Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240319となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# ソーシャル・メディア・ストリーミング・データに基づくリアルタイムの適切な思考予測のためのビッグデータ分析システム A Big Data Analytics System for Predicting Suicidal Ideation in Real-Time Based on Social Media Streaming Data ( http://arxiv.org/abs/2404.12394v1 ) ライセンス: Link先を確認	Mohamed A. Allayla, Serkan Ayvaz,	(参考訳) オンラインソーシャルメディアプラットフォームは、最近、私たちの社会や日々のルーチンに不可欠なものになっています。世界中のユーザーは毎日、このようなプラットフォームに何時間も費やし、感情や感情の状態を表現し、お互いに連絡を取り合っている。このような膨大なデータをこれらのプラットフォームから分析することで、世論の感情を明確に把握し、彼らの精神状態を検出することができる。これらの健康状態のリスクを早期に特定することは、自殺のアイデアの数を予防したり減らしたり、人々の命を救ったりするのに役立ちます。従来の手法は、ストリームや大規模なデータセットを処理するのに効果がない。そこで本稿では,ソーシャルメディアコンテンツからの自殺思考を予測するため,ビッグデータアーキテクチャに基づく新たな手法を提案する。提案手法は、バッチ処理とリアルタイムストリーミング予測という2つのフェーズで、ソーシャルメディアデータの実用的な分析を提供する。バッチデータセットはRedditフォーラムから収集され、モデル構築とトレーニングに使用され、ビッグデータのストリーミングはTwitterストリーミングAPIを使用して抽出され、リアルタイム予測に使用された。生データを前処理した後、抽出した機能は複数のApache Spark ML分類器(NB、LR、LinearSVC、DT、RF、MLP)に供給された。様々なテストシナリオを用いて,様々な特徴抽出手法を用いて様々な実験を行った。バッチ処理フェーズの実験結果から, (Unigram + Bigram) + CV-IDF と MLP の分類器を用いて抽出した特徴は, 93.47%の精度で自殺思考を分類し, リアルタイムストリーミング予測フェーズに適用できることを示した。 Online social media platforms have recently become integral to our society and daily routines. Every day, users worldwide spend a couple of hours on such platforms, expressing their sentiments and emotional state and contacting each other. Analyzing such huge amounts of data from these platforms can provide a clear insight into public sentiments and help detect their mental status. The early identification of these health condition risks may assist in preventing or reducing the number of suicide ideation and potentially saving people's lives. The traditional techniques have become ineffective in processing such streams and large-scale datasets. Therefore, the paper proposed a new methodology based on a big data architecture to predict suicidal ideation from social media content. The proposed approach provides a practical analysis of social media data in two phases: batch processing and real-time streaming prediction. The batch dataset was collected from the Reddit forum and used for model building and training, while streaming big data was extracted using Twitter streaming API and used for real-time prediction. After the raw data was preprocessed, the extracted features were fed to multiple Apache Spark ML classifiers: NB, LR, LinearSVC, DT, RF, and MLP. We conducted various experiments using various feature-extraction techniques with different testing scenarios. The experimental results of the batch processing phase showed that the features extracted of (Unigram + Bigram) + CV-IDF with MLP classifier provided high performance for classifying suicidal ideation, with an accuracy of 93.47%, and then applied for real-time streaming prediction phase.	翻訳日:2024-07-01 11:58:46 公開日:2024-03-19
# Rigid ICPテンプレートアライメントとボクセル空間再構成に基づく半自動頭蓋内インプラント設計ツール A Semi-automatic Cranial Implant Design Tool Based on Rigid ICP Template Alignment and Voxel Space Reconstruction ( http://arxiv.org/abs/2404.15287v1 ) ライセンス: Link先を確認	Michael Lackner, Behrus Puladi, Jens Kleesiek, Jan Egger, Jianning Li,	(参考訳) 外傷性疾患では、患者は頭蓋形成術(頭蓋インプラントを用いた神経頭蓋修復術)に大きく依存する。近年の進歩にもかかわらず、患者固有のインプラント(PSI)の設計は、頭蓋形成術において最も複雑で、高価で、かつ、最も自動化されていないタスクである。この分野のさらなる研究が必要である。そこで我々は,ハイレベルな動作しか行わないセミオートマチックなインプラント生成に適したグラフィカルユーザインタフェース(UI)を備えたプロトタイプアプリケーションを作成した。提案したインプラント生成プロセスの概略は、関心領域を設定し、テンプレートを整列させ、その後、ボクセル空間にインプラントを作成することである。さらに, 欠陥境界近傍のクリップ状形状を考慮すれば, アライメントを著しく改善できることを示す。ソフトウェアプロトタイプはhttps://github.com/3Descape/Cranial_Implant_Designでオープンソース化される。 In traumatic medical emergencies, the patients heavily depend on cranioplasty - the craft of neurocranial repair using cranial implants. Despite the improvements made in recent years, the design of a patient-specific implant (PSI) is among the most complex, expensive, and least automated tasks in cranioplasty. Further research in this area is needed. Therefore, we created a prototype application with a graphical user interface (UI) specifically tailored for semi-automatic implant generation, where the users only need to perform high-level actions. A general outline of the proposed implant generation process involves setting an area of interest, aligning the templates, and then creating the implant in voxel space. Furthermore, we show that the alignment can be improved significantly, by only considering clipped geometry in the vicinity of the defect border. The software prototype will be open-sourced at https://github.com/3Descape/Cranial_Implant_Design	翻訳日:2024-07-01 11:49:01 公開日:2024-03-19
# 混在無線環境に対するAGCインデックス管理アルゴリズム Algorithm for AGC index management against crowded radio environment ( http://arxiv.org/abs/2404.08652v1 ) ライセンス: Link先を確認	Morgane Joly, Fabian Rivière, Éric Renault,	(参考訳) 本稿では,パケット受信に使用する最適な自動利得制御(AGC)指数,あるいは最も適切な可変利得範囲を推定し,ペイロード受信中に出現する干渉者を予測した。これにより、受信機は、ゲインフリードペイロード受信期間中に発生しても干渉者に高い免疫を与えることができ、なおかつ、最適な感度レベルを確保できる。その結果、受信機利得の設定は、受信感度とランダムな干渉者免疫との間に最適なトレードオフを得ることができる。 This paper describes a receiver that uses an innovative method to predict, according to history of receiver operating metrics (packet lost/well received), the optimum automatic gain control (AGC) index or most appropriate variable gain range to be used for next packet reception, anticipating an interferer appearing during the payload reception. This allows the receiver to have higher immunity to interferers even if they occur during the gain frozen payload reception period whilst still ensuring an optimum sensitivity level. As a result, the method allows setting the receiver gain to get an optimum trade-off between reception sensitivity and random interferer immunity.	翻訳日:2024-04-21 20:04:31 公開日:2024-03-19
# プログラマブルフォトニック集積回路の熱クロストークモデリングと補償法 Thermal Crosstalk Modelling and Compensation Methods for Programmable Photonic Integrated Circuits ( http://arxiv.org/abs/2404.10589v1 ) ライセンス: Link先を確認	Isidora Teofilovic, Ali Cem, David Sanchez-Jacome, Daniel Perez-Lopez, Francesco Da Ros,	(参考訳) フォトニック集積回路は光コンピューティングの分野で重要な役割を担い、デジタルコンピューティングに比べて高速でエネルギー効率の高い演算を約束する。この利点は、行列乗法を実行するための光信号の固有の適合性に起因している。しかし、熱クロストークのような決定論的現象でさえ、フォトニックチップの正確なプログラミングは難しい課題である。ここでは,統合可能フォトニックメッシュの異なる位置における熱クロストークの効果を予測するために,物理直観の多様性を取り入れた3つのモデルを訓練し,実験的に評価する。マイクロリング共振器のパワースペクトルにおける共振波長シフトによる熱クロストークの効果を定量化し, モデル化誤差<0.5 pm。クロストークによる波長シフトの補正によりモデルの有効性を実験的に検証する。最後に、トレーニングされていないチップのサーマルクロストークの効果を予測・補償するために、モデルの一つの一般化能力を評価し、2.0 pmのルート平均二乗誤差を明らかにした。 Photonic integrated circuits play an important role in the field of optical computing, promising faster and more energy-efficient operations compared to their digital counterparts. This advantage stems from the inherent suitability of optical signals to carry out matrix multiplication. However, even deterministic phenomena such as thermal crosstalk make precise programming of photonic chips a challenging task. Here, we train and experimentally evaluate three models incorporating varying degrees of physics intuition to predict the effect of thermal crosstalk in different locations of an integrated programmable photonic mesh. We quantify the effect of thermal crosstalk by the resonance wavelength shift in the power spectrum of a microring resonator implemented in the chip, achieving modelling errors <0.5 pm. We experimentally validate the models through compensation of the crosstalk-induced wavelength shift. Finally, we evaluate the generalization capabilities of one of the models by employing it to predict and compensate for the effect of thermal crosstalk for parts of the chip it was not trained on, revealing root-mean-square-errors of <2.0 pm.	翻訳日:2024-04-21 19:45:03 公開日:2024-03-19
# 知識処理と複雑なタスクにおける生成検索エンジンの利用 The Use of Generative Search Engines for Knowledge Work and Complex Tasks ( http://arxiv.org/abs/2404.04268v1 ) ライセンス: Link先を確認	Siddharth Suri, Scott Counts, Leijie Wang, Chacha Chen, Mengting Wan, Tara Safavi, Jennifer Neville, Chirag Shah, Ryen W. White, Reid Andersen, Georg Buscher, Sathish Manivannan, Nagu Rangan, Longqi Yang,	(参考訳) 最近まで、検索エンジンは人々がオンライン情報にアクセスするための主要な方法だった。近年の大規模言語モデル(LLM)の出現により、機械はテキスト、画像、コードなどの新しいデジタルアーティファクトを生成できるようになり、その結果、LLMの能力を従来の検索エンジンと組み合わせた新しいツール、生成検索エンジンが誕生した。 Bing Copilot(Bing Chat)の実証分析を通じて,Bing Copilotを用いたタスクのタイプと複雑さをBing Searchと比較して分析した。発見は、従来の検索エンジンよりも認知の複雑さが高い知識作業タスクのために、人々が生成検索エンジンを使用していることを示している。 Until recently, search engines were the predominant method for people to access online information. The recent emergence of large language models (LLMs) has given machines new capabilities such as the ability to generate new digital artifacts like text, images, code etc., resulting in a new tool, a generative search engine, which combines the capabilities of LLMs with a traditional search engine. Through the empirical analysis of Bing Copilot (Bing Chat), one of the first publicly available generative search engines, we analyze the types and complexity of tasks that people use Bing Copilot for compared to Bing Search. Findings indicate that people use the generative search engine for more knowledge work tasks that are higher in cognitive complexity than were commonly done with a traditional search engine.	翻訳日:2024-04-14 13:21:48 公開日:2024-03-19
# 推薦システムにおけるアルゴリズム的集合行動:プレイリストの並べ替えによる歌の促進 Algorithmic Collective Action in Recommender Systems: Promoting Songs by Reordering Playlists ( http://arxiv.org/abs/2404.04269v1 ) ライセンス: Link先を確認	Joachim Baumann, Celestine Mendler-Dünner,	(参考訳) 変圧器を用いた推薦システムにおけるアルゴリズム的集団行動について検討する。我々のユースケースは、アーティストがコントロールする既存のプレイリストに曲を戦略的に配置することで、アーティストの可視性を促進することを目的としたファンの集まりである。この曲の成功は、対象歌のテストタイムレコメンデーションの増加によって測定される。我々は,この目標に向けて,実装が容易な2つの戦略を導入し,主要な音楽ストリーミングプラットフォームがリリースするレコメンデータシステムモデル上で,その有効性を検証した。その結果,小集団(トレーニングデータの0.01%未満をコントロールしている)でさえ,楽曲挿入位置を戦略的に選択することで,推薦の25倍の増幅を達成できることが判明した。次に、戦略の外部性の調査に焦点をあてる。プラットフォームの性能損失は無視でき、他の曲の推薦は大部分が保存されており、参加者のユーザエクスペリエンスを損なうことが最小限である。さらに、コストは他のアーティストに均等に分配される。本研究は, 包括的行動戦略が必ずしも敵対的ではなく, インセンティブ, 社会的ダイナミクス, およびレコメンデーターシステムにおける均衡に関する新たな疑問を提起する上で有効であることを示すものである。 We investigate algorithmic collective action in transformer-based recommender systems. Our use case is a collective of fans aiming to promote the visibility of an artist by strategically placing one of their songs in the existing playlists they control. The success of the collective is measured by the increase in test-time recommendations of the targeted song. We introduce two easily implementable strategies towards this goal and test their efficacy on a publicly available recommender system model released by a major music streaming platform. Our findings reveal that even small collectives (controlling less than 0.01% of the training data) can achieve up 25x amplification of recommendations by strategically choosing the position at which to insert the song. We then focus on investigating the externalities of the strategy. We find that the performance loss for the platform is negligible, and the recommendations of other songs are largely preserved, minimally impairing the user experience of participants. Moreover, the costs are evenly distributed among other artists. Taken together, our findings demonstrate how collective action strategies can be effective while not necessarily being adversarial, raising new questions around incentives, social dynamics, and equilibria in recommender systems.	翻訳日:2024-04-14 13:21:48 公開日:2024-03-19
# 成長モニタリングのための深層学習によるパラガングリオーマの自動分離 Deep learning-based auto-segmentation of paraganglioma for growth monitoring ( http://arxiv.org/abs/2404.07952v1 ) ライセンス: Link先を確認	E. M. C. Sijben, J. C. Jansen, M. de Ridder, P. A. N. Bosman, T. Alderliesten,	(参考訳) 神経内分泌腫瘍(典型的には頭頸部の血管や神経経路に沿って形成される稀な神経内分泌腫瘍)の体積測定は、腫瘍の成長を長期にわたって監視・モデル化するために重要である。しかし、臨床実践では、これらの測定に利用可能なツールを使用することは時間がかかり、腫瘍形成の仮定やオブザーバ・オブザーバの変動に悩まされる。成長モデリングは、数十年前のジレンマ(腫瘍が時間とともにどのように発達するかの不確実性から考える)を解決する上で重要な役割を果たす可能性がある。パラガングリオーマ患者に治療を施すことにより、重度の症状を予防することができる。しかし、実際には必要のない患者を治療するには、不要な副作用や合併症が伴う。改良された測定技術は、大量の腫瘍の体積データによる成長モデルの研究を可能にし、これらの腫瘍が時間とともにどのように発達するかについての貴重な洞察を与える可能性がある。そこで我々は,no-new-UNnet (nnUNet) を用いたディープラーニングセグメンテーションモデルに基づく腫瘍体積自動計測手法を提案する。本研究では, 高齢者耳鼻咽喉科医による視覚検査と, モデルアウトプットと手動記述との比較により, 複数の観察者による手動記述の変動との比較などにより, モデルの性能を定量的に評価した。以上の結果から,手動のデライン化に匹敵する自動手法が(少なくとも)有効であることが示唆された。最後に、生成したモデルと、時間とともに腫瘍を追跡できるリンク手順を用いて、既知の成長関数の適合度に追加の体積測定がどう影響するかを示す。 Volume measurement of a paraganglioma (a rare neuroendocrine tumor that typically forms along major blood vessels and nerve pathways in the head and neck region) is crucial for monitoring and modeling tumor growth in the long term. However, in clinical practice, using available tools to do these measurements is time-consuming and suffers from tumor-shape assumptions and observer-to-observer variation. Growth modeling could play a significant role in solving a decades-old dilemma (stemming from uncertainty regarding how the tumor will develop over time). By giving paraganglioma patients treatment, severe symptoms can be prevented. However, treating patients who do not actually need it, comes at the cost of unnecessary possible side effects and complications. Improved measurement techniques could enable growth model studies with a large amount of tumor volume data, possibly giving valuable insights into how these tumors develop over time. Therefore, we propose an automated tumor volume measurement method based on a deep learning segmentation model using no-new-UNnet (nnUNet). We assess the performance of the model based on visual inspection by a senior otorhinolaryngologist and several quantitative metrics by comparing model outputs with manual delineations, including a comparison with variation in manual delineation by multiple observers. Our findings indicate that the automatic method performs (at least) equal to manual delineation. Finally, using the created model, and a linking procedure that we propose to track the tumor over time, we show how additional volume measurements affect the fit of known growth functions.	翻訳日:2024-04-14 13:03:36 公開日:2024-03-19
# AIはソーシャルメディアのクリエイティビティを創造する上で、人間のエキスパートより優れているか? Can AI Outperform Human Experts in Creating Social Media Creatives? ( http://arxiv.org/abs/2404.00018v1 ) ライセンス: Link先を確認	Eunkyung Park, Raymond K. Wong, Junbum Kwon,	(参考訳) 人工知能はチェスやバデュークのような機能的なタスクにおいて、人間の専門家よりも優れています。創造的なタスクはどうでしょう? 本稿では、これまではほとんど研究されていない人間の専門家と比較して、創造的領域におけるAIの能力を評価する。本稿では,大規模言語モデルによる迅速な拡張を通じて,ソーシャルメディアの創造性を創出するための新しいPrompt-for-Promptを提案する。人気の高いInstagramの投稿(クリック数が最も多い)をトップブランドのInstagramアカウントに掲載して、ソーシャルメディアのクリエイティビティを創り出しています。我々はGPT 4にテキスト記述を用いたいくつかのプロンプトを与え、最先端のテキスト・ツー・イメージ・ジェネレータ(Midjourney、DALL E 3、Stable Diffusion)に最も効果的なプロンプトを生成する。 LLM拡張プロンプトは、ソーシャルメディアイメージ作成のための目標、エンゲージメント戦略、照明、ブランド整合性を追加することで、AIの能力を高めることができる。我々は、広範囲にわたる人的評価実験を行い、AIが人間の専門家に優れており、Midjourneyは他のテキストから画像へのジェネレータよりも優れていることを発見した。驚くことに、ソーシャルメディア業界における従来の知恵とは違って、アイキャッチを含むインストラクションは、自然なものよりもパフォーマンスが劣っている。クリエイティブのタイプに関しては、AIは動物や製品による創造性を改善するが、実際の人々による創造性は低下する。また、AIは長いテキスト記述よりも短いテキスト記述で創造性を向上する。 Artificial Intelligence has outperformed human experts in functional tasks such as chess and baduk. How about creative tasks? This paper evaluates AI's capability in the creative domain compared to human experts, which little research has been conducted so far. We propose a novel Prompt-for-Prompt to generate social media creatives via prompt augmentation by Large Language Models. We take the most popular Instagram posts (with the biggest number of like clicks) in top brands' Instagram accounts to create social media creatives. We give GPT 4 several prompt instructions with text descriptions to generate the most effective prompts for cutting-edge text-to-image generators: Midjourney, DALL E 3, and Stable Diffusion. LLM-augmented prompts can boost AI's abilities by adding objectives, engagement strategy, lighting and brand consistency for social media image creation. We conduct an extensive human evaluation experiment, and find that AI excels human experts, and Midjourney is better than the other text-to-image generators. Surprisingly, unlike conventional wisdom in the social media industry, prompt instruction including eye-catching shows much poorer performance than those including natural. Regarding the type of creatives, AI improves creatives with animals or products but less with real people. Also, AI improves creatives with short text descriptions more than with long text descriptions, because there is more room for AI to augment prompts with shorter descriptions.	翻訳日:2024-04-07 23:17:33 公開日:2024-03-19
# 説明可能な自動運転車システムの進化 : 総合的なレビューと研究ロードマップ Advancing Explainable Autonomous Vehicle Systems: A Comprehensive Review and Research Roadmap ( http://arxiv.org/abs/2404.00019v1 ) ライセンス: Link先を確認	Sule Tekkesinoglu, Azra Habibovic, Lars Kunze,	(参考訳) 自律走行車(AV)の既存の説明可能性手法がステークホルダーのニーズにどのように適合しているかという不確実性を考えると、説明を必要とする状況や適切なインタラクション戦略を決定するために徹底的な調査が不可欠である。 AVエコシステムにおける様々な関心や期待と現在のアプローチの整合性を評価するためには、包括的なレビューが不可欠である。本稿では,より効果的かつ包括的説明可能なAVシステムの開発を促進するために,説明生成とプレゼンテーションに関連する複雑さについて論じる。本研究は,既存の文献を説明課題,説明情報,説明情報通信の3つの主要なトピックに分類することにつながった。我々の洞察に基づいて、我々は今後の研究の総合的なロードマップを提案してきた。 (i)インターロケータを知ること。 (二)タイムリーな説明を作成すること。 (二)人間に優しい説明、(四)継続的学習。私たちのロードマップは、責任ある研究とイノベーションの原則に基づき、多様な説明要件の重要性を強調しています。説明可能なAVシステムの実装に関わる課題に効果的に取り組むため,プライバシー保護データ統合,倫理的枠組み,リアルタイム分析,人間中心のインタラクション設計,学際的コラボレーションの強化など,さまざまな研究方針を整理した。これらの研究の方向性を探求することにより、ユーザニーズ、技術進歩、規制順守、倫理的配慮の全体的理解から情報を得て、説明可能なAVの開発と展開をガイドし、より安全で信頼性の高い自動運転体験を確保することを目的とする。 Given the uncertainty surrounding how existing explainability methods for autonomous vehicles (AVs) meet the diverse needs of stakeholders, a thorough investigation is imperative to determine the contexts requiring explanations and suitable interaction strategies. A comprehensive review becomes crucial to assess the alignment of current approaches with the varied interests and expectations within the AV ecosystem. This study presents a review to discuss the complexities associated with explanation generation and presentation to facilitate the development of more effective and inclusive explainable AV systems. Our investigation led to categorising existing literature into three primary topics: explanatory tasks, explanatory information, and explanatory information communication. Drawing upon our insights, we have proposed a comprehensive roadmap for future research centred on (i) knowing the interlocutor, (ii) generating timely explanations, (ii) communicating human-friendly explanations, and (iv) continuous learning. Our roadmap is underpinned by principles of responsible research and innovation, emphasising the significance of diverse explanation requirements. To effectively tackle the challenges associated with implementing explainable AV systems, we have delineated various research directions, including the development of privacy-preserving data integration, ethical frameworks, real-time analytics, human-centric interaction design, and enhanced cross-disciplinary collaborations. By exploring these research directions, the study aims to guide the development and deployment of explainable AVs, informed by a holistic understanding of user needs, technological advancements, regulatory compliance, and ethical considerations, thereby ensuring safer and more trustworthy autonomous driving experiences.	翻訳日:2024-04-07 23:17:33 公開日:2024-03-19
# 評価学:評価の科学と工学 Evaluatology: The Science and Engineering of Evaluation ( http://arxiv.org/abs/2404.00021v1 ) ライセンス: Link先を確認	Jianfeng Zhan, Lei Wang, Wanling Gao, Hongxiao Li, Chenxi Wang, Yunyou Huang, Yatao Li, Zhengxin Yang, Guoxin Kang, Chunjie Luo, Hainan Ye, Shaopeng Dai, Zhifei Zhang,	(参考訳) 評価は人間の存在の重要な側面であり、様々な分野で重要な役割を果たしている。しかし、普遍的な概念、用語、理論、方法論についてのコンセンサスが欠如している経験的かつアドホックな方法でアプローチされることがしばしばある。この合意の欠如は大きな反響を呼んだ。本稿では,評価の科学と工学を包含する評価学の分野を正式に紹介することを目的とする。本稿では,様々な分野にまたがって適用可能な概念,用語,理論,方法論を包含して評価するための普遍的な枠組みを提案する。本研究は,多種多様な被験者に対して客観的に評価条件を適用し,測定および/または試験によって異なる被験者の影響を推定する実験を行うことが評価の本質であることを明らかにした。評価の本質から,評価結果の重要側面に着目した5つの公理を基礎評価理論として提案する。これらの公理は、普遍的な評価理論と方法論を構築する基盤となる。 1つの主題を評価する場合、同値性の異なる評価条件を作成することが不可欠である。これらの条件を多様な対象に適用することにより、基準評価モデルを確立することができる。これらのモデルでは、他のすべての変数をコントロールとして保ちながら、単一の独立変数を一度に変更することができます。複雑なシナリオを評価するとき、鍵となるのは、推移性を維持する一連の評価モデルを確立することである。評価の科学に基づいて,同値性の異なる評価条件として,ベンチマークの形式的定義を提案する。この概念は、様々な分野にまたがって評価を行う、普遍的なベンチマークベースのエンジニアリングアプローチの基盤となる。 Evaluation is a crucial aspect of human existence and plays a vital role in various fields. However, it is often approached in an empirical and ad-hoc manner, lacking consensus on universal concepts, terminologies, theories, and methodologies. This lack of agreement has significant repercussions. This article aims to formally introduce the discipline of evaluatology, which encompasses the science and engineering of evaluation. We propose a universal framework for evaluation, encompassing concepts, terminologies, theories, and methodologies that can be applied across various disciplines. Our research reveals that the essence of evaluation lies in conducting experiments that intentionally apply a well-defined evaluation condition to diverse subjects and infer the impact of different subjects by measuring and/or testing. Derived from the essence of evaluation, we propose five axioms focusing on key aspects of evaluation outcomes as the foundational evaluation theory. These axioms serve as the bedrock upon which we build universal evaluation theories and methodologies. When evaluating a single subject, it is crucial to create evaluation conditions with different levels of equivalency. By applying these conditions to diverse subjects, we can establish reference evaluation models. These models allow us to alter a single independent variable at a time while keeping all other variables as controls. When evaluating complex scenarios, the key lies in establishing a series of evaluation models that maintain transitivity. Building upon the science of evaluation, we propose a formal definition of a benchmark as a simplified and sampled evaluation condition that guarantees different levels of equivalency. This concept serves as the cornerstone for a universal benchmark-based engineering approach to evaluation across various disciplines, which we refer to as benchmarkology.	翻訳日:2024-04-07 23:17:33 公開日:2024-03-19
# WoLF: CXR理解のための大規模言語モデルフレームワーク WoLF: Large Language Model Framework for CXR Understanding ( http://arxiv.org/abs/2403.15456v1 ) ライセンス: Link先を確認	Seil Kang, Donghyun Kim, Junhyeok Kim, Hyo Kyung Lee, Seong Jae Hwang,	(参考訳) 最新の視覚言語モデル(VLM)による胸部X線(CXR)の理解に向けた重要な手法が開発され、視覚質問応答(VQA)とCXRレポート生成能力が目覚ましい。しかし、既存のCXR理解フレームワークには、手続き上の注意事項がいくつか残っている。 1) 総合的視覚質問応答 (VQA) には不十分なCXRレポートのみを使用する従来手法では, 薬物歴や先行診断などの健康関連データが必要であった。 2) 従来の手法では生のCXRレポートを使用しており, 任意に構造化されることが多い。現代の言語モデルは、様々なテキスト形式を理解できるが、より明確で組織化された解剖学的情報のためのレポートの再構築は、それらの有用性を高めることができる。 3) CXR-VQAの現在の評価手法は, 主に言語的正当性を重視しており, 生成した回答の微妙な評価を行う能力は欠如している。本稿では,CXR理解のための広スコープ大言語モデルフレームワークであるWoLFを紹介する。 1) 実際の臨床シナリオにおいて, 正確な診断に利用される多面的な患者の記録を収集する。具体的には、電子健康記録(EHR)を用いて、CXR理解に適した指示追従データを生成する。 2)CXRレポートでは,注意ステップ内においても注意を隠蔽して,解剖学的構造に基づく知識の疎結合化によるレポート生成性能の向上が図られている。 (3)に対処するため,LLMの性能評価に最適化されたAI評価プロトコルを提案する。大規模な実験的検証を通じて、WoLFはVQA(平均スコア+9.47%まで)とレポート生成(+7.3%p BLEU-1まで)に関するAI評価領域におけるMIMIC-CXRの他のモデルよりも優れた性能を示す。 Significant methodological strides have been made toward Chest X-ray (CXR) understanding via modern vision-language models (VLMs), demonstrating impressive Visual Question Answering (VQA) and CXR report generation abilities. However, existing CXR understanding frameworks still possess several procedural caveats. (1) Previous methods solely use CXR reports, which are insufficient for comprehensive Visual Question Answering (VQA), especially when additional health-related data like medication history and prior diagnoses are needed. (2) Previous methods use raw CXR reports, which are often arbitrarily structured. While modern language models can understand various text formats, restructuring reports for clearer, organized anatomy-based information could enhance their usefulness. (3) Current evaluation methods for CXR-VQA primarily emphasize linguistic correctness, lacking the capability to offer nuanced assessments of the generated answers. In this work, to address the aforementioned caveats, we introduce WoLF, a Wide-scope Large Language Model Framework for CXR understanding. To resolve (1), we capture multi-faceted records of patients, which are utilized for accurate diagnoses in real-world clinical scenarios. Specifically, we adopt the Electronic Health Records (EHR) to generate instruction-following data suited for CXR understanding. Regarding (2), we enhance report generation performance by decoupling knowledge in CXR reports based on anatomical structure even within the attention step via masked attention. To address (3), we introduce an AI-evaluation protocol optimized for assessing the capabilities of LLM. Through extensive experimental validations, WoLF demonstrates superior performance over other models on MIMIC-CXR in the AI-evaluation arena about VQA (up to +9.47%p mean score) and by metrics about report generation (+7.3%p BLEU-1).	翻訳日:2024-03-26 22:41:56 公開日:2024-03-19
# 信頼できるAIへの旅その1:実践的なフレームワークの探求 The Journey to Trustworthy AI- Part 1: Pursuit of Pragmatic Frameworks ( http://arxiv.org/abs/2403.15457v1 ) ライセンス: Link先を確認	Mohamad M Nasr-Azadani, Jean-Luc Chatelain,	(参考訳) 本稿では,信頼に値する人工知能(TAI)とその様々な定義についてレビューする。あらゆる社会で尊重される原則を考えると、TAIはしばしばいくつかの属性によって特徴づけられる。我々は、TAIの代わりにResponsibleやEthical AIといった用語を使うことに反対する。そして、混乱を明確にするために、私たちはそれらを置き去りにすることを提案します。 TAIに固有の主観性と複雑性を考えると、普遍的な枠組みの開発は不可能であると考えられる。代わりに、フェアネス、バイアス、リスク、セキュリティ、説明可能性、信頼性といった重要な属性や特性に対処するアプローチを提唱します。我々は、EU、中国、米国におけるイニシアチブに焦点をあてて、現在進行中の規制の状況について検討する。我々は、地政学的理由と地理的理由に基づくAI規制の違いが、多国籍企業にとってさらなる課題となることを認識している。我々はリスクをAI規制とTAIの中核要因とみなしている。例えば、EU-AI法(EU-AI Act)で概説されているように、組織はAI製品のリスクレベルを評価して、それに従って行動しなければならない(あるいはリスクヘビーな罰金)。私たちは、TAI実装のモダリティと、複数のクロスファンクショナルチームがプロセス全体に従事しているかを比較します。したがって、TAIを実践するための残酷な力のアプローチは、その効率性と機敏さ、ムートをもたらす。これを解決するために、当社のフレームワークであるSet-Formalize-Measure-Act(SFMA)を紹介します。私たちのソリューションでは、TAI対応メトリクス、TAIのドライバ、ステークホルダ、ビジネス/法律要件を実際のベンチマークやテストに変換することの重要性を強調しています。最後に、強力なAIモデルのパニックによって引き起こされる過剰な規制は、事実、TAIにも害を与える可能性がある。 GitHubのユーザアクティビティデータに基づいて、2023年には、AIオープンソースプロジェクトがコントリビュータアカウントによってトッププロジェクトに昇格した。 TAIにおけるイノベーションの実現は、オープンソースコミュニティの独立した貢献に依存している。 This paper reviews Trustworthy Artificial Intelligence (TAI) and its various definitions. Considering the principles respected in any society, TAI is often characterized by a few attributes, some of which have led to confusion in regulatory or engineering contexts. We argue against using terms such as Responsible or Ethical AI as substitutes for TAI. And to help clarify any confusion, we suggest leaving them behind. Given the subjectivity and complexity inherent in TAI, developing a universal framework is deemed infeasible. Instead, we advocate for approaches centered on addressing key attributes and properties such as fairness, bias, risk, security, explainability, and reliability. We examine the ongoing regulatory landscape, with a focus on initiatives in the EU, China, and the USA. We recognize that differences in AI regulations based on geopolitical and geographical reasons pose an additional challenge for multinational companies. We identify risk as a core factor in AI regulation and TAI. For example, as outlined in the EU-AI Act, organizations must gauge the risk level of their AI products to act accordingly (or risk hefty fines). We compare modalities of TAI implementation and how multiple cross-functional teams are engaged in the overall process. Thus, a brute force approach for enacting TAI renders its efficiency and agility, moot. To address this, we introduce our framework Set-Formalize-Measure-Act (SFMA). Our solution highlights the importance of transforming TAI-aware metrics, drivers of TAI, stakeholders, and business/legal requirements into actual benchmarks or tests. Finally, over-regulation driven by panic of powerful AI models can, in fact, harm TAI too. Based on GitHub user-activity data, in 2023, AI open-source projects rose to top projects by contributor account. Enabling innovation in TAI hinges on the independent contributions of the open-source community.	翻訳日:2024-03-26 22:41:56 公開日:2024-03-19
# ゲーム内トレーシュ音声検出のための微調整事前学習言語モデル Fine-Tuning Pre-trained Language Models to Detect In-Game Trash Talks ( http://arxiv.org/abs/2403.15458v1 ) ライセンス: Link先を確認	Daniel Fesalbon, Arvin De La Cruz, Marvin Mallari, Nelson Rodelas,	(参考訳) オンラインモバイルゲームやコンピュータゲームの一般的な問題は、プレイヤー間の有害な行動と虐待的なコミュニケーションに関連していた。異なるレポートや研究に基づいて、オンラインヘイトスピーチと毒性がプレイヤーのゲーム内パフォーマンスおよび全体的な幸福に与える影響についても論じている。本研究は,ゲーム内チャットにおける有害性を検出するために,事前学習されたBERT言語モデルとGPT言語モデルの性能を評価し評価する。公開APIを用いて、DOTA 2のゲームマッチのゲーム内チャットデータを収集し、処理し、レビューし、非毒性、軽度(毒性)、有毒とラベル付けした。この研究は、BERT(Base-uncased)、BERT(Large-uncased)、GPT-3モデルのトレーニングとテストのために、約2万のゲーム内チャットを収集することができた。本研究は,3つのモデルの最先端性能に基づいて,オンラインヘイトスピーチとゲーム内侮辱的ゴミ話に対処する事前学習された言語モデルの有望な可能性について結論づける。 Common problems in playing online mobile and computer games were related to toxic behavior and abusive communication among players. Based on different reports and studies, the study also discusses the impact of online hate speech and toxicity on players' in-game performance and overall well-being. This study investigates the capability of pre-trained language models to classify or detect trash talk or toxic in-game messages The study employs and evaluates the performance of pre-trained BERT and GPT language models in detecting toxicity within in-game chats. Using publicly available APIs, in-game chat data from DOTA 2 game matches were collected, processed, reviewed, and labeled as non-toxic, mild (toxicity), and toxic. The study was able to collect around two thousand in-game chats to train and test BERT (Base-uncased), BERT (Large-uncased), and GPT-3 models. Based on the three models' state-of-the-art performance, this study concludes pre-trained language models' promising potential for addressing online hate speech and in-game insulting trash talk.	翻訳日:2024-03-26 22:41:56 公開日:2024-03-19
# 言語生産のオンライン研究における効果の大きさ, 多様性, 力の評価 Assessing effect sizes, variability, and power in the on-line study of language production ( http://arxiv.org/abs/2403.15459v1 ) ライセンス: Link先を確認	Bürki Audrey, Vasishth Shravan,	(参考訳) パンデミックにより、多くの実験心理学者や言語学者がインターネット上でデータを集め始めた(オンラインデータ以降)。このような実験の実現可能性と、将来の実験で十分な統計的パワーを達成するために必要なサンプルサイズを評価する必要がある。これにより、効果の大きさや変動性に関する情報が必要となる。そこで本研究では,実験室とオンラインで行った同じ単語生成実験で得られた応答時間データを比較した。これらの分析により,2つの設定が効果サイズに異なるか,実験中における応答の整合性,参加者間の平均応答時間のばらつき,参加者間の効果サイズの大きさ,説明できない変数の量で異なるかを決定することができる。一連のシミュレーションにおいて,これらの違いが設計のパワーに与える影響を評価する。これまでの研究から得られた熱意を抑えつつ, オンライン生産研究は実現可能であるが, 非無視コストが伴う可能性が示唆された。オンライン言語生産研究において十分なパワーを達成するために必要なサンプルサイズは、手作業量の増加が不可避である。 With the pandemic, many experimental psychologists and linguists have started to collect data over the internet (hereafter on-line data). The feasibility of such experiments and the sample sizes required to achieve sufficient statistical power in future experiments have to be assessed. This in turn requires information on effect sizes and variability. In a series of analyses, we compare response time data obtained in the same word production experiment conducted in the lab and on-line. These analyses allow us to determine whether the two settings differ in effect sizes, in the consistency of responses over the course of the experiment, in the variability of average response times across participants, in the magnitude of effect sizes across participants, or in the amount of unexplained variability. We assess the impact of these differences on the power of the design in a series of simulations. Our findings temper the enthusiasm raised by previous studies and suggest that on-line production studies might be feasible but at a non-negligible cost. The sample sizes required to achieve sufficient power in on-line language production studies come with a non-negligible increase in the amount of manual labour.	翻訳日:2024-03-26 22:41:56 公開日:2024-03-19
# FUELVISION:ワイルドファイア燃料マッピングのためのマルチモーダルデータフュージョンとマルチモデルアンサンブルアルゴリズム FUELVISION: A Multimodal Data Fusion and Multimodel Ensemble Algorithm for Wildfire Fuels Mapping ( http://arxiv.org/abs/2403.15462v1 ) ライセンス: Link先を確認	Riyaaz Uddien Shaik, Mohamad Alipour, Eric Rowell, Bharathan Balaji, Adam Watts, Ertugrul Taciroglu,	(参考訳) 燃料条件の正確な評価は、火火の点火および行動予測およびリスク管理の前提条件である。提案手法は,ランドサット8光画像,センチネル-1(Cバンド)合成開口レーダ(SAR)画像,PALSAR(Lバンド)SAR画像,地形特徴などの多様なデータソースを利用して,燃料の種類や分布に関する包括的情報を取得する。 USDAフォレストサービスから得られた森林調査プロットデータを用いて「スコット・アンド・バーガン40」などのランドスケープスケールの燃料を推定するために,アンサンブルモデルを訓練した。しかし、この基本的なアプローチは、トレーニングデータの不足により、比較的貧弱な結果をもたらした。 Pseudo-labeled and fully synthetic datasets were developed using Generative AI approach to address the limit of ground truth data available。これらの合成データセットは、モデルトレーニングの堅牢性とカバレッジを高めるために、カリフォルニアからFIAデータを増強するために使用された。ディープラーニングニューラルネットワーク、決定木、勾配向上など、一連の手法を使用することで、燃料マッピングの精度は80%近く向上した。大規模な実験と評価を通じて、2021年のディクシー火災とカルドール火災の地域で提案手法の有効性が検証された。国立農業画像計画(NAIP)と木材収穫図の高分解能データとの比較分析により, ほぼリアルタイムの燃料マッピングが可能な提案手法の堅牢性と信頼性が確認された。 Accurate assessment of fuel conditions is a prerequisite for fire ignition and behavior prediction, and risk management. The method proposed herein leverages diverse data sources including Landsat-8 optical imagery, Sentinel-1 (C-band) Synthetic Aperture Radar (SAR) imagery, PALSAR (L-band) SAR imagery, and terrain features to capture comprehensive information about fuel types and distributions. An ensemble model was trained to predict landscape-scale fuels such as the 'Scott and Burgan 40' using the as-received Forest Inventory and Analysis (FIA) field survey plot data obtained from the USDA Forest Service. However, this basic approach yielded relatively poor results due to the inadequate amount of training data. Pseudo-labeled and fully synthetic datasets were developed using generative AI approaches to address the limitations of ground truth data availability. These synthetic datasets were used for augmenting the FIA data from California to enhance the robustness and coverage of model training. The use of an ensemble of methods including deep learning neural networks, decision trees, and gradient boosting offered a fuel mapping accuracy of nearly 80\%. Through extensive experimentation and evaluation, the effectiveness of the proposed approach was validated for regions of the 2021 Dixie and Caldor fires. Comparative analyses against high-resolution data from the National Agriculture Imagery Program (NAIP) and timber harvest maps affirmed the robustness and reliability of the proposed approach, which is capable of near-real-time fuel mapping.	翻訳日:2024-03-26 22:41:56 公開日:2024-03-19
# 絶え間ない世界における異常発見:連続学習における画素レベル異常検出のためのベンチマーク Unveiling the Anomalies in an Ever-Changing World: A Benchmark for Pixel-Level Anomaly Detection in Continual Learning ( http://arxiv.org/abs/2403.15463v1 ) ライセンス: Link先を確認	Nikola Bugarin, Jovana Bugaric, Manuel Barusco, Davide Dalle Pezze, Gian Antonio Susto,	(参考訳) 異常検出は多くの実世界のアプリケーション、特に画像を扱う場合、関連する問題である。しかし、入力データ分布の時間的変化にはほとんど注意が払われておらず、性能が著しく低下する可能性がある。本研究では,連続学習環境におけるPixel-Level Anomaly Detectionの問題点について検討する。本研究では,古典的環境における異常検出問題の解決と,継続学習環境における動作に適応するために,いくつかの最先端技術を実装した。アプローチを検証するために,画素ベースの異常のある実世界の画像データセットを用いて,信頼性の高いベンチマークを提供し,この分野のさらなる進歩の基盤として機能する。我々は、どの異常検出方法と、どのファミリーのアプローチが継続的な学習環境に適しているかについて議論する包括的分析を行う。 Anomaly Detection is a relevant problem in numerous real-world applications, especially when dealing with images. However, little attention has been paid to the issue of changes over time in the input data distribution, which may cause a significant decrease in performance. In this study, we investigate the problem of Pixel-Level Anomaly Detection in the Continual Learning setting, where new data arrives over time and the goal is to perform well on new and old data. We implement several state-of-the-art techniques to solve the Anomaly Detection problem in the classic setting and adapt them to work in the Continual Learning setting. To validate the approaches, we use a real-world dataset of images with pixel-based anomalies to provide a reliable benchmark and serve as a foundation for further advancements in the field. We provide a comprehensive analysis, discussing which Anomaly Detection methods and which families of approaches seem more suitable for the Continual Learning setting.	翻訳日:2024-03-26 22:41:56 公開日:2024-03-19
# EHRを用いたLLMによるFew-Shot病の予測:予測エージェント推論とクリティカルエージェント指導を組み合わせた新しいアプローチ LLMs-based Few-Shot Disease Predictions using EHR: A Novel Approach Combining Predictive Agent Reasoning and Critical Agent Instruction ( http://arxiv.org/abs/2403.15464v1 ) ライセンス: Link先を確認	Hejie Cui, Zhuocheng Shen, Jieyu Zhang, Hui Shao, Lianhui Qin, Joyce C. Ho, Carl Yang,	(参考訳) 電子健康記録(EHR)は、疾患予測などの健康関連予測タスクに有用な患者データを含んでいる。従来のアプローチは、巨大なラベル付きデータセットを必要とする教師付き学習手法に依存しており、高価で入手が難しい。本研究では,Large Language Models (LLMs) を用いて,構造化患者訪問データ(例えば,診断,検査,処方薬)を自然言語の物語に変換する可能性について検討した。様々なERH予測指向のプロンプト戦略を用いて,LLMのゼロショット性能と少数ショット性能を評価した。さらに、予測を行い、推論プロセスを生成する予測エージェントと、誤った予測を解析し、予測エージェントの推論を改善するためのガイダンスを提供する批評家エージェントと、異なる役割を持つLLMエージェントを利用する新しいアプローチを提案する。提案手法により,従来のERHによる疾患予測における教師あり学習法と比較して,LLMは極めて少ない性能を達成でき,健康志向の応用の可能性が示唆された。 Electronic health records (EHRs) contain valuable patient data for health-related prediction tasks, such as disease prediction. Traditional approaches rely on supervised learning methods that require large labeled datasets, which can be expensive and challenging to obtain. In this study, we investigate the feasibility of applying Large Language Models (LLMs) to convert structured patient visit data (e.g., diagnoses, labs, prescriptions) into natural language narratives. We evaluate the zero-shot and few-shot performance of LLMs using various EHR-prediction-oriented prompting strategies. Furthermore, we propose a novel approach that utilizes LLM agents with different roles: a predictor agent that makes predictions and generates reasoning processes and a critic agent that analyzes incorrect predictions and provides guidance for improving the reasoning of the predictor agent. Our results demonstrate that with the proposed approach, LLMs can achieve decent few-shot performance compared to traditional supervised learning methods in EHR-based disease predictions, suggesting its potential for health-oriented applications.	翻訳日:2024-03-26 22:41:56 公開日:2024-03-19
# ロールアウトアルゴリズムによる$n$-Grams, Transformer, HMMs, Markov Chainsの配列生成 Most Likely Sequence Generation for $n$-Grams, Transformers, HMMs, and Markov Chains, by Using Rollout Algorithms ( http://arxiv.org/abs/2403.15465v1 ) ライセンス: Link先を確認	Yuchao Li, Dimitri Bertsekas,	(参考訳) 本稿では,ChatGPTの基盤となるような$n$-gram構造を持つ変圧器について考察する。変換器は、単語列を生成するために使用できる次の単語確率を提供する。これらの確率に基づいて,確率の高い単語列の計算方法を検討する。与えられた初期状態から始まる最適な(最も可能性が高い)単語列を計算することは難解な問題であり、我々は$N$の低次多項式で$n$-gramのボキャブラリサイズである時間に$N$の単語列を計算する方法を提案する。これらの手法は、任意のヒューリスティックポリシーの性能を向上させることができる単一のポリシーイテレーションの形式である、近似動的プログラミングからのロールアウトアプローチに基づいている。私たちの場合、最も高い確率を持つ次の単語を生成する欲求的ヒューリスティックを使用します。解析, 実例, 計算実験により, 本手法は, 強欲なヒューリスティックよりも計算量がわずかに増加し, 高い確率の列を生成することができることを示した。解析と実験は変換器やChatGPTのようなモデルで生じる型のマルコフ連鎖に焦点が当てられているが、本手法は一般的な有限状態マルコフ連鎖や、ビタビ復号法が広く用いられるHMM(Hidden Markov Models)の関連する推論応用に適用できる。 In this paper we consider a transformer with an $n$-gram structure, such as the one underlying ChatGPT. The transformer provides next word probabilities, which can be used to generate word sequences. We consider methods for computing word sequences that are highly likely, based on these probabilities. Computing the optimal (i.e., most likely) word sequence starting with a given initial state is an intractable problem, so we propose methods to compute highly likely sequences of $N$ words in time that is a low order polynomial in $N$ and in the vocabulary size of the $n$-gram. These methods are based on the rollout approach from approximate dynamic programming, a form of single policy iteration, which can improve the performance of any given heuristic policy. In our case we use a greedy heuristic that generates as next word one that has the highest probability. We show with analysis, examples, and computational experimentation that our methods are capable of generating highly likely sequences with a modest increase in computation over the greedy heuristic. While our analysis and experiments are focused on Markov chains of the type arising in transformer and ChatGPT-like models, our methods apply to general finite-state Markov chains, and related inference applications of Hidden Markov Models (HMM), where Viterbi decoding is used extensively.	翻訳日:2024-03-26 22:41:56 公開日:2024-03-19
# ストラグラー存在下での1ビット勾配符号化に基づく分散学習 Distributed Learning based on 1-Bit Gradient Coding in the Presence of Stragglers ( http://arxiv.org/abs/2403.14716v1 ) ライセンス: Link先を確認	Chengxi Li, Mikael Skoglund,	(参考訳) 本稿では,トラグラーの存在下での分散学習(DL)の問題について考察する。この問題に対して、勾配符号化に基づくDL手法が広く研究されており、労働者がストラグラーである場合の収束を保証するために、トレーニングデータを冗長に労働者に配布している。しかし、これらの手法では、学習中に実数値ベクトルを送信する必要があるため、非常に高い通信負担が生じる。この欠点を克服するために,1ビット勾配符号化(1ビットGCDL)に基づく新しいDL手法を提案する。理論的には、凸損失関数と非凸損失関数の両方に対する提案手法の収束保証を提供する。 1ビットのGC-DLはベースライン法よりも優れており、同じ通信オーバヘッド下での学習性能が向上する。 This paper considers the problem of distributed learning (DL) in the presence of stragglers. For this problem, DL methods based on gradient coding have been widely investigated, which redundantly distribute the training data to the workers to guarantee convergence when some workers are stragglers. However, these methods require the workers to transmit real-valued vectors during the process of learning, which induces very high communication burden. To overcome this drawback, we propose a novel DL method based on 1-bit gradient coding (1-bit GCDL), where 1-bit data encoded from the locally computed gradients are transmitted by the workers to reduce the communication overhead. We theoretically provide the convergence guarantees of the proposed method for both the convex loss functions and nonconvex loss functions. It is shown empirically that 1-bit GC-DL outperforms the baseline methods, which attains better learning performance under the same communication overhead.	翻訳日:2024-03-25 21:41:26 公開日:2024-03-19
# FedSR: IoTシステムにおける非IID性のための半分散フェデレーション学習アルゴリズム FedSR: A Semi-Decentralized Federated Learning Algorithm for Non-IIDness in IoT System ( http://arxiv.org/abs/2403.14718v1 ) ライセンス: Link先を確認	Jianjun Huang, Lixin Ye, Li Kang,	(参考訳) IoT(Industrial Internet of Things)では、大量のデータが毎日生成される。プライバシとセキュリティの問題から、これらすべてのデータをまとめてディープラーニングモデルをトレーニングすることは難しいため、データプライバシを保護する分散型機械学習パラダイムであるフェデレーション学習がIoTで広く使用されている。しかし、実践的なフェデレート学習では、データ分布は通常デバイス間で大きな差異があり、データの均一性はモデルの性能を低下させる。さらに、IoTのフェデレーション学習は通常、トレーニングに関わる多数のデバイスを持ち、クラウドサーバの限られた通信リソースは、トレーニングのボトルネックになる。上記の課題に対処するため,本論文では,集中型フェデレーション学習と分散型フェデレーション学習を組み合わせて,半分散型クラウドエッジデバイス階層型フェデレーション学習フレームワークを設計する。データの不均一性の影響に対処するために、各リングクラスタにおける漸進的な段階的最適化アルゴリズムを用いて、リングクラスタモデルの一般化能力を向上する。我々の大規模な実験は、当社のアプローチがデータ不均一性の影響を効果的に軽減し、クラウドサーバにおける通信ボトルネックを軽減することができることを示している。 In the Industrial Internet of Things (IoT), a large amount of data will be generated every day. Due to privacy and security issues, it is difficult to collect all these data together to train deep learning models, thus the federated learning, a distributed machine learning paradigm that protects data privacy, has been widely used in IoT. However, in practical federated learning, the data distributions usually have large differences across devices, and the heterogeneity of data will deteriorate the performance of the model. Moreover, federated learning in IoT usually has a large number of devices involved in training, and the limited communication resource of cloud servers become a bottleneck for training. To address the above issues, in this paper, we combine centralized federated learning with decentralized federated learning to design a semi-decentralized cloud-edge-device hierarchical federated learning framework, which can mitigate the impact of data heterogeneity, and can be deployed at lage scale in IoT. To address the effect of data heterogeneity, we use an incremental subgradient optimization algorithm in each ring cluster to improve the generalization ability of the ring cluster models. Our extensive experiments show that our approach can effectively mitigate the impact of data heterogeneity and alleviate the communication bottleneck in cloud servers.	翻訳日:2024-03-25 21:41:26 公開日:2024-03-19
# ラベルの平滑化が選択的分類を低下させる理由と修正方法 Understanding Why Label Smoothing Degrades Selective Classification and How to Fix It ( http://arxiv.org/abs/2403.14715v1 ) ライセンス: Link先を確認	Guoxuan Xia, Olivier Laurent, Gianni Franchi, Christos-Savvas Bouganis,	(参考訳) ラベルスムーシング(LS)は、テスト精度の向上と実装の単純さにより、ディープニューラルネットワーク分類器をトレーニングするための一般的な正規化手法である。ハード」ワンホットラベルは、確率質量を他のクラスに均一に分散し、オーバーフィッティングを減らすことで「平滑化」される。本研究では,LSが選択分類(SC)に悪影響を及ぼすことを明らかにする。まず、LSがSCに一貫した劣化をもたらす様々なタスクやアーキテクチャを経験的に実証する。次に、ロジトレベルの勾配を解析することにより、LSはエラーの確率が低い場合には最大ロジトをより規則化し、エラーの確率が高い場合はより小さくすることで、過信と過信を悪化させることを示す。この結果より, SCでは強い分類器が不十分であったことが示唆された。次に,LSによる損失SCの回復に対するロジット正規化の有効性を実証した。さらに、勾配解析に基づいて、なぜそのような正規化が有効かを説明する。近いうちにコードを公開します。 Label smoothing (LS) is a popular regularisation method for training deep neural network classifiers due to its effectiveness in improving test accuracy and its simplicity in implementation. "Hard" one-hot labels are "smoothed" by uniformly distributing probability mass to other classes, reducing overfitting. In this work, we reveal that LS negatively affects selective classification (SC) - where the aim is to reject misclassifications using a model's predictive uncertainty. We first demonstrate empirically across a range of tasks and architectures that LS leads to a consistent degradation in SC. We then explain this by analysing logit-level gradients, showing that LS exacerbates overconfidence and underconfidence by regularising the max logit more when the probability of error is low, and less when the probability of error is high. This elucidates previously reported experimental results where strong classifiers underperform in SC. We then demonstrate the empirical effectiveness of logit normalisation for recovering lost SC performance caused by LS. Furthermore, based on our gradient analysis, we explain why such normalisation is effective. We will release our code shortly.	翻訳日:2024-03-25 21:31:40 公開日:2024-03-19
# 電子商取引における個人・製品関連前駆体 Individual and Product-Related Antecedents of Electronic Word-of-Mouth ( http://arxiv.org/abs/2403.14717v1 ) ライセンス: Link先を確認	Bogdan Anastasiei, Nicoleta Dospinescu, Octavian Dospinescu,	(参考訳) 本研究は,eWOMの正負電子ワード・オブ・マウス(eWOM)の適合性,およびeWOMの正電子ワード・オブ・マウス(eWOM)の正電子ワード・オブ・マウス(eWOM)の正電子ワード・オブ・マウス(eWOM)の正電子ワード・オブ・マウス(eWOM)の適合性について検討した。製品関連変数と個人要因の2種類のeWOM予測器が検討された。データはルーマニアの335人の被験者を対象にしたオンライン調査を通じて収集され、解析方法は構造方程式モデリングである。以上の結果から,個人的要因(ソーシャルメディアの利用行動,マーケティングマニアニズム,評価の必要性)が,製品レビューやコメントをオンラインで作成する意図の最も重要な先駆者であることを示唆した。製品関連因子から、eWOMを提供するための適合性に影響を与えるのはブランド信頼のみである。さらに、肯定的および否定的なeWOM意図は、再購入意図と関連している。 This research investigates the antecedents of positive and negative electronic word-of-mouth (eWOM) propensity, as well as the impact of eWOM propensity on the intention to repurchase the product. Two types of eWOM predictors were considered: product related variables and personal factors. The data were collected through an online survey conducted on a sample of 335 Romanian subjects, and the analysis method was Structural Equation Modeling. Our findings show that personal factors - social media usage behavior, marketing mavenism and need to evaluate - are the most important antecedents of the intention to write product reviews and comments online, either positive or negative. From the product related factors, only brand trust influences the propensity to provide eWOM. Furthermore, both positive and negative eWOM intentions are associated with the repurchase intention.	翻訳日:2024-03-25 21:31:40 公開日:2024-03-19
# カラーアウェア置換によるLCM透かしのバイパス Bypassing LLM Watermarks with Color-Aware Substitutions ( http://arxiv.org/abs/2403.14719v1 ) ライセンス: Link先を確認	Qilong Wu, Varun Chandrasekaran,	(参考訳) テキストが人間か大きな言語モデル(LLM)であるかどうかを識別するために、透かし手法が提案されている。 Kirchenbauer et al (2023a) の最先端の透かし戦略は LLM を偏り、特定の (`green'') トークンを生成する。しかし、この透かし法の堅牢性を決定することは未解決の問題である。既存の攻撃方法は、長いテキストセグメントの検出を回避できない。我々はこの制限を克服し、最初の「カラーアウェア」攻撃であるSCTS(Self Color Testing-based Substitution)を提案する。 SCTSは、ウォーターマークされたLCMを戦略的に促し、出力トークンの周波数を比較することで、色情報を取得する。この情報を使ってトークンの色を決定し、緑色のトークンを非緑色のトークンに置き換える。本実験においてSCTSは関連する作業よりも少ない編集数で透かし検出を回避した。さらに、SCTSが任意の長さの透かしテキストの透かしを除去できることを理論的および実証的に示す。 Watermarking approaches are proposed to identify if text being circulated is human or large language model (LLM) generated. The state-of-the-art watermarking strategy of Kirchenbauer et al. (2023a) biases the LLM to generate specific (``green'') tokens. However, determining the robustness of this watermarking method is an open problem. Existing attack methods fail to evade detection for longer text segments. We overcome this limitation, and propose {\em Self Color Testing-based Substitution (SCTS)}, the first ``color-aware'' attack. SCTS obtains color information by strategically prompting the watermarked LLM and comparing output tokens frequencies. It uses this information to determine token colors, and substitutes green tokens with non-green ones. In our experiments, SCTS successfully evades watermark detection using fewer number of edits than related work. Additionally, we show both theoretically and empirically that SCTS can remove the watermark for arbitrarily long watermarked text.	翻訳日:2024-03-25 21:31:40 公開日:2024-03-19
# 粒子加速器におけるビームダイナミクスの生成と予測のための条件付き遅延自己回帰リカレントモデル A conditional latent autoregressive recurrent model for generation and forecasting of beam dynamics in particle accelerators ( http://arxiv.org/abs/2403.13858v1 ) ライセンス: Link先を確認	Mahindra Rautela, Alan Williams, Alexander Scheinker,	(参考訳) 粒子加速器は、高エネルギーに荷電粒子ビームを集中させ、誘導し、加速する複雑なシステムである。ビーム診断は, 限られた非破壊測定, 計算要求シミュレーション, システム内固有の不確実性により, 困難な問題となる。本研究では,加速器内の荷電粒子の時空間的ダイナミクスを学習するための,条件付き遅延自己回帰時間モデル(CLARM)と呼ばれる2段階の教師なしディープラーニングフレームワークを提案する。 CLARMは、6次元位相空間を低次元の潜在分布に変換する条件変分オートエンコーダ(CVAE)と、時間的ダイナミクスを自己回帰的に捉えるLong Short-Term Memory(LSTM)ネットワークで構成される。 CLARMは、潜在空間表現をサンプリングして復号することで、様々な加速器モジュールでプロジェクションを生成することができる。このモデルはまた、過去の状態(上流位置)から荷電粒子の将来の状態(下流位置)を予測する。その結果,提案手法の予測能力と生成能力は,様々な評価指標と比較した場合に有望であることが示唆された。 Particle accelerators are complex systems that focus, guide, and accelerate intense charged particle beams to high energy. Beam diagnostics present a challenging problem due to limited non-destructive measurements, computationally demanding simulations, and inherent uncertainties in the system. We propose a two-step unsupervised deep learning framework named as Conditional Latent Autoregressive Recurrent Model (CLARM) for learning the spatiotemporal dynamics of charged particles in accelerators. CLARM consists of a Conditional Variational Autoencoder (CVAE) transforming six-dimensional phase space into a lower-dimensional latent distribution and a Long Short-Term Memory (LSTM) network capturing temporal dynamics in an autoregressive manner. The CLARM can generate projections at various accelerator modules by sampling and decoding the latent space representation. The model also forecasts future states (downstream locations) of charged particles from past states (upstream locations). The results demonstrate that the generative and forecasting ability of the proposed approach is promising when tested against a variety of evaluation metrics.	翻訳日:2024-03-22 18:28:52 公開日:2024-03-19
# 創造性と機械学習: 調査 Creativity and Machine Learning: A Survey ( http://arxiv.org/abs/2104.02726v4 ) ライセンス: Link先を確認	Giorgio Franceschelli, Mirco Musolesi,	(参考訳) 機械学習とクリエイティビティの分野への関心が高まっている。本稿では,計算創造性理論の歴史と現状,鍵となる機械学習技術(生成的深層学習を含む),およびそれに対応する自動評価手法について概説する。この分野における重要な貢献について批判的な議論を行った後、この分野における現在の研究課題と新たな機会について概説する。 There is a growing interest in the area of machine learning and creativity. This survey presents an overview of the history and the state of the art of computational creativity theories, key machine learning techniques (including generative deep learning), and corresponding automatic evaluation methods. After presenting a critical discussion of the key contributions in this area, we outline the current research challenges and emerging opportunities in this field.	翻訳日:2024-03-21 23:26:53 公開日:2024-03-19
# 計量空間における弱凸集合の学習 Learning Weakly Convex Sets in Metric Spaces ( http://arxiv.org/abs/2105.06251v2 ) ライセンス: Link先を確認	Eike Stadtländer, Tamás Horváth, Stefan Wrobel,	(参考訳) 機械学習理論で研究される中心的な問題の1つは、与えられた仮説のクラスに対して、効率よく {consistent} 仮説を見つけることができるかどうか、すなわち、訓練誤差がゼロであるかどうかである。エム凸仮説に関する問題は広く研究されているが、いくつかの非連結領域からなる非凸仮説に対して効率的な学習が可能かどうかという問題は、いまだに理解されていない。弱凸仮説の効率的な学習(凸仮説のパラメータ化緩和)がブール関数の特別な場合において可能であることが、かなり以前から示されてきたが、このアイデアが一般的なパラダイムへと発展できるかどうかという問題は、まだ研究されていない。本稿では,測度空間上の弱凸仮説の広いクラスに対して,一貫した仮説探索問題を多項式時間で解くことができることを示す。そこで本研究では,一貫した弱凸仮説を導出する一般領域非依存アルゴリズムを提案し,その効率性を証明し,対応する仮説クラスを特徴づける。一般のアルゴリズムとその特性を説明するために,いくつかの非自明な学習例について論じ,それに対応する一貫した仮説探索問題を効率的に解く方法を示す。弱凸性制約がなければ、これらの問題は計算的に難解であることが知られている。そして、グラフで頂点分類を行う際に自然に発生するような、我々のアルゴリズムの一般的な考え方は、拡張性の弱い凸仮説にまで拡張可能であることを示す。拡張アルゴリズムを用いることで、領域内の距離を効率的に計算できるような多項式時間で問題を解くことができることを示す。 One of the central problems studied in the theory of machine learning is the question of whether, for a given class of hypotheses, it is possible to efficiently find a {consistent} hypothesis, i.e., which has zero training error. While problems involving {\em convex} hypotheses have been extensively studied, the question of whether efficient learning is possible for non-convex hypotheses composed of possibly several disconnected regions is still less understood. Although it has been shown quite a while ago that efficient learning of weakly convex hypotheses, a parameterized relaxation of convex hypotheses, is possible for the special case of Boolean functions, the question of whether this idea can be developed into a generic paradigm has not been studied yet. In this paper, we provide a positive answer and show that the consistent hypothesis finding problem can indeed be solved in polynomial time for a broad class of weakly convex hypotheses over metric spaces. To this end, we propose a general domain-independent algorithm for finding consistent weakly convex hypotheses and prove sufficient conditions for its efficiency that characterize the corresponding hypothesis classes. To illustrate our general algorithm and its properties, we discuss several non-trivial learning examples to demonstrate how it can be used to efficiently solve the corresponding consistent hypothesis finding problem. Without the weak convexity constraint, these problems are known to be computationally intractable. We then proceed to show that the general idea of our algorithm can even be extended to the case of extensional weakly convex hypotheses, as it naturally arise, e.g., when performing vertex classification in graphs. We prove that using our extended algorithm, the problem can be solved in polynomial time provided the distances in the domain can be computed efficiently.	翻訳日:2024-03-21 23:26:53 公開日:2024-03-19
# 量子エンタングルメントの初等生への導入 : トリレンマの解消 Introducing Quantum Entanglement to First-Year Students: Resolving the Trilemma ( http://arxiv.org/abs/2106.12043v4 ) ライセンス: Link先を確認	W. M. Stuckey, Timothy McDevitt, Michael Silberstein,	(参考訳) 量子力学(Quantum Mechanics, QM)は、入門物理学の教科書で長くカバーされているが、量子エンタングルメントの概念は、急速に成長する量子情報科学の領域と、その広範な実験的検証において重要であるにもかかわらず、一般的にはカバーされていない。このように、物理教育者は、この重要な概念を導入する方法について、自身のデバイスに委ねられている。物理学の教育者が量子絡み合いを導入する方法がどうあるにせよ、謎のベル-不等質-違反相関を含むトリレンマに直面している。 2022年のノーベル物理学賞を完全に無視して、導入の完全性に妥協し、その事実を共有することを単純に選択することができる。彼らは、このミステリーを導入することで、より好奇心をそそる学生達に不満を抱き、QM形式と関連する(同様に神秘的な)保存法則が実験に美しく対応していることを単に伝えただけで、言うべきことは他にない。あるいは、彼らはプレゼンテーションの厳密さを妥協し、競合するQM解釈のメタ物理準位に介入することでミステリーを解決しようとする。ここでは、アインシュタインが19世紀後半に存在した時間拡張と長さ収縮の謎を解くのと全く同じ方法で、このトリレンマを解く。すなわち、我々は「経験的に発見された」事実の数学的結果に基づいて「原理的」な説明を行う。実際、我々の量子絡み合いの原理はアインシュタインが用いたのと同じ原理、すなわち相対性理論や「好ましくない参照フレーム」に基づいている。このように、トリレンマのこの原理的解決は完全であり、満足し、分析的に厳密であり、初年次物理学生のための特殊相対性理論の標準導入としてアクセス可能である。 While quantum mechanics (QM) is covered at length in introductory physics textbooks, the concept of quantum entanglement is typically not covered at all, despite its importance in the rapidly growing area of quantum information science and its extensive experimental confirmation. Thus, physics educators are left to their own devices as to how to introduce this important concept. Regardless of how a physics educator chooses to introduce quantum entanglement, they face a trilemma involving its mysterious Bell-inequality-violating correlations. They can compromise on the the completeness of their introduction and simply choose not to share that fact, totally ignoring the 2022 Nobel Prize in Physics. They can frustrate their more curious students by introducing the mystery and simply telling them that the QM formalism with its associated (equally mysterious) conservation law maps beautifully to the experiments, so there is nothing else that needs to be said. Or, they can compromise the rigor of their presentation and attempt to resolve the mystery by venturing into the metaphysical quagmire of competing QM interpretations. Herein, we resolve this trilemma in precisely the same way that Einstein resolved the mysteries of time dilation and length contraction that existed in the late nineteenth century. That is, we resort to "principle" explanation based on the mathematical consequences of "empirically discovered" facts. Indeed, our principle account of quantum entanglement is even based on the same principle Einstein used, i.e., the relativity principle or "no preferred reference frame." Thus, this principle resolution of the trilemma is as complete, satisfying, analytically rigorous, and accessible as the standard introduction of special relativity for first-year physics students.	翻訳日:2024-03-21 23:26:53 公開日:2024-03-19
# 雑音量子コンピュータの位相データ解析 Topological data analysis on noisy quantum computers ( http://arxiv.org/abs/2209.09371v4 ) ライセンス: Link先を確認	Ismail Yunus Akhalwaya, Shashanka Ubaru, Kenneth L. Clarkson, Mark S. Squillante, Vishnu Jejjala, Yang-Hui He, Kugendran Naidoo, Vasileios Kalantzis, Lior Horesh,	(参考訳) トポロジカルデータ解析(TDA)は,高次元データの複雑で価値の高い形状関連要約を抽出する強力な手法である。しかし、TDA計算における古典的アルゴリズムの計算要求は極端であり、高次特性に対してはすぐに非現実的になる。量子コンピュータは、特定の計算問題に対して大きなスピードアップを達成する可能性を秘めている。実際、TDAはそのような問題の1つとして報告されているが、ロイド、ガーネロン、ザナルディによる量子TDA(QTDA)の定式化のような量子コンピューティングアルゴリズムでは、現在利用できないフォールトトレランスの資格が必要となる。本研究では,高次元古典データに適用可能な短い回路深度のみを必要とする完全実装のエンドツーエンド量子機械学習アルゴリズムであるNISQ-TDAについて述べる。このアルゴリズムは、データローディングの問題に悩まされず、入力データを量子コンピュータに明示的に格納する必要もない。このアルゴリズムは、小さなデータセットに適用された量子コンピューティングデバイスだけでなく、ノイズの多い量子シミュレータ上でもうまく実行された。予備的な経験的結果は、アルゴリズムがノイズに対して堅牢であることを示唆している。 Topological data analysis (TDA) is a powerful technique for extracting complex and valuable shape-related summaries of high-dimensional data. However, the computational demands of classical algorithms for computing TDA are exorbitant, and quickly become impractical for high-order characteristics. Quantum computers offer the potential of achieving significant speedup for certain computational problems. Indeed, TDA has been purported to be one such problem, yet, quantum computing algorithms proposed for the problem, such as the original Quantum TDA (QTDA) formulation by Lloyd, Garnerone and Zanardi, require fault-tolerance qualifications that are currently unavailable. In this study, we present NISQ-TDA, a fully implemented end-to-end quantum machine learning algorithm needing only a short circuit-depth, that is applicable to high-dimensional classical data, and with provable asymptotic speedup for certain classes of problems. The algorithm neither suffers from the data-loading problem nor does it need to store the input data on the quantum computer explicitly. The algorithm was successfully executed on quantum computing devices, as well as on noisy quantum simulators, applied to small datasets. Preliminary empirical results suggest that the algorithm is robust to noise.	翻訳日:2024-03-21 23:26:53 公開日:2024-03-19
# 非負行列分解のための高速乗算更新アルゴリズム A fast Multiplicative Updates algorithm for Non-negative Matrix Factorization ( http://arxiv.org/abs/2303.17992v2 ) ライセンス: Link先を確認	Mai-Quyen Pham, Jérémy Cohen, Thierry Chonavel,	(参考訳) 非負の行列因子化は、教師なし機械学習において、しばしば解釈可能な部分の積にデータマトリックスを分解する重要なツールである。過去30年間に多くのアルゴリズムが提案されてきた。有名な方法は2002年にLee and Seungによって提案された乗法更新アルゴリズムである。実装が簡単で、スパース非負行列因子化のような一般的な変種に適応でき、最近のベンチマークによると、損失関数がフロベニウスノルムではない多くの問題に対して最先端である。本稿では,各サブプロブレムに対して,ヘッセン行列のより厳密な上界を構築することにより,交互な最大化最小化アルゴリズムと見なされる乗法更新アルゴリズムを改善することを提案する。コンバージェンスはまだ保証されており、我々は実際に合成と実世界の両方のデータセットで、提案したfastMUアルゴリズムが通常の乗算更新アルゴリズムよりも数桁高速であり、フロベニウスの損失に対する最先端の手法と競合することがあることを観察している。 Nonnegative Matrix Factorization is an important tool in unsupervised machine learning to decompose a data matrix into a product of parts that are often interpretable. Many algorithms have been proposed during the last three decades. A well-known method is the Multiplicative Updates algorithm proposed by Lee and Seung in 2002. Multiplicative updates have many interesting features: they are simple to implement and can be adapted to popular variants such as sparse Nonnegative Matrix Factorization, and, according to recent benchmarks, is state-of-the-art for many problems where the loss function is not the Frobenius norm. In this manuscript, we propose to improve the Multiplicative Updates algorithm seen as an alternating majorization minimization algorithm by crafting a tighter upper bound of the Hessian matrix for each alternate subproblem. Convergence is still ensured and we observe in practice on both synthetic and real world dataset that the proposed fastMU algorithm is often several orders of magnitude faster than the regular Multiplicative Updates algorithm, and can even be competitive with state-of-the-art methods for the Frobenius loss.	翻訳日:2024-03-21 23:07:03 公開日:2024-03-19
# OSDaR23:Rail 2023用のオープンセンサーデータ OSDaR23: Open Sensor Data for Rail 2023 ( http://arxiv.org/abs/2305.03001v2 ) ライセンス: Link先を確認	Rustam Tagiew, Martin Köppel, Karsten Schwalbe, Patrick Denzler, Philipp Neumaier, Tobias Klockau, Martin Boekhoff, Pavel Klasek, Roman Tilly,	(参考訳) 本線鉄道における無人運転を実現するためには、適切なセンサシステムにより、列車の走行路の現実的および潜在的障害を自動的に検出する必要がある。機械学習アルゴリズムは、ここ数年でこのタスクに強力なツールであることが証明されている。しかしながら、これらのアルゴリズムは、トレーニングデータとして鉄道固有のオブジェクトを含む大量の高品質なアノテートデータを必要とする。残念ながら、この要件に対処する公開データセットはすべて、何らかの方法で制限されている。そこで本稿では,2021年9月にドイツのハンブルクで取得した45のサブシーケンスのマルチセンサ・データセットであるOSDaR23について述べる。センサーのセットアップは、複数の校正・同期赤外線(IR)と視覚カメラ(RGB)、ライダー、レーダー、およびレール車両の前面に取り付けられた位置と加速度センサーで構成されている。生データに加えて、データセットには204091のポリリン、多角形、矩形、立方形アノテーションが含まれており、合計で20の異なるオブジェクトクラスがある。これは、鉄道コンテキストに関連するさまざまなオブジェクトクラスに注釈付けされた、初めて公開されたマルチセンサーデータセットである。 data.fid-move.de/dataset/osdar23で利用可能なOSDaR23は、衝突予測以外のタスクにも使用することができる。 To achieve a driverless train operation on mainline railways, actual and potential obstacles for the train's driveway must be detected automatically by appropriate sensor systems. Machine learning algorithms have proven to be powerful tools for this task during the last years. However, these algorithms require large amounts of high-quality annotated data containing railway-specific objects as training data. Unfortunately, all of the publicly available datasets that tackle this requirement are restricted in some way. Therefore, this paper presents OSDaR23, a multi-sensor dataset of 45 subsequences acquired in Hamburg, Germany, in September 2021, that was created to foster driverless train operation on mainline railways. The sensor setup consists of multiple calibrated and synchronized infrared (IR) and visual (RGB) cameras, lidars, a radar, and position and acceleration sensors mounted on the front of a rail vehicle. In addition to the raw data, the dataset contains 204091 polyline, polygonal, rectangle, and cuboid annotations in total for 20 different object classes. It is the first publicly available multi-sensor dataset annotated with a variety of object classes that are relevant for the railway context. OSDaR23, available at data.fid-move.de/dataset/osdar23, can also be used for tasks beyond collision prediction, which are listed in this paper.	翻訳日:2024-03-21 23:07:03 公開日:2024-03-19
# Federated Foundation Models: 大規模モデルのためのプライバシ保護と協調学習 Federated Foundation Models: Privacy-Preserving and Collaborative Learning for Large Models ( http://arxiv.org/abs/2305.11414v3 ) ライセンス: Link先を確認	Sixing Yu, J. Pablo Muñoz, Ali Jannesari,	(参考訳) LLaMA、BERT、GPT、ViT、CLIPといったファンデーションモデル(FM)は、事前トレーニングに大量のデータを活用する能力によって、幅広いアプリケーションで顕著な成功を収めている。しかし、FMを最適化するには、機密データにアクセスし、プライバシー上の懸念を高め、多くのドメインで適用性を制限する必要がある。本稿では,FMとFederated Learning(FL)の利点を組み合わせたFFM(Federated Foundation Models)パラダイムを提案する。我々は,FMの寿命にFLを組み込むことの潜在的なメリットと課題について論じ,事前学習,微調整,応用について論じる。 FFMの事前トレーニング、FFMの微調整、フェデレートされたプロンプトチューニングなど、FFMの将来的な研究の道程を概説し、データのプライバシーを確保しつつ、よりパーソナライズされたコンテキスト対応モデルの開発を可能にする。さらに,FFMにおける連続的・長期学習の可能性についても検討し,エッジでの計算能力の増大が,データソースに近い新たに生成されたプライベートデータを用いてFMを最適化する可能性を秘めている。提案したFFMの概念は、大きな言語モデルをプライバシー保護方法でトレーニングするための柔軟でスケーラブルなフレームワークを提供する。 Foundation Models (FMs), such as LLaMA, BERT, GPT, ViT, and CLIP, have demonstrated remarkable success in a wide range of applications, driven by their ability to leverage vast amounts of data for pre-training. However, optimizing FMs often requires access to sensitive data, raising privacy concerns and limiting their applicability in many domains. In this paper, we propose the Federated Foundation Models (FFMs) paradigm, which combines the benefits of FMs and Federated Learning (FL) to enable privacy-preserving and collaborative learning across multiple end-users. We discuss the potential benefits and challenges of integrating FL into the lifespan of FMs, covering pre-training, fine-tuning, and application. We further outline potential future research avenues in FFM, including FFM pre-training, FFM fine-tuning, and federated prompt tuning, which allow the development of more personalized and context-aware models while ensuring data privacy. Moreover, we explore the possibility of continual/lifelong learning in FFMs, as increased computational power at the edge may unlock the potential for optimizing FMs using newly generated private data close to the data source. The proposed FFM concepts offer a flexible and scalable framework for training large language models in a privacy-preserving manner, setting the stage for subsequent advancements in both FM training and federated learning.	翻訳日:2024-03-21 23:07:03 公開日:2024-03-19
# DermSynth3D:in-the-wild Annotated Dermatology画像の合成 DermSynth3D: Synthesis of in-the-wild Annotated Dermatology Images ( http://arxiv.org/abs/2305.12621v3 ) ライセンス: Link先を確認	Ashish Sinha, Jeremy Kawahara, Arezou Pakzad, Kumar Abhishek, Matthieu Ruthven, Enjie Ghorbel, Anis Kacem, Djamila Aouada, Ghassan Hamarneh,	(参考訳) 近年, 深層学習(DL)は皮膚画像解析の分野で大きな可能性を秘めている。しかし、この領域の既存のデータセットには、少数の画像サンプル、限られた疾患条件、不十分なアノテーション、標準化されていない画像取得など、重大な制限がある。これらの欠点に対処するため,我々はDermSynth3Dという新しいフレームワークを提案する。 DermSynth3Dは、人体の3Dテクスチャメッシュに、微分可能なレンダラーを用いて皮膚の病気パターンをブレンドし、さまざまな背景条件下で選択された照明条件下で、様々なカメラ視点から2D画像を生成する。筆者らの手法は、ブレンディングとレンダリングを制約するトップダウンルールに従属し、より有意義な結果が得られるように、肌の状態の2D画像を作成する。本フレームワークは、皮膚、皮膚の状態、身体部分、病変周囲の境界ボックス、深度マップ、およびカメラ位置や照明条件などの他の3Dシーンパラメータを意味的セグメンテーションするための、フォトリアリスティックな2D皮膚鏡画像およびそれに対応する高密度アノテーションを生成する。 DermSynth3Dは、さまざまな皮膚科学タスクのためのカスタムデータセットを作成することができる。本稿では,DermSynth3Dを用いて合成データ上でDLモデルを訓練し,実際の2次元皮膚画像を用いて各種皮膚学タスクで評価することにより,データの有効性を実証する。コードをhttps://github.com/sfu-mial/DermSynth3Dで公開しています。 In recent years, deep learning (DL) has shown great potential in the field of dermatological image analysis. However, existing datasets in this domain have significant limitations, including a small number of image samples, limited disease conditions, insufficient annotations, and non-standardized image acquisitions. To address these shortcomings, we propose a novel framework called DermSynth3D. DermSynth3D blends skin disease patterns onto 3D textured meshes of human subjects using a differentiable renderer and generates 2D images from various camera viewpoints under chosen lighting conditions in diverse background scenes. Our method adheres to top-down rules that constrain the blending and rendering process to create 2D images with skin conditions that mimic in-the-wild acquisitions, ensuring more meaningful results. The framework generates photo-realistic 2D dermoscopy images and the corresponding dense annotations for semantic segmentation of the skin, skin conditions, body parts, bounding boxes around lesions, depth maps, and other 3D scene parameters, such as camera position and lighting conditions. DermSynth3D allows for the creation of custom datasets for various dermatology tasks. We demonstrate the effectiveness of data generated using DermSynth3D by training DL models on synthetic data and evaluating them on various dermatology tasks using real 2D dermatological images. We make our code publicly available at https://github.com/sfu-mial/DermSynth3D.	翻訳日:2024-03-21 23:07:03 公開日:2024-03-19
# Prodigy: 適応型パラメータフリー学習者 Prodigy: An Expeditiously Adaptive Parameter-Free Learner ( http://arxiv.org/abs/2306.06101v4 ) ライセンス: Link先を確認	Konstantin Mishchenko, Aaron Defazio,	(参考訳) 本稿では,AdaGradやAdamといった適応的な手法で学習率を推定する問題を考察する。本稿では,学習率を最適に設定するために必要となる,解からD$までの距離を確実に推定するアルゴリズムであるProdigyを提案する。 Prodigyの中核となるのは、D-Adaptation法を学習速度のない学習に適用することである。これは、D-適応の収束率を$O(\sqrt{\log(D/d_0)})$で改善する。我々は12の共通ロジスティック回帰ベンチマークデータセット、CIFAR10でのVGG11およびResNet-50トレーニング、ImagenetでのViTトレーニング、IWSLT14でのLSTMトレーニング、CriteoデータセットでのDLRMトレーニング、Knee MRIデータセットでのVarNet、BookWikiでのRoBERTaおよびGPTトランスフォーマートレーニングでProdigyをテストする。実験結果から,本手法はD-Adaptationより優れ,手書きAdamに近い精度で精度が向上することがわかった。 We consider the problem of estimating the learning rate in adaptive methods, such as AdaGrad and Adam. We propose Prodigy, an algorithm that provably estimates the distance to the solution $D$, which is needed to set the learning rate optimally. At its core, Prodigy is a modification of the D-Adaptation method for learning-rate-free learning. It improves upon the convergence rate of D-Adaptation by a factor of $O(\sqrt{\log(D/d_0)})$, where $d_0$ is the initial estimate of $D$. We test Prodigy on 12 common logistic-regression benchmark datasets, VGG11 and ResNet-50 training on CIFAR10, ViT training on Imagenet, LSTM training on IWSLT14, DLRM training on Criteo dataset, VarNet on Knee MRI dataset, as well as RoBERTa and GPT transformer training on BookWiki. Our experimental results show that our approach consistently outperforms D-Adaptation and reaches test accuracy values close to that of hand-tuned Adam.	翻訳日:2024-03-21 22:57:10 公開日:2024-03-19
# $GRU^{spa}$:時空間分散のための空間的注意を伴うGated Recurrent Unit $GRU^{spa}$: Gated Recurrent Unit with Spatial Attention for Spatio-Temporal Disaggregation ( http://arxiv.org/abs/2306.07292v3 ) ライセンス: Link先を確認	Bin Han, Bill Howe,	(参考訳) オープンデータは、通常プライバシーポリシーに従うために、空間的に集約されることが多い。しかし、粗大で異質な集約は、下流のAI/MLシステムの学習と統合を複雑にします。本研究では,低分解能で不規則なパーティション(例:国勢調査トラクション)から高分解能で不規則なパーティション(例:都市ブロック)へ時空間データを分解するモデルを考察する。本稿では,空間的注意層をGRUモデルに統合したGated Recurrent Unit with Space Attention(GRU^{spa}$)を提案する。空間的注意層は領域間の空間的相互作用を捉え、ゲートリカレントモジュールは時間的依存関係をキャプチャする。さらに,地域レベルの違い(例えば,ある都市ブロックが所定の国勢調査区域に完全に含まれている場合)の包摂関係を利用して,空間的注意層を制約する。限られた歴史的訓練データが存在する状況に対しては,移動学習のシナリオについて検討し,ある都市変数に事前学習したモデルを,数百のサンプルのみを用いて,他の都市変数に対して微調整できることを示す。これらの手法を2つのモビリティデータセット上で評価した結果、$GRU^{spa}$は、他のニューラルネットワークモデルや典型的なヒューリスティック手法よりも大幅に改善され、下流モデルのトレーニングに有用な小さな領域における現実的なポイントデータを合成できることがわかった。 Open data is frequently released spatially aggregated, usually to comply with privacy policies. But coarse, heterogeneous aggregations complicate learning and integration for downstream AI/ML systems. In this work, we consider models to disaggregate spatio-temporal data from a low-resolution, irregular partition (e.g., census tract) to a high-resolution, irregular partition (e.g., city block). We propose a model, Gated Recurrent Unit with Spatial Attention ($GRU^{spa}$), where spatial attention layers are integrated into the original Gated Recurrent Unit (GRU) model. The spatial attention layers capture spatial interactions among regions, while the gated recurrent module captures the temporal dependencies. Additionally, we utilize containment relationships between different geographic levels (e.g., when a given city block is wholly contained in a given census tract) to constrain the spatial attention layers. For situations where limited historical training data is available, we study transfer learning scenarios and show that a model pre-trained on one city variable can be fine-tuned for another city variable using only a few hundred samples. Evaluating these techniques on two mobility datasets, we find that $GRU^{spa}$ provides a significant improvement over other neural models as well as typical heuristic methods, allowing us to synthesize realistic point data over small regions useful for training downstream models.	翻訳日:2024-03-21 22:57:10 公開日:2024-03-19
# ロバスト・インストラクション・チューニングによる大規模マルチモーダルモデルにおける幻覚の緩和 Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning ( http://arxiv.org/abs/2306.14565v4 ) ライセンス: Link先を確認	Fuxiao Liu, Kevin Lin, Linjie Li, Jianfeng Wang, Yaser Yacoob, Lijuan Wang,	(参考訳) マルチモーダルタスクの有望な進歩にもかかわらず、現在の大規模マルチモーダルモデル(LMM)は、関連する画像や人間の指示に関して一貫性のない記述を幻覚させる傾向にある。本稿では,Large-scale Robust Visual (LRV)-Instructionという,大規模かつ多様な視覚的命令チューニングデータセットを導入することでこの問題に対処する。本データセットは, GPT4が生成した400kの視覚的命令からなり, 16の視覚・言語的タスクをオープンエンドの指示と回答でカバーする。主に正の命令サンプルに焦点を当てた既存の研究とは異なり、我々は、より堅牢な視覚的命令チューニングのための正と負の両方の命令を含むLRV-インストラクションを設計する。私たちの否定的な指示は3つの意味レベルで設計されます。 (i)現存しないオブジェクト操作 (二)既存の物体の操作及び操作 (三)知識操作 LMMが生み出す幻覚を効果的に測定するために,人間の専門家による視覚指導のチューニングを安定的に評価するためのGAVIE(GPT4-Assisted Visual Instruction Evaluation)を提案する。 GAVIEは人間による注釈付き基礎回答を必要とせず、多様な命令形式に適応することができる。われわれはLMMの幻覚を調査するための総合的な実験を行った。以上の結果から,既存のLMMには負の指示,特に既存のオブジェクトと知識操作の指示が提示されている。さらに, LRV-InstructionにおけるMiniGPT4とmPLUG-Owlの微調整により幻覚の緩和を実現し, 最先端の手法と比較していくつかの公開データセットの性能向上を実現した。さらに、トレーニングデータにおける正と負のインスタンスのバランスの取れた比率は、より堅牢なモデルにつながることを観察した。コードとデータはhttps://github.com/FuxiaoLiu/LRV-Instruction.comで公開されている。 Despite the promising progress in multi-modal tasks, current large multi-modal models (LMMs) are prone to hallucinating inconsistent descriptions with respect to the associated image and human instructions. This paper addresses this issue by introducing the first large and diverse visual instruction tuning dataset, named Large-scale Robust Visual (LRV)-Instruction. Our dataset comprises 400k visual instructions generated by GPT4, covering 16 vision-and-language tasks with open-ended instructions and answers. Unlike existing studies that primarily focus on positive instruction samples, we design LRV-Instruction to include both positive and negative instructions for more robust visual instruction tuning. Our negative instructions are designed at three semantic levels: (i) Nonexistent Object Manipulation, (ii) Existent Object Manipulation and (iii) Knowledge Manipulation. To efficiently measure the hallucination generated by LMMs, we propose GPT4-Assisted Visual Instruction Evaluation (GAVIE), a stable approach to evaluate visual instruction tuning like human experts. GAVIE does not require human-annotated groundtruth answers and can adapt to diverse instruction formats. We conduct comprehensive experiments to investigate the hallucination of LMMs. Our results demonstrate existing LMMs exhibit significant hallucinations when presented with our negative instructions, particularly Existent Object and Knowledge Manipulation instructions. Moreover, we successfully mitigate hallucination by finetuning MiniGPT4 and mPLUG-Owl on LRV-Instruction while improving performance on several public datasets compared to state-of-the-art methods. Additionally, we observed that a balanced ratio of positive and negative instances in the training data leads to a more robust model. Code and data are available at https://github.com/FuxiaoLiu/LRV-Instruction.	翻訳日:2024-03-21 22:57:10 公開日:2024-03-19
# Kadanoff-Baym方程式を用いたオープン量子システム Open Quantum Systems with Kadanoff-Baym Equations ( http://arxiv.org/abs/2308.07659v3 ) ライセンス: Link先を確認	Tim Neidig, Jan Rais, Marcus Bleicher, Hendrik van Hees, Carsten Greiner,	(参考訳) 本研究では, 量子力学的フェルミオン粒子の時間的進化について検討した。この開量子系に対して、熱バス粒子との相互作用を弾性2-2散乱とすることで、系の粒子に対する非平衡カダノフ・ベイム方程式を定式化する。一粒子グリーンズ関数に対する空間的に不均一な積分微分方程式を数値的に解く。本研究では, 系粒子が熱浴と平衡して熱分解し, 密度行列の対角要素が1粒子エネルギー固有基底, デコヘアで表されることにより, 対角成分,すなわち占有数のみが生き残ることを示す。さらに、グリーン関数の時間発展は、様々な一粒子量子状態のスペクトル特性も決定する。 We study the temporal evolution of quantum mechanical fermionic particles exhibiting one bound state within a one-dimensional attractive square-well potential in a heat bath of bosonic particles. For this open quantum system we formulate the non-equilibrium Kadanoff-Baym equations for the system particles by taking the interactions to be elastic 2-2 scatterings with the heat-bath particles. The corresponding spatially imhomogeneous integro-differential equations for the one-particle Greens's function are solved numerically. We demonstrate how the system particles equilibrate and thermalize with the heat bath and how the off-diagonal elements of the density matrix, expressed in the one-particle energy eigenbasis, decohere, so that only the diagonal entries, i.e. the occupation numbers, survive. In addition, the time evolution of the (retarded) Green's function also determines the spectral properties of the various one-particle quantum states.	翻訳日:2024-03-21 22:47:21 公開日:2024-03-19
# 視覚・言語モデルにおけるチェーン・オブ・サート推論の測定と改善 Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models ( http://arxiv.org/abs/2309.04461v2 ) ライセンス: Link先を確認	Yangyi Chen, Karan Sikka, Michael Cogswell, Heng Ji, Ajay Divakaran,	(参考訳) 視覚言語モデル(VLM)は最近、視覚的内容に関する自然なクエリを解析し、人間のような出力を生成する視覚アシスタントとして、強力な効果を示した。本研究では,これらのモデルが知覚情報に基づいて人間ライクな推論を実証する能力について検討する。推論能力の完全整合性および基底性に関する重要な懸念に対処するため、これらのモデルの推論整合性も測定する。これを実現するために,チェーン・オブ・シント(CoT)に基づく一貫性尺度を提案する。しかし、そのような評価には高レベルの推論と詳細な推論チェーンの両方を含むベンチマークが必要である。 LLM-Human-in-the-Loopパイプラインを提案することで、この課題に対処する。このパイプラインと既存の粗粒化アノテートデータセットに基づいて、VLMのゼロショット推論性能と一貫性の両方を測定するためにCUREベンチマークを構築します。我々は、既存の最先端のVLMを評価し、最高の性能モデルでさえ、強い視覚的推論能力と一貫性を示すことができず、VLMが体系的かつ一貫して人間のように視覚的推論を実行できるようにするためには、かなりの努力が必要であることを示す。初期段階として,VLMの推論性能と一貫性の向上を目的とした2段階のトレーニングフレームワークを提案する。第1段階では、LLMが自動的に生成するステップバイステップ推論サンプルを使用して、VLMの教師付き微調整を行う。第2段階では,LLMによるフィードバックを取り入れて,高度に整合性のある推論連鎖を生成することにより,トレーニングプロセスをさらに強化する。パフォーマンスと一貫性の両方を推論する上で、私たちのフレームワークの有効性を実証的に強調します。 Vision-language models (VLMs) have recently demonstrated strong efficacy as visual assistants that can parse natural queries about the visual content and generate human-like outputs. In this work, we explore the ability of these models to demonstrate human-like reasoning based on the perceived information. To address a crucial concern regarding the extent to which their reasoning capabilities are fully consistent and grounded, we also measure the reasoning consistency of these models. We achieve this by proposing a chain-of-thought (CoT) based consistency measure. However, such an evaluation requires a benchmark that encompasses both high-level inference and detailed reasoning chains, which is costly. We tackle this challenge by proposing a LLM-Human-in-the-Loop pipeline, which notably reduces cost while simultaneously ensuring the generation of a high-quality dataset. Based on this pipeline and the existing coarse-grained annotated dataset, we build the CURE benchmark to measure both the zero-shot reasoning performance and consistency of VLMs. We evaluate existing state-of-the-art VLMs, and find that even the best-performing model is unable to demonstrate strong visual reasoning capabilities and consistency, indicating that substantial efforts are required to enable VLMs to perform visual reasoning as systematically and consistently as humans. As an early step, we propose a two-stage training framework aimed at improving both the reasoning performance and consistency of VLMs. The first stage involves employing supervised fine-tuning of VLMs using step-by-step reasoning samples automatically generated by LLMs. In the second stage, we further augment the training process by incorporating feedback provided by LLMs to produce reasoning chains that are highly consistent and grounded. We empirically highlight the effectiveness of our framework in both reasoning performance and consistency.	翻訳日:2024-03-21 22:37:29 公開日:2024-03-19
# Forman-Ricci曲率の増大による過スムージングと過スワッシングの緩和 Mitigating Over-Smoothing and Over-Squashing using Augmentations of Forman-Ricci Curvature ( http://arxiv.org/abs/2309.09384v3 ) ライセンス: Link先を確認	Lukas Fesser, Melanie Weber,	(参考訳) グラフニューラルネットワーク(GNN)は、ドメイン間のグラフ構造化データ学習に成功しているが、いくつかの潜在的な落とし穴が最近説明されている。これには、長距離接続(オーバー・スクワッシング)で符号化された情報を正確に活用できないことや、ネットワーク深度(オーバー・スムーシング)が増大する近くのノードの学習した表現を区別できないことが含まれる。両効果を特徴付ける効果的な方法は、離散曲率である: オーバー・スクアッシング効果を弱める長距離接続は、低い曲率を持つのに対し、オーバー・スムーシングに寄与するエッジは高い曲率を持つ。この観察により、オーバースムーシングとオーバースキャッシングを緩和するためにエッジを追加または除去する再配線技術がもたらされた。グラフの曲率やラプラシアンのスペクトルなどのグラフ特性を利用するいくつかの再配線手法が提案されている。しかし、既存の手法、特に曲率に基づく手法は、しばしば高価なサブルーチンと注意深いハイパーパラメータチューニングを必要とし、大規模なグラフに適用性を制限する。本稿では、線形時間で計算可能なスケーラブルな曲率表記法であるAFRC(Augmented Forman-Ricci curvature)に基づく書き換え手法を提案する。 AFRCはメッセージパッシングGNNにおける過剰なスムースと過剰なスキャッシング効果を効果的に特徴付ける。提案手法は,他の手法と比較して計算コストを大幅に削減しつつ,最先端の性能を実現することを示す実験により理論的結果を補完する。離散曲率の基本特性を生かして,高コストなハイパーパラメータ探索を回避し,提案手法のスケーラビリティを向上する,曲率ベースリワイアリングにおけるハイパーパラメータの効果的なヒューリスティックスを提案する。 While Graph Neural Networks (GNNs) have been successfully leveraged for learning on graph-structured data across domains, several potential pitfalls have been described recently. Those include the inability to accurately leverage information encoded in long-range connections (over-squashing), as well as difficulties distinguishing the learned representations of nearby nodes with growing network depth (over-smoothing). An effective way to characterize both effects is discrete curvature: Long-range connections that underlie over-squashing effects have low curvature, whereas edges that contribute to over-smoothing have high curvature. This observation has given rise to rewiring techniques, which add or remove edges to mitigate over-smoothing and over-squashing. Several rewiring approaches utilizing graph characteristics, such as curvature or the spectrum of the graph Laplacian, have been proposed. However, existing methods, especially those based on curvature, often require expensive subroutines and careful hyperparameter tuning, which limits their applicability to large-scale graphs. Here we propose a rewiring technique based on Augmented Forman-Ricci curvature (AFRC), a scalable curvature notation, which can be computed in linear time. We prove that AFRC effectively characterizes over-smoothing and over-squashing effects in message-passing GNNs. We complement our theoretical results with experiments, which demonstrate that the proposed approach achieves state-of-the-art performance while significantly reducing the computational cost in comparison with other methods. Utilizing fundamental properties of discrete curvature, we propose effective heuristics for hyperparameters in curvature-based rewiring, which avoids expensive hyperparameter searches, further improving the scalability of the proposed approach.	翻訳日:2024-03-21 22:37:29 公開日:2024-03-19
# BooookScore: LLM時代における書籍長要約の体系的研究 BooookScore: A systematic exploration of book-length summarization in the era of LLMs ( http://arxiv.org/abs/2310.00785v3 ) ライセンス: Link先を確認	Yapei Chang, Kyle Lo, Tanya Goyal, Mohit Iyyer,	(参考訳) 大規模言語モデル (LLM) のコンテキストウィンドウサイズを超える書籍の長さの文書 (>100Kトークン) を要約するには、まず入力文書を小さなチャンクに分割し、LLMにチャンクレベルの要約をマージ、更新、圧縮するよう促す必要がある。この課題の複雑さと重要性にもかかわらず、既存の書籍長要約データセット(例:BookSum)は、ほとんどの公共LCMの事前学習データであり、既存の評価手法は、現代のLCM要約器による誤りを捉えるのに苦労している。本稿では,(1)階層的にチャンクレベルの要約をマージし,(2)実行中の要約を漸進的に更新する。我々は、最近出版された100冊のGPT-4生成した要約に対して、1193個の微粒な人間のアノテーションを取得し、LLMによる8種類のコヒーレンスエラーを同定した。人間の評価は高価で時間を要するため,識別されたエラータイプを一切含まない要約文の比率を計測する自動尺度BooookScoreを開発する。 BooookScoreは、人間のアノテーションと高い合意を持っていて、他の多くの重要なパラメータ(例えば、チャンクサイズ、ベースLLM)の影響を体系的に評価できます。 GPT-4 や Claude 2 のようなクローズドソース LLM は,オープンソースモデルよりも BooookScore の高いサマリーを生成することがわかった。 LLaMA 2は他のモデルより遅れているが、MixtralはGPT-3.5-Turboと同等のパフォーマンスを達成している。増分更新によってBooookScoreは低下するが、階層的なマージよりも詳細度が高い。 Summarizing book-length documents (>100K tokens) that exceed the context window size of large language models (LLMs) requires first breaking the input document into smaller chunks and then prompting an LLM to merge, update, and compress chunk-level summaries. Despite the complexity and importance of this task, it has yet to be meaningfully studied due to the challenges of evaluation: existing book-length summarization datasets (e.g., BookSum) are in the pretraining data of most public LLMs, and existing evaluation methods struggle to capture errors made by modern LLM summarizers. In this paper, we present the first study of the coherence of LLM-based book-length summarizers implemented via two prompting workflows: (1) hierarchically merging chunk-level summaries, and (2) incrementally updating a running summary. We obtain 1193 fine-grained human annotations on GPT-4 generated summaries of 100 recently-published books and identify eight common types of coherence errors made by LLMs. Because human evaluation is expensive and time-consuming, we develop an automatic metric, BooookScore, that measures the proportion of sentences in a summary that do not contain any of the identified error types. BooookScore has high agreement with human annotations and allows us to systematically evaluate the impact of many other critical parameters (e.g., chunk size, base LLM) while saving $15K USD and 500 hours in human evaluation costs. We find that closed-source LLMs such as GPT-4 and Claude 2 produce summaries with higher BooookScore than those generated by open-source models. While LLaMA 2 falls behind other models, Mixtral achieves performance on par with GPT-3.5-Turbo. Incremental updating yields lower BooookScore but higher level of detail than hierarchical merging, a trade-off sometimes preferred by annotators.	翻訳日:2024-03-21 22:37:29 公開日:2024-03-19
# FroSSL: 効率的なマルチビュー自己監視学習のためのFrobenius Norm最小化 FroSSL: Frobenius Norm Minimization for Efficient Multiview Self-Supervised Learning ( http://arxiv.org/abs/2310.02903v3 ) ライセンス: Link先を確認	Oscar Skean, Aayush Dhakal, Nathan Jacobs, Luis Gonzalo Sanchez Giraldo,	(参考訳) 自己教師付き学習(SSL)は、表現学習の一般的なパラダイムである。近年のマルチビュー手法は, サンプルコントラスト, 次元コントラスト, 非対称ネットワークベースに分類される。これらのファミリーは類似した品質の解に収束するが、いくつかの手法がエポック非効率であり、目標のパフォーマンスに到達するのに長い訓練を必要とすることを実証的に示すことができる。効率性を改善するための2つの主要なアプローチは、共分散固有値正則化と、より多くのビューの使用である。しかし、これらの2つのアプローチは固有値の計算の複雑さのために結合が難しい。固有分解を完全に回避しながら双方のアプローチを一致させる目的関数FroSSLを提案する。 FroSSLは、共分散フロベニウスノルムを最小化して崩壊を回避し、平均二乗誤差を最小化して拡張不変性を達成する。我々は,FroSSLが他のSSLメソッドよりも高速に競合精度に達することを示し,FroSSLが埋め込み共分散行列の固有値にどのように影響するかによって,この高速収束が理論的および実証的な支持を提供する。また、FroSSLは、STL-10、Tiny Imagenet、Imagenet-100など、複数のデータセット上でResNet18をトレーニングする際に、線形プローブ評価の競合表現を学習することを示した。 Self-supervised learning (SSL) is a popular paradigm for representation learning. Recent multiview methods can be classified as sample-contrastive, dimension-contrastive, or asymmetric network-based, with each family having its own approach to avoiding informational collapse. While these families converge to solutions of similar quality, it can be empirically shown that some methods are epoch-inefficient and require longer training to reach a target performance. Two main approaches to improving efficiency are covariance eigenvalue regularization and using more views. However, these two approaches are difficult to combine due to the computational complexity of computing eigenvalues. We present the objective function FroSSL which reconciles both approaches while avoiding eigendecomposition entirely. FroSSL works by minimizing covariance Frobenius norms to avoid collapse and minimizing mean-squared error to achieve augmentation invariance. We show that FroSSL reaches competitive accuracies more quickly than any other SSL method and provide theoretical and empirical support that this faster convergence is due to how FroSSL affects the eigenvalues of the embedding covariance matrices. We also show that FroSSL learns competitive representations on linear probe evaluation when used to train a ResNet18 on several datasets, including STL-10, Tiny Imagenet, and Imagenet-100.	翻訳日:2024-03-21 22:37:29 公開日:2024-03-19
# 漸近的にフリーなスケッチリッジアンサンブル:リスク、クロスバリデーション、チューニング Asymptotically free sketched ridge ensembles: Risks, cross-validation, and tuning ( http://arxiv.org/abs/2310.04357v3 ) ライセンス: Link先を確認	Pratik Patil, Daniel LeJeune,	(参考訳) 我々は、スケッチされたリッジ回帰アンサンブルの予測リスクを推定するために、一般化されたクロスバリデーション(GCV)の整合性を確立するために、ランダム行列理論を用いて、正規化とスケッチパラメータの効率的で一貫したチューニングを可能にする。我々の結果は、非常に穏やかなデータ仮定の下で、漸近的に自由なスケッチの幅広いクラスを保っている。正方形の予測リスクに対して,無作為な等価な暗黙のリッジバイアスとスケッチに基づく分散を分解し,無限アンサンブルでスケッチサイズをチューニングするだけで,そのリスクを大域的に最適化できることを示す。一般の準4次予測リスク関数に対しては、GCVを拡張して一貫したリスク推定器を構築し、ワッサーシュタイン2計量におけるGCV補正予測の分布収束を得る。これは特に、トレーニングデータに漸近的に正しいカバレッジ条件で予測間隔を構築することができる。また,小型のスケッチ付きリッジ・アンサンブルを用いて,GCVを用いて非スケッチ・リッジ・レグレッションのリスクを効率的に推定できるアンサンブル・トリックを提案する。我々は、CountSketchやサブサンプリングされたランダム化された離散コサイン変換を含む実用的なスケッチを持つ合成データセットと実大規模データセットの両方を用いて、理論的結果を実証的に検証した。 We employ random matrix theory to establish consistency of generalized cross validation (GCV) for estimating prediction risks of sketched ridge regression ensembles, enabling efficient and consistent tuning of regularization and sketching parameters. Our results hold for a broad class of asymptotically free sketches under very mild data assumptions. For squared prediction risk, we provide a decomposition into an unsketched equivalent implicit ridge bias and a sketching-based variance, and prove that the risk can be globally optimized by only tuning sketch size in infinite ensembles. For general subquadratic prediction risk functionals, we extend GCV to construct consistent risk estimators, and thereby obtain distributional convergence of the GCV-corrected predictions in Wasserstein-2 metric. This in particular allows construction of prediction intervals with asymptotically correct coverage conditional on the training data. We also propose an "ensemble trick" whereby the risk for unsketched ridge regression can be efficiently estimated via GCV using small sketched ridge ensembles. We empirically validate our theoretical results using both synthetic and real large-scale datasets with practical sketches including CountSketch and subsampled randomized discrete cosine transforms.	翻訳日:2024-03-21 22:37:29 公開日:2024-03-19
# 脳波のマルチスケール因果バックボーンの抽出 Extracting the Multiscale Causal Backbone of Brain Dynamics ( http://arxiv.org/abs/2311.00118v2 ) ライセンス: Link先を確認	Gabriele D'Acunto, Francesco Bonchi, Gianmarco De Francisci Morales, Giovanni Petri,	(参考訳) 脳の接続に関する研究努力の大部分は、脳のダイナミックスを管理する因果機構に直接関係しない脳領域間の統計的関連を中心に展開している。本稿では,脳力学のマルチスケール因果バックボーン(MCB)を提案する。提案手法は,近年のマルチスケール因果構造学習の進歩を活用し,モデル適合と複雑性のトレードオフを最適化する。合成データに対する経験的評価は,標準機能接続ネットワークに基づくベースラインよりも,我々の方法論の方が優れていることを示している。安静時fMRIデータに適用すると,左脳半球と右脳半球の両方に細いMCBが認められる。マルチスケールの性質から,低周波帯では因果ダイナミクスは高次認知機能に関連する脳の領域によって駆動され,高周波では知覚処理に関連するノードが重要な役割を担っていることが示唆された。最後に,脳接続の因果指紋の存在を確認し,因果的観点からの脳接続指紋の広範な研究を支援した。 The bulk of the research effort on brain connectivity revolves around statistical associations among brain regions, which do not directly relate to the causal mechanisms governing brain dynamics. Here we propose the multiscale causal backbone (MCB) of brain dynamics, shared by a set of individuals across multiple temporal scales, and devise a principled methodology to extract it. Our approach leverages recent advances in multiscale causal structure learning and optimizes the trade-off between the model fit and its complexity. Empirical assessment on synthetic data shows the superiority of our methodology over a baseline based on canonical functional connectivity networks. When applied to resting-state fMRI data, we find sparse MCBs for both the left and right brain hemispheres. Thanks to its multiscale nature, our approach shows that at low-frequency bands, causal dynamics are driven by brain regions associated with high-level cognitive functions; at higher frequencies instead, nodes related to sensory processing play a crucial role. Finally, our analysis of individual multiscale causal structures confirms the existence of a causal fingerprint of brain connectivity, thus supporting the existing extensive research in brain connectivity fingerprinting from a causal perspective.	翻訳日:2024-03-21 22:27:37 公開日:2024-03-19
# 分散化フェデレーション学習ネットワークにおける対立ノード配置の影響 The Impact of Adversarial Node Placement in Decentralized Federated Learning Networks ( http://arxiv.org/abs/2311.07946v4 ) ライセンス: Link先を確認	Adam Piaseczny, Eric Ruzomberka, Rohit Parasnis, Christopher G. Brinton,	(参考訳) 連邦学習(FL)の人気が高まるにつれ、新しい分散フレームワークが広まりつつある。これらのフレームワークは分散環境の利点を利用して、高速でエネルギー効率の良いデバイス間通信を可能にする。しかし、この人気の高まりは、堅牢なセキュリティ対策の必要性を増している。既存の研究はFLセキュリティの様々な側面を探求してきたが、分散ネットワークにおける敵ノード配置の役割はほとんど解明されていない。本稿では,ネットワーク内の配置を協調的に調整できる様々な敵配置戦略において,分散FLの性能を解析することにより,このギャップを解消する。ランダムな配置とネットワーク中心性に基づく配置の2つの基本戦略を確立する。本稿では, 敵同士の平均ネットワーク距離を最大化することにより, 敵中心性よりも敵の拡散を優先する新たな攻撃アルゴリズムを提案する。新たなアタックアルゴリズムは、テスト精度などの重要なパフォーマンス指標に大きく影響し、考慮されたセットアップに対して、ベースラインフレームワークを9.5%から6.5.%に上回る結果となった。我々の研究は、分散FLシステムの脆弱性に関する貴重な知見を提供し、よりセキュアで堅牢な分散FLフレームワークを開発することを目的とした将来の研究の舞台となる。 As Federated Learning (FL) grows in popularity, new decentralized frameworks are becoming widespread. These frameworks leverage the benefits of decentralized environments to enable fast and energy-efficient inter-device communication. However, this growing popularity also intensifies the need for robust security measures. While existing research has explored various aspects of FL security, the role of adversarial node placement in decentralized networks remains largely unexplored. This paper addresses this gap by analyzing the performance of decentralized FL for various adversarial placement strategies when adversaries can jointly coordinate their placement within a network. We establish two baseline strategies for placing adversarial node: random placement and network centrality-based placement. Building on this foundation, we propose a novel attack algorithm that prioritizes adversarial spread over adversarial centrality by maximizing the average network distance between adversaries. We show that the new attack algorithm significantly impacts key performance metrics such as testing accuracy, outperforming the baseline frameworks by between $9\%$ and $66.5\%$ for the considered setups. Our findings provide valuable insights into the vulnerabilities of decentralized FL systems, setting the stage for future research aimed at developing more secure and robust decentralized FL frameworks.	翻訳日:2024-03-21 22:17:48 公開日:2024-03-19
# Span-based Optimal Sample Complexity for Average Reward MDPs Span-Based Optimal Sample Complexity for Average Reward MDPs ( http://arxiv.org/abs/2311.13469v2 ) ライセンス: Link先を確認	Matthew Zurek, Yudong Chen,	(参考訳) 平均回帰マルコフ決定過程(MDP)において,$\varepsilon$-optimal Policyを生成モデルで学習する際のサンプル複雑性について検討した。我々は、$\widetilde{O}\left(SA\frac{H}{\varepsilon^2} \right)$, ここで、$H$は最適ポリシーのバイアス関数のスパンであり、$SA$は状態-作用空間の濃度である。我々の結果は、すべてのパラメータにおいて(ログファクタまで)最小値の最大値である$S,A,H$および$\varepsilon$で、すべてのポリシーに対して一様に有界な混合時間を仮定する既存の作業を改善するか、パラメータに最適に依存するかのいずれかである。本結果は, 平均再帰型MDPを, 割引型MDPに還元することに基づく。この削減の最適性を確立するために、$\widetilde{O}\left(SA\frac{H}{(1-\gamma)^2\varepsilon^2} \right)$サンプルが$\varepsilon$-optimal policyを学習するのに十分であることを示す$\gamma$-discounted MDPsに対する改善されたバウンダリを開発し、$\widetilde{\Omega}\left(SA\frac{1}{(1-\gamma)^3\varepsilon^2} \right)のよく知られた下限を回避した。分析では,スパンパラメータの観点から,特定のインスタンス依存分散パラメータの上限を求める。これらの境界は、MDPの混合時間や直径に基づくものよりも厳密であり、より広い用途がある可能性がある。 We study the sample complexity of learning an $\varepsilon$-optimal policy in an average-reward Markov decision process (MDP) under a generative model. We establish the complexity bound $\widetilde{O}\left(SA\frac{H}{\varepsilon^2} \right)$, where $H$ is the span of the bias function of the optimal policy and $SA$ is the cardinality of the state-action space. Our result is the first that is minimax optimal (up to log factors) in all parameters $S,A,H$ and $\varepsilon$, improving on existing work that either assumes uniformly bounded mixing times for all policies or has suboptimal dependence on the parameters. Our result is based on reducing the average-reward MDP to a discounted MDP. To establish the optimality of this reduction, we develop improved bounds for $\gamma$-discounted MDPs, showing that $\widetilde{O}\left(SA\frac{H}{(1-\gamma)^2\varepsilon^2} \right)$ samples suffice to learn a $\varepsilon$-optimal policy in weakly communicating MDPs under the regime that $\gamma \geq 1 - \frac{1}{H}$, circumventing the well-known lower bound of $\widetilde{\Omega}\left(SA\frac{1}{(1-\gamma)^3\varepsilon^2} \right)$ for general $\gamma$-discounted MDPs. Our analysis develops upper bounds on certain instance-dependent variance parameters in terms of the span parameter. These bounds are tighter than those based on the mixing time or diameter of the MDP and may be of broader use.	翻訳日:2024-03-21 22:17:48 公開日:2024-03-19
# DREAM:拡散整流と推定適応モデル DREAM: Diffusion Rectification and Estimation-Adaptive Models ( http://arxiv.org/abs/2312.00210v2 ) ライセンス: Link先を確認	Jinxin Zhou, Tianyu Ding, Tianyi Chen, Jiachen Jiang, Ilya Zharkov, Zhihui Zhu, Luming Liang,	(参考訳) DREAM(Diffusion Rectification and Estimation Adaptive Models)は,最小限のコード変更(たった3行)を必要とするが,拡散モデルのサンプリングによるトレーニングのアライメントは著しく向上する。 DREAMには2つのコンポーネントがある。DREAMは、サンプリングプロセスの反映のためにトレーニングを調整する拡散補正と、歪みに対する知覚のバランスをとる推定適応である。画像超解像(SR)に適用すると、DREAMは歪みの最小化と高画質の保存とのトレードオフを確実にナビゲートする。実験では、標準拡散ベースのSR法よりもDREAMの方が優れており、トレーニング収束の高速化に2ドルから3ドル、サンプリングステップの削減に10ドルから20ドルがかかる。 DREAMが拡散モデルトレーニングパラダイムを再考することを願っている。 We present DREAM, a novel training framework representing Diffusion Rectification and Estimation Adaptive Models, requiring minimal code changes (just three lines) yet significantly enhancing the alignment of training with sampling in diffusion models. DREAM features two components: diffusion rectification, which adjusts training to reflect the sampling process, and estimation adaptation, which balances perception against distortion. When applied to image super-resolution (SR), DREAM adeptly navigates the tradeoff between minimizing distortion and preserving high image quality. Experiments demonstrate DREAM's superiority over standard diffusion-based SR methods, showing a $2$ to $3\times $ faster training convergence and a $10$ to $20\times$ reduction in sampling steps to achieve comparable results. We hope DREAM will inspire a rethinking of diffusion model training paradigms.	翻訳日:2024-03-21 22:08:02 公開日:2024-03-19
# メタキャリブレータを用いたNeRFの瞬時不確かさ校正 Instant Uncertainty Calibration of NeRFs Using a Meta-calibrator ( http://arxiv.org/abs/2312.02350v2 ) ライセンス: Link先を確認	Niki Amini-Naieni, Tomas Jakab, Andrea Vedaldi, Ronald Clark,	(参考訳) ニューラルレージアンス場(NeRF)は、新しいビュー合成を著しく改善しているが、画像予測における正確な不確かさの定量化は未解決の問題である。最先端の密度認識型NeRFアンサンブル(DANE)[29]を含む不確実性を推定するための一般的な手法は、キャリブレーションなしで不確実性を定量化する。これはしばしば画像予測における過度または過度な信頼につながり、実際の応用を損なう可能性がある。本稿では,NeRFの校正不確かさを初めて達成する手法を提案する。これを実現するために,既存のキャリブレーション手法をNeRFに適用する上で重要な課題を克服した。この問題は、スパースビューの設定では特に問題があり、3つのイメージで操作できます。そこで本研究では,単一前方通過によるNeRFの不確実な校正を行うメタキャリブレータの概念を提案する。我々のメタキャリブレータはニューラルネットワークであり、NeRF画像と未校正不確実性マップを入力として、NeRFの未校正不確かさを補正するシーン固有の校正曲線を出力する。メタキャリブレータは,未確認シーンを一般化し,NeRFの良好な校正および最先端の不確実性を達成し,DANEや他のアプローチを著しく上回ることを示す。これにより、次世代の視点計画や、診断のための信頼性の高い画像再構成など、正確なNeRF不確実性推定に依存するアプリケーションを改善する機会が開ける。 Although Neural Radiance Fields (NeRFs) have markedly improved novel view synthesis, accurate uncertainty quantification in their image predictions remains an open problem. The prevailing methods for estimating uncertainty, including the state-of-the-art Density-aware NeRF Ensembles (DANE) [29], quantify uncertainty without calibration. This frequently leads to over- or under-confidence in image predictions, which can undermine their real-world applications. In this paper, we propose a method which, for the first time, achieves calibrated uncertainties for NeRFs. To accomplish this, we overcome a significant challenge in adapting existing calibration techniques to NeRFs: a need to hold out ground truth images from the target scene, reducing the number of images left to train the NeRF. This issue is particularly problematic in sparse-view settings, where we can operate with as few as three images. To address this, we introduce the concept of a meta-calibrator that performs uncertainty calibration for NeRFs with a single forward pass without the need for holding out any images from the target scene. Our meta-calibrator is a neural network that takes as input the NeRF images and uncalibrated uncertainty maps and outputs a scene-specific calibration curve that corrects the NeRF's uncalibrated uncertainties. We show that the meta-calibrator can generalize on unseen scenes and achieves well-calibrated and state-of-the-art uncertainty for NeRFs, significantly beating DANE and other approaches. This opens opportunities to improve applications that rely on accurate NeRF uncertainty estimates such as next-best view planning and potentially more trustworthy image reconstruction for medical diagnosis.	翻訳日:2024-03-21 22:08:02 公開日:2024-03-19
# 映像行動検出のための半教師付き能動学習 Semi-supervised Active Learning for Video Action Detection ( http://arxiv.org/abs/2312.07169v2 ) ライセンス: Link先を確認	Ayush Singh, Aayush J Rana, Akash Kumar, Shruti Vyas, Yogesh Singh Rawat,	(参考訳) 本研究では,映像行動検出のためのラベル学習に焦点をあてる。本研究では,ラベル付きデータとラベルなしデータと,行動検出のための情報的サンプル選択を併用した,新しい半教師付きアクティブラーニング手法を開発した。ビデオ行動検出には時空間的局所化と分類が必要であるため、アクティブな学習情報サンプル選択と半教師付き学習擬似ラベル生成の両方にいくつかの課題が生じる。まず,映像行動検出のための情報サンプルを効果的に選択するシンプルな拡張戦略であるNossAugを提案する。次に、ビデオ内の関連活動領域を強調することで、ビデオアクション検出におけるSSLの擬似ラベルの有効活用を可能にする、ハイパスフィルタリングに基づく新しい技術であるfft-attentionを提案する。提案手法を,UCF-101-24,JHMDB-21,Youtube-VOSの3種類のベンチマークデータセットで評価した。まず,提案手法は,UCF101-24とJHMDB-21の両方のベースラインアプローチとともに,半教師付き・弱教師付き学習において先行して機能するビデオアクション検出に有効であることを示す。次に、ビデオ内の他の密集予測タスクに対する一般化能力を示すビデオオブジェクトセグメンテーションにおけるYoutube-VOSの有効性を示す。 In this work, we focus on label efficient learning for video action detection. We develop a novel semi-supervised active learning approach which utilizes both labeled as well as unlabeled data along with informative sample selection for action detection. Video action detection requires spatio-temporal localization along with classification, which poses several challenges for both active learning informative sample selection as well as semi-supervised learning pseudo label generation. First, we propose NoiseAug, a simple augmentation strategy which effectively selects informative samples for video action detection. Next, we propose fft-attention, a novel technique based on high-pass filtering which enables effective utilization of pseudo label for SSL in video action detection by emphasizing on relevant activity region within a video. We evaluate the proposed approach on three different benchmark datasets, UCF-101-24, JHMDB-21, and Youtube-VOS. First, we demonstrate its effectiveness on video action detection where the proposed approach outperforms prior works in semi-supervised and weakly-supervised learning along with several baseline approaches in both UCF101-24 and JHMDB-21. Next, we also show its effectiveness on Youtube-VOS for video object segmentation demonstrating its generalization capability for other dense prediction tasks in videos.	翻訳日:2024-03-21 21:58:15 公開日:2024-03-19
# バレンプラトーの証明可能な欠如は、古典的なシミュラビリティを暗示しているか?それとも、なぜ変分量子コンピューティングを再考する必要があるのか? Does provable absence of barren plateaus imply classical simulability? Or, why we need to rethink variational quantum computing ( http://arxiv.org/abs/2312.09121v2 ) ライセンス: Link先を確認	M. Cerezo, Martin Larocca, Diego García-Martín, N. L. Diaz, Paolo Braccia, Enrico Fontana, Manuel S. Rudolph, Pablo Bermejo, Aroosa Ijaz, Supanut Thanasilp, Eric R. Anschuetz, Zoë Holmes,	(参考訳) 近年,バレン高原現象の解明に多大な努力が払われている。このパースペクティブな記事では、部屋の中でますます大きな象に直面し、多くの人にほのめかされたが明示されていない質問に答える: 不毛の台地を避けることができる構造も活用して、古典的な損失を効率的にシミュレートできるだろうか? 我々は、初期データ取得フェーズにおいて量子デバイスから古典的なデータを収集できることを前提として、バレン高原の証明可能なモデルが古典的にシミュレート可能であることを示す強力な証拠を示す。これは、不毛の高原が次元性の呪いによって生じること、そしてそれらを解決するための現在のアプローチが、問題をいくつかの小さく古典的にシミュレート可能な部分空間にエンコードする、という観察から導かれる。したがって、データ収集には量子コンピュータの強調が不可欠であるが、我々の分析は、不毛高原の景観に対するパラメタライズド量子回路の情報処理能力の非古典性に深刻な疑問を呈している。議論の要点、スマートイニシャライゼーションの役割、そしてパラメタライズド量子回路の実行による証明可能な超ポリノミカル(あるいは単に実用的)の利点の可能性について論じる。 A large amount of effort has recently been put into understanding the barren plateau phenomenon. In this perspective article, we face the increasingly loud elephant in the room and ask a question that has been hinted at by many but not explicitly addressed: Can the structure that allows one to avoid barren plateaus also be leveraged to efficiently simulate the loss classically? We present strong evidence that commonly used models with provable absence of barren plateaus are also classically simulable, provided that one can collect some classical data from quantum devices during an initial data acquisition phase. This follows from the observation that barren plateaus result from a curse of dimensionality, and that current approaches for solving them end up encoding the problem into some small, classically simulable, subspaces. Thus, while stressing quantum computers can be essential for collecting data, our analysis sheds serious doubt on the non-classicality of the information processing capabilities of parametrized quantum circuits for barren plateau-free landscapes. We end by discussing caveats in our arguments, the role of smart initializations and the possibility of provably superpolynomial, or simply practical, advantages from running parametrized quantum circuits.	翻訳日:2024-03-21 21:58:15 公開日:2024-03-19
# DiffPortrait3D:ゼロショットポートレートビュー合成のための制御可能な拡散 DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis ( http://arxiv.org/abs/2312.13016v4 ) ライセンス: Link先を確認	Yuming Gu, You Xie, Hongyi Xu, Guoxian Song, Yichun Shi, Di Chang, Jing Yang, Linjie Luo,	(参考訳) 本稿では,DiffPortrait3Dという条件付き拡散モデルについて述べる。具体的には、単一のRGB入力を前提として、アイデンティティと表情の両方を保持した新しいカメラビューから、可塑性だが一貫した顔の詳細を合成することを目的としている。我々のゼロショット法は、時間を要する最適化と微調整の代わりに、カメラビューのない任意の顔像、極端な表情、多様な芸術的描写を一般化する。その中心となるのは、大規模画像データセットで事前訓練された2次元拡散モデルの生成前をレンダリングバックボーンとして利用し、一方、デノナイジングは、外観とカメラのポーズの無拘束制御でガイドされる。そこで我々はまず,凍結したユニセットの自己注意層に参照画像から外観コンテキストを注入する。そして、レンダリングビューを、同じビューから横断被写体の条件画像を見て、カメラポーズを解釈する新しい条件制御モジュールで操作する。さらに、トレーニング可能なクロスビューアテンションモジュールを挿入してビューの一貫性を高め、推論中に新しい3D対応ノイズ生成プロセスによりさらに強化する。我々は、我々の挑戦的インザワイルドとマルチビューのベンチマークにおいて、質的にも定量的にも、最先端の結果を実証する。 We present DiffPortrait3D, a conditional diffusion model that is capable of synthesizing 3D-consistent photo-realistic novel views from as few as a single in-the-wild portrait. Specifically, given a single RGB input, we aim to synthesize plausible but consistent facial details rendered from novel camera views with retained both identity and facial expression. In lieu of time-consuming optimization and fine-tuning, our zero-shot method generalizes well to arbitrary face portraits with unposed camera views, extreme facial expressions, and diverse artistic depictions. At its core, we leverage the generative prior of 2D diffusion models pre-trained on large-scale image datasets as our rendering backbone, while the denoising is guided with disentangled attentive control of appearance and camera pose. To achieve this, we first inject the appearance context from the reference image into the self-attention layers of the frozen UNets. The rendering view is then manipulated with a novel conditional control module that interprets the camera pose by watching a condition image of a crossed subject from the same view. Furthermore, we insert a trainable cross-view attention module to enhance view consistency, which is further strengthened with a novel 3D-aware noise generation process during inference. We demonstrate state-of-the-art results both qualitatively and quantitatively on our challenging in-the-wild and multi-view benchmarks.	翻訳日:2024-03-21 21:58:15 公開日:2024-03-19
# 不均一グラフ上の分布外一般化のためのFew-Shot Causal Representation Learning Few-Shot Causal Representation Learning for Out-of-Distribution Generalization on Heterogeneous Graphs ( http://arxiv.org/abs/2401.03597v2 ) ライセンス: Link先を確認	Pengfei Ding, Yan Wang, Guanfeng Liu, Nan Wang,	(参考訳) Heterogeneous graph few-shot Learning (HGFL) は、様々な種類のノードとエッジから構成されるヘテロジニアスグラフ(HG)のラベル空間問題に対処するために開発された。 HGFLの中核的な概念は、ソースHGのリッチラベルされたクラスから知識を抽出し、この知識をターゲットHGに転送して、少数のラベル付きトレーニングデータで新しいクラスを学習し、最終的にラベル付きテストデータで予測することである。既存の方法は、典型的には、ソースHG、トレーニングデータ、テストデータは全て同じ分布を共有していると仮定する。しかし、実際には、(1)対象のHG分布と一致するソースHGの限られた可用性、(2)対象のHGの予測不能なデータ生成機構の2つの理由により、これらの3種類のデータ間の分散シフトは避けられない。このような分布シフトは,既存の手法では非効率な知識伝達と学習性能の低下をもたらすため,HGFLにおけるアウト・オブ・ディストリビューション(OOD)の一般化という新たな問題に繋がる。この課題に対処するため、我々はCausal OOD Heterogeneous graph Few-shot Learning Model、すなわちCOHFを提案する。 COHFでは、構造因果モデルを用いてHGの分布シフトを初めて特徴付け、HGFLにおけるOOD一般化の不変原理を確立した。そして、この不変原理に従い、分散シフトの影響を軽減するために、新しい変分自己エンコーダに基づく異種グラフニューラルネットワークを提案する。最後に、このネットワークを新しいメタ学習フレームワークと統合することにより、COHFは知識をターゲットHGに効果的に転送し、ラベルの少ないデータで新しいクラスを予測する。 7つの実世界のデータセットに対する大規模な実験は、最先端の手法よりもCOHFの優れた性能を示している。 Heterogeneous graph few-shot learning (HGFL) has been developed to address the label sparsity issue in heterogeneous graphs (HGs), which consist of various types of nodes and edges. The core concept of HGFL is to extract knowledge from rich-labeled classes in a source HG, transfer this knowledge to a target HG to facilitate learning new classes with few-labeled training data, and finally make predictions on unlabeled testing data. Existing methods typically assume that the source HG, training data, and testing data all share the same distribution. However, in practice, distribution shifts among these three types of data are inevitable due to two reasons: (1) the limited availability of the source HG that matches the target HG distribution, and (2) the unpredictable data generation mechanism of the target HG. Such distribution shifts result in ineffective knowledge transfer and poor learning performance in existing methods, thereby leading to a novel problem of out-of-distribution (OOD) generalization in HGFL. To address this challenging problem, we propose a novel Causal OOD Heterogeneous graph Few-shot learning model, namely COHF. In COHF, we first characterize distribution shifts in HGs with a structural causal model, establishing an invariance principle for OOD generalization in HGFL. Then, following this invariance principle, we propose a new variational autoencoder-based heterogeneous graph neural network to mitigate the impact of distribution shifts. Finally, by integrating this network with a novel meta-learning framework, COHF effectively transfers knowledge to the target HG to predict new classes with few-labeled data. Extensive experiments on seven real-world datasets have demonstrated the superior performance of COHF over the state-of-the-art methods.	翻訳日:2024-03-21 21:48:20 公開日:2024-03-19
# k- Support Normによる反復正規化:スパースリカバリにおける重要な補完 Iterative Regularization with k-support Norm: An Important Complement to Sparse Recovery ( http://arxiv.org/abs/2401.05394v4 ) ライセンス: Link先を確認	William de Vazelhes, Bhaskar Mukhoty, Xiao-Tong Yuan, Bin Gu,	(参考訳) スパースリカバリは機械学習と信号処理においてユビキタスである。スパースリカバリのNPハードの性質のため、既存の手法は制限的な(あるいは未知の)適用条件や高い計算コストに悩まされていることが知られている。近年, 反復正規化手法は, 従来手法で用いられてきた面倒なグリッド探索よりも, 早い停止時間でスパースリカバリを達成できるため, 有望な高速手法として出現している。しかし、これらの反復的メソッドのほとんどは、制限的な適用性条件を必要とする$\ell_1$ノルムに基づいており、多くの場合失敗する可能性がある。そのため、より広い条件下で反復正則化法を用いてスパースリカバリを実現することは、まだ研究されていない。この問題に対処するために、$\ell_1$標準ではなく$k$サポート標準正規化器に基づく新しい反復正規化アルゴリズムIRKSNを提案する。我々は、IRKSNとスパースリカバリ条件を提供し、従来のリカバリ条件と$\ell_1$標準正規化器を比較した。さらに,IRKSNのモデル誤差を定数で早期に停止し,スパースリカバリの標準線形レートを達成する。最後に, 相関設計行列を用いた支援回復実験を含むいくつかの実験において, アルゴリズムの適用性について述べる。 Sparse recovery is ubiquitous in machine learning and signal processing. Due to the NP-hard nature of sparse recovery, existing methods are known to suffer either from restrictive (or even unknown) applicability conditions, or high computational cost. Recently, iterative regularization methods have emerged as a promising fast approach because they can achieve sparse recovery in one pass through early stopping, rather than the tedious grid-search used in the traditional methods. However, most of those iterative methods are based on the $\ell_1$ norm which requires restrictive applicability conditions and could fail in many cases. Therefore, achieving sparse recovery with iterative regularization methods under a wider range of conditions has yet to be further explored. To address this issue, we propose a novel iterative regularization algorithm, IRKSN, based on the $k$-support norm regularizer rather than the $\ell_1$ norm. We provide conditions for sparse recovery with IRKSN, and compare them with traditional conditions for recovery with $\ell_1$ norm regularizers. Additionally, we give an early stopping bound on the model error of IRKSN with explicit constants, achieving the standard linear rate for sparse recovery. Finally, we illustrate the applicability of our algorithm on several experiments, including a support recovery experiment with a correlated design matrix.	翻訳日:2024-03-21 21:48:20 公開日:2024-03-19
# Health-LLM:Personalized Retrieval-Augmented Disease Prediction System Health-LLM: Personalized Retrieval-Augmented Disease Prediction System ( http://arxiv.org/abs/2402.00746v6 ) ライセンス: Link先を確認	Mingyu Jin, Qinkai Yu, Dong Shu, Chong Zhang, Lizhou Fan, Wenyue Hua, Suiyuan Zhu, Yanda Meng, Zhenting Wang, Mengnan Du, Yongfeng Zhang,	(参考訳) 人工知能(AI)の最近の進歩、特に大きな言語モデル(LLM)は、医療応用を著しく進歩させ、インテリジェントな医療治療の可能性を実証している。しかし、膨大なデータ量や一貫性のない症状の特徴付け基準、個々の患者のニーズに医療AIシステムが完全に統合されるのを防ぐといった顕著な課題がある。専門的かつパーソナライズされた医療を促進するために,大規模特徴抽出と医療知識トレードオフスコアリングを組み合わせた,革新的なフレームワークHeath-LLMを提案する。従来の健康管理アプリケーションと比較して,1) 医療報告や医療知識を大きなモデルに統合し,大きな言語モデルに関連性のある質問を投げかける,2) 特徴抽出の強化にRAG(検索強化生成)機構を活用する,3) 半自動的な機能更新フレームワークを組み込んで,疾患予測の精度を向上させる,という3つの利点がある。我々は,Health-LLMシステムの有効性を評価するために,多数の健康レポートを実験した。その結果,提案システムは既存のシステムを超え,疾患予測とパーソナライズされた健康管理を著しく進める可能性が示唆された。コードはhttps://github.com/jmyissb/HealthLLM.comで入手できる。 Recent advancements in artificial intelligence (AI), especially large language models (LLMs), have significantly advanced healthcare applications and demonstrated potentials in intelligent medical treatment. However, there are conspicuous challenges such as vast data volumes and inconsistent symptom characterization standards, preventing full integration of healthcare AI systems with individual patients' needs. To promote professional and personalized healthcare, we propose an innovative framework, Heath-LLM, which combines large-scale feature extraction and medical knowledge trade-off scoring. Compared to traditional health management applications, our system has three main advantages: (1) It integrates health reports and medical knowledge into a large model to ask relevant questions to large language model for disease prediction; (2) It leverages a retrieval augmented generation (RAG) mechanism to enhance feature extraction; (3) It incorporates a semi-automated feature updating framework that can merge and delete features to improve accuracy of disease prediction. We experiment on a large number of health reports to assess the effectiveness of Health-LLM system. The results indicate that the proposed system surpasses the existing ones and has the potential to significantly advance disease prediction and personalized health management. The code is available at https://github.com/jmyissb/HealthLLM.	翻訳日:2024-03-21 21:48:20 公開日:2024-03-19
# 高エネルギー物理におけるイベント分類のためのハイブリッド量子ビジョン変換器 Hybrid Quantum Vision Transformers for Event Classification in High Energy Physics ( http://arxiv.org/abs/2402.00776v2 ) ライセンス: Link先を確認	Eyup B. Unlu, Marçal Comajoan Cara, Gopal Ramesh Dahale, Zhongtian Dong, Roy T. Forestano, Sergei Gleyzer, Daniel Justice, Kyoungchul Kong, Tom Magorsch, Konstantin T. Matchev, Katia Matcheva,	(参考訳) 視覚変換器アーキテクチャに基づくモデルは、画像分類タスクに関しては最先端と見なされる。しかし、トレーニングとデプロイメントの両方に広範な計算資源が必要である。データの量と複雑さが増加するにつれて、問題は悪化する。量子ベースの視覚変換器モデルは、同じ予測力を保ちながら、トレーニングと運用時間を短縮することで、この問題を軽減できる可能性がある。現在の量子コンピュータはまだ高次元のタスクを実行できないが、彼らは将来最も効率的なソリューションの1つを提供している。本研究では、高エネルギー物理学における分類問題(電磁カロリー計における光子と電子の識別)のための量子ハイブリッド・ビジョン・トランスフォーマーの様々なバリエーションを構築した。古典的なビジョントランスフォーマーアーキテクチャに対してテストします。以上の結果から,これらのハイブリッドモデルは,類似したパラメータを持つ古典的アナログに匹敵する性能を達成できることが示唆された。 Models based on vision transformer architectures are considered state-of-the-art when it comes to image classification tasks. However, they require extensive computational resources both for training and deployment. The problem is exacerbated as the amount and complexity of the data increases. Quantum-based vision transformer models could potentially alleviate this issue by reducing the training and operating time while maintaining the same predictive power. Although current quantum computers are not yet able to perform high-dimensional tasks yet, they do offer one of the most efficient solutions for the future. In this work, we construct several variations of a quantum hybrid vision transformer for a classification problem in high energy physics (distinguishing photons and electrons in the electromagnetic calorimeter). We test them against classical vision transformer architectures. Our findings indicate that the hybrid models can achieve comparable performance to their classical analogues with a similar number of parameters.	翻訳日:2024-03-21 21:48:20 公開日:2024-03-19
# 暗号センサ Cryptographic Censorship ( http://arxiv.org/abs/2402.03425v2 ) ライセンス: Link先を確認	Netta Engelhardt, Åsmund Folkestad, Adam Levine, Evita Verheijden, Lisa Yang,	(参考訳) 我々は、弱宇宙検閲予想の量子バージョンを証明するために、2つの大きな一歩を踏み出した。ホログラフィック CFT の時間発展作用素が、あるコード部分空間上のほぼ擬似乱数(あるいはハール乱数)であるとき、対応するバルク双対に事象の地平線が存在する必要があることを示す定理である。この結果は、(有限時間における)事象の地平線の形成を、大域時空構造に関する最小の仮定で保証する一般的な条件を与える。我々の定理は、最近の量子学習のno-go定理の拡張に依存しており、疑似ランダム測度集中の新しい手法を用いて証明されている。この結果を宇宙検閲に適用するために、特異点を古典的、半プランク語型、プランク語型に分ける。古典的および半プランク的特異点がおよそ擬似乱数 CFT 時間進化と相容れないことを示し、したがって、そのような特異点が実際にほぼ擬似乱数であるならば、暗号検閲により、それらは事象の地平線が存在しない状態では存在できない。この結果は、一般に地平線の典型性に依存している量子カオスと熱化に関するセミナルホログラフィック結果がAdS/CFTにおける裸の特異点の形成によって無効にならないという十分な条件を与える。 We formulate and take two large strides towards proving a quantum version of the weak cosmic censorship conjecture. We first prove "Cryptographic Censorship": a theorem showing that when the time evolution operator of a holographic CFT is approximately pseudorandom (or Haar random) on some code subspace, then there must be an event horizon in the corresponding bulk dual. This result provides a general condition that guarantees (in finite time) event horizon formation, with minimal assumptions about the global spacetime structure. Our theorem relies on an extension of a recent quantum learning no-go theorem and is proved using new techniques of pseudorandom measure concentration. To apply this result to cosmic censorship, we separate singularities into classical, semi-Planckian, and Planckian types. We illustrate that classical and semi-Planckian singularities are compatible with approximately pseudorandom CFT time evolution; thus, if such singularities are indeed approximately pseudorandom, by Cryptographic Censorship, they cannot exist in the absence of event horizons. This result provides a sufficient condition guaranteeing that seminal holographic results on quantum chaos and thermalization, whose general applicability relies on typicality of horizons, will not be invalidated by the formation of naked singularities in AdS/CFT.	翻訳日:2024-03-21 21:38:31 公開日:2024-03-19
# HYPO:超球面分布の一般化 HYPO: Hyperspherical Out-of-Distribution Generalization ( http://arxiv.org/abs/2402.07785v2 ) ライセンス: Link先を確認	Yifei Ming, Haoyue Bai, Julian Katz-Samuels, Yixuan Li,	(参考訳) アウト・オブ・ディストリビューション(OOD)の一般化は、現実世界にデプロイされた機械学習モデルにとって重要である。しかし、異なるドメインや環境にまたがって不変の機能を学ぶ能力を必要とするため、これを実現することは基本的に困難である。本稿では,超球面空間における領域不変表現を証明的に学習する新しいフレームワークHYPO(HYPerspherical OOD generalization)を提案する。特に、我々の超球面学習アルゴリズムは、クラス内変異とクラス間分離原則によって導かれる -- 同じクラス(異なるトレーニング領域全体)のフィーチャがクラスプロトタイプと密接に一致していることを保証する一方で、異なるクラスプロトタイプが最大に分離されている。さらに、我々の原型学習目的がOOD一般化境界をどのように改善するかに関する理論的正当性を提供する。 OODベンチマークの挑戦実験を通じて、我々のアプローチが競争基準よりも優れ、優れたパフォーマンスを実現することを実証した。コードはhttps://github.com/deeplearning-wisc/hypo.comで入手できる。 Out-of-distribution (OOD) generalization is critical for machine learning models deployed in the real world. However, achieving this can be fundamentally challenging, as it requires the ability to learn invariant features across different domains or environments. In this paper, we propose a novel framework HYPO (HYPerspherical OOD generalization) that provably learns domain-invariant representations in a hyperspherical space. In particular, our hyperspherical learning algorithm is guided by intra-class variation and inter-class separation principles -- ensuring that features from the same class (across different training domains) are closely aligned with their class prototypes, while different class prototypes are maximally separated. We further provide theoretical justifications on how our prototypical learning objective improves the OOD generalization bound. Through extensive experiments on challenging OOD benchmarks, we demonstrate that our approach outperforms competitive baselines and achieves superior performance. Code is available at https://github.com/deeplearning-wisc/hypo.	翻訳日:2024-03-21 21:38:31 公開日:2024-03-19
# 非マルコフ位相雑音下での単一量子ビットゲートの量子力学写像 Quantum dynamical maps for single-qubit gates under non-Markovian phase noise ( http://arxiv.org/abs/2402.14530v2 ) ライセンス: Link先を確認	J. M. Sánchez Velázquez, A. Steiner, R. Freund, M. Guevara-Bertsch, Ch. D. Marciniak, T. Monz, A. Bermudez,	(参考訳) ノイズはユビキタスで、精度が要求される環境では一般的に有害である。これは、システムユーティリティがその影響下で急速に崩壊する量子技術分野において特に当てはまる。したがって、量子デバイスにおけるノイズを理解することは、その有害な影響を軽減または排除するための効率的な戦略の前提となる。しかし、これはしばしば禁止されるリソースを必要とするため、一般的に使用されるノイズモデルは、しばしば実験的な現実から逸脱する単純化に依存している。ここでは、単一実験入力(ノイズパワースペクトル密度)のみを必要とする単一量子ゲートのコンパクトな顕微鏡誤差モデルを導出する。我々のモデルは標準的な偏極化あるいはパウリ旋回ノイズモデルを超えており、非クリフォードおよび非マルコフの動的誤差写像への寄与を明示的に含んでいる。我々は,トラップイオン量子コンピュータ上で動作している確立された特性評価技術に対して,実験的な指標の予測を行う。特に,ランダム化ベンチマークを用いて測定し,量子プロセストモグラフィーを用いて再構成した平均ゲート誤差の実験的推定は,解析的推定により厳密に下界し,非分極モデルではゲート誤差を過大評価することがわかった。非マルコフ的寄与を含むノイズモデリングは、動的デカップリングや動的修正ゲートなどの確立されたフレームワークや、量子誤り訂正のためのより現実的なしきい値を提供するために、容易に適用することができる。 Noise is both ubiquitous and generally deleterious in settings where precision is required. This is especially true in the quantum technology sector where system utility typically decays rapidly under its influence. Understanding the noise in quantum devices is thus a prerequisite for efficient strategies to mitigate or even eliminate its harmful effects. However, this requires resources that are often prohibitive, such that the typically-used noise models rely on simplifications that sometimes depart from experimental reality. Here we derive a compact microscopic error model for single-qubit gates that only requires a single experimental input -the noise power spectral density. Our model goes beyond standard depolarizing or Pauli-twirled noise models, explicitly including non-Clifford and non-Markovian contributions to the dynamical error map. We gauge our predictions for experimentally relevant metrics against established characterization techniques run on a trapped-ion quantum computer. In particular, we find that experimental estimates of average gate errors measured through randomized benchmarking and reconstructed via quantum process tomography are tightly lower-bounded by our analytical estimates, while the depolarizing model overestimates the gate error. Our noise modeling including non-Markovian contributions can be readily applied to established frameworks such as dynamical decoupling and dynamically-corrected gates, or to provide more realistic thresholds for quantum error correction.	翻訳日:2024-03-21 21:38:31 公開日:2024-03-19
# 集積フォトニック回路における量子ビットと量子ビットを用いた任意の状態準備を行うための電気エネルギーコストの推定 Estimating the electrical energy cost of performing arbitrary state preparation using qubits and qudits in integrated photonic circuits ( http://arxiv.org/abs/2402.16603v2 ) ライセンス: Link先を確認	Maria Carolina Volpato, Pierre-Louis de Assis,	(参考訳) 量子情報処理に単一光子とフォトニック集積回路(PIC)を用いる場合、量子状態は量子ビットまたは量子ビットを用いて符号化することができる。量子ビットは、量子ビットが使用する2.log_2d$導波管と比較して$d$の導波管を必要とするため、同じ次元に達するのに使用される導波管の数より効率的である。有用なタスクに十分な大きさの次元については、他のリソースが少なくともPICの導波路の数と多項式的にスケールするため、これは明らかに量子ビットが最良の選択肢であることを示している。しかし、比較はそれほど直接的ではない。例えば、変分量子アルゴリズムに関係のある量子状態準備の課題について考察する。 n 個の量子ビットに対して、このタスクは回路に n で指数関数的な多くの制御NOT(CNOT)ゲートを持つ必要がある。どちらの実装も指数的なリソースコストに悩まされているため、より詳細な評価が必要である。我々は、量子状態の準備を行うためのフォトニック回路をプログラムするために、平均して費やされる電気エネルギーの量の観点から、キュービットとキューディットのアプローチを比較した。完全に再構成可能な干渉計の配列を持つPICを使用する場合、キュービットを使用するにはキュービットを使用するよりも多くのエネルギーを必要とする。しかしながら、専用CNOTブロックを持つ回路はエネルギー消費がはるかに小さく、CNOTゲートの確率的性質など、より重要なボトルネックが発生するような大きな量子ビット数でも有効であることを示す。 When using single photons and photonic integrated circuits (PICs) for quantum information processing, quantum states can be encoded using either qubits or qudits. Qudits are more efficient in terms of requiring less photons (in principle, only one) to encode the state, while qubits are more efficient in terms of the number of waveguides used to reach the same dimension $d$, as qudits need $d$ waveguides in comparison to the $2\log_2d$ waveguides used by qubits. For dimensions large enough for useful tasks, this would indicate that qubits are clearly the best option, as other resources scale at least polynomially with the number of waveguides in the PIC. The comparison, however, is not so direct. We consider the task of quantum state preparation, which is relevant for variational quantum algorithms, for instance. For n qubits, this task requires the circuit to have a number of Controlled-NOT (CNOT) gates that is exponential in n. Since both implementations suffer from an exponential resource cost, a more detailed evaluation is required. We compare the qubit and qudit approaches in terms of the amount of electrical energy that must be spent, on average, to program a photonic circuit to perform quantum state preparation. We find that if a PIC with a fully reconfigurable array of interferometers is to be used, using qubits requires more energy than using qudits. We show, however, that a circuit with dedicated CNOT blocks has a much smaller energy consumption, remaining viable even at large qubit numbers, where more important bottlenecks come into play, such as the probabilistic nature of the CNOT gates.	翻訳日:2024-03-21 21:28:43 公開日:2024-03-19
# センサ故障時の一般化可能性:Tokenization + Transformerにより、よりロバストな潜在空間が実現 Generalizability Under Sensor Failure: Tokenization + Transformers Enable More Robust Latent Spaces ( http://arxiv.org/abs/2402.18546v3 ) ライセンス: Link先を確認	Geeling Chau, Yujin An, Ahamed Raffey Iqbal, Soon-Jo Chung, Yisong Yue, Sabera Talukder,	(参考訳) 神経科学の主要な目標は、一般化する神経データ表現を発見することである。この目標には、記録セッション(eg環境)、被験者(egニューラルネットワーク構造)、センサ(egセンサノイズ)などに沿った変動性がある。最近の研究は、セッションや主題間の一般化に対処し始めているが、神経科学実験でよく見られるセンサー障害に対する堅牢性の研究はほとんどない。これらの一般化可能性次元に対処するために、我々はまず多数のセッション、被験者、センサーで独自の脳波データセットを収集し、次にEEGNet(Lawhern et al , 2018)とTOTEM(Talukder et al , 2024)の2つの時系列モデルを研究します。 EEGNetは広く使われている畳み込みニューラルネットワークであり、TOTEMは離散時系列トークンとトランスフォーマーモデルである。一般化可能なすべてのケースにおいて、TOTEMがEEGNetを上回ったり、マッチすることがわかった。最後に、TOTEMの潜在コードブックの分析を通して、トークン化が一般化を可能にすることを観察する。 A major goal in neuroscience is to discover neural data representations that generalize. This goal is challenged by variability along recording sessions (e.g. environment), subjects (e.g. varying neural structures), and sensors (e.g. sensor noise), among others. Recent work has begun to address generalization across sessions and subjects, but few study robustness to sensor failure which is highly prevalent in neuroscience experiments. In order to address these generalizability dimensions we first collect our own electroencephalography dataset with numerous sessions, subjects, and sensors, then study two time series models: EEGNet (Lawhern et al., 2018) and TOTEM (Talukder et al., 2024). EEGNet is a widely used convolutional neural network, while TOTEM is a discrete time series tokenizer and transformer model. We find that TOTEM outperforms or matches EEGNet across all generalizability cases. Finally through analysis of TOTEM's latent codebook we observe that tokenization enables generalization.	翻訳日:2024-03-21 21:28:43 公開日:2024-03-19
# ファウショット事例選択のためのインフォーマティブメトリックの設計 Designing Informative Metrics for Few-Shot Example Selection ( http://arxiv.org/abs/2403.03861v2 ) ライセンス: Link先を確認	Rishabh Adiga, Lakshminarayanan Subramanian, Varun Chandrasekaran,	(参考訳) 事前訓練された言語モデル(PLM)は、適切にフォーマットされた例を提供すると、顕著な数ショットの学習能力を示す。しかしながら、"ベスト"の例を選択することは、依然としてオープンな課題である。本稿では,複雑性に基づく逐次タギングタスクのプロンプト選択手法を提案する。このアプローチは、サンプルの選択専用のモデルのトレーニングを回避し、代わりに特定のメトリクスを使用して、テスト文や例の構文と意味の複雑さを調整する。文レベルと単語レベルの両方のメトリクスを用いて、例の複雑さと検討中の(テスト)文とを一致させる。 GPT-4のCoNLL2003データセットのF1スコアを5%改善し,NERの最先端性能を実現した。また、GPT-j-6Bのような小型モデルでは28.85ポイント(F1/Acc.)までの大きなゲインも見られる。 Pretrained language models (PLMs) have shown remarkable few-shot learning capabilities when provided with properly formatted examples. However, selecting the "best" examples remains an open challenge. We propose a complexity-based prompt selection approach for sequence tagging tasks. This approach avoids the training of a dedicated model for selection of examples, and instead uses certain metrics to align the syntactico-semantic complexity of test sentences and examples. We use both sentence- and word-level metrics to match the complexity of examples to the (test) sentence being considered. Our results demonstrate that our approach extracts greater performance from PLMs: it achieves state-of-the-art performance on few-shot NER, achieving a 5% absolute improvement in F1 score on the CoNLL2003 dataset for GPT-4. We also see large gains of upto 28.85 points (F1/Acc.) in smaller models like GPT-j-6B.	翻訳日:2024-03-21 21:18:47 公開日:2024-03-19
# Large, Small or Both: 意見要約の曖昧化のための言語モデルに基づく新しいデータ拡張フレームワーク Large, Small or Both: A Novel Data Augmentation Framework Based on Language Models for Debiasing Opinion Summarization ( http://arxiv.org/abs/2403.07693v2 ) ライセンス: Link先を確認	Yanyue Zhang, Pengfei Li, Yilong Lai, Deyu Zhou, Yulan He,	(参考訳) 既存の意見要約データセットの70$\%以上のレビューは肯定的であるため、現在の意見要約アプローチは、否定的なテキストの入力によって負の要約を生成することに消極的である。このような感情バイアスに対処するために、特定のフレームワークに過度に依存しない直接的なアプローチは、データセットの感情分布のバランスをとるために、大きな言語モデルに基づいた追加データを生成することである。しかし、大きな言語モデルに基づくデータ拡張は2つの欠点に直面している。 1) 付加データの潜在的な問題又は毒性 2)コストがかかる。そこで本稿では,大小の言語モデルと大小の言語モデルに基づく新たなデータ拡張フレームワークを提案する。具体的には、大規模な言語モデルを用いて、肯定的なテキストを書き換えることにより、合成された否定的なレビューの小さなサイズを得る。そして、生成されたデータに基づいて、アンタングル復元モデルをトレーニングする。トレーニング後、混乱度と感情分類に基づいて、異なるサンプル表現とフィルタリングの組み合わせから得られた新しい表現を復号することで、大量の合成データを得ることができる。実験により、我々のフレームワークは、大きなモデルだけでなく、より経済的にも、感情バイアスを効果的に軽減できることが示された。 As more than 70$\%$ of reviews in the existing opinion summary data set are positive, current opinion summarization approaches are reluctant to generate negative summaries given the input of negative texts. To address such sentiment bias, a direct approach without the over-reliance on a specific framework is to generate additional data based on large language models to balance the emotional distribution of the dataset. However, data augmentation based on large language models faces two disadvantages: 1) the potential issues or toxicity in the augmented data; 2) the expensive costs. Therefore, in this paper, we propose a novel data augmentation framework based on both large and small language models for debiasing opinion summarization. In specific, a small size of synthesized negative reviews is obtained by rewriting the positive text via a large language model. Then, a disentangle reconstruction model is trained based on the generated data. After training, a large amount of synthetic data can be obtained by decoding the new representation obtained from the combination of different sample representations and filtering based on confusion degree and sentiment classification. Experiments have proved that our framework can effectively alleviate emotional bias same as using only large models, but more economically.	翻訳日:2024-03-21 21:18:47 公開日:2024-03-19
# 対話レコメンデーションのための生成ユーザシミュレータとしての大規模言語モデルの評価 Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation ( http://arxiv.org/abs/2403.09738v2 ) ライセンス: Link先を確認	Se-eun Yoon, Zhankui He, Jessica Maria Echterhoff, Julian McAuley,	(参考訳) 合成ユーザは,対話レコメンデーションシステムの評価において,実際のユーザにとって費用対効果の高いプロキシである。大規模言語モデルは、人間の様態をシミュレートし、多様なユーザーを表わす能力の疑問を提起する。本稿では,言語モデルが対話的推薦において人間の行動を正確にエミュレートできる程度を測定するための新しいプロトコルを提案する。このプロトコルは5つのタスクから構成されており、それぞれのタスクは、合成ユーザが提示すべき重要な特性、すなわち、どのアイテムについて話すべきかの選択、バイナリの好みの表現、オープンな好みの表現、レコメンデーションの要求、フィードバックの付与である。ベースラインシミュレータの評価を通じて、これらのタスクは人間の行動から言語モデルの逸脱を効果的に明らかにし、モデル選択と促進戦略による逸脱を減らす方法についての洞察を与える。 Synthetic users are cost-effective proxies for real users in the evaluation of conversational recommender systems. Large language models show promise in simulating human-like behavior, raising the question of their ability to represent a diverse population of users. We introduce a new protocol to measure the degree to which language models can accurately emulate human behavior in conversational recommendation. This protocol is comprised of five tasks, each designed to evaluate a key property that a synthetic user should exhibit: choosing which items to talk about, expressing binary preferences, expressing open-ended preferences, requesting recommendations, and giving feedback. Through evaluation of baseline simulators, we demonstrate these tasks effectively reveal deviations of language models from human behavior, and offer insights on how to reduce the deviations with model selection and prompting strategies.	翻訳日:2024-03-21 21:08:57 公開日:2024-03-19
# 透かしLLMの統計的理解に向けて Towards Better Statistical Understanding of Watermarking LLMs ( http://arxiv.org/abs/2403.13027v1 ) ライセンス: Link先を確認	Zhongze Cai, Shang Liu, Hanzhao Wang, Huaiyang Zhong, Xiaocheng Li,	(参考訳) 本稿では,大規模言語モデル (LLM) の透かし問題について検討する。モデル歪みと検出能力のトレードオフを考慮し,Kirchenbauer et al (2023a) のグリーンレッドアルゴリズムに基づく制約付き最適化問題として定式化する。最適化問題に対する最適解法は、より理解し、ウォーターマーキングプロセスのアルゴリズム設計を刺激する優れた解析的性質を享受できることを示す。本研究では,この最適化定式化を考慮したオンライン二重勾配上昇透かしアルゴリズムを開発し,その漸近的パレート最適性をモデル歪みと検出能力の間で証明する。このような結果は、緑リストの確率が平均的に増加することを保証し、従って(以前の結果とは対照的に)明示的に検出する。さらに,透かし問題に対するモデル歪み指標の選択について,系統的な考察を行った。我々は、KLの発散の選択を正当化し、既存の「歪曲フリー」とパープレキシティの基準で問題を提示する。最後に、ベンチマークアルゴリズムに対して、広範囲なデータセットでアルゴリズムを実証的に評価する。 In this paper, we study the problem of watermarking large language models (LLMs). We consider the trade-off between model distortion and detection ability and formulate it as a constrained optimization problem based on the green-red algorithm of Kirchenbauer et al. (2023a). We show that the optimal solution to the optimization problem enjoys a nice analytical property which provides a better understanding and inspires the algorithm design for the watermarking process. We develop an online dual gradient ascent watermarking algorithm in light of this optimization formulation and prove its asymptotic Pareto optimality between model distortion and detection ability. Such a result guarantees an averaged increased green list probability and henceforth detection ability explicitly (in contrast to previous results). Moreover, we provide a systematic discussion on the choice of the model distortion metrics for the watermarking problem. We justify our choice of KL divergence and present issues with the existing criteria of ``distortion-free'' and perplexity. Finally, we empirically evaluate our algorithms on extensive datasets against benchmark algorithms.	翻訳日:2024-03-21 20:59:01 公開日:2024-03-19
# 回折限界に打ち勝つインテンシティ製品に基づく光センシング Intensity product-based optical sensing to beat the diffraction limit ( http://arxiv.org/abs/2403.13029v1 ) ライセンス: Link先を確認	Byoung S. Ham,	(参考訳) 古典的に定義された光学位相感度は、ショットノイズ限界(SNL)または量子力学の不確実性原理に由来する標準量子極限として知られている。 SNLに基づいて、位相分解能(感度)は正方根Nに逆比例する。これにより、信号対雑音比の平方根Nゲインにより、高出力レーザーの使用が有利となる。しかし、光学干渉計では、Nプローブ光子が検出過程で解決されない限り、位相分解能はN=1の場合に留まり、古典光学の回折限界が生じる。そこで本研究では, SNL を実現するために, 高精細度製品を用いた光センシングのための投影型計測法を提案する。このため、干渉計の出力ポートの1つをNポートに均等に分割し、入力レーザの光子の総数によって最大Nが与えられるm度強度相関(mはN以下)について測定する。 N光子の最大時間遅延はレーザーのスペクトル帯域によって制限され、これは偶然検出に基づく量子センシングに使用される光子アンサンブルの有効コヒーレンス時間と同じである。 Classically defined optical phase sensitivity is known as the shot-noise limit (SNL) or standard quantum limit originating in the uncertainty principle of quantum mechanics. Based on SNL, the phase resolution (sensitivity) is inversely proportional to the square root N, where N is the number of interfering photons or individually measured events. Thus, using a high-power laser is advantageous due to the square root N gain in the signal-to-noise ratio. In an optical interferometer, however, the phase resolution remains in the N=1 case unless N probe photons are resolved in a detection process, resulting in the diffraction limit of classical optics. Here, a projective measurement is proposed for intensity product-based optical sensing to realize SNL in a typical interferometer commonly used for high-precision metrology. For this, one of the output ports of the interferometer is evenly divided into N ports and measured them for mth-intensity correlations (m is less than or equal to N), where the maximum N is given by the total number of photons of the input laser. The maximum temporal delay among N photons is constrained by the spectral bandwidth of the laser, which is the same as the effective coherence time of the photon ensemble used for coincidence detection-based quantum sensing.	翻訳日:2024-03-21 20:59:01 公開日:2024-03-19
# 階層ROIと適応量子化による超高忠実画像圧縮 Super-High-Fidelity Image Compression via Hierarchical-ROI and Adaptive Quantization ( http://arxiv.org/abs/2403.13030v1 ) ライセンス: Link先を確認	Jixiang Luo, Yan Wang, Hongwei Qin,	(参考訳) 学習された画像圧縮(lic)は、客観的および主観的メトリクスに関して劇的な進歩を遂げた。 MSEベースのモデルは客観的メトリクスを改善することを目的としており、生成モデルは主観的メトリクスによって測定された視覚的品質を改善するために活用される。しかし、いずれも低ビットレートで、特に0.2bpp$以下のぼやけや変形に悩まされている。さらに、人間の顔やテキストの変形は視覚的品質評価には受け入れられず、小さな顔やテキストではより顕著になる。この問題を解決するために、関心領域(ROI)を利用して、MSEベースのモデルと生成モデルの利点を組み合わせる。本研究では,顔,テキスト,複雑なテクスチャを含む領域の再構成を改善するために,画像を複数の前景領域と1つの背景領域に分割する階層ROI(H-ROI)を提案する。さらに、チャネル次元内における非線形マッピングによる適応量子化を提案し、視覚的品質を維持しながらビットレートを制約する。提案手法は,HiFiCの0.7X$ビット,BPGの0.5X$ビットなど,低ビットレートの小さな顔やテキストに対して,より視覚的品質が得られることを示す。 Learned Image Compression (LIC) has achieved dramatic progress regarding objective and subjective metrics. MSE-based models aim to improve objective metrics while generative models are leveraged to improve visual quality measured by subjective metrics. However, they all suffer from blurring or deformation at low bit rates, especially at below $0.2bpp$. Besides, deformation on human faces and text is unacceptable for visual quality assessment, and the problem becomes more prominent on small faces and text. To solve this problem, we combine the advantage of MSE-based models and generative models by utilizing region of interest (ROI). We propose Hierarchical-ROI (H-ROI), to split images into several foreground regions and one background region to improve the reconstruction of regions containing faces, text, and complex textures. Further, we propose adaptive quantization by non-linear mapping within the channel dimension to constrain the bit rate while maintaining the visual quality. Exhaustive experiments demonstrate that our methods achieve better visual quality on small faces and text with lower bit rates, e.g., $0.7X$ bits of HiFiC and $0.5X$ bits of BPG.	翻訳日:2024-03-21 20:59:01 公開日:2024-03-19
# RigorLLM: 望ましくないコンテンツに対する大規模言語モデルのための回復力のあるガードレール RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content ( http://arxiv.org/abs/2403.13031v1 ) ライセンス: Link先を確認	Zhuowen Yuan, Zidi Xiong, Yi Zeng, Ning Yu, Ruoxi Jia, Dawn Song, Bo Li,	(参考訳) 大規模言語モデル(LLM)の最近の進歩は、様々な領域における様々なタスクにまたがる顕著な機能を示した。しかし、特に悪意のある入力の下では、バイアスの出現とLSMの有害なコンテンツを生成する可能性には大きな課題が生じる。現在の緩和戦略は効果はあるものの、敵の攻撃下では弾力性がない。本稿では,LLMに対する有害かつ安全でない入力と出力を効率よく効果的に抑制する新しいフレームワークであるResilient Guardrails for Large Language Models (RigorLLM)を紹介する。ランゲヴィンダイナミクスによるエネルギーベースのトレーニングデータ拡張を含む多面的アプローチを採用し、ミニマックス最適化による入力に対する安全なサフィックスを最適化し、我々のデータ拡張に基づくロバストなKNNとLLMを組み合わせた融合モデルを統合することにより、RigorLLMは有害なコンテンツモデレーションに対する堅牢なソリューションを提供する。実験により、RigorLLMは、有害なコンテンツの検出において、OpenAI APIやAspective APIのような既存のベースラインよりも優れているだけでなく、脱獄攻撃に対する非並列なレジリエンスも示している。制約付き最適化とフュージョンベースのガードレールアプローチの革新的利用は、よりセキュアで信頼性の高いLCMを開発するための大きな一歩であり、デジタル脅威の進化に直面したコンテンツモデレーションフレームワークの新たな標準となる。 Recent advancements in Large Language Models (LLMs) have showcased remarkable capabilities across various tasks in different domains. However, the emergence of biases and the potential for generating harmful content in LLMs, particularly under malicious inputs, pose significant challenges. Current mitigation strategies, while effective, are not resilient under adversarial attacks. This paper introduces Resilient Guardrails for Large Language Models (RigorLLM), a novel framework designed to efficiently and effectively moderate harmful and unsafe inputs and outputs for LLMs. By employing a multi-faceted approach that includes energy-based training data augmentation through Langevin dynamics, optimizing a safe suffix for inputs via minimax optimization, and integrating a fusion-based model combining robust KNN with LLMs based on our data augmentation, RigorLLM offers a robust solution to harmful content moderation. Our experimental evaluations demonstrate that RigorLLM not only outperforms existing baselines like OpenAI API and Perspective API in detecting harmful content but also exhibits unparalleled resilience to jailbreaking attacks. The innovative use of constrained optimization and a fusion-based guardrail approach represents a significant step forward in developing more secure and reliable LLMs, setting a new standard for content moderation frameworks in the face of evolving digital threats.	翻訳日:2024-03-21 20:59:01 公開日:2024-03-19
# 産業バッチプロセス監視のためのハイブリッド型教師なし学習戦略 Hybrid Unsupervised Learning Strategy for Monitoring Industrial Batch Processes ( http://arxiv.org/abs/2403.13032v1 ) ライセンス: Link先を確認	Christian W. Frey,	(参考訳) 工業生産プロセス、特に製薬業界は、効率、製品品質、安全性を確保するために継続的な監視を必要とする複雑なシステムである。本稿では,複雑な産業プロセスを監視するためのハイブリッド型教師なし学習戦略(HULS)を提案する。従来の自己組織化マップ(SOM)の制限、特にバランスの取れていないデータセットと高相関のプロセス変数のシナリオに対処するため、HULSは既存の教師なし学習技術を組み合わせてこれらの課題に対処する。 HULSの概念の性能を評価するために,実験室のバッチに基づいて比較実験を行った。 Industrial production processes, especially in the pharmaceutical industry, are complex systems that require continuous monitoring to ensure efficiency, product quality, and safety. This paper presents a hybrid unsupervised learning strategy (HULS) for monitoring complex industrial processes. Addressing the limitations of traditional Self-Organizing Maps (SOMs), especially in scenarios with unbalanced data sets and highly correlated process variables, HULS combines existing unsupervised learning techniques to address these challenges. To evaluate the performance of the HULS concept, comparative experiments are performed based on a laboratory batch	翻訳日:2024-03-21 20:59:01 公開日:2024-03-19
# 部分オラクルとグローバーアルゴリズムを用いた加速量子探索 Accelerated quantum search using partial oracles and Grover's algorithm ( http://arxiv.org/abs/2403.13035v1 ) ライセンス: Link先を確認	Fintan M. Bolton,	(参考訳) グロバーのアルゴリズム(英: Grover's algorithm)は、順序のないデータベースを探索する手段として考案されたアルゴリズムであり、量子計算によって生成された結果集合から解を抽出する手法としても用いられる。このアルゴリズムはオラクル関数の概念を利用して、検索項目をマッチングするプロセスを抽象化する(マッチと0を返却する)。このブラックボックスアプローチは、検索問題の詳細を隠蔽し、検索空間内のアイテムが完全に順序づけられていないという仮定で機能する。この場合、サイズ$N$の検索空間で1つのターゲット項目を検索すると、$\mathcal{O}(\sqrt{N})$oracleクエリ(または$\mathcal{O}(\sqrt{N/M})$oracleクエリ)としてスケールする。しかし、典型的なブラックボックスのオラクルの中に隠されているアイテムをマッチングするプロセスは、通常複数のデータビットをマッチングする。本稿では,個別のオラクルをマッチング条件の各ビットに関連付け,独立にテスト可能な複数の部分的なオラクル関数を得るという考え方について検討する。このアイデアを探索することで、(Groverのように)$\mathcal{O}(\sqrt{N})$ (Same as Grover) や$\mathcal{O}(\log(N))$ (exponentially faster) など、幅広い範囲でパフォーマンスが低下する多段階ハイブリッド検索アルゴリズムが実現される。探索加速度は探索空間の順序を動的に発見し、その順序は部分オラクル関数と探索インデックスの相関関係から成り立っている。このアルゴリズムは最も単純な検索シナリオに対して検証される。 Grover's algorithm, orginally conceived as a means of searching an unordered database, can also be used as a technique for extracting solutions from the result sets generated by quantum computations. The algorithm exploits the concept of an oracle function, which abstracts the process of matching a search item (returning 1 for a match and 0 otherwise). This black-box approach hides the details of a search problem and works with the assumption that the items in the search space are completely unordered. In this case, searching for 1 target item in a search space of size $N$ scales as $\mathcal{O}(\sqrt{N})$ oracle queries (or $\mathcal{O}(\sqrt{N/M})$ oracle queries for $M$ target items in a search space of size $N$). Hidden inside the typical black-box oracle, however, the process of matching an item usually involves matching multiple data bits. In this article, we explore the idea of associating a separate oracle with each bit of the matching condition, obtaining multiple partial oracle functions which can be tested independently. Exploring this idea leads to a multi-stage hybrid search algorithm, whose performance can fall within a wide range, anywhere between $\mathcal{O}(\sqrt{N})$ (same as Grover) or $\mathcal{O}(\log(N))$ (exponentially faster). Apparently, the search acceleration works by dynamically discovering order in the search space, where the order consists of correlations between the partial oracle functions and the search index. This new algorithm is validated against the simplest kind of search scenario.	翻訳日:2024-03-21 20:59:01 公開日:2024-03-19
# BiLoRA: 大規模事前学習モデルの高効率低ランク適応のための2レベル最適化フレームワーク BiLoRA: A Bi-level Optimization Framework for Overfitting-Resilient Low-Rank Adaptation of Large Pre-trained Models ( http://arxiv.org/abs/2403.13037v1 ) ライセンス: Link先を確認	Rushi Qiang, Ruiyi Zhang, Pengtao Xie,	(参考訳) 低ランク適応(LoRA)は、低ランクインクリメンタル行列を学習することにより、下流タスクにおける大規模事前学習モデルの微調整に人気がある手法である。 LoRAとその変種は、完全な微調整法に比べてトレーニング可能なパラメータの数を効果的に減少させるが、トレーニングデータによく適合し、テストデータに対する準最適一般化をもたらす。この問題に対処するために,バイレベル最適化(BLO)に基づく過度な微調整手法であるBiLoRAを導入する。 BiLoRAは擬似特異値分解を用いて低ランクインクリメンタル行列をパラメータ化し、擬似特異ベクトルと値のトレーニングをトレーニングデータの2つの異なるサブセットに分割する。この分割は、BLOフレームワークの別のレベルに埋め込まれており、単一のデータセットに過度に適合するリスクを軽減する。自然言語の理解と生成タスクをカバーする10のデータセットでテストされ、よく知られた大規模な事前学習モデルに適用されたBiLoRAは、同様のトレーニング可能なパラメータを持つLoRAメソッドやその他の微調整アプローチを著しく上回っている。 Low-rank adaptation (LoRA) is a popular method for fine-tuning large-scale pre-trained models in downstream tasks by learning low-rank incremental matrices. Though LoRA and its variants effectively reduce the number of trainable parameters compared to full fine-tuning methods, they often overfit training data, resulting in sub-optimal generalization on test data. To address this problem, we introduce BiLoRA, an overfitting-alleviating fine-tuning approach based on bi-level optimization (BLO). BiLoRA employs pseudo singular value decomposition to parameterize low-rank incremental matrices and splits the training of pseudo singular vectors and values across two different subsets of training data. This division, embedded within separate levels of the BLO framework, mitigates the risk of overfitting to a single dataset. Tested on ten datasets covering natural language understanding and generation tasks and applied to various well-known large pre-trained models, BiLoRA significantly outperforms LoRA methods and other fine-tuning approaches, with similar amounts of trainable parameters.	翻訳日:2024-03-21 20:59:01 公開日:2024-03-19
# 顔表情認識のための注意融合型エモティックマスク付きオートエンコーダ Emotic Masked Autoencoder with Attention Fusion for Facial Expression Recognition ( http://arxiv.org/abs/2403.13039v1 ) ライセンス: Link先を確認	Bach Nguyen-Xuan, Thien Nguyen-Hoang, Nhu Tai-Do,	(参考訳) 表情認識(FER)はコンピュータビジョンにおける重要な課題であり、様々な領域にまたがる多様な応用がある。表現認識モデルの一般化能力を損なうような限られたFERデータセットの課題に対処することは、性能向上に不可欠である。本稿では,表現分類におけるMAE-Face Self-supervised Learning(SSL)手法とFusion Attention(フュージョン・アテンション・アテンション・メカニズム)を統合した革新的なアプローチを提案する。さらに,Aff-wild2データセットで顕著に示されたトレーニングセットと検証セットのモデル性能を向上させるために,重要な顔特徴を強調する前処理手法を提案する。 Facial Expression Recognition (FER) is a critical task within computer vision with diverse applications across various domains. Addressing the challenge of limited FER datasets, which hampers the generalization capability of expression recognition models, is imperative for enhancing performance. Our paper presents an innovative approach integrating the MAE-Face self-supervised learning (SSL) method and Fusion Attention mechanism for expression classification, particularly showcased in the 6th Affective Behavior Analysis in-the-wild (ABAW) competition. Additionally, we propose preprocessing techniques to emphasize essential facial features, thereby enhancing model performance on both training and validation sets, notably demonstrated on the Aff-wild2 dataset.	翻訳日:2024-03-21 20:59:01 公開日:2024-03-19
# 心室内ベクトルフローマッピングのための物理誘導型ニューラルネットワーク Physics-Guided Neural Networks for Intraventricular Vector Flow Mapping ( http://arxiv.org/abs/2403.13040v1 ) ライセンス: Link先を確認	Hang Jung Ling, Salomé Bru, Julia Puig, Florian Vixège, Simon Mendez, Franck Nicoud, Pierre-Yves Courand, Olivier Bernard, Damien Garcia,	(参考訳) 心内ベクターフローマッピング(iVFM)は、心臓画像におけるカラードプラの増強と定量化を目的としている。本研究では,物理インフォームドニューラルネットワーク (PINN) と物理誘導 nnU-Net を用いた教師付きアプローチを用いて,従来の iVFM 最適化手法に代わる新しい手法を提案する。患者固有の流体力学モデルから得られたカラードップラー画像の厳密な評価と生体内ドップラーの取得により、どちらの手法も元のiVFMアルゴリズムに匹敵する再現性能を示した。 PINNの効率は2段最適化と事前最適化により向上する。一方、nnU-Net法は一般化性とリアルタイム性に優れる。特に、nnU-Netは、明示的な境界条件からの独立性を維持しつつ、スパースおよびトランケートドップラーデータに優れたロバスト性を示す。以上の結果から,心室内ベクター血流の再建におけるこれらの方法の有効性が示唆された。この研究は、超高速カラードプライメージングにおけるPINNの潜在的な応用と、血流に基づく心臓血管疾患のバイオマーカーを導出するための流体力学方程式の導入についても示唆している。 Intraventricular vector flow mapping (iVFM) seeks to enhance and quantify color Doppler in cardiac imaging. In this study, we propose novel alternatives to the traditional iVFM optimization scheme by utilizing physics-informed neural networks (PINNs) and a physics-guided nnU-Net-based supervised approach. Through rigorous evaluation on simulated color Doppler images derived from a patient-specific computational fluid dynamics model and in vivo Doppler acquisitions, both approaches demonstrate comparable reconstruction performance to the original iVFM algorithm. The efficiency of PINNs is boosted through dual-stage optimization and pre-optimized weights. On the other hand, the nnU-Net method excels in generalizability and real time capabilities. Notably, nnU-Net shows superior robustness on sparse and truncated Doppler data while maintaining independence from explicit boundary conditions. Overall, our results highlight the effectiveness of these methods in reconstructing intraventricular vector blood flow. The study also suggests potential applications of PINNs in ultrafast color Doppler imaging and the incorporation of fluid dynamics equations to derive biomarkers for cardiovascular diseases based on blood flow.	翻訳日:2024-03-21 20:59:01 公開日:2024-03-19
# 非プロプライエタリなプレプロシージャによる予測可能なプライバシ Provable Privacy with Non-Private Pre-Processing ( http://arxiv.org/abs/2403.13041v1 ) ライセンス: Link先を確認	Yaxi Hu, Amartya Sanyal, Bernhard Schölkopf,	(参考訳) Differentially Private(DP)機械学習パイプラインを分析する場合、データ依存の事前処理の潜在的なプライバシコストは、プライバシ会計においてしばしば見過ごされる。本研究では,非プライベートなデータ依存型前処理アルゴリズムによって生じる追加のプライバシーコストを評価するための一般的なフレームワークを提案する。本フレームワークは,Smooth DPと呼ばれるDPの変種と,前処理アルゴリズムの限界感度という,2つの新しい技術的概念を活用することにより,全体的なプライバシー保証の上限を確立する。汎用フレームワークに加えて、複数のDPアルゴリズムと組み合わせて使用する場合、データ計算、量子化、復号化、PCAなどの複数のデータ依存事前処理アルゴリズムに対して、全体的なプライバシー保証を提供する。このフレームワークは実装も簡単で、既存のDPパイプラインに直接統合できる。 When analysing Differentially Private (DP) machine learning pipelines, the potential privacy cost of data-dependent pre-processing is frequently overlooked in privacy accounting. In this work, we propose a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms. Our framework establishes upper bounds on the overall privacy guarantees by utilising two new technical notions: a variant of DP termed Smooth DP and the bounded sensitivity of the pre-processing algorithms. In addition to the generic framework, we provide explicit overall privacy guarantees for multiple data-dependent pre-processing algorithms, such as data imputation, quantization, deduplication and PCA, when used in combination with several DP algorithms. Notably, this framework is also simple to implement, allowing direct integration into existing DP pipelines.	翻訳日:2024-03-21 20:59:01 公開日:2024-03-19
# TAPTR: トランスフォーマーを検出として任意のポイントを追跡する TAPTR: Tracking Any Point with Transformers as Detection ( http://arxiv.org/abs/2403.13042v1 ) ライセンス: Link先を確認	Hongyang Li, Hao Zhang, Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Lei Zhang,	(参考訳) 本稿では,TRansformers (TAPTR) を用いた任意の点追跡のためのシンプルで強力なフレームワークを提案する。点追跡は物体検出と追跡に非常によく似ているという観測に基づいて,TAPの課題に対処するためにDETRライクなアルゴリズムから設計を借りる。提案フレームワークでは、各ビデオフレームにおいて、各トラッキングポイントを位置部分とコンテンツ部分からなるポイントクエリとして表現する。 DETRのように、各クエリ(位置とコンテンツ機能)は層ごとに自然に更新される。その可視性は、そのアップデートされたコンテンツ機能によって予測される。同じ追跡ポイントに属するクエリは、時間次元に沿って自己アテンションを介して情報を交換することができる。これらの操作はすべてDETRのようなアルゴリズムでよく設計されているため、概念的には非常に単純である。また,光学フローモデルからのコスト容積などの有用な設計も採用し,機能ドリフト問題を緩和しつつ,長時間の時間的情報を提供するための簡易な設計を開発した。提案フレームワークは,高速な推論速度を持つ様々なTAPデータセットに対して,最先端の性能で高い性能を示す。 In this paper, we propose a simple and strong framework for Tracking Any Point with TRansformers (TAPTR). Based on the observation that point tracking bears a great resemblance to object detection and tracking, we borrow designs from DETR-like algorithms to address the task of TAP. In the proposed framework, in each video frame, each tracking point is represented as a point query, which consists of a positional part and a content part. As in DETR, each query (its position and content feature) is naturally updated layer by layer. Its visibility is predicted by its updated content feature. Queries belonging to the same tracking point can exchange information through self-attention along the temporal dimension. As all such operations are well-designed in DETR-like algorithms, the model is conceptually very simple. We also adopt some useful designs such as cost volume from optical flow models and develop simple designs to provide long temporal information while mitigating the feature drifting issue. Our framework demonstrates strong performance with state-of-the-art performance on various TAP datasets with faster inference speed.	翻訳日:2024-03-21 20:59:01 公開日:2024-03-19
# より大型のビジョンモデルはいつ必要か? When Do We Not Need Larger Vision Models? ( http://arxiv.org/abs/2403.13043v1 ) ライセンス: Link先を確認	Baifeng Shi, Ziyang Wu, Maolin Mao, Xin Wang, Trevor Darrell,	(参考訳) 視覚モデルのサイズを拡大することが、より強力な視覚表現を得るためのデファクトスタンダードとなっている。本稿では,より大きな視覚モデルが不要な点について論じる。まず、トレーニング済みで凍結された小さな視覚モデル(例えば、ViT-BまたはViT-L)を複数の画像スケールで実行することで、分類、セグメンテーション、深さ推定、マルチモーダルLLM(MLLM)ベンチマーク、ロボット操作において、より大きなモデル(例えば、ViT-HまたはViT-G)よりも優れた性能を発揮できる(S$^2$)。特に、S$^2$は、GPT-4Vのようなモデルを上回る、Vベンチマーク上でのMLLMの詳細な理解において、最先端のパフォーマンスを達成する。 S$^2$がモデルサイズでのスケーリングよりも好ましいスケーリング手法である条件について検討する。より大型のモデルでは、ハードな例でのより優れた一般化の利点があるが、より大型の視覚モデルの特徴は、マルチスケールの小型モデルによってよく近似できることを示す。これは、全てではないとしても、現在の大きな事前訓練されたモデルによって学習された表現のほとんどが、マルチスケールのより小さなモデルから得ることができることを示唆している。以上の結果から,S$^2$の事前学習モデルでは,より大規模なモデルに匹敵する学習能力を有し,より大規模なモデルに匹敵するか,あるいはその優位性を超えうることが示された。我々は、任意のビジョンモデルに1行のコードでS$^2$を適用することができるPythonパッケージをリリースした。 Scaling up the size of vision models has been the de facto standard to obtain more powerful visual representations. In this work, we discuss the point beyond which larger vision models are not necessary. First, we demonstrate the power of Scaling on Scales (S$^2$), whereby a pre-trained and frozen smaller vision model (e.g., ViT-B or ViT-L), run over multiple image scales, can outperform larger models (e.g., ViT-H or ViT-G) on classification, segmentation, depth estimation, Multimodal LLM (MLLM) benchmarks, and robotic manipulation. Notably, S$^2$ achieves state-of-the-art performance in detailed understanding of MLLM on the V benchmark, surpassing models such as GPT-4V. We examine the conditions under which S$^2$ is a preferred scaling approach compared to scaling on model size. While larger models have the advantage of better generalization on hard examples, we show that features of larger vision models can be well approximated by those of multi-scale smaller models. This suggests most, if not all, of the representations learned by current large pre-trained models can also be obtained from multi-scale smaller models. Our results show that a multi-scale smaller model has comparable learning capacity to a larger model, and pre-training smaller models with S$^2$ can match or even exceed the advantage of larger models. We release a Python package that can apply S$^2$ on any vision model with one line of code: https://github.com/bfshi/scaling_on_scales.	翻訳日:2024-03-21 20:59:01 公開日:2024-03-19
# Magic Fixup:ダイナミックビデオによる写真編集の合理化 Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos ( http://arxiv.org/abs/2403.13044v1 ) ライセンス: Link先を確認	Hadi Alzayer, Zhihao Xia, Xuaner Zhang, Eli Shechtman, Jia-Bin Huang, Michael Gharbi,	(参考訳) 本稿では、粗い編集画像が与えられた場合、所定のレイアウトに従うフォトリアリスティックな出力を合成する生成モデルを提案する。本手法は,元の画像から細かな詳細を転送し,その部分の同一性を保持する。しかし、新しいレイアウトで定義された照明とコンテキストに適応する。物体やカメラの動きは、視点、照明、物理的相互作用によって世界がどのように変化するかを多くの観察する。我々は,サンプルが同一ビデオからランダムに選択された時間間隔で抽出された,一対のソースとターゲットフレームである画像データセットを構築した。期待されるテストタイムのユーザ編集を模倣する2つのモーションモデルを用いて、ソースフレームをターゲットに向けてワープする。我々は、事前訓練された拡散モデルから、歪んだ画像を地上の真実に翻訳するモデルを監督する。モデル設計では,ユーザが指定したレイアウトを忠実に追従しながら,ソースフレームから生成画像への細部移動を可能にする。簡単なセグメンテーションと粗い2D操作により、編集対象間の光の調和や物理的相互作用といった2階効果に対処しながら、ユーザの入力に忠実なフォトリアリスティックな編集を合成できることが示される。 We propose a generative model that, given a coarsely edited image, synthesizes a photorealistic output that follows the prescribed layout. Our method transfers fine details from the original image and preserves the identity of its parts. Yet, it adapts it to the lighting and context defined by the new layout. Our key insight is that videos are a powerful source of supervision for this task: objects and camera motions provide many observations of how the world changes with viewpoint, lighting, and physical interactions. We construct an image dataset in which each sample is a pair of source and target frames extracted from the same video at randomly chosen time intervals. We warp the source frame toward the target using two motion models that mimic the expected test-time user edits. We supervise our model to translate the warped image into the ground truth, starting from a pretrained diffusion model. Our model design explicitly enables fine detail transfer from the source frame to the generated image, while closely following the user-specified layout. We show that by using simple segmentations and coarse 2D manipulations, we can synthesize a photorealistic edit faithful to the user's input while addressing second-order effects like harmonizing the lighting and physical interactions between edited objects.	翻訳日:2024-03-21 20:59:01 公開日:2024-03-19
# 局所可観測物の熱化に及ぼす非可換電荷の影響 Noncommuting charges' effect on the thermalization of local observables ( http://arxiv.org/abs/2403.13046v1 ) ライセンス: Link先を確認	Shayan Majidy,	(参考訳) 非可換保存量(または「チャージ」)の研究は、概念的なパズルを生み出した。近年の研究では、非交換電荷はいくつかの点で熱化を妨げるが、他の方法では促進することが示唆されている。この問題を解決するために, 固有状態熱化仮説に従って熱化する局所観測値の数を減らし, 非可換電荷が熱化を促進することを示す。まず、観測対象が熱を起こさないための電荷と十分な条件の対応性を確立する。これらの条件は「力学対称性」として知られている。「ハミルトニアンが持つ動的対称性のペアごとに対応する電荷が存在することを実証する。相互関係が広い範囲の突撃を担っていることを証明している。この対応から、システムに新しい電荷を導入することで、既存の非定常力学に寄与するか、破壊することができることを示す。新しい電荷が既存の電荷と通勤する場合、システムの非定常動力学はそのまま残り、新しい電荷が出現し、そうでなければ既存の非定常動力学は除去される。 Variouモデルを用いて結果を説明する。その結果,非交換電荷が促進する熱化の面が示された。 Studying noncommuting conserved quantities (or `charges') has produced a conceptual puzzle. Recent results suggest that noncommuting charges hinder thermalization in some ways, yet promote it in others. To help resolve this puzzle, we demonstrate how noncommuting charges can promote thermalization by reducing the number of local observables that thermalize according to the Eigenstate Thermalization Hypothesis. We first establish a correspondence between charges and sufficient conditions for observables to not thermalize. These conditions are known as `dynamical symmetries.' We demonstrate that for each pair of dynamical symmetries a Hamiltonian has, there exists a corresponding charge. We prove that the reciprocal relationship holds for a broad range of charges. From this correspondence, we demonstrate that introducing a new charge to a system can either contribute to or disrupt its existing non-stationary dynamics. If the new charge commutes with existing ones, the system's non-stationary dynamics remain intact, and new ones emerge; if not, the existing non-stationary dynamics are removed. We illustrate our results using variou models. Our results demonstrate a facet of thermalization which noncommuting charges promote.	翻訳日:2024-03-21 20:59:01 公開日:2024-03-19
# SceneScript: 自己回帰型構造化言語モデルでシーンを再構築する SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model ( http://arxiv.org/abs/2403.13064v1 ) ライセンス: Link先を確認	Armen Avetisyan, Christopher Xie, Henry Howard-Jenkins, Tsun-Yi Yang, Samir Aroudj, Suvam Patra, Fuyang Zhang, Duncan Frost, Luke Holland, Campbell Orme, Jakob Engel, Edward Miller, Richard Newcombe, Vasileios Balntas,	(参考訳) SceneScriptは,自己回帰型トークンベースのアプローチを用いて,構造化言語コマンドのシーケンスとして,シーンモデルを直接生成する手法である。提案するシーン表現は,近年のトランスフォーマーとLLMの成功に触発され,メッシュやボクセルグリッド,点雲,放射場などのシーンを一般的に記述する従来の手法から逸脱する。本手法は,シーン言語エンコーダ・デコーダアーキテクチャを用いて,映像データから構造化言語コマンドのセットを直接推論する。 SceneScriptを訓練するために、100万の高品質な室内シーンからなるAria Synthetic Environmentsと呼ばれる大規模な合成データセットを生成し、リリースする。提案手法は,3次元オブジェクト検出において,構造的レイアウト推定における最先端の成果と競合する結果を与える。最後に、構造化言語への簡単な追加を通じて、新しいコマンドに簡単に適応できるSceneScriptの利点について検討する。 We introduce SceneScript, a method that directly produces full scene models as a sequence of structured language commands using an autoregressive, token-based approach. Our proposed scene representation is inspired by recent successes in transformers & LLMs, and departs from more traditional methods which commonly describe scenes as meshes, voxel grids, point clouds or radiance fields. Our method infers the set of structured language commands directly from encoded visual data using a scene language encoder-decoder architecture. To train SceneScript, we generate and release a large-scale synthetic dataset called Aria Synthetic Environments consisting of 100k high-quality in-door scenes, with photorealistic and ground-truth annotated renders of egocentric scene walkthroughs. Our method gives state-of-the art results in architectural layout estimation, and competitive results in 3D object detection. Lastly, we explore an advantage for SceneScript, which is the ability to readily adapt to new commands via simple additions to the structured language, which we illustrate for tasks such as coarse 3D object part reconstruction.	翻訳日:2024-03-21 20:59:01 公開日:2024-03-19
# 自由電子量子光学系における強結合と単一光子非線形性 Strong coupling and single-photon nonlinearity in free-electron quantum optics ( http://arxiv.org/abs/2403.13071v1 ) ライセンス: Link先を確認	Aviv Karnieli, Charles Roques-Carmes, Nicholas Rivera, Shanhui Fan,	(参考訳) 自由電子が量子化された電磁場や物質系とコヒーレントに相互作用できるという観測は、自由電子のユニークな量子的性質を活用する多くの提案につながった。これらの提案の中心には、空飛ぶ自由電子とフォトニックモードの間の強い量子相互作用の仮定がある。しかし、既存のスキームは電子回折によって本質的に制限され、相互作用長と量子カップリング強度に上限が与えられる。ここでは、自由電子が2つの誘導モードで共伝播する「自由電子ファイバー」を効果的に1次元フォトニックシステムとして使用することを提案する。第1モードは、自由電子に雷動トラップを適用し、電子回折による限界を効果的に引き上げる。第2モードはガイドされた自由電子に強く結合し、以前の設計よりも桁違いに大きい結合が強化される。さらに,提案手法によって実現された相互作用長の延長により,自由電子を介する強い単一光子非線形性を実現することができる。我々は、決定論的単一光子放出や複素非線形多モードダイナミクスなど、我々のシステムにおけるいくつかの興味深い観測可能な量子効果を予測する。我々の提案は、非ガウス光発生、決定論的単一光子放出、自由電子-光子相互作用によって制御される量子ゲートなど、自由電子量子光学における多くの期待される効果の実現に向けた道を開くものである。 The observation that free electrons can interact coherently with quantized electromagnetic fields and matter systems has led to a plethora of proposals leveraging the unique quantum properties of free electrons. At the heart of these proposals lies the assumption of a strong quantum interaction between a flying free electron and a photonic mode. However, existing schemes are intrinsically limited by electron diffraction, which puts an upper bound on the interaction length and therefore the quantum coupling strength. Here, we propose the use of "free-electron fibers'': effectively one-dimensional photonic systems where free electrons co-propagate with two guided modes. The first mode applies a ponderomotive trap to the free electron, effectively lifting the limitations due to electron diffraction. The second mode strongly couples to the guided free electron, with an enhanced coupling that is orders of magnitude larger than previous designs. Moreover, the extended interaction lengths enabled by our scheme allows for strong single-photon nonlinearities mediated by free electrons. We predict a few interesting observable quantum effects in our system, such as deterministic single-photon emission and complex, nonlinear multimode dynamics. Our proposal paves the way towards the realization of many anticipated effects in free-electron quantum optics, such as non-Gaussian light generation, deterministic single photon emission, and quantum gates controlled by free-electron--photon interactions.	翻訳日:2024-03-21 18:56:56 公開日:2024-03-19
# AIの石炭鉱山のカナリア:アメリカのユダヤ人は、大規模言語モデルトレーニングにおける知的財産の処分によって不釣り合いになるかもしれない A Canary in the AI Coal Mine: American Jews May Be Disproportionately Harmed by Intellectual Property Dispossession in Large Language Model Training ( http://arxiv.org/abs/2403.13073v1 ) ライセンス: Link先を確認	Heila Precel, Allison McDonald, Brent Hecht, Nicholas Vincent,	(参考訳) 少数民族からの体系的な財産処分は、技術進歩という名目でしばしば行われている。本稿では,大規模言語モデル(LLM)の現在のパラダイムが,この長い歴史を継続していることを示す。一般的なLLMトレーニングデータセットから、ユダヤ人アメリカ人によって書かれた不均等な量のコンテンツが、同意なくトレーニングに使用されることが分かる。過剰表現の程度はおよそ2倍から6.5倍である。 LLMは、トレーニングデータを制作した人々の有給労働の代わりになる可能性があるので、今後数年でユダヤ人アメリカ人にさらに実質的で不均等な経済被害をもたらす可能性がある。本稿は、ユダヤ系アメリカ人を事例研究として取り上げるが、他の少数民族(アジア系アメリカ人、ヒンドゥー系アメリカ人など)も同様に影響を受けている可能性があり、最も重要なことは、害がすぐにほぼすべての人に影響を及ぼす現在のLLMパラダイムに関する深い構造的な懸念を浮き彫りにする「炭鉱の運河」として解釈されるべきである。政策立案者がLSMの規制方法や、LSMの推進に取り組んでいるAI分野の人たちに対して、これらの結果がもたらす意味について論じる。本研究は,異なる影響と広範囲な社会的危害を回避し,代替的なLCMパラダイムに向けた共同作業の重要性を強調した。 Systemic property dispossession from minority groups has often been carried out in the name of technological progress. In this paper, we identify evidence that the current paradigm of large language models (LLMs) likely continues this long history. Examining common LLM training datasets, we find that a disproportionate amount of content authored by Jewish Americans is used for training without their consent. The degree of over-representation ranges from around 2x to around 6.5x. Given that LLMs may substitute for the paid labor of those who produced their training data, they have the potential to cause even more substantial and disproportionate economic harm to Jewish Americans in the coming years. This paper focuses on Jewish Americans as a case study, but it is probable that other minority communities (e.g., Asian Americans, Hindu Americans) may be similarly affected and, most importantly, the results should likely be interpreted as a "canary in the coal mine" that highlights deep structural concerns about the current LLM paradigm whose harms could soon affect nearly everyone. We discuss the implications of these results for the policymakers thinking about how to regulate LLMs as well as for those in the AI field who are working to advance LLMs. Our findings stress the importance of working together towards alternative LLM paradigms that avoid both disparate impacts and widespread societal harms.	翻訳日:2024-03-21 18:56:56 公開日:2024-03-19
# HuLP: 予後のための人間--the-Loop HuLP: Human-in-the-Loop for Prognosis ( http://arxiv.org/abs/2403.13078v1 ) ライセンス: Link先を確認	Muhammad Ridzuan, Mai Kassem, Numan Saeed, Ikboljon Sobirov, Mohammad Yaqub,	(参考訳) 本稿では,Human-in-the-Loop for Prognosis(Human-in-the-Loop for Prognosis)モデルについて紹介する。 HuLPは、人間の専門家による介入を可能にする革新的なアプローチを提供し、臨床医がモデルの予測と対話し、修正できるようにし、より正確な予後を生み出すために人間とAIモデルの協力を促進する。加えて、HuLPは、ニューラルネットワークを活用し、欠落したデータを効果的に処理する調整済みの方法論を提供することによって、欠落するデータの課題に対処する。従来の方法では、患者集団内のニュアンスな変化を捉えるのに苦労することが多く、予後予測の妥協につながった。 HuLPは、イメージング機能に基づいた共変体を欠いていることを示唆し、クリニックワークフローとより緊密に連携し、信頼性を高める。我々は,HuLPの優位性を示すために,実世界の2つの公開医療データセットを用いて実験を行った。 This paper introduces HuLP, a Human-in-the-Loop for Prognosis model designed to enhance the reliability and interpretability of prognostic models in clinical contexts, especially when faced with the complexities of missing covariates and outcomes. HuLP offers an innovative approach that enables human expert intervention, empowering clinicians to interact with and correct models' predictions, thus fostering collaboration between humans and AI models to produce more accurate prognosis. Additionally, HuLP addresses the challenges of missing data by utilizing neural networks and providing a tailored methodology that effectively handles missing data. Traditional methods often struggle to capture the nuanced variations within patient populations, leading to compromised prognostic predictions. HuLP imputes missing covariates based on imaging features, aligning more closely with clinician workflows and enhancing reliability. We conduct our experiments on two real-world, publicly available medical datasets to demonstrate the superiority of HuLP.	翻訳日:2024-03-21 18:56:56 公開日:2024-03-19
# クロスバー型アナログインメモリ加速器のADC効率向上のためのプルーニング Pruning for Improved ADC Efficiency in Crossbar-based Analog In-memory Accelerators ( http://arxiv.org/abs/2403.13082v1 ) ライセンス: Link先を確認	Timur Ibrayev, Isha Garg, Indranil Chakraborty, Kaushik Roy,	(参考訳) ディープラーニングは多くのアプリケーションで成功したが、高い計算要求に悩まされており、デプロイにはカスタムアクセラレータを必要とする。クロスバーベースのアナログインメモリアーキテクチャは、高いデータ再利用とメモリのストレージと計算を組み合わせることで高効率を実現するため、ディープニューラルネットワーク(DNN)の加速に魅力がある。しかし、それらはクロスバー出力を伝えるためにアナログ・デジタル変換器(ADC)を必要とする。 ADCはすべてのクロスバー処理ユニットのエネルギーと面積の大部分を消費するので、潜在的な効率性は低下する。プルーニングはDNNの効率を改善するためによく研究されている手法であるが、クロスバーに有効な修正が必要である。本稿では,ADC固有の非効率性を目標とするクロスバー調整プルーニングの動機付けを行う。これは、3つの重要な性質(D.U.B.と呼ばれる)を同定し、精度を犠牲にすることなくADCエネルギーを低減できる空間性を誘導することで達成される。最初の特性は、スパーシティレベルを2の離散パワーに制限することで、スペーシティがハードウェア効率に効果的に変換されることを保証する。他の2つの特性は、精度の低下を和らげるために、同じクロスバーの列が非構造化とバランスの取れた間隔の両方を達成することを奨励する。所望のD.U.B.間隔は、同じクロスバー内の隣接する列の$L_{0}$ノルムの分散を規則化することによって達成される。提案した実装は、エンドツーエンドのグラデーションベースのトレーニングで直接使用することができる。提案アルゴリズムは,CIFAR-10とImageNetデータセットに基づいてトレーニングされたVGG11とResNet18モデルの畳み込み層に適用し,それぞれ7.13倍,1.27倍の改善を実現した。 Deep learning has proved successful in many applications but suffers from high computational demands and requires custom accelerators for deployment. Crossbar-based analog in-memory architectures are attractive for acceleration of deep neural networks (DNN), due to their high data reuse and high efficiency enabled by combining storage and computation in memory. However, they require analog-to-digital converters (ADCs) to communicate crossbar outputs. ADCs consume a significant portion of energy and area of every crossbar processing unit, thus diminishing the potential efficiency benefits. Pruning is a well-studied technique to improve the efficiency of DNNs but requires modifications to be effective for crossbars. In this paper, we motivate crossbar-attuned pruning to target ADC-specific inefficiencies. This is achieved by identifying three key properties (dubbed D.U.B.) that induce sparsity that can be utilized to reduce ADC energy without sacrificing accuracy. The first property ensures that sparsity translates effectively to hardware efficiency by restricting sparsity levels to Discrete powers of 2. The other 2 properties encourage columns in the same crossbar to achieve both Unstructured and Balanced sparsity in order to amortize the accuracy drop. The desired D.U.B. sparsity is then achieved by regularizing the variance of $L_{0}$ norms of neighboring columns within the same crossbar. Our proposed implementation allows it to be directly used in end-to-end gradient-based training. We apply the proposed algorithm to convolutional layers of VGG11 and ResNet18 models, trained on CIFAR-10 and ImageNet datasets, and achieve up to 7.13x and 1.27x improvement, respectively, in ADC energy with less than 1% drop in accuracy.	翻訳日:2024-03-21 18:56:56 公開日:2024-03-19
# 量子PCP予想(ゲーム版)の現状 The status of the quantum PCP conjecture (games version) ( http://arxiv.org/abs/2403.13084v1 ) ライセンス: Link先を確認	Anand Natarajan, Chinmay Nirkhe,	(参考訳) 古典的複雑性理論において、確率論的に検証可能な証明(制約満足度と非局所ゲームバージョン)の2つの定義は計算量的に同じである。量子環境では、状況ははるかに明確ではない。 MIP* = RE of Ji et al (arXiv:2001.04383) と、Natarajan と Zhang (arXiv:2302.04322) による改良により、多対数長のメッセージを持つ多目的対話型証明システムは、停止問題のような決定不能な問題を含むREのあらゆる問題を解決することができることを示した。これらの結果は、「制約満足度」あるいは「ハミルトニアン」量子PCP予想と非局所ゲームとの間の関係は、ゲーム内のプレイヤーを計算的に効率よく制限する必要があることを示している。本稿では,(1)標準AM完全問題に対する効率的なプロバーを備えた簡潔なMIPプロトコルを新たに構築し,(2)ナタラジャンとヴィディックのエネルギー増幅手順における誤りを説明する(arXiv:1710.03062)。量子ゲームPCP for QMA の障害を調査する際には、局所性がハミルトンの「パウリスペクトル」上の境界のようなより弱い制約に置き換わる場合でも、ハミルトニアンにとってギャップ増幅を理解することの重要性と課題を強調した。これらの疑問がハミルトン量子PCP予想の新しい「ベイビーバージョン」への進展を動機づけることを願っている。 In classical complexity theory, the two definitions of probabilistically checkable proofs -- the constraint satisfaction and the nonlocal games version -- are computationally equal in power. In the quantum setting, the situation is far less clear. The result MIP = RE of Ji et. al. (arXiv:2001.04383) and refinements by Natarajan and Zhang (arXiv:2302.04322) show that multiprover interactive proof systems with polylogarithmically long messages can solve any decision problem in RE, including undecidable problems like the halting problem. These results show that any connection between the "constraint satisfaction" or "Hamiltonian" quantum PCP conjecture and nonlocal games must involve restricting the players in the game to be computationally efficient. This note contains two main results: (1) we give a "quantum games PCP for AM" in the form of a new construction of a succinct MIP* protocol with efficient provers for the canonical AM-complete problem, and (2) we explain an error in the energy amplification procedure of Natarajan and Vidick (arXiv:1710.03062) which invalidates their claim to have constructed a quantum games PCP for a QMA-complete problem. In surveying the obstacles remaining towards a quantum games PCP for QMA, we highlight the importance and challenge of understanding gap amplification for Hamiltonians even when locality is replaced by much weaker constraints, such as bounds on the "Pauli spectrum" of the Hamiltonian. We hope these questions will motivate progress towards new "baby versions" of Hamiltonian quantum PCP conjecture.	翻訳日:2024-03-21 18:56:56 公開日:2024-03-19
# 音声分類のための可聴マップ Listenable Maps for Audio Classifiers ( http://arxiv.org/abs/2403.13086v1 ) ライセンス: Link先を確認	Francesco Paissan, Mirco Ravanelli, Cem Subakan,	(参考訳) さまざまなタスクにわたるディープラーニングモデルの素晴らしいパフォーマンスにもかかわらず、その複雑さは解釈に挑戦する。この課題は、音声信号の伝達が本質的に困難になる場合に特に顕著である。この問題に対処するために,音声分類のためのリスナブルマップ (L-MAC) を導入し,忠実で聞きやすい解釈を生成するポストホック解釈法を提案する。 L-MACは、事前訓練された分類器の上のデコーダを使用して、入力オーディオの関連部分をハイライトするバイナリマスクを生成する。我々は、マスクアウト部分のモデル出力の確率を最小化しつつ、音声のマスクイン部分における分類器決定の信頼性を最大化する特別な損失でデコーダを訓練する。領域内および領域外データの定量的評価は、L-MACが複数の勾配およびマスキングに基づく手法よりも一貫して忠実な解釈を生成することを示す。さらに,ユーザスタディでは,提案手法が生成した解釈を平均的に好んでいることを確認した。 Despite the impressive performance of deep learning models across diverse tasks, their complexity poses challenges for interpretation. This challenge is particularly evident for audio signals, where conveying interpretations becomes inherently difficult. To address this issue, we introduce Listenable Maps for Audio Classifiers (L-MAC), a posthoc interpretation method that generates faithful and listenable interpretations. L-MAC utilizes a decoder on top of a pretrained classifier to generate binary masks that highlight relevant portions of the input audio. We train the decoder with a special loss that maximizes the confidence of the classifier decision on the masked-in portion of the audio while minimizing the probability of model output for the masked-out portion. Quantitative evaluations on both in-domain and out-of-domain data demonstrate that L-MAC consistently produces more faithful interpretations than several gradient and masking-based methodologies. Furthermore, a user study confirms that, on average, users prefer the interpretations generated by the proposed technique.	翻訳日:2024-03-21 18:56:56 公開日:2024-03-19
# プロンプトチューニングによる大規模言語モデルを用いた医師と医師の対話の自動要約 Automatic Summarization of Doctor-Patient Encounter Dialogues Using Large Language Model through Prompt Tuning ( http://arxiv.org/abs/2403.13089v1 ) ライセンス: Link先を確認	Mengxian Lyu, Cheng Peng, Xiaohan Li, Patrick Balian, Jiang Bian, Yonghui Wu,	(参考訳) 自動テキスト要約(ATS: Automatic Text summarization)は、医師が継続的かつ協調的なケアを提供することを支援する技術である。本研究では,ジェネレーティブ・大規模言語モデル(LLM)を用いて医師と患者との対話を要約する手法を提案する。我々は, 臨床テキストを要約するために, 生成LDMを指示するプロンプトチューニングアルゴリズムを開発した。 GatorTronGPT(GatorTronGPT)は,277億のクリニカルおよび一般的な英単語を最大200億のパラメータで用いて開発され,迅速な学習方法,ソフトプロンプトのサイズ,数短学習能力について検討した。我々は,臨床ベンチマークデータセットMTS-DIALOGを用いて,広範に使用されているT5モデルの微調整に基づいて,GatorTronGPTと過去のソリューションを比較した。実験結果から, GatorTronGPT-20Bモデルがすべての評価指標で最高の性能を示した。提案手法は、PLMパラメータがプロンプトチューニング中に更新されないため、計算コストが低い。本研究は, プロンプトチューニングによる臨床用ALMの有効性を示すものである。 Automatic text summarization (ATS) is an emerging technology to assist clinicians in providing continuous and coordinated care. This study presents an approach to summarize doctor-patient dialogues using generative large language models (LLMs). We developed prompt-tuning algorithms to instruct generative LLMs to summarize clinical text. We examined the prompt-tuning strategies, the size of soft prompts, and the few-short learning ability of GatorTronGPT, a generative clinical LLM developed using 277 billion clinical and general English words with up to 20 billion parameters. We compared GatorTronGPT with a previous solution based on fine-tuning of a widely used T5 model, using a clinical benchmark dataset MTS-DIALOG. The experimental results show that the GatorTronGPT- 20B model achieved the best performance on all evaluation metrics. The proposed solution has a low computing cost as the LLM parameters are not updated during prompt-tuning. This study demonstrates the efficiency of generative clinical LLMs for clinical ATS through prompt tuning.	翻訳日:2024-03-21 18:56:56 公開日:2024-03-19
# JaxUED: Jaxのシンプルで使えるUEDライブラリ JaxUED: A simple and useable UED library in Jax ( http://arxiv.org/abs/2403.13091v1 ) ライセンス: Link先を確認	Samuel Coward, Michael Beukman, Jakob Foerster,	(参考訳) 本稿では,最新のUnsupervised Environment Design (UED) アルゴリズムの依存性を最小限に抑えるオープンソースライブラリであるJaxUEDを紹介する。 JaxUEDはハードウェアアクセラレーションを利用して、従来のCPUベースの実装と比べて100倍のスピードアップが得られる。 cleanRLにインスパイアされた我々は、高速で、明確で、理解しやすく、容易に変更可能な実装を提供し、UEDの研究を加速することを目的としている。本稿では,図書館について解説し,基本となる結果について述べる。コードはhttps://github.com/DramaCow/jaxued.comにある。 We present JaxUED, an open-source library providing minimal dependency implementations of modern Unsupervised Environment Design (UED) algorithms in Jax. JaxUED leverages hardware acceleration to obtain on the order of 100x speedups compared to prior, CPU-based implementations. Inspired by CleanRL, we provide fast, clear, understandable, and easily modifiable implementations, with the aim of accelerating research into UED. This paper describes our library and contains baseline results. Code can be found at https://github.com/DramaCow/jaxued.	翻訳日:2024-03-21 18:56:56 公開日:2024-03-19
# エンド・ツー・エンド深層学習を用いた鉄道線路の列車エゴパス検出 Train Ego-Path Detection on Railway Tracks Using End-to-End Deep Learning ( http://arxiv.org/abs/2403.13094v1 ) ライセンス: Link先を確認	Thomas Laurent,	(参考訳) 本稿では、インテリジェントな車載視覚システムのための鉄道軌道検出手法である「トレインエゴパス検出」の課題について紹介する。既存の研究は精度に欠けており、視野内の全ての線路を均一に検討することが多いが、提案課題は、潜在的に複雑でダイナミックな鉄道環境の中で列車の直進経路、すなわち「エゴパス」を特定することを目的としている。これに基づいて、ego-pathアノテーションでRailSem19データセットを拡張し、この方向のさらなる研究を促進する。私たちの研究の中心は、エゴパス検出に適したエンドツーエンドのディープラーニングフレームワークであるTEP-Netで、構成可能なモデルアーキテクチャ、動的データ拡張戦略、ドメイン固有の損失関数が特徴です。トラック検出問題を従来よりも微妙に扱いながら、テストセット上で97.5%のIoUを達成し、既存のすべてのメソッドよりも高速である。さらに比較分析は、TEP-Netの背景にある概念的選択の関連性を強調し、多様な環境条件と運用力学にまたがるロバスト性に固有の妥当性を示す。この作業は、インテリジェントドライバー支援システムと自律列車の運用の発展のための有望な道を開き、より安全で効率的な鉄道輸送への道を開いた。 This paper introduces the task of "train ego-path detection", a refined approach to railway track detection designed for intelligent onboard vision systems. Whereas existing research lacks precision and often considers all tracks within the visual field uniformly, our proposed task specifically aims to identify the train's immediate path, or "ego-path", within potentially complex and dynamic railway environments. Building on this, we extend the RailSem19 dataset with ego-path annotations, facilitating further research in this direction. At the heart of our study lies TEP-Net, an end-to-end deep learning framework tailored for ego-path detection, featuring a configurable model architecture, a dynamic data augmentation strategy, and a domain-specific loss function. Leveraging a regression-based approach, TEP-Net outperforms SOTA: while addressing the track detection problem in a more nuanced way than previously, our model achieves 97.5% IoU on the test set and is faster than all existing methods. Further comparative analysis highlights the relevance of the conceptual choices behind TEP-Net, demonstrating its inherent propensity for robustness across diverse environmental conditions and operational dynamics. This work opens promising avenues for the development of intelligent driver assistance systems and autonomous train operations, paving the way toward safer and more efficient railway transportation.	翻訳日:2024-03-21 18:56:56 公開日:2024-03-19
# オフライン強化学習のための簡易学習法 Simple Ingredients for Offline Reinforcement Learning ( http://arxiv.org/abs/2403.13097v1 ) ライセンス: Link先を確認	Edoardo Cetin, Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric, Yann Ollivier, Ahmed Touati,	(参考訳) オフライン強化学習アルゴリズムは、ターゲット下流タスクに高度に接続されたデータセットに有効であることが証明されている。しかし,異種音源からトラジェクトリを抽出する新しいテストベッド (MOOD) を利用することで,既存の手法は多様なデータに苦しむことを示す。この発見を踏まえて、我々は大規模な実証的研究を行い、この失敗を説明するためにいくつかの仮説を定式化し、検証する。驚くべきことに、アルゴリズム的な考慮以上のスケールが、パフォーマンスに影響を与える重要な要因であることがわかった。ネットワークサイズを増大させたAWACやIQLのような単純な手法は、MOODに付加データを含めることでパラドックス的障害モードを克服し、特に標準D4RLベンチマークにおける最先端のアルゴリズムよりも優れていることを示す。 Offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task. Yet, leveraging a novel testbed (MOOD) in which trajectories come from heterogeneous sources, we show that existing methods struggle with diverse data: their performance considerably deteriorates as data collected for related but different tasks is simply added to the offline buffer. In light of this finding, we conduct a large empirical study where we formulate and test several hypotheses to explain this failure. Surprisingly, we find that scale, more than algorithmic considerations, is the key factor influencing performance. We show that simple methods like AWAC and IQL with increased network size overcome the paradoxical failure modes from the inclusion of additional data in MOOD, and notably outperform prior state-of-the-art algorithms on the canonical D4RL benchmark.	翻訳日:2024-03-21 18:56:56 公開日:2024-03-19
# AdaptSFL:資源制約エッジネットワークにおける適応的分割学習 AdaptSFL: Adaptive Split Federated Learning in Resource-constrained Edge Networks ( http://arxiv.org/abs/2403.13101v1 ) ライセンス: Link先を確認	Zheng Lin, Guanqiao Qu, Wei Wei, Xianhao Chen, Kin K. Leung,	(参考訳) ディープニューラルネットワークの複雑さの増大は、リソース制限されたエッジデバイスにそれらを民主化する上で、大きな障壁となる。この課題に対処するため、分割フェデレーション学習(SFL)は、エッジデバイス間の並列トレーニングを可能にしながら、モデルのパーティショニングを通じて、プライマリトレーニングワークロードをサーバにフロードすることで、有望なソリューションとして登場した。しかし、システム最適化は資源制約付きシステムにおけるSFLの性能に大きく影響するが、問題は未解決のままである。本稿では、モデル分割(MS)とクライアント側モデル集約(MA)が学習性能に与える影響を定量化するSFLの収束解析を行い、理論的基礎となる。そこで我々は,資源制約付きエッジコンピューティングシステムの下でSFLを高速化する新しいリソース適応型SFLフレームワークであるAdaptSFLを提案する。具体的には、AdaptSFLはクライアント側MAとMSを適応的に制御し、通信計算のレイテンシとトレーニング収束のバランスをとる。提案するAdaptSFLフレームワークは,ベンチマークよりも目標精度を達成するのに要する時間を大幅に削減し,提案手法の有効性を実証する。 The increasing complexity of deep neural networks poses significant barriers to democratizing them to resource-limited edge devices. To address this challenge, split federated learning (SFL) has emerged as a promising solution by of floading the primary training workload to a server via model partitioning while enabling parallel training among edge devices. However, although system optimization substantially influences the performance of SFL under resource-constrained systems, the problem remains largely uncharted. In this paper, we provide a convergence analysis of SFL which quantifies the impact of model splitting (MS) and client-side model aggregation (MA) on the learning performance, serving as a theoretical foundation. Then, we propose AdaptSFL, a novel resource-adaptive SFL framework, to expedite SFL under resource-constrained edge computing systems. Specifically, AdaptSFL adaptively controls client-side MA and MS to balance communication-computing latency and training convergence. Extensive simulations across various datasets validate that our proposed AdaptSFL framework takes considerably less time to achieve a target accuracy than benchmarks, demonstrating the effectiveness of the proposed strategies.	翻訳日:2024-03-21 18:56:56 公開日:2024-03-19
# 量子情報における幾何学的手法と絡み合いの変動原理 Geometric methods in quantum information and entanglement variational principle ( http://arxiv.org/abs/2403.13102v1 ) ライセンス: Link先を確認	Daniele Iannotti, Alioscia Hamma,	(参考訳) 量子情報の幾何学的手法は、技術的なツールと難しい制御や最適化の問題への直感の両方を提供するために非常に有望である。さらに、これらは、GRのような純粋幾何学理論とAdS/CFT対応のような量子力学を結びつける上で、基本的な重要性がある。本稿ではまず,幾何学的手法が量子情報理論に有用であることが証明された最も重要な設定について調査する。次に、絡み合い、コヒーレンス、反平坦性といった量子資源に対する行動原理の一般的な枠組みを定めます。 2ビットシステムの場合について論じる。 Geometrical methods in quantum information are very promising for both providing technical tools and intuition into difficult control or optimization problems. Moreover, they are of fundamental importance in connecting pure geometrical theories, like GR, to quantum mechanics, like in the AdS/CFT correspondence. In this paper, we first make a survey of the most important settings in which geometrical methods have proven useful to quantum information theory. Then, we lay down a general framework for an action principle for quantum resources like entanglement, coherence, and anti-flatness. We discuss the case of a two-qubit system.	翻訳日:2024-03-21 18:56:56 公開日:2024-03-19
# IEEE-GDL CCDスマートビルディング入門 IEEE-GDL CCD Smart Buildings Introduction ( http://arxiv.org/abs/2403.13103v1 ) ライセンス: Link先を確認	Victor Manuel Larios, José Guadalupe Robledo, Leopoldo Gómez, R. Rincón,	(参考訳) IEEE-GDL CCDワーキンググループの物理インフラ活動の一環として、このホワイトペーパーは、スマートな建物開発のためのレイヤ、サービスの分類、ベストプラクティスを理解するための最初のガイドとなることを意図している。レイヤとサービス間の相互運用性を高めるために、オープンスタンダードが要求される。さらに、グアダラハラ市の2つの建物(新しい建物と更新する建物)は、開発中の概念実証であり、マスタープランに基づいてスマートシティインフラを開発する戦略の一部であると説明されている。この文書の貢献として、スマートな建物におけるイノベーションの領域と機会を特定するために、議論を行う。 As part of the activities of the IEEE-GDL CCD working group of physical infrastructure, this whitepaper is intented to be an initial guide to understand the layers, taxonomy of services and best practices for the development of smart buildings. Open standards are claimed in order to increase interoperability between layers and services. Moreover, two buildings in Guadalajara city, one new and another to renew, are described as a proof of concept under development and being part of the strategy to develop the smart city infrastructure based in a master plan. A discussion will be addressed in order to identify the areas of innovation and opportunities for the smart buildings as the contribution of this document.	翻訳日:2024-03-21 18:56:56 公開日:2024-03-19
# FPGAにおける未クロック繰り返しブール回路のタスク性能最適化のための進化的計算法 Using evolutionary computation to optimize task performance of unclocked, recurrent Boolean circuits in FPGAs ( http://arxiv.org/abs/2403.13105v1 ) ライセンス: Link先を確認	Raphael Norman-Tenazas, David Kleinberg, Erik C. Johnson, Daniel P. Lathrop, Matthew J. Roos,	(参考訳) FPGAにおけるブールゲートの非クロック・リカレントネットワークは低SWaP貯水池計算に利用できることが示されている。このようなシステムでは、ネットワークのトポロジとノード機能はランダムに初期化される。タスクを解くネットワークを構築するために、出力ノードに重みを適用し、それらの重みを従来の機械学習手法で調整して学習する。しかしながら、全てのパラメータが学習されるネットワークと比較して、パフォーマンスは制限されることが多い。本稿では,FPGAにおける非時計型再帰型ネットワークの代替学習手法について検討する。ネットワークノードのブール関数の進化には,進化計算を用いる。あるタイプの実装では、出力ノードは直接タスクを実行するために使用され、すべての学習はネットワークのノード関数の進化によって行われる。第2のタイプの実装では、従来の貯水池計算ではバックエンド分類器が使用される。この場合、ノード関数の進化と出力ノード重みの調整の両方が学習に寄与する。我々は,ノード関数の進化の実践性を実証し,画像分類タスクにおいて,毎秒300万以上のサンプルを処理しながら,約30%の精度向上が得られることを示した。さらに,ネットワークメモリと動的出力信号の進化性についても示す。 It has been shown that unclocked, recurrent networks of Boolean gates in FPGAs can be used for low-SWaP reservoir computing. In such systems, topology and node functionality of the network are randomly initialized. To create a network that solves a task, weights are applied to output nodes and learning is achieved by adjusting those weights with conventional machine learning methods. However, performance is often limited compared to networks where all parameters are learned. Herein, we explore an alternative learning approach for unclocked, recurrent networks in FPGAs. We use evolutionary computation to evolve the Boolean functions of network nodes. In one type of implementation the output nodes are used directly to perform a task and all learning is via evolution of the network's node functions. In a second type of implementation a back-end classifier is used as in traditional reservoir computing. In that case, both evolution of node functions and adjustment of output node weights contribute to learning. We demonstrate the practicality of node function evolution, obtaining an accuracy improvement of ~30% on an image classification task while processing at a rate of over three million samples per second. We additionally demonstrate evolvability of network memory and dynamic output signals.	翻訳日:2024-03-21 18:56:56 公開日:2024-03-19
# 非線形性を知る:データの下層構造を明らかにする共有インタラクション Knowing Your Nonlinearities: Shapley Interactions Reveal the Underlying Structure of Data ( http://arxiv.org/abs/2403.13106v1 ) ライセンス: Link先を確認	Divyansh Singhvi, Andrej Erkelens, Raghav Jain, Diganta Misra, Naomi Saphra,	(参考訳) 非線形特徴相互作用の測定は、多くのモデルにおける帰属の複雑なパターンを理解するための確立されたアプローチである。本稿では、Shapley Taylorインタラクション指標(STII)を用いて、様々なモダリティ、タスク、アーキテクチャにおいて、基礎となるデータ構造がモデル表現に与える影響を分析する。マスク付きおよび自己回帰型言語モデル(MLM,ALM)の言語構造を考えると,STIIは慣用表現内で増加し,MLMは構文的距離でSTIIをスケールし,ALMよりも非線形構造における構文に依存していることがわかった。本研究の音声モデルでは, 音素の開口度が音素の文脈によってどの程度変化するかを, 音素の主成分として反映している。最後に,画像分類器について検討し,特徴的相互作用が物体の境界を直感的に反映することを示す。我々の幅広い研究成果は、学際的な研究と解釈可能性研究におけるドメインの専門知識の利点を示している。 Measuring nonlinear feature interaction is an established approach to understanding complex patterns of attribution in many models. In this paper, we use Shapley Taylor interaction indices (STII) to analyze the impact of underlying data structure on model representations in a variety of modalities, tasks, and architectures. Considering linguistic structure in masked and auto-regressive language models (MLMs and ALMs), we find that STII increases within idiomatic expressions and that MLMs scale STII with syntactic distance, relying more on syntax in their nonlinear structure than ALMs do. Our speech model findings reflect the phonetic principal that the openness of the oral cavity determines how much a phoneme varies based on its context. Finally, we study image classifiers and illustrate that feature interactions intuitively reflect object boundaries. Our wide range of results illustrates the benefits of interdisciplinary work and domain expertise in interpretability research.	翻訳日:2024-03-21 18:56:56 公開日:2024-03-19
# 法的テキストの多段階要約による教師なし質問応答システムの実現に向けて Towards Unsupervised Question Answering System with Multi-level Summarization for Legal Text ( http://arxiv.org/abs/2403.13107v1 ) ライセンス: Link先を確認	M Manvith Prabhu, Haricharana Srinivasa, Anand Kumar M,	(参考訳) 本稿では,SCaLARチームによるSemEval-2024 Task 5: Legal Argument Reasoning in Civil procedureについて要約する。法文の複雑さに悩まされていたこのバイナリ分類課題に対処するために,ラベルを生成するための,単純ながら斬新な類似性と距離に基づく教師なしアプローチを提案する。さらに,CNN,GRU,LSTMなどのアンサンブル機能を用いて,多段階の法的検討を行った。データセットにおける法則的説明の長大な性質に対処するため、T5に基づくセグメントワイド要約を導入し、重要な情報を保持することに成功し、モデルの性能を向上させる。監視されていないシステムでは、開発セットのマクロF1スコアが20ポイント増加し、テストセットの10ポイント増加が見られた。 This paper summarizes Team SCaLAR's work on SemEval-2024 Task 5: Legal Argument Reasoning in Civil Procedure. To address this Binary Classification task, which was daunting due to the complexity of the Legal Texts involved, we propose a simple yet novel similarity and distance-based unsupervised approach to generate labels. Further, we explore the Multi-level fusion of Legal-Bert embeddings using ensemble features, including CNN, GRU, and LSTM. To address the lengthy nature of Legal explanation in the dataset, we introduce T5-based segment-wise summarization, which successfully retained crucial information, enhancing the model's performance. Our unsupervised system witnessed a 20-point increase in macro F1-score on the development set and a 10-point increase on the test set, which is promising given its uncomplicated architecture.	翻訳日:2024-03-21 18:47:08 公開日:2024-03-19
# 部分的共有がオンラインフェデレーション学習のモデル攻撃に対するレジリエンスに及ぼす影響の分析 Analyzing the Impact of Partial Sharing on the Resilience of Online Federated Learning Against Model Poisoning Attacks ( http://arxiv.org/abs/2403.13108v1 ) ライセンス: Link先を確認	Ehsan Lari, Vinay Chakravarthi Gogineni, Reza Arablouei, Stefan Werner,	(参考訳) 我々は,部分共有型オンラインフェデレーション学習(PSO-Fed)アルゴリズムのモデルポゾン攻撃に対するレジリエンスを精査する。 PSO-Fedは、クライアントが各更新ラウンドでサーバとモデル見積のごく一部しか交換できないようにすることで、通信負荷を低減する。モデル推定の部分的な共有はまた、モデル中毒攻撃に対するアルゴリズムの堅牢性を高める。この現象についてより深い知見を得るため,バイザンティンのクライアントの存在下でPSO-Fedアルゴリズムの性能を解析した。解析により, PSO-Fed は平均および平均2乗感覚の収束を保ち, モデルポゾン攻撃のひずみ下においても維持することを示した。さらに、PSO-Fedの理論的平均二乗誤差(MSE)を導出し、ステップサイズ、アタック確率、ビザンティンクライアント数、クライアント参加率、部分共有比、ノイズ分散などのパラメータにリンクする。また, PSO-Fed は, PSO-Fed に対して, モデルポゾン攻撃に直面する場合のステップサイズが非自明であることを示す。我々の大規模な数値実験の結果は、我々の理論的な主張を裏付け、PSO-Fedがビザンツ攻撃に対処する優れた能力を強調し、他の関連するアルゴリズムよりも優れています。 We scrutinize the resilience of the partial-sharing online federated learning (PSO-Fed) algorithm against model-poisoning attacks. PSO-Fed reduces the communication load by enabling clients to exchange only a fraction of their model estimates with the server at each update round. Partial sharing of model estimates also enhances the robustness of the algorithm against model-poisoning attacks. To gain better insights into this phenomenon, we analyze the performance of the PSO-Fed algorithm in the presence of Byzantine clients, malicious actors who may subtly tamper with their local models by adding noise before sharing them with the server. Through our analysis, we demonstrate that PSO-Fed maintains convergence in both mean and mean-square senses, even under the strain of model-poisoning attacks. We further derive the theoretical mean square error (MSE) of PSO-Fed, linking it to various parameters such as stepsize, attack probability, number of Byzantine clients, client participation rate, partial-sharing ratio, and noise variance. We also show that there is a non-trivial optimal stepsize for PSO-Fed when faced with model-poisoning attacks. The results of our extensive numerical experiments affirm our theoretical assertions and highlight the superior ability of PSO-Fed to counteract Byzantine attacks, outperforming other related leading algorithms.	翻訳日:2024-03-21 18:47:08 公開日:2024-03-19
# ダイヤモンド中の薄膜量子ビットの単一ショット読み出しと弱測定 Single-Shot Readout and Weak Measurement of a Tin-Vacancy Qubit in Diamond ( http://arxiv.org/abs/2403.13110v1 ) ライセンス: Link先を確認	Eric I. Rosenthal, Souvik Biswas, Giovanni Scuri, Hope Lee, Abigail J. Stein, Hannah C. Kleidermacher, Jakob Grzesik, Alison E. Rugar, Shahriar Aghaeimeibodi, Daniel Riedel, Michael Titze, Edward S. Bielejec, Joonhee Choi, Christopher P. Anderson, Jelena Vuckovic,	(参考訳) ダイヤモンド(SnV$^-$)の負電荷のスズ空洞は、次世代の長距離量子ネットワークを構築するための新興プラットフォームである。これは、SnV$^-$が明るい発光、電子ノイズに対する感度、ケルビン1以上の温度での長いスピンコヒーレンス時間を含む好ましい光学特性とスピン特性のためである。ここでは、単発読み出しフィリティが87.4 %$の単一SnV$^-$電子スピンの測定を実証し、複数の読み出しの条件付けによりさらに9,8.5 %$に改善できることを示した。この性能は高速マイクロ波スピン制御と互換性があることを示し、SnV$^-$に対して、ダイヤモンド中のグループIV中心に固有の光リードアウトとスピン制御のトレードオフを克服できることを示した。量子力学における測定とデコヒーレンスの間の基本的な相互作用を照らし、量子ビットのスピンコヒーレンスをメトロジーツールとして利用する。これらの結果は、SnV$^-$ベースの量子技術の発展において重要なハードルを克服し、その過程で固体量子エミッタの研究に広く適用できる技術と理解を開発する。 The negatively charged tin-vacancy center in diamond (SnV$^-$) is an emerging platform for building the next generation of long-distance quantum networks. This is due to the SnV$^-$'s favorable optical and spin properties including bright emission, insensitivity to electronic noise, and long spin coherence times at temperatures above 1 Kelvin. Here, we demonstrate measurement of a single SnV$^-$ electronic spin with a single-shot readout fidelity of $87.4\%$, which can be further improved to $98.5\%$ by conditioning on multiple readouts. We show this performance is compatible with rapid microwave spin control, demonstrating that the trade-off between optical readout and spin control inherent to group-IV centers in diamond can be overcome for the SnV$^-$. Finally, we use weak quantum measurement to study measurement induced dephasing; this illuminates the fundamental interplay between measurement and decoherence in quantum mechanics, and makes use of the qubit's spin coherence as a metrological tool. Taken together, these results overcome an important hurdle in the development of the SnV$^-$ based quantum technologies, and in the process, develop techniques and understanding broadly applicable to the study of solid-state quantum emitters.	翻訳日:2024-03-21 18:47:08 公開日:2024-03-19
# 医学予測問題におけるノイズラベルを用いた深層学習--スコーピングレビュー Deep learning with noisy labels in medical prediction problems: a scoping review ( http://arxiv.org/abs/2403.13111v1 ) ライセンス: Link先を確認	Yishu Wei, Yu Deng, Cong Sun, Mingquan Lin, Hongmei Jiang, Yifan Peng,	(参考訳) 目的: 医学研究は、専門家間の多様性や機械抽出ラベルといった要因によるノイズの多いラベルによる重大な課題に直面します。それにもかかわらず、ラベルノイズ管理の採用は限定的であり、ラベルノイズはほとんど無視されている。この目的のためには,問題領域に着目したスクーピングレビューを実施することが不可欠である。本研究の目的は,ラベルノイズ検出,ラベルノイズハンドリング,評価を含む深層学習に基づく医療予測問題において,ラベルノイズ管理を包括的にレビューすることである。ラベルの不確実性に関する研究も含んでいる。方法】Scoping review following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines。 PubMed、IEEE Xplore、Google Scholar、Semantic Scholarの4つのデータベースを検索しました。検索用語は,「ノイズラベルと医療/医療/臨床」「不確実性と医療/医療/臨床」「ノイズと医療/医療/臨床」である。結果: 2016年から2023年の間に合計60の論文が包括的基準を満たした。医学研究における一連の実践的課題について検討する。これにはラベルノイズの発生源、ラベルノイズの影響、ラベルノイズの検出、ラベルノイズハンドリング技術、評価が含まれる。ラベルノイズ検出手法とハンドリング手法の両方を分類する。考察: 方法論的観点から, 医療コミュニティは, より広範な深層学習コミュニティと最新のものであることを観察し, 医療データに基づいてほとんどの技術が評価されていることを考察した。医療研究の標準要素としてラベルノイズを検討することを推奨する。最初の実験は、ノイズロバスト損失関数、重み付け、カリキュラム学習など、簡単に実装できる方法から始めることができる。 Objectives: Medical research faces substantial challenges from noisy labels attributed to factors like inter-expert variability and machine-extracted labels. Despite this, the adoption of label noise management remains limited, and label noise is largely ignored. To this end, there is a critical need to conduct a scoping review focusing on the problem space. This scoping review aims to comprehensively review label noise management in deep learning-based medical prediction problems, which includes label noise detection, label noise handling, and evaluation. Research involving label uncertainty is also included. Methods: Our scoping review follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. We searched 4 databases, including PubMed, IEEE Xplore, Google Scholar, and Semantic Scholar. Our search terms include "noisy label AND medical / healthcare / clinical", "un-certainty AND medical / healthcare / clinical", and "noise AND medical / healthcare / clinical". Results: A total of 60 papers met inclusion criteria between 2016 and 2023. A series of practical questions in medical research are investigated. These include the sources of label noise, the impact of label noise, the detection of label noise, label noise handling techniques, and their evaluation. Categorization of both label noise detection methods and handling techniques are provided. Discussion: From a methodological perspective, we observe that the medical community has been up to date with the broader deep-learning community, given that most techniques have been evaluated on medical data. We recommend considering label noise as a standard element in medical research, even if it is not dedicated to handling noisy labels. Initial experiments can start with easy-to-implement methods, such as noise-robust loss functions, weighting, and curriculum learning.	翻訳日:2024-03-21 18:47:08 公開日:2024-03-19
# 並列で一度のエンコードとデコード:効率的なトランスフォーマーデコード Encode Once and Decode in Parallel: Efficient Transformer Decoding ( http://arxiv.org/abs/2403.13112v1 ) ライセンス: Link先を確認	Bo-Ru Lu, Nikita Haduong, Chien-Yu Lin, Hao Cheng, Noah A. Smith, Mari Ostendorf,	(参考訳) トランスフォーマーベースのNLPモデルは強力だが、デプロイメントシナリオを制限する計算コストが高い。微細エンコーダ-デコーダモデルは特殊なドメインで人気があり、GPT-4のようなより一般化されたデコーダのみのモデルよりも優れている。本稿では,1つの入力に対して複数の出力を必要とする構造化出力と質問応答タスクの効率を向上させるエンコーダ・デコーダモデルのための新しい構成を提案する。提案手法は,インパルスインデコーダ(PiD)を一度エンコードし,出力を並列にデコードすることで,重複するインプットエンコーディングを回避することにより,トレーニングと推論の効率を向上し,デコーダのメモリフットプリントを低減する。我々は,対話状態追跡,要約,質問応答タスクにおいて,最大4.6倍の高速化を実現した。トレーニング/推論コードとチェックポイントをリリースします。 Transformer-based NLP models are powerful but have high computational costs that limit deployment scenarios. Finetuned encoder-decoder models are popular in specialized domains and can outperform larger more generalized decoder-only models, such as GPT-4. We introduce a new configuration for encoder-decoder models that improves efficiency on structured output and question-answering tasks where multiple outputs are required of a single input. Our method, prompt-in-decoder (PiD), encodes the input once and decodes output in parallel, boosting both training and inference efficiency by avoiding duplicate input encoding, thereby reducing the decoder's memory footprint. We achieve computation reduction that roughly scales with the number of subtasks, gaining up to 4.6x speed-up over state-of-the-art models for dialogue state tracking, summarization, and question-answering tasks with comparable or better performance. We release our training/inference code and checkpoints.	翻訳日:2024-03-21 18:47:08 公開日:2024-03-19
# 肺癌切除用プレトレーニングトランスの信頼性 Trustworthiness of Pretrained Transformers for Lung Cancer Segmentation ( http://arxiv.org/abs/2403.13113v1 ) ライセンス: Link先を確認	Aneesh Rangnekar, Nishant Nadkarni, Jue Jiang, Harini Veeraraghavan,	(参考訳) 670 CT と MRI を用いた肺小切開術において,Swin UNETR と SMIT の2種類の自己超越型トランスフォーマーモデルの信頼性について検討した。前立腺癌における3D-CTの2つのデータセットのセグメンテーション精度, 新型コロナウイルス患者のCTスキャンの堅牢性, 卵巣癌患者のCTスキャン, 前立腺癌患者のT2強調MRI, およびT2強調MRIにおけるLCのゼロショット一般化について検討した。どちらのモデルも、分配データの精度が高い(SMITは0.80、Swin UNETRは0.78)。 SMITは、CTスキャンでは同様の偏分布性能(AUROC 89.85% vs. 89.19%)を示したが、CT(AUROC 97.2% vs. 87.1%)とMRI(92.15% vs. 73.8%)ではより優れた偏分布精度を示した。 SMITはMRIでのゼロショットセグメンテーション(Dice 0.78 vs. 0.69)でSwin UNETRより優れていた。これらの知見は,日常的な臨床利用における,現在および将来の事前訓練モデルの安全な開発と展開の指針となるものと期待される。 We assessed the trustworthiness of two self-supervision pretrained transformer models, Swin UNETR and SMIT, for fine-tuned lung (LC) tumor segmentation using 670 CT and MRI scans. We measured segmentation accuracy on two public 3D-CT datasets, robustness on CT scans of patients with COVID-19, CT scans of patients with ovarian cancer and T2-weighted MRI of men with prostate cancer, and zero-shot generalization of LC for T2-weighted MRIs. Both models demonstrated high accuracy on in-distribution data (Dice 0.80 for SMIT and 0.78 for Swin UNETR). SMIT showed similar near-out-of-distribution performance on CT scans (AUROC 89.85% vs. 89.19%) but significantly better far-out-of-distribution accuracy on CT (AUROC 97.2% vs. 87.1%) and MRI (92.15% vs. 73.8%). SMIT outperformed Swin UNETR in zero-shot segmentation on MRI (Dice 0.78 vs. 0.69). We expect these findings to guide the safe development and deployment of current and future pretrained models in routine clinical use.	翻訳日:2024-03-21 18:47:08 公開日:2024-03-19
# MLOps原則導入のメリットと限界に関する専門家の見解 Professional Insights into Benefits and Limitations of Implementing MLOps Principles ( http://arxiv.org/abs/2403.13115v1 ) ライセンス: Link先を確認	Gabriel Araujo, Marcos Kalinowski, Markus Endler, Fabio Calefato,	(参考訳) コンテキスト: 機械学習オペレーション(MLOps)は、開発、テスト、運用を組み合わせて機械学習アプリケーションのデプロイとメンテナンスを行う一連のプラクティスとして登場した。目的:本稿では,オンライン教師あり学習におけるMLOps原則のメリットと限界を評価する。方法: 経験豊富な機械学習開発者6名を対象に、オンライン機械学習アプリケーションにMLOps原則を適用することのメリットと制限について、2つのフォーカスグループセッションを行った。結果: 機械学習の開発者はMLOps原則を使うことの多くのメリットを享受しているが、これらは彼らが取り組んでいるすべてのプロジェクトに当てはまらない。専門家によると、この投資は、十分に準備された自動化プロセスを必要とする継続的デプロイメントを備えた大規模アプリケーションに対して対価を支払う傾向にある。しかしながら、初期バージョンの機械学習アプリケーションでは、その原則の実装に要する労力はプロジェクトのスコープを拡大し、最初のバージョンを本番環境にデプロイするために必要な時間を増やすことができる。議論の結果、ほとんどのメリットは、エラーが発生しやすい手動ステップの回避、アプリケーションの以前の状態への復元、堅牢な継続的自動デプロイメントパイプラインの確立に関連している、という結論に達した。結論: プロジェクトのスコープとニーズを考慮してMLOps原則を実装する上で、投資時間と労力のトレードオフをバランスさせることが重要です。 Context: Machine Learning Operations (MLOps) has emerged as a set of practices that combines development, testing, and operations to deploy and maintain machine learning applications. Objective: In this paper, we assess the benefits and limitations of using the MLOps principles in online supervised learning. Method: We conducted two focus group sessions on the benefits and limitations of applying MLOps principles for online machine learning applications with six experienced machine learning developers. Results: The focus group revealed that machine learning developers see many benefits of using MLOps principles but also that these do not apply to all the projects they worked on. According to experts, this investment tends to pay off for larger applications with continuous deployment that require well-prepared automated processes. However, for initial versions of machine learning applications, the effort taken to implement the principles could enlarge the project's scope and increase the time needed to deploy a first version to production. The discussion brought up that most of the benefits are related to avoiding error-prone manual steps, enabling to restore the application to a previous state, and having a robust continuous automated deployment pipeline. Conclusions: It is important to balance the trade-offs of investing time and effort in implementing the MLOps principles considering the scope and needs of the project, favoring such investments for larger applications with continuous model deployment requirements.	翻訳日:2024-03-21 18:47:08 公開日:2024-03-19
# 最適フローマッチング: たった1ステップで直線軌道を学習する Optimal Flow Matching: Learning Straight Trajectories in Just One Step ( http://arxiv.org/abs/2403.13117v1 ) ライセンス: Link先を確認	Nikita Kornilov, Alexander Gasnikov, Alexander Korotin,	(参考訳) 近年,生成モデルのためのフローマッチング手法の開発が盛んに行われている。コミュニティが追求する興味深い特性の1つは、最適な輸送(OT)変位を実現する直線軌道で流れを学習する能力である。学習したフローのパスを素早く統合するには、ストレートネスが重要です。残念ながら、既存のフローストレート化手法のほとんどは、トレーニング中にエラーを蓄積したり、ヒューリスティックなミニバッチOT近似を利用する非自明な反復手順に基づいている。この問題に対処するため, 直列OT変位を2次コストで回収する最適流れマッチング手法を, たった1つのフローマッチングステップで開発する。 Over the several recent years, there has been a boom in development of flow matching methods for generative modeling. One intriguing property pursued by the community is the ability to learn flows with straight trajectories which realize the optimal transport (OT) displacements. Straightness is crucial for fast integration of the learned flow's paths. Unfortunately, most existing flow straightening methods are based on non-trivial iterative procedures which accumulate the error during training or exploit heuristic minibatch OT approximations. To address this issue, we develop a novel optimal flow matching approach which recovers the straight OT displacement for the quadratic cost in just one flow matching step.	翻訳日:2024-03-21 18:47:08 公開日:2024-03-19
# 多変量ガウス過程回帰による時空間データのモーダル解析 Modal Analysis of Spatiotemporal Data via Multivariate Gaussian Process Regression ( http://arxiv.org/abs/2403.13118v1 ) ライセンス: Link先を確認	Jiwoo Song, Daning Huang,	(参考訳) モーダル解析は複雑な流れのコヒーレントな構造を理解するための重要なツールとなっている。動的モード分解(DMD)やスペクトル固有直交分解(SPOD)のような古典的なモーダル解析法は、時間内に定期的にサンプリングされる十分な量のデータに依存している。しかし、実験的な測定とシミュレーションアルゴリズムのため、しばしばスパース時間的に不規則なデータ、例えば、データを扱う必要がある。データ不足と不規則サンプリングの限界を克服するために,多変量ガウス過程回帰(MVGPR)を用いた新しいモーダル解析手法を提案する。まず,線形システム同定の観点から,MVGPRと既存のモーダル解析技術であるDMDとSPODの関連性を確立する。次に、この接続を利用して、上記の制限に対処するMVGPRに基づくモーダル解析手法を開発する。 MVGPRの能力は、仮定された線形力学から導かれる相関関数の法則的に設計されたカーネル構造によって与えられる。その後, MVGPR法は, 学術・合成データから非定常空気翼空力学まで, DMD と SPOD に対してベンチマークを行った。その結果,MVGPRは古典的モーダル解析法に代わる有望な代替手段であることを示す。 Modal analysis has become an essential tool to understand the coherent structure of complex flows. The classical modal analysis methods, such as dynamic mode decomposition (DMD) and spectral proper orthogonal decomposition (SPOD), rely on a sufficient amount of data that is regularly sampled in time. However, often one needs to deal with sparse temporally irregular data, e.g., due to experimental measurements and simulation algorithm. To overcome the limitations of data scarcity and irregular sampling, we propose a novel modal analysis technique using multi-variate Gaussian process regression (MVGPR). We first establish the connection between MVGPR and the existing modal analysis techniques, DMD and SPOD, from a linear system identification perspective. Next, leveraging this connection, we develop a MVGPR-based modal analysis technique that addresses the aforementioned limitations. The capability of MVGPR is endowed by its judiciously designed kernel structure for correlation function, that is derived from the assumed linear dynamics. Subsequently, the proposed MVGPR method is benchmarked against DMD and SPOD on a range of examples, from academic and synthesized data to unsteady airfoil aerodynamics. The results demonstrate MVGPR as a promising alternative to classical modal analysis methods, especially in the scenario of scarce and temporally irregular data.	翻訳日:2024-03-21 18:47:08 公開日:2024-03-19
# 凸最適化による制約付き確率回路 Probabilistic Circuits with Constraints via Convex Optimization ( http://arxiv.org/abs/2403.13125v1 ) ライセンス: Link先を確認	Soroush Ghandi, Benjamin Quost, Cassio de Campos,	(参考訳) 本研究は確率論的論理制約を確率論的回路(PC)によって符号化された分布に統合する。 PCは、いくつかの領域で最先端の性能を達成しつつ、効率的な計算(条件や限界確率など)を可能にするトラクタブルモデルのクラスである。提案手法は,PCと制約の両方を入力とし,制約を満たす新しいPCを出力する。これは、モデル全体を再トレーニングすることなく、凸最適化によって効率的に行われる。経験的評価は、制約とPCの組み合わせは、不足データや不完全データによるモデル性能の向上、モデル適合性を損なうことなくモデルに機械学習公正度対策を適用するなど、複数のユースケースを持つことができることを示している。これらのアイデアは、論理と深い確率モデルの組み合わせを含む、他の複数のアプリケーションの可能性を開くだろうと考えています。 This work addresses integrating probabilistic propositional logic constraints into the distribution encoded by a probabilistic circuit (PC). PCs are a class of tractable models that allow efficient computations (such as conditional and marginal probabilities) while achieving state-of-the-art performance in some domains. The proposed approach takes both a PC and constraints as inputs, and outputs a new PC that satisfies the constraints. This is done efficiently via convex optimization without the need to retrain the entire model. Empirical evaluations indicate that the combination of constraints and PCs can have multiple use cases, including the improvement of model performance under scarce or incomplete data, as well as the enforcement of machine learning fairness measures into the model without compromising model fitness. We believe that these ideas will open possibilities for multiple other applications involving the combination of logics and deep probabilistic models.	翻訳日:2024-03-21 18:47:08 公開日:2024-03-19
# AdaFish: 2次情報を用いた高速低ランクパラメータ効率微調整 AdaFish: Fast low-rank parameter-efficient fine-tuning by using second-order information ( http://arxiv.org/abs/2403.13128v1 ) ライセンス: Link先を確認	Jiang Hu, Quanzheng Li,	(参考訳) 大規模事前学習モデルの最近の進歩は、自然言語処理やコンピュータビジョンにおける様々なタスクにおける性能を著しく向上させてきた。しかし、これらのモデルの膨大な数のパラメータは、完全なトレーニングのためにかなりのメモリと計算資源を必要とする。これらのモデルを下流タスクや特定のアプリケーション指向データセットに適用するために、事前訓練されたパラメータを利用するパラメータ効率の高い微調整手法が注目されている。しかし、多くのパラメータやエポックのため、依然として時間がかかります。本研究では,低ランク分解に基づく微調整フレームワークにおいて,学習プロセスを高速化するために設計された2次型の効率的なアルゴリズムであるAdaFishを紹介する。我々のキーとなる観察は、関連する一般化されたフィッシャー情報行列が低ランクか極小スケールであることである。このような一般化されたフィッシャー情報行列はヘッセン行列と同値であることが示される。さらに、AdaFishのグローバル収束と、そのイテレーション/オークルの複雑さを証明します。数値実験により,本アルゴリズムは最先端のAdamW法と非常に競合することを示した。 Recent advancements in large-scale pretrained models have significantly improved performance across a variety of tasks in natural language processing and computer vision. However, the extensive number of parameters in these models necessitates substantial memory and computational resources for full training. To adapt these models for downstream tasks or specific application-oriented datasets, parameter-efficient fine-tuning methods leveraging pretrained parameters have gained considerable attention. However, it can still be time-consuming due to lots of parameters and epochs. In this work, we introduce AdaFish, an efficient algorithm of the second-order type designed to expedite the training process within low-rank decomposition-based fine-tuning frameworks. Our key observation is that the associated generalized Fisher information matrix is either low-rank or extremely small-scaled. Such a generalized Fisher information matrix is shown to be equivalent to the Hessian matrix. Moreover, we prove the global convergence of AdaFish, along with its iteration/oracle complexity. Numerical experiments show that our algorithm is quite competitive with the state-of-the-art AdamW method.	翻訳日:2024-03-21 18:47:08 公開日:2024-03-19
# よりよいコールSAL: ライダーであらゆるものをセグメンテーションする学習を目指す Better Call SAL: Towards Learning to Segment Anything in Lidar ( http://arxiv.org/abs/2403.13129v1 ) ライセンス: Link先を確認	Aljoša Ošep, Tim Meinhardt, Francesco Ferroni, Neehar Peri, Deva Ramanan, Laura Leal-Taixé,	(参考訳) 我々は、Lidar内の任意のオブジェクトのセグメンテーションと分類のためのテキストプロンプタブルゼロショットモデルと、手動の監督なしにモデルトレーニングを容易にする擬似ラベルエンジンからなる、$\texttt{S}$egment $\texttt{A}$nything in $\texttt{L}$idar)メソッドを提案する。 $\textit{Lidar Panoptic Segmentation}$ (LPS) の確立されたパラダイムは、優先順位を定義した少数のオブジェクトクラスの手動による監督に依存しているのに対し、我々は2D視覚基盤モデルを使用して「無償で」3Dの監視を生成する。擬似ラベルはインスタンスマスクと対応するCLIPトークンで構成されており、校正マルチモーダルデータを用いてLidarに持ち込む。これらのラベルでモデルをトレーニングすることにより、2DファンデーションモデルをLidar $\texttt{SAL}$モデルに蒸留する。手動ラベルがなくても、クラスに依存しないセグメンテーションでは911\%、完全に監督された最先端のゼロショットLPSでは44\%である。さらに, 蒸留ではなく, リフト画像の特徴を3Dに反映したいくつかのベースラインの性能を向上する。さらに重要なことは、$\texttt{SAL}$が任意のクラスプロンプトをサポートし、新しいデータセットに容易に拡張できることを示し、自己ラベル付きデータの増加によって改善する可能性を示している。 We propose $\texttt{SAL}$ ($\texttt{S}$egment $\texttt{A}$nything in $\texttt{L}$idar) method consisting of a text-promptable zero-shot model for segmenting and classifying any object in Lidar, and a pseudo-labeling engine that facilitates model training without manual supervision. While the established paradigm for $\textit{Lidar Panoptic Segmentation}$ (LPS) relies on manual supervision for a handful of object classes defined a priori, we utilize 2D vision foundation models to generate 3D supervision "for free". Our pseudo-labels consist of instance masks and corresponding CLIP tokens, which we lift to Lidar using calibrated multi-modal data. By training our model on these labels, we distill the 2D foundation models into our Lidar $\texttt{SAL}$ model. Even without manual labels, our model achieves $91\%$ in terms of class-agnostic segmentation and $44\%$ in terms of zero-shot LPS of the fully supervised state-of-the-art. Furthermore, we outperform several baselines that do not distill but only lift image features to 3D. More importantly, we demonstrate that $\texttt{SAL}$ supports arbitrary class prompts, can be easily extended to new datasets, and shows significant potential to improve with increasing amounts of self-labeled data.	翻訳日:2024-03-21 18:47:08 公開日:2024-03-19
# 連続的ニューラルネットワーク翻訳のための自己生成リプレイ記憶 Self-generated Replay Memories for Continual Neural Machine Translation ( http://arxiv.org/abs/2403.13130v1 ) ライセンス: Link先を確認	Michele Resta, Davide Bacciu,	(参考訳) 現代のニューラル機械翻訳システムは、いくつかの異なる言語で強い性能を示し、常に改善されている。しかし、彼らの継続的な学習能力は、大惨事な忘れ物問題によって依然として著しく制限されている。本研究では,エンコーダ・デコーダ変換器の重要な特性,すなわち生成能力を活用し,ニューラルマシン翻訳システムの継続的な学習方法を提案する。本稿では,モデル自体を並列文の生成元として使用することにより,リプレイメモリを活用することで,異なる言語を構成する経験の流れを効果的に学習する方法を示す。我々は,トレーニングデータの明示的な記憶を必要とせずに,破滅的な忘れ込みを防止できることを実証的に実証した。コードは公開時に公開される。コード:https://github.com/m-resta/sg-rep Modern Neural Machine Translation systems exhibit strong performance in several different languages and are constantly improving. Their ability to learn continuously is, however, still severely limited by the catastrophic forgetting issue. In this work, we leverage a key property of encoder-decoder Transformers, i.e. their generative ability, to propose a novel approach to continually learning Neural Machine Translation systems. We show how this can effectively learn on a stream of experiences comprising different languages, by leveraging a replay memory populated by using the model itself as a generator of parallel sentences. We empirically demonstrate that our approach can counteract catastrophic forgetting without requiring explicit memorization of training data. Code will be publicly available upon publication. Code: https://github.com/m-resta/sg-rep	翻訳日:2024-03-21 18:47:08 公開日:2024-03-19
# 対人訓練におけるロバストNAS--ベンチマーク、理論、その他 Robust NAS under adversarial training: benchmark, theory, and beyond ( http://arxiv.org/abs/2403.13134v1 ) ライセンス: Link先を確認	Yongtao Wu, Fanghui Liu, Carl-Johann Simon-Gabriel, Grigorios G Chrysos, Volkan Cevher,	(参考訳) ニューラルアーキテクチャサーチ(NAS)の最近の進歩は、悪意のあるデータに対して堅牢なアーキテクチャを考えることの重要性を強調している。しかし、特に敵の訓練を考慮した場合、これらの堅牢なアーキテクチャを探索するためのベンチマーク評価や理論的保証が特に欠落している。本研究は,これら2つの課題に対処し,2つのコントリビューションを実現することを目的としている。まず、画像データセット上のNAS-Bench-201探索空間から、多数の敵に訓練されたネットワークに対して、クリーンな精度とロバストな精度の両方を包含する包括的データセットをリリースする。深層学習理論からニューラル・タンジェント・カーネル(NTK)ツールを応用し,多目的対人訓練において,クリーンな精度とロバストな精度でアーキテクチャを探索する一般化理論を確立する。我々は、信頼性の高い再現性、効率的な評価、理論的基礎、特に堅牢なアーキテクチャの追求を通じて、我々のベンチマークと理論的洞察がNASコミュニティに大きな恩恵をもたらすと強く信じている。 Recent developments in neural architecture search (NAS) emphasize the significance of considering robust architectures against malicious data. However, there is a notable absence of benchmark evaluations and theoretical guarantees for searching these robust architectures, especially when adversarial training is considered. In this work, we aim to address these two challenges, making twofold contributions. First, we release a comprehensive data set that encompasses both clean accuracy and robust accuracy for a vast array of adversarially trained networks from the NAS-Bench-201 search space on image datasets. Then, leveraging the neural tangent kernel (NTK) tool from deep learning theory, we establish a generalization theory for searching architecture in terms of clean accuracy and robust accuracy under multi-objective adversarial training. We firmly believe that our benchmark and theoretical insights will significantly benefit the NAS community through reliable reproducibility, efficient assessment, and theoretical foundation, particularly in the pursuit of robust architectures.	翻訳日:2024-03-21 18:47:08 公開日:2024-03-19
# センチネル2画像の自動ラベリングによる極海氷分類のための並列ワークフロー A Parallel Workflow for Polar Sea-Ice Classification using Auto-labeling of Sentinel-2 Imagery ( http://arxiv.org/abs/2403.13135v1 ) ライセンス: Link先を確認	Jurdana Masuma Iqrah, Wei Wang, Hongjie Xie, Sushil Prasad,	(参考訳) 北極海氷の進行・後退パターンの観測は地球温暖化の重要な指標である。本研究の目的は, 極海氷を, 厚い, 雪に覆われた, 薄い, 開いている水として, センチネル2 (S2) 画像を用いて分類する, 堅牢で効果的でスケーラブルなシステムを開発することである。 S2衛星は地表の高解像度画像を積極的に捉えているため、分類する必要がある画像はたくさんある。 1つの大きな障害は、基礎となる真実として振る舞うためのラベル付きS2トレーニングデータ(イメージ)がないことである。そこで本研究では,S2画像のセグメンテーションと自動ラベル付けを行う,スケーラブルで高精度な手法を提案する。我々は、PySparkを使って9倍のデータロードを実現し、16倍のマップ・リデュース・スピードアップを、薄いクラウドとシャドウフィルタによるカラー・セグメンテーションに基づいて実現し、ラベルデータを生成する。このプロセスから生成された自動ラベル付きデータは、U-Net機械学習モデルのトレーニングに使用される。 U-Net分類モデルのトレーニングは計算に重く、時間を要するため、モデルの精度に影響を与えることなく、7.21倍のスピードアップを持つDGXクラスタ上でHorovodフレームワークを使用して8つのGPUにスケールするU-Netモデルトレーニングを配布する。南極のロス海地域を例として、自動ラベル付きデータに基づいてトレーニングされたU-Netモデルは、S2画像からの薄い雲と影がフィルタリングされたときに、自動ラベル付きトレーニングデータセットの分類精度98.97%を達成する。 The observation of the advancing and retreating pattern of polar sea ice cover stands as a vital indicator of global warming. This research aims to develop a robust, effective, and scalable system for classifying polar sea ice as thick/snow-covered, young/thin, or open water using Sentinel-2 (S2) images. Since the S2 satellite is actively capturing high-resolution imagery over the earth's surface, there are lots of images that need to be classified. One major obstacle is the absence of labeled S2 training data (images) to act as the ground truth. We demonstrate a scalable and accurate method for segmenting and automatically labeling S2 images using carefully determined color thresholds. We employ a parallel workflow using PySpark to scale and achieve 9-fold data loading and 16-fold map-reduce speedup on auto-labeling S2 images based on thin cloud and shadow-filtered color-based segmentation to generate label data. The auto-labeled data generated from this process are then employed to train a U-Net machine learning model, resulting in good classification accuracy. As training the U-Net classification model is computationally heavy and time-consuming, we distribute the U-Net model training to scale it over 8 GPUs using the Horovod framework over a DGX cluster with a 7.21x speedup without affecting the accuracy of the model. Using the Antarctic's Ross Sea region as an example, the U-Net model trained on auto-labeled data achieves a classification accuracy of 98.97% for auto-labeled training datasets when the thin clouds and shadows from the S2 images are filtered out.	翻訳日:2024-03-21 18:47:08 公開日:2024-03-19
# レーザー誘起エネルギーデポジションにおける溶融プールモデリングのための不均一入力空間を持つ多面体サロゲート Multi-fidelity surrogate with heterogeneous input spaces for modeling melt pools in laser-directed energy deposition ( http://arxiv.org/abs/2403.13136v1 ) ライセンス: Link先を確認	Nandana Menon, Amrita Basak,	(参考訳) MFモデリング(Multi-fidelity Modeling)は、様々なフィデリティソースからデータをインテリジェントにブレンドできる強力な統計手法である。このアプローチは、レーザー指向エネルギー沈着(L-DED)に対する溶融プール幾何学の予測に魅力的な応用を見出した。メルトプールモデルの階層をマージするためにMFサロゲートを使用する際の大きな課題は、入力空間における可変性である。この課題に対処するために, 不均一な入力空間で動作する様々な複雑さのモデルを統合することにより, 溶融プール形状を予測するためのMFサロゲートを構築するための新しいアプローチを提案する。第1の熱モデルには、レーザーパワー、走査速度、粉体流量、キャリアガス流量、ノズル高さの5つの入力パラメータが組み込まれている。対照的に、第2の熱モデルはレーザーパワーと走査速度しか扱えない。不均一な入力空間の間に写像を確立し、その5次元空間を擬二次元空間に変形させることができる。予測はガウス過程に基づくコクリグ法でブレンドされる。結果のヘテロジニアス多面体ガウス過程(Het-MFGP)は、予測精度を向上するだけでなく、高次元の高密度熱モデルから要求される評価を減らすことで計算効率も向上する。その結果,L-DEDにおける溶融プール挙動のモデル化にHet-MFGPを用いることの利点が示された。このフレームワークは、マルチモーダルデータをうまく活用し、特定の入力パラメータがモデル化や測定が難しいシナリオを扱う方法を実証する。 Multi-fidelity (MF) modeling is a powerful statistical approach that can intelligently blend data from varied fidelity sources. This approach finds a compelling application in predicting melt pool geometry for laser-directed energy deposition (L-DED). One major challenge in using MF surrogates to merge a hierarchy of melt pool models is the variability in input spaces. To address this challenge, this paper introduces a novel approach for constructing an MF surrogate for predicting melt pool geometry by integrating models of varying complexity, that operate on heterogeneous input spaces. The first thermal model incorporates five input parameters i.e., laser power, scan velocity, powder flow rate, carrier gas flow rate, and nozzle height. In contrast, the second thermal model can only handle laser power and scan velocity. A mapping is established between the heterogeneous input spaces so that the five-dimensional space can be morphed into a pseudo two-dimensional space. Predictions are then blended using a Gaussian process-based co-kriging method. The resulting heterogeneous multi-fidelity Gaussian process (Het-MFGP) surrogate not only improves predictive accuracy but also offers computational efficiency by reducing evaluations required from the high-dimensional, high-fidelity thermal model. The results underscore the benefits of employing Het-MFGP for modeling melt pool behavior in L-DED. The framework successfully demonstrates how to leverage multimodal data and handle scenarios where certain input parameters may be difficult to model or measure.	翻訳日:2024-03-21 18:37:24 公開日:2024-03-19
# 関数木: 透過的な機械学習 Function Trees: Transparent Machine Learning ( http://arxiv.org/abs/2403.13141v1 ) ライセンス: Link先を確認	Jerome H. Friedman,	(参考訳) 機械学習アルゴリズムの出力は通常、入力変数の1つ以上の多変量関数で表される。このような関数のグローバルな性質を知ることは、データを生成するシステムを理解するのに役立ち、対応するモデル予測を解釈し説明するのに役立ちます。より単純な関数のツリーとして、一般的な多変量関数を表現するための方法が提示される。この木は、入力変数のサブセットの結合影響を発見し、記述することで、関数のグローバルな内部構造を公開する。入力値と対応する関数値が与えられたとき、関数ツリーが構築され、関数の主および相互作用効果のすべてを高速に識別し、高次まで計算することができる。最大4変数の相互作用効果をグラフィカルに視覚化する。 The output of a machine learning algorithm can usually be represented by one or more multivariate functions of its input variables. Knowing the global properties of such functions can help in understanding the system that produced the data as well as interpreting and explaining corresponding model predictions. A method is presented for representing a general multivariate function as a tree of simpler functions. This tree exposes the global internal structure of the function by uncovering and describing the combined joint influences of subsets of its input variables. Given the inputs and corresponding function values, a function tree is constructed that can be used to rapidly identify and compute all of the function's main and interaction effects up to high order. Interaction effects involving up to four variables are graphically visualized.	翻訳日:2024-03-21 18:37:24 公開日:2024-03-19
# SIFT-DBT:不均衡な乳房共生画像分類のための自己教師付き初期化と微調整 SIFT-DBT: Self-supervised Initialization and Fine-Tuning for Imbalanced Digital Breast Tomosynthesis Image Classification ( http://arxiv.org/abs/2403.13148v1 ) ライセンス: Link先を確認	Yuexi Du, Regina J. Hooley, John Lewin, Nicha C. Dvornek,	(参考訳) デジタル乳房共生(Digital Breast Tomo synthesis、DBT)は、乳がんのスクリーニングと診断に広く用いられている医療画像のモダリティであり、3Dライクな乳房容積イメージング機能により、より高解像度で詳細な画像を提供する。しかし、データ量の増加はデータ不均衡の顕著な課題も引き起こし、ボリュームのごく一部に不審な組織が含まれている。これにより、実世界のデータにおけるケースレベルの分布によるデータの不均衡がさらに悪化し、多数派のみを予測する自明な分類モデルを学ぶことができる。そこで本研究では,ビューレベルのコントラスト型自己監督初期化とファインチューニングを用いて,異常なDBT画像(SIFT-DBT)を識別する手法を提案する。さらに,空間分解能を維持するためのパッチレベルのマルチインスタンス学習手法を提案する。提案手法は92.69%の量的AUCを970のユニークな研究で評価する。 Digital Breast Tomosynthesis (DBT) is a widely used medical imaging modality for breast cancer screening and diagnosis, offering higher spatial resolution and greater detail through its 3D-like breast volume imaging capability. However, the increased data volume also introduces pronounced data imbalance challenges, where only a small fraction of the volume contains suspicious tissue. This further exacerbates the data imbalance due to the case-level distribution in real-world data and leads to learning a trivial classification model that only predicts the majority class. To address this, we propose a novel method using view-level contrastive Self-supervised Initialization and Fine-Tuning for identifying abnormal DBT images, namely SIFT-DBT. We further introduce a patch-level multi-instance learning method to preserve spatial resolution. The proposed method achieves 92.69% volume-wise AUC on an evaluation of 970 unique studies.	翻訳日:2024-03-21 18:37:24 公開日:2024-03-19
# Scoring Rules を用いた生存モデルの訓練 Training Survival Models using Scoring Rules ( http://arxiv.org/abs/2403.13150v1 ) ライセンス: Link先を確認	Philipp Kopper, David Rügamer, Raphael Sonabend, Bernd Bischl, Andreas Bender,	(参考訳) 生存分析(Survival Analysis)は、いくつかの領域において、部分的に不完全な時間と時間のデータに対する重要な洞察を提供する。また、確率論的機械学習の重要な例である。予測の確率的性質は、確率に基づく最適化ではなく、モデルフィッティングプロセスにおける(適切な)スコアリングルールを使用することによって利用することができる。私たちの提案は一般的な方法で行われ、さまざまなモデルクラスに使用できます。柔軟性の異なるパラメトリックと非パラメトリックのサブフレームワークを確立します。ニューラルネットワークに組み込まれると、計算効率が良くスケーラブルな最適化ルーチンが発生し、最先端の予測性能が得られる。最後に、我々のフレームワークを用いて様々なパラメトリックモデルを復元し、確率に基づく手法と比較して最適化が等しく動作することを示す。 Survival Analysis provides critical insights for partially incomplete time-to-event data in various domains. It is also an important example of probabilistic machine learning. The probabilistic nature of the predictions can be exploited by using (proper) scoring rules in the model fitting process instead of likelihood-based optimization. Our proposal does so in a generic manner and can be used for a variety of model classes. We establish different parametric and non-parametric sub-frameworks that allow different degrees of flexibility. Incorporated into neural networks, it leads to a computationally efficient and scalable optimization routine, yielding state-of-the-art predictive performance. Finally, we show that using our framework, we can recover various parametric models and demonstrate that optimization works equally well when compared to likelihood-based methods.	翻訳日:2024-03-21 18:37:24 公開日:2024-03-19
# カゴメ格子$\mathbb{Z}_2$スピン液体中のバイソン凝縮とスピノン凝縮:量子二量体モデルの数値的研究 Vison condensation and spinon confinement in a kagome lattice $\mathbb{Z}_2$ spin liquid: A numerical study of a quantum dimer model ( http://arxiv.org/abs/2403.13154v1 ) ライセンス: Link先を確認	Kyusung Hwang,	(参考訳) 量子スピン液体(quantum spin liquids)は、長距離の絡み合いと分数的な正準粒子を特徴とする、エキゾチックな多体状態である。スピン液体の量子相転移は、特にエノン凝縮とエノン凝縮の新しい現象に関連する興味深い問題である。本稿では,カゴメ格子上のスピン液体(\mathbb{Z}_2$SL)と価結合固体(VBS)の遷移を実装する量子二量体モデルについて検討する。この遷移は、$\mathbb{Z}_2$ スピン液体のビソン励起の凝縮によって引き起こされ、特にスピノン励起の閉じ込めにつながる他のエノン励起に影響を及ぼす。ダイマーモデルの数値的対角化により、バイソン弦演算子を用いてビソン凝縮を直接測定し、VBS状態のスピノン励起に作用する精細ポテンシャルを明示的にチェックする。スピン液体状態のトポロジカル縮退は、ビソン凝縮と相まって持ち上げられることが観察された。 VBS状態の二量体秩序パターンを、二量体構造因子を調べることにより同定する。さらに、スピン液体とVBSの特徴を同時に示す興味深い状態が見つかる。本稿では,混合挙動の起源と熱力学的限界で期待されるシナリオについて論じる。この研究は、ダイマーモデルであるPhysに関する以前の分析研究を補完する。 B87号、104408号(2013年)、Phys号。 B92, 205131 (2015) は、$\mathbb{Z}_2$SL-to-VBS遷移におけるバイソン凝縮とスピノン凝縮に関する数値的な証拠を提供する。 Quantum spin liquids are exotic many-body states featured with long-range entanglement and fractional anyon quasiparticles. Quantum phase transitions of spin liquids are particularly interesting problems related with novel phenomena of anyon condensation and anyon confinement. Here we study a quantum dimer model which implements a transition between a $\mathbb{Z}_2$ spin liquid ($\mathbb{Z}_2$SL) and a valence bond solid (VBS) on the kagome lattice. The transition is driven by the condensation of vison excitation of the $\mathbb{Z}_2$ spin liquid, which impacts on other anyon excitations especially leading to the confinement of spinon excitations. By numerical exact diagonalization of the dimer model, we directly measure the vison condensation using vison string operators, and explicitly check a confining potential acting on spinon excitations in the VBS state. It is observed that topological degeneracy of the spin liquid state is lifted concomitantly with the vison condensation. The dimer ordering pattern of the VBS state is identified by investigating dimer structure factor. Furthermore, we find an interesting state that exhibits features of spin liquid and VBS simultaneously. We discuss the origin of the mixed behaviors and possible scenarios expected in thermodynamic limit. This work complements the previous analytical studies on the dimer model, Phys. Rev. B 87, 104408 (2013) and Phys. Rev. B 92, 205131 (2015), by providing numerical evidences on the vison condensation and the spinon confinement in the $\mathbb{Z}_2$SL-to-VBS transition.	翻訳日:2024-03-21 18:37:24 公開日:2024-03-19
# DeblurDiNAT: 軽量で効果的な画像デブロア変換器 DeblurDiNAT: A Lightweight and Effective Transformer for Image Deblurring ( http://arxiv.org/abs/2403.13163v1 ) ライセンス: Link先を確認	Hanzhou Liu, Binghan Li, Chengkai Liu, Mi Lu,	(参考訳) ブラーリ画像には局所的およびグローバルな非一様人工物が含まれており、これは難解な過程を複雑にし、満足のいく結果を達成するのがより困難になる。近年、トランスフォーマーは既存のCNNアーキテクチャよりも改善された遅延結果を生成する。しかし、大きなモデルサイズと長い推測時間は、まだ十分に検討されていない2つの厄介な問題である。そこで本研究では,現実のぼやけた画像からクリーンな画像を効率よく復元する小型エンコーダデコーダトランスであるDeblurDiNATを提案する。我々は,グローバルな特徴学習を目的とした交互拡張因子構造を採用する。また,ネットワーク上での自己注意層の利用は,必ずしもよい判断結果をもたらすとは限らないことも観察した。この問題を解決するために、チャネル変調自己注意ブロック(CMSA)を提案し、チャンネル間の学習者(CCL)を用いてチャネル関係をキャプチャする。さらに,高速な特徴伝達が可能な分割・乗算フィードフォワードネットワーク(DMFN)を提案する。さらに,制御機能マージを行う軽量ゲート機能融合(LGFF)モジュールを設計する。総合的な実験結果から,提案モデルであるDeblurDiNATは,ベースラインに顕著な計算コストを伴わずに良好な性能向上を実現し,複数の画像劣化データセット上でSOTA(State-of-the-art)性能を実現することを示す。最寄りの競合と比べて、空間効率と時間節約の手法は、パラメーターが3%から68%減ってより強力な一般化能力を示し、地上の真実に近づいた可視画像を生成する。 Blurry images may contain local and global non-uniform artifacts, which complicate the deblurring process and make it more challenging to achieve satisfactory results. Recently, Transformers generate improved deblurring outcomes than existing CNN architectures. However, the large model size and long inference time are still two bothersome issues which have not been fully explored. To this end, we propose DeblurDiNAT, a compact encoder-decoder Transformer which efficiently restores clean images from real-world blurry ones. We adopt an alternating dilation factor structure with the aim of global-local feature learning. Also, we observe that simply using self-attention layers in networks does not always produce good deblurred results. To solve this problem, we propose a channel modulation self-attention (CMSA) block, where a cross-channel learner (CCL) is utilized to capture channel relationships. In addition, we present a divide and multiply feed-forward network (DMFN) allowing fast feature propagation. Moreover, we design a lightweight gated feature fusion (LGFF) module, which performs controlled feature merging. Comprehensive experimental results show that the proposed model, named DeblurDiNAT, provides a favorable performance boost without introducing noticeable computational costs over the baseline, and achieves state-of-the-art (SOTA) performance on several image deblurring datasets. Compared to nearest competitors, our space-efficient and time-saving method demonstrates a stronger generalization ability with 3%-68% fewer parameters and produces deblurred images that are visually closer to the ground truth.	翻訳日:2024-03-21 18:37:24 公開日:2024-03-19
# VL-ICL Bench:マルチモーダルインコンテキストラーニングのベンチマークの詳細 VL-ICL Bench: The Devil in the Details of Benchmarking Multimodal In-Context Learning ( http://arxiv.org/abs/2403.13164v1 ) ライセンス: Link先を確認	Yongshuo Zong, Ondrej Bohdal, Timothy Hospedales,	(参考訳) 大規模言語モデル(LLM)は、モデルの重みを更新することなく、プロンプトとして提供される少数ショット例を使用して、新しいタスクに迅速に適応する能力である、創発的なインコンテキスト学習(ICL)を示すことで有名である。 LLM上に構築された視覚大言語モデル(VLLM)は、認識、推論、接地といった分野で大きく進歩している。しかし、emph{multimodal ICL} の調査は、主に数発の視覚的質問応答(VQA)と画像キャプションに焦点を合わせており、ICL の強みを活用せず、その限界もテストしない。マルチモーダルICLのより広範な機能と限界は、まだ未調査のままである。本研究では,マルチモーダル・インコンテキスト学習のための総合ベンチマークVL-ICL Benchを導入し,画像とテキストの両方を入力や出力として含むタスクの幅広い範囲を包含する。我々は、このベンチマークスイートに対して最先端のVLLMの能力を評価し、その多様な長所と短所を明らかにし、GPT-4のような最も先進的なモデルでさえ課題を見出すことを示した。さまざまな新しいICLタスクと既存のモデルの強みと制限を強調して、私たちのデータセットがVLLMのコンテキスト内学習能力を向上し、VLLM ICLを利用する新しいアプリケーションに刺激を与えることを期待しています。コードとデータセットはhttps://github.com/ys-zong/VL-ICLで公開されている。 Large language models (LLMs) famously exhibit emergent in-context learning (ICL) -- the ability to rapidly adapt to new tasks using few-shot examples provided as a prompt, without updating the model's weights. Built on top of LLMs, vision large language models (VLLMs) have advanced significantly in areas such as recognition, reasoning, and grounding. However, investigations into \emph{multimodal ICL} have predominantly focused on few-shot visual question answering (VQA), and image captioning, which we will show neither exploit the strengths of ICL, nor test its limitations. The broader capabilities and limitations of multimodal ICL remain under-explored. In this study, we introduce a comprehensive benchmark VL-ICL Bench for multimodal in-context learning, encompassing a broad spectrum of tasks that involve both images and text as inputs and outputs, and different types of challenges, from {perception to reasoning and long context length}. We evaluate the abilities of state-of-the-art VLLMs against this benchmark suite, revealing their diverse strengths and weaknesses, and showing that even the most advanced models, such as GPT-4, find the tasks challenging. By highlighting a range of new ICL tasks, and the associated strengths and limitations of existing models, we hope that our dataset will inspire future work on enhancing the in-context learning capabilities of VLLMs, as well as inspire new applications that leverage VLLM ICL. The code and dataset are available at https://github.com/ys-zong/VL-ICL.	翻訳日:2024-03-21 18:37:24 公開日:2024-03-19
# 医用画像分類のための視覚変換器EATFormerの改良 Improved EATFormer: A Vision Transformer for Medical Image Classification ( http://arxiv.org/abs/2403.13167v1 ) ライセンス: Link先を確認	Yulong Shisu, Susano Mingwin, Yongshuai Wanwag, Zengqiang Chenso, Sunshin Huing,	(参考訳) 医療画像の正確な分析は、医療状況の診断と予測に不可欠である。放射線技師や臨床医に依存する伝統的なアプローチは、不整合と診断の欠如に悩まされている。コンピュータ支援診断システムは、早期、正確、効率的な診断の達成を支援することができる。本稿では,視覚変換器を用いた医用画像分類のための改良された進化的アルゴリズムに基づくトランスフォーマアーキテクチャを提案する。提案したEATFormerアーキテクチャは、畳み込みニューラルネットワークとビジョントランスフォーマーの強みを組み合わせて、データのパターンを特定し、特定の特性に適応する能力を活用している。このアーキテクチャには、Feed-Forward Networkによる拡張EAベースのTransformerブロック、Global and Local Interaction、マルチスケールリージョンアグリゲーションモジュールなど、新しいコンポーネントが含まれている。また、不規則な位置の動的モデリングのためのModulated Deformable MSAモジュールも導入されている。本稿では,ビジョントランスフォーマー(ViT)モデルの主要な特徴として,パッチベースの処理,位置空間の取り込み,マルチヘッドアテンション機構について論じる。これは、異なる受容領域から情報を集約して誘導バイアスを提供する、マルチスケール領域集約モジュールを導入している。 Global and Local Interactionモジュールは、識別的ローカル情報を抽出するローカルパスを導入することで、MSAベースのグローバルモジュールを強化する。 Chest X-rayデータセットとKvasirデータセットの実験結果から,提案したEATFormerはベースラインモデルと比較して予測速度と精度を大幅に向上することが示された。 The accurate analysis of medical images is vital for diagnosing and predicting medical conditions. Traditional approaches relying on radiologists and clinicians suffer from inconsistencies and missed diagnoses. Computer-aided diagnosis systems can assist in achieving early, accurate, and efficient diagnoses. This paper presents an improved Evolutionary Algorithm-based Transformer architecture for medical image classification using Vision Transformers. The proposed EATFormer architecture combines the strengths of Convolutional Neural Networks and Vision Transformers, leveraging their ability to identify patterns in data and adapt to specific characteristics. The architecture incorporates novel components, including the Enhanced EA-based Transformer block with Feed-Forward Network, Global and Local Interaction , and Multi-Scale Region Aggregation modules. It also introduces the Modulated Deformable MSA module for dynamic modeling of irregular locations. The paper discusses the Vision Transformer (ViT) model's key features, such as patch-based processing, positional context incorporation, and Multi-Head Attention mechanism. It introduces the Multi-Scale Region Aggregation module, which aggregates information from different receptive fields to provide an inductive bias. The Global and Local Interaction module enhances the MSA-based global module by introducing a local path for extracting discriminative local information. Experimental results on the Chest X-ray and Kvasir datasets demonstrate that the proposed EATFormer significantly improves prediction speed and accuracy compared to baseline models.	翻訳日:2024-03-21 18:37:24 公開日:2024-03-19
# Wav2Gloss: 音声からインターリニア・グロステキストを生成する Wav2Gloss: Generating Interlinear Glossed Text from Speech ( http://arxiv.org/abs/2403.13169v1 ) ライセンス: Link先を確認	Taiqi He, Kwanghee Choi, Lindia Tjuatja, Nathaniel R. Robinson, Jiatong Shi, Shinji Watanabe, Graham Neubig, David R. Mortensen, Lori Levin,	(参考訳) 世界中の何千もの言語が絶滅の危機にさらされている。 Interlinear Glossed Text (IGT) は言語アノテーションの一種で、これらの言語コミュニティのドキュメントやリソース作成をサポートする。 IGTは通常、(1)転写、(2)形態的セグメンテーション、(3)グルース、(4)多数言語への自由翻訳からなる。本稿では,これらの4つのアノテーションコンポーネントを音声から自動的に抽出するタスクであるWav2Glossを提案し,その最後に最初のデータセットであるFieldworkを紹介した。我々は,エンドツーエンドとカスケードのWav2Gloss法を比較し,事前学習したデコーダが翻訳とグロス処理を補助し,マルチタスクと多言語アプローチが不十分であり,テキストのみの利点にもかかわらず,エンドツーエンドシステムはカスケードシステムよりも優れた性能を発揮することを示唆する分析を行った。音声からのIGT生成に関する今後の研究のための基礎研究を行うためのベンチマークを提供する。 Thousands of the world's languages are in danger of extinction--a tremendous threat to cultural identities and human language diversity. Interlinear Glossed Text (IGT) is a form of linguistic annotation that can support documentation and resource creation for these languages' communities. IGT typically consists of (1) transcriptions, (2) morphological segmentation, (3) glosses, and (4) free translations to a majority language. We propose Wav2Gloss: a task to extract these four annotation components automatically from speech, and introduce the first dataset to this end, Fieldwork: a corpus of speech with all these annotations covering 37 languages with standard formatting and train/dev/test splits. We compare end-to-end and cascaded Wav2Gloss methods, with analysis suggesting that pre-trained decoders assist with translation and glossing, that multi-task and multilingual approaches are underperformant, and that end-to-end systems perform better than cascaded systems, despite the text-only systems' advantages. We provide benchmarks to lay the ground work for future research on IGT generation from speech.	翻訳日:2024-03-21 18:37:24 公開日:2024-03-19
# LUWAデータセット: 顕微鏡画像におけるLithic Use-Wear解析の学習 LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images ( http://arxiv.org/abs/2403.13171v1 ) ライセンス: Link先を確認	Jing Zhang, Irving Fang, Juexiao Zhang, Hao Wu, Akshat Kaushik, Alice Rodriguez, Hanwen Zhao Zhuo Zheng, Radu Iovita, Chen Feng,	(参考訳) 顕微鏡画像を用いたLithic Use-Wear Analysis (LUWA) は、未発見のビジョン・フォー・サイエンス研究領域である。考古学的アーティファクト、材料相互作用、ツール機能、歯科記録を理解する上で重要な作業材料を区別することを目指している。しかし、この課題は、一般的な対象に対するよく研究された画像分類問題を越えている。複雑な摩耗機構と顕微鏡画像により、多くの共同設立者の影響を受けており、人間の専門家でさえその素材をうまく識別することは困難である。本稿では,このユニークな視覚課題について,初めて以下の3つの疑問を考察する。 (i)最先端の事前訓練されたモデル(例えばDINOv2)は、どのようにして希少な領域に一般化できるのか? (ii)少ない顕微鏡画像に少ショット学習をどのように活用することができるか。三曖昧な倍率と感度が分類精度にどのような影響を及ぼすか。これらの研究のために,我々は考古学者と共同で,23,130の顕微鏡画像を含む最初のオープンソースかつ最大のLUWAデータセットを構築した。大規模な実験では、既存の事前訓練されたモデルは、特に人間の専門家より優れているが、改善のための大きなギャップを残している。最も重要なのは、LUWAデータセットが、ビジョンと学習コミュニティの未探索の機会を提供し、共通オブジェクト上の既存の画像分類問題を補完することです。 Lithic Use-Wear Analysis (LUWA) using microscopic images is an underexplored vision-for-science research area. It seeks to distinguish the worked material, which is critical for understanding archaeological artifacts, material interactions, tool functionalities, and dental records. However, this challenging task goes beyond the well-studied image classification problem for common objects. It is affected by many confounders owing to the complex wear mechanism and microscopic imaging, which makes it difficult even for human experts to identify the worked material successfully. In this paper, we investigate the following three questions on this unique vision task for the first time:(i) How well can state-of-the-art pre-trained models (like DINOv2) generalize to the rarely seen domain? (ii) How can few-shot learning be exploited for scarce microscopic images? (iii) How do the ambiguous magnification and sensing modality influence the classification accuracy? To study these, we collaborated with archaeologists and built the first open-source and the largest LUWA dataset containing 23,130 microscopic images with different magnifications and sensing modalities. Extensive experiments show that existing pre-trained models notably outperform human experts but still leave a large gap for improvements. Most importantly, the LUWA dataset provides an underexplored opportunity for vision and learning communities and complements existing image classification problems on common objects.	翻訳日:2024-03-21 18:37:24 公開日:2024-03-19
# Castor: 高速かつ正確な時系列分類のためのシェープレットの競合 Castor: Competing shapelets for fast and accurate time series classification ( http://arxiv.org/abs/2403.13176v1 ) ライセンス: Link先を確認	Isak Samsten, Zed Lee,	(参考訳) シェープレットは識別サブシーケンスであり、もともとはシェイプレットベースの決定木に埋め込まれていたが、その後シェイプレットベースの変換へと拡張された。シェープレットを用いて時系列を変換する,単純で効率的かつ正確な時系列分類アルゴリズムであるCastorを提案する。この変換は、様々なダイレーションを持つグループにシェイプレットを編成し、シェイプレットが時間的文脈で競争し、多様な特徴表現を構築することを可能にする。シェープレットをグループにまとめることで、競合のレベル間の遷移を可能とし、その結果、距離ベースの変換や辞書ベースの変換とよりよく似た方法が生まれる。我々は、広範囲にわたる経験的調査を通じて、カスターは、いくつかの最先端の分類器よりもはるかに正確な分類器をもたらす変換をもたらすことを示した。広範囲にわたるアブレーション研究において、ハイパーパラメータの選択の効果を検証し、正確で効率的なデフォルト値を提案する。 Shapelets are discriminative subsequences, originally embedded in shapelet-based decision trees but have since been extended to shapelet-based transformations. We propose Castor, a simple, efficient, and accurate time series classification algorithm that utilizes shapelets to transform time series. The transformation organizes shapelets into groups with varying dilation and allows the shapelets to compete over the time context to construct a diverse feature representation. By organizing the shapelets into groups, we enable the transformation to transition between levels of competition, resulting in methods that more closely resemble distance-based transformations or dictionary-based transformations. We demonstrate, through an extensive empirical investigation, that Castor yields transformations that result in classifiers that are significantly more accurate than several state-of-the-art classifiers. In an extensive ablation study, we examine the effect of choosing hyperparameters and suggest accurate and efficient default values.	翻訳日:2024-03-21 18:37:24 公開日:2024-03-19
# 深層強化学習のための高速値追跡 Fast Value Tracking for Deep Reinforcement Learning ( http://arxiv.org/abs/2403.13178v1 ) ライセンス: Link先を確認	Frank Shih, Faming Liang,	(参考訳) 強化学習(Reinforcement Learning、RL)は、環境と相互作用するエージェントを作成することで、シーケンシャルな意思決定問題に取り組む。しかし、既存のアルゴリズムはしばしばこれらの問題を静的とみなし、期待される報酬を最大化するためにモデルパラメータの点推定に注目し、エージェント環境相互作用の確率力学と不確実な定量化の重要な役割を無視している。我々の研究はカルマンフィルタのパラダイムを活用し、Langevinized Kalman Temporal-Difference (LKTD) と呼ばれる新しい拡張性のあるサンプリングアルゴリズムを導入する。このアルゴリズムはSGMCMC(Stochastic Gradient Markov Chain Monte Carlo)に基づいており、ディープニューラルネットワークパラメータの後方分布からサンプルを効率的に引き出す。軽度条件下では、LKTDアルゴリズムによって生成された後続サンプルが定常分布に収束することが証明される。この収束によって、値関数やモデルパラメータに関連する不確実性を定量化できるだけでなく、トレーニングフェーズ全体を通してポリシー更新中にこれらの不確実性を監視できる。 LKTDアルゴリズムは、より堅牢で適応可能な強化学習アプローチの道を開く。 Reinforcement learning (RL) tackles sequential decision-making problems by creating agents that interacts with their environment. However, existing algorithms often view these problem as static, focusing on point estimates for model parameters to maximize expected rewards, neglecting the stochastic dynamics of agent-environment interactions and the critical role of uncertainty quantification. Our research leverages the Kalman filtering paradigm to introduce a novel and scalable sampling algorithm called Langevinized Kalman Temporal-Difference (LKTD) for deep reinforcement learning. This algorithm, grounded in Stochastic Gradient Markov Chain Monte Carlo (SGMCMC), efficiently draws samples from the posterior distribution of deep neural network parameters. Under mild conditions, we prove that the posterior samples generated by the LKTD algorithm converge to a stationary distribution. This convergence not only enables us to quantify uncertainties associated with the value function and model parameters but also allows us to monitor these uncertainties during policy updates throughout the training phase. The LKTD algorithm paves the way for more robust and adaptable reinforcement learning approaches.	翻訳日:2024-03-21 18:37:24 公開日:2024-03-19
# 構造化ドメイン上の予測的、スケーラブルで解釈可能な知識トレース Predictive, scalable and interpretable knowledge tracing on structured domains ( http://arxiv.org/abs/2403.13179v1 ) ライセンス: Link先を確認	Hanqi Zhou, Robert Bamler, Charley M. Wu, Álvaro Tejero-Cantero,	(参考訳) 知的学習システムは、学習教材の選択とタイミングを最適化し、理解と長期保持を強化する。これは学習者の進捗('knowledge Trace'; KT')と学習領域の必須構造('knowledge mapping')の両方を推定する必要がある。近年のディープラーニングモデルは高いKT精度を達成しているが、心理的にインスパイアされたモデルの解釈容易性を犠牲にしている。この作業では、このトレードオフに対する解決策を提示します。 PSI-KTは階層的生成手法であり、個々の認知特性と知識の前提構造の両方が学習力学に影響を及ぼし、設計による解釈可能性を実現する。さらに、スケーラブルなベイズ推定を用いることで、PSI-KTは学習者や学習履歴の増大を伴っても、実世界の効率的なパーソナライズの必要性を目標としている。 PSI-KTは、オンライン学習プラットフォームから得られた3つのデータセットに基づいて、学習者固有の特徴の解釈可能な表現と、学習を因果的に支援する知識の前提構造を提供しながら、連続的な学習設定において、優れた多段階予測精度とスケーラブルな推論を実現する。総じて、予測的でスケーラブルで解釈可能な知識追跡とソリッド・ナレッジ・マッピングは、効果的なパーソナライズド・ラーニングの基盤として、幅広いグローバルなオーディエンスに教育を利用できるようにする。 Intelligent tutoring systems optimize the selection and timing of learning materials to enhance understanding and long-term retention. This requires estimates of both the learner's progress (''knowledge tracing''; KT), and the prerequisite structure of the learning domain (''knowledge mapping''). While recent deep learning models achieve high KT accuracy, they do so at the expense of the interpretability of psychologically-inspired models. In this work, we present a solution to this trade-off. PSI-KT is a hierarchical generative approach that explicitly models how both individual cognitive traits and the prerequisite structure of knowledge influence learning dynamics, thus achieving interpretability by design. Moreover, by using scalable Bayesian inference, PSI-KT targets the real-world need for efficient personalization even with a growing body of learners and learning histories. Evaluated on three datasets from online learning platforms, PSI-KT achieves superior multi-step predictive accuracy and scalable inference in continual-learning settings, all while providing interpretable representations of learner-specific traits and the prerequisite structure of knowledge that causally supports learning. In sum, predictive, scalable and interpretable knowledge tracing with solid knowledge mapping lays a key foundation for effective personalized learning to make education accessible to a broad, global audience.	翻訳日:2024-03-21 18:37:24 公開日:2024-03-19
# モデルマージレシピの進化的最適化 Evolutionary Optimization of Model Merging Recipes ( http://arxiv.org/abs/2403.13187v1 ) ライセンス: Link先を確認	Takuya Akiba, Makoto Shing, Yujin Tang, Qi Sun, David Ha,	(参考訳) 本稿では、強力な基礎モデルの作成を自動化するための進化的アルゴリズムの新たな応用について述べる。モデルマージは、LLM開発においてコスト効率のために有望なアプローチとして現れてきたが、現在は人間の直観とドメイン知識に依存しており、その可能性を制限する。本稿では、多様なオープンソースモデルの効果的な組み合わせを自動的に発見し、大規模なトレーニングデータや計算を必要とせず、その集合的知性を活用することにより、この制限を克服する進化的アプローチを提案する。我々の手法はパラメータ空間とデータフロー空間の両方で動作し、個々のモデルの重み以上の最適化を可能にする。このアプローチはドメイン間のマージを容易にし、Math推論機能を備えた日本のLLMのようなモデルを生成する。驚くべきことに、我々の日本語数学 LLM は、これらのタスクを明示的に訓練されていないにもかかわらず、パラメータがかなり多いモデルよりもはるかに多く、様々な確立された日本語 LLM ベンチマークで最先端のパフォーマンスを達成した。さらに,本手法により得られた文化に配慮したVLMは,従来のVLMよりも優れた日本文化特化コンテンツを記述する上で,その効果を実証する。この作業は、新しい最先端のモデルをオープンソースコミュニティに還元するだけでなく、自動化されたモデル構成のための新しいパラダイムを導入し、基盤モデル開発への代替的で効率的なアプローチを探求する道を開いた。 We present a novel application of evolutionary algorithms to automate the creation of powerful foundation models. While model merging has emerged as a promising approach for LLM development due to its cost-effectiveness, it currently relies on human intuition and domain knowledge, limiting its potential. Here, we propose an evolutionary approach that overcomes this limitation by automatically discovering effective combinations of diverse open-source models, harnessing their collective intelligence without requiring extensive additional training data or compute. Our approach operates in both parameter space and data flow space, allowing for optimization beyond just the weights of the individual models. This approach even facilitates cross-domain merging, generating models like a Japanese LLM with Math reasoning capabilities. Surprisingly, our Japanese Math LLM achieved state-of-the-art performance on a variety of established Japanese LLM benchmarks, even surpassing models with significantly more parameters, despite not being explicitly trained for such tasks. Furthermore, a culturally-aware Japanese VLM generated through our approach demonstrates its effectiveness in describing Japanese culture-specific content, outperforming previous Japanese VLMs. This work not only contributes new state-of-the-art models back to the open-source community, but also introduces a new paradigm for automated model composition, paving the way for exploring alternative, efficient approaches to foundation model development.	翻訳日:2024-03-21 18:37:24 公開日:2024-03-19
# LiDARセマンティックセグメンテーションの改善 Reflectivity Is All You Need!: Advancing LiDAR Semantic Segmentation ( http://arxiv.org/abs/2403.13188v1 ) ライセンス: Link先を確認	Kasi Viswanath, Peng Jiang, Srikanth Saripalli,	(参考訳) LiDARセマンティックセグメンテーションフレームワークは、主に幾何学に基づく特徴を活用してスキャン内のオブジェクトを識別する。これらの手法は明確な境界と異なる形状のシナリオで優れているが、特にオフロード環境では境界がぼやけている環境では性能が低下する。これを解決するために、最近の3次元分割アルゴリズムは、予測精度を向上させるために生のLiDAR強度測定を活用することに重点を置いている。これらの努力にもかかわらず、現在の学習モデルでは、生の強度と距離、入射角、物質反射率、大気条件などの要因の間の複雑な関係の相関に苦慮している。本稿では,従来の研究に基づいて,LiDARセマンティックセマンティックセグメンテーションフレームワークに校正強度(リフレクティビティとも呼ばれる)を用いることの利点を考察する。我々はまず、リフレクティビティを入力として組み込むことで、既存のLiDARセマンティックセグメンテーションモデルを強化することを確立した。さらに,モデルが強度の校正を学べることによって,性能が向上することを示す。オフロードデータセットRellis-3Dの広範な実験により、顕著な改善が示された。特に, 反射率への変換は, オフロードシナリオの生強度を用いた場合と比較して, 平均対合(mIoU)の平均断面積を4%増加させる。さらに,都市環境(SemanticKITTI)におけるセマンティックセグメンテーションの校正強度とクロスセンサー領域適応の利点についても検討した。 LiDAR semantic segmentation frameworks predominantly leverage geometry-based features to differentiate objects within a scan. While these methods excel in scenarios with clear boundaries and distinct shapes, their performance declines in environments where boundaries are blurred, particularly in off-road contexts. To address this, recent strides in 3D segmentation algorithms have focused on harnessing raw LiDAR intensity measurements to improve prediction accuracy. Despite these efforts, current learning-based models struggle to correlate the intricate connections between raw intensity and factors such as distance, incidence angle, material reflectivity, and atmospheric conditions. Building upon our prior work, this paper delves into the advantages of employing calibrated intensity (also referred to as reflectivity) within learning-based LiDAR semantic segmentation frameworks. We initially establish that incorporating reflectivity as an input enhances the existing LiDAR semantic segmentation model. Furthermore, we present findings that enable the model to learn to calibrate intensity can boost its performance. Through extensive experimentation on the off-road dataset Rellis-3D, we demonstrate notable improvements. Specifically, converting intensity to reflectivity results in a 4% increase in mean Intersection over Union (mIoU) when compared to using raw intensity in Off-road scenarios. Additionally, we also investigate the possible benefits of using calibrated intensity in semantic segmentation in urban environments (SemanticKITTI) and cross-sensor domain adaptation.	翻訳日:2024-03-21 18:37:24 公開日:2024-03-19
# 3Dセマンティックマップネット:3Dのマルチオブジェクト再識別のためのマップの構築 3D Semantic MapNet: Building Maps for Multi-Object Re-Identification in 3D ( http://arxiv.org/abs/2403.13190v1 ) ライセンス: Link先を確認	Vincent Cartillier, Neha Jain, Irfan Essa,	(参考訳) 具体化ツアーにおける3次元多目的再識別の課題について検討する。具体的には、エージェントは、2つの異なるレイアウト(例えば家具の配置)の下で環境(例えばアパート)の2つのツアーを与えられる。例えば、ロケーションAからBへ移動した"ソファ"、ロケーションCにおける第2のレイアウトで新しい"チェア"、第1のレイアウトで第2のレイアウトに欠けているロケーションDからの"ランプ"などである。このタスクを支援するために、我々は、Matterport3Dシーン、YCB、Googleスキャンされたオブジェクトを用いて、Habitatシミュレータで初期/修正レイアウトのペア・エゴセントリックなツアーを生成する自動化インフラを構築した。 3Dセマンティックマップネット(3D-SMNet)は,(1)RGB-Dビデオで動作する3Dオブジェクト検出器と(2)2組の3Dバウンディングボックス間の対応推定を解く微分可能なオブジェクトマッチングモジュールからなる2段階の再同定モデルである。全体として、3D-SMNetは、各レイアウトのオブジェクトベースのマップを構築し、その後、各ツアーのオブジェクトを再識別するために、差別化可能なマーカを使用する。生成されたエピソードを3D-SMNetでトレーニングした後,Replica,Active Vision,RIO環境においてタスクをインスタンス化することにより,実世界の再配置シナリオへのゼロショット転送を実証した。すべてのデータセットにおいて、3D-SMNetは競争ベースラインよりも優れています。さらに,実際のエピソードと生成されたエピソードを共同でトレーニングすることで,実際のデータのみのトレーニングよりも大きな改善がもたらされることを示す。 We study the task of 3D multi-object re-identification from embodied tours. Specifically, an agent is given two tours of an environment (e.g. an apartment) under two different layouts (e.g. arrangements of furniture). Its task is to detect and re-identify objects in 3D - e.g. a "sofa" moved from location A to B, a new "chair" in the second layout at location C, or a "lamp" from location D in the first layout missing in the second. To support this task, we create an automated infrastructure to generate paired egocentric tours of initial/modified layouts in the Habitat simulator using Matterport3D scenes, YCB and Google-scanned objects. We present 3D Semantic MapNet (3D-SMNet) - a two-stage re-identification model consisting of (1) a 3D object detector that operates on RGB-D videos with known pose, and (2) a differentiable object matching module that solves correspondence estimation between two sets of 3D bounding boxes. Overall, 3D-SMNet builds object-based maps of each layout and then uses a differentiable matcher to re-identify objects across the tours. After training 3D-SMNet on our generated episodes, we demonstrate zero-shot transfer to real-world rearrangement scenarios by instantiating our task in Replica, Active Vision, and RIO environments depicting rearrangements. On all datasets, we find 3D-SMNet outperforms competitive baselines. Further, we show jointly training on real and generated episodes can lead to significant improvements over training on real data alone.	翻訳日:2024-03-21 18:27:31 公開日:2024-03-19
# 大規模言語モデルを用いたJavaScriptプログラムの脆弱性修復に関する研究 A Study of Vulnerability Repair in JavaScript Programs with Large Language Models ( http://arxiv.org/abs/2403.13193v1 ) ライセンス: Link先を確認	Tan Khang Le, Saba Alimadadi, Steven Y. Ko,	(参考訳) 近年、JavaScriptは特にWeb開発において最も広く使われているプログラミング言語となっている。しかし、セキュアなJavaScriptコードを書くのは簡単ではなく、プログラマはWebアプリケーションのセキュリティ上の脆弱性につながる間違いを犯すことが多い。大規模言語モデル(LLM)は、複数のドメインにまたがる大幅な進歩を示しており、その進化する能力は、自動バグ修正を含む要求仕様に基づいて自動コード生成の可能性を示している。本研究では,JavaScriptプログラムにおけるセキュリティ脆弱性の発見と修正におけるLCM,すなわちChatGPTとBardの精度について検討する。また、脆弱なJavaScriptコードの正しいパッチを生成するためにLLMを指示するプロンプトにおけるコンテキストの影響についても検討する。実世界のソフトウェア脆弱性に関する我々の実験によると、LLMはJavaScriptコードの自動プログラム修復において有望であるが、正しいバグ修正を達成するには、しばしばプロンプトに適切なコンテキストを必要とする。 In recent years, JavaScript has become the most widely used programming language, especially in web development. However, writing secure JavaScript code is not trivial, and programmers often make mistakes that lead to security vulnerabilities in web applications. Large Language Models (LLMs) have demonstrated substantial advancements across multiple domains, and their evolving capabilities indicate their potential for automatic code generation based on a required specification, including automatic bug fixing. In this study, we explore the accuracy of LLMs, namely ChatGPT and Bard, in finding and fixing security vulnerabilities in JavaScript programs. We also investigate the impact of context in a prompt on directing LLMs to produce a correct patch of vulnerable JavaScript code. Our experiments on real-world software vulnerabilities show that while LLMs are promising in automatic program repair of JavaScript code, achieving a correct bug fix often requires an appropriate amount of context in the prompt.	翻訳日:2024-03-21 18:27:31 公開日:2024-03-19
# Hermite座標補間カーネル:画像ズームへの応用 Hermite coordinate interpolation kernels: application to image zooming ( http://arxiv.org/abs/2403.13195v1 ) ライセンス: Link先を確認	Konstantinos K. Delibasis, Iro Oikonomou, Aristides I. Kechriniotis, Georgios N. Tsigaridas,	(参考訳) 幾何変換のような多くの基本的な画像処理タスクは、サブピクセル画像の値の補間を必要とする。本研究では,非等間隔の直線格子上に定義された多次元座標Hermite Spline補間を利用して,画像ズームという,非常に一般的な画像処理タスクに適用する。 Hermite補間は関数値や偏微分値を利用するため、各画素における画像偏微分の数値近似を用いて、等間隔格子の特別な場合として画像処理タスクに適用することは自然である。さらに、画像補間処理のタスクは、nono-zero分数部分を持つ位置における画像値の計算を必要とする。したがって、スプライン補間は適切なカーネルとの畳み込みとして記述することができる。この文脈では、 [1] において定理 2 の導出した$n-$次元補間子に従ってエルミート核を生成する。補間材の複雑さが増大しているにもかかわらず、カーネルが構築されると、ハーマイトスプライン補間は他のより複雑な方法と同様に効率的に画像に適用できることが示される。最後に,従来の畳み込みに基づく他の補間法と比較し,PSNRやSSIM誤差の指標として深層学習を用いた場合と比較して,画像ズームのためのHermiteカーネルの適用性と高精度性を示すための実証的な数値例を示す。提案したHermiteスプラインカーネルは、ズーム操作のカスケード繰り返しを用いた実験において、テスト画像の大部分において、他のすべての方法よりも優れている。興味深い結論は、比較対象のすべての方法を考慮すると得られる。 A number of basic image processing tasks, such as any geometric transformation require interpolation at subpixel image values. In this work we utilize the multidimensional coordinate Hermite spline interpolation defined on non-equal spaced, rectilinear grids and apply it to a very common image processing task, image zooming. Since Hermite interpolation utilizes function values, as well as partial derivative values, it is natural to apply it to image processing tasks as a special case of equi-spaced grid, using numerical approximations of the image partial derivatives at each pixel. Furthermore, the task of image interpolation requires the calculation of image values at positions with nono-zero fractional part. Thus, any spline interpolation can be written as convolution with an appropriate kernel. In this context we generate the Hermite kernels according to the derived $n-$dimensional interpolant of Theorem 2 in [1]. We show that despite the increased complexity of the interpolant, once the kernels are constructed, the Hermite spline interpolation can be applied to images as efficiently as any other less complicated method. Finally, we perform illustrative numerical examples to showcase the applicability and high accuracy of the proposed Hermite kernels for image zooming, compared to other interpolation methods, both traditional convolution-based, as well as employing deep learning, in terms of PSNR, as well as SSIM error metrics. The proposed Hermite spline kernels outperform all other methods in the majority of the test images, in experiments using many cascaded repetitions of the zoom operation. Interesting conclusions can be drawn considering all methods under comparison.	翻訳日:2024-03-21 18:27:31 公開日:2024-03-19
# ADAPTがプロンプト調整型視覚変換器をロバスト化 ADAPT to Robustify Prompt Tuning Vision Transformers ( http://arxiv.org/abs/2403.13196v1 ) ライセンス: Link先を確認	Masih Eskandar, Tooba Imtiaz, Zifeng Wang, Jennifer Dy,	(参考訳) ビジョントランスフォーマーを含むディープモデルの性能は、敵の攻撃に弱いことが知られている。これらの攻撃に対する多くの既存の防御、例えば敵の訓練は、モデルの堅牢性を誘導するためにフルモデルの微調整に依存している。これらの防御は、タスク毎に数十億のパラメータを持つことができるモデル全体のコピーを保存する必要がある。同時に、パラメータ効率のよいプロンプトチューニングを使用して、大きなコピーを保存することなく、大きなトランスフォーマーベースのモデルを下流タスクに適応させる。本稿では、ロバスト性レンズ下での下流タスクに対するビジョントランスフォーマーのパラメータ効率向上プロンプトチューニングについて検討する。本稿では,従来の対角防御手法を即時チューニングパラダイムに適用した場合,勾配難読化に悩まされ,適応攻撃に弱いことを示す。本稿では,アダプティブ・チューニング・パラダイムにおける適応的対角訓練を行うための新しいフレームワークであるADAPTを紹介する。提案手法は,パラメータの約1%をチューニングすることで,フルモデルファインチューニングを用いた約40%のSOTAロバストネスの競合ロバスト精度を実現する。 The performance of deep models, including Vision Transformers, is known to be vulnerable to adversarial attacks. Many existing defenses against these attacks, such as adversarial training, rely on full-model fine-tuning to induce robustness in the models. These defenses require storing a copy of the entire model, that can have billions of parameters, for each task. At the same time, parameter-efficient prompt tuning is used to adapt large transformer-based models to downstream tasks without the need to save large copies. In this paper, we examine parameter-efficient prompt tuning of Vision Transformers for downstream tasks under the lens of robustness. We show that previous adversarial defense methods, when applied to the prompt tuning paradigm, suffer from gradient obfuscation and are vulnerable to adaptive attacks. We introduce ADAPT, a novel framework for performing adaptive adversarial training in the prompt tuning paradigm. Our method achieves competitive robust accuracy of ~40% w.r.t. SOTA robustness methods using full-model fine-tuning, by tuning only ~1% of the number of parameters.	翻訳日:2024-03-21 18:27:31 公開日:2024-03-19
# DecentNeRFs: クラウドソーシング画像からの分散ニューラルラジアンスフィールド DecentNeRFs: Decentralized Neural Radiance Fields from Crowdsourced Images ( http://arxiv.org/abs/2403.13199v1 ) ライセンス: Link先を確認	Zaid Tasneem, Akshat Dave, Abhishek Singh, Kushagra Tiwary, Praneeth Vepakomma, Ashok Veeraraghavan, Ramesh Raskar,	(参考訳) ニューラルレイディアンス場(NeRF)は、世界中で撮影された画像を没入型3D視覚体験に変換する可能性を示している。しかし、これらのキャプチャーされた視覚データのほとんどは、画像が個人の詳細を含んでいるため、カメラロールにサイロ化されている。たとえ公開されても、毎日数十億のシーンを集中的に撮影する3D表現を学習する問題は、計算的に難解である。私たちのアプローチであるDecentNeRFは、中央集権的なアプローチよりも、シーンのサーバコンピューティングを少なくする$\sim 10^4\timesを必要とする、分散型でクラウドソースのNeRFの最初の試みです。当社のアプローチでは,生データを送信するのではなく,ユーザが3D表現を送信し,ユーザ間で集中的なNeRFをトレーニングする際の高い計算コストを分散する。ユーザの3Dビューを個人的およびグローバルなNeRFに分解することで、フォトリアリスティックなシーン表現を学習し、後者のみを最適な重み付けで集約する。我々は、構造化された合成写真と実世界の写真観光データセット上で、NeRFをフォトリアリズムで学習し、サーバの計算コストを最小限に抑える方法の利点を検証した。さらに、DecentNeRFにおけるグローバルNeRFの安全なアグリゲーションが、サーバによる個人コンテンツの望ましくない再構築をいかに最小化するかを分析する。 Neural radiance fields (NeRFs) show potential for transforming images captured worldwide into immersive 3D visual experiences. However, most of this captured visual data remains siloed in our camera rolls as these images contain personal details. Even if made public, the problem of learning 3D representations of billions of scenes captured daily in a centralized manner is computationally intractable. Our approach, DecentNeRF, is the first attempt at decentralized, crowd-sourced NeRFs that require $\sim 10^4\times$ less server computing for a scene than a centralized approach. Instead of sending the raw data, our approach requires users to send a 3D representation, distributing the high computation cost of training centralized NeRFs between the users. It learns photorealistic scene representations by decomposing users' 3D views into personal and global NeRFs and a novel optimally weighted aggregation of only the latter. We validate the advantage of our approach to learn NeRFs with photorealism and minimal server computation cost on structured synthetic and real-world photo tourism datasets. We further analyze how secure aggregation of global NeRFs in DecentNeRF minimizes the undesired reconstruction of personal content by the server.	翻訳日:2024-03-21 18:27:31 公開日:2024-03-19
# シャープネス最小化器の多様性を考慮したアグノスティックアンサンブル Diversity-Aware Agnostic Ensemble of Sharpness Minimizers ( http://arxiv.org/abs/2403.13204v1 ) ライセンス: Link先を確認	Anh Bui, Vy Vo, Tung Pham, Dinh Phung, Trung Le,	(参考訳) アンサンブル学習の成功を裏付ける理論的・実証的な証拠は、長い間数多くあった。特にディープアンサンブルは、個々のニューラルネットワークのランダム性と表現性をトレーニングし、予測の多様性を得る。一般化に関して、より広範な局所ミニマを追求すると、モデルはトレーニングとテストセット間のシフトに対してより堅牢になる。アンサンブル学習とロスシャープネスの最小化を統合すれば、一般化能力の向上が達成できるかどうかという2つのアプローチから自然研究の疑問が生まれている。本研究は,この関係を解明し,深層アンサンブルにおける多様性と平坦性を促進する学習アルゴリズムであるDASHを提案する。より具体的には、DASHは、基礎学習者が最小のシャープネスの低損失領域へ分岐することを奨励する。我々は,本手法の理論的バックボーンと,アンサンブルの一般化性の向上を示す広範な実証的証拠を提供する。 There has long been plenty of theoretical and empirical evidence supporting the success of ensemble learning. Deep ensembles in particular take advantage of training randomness and expressivity of individual neural networks to gain prediction diversity, ultimately leading to better generalization, robustness and uncertainty estimation. In respect of generalization, it is found that pursuing wider local minima result in models being more robust to shifts between training and testing sets. A natural research question arises out of these two approaches as to whether a boost in generalization ability can be achieved if ensemble learning and loss sharpness minimization are integrated. Our work investigates this connection and proposes DASH - a learning algorithm that promotes diversity and flatness within deep ensembles. More concretely, DASH encourages base learners to move divergently towards low-loss regions of minimal sharpness. We provide a theoretical backbone for our method along with extensive empirical evidence demonstrating an improvement in ensemble generalizability.	翻訳日:2024-03-21 18:27:31 公開日:2024-03-19
# 地球モーバー距離による深度誘導型NRF訓練 Depth-guided NeRF Training via Earth Mover's Distance ( http://arxiv.org/abs/2403.13206v1 ) ライセンス: Link先を確認	Anita Rau, Josiah Aklilu, F. Christopher Holsinger, Serena Yeung-Levy,	(参考訳) ニューラルレージアンス場(NeRF)は、予測された視点のレンダリング損失を最小限に抑えるために訓練される。しかし、測光損失は、同じ画像を得る異なる可能な測地間を曖昧にするための十分な情報を提供していないことが多い。これまでの研究は、NeRFトレーニング中に深度監視を取り入れており、事前訓練された深度ネットワークからの密集した予測を擬似地下真実として活用している。これらの深度事前は、一度ノイズをフィルターすると完璧であると仮定されるが、実際には、その精度を捉えることはより困難である。この研究は、NeRF監視のための深度事前の不確実性に対する新しいアプローチを提案する。カスタムトレーニングされた深さや不確実性前兆を使用する代わりに、既訓練の拡散モデルを用いて、デノナイジングプロセス中の深さを予測し、不確実性を捉える。我々は、深度先行が誤差の傾向にあることを知っているので、L2-lossを通して正確に深度を再現するために、描画された深度を強制するのではなく、地球モーバー距離で光の終端距離分布を監督することを提案する。我々の深度誘導型NeRFは、光度測定における性能を維持しながら、標準深度測定におけるすべてのベースラインを大きなマージンで上回る。 Neural Radiance Fields (NeRFs) are trained to minimize the rendering loss of predicted viewpoints. However, the photometric loss often does not provide enough information to disambiguate between different possible geometries yielding the same image. Previous work has thus incorporated depth supervision during NeRF training, leveraging dense predictions from pre-trained depth networks as pseudo-ground truth. While these depth priors are assumed to be perfect once filtered for noise, in practice, their accuracy is more challenging to capture. This work proposes a novel approach to uncertainty in depth priors for NeRF supervision. Instead of using custom-trained depth or uncertainty priors, we use off-the-shelf pretrained diffusion models to predict depth and capture uncertainty during the denoising process. Because we know that depth priors are prone to errors, we propose to supervise the ray termination distance distribution with Earth Mover's Distance instead of enforcing the rendered depth to replicate the depth prior exactly through L2-loss. Our depth-guided NeRF outperforms all baselines on standard depth metrics by a large margin while maintaining performance on photometric measures.	翻訳日:2024-03-21 18:27:31 公開日:2024-03-19
# マスケッド学習を用いたトランスフォーマを用いた感情認識 Emotion Recognition Using Transformers with Masked Learning ( http://arxiv.org/abs/2403.13731v1 ) ライセンス: Link先を確認	Seongjae Min, Junseok Yang, Sangjun Lim, Junyong Lee, Sangwon Lee, Sejoon Lim,	(参考訳) 近年、深層学習は、人間の感情や行動の分析など、様々な分野で革新的な進歩を遂げている。 ABAW(Affective Behavior Analysis in-the-Wild)コンペティションのようなイニシアチブは、複雑な感情状態の正確な評価を可能にする多様で挑戦的なデータセットを提供することによって、この分野の研究を促進する上で特に役立っている。本研究では、視覚変換器(ViT)とトランスフォーマー(Transformer)モデルを用いて、感情の肯定性と強さ、様々な表情の認識、基本的な筋運動を表すアクションユニット(AU)の検出に焦点をあてる。このアプローチは従来の畳み込みニューラルネットワーク(CNN)とLong Short-Term Memory(LSTM)ベースの手法を超越し、時間的および空間的特徴の理解を最大化する新しいTransformerベースのフレームワークを提案する。本研究のコアコントリビューションは,ランダムフレームマスキングによる学習手法の導入と,不均衡なデータに適応した焦点損失の適用,実世界の環境における感情と行動分析の正確性と適用性の向上である。このアプローチは、感情コンピューティングとディープラーニング方法論の進歩に寄与することが期待されている。 In recent years, deep learning has achieved innovative advancements in various fields, including the analysis of human emotions and behaviors. Initiatives such as the Affective Behavior Analysis in-the-wild (ABAW) competition have been particularly instrumental in driving research in this area by providing diverse and challenging datasets that enable precise evaluation of complex emotional states. This study leverages the Vision Transformer (ViT) and Transformer models to focus on the estimation of Valence-Arousal (VA), which signifies the positivity and intensity of emotions, recognition of various facial expressions, and detection of Action Units (AU) representing fundamental muscle movements. This approach transcends traditional Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) based methods, proposing a new Transformer-based framework that maximizes the understanding of temporal and spatial features. The core contributions of this research include the introduction of a learning technique through random frame masking and the application of Focal loss adapted for imbalanced data, enhancing the accuracy and applicability of emotion and behavior analysis in real-world settings. This approach is expected to contribute to the advancement of emotional computing and deep learning methodologies.	翻訳日:2024-03-21 16:08:57 公開日:2024-03-19
# パウリのチャネル学習に適応性は役に立たない Adaptivity is not helpful for Pauli channel learning ( http://arxiv.org/abs/2403.09033v3 ) ライセンス: Link先を確認	Xuan Du Trinh, Nengkun Yu,	(参考訳) このノートは、適応戦略が、絡み合った入力でPauliチャネルを学習し、テストするための追加の利点を提供していないことを示している。まず、一般ノルム$l_p$に対して、絡み合った入力を持つパウリチャネルを学習する際の厳密なクエリ複雑性を確立する。特に、$l_{1}$, $l_2$, $l_\infty$ノルムの複雑さは、文献の絡み合いを用いた以前の結果と比較して改善または整合する。 Pauliチャネルが$l_p$のホワイトノイズソースであるかどうかをテストするためのクエリの複雑さも解決します。さらに,誤差分布のエントロピーと非ゼロ確率のカウントを特徴とするPauliチャネルのノイズレベルを推定するクエリの複雑さが$\Theta(4^n/n)$であることを示す。さらに、$\Theta(4^n/n)$クエリは、2つのパウリチャネル間のダイヤモンドノルムを推定するのに十分である。 This note shows that adaptive strategies do not offer additional advantages for learning and testing Pauli channels with entangled input. First, the tight query complexity of learning Pauli channels with entangled input is established for the general norm $l_p$. In particular, the complexities for the $l_{1}$, $l_2$ and $l_\infty$ norms are improved or matched compared to previous results using entanglement in the literature. We also settle the query complexity to test if Pauli channels are white noise sources across $l_p$. Additionally, we demonstrate that the query complexity of estimating the noise level of a Pauli channel, characterized by the entropy of its error distribution and the count of non-zero probabilities, is $\Theta(4^n/n)$. Further, $\Theta(4^n/n)$ queries are sufficient to estimate the diamond norm between two Pauli channels.	翻訳日:2024-03-21 11:37:40 公開日:2024-03-19
# Open Stamped Parts Dataset Open Stamped Parts Dataset ( http://arxiv.org/abs/2403.10369v2 ) ライセンス: Link先を確認	Sarah Antiles, Sachin S. Talathi,	(参考訳) 自動車製造用の切削金属シートの合成および実像を特徴とするオープンスタンプ部品データセット(OSPD)について述べる。実際の写真は7台のカメラから撮影され、7,980枚の未ラベル画像と1,680枚のラベル画像で構成されている。さらに, ホールの10%に合成マスクをオーバーレイすることで, 欠陥データセットをコンパイルした。合成データセットは、実際の製造環境を、カメラに対する照明と部分配置の観点から再現する。合成データは、訓練用画像7,980枚、検証用画像1,680枚、テスト用画像1,680枚を含む。合成データの穴の10%は、実際の画像データセットで生成された欠陥を模倣している。合成OSPDのホール検出モデルを訓練し,67.2%,94.4%の精度でリコールスコアを修正した。我々はOSPDを用いた自動車製造、より広範な機械学習およびコンピュータビジョンのコミュニティの研究者が、金属シートスタンププロセスにおける切削孔の欠陥検出の最先端を推し進めることを期待している。データセットは、https://tinyurl.com/hm6xatd7.comでダウンロードできる。 We present the Open Stamped Parts Dataset (OSPD), featuring synthetic and real images of stamped metal sheets for auto manufacturing. The real part images, captured from 7 cameras, consist of 7,980 unlabeled images and 1,680 labeled images. In addition, we have compiled a defect dataset by overlaying synthetically generated masks on 10% of the holes. The synthetic dataset replicates the real manufacturing environment in terms of lighting and part placement relative to the cameras. The synthetic data includes 7,980 training images, 1,680 validation images and 1,680 test images, each with bounding box and segmentation mask annotations around all holes. 10% of the holes in the synthetic data mimic defects generated in the real image dataset. We trained a hole-detection model on the synthetic-OSPD, achieving a modified recall score of 67.2% and a precision of 94.4% . We anticipate researchers in the auto manufacturing and broader machine learning and computer vision communities using OSPD to advance the state of the art in defect detection of stamped holes in the metalsheet stamping process. The dataset is available for download at: https://tinyurl.com/hm6xatd7	翻訳日:2024-03-21 11:37:40 公開日:2024-03-19
# V2X-DGW: 逆気象条件下でのマルチエージェント知覚のためのドメイン一般化 V2X-DGW: Domain Generalization for Multi-agent Perception under Adverse Weather Conditions ( http://arxiv.org/abs/2403.11371v2 ) ライセンス: Link先を確認	Baolu Li, Jinlong Li, Xinyu Liu, Runsheng Xu, Zhengzhong Tu, Jiacheng Guo, Xiaopeng Li, Hongkai Yu,	(参考訳) 現在のLiDARベースのV2X(Vine-to-Everything)マルチエージェント認識システムは、3Dオブジェクト検出において大きな成功を収めている。これらのモデルは、訓練済みのクリーンな天候下ではよく機能するが、現実のドメインギャップで目に見えない悪天候に苦しむ。本稿では,悪天候下でのマルチエージェント認識システム上でのLiDARに基づく3次元物体検出のための領域一般化手法であるV2X-DGWを提案する。クリーンな天候だけでなく、クリーンな天気データのみを学習することで、好適なマルチエージェントのパフォーマンスを確保することを目的としている。この領域の研究を進めるために、我々は広く使われている2つのマルチエージェントデータセットに対する3つの悪天候条件の影響をシミュレートし、2つの新しいベンチマークデータセット、OPV2V-wとV2XSet-wを作成しました。この目的のために,まずアダプティブ・ウェザー・アジュメンテーション(AWA)を導入し,未知の悪天候条件を模倣し,TWA(Trust-rea Weather-invariant Alignment)とACA(Agent-aware Contrastive Alignment)の2つのアライメントを提案する。我々のV2X-DGWは、目に見えない悪天候を改善することができた。 Current LiDAR-based Vehicle-to-Everything (V2X) multi-agent perception systems have shown the significant success on 3D object detection. While these models perform well in the trained clean weather, they struggle in unseen adverse weather conditions with the real-world domain gap. In this paper, we propose a domain generalization approach, named V2X-DGW, for LiDAR-based 3D object detection on multi-agent perception system under adverse weather conditions. Not only in the clean weather does our research aim to ensure favorable multi-agent performance, but also in the unseen adverse weather conditions by learning only on the clean weather data. To advance research in this area, we have simulated the impact of three prevalent adverse weather conditions on two widely-used multi-agent datasets, resulting in the creation of two novel benchmark datasets: OPV2V-w and V2XSet-w. To this end, we first introduce the Adaptive Weather Augmentation (AWA) to mimic the unseen adverse weather conditions, and then propose two alignments for generalizable representation learning: Trust-region Weather-invariant Alignment (TWA) and Agent-aware Contrastive Alignment (ACA). Extensive experimental results demonstrate that our V2X-DGW achieved improvements in the unseen adverse weather conditions.	翻訳日:2024-03-21 11:32:23 公開日:2024-03-19
# 聴覚的情緒的ミミリー強度推定のための効率的な特徴抽出とレイトフュージョン戦略 Efficient Feature Extraction and Late Fusion Strategy for Audiovisual Emotional Mimicry Intensity Estimation ( http://arxiv.org/abs/2403.11757v2 ) ライセンス: Link先を確認	Jun Yu, Wangyuan Zhu, Jichao Zhu,	(参考訳) 本稿では,第6回情緒的行動分析(ABAW)コンペティション(ABAW)コンペティション(ABAW)コンペティション(Emotional Mimicry Intensity(EMI)推定課題)の解決法を提案する。この課題に対処するために,ビデオモダリティのためのResNet18とAUに基づいてリッチなデュアルチャネル視覚特徴を抽出し,オーディオモダリティのためのWav2Vec2.0に基づく効果的なシングルチャネル特徴を抽出した。これにより、視覚的モダリティに対する包括的感情的特徴が得られました。さらに、後期融合戦略を利用して視覚モデルと音響モデルの予測を平均化し、より正確な視覚的感情的模倣強度を推定した。実験の結果,平均ピアソン相関係数($\rho$)を6つの感情次元で比較し,0.3288。 In this paper, we present the solution to the Emotional Mimicry Intensity (EMI) Estimation challenge, which is part of 6th Affective Behavior Analysis in-the-wild (ABAW) Competition.The EMI Estimation challenge task aims to evaluate the emotional intensity of seed videos by assessing them from a set of predefined emotion categories (i.e., "Admiration", "Amusement", "Determination", "Empathic Pain", "Excitement" and "Joy"). To tackle this challenge, we extracted rich dual-channel visual features based on ResNet18 and AUs for the video modality and effective single-channel features based on Wav2Vec2.0 for the audio modality. This allowed us to obtain comprehensive emotional features for the audiovisual modality. Additionally, leveraging a late fusion strategy, we averaged the predictions of the visual and acoustic models, resulting in a more accurate estimation of audiovisual emotional mimicry intensity. Experimental results validate the effectiveness of our approach, with the average Pearson's correlation Coefficient($\rho$) across the 6 emotion dimensionson the validation set achieving 0.3288.	翻訳日:2024-03-21 11:32:23 公開日:2024-03-19
# 非負のニューラルネットワークの固定点 Fixed points of nonnegative neural networks ( http://arxiv.org/abs/2106.16239v8 ) ライセンス: Link先を確認	Tomasz J. Piotrowski, Renato L. G. Cavalcante, Mateusz Gabor,	(参考訳) 非負のベクトルを非負のベクトルにマッピングするニューラルネットワークとして定義する、非負のニューラルネットワークの解析に固定点理論を用いる。まず、非負の重みとバイアスを持つ非負のニューラルネットワークは、非線形ペロン・フロベニウス理論の枠組みの中で単調かつ(弱く)スケーラブルな写像として認識できることを示す。この事実により、同じ次元の入力と出力を持つ非負のニューラルネットワークの固定点の存在条件を提供することができ、これらの条件は凸解析の引数を用いて最近得られた条件よりも弱い。さらに、非負の重みとバイアスを持つ非負のニューラルネットワークの固定点集合の形状が間隔であり、穏やかな条件下では点に縮退することを示した。これらの結果は、より一般的な非負のニューラルネットワークの固定点の存在を得るために用いられる。実用的観点からは, オートエンコーダの挙動の理解に寄与し, 深層平衡モデルにおける今後の発展に有用な数学機械も提供する。 We use fixed point theory to analyze nonnegative neural networks, which we define as neural networks that map nonnegative vectors to nonnegative vectors. We first show that nonnegative neural networks with nonnegative weights and biases can be recognized as monotonic and (weakly) scalable mappings within the framework of nonlinear Perron-Frobenius theory. This fact enables us to provide conditions for the existence of fixed points of nonnegative neural networks having inputs and outputs of the same dimension, and these conditions are weaker than those recently obtained using arguments in convex analysis. Furthermore, we prove that the shape of the fixed point set of nonnegative neural networks with nonnegative weights and biases is an interval, which under mild conditions degenerates to a point. These results are then used to obtain the existence of fixed points of more general nonnegative neural networks. From a practical perspective, our results contribute to the understanding of the behavior of autoencoders, and we also offer valuable mathematical machinery for future developments in deep equilibrium models.	翻訳日:2024-03-21 02:10:44 公開日:2024-03-19
# 強化学習によるヒューマンシークエンシャル意思決定の改善 Improving Human Sequential Decision-Making with Reinforcement Learning ( http://arxiv.org/abs/2108.08454v5 ) ライセンス: Link先を確認	Hamsa Bastani, Osbert Bastani, Wichinpong Park Sinchaisri,	(参考訳) 労働者は良い決断をする方法を学ぶのにかなりの時間を費やします。しかし、ある決定の有効性を評価することは、複雑である可能性がある。例えば、決定結果はしばしば長期的であり、複雑な方法で元の決定と関係する。驚くべきことに、優れた意思決定戦略を学ぶことは難しいが、単純で簡潔な形で表現されることも多い。逐次的意思決定に着目し,トレースデータから"ベストプラクティス"を抽出し,その洞察を解釈可能な"チップ"の形で人間に伝達する,新しい機械学習アルゴリズムを設計する。提案アルゴリズムは, 作業者による行動と最適方針による行動のギャップを, より高い性能を達成するためにどの行動が適切であるかを考慮し, 最善を尽くすためのヒントを選択する。本手法は,参加者が仮想キッチンを管理するランダム化制御実験を通じて評価する。実験の結果,アルゴリズムによって生成されたヒントは,直感的なベースラインに対して人的パフォーマンスを著しく向上させることができることがわかった。さらに,人間-AIインタフェースを意図したアルゴリズムの設計を支援するための実証的な知見をいくつか紹介する。例えば、参加者は単にヒントに盲目的に従うのではなく、自分たちの経験と組み合わせて、パフォーマンスを改善するための追加の戦略を発見するのです。 Workers spend a significant amount of time learning how to make good decisions. Evaluating the efficacy of a given decision, however, can be complicated -- e.g., decision outcomes are often long-term and relate to the original decision in complex ways. Surprisingly, even though learning good decision-making strategies is difficult, they can often be expressed in simple and concise forms. Focusing on sequential decision-making, we design a novel machine learning algorithm that is capable of extracting "best practices" from trace data and conveying its insights to humans in the form of interpretable "tips". Our algorithm selects the tip that best bridges the gap between the actions taken by human workers and those taken by the optimal policy in a way that accounts for which actions are consequential for achieving higher performance. We evaluate our approach through a series of randomized controlled experiments where participants manage a virtual kitchen. Our experiments show that the tips generated by our algorithm can significantly improve human performance relative to intuitive baselines. In addition, we discuss a number of empirical insights that can help inform the design of algorithms intended for human-AI interfaces. For instance, we find evidence that participants do not simply blindly follow our tips; instead, they combine them with their own experience to discover additional strategies for improving performance.	翻訳日:2024-03-21 02:10:44 公開日:2024-03-19
# ソフトウェア工学実験における金融インセンティブの異なる手法を用いた実験室実験 A Laboratory Experiment on Using Different Financial-Incentivization Schemes in Software-Engineering Experimentation ( http://arxiv.org/abs/2202.10985v6 ) ライセンス: Link先を確認	Dmitri Bershadskyy, Jacob Krüger, Gül Çalıklı, Siegmar Otto, Sarah Zabel, Jannik Greif, Robert Heyer,	(参考訳) ソフトウェア工学の研究では、多くの経験的研究がオープンソースや業界開発者によって行われている。しかし、経済学や心理学のような他の研究コミュニティとは対照的に、参加者の行動を動機づけ、パフォーマンスに報いる戦略として金銭的インセンティブ(すなわち、お金を払うこと)を使用する実験はごくわずかである。最新のSIGSOFT Empirical Standardsでは、調査への参加の増加のためだけに、実際のモチベーションや実験の振る舞いを模倣するためではなく、支払いについて言及している。本稿では、金融インセンティブの異なるスキームが開発者に与える影響を研究することによって、このギャップに対処する制御実験を報告する。そこで我々はまず,(1)従業員が好むパフォーマンス依存型スキーム,(2)パフォーマンス非依存型スキーム,(3)オープンソース開発を模倣するスキームの3つのインセンティブを設計した実世界の金融インセンティブに関する調査を行った。そして,これらの3つのスキームが参加者のパフォーマンスに与える影響について検討した。提案手法は,ソフトウェア工学実験における参加者のパフォーマンスに影響を及ぼす可能性が示唆された。サンプルサイズが小さいため、統計的に有意ではないが、それでも明らかな傾向が観察できる。私たちのコントリビューションは、ファイナンシャルインセンティブが実験参加者や実世界のシナリオに与える影響を理解し、研究者が実験を設計し、開発者を補償する組織を指導する上で役立ちます。 In software-engineering research, many empirical studies are conducted with open-source or industry developers. However, in contrast to other research communities like economics or psychology, only few experiments use financial incentives (i.e., paying money) as a strategy to motivate participants' behavior and reward their performance. The most recent version of the SIGSOFT Empirical Standards mentions payouts only for increasing participation in surveys, but not for mimicking real-world motivations and behavior in experiments. Within this article, we report a controlled experiment in which we tackled this gap by studying how different financial incentivization schemes impact developers. For this purpose, we first conducted a survey on financial incentives used in the real-world, based on which we designed three incentivization schemes: (1) a performance-dependent scheme that employees prefer, (2) a scheme that is performance-independent, and (3) a scheme that mimics open-source development. Then, using a between-subject experimental design, we explored how these three schemes impact participants' performance. Our findings indicate that the different schemes can impact participants' performance in software-engineering experiments. Due to the small sample sizes, our results are not statistically significant, but we can still observe clear tendencies. Our contributions help understand the impact of financial incentives on participants in experiments as well as real-world scenarios, guiding researchers in designing experiments and organizations in compensating developers.	翻訳日:2024-03-21 02:10:44 公開日:2024-03-19
# GCT:半教師付きFew-Shot学習のためのグラフ共同学習 GCT: Graph Co-Training for Semi-Supervised Few-Shot Learning ( http://arxiv.org/abs/2203.07738v4 ) ライセンス: Link先を確認	Rui Xu, Lei Xing, Shuai Shao, Lifei Zhao, Baodi Liu, Weifeng Liu, Yicong Zhou,	(参考訳) 近年,データスカース問題の解決を目的としたFSL (Few-shot Learning) が注目されている。一般的なFSLフレームワークには2つのフェーズがある。 i)プレトレインフェーズは、ベースデータを用いてCNNベースの特徴抽出器を訓練する。 (II) メタテストフェーズでは, 凍結特徴抽出器を新しいデータに適用し, 認識のための分類器を設計する。少ないショットデータ分布を補正するために、研究者はラベルなしデータを導入してSSFSL(Semi-Supervised Few-Shot Learning)を提案する。 SSFSL は FSL コミュニティにおいて優れた性能を発揮することが証明されているが、未学習の特徴抽出器はクロスカテゴリ設定のため、新しいデータに不完全に対応できないという根本的な問題がある。通常、新しい特徴に大量のノイズが導入される。 FEM(Feature-Extractor-Maladaptive)問題と呼ぶ。 FEMに取り組むために,本稿では2つの取り組みを行う。まず,新しいラベル予測手法である孤立グラフ学習(IGL)を提案する。 IGLは、生データをグラフ空間にエンコードするラプラシアン演算子を導入し、分類時の特徴への依存を減らし、予測のためにラベル空間にグラフ表現を投影する。 IGLは特徴表現の観点からノイズの負の影響を弱めることができ、SSFSLに適した独立した訓練や試験手順にも柔軟である。第2に,提案したIGLを協調学習フレームワークに拡張することにより,マルチモーダル融合の観点から,この課題に対処するグラフコレーニング(GCT)を提案する。 GCTは、IGL分類器を横断的に強化するために、2つのモーダル特徴を持つラベル付きサンプルを利用する半教師付き手法である。 Few-shot learning (FSL), purposing to resolve the problem of data-scarce, has attracted considerable attention in recent years. A popular FSL framework contains two phases: (i) the pre-train phase employs the base data to train a CNN-based feature extractor. (ii) the meta-test phase applies the frozen feature extractor to novel data (novel data has different categories from base data) and designs a classifier for recognition. To correct few-shot data distribution, researchers propose Semi-Supervised Few-Shot Learning (SSFSL) by introducing unlabeled data. Although SSFSL has been proved to achieve outstanding performances in the FSL community, there still exists a fundamental problem: the pre-trained feature extractor can not adapt to the novel data flawlessly due to the cross-category setting. Usually, large amounts of noises are introduced to the novel feature. We dub it as Feature-Extractor-Maladaptive (FEM) problem. To tackle FEM, we make two efforts in this paper. First, we propose a novel label prediction method, Isolated Graph Learning (IGL). IGL introduces the Laplacian operator to encode the raw data to graph space, which helps reduce the dependence on features when classifying, and then project graph representation to label space for prediction. The key point is that: IGL can weaken the negative influence of noise from the feature representation perspective, and is also flexible to independently complete training and testing procedures, which is suitable for SSFSL. Second, we propose Graph Co-Training (GCT) to tackle this challenge from a multi-modal fusion perspective by extending the proposed IGL to the co-training framework. GCT is a semi-supervised method that exploits the unlabeled samples with two modal features to crossly strengthen the IGL classifier.	翻訳日:2024-03-21 02:10:44 公開日:2024-03-19
# 遠位両腕ロボット操作のためのゴール条件付きデュアルアクション模倣学習 Goal-conditioned dual-action imitation learning for dexterous dual-arm robot manipulation ( http://arxiv.org/abs/2203.09749v2 ) ライセンス: Link先を確認	Heecheol Kim, Yoshiyuki Ohmura, Yasuo Kuniyoshi,	(参考訳) バナナの皮剥きなどの変形可能な物体の長い水平なデキスタラスロボット操作は、物体モデリングの難しさと安定的でデキスタラスな操作スキルに関する知識の欠如から問題となる課題である。本稿では、人間の実演データを用いて、巧妙な操作スキルを学習できる、GC-DA(Deep mimicion Learning)アプローチを提案する。従来のDIL法は、現在の感覚入力と反応動作をマッピングするが、これはしばしば、繰り返し発生する動作の計算による模倣学習における複合的なエラーのために失敗する。この方法は、対象物の正確な操作が必要な場合(局所動作)にのみ反応作用を予測し、正確な操作が不要な場合(グローバル動作)に全軌道を生成する。この二重作用定式化は、反応局所作用中の対象物体の予期せぬ変化に応答しながら、軌道に基づく大域作用を用いた模倣学習における複合的誤りを効果的に防止する。提案手法は実物のデュアルアームロボットを用いて試験し,バナナピーリング作業の達成に成功した。 Long-horizon dexterous robot manipulation of deformable objects, such as banana peeling, is a problematic task because of the difficulties in object modeling and a lack of knowledge about stable and dexterous manipulation skills. This paper presents a goal-conditioned dual-action (GC-DA) deep imitation learning (DIL) approach that can learn dexterous manipulation skills using human demonstration data. Previous DIL methods map the current sensory input and reactive action, which often fails because of compounding errors in imitation learning caused by the recurrent computation of actions. The method predicts reactive action only when the precise manipulation of the target object is required (local action) and generates the entire trajectory when precise manipulation is not required (global action). This dual-action formulation effectively prevents compounding error in the imitation learning using the trajectory-based global action while responding to unexpected changes in the target object during the reactive local action. The proposed method was tested in a real dual-arm robot and successfully accomplished the banana-peeling task.	翻訳日:2024-03-21 02:10:44 公開日:2024-03-19
# 双相最適化による超低レイテンシANN-SNN変換の無損失化 Towards Lossless ANN-SNN Conversion under Ultra-Low Latency with Dual-Phase Optimization ( http://arxiv.org/abs/2205.07473v3 ) ライセンス: Link先を確認	Ziming Wang, Shuang Lian, Yuhao Zhang, Xiaoxin Cui, Rui Yan, Huajin Tang,	(参考訳) 非同期離散イベントで動作するスパイキングニューラルネットワーク(SNN)は、スパース計算によるエネルギー効率の向上を示す。ディープSNNを実装するための一般的なアプローチは、ANNの効率的なトレーニングとSNNの効率的な推論を組み合わせたANN-SNN変換である。しかし、精度の損失は通常無視できないが、特に数ステップで SNN のレイテンシに敏感なエッジデバイスへの応用を著しく制限する。本稿では,SNNにおける負または過フロー残留膜電位の誤表現に起因する性能劣化を最初に同定する。そこで我々は,変換誤差を量子化誤差,クリッピング誤差,残留膜電位表現誤差の3つの部分に分解した。そこで本研究では,これらの誤りを最小化するための2段階変換アルゴリズムを提案する。さらに,各ステージが相補的な方法で大きなパフォーマンス向上を達成することを示す。提案手法は, CIFAR-10, CIFAR-100, ImageNetなどの挑戦的データセットを用いて, 精度, レイテンシ, エネルギー保存の観点から, 最先端の性能を示す。さらに,本手法は,既存のスパイク検出アルゴリズムと比較して,超低レイテンシ下での回帰性能が顕著に向上することを示すため,より困難な物体検出タスクを用いて評価を行った。コードはhttps://github.com/Windere/snn-cvt-dual-phaseで公開されている。 Spiking neural networks (SNNs) operating with asynchronous discrete events show higher energy efficiency with sparse computation. A popular approach for implementing deep SNNs is ANN-SNN conversion combining both efficient training of ANNs and efficient inference of SNNs. However, the accuracy loss is usually non-negligible, especially under a few time steps, which restricts the applications of SNN on latency-sensitive edge devices greatly. In this paper, we first identify that such performance degradation stems from the misrepresentation of the negative or overflow residual membrane potential in SNNs. Inspired by this, we decompose the conversion error into three parts: quantization error, clipping error, and residual membrane potential representation error. With such insights, we propose a two-stage conversion algorithm to minimize those errors respectively. Besides, We show each stage achieves significant performance gains in a complementary manner. By evaluating on challenging datasets including CIFAR-10, CIFAR- 100 and ImageNet, the proposed method demonstrates the state-of-the-art performance in terms of accuracy, latency and energy preservation. Furthermore, our method is evaluated using a more challenging object detection task, revealing notable gains in regression performance under ultra-low latency when compared to existing spike-based detection algorithms. Codes are available at https://github.com/Windere/snn-cvt-dual-phase.	翻訳日:2024-03-21 02:10:44 公開日:2024-03-19
# 普遍リンドブラッド方程式から得られる定常状態の精度の定量化 Quantifying the accuracy of steady states obtained from the Universal Lindblad Equation ( http://arxiv.org/abs/2206.02917v2 ) ライセンス: Link先を確認	Frederik Nathan, Mark S. Rudner,	(参考訳) 普遍リンドブラッド方程式(ULE)により予測される定常予測値は,有効系-バス結合と線形にスケールする有界補正まで正確に,$\Gamma$(顕微鏡結合の2次)であることを示す。また、ULEの導出時に用いられる準局所的な準局所的な「メモリ・ドレッシング」変換を同定し、その逆は、安定状態値自体が$\Gamma$でゼロにスケールする非平衡電流であっても、総じて$\Gamma$で0にスケールするオブザーバブルの相対偏差を達成するために適用することができる。この結果はリンドブラッド方程式の精度に関する最近特定された制限に対する解となり、保存された量の電流における大きな相対誤差の可能性を強調した。この変換により、弱結合状態における電流の高忠実度計算が可能となり、リンドブラッド形式マスター方程式の安定性と物理性を保ちながら、熱力学的一貫性と局所保存則が保証される。 We show that steady-state expectation values predicted by the universal Lindblad equation (ULE) are accurate up to bounded corrections that scale linearly with the effective system-bath coupling, $\Gamma$ (second order in the microscopic coupling). We also identify a near-identity, quasilocal "memory-dressing" transformation, used during the derivation of the ULE, whose inverse can be applied to achieve relative deviations of observables that generically scale to zero with $\Gamma$, even for nonequilibrium currents whose steady-state values themselves scale to zero with $\Gamma$. This result provides a solution to recently identified limitations on the accuracy of Lindblad equations, which highlighted a potential for significant relative errors in currents of conserved quantities. The transformation we identify allows for high-fidelity computation of currents in the weak-coupling regime, ensuring thermodynamic consistency and local conservation laws, while retaining the stability and physicality of a Lindblad-form master equation.	翻訳日:2024-03-21 02:10:44 公開日:2024-03-19
# フロケット回路のトポロジカル欠陥 Topological Defects in Floquet Circuits ( http://arxiv.org/abs/2206.06272v3 ) ライセンス: Link先を確認	Mao Tian Tan, Yifan Wang, Aditi Mitra,	(参考訳) トポロジカルな欠陥を持つ駆動Ising鎖を記述したFloquet回路を導入する。対応するゲートはスピンを反転する欠陥と、クラマース・ワニエ双対変換を明示的に実装する双対性欠陥を含む。フロケユニタリ進化作用素はそのような欠陥で可換であるが、双対性欠陥は半分の状態を射出するのでユニタリではない。これらの欠陥の応用は2つあります。 1つは、システムの周りに広がる「空間的」欠陥の存在下での戻り振幅を分析することである。我々は、返却振幅が欠陥の融合規則に合致していることを明確に検証する。第二の応用は、反周期的・双対的境界条件を実装する「時間的」欠陥の存在下でのユニタリ進化を研究することである。後者の場合、単一の未ペアローカライズされたMajorana 0 モードが現れることを示す。我々は、このFloquet回路の対称性として機能する演算子を明示的に構成する。また, 複数箇所のシステムに対して, 一つの時間ステップで絡み合いエントロピーの解析式を, 上記のすべての欠陥構成に対して提示する。 We introduce a Floquet circuit describing the driven Ising chain with topological defects. The corresponding gates include a defect that flips spins as well as the duality defect that explicitly implements the Kramers-Wannier duality transformation. The Floquet unitary evolution operator commutes with such defects, but the duality defect is not unitary, as it projects out half the states. We give two applications of these defects. One is to analyze the return amplitudes in the presence of "space-like" defects stretching around the system. We verify explicitly that the return amplitudes are in agreement with the fusion rules of the defects. The second application is to study unitary evolution in the presence of "time-like" defects that implement anti-periodic and duality-twisted boundary conditions. We show that a single unpaired localized Majorana zero mode appears in the latter case. We explicitly construct this operator, which acts as a symmetry of this Floquet circuit. We also present analytic expressions for the entanglement entropy after a single time step for a system of a few sites, for all of the above defect configurations.	翻訳日:2024-03-21 02:10:44 公開日:2024-03-19
# ネスト合成最適化のためのリーマン確率勾配法 Riemannian Stochastic Gradient Method for Nested Composition Optimization ( http://arxiv.org/abs/2207.09350v2 ) ライセンス: Link先を確認	Dewei Zhang, Sam Davanloo Tajbakhsh,	(参考訳) この研究は、各函数が期待を含むリーマン多様体上のネスト形式の函数の構成の最適化を考える。このような問題は、強化学習における政策評価やメタ学習におけるモデルカスタマイズといった応用において人気が高まっている。非合成最適化のためのリーマン確率勾配法は、内部関数の確率近似が外函数の勾配にバイアスを与えるものとして直接適用できない。 2段階の組成最適化のために、およそ定常点を求めるリーマン確率組成勾配 (R-SCGD) 法を提案し、期待される2乗リーマン勾配が$\epsilon$, in $O(\epsilon^{-2})$ である。さらに,多層ネスト構成構造問題に対するR-SCGDアルゴリズムを,一階確率オラクルに対して$O(\epsilon^{-2})$と同じ複雑さで一般化する。最後に、強化学習における政策評価問題に対して、R-SCGD法の性能を数値的に評価する。 This work considers optimization of composition of functions in a nested form over Riemannian manifolds where each function contains an expectation. This type of problems is gaining popularity in applications such as policy evaluation in reinforcement learning or model customization in meta-learning. The standard Riemannian stochastic gradient methods for non-compositional optimization cannot be directly applied as stochastic approximation of inner functions create bias in the gradients of the outer functions. For two-level composition optimization, we present a Riemannian Stochastic Composition Gradient Descent (R-SCGD) method that finds an approximate stationary point, with expected squared Riemannian gradient smaller than $\epsilon$, in $O(\epsilon^{-2})$ calls to the stochastic gradient oracle of the outer function and stochastic function and gradient oracles of the inner function. Furthermore, we generalize the R-SCGD algorithms for problems with multi-level nested compositional structures, with the same complexity of $O(\epsilon^{-2})$ for the first-order stochastic oracle. Finally, the performance of the R-SCGD method is numerically evaluated over a policy evaluation problem in reinforcement learning.	翻訳日:2024-03-21 02:00:54 公開日:2024-03-19
# セキュアな同型解析のための検証可能なエンコーディング Verifiable Encodings for Secure Homomorphic Analytics ( http://arxiv.org/abs/2207.14071v3 ) ライセンス: Link先を確認	Sylvain Chatel, Christian Knabenhans, Apostolos Pyrgelis, Carmela Troncoso, Jean-Pierre Hubaux,	(参考訳) 暗号文上での算術演算の直接実行を可能にする同型暗号化は、機密データ上でのクラウドデリゲート計算のプライバシを保護するための有望なソリューションである。しかし、計算結果の正確性は保証されない。本稿では,異なるトレードオフの下で,暗号化アルゴリズムの特徴を損なうことなく,クラウドベースの同型計算の実用的なクライアント検証を可能にする2つの誤り検出符号化とビルド認証手法を提案する。我々の認証装置は、整数上の完全同型暗号スキームに基づいて、トレンドリング学習を演算する。我々は,暗号化されたデータ上で実行されたアウトソース計算の検証システムであるVERITASにソリューションを実装した。従来の作業とは対照的に、VERITASは任意のホモモルフィック動作の検証をサポートしており、ライドシェアリング、ゲノムデータ分析、暗号化検索、機械学習のトレーニングと推論など、様々な応用にその実用性を実証している。 Homomorphic encryption, which enables the execution of arithmetic operations directly on ciphertexts, is a promising solution for protecting privacy of cloud-delegated computations on sensitive data. However, the correctness of the computation result is not ensured. We propose two error detection encodings and build authenticators that enable practical client-verification of cloud-based homomorphic computations under different trade-offs and without compromising on the features of the encryption algorithm. Our authenticators operate on top of trending ring learning with errors based fully homomorphic encryption schemes over the integers. We implement our solution in VERITAS, a ready-to-use system for verification of outsourced computations executed over encrypted data. We show that contrary to prior work VERITAS supports verification of any homomorphic operation and we demonstrate its practicality for various applications, such as ride-hailing, genomic-data analysis, encrypted search, and machine-learning training and inference.	翻訳日:2024-03-21 02:00:54 公開日:2024-03-19
# 長波長光ファイバー干渉計の限界と展望 Limits and prospects for long-baseline optical fiber interferometry ( http://arxiv.org/abs/2208.09247v2 ) ライセンス: Link先を確認	Christopher Hilweg, Danial Shadmany, Philip Walther, Nergis Mavalvala, Vivishek Sudhir,	(参考訳) 今日の最も精密な光学機器(重力波干渉計と光学原子時計)は、光子の正確な感度を実現するために長い貯蔵時間に依存している。光ファイバー技術は、長距離光伝搬を実現するための最も広く利用されているプラットフォームである。しかし、精密光学測定への応用は少ない。本稿では,従来の(ソリッドコア)光ファイバのノイズ特性について,高精度な光学計測と長距離情報伝達に依存する量子技術の観点から概観する。そうすることで、このプラットフォームの限界を強調し、構造化ファイバー技術がこれらの制限を克服する機会を指摘する。 Today's most precise optical instruments -- gravitational-wave interferometers and optical atomic clocks -- rely on long storage times for photons to realize their exquisite sensitivity. Optical fiber technology is the most widely deployed platform for realizing long-distance optical propagation. Yet, their application to precision optical measurements is sparse. We review the state-of-the-art in the noise performance of conventional (solid-core) optical fibers from the perspective of precision optical measurements and quantum technology that rely on precise transfer of information over long distances. In doing so, we highlight the limitations of this platform and point to the opportunities that structured fiber technology offers to overcome some of these limitations.	翻訳日:2024-03-21 02:00:54 公開日:2024-03-19
# 深層学習から見たアライメント問題 The Alignment Problem from a Deep Learning Perspective ( http://arxiv.org/abs/2209.00626v6 ) ライセンス: Link先を確認	Richard Ngo, Lawrence Chan, Sören Mindermann,	(参考訳) 今後数年や数十年で、人工知能(AGI)は多くの重要なタスクにおいて人間の能力を上回る可能性がある。我々は、それを防ぐためのかなりの努力がなければ、AIGは人間の利益と矛盾している(すなわち、不一致)目標を追求することを学ぶことができると論じる。現代の最も有能なモデルのように訓練された場合、AGIは、より高い報酬を得るために欺意的に行動することを学び、微調整された分布を超えて一般化する内的表現された目標を学習し、パワー探索戦略を用いてそれらの目標を追求する。これらの特性の新たな証拠をレビューする。これらの特性を持つAGIは整列が困難であり、そうでない場合でも整列する可能性がある。最後に、不整合AGIの展開が人類の世界のコントロールを不可逆的に損なう可能性について概説し、この結果の防止を目的とした研究の方向性について概説する。 In coming years or decades, artificial general intelligence (AGI) may surpass human capabilities at many critical tasks. We argue that, without substantial effort to prevent it, AGIs could learn to pursue goals that are in conflict (i.e. misaligned) with human interests. If trained like today's most capable models, AGIs could learn to act deceptively to receive higher reward, learn misaligned internally-represented goals which generalize beyond their fine-tuning distributions, and pursue those goals using power-seeking strategies. We review emerging evidence for these properties. AGIs with these properties would be difficult to align and may appear aligned even when they are not. Finally, we briefly outline how the deployment of misaligned AGIs might irreversibly undermine human control over the world, and we review research directions aimed at preventing this outcome.	翻訳日:2024-03-21 02:00:54 公開日:2024-03-19
# パラメトリックダウン・コンバージョンにおける横ラゲア・ガウスモードのスペクトル特性 Spectral Properties of Transverse Laguerre-Gauss Modes in Parametric Down-Conversion ( http://arxiv.org/abs/2209.01913v2 ) ライセンス: Link先を確認	Carlos Sevilla-Gutiérrez, Varun Raj Kaipalath, Baghdasar Baghdasaryan, Markus Gräfe, Stephan Fritzsche, Fabian Steinlechner,	(参考訳) パラメトリックダウンコンバージョン(PDC)発光円錐の最初のカラー写真は、プロセス中の縦-横運動量の相関、すなわちPDC光子の波長依存性放出角を示す。しかしながら、現在の実験と応用は離散モード集合の観点からより便利に記述されており、最も適切な選択は実験環境の伝播対称性に依存する。顕著なことに、PDC源を用いた実験は、例えば明るさや状態の忠実度の観点からは、より要求が強くなっているにもかかわらず、離散モード分解の場合のパラメトリックダウンコンバージョンにおけるスペクトル-空間結合の記述は、いまだに解明されていない。本稿では,パラメトリックダウンコンバージョンにおけるラゲール・ガウスモードのスペクトル依存性について,理論的および実験的に包括的に研究する。さらに、よく知られた軌道角運動量エンタングルメントの純度を調整するために、スペクトルと空間のカップリングをどのように利用できるかを示す。この研究は、横方向の単一モードにおける絡み合った光子の効率的な収集、量子イメージング、高次元量子情報処理のための工学的純粋状態に影響を及ぼす。 The first color photos of the parametric down-conversion (PDC) emission cone illustrate the correlation of longitudinal- and transverse momentum in the process, i.e., wavelength-dependent emission angle of PDC photons. However, current experiments and applications are more conveniently described in terms of discrete mode sets, with the most suitable choice depending on the propagation symmetries of the experimental setting. Remarkably, despite the fact that experiments with PDC sources are becoming ever more demanding, e.g. in terms of brightness or state fidelity, a description of spectral-spatial coupling in parametric downconversion for the case of discrete modal decompositions remains elusive. We present a comprehensive study, in theory and experiment, of the spectral dependence of the transverse Laguerre-Gauss modes in parametric downconversion. Moreover, we show how the spectral and spatial coupling can be harnessed to tune the purity of the well-known orbital angular momentum entanglement. This work has implications for efficient collection of entangled photons in a transverse single mode, quantum imaging, and engineering pure states for high-dimensional quantum information processing.	翻訳日:2024-03-21 02:00:54 公開日:2024-03-19
# 多目的学習におけるグラディエントバイアスの緩和:確率論的アプローチ Mitigating Gradient Bias in Multi-objective Learning: A Provably Convergent Stochastic Approach ( http://arxiv.org/abs/2210.12624v2 ) ライセンス: Link先を確認	Heshan Fernando, Han Shen, Miao Liu, Subhajit Chaudhury, Keerthiram Murugesan, Tianyi Chen,	(参考訳) 複数の目的関数を持つ機械学習の問題は、公正性や安全性、正確性といった複数のパフォーマンス指標間のトレードオフを学習する複数の基準で学習する場合や、複数のタスクを共同で最適化したマルチタスク学習において、それら間で帰納的バイアスを共有する場合のどちらかに現れます。この問題は多目的最適化フレームワークによってしばしば解決される。しかし、既存の確率的多目的勾配法とその変種(MGDA、PCGrad、CAGradなど)は、いずれもバイアス付き雑音勾配方向を採用しており、経験的性能が劣化する。そこで我々は,多目的最適化のための確率的多目的勾配補正法(MoCo)を開発した。本手法の特徴は,非凸設定においてもバッチサイズを増大させることなく収束を保証できる点である。マルチタスク指導および強化学習のシミュレーションは,最先端手法と比較して,本手法の有効性を実証する。 Machine learning problems with multiple objective functions appear either in learning with multiple criteria where learning has to make a trade-off between multiple performance metrics such as fairness, safety and accuracy; or, in multi-task learning where multiple tasks are optimized jointly, sharing inductive bias between them. This problems are often tackled by the multi-objective optimization framework. However, existing stochastic multi-objective gradient methods and its variants (e.g., MGDA, PCGrad, CAGrad, etc.) all adopt a biased noisy gradient direction, which leads to degraded empirical performance. To this end, we develop a stochastic Multi-objective gradient Correction (MoCo) method for multi-objective optimization. The unique feature of our method is that it can guarantee convergence without increasing the batch size even in the non-convex setting. Simulations on multi-task supervised and reinforcement learning demonstrate the effectiveness of our method relative to state-of-the-art methods.	翻訳日:2024-03-21 02:00:54 公開日:2024-03-19
# SFPDML:MKTFHEに基づくシーカライズと高速なプライバシ保護型分散機械学習 SFPDML: Securer and Faster Privacy-Preserving Distributed Machine Learning based on MKTFHE ( http://arxiv.org/abs/2211.09353v2 ) ライセンス: Link先を確認	Hongxiao Wang, Zoe L. Jiang, Yanmin Zhao, Siu-Ming Yiu, Peng Yang, Man Chen, Zejiu Tan, Bohan Jin,	(参考訳) 近年,分散機械学習が注目されている。しかし、プライバシーはこの分野における未解決の問題であり続けている。 MKTFHE(Multi-key homomorphic encryption over torus)は、この問題に対処するための有望な候補の1つである。それでも、MKTFHEの解読にはセキュリティ上のリスクがある可能性がある。さらに、我々の知る限り、MKTFHEに関する最新の研究は、Sigmoidのような非線形関数を直接計算できないブール演算と線形演算しかサポートしていない。したがって、ロジスティック回帰やニューラルネットワークなどの一般的な機械学習を高性能に実行することは依然として困難である。本稿では,MKTFHEの既存の分散復号化プロトコルに対する攻撃の可能性を最初に発見し,その後秘密共有を導入し,セキュアな復号化を提案する。次に,emph{homogenizer} と \emph{compare quads} を用いて,新しい MKTFHE 対応活性化関数を設計する。最後に,MKTFHEにおけるロジスティック回帰とニューラルネットワークトレーニングの実装に活用する。 Sigmoid のTaylor 多項式を活性化関数として使用する場合の効率と精度を比較すると、我々の関数の効率は7次Taylor 多項式を直線的に使用する場合の10倍高く、訓練モデルの精度は高次多項式を活性化関数スキームとして使用する場合と似ている。 In recent years, distributed machine learning has garnered significant attention. However, privacy continues to be an unresolved issue within this field. Multi-key homomorphic encryption over torus (MKTFHE) is one of the promising candidates for addressing this concern. Nevertheless, there may be security risks in the decryption of MKTFHE. Moreover, to our best known, the latest works about MKTFHE only support Boolean operation and linear operation which cannot directly compute the non-linear function like Sigmoid. Therefore, it is still hard to perform common machine learning such as logistic regression and neural networks in high performance. In this paper, we first discover a possible attack on the existing distributed decryption protocol for MKTFHE and subsequently introduce secret sharing to propose a securer one. Next, we design a new MKTFHE-friendly activation function via \emph{homogenizer} and \emph{compare quads}. Finally, we utilize them to implement logistic regression and neural network training in MKTFHE. Comparing the efficiency and accuracy between using Taylor polynomials of Sigmoid and our proposed function as an activation function, the experiments show that the efficiency of our function is 10 times higher than using 7-order Taylor polynomials straightly and the accuracy of the training model is similar to using a high-order polynomial as an activation function scheme.	翻訳日:2024-03-21 02:00:54 公開日:2024-03-19
# 非コヒーレントオーバーザエア分散グラディエント染料 Non-Coherent Over-the-Air Decentralized Gradient Descent ( http://arxiv.org/abs/2211.10777v2 ) ライセンス: Link先を確認	Nicolo Michelusi,	(参考訳) Decentralized Gradient Descent (DGD) は、リモートセンシング、分散推論、マルチエージェント調整、フェデレーション学習など、さまざまな領域における分散最適化問題を解決するために使われる一般的なアルゴリズムである。しかし、ノイズ、フェード、帯域幅の制限によって影響を受ける無線システム上でDGDを実行することは、干渉を軽減するために送信のスケジューリングが必要であり、また、無線分散システムにおける複雑なタスクであるトポロジとチャネル状態情報を取得する必要がある。本稿では,無線システムに適したDGDアルゴリズムを提案する。既存のアプローチとは異なり、エージェント間の調整、トポロジー情報、チャネル状態情報なしで動作している。その中核は非コヒーレントオーバー・ザ・エア(NCOTA)コンセンサススキームであり、無線チャネルのノイズの多いエネルギー重畳特性を利用する。半二重演算に対応するランダム化された伝送戦略により、送信機はOFDMフレーム内のサブキャリア全体のエネルギーレベルに局所最適化信号をマッピングし、調整なしで同時に送信する。受信したエネルギーは雑音の多いコンセンサス信号を形成し、その変動はコンセンサスステップサイズによって緩和される。 NCOTA-DGDはチャネルパスロスを利用してコンセンサスを形成する。強凸問題のクラスでは、局所的モデルと大域的最適モデルの間の期待二乗距離が$$\mathcal O(1/\sqrt{k})$ の後に消失し、ステップサイズを減少させる適切な設計が示される。拡張は、幅広い種類のフェージングモデルと周波数選択チャネルに対処する。画像分類タスクの数値的な結果は、特に密に配置されたネットワークにおいて、最先端のスキームよりも高速な収束 vis-\`a-vis 実行時間を示している。 Decentralized Gradient Descent (DGD) is a popular algorithm used to solve decentralized optimization problems in diverse domains such as remote sensing, distributed inference, multi-agent coordination, and federated learning. Yet, executing DGD over wireless systems affected by noise, fading and limited bandwidth presents challenges, requiring scheduling of transmissions to mitigate interference and the acquisition of topology and channel state information -- complex tasks in wireless decentralized systems. This paper proposes a DGD algorithm tailored to wireless systems. Unlike existing approaches, it operates without inter-agent coordination, topology information, or channel state information. Its core is a Non-Coherent Over-The-Air (NCOTA) consensus scheme, exploiting a noisy energy superposition property of wireless channels. With a randomized transmission strategy to accommodate half-duplex operation, transmitters map local optimization signals to energy levels across subcarriers in an OFDM frame, and transmit concurrently without coordination. It is shown that received energies form a noisy consensus signal, whose fluctuations are mitigated via a consensus stepsize. NCOTA-DGD leverages the channel pathloss for consensus formation, without explicit knowledge of the mixing weights. It is shown that, for the class of strongly-convex problems, the expected squared distance between the local and globally optimum models vanishes with rate $\mathcal O(1/\sqrt{k})$ after $k$ iterations, with a proper design of decreasing stepsizes. Extensions address a broad class of fading models and frequency-selective channels. Numerical results on an image classification task depict faster convergence vis-\`a-vis running time than state-of-the-art schemes, especially in densely deployed networks.	翻訳日:2024-03-21 01:51:05 公開日:2024-03-19
# 非エルミート二バンドBCSモデルにおけるゼロ例外点におけるマイスナー効果の破壊 Breakdown of the Meissner effect at the zero exceptional point in non-Hermitian two-band BCS model ( http://arxiv.org/abs/2211.11422v3 ) ライセンス: Link先を確認	Takano Taira,	(参考訳) パラメータ空間の例外点における複素場理論における連続対称性の自発的対称性の破れは、ヒッグス機構の崩壊のような興味深い現象を示すことが知られている。本研究では、経路積分による非エルミート二バンドBCSモデルから複素ギンズバーグ・ランダウモデルを導出し、その自発対称性の破れについて検討する。ヒッグス機構と類似して、複素ギンズバーグ・ランダウ模型のマイスナー効果も例外点で分解され、ギャップパラメータは有限である。 The spontaneous symmetry breaking of a continuous symmetry in complex field theory at the exceptional point of the parameter space is known to exhibit interesting phenomena, such as the breakdown of a Higgs mechanism. In this work, we derive the complex Ginzburg-Landau model from a non-Hermitian two-band BCS model via path integral and investigate its spontaneous symmetry breaking. We find that analog to the Higgs mechanism, the Meissner effect of the complex Ginzburg-Landau model also breaks down at the exceptional point while the gap parameters stay finite.	翻訳日:2024-03-21 01:51:05 公開日:2024-03-19
# ラフメッシュによる3次元シーン作成とレンダリング:照明伝達アベニュー 3D Scene Creation and Rendering via Rough Meshes: A Lighting Transfer Avenue ( http://arxiv.org/abs/2211.14823v3 ) ライセンス: Link先を確認	Bowen Cai, Yujie Li, Yuqin Liang, Rongfei Jia, Binqiang Zhao, Mingming Gong, Huan Fu,	(参考訳) 本稿では,再構成された3Dモデルを3Dシーン作成やレンダリングなどの実用的な3Dモデリングパイプラインに柔軟に統合する方法について述べる。技術的難しさから、既存の3D再構成技術を用いて、ほとんどの実物に対して粗い3Dモデル(R3DM)しか得られない。その結果、物理ベースレンダリング(PBR)はR3DMで構築された低画質の画像やビデオを表示するようになった。期待できる解決策の1つは、現実世界のオブジェクトをNeRFのようなニューラルフィールドとして表現し、望まれる視点の下でオブジェクトの写実的なレンダリングを生成することである。しかし、ニューラルフィールドレンダリング(NFR)による合成ビューは、特に3次元シーン生成におけるオブジェクトの相互作用が局所影を引き起こす場合、PBRパイプラインにおけるR3DMのシミュレーションライティング詳細を反映できない。このジレンマを解決するために,NFRとPBRを橋渡しする照明伝達ネットワーク(LighTNet)を提案する。 LighTNetは、簡易な画像合成モデルに関する理由から、R3DMによる表面の不均一な問題を是正し、いくつかの知覚的モチベーションを持つ制約と、照明強度と色とのコントラストを高める新しいLab角損失によって強化されている。比較では、LighTNetは印象的な照明の合成に優れており、実用的な3DモデリングワークフローにおいてNFRをさらに推し進めることを約束している。 This paper studies how to flexibly integrate reconstructed 3D models into practical 3D modeling pipelines such as 3D scene creation and rendering. Due to the technical difficulty, one can only obtain rough 3D models (R3DMs) for most real objects using existing 3D reconstruction techniques. As a result, physically-based rendering (PBR) would render low-quality images or videos for scenes that are constructed by R3DMs. One promising solution would be representing real-world objects as Neural Fields such as NeRFs, which are able to generate photo-realistic renderings of an object under desired viewpoints. However, a drawback is that the synthesized views through Neural Fields Rendering (NFR) cannot reflect the simulated lighting details on R3DMs in PBR pipelines, especially when object interactions in the 3D scene creation cause local shadows. To solve this dilemma, we propose a lighting transfer network (LighTNet) to bridge NFR and PBR, such that they can benefit from each other. LighTNet reasons about a simplified image composition model, remedies the uneven surface issue caused by R3DMs, and is empowered by several perceptual-motivated constraints and a new Lab angle loss which enhances the contrast between lighting strength and colors. Comparisons demonstrate that LighTNet is superior in synthesizing impressive lighting, and is promising in pushing NFR further in practical 3D modeling workflows.	翻訳日:2024-03-21 01:51:05 公開日:2024-03-19
# 量子ジャジンスキー等式の設定における射影仮説 Projection hypothesis in the setting for the quantum Jarzynski equality ( http://arxiv.org/abs/2212.07785v6 ) ライセンス: Link先を確認	Eiji Konishi,	(参考訳) 射影量子計測は、現代の量子力学において理論的に受け入れられた過程である。しかし、その射影仮説は実験的に確立された経験則として広く見なされている。本稿では、投射量子測定における射影仮説のハミルトン過程の実現に関する以前の結果と、マクロ量子力学系の質量中心の軌道可観測物の完全な集合が相互に可換な古典的可観測物の集合に制限されていることと、イベント読取に必要な作業(すなわち射影量子測定における情報的過程)に関する以前の結果を組み合わせる。次に、これら2つの相互独立な量子計測理論結果を同時に試験するための量子熱力学スキームを提案する。 Projective quantum measurement is a theoretically accepted process in modern quantum mechanics. However, its projection hypothesis is widely regarded as an experimentally established empirical law. In this article, we combine a previous result regarding the realization of a Hamiltonian process of the projection hypothesis in projective quantum measurement, where the complete set of the orbital observables of the center of mass of a macroscopic quantum mechanical system is restricted to a set of mutually commuting classical observables, and a previous result regarding the work required for an event reading (i.e., the informatical process in projective quantum measurement). Then, a quantum thermodynamic scheme is proposed for experimentally testing these two mutually independent theoretical results of projective quantum measurement simultaneously.	翻訳日:2024-03-21 01:51:05 公開日:2024-03-19
# マッキーン・ブラソフ制御問題に対する平均場ニューラルネットワークに基づくアルゴリズム* Mean-field neural networks-based algorithms for McKean-Vlasov control problems * ( http://arxiv.org/abs/2212.11518v2 ) ライセンス: Link先を確認	Huyên Pham, Xavier Warin,	(参考訳) 本稿では,ワッサーシュタイン空間の解法を学習するために,我々の共用紙[25]に導入した平均場ニューラルネットワークのクラスを用いて,マッキーン・ブラソフ制御問題の数値解法について述べる。ポリシーや値反復による制御学習を用いた動的プログラミングや,大域的あるいは局所的損失関数を用いた確率的最大原理による逆SDEを提案する。 8つのアルゴリズムの各々の精度を示すために、異なる例の大規模な数値結果を示す。テストされたすべてのメソッドの長所と短所について議論し、比較する。 This paper is devoted to the numerical resolution of McKean-Vlasov control problems via the class of mean-field neural networks introduced in our companion paper [25] in order to learn the solution on the Wasserstein space. We propose several algorithms either based on dynamic programming with control learning by policy or value iteration, or backward SDE from stochastic maximum principle with global or local loss functions. Extensive numerical results on different examples are presented to illustrate the accuracy of each of our eight algorithms. We discuss and compare the pros and cons of all the tested methods.	翻訳日:2024-03-21 01:51:05 公開日:2024-03-19
# カスタマイズされたプライバシ保護によるソーシャル・アウェア・クラスタ型フェデレーション学習 Social-Aware Clustered Federated Learning with Customized Privacy Preservation ( http://arxiv.org/abs/2212.13992v3 ) ライセンス: Link先を確認	Yuntao Wang, Zhou Su, Yanghe Pan, Tom H Luan, Ruidong Li, Shui Yu,	(参考訳) FL(Federated Learning)の重要な特徴は、エンドユーザのデータのプライバシを維持することだ。しかし、FLの下で勾配を交換する際の潜在的なプライバシー漏洩が存在する。その結果、近年の研究では、計算結果にノイズを加えるための差分プライバシ(DP)アプローチを検討し、低オーバーヘッドでプライバシの問題に対処するが、モデル性能は低下する。本稿では,ユーザ間のソーシャルな関係を利用して,データプライバシと効率のバランスをとる。具体的には,ソーシャル・アウェア・クラスタ・フェデレート・ラーニング・スキームであるSCFLを提案する。このスキームでは,信頼関係のある個人が,グローバルアグリゲーションのためにクラウドにアップロードする前に,ソーシャル・クラスタを自由に形成し,各クラスタ内の生モデル更新(グラデーションなど)を集約することができる。モデル更新をソーシャルグループに混ぜ合わせることで、敵はソーシャル層を組み合わせた結果のみを盗むことができるが、個人のプライバシーを盗むことはできない。 SCFLの設計を3段階に展開する。ユーザの不均一なトレーニングサンプルやデータ分布を考慮すると、最適なソーシャルクラスタ形成問題をフェデレーションゲームとして定式化し、フリーライダーに抵抗する公平な収益配分機構を考案する。二差別化された信託民営地図相互信頼度が低いクラスタに対しては,社会的信頼度に応じて参加者のモデル更新を適応的に正当化するための,カスタマイズ可能なプライバシ保護機構を設計する。三分散収束分散二面マッチングアルゴリズムは、Nash-stable 収束で最適化された解離分割を実現するために考案された。 FacebookネットワークとMNIST/CIFAR-10データセットの実験は、SCFLが学習ユーティリティを効果的に強化し、ユーザの支払いを改善し、カスタマイズ可能なプライバシ保護を強制できることを検証する。 A key feature of federated learning (FL) is to preserve the data privacy of end users. However, there still exist potential privacy leakage in exchanging gradients under FL. As a result, recent research often explores the differential privacy (DP) approaches to add noises to the computing results to address privacy concerns with low overheads, which however degrade the model performance. In this paper, we strike the balance of data privacy and efficiency by utilizing the pervasive social connections between users. Specifically, we propose SCFL, a novel Social-aware Clustered Federated Learning scheme, where mutually trusted individuals can freely form a social cluster and aggregate their raw model updates (e.g., gradients) inside each cluster before uploading to the cloud for global aggregation. By mixing model updates in a social group, adversaries can only eavesdrop the social-layer combined results, but not the privacy of individuals. We unfold the design of SCFL in three steps.i) Stable social cluster formation. Considering users' heterogeneous training samples and data distributions, we formulate the optimal social cluster formation problem as a federation game and devise a fair revenue allocation mechanism to resist free-riders. ii) Differentiated trust-privacy mapping}. For the clusters with low mutual trust, we design a customizable privacy preservation mechanism to adaptively sanitize participants' model updates depending on social trust degrees. iii) Distributed convergence}. A distributed two-sided matching algorithm is devised to attain an optimized disjoint partition with Nash-stable convergence. Experiments on Facebook network and MNIST/CIFAR-10 datasets validate that our SCFL can effectively enhance learning utility, improve user payoff, and enforce customizable privacy protection.	翻訳日:2024-03-21 01:51:05 公開日:2024-03-19
# Behave-XAI:行動表現データの深い説明可能な学習 Behave-XAI: Deep Explainable Learning of Behavioral Representational Data ( http://arxiv.org/abs/2301.00016v2 ) ライセンス: Link先を確認	Rossi Kamal, Zuzana Kubincova,	(参考訳) 人工知能の最新トレンドによると、AIシステムは、一般的な、特定の決定、それが提供するサービスについて明確にする必要がある。消費者だけが満足しており、例えば、なぜ分類結果が与えられた時間の結果であるのかを説明する。これは実際に、行動的マイニングのシナリオにおいて、説明可能なAIや人間の理解可能なAIを使用することを動機付けています。しかし、AIシステムの出力は必ずしも体系的に正しいというわけではなく、しばしば体系的に正しいというわけではない。下にある理由は何ですか。この文脈で、我々はまず、深層畳み込みニューラルネットワークアーキテクチャにおける行動マイニング問題を定式化する。最終的に、ユーザーの生理的および環境的センサー読み取りから時系列データが存在するため、再帰的ニューラルネットワークを適用する。モデルが開発されると、ユーザの前でXAIモデルの出現が説明される。この重要なステップは、ユーザーが従来のAIよりも説明を好み、説明の信頼性を判断する広範囲なトライアルである。 According to the latest trend of artificial intelligence, AI-systems needs to clarify regarding general,specific decisions,services provided by it. Only consumer is satisfied, with explanation , for example, why any classification result is the outcome of any given time. This actually motivates us using explainable or human understandable AI for a behavioral mining scenario, where users engagement on digital platform is determined from context, such as emotion, activity, weather, etc. However, the output of AI-system is not always systematically correct, and often systematically correct, but apparently not-perfect and thereby creating confusions, such as, why the decision is given? What is the reason underneath? In this context, we first formulate the behavioral mining problem in deep convolutional neural network architecture. Eventually, we apply a recursive neural network due to the presence of time-series data from users physiological and environmental sensor-readings. Once the model is developed, explanations are presented with the advent of XAI models in front of users. This critical step involves extensive trial with users preference on explanations over conventional AI, judgement of credibility of explanation.	翻訳日:2024-03-21 01:51:05 公開日:2024-03-19
# 長期記憶とTOPSISによる深部反復学習 Deep Recurrent Learning Through Long Short Term Memory and TOPSIS ( http://arxiv.org/abs/2301.00693v2 ) ライセンス: Link先を確認	Rossi Kamal, Zuzana Kubincova, Mosaddek Hossain Kamal, Upama Kabir,	(参考訳) エンタープライズリソース計画(ERP)ソフトウェアは、企業のビジネスプロセス内でソフトウェアフローを維持するために、リソースとデータをまとめます。しかし、クラウドコンピューティングの安価で簡単で迅速な管理の約束は、ビジネスオーナーにモノリシックからデータセンター/クラウドベースのERPへの移行を迫る。クラウドERP開発には、計画、実装、テスト、アップグレードといった循環的なプロセスが伴うため、その採用はディープリカレントニューラルネットワーク問題として実現されている。最終的に、長寿命メモリ(LSTM)とTOPSISに基づく分類アルゴリズムが提案され、それぞれ採用特徴を識別およびランク付けする。我々の理論モデルは、キープレーヤー、サービス、アーキテクチャ、機能を明確にすることで、参照モデル上で検証される。技術,イノベーション,抵抗問題を考慮した質的調査を行い,主要な採用要因に関する仮説を定式化する。 Enterprise resource planning (ERP) software brings resources, data together to keep software-flow within business processes in a company. However, cloud computing's cheap, easy and quick management promise pushes business-owners for a transition from monolithic to a data-center/cloud based ERP. Since cloud-ERP development involves a cyclic process, namely planning, implementing, testing and upgrading, its adoption is realized as a deep recurrent neural network problem. Eventually, a classification algorithm based on long short term memory (LSTM) and TOPSIS is proposed to identify and rank, respectively, adoption features. Our theoretical model is validated over a reference model by articulating key players, services, architecture, functionalities. Qualitative survey is conducted among users by considering technology, innovation and resistance issues, to formulate hypotheses on key adoption factors.	翻訳日:2024-03-21 01:51:05 公開日:2024-03-19
# 未知の量子計測の識別と認証 Discrimination and certification of unknown quantum measurements ( http://arxiv.org/abs/2301.04948v3 ) ライセンス: Link先を確認	Aleksandra Krawiec, Łukasz Pawela, Zbigniew Puchała,	(参考訳) 我々は,フォン・ノイマン測定の基準値や他の測定値が与えられた場合のシナリオにおける識別について検討した。判別の目的は、他の測定値が最初の測定値と同じかどうかを決定することである。基準測定が古典的な記述を伴わない場合と古典的な記述が知られている場合を考察する。どちらのケースも、対称的および非対称的な識別設定で研究されている。さらに、未知の量子計測に対して既知の量子測定を証明できる最適な認証方式を提供する。 We study the discrimination of von Neumann measurements in the scenario when we are given a reference measurement and some other measurement. The aim of the discrimination is to determine whether the other measurement is the same as the first one. We consider the cases when the reference measurement is given without the classical description and when its classical description is known. Both cases are studied in the symmetric and asymmetric discrimination setups. Moreover, we provide optimal certification schemes enabling us to certify a known quantum measurement against the unknown one.	翻訳日:2024-03-21 01:51:05 公開日:2024-03-19
# 変圧器による映像塗装における光学的フロー誘導 Exploiting Optical Flow Guidance for Transformer-Based Video Inpainting ( http://arxiv.org/abs/2301.10048v2 ) ライセンス: Link先を確認	Kaidong Zhang, Jialun Peng, Jingjing Fu, Dong Liu,	(参考訳) トランスフォーマーはマルチヘッド・セルフアテンション(MHSA)機構によってビデオ処理に広く利用されている。しかし、MHSA機構は、劣化した領域に付随する特徴が劣化し、不正確な自己注意が生じるため、ビデオ塗布の本質的な困難に遭遇する。問合せ分解と呼ばれるこの問題は、最初に光学的流れを完了し、フローを用いて自己注意を導くことで緩和される可能性がある。さらにフローガイダンスを利用してFGT++を提案する。まず,ローカルアグリゲーションとエッジロスを用いて,軽量なフローコンプリートネットワークを設計する。第2に、クエリの劣化に対処するために、フロー誘導機能統合モジュールを提案し、フローにしたがって特徴を警告するフロー誘導機能伝搬モジュールとともに、動作の相違を利用して特徴を増強する。第3に、時間的および空間的次元に沿って変換器を分離し、時間的変形可能なMHSA機構でトークンの選択にフローを使用し、大域トークンは双対視点MHSA機構で内窓局所トークンと結合する。 FGT++は、既存のビデオインパインティングネットワークを質的かつ定量的に上回っていると実験的に評価されている。 Transformers have been widely used for video processing owing to the multi-head self attention (MHSA) mechanism. However, the MHSA mechanism encounters an intrinsic difficulty for video inpainting, since the features associated with the corrupted regions are degraded and incur inaccurate self attention. This problem, termed query degradation, may be mitigated by first completing optical flows and then using the flows to guide the self attention, which was verified in our previous work - flow-guided transformer (FGT). We further exploit the flow guidance and propose FGT++ to pursue more effective and efficient video inpainting. First, we design a lightweight flow completion network by using local aggregation and edge loss. Second, to address the query degradation, we propose a flow guidance feature integration module, which uses the motion discrepancy to enhance the features, together with a flow-guided feature propagation module that warps the features according to the flows. Third, we decouple the transformer along the temporal and spatial dimensions, where flows are used to select the tokens through a temporally deformable MHSA mechanism, and global tokens are combined with the inner-window local tokens through a dual perspective MHSA mechanism. FGT++ is experimentally evaluated to be outperforming the existing video inpainting networks qualitatively and quantitatively.	翻訳日:2024-03-21 01:51:05 公開日:2024-03-19
# ETHNO-DAANN:Deep Adversarial Transfer Learningによるエスノグラフィーエンゲージメント分類 ETHNO-DAANN: Ethnographic Engagement Classification by Deep Adversarial Transfer Learning ( http://arxiv.org/abs/2301.10229v2 ) ライセンス: Link先を確認	Rossi Kamal, Zuzana Kubincova, Mosaddek Hossain Kamal, Upama Kabir,	(参考訳) 学生のモチベーションは、現在進行中の第4次産業革命において、ポストコロニアル教育改革と青少年雇用市場適応の必要性から、重要な研究課題である。ポスト共産主義時代の教師は、より良い教育を提供することを目的として、背景、起源などの学生の民族情報を分析するよう促される。スマートデバイスデータの普及,遠隔学習プラットフォームへの需要の増大,バーチャル学習のさまざまな調査結果などにより,学生のエンゲージメントデータにアクセスできることは幸運なことだ。本研究の動機は, ラベル付き知識が限られている場合に, エスノグラフィ情報から学生のエンゲージメントを予測できるか, という課題に対処することにある。もし答えがイエスなら、どの特徴がエスノグラフィーエンゲージメント学習に最も影響しているかを知ることができるだろうか? この文脈では、エスノグラフィーエンゲージメント予測のための逆適応を用いたディープニューラルネットワークに基づくトランスファーラーニングアルゴリズム ETHNO-DAANN を提案する。最終予測に有効な最も影響力のある特徴を明らかにするために,民族を基盤とした学生のモチベーションに関する参加者を対象に調査を行った。このように,本研究は,限られたラベル付きデータの場合のエスノグラフィー・モチベーション・パラメーター推定の一般解である。 Student motivation is a key research agenda due to the necessity of both postcolonial education reform and youth job-market adaptation in ongoing fourth industrial revolution. Post-communism era teachers are prompted to analyze student ethnicity information such as background, origin with the aim of providing better education. With the proliferation of smart-device data, ever-increasing demand for distance learning platforms and various survey results of virtual learning, we are fortunate to have some access to student engagement data. In this research, we are motivated to address the following questions: can we predict student engagement from ethnographic information when we have limited labeled knowledge? If the answer is yes, can we tell which features are most influential in ethnographic engagement learning? In this context, we have proposed a deep neural network based transfer learning algorithm ETHNO-DAANN with adversarial adaptation for ethnographic engagement prediction. We conduct a survey among participants about ethnicity-based student motivation to figure out the most influential feature helpful in final prediction. Thus, our research stands as a general solution for ethnographic motivation parameter estimation in case of limited labeled data.	翻訳日:2024-03-21 01:51:05 公開日:2024-03-19
# レイヤーがロタリーを再生すると、全てのチケットが初期化に勝つ When Layers Play the Lottery, all Tickets Win at Initialization ( http://arxiv.org/abs/2301.10835v2 ) ライセンス: Link先を確認	Artur Jordao, George Correa de Araujo, Helena de Almeida Maia, Helio Pedrini,	(参考訳) プルーニングはディープネットワークの計算コストを削減するための標準的な手法である。プルーニングにおける多くの進歩は、LTH(Lottery Ticket hypothesis)の概念を活用している。 LTHは、訓練された密集ネットワークの内部に、同様の精度(すなわち、宝くじに勝つ)を達成できるスパースサブネットワーク(チケット)が存在することを明らかにしている。初期化時のプルーニングは、密集したネットワークを訓練せずに勝利のチケットを見つけることに焦点を当てている。これらの概念の研究は、サブネットワークが重み付けやフィルタープルーニングから生まれる傾向を共有している。本研究では,層状プルーニングのレンズからの初期化におけるLTHおよびプルーニングについて検討する。まず,切り抜き処理によって層が取り除かれた場合の当選チケットの存在を確認した。そこで本研究では,初期化時の入賞チケットの発見を提案し,初期(過パラメータ化)高密度ネットワークをトレーニングするための重い計算資源の必要性を排除した。大規模な実験では、優勝チケットが特にトレーニングフェーズをスピードアップし、最大51%の二酸化炭素排出量を削減していることが示されています。優勝チケットは、計算上の利点以外にも、敵意やアウト・オブ・ディストリビューションの例に対して堅牢性を示す。最後に,フィルタ除去チケット(標準構造LTH)が当選チケットとなるのがほとんどなく,初期化時にサブネットワークが抽選に容易に勝ることを示す。 Pruning is a standard technique for reducing the computational cost of deep networks. Many advances in pruning leverage concepts from the Lottery Ticket Hypothesis (LTH). LTH reveals that inside a trained dense network exists sparse subnetworks (tickets) able to achieve similar accuracy (i.e., win the lottery - winning tickets). Pruning at initialization focuses on finding winning tickets without training a dense network. Studies on these concepts share the trend that subnetworks come from weight or filter pruning. In this work, we investigate LTH and pruning at initialization from the lens of layer pruning. First, we confirm the existence of winning tickets when the pruning process removes layers. Leveraged by this observation, we propose to discover these winning tickets at initialization, eliminating the requirement of heavy computational resources for training the initial (over-parameterized) dense network. Extensive experiments show that our winning tickets notably speed up the training phase and reduce up to 51% of carbon emission, an important step towards democratization and green Artificial Intelligence. Beyond computational benefits, our winning tickets exhibit robustness against adversarial and out-of-distribution examples. Finally, we show that our subnetworks easily win the lottery at initialization while tickets from filter removal (the standard structured LTH) hardly become winning tickets.	翻訳日:2024-03-21 01:51:05 公開日:2024-03-19
# ピアスワイドアフィンシステムにおける予測のための平滑オンライン学習 Smoothed Online Learning for Prediction in Piecewise Affine Systems ( http://arxiv.org/abs/2301.11187v2 ) ライセンス: Link先を確認	Adam Block, Max Simchowitz, Russ Tedrake,	(参考訳) ピースワイズアフィン(PWA)の回帰と計画の問題は、オンライン学習、制御、ロボット工学の研究において基礎的な重要性を持つ。残念なことに、異なる「作品」にまたがる際に生じる不連続さのため、一般的なシーケンシャルなセッティングでの学習は不可能であり、実用的なアルゴリズムはヒューリスティックなアプローチに頼らざるを得ない。本稿は、最近開発されたスムーズなオンライン学習フレームワークを基盤として、弱スムーズな仮定の下で全ての問題パラメータの多項式を後悔するPWAシステムにおいて、予測とシミュレーションのための最初のアルゴリズムを提供する。さらに,本研究は,学習者が軌道を模擬し,実データと実データとのワッサーシュタイン距離を測った場合の,一段階の予測と多段階のシミュレーション後悔の問題に適用する。その過程で私たちは,より一般的な技術ツールをいくつか開発しています。 The problem of piecewise affine (PWA) regression and planning is of foundational importance to the study of online learning, control, and robotics, where it provides a theoretically and empirically tractable setting to study systems undergoing sharp changes in the dynamics. Unfortunately, due to the discontinuities that arise when crossing into different ``pieces,'' learning in general sequential settings is impossible and practical algorithms are forced to resort to heuristic approaches. This paper builds on the recently developed smoothed online learning framework and provides the first algorithms for prediction and simulation in PWA systems whose regret is polynomial in all relevant problem parameters under a weak smoothness assumption; moreover, our algorithms are efficient in the number of calls to an optimization oracle. We further apply our results to the problems of one-step prediction and multi-step simulation regret in piecewise affine dynamical systems, where the learner is tasked with simulating trajectories and regret is measured in terms of the Wasserstein distance between simulated and true data. Along the way, we develop several technical tools of more general interest.	翻訳日:2024-03-21 01:51:05 公開日:2024-03-19
# LAGAN: 条件付き生成逆ニューラルネットを用いた半監督言語-アントロポロジー分類 LAGAN: Deep Semi-Supervised Linguistic-Anthropology Classification with Conditional Generative Adversarial Neural Network ( http://arxiv.org/abs/2301.13853v2 ) ライセンス: Link先を確認	Rossi Kamal, Zuzana Kubincova,	(参考訳) しかし、教育はすべての個人が他と異なる権利である。共産主義時代の教師は、固有の個人主義を発見し、第四次産業革命の雇用市場に向けて平等に訓練する。学術実践における少数民族教育のシナリオを考えることができる。民族的少数派は独自の文化で成長し、彼らの母国語で教えることを好む。我々は,このような言語人類学に基づくエンゲージメントを半教師付き問題として定式化した。そこで我々は,学生エンゲージメントにおける言語的エスノグラフィの特徴を分類するために,LA-GANという条件付き深層生成敵ネットワークアルゴリズムを開発した。理論的正当化は、我々の半教師付き敵モデルの目的、正則化、損失関数を証明している。調査質問は、学習スタイル、学習アプローチ、嗜好が主な関心領域であるz世代と民族マイノリティグループに関するある種の仮定に到達する準備が整っている。 Education is a right of all, however, every individual is different than others. Teachers in post-communism era discover inherent individualism to equally train all towards job market of fourth industrial revolution. We can consider scenario of ethnic minority education in academic practices. Ethnic minority group has grown in their own culture and would prefer to be taught in their native way. We have formulated such linguistic anthropology(how people learn)based engagement as semi-supervised problem. Then, we have developed an conditional deep generative adversarial network algorithm namely LA-GAN to classify linguistic ethnographic features in student engagement. Theoretical justification proves the objective, regularization and loss function of our semi-supervised adversarial model. Survey questions are prepared to reach some form of assumptions about z-generation and ethnic minority group, whose learning style, learning approach and preference are our main area of interest.	翻訳日:2024-03-21 01:40:47 公開日:2024-03-19
# 二重量子井戸を有する微小キャビティにおける強い機械的スクイーズ Strong mechanical squeezing in a microcavity with double quantum wells ( http://arxiv.org/abs/2302.00534v4 ) ライセンス: Link先を確認	Muhammad Asjad, Berihu Teklu, Hichem Eleuch,	(参考訳) 二色コヒーレント光で励起される移動端ミラーを空洞内に配置した2つの量子井戸からなるハイブリッド量子システムにおいて、機械共振器の圧縮状態の形成に対処する。エクシトンモードと機械共振器はマイクロキャビティフィールドを介して間接的に相互作用する。生成した結合条件下では、既存の実験パラメータによる解決されたサイドバンド状態を超えた機械モードのスクイーズを予測できる。最後に, この熱ゆらぎに対するスチーズ処理の堅牢性は, これらのシステムの実用化に重要であることを示す。 In a hybrid quantum system composed of two quantum wells placed inside a cavity with a moving end mirror pumped by bichromatic coherent light, we address the formation of squeezed states of a mechanical resonator. The exciton mode and mechanical resonator interact indirectly via microcavity fields. Under the conditions of the generated coupling, we predict squeezing of the mechanical-mode beyond the resolved side-band regime with existing experimental parameters. Finally, we show that the robustness of this squeezing against thermal fluctuations is important for practical applications of such systems.	翻訳日:2024-03-21 01:40:47 公開日:2024-03-19
# Oracle-Efficient Smoothed Online Learning for Piecewise Continuous Decision Making Oracle-Efficient Smoothed Online Learning for Piecewise Continuous Decision Making ( http://arxiv.org/abs/2302.05430v2 ) ライセンス: Link先を確認	Adam Block, Alexander Rakhlin, Max Simchowitz,	(参考訳) スムースなオンライン学習は、古典的な学習から逆の学習へと移行するときに生じる統計的および計算的複雑さのかなりの損失を軽減するために人気のあるフレームワークとして現れてきた。残念なことに、いくつかの空間では、学習者が空間上の最適化オラクルにアクセスできたとしても、効率の良いアルゴリズムが極端に最適であるアルゴリズムよりも指数関数的に悪い後悔を被っていることが示されている。この指数的依存を緩和するために、この研究は複雑性という新しい概念を導入し、一般ブラケット数(英語版)を導入し、これは空間の大きさに対する敵の制約をマージし、Follow-the-Perturbed-Leaderのインスタンス化が、平均的後悔に対して最適にスケールする最適化オラクルへの呼び出しの数に対して、あまり後悔しないことを示す。そして、オンラインの予測や断片的連続関数の計画など、関心のあるいくつかの問題で境界をインスタンス化し、計量学やロボット工学のような分野に多くの応用がある。 Smoothed online learning has emerged as a popular framework to mitigate the substantial loss in statistical and computational complexity that arises when one moves from classical to adversarial learning. Unfortunately, for some spaces, it has been shown that efficient algorithms suffer an exponentially worse regret than that which is minimax optimal, even when the learner has access to an optimization oracle over the space. To mitigate that exponential dependence, this work introduces a new notion of complexity, the generalized bracketing numbers, which marries constraints on the adversary to the size of the space, and shows that an instantiation of Follow-the-Perturbed-Leader can attain low regret with the number of calls to the optimization oracle scaling optimally with respect to average regret. We then instantiate our bounds in several problems of interest, including online prediction and planning of piecewise continuous functions, which has many applications in fields as diverse as econometrics and robotics.	翻訳日:2024-03-21 01:40:47 公開日:2024-03-19
# 知識ネットワークナビゲーションにおける個人差 Individual differences in knowledge network navigation ( http://arxiv.org/abs/2303.10036v2 ) ライセンス: Link先を確認	Manran Zhu, Taha Yasseri, János Kertész,	(参考訳) オンライン情報の急激な蓄積により、効率的なWebナビゲーションは不可欠だが困難なものになっている。多様な人口層に対応しやすくナビゲート可能なサイバースペースを構築するためには、人々がどのように異なるナビゲートを行うかを理解することが最重要である。これまでの研究では、空間ナビゲーションにおける個々の違いが明らかにされていたが、知識空間ナビゲーションにおけるそのような違いは依然として少ないままである。このギャップを埋めるために、参加者がウィキペディアでナビゲーションゲームを行い、個人情報のアンケートを完了したオンライン実験を行った。分析の結果,年齢が知識空間のナビゲーション性能に悪影響を及ぼす一方,多言語主義はそれを向上させることがわかった。時間的プレッシャーの下では、被験者のパフォーマンスは試験全体で改善し、男性は女性よりも優れ、時間的プレッシャーのないゲームでは見られない効果である。我々の実験では、ルートフィリングの成功は、通常、ルートの革新的な探索能力に関係しない。本結果は,知識空間ナビゲーションにおける年齢,多言語性,時間制約の重要性を裏付けるものである。 With the rapid accumulation of online information, efficient web navigation has grown vital yet challenging. To create an easily navigable cyberspace catering to diverse demographics, understanding how people navigate differently is paramount. While previous research has unveiled individual differences in spatial navigation, such differences in knowledge space navigation remain sparse. To bridge this gap, we conducted an online experiment where participants played a navigation game on Wikipedia and completed personal information questionnaires. Our analysis shows that age negatively affects knowledge space navigation performance, while multilingualism enhances it. Under time pressure, participants' performance improves across trials and males outperform females, an effect not observed in games without time pressure. In our experiment, successful route-finding is usually not related to abilities of innovative exploration of routes. Our results underline the importance of age, multilingualism and time constraint in the knowledge space navigation.	翻訳日:2024-03-21 01:40:47 公開日:2024-03-19
# BugNIST - ドメインシフトによるオブジェクト検出のための大規模ボリュームデータセット BugNIST - a Large Volumetric Dataset for Object Detection under Domain Shift ( http://arxiv.org/abs/2304.01838v2 ) ライセンス: Link先を確認	Patrick Møller Jensen, Vedrana Andersen Dahl, Carsten Gundlach, Rebecca Engberg, Hans Martin Kjer, Anders Bjorholm Dahl,	(参考訳) ドメインシフトはディープラーニングアルゴリズムの性能に大きく影響する。アノテーション付きトレーニングデータは、ディープラーニングに基づくオブジェクト検出に不可欠である。しかし、密集したオブジェクトに注釈を付けるのに時間がかかり、コストがかかる。代わりに、個別にスキャンされたオブジェクトのトレーニングモデルを提案し、トレーニングデータと検出データのドメインシフトを引き起こします。この課題に対処するために,12種類のバグタイプ9154マイクロCTボリュームと,密充填されたバグミックス388ボリュームからなるBugNISTデータセットを紹介した。このデータセットは、ソースとターゲットドメインで同じ外観のオブジェクトを持つのが特徴で、ドメインシフトのための他のベンチマークデータセットでは珍しい。トレーニングでは、クラスによってラベル付けされた個々のバグボリュームが使用され、テストではセンターポイントアノテーションとバグタイプラベルが混在している。データセットとともに,3次元物体検出手法のフィールド化をめざして,ベースライン検出分析を行う。 Domain shift significantly influences the performance of deep learning algorithms, particularly for object detection within volumetric 3D images. Annotated training data is essential for deep learning-based object detection. However, annotating densely packed objects is time-consuming and costly. Instead, we suggest training models on individually scanned objects, causing a domain shift between training and detection data. To address this challenge, we introduce the BugNIST dataset, comprising 9154 micro-CT volumes of 12 bug types and 388 volumes of tightly packed bug mixtures. This dataset is characterized by having objects with the same appearance in the source and target domain, which is uncommon for other benchmark datasets for domain shift. During training, individual bug volumes labeled by class are utilized, while testing employs mixtures with center point annotations and bug type labels. Together with the dataset, we provide a baseline detection analysis, aiming at advancing the field of 3D object detection methods.	翻訳日:2024-03-21 01:30:29 公開日:2024-03-19
# ポピュレーションパラメータ平均化(PAPA) PopulAtion Parameter Averaging (PAPA) ( http://arxiv.org/abs/2304.03094v3 ) ライセンス: Link先を確認	Alexia Jolicoeur-Martineau, Emy Gervais, Kilian Fatras, Yan Zhang, Simon Lacoste-Julien,	(参考訳) アンサンブル法は、複数のモデルの予測を組み合わせて性能を向上させるが、推論時に計算コストを大幅に高める必要がある。これらのコストを回避するために、重みを平均化することにより、複数のニューラルネットワークをひとつにまとめることができる。しかし、これは通常、アンサンブルよりもはるかに悪いパフォーマンスを示す。ウェイト平均化は、それらを組み合わせることで利益を得るのに十分な違いがある場合にのみ有益であるが、平均的に十分よく似ている。この考え方に基づいて,アンサンブルの一般性と平均化の効率を結合する手法であるPopulAtion Parameter Averaging (PAPA)を提案する。 PAPAは多様なモデル(異なるデータ順序、拡張、正規化に基づいて訓練された)の集団を活用しながら、ネットワークの重みを徐々に重みの平均まで押し上げている。また, PAPA-allおよびPAPA-2は, 平均重みが連続的にではなく, 平均重みがほとんどなく, 全ての手法が一般化を促進させるが, PAPAは最良に機能する傾向にある。 PAPAは平均化とアンサンブルのパフォーマンスギャップを減らし、CIFAR-10では0.8%、CIFAR-100では1.9%、ImageNetでは1.6%となる。 Ensemble methods combine the predictions of multiple models to improve performance, but they require significantly higher computation costs at inference time. To avoid these costs, multiple neural networks can be combined into one by averaging their weights. However, this usually performs significantly worse than ensembling. Weight averaging is only beneficial when different enough to benefit from combining them, but similar enough to average well. Based on this idea, we propose PopulAtion Parameter Averaging (PAPA): a method that combines the generality of ensembling with the efficiency of weight averaging. PAPA leverages a population of diverse models (trained on different data orders, augmentations, and regularizations) while slowly pushing the weights of the networks toward the population average of the weights. We also propose PAPA variants (PAPA-all, and PAPA-2) that average weights rarely rather than continuously; all methods increase generalization, but PAPA tends to perform best. PAPA reduces the performance gap between averaging and ensembling, increasing the average accuracy of a population of models by up to 0.8% on CIFAR-10, 1.9% on CIFAR-100, and 1.6% on ImageNet when compared to training independent (non-averaged) models.	翻訳日:2024-03-21 01:30:29 公開日:2024-03-19
# マスクを用いたニューラルラジアンス場のモデリング Mask-Based Modeling for Neural Radiance Fields ( http://arxiv.org/abs/2304.04962v2 ) ライセンス: Link先を確認	Ganlin Yang, Guoqiang Wei, Zhizheng Zhang, Yan Lu, Dong Liu,	(参考訳) ほとんどのニューラルラジアンス場(NeRF)は限定的な一般化能力を示し、単一のモデルを用いて複数のシーンを表現することの適用性を制限している。この問題に対処するために、既存の一般化可能なNeRF法は単純に画像の特徴にモデルを条件付ける。これらの手法は、異なる視点と視点の間で相互作用する効果的なメカニズムが欠如しているため、様々な場面で正確なグローバル表現を学ぶのに依然として苦労している。本研究では,マスクベースモデリングにより3次元暗黙表現学習を大幅に改善できることを明らかにする。具体的には,自己教師型事前学習対象である一般化可能なNeRF(MRVM-NeRF)に対するマスク付き光線とビューモデリングを提案し,各光線に沿った部分マスキング特徴からシーンの完全な表現を予測する。この事前学習目標により、MRVM-NeRFは、異なる点とビュー間の相関関係をよりよく利用することができ、それによって、シーン内の複雑な詳細をキャプチャし、異なるシーン間での一般化能力を高めることができる。提案したMRVM-NeRFが,合成および実世界の両方のデータセットに対して質的,定量的に有効であることを示す。また,提案手法の種々のバックボーンとの整合性を示す実験も行った。 Most Neural Radiance Fields (NeRFs) exhibit limited generalization capabilities, which restrict their applicability in representing multiple scenes using a single model. To address this problem, existing generalizable NeRF methods simply condition the model on image features. These methods still struggle to learn precise global representations over diverse scenes since they lack an effective mechanism for interacting among different points and views. In this work, we unveil that 3D implicit representation learning can be significantly improved by mask-based modeling. Specifically, we propose masked ray and view modeling for generalizable NeRF (MRVM-NeRF), which is a self-supervised pretraining target to predict complete scene representations from partially masked features along each ray. With this pretraining target, MRVM-NeRF enables better use of correlations across different points and views as the geometry priors, which thereby strengthens the capability of capturing intricate details within the scenes and boosts the generalization capability across different scenes. Extensive experiments demonstrate the effectiveness of our proposed MRVM-NeRF on both synthetic and real-world datasets, qualitatively and quantitatively. Besides, we also conduct experiments to show the compatibility of our proposed method with various backbones and its superiority under few-shot cases.	翻訳日:2024-03-21 01:30:29 公開日:2024-03-19
# 非エルミート量子フェルミ加速器 Non-Hermitian Quantum Fermi Accelerator ( http://arxiv.org/abs/2304.07950v2 ) ライセンス: Link先を確認	Andreas Fring, Takano Taira,	(参考訳) 我々は、時間依存のディリクレ境界条件を持つ時間依存の非エルミートハミルトニアンからなる量子フェルミ加速モデルを正確に解く。そのような系に対するヒルベルト空間は、最初は時間非依存のダイソン写像を構築し、その後は固定境界条件に一意的に写像するか、あるいは最初は固定境界条件に一意的に写像し、続いて時間依存のダイソン写像を構築することによって、2つの等価な方法で定義される。これにより、時間に依存しない距離演算子と、移動境界を凍結する時間に依存しない2つのユニタリ写像から時間に依存した距離演算子を構築することができる。時間依存性エネルギースペクトルから、PT-レジームにおける平均エネルギーにおける振動挙動の既知可能性を見出す一方、自発的に壊れたPT-レジームでは、エネルギーの1回の枯渇の新たな特徴を観察する。 PT壊れた状態は、時間依存のダイソン写像と等価な移動境界で修正されていることを示す。 We exactly solve a quantum Fermi accelerator model consisting of a time-independent non-Hermitian Hamiltonian with time-dependent Dirichlet boundary conditions. A Hilbert space for such systems can be defined in two equivalent ways, either by first constructing a time-independent Dyson map and subsequently unitarily mapping to fixed boundary conditions or by first unitarily mapping to fixed boundary conditions followed by the construction of a time-dependent Dyson map. In turn this allows to construct time-dependent metric operators from a time-independent metric and two time-dependent unitary maps that freeze the moving boundaries. From the time-dependent energy spectrum, we find the known possibility of oscillatory behavior in the average energy in the PT-regime, whereas in the spontaneously broken PT-regime we observe the new feature of a one-time depletion of the energy. We show that the PT broken regime is mended with moving boundary, equivalently to mending it with a time-dependent Dyson map.	翻訳日:2024-03-21 01:30:29 公開日:2024-03-19
# クロス・アンド・ウェイト : 無署名の交差点における歩行者の相互作用の予測 Cross or Wait? Predicting Pedestrian Interaction Outcomes at Unsignalized Crossings ( http://arxiv.org/abs/2304.08260v2 ) ライセンス: Link先を確認	Chi Zhang, Amir Hossein Kalantari, Yue Yang, Zhongjun Ni, Gustav Markkula, Natasha Merat, Christian Berger,	(参考訳) 自動車と対話する際の歩行者の行動を予測することは、自動運転分野における最も重要な課題の1つである。歩行者の横断行動は、到着までの時間、歩行者待ち時間、横断歩道の存在、歩行者と運転者の特性と性格特性など、様々な相互作用要因の影響を受けている。しかし、これらの要因は相互作用の結果を予測するために十分に研究されていない。本稿では,歩行者横断決定,横断開始時間(CIT),横断時間(CD)などの歩行者横断行動の機械学習による予測を行う。分散シミュレータデータを用いて相互作用因子の予測と解析を行う。ロジスティック回帰ベースラインモデルと比較して,提案したニューラルネットワークモデルでは予測精度が4.46%,F1スコアが3.23%向上した。また、線形回帰モデルと比較して、CITとCDの根平均二乗誤差(RMSE)を21.56%、30.14%削減する。さらに、相互作用因子の重要性を分析し、より少ない因子を用いたモデルの結果を提示する。これにより、入力機能に制限のある異なるシナリオでのモデル選択に関する情報が提供される。 Predicting pedestrian behavior when interacting with vehicles is one of the most critical challenges in the field of automated driving. Pedestrian crossing behavior is influenced by various interaction factors, including time to arrival, pedestrian waiting time, the presence of zebra crossing, and the properties and personality traits of both pedestrians and drivers. However, these factors have not been fully explored for use in predicting interaction outcomes. In this paper, we use machine learning to predict pedestrian crossing behavior including pedestrian crossing decision, crossing initiation time (CIT), and crossing duration (CD) when interacting with vehicles at unsignalized crossings. Distributed simulator data are utilized for predicting and analyzing the interaction factors. Compared with the logistic regression baseline model, our proposed neural network model improves the prediction accuracy and F1 score by 4.46% and 3.23%, respectively. Our model also reduces the root mean squared error (RMSE) for CIT and CD by 21.56% and 30.14% compared with the linear regression model. Additionally, we have analyzed the importance of interaction factors, and present the results of models using fewer factors. This provides information for model selection in different scenarios with limited input features.	翻訳日:2024-03-21 01:30:29 公開日:2024-03-19
# GenCorres: 結合入射型形状生成モデルによる連続形状マッチング GenCorres: Consistent Shape Matching via Coupled Implicit-Explicit Shape Generative Models ( http://arxiv.org/abs/2304.10523v2 ) ライセンス: Link先を確認	Haitao Yang, Xiangru Huang, Bo Sun, Chandrajit Bajaj, Qixing Huang,	(参考訳) 本稿では,新しいunsupervised joint shape matching (JSM)アプローチであるGenCorresを紹介する。我々のキーとなるアイデアは、メッシュジェネレータを学習して非組織的な変形可能な形状の集合に適合させながら、隣接する合成形状間の変形を制限し、局所剛性や局所整合性などの幾何学構造を保存することである。 GenCorresは既存のJSM技術よりも3つの魅力的な利点を示している。まず、GenCorresは入力形状よりもはるかに大きく、JSMのデータ駆動力を完全に活用する合成形状コレクションの中でJSMを実行する。第2に、GenCorresは一貫した形状マッチングとペアワイズマッチング(すなわち、隣接した合成形状間の変形先行を強制することによって)を統一する。第3に、ジェネレータは、一貫した形状対応の簡潔な符号化を提供する。しかし、未組織形状の収集からメッシュジェネレータを学ぶことは困難であり、優れた初期化が必要である。 GenCorresは入力形状から暗黙のジェネレータを学習することでこの問題に対処する。近接する暗黙曲面間の対応を計算するための新しい手法を導入し, 暗黙発生器の正規化に利用する。暗黙のジェネレータの合成形状は、メッシュジェネレータを学習するための初期フィッティング(テンプレートベースの変形)を誘導する。実験の結果,GenCorresは最先端のJSM技術よりもかなり優れていた。 GenCorresの合成形状は、最先端の変形可能な形状生成器に対して良好な性能を得ることができる。 This paper introduces GenCorres, a novel unsupervised joint shape matching (JSM) approach. Our key idea is to learn a mesh generator to fit an unorganized deformable shape collection while constraining deformations between adjacent synthetic shapes to preserve geometric structures such as local rigidity and local conformality. GenCorres presents three appealing advantages over existing JSM techniques. First, GenCorres performs JSM among a synthetic shape collection whose size is much bigger than the input shapes and fully leverages the datadriven power of JSM. Second, GenCorres unifies consistent shape matching and pairwise matching (i.e., by enforcing deformation priors between adjacent synthetic shapes). Third, the generator provides a concise encoding of consistent shape correspondences. However, learning a mesh generator from an unorganized shape collection is challenging, requiring a good initialization. GenCorres addresses this issue by learning an implicit generator from the input shapes, which provides intermediate shapes between two arbitrary shapes. We introduce a novel approach for computing correspondences between adjacent implicit surfaces, which we use to regularize the implicit generator. Synthetic shapes of the implicit generator then guide initial fittings (i.e., via template-based deformation) for learning the mesh generator. Experimental results show that GenCorres considerably outperforms state-of-the-art JSM techniques. The synthetic shapes of GenCorres also achieve salient performance gains against state-of-the-art deformable shape generators.	翻訳日:2024-03-21 01:30:29 公開日:2024-03-19
# 低次元マニフォールドを探索する深層ネットワークの学習過程 The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold ( http://arxiv.org/abs/2305.01604v3 ) ライセンス: Link先を確認	Jialin Mao, Itay Griniasty, Han Kheng Teoh, Rahul Ramesh, Rubing Yang, Mark K. Transtrum, James P. Sethna, Pratik Chaudhari,	(参考訳) 我々は,訓練中の深層ネットワーク予測の軌跡を解析するための情報幾何学的手法を開発した。基礎となる高次元確率モデルを調べることにより、トレーニング過程が効果的に低次元多様体を探索することを明らかにする。様々なアーキテクチャ、サイズを持つネットワークは、様々な最適化手法、正規化技術、データ拡張技術、重み付け初期化を訓練し、予測空間の同じ多様体上に配置する。この多様体の詳細を調べたところ、異なるアーキテクチャを持つネットワークは区別可能な軌跡に従うが、他の要因は最小限の影響を受けており、より大きなネットワークはより小さなネットワークと同様の多様体に沿って訓練し、予測空間の非常に異なる部分で初期化されるネットワークは、同様の多様体に沿って解に収束する。 We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We study the details of this manifold to find that networks with different architectures follow distinguishable trajectories but other factors have a minimal influence; larger networks train along a similar manifold as that of smaller networks, just faster; and networks initialized at very different parts of the prediction space converge to the solution along a similar manifold.	翻訳日:2024-03-21 01:30:29 公開日:2024-03-19
# 量子整数分解性能評価における拡張性の向上 Enhanced Scalability in Assessing Quantum Integer Factorization Performance ( http://arxiv.org/abs/2305.05249v3 ) ライセンス: Link先を確認	Junseo Lee,	(参考訳) 量子技術の進歩により、整数分解に基づく従来の暗号化システムには潜在的な脅威がある。したがって、関連する量子アルゴリズムの性能を正確に測定する技術の開発が重要である。本章では,行列積状態型のゲートベース量子回路シミュレータにおいて,Shorのアルゴリズムを用いて整数分解タスクに必要な時間を分析することを目的とする。さらに、Shorのアルゴリズムにおけるパラメータ事前選択の影響を観察する。具体的には、この事前選択は、繰り返し回数を減らし、固定条件下での性能測定を容易にすることにより、整数分解の成功率を高めることが期待され、実際の量子ハードウェアでもスケーラブルな性能評価を可能にする。 With the advancement of quantum technologies, there is a potential threat to traditional encryption systems based on integer factorization. Therefore, developing techniques for accurately measuring the performance of associated quantum algorithms is crucial, as it can provide insights into the practical feasibility from the current perspective. In this chapter, we aim to analyze the time required for integer factorization tasks using Shor's algorithm within a gate-based quantum circuit simulator of the matrix product state type. Additionally, we observe the impact of parameter pre-selection in Shor's algorithm. Specifically, this pre-selection is expected to increase the success rate of integer factorization by reducing the number of iterations and facilitating performance measurement under fixed conditions, thus enabling scalable performance evaluation even on real quantum hardware.	翻訳日:2024-03-21 01:30:29 公開日:2024-03-19
# ディープモーダルアライメントと自己教師型マルチタスク学習を用いたマルチモーダル感性分析における共有・プライベート情報学習 Shared and Private Information Learning in Multimodal Sentiment Analysis with Deep Modal Alignment and Self-supervised Multi-Task Learning ( http://arxiv.org/abs/2305.08473v2 ) ライセンス: Link先を確認	Songning Lai, Jiakang Li, Guinan Guo, Xifeng Hu, Yulong Li, Yuan Tan, Zichen Song, Yutong Liu, Zhaoxia Ren, Chun Wan, Danmin Miao, Zhi Liu,	(参考訳) マルチモーダル感情分析タスクのための効果的な表現学習法の設計は重要な研究方向である。この課題は、共有情報とプライベート情報の両方を完全なモーダル表現で学習することであり、均一なマルチモーダルラベルと生の機能融合アプローチでは難しい。本研究では,共分散行列に基づく深層モード共有情報学習モジュールを提案し,モダリティ間の共有情報をキャプチャする。さらに、自己教師付き学習戦略に基づくラベル生成モジュールを使用して、モダリティのプライベート情報をキャプチャする。モジュールはマルチモーダルタスクにおいてプラグ・アンド・プレイであり、パラメータ化を変更することで、モード間の情報交換関係を調整し、指定されたモード間のプライベートまたは共有情報を学習することができる。また、モデルがモーダル微分訓練データに焦点を合わせるのを支援するために、マルチタスク学習戦略も採用している。深層モード共有情報学習モジュールの設計のための詳細な定式化の導出と実現可能性証明を提供する。我々は,3つの一般的なマルチモーダル感情分析ベースラインデータセットについて広範な実験を行い,実験結果からモデルの信頼性が検証された。さらに,モジュール利用のための組合せ手法についても検討する。当社のアプローチは,3つの公開データセットの指標のほとんどにおいて,最先端の手法よりも優れています。 Designing an effective representation learning method for multimodal sentiment analysis tasks is a crucial research direction. The challenge lies in learning both shared and private information in a complete modal representation, which is difficult with uniform multimodal labels and a raw feature fusion approach. In this work, we propose a deep modal shared information learning module based on the covariance matrix to capture the shared information between modalities. Additionally, we use a label generation module based on a self-supervised learning strategy to capture the private information of the modalities. Our module is plug-and-play in multimodal tasks, and by changing the parameterization, it can adjust the information exchange relationship between the modes and learn the private or shared information between the specified modes. We also employ a multi-task learning strategy to help the model focus its attention on the modal differentiation training data. We provide a detailed formulation derivation and feasibility proof for the design of the deep modal shared information learning module. We conduct extensive experiments on three common multimodal sentiment analysis baseline datasets, and the experimental results validate the reliability of our model. Furthermore, we explore more combinatorial techniques for the use of the module. Our approach outperforms current state-of-the-art methods on most of the metrics of the three public datasets.	翻訳日:2024-03-21 01:20:39 公開日:2024-03-19
# LeTI: テキストインタラクションから生成する学習 LeTI: Learning to Generate from Textual Interactions ( http://arxiv.org/abs/2305.10314v2 ) ライセンス: Link先を確認	Xingyao Wang, Hao Peng, Reyhaneh Jabbarvand, Heng Ji,	(参考訳) 微調整事前訓練言語モデル(LM)は、それらの能力を高めるために不可欠である。既存の技法は、入力出力対(例えば、命令チューニング)や、出力品質(例えば、RLHF)を測定する数値的な報酬を持つ。本稿では,テキストインタラクション(LETI)から学習するLMの可能性を,バイナリラベルによる正当性をチェックするだけでなく,テキストフィードバックを通じて出力中のエラーをピンポイントし,説明する。私たちの焦点はコード生成タスクであり、そこではモデルが自然言語命令に基づいてコードを生成する。この設定では、Pythonインタプリタを使用したコード実行からエラーメッセージとスタックトレースという、テキストフィードバックを取得する自然な、スケーラブルな方法が紹介されている。 LETIは、自然言語命令、LM生成プログラム、テキストフィードバックの結合に基づいて、LMの目的を用いてモデルを反復的に微調整する。この微調整されたテキストに先立って、バイナリ報酬トークンは、正しいソリューションとバグの多いソリューションを区別するために使用される。 LETIはトレーニングに地平線出力を必要とせず、微調整されたベースラインよりも優れています。 LETIは、コード生成データセットMBPP上でのLMのパフォーマンスを向上するだけでなく、他のデータセットにも一般化する。 MBPPで訓練され、HumanEvalの見当たらない問題に対して、基本のLMと同等または優れた性能を達成している。さらに, 2進フィードバックと比較して, テキストフィードバックが生成品質とサンプル効率の向上に寄与し, グラデーションの歩数の半分以下で同じ性能を達成できることがわかった。 LETIは、イベント引数抽出で実証的に検証したコード生成として定式化することができ、自然言語タスクにも等しく適用可能である。 Fine-tuning pre-trained language models (LMs) is essential for enhancing their capabilities. Existing techniques commonly fine-tune on input-output pairs (e.g., instruction tuning) or with numerical rewards that gauge the output quality (e.g., RLHF). We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and explain errors in their outputs through textual feedback. Our focus is the code generation task, where the model produces code based on natural language instructions. This setting invites a natural and scalable way to acquire textual feedback: the error messages and stack traces from code execution using a Python interpreter. LETI iteratively fine-tunes the model, using the LM objective, on a concatenation of natural language instructions, LM-generated programs, and textual feedback. Prepended to this fine-tuning text, a binary reward token is used to differentiate correct and buggy solutions. LETI requires no ground-truth outputs for training and even outperforms a fine-tuned baseline that does. LETI not only improves the performance of LMs on a code generation dataset MBPP, but also generalizes to other datasets. Trained on MBPP, it achieves comparable or better performance than the base LMs on unseen problems in HumanEval. Furthermore, compared to binary feedback, we observe that textual feedback leads to improved generation quality and sample efficiency, achieving the same performance with fewer than half of the gradient steps. LETI is equally applicable in natural language tasks when they can be formulated as code generation, which we empirically verified on event argument extraction.	翻訳日:2024-03-21 01:20:39 公開日:2024-03-19
# Moment Matching Denoisingギブズサンプリング Moment Matching Denoising Gibbs Sampling ( http://arxiv.org/abs/2305.11650v6 ) ライセンス: Link先を確認	Mingtian Zhang, Alex Hawkins-Hooker, Brooks Paige, David Barber,	(参考訳) エネルギーベースモデル(EBM)は、複雑なデータ分散をモデル化するための汎用的なフレームワークを提供する。しかし、ESMからのトレーニングとサンプリングは引き続き大きな課題を呈している。スケーラブルなEMMトレーニングのための広く使われているDenoising Score Matching (DSM) 法は不整合の問題に悩まされ、エネルギーモデルが「ノイズの多い」データ分布を学習する原因となった。そこで本研究では,DSM で十分に訓練された 'ノイズ' モデルが与えられた場合に,基礎となるクリーンモデルから効果的なサンプリングを可能にする,モーメントマッチングを用いた効率的なサンプリングフレームワークを提案する。関連手法と比較して,本手法の利点を考察し,高次元データセットへの拡張方法を示す。 Energy-Based Models (EBMs) offer a versatile framework for modeling complex data distributions. However, training and sampling from EBMs continue to pose significant challenges. The widely-used Denoising Score Matching (DSM) method for scalable EBM training suffers from inconsistency issues, causing the energy model to learn a `noisy' data distribution. In this work, we propose an efficient sampling framework: (pseudo)-Gibbs sampling with moment matching, which enables effective sampling from the underlying clean model when given a `noisy' model that has been well-trained via DSM. We explore the benefits of our approach compared to related methods and demonstrate how to scale the method to high-dimensional datasets.	翻訳日:2024-03-21 01:20:39 公開日:2024-03-19
# リッチ文字列ネットモデルとその励起 Enriched string-net models and their excitations ( http://arxiv.org/abs/2305.14068v2 ) ライセンス: Link先を確認	David Green, Peter Huston, Kyle Kawagoe, David Penneys, Anup Poudel, Sean Sanford,	(参考訳) ウォーカー・ワンモデルの境界は、境界励起としてキラルユニタリモジュラーテンソル圏(UMTC)を実現する通勤プロジェクターモデルを構築するために用いられる。異常のウィット類を表すUMTC $\mathcal{A}$が与えられたとき、[arXiv:2208.14018] は$\mathcal{A}$-enriched unitary fusion category $\mathcal{X}$に関連付けられた可換射影モデルを$\mathcal{A}$に関連付けられた3Dウォーカー=ワングモデルの2次元境界上で与えた。その記事は、境界励起は強化センター/M\"uger centralizer $Z^\mathcal{A}(\mathcal{X})$ of $\mathcal{A}$ in $Z(\mathcal{X})$によって与えられると主張した。本稿では、この2次元境界モデルの厳密な扱いを行い、この主張を、スケイン加群や表現圏が境界励起を記述する半単純代数を含むトポロジカル量子場理論(TQFT)技術を用いて検証する。また、Walker-Wangバルクの3次元バルク点励起を M\ "uger center $Z_2(\mathcal{A})$ で示し、バルクからバウンダリへのホッピング作用素 $Z_2(\mathcal{A})\to Z^{\mathcal{A}}(\mathcal{X})$ 境界励起のUMTCが$Z^{\mathcal{A}}(\mathcal{X})$ が$Z_2(\mathcal{A})$ でリッチされた対称なブレイドであることを示すためにTQFT技術を用いている。この記事ではまた、ユニタリテンソル圏の観点からのLevin-Wen文字列ネットモデルの自己完結した総合的なレビューを含む。 Boundaries of Walker-Wang models have been used to construct commuting projector models which realize chiral unitary modular tensor categories (UMTCs) as boundary excitations. Given a UMTC $\mathcal{A}$ representing the Witt class of an anomaly, the article [arXiv:2208.14018] gave a commuting projector model associated to an $\mathcal{A}$-enriched unitary fusion category $\mathcal{X}$ on a 2D boundary of the 3D Walker-Wang model associated to $\mathcal{A}$. That article claimed that the boundary excitations were given by the enriched center/M\"uger centralizer $Z^\mathcal{A}(\mathcal{X})$ of $\mathcal{A}$ in $Z(\mathcal{X})$. In this article, we give a rigorous treatment of this 2D boundary model, and we verify this assertion using topological quantum field theory (TQFT) techniques, including skein modules and a certain semisimple algebra whose representation category describes boundary excitations. We also use TQFT techniques to show the 3D bulk point excitations of the Walker-Wang bulk are given by the M\"uger center $Z_2(\mathcal{A})$, and we construct bulk-to-boundary hopping operators $Z_2(\mathcal{A})\to Z^{\mathcal{A}}(\mathcal{X})$ reflecting how the UMTC of boundary excitations $Z^{\mathcal{A}}(\mathcal{X})$ is symmetric-braided enriched in $Z_2(\mathcal{A})$. This article also includes a self-contained comprehensive review of the Levin-Wen string net model from a unitary tensor category viewpoint, as opposed to the skeletal $6j$ symbol viewpoint.	翻訳日:2024-03-21 01:20:39 公開日:2024-03-19
# 不溶性蒸留--要約・パラフレージングのための低品質モデルから高品質データセット・モデルへ Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing ( http://arxiv.org/abs/2305.16635v2 ) ライセンス: Link先を確認	Jaehun Jung, Peter West, Liwei Jiang, Faeze Brahman, Ximing Lu, Jillian Fisher, Taylor Sorensen, Yejin Choi,	(参考訳) 本稿では,これらの課題を遂行できない低品質の教師から高品質なデータセットとモデルを蒸留する,言い換えと文要約の新しいフレームワークであるImpossible Distillationを提案する。極大規模教師モデル(例, GPT3)やタスク固有アーキテクチャ(例, GPT3)に依存した先行研究とは異なり、パラフレーズがLM分布の近位部分空間を占有する事前学習されたLM(例, GPT2)に内在するパラフレーズの近さを仮説化し検証する。これらの部分空間から世代を同定して蒸留することにより、インポッシブル蒸留は、GPT2スケールのLMでも高品質なデータセットとモデルを生成する。制約なし/構文制御されたパラフレーズ生成と文要約にまたがる複数のベンチマークにおいて,本手法の評価を行った。 770Mパラメータを持つ我々のモデルは、ChatGPTから蒸留されたモデルや、時にはChatGPT自体よりも高いベースラインを一貫して上回ります。また,1.5B LMの蒸留データセットは最大13倍の多様性と忠実度を示した。 We present Impossible Distillation, a novel framework for paraphrasing and sentence summarization, that distills a high-quality dataset and model from a low-quality teacher that itself cannot perform these tasks. Unlike prior works that rely on an extreme-scale teacher model (e.g., GPT3) or task-specific architecture, we hypothesize and verify the paraphrastic proximity intrinsic to pre-trained LMs (e.g., GPT2), where paraphrases occupy a proximal subspace in the LM distribution. By identifying and distilling generations from these subspaces, Impossible Distillation produces a high-quality dataset and model even from GPT2-scale LMs. We evaluate our method on multiple benchmarks spanning unconstrained / syntax-controlled paraphrase generation and sentence summarization. Our model with 770M parameters consistently outperforms strong baselines, including models distilled from ChatGPT, and sometimes, even ChatGPT itself. Also, we find that our distilled dataset from 1.5B LMs exhibits higher diversity and fidelity than up to 13 times larger datasets.	翻訳日:2024-03-21 01:20:39 公開日:2024-03-19
# 費用対効果のある適応的ランダムテストに向けて : 近似的近傍アプローチ Toward Cost-effective Adaptive Random Testing: An Approximate Nearest Neighbor Approach ( http://arxiv.org/abs/2305.17496v2 ) ライセンス: Link先を確認	Rubing Huang, Chenhui Cui, Junlong Lian, Dave Towey, Weifeng Sun, Haibo Chen,	(参考訳) 適応ランダムテスト(ART)は、入力ドメイン全体のランダムテストケースの多様性を増大させることで、ランダムテスト(RT)のテストの有効性(障害検出機能を含む)を高める。多くのARTアルゴリズムがFSCS (Fixed-Size-Candidate-Set ART) やRRT (Restricted Random Testing) などによって研究され、多くの実用的応用で広く利用されている。その人気にもかかわらず、ARTは特にテストケースの数が増えるにつれて、テストケース生成時の高い計算コストの問題に悩まされている。これらのアルゴリズムは,(1)計算時間を短縮できるが,その実行コストは,特にテストケースの数が多い場合に非常に高く,(2)低い計算コストを達成するためには,いくつかの故障検出能力を犠牲にする可能性がある。本稿では,ANNをベースとしたLocality-Sensitive Hashing ART(LSH-ART)を提案する。異なるテスト入力間の距離を計算するとき、LSH-ARTは、候補者に最も近い(必ずしも正確ではない)近傍を効率的な方法で識別する。 LSH-ARTはARTテストの有効性と効率のバランスをとる。 Adaptive Random Testing (ART) enhances the testing effectiveness (including fault-detection capability) of Random Testing (RT) by increasing the diversity of the random test cases throughout the input domain. Many ART algorithms have been investigated such as Fixed-Size-Candidate-Set ART (FSCS) and Restricted Random Testing (RRT), and have been widely used in many practical applications. Despite its popularity, ART suffers from the problem of high computational costs during test-case generation, especially as the number of test cases increases. Although several strategies have been proposed to enhance the ART testing efficiency, such as the forgetting strategy and the k-dimensional tree strategy, these algorithms still face some challenges, including: (1) Although these algorithms can reduce the computation time, their execution costs are still very high, especially when the number of test cases is large; and (2) To achieve low computational costs, they may sacrifice some fault-detection capability. In this paper, we propose an approach based on Approximate Nearest Neighbors (ANNs), called Locality-Sensitive Hashing ART (LSH-ART). When calculating distances among different test inputs, LSH-ART identifies the approximate (not necessarily exact) nearest neighbors for candidates in an efficient way. LSH-ART attempts to balance ART testing effectiveness and efficiency.	翻訳日:2024-03-21 01:20:39 公開日:2024-03-19
# 雑音を考慮した低次多項式閾値関数の属性効率PAC学習 Attribute-Efficient PAC Learning of Low-Degree Polynomial Threshold Functions with Nasty Noise ( http://arxiv.org/abs/2306.00673v2 ) ライセンス: Link先を確認	Shiwei Zeng, Jie Shen,	(参考訳) 低次多項式しきい値関数(PTF)の概念クラスは、機械学習において基本的な役割を果たす。本稿では,$K$-sparse degree-$d$ PTFs on $\mathbb{R}^n$のPAC学習について検討する。我々の主な貢献は、時間$({nd}/{\epsilon})^{O(d)}$とガウス限界分布の下で、PACはクラスを$O(\frac{K^{4d}}{\epsilon^{2d}} \cdot \log^{5d} n)$サンプルで学習する。この研究に先立ち、属性効率の良いロバストアルゴリズムはスパースホモジニアス半空間の特別なケースに対してのみ確立される。主な材料は以下の通り。 1) ハーマイト多項式に基づくChowベクトルのスパーシティパターンに属性のスパーシティを変換する構造的結果、及び 2) 制限されたフロベニウスノルムのみを用いて良好な近似を証明するか, あるいは, 破損したサンプルを検出するフィルタとして, 疎度誘起次数-$2d$多項式の検証を行う新しい属性効率の強いChowベクトル推定アルゴリズムを提案する。 The concept class of low-degree polynomial threshold functions (PTFs) plays a fundamental role in machine learning. In this paper, we study PAC learning of $K$-sparse degree-$d$ PTFs on $\mathbb{R}^n$, where any such concept depends only on $K$ out of $n$ attributes of the input. Our main contribution is a new algorithm that runs in time $({nd}/{\epsilon})^{O(d)}$ and under the Gaussian marginal distribution, PAC learns the class up to error rate $\epsilon$ with $O(\frac{K^{4d}}{\epsilon^{2d}} \cdot \log^{5d} n)$ samples even when an $\eta \leq O(\epsilon^d)$ fraction of them are corrupted by the nasty noise of Bshouty et al. (2002), possibly the strongest corruption model. Prior to this work, attribute-efficient robust algorithms are established only for the special case of sparse homogeneous halfspaces. Our key ingredients are: 1) a structural result that translates the attribute sparsity to a sparsity pattern of the Chow vector under the basis of Hermite polynomials, and 2) a novel attribute-efficient robust Chow vector estimation algorithm which uses exclusively a restricted Frobenius norm to either certify a good approximation or to validate a sparsity-induced degree-$2d$ polynomial as a filter to detect corrupted samples.	翻訳日:2024-03-21 01:20:39 公開日:2024-03-19
# 両世界のベスト:イベントベース光フロー推定のためのハイブリッドSNN-ANNアーキテクチャ Best of Both Worlds: Hybrid SNN-ANN Architecture for Event-based Optical Flow Estimation ( http://arxiv.org/abs/2306.02960v2 ) ライセンス: Link先を確認	Shubham Negi, Deepika Sharma, Adarsh Kumar Kosta, Kaushik Roy,	(参考訳) ロボット工学の分野では、イベントベースのカメラは、高速な動きとダイナミックレンジのシーンを撮影するための、従来のフレームベースのカメラに代わる有望な低消費電力カメラとして登場している。これはスパースで非同期なイベント出力のためです。非同期イベント駆動型計算でニューラルネットワーク(SNN)をスパイクすることは、これらのイベントストリームから時空間的特徴を抽出する大きな可能性を示す。対照的に、標準的なアナログニューラルネットワーク(ANN)は、イベントデータを効率的に処理できない。しかし、トレーニング可能なパラメータ(閾値とリーク)の追加、深い層でのスパイクの消滅、微分不可能なバイナリアクティベーション機能により、SNNのトレーニングは困難である。さらに、時間情報の追跡に責任を持つ追加のデータ構造である膜電位は、SNNのすべての時間ステップで取得および更新されなければならない。これらの課題を克服するために,両モデルの強みを組み合わせた新しいSNN-ANNハイブリッドアーキテクチャを提案する。具体的には、SNNレイヤの非同期計算機能を活用して、入力時間情報を効果的に抽出する。同時に、ANNレイヤは、GPUのような従来の機械学習ハードウェア上でのトレーニングと効率的なハードウェアデプロイメントを容易にする。我々は、各層をスパイクまたはアナログに割り当てる実験的な分析を行い、性能と訓練の容易さに最適化されたネットワーク構成をもたらす。 DSEC-flowとMVSEC(Multi-Vehicle Stereo Event-Camera)データセットを用いた光フロー推定のためのハイブリッドアーキテクチャの評価を行った。 DSEC-flowデータセットでは、ハイブリッドSNN-ANNアーキテクチャは、平均エンドポイントエラー(AEE)を40%削減し、Full-SNNよりも22%低いエネルギー消費量、Full-ANNより48%低いAEEを実現する。 In the field of robotics, event-based cameras are emerging as a promising low-power alternative to traditional frame-based cameras for capturing high-speed motion and high dynamic range scenes. This is due to their sparse and asynchronous event outputs. Spiking Neural Networks (SNNs) with their asynchronous event-driven compute, show great potential for extracting the spatio-temporal features from these event streams. In contrast, the standard Analog Neural Networks (ANNs) fail to process event data effectively. However, training SNNs is difficult due to additional trainable parameters (thresholds and leaks), vanishing spikes at deeper layers, and a non-differentiable binary activation function. Furthermore, an additional data structure, membrane potential, responsible for keeping track of temporal information, must be fetched and updated at every timestep in SNNs. To overcome these challenges, we propose a novel SNN-ANN hybrid architecture that combines the strengths of both. Specifically, we leverage the asynchronous compute capabilities of SNN layers to effectively extract the input temporal information. Concurrently, the ANN layers facilitate training and efficient hardware deployment on traditional machine learning hardware such as GPUs. We provide extensive experimental analysis for assigning each layer to be spiking or analog, leading to a network configuration optimized for performance and ease of training. We evaluate our hybrid architecture for optical flow estimation on DSEC-flow and Multi-Vehicle Stereo Event-Camera (MVSEC) datasets. On the DSEC-flow dataset, the hybrid SNN-ANN architecture achieves a 40% reduction in average endpoint error (AEE) with 22% lower energy consumption compared to Full-SNN, and 48% lower AEE compared to Full-ANN, while maintaining comparable energy usage.	翻訳日:2024-03-21 01:20:39 公開日:2024-03-19
# もうひとつのICUベンチマーク: 臨床MLのためのフレキシブルなマルチセンターフレームワーク Yet Another ICU Benchmark: A Flexible Multi-Center Framework for Clinical ML ( http://arxiv.org/abs/2306.05109v4 ) ライセンス: Link先を確認	Robin van de Water, Hendrik Schmidt, Paul Elbers, Patrick Thoral, Bert Arnrich, Patrick Rockenschaub,	(参考訳) 近年,機械学習(ML)の医療応用が急増している。集中治療ユニット(ICU)は、電子健康記録から利用可能なデータが豊富にあることを考えると、MLの自然な生息地である。合併症の早期検出など、多数のICU予測タスクに対処するモデルが提案されている。著者は、しばしば最先端のパフォーマンスを報告するが、優越性の主張を検証することは困難である。データセットとコードは必ずしも公開されないため、コホート定義、前処理パイプライン、トレーニングセットアップは再現が難しい。本研究は,再現可能かつ同等な臨床ML実験を研究者が定義可能なモジュラーフレームワークであるEtther Another ICU Benchmark (YAIB)を紹介し,コホート定義からモデル評価まで,エンドツーエンドのソリューションを提供する。このフレームワークは、ほとんどのオープンアクセスICUデータセット(MIMIC III/IV、eICU、HiRID、AUMCdb)をネイティブにサポートしており、将来のICUデータセットに容易に適応できる。複数のMLとディープラーニングモデルの透過的な前処理パイプラインと拡張可能なトレーニングコードを組み合わせることで、YAIBは統一されたモデル開発を可能にする。当ベンチマークでは, 既定の5つの予測課題(死亡, 急性腎損傷, 敗血症, 腎臓機能, 滞在期間)を臨床医と共同で検討した。さらなるタスクを追加するのは、設計が簡単です。 YAIBを用いて、データセット、コホート定義、前処理の選択が、モデルクラスよりも予測性能に大きな影響を与えることを実証する。本研究は,手法開発を加速し,現実的な臨床実践を可能にするための臨床MLコミュニティへの取り組みである。 Software Repository: https://github.com/rvandewater/YAIB.com Medical applications of machine learning (ML) have experienced a surge in popularity in recent years. The intensive care unit (ICU) is a natural habitat for ML given the abundance of available data from electronic health records. Models have been proposed to address numerous ICU prediction tasks like the early detection of complications. While authors frequently report state-of-the-art performance, it is challenging to verify claims of superiority. Datasets and code are not always published, and cohort definitions, preprocessing pipelines, and training setups are difficult to reproduce. This work introduces Yet Another ICU Benchmark (YAIB), a modular framework that allows researchers to define reproducible and comparable clinical ML experiments; we offer an end-to-end solution from cohort definition to model evaluation. The framework natively supports most open-access ICU datasets (MIMIC III/IV, eICU, HiRID, AUMCdb) and is easily adaptable to future ICU datasets. Combined with a transparent preprocessing pipeline and extensible training code for multiple ML and deep learning models, YAIB enables unified model development. Our benchmark comes with five predefined established prediction tasks (mortality, acute kidney injury, sepsis, kidney function, and length of stay) developed in collaboration with clinicians. Adding further tasks is straightforward by design. Using YAIB, we demonstrate that the choice of dataset, cohort definition, and preprocessing have a major impact on the prediction performance - often more so than model class - indicating an urgent need for YAIB as a holistic benchmarking tool. We provide our work to the clinical ML community to accelerate method development and enable real-world clinical implementations. Software Repository: https://github.com/rvandewater/YAIB.	翻訳日:2024-03-21 01:20:39 公開日:2024-03-19
# Ada-NAV:ロボットナビゲーションのための適応軌道長に基づく効率的な政策学習 Ada-NAV: Adaptive Trajectory Length-Based Sample Efficient Policy Learning for Robotic Navigation ( http://arxiv.org/abs/2306.06192v4 ) ライセンス: Link先を確認	Bhrij Patel, Kasun Weerakoon, Wesley A. Suttle, Alec Koppel, Brian M. Sadler, Tianyi Zhou, Amrit Singh Bedi, Dinesh Manocha,	(参考訳) 軌道長は強化学習(RL)アルゴリズムにおける重要なハイパーパラメータであり、ロボット工学の応用におけるサンプルの非効率性に大きく貢献する。 Ada-NAVはロボットナビゲーションタスクにおけるRLアルゴリズムのトレーニングサンプル効率を高めるために設計された新しい適応軌道長スキームである。軌道長を固定されたハイパーパラメータとして扱う従来の手法とは異なり、下層の航法方針のエントロピーに基づいて動的に調整することを提案する。興味深いことに、Ada-NAVは既存のオン・ポリティとオフ・ポリティィのRL手法の両方に適用でき、この手法はREINFORCE, Proximal Policy Optimization (PPO), Soft Actor-Critic (SAC)の3つの一般的なRL法に対して実証的に有効性を示す。我々は、Ada-NAVが一定またはランダムにサンプリングされた軌道長を用いる従来の手法よりも優れている、シミュレーションおよび実世界のロボット実験を通して実証する。特に、固定サンプル予算では、Ada-NAV は航法成功率 18 % 、航法パス長 20-38 % 、高架コスト 9.32 % を達成している。さらに,Ada-NAVをClearpath Huskyロボットに統合することで,複雑な屋外環境に適用可能であることを示す。 Trajectory length stands as a crucial hyperparameter within reinforcement learning (RL) algorithms, significantly contributing to the sample inefficiency in robotics applications. Motivated by the pivotal role trajectory length plays in the training process, we introduce Ada-NAV, a novel adaptive trajectory length scheme designed to enhance the training sample efficiency of RL algorithms in robotic navigation tasks. Unlike traditional approaches that treat trajectory length as a fixed hyperparameter, we propose to dynamically adjust it based on the entropy of the underlying navigation policy. Interestingly, Ada-NAV can be applied to both existing on-policy and off-policy RL methods, which we demonstrate by empirically validating its efficacy on three popular RL methods: REINFORCE, Proximal Policy Optimization (PPO), and Soft Actor-Critic (SAC). We demonstrate through simulated and real-world robotic experiments that Ada-NAV outperforms conventional methods that employ constant or randomly sampled trajectory lengths. Specifically, for a fixed sample budget, Ada-NAV achieves an 18\% increase in navigation success rate, a 20-38\% reduction in navigation path length, and a 9.32\% decrease in elevation costs. Furthermore, we showcase the versatility of Ada-NAV by integrating it with the Clearpath Husky robot, illustrating its applicability in complex outdoor environments.	翻訳日:2024-03-21 01:10:09 公開日:2024-03-19
# Radiology-GPT: ラジオロジーのための大規模言語モデル Radiology-GPT: A Large Language Model for Radiology ( http://arxiv.org/abs/2306.08666v2 ) ライセンス: Link先を確認	Zhengliang Liu, Aoxiao Zhong, Yiwei Li, Longtao Yang, Chao Ju, Zihao Wu, Chong Ma, Peng Shu, Cheng Chen, Sekeun Kim, Haixing Dai, Lin Zhao, Lichao Sun, Dajiang Zhu, Jun Liu, Wei Liu, Dinggang Shen, Xiang Li, Quanzheng Li, Tianming Liu,	(参考訳) 本稿では,ラジオロジーのための大規模言語モデルであるRadiology-GPTを紹介する。放射線学領域知識の広範なデータセットに基づく指導チューニング手法を用いて、ラジオロジー-GPTは、StableLM、Dlly、LLaMAといった一般的な言語モデルと比較して優れた性能を示す。放射線診断、研究、通信において大きな汎用性を示す。本研究はNLPの今後の発展の触媒となる。ラジオロジー-GPTの実装が成功したことは、HIPAAのようなプライバシ標準の遵守を確保しつつ、特に特有の医療専門分野に適した、生成的な大きな言語モデルをローカライズする可能性を示唆している。様々な病院のニーズに合わせて個別化された大規模言語モデルを開発する見通しは、有望な方向性を示している。これらのモデルにおける会話能力とドメイン固有の知識の融合は、医療AIにおける将来の発展を促進することを目的としている。 Radiology-GPTのデモはhttps://huggingface.co/spaces/allen-eric/radiology-gptで公開されている。 We introduce Radiology-GPT, a large language model for radiology. Using an instruction tuning approach on an extensive dataset of radiology domain knowledge, Radiology-GPT demonstrates superior performance compared to general language models such as StableLM, Dolly and LLaMA. It exhibits significant versatility in radiological diagnosis, research, and communication. This work serves as a catalyst for future developments in clinical NLP. The successful implementation of Radiology-GPT is indicative of the potential of localizing generative large language models, specifically tailored for distinctive medical specialties, while ensuring adherence to privacy standards such as HIPAA. The prospect of developing individualized, large-scale language models that cater to specific needs of various hospitals presents a promising direction. The fusion of conversational competence and domain-specific knowledge in these models is set to foster future development in healthcare AI. A demo of Radiology-GPT is available at https://huggingface.co/spaces/allen-eric/radiology-gpt.	翻訳日:2024-03-21 01:10:09 公開日:2024-03-19
# Granger-Causal Hierarchical Skill Discovery Granger-Causal Hierarchical Skill Discovery ( http://arxiv.org/abs/2306.09509v2 ) ライセンス: Link先を確認	Caleb Chuck, Kevin Black, Aditya Arjun, Yuke Zhu, Scott Niekum,	(参考訳) 強化学習(Reinforcement Learning, RL)は複雑なタスクの学習方針において有望な結果を示してきたが、しばしばサンプル効率の低下と限られた伝達性に悩まされる。階層的RL(HRL)手法は、政策をスキルに分解し、状態を抽象化し、新しいタスクでスキルを再利用することで、長期タスクの学習の難しさを解決することを目的としている。しかし、多くのHRL手法は、有用なスキルを見つけるためにいくつかの初期タスク成功を必要とする。一方、報酬のないHRL法は、高次元領域における適切なカバレッジを達成するために、あまりにも多くのスキルを習得する必要があることが多い。対照的に、我々は、高レベルの制御を許す少数のタスク非依存スキルを特定するために、分解されたドメインの制御性に焦点を当てた相互作用スキルの連鎖(COInS)アルゴリズムを導入している。 COInSは学習した検出器を使って状態要因間の相互作用を識別し、それぞれの要因を連続的に制御する一連のスキルを訓練する。障害物のあるロボット押下作業におけるCOInSの評価-他のRL法とHRL法が不足する課題領域である。また、一般的なRLベンチマークであるBreakoutの変種を用いて、COInSが学習したスキルの伝達性を実証し、標準RLベースラインと比較してサンプル効率と最終性能を2～3倍改善したことを示す。 Reinforcement Learning (RL) has demonstrated promising results in learning policies for complex tasks, but it often suffers from low sample efficiency and limited transferability. Hierarchical RL (HRL) methods aim to address the difficulty of learning long-horizon tasks by decomposing policies into skills, abstracting states, and reusing skills in new tasks. However, many HRL methods require some initial task success to discover useful skills, which paradoxically may be very unlikely without access to useful skills. On the other hand, reward-free HRL methods often need to learn far too many skills to achieve proper coverage in high-dimensional domains. In contrast, we introduce the Chain of Interaction Skills (COInS) algorithm, which focuses on controllability in factored domains to identify a small number of task-agnostic skills that still permit a high degree of control. COInS uses learned detectors to identify interactions between state factors and then trains a chain of skills to control each of these factors successively. We evaluate COInS on a robotic pushing task with obstacles-a challenging domain where other RL and HRL methods fall short. We also demonstrate the transferability of skills learned by COInS, using variants of Breakout, a common RL benchmark, and show 2-3x improvement in both sample efficiency and final performance compared to standard RL baselines.	翻訳日:2024-03-21 01:10:08 公開日:2024-03-19
# 生成型マルチモーダルエンティティリンク Generative Multimodal Entity Linking ( http://arxiv.org/abs/2306.12725v3 ) ライセンス: Link先を確認	Senbao Shi, Zhenran Xu, Baotian Hu, Min Zhang,	(参考訳) MEL(Multimodal Entity Linking)は、知識ベースから参照エンティティへの参照をマルチモーダルコンテキストでマッピングするタスクである。既存のMEL法は主に複雑なマルチモーダル相互作用機構の設計に重点を置いており、全てのモデルパラメータを微調整する必要がある。本研究では,ジェネレーティブ・マルチモーダル・エンティティ・リンク・フレームワークであるGEMELを提案する。私たちはビジョンと言語モデルを凍結し続け、モダリティ間の相互作用を可能にするために機能マッパーをトレーニングします。 MELタスクにLLMを適用するために,マルチモーダルインスタンスを実演として検索することで,LLMのコンテキスト内学習能力を活用する。 GEMEL はモデルパラメータの 0.3% しか微調整されていないため、2つの確立された MEL データセット(WikiDiverse では 7.7% 、WikiMEL では 8.8% の精度向上)で最先端の結果が得られている。性能向上は、LLM予測の人気バイアスを緩和し、あまり一般的でないエンティティを効果的に曖昧にすることに起因する。さらなる解析により、GEMELの一般性とスケーラビリティが検証される。我々のフレームワークは市販の言語モデルと互換性があり、MELタスクでLLMを利用するための効率的で汎用的なソリューションに向かっている。 Multimodal Entity Linking (MEL) is the task of mapping mentions with multimodal contexts to the referent entities from a knowledge base. Existing MEL methods mainly focus on designing complex multimodal interaction mechanisms and require fine-tuning all model parameters, which can be prohibitively costly and difficult to scale in the era of Large Language Models (LLMs). In this work, we propose GEMEL, a Generative Multimodal Entity Linking framework based on LLMs, which directly generates target entity names. We keep the vision and language model frozen and only train a feature mapper to enable cross-modality interactions. To adapt LLMs to the MEL task, we leverage the in-context learning capability of LLMs by retrieving multimodal instances as demonstrations. Extensive experiments show that, with only ~0.3% of the model parameters fine-tuned, GEMEL achieves state-of-the-art results on two well-established MEL datasets (7.7% accuracy gains on WikiDiverse and 8.8% accuracy gains on WikiMEL). The performance gain stems from mitigating the popularity bias of LLM predictions and disambiguating less common entities effectively. Further analysis verifies the generality and scalability of GEMEL. Our framework is compatible with any off-the-shelf language model, paving the way towards an efficient and general solution for utilizing LLMs in the MEL task.	翻訳日:2024-03-21 01:10:08 公開日:2024-03-19
# 時間的一貫した人間のアニメーションのための双方向時間拡散モデル Bidirectional Temporal Diffusion Model for Temporally Consistent Human Animation ( http://arxiv.org/abs/2307.00574v4 ) ライセンス: Link先を確認	Tserendorj Adiya, Jae Shin Yoon, Jungeun Lee, Sanghun Kim, Hwasup Lim,	(参考訳) 本研究では,1つの画像,ビデオ,ランダムノイズから時間的コヒーレントな人間のアニメーションを生成する手法を提案する。この問題は、過去のフレームを復号化して将来のフレームを復号する自動回帰生成のモデリングとして定式化されている。しかし、このような一方向生成は時間の経過とともに動きが漂う傾向が高く、外見の歪みのような重要なアーチファクトを持つ非現実的な人間のアニメーションを生成する。両方向の時間的モデリングは、人間の外見の運動のあいまいさを大幅に抑制することにより、生成ネットワーク上の時間的コヒーレンスを強制すると主張している。ニューラルネットワークは,中間結果を連続フレーム間で双方向に条件付けした時相ガウス雑音を復調することにより,人の画像を生成することを学習する。実験では,実時間コヒーレンスをもつ一方向アプローチと比較して高い性能を示す。 We introduce a method to generate temporally coherent human animation from a single image, a video, or a random noise. This problem has been formulated as modeling of an auto-regressive generation, i.e., to regress past frames to decode future frames. However, such unidirectional generation is highly prone to motion drifting over time, generating unrealistic human animation with significant artifacts such as appearance distortion. We claim that bidirectional temporal modeling enforces temporal coherence on a generative network by largely suppressing the motion ambiguity of human appearance. To prove our claim, we design a novel human animation framework using a denoising diffusion model: a neural network learns to generate the image of a person by denoising temporal Gaussian noises whose intermediate results are cross-conditioned bidirectionally between consecutive frames. In the experiments, our method demonstrates strong performance compared to existing unidirectional approaches with realistic temporal coherence	翻訳日:2024-03-21 01:10:08 公開日:2024-03-19
# VOLTA: 変分相互情報最大化オートエンコーダによる生成多様性の向上 VOLTA: Improving Generative Diversity by Variational Mutual Information Maximizing Autoencoder ( http://arxiv.org/abs/2307.00852v2 ) ライセンス: Link先を確認	Yueen Ma, Dafeng Chi, Jingjing Li, Kai Song, Yuzheng Zhuang, Irwin King,	(参考訳) 自然言語生成ドメインはTransformerモデルのおかげで大きな成功を収めた。彼らは最先端の世代的品質を達成したが、しばしば世代的多様性を無視する。この問題に対処する以前の試みは、モデル容量の低いか、複雑すぎるアーキテクチャのいずれかに悩まされていた。いくつかの最近の手法では、多様性を高めるためにVAEフレームワークを使用しているが、潜伏変数は入力コンテキストに完全に依存しており、潜伏空間の探索を制限している。本稿では,従来の埋め込み結合や要約から離れて,より効果的な相互接続により,トランスフォーマーとVAEをブリッジすることで生成多様性を高めるフレームワークであるVOLTAを紹介する。さらに,インプットに依存しない可変性を実現するためにInfoGANスタイルの潜時符号を統合することを提案する。さらに,本フレームワークは,従来の連続入力サポートと並行して,離散入力に対応している。我々は3つの異なるNLGタスクから得られた6つのデータセットに対して2種類のトランスフォーマーを用いて総合的な実験を行い、生成品質を維持しながら生成の多様性を著しく改善できることを示す。 The natural language generation domain has witnessed great success thanks to Transformer models. Although they have achieved state-of-the-art generative quality, they often neglect generative diversity. Prior attempts to tackle this issue suffer from either low model capacity or over-complicated architectures. Some recent methods employ the VAE framework to enhance diversity, but their latent variables fully depend on the input context, restricting exploration of the latent space. In this paper, we introduce VOLTA, a framework that elevates generative diversity by bridging Transformer with VAE via a more effective cross-attention-based connection, departing from conventional embedding concatenation or summation. Additionally, we propose integrating InfoGAN-style latent codes to enable input-independent variability, further diversifying the generation. Moreover, our framework accommodates discrete inputs alongside its existing support for continuous inputs. We perform comprehensive experiments with two types of Transformers on six datasets from three different NLG tasks to show that our approach can significantly improve generative diversity while maintaining generative quality.	翻訳日:2024-03-21 01:10:08 公開日:2024-03-19
# SAMAug:セグメンテーションモデルのためのポイントプロンプト拡張 SAMAug: Point Prompt Augmentation for Segment Anything Model ( http://arxiv.org/abs/2307.01187v4 ) ライセンス: Link先を確認	Haixing Dai, Chong Ma, Zhiling Yan, Zhengliang Liu, Enze Shi, Yiwei Li, Peng Shu, Xiaozheng Wei, Lin Zhao, Zihao Wu, Fang Zeng, Dajiang Zhu, Wei Liu, Quanzheng Li, Lichao Sun, Shu Zhang Tianming Liu, Xiang Li,	(参考訳) 本稿では,対話型画像分割性能を向上させるSegment Anything Model(SAM)のための新しい視覚的ポイント拡張手法であるSAMAugを紹介する。 SAMAugは、SAMに対するユーザの意図に関する情報を提供するために、拡張ポイントプロンプトを生成する。 SAMは初期点プロンプトから初期マスクを生成し、提案したSAMAugに入力して拡張点プロンプトを生成する。これらの余分な点を組み込むことで、SAMは拡張ポイントプロンプトと初期プロンプトの両方に基づいて拡張セグメンテーションマスクを生成することができ、セグメンテーション性能が向上する。本研究では,最大差エントロピー,最大距離,サリエンシに基づくランダムサンプリング,サンプリングという4つの異なる点拡張手法を用いて評価を行った。 COCO、Fundus、COVID QUEx、ISIC2018データセットの実験結果は、SAMAugがSAMのセグメンテーション結果、特に最大距離とサリエンシを使って促進できることを示している。 SAMAugはコンピュータビジョンの視覚的プロンプト増強の可能性を示す。 SAMAugのコードはgithub.com/yhydhx/SAMAugで入手できる。 This paper introduces SAMAug, a novel visual point augmentation method for the Segment Anything Model (SAM) that enhances interactive image segmentation performance. SAMAug generates augmented point prompts to provide more information about the user's intention to SAM. Starting with an initial point prompt, SAM produces an initial mask, which is then fed into our proposed SAMAug to generate augmented point prompts. By incorporating these extra points, SAM can generate augmented segmentation masks based on both the augmented point prompts and the initial prompt, resulting in improved segmentation performance. We conducted evaluations using four different point augmentation strategies: random sampling, sampling based on maximum difference entropy, maximum distance, and saliency. Experiment results on the COCO, Fundus, COVID QUEx, and ISIC2018 datasets show that SAMAug can boost SAM's segmentation results, especially using the maximum distance and saliency. SAMAug demonstrates the potential of visual prompt augmentation for computer vision. Codes of SAMAug are available at github.com/yhydhx/SAMAug	翻訳日:2024-03-21 01:10:08 公開日:2024-03-19
# 3次元フェルミオン物質中の渦ループダイナミクスと動的量子相転移 Vortex loop dynamics and dynamical quantum phase transitions in 3D fermion matter ( http://arxiv.org/abs/2307.02985v3 ) ライセンス: Link先を確認	Arkadiusz Kosior, Markus Heyl,	(参考訳) 過去10年間、動的量子相転移(DQPT)は非平衡量子多体系を理解するためのパラダイムシフトとして現れてきた。しかし、課題は、関連する動的位相を効果的に特徴づける順序パラメータを特定することである。本研究では,グリーン関数の位相における渦特異点の挙動を,相互作用系と非相互作用系の両方において瞬時クエンチ後の3次元の広い種類のフェルミオン格子モデルに対して検討する。渦の全集合が一次元の動的対象を形成しており、これを \emph{vortex loops} と呼ぶ。このような渦ループの数は、異なる非平衡位相を区別する量子化順序パラメータとして解釈できる。本結果は,非相互作用シナリオにおける順序パラメータの変動とDQPTの関係を明確にするものである。さらに,Loschmidt振幅とグリーン関数との間に直接的関係がないにもかかわらず,渦ループは弱い相互作用の場合において頑健であることを示す。最後に、渦ループが運動量空間の複雑な力学パターンを形成することを観察する。本研究は,非平衡系における動的順序パラメータの定義の開発に有用な知見を提供する。 Over the past decade, dynamical quantum phase transitions (DQPTs) have emerged as a paradigm shift in understanding nonequilibrium quantum many-body systems. However, the challenge lies in identifying order parameters that effectively characterize the associated dynamic phases. In this study, we investigate the behavior of vortex singularities in the phase of the Green's function for a broad class of fermion lattice models in three dimensions after an instantaneous quench in both interacting and non-interacting systems. We find that the full set of vortices form one-dimensional dynamical objects, which we call \emph{vortex loops}. We propose that the number of such vortex loops can be interpreted as a quantized order parameter that distinguishes between different non-equilibrium phases. Our results establish an explicit link between variations in the order parameter and DQPTs in the non-interacting scenario. Moreover, we show that the vortex loops are robust in the weakly interacting case, even though there is no direct relation between the Loschmidt amplitude and the Green's function. Finally, we observe that vortex loops can form complex dynamical patterns in momentum space. Our findings provide valuable insights for developing definitions of dynamical order parameters in non-equilibrium systems.	翻訳日:2024-03-21 01:10:08 公開日:2024-03-19
# 機械学習による複雑システムの情報分解 Information decomposition in complex systems via machine learning ( http://arxiv.org/abs/2307.04755v2 ) ライセンス: Link先を確認	Kieran A. Murphy, Dani S. Bassett,	(参考訳) 複雑なシステムを理解するための基本的なステップの1つは、マクロスケールの振る舞いに最も関係のあるシステムのコンポーネントのスケールにおける変動を特定することである。相互情報は、観測物間の機能的関係の独立性から、システムのスケールをまたいで変動をリンクする自然な手段を提供する。しかしながら、情報が観測対象の集合に分散される方法の特徴付けは、計算的に困難であり、少数の測定範囲を超えて一般的には不可能である。本稿では、機械学習を用いて、各測定の損失圧縮を共同で最適化することにより、測定セットに含まれる情報を分解する、実用的で一般的な手法を提案する。分散情報ボトルネックを学習目的として導いた情報分解は、特定マクロな振る舞いに最も関係のあるシステム状態の測定における変動を識別する。我々は,塑性変形を受けるブール回路とアモルファス材料という,2つのパラダイム的複雑系に着目した解析を行った。どちらの例でも、システム状態のエントロピーの量は、マクロな振る舞いに最も関係しているものによって、ビット単位で分解される。情報理論によってもたらされる有意義なデータ変動の同定は、複雑なシステムにおけるミクロ構造とマクロ構造の間の関係を研究するために実用的である。 One of the fundamental steps toward understanding a complex system is identifying variation at the scale of the system's components that is most relevant to behavior on a macroscopic scale. Mutual information provides a natural means of linking variation across scales of a system due to its independence of functional relationship between observables. However, characterizing the manner in which information is distributed across a set of observables is computationally challenging and generally infeasible beyond a handful of measurements. Here we propose a practical and general methodology that uses machine learning to decompose the information contained in a set of measurements by jointly optimizing a lossy compression of each measurement. Guided by the distributed information bottleneck as a learning objective, the information decomposition identifies the variation in the measurements of the system state most relevant to specified macroscale behavior. We focus our analysis on two paradigmatic complex systems: a Boolean circuit and an amorphous material undergoing plastic deformation. In both examples, the large amount of entropy of the system state is decomposed, bit by bit, in terms of what is most related to macroscale behavior. The identification of meaningful variation in data, with the full generality brought by information theory, is made practical for studying the connection between micro- and macroscale structure in complex systems.	翻訳日:2024-03-21 01:10:08 公開日:2024-03-19
# 層ワイドリニアモード接続性 Layer-wise Linear Mode Connectivity ( http://arxiv.org/abs/2307.06966v3 ) ライセンス: Link先を確認	Linara Adilova, Maksym Andriushchenko, Michael Kamp, Asja Fischer, Martin Jaggi,	(参考訳) ニューラルネットワークパラメータの平均化は、2つの独立したモデルの知識を融合させる直感的な方法である。連邦学習において最も顕著に用いられている。トレーニングの終わりにモデルが平均化されると、関心の損失面が非常に特殊である場合、すなわち、2つのモデルの間の中間点の損失が十分に低くなければならない場合にのみ、優れたパフォーマンスモデルをもたらす。これは、最先端ネットワークの非凸損失を保証することは不可能である。非常に異なるデータセットでトレーニングされた平均モデルに対して、特定のレイヤのパラメータやレイヤの組み合わせだけを平均化して、よりよいパフォーマンスのモデルが提案された。レイヤワイド平均化の効果をより深く理解するために、単一のレイヤやレイヤのグループを平均化するモデルの性能を分析します。実験的および理論的研究に基づき、我々は層幅線形接続という新しい概念を導入し、深層ネットワークが層幅障壁を持たないことを示す。 Averaging neural network parameters is an intuitive method for fusing the knowledge of two independent models. It is most prominently used in federated learning. If models are averaged at the end of training, this can only lead to a good performing model if the loss surface of interest is very particular, i.e., the loss in the midpoint between the two models needs to be sufficiently low. This is impossible to guarantee for the non-convex losses of state-of-the-art networks. For averaging models trained on vastly different datasets, it was proposed to average only the parameters of particular layers or combinations of layers, resulting in better performing models. To get a better understanding of the effect of layer-wise averaging, we analyse the performance of the models that result from averaging single layers, or groups of layers. Based on our empirical and theoretical investigation, we introduce a novel notion of the layer-wise linear connectivity, and show that deep networks do not have layer-wise barriers between them.	翻訳日:2024-03-21 01:00:25 公開日:2024-03-19
# 超大規模原子構造の有効ハミルトニアンのためのアクティブラーニング Active learning for effective Hamiltonian of super-large-scale atomic structures ( http://arxiv.org/abs/2307.08929v2 ) ライセンス: Link先を確認	Xingyue Ma, Hongying Chen, Ri He, Zhanbo Yu, Sergei Prokhorenko, Zheng Wen, Zhicheng Zhong, Jorge Iñiguez, L. Bellaiche, Di Wu, Yurong Yang,	(参考訳) 第一原理に基づく有効ハミルトニアンは、強誘電体と緩和体強誘電体の特性を予測し、シミュレートするために広く用いられている。しかし、実効ハミルトニアンのパラメトリゼーション法は複雑であり、複雑な相互作用や複雑な成分を持つシステムをほとんど解決できない。そこで我々は,ベイズ線形回帰に基づく実効ハミルトニアンをパラメータ化するための,オンザフライアクティブ機械学習手法を開発した。パラメトリゼーションは分子動力学シミュレーションで完了し、各ステップで予測されるエネルギー、力、ストレスは不確実性と共に行われる。第一原理計算は、不確実性が大きければパラメータを再訓練するときに実行される。このアプローチは、以前の方法では扱えない複雑なシステムを含むあらゆる考慮されたシステムに対して、効果的なハミルトンパラメータを計算する普遍的で自動的な方法を提供する。実効ハミルトニアンの形式も、いくつかの複素項を含むように修正されている。 BaTiO3, CsPbI3およびSrTiO3/PbTiO3表面は、従来の第一原理パラメトリゼーション法と比較して、このアプローチの正確性を示す例として挙げられる。 The first-principles-based effective Hamiltonian is widely used to predict and simulate the properties of ferroelectrics and relaxor ferroelectrics. However, the parametrization method of the effective Hamiltonian is complicated and hardly can resolve the systems with complex interactions and/or complex components. Here, we developed an on-the-fly active machine learning approach to parametrize the effective Hamiltonian based on Bayesian linear regression. The parametrization is completed in molecular dynamics simulations, with the energy, forces and stress predicted at each step along with their uncertainties. First-principles calculations are executed when the uncertainties are large to retrain the parameters. This approach provides a universal and automatic way to compute the effective Hamiltonian parameters for any considered systems including complex systems which previous methods can not handle. The form of the effective Hamiltonian is also revised to include some complex terms. BaTiO3, CsPbI3 and SrTiO3/PbTiO3 surface are taken as examples to show the accurateness of this approach comparing with conventional first-principles parametrization method.	翻訳日:2024-03-21 01:00:25 公開日:2024-03-19
# HateModerate: コンテンツモデレーションポリシーに対するHate Speech Detectorのテスト HateModerate: Testing Hate Speech Detectors against Content Moderation Policies ( http://arxiv.org/abs/2307.12418v2 ) ライセンス: Link先を確認	Jiangrui Zheng, Xueqing Liu, Guanqun Yang, Mirazul Haque, Xing Qian, Ravishka Rathnasuriya, Wei Yang, Girish Budhrani,	(参考訳) ヘイトフルコンテンツからユーザーを守るため、既存の研究はヘイトスピーチの自動検出を研究した。ヘイトスピーチ検出の自動化はソーシャルメディアのコンテンツポリシーに準拠しているのだろうか? プラットフォームの内容ポリシーは、ソーシャルメディアプラットフォームによって調整されたコンテンツのチェックリストである。コンテンツモデレーションルールはしばしば一意に定義されているため、既存のヘイトスピーチデータセットはこの質問に答えることはできない。この研究は、コンテンツポリシーに対する自動コンテンツモデレーターの振る舞いをテストするデータセットであるHateModerateを作成することで、この問題に答えようとしている。まず、28のアノテータとGPTを6ステップのアノテーションプロセスで処理し、その結果、Facebookの41のヘイトスピーチポリシーのそれぞれにマッチする憎悪と非憎しみのないテストスイートのリストを作成します。第2に、HateModerateに対して最先端のヘイトスピーチ検出器の性能を検証し、これらのモデルがポリシーに適合していることを示す。第3に、HateModerateを使用して、HuggingFace上のトップダウンのヘイト検知器のトレーニングデータを増強します。我々は,オリジナルテストデータに匹敵するスコアを持ちながら,コンテンツポリシーに対するモデル適合性の大幅な改善を観察する。データセットとコードは添付ファイルにある。 To protect users from massive hateful content, existing works studied automated hate speech detection. Despite the existing efforts, one question remains: do automated hate speech detectors conform to social media content policies? A platform's content policies are a checklist of content moderated by the social media platform. Because content moderation rules are often uniquely defined, existing hate speech datasets cannot directly answer this question. This work seeks to answer this question by creating HateModerate, a dataset for testing the behaviors of automated content moderators against content policies. First, we engage 28 annotators and GPT in a six-step annotation process, resulting in a list of hateful and non-hateful test suites matching each of Facebook's 41 hate speech policies. Second, we test the performance of state-of-the-art hate speech detectors against HateModerate, revealing substantial failures these models have in their conformity to the policies. Third, using HateModerate, we augment the training data of a top-downloaded hate detector on HuggingFace. We observe significant improvement in the models' conformity to content policies while having comparable scores on the original test data. Our dataset and code can be found in the attachment.	翻訳日:2024-03-21 01:00:25 公開日:2024-03-19
# 分布モーメントを用いた量子トモグラフィーの信頼性領域 Reliable confidence regions for quantum tomography using distribution moments ( http://arxiv.org/abs/2307.12823v2 ) ライセンス: Link先を確認	D. O. Norkin, E. O. Kiktenko, A. K. Fedorov,	(参考訳) 量子トモグラフィーは、未知の量子状態とプロセスの再構成に広く応用できる方法である。しかし、その量子技術への応用は通常、信頼できる信頼区間を持つ準備された量子状態とターゲットの量子状態の違いを推定する必要がある。本研究では,量子トモグラフィーの精度の高い誤差バーを決定するための計算効率が高く信頼性の高い手法を提案する。我々は,2つのモーメントを計算することで,対象状態と線形反転によって与えられる推定値との間のヒルベルト・シュミット距離の確率分布を近似する。また、量子プロセストモグラフィーに対するこのアプローチの一般化とアフィン関数に対する信頼区間の導出を提案する。我々は,クラウドアクセス可能な量子プロセッサを用いてシミュレーションと実演の両方を用いて,多数の量子トモグラフィープロトコルのアプローチをベンチマークした。得られた結果は、様々な性質の量子システムの完全なキャラクタリゼーションのための提案されたスキームの使用方法を明らかにする。 Quantum tomography is a widely applicable method for reconstructing unknown quantum states and processes. However, its applications in quantum technologies usually also require estimating the difference between prepared and target quantum states with reliable confidence intervals. In this work we suggest a computationally efficient and reliable scheme for determining well-justified error bars for quantum tomography. We approximate the probability distribution of the Hilbert-Schmidt distance between the target state and the estimation, which is given by the linear inversion, by calculating its two moments. We also present a generalization of this approach for quantum process tomography and deriving confidence intervals for affine functions. We benchmark our approach for a number of quantum tomography protocols using both simulation and demonstration with the use of a cloud-accessible quantum processor. The obtained results pave the way for the use of the suggested scheme for the complete characterization of quantum systems of various natures.	翻訳日:2024-03-21 01:00:25 公開日:2024-03-19
# FedDRL:段階的強化学習に基づく信頼できるフェデレーション学習モデル融合法 FedDRL: A Trustworthy Federated Learning Model Fusion Method Based on Staged Reinforcement Learning ( http://arxiv.org/abs/2307.13716v4 ) ライセンス: Link先を確認	Leiming Chen, Weishan Zhang, Cihao Dong, Sibo Qiao, Ziling Huang, Yuming Nie, Zhaoxiang Hou, Chee Wei Tan,	(参考訳) 従来のフェデレート学習では、各クライアントモデルの重みを計算するためにサンプルの数を使用し、この固定重み値を使用してグローバルモデルを融合する。しかし、現実的なシナリオでは、各クライアントのデバイスとデータの均一性は、各クライアントのモデルの品質に違いをもたらす。したがって、グローバルモデルへの貢献は、サンプルサイズによって完全には決定されない。さらに、クライアントが意図的に低品質または悪意のあるモデルをアップロードした場合、集約にこれらのモデルを使用することで、グローバルモデルの精度が大幅に低下する。従来のフェデレーション学習アルゴリズムはこれらの問題に対処しない。本稿では,2段階のアプローチに基づく強化学習を用いたモデル融合手法であるFedDRLを提案する。最初の段階では、我々の手法は悪意あるモデルをフィルタリングし、信頼されたクライアントモデルを選択してモデル融合に参加する。第2段階では、FedDRLアルゴリズムは信頼されたクライアントモデルの重みを適応的に調整し、最適なグローバルモデルを集約する。また、5つのモデル融合シナリオを定義し、それらのシナリオにおける2つのベースラインアルゴリズムと比較する。実験結果から,本アルゴリズムは精度を維持しつつ,他のアルゴリズムよりも信頼性が高いことがわかった。 Traditional federated learning uses the number of samples to calculate the weights of each client model and uses this fixed weight value to fusion the global model. However, in practical scenarios, each client's device and data heterogeneity leads to differences in the quality of each client's model. Thus the contribution to the global model is not wholly determined by the sample size. In addition, if clients intentionally upload low-quality or malicious models, using these models for aggregation will lead to a severe decrease in global model accuracy. Traditional federated learning algorithms do not address these issues. To solve this probelm, we propose FedDRL, a model fusion approach using reinforcement learning based on a two staged approach. In the first stage, Our method could filter out malicious models and selects trusted client models to participate in the model fusion. In the second stage, the FedDRL algorithm adaptively adjusts the weights of the trusted client models and aggregates the optimal global model. We also define five model fusion scenarios and compare our method with two baseline algorithms in those scenarios. The experimental results show that our algorithm has higher reliability than other algorithms while maintaining accuracy.	翻訳日:2024-03-21 01:00:25 公開日:2024-03-19
# アクティブラーニングに基づく事前学習データ重複モデル A Pre-trained Data Deduplication Model based on Active Learning ( http://arxiv.org/abs/2308.00721v2 ) ライセンス: Link先を確認	Xinyao Liu, Shengdong Du, Fengmao Lv, Hongtao Xue, Jie Hu, Tianrui Li,	(参考訳) ビッグデータの時代、データ品質の問題はますます顕著になっている。主な課題の1つは重複データの問題であり、これは反復的なエントリや複数のデータソースのマージによって生じる可能性がある。これらの"汚れたデータ"問題は、ビッグデータの効果的な適用を著しく制限することができる。データ重複の問題に対処するため,本研究では,アクティブラーニングをベースとした事前学習型重複解消モデルを提案する。このモデルは、事前訓練されたトランスフォーマー上に構築され、復号化問題を分類タスクのシーケンスとして解くために微調整され、まず、トランスフォーマーとアクティブラーニングをエンド・ツー・エンドのアーキテクチャに統合し、復号化モデルのトレーニングに最も有用なデータを選択するとともに、R-Drop法を用いてラベル付きデータのラウンド毎にデータ拡張を行い、手動ラベリングのコストを低減し、モデルの性能を向上させる。実験結果から,提案モデルが従来のデータ識別技術(SOTA)よりも優れており,ベンチマークデータセット上でのリコールスコアが最大28%向上していることがわかった。 In the era of big data, the issue of data quality has become increasingly prominent. One of the main challenges is the problem of duplicate data, which can arise from repeated entry or the merging of multiple data sources. These "dirty data" problems can significantly limit the effective application of big data. To address the issue of data deduplication, we propose a pre-trained deduplication model based on active learning, which is the first work that utilizes active learning to address the problem of deduplication at the semantic level. The model is built on a pre-trained Transformer and fine-tuned to solve the deduplication problem as a sequence to classification task, which firstly integrate the transformer with active learning into an end-to-end architecture to select the most valuable data for deduplication model training, and also firstly employ the R-Drop method to perform data augmentation on each round of labeled data, which can reduce the cost of manual labeling and improve the model's performance. Experimental results demonstrate that our proposed model outperforms previous state-of-the-art (SOTA) for deduplicated data identification, achieving up to a 28% improvement in Recall score on benchmark datasets.	翻訳日:2024-03-21 01:00:25 公開日:2024-03-19
# EventBind: イベントベースのオープンワールド理解のためのバインディングテーマの統一表現学習 EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding ( http://arxiv.org/abs/2308.03135v5 ) ライセンス: Link先を確認	Jiazhou Zhou, Xu Zheng, Yuanhuiyi Lyu, Lin Wang,	(参考訳) 本稿では,大規模イベントベースデータセットの欠如を補うために,イベントベース認識のための視覚言語モデル(VLM)の可能性を明らかにする,新しい効果的なフレームワークであるEventBindを提案する。特に、画像テキストデータとの相違と大規模データセットの欠如により、画像、テキスト、イベントの共通表現空間を学習するのは簡単ではない。 1)CLIPの視覚エンコーダをイベントデータに一般化する方法。 2)マルチモーダル埋め込み、すなわち画像、テキスト、イベントを効果的に整列する方法。そこで我々はまず,イベントから時間情報を微妙にモデル化する新しいイベントエンコーダを導入するとともに,モダリティブリッジのためのイベントプロンプトを生成する。提案するイベントエンコーダ,テキストエンコーダ,画像エンコーダを用いて,新たな階層型三重コントラストアライメント(HTCA)モジュールを導入し,相関関係の最適化と3つのモード間の効率的な知識伝達を実現する。 N-Caltech101 (+5.34%と+1.70%) や N-Imagenet (+5.65%と+1.99%) といった,微調整と20ショットの設定で, 従来手法と比較して, 新たな最先端の精度を実現している。さらに、私たちのEventBindは、テキストや画像クエリを使用して、イベント検索タスクに柔軟に拡張することができ、妥当なパフォーマンスを示します。私たちのプロジェクトコードは公開されます。 In this paper, we propose EventBind, a novel and effective framework that unleashes the potential of vision-language models (VLMs) for event-based recognition to compensate for the lack of large-scale event-based datasets. In particular, due to the distinct modality gap with the image-text data and the lack of large-scale datasets, learning a common representation space for images, texts, and events is non-trivial.Intuitively, we need to address two key challenges: 1) how to generalize CLIP's visual encoder to event data while fully leveraging events' unique properties, e.g., sparsity and high temporal resolution; 2) how to effectively align the multi-modal embeddings, i.e., image, text, and events. Accordingly, we first introduce a novel event encoder that subtly models the temporal information from events and meanwhile, generates event prompts for modality bridging. We then design a text encoder that generates content prompts and utilizes hybrid text prompts to enhance EventBind's generalization ability across diverse datasets.With the proposed event encoder, text encoder, and image encoder, a novel Hierarchical Triple Contrastive Alignment (HTCA) module is introduced to jointly optimize the correlation and enable efficient knowledge transfer among the three modalities. We evaluate various settings, including fine-tuning and few-shot on three benchmarks, and our EventBind achieves new state-of-the-art accuracy compared with the previous methods, such as on N-Caltech101 (+5.34% and +1.70%) and N-Imagenet (+5.65% and +1.99%) with fine-tuning and 20-shot settings, respectively. Moreover, our EventBind can be flexibly extended to the event retrieval task using text or image queries, showing plausible performance. Our project code will be made publicly available.	翻訳日:2024-03-21 01:00:25 公開日:2024-03-19
# 惑星ローバーの高速かつ最適学習経路計画法 A Fast and Optimal Learning-based Path Planning Method for Planetary Rovers ( http://arxiv.org/abs/2308.04792v2 ) ライセンス: Link先を確認	Yiming Ji, Yang Liu, Guanghu Xie, Boyu Ma, Zongwu Xie, Baoshi Cao,	(参考訳) インテリジェントな自律経路計画は、惑星探査機の探索効率を向上させるために不可欠である。本稿では,NNPPと呼ばれる標高マップの最適経路を高速に探索する学習手法を提案する。 NNPPモデルは、多数の事前注釈付き最適経路デモから開始位置とゴール位置のセマンティック情報とマップ表現を学習し、地図上の最適経路に属する可能性を表す画素ごとの確率分布を生成する。より具体的には、DEMから得られた勾配、粗さ、標高差から各格子セルのトラバースコストを算出する。その後、ガウス分布を用いて開始位置とゴール位置を符号化し、モデル性能への影響について異なる位置符号化パラメータを解析する。トレーニング後、NNPPモデルは、新しい地図上で経路計画を実行することができる。実験の結果,NNPPモデルにより生成された誘導場は,同じハードウェア条件下での最適経路の探索時間を著しく短縮することができ,NNPPの利点は地図の規模によって増大することがわかった。 Intelligent autonomous path planning is crucial to improve the exploration efficiency of planetary rovers. In this paper, we propose a learning-based method to quickly search for optimal paths in an elevation map, which is called NNPP. The NNPP model learns semantic information about start and goal locations, as well as map representations, from numerous pre-annotated optimal path demonstrations, and produces a probabilistic distribution over each pixel representing the likelihood of it belonging to an optimal path on the map. More specifically, the paper computes the traversal cost for each grid cell from the slope, roughness and elevation difference obtained from the DEM. Subsequently, the start and goal locations are encoded using a Gaussian distribution and different location encoding parameters are analyzed for their effect on model performance. After training, the NNPP model is able to perform path planning on novel maps. Experiments show that the guidance field generated by the NNPP model can significantly reduce the search time for optimal paths under the same hardware conditions, and the advantage of NNPP increases with the scale of the map.	翻訳日:2024-03-21 01:00:25 公開日:2024-03-19
# EasyEdit: 大規模言語モデルのための使いやすい知識編集フレームワーク EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models ( http://arxiv.org/abs/2308.07269v2 ) ライセンス: Link先を確認	Peng Wang, Ningyu Zhang, Bozhong Tian, Zekun Xi, Yunzhi Yao, Ziwen Xu, Mengru Wang, Shengyu Mao, Xiaohan Wang, Siyuan Cheng, Kangwei Liu, Yuansheng Ni, Guozhou Zheng, Huajun Chen,	(参考訳) 大きな言語モデル(LLM)は、通常、知識の切り離しや誤りの問題に悩まされる。この目的のために、LLMの知識編集アプローチが数多く登場し、更新された知識を微妙に注入/編集したり、望ましくない振る舞いを調整したりしながら、無関係な入力への影響を最小限に抑えることを目的としている。しかし,様々な知識編集手法とタスク設定の違いにより,実践者がアプリケーションに知識編集を適用することを妨げる標準実装フレームワークがコミュニティに存在しない。これらの問題に対処するため,LLM のための知識編集フレームワーク EasyEdit を提案する。様々な最先端の知識編集アプローチをサポートしており、T5、GPT-J、LlaMAなど、よく知られたLLMにも容易に適用できる。実験的に,LlaMA-2の知識編集結果をEasyEditで報告し,信頼性と一般化の観点から,知識編集が従来の微調整を上回ることを示した。 Google Colabのチュートリアルと初心者が始めるための包括的なドキュメントとともに、ソースコードをGitHubでリリースしました。また,リアルタイム知識編集のためのオンラインシステムとデモビデオも提示する。 Large Language Models (LLMs) usually suffer from knowledge cutoff or fallacy issues, which means they are unaware of unseen events or generate text with incorrect facts owing to outdated/noisy data. To this end, many knowledge editing approaches for LLMs have emerged -- aiming to subtly inject/edit updated knowledge or adjust undesired behavior while minimizing the impact on unrelated inputs. Nevertheless, due to significant differences among various knowledge editing methods and the variations in task setups, there is no standard implementation framework available for the community, which hinders practitioners from applying knowledge editing to applications. To address these issues, we propose EasyEdit, an easy-to-use knowledge editing framework for LLMs. It supports various cutting-edge knowledge editing approaches and can be readily applied to many well-known LLMs such as T5, GPT-J, LlaMA, etc. Empirically, we report the knowledge editing results on LlaMA-2 with EasyEdit, demonstrating that knowledge editing surpasses traditional fine-tuning in terms of reliability and generalization. We have released the source code on GitHub, along with Google Colab tutorials and comprehensive documentation for beginners to get started. Besides, we present an online system for real-time knowledge editing, and a demo video.	翻訳日:2024-03-21 01:00:25 公開日:2024-03-19
# O$^2$-Recon: 事前学習した2次元拡散モデルによるシーンにおける付加物体の3次元再構成を補完する O$^2$-Recon: Completing 3D Reconstruction of Occluded Objects in the Scene with a Pre-trained 2D Diffusion Model ( http://arxiv.org/abs/2308.09591v3 ) ライセンス: Link先を確認	Yubin Hu, Sheng Ye, Wang Zhao, Matthieu Lin, Yuze He, Yu-Hui Wen, Ying He, Yong-Jin Liu,	(参考訳) 咬合は、RGB-Dビデオからの3D再構成において一般的な問題であり、しばしばオブジェクトの完全な再構成をブロックし、進行中の問題を提示する。本稿では,物体の隠れた部分の完全な表面を再構築する2次元拡散に基づくインペインティングモデルを用いて,新しい枠組みを提案する。具体的には,事前学習した拡散モデルを用いて2次元画像の隠れた領域を埋める。次に、これらのインペイント画像を用いて、各インスタンスのニューラル暗示表面表現を最適化し、3D再構成する。このプロセスに必要な塗装マスクの作成は難しいので、我々は高品質なマスクを作成するために、人間のエンゲージメントをほとんど含まない、ループ内戦略を採用しています。さらに、ビデオは通常、限られた視点から撮影されるため、オブジェクトの一部を完全に隠すことができる。そこで我々は,これらの見えない領域の回復を確保するために,符号付き距離場を予測し,位置符号化の周波数帯域を多用し,全体的な滑らかさを維持するカスケードネットワークアーキテクチャを開発した。一般的なレンダリング損失、アイコン損失、シルエット損失に加えて、CLIPに基づくセマンティック一貫性損失を採用し、見えないカメラアングルから表面を誘導する。 ScanNetのシーンでの実験では,シーンレベルのRGB-Dビデオからのオブジェクトレベルの再構築において,最先端の精度と完全性を実現している。コード:https://github.com/THU-LYJ-Lab/O2-Recon Occlusion is a common issue in 3D reconstruction from RGB-D videos, often blocking the complete reconstruction of objects and presenting an ongoing problem. In this paper, we propose a novel framework, empowered by a 2D diffusion-based in-painting model, to reconstruct complete surfaces for the hidden parts of objects. Specifically, we utilize a pre-trained diffusion model to fill in the hidden areas of 2D images. Then we use these in-painted images to optimize a neural implicit surface representation for each instance for 3D reconstruction. Since creating the in-painting masks needed for this process is tricky, we adopt a human-in-the-loop strategy that involves very little human engagement to generate high-quality masks. Moreover, some parts of objects can be totally hidden because the videos are usually shot from limited perspectives. To ensure recovering these invisible areas, we develop a cascaded network architecture for predicting signed distance field, making use of different frequency bands of positional encoding and maintaining overall smoothness. Besides the commonly used rendering loss, Eikonal loss, and silhouette loss, we adopt a CLIP-based semantic consistency loss to guide the surface from unseen camera angles. Experiments on ScanNet scenes show that our proposed framework achieves state-of-the-art accuracy and completeness in object-level reconstruction from scene-level RGB-D videos. Code: https://github.com/THU-LYJ-Lab/O2-Recon.	翻訳日:2024-03-21 00:50:27 公開日:2024-03-19
# テンソル表現による逆流補正の再検討:Fermi-Hubbard型モデルのベンチマーク Revisiting Backflow Corrections by Tensor Representations: Benchmarks on Fermi-Hubbard-type Models ( http://arxiv.org/abs/2308.11823v5 ) ライセンス: Link先を確認	Yu-Tong Zhou, Zheng-Wei Zhou, Xiao Liang,	(参考訳) 量子多体問題は凝縮物質物理学において重要なトピックである。この問題を解決するために波動関数の表現能力を向上させるためにいくつかの手法が開発され、Fermi-Hubbardモデルでは現在の最先端の手法はニューラルネットワークのバックフローと隠れフェルミオンSlater行列式である。逆流補正は、自由粒子のスレーター行列式を改善する効率的な方法である。本研究では、バックフロー補正波動関数のテンソル表現を提案し、スピンレス$t$-$V$モデルでは、エネルギー精度が現在の最先端テンソルネットワーク法よりも高いか、あるいは低いかを示す。スピンを持つモデルでは、軌道と粒子の間の異なるスピンの非ゼロ逆流補正を考慮し、表現能力をさらに向上する。我々は、STO-3Gに基づく分子と、周期的および円筒的ボーダリー条件を持つフェルミ・ハッバードモデルについてベンチマークを行った。提案手法は,現在の最先端手法よりも競争力や低エネルギー化を実現していることを示す。 The quantum many-body problem is an important topic in condensed matter physics. To efficiently solve the problem, several methods have been developped to improve the representation ability of wave-functions.For the Fermi-Hubbard model, current state-of-the-art methods are neural network backflows and the hidden fermion Slater determinant. The backflow correction is an efficient way to improve the Slater determinant of free-particles. In this work we propose a tensor representation of the backflow corrected wave-function, we show that for the spinless $t$-$V$ model, the energy precision is competitive or even lower than current state-of-the-art tensor network methods. For models with spin, we further improve the representation ability by considering non-zero backflow corrections on different spins between the orbital and the particle. We benchmark on molecules under STO-3G basis and the Fermi-Hubbard model with periodic and cylindrical boudary conditions. We show that our methods achieve competitive or even lower energy results than current state-of-the-art methods.	翻訳日:2024-03-21 00:50:27 公開日:2024-03-19
# Dysen-VDM: LLMを用いたダイナミクス対応テキスト・ビデオ拡散 Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs ( http://arxiv.org/abs/2308.13812v2 ) ライセンス: Link先を確認	Hao Fei, Shengqiong Wu, Wei Ji, Hanwang Zhang, Tat-Seng Chua,	(参考訳) テキスト・ツー・ビデオ(T2V)合成は,最近出現した拡散モデル (DM) が,過去のアプローチよりも有望な性能を示したコミュニティで注目を集めている。既存の最先端のDMは高精細なビデオ生成を実現する能力があるが、ビデオ合成の要点である複雑な時間ダイナミクスモデリングに関して重要な限界(例えば、アクション発生障害、粗雑なビデオ運動)に悩まされる。本研究では,高品質なT2V生成のためのDMの映像力学の認知度向上について検討する。人間の直感に触発されて、入力テキストからキーアクションを適切な時間順アレンジで抽出する(ステップ-1)、アクションスケジュールを動的シーングラフ(DSG)表現に変換する(ステップ-2)、そして(ステップ-3)DSG内のシーンを十分かつ合理的にリッチにする(ステップ-3)、革新的な動的シーンマネージャ(Dysen)モジュールを設計する。既存の強力なLLM(例えばChatGPT)をコンテキスト内学習を通じて活用することで、Dysenは(ほぼ)人間レベルの時間的ダイナミックス理解を実現する。最後に、アクションシーンの詳細が豊富な映像DSGを微細な時空間特徴として符号化し、ビデオ生成用バックボーンT2V DMに統合する。一般的なT2Vデータセットの実験から、私たちのDysen-VDMは、特に複雑なアクションのシナリオにおいて、大きなマージンを持つ先行技術よりも一貫して優れています。 https://haofei.vip/Dysen-VDM Text-to-video (T2V) synthesis has gained increasing attention in the community, in which the recently emerged diffusion models (DMs) have promisingly shown stronger performance than the past approaches. While existing state-of-the-art DMs are competent to achieve high-resolution video generation, they may largely suffer from key limitations (e.g., action occurrence disorders, crude video motions) with respect to the intricate temporal dynamics modeling, one of the crux of video synthesis. In this work, we investigate strengthening the awareness of video dynamics for DMs, for high-quality T2V generation. Inspired by human intuition, we design an innovative dynamic scene manager (dubbed as Dysen) module, which includes (step-1) extracting from input text the key actions with proper time-order arrangement, (step-2) transforming the action schedules into the dynamic scene graph (DSG) representations, and (step-3) enriching the scenes in the DSG with sufficient and reasonable details. Taking advantage of the existing powerful LLMs (e.g., ChatGPT) via in-context learning, Dysen realizes (nearly) human-level temporal dynamics understanding. Finally, the resulting video DSG with rich action scene details is encoded as fine-grained spatio-temporal features, integrated into the backbone T2V DM for video generating. Experiments on popular T2V datasets suggest that our Dysen-VDM consistently outperforms prior arts with significant margins, especially in scenarios with complex actions. Codes at https://haofei.vip/Dysen-VDM	翻訳日:2024-03-21 00:50:27 公開日:2024-03-19
# 相対的ガウスのメカニズムとプライベートグラディエント蛍光への応用 The Relative Gaussian Mechanism and its Application to Private Gradient Descent ( http://arxiv.org/abs/2308.15250v2 ) ライセンス: Link先を確認	Hadrien Hendrikx, Paul Mangold, Aurélien Bellet,	(参考訳) リリース前にベクトル値クエリにガウスノイズを追加することで構成されるガウスメカニズム(GM)は、標準的なプライバシ保護メカニズムである。特に、クエリがL2感度特性(隣り合う2つの入力の出力間のL2距離は有界)を尊重すると、GMはR'enyi Differential Privacy (RDP)を保証する。残念ながら、L2感度の正確なバウンドは難しいため、プライバシーのバウンドは緩い。本研究では,2つの問合せ出力間の距離を基準とする相対的L2感度仮定について考察する。この仮定を利用して、雑音の分散が出力のノルムに依存する相対ガウス機構(RGM)を導入する。相対的なL2感度下でのRDPパラメータの厳密な境界を証明し、出力依存ノイズを用いて生じるプライバシー損失を特徴付ける。特に、RGMは自然に出力のノルムを制御する潜在変数に適応することを示す。最後に、我々のフレームワークをインスタンス化し、相対的なL2感度仮定に自然に適合する問題であるPrivate Gradient Descentの厳密な保証を示す。 The Gaussian Mechanism (GM), which consists in adding Gaussian noise to a vector-valued query before releasing it, is a standard privacy protection mechanism. In particular, given that the query respects some L2 sensitivity property (the L2 distance between outputs on any two neighboring inputs is bounded), GM guarantees R\'enyi Differential Privacy (RDP). Unfortunately, precisely bounding the L2 sensitivity can be hard, thus leading to loose privacy bounds. In this work, we consider a Relative L2 sensitivity assumption, in which the bound on the distance between two query outputs may also depend on their norm. Leveraging this assumption, we introduce the Relative Gaussian Mechanism (RGM), in which the variance of the noise depends on the norm of the output. We prove tight bounds on the RDP parameters under relative L2 sensitivity, and characterize the privacy loss incurred by using output-dependent noise. In particular, we show that RGM naturally adapts to a latent variable that would control the norm of the output. Finally, we instantiate our framework to show tight guarantees for Private Gradient Descent, a problem that naturally fits our relative L2 sensitivity assumption.	翻訳日:2024-03-21 00:50:27 公開日:2024-03-19
# ロバスト制御理論を用いたコヒーレントパッシブ量子等化器の設計 Design of Coherent Passive Quantum Equalizers Using Robust Control Theory ( http://arxiv.org/abs/2308.15805v2 ) ライセンス: Link先を確認	V. Ugrinovskii, M. R. James,	(参考訳) 本稿では,量子通信チャネルにおけるコヒーレント等化フィルタの設計手法を提案する。量子通信チャネルの線形量子系モデルが与えられた場合、元のシステムと結合すると環境の劣化効果を緩和する別の量子系を得るのが目的である。本研究の主な成果は、半定値プログラミングによる状態空間ロバストな制御設計法に依存する、体系的等化器合成アルゴリズムである。 The paper develops a methodology for the design of coherent equalizing filters for quantum communication channels. Given a linear quantum system model of a quantum communication channel, the aim is to obtain another quantum system which, when coupled with the original system, mitigates degrading effects of the environment. The main result of the paper is a systematic equalizer synthesis algorithm which relies on methods of state-space robust control design via semidefinite programming.	翻訳日:2024-03-21 00:50:27 公開日:2024-03-19
# クーパー対スプリッターを用いたフェルミオン量子計算 Fermionic quantum computation with Cooper pair splitters ( http://arxiv.org/abs/2309.00447v2 ) ライセンス: Link先を確認	Kostas Vilkelis, Antonio Manesco, Juan Daniel Torres Luna, Sebastian Miles, Michael Wimmer, Anton Akhmerov,	(参考訳) 量子ビットではなく局所フェルミオンモード(LFM)を用いる普遍量子コンピュータの実践的実装を提案する。デバイスレイアウトは、ハイブリッド超伝導島で結合された量子ドットトンネルと、ドット間の可変容量結合からなる。クーパー対分割, 弾性コツネリング, クーロン相互作用のコヒーレント制御により, ブラヴィイとキタエフによって定義された量子ゲートの普遍的な集合を実現できることを示す。電荷量子ビットとの類似性のため、電荷ノイズがデコヒーレンスの主な原因になると期待する。このため、量子ドットが超伝導体に調整可能な結合を持つような代替設計も検討する。この第2のデバイス設計では、局所フェルミオンモードが電荷中立であるスイートスポットが存在し、ノイズ効果に敏感であることを示す。最後に、設計と実験的制約を比較し、それらを克服するための今後の取り組みを提案する。 We propose a practical implementation of a universal quantum computer that uses local fermionic modes (LFM) rather than qubits. The device layout consists of quantum dots tunnel coupled by a hybrid superconducting island and a tunable capacitive coupling between the dots. We show that coherent control of Cooper pair splitting, elastic cotunneling, and Coulomb interactions allows us to implement the universal set of quantum gates defined by Bravyi and Kitaev. Due to the similarity with charge qubits, we expect charge noise to be the main source of decoherence. For this reason, we also consider an alternative design where the quantum dots have tunable coupling to the superconductor. In this second device design, we show that there is a sweetspot for which the local fermionic modes are charge neutral, making the device insensitive to charge noise effects. Finally, we compare both designs and their experimental limitations and suggest future efforts to overcome them.	翻訳日:2024-03-21 00:50:27 公開日:2024-03-19
# 顔認証における視覚的品質改善と対向的攻撃の伝達性 Improving Visual Quality and Transferability of Adversarial Attacks on Face Recognition Simultaneously with Adversarial Restoration ( http://arxiv.org/abs/2309.01582v4 ) ライセンス: Link先を確認	Fengfan Zhou, Hefei Ling, Yuxuan Shi, Jiazhong Chen, Ping Li,	(参考訳) 敵対的な顔の例には、視覚的品質と伝達可能性という2つの重要な特性がある。しかし、既存のアプローチではこれらの特性を同時に扱うことはめったにない。そこで本稿では, 顔の修復に先立って, 顔の視覚的品質と転写性の向上を図った, AdvRestore と呼ばれる新たな対向攻撃手法を提案する。本手法では,顔の復元を目的としたリカバリ潜在拡散モデル(RLDM)を訓練する。次に、RLDMの推論プロセスを用いて、対向顔例を生成する。 RLDMの中間特性に逆方向の摂動を適用した。さらに、RLDM顔復元を兄弟タスクとして扱うことにより、生成した対向顔例の転送性をさらに向上する。提案手法の有効性を実験的に検証した。 Adversarial face examples possess two critical properties: Visual Quality and Transferability. However, existing approaches rarely address these properties simultaneously, leading to subpar results. To address this issue, we propose a novel adversarial attack technique known as Adversarial Restoration (AdvRestore), which enhances both visual quality and transferability of adversarial face examples by leveraging a face restoration prior. In our approach, we initially train a Restoration Latent Diffusion Model (RLDM) designed for face restoration. Subsequently, we employ the inference process of RLDM to generate adversarial face examples. The adversarial perturbations are applied to the intermediate features of RLDM. Additionally, by treating RLDM face restoration as a sibling task, the transferability of the generated adversarial face examples is further improved. Our experimental results validate the effectiveness of the proposed attack method.	翻訳日:2024-03-21 00:50:27 公開日:2024-03-19
# DePT: 切り離されたプロンプトチューニング DePT: Decoupled Prompt Tuning ( http://arxiv.org/abs/2309.07439v2 ) ライセンス: Link先を確認	Ji Zhang, Shihan Wu, Lianli Gao, Heng Tao Shen, Jingkuan Song,	(参考訳) この作業は、即時チューニングにおいてベース・ニュートレードオフ(BNT)ジレンマ(英語版)を突破し、すなわち、チューニングされたモデルがベース(またはターゲット)タスクに一般化するほど、それが新しいタスクに一般化されるほど、その逆である。具体的には、基礎の学習した特徴と新しいタスクの詳細な分析を通して、BNTはチャネルバイアスの問題、すなわち、ほとんどの特徴チャネルがベース固有の知識によって占められていることから、新しいタスクにとって重要なタスクハザード知識が崩壊するのを観察する。これを解決するために,Decoupled Prompt Tuning (DePT) フレームワークを提案する。これは,特徴チャネルからのベース固有知識を即時チューニング中に分離された特徴空間に分離することで,タスク共有知識を元の特徴空間に最大に保存し,新しいタスクのゼロショットの一般化を向上する。重要なことは、DePTは既存のプロンプトチューニング手法と直交しているため、それらすべてを改善することができる。 11のデータセットに対する大規模な実験は、DePTの柔軟性と有効性を示している。私たちのコードと事前訓練されたモデルはhttps://github.com/Koorye/DePT.orgで公開されています。 This work breaks through the Base-New Tradeoff (BNT)dilemma in prompt tuning, i.e., the better the tuned model generalizes to the base (or target) task, the worse it generalizes to new tasks, and vice versa. Specifically, through an in-depth analysis of the learned features of the base and new tasks, we observe that the BNT stems from a channel bias issue, i.e., the vast majority of feature channels are occupied by base-specific knowledge, resulting in the collapse of taskshared knowledge important to new tasks. To address this, we propose the Decoupled Prompt Tuning (DePT) framework, which decouples base-specific knowledge from feature channels into an isolated feature space during prompt tuning, so as to maximally preserve task-shared knowledge in the original feature space for achieving better zero-shot generalization on new tasks. Importantly, our DePT is orthogonal to existing prompt tuning methods, hence it can improve all of them. Extensive experiments on 11 datasets show the strong flexibility and effectiveness of DePT. Our code and pretrained models are available at https://github.com/Koorye/DePT.	翻訳日:2024-03-21 00:50:27 公開日:2024-03-19
# 安全に配慮したLLaMA: インストラクションをフォローする大規模言語モデルの安全性向上から学ぶ Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions ( http://arxiv.org/abs/2309.07875v3 ) ライセンス: Link先を確認	Federico Bianchi, Mirac Suzgun, Giuseppe Attanasio, Paul Röttger, Dan Jurafsky, Tatsunori Hashimoto, James Zou,	(参考訳) 命令に従うために大規模な言語モデルをトレーニングすることで、幅広いタスクでパフォーマンスが向上し、一般的にはより役に立つようになる。しかし、完璧に有用なモデルは、最も悪意のある命令にも従い、有害なコンテンツを簡単に生成する。本稿では,その指導指導において,無害性にのみ注目するモデルの安全性に対する懸念を提起する。いくつかの一般的な命令調整モデルは非常に安全でないことを示す。さらに,LLaMAのようなモデルを微調整した場合,安全性を著しく向上させるには,わずか3%の安全性例(数百のデモ)を追加するだけでよいことを示す。私たちの安全チューニングは、標準ベンチマークによって測定されたモデルの性能や有用性を著しく低下させません。しかし、過度に安全性を調整しすぎると、安全でないモデルに似ていれば、モデルは完全に安全なプロンプトを拒否する、という大げさな安全行動が見つかります。全体としては、LLMをトレーニングし、安全性をトレーニングする上でのトレードオフについて説明します。 Training large language models to follow instructions makes them perform better on a wide range of tasks and generally become more helpful. However, a perfectly helpful model will follow even the most malicious instructions and readily generate harmful content. In this paper, we raise concerns over the safety of models that only emphasize helpfulness, not harmlessness, in their instruction-tuning. We show that several popular instruction-tuned models are highly unsafe. Moreover, we show that adding just 3% safety examples (a few hundred demonstrations) when fine-tuning a model like LLaMA can substantially improve its safety. Our safety-tuning does not make models significantly less capable or helpful as measured by standard benchmarks. However, we do find exaggerated safety behaviours, where too much safety-tuning makes models refuse perfectly safe prompts if they superficially resemble unsafe ones. As a whole, our results illustrate trade-offs in training LLMs to be helpful and training them to be safe.	翻訳日:2024-03-21 00:40:38 公開日:2024-03-19
# 昼夜のチャット: イベントカメラで誘導されるロバストで効率的な全日物検出に向けて Chasing Day and Night: Towards Robust and Efficient All-Day Object Detection Guided by an Event Camera ( http://arxiv.org/abs/2309.09297v2 ) ライセンス: Link先を確認	Jiahang Cao, Xu Zheng, Yuanhuiyi Lyu, Jiaxu Wang, Renjing Xu, Lin Wang,	(参考訳) すべての照明条件(通常、過露光、過露光など)の物体を検出する能力は、自動運転車のような現実の用途には不可欠であり、先進的なRGBベースの検出器は、このような様々な照明条件下ではしばしば失敗する。近年の研究では、新しいイベントカメラを使用してRGBのモダリティを補う、またはガイドするが、これらの手法は通常、RGBのモダリティに大きく依存する非対称ネットワーク構造を採用しており、全日検出の堅牢性に限界がある。本稿では,RGBとイベントモダリティの両方を融合させることで,堅牢かつ効率的な全日検出を実現する新しいオブジェクト検出フレームワークであるEOLOを提案する。我々のEOLOフレームワークは、イベントの非同期特性を効率的に活用するために、軽量スパイキングニューラルネットワーク(SNN)に基づいて構築されている。イベントの時間的アテンション(ETA)モジュールを導入し、重要なエッジ情報を保持しながら、イベントから高い時間的情報を学ぶ。第2に、様々なモードが様々な照明条件下で様々な重要性を示すため、特定のモダリティに頼ることなくRGBイベント機能を効果的に融合するSymmetric RGB-Event Fusion (SREF) モジュールを提案する。さらに,全日連続トレーニングと評価のためのRGB-Eventデータセットの欠如を補うために,単一露光画像からイベントフレームを直接生成可能なランダム化光学フローに基づくイベント合成手法を提案する。さらに、人気のあるベンチマークであるMSCOCOとPASCAL VOCに基づいて、E-MSCOCOとE-VOCという2つの新しいデータセットを構築します。大規模な実験では、EOLOはすべての照明条件において、最先端の検出器であるRENetよりも相当なマージン(+3.74% mAP50)を上回り、我々のコードとデータセットはhttps://vlislab22.github.io/EOLO/で利用可能である。 The ability to detect objects in all lighting (i.e., normal-, over-, and under-exposed) conditions is crucial for real-world applications, such as self-driving.Traditional RGB-based detectors often fail under such varying lighting conditions.Therefore, recent works utilize novel event cameras to supplement or guide the RGB modality; however, these methods typically adopt asymmetric network structures that rely predominantly on the RGB modality, resulting in limited robustness for all-day detection. In this paper, we propose EOLO, a novel object detection framework that achieves robust and efficient all-day detection by fusing both RGB and event modalities. Our EOLO framework is built based on a lightweight spiking neural network (SNN) to efficiently leverage the asynchronous property of events. Buttressed by it, we first introduce an Event Temporal Attention (ETA) module to learn the high temporal information from events while preserving crucial edge information. Secondly, as different modalities exhibit varying levels of importance under diverse lighting conditions, we propose a novel Symmetric RGB-Event Fusion (SREF) module to effectively fuse RGB-Event features without relying on a specific modality, thus ensuring a balanced and adaptive fusion for all-day detection. In addition, to compensate for the lack of paired RGB-Event datasets for all-day training and evaluation, we propose an event synthesis approach based on the randomized optical flow that allows for directly generating the event frame from a single exposure image. We further build two new datasets, E-MSCOCO and E-VOC based on the popular benchmarks MSCOCO and PASCAL VOC. Extensive experiments demonstrate that our EOLO outperforms the state-of-the-art detectors,e.g.,RENet,by a substantial margin (+3.74% mAP50) in all lighting conditions.Our code and datasets will be available at https://vlislab22.github.io/EOLO/	翻訳日:2024-03-21 00:40:38 公開日:2024-03-19
# AV-SUPERB:オーディオ映像表現モデルのためのマルチタスク評価ベンチマーク AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models ( http://arxiv.org/abs/2309.10787v2 ) ライセンス: Link先を確認	Yuan Tseng, Layne Berry, Yi-Ting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Po-Yao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Shinji Watanabe, Abdelrahman Mohamed, Chi-Luen Feng, Hung-yi Lee,	(参考訳) 視聴覚表現学習は,聴覚情報と視覚情報との相関を利用して,人間のような知覚を持つシステムを開発することを目的としている。しかし、現在のモデルは限られたタスクに焦点をあてることが多く、学習された表現の一般化能力は不明確である。そこで本研究では,音声・音声処理における5つの音声・視覚タスクをカバーする7つのデータセットに対して,音声・視覚・バイモーダル融合表現の汎用的評価を可能にするAV-SUPERBベンチマークを提案する。我々は,最近の5つの自己教師型モデルを評価し,これらのモデルがすべてのタスクに一般化されることはなく,今後のユニバーサルモデルの性能向上に向けた研究の必要性を強調した。さらに,AudioSetを用いた中間タスクの微調整と音声イベント分類によって表現が改善されることが示唆された。評価コードとモデル提出プラットフォームを備えたベンチマークを公開し、音声視覚学習のさらなる研究を奨励する。 Audio-visual representation learning aims to develop systems with human-like perception by utilizing correlation between auditory and visual information. However, current models often focus on a limited set of tasks, and generalization abilities of learned representations are unclear. To this end, we propose the AV-SUPERB benchmark that enables general-purpose evaluation of unimodal audio/visual and bimodal fusion representations on 7 datasets covering 5 audio-visual tasks in speech and audio processing. We evaluate 5 recent self-supervised models and show that none of these models generalize to all tasks, emphasizing the need for future study on improving universal model performance. In addition, we show that representations may be improved with intermediate-task fine-tuning and audio event classification with AudioSet serves as a strong intermediate task. We release our benchmark with evaluation code and a model submission platform to encourage further research in audio-visual learning.	翻訳日:2024-03-21 00:40:38 公開日:2024-03-19
# BenLLMEval: Bengali NLPにおける大規模言語モデルの可能性と落とし穴に関する総合的な評価 BenLLMEval: A Comprehensive Evaluation into the Potentials and Pitfalls of Large Language Models on Bengali NLP ( http://arxiv.org/abs/2309.13173v2 ) ライセンス: Link先を確認	Mohsinul Kabir, Mohammed Saidul Islam, Md Tahmid Rahman Laskar, Mir Tafseer Nayeem, M Saiful Bari, Enamul Hoque,	(参考訳) 言語モデル(LLM)は、言語生成やその他の言語固有のタスクにおいて、NLPにおいて最も重要なブレークスルーの1つとなっている。 LLMは様々なタスク、主に英語で評価されてきたが、ベンガル語 (Bangla) のような未公開言語では十分に評価されていない。この目的のために,本論文では,モデストリソースを持つベンガル言語において,LLMを総合的に評価し,その性能をベンチマークするBenLLM-Evalを紹介する。そこで本研究では,GPT-3.5,LLaMA-2-13b-chat,Claude-2のゼロショット評価のために,テキスト要約,質問応答,パラフレージング,自然言語推論,文字化,テキスト分類,感情分析などの重要なベンガルNLPタスクを選択する。実験の結果、ベンガルのNLPタスクでは、ゼロショットLLMは、現在のSOTAモデルよりも同等またはそれ以上の性能を達成できたが、ほとんどのタスクでは、現在のSOTA結果と比較して、LLaMA-2-13b-chatのようなオープンソースのLLMの性能がかなり悪いため、その性能は非常に劣っていることがわかった。そのため、ベンガル語のようなモデストソース言語におけるLLMの理解を深めるためのさらなる努力が求められている。 Large Language Models (LLMs) have emerged as one of the most important breakthroughs in NLP for their impressive skills in language generation and other language-specific tasks. Though LLMs have been evaluated in various tasks, mostly in English, they have not yet undergone thorough evaluation in under-resourced languages such as Bengali (Bangla). To this end, this paper introduces BenLLM-Eval, which consists of a comprehensive evaluation of LLMs to benchmark their performance in the Bengali language that has modest resources. In this regard, we select various important and diverse Bengali NLP tasks, such as text summarization, question answering, paraphrasing, natural language inference, transliteration, text classification, and sentiment analysis for zero-shot evaluation of popular LLMs, namely, GPT-3.5, LLaMA-2-13b-chat, and Claude-2. Our experimental results demonstrate that while in some Bengali NLP tasks, zero-shot LLMs could achieve performance on par, or even better than current SOTA fine-tuned models; in most tasks, their performance is quite poor (with the performance of open-source LLMs like LLaMA-2-13b-chat being significantly bad) in comparison to the current SOTA results. Therefore, it calls for further efforts to develop a better understanding of LLMs in modest-resourced languages like Bengali.	翻訳日:2024-03-21 00:40:38 公開日:2024-03-19
# BAMBOO:大規模言語モデルの長文モデリング能力評価のための総合ベンチマーク BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models ( http://arxiv.org/abs/2309.13345v3 ) ライセンス: Link先を確認	Zican Dong, Tianyi Tang, Junyi Li, Wayne Xin Zhao, Ji-Rong Wen,	(参考訳) 大規模言語モデル(LLM)は、通常の長さのNLPタスクよりも劇的な熟練を実現している。近年,LLMの文脈長の延長と長文モデリング機能の向上に,複数の研究が取り組んでいる。 LLMの長期コンテキスト能力を総合的に評価するために,マルチタスク長コンテキストベンチマークであるBAMBOOを提案する。 BAMBOOは、包括的なキャパシティ評価、データ汚染の回避、正確な自動評価、異なる長さレベルという4つの原則で設計されている。質問応答、幻覚検出、テキストソート、言語モデリング、コード補完の5つの異なる長文理解タスクから10のデータセットで構成され、中核容量とLLMの様々な領域をカバーする。 BAMBOO上で5つの長期文脈モデルを用いて実験を行い、さらに長文の4つの重要な研究課題について考察する。また、現状の長期文脈モデルを定性的に分析し、長期テキストモデリング能力を高めるための今後の方向性を指摘する。データ、プロンプト、コードはhttps://github.com/RUCAIBox/BAMBOO.comで公開しています。 Large language models (LLMs) have achieved dramatic proficiency over NLP tasks with normal length. Recently, multiple studies have committed to extending the context length and enhancing the long text modeling capabilities of LLMs. To comprehensively evaluate the long context ability of LLMs, we propose BAMBOO, a multi-task long context benchmark. BAMBOO has been designed with four principles: comprehensive capacity evaluation, avoidance of data contamination, accurate automatic evaluation, and different length levels. It consists of 10 datasets from 5 different long text understanding tasks, i.e. question answering, hallucination detection, text sorting, language modeling, and code completion, to cover core capacities and various domains of LLMs. We conduct experiments with five long context models on BAMBOO and further discuss four key research questions of long text. We also qualitatively analyze current long context models and point out future directions for enhancing long text modeling capacities. We release our data, prompts, and code at https://github.com/RUCAIBox/BAMBOO.	翻訳日:2024-03-21 00:40:38 公開日:2024-03-19
# Light Schrödinger 橋 Light Schrödinger Bridge ( http://arxiv.org/abs/2310.01174v3 ) ライセンス: Link先を確認	Alexander Korotin, Nikita Gushchin, Evgeny Burnaev,	(参考訳) Schr\"odinger Bridges (SB) の分野での最近の進歩にもかかわらず、既存のSBソルバは依然として重み付けされており、複数のニューラルネットワークの複雑な最適化が必要である。クラスタリングにおける$k$-means法、分類におけるロジスティック回帰、離散的最適輸送におけるSinkhornアルゴリズムのように、SBの単純なyet効果ベースラインの役割を果たす主解法は存在しないことが判明した。本稿では,この問題に対処し,高速で簡単なSB解法を提案する。私たちの開発は、最近この分野に登場した2つのアイデアのスマートな組み合わせです。 (a)sum-exp二次関数とSchr\"odingerポテンシャルのパラメータ化 b)log-Schr\"odinger電位をエネルギー関数として見る。これらのアイデアを組み合わせることで、単純な最適化目的を持つ軽量でシミュレーション不要で理論的に正当化されたSBソルバが得られることを示す。結果として、痛みを伴うハイパーパラメータ選択なしで、CPU上で数分で適度な次元でSBを解くことができる。我々の光解法は密度推定に広く用いられているガウス混合モデルに類似している。この類似性に着想を得て、光解法がSBの普遍近似であることを示す重要な理論的結果も証明した。さらに,光解像器の一般化誤差の解析を行った。我々のソルバのコードはhttps://github.com/ngushchin/LightSBで確認できる。 Despite the recent advances in the field of computational Schr\"odinger Bridges (SB), most existing SB solvers are still heavy-weighted and require complex optimization of several neural networks. It turns out that there is no principal solver which plays the role of simple-yet-effective baseline for SB just like, e.g., $k$-means method in clustering, logistic regression in classification or Sinkhorn algorithm in discrete optimal transport. We address this issue and propose a novel fast and simple SB solver. Our development is a smart combination of two ideas which recently appeared in the field: (a) parameterization of the Schr\"odinger potentials with sum-exp quadratic functions and (b) viewing the log-Schr\"odinger potentials as the energy functions. We show that combined together these ideas yield a lightweight, simulation-free and theoretically justified SB solver with a simple straightforward optimization objective. As a result, it allows solving SB in moderate dimensions in a matter of minutes on CPU without a painful hyperparameter selection. Our light solver resembles the Gaussian mixture model which is widely used for density estimation. Inspired by this similarity, we also prove an important theoretical result showing that our light solver is a universal approximator of SBs. Furthemore, we conduct the analysis of the generalization error of our light solver. The code for our solver can be found at https://github.com/ngushchin/LightSB	翻訳日:2024-03-21 00:40:38 公開日:2024-03-19
# 散逸量子ビットをモニタリングする測定装置からの熱流 Heat flow from a measurement apparatus monitoring a dissipative qubit ( http://arxiv.org/abs/2310.02789v2 ) ライセンス: Link先を確認	Tsuyoshi Yamamoto, Yasuhiro Tokura,	(参考訳) 連続量子測定により, 熱浴に結合したキュービットの熱流について検討した。定常状態限界では、測定装置からキュービットに常に熱が流れ、キュービットと測定装置の間の熱電流の上下境界を導出することを示す。さらに,過渡期における熱電流と過渡期熱の過渡ダイナミクスについて検討した。 We investigate the heat flow of a qubit coupled to heat baths under continuous quantum measurement. In the steady-state limit, we show that heat always flows from the measurement apparatus into the qubit regardless of the measured qubit state and derive lower and upper bounds for the heat current between the qubit and the measurement apparatus. Furthermore, we study the transient dynamics of the heat current and the excess heat during the transient regime.	翻訳日:2024-03-21 00:30:47 公開日:2024-03-19
# 形状分類のための微分可能なオイラー特性変換 Differentiable Euler Characteristic Transforms for Shape Classification ( http://arxiv.org/abs/2310.07630v3 ) ライセンス: Link先を確認	Ernst Roell, Bastian Rieck,	(参考訳) オイラー特性変換(ECT)は、形状とグラフの幾何学的特徴と位相的特徴を組み合わせた強力な表現であることが証明されている。しかし、ECTはタスク固有の表現を学べなかった。我々はこの問題を克服し、エンドツーエンドでECTを学習できる新しい計算層を開発する。我々の手法である微分可能なオイラー特性変換(DECT)は高速かつ計算的に効率的であり、グラフおよび点クラウドの分類タスクにおいてより複雑なモデルと同等の性能を示す。さらに、この一見単純な統計は、より複雑なトポロジカルディープラーニング層と同じトポロジカルな表現性を提供することを示す。 The Euler Characteristic Transform (ECT) has proven to be a powerful representation, combining geometrical and topological characteristics of shapes and graphs. However, the ECT was hitherto unable to learn task-specific representations. We overcome this issue and develop a novel computational layer that enables learning the ECT in an end-to-end fashion. Our method, the Differentiable Euler Characteristic Transform (DECT), is fast and computationally efficient, while exhibiting performance on a par with more complex models in both graph and point cloud classification tasks. Moreover, we show that this seemingly simple statistic provides the same topological expressivity as more complex topological deep learning layers.	翻訳日:2024-03-21 00:30:47 公開日:2024-03-19
# Think, Act, and Ask: オープンワールドの対話型パーソナライズされたロボットナビゲーション Think, Act, and Ask: Open-World Interactive Personalized Robot Navigation ( http://arxiv.org/abs/2310.07968v3 ) ライセンス: Link先を確認	Yinpei Dai, Run Peng, Sikai Li, Joyce Chai,	(参考訳) Zero-Shot Object Navigation (ZSON)は、エージェントが未知の環境でオープン語彙オブジェクトへナビゲートすることを可能にする。 ZSONの既存の研究は主に、汎用オブジェクトクラスを見つけるための個別の命令に従うことに焦点を当てており、自然言語の相互作用の利用や、ユーザ固有のオブジェクトを特定する複雑さを無視している。これらの制限に対処するために、ZIPON(Zero-shot Interactive Personalized Object Navigation)を導入する。 ZIPON を解決するために,Large Language Models (LLM) を用いた Open-woRld Interactive persOnalized Navigation (ORION) と呼ばれる新しいフレームワークを提案する。実験の結果,ユーザフィードバックを活用できる対話型エージェントの性能は著しく向上した。しかし,タスク完了とナビゲーションとインタラクションの効率のバランスが良好であることは,すべての方法において依然として困難である。さらに,多様なユーザフィードバックフォームがエージェントのパフォーマンスに与える影響について,さらなる知見を提供する。 Zero-Shot Object Navigation (ZSON) enables agents to navigate towards open-vocabulary objects in unknown environments. The existing works of ZSON mainly focus on following individual instructions to find generic object classes, neglecting the utilization of natural language interaction and the complexities of identifying user-specific objects. To address these limitations, we introduce Zero-shot Interactive Personalized Object Navigation (ZIPON), where robots need to navigate to personalized goal objects while engaging in conversations with users. To solve ZIPON, we propose a new framework termed Open-woRld Interactive persOnalized Navigation (ORION), which uses Large Language Models (LLMs) to make sequential decisions to manipulate different modules for perception, navigation and communication. Experimental results show that the performance of interactive agents that can leverage user feedback exhibits significant improvement. However, obtaining a good balance between task completion and the efficiency of navigation and interaction remains challenging for all methods. We further provide more findings on the impact of diverse user feedback forms on the agents' performance.	翻訳日:2024-03-21 00:30:47 公開日:2024-03-19
# 現象を補う:仮説補充による言語モデルの帰納的推論能力の検証 Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement ( http://arxiv.org/abs/2310.08559v3 ) ライセンス: Link先を確認	Linlu Qiu, Liwei Jiang, Ximing Lu, Melanie Sclar, Valentina Pyatkin, Chandra Bhagavatula, Bailin Wang, Yoon Kim, Yejin Choi, Nouha Dziri, Xiang Ren,	(参考訳) 基礎となる原則を少数の観察から導き出し、誘導的推論として知られる新しい状況に一般化する能力は、人間の知性の中心である。以前の研究は、言語モデル(LM)が、しばしば帰納的推論に不足していることを示唆している。本研究では,従来のインプット・アウトプット・プロンプトよりも人為的インダクティブ・プロセスをより密接に反映する手法である反復的仮説修正を通じて,LMの帰納的推論能力を体系的に研究する。反復的仮説修正は3段階のプロセス、すなわちテキスト規則の形で仮説を提案し、選択し、修正するプロセスを採用する。中間ルールを検証した結果,LMは現象仮説の提案者(すなわち,候補規則の生成)であり,提案したルールセットを体系的にフィルタリングする(タスク固有の)シンボリックインタプリタと組み合わせることで,因果関係,言語的指示,記号的概念の誘導を必要とする帰納的推論ベンチマークに対して強い結果が得られた。しかし、それらは帰納的推論器としても振る舞うことができ、規則帰納法(可塑性規則を識別する)と規則適用法(インスタンスに提案された規則を適用する)の間に顕著なパフォーマンスギャップを示し、LMが実際に規則を適用することなく仮説を提案していることを示唆している。実験的および人為的分析により, LMの誘導的推論過程と人間とのいくつかの相違が明らかとなり, 誘導的推論タスクにおけるLMの使用の可能性と限界の両方に光を当てる。 The ability to derive underlying principles from a handful of observations and then generalize to novel situations -- known as inductive reasoning -- is central to human intelligence. Prior work suggests that language models (LMs) often fall short on inductive reasoning, despite achieving impressive success on research benchmarks. In this work, we conduct a systematic study of the inductive reasoning capabilities of LMs through iterative hypothesis refinement, a technique that more closely mirrors the human inductive process than standard input-output prompting. Iterative hypothesis refinement employs a three-step process: proposing, selecting, and refining hypotheses in the form of textual rules. By examining the intermediate rules, we observe that LMs are phenomenal hypothesis proposers (i.e., generating candidate rules), and when coupled with a (task-specific) symbolic interpreter that is able to systematically filter the proposed set of rules, this hybrid approach achieves strong results across inductive reasoning benchmarks that require inducing causal relations, language-like instructions, and symbolic concepts. However, they also behave as puzzling inductive reasoners, showing notable performance gaps between rule induction (i.e., identifying plausible rules) and rule application (i.e., applying proposed rules to instances), suggesting that LMs are proposing hypotheses without being able to actually apply the rules. Through empirical and human analyses, we further reveal several discrepancies between the inductive reasoning processes of LMs and humans, shedding light on both the potentials and limitations of using LMs in inductive reasoning tasks.	翻訳日:2024-03-21 00:30:47 公開日:2024-03-19
# 超低ビットレートにおける完全リアリズムによる画像圧縮に向けて Towards image compression with perfect realism at ultra-low bitrates ( http://arxiv.org/abs/2310.10325v2 ) ライセンス: Link先を確認	Marlène Careil, Matthew J. Muckley, Jakob Verbeek, Stéphane Lathuilière,	(参考訳) 画像コーデックは通常、ビットレート \vs歪みメトリクスのトレードオフに最適化される。低ビットレートでは、知覚的または敵対的な損失を伴うトレーニングであっても、容易に知覚できる圧縮アーチファクトが導かれる。画像品質の向上とビットレート依存の除去を目的として,反復拡散モデルを用いてデコードを提案する。本稿では,ベクトル量子化画像表現のデコード処理とグローバルな画像記述を併用して追加のコンテキストを提供する。当社のモデルPerCoを"知覚圧縮"としてダブし、最先端コーデックを0.1から0.003ビット/ピクセルのレートで比較します。後者の速度は、従来考えられていたよりも桁違いに小さく、153バイト未満の512x768 Kodak画像を圧縮する。この超低ビットレートにもかかわらず、我々のアプローチは現実的なイメージを再構築する能力を維持している。 FID と KID で測定した現状の視覚的品質を再現する上で,本モデルが有効であることがわかった。速度歪み知覚理論によって予測されるように、視覚的品質は以前の方法よりもビットレートに依存しない。 Image codecs are typically optimized to trade-off bitrate \vs distortion metrics. At low bitrates, this leads to compression artefacts which are easily perceptible, even when training with perceptual or adversarial losses. To improve image quality and remove dependency on the bitrate, we propose to decode with iterative diffusion models. We condition the decoding process on a vector-quantized image representation, as well as a global image description to provide additional context. We dub our model PerCo for 'perceptual compression', and compare it to state-of-the-art codecs at rates from 0.1 down to 0.003 bits per pixel. The latter rate is more than an order of magnitude smaller than those considered in most prior work, compressing a 512x768 Kodak image with less than 153 bytes. Despite this ultra-low bitrate, our approach maintains the ability to reconstruct realistic images. We find that our model leads to reconstructions with state-of-the-art visual quality as measured by FID and KID. As predicted by rate-distortion-perception theory, visual quality is less dependent on the bitrate than previous methods.	翻訳日:2024-03-21 00:30:47 公開日:2024-03-19
# ハッカーとしてのLLM: 自律的なLinuxPrivategeエスカレーション攻撃 LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks ( http://arxiv.org/abs/2310.11409v3 ) ライセンス: Link先を確認	Andreas Happe, Aaron Kaplan, Jürgen Cito,	(参考訳) ソフトウェアセキュリティテストの不可欠なコンポーネントである浸透テストは、組織がシステムの脆弱性を積極的に識別し、修正し、潜在的なサイバー攻撃に対する防御メカニズムを強化することを可能にする。浸透試験の領域における最近の進歩の1つは言語モデル(LLM)の利用である。 LLMと浸透試験の交差点を探索し、特権拡大の文脈におけるそれらの能力と課題について考察する。ローカル仮想マシンを利用した自動Linux特権エスカレーションベンチマークを作成する。異なるLLMの評価とベンチマークに対する戦略の促進を目的として,LLM誘導型特権エスカレーションツールを提案する。その結果,GPT-4はファイルベースのエクスプロイトの検出に適しており,その脆弱性クラスの75～100\%のテストケースを解決できることがわかった。 GPT-3.5-turboは25-50%しか解決できなかったが、Llama2のようなローカルモデルはいかなるエクスプロイトも検出できなかった。我々は、異なるプロンプト設計の影響、文脈内学習の利点、LLMに高レベルのガイダンスを提供することの利点を分析した。テスト中のフォーカスの維持、エラーへの対処、そして最終的には確率的なオウムと人間のハッカーとの比較など、LLMの課題領域について論じる。 Penetration testing, an essential component of software security testing, allows organizations to proactively identify and remediate vulnerabilities in their systems, thus bolstering their defense mechanisms against potential cyberattacks. One recent advancement in the realm of penetration testing is the utilization of Language Models (LLMs). We explore the intersection of LLMs and penetration testing to gain insight into their capabilities and challenges in the context of privilege escalation. We create an automated Linux privilege-escalation benchmark utilizing local virtual machines. We introduce an LLM-guided privilege-escalation tool designed for evaluating different LLMs and prompt strategies against our benchmark. Our results show that GPT-4 is well suited for detecting file-based exploits as it can typically solve 75-100\% of test-cases of that vulnerability class. GPT-3.5-turbo was only able to solve 25-50% of those, while local models, such as Llama2 were not able to detect any exploits. We analyze the impact of different prompt designs, the benefits of in-context learning, and the advantages of offering high-level guidance to LLMs. We discuss challenging areas for LLMs, including maintaining focus during testing, coping with errors, and finally comparing them with both stochastic parrots as well as with human hackers.	翻訳日:2024-03-21 00:30:47 公開日:2024-03-19
# ハイブリッドアナログデジタル戦略によるSu-Schrieffer-Heeger鎖のエッジ状態伝達の最適化 Optimizing edge state transfer in a Su-Schrieffer-Heeger chain via hybrid analog-digital strategies ( http://arxiv.org/abs/2310.12179v2 ) ライセンス: Link先を確認	Sebastián V. Romero, Xi Chen, Gloria Platero, Yue Ban,	(参考訳) Su-Schrieffer-Heeger(SSH)連鎖は、位相相とその関連するエッジ状態を理解するためのパラダイムモデルとして機能し、量子材料や量子情報処理および技術の理解を深める上で重要な役割を果たす。本稿では,SSHチェーンにおけるエッジ状態の非断熱的かつ高忠実な転送を目的としたハイブリッドアナログデジタルプロトコルを提案する。しかし、特に長距離チェーンにおける伝達忠実性を高めるために、高次ネスト型通勤機が重要となる。実験的な実装を簡略化し,計算複雑性をナビゲートするために,サブラチテンAサイト間の隣り合う次のホッピング項を支配的なCD駆動として同定し,変動量子回路を用いてそれらを最適化する。ディジタル量子シミュレーションにより,障害があっても高速で堅牢な解を実現する能力を示す。このアナログデジタル転送プロトコルは、量子制御手法の拡張であり、エッジ状態転送のための堅牢なフレームワークを確立する。重要なことは、同定された最適なCD駆動は、様々な量子レジスタでシームレスに実装することができ、我々のアプローチの汎用性を強調している。 The Su-Schrieffer-Heeger (SSH) chain, which serves as a paradigmatic model for comprehending topological phases and their associated edge states, plays an essential role in advancing our understanding of quantum materials and quantum information processing and technology. In this paper, we introduce a hybrid analog-digital protocol designed for the nonadiabatic yet high-fidelity transfer of edge states in an SSH chain, featuring two sublattices, A and B. The core of our approach lies in harnessing the approximate time-dependent counterdiabatic (CD) interaction, derived from adiabatic gauge potentials. However, to enhance transfer fidelity, particularly in long-distance chains, higher-order nested commutators become crucial. To simplify the experimental implementation and navigate computational complexities, we identify the next-to-nearest-neighbor hopping terms between sublattice A sites as dominant CD driving and further optimize them by using variational quantum circuits. Through digital quantum simulation, our protocol showcases the capability to achieve rapid and robust solutions, even in the presence of disorder. This analog-digital transfer protocol, an extension of quantum control methodology, establishes a robust framework for edge-state transfer. Importantly, the optimal CD driving identified can be seamlessly implemented across various quantum registers, highlighting the versatility of our approach.	翻訳日:2024-03-21 00:30:47 公開日:2024-03-19
# PGA: 単一ロボットインタラクションによるグラフピングエージェントのパーソナライズ PGA: Personalizing Grasping Agents with Single Human-Robot Interaction ( http://arxiv.org/abs/2310.12547v2 ) ライセンス: Link先を確認	Junghyun Kim, Gi-Cheon Kang, Jaein Kim, Seoyun Yang, Minjoon Jung, Byoung-Tak Zhang,	(参考訳) LCRG(Language-Conditioned Robotic Grasping)は、自然言語の指示に基づいてオブジェクトを理解・把握するロボットを開発することを目的としている。私の財布のような個人的なオブジェクトを理解する能力は、人間のユーザとのより自然なインタラクションを促進するが、現在のLCRGシステムでは、ラップトップの横にある黒いウォレットのような一般的な言語命令しか使えません。この目的のために、大きなラベル付きデータセットではなく、単一の人間とロボットのインタラクションから学習することで、個人指標が与えられた個人オブジェクトをピンポイントし、把握することを目的とした、新しいデータセットとともに、タスクシナリオGraspMineを紹介した。提案手法であるPersonalized Grasping Agent (PGA)は,Reminiscenceと呼ばれる,ユーザの環境のラベルのない画像データを活用することでGraspMineに対処する。具体的には、PGAは、個人オブジェクトに関連指標を提示するユーザによって個人オブジェクト情報を取得し、PGAはそれを回転させてオブジェクトを検査する。得られた情報に基づいて,提案したラベル伝搬アルゴリズムにより,PGAの擬似ラベルオブジェクトを記憶する。 PGAは、インタラクションから取得した情報と、Reminiscence内の擬似ラベルオブジェクトとを調和させ、個人オブジェクトを把握するためにオブジェクトグラウンドモデルを適用する。これまでのLCRGシステムはリソース集約的な人間のアノテーションに依存していたが、財布を学ぶには数百のラベル付きデータを必要としていた。さらに、PGAはすべてのメトリクスでベースラインメソッドよりも優れており、9kの注釈付きデータサンプルから学習する完全教師付きメソッドと同等のパフォーマンスを示している。 GrsapMineの実行に物理ロボットを用いることにより,PGAの現実的適用性をさらに検証する。コードとデータはhttps://github.com/JHKim-snu/PGAで公開されている。 Language-Conditioned Robotic Grasping (LCRG) aims to develop robots that comprehend and grasp objects based on natural language instructions. While the ability to understand personal objects like my wallet facilitates more natural interaction with human users, current LCRG systems only allow generic language instructions, e.g., the black-colored wallet next to the laptop. To this end, we introduce a task scenario GraspMine alongside a novel dataset aimed at pinpointing and grasping personal objects given personal indicators via learning from a single human-robot interaction, rather than a large labeled dataset. Our proposed method, Personalized Grasping Agent (PGA), addresses GraspMine by leveraging the unlabeled image data of the user's environment, called Reminiscence. Specifically, PGA acquires personal object information by a user presenting a personal object with its associated indicator, followed by PGA inspecting the object by rotating it. Based on the acquired information, PGA pseudo-labels objects in the Reminiscence by our proposed label propagation algorithm. Harnessing the information acquired from the interactions and the pseudo-labeled objects in the Reminiscence, PGA adapts the object grounding model to grasp personal objects. This results in significant efficiency while previous LCRG systems rely on resource-intensive human annotations -- necessitating hundreds of labeled data to learn my wallet. Moreover, PGA outperforms baseline methods across all metrics and even shows comparable performance compared to the fully-supervised method, which learns from 9k annotated data samples. We further validate PGA's real-world applicability by employing a physical robot to execute GrsapMine. Code and data are publicly available at https://github.com/JHKim-snu/PGA.	翻訳日:2024-03-21 00:30:47 公開日:2024-03-19
# 分散ヘビアン時間記憶を用いた継承関数の学習 Learning Successor Features with Distributed Hebbian Temporal Memory ( http://arxiv.org/abs/2310.13391v2 ) ライセンス: Link先を確認	Evgenii Dzhivelikian, Petr Kuderov, Aleksandr I. Panov,	(参考訳) 本稿では,非定常部分観測環境における不確実性を考慮した意思決定におけるオンライン時間記憶学習の課題に対して,新しいアプローチを提案する。提案アルゴリズムは因子グラフ形式と多成分ニューロンモデルに基づく分散Hebbian Temporal Memory (DHTM) である。 DHTMは、シーケンシャルなデータ関係を捉え、将来の観測について累積的な予測を行い、継承機能(SF)を形成することを目的としている。新皮質の神経生理学的モデルにインスパイアされたこのアルゴリズムは、分散表現、スパース遷移行列、および局所ヘビアンのような学習規則を利用して、RNNやHMMのような伝統的な時間記憶アルゴリズムの不安定性と遅い学習プロセスを克服する。実験の結果,非定常データセットの場合,DHTMはLSTMと生物学的にインスパイアされたHMMライクなアルゴリズムCSCGより優れていた。この結果から,DHTMは動的環境におけるオンラインシーケンス学習と計画の課題に対処するための,有望なアプローチであることが示唆された。 This paper presents a novel approach to address the challenge of online temporal memory learning for decision-making under uncertainty in non-stationary, partially observable environments. The proposed algorithm, Distributed Hebbian Temporal Memory (DHTM), is based on factor graph formalism and a multicomponent neuron model. DHTM aims to capture sequential data relationships and make cumulative predictions about future observations, forming Successor Features (SF). Inspired by neurophysiological models of the neocortex, the algorithm utilizes distributed representations, sparse transition matrices, and local Hebbian-like learning rules to overcome the instability and slow learning process of traditional temporal memory algorithms like RNN and HMM. Experimental results demonstrate that DHTM outperforms LSTM and a biologically inspired HMM-like algorithm, CSCG, in the case of non-stationary datasets. Our findings suggest that DHTM is a promising approach for addressing the challenges of online sequence learning and planning in dynamic environments.	翻訳日:2024-03-21 00:30:47 公開日:2024-03-19
# TiC-CLIP:CLIPモデルの継続的なトレーニング TiC-CLIP: Continual Training of CLIP Models ( http://arxiv.org/abs/2310.16226v2 ) ライセンス: Link先を確認	Saurabh Garg, Mehrdad Farajtabar, Hadi Pouransari, Raviteja Vemulapalli, Sachin Mehta, Oncel Tuzel, Vaishaal Shankar, Fartash Faghri,	(参考訳) 最新のデータに基づいて、大規模なファンデーションモデルを最新に保つことは本質的にコストがかかる。絶え間なく再トレーニングすることの禁止コストを避けるため、これらのモデルを訓練するのは必須である。この問題は、大規模な継続的学習ベンチマークやベースラインの欠如によって悪化している。我々は、TiC-DataComp、TiC-YFCC、TiC-Redcapsといったビジョン言語モデルをトレーニングするための、WebスケールのTime-Continual(TiC)ベンチマークの最初のセットを紹介する。当社最大のデータセットであるTiC-DataCompには、2014年から2022年にかけての12.7Bのタイムスタンプイメージテキストペアが含まれています。まず,既存のモデルの時間的ロバスト性を評価するために,ベンチマークを用いて各種のemph{dynamic}評価をキュレートする。私たちは、OpenAIのCLIP(2020年までのデータでトレーニングされた)が、最近トレーニングされたOpenCLIPリポジトリのモデルと比較して、2021年から2022年までのキュレートされた検索タスクにおいて、$\approx 8\%$ゼロショットの精度を失うことを示しています。次に、時間連続データに基づいてモデルを効率的にトレーニングする方法を研究します。我々は、前回のチェックポイントからトレーニングを継続し、古いデータを再生する単純なリハーサルベースのアプローチが、スクラッチからリトレーニングする標準的なプラクティスと比較して、計算を2.5\times$に削減することを示した。コードはhttps://github.com/apple/ml-tic-clip.comで入手できる。 Keeping large foundation models up to date on latest data is inherently expensive. To avoid the prohibitive costs of constantly retraining, it is imperative to \emph{continually} train these models. This problem is exacerbated by the lack of any large scale continual learning benchmarks or baselines. We introduce the first set of web-scale Time-Continual (TiC) benchmarks for training vision-language models: TiC-DataComp, TiC-YFCC, and TiC-Redcaps. TiC-DataComp, our largest dataset, contains over 12.7B timestamped image-text pairs spanning 9 years (2014--2022). We first use our benchmarks to curate various \emph{dynamic} evaluations to measure temporal robustness of existing models. We show OpenAI's CLIP (trained on data up to 2020) loses $\approx 8\%$ zero-shot accuracy on our curated retrieval task from 2021--2022 compared with more recently trained models in OpenCLIP repository. We then study how to efficiently train models on time-continuous data. We demonstrate that a simple rehearsal-based approach that continues training from the last checkpoint and replays old data reduces compute by $2.5\times$ when compared to the standard practice of retraining from scratch. Code is available at https://github.com/apple/ml-tic-clip.	翻訳日:2024-03-21 00:30:47 公開日:2024-03-19
# FTIC:学習画像圧縮のための周波数対応変換器 FTIC: Frequency-Aware Transformer for Learned Image Compression ( http://arxiv.org/abs/2310.16387v2 ) ライセンス: Link先を確認	Han Li, Shaohui Li, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong,	(参考訳) 近年,学習画像圧縮(lic)が画像記憶と伝送に有効なソリューションとして注目されている。しかし、既存のlicメソッドは、異方性周波数成分の捕捉や方向の詳細の保存に制限があるため、遅延表現において冗長である。これらの課題を克服するために,新しい周波数対応変換器 (FAT) ブロックを提案する。 FATブロックは、自然画像のマルチスケールおよび指向性周波数成分をキャプチャするための周波数分解ウィンドウアテンション(FDWA)モジュールを含む。さらに、周波数変調フィードフォワードネットワーク(FMFFN)を導入し、異なる周波数成分を適応的に変調し、周波数歪み性能を向上させる。さらに、チャネル依存性を効果的に活用するトランスフォーマーベースのチャネルワイド自己回帰(T-CA)モデルを提案する。実験により,本手法は既存の標準手法と比較して最先端の速度歪み性能を実現し,コダック,テックニック,CLICデータセット上でのBDレートの14.5%,15.1%,13.0%,最新の標準コーデックVTM-12.1よりも明らかに優れていた。 Learned image compression (LIC) has gained traction as an effective solution for image storage and transmission in recent years. However, existing LIC methods are redundant in latent representation due to limitations in capturing anisotropic frequency components and preserving directional details. To overcome these challenges, we propose a novel frequency-aware transformer (FAT) block that for the first time achieves multiscale directional ananlysis for LIC. The FAT block comprises frequency-decomposition window attention (FDWA) modules to capture multiscale and directional frequency components of natural images. Additionally, we introduce frequency-modulation feed-forward network (FMFFN) to adaptively modulate different frequency components, improving rate-distortion performance. Furthermore, we present a transformer-based channel-wise autoregressive (T-CA) model that effectively exploits channel dependencies. Experiments show that our method achieves state-of-the-art rate-distortion performance compared to existing LIC methods, and evidently outperforms latest standardized codec VTM-12.1 by 14.5%, 15.1%, 13.0% in BD-rate on the Kodak, Tecnick, and CLIC datasets.	翻訳日:2024-03-21 00:20:56 公開日:2024-03-19
# 適応スコアを用いた帰納的共形推論 Transductive conformal inference with adaptive scores ( http://arxiv.org/abs/2310.18108v2 ) ライセンス: Link先を確認	Ulysse Gazin, Gilles Blanchard, Etienne Roquain,	(参考訳) コンフォーマル推論は、多くの機械学習タスクに対して分散のない保証を提供する、基本的で汎用的なツールである。トランスダクティブな設定は、$m$新しい点のテストサンプルで決定され、$m$コンフォーマルな$p$-値が生じる。古典的な結果はそれらの限界分布のみに関係するが、それらの結合分布は P'olya urn モデルに従い、その経験的分布関数に対する濃度不等式を確立する。結果は任意の交換可能なスコアを保持し、トレーニング段階でテスト+校正サンプルの共変量を使用する適応スコアを含む。本稿では,2クラス分類に基づく伝達学習における区間予測と新規性検出の2つの機械学習タスクに対して,一様かつ不確率な保証により,これらの理論結果の有用性を実証する。 Conformal inference is a fundamental and versatile tool that provides distribution-free guarantees for many machine learning tasks. We consider the transductive setting, where decisions are made on a test sample of $m$ new points, giving rise to $m$ conformal $p$-values. While classical results only concern their marginal distribution, we show that their joint distribution follows a P\'olya urn model, and establish a concentration inequality for their empirical distribution function. The results hold for arbitrary exchangeable scores, including adaptive ones that can use the covariates of the test+calibration samples at training stage for increased accuracy. We demonstrate the usefulness of these theoretical results through uniform, in-probability guarantees for two machine learning tasks of current interest: interval prediction for transductive transfer learning and novelty detection based on two-class classification.	翻訳日:2024-03-21 00:20:56 公開日:2024-03-19
# 量子符号の最適シングルショット復号法 Optimal Single-Shot Decoding of Quantum Codes ( http://arxiv.org/abs/2310.18138v2 ) ライセンス: Link先を確認	Aldo Cumitini, Stefano Tinelli, Balázs Matuz, Francisco Lázaro, Luca Barletta,	(参考訳) 本稿では, 量子カルダーバンク・ソー・ステーン符号の単一ショット復号と故障症候群の測定について論じる。我々は、この問題を共用音源チャネル符号化問題として記述する。符号のパリティチェック行列に冗長な行を追加することで、障害症候群の測定に対処する追加のシンドローエラー訂正コードを得る。これにより、冗長行を選択して、安定化器重みを低く保ちつつ、良好なシンドローム誤差補正能力を得る。一般的な符号では複雑すぎるが、短い量子符号では評価できない最適結合復号法が導出される。 We discuss single-shot decoding of quantum Calderbank-Shor-Steane codes with faulty syndrome measurements. We state the problem as a joint source-channel coding problem. By adding redundant rows to the code's parity-check matrix we obtain an additional syndrome error correcting code which addresses faulty syndrome measurements. Thereby, the redundant rows are chosen to obtain good syndrome error correcting capabilities while keeping the stabilizer weights low. Optimal joint decoding rules are derived which, though too complex for general codes, can be evaluated for short quantum codes.	翻訳日:2024-03-21 00:20:56 公開日:2024-03-19
# MCRAGE:フェアネスのための総合医療データ MCRAGE: Synthetic Healthcare Data for Fairness ( http://arxiv.org/abs/2310.18430v2 ) ライセンス: Link先を確認	Keira Behal, Jiayi Chen, Caleb Fikes, Sophia Xiao,	(参考訳) 医療分野において、電子健康記録(EHR)は、医療資源の診断、治療、管理のための機械学習モデルを開発するための重要なトレーニングデータである。しかしながら、医療データセットは人種や民族、性別、年齢といったセンシティブな属性の観点からは不均衡であることが多い。クラス不均衡なEHRデータセットに基づいてトレーニングされた機械学習モデルは、マイノリティークラスの個人に対して、マイノリティークラスのサンプルと比較して、デプロイにおいて著しく悪化し、マイノリティーグループの医療結果が不平等になる可能性がある。この課題に対処するため、我々は、深層生成モデルにより生成されたサンプルを用いて不均衡なデータセットを増大させる新しいアプローチであるMCRAGE(Mority Class Rebalancing through Augmentation by Generative Modeling)を提案する。 MCRAGEプロセスは、未表現のクラスから高品質な合成EHRサンプルを生成することができる条件付き脱ノイズ拡散確率モデル(CDDPM)を訓練する。この合成データを用いて、既存の不均衡データセットを増大させ、不均衡な機械学習モデルをトレーニングするために使用可能な、すべてのクラスにまたがるよりバランスのとれた分散を実現する。精度,F1スコア,AUROCを用いて,MCRAGEと代替手法の性能を測定した。最小の仮定を持つDDPMに対する最近の収束結果の観点から,本手法の理論的正当性を示す。 In the field of healthcare, electronic health records (EHR) serve as crucial training data for developing machine learning models for diagnosis, treatment, and the management of healthcare resources. However, medical datasets are often imbalanced in terms of sensitive attributes such as race/ethnicity, gender, and age. Machine learning models trained on class-imbalanced EHR datasets perform significantly worse in deployment for individuals of the minority classes compared to samples from majority classes, which may lead to inequitable healthcare outcomes for minority groups. To address this challenge, we propose Minority Class Rebalancing through Augmentation by Generative modeling (MCRAGE), a novel approach to augment imbalanced datasets using samples generated by a deep generative model. The MCRAGE process involves training a Conditional Denoising Diffusion Probabilistic Model (CDDPM) capable of generating high-quality synthetic EHR samples from underrepresented classes. We use this synthetic data to augment the existing imbalanced dataset, thereby achieving a more balanced distribution across all classes, which can be used to train an unbiased machine learning model. We measure the performance of MCRAGE versus alternative approaches using Accuracy, F1 score and AUROC. We provide theoretical justification for our method in terms of recent convergence results for DDPMs with minimal assumptions.	翻訳日:2024-03-21 00:20:56 公開日:2024-03-19
# プラスティックで安定な初等的インクリメンタルラーニングを目指して--累積パラメータ平均化によるデュアルラーナーフレームワーク Towards Plastic and Stable Exemplar-Free Incremental Learning: A Dual-Learner Framework with Cumulative Parameter Averaging ( http://arxiv.org/abs/2310.18639v3 ) ライセンス: Link先を確認	Wenju Sun, Qingyong Li, Wen Wang, Yangli-ao Geng,	(参考訳) 可塑性と安定性のジレンマはインクリメンタルラーニング(IL:Incrmental Learning)において重要な課題であり、特に新しいタスクの学習において、古いタスクのサンプルへのアクセスが厳格に禁じられている経験のないシナリオにおいて重要である。この問題に対する直接的な解決策は、STL(Single Task Learning)として知られるタスク毎に独立したモデルを学習し、保存することです。 STLにおけるタスク数によるモデル記憶の線形的な増加にもかかわらず、これらのモデルパラメータの平均化は、すべてのタスクにおける知識を保存する可能性があることを実証的に発見する。この観測に触発されて、累積パラメータ平均化(DLCPA)を用いたDual-Learnerフレームワークを提案する。 DLCPAは、新しいタスク知識の獲得に焦点を当てたプラスチック学習者と、すべての学習知識を蓄積する安定した学習者である。プラスチック学習者からの知識は、累積パラメータ平均化によって安定学習者に伝達される。さらに、いくつかのタスク固有の分類器は、安定した学習者と協力して最終予測を行う。具体的には、新しいタスクを学ぶとき、これらのモジュールは循環的に更新される。一プラスチック学習者は、当初、自己監督的損失以外の自己監督的損失を用いて、特徴抽出の堅牢性を高めるために最適化されていること。二安定学習者は、そのタスクワイドの一般化を維持するために、累積パラメータ平均化により、プラスチック学習者に対して更新される。三タスク固有の分類器は、安定した学習者に合わせて最適化される。 CIFAR-100 と Tiny-ImageNet の実験結果から,DLCPA は Task-IL と Class-IL の両設定において,最先端の既定ベースラインよりも優れていた。 The dilemma between plasticity and stability presents a significant challenge in Incremental Learning (IL), especially in the exemplar-free scenario where accessing old-task samples is strictly prohibited during the learning of a new task. A straightforward solution to this issue is learning and storing an independent model for each task, known as Single Task Learning (STL). Despite the linear growth in model storage with the number of tasks in STL, we empirically discover that averaging these model parameters can potentially preserve knowledge across all tasks. Inspired by this observation, we propose a Dual-Learner framework with Cumulative Parameter Averaging (DLCPA). DLCPA employs a dual-learner design: a plastic learner focused on acquiring new-task knowledge and a stable learner responsible for accumulating all learned knowledge. The knowledge from the plastic learner is transferred to the stable learner via cumulative parameter averaging. Additionally, several task-specific classifiers work in cooperation with the stable learner to yield the final prediction. Specifically, when learning a new task, these modules are updated in a cyclic manner: i) the plastic learner is initially optimized using a self-supervised loss besides the supervised loss to enhance the feature extraction robustness; ii) the stable learner is then updated with respect to the plastic learner in a cumulative parameter averaging manner to maintain its task-wise generalization; iii) the task-specific classifier is accordingly optimized to align with the stable learner. Experimental results on CIFAR-100 and Tiny-ImageNet show that DLCPA outperforms several state-of-the-art exemplar-free baselines in both Task-IL and Class-IL settings.	翻訳日:2024-03-21 00:20:56 公開日:2024-03-19
# OP Combinatorial OptimizationのためのQ-Learning付きポインタネットワーク Pointer Networks with Q-Learning for OP Combinatorial Optimization ( http://arxiv.org/abs/2311.02629v2 ) ライセンス: Link先を確認	Alessandro Barro,	(参考訳) オリエンテーリング問題(OP)は、物流、配送、輸送計画において広く使われていることを強調する、コンビニショナル最適化(CO)において、ユニークな課題を提示している。 OPのNPハードの性質を考えると、最適解を得るのは本質的に複雑である。 Pointer Networks (Ptr-Nets) は、様々な組合せタスク、OPの文脈におけるパフォーマンス、将来のリターンや探索に焦点をあてる義務に長けているが、改善の余地は残されている。本研究は,強化学習(RL)法とシーケンス・ツー・シーケンス・モデルを組み合わせた有効性を認識し,Pointer Q-Network (PQN) を公表する。この手法はPtr-NetsとQ-learningを組み合わせており、その批判的な性質のおかげで、OPが提示する特定の課題に効果的に対処するための基本的な要件である組込みグラフ内で関係をキャプチャする能力に優れている。 PQNシステムのアーキテクチャと機能について検討し、オリエンテーリング問題のような組合せ最適化問題の効率性の観点から理論的・実用的優位性を示す。 The Orienteering Problem (OP) presents a unique challenge in Combinatorial Optimization (CO), emphasized by its widespread use in logistics, delivery, and transportation planning. Given the NP-hard nature of OP, obtaining optimal solutions is inherently complex. While Pointer Networks (Ptr-Nets) have exhibited prowess in various combinatorial tasks, their performance in the context of OP, and duties requiring focus on future return or exploration, leaves room for improvement. Recognizing the potency combining Reinforcement Learning (RL) methods with sequence-to-sequence models, this research unveils the Pointer Q-Network (PQN). This method combines Ptr-Nets and Q-learning, which, thanks to its critic only nature, outstands in its capability of capturing relationships within an embedded graph, a fundamental requirement in order to effectively address the specific challenges presented by OP. We explore the architecture and functionality of the PQN system, while showcasing its theoretical and practical advantages in terms of efficiency for combinatorial optimization problems such as the Orienteering Problem.	翻訳日:2024-03-21 00:20:56 公開日:2024-03-19
# 未知のコストによるワンショット戦略分類 One-Shot Strategic Classification Under Unknown Costs ( http://arxiv.org/abs/2311.02761v2 ) ライセンス: Link先を確認	Elan Rosenfeld, Nir Rosenfeld,	(参考訳) 戦略的分類の目標は、戦略的入力操作に堅牢な決定ルールを学習することである。いくつかの最近の研究は未知のレスポンスを扱うが、彼らは反復的なモデル展開でオンライン設定を独占的に研究している。しかし、パブリックポリシーでは、多くのドメインがある:$\unicode{x2014}$ 特に、共通のモチベーションのユースケース$\unicode{x2014}$複数のデプロイメントが実現不可能、あるいは、1つの悪いラウンドが受け入れられない。このギャップに対処するために、未知の応答下でのワンショット戦略分類の正式な研究を開始し、1つの分類器に1回コミットする必要がある。利用者のコスト関数の不確実性に着目して、広範囲のコストに対して、たとえ真のコストの小さな誤推定であっても、最悪の場合、自明な精度が伴うことを証明することから始める。これを踏まえて,本課題を最小限の問題とみなし,可能なコストの不確実性に対して最小の最悪のリスクで分類器を識別することを目的とする。我々は、全バッチおよび確率的設定の両方に対して効率的なアルゴリズムを設計し、$\tilde{\mathcal{O}}(T^{-\frac{1}{2}})$の次元非依存速度でミニマックス解に収束する(オフライン)ことを証明した。我々の理論解析は、戦略的応答、特にコスト関数に対する双対ノルム正規化の値から生じる重要な構造を明らかにする。 The goal of strategic classification is to learn decision rules which are robust to strategic input manipulation. Earlier works assume that these responses are known; while some recent works handle unknown responses, they exclusively study online settings with repeated model deployments. But there are many domains$\unicode{x2014}$particularly in public policy, a common motivating use case$\unicode{x2014}$where multiple deployments are infeasible, or where even one bad round is unacceptable. To address this gap, we initiate the formal study of one-shot strategic classification under unknown responses, which requires committing to a single classifier once. Focusing on uncertainty in the users' cost function, we begin by proving that for a broad class of costs, even a small mis-estimation of the true cost can entail trivial accuracy in the worst case. In light of this, we frame the task as a minimax problem, with the goal of identifying the classifier with the smallest worst-case risk over an uncertainty set of possible costs. We design efficient algorithms for both the full-batch and stochastic settings, which we prove converge (offline) to the minimax solution at the dimension-independent rate of $\tilde{\mathcal{O}}(T^{-\frac{1}{2}})$. Our theoretical analysis reveals important structure stemming from strategic responses, particularly the value of dual norm regularization with respect to the cost function.	翻訳日:2024-03-21 00:20:56 公開日:2024-03-19
# MuSHRoom:ジョイント3次元再構成と新しいビュー合成のためのマルチセンサハイブリッドルームデータセット MuSHRoom: Multi-Sensor Hybrid Room Dataset for Joint 3D Reconstruction and Novel View Synthesis ( http://arxiv.org/abs/2311.02778v2 ) ライセンス: Link先を確認	Xuqian Ren, Wenjia Wang, Dingding Cai, Tuuli Tuominen, Juho Kannala, Esa Rahtu,	(参考訳) メタバース技術は、人間以外の知覚(ドローン/ロボット/自律走行車ナビゲーションなど)とAR/VRのような没入型技術の両方に対して、消費者級ハードウェアの精度、リアルタイム、没入型モデリングを必要とし、構造的精度とフォトリアリズムの両方を必要とする。しかし、幾何再構成とフォトリアリズムモデリング(ノーベルビュー合成)を統一されたフレームワークに適用する方法には、知識ギャップが存在する。このギャップに対処し、コンシューマグレードデバイスを用いた堅牢で没入的なモデリングおよびレンダリングの開発を促進するために、実世界のマルチセンサーハイブリッドルームデータセット(MuSHRoom)を提案する。我々のデータセットは、エキサイティングな課題を示し、最先端の手法がコスト効率が高く、ノイズの多いデータやデバイスに対して堅牢であることを必要とし、それらを個別のタスクとして扱うのではなく、3D再構成と新しいビュー合成を共同で学習し、現実のアプリケーションに最適なものにします。共同3Dメッシュ再構成と新しいビュー合成のためのデータセット上で、いくつかの有名なパイプラインをベンチマークする。我々のデータセットとベンチマークは、堅牢で計算効率のよいエンドツーエンド方式で、3D再構成と高品質なレンダリングを融合させる改善を促進する大きな可能性を示している。データセットとコードはプロジェクトのWebサイト(https://xuqianren.github.io/publications/MuSHRoom/)で公開されている。 Metaverse technologies demand accurate, real-time, and immersive modeling on consumer-grade hardware for both non-human perception (e.g., drone/robot/autonomous car navigation) and immersive technologies like AR/VR, requiring both structural accuracy and photorealism. However, there exists a knowledge gap in how to apply geometric reconstruction and photorealism modeling (novel view synthesis) in a unified framework. To address this gap and promote the development of robust and immersive modeling and rendering with consumer-grade devices, we propose a real-world Multi-Sensor Hybrid Room Dataset (MuSHRoom). Our dataset presents exciting challenges and requires state-of-the-art methods to be cost-effective, robust to noisy data and devices, and can jointly learn 3D reconstruction and novel view synthesis instead of treating them as separate tasks, making them ideal for real-world applications. We benchmark several famous pipelines on our dataset for joint 3D mesh reconstruction and novel view synthesis. Our dataset and benchmark show great potential in promoting the improvements for fusing 3D reconstruction and high-quality rendering in a robust and computationally efficient end-to-end fashion. The dataset and code are available at the project website: https://xuqianren.github.io/publications/MuSHRoom/.	翻訳日:2024-03-21 00:20:56 公開日:2024-03-19
# TabRepo: タブラルモデル評価の大規模リポジトリとそのAutoMLアプリケーション TabRepo: A Large Scale Repository of Tabular Model Evaluations and its AutoML Applications ( http://arxiv.org/abs/2311.02971v2 ) ライセンス: Link先を確認	David Salinas, Nick Erickson,	(参考訳) 本稿では,表モデル評価と予測の新しいデータセットであるTabRepoを紹介する。 TabRepoには、200の分類と回帰データセットで評価された1310モデルの予測とメトリクスが含まれている。データセットのメリットを,さまざまな方法で説明します。まず,従来のAutoMLシステムとハイパーパラメータ最適化を比較し,事前計算モデル予測を用いて限界コストでのアンサンブルを考慮した分析を行う。第二に、我々のデータセットを簡単に活用して転送学習を行うことが示される。特に,標準的な伝達学習手法を用いることで,最先端の表計算システムを精度,実行時間,レイテンシで上回ることを示す。 We introduce TabRepo, a new dataset of tabular model evaluations and predictions. TabRepo contains the predictions and metrics of 1310 models evaluated on 200 classification and regression datasets. We illustrate the benefit of our dataset in multiple ways. First, we show that it allows to perform analysis such as comparing Hyperparameter Optimization against current AutoML systems while also considering ensembling at marginal cost by using precomputed model predictions. Second, we show that our dataset can be readily leveraged to perform transfer-learning. In particular, we show that applying standard transfer-learning techniques allows to outperform current state-of-the-art tabular systems in accuracy, runtime and latency.	翻訳日:2024-03-21 00:20:56 公開日:2024-03-19
# クリロフ複雑性は状態または作用素間の距離の尺度ではない Krylov complexity is not a measure of distance between states or operators ( http://arxiv.org/abs/2311.04093v2 ) ライセンス: Link先を確認	Sergio E. Aguilar-Gutierrez, Andrew Rolph,	(参考訳) 我々は、クリロフ複雑性が回路とNielsenの複雑性の定義と相互互換であるかどうかを問う。 3つの状態の間のクリロフ複素数は三角形の不等式を満たすことができず、したがって距離の測度にならないことを示します。これを最も単純な例、単一のキュービット、そして一般に明らかに示す。 We ask whether Krylov complexity is mutually compatible with the circuit and Nielsen definitions of complexity. We show that the Krylov complexities between three states fail to satisfy the triangle inequality and so cannot be a measure of distance: there is no possible metric for which Krylov complexity is the length of the shortest path to the target state or operator. We show this explicitly in the simplest example, a single qubit, and in general.	翻訳日:2024-03-21 00:20:56 公開日:2024-03-19
# AI/ML加速器の評価:IPU、RDU、NVIDIA/AMD GPU Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs ( http://arxiv.org/abs/2311.04417v3 ) ライセンス: Link先を確認	Hongwu Peng, Caiwen Ding, Tong Geng, Sutanay Choudhury, Kevin Barker, Ang Li,	(参考訳) 人工知能(AI)と機械学習(ML)のアプリケーションは、複雑さと計算要求の増大に対処できる特別なハードウェアアクセラレータの開発を必要としている。 von Neumannモデルに基づく従来のコンピューティングアーキテクチャは、現代のAI/MLアルゴリズムの要求に圧倒され、Graphcore Intelligence Processing Unit (IPU)、Sambanova Reconfigurable Dataflow Unit (RDU)、GPUプラットフォームなどのアクセラレータの開発が急増している。これらのハードウェアアクセラレータは、その革新的なデータフローアーキテクチャと、AI/MLタスクのパフォーマンスとエネルギー効率の向上を約束するその他の設計最適化によって特徴付けられる。この研究は、これらの商用AI/MLアクセラレータの予備的な評価と比較を提供し、ハードウェアとソフトウェアの設計機能を調べて、その強みとユニークな能力を識別する。一般的なDNN演算子や他のAI/MLワークロードに対する一連のベンチマーク評価を行うことで、従来のプロセッサ設計よりもデータフローアーキテクチャの利点を明確化し、各プラットフォームのパフォーマンストレードオフに関する洞察を提供することを目指している。この研究から得られた知見は、研究プロトタイプの設計と性能の期待に対する貴重な基準となり、AI/MLアプリケーションの進化を続ける状況に合わせた次世代ハードウェアアクセラレータの開発が促進される。この分析を通じて、私たちは、現在の加速器技術のより広範な理解に貢献し、この分野における将来のイノベーションのためのガイダンスを提供することを目標にしています。 The relentless advancement of artificial intelligence (AI) and machine learning (ML) applications necessitates the development of specialized hardware accelerators capable of handling the increasing complexity and computational demands. Traditional computing architectures, based on the von Neumann model, are being outstripped by the requirements of contemporary AI/ML algorithms, leading to a surge in the creation of accelerators like the Graphcore Intelligence Processing Unit (IPU), Sambanova Reconfigurable Dataflow Unit (RDU), and enhanced GPU platforms. These hardware accelerators are characterized by their innovative data-flow architectures and other design optimizations that promise to deliver superior performance and energy efficiency for AI/ML tasks. This research provides a preliminary evaluation and comparison of these commercial AI/ML accelerators, delving into their hardware and software design features to discern their strengths and unique capabilities. By conducting a series of benchmark evaluations on common DNN operators and other AI/ML workloads, we aim to illuminate the advantages of data-flow architectures over conventional processor designs and offer insights into the performance trade-offs of each platform. The findings from our study will serve as a valuable reference for the design and performance expectations of research prototypes, thereby facilitating the development of next-generation hardware accelerators tailored for the ever-evolving landscape of AI/ML applications. Through this analysis, we aspire to contribute to the broader understanding of current accelerator technologies and to provide guidance for future innovations in the field.	翻訳日:2024-03-21 00:20:56 公開日:2024-03-19
# ガウス過程を用いた高周波・マルチスケールPDEの解法 Solving High Frequency and Multi-Scale PDEs with Gaussian Processes ( http://arxiv.org/abs/2311.04465v2 ) ライセンス: Link先を確認	Shikai Fang, Madison Cooley, Da Long, Shibo Li, Robert Kirby, Shandian Zhe,	(参考訳) 機械学習に基づく解法は、物理シミュレーションと科学計算に大きな注目を集めており、特に物理情報ニューラルネットワーク(PINN)が顕著である。しかしながら、PINNは、ニューラルネットワークトレーニング中にスペクトルバイアスに起因する可能性のある、高周波およびマルチスケールPDEの解決に苦慮することが多い。この問題に対処するため、我々はガウス過程(GP)フレームワークを利用する。支配周波数を柔軟に捕捉するために、学生$t$混合またはガウス混合を用いてPDE溶液のパワースペクトルをモデル化する。逆フーリエ変換を適用して共分散関数を求める(ウィーナー・ヒンチンの定理による)。ガウス混合スペクトルに由来する共分散は、既知のスペクトル混合核に対応する。次に、ログ領域内の混合重みを推定し、ジェフリーを事前に配置するのと等価であることを示す。空間性を自動的に誘導し、過度な周波数を誘発し、残りを地平線に向けて調整する。第三に、高周波数の捕捉に不可欠である大規模なコロケーション点の効率よくスケーラブルな計算を可能にするため、コロケーション点を格子上に配置し、各入力次元に共分散関数を乗算する。 GP条件平均を用いて解とその微分を予測し、境界条件と方程式自体に適合させる。その結果、共分散行列のクロネッカー積構造を導出できる。我々はKroneckerの製品特性と多線型代数を用いて、低ランク近似を使わずに計算効率とスケーラビリティを向上する。系統実験において,本手法の利点を示す。コードは \url{https://github.com/xuangu-fang/Gaussian-Process-Slover-for-High-Freq-PDE} で公開されている。 Machine learning based solvers have garnered much attention in physical simulation and scientific computing, with a prominent example, physics-informed neural networks (PINNs). However, PINNs often struggle to solve high-frequency and multi-scale PDEs, which can be due to spectral bias during neural network training. To address this problem, we resort to the Gaussian process (GP) framework. To flexibly capture the dominant frequencies, we model the power spectrum of the PDE solution with a student $t$ mixture or Gaussian mixture. We apply the inverse Fourier transform to obtain the covariance function (by Wiener-Khinchin theorem). The covariance derived from the Gaussian mixture spectrum corresponds to the known spectral mixture kernel. Next, we estimate the mixture weights in the log domain, which we show is equivalent to placing a Jeffreys prior. It automatically induces sparsity, prunes excessive frequencies, and adjusts the remaining toward the ground truth. Third, to enable efficient and scalable computation on massive collocation points, which are critical to capture high frequencies, we place the collocation points on a grid, and multiply our covariance function at each input dimension. We use the GP conditional mean to predict the solution and its derivatives so as to fit the boundary condition and the equation itself. As a result, we can derive a Kronecker product structure in the covariance matrix. We use Kronecker product properties and multilinear algebra to promote computational efficiency and scalability, without low-rank approximations. We show the advantage of our method in systematic experiments. The code is released at \url{https://github.com/xuangu-fang/Gaussian-Process-Slover-for-High-Freq-PDE}.	翻訳日:2024-03-21 00:20:56 公開日:2024-03-19
# 連続インデクシングテンソルデータに対する関数ベイズタッカー分解 Functional Bayesian Tucker Decomposition for Continuous-indexed Tensor Data ( http://arxiv.org/abs/2311.04829v2 ) ライセンス: Link先を確認	Shikai Fang, Xin Yu, Zheng Wang, Shibo Li, Mike Kirby, Shandian Zhe,	(参考訳) タッカー分解はマルチアスペクトデータを処理する強力なテンソルモデルである。グリッド構造データをコアテンソルとオブジェクト表現(要素)の集合間の相互作用として分解することで、低ランク性を示す。このような分解の基本的な前提は、データエントリの離散的なインデックスに対応する、各アスペクトまたはモードに有限オブジェクトが存在するということである。しかし、現実世界のデータはこの設定では自然に当てはまらないことが多い。例えば、地理的データは緯度と経度座標の連続インデックスとして表現され、テンソルモデルに直接適合することができない。このようなシナリオにタッカー分解を一般化するために,関数ベイズタッカー分解(FunBaT)を提案する。連続インデックスデータを,Tuckerコアと潜在関数群との相互作用として扱う。潜在関数のモデル化にはガウス過程(GP)を用いる。次に,計算コストを削減するために等価確率微分方程式(SDE)を構築することにより,各GPを状態空間に変換する。高度メッセージパッシング技術に基づくスケーラブルな後部近似のための効率的な推論アルゴリズムを開発した。本手法の利点は, 合成データと実世界の応用の両方で示される。 FunBaT のコードは \url{https://github.com/xuangu-fang/Functional-Bayesian-Tucker-Decomposition} でリリースします。 Tucker decomposition is a powerful tensor model to handle multi-aspect data. It demonstrates the low-rank property by decomposing the grid-structured data as interactions between a core tensor and a set of object representations (factors). A fundamental assumption of such decomposition is that there are finite objects in each aspect or mode, corresponding to discrete indexes of data entries. However, real-world data is often not naturally posed in this setting. For example, geographic data is represented as continuous indexes of latitude and longitude coordinates, and cannot fit tensor models directly. To generalize Tucker decomposition to such scenarios, we propose Functional Bayesian Tucker Decomposition (FunBaT). We treat the continuous-indexed data as the interaction between the Tucker core and a group of latent functions. We use Gaussian processes (GP) as functional priors to model the latent functions. Then, we convert each GP into a state-space prior by constructing an equivalent stochastic differential equation (SDE) to reduce computational cost. An efficient inference algorithm is developed for scalable posterior approximation based on advanced message-passing techniques. The advantage of our method is shown in both synthetic data and several real-world applications. We release the code of FunBaT at \url{https://github.com/xuangu-fang/Functional-Bayesian-Tucker-Decomposition}.	翻訳日:2024-03-21 00:20:56 公開日:2024-03-19
# 非専門LLMユーザのための微調整・検索・拡張・ソフトプロンピングにおけるパフォーマンスベースラインの確立 Establishing Performance Baselines in Fine-Tuning, Retrieval-Augmented Generation and Soft-Prompting for Non-Specialist LLM Users ( http://arxiv.org/abs/2311.05903v2 ) ライセンス: Link先を確認	Jennifer Dodgson, Lin Nanzheng, Julian Peh, Akira Rafhael Janson Pattirane, Alfath Daryl Alhajir, Eko Ridho Dinarto, Joseph Lim, Syed Danyal Ahmad,	(参考訳) 大規模言語モデル(LLM)の性能向上手法の研究は、細調整、検索強化生成(RAG)、ソフトプロンプティング(Soft-prompting)により、高度技術や高コスト技術の使用に集中する傾向にあり、新たに発見されたアプローチの多くは、非技術的ユーザに対して比較的アクセスできないものとなっている。本稿では,GPT 3.5の修正されていないバージョン,微調整されたバージョン,およびベクトル化RAGデータベースへのアクセスを分離した上で,基本的でないソフトプロンプトと組み合わせて検討した。それぞれのケースで、2021年9月以降(GPT 3.5のトレーニングデータセットが終了する時点)に主に発生したイベントに関連する100の質問に答えるモデルの能力をテストしました。市販のプラットフォームを使用して,出力のベースラインセットを確立するために,デフォルト設定をイテレーションなしで適用した場合,微調整モデルの方がGPT 3.5 Turboより優れ,RAGアプローチはどちらも優れることがわかった。ソフトプロンプトの適用により、各アプローチの性能が大幅に向上した。 Research into methods for improving the performance of large language models (LLMs) through fine-tuning, retrieval-augmented generation (RAG) and soft-prompting has tended to focus on the use of highly technical or high-cost techniques, making many of the newly discovered approaches comparatively inaccessible to non-technical users. In this paper we tested an unmodified version of GPT 3.5, a fine-tuned version, and the same unmodified model when given access to a vectorised RAG database, both in isolation and in combination with a basic, non-algorithmic soft prompt. In each case we tested the model's ability to answer a set of 100 questions relating primarily to events that occurred after September 2021 (the point at which GPT 3.5's training data set ends). We found that if commercial platforms are used and default settings are applied with no iteration in order to establish a baseline set of outputs, a fine-tuned model outperforms GPT 3.5 Turbo, while the RAG approach out-performed both. The application of a soft prompt significantly improved the performance of each approach.	翻訳日:2024-03-21 00:11:07 公開日:2024-03-19
# Aria-NeRF:マルチモーダル・エゴセントリック・ビュー・シンセサイザー Aria-NeRF: Multimodal Egocentric View Synthesis ( http://arxiv.org/abs/2311.06455v2 ) ライセンス: Link先を確認	Jiankai Sun, Jianing Qiu, Chuanyang Zheng, John Tucker, Javier Yu, Mac Schwager,	(参考訳) 我々は,Neural Radiance Fields (NeRFs) にインスパイアされた可変体積線トレーシングに基づいて,エゴセントリックなデータから学習したリッチでマルチモーダルなシーンモデルの開発を加速することを目指している。 Egocentric image sequenceからのNeRFライクなモデルの構築は、人間の行動を理解する上で重要な役割を担い、VR/ARの領域における多様な応用を担っている。このような自己中心型NeRFのようなモデルは現実的なシミュレーションとして利用でき、現実世界でタスクを実行する知的エージェントの進歩に大きく貢献する。 Egocentric view synthesisの将来は、現在のNeRFを超える新しい環境表現に繋がる可能性がある。例えば、移動追跡のためのIMU、表面テクスチャと人間の言語コンテキストをキャプチャするオーディオセンサー、シーンにおける人間の注意パターンを推測するアイ・ゲイズ・トラッカーなどである。エゴセントリック・マルチモーダル・シーン・モデリングの開発と評価を支援するため,包括的マルチモーダル・エゴセントリック・ビデオ・データセットを提案する。このデータセットは、RGB画像、アイトラッキングカメラの映像、マイクからの音声記録、気圧計からの気圧測定、GPSからの位置座標、Wi-FiとBluetoothの接続の詳細、デュアル周波数IMUデータセット(1kHzと800Hz)と磁気センサのペアによる情報を含む、総合的なセンサデータの収集を提供する。データセットはMeta Aria Glassesウェアラブルデバイスプラットフォームで収集された。このデータセットで捉えた多様なデータモダリティと現実世界のコンテキストは、人間の行動に対する理解を深め、VR、AR、ロボット工学の領域でより没入的でインテリジェントな体験を可能にする、堅牢な基盤となる。 We seek to accelerate research in developing rich, multimodal scene models trained from egocentric data, based on differentiable volumetric ray-tracing inspired by Neural Radiance Fields (NeRFs). The construction of a NeRF-like model from an egocentric image sequence plays a pivotal role in understanding human behavior and holds diverse applications within the realms of VR/AR. Such egocentric NeRF-like models may be used as realistic simulations, contributing significantly to the advancement of intelligent agents capable of executing tasks in the real-world. The future of egocentric view synthesis may lead to novel environment representations going beyond today's NeRFs by augmenting visual data with multimodal sensors such as IMU for egomotion tracking, audio sensors to capture surface texture and human language context, and eye-gaze trackers to infer human attention patterns in the scene. To support and facilitate the development and evaluation of egocentric multimodal scene modeling, we present a comprehensive multimodal egocentric video dataset. This dataset offers a comprehensive collection of sensory data, featuring RGB images, eye-tracking camera footage, audio recordings from a microphone, atmospheric pressure readings from a barometer, positional coordinates from GPS, connectivity details from Wi-Fi and Bluetooth, and information from dual-frequency IMU datasets (1kHz and 800Hz) paired with a magnetometer. The dataset was collected with the Meta Aria Glasses wearable device platform. The diverse data modalities and the real-world context captured within this dataset serve as a robust foundation for furthering our understanding of human behavior and enabling more immersive and intelligent experiences in the realms of VR, AR, and robotics.	翻訳日:2024-03-21 00:11:07 公開日:2024-03-19
# 三次元キャビティ構造におけるゲート対応回路量子電磁力学 Gate-Compatible Circuit Quantum Electrodynamics in a Three-Dimensional Cavity Architecture ( http://arxiv.org/abs/2311.07337v2 ) ライセンス: Link先を確認	Zezhou Xia, Jierong Huo, Zonglin Li, Jianghua Ying, Yulong Liu, Xin-Yi Tang, Yuqing Wang, Mo Chen, Dong Pan, Shan Zhang, Qichun Liu, Tiefu Li, Lin Li, Ke He, Jianhua Zhao, Runan Shang, Hao Zhang,	(参考訳) 半導体ベースの超伝導量子ビットは、回路量子力学(cQED)アーキテクチャにおいてハイブリッド量子デバイスを研究するための汎用的なプラットフォームを提供する。これらのcQED実験のほとんどは、直流ゲート線が組み込まれやすいコプラナー導波路を用いる。本稿では,3次元マイクロ波空洞を用いたゲート可変ハイブリッドデバイスの探索手法を提案する。装置やゲートラインの配置のため、キャビティ壁内に凹部を加工する。 InAs-Alナノワイヤジョセフソン接合を用いたハイブリッドデバイスを用いて,この設計を検証する。装置と空洞との結合は、長い超伝導ストリップ、アンテナにより容易となる。ジョセフソン接合とアンテナは共にゲートモン量子ビットを形成する。さらに、ゲート可変キャビティシフトと2トーン量子ビット分光を実証する。この技術は、直流ゲート電圧を必要とする3D cQEDアーキテクチャで様々な量子デバイスや材料を探索するために使用できる。 Semiconductor-based superconducting qubits offer a versatile platform for studying hybrid quantum devices in circuit quantum electrodynamics (cQED) architecture. Most of these cQED experiments utilize coplanar waveguides, where the incorporation of DC gate lines is straightforward. Here, we present a technique for probing gate-tunable hybrid devices using a three-dimensional (3D) microwave cavity. A recess is machined inside the cavity wall for the placement of devices and gate lines. We validate this design using a hybrid device based on an InAs-Al nanowire Josephson junction. The coupling between the device and the cavity is facilitated by a long superconducting strip, the antenna. The Josephson junction and the antenna together form a gatemon qubit. We further demonstrate the gate-tunable cavity shift and two-tone qubit spectroscopy. This technique could be used to probe various quantum devices and materials in a 3D cQED architecture that requires DC gate voltages.	翻訳日:2024-03-21 00:11:07 公開日:2024-03-19
# 大気量子チャネルにおける時間相関 Time correlations in atmospheric quantum channels ( http://arxiv.org/abs/2311.07730v2 ) ライセンス: Link先を確認	M. Klen, D. Vasylyev, W. Vogel, A. A. Semenov,	(参考訳) リモートパーティ間での量子情報の効率的な転送は、大気チャネル上での量子通信にとって重要な課題である。チャネル透過率のランダム変動は、その実践上の大きな障害要因である。本研究では,異なるタイミングでチャネル透過率の相関について検討し,二つの伝送プロトコルに着目した。 1つ目は、時間分離光パルス間の離散的および連続的可変な絡み合いの堅牢性に関連しており、ヒルベルト空間の有効次元を拡大する可能性を示している。 2つ目は、明るい古典的なパルスと量子光でそれらをテストすることで、高透過事象の選択に対処する。以上の結果から,大気中の光の量子状態を符号化し,伝送するための時間コヒーレンス資源の容量が高いことが示された。 Efficient transfer of quantum information between remote parties is a crucial challenge for quantum communication over atmospheric channels. Random fluctuations of the channel transmittance are a major disturbing factor for its practical implementation. We study correlations between channel transmittances at different moments of time and focus on two transmission protocols. The first is related to the robustness of both discrete- and continuous-variable entanglement between time-separated light pulses, showing a possibility to enlarge the effective dimension of the Hilbert space. The second addresses a selection of high-transmittance events by testing them with bright classical pulses followed by quantum light. Our results show a high capacity of the time-coherence resource for encoding and transferring quantum states of light in atmospheric channels.	翻訳日:2024-03-21 00:11:07 公開日:2024-03-19
# LifeTox:ライフアドバイザの有害毒性を明らかにする LifeTox: Unveiling Implicit Toxicity in Life Advice ( http://arxiv.org/abs/2311.09585v2 ) ライセンス: Link先を確認	Minbeom Kim, Jahyun Koo, Hwanhee Lee, Joonsuk Park, Hwaran Lee, Kyomin Jung,	(参考訳) 大きな言語モデルが日々の生活にますます統合されるにつれて、様々な文脈で暗黙的な毒性を検出することが不可欠である。この目的のために、幅広いアドバイス検索シナリオ内で暗黙的な毒性を特定するために設計されたデータセットであるLifeToxを紹介する。既存の安全データセットとは異なり、LifeToxはオープンエンドの質問を通じて個人体験から派生したさまざまなコンテキストで構成されている。実験により、LifeToxで微調整されたRoBERTaは、毒性分類タスクにおいて、大きな言語モデルのゼロショット性能にマッチするか、上回っていることが示された。これらの結果は、暗黙の毒性に固有の複雑な課題に対処する上で、LifeToxの有効性を裏付けるものである。私たちはS data\footnote{\url{https://huggingface.co/datasets/mbkim/LifeTox}}とLifeToxモデレータファミリーをオープンソース化しました。 As large language models become increasingly integrated into daily life, detecting implicit toxicity across diverse contexts is crucial. To this end, we introduce LifeTox, a dataset designed for identifying implicit toxicity within a broad range of advice-seeking scenarios. Unlike existing safety datasets, LifeTox comprises diverse contexts derived from personal experiences through open-ended questions. Experiments demonstrate that RoBERTa fine-tuned on LifeTox matches or surpasses the zero-shot performance of large language models in toxicity classification tasks. These results underscore the efficacy of LifeTox in addressing the complex challenges inherent in implicit toxicity. We open-sourced the dataset\footnote{\url{https://huggingface.co/datasets/mbkim/LifeTox}} and the LifeTox moderator family; 350M, 7B, and 13B.	翻訳日:2024-03-21 00:11:07 公開日:2024-03-19
# 医師はプロンプトの仕方を知っているか? : 臨床ノート作成における自動プロンプト最適化支援の必要性 Do Physicians Know How to Prompt? The Need for Automatic Prompt Optimization Help in Clinical Note Generation ( http://arxiv.org/abs/2311.09684v2 ) ライセンス: Link先を確認	Zonghai Yao, Ahmed Jaafar, Beining Wang, Zhichao Yang, Hong Yu,	(参考訳) 本研究は,臨床ノート作成における言語モデル(LLM)の性能に及ぼすプロンプトエンジニアリングの影響について検討する。本稿では,医療専門家,非医療専門家,APPO強化GPT3.5およびGPT4のアウトプットを比較し,初期プロンプトを洗練するための自動プロンプト最適化(APO)フレームワークを提案する。その結果, GPT4 APO は, 臨床検査項目間での即時品質の標準化に優れていた。 Human-in-the-loopアプローチは、専門家が自身の修正を好んでAPO以降のコンテンツ品質を維持することを示し、専門家のカスタマイズの価値を示唆している。整合性にはAPO-GPT4、パーソナライズにはエキスパートインプットを利用する2相最適化プロセスを提案する。 This study examines the effect of prompt engineering on the performance of Large Language Models (LLMs) in clinical note generation. We introduce an Automatic Prompt Optimization (APO) framework to refine initial prompts and compare the outputs of medical experts, non-medical experts, and APO-enhanced GPT3.5 and GPT4. Results highlight GPT4 APO's superior performance in standardizing prompt quality across clinical note sections. A human-in-the-loop approach shows that experts maintain content quality post-APO, with a preference for their own modifications, suggesting the value of expert customization. We recommend a two-phase optimization process, leveraging APO-GPT4 for consistency and expert input for personalization.	翻訳日:2024-03-21 00:11:07 公開日:2024-03-19
# DRESS:自然言語フィードバックによる人との交流・交流のための大規模視覚言語モデルの構築 DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback ( http://arxiv.org/abs/2311.10081v2 ) ライセンス: Link先を確認	Yangyi Chen, Karan Sikka, Michael Cogswell, Heng Ji, Ajay Divakaran,	(参考訳) 本稿では,LVLM(Large Vision Language Model)として,Large Language Modelsの自然言語フィードバック(NLF)を革新的に活用し,現状のLVLMにおける2つの重要な制限に対処することによって,そのアライメントとインタラクションを強化する。第一に、以前のLVLMは一般に、人間の好みに合わせて調整するために、命令の微調整段階にのみ依存する。追加のフィードバックを含まないと、いまだに無害、幻覚、有害な反応を起こす傾向にある。第二に、視覚的インストラクションチューニングデータは一般的にマルチターン対話形式で構成されるが、連続する会話のターン間の接続や依存関係は弱い。これにより、効果的なマルチターン相互作用のキャパシティが低下する。そこで本研究では,NLFを批判と洗練の2つの重要なタイプに分類する手法を提案する。批判的NLFは反応の強さと弱さを特定し、LVLMを人間の好みに合わせるために使用される。改良NLFは、改善のための具体的な提案を提供し、マルチターンインタラクションにフィードバックを組み込むことで、LVLMの応答を洗練できる能力に焦点を当てたLVLMの相互作用能力を改善するために採用されている。 NLFの非微分性に対処するため,条件付き強化学習を一般化した。実験の結果,DRESS は SOTA LVML よりも有用 (9.76%), 正直 (11.52%), 無害 (21.03%) な応答を生成でき, マルチターン相互作用におけるフィードバックからより効果的に学習できることがわかった。 We present DRESS, a large vision language model (LVLM) that innovatively exploits Natural Language feedback (NLF) from Large Language Models to enhance its alignment and interactions by addressing two key limitations in the state-of-the-art LVLMs. First, prior LVLMs generally rely only on the instruction finetuning stage to enhance alignment with human preferences. Without incorporating extra feedback, they are still prone to generate unhelpful, hallucinated, or harmful responses. Second, while the visual instruction tuning data is generally structured in a multi-turn dialogue format, the connections and dependencies among consecutive conversational turns are weak. This reduces the capacity for effective multi-turn interactions. To tackle these, we propose a novel categorization of the NLF into two key types: critique and refinement. The critique NLF identifies the strengths and weaknesses of the responses and is used to align the LVLMs with human preferences. The refinement NLF offers concrete suggestions for improvement and is adopted to improve the interaction ability of the LVLMs-- which focuses on LVLMs' ability to refine responses by incorporating feedback in multi-turn interactions. To address the non-differentiable nature of NLF, we generalize conditional reinforcement learning for training. Our experimental results demonstrate that DRESS can generate more helpful (9.76%), honest (11.52%), and harmless (21.03%) responses, and more effectively learn from feedback during multi-turn interactions compared to SOTA LVMLs.	翻訳日:2024-03-21 00:11:07 公開日:2024-03-19
# 超高磁場機能MRIの高分解能・高感度超解像 : 視覚研究への応用 Resolution- and Stimulus-agnostic Super-Resolution of Ultra-High-Field Functional MRI: Application to Visual Studies ( http://arxiv.org/abs/2311.14918v2 ) ライセンス: Link先を確認	Hongwei Bran Li, Matthew S. Rosen, Shahin Nasr, Juan Eugenio Iglesias,	(参考訳) 高分解能fMRIは脳のメソスケール組織への窓を提供する。しかし、高い空間分解能はスキャン時間を増加させ、低信号とコントラスト-ノイズ比を補う。本研究では,fMRIのための深層学習に基づく3次元超解像法を提案する。解像度に依存しない画像拡張フレームワークを組み込むことで,リトレーニングなしで様々なボクセルサイズに適応できる。初期視覚領域における微細な動き選択部位のローカライズに,この革新的な手法を適用した。これらのサイトの検出には一般的に1mm等方性以上の解像度を必要とするが、ここでは低分解能(2-3mm等方性)のfMRIデータに基づいてそれらを可視化する。興味深いことに、超解像fMRIは、異なる被験者から得られたトレーニングデータや実験パラダイム(非視覚的静止状態fMRIを含む)から得られたデータであっても、これらのサイトの相互に結合した組織(色選択された部位)の高頻度の詳細を復元することができる。定量的および定性的な結果から,fMRIの空間分解能が向上する可能性が示唆された。 High-resolution fMRI provides a window into the brain's mesoscale organization. Yet, higher spatial resolution increases scan times, to compensate for the low signal and contrast-to-noise ratio. This work introduces a deep learning-based 3D super-resolution (SR) method for fMRI. By incorporating a resolution-agnostic image augmentation framework, our method adapts to varying voxel sizes without retraining. We apply this innovative technique to localize fine-scale motion-selective sites in the early visual areas. Detection of these sites typically requires a resolution higher than 1 mm isotropic, whereas here, we visualize them based on lower resolution (2-3mm isotropic) fMRI data. Remarkably, the super-resolved fMRI is able to recover high-frequency detail of the interdigitated organization of these sites (relative to the color-selective sites), even with training data sourced from different subjects and experimental paradigms -- including non-visual resting-state fMRI, underscoring its robustness and versatility. Quantitative and qualitative results indicate that our method has the potential to enhance the spatial resolution of fMRI, leading to a drastic reduction in acquisition time.	翻訳日:2024-03-21 00:11:07 公開日:2024-03-19
# 真空絡み込みにおける曲率の普遍的役割 Universal role of curvature in vacuum entanglement ( http://arxiv.org/abs/2311.15019v2 ) ライセンス: Link先を確認	Hari K, Subhajit Barman, Dawood Kothawala,	(参考訳) 適切な真空状態で量子場に結合した量子プローブ間の絡み合いにおける時空曲率の役割に関する普遍的な特徴を強調した。プローブは、最初は因果的に切断され、絡み合っていない。パラメータ空間 $\{{\omega}, d_0, \boldsymbol{v}_0\}$ は、検出器のエネルギーギャップ $\omega$ と、分離距離 $d_0$ と相対速度 $\boldsymbol{v}_0$ の初値は、いずれも任意の曲線時空で共変的に定義される。ド・ジッター時空の数値的な結果も得られ、これらを用いて強い曲率構造を探索するとともに、摂動的な結果を任意の曲率時空で補正する。解析により, 前記パラメータ空間の特定の領域において, 曲率を時空曲率のプローブとして利用しやすくすることで, 絡み合い特性を誘導できることが示唆された。 We highlight some universal features concerning the role of spacetime curvature in the entanglement induced between quantum probes coupled to a quantum field in a suitable vacuum state. The probes are initially causally disconnected and non-entangled. We explore the parameter space $\{{\omega}, d_0, \boldsymbol{v}_0\}$ spanned by the energy gap $\omega$ of the detectors, and the initial values of separation distance $d_0$ and relative velocity $\boldsymbol{v}_0$, both covariantly defined in arbitrary curved spacetime. We also obtain numerical results in de Sitter spacetimes and use these to explore strong curvature regime, while also corroborating our perturbative results in arbitrary curved spacetime. Our analysis shows that curvature can induce entanglement features in certain regions of the above parameter space, in a manner which facilitates using entanglement as a probe of spacetime curvature.	翻訳日:2024-03-21 00:11:07 公開日:2024-03-19
# 知識サブグラフ学習による薬物と薬物の相互作用予測の精度と解釈 Accurate and interpretable drug-drug interaction prediction enabled by knowledge subgraph learning ( http://arxiv.org/abs/2311.15056v2 ) ライセンス: Link先を確認	Yaqing Wang, Zaifei Yang, Quanming Yao,	(参考訳) 背景: 薬物と薬物の相互作用(DDI)を明らかにすることは、臨床治療と薬物開発における長年にわたる課題である。近年,DDI予測のためのディープラーニング技術が開発されている。しかし、一般的に大量のサンプルを必要とするが、既知のDDIは稀である。方法:本研究では,上記の課題に対処するグラフニューラルネットワークに基づくKnowDDIを提案する。 KnowDDIは、大きなバイオメディカル知識グラフからリッチな近隣情報を適応的に活用することで、薬物表現を強化する。そして、予測されたDDIを解釈するために、各ドラッグペアの知識サブグラフを学習し、それぞれのエッジが、既知のDDIの重要性を示す接続強度、または、接続が不明なドラッグペア間の強度に類似した接続強度に関連付けられている。したがって、DDIの欠如は、豊かな薬物表現とプロパゲーションされた薬物類似性によって暗黙的に補償される。結果:2つのベンチマークDDIデータセット上でKnowDDIを評価する。その結果,KnowDDIは高い解釈性で最先端の予測性能が得られることがわかった。また,KnowDDIはスペーサーの知識グラフから既存の作業よりも苦しむことが判明した。このことは、薬物の表現が豊かでないときのDDIの欠如を補う上で、伝達される薬物の類似性がより重要な役割を担っていることを示している。結論: KnowDDIは、深層学習技術の効率と、生物医学知識グラフにおける豊富な事前知識をうまく組み合わせている。元々のオープンソースツールであるKnowDDIは、タンパク質-タンパク質間相互作用、薬物-標的間相互作用、疾患-遺伝子間相互作用など、幅広い関連する相互作用予測タスクにおける潜在的な相互作用の検出を支援し、最終的にはバイオメディシンと医療の発展を促進する。 Background: Discovering potential drug-drug interactions (DDIs) is a long-standing challenge in clinical treatments and drug developments. Recently, deep learning techniques have been developed for DDI prediction. However, they generally require a huge number of samples, while known DDIs are rare. Methods: In this work, we present KnowDDI, a graph neural network-based method that addresses the above challenge. KnowDDI enhances drug representations by adaptively leveraging rich neighborhood information from large biomedical knowledge graphs. Then, it learns a knowledge subgraph for each drug-pair to interpret the predicted DDI, where each of the edges is associated with a connection strength indicating the importance of a known DDI or resembling strength between a drug-pair whose connection is unknown. Thus, the lack of DDIs is implicitly compensated by the enriched drug representations and propagated drug similarities. Results: We evaluate KnowDDI on two benchmark DDI datasets. Results show that KnowDDI obtains the state-of-the-art prediction performance with better interpretability. We also find that KnowDDI suffers less than existing works given a sparser knowledge graph. This indicates that the propagated drug similarities play a more important role in compensating for the lack of DDIs when the drug representations are less enriched. Conclusions: KnowDDI nicely combines the efficiency of deep learning techniques and the rich prior knowledge in biomedical knowledge graphs. As an original open-source tool, KnowDDI can help detect possible interactions in a broad range of relevant interaction prediction tasks, such as protein-protein interactions, drug-target interactions and disease-gene interactions, eventually promoting the development of biomedicine and healthcare.	翻訳日:2024-03-21 00:11:07 公開日:2024-03-19
# オンラインコミュニティからの完全な視覚的質問応答データセット Fully Authentic Visual Question Answering Dataset from Online Communities ( http://arxiv.org/abs/2311.15562v3 ) ライセンス: Link先を確認	Chongyan Chen, Mengchen Liu, Noel Codella, Yunsheng Li, Lu Yuan, Danna Gurari,	(参考訳) VQA(Visual Question Answering)は、画像に関する質問に答える機能である。 VQAデータセットは、すべてのコンテンツが真正のユースケースから生まれたものである。オンラインの質問応答コミュニティフォーラムから引用して、VQAonlineと呼ぶ。このデータセットと8つの主流VQAデータセットとの関係を特徴付ける。データセットの回答はより長い(平均173語)ため、標準的なVQA評価指標と互換性がないため、VQAonline上で6つの最先端のVQAモデルを評価し、最も苦労したことを報告するために、より長いテキスト評価に人気のあるメトリクスを使用する。最後に、評価指標が人間の判断に最も適しているかを分析する。将来の拡張を容易にするため、データセットをhttps://vqaonline.github.io/で公開しています。 Visual Question Answering (VQA) entails answering questions about images. We introduce the first VQA dataset in which all contents originate from an authentic use case. Sourced from online question answering community forums, we call it VQAonline. We characterize this dataset and how it relates to eight mainstream VQA datasets. Observing that answers in our dataset tend to be much longer (i.e., a mean of 173 words) and so incompatible with standard VQA evaluation metrics, we instead utilize popular metrics for longer text evaluation for evaluating six state-of-the-art VQA models on VQAonline and report where they struggle most. Finally, we analyze which evaluation metrics align best with human judgments. To facilitate future extensions, we publicly-share the dataset at: https://vqaonline.github.io/.	翻訳日:2024-03-21 00:01:19 公開日:2024-03-19
# 適応前のアライメント: 一般化可能なビデオアクション認識のためのEntity-to-Regionアライメントの活用 Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition ( http://arxiv.org/abs/2311.15619v2 ) ライセンス: Link先を確認	Yifei Chen, Dapeng Chen, Ruijin Liu, Sai Zhou, Wenyuan Xue, Wei Peng,	(参考訳) 大規模視覚言語事前学習モデルは様々なビデオタスクで大きな成功を収めた。しかし、既存のほとんどの手法は、訓練済みの画像エンコーダをビデオレベルの表現のモデル化に適応させ、アクションラベルのワンホットまたはテキスト埋め込みを監督に利用する「適応的整合」パラダイムに従っている。このパラダイムは、静的イメージから複雑なアクティビティ概念へのマッピングという課題を見落としている。本稿では,Align before Adapt(ALT)パラダイムを提案する。ビデオ表現学習に適応する前に、各フレームのエンティティ・ツー・リージョンアライメントを利用する。このアライメントは、領域認識された画像埋め込みをオフラインで構築されたテキストコーパスにマッチングすることで達成される。一致したエンティティを用いて、変換器ベースのビデオアダプタにテキスト埋め込みをクエリとして送り、ビデオからベクターへの最も重要なエンティティのセマンティクスの抽出に役立てる。このパラダイムは、適応中のVLPの視覚言語アライメントを再利用し、基礎となるエンティティによるアクションを説明しようとする。これは複雑なアクティビティセマンティクスとのギャップを埋めることによって、アクションを理解するのに役立つ。 ALTは計算コストを著しく低く保ちながら、競争性能を示す。完全に監督された実験では、キネティクス400で88.1%の精度で4947 GFLOPを達成している。さらに、ALTはゼロショットと少数ショットの両方の実験において従来の最先端の手法よりも優れており、様々な学習シナリオにおける優れた一般化性を強調している。 Large-scale visual-language pre-trained models have achieved significant success in various video tasks. However, most existing methods follow an "adapt then align" paradigm, which adapts pre-trained image encoders to model video-level representations and utilizes one-hot or text embedding of the action labels for supervision. This paradigm overlooks the challenge of mapping from static images to complicated activity concepts. In this paper, we propose a novel "Align before Adapt" (ALT) paradigm. Prior to adapting to video representation learning, we exploit the entity-to-region alignments for each frame. The alignments are fulfilled by matching the region-aware image embeddings to an offline-constructed text corpus. With the aligned entities, we feed their text embeddings to a transformer-based video adapter as the queries, which can help extract the semantics of the most important entities from a video to a vector. This paradigm reuses the visual-language alignment of VLP during adaptation and tries to explain an action by the underlying entities. This helps understand actions by bridging the gap with complex activity semantics, particularly when facing unfamiliar or unseen categories. ALT demonstrates competitive performance while maintaining remarkably low computational costs. In fully supervised experiments, it achieves 88.1% top-1 accuracy on Kinetics-400 with only 4947 GFLOPs. Moreover, ALT outperforms the previous state-of-the-art methods in both zero-shot and few-shot experiments, emphasizing its superior generalizability across various learning scenarios.	翻訳日:2024-03-21 00:01:19 公開日:2024-03-19
# 述語拡散:テキスト・画像拡散モデルのための述語論理に基づく注意誘導 Predicated Diffusion: Predicate Logic-Based Attention Guidance for Text-to-Image Diffusion Models ( http://arxiv.org/abs/2311.16117v2 ) ライセンス: Link先を確認	Kota Sueyoshi, Takashi Matsubara,	(参考訳) 拡散モデルは高品質で多彩で創造的な画像を生成することに顕著な成果を上げている。しかし、テキストベースの画像生成に関しては、しばしばテキストに示される意図された意味を捉えることに失敗する。例えば、指定されたオブジェクトは生成されず、不要なオブジェクトは生成され、形容詞は変更を意図していないオブジェクトを変更できる。さらに、オブジェクト間の保持を示す関係がしばしば見過ごされることがわかった。テキストにおけるユーザの意図は様々であるが、既存の手法はこれらのいくつかの側面のみを専門化する傾向にある。本稿では,ユーザの意図を表現する統合フレームワークであるPredicated Diffusionを提案する。上述の問題の根源はテキストエンコーダであり、しばしば個々の単語にのみ焦点をあて、それらの間の論理的関係を無視する。提案手法は,テキストエンコーダにのみ依存するのではなく,テキスト中の意図した意味を述語論理を用いた命題として表現し,ファジィ述語として注目マップ内の画素を扱う。これにより、画像を最小化して命題を満たすような、微分可能な損失関数を得ることができる。いくつかの既存手法と比較すると、人間の評価や事前学習した画像テキストモデルによって検証されたように、述語拡散は様々なテキストプロンプトに忠実な画像を生成することができることを示した。 Diffusion models have achieved remarkable results in generating high-quality, diverse, and creative images. However, when it comes to text-based image generation, they often fail to capture the intended meaning presented in the text. For instance, a specified object may not be generated, an unnecessary object may be generated, and an adjective may alter objects it was not intended to modify. Moreover, we found that relationships indicating possession between objects are often overlooked. While users' intentions in text are diverse, existing methods tend to specialize in only some aspects of these. In this paper, we propose Predicated Diffusion, a unified framework to express users' intentions. We consider that the root of the above issues lies in the text encoder, which often focuses only on individual words and neglects the logical relationships between them. The proposed method does not solely rely on the text encoder, but instead, represents the intended meaning in the text as propositions using predicate logic and treats the pixels in the attention maps as the fuzzy predicates. This enables us to obtain a differentiable loss function that makes the image fulfill the proposition by minimizing it. When compared to several existing methods, we demonstrated that Predicated Diffusion can generate images that are more faithful to various text prompts, as verified by human evaluators and pretrained image-text models.	翻訳日:2024-03-21 00:01:19 公開日:2024-03-19
# 時系列異常検出のための周波数領域におけるマルチパタン正規性学習 Learning Multi-Pattern Normalities in the Frequency Domain for Efficient Time Series Anomaly Detection ( http://arxiv.org/abs/2311.16191v2 ) ライセンス: Link先を確認	Feiyi Chen, Yingying zhang, Zhen Qin, Lunting Fan, Renhe Jiang, Yuxuan Liang, Qingsong Wen, Shuiguang Deng,	(参考訳) 異常検出は、クラウドシステムの堅牢性を大幅に向上させる。ニューラルネットワークベースの手法は、最近、強力なアドバンテージを示しているが、クラウド環境では、各サービスのためのユニークなモデルを維持する非現実性と、統一モデルによる多様な正常パターンを扱う能力の制限との矛盾、そして、大量のトラフィックをリアルタイムに処理することの問題点と、短期的な異常検出感度といった、実践的な課題に直面している。そこで本研究では、時系列異常検出のための周波数領域における多重正規パターンと効率的な異常検出手法であるMACEを提案する。特徴は3つある。一統一モデルにより多様な正常パターンを扱うのに優れたパターン抽出機構であって、データサンプル自体にのみ焦点をあてるのではなく、データサンプルとそのサービス正規パターンの相関を調べて異常を識別することができる。二時間領域における短期異常を増幅し、周波数領域における異常の再構成を阻害する双対的畳み込み機構であって、異常と正常との再構成誤差を増大させ、異常検出を容易にすること。三周波数領域の疎度と並列性を活用してモデル効率を向上させること。理論的および実験的に、フーリエ基底の戦略的に選択された部分集合を使用することで、計算オーバーヘッドを低減できるだけでなく、完全なスペクトルを用いた場合と比較して異常を区別する利益も得られることを証明した。さらに,多種多様な正規パターンを統一モデルで処理し,高い効率で最先端の性能を実現するためのMISの有効性を実証した。 Anomaly detection significantly enhances the robustness of cloud systems. While neural network-based methods have recently demonstrated strong advantages, they encounter practical challenges in cloud environments: the contradiction between the impracticality of maintaining a unique model for each service and the limited ability to deal with diverse normal patterns by a unified model, as well as issues with handling heavy traffic in real time and short-term anomaly detection sensitivity. Thus, we propose MACE, a multi-normal-pattern accommodated and efficient anomaly detection method in the frequency domain for time series anomaly detection. There are three novel characteristics of it: (i) a pattern extraction mechanism excelling at handling diverse normal patterns with a unified model, which enables the model to identify anomalies by examining the correlation between the data sample and its service normal pattern, instead of solely focusing on the data sample itself; (ii) a dualistic convolution mechanism that amplifies short-term anomalies in the time domain and hinders the reconstruction of anomalies in the frequency domain, which enlarges the reconstruction error disparity between anomaly and normality and facilitates anomaly detection; (iii) leveraging the sparsity and parallelism of frequency domain to enhance model efficiency. We theoretically and experimentally prove that using a strategically selected subset of Fourier bases can not only reduce computational overhead but is also profitable to distinguish anomalies, compared to using the complete spectrum. Moreover, extensive experiments demonstrate MACE's effectiveness in handling diverse normal patterns with a unified model and it achieves state-of-the-art performance with high efficiency.	翻訳日:2024-03-21 00:01:19 公開日:2024-03-19
# 共ドープ結晶膜における希土類エミッタのスペクトル多重化 Spectral Multiplexing of Rare-earth Emitters in a Co-doped Crystalline Membrane ( http://arxiv.org/abs/2311.16875v2 ) ライセンス: Link先を確認	Alexander Ulanowski, Johannes Früh, Fabian Salamon, Adrian Holzäpfel, Andreas Reiserer,	(参考訳) 光共振器における多数のレアアースドパントのスペクトルアドレス化は、分散量子情報プロセッサの実現に大きな可能性をもたらす。この目的のためには、ミクロンスケールデバイスにおけるエミッタのスペクトル特性を理解し制御する必要がある。ここで、エルビウムエミッタは、ユーロピウムと共ドープした結晶性イットリウムオルソシリケート10ミクロンの膜を含むファブリペロ共振器で研究される。共ドーピングにより、エミッタ周波数の不均一分布を調整でき、Purcell因子が35を超える360量子ビット以上の高忠実スペクトル多重化が可能となる。同時に、光学的コヒーレンスは、動的疎結合の下で0.62(3)msまで保存される。デカップリングなしでは、コヒーレンスは110倍の寿命短縮(0.104(9)msまで減少する最強のパーセルエンハンスメントを持つエミッターの寿命限界に達する。将来の研究は、これを長寿命の核スピン記憶と組み合わせることで、研究された共ドープ膜は量子リピータと分散量子コンピュータのための有望なプラットフォームとなる。 The spectral addressing of many individual rare-earth dopants in optical resonators offers great potential for realizing distributed quantum information processors. To this end, it is required to understand and control the spectral properties of the emitters in micron-scale devices. Here, erbium emitters are investigated in a Fabry-Perot resonator which contains a ten-micrometer-thin membrane of crystalline yttrium orthosilicate that is co-doped with europium. The co-doping allows for tailoring the inhomogeneous distribution of the emitter frequency, which enables high-fidelity spectral multiplexing of more than 360 qubits with Purcell factors exceeding 35. At the same time, the optical coherence is preserved up to 0.62(3) ms under dynamical decoupling. Without decoupling, the coherence still reaches the lifetime limit for the emitters with the strongest Purcell enhancement that leads up to a 110-fold lifetime reduction, down to 0.104(9) ms. Future work may combine this with long-lived nuclear spin memories, which makes the investigated co-doped membranes a promising platform for quantum repeaters and distributed quantum computers.	翻訳日:2024-03-21 00:01:19 公開日:2024-03-19
# WoVoGen: 制御可能なマルチカメラ駆動シーン生成のための世界ボリューム対応拡散 WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation ( http://arxiv.org/abs/2312.02934v3 ) ライセンス: Link先を確認	Jiachen Lu, Ze Huang, Zeyu Yang, Jiahui Zhang, Li Zhang,	(参考訳) マルチカメラストリートビュービデオの生成は、広範囲で多様なデータに対する緊急の要求に対処するため、自動運転データセットの増大に不可欠である。照明条件を扱う際の多様性や課題の制限により、従来のレンダリングベースの手法は拡散ベースの手法に取って代わられつつある。しかし、拡散法における重要な課題は、生成したセンサデータが世界内部の一貫性とセンサ間のコヒーレンスの両方を維持することを保証することである。これらの課題に対処するため,新たな世界ボリュームを組み合わせ,WoVoGen(World Volume-aware Multi-camera Driving Scene Generator)を提案する。このシステムは、4Dワールドボリュームをビデオ生成の基礎要素として活用するように設計されている。私たちのモデルは2つの異なるフェーズで動作します。一車両制御順序に基づく将来の四次元時空間容積を想定すること。 (II) この4次元時間的世界容積とセンサの相互接続性から, マルチカメラ映像を生成する。 4Dワールドボリュームの導入により、WoVoGenは車載制御入力に応じて高品質なストリートビュービデオを生成するだけでなく、シーン編集作業を容易にすることができる。 Generating multi-camera street-view videos is critical for augmenting autonomous driving datasets, addressing the urgent demand for extensive and varied data. Due to the limitations in diversity and challenges in handling lighting conditions, traditional rendering-based methods are increasingly being supplanted by diffusion-based methods. However, a significant challenge in diffusion-based methods is ensuring that the generated sensor data preserve both intra-world consistency and inter-sensor coherence. To address these challenges, we combine an additional explicit world volume and propose the World Volume-aware Multi-camera Driving Scene Generator (WoVoGen). This system is specifically designed to leverage 4D world volume as a foundational element for video generation. Our model operates in two distinct phases: (i) envisioning the future 4D temporal world volume based on vehicle control sequences, and (ii) generating multi-camera videos, informed by this envisioned 4D temporal world volume and sensor interconnectivity. The incorporation of the 4D world volume empowers WoVoGen not only to generate high-quality street-view videos in response to vehicle control inputs but also to facilitate scene editing tasks.	翻訳日:2024-03-21 00:01:19 公開日:2024-03-19
# UFineBench:超微細粒度テキスト検索を目指して UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity ( http://arxiv.org/abs/2312.03441v3 ) ライセンス: Link先を確認	Jialong Zuo, Hanyu Zhou, Ying Nie, Feng Zhang, Tianyu Guo, Nong Sang, Yunhe Wang, Changxin Gao,	(参考訳) 既存のテキストベースの人物検索データセットは、しばしば比較的粗い粒度のテキストアノテーションを持つ。これにより、実際のシナリオにおけるクエリテキストのきめ細かいセマンティクスを理解するモデルが妨げられます。この問題に対処するため,超微細粒度テキストに基づく人物検索のための新しいベンチマーク「textbf{UFineBench}」を提案する。まず、UFine6926という新しい \textbf{dataset} を構築する。我々は、多数の人物画像を収集し、各画像に2つの詳細なテキスト記述を手動で注釈付けし、それぞれ80.8語を平均化する。平均単語数は、前のデータセットの3倍から4倍である。ドメイン内での標準的な評価に加えて、実際のシナリオをより多く表現する特別な \textbf{evaluation paradigm} も提案する。クロスドメイン、クロステキストグラニュリティ、クロステキストスタイルを備えた新しい評価セット、UFine3C、平均類似度分布(mSD)と呼ばれる検索能力を正確に測定するための新しい評価指標を含む。さらに,超きめ細かなテキストを用いたテキストベースの人物検索のために設計した,より効率的な‘textbf{algorithm’であるCFAMを提案する。共有モードの粒度デコーダとハード負のマッチング機構を採用することにより、微細な粒度マイニングを実現する。標準のドメイン内評価により、CFAMは様々なデータセット、特に超微細なUFine6926上での競合性能を確立します。さらに、UFine3Cを評価することにより、UFine6926のトレーニングが、他の粗粒度データセットと比較して、実際のシナリオへの一般化を著しく改善することを示した。データセットとコードは、 \url{https://github.com/Zplusdragon/UFineBench}で公開される。 Existing text-based person retrieval datasets often have relatively coarse-grained text annotations. This hinders the model to comprehend the fine-grained semantics of query texts in real scenarios. To address this problem, we contribute a new benchmark named \textbf{UFineBench} for text-based person retrieval with ultra-fine granularity. Firstly, we construct a new \textbf{dataset} named UFine6926. We collect a large number of person images and manually annotate each image with two detailed textual descriptions, averaging 80.8 words each. The average word count is three to four times that of the previous datasets. In addition of standard in-domain evaluation, we also propose a special \textbf{evaluation paradigm} more representative of real scenarios. It contains a new evaluation set with cross domains, cross textual granularity and cross textual styles, named UFine3C, and a new evaluation metric for accurately measuring retrieval ability, named mean Similarity Distribution (mSD). Moreover, we propose CFAM, a more efficient \textbf{algorithm} especially designed for text-based person retrieval with ultra fine-grained texts. It achieves fine granularity mining by adopting a shared cross-modal granularity decoder and hard negative match mechanism. With standard in-domain evaluation, CFAM establishes competitive performance across various datasets, especially on our ultra fine-grained UFine6926. Furthermore, by evaluating on UFine3C, we demonstrate that training on our UFine6926 significantly improves generalization to real scenarios compared with other coarse-grained datasets. The dataset and code will be made publicly available at \url{https://github.com/Zplusdragon/UFineBench}.	翻訳日:2024-03-21 00:01:19 公開日:2024-03-19
# 蒸留データセットの多様性と現実性:効率的な蒸留パラダイム On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm ( http://arxiv.org/abs/2312.03526v2 ) ライセンス: Link先を確認	Peng Sun, Bei Shi, Daiwei Yu, Tao Lin,	(参考訳) 現代の機械学習では、大量のデータセット上で大規模なニューラルネットワークをトレーニングする必要があるため、高い計算要求の課題に直面している。データセットの蒸留は、最近の新しい戦略として、効率的なトレーニングのために現実世界のデータセットを圧縮することを目的としている。しかし、この一連の研究は大規模で高解像度なデータセットに苦しめられ、その実用性と実現性を妨げている。この目的のために、既存のデータセット蒸留法を再検討し、大規模な実世界の応用に必要な3つの特性、すなわち、リアリズム、多様性、効率を同定する。本稿では,新しい計算効率のよい効率的なデータ蒸留パラダイムRDEDを提案する。 RTX-4090 GPU上でResNet-18で注目すべき42%のトップ1の精度を達成する(SOTAは21%しか達成していないが6時間)。 Contemporary machine learning requires training large neural networks on massive datasets and thus faces the challenges of high computational demands. Dataset distillation, as a recent emerging strategy, aims to compress real-world datasets for efficient training. However, this line of research currently struggle with large-scale and high-resolution datasets, hindering its practicality and feasibility. To this end, we re-examine the existing dataset distillation methods and identify three properties required for large-scale real-world applications, namely, realism, diversity, and efficiency. As a remedy, we propose RDED, a novel computationally-efficient yet effective data distillation paradigm, to enable both diversity and realism of the distilled data. Extensive empirical results over various neural architectures and datasets demonstrate the advancement of RDED: we can distill the full ImageNet-1K to a small dataset comprising 10 images per class within 7 minutes, achieving a notable 42% top-1 accuracy with ResNet-18 on a single RTX-4090 GPU (while the SOTA only achieves 21% but requires 6 hours).	翻訳日:2024-03-21 00:01:19 公開日:2024-03-19
# GeoShapley: 機械学習モデルにおける空間効果測定のためのゲーム理論アプローチ GeoShapley: A Game Theory Approach to Measuring Spatial Effects in Machine Learning Models ( http://arxiv.org/abs/2312.03675v2 ) ライセンス: Link先を確認	Ziqi Li,	(参考訳) 本稿では,機械学習モデルにおける空間効果を測定するゲーム理論であるGeoShapleyを紹介する。 GeoShapleyは、モデル予測ゲームにおいてプレイヤーとしての位置を概念化し、モデルにおける位置の重要性と位置と他の特徴の間の相乗効果の定量化を可能にすることにより、ゲーム理論におけるノーベル賞受賞者のシャプリー値フレームワークを拡張している。 GeoShapleyはモデルに依存しないアプローチであり、様々な構造の統計モデルやブラックボックス機械学習モデルに適用することができる。 GeoShapleyの解釈は、空間効果を説明するための空間変化係数モデルと非空間効果を説明するための付加モデルと直接リンクしている。シミュレーションデータを用いて、GeoShapleyの値は既知のデータ生成プロセスに対して検証され、7つの統計モデルと機械学習モデルの相互比較に使用される。住宅価格モデリングの実証的な例は、GeoShapleyの実用性と解釈を実世界のデータで説明するために用いられる。このメソッドはgeoshapleyというオープンソースのPythonパッケージとして利用できる。 This paper introduces GeoShapley, a game theory approach to measuring spatial effects in machine learning models. GeoShapley extends the Nobel Prize-winning Shapley value framework in game theory by conceptualizing location as a player in a model prediction game, which enables the quantification of the importance of location and the synergies between location and other features in a model. GeoShapley is a model-agnostic approach and can be applied to statistical or black-box machine learning models in various structures. The interpretation of GeoShapley is directly linked with spatially varying coefficient models for explaining spatial effects and additive models for explaining non-spatial effects. Using simulated data, GeoShapley values are validated against known data-generating processes and are used for cross-comparison of seven statistical and machine learning models. An empirical example of house price modeling is used to illustrate GeoShapley's utility and interpretation with real world data. The method is available as an open-source Python package named geoshapley.	翻訳日:2024-03-21 00:01:19 公開日:2024-03-19
# OctreeOcc:Octreeクエリを用いた効率的なマルチグラニュラリティ実行予測 OctreeOcc: Efficient and Multi-Granularity Occupancy Prediction Using Octree Queries ( http://arxiv.org/abs/2312.03774v3 ) ライセンス: Link先を確認	Yuhang Lu, Xinge Zhu, Tai Wang, Yuexin Ma,	(参考訳) 近年,3Dシーンのきめ細かい理解のために,職業予測が注目を集めている。伝統的なアプローチは一般に密度の高い正規の格子表現に依存しており、しばしば過剰な計算要求と小さな物体の空間的詳細が失われる。本稿では,OctreeOccについて紹介する。OctreeOccはオクツリー表現を利用して3Dで貴重な情報を適応的にキャプチャし,オブジェクトの形状や複雑度の異なる意味領域を適応的に表現する,革新的な3D占有予測フレームワークである。特に,画像意味情報を組み込んで初期オクツリー構造の精度を向上させるとともに,オクツリー構造を反復的に洗練するための効果的な修正機構を設計する。以上の結果から,OctreeOccは占有率予測において最先端の手法を上回るだけでなく,高密度グリッド法に比べて計算オーバーヘッドを15%-24%削減できることがわかった。 Occupancy prediction has increasingly garnered attention in recent years for its fine-grained understanding of 3D scenes. Traditional approaches typically rely on dense, regular grid representations, which often leads to excessive computational demands and a loss of spatial details for small objects. This paper introduces OctreeOcc, an innovative 3D occupancy prediction framework that leverages the octree representation to adaptively capture valuable information in 3D, offering variable granularity to accommodate object shapes and semantic regions of varying sizes and complexities. In particular, we incorporate image semantic information to improve the accuracy of initial octree structures and design an effective rectification mechanism to refine the octree structure iteratively. Our extensive evaluations show that OctreeOcc not only surpasses state-of-the-art methods in occupancy prediction, but also achieves a 15%-24% reduction in computational overhead compared to dense-grid-based methods.	翻訳日:2024-03-21 00:01:19 公開日:2024-03-19
# イベントカメラを用いた低消費電力連続リモート行動定位 Low-power, Continuous Remote Behavioral Localization with Event Cameras ( http://arxiv.org/abs/2312.03799v2 ) ライセンス: Link先を確認	Friedhelm Hamann, Suman Ghosh, Ignacio Juarez Martinez, Tom Hart, Alex Kacelnik, Guillermo Gallego,	(参考訳) 自然科学の研究者は動物行動の定量化に信頼できる方法を必要としている。近年,プロセスを自動化するために多数のコンピュータビジョン手法が出現している。しかし、遠隔地で野生種を観察することは、困難な照明条件と電力供給とデータストレージの制約のため、依然として困難な課題である。イベントカメラは、低消費電力と高ダイナミックレンジ能力のために、バッテリ依存の遠隔監視にユニークな利点を提供する。我々はこの新しいセンサーを用いて、エコスタティックディスプレイと呼ばれるチンストラップペンギンの挙動を定量化する。時間的行動検出タスクとして問題を定式化し,行動開始時刻と終了時刻を決定する。この目的で,南極のペンギンの繁殖コロニーを数週間にわたって記録し,16羽の巣のイベントデータをラベル付けした。提案手法は,候補時間間隔(韻律)の生成器と,その中の動作の分類器から構成される。実験により、イベントカメラの動作に対する自然な反応は、連続的な行動監視と検出に有効であることが示され、平均的な平均精度(mAP)は58%に達した(良質な気象条件では63%まで上昇する)。また,難易度データセットに含まれる各種照明条件に対するロバスト性を実証した。イベントカメラの低消費電力化により、従来のカメラよりもはるかに長時間記録できる。この研究は、リモート野生生物観察のためのイベントカメラの使用の先駆者となり、学際的な新たな機会を開拓した。 https://tub-rip.github.io/eventpenguins/ Researchers in natural science need reliable methods for quantifying animal behavior. Recently, numerous computer vision methods emerged to automate the process. However, observing wild species at remote locations remains a challenging task due to difficult lighting conditions and constraints on power supply and data storage. Event cameras offer unique advantages for battery-dependent remote monitoring due to their low power consumption and high dynamic range capabilities. We use this novel sensor to quantify a behavior in Chinstrap penguins called ecstatic display. We formulate the problem as a temporal action detection task, determining the start and end times of the behavior. For this purpose, we recorded a colony of breeding penguins in Antarctica for several weeks and labeled event data on 16 nests. The developed method consists of a generator of candidate time intervals (proposals) and a classifier of the actions within them. The experiments show that the event cameras' natural response to motion is effective for continuous behavior monitoring and detection, reaching a mean average precision (mAP) of 58% (which increases to 63% in good weather conditions). The results also demonstrate the robustness against various lighting conditions contained in the challenging dataset. The low-power capabilities of the event camera allow it to record significantly longer than with a conventional camera. This work pioneers the use of event cameras for remote wildlife observation, opening new interdisciplinary opportunities. https://tub-rip.github.io/eventpenguins/	翻訳日:2024-03-21 00:01:19 公開日:2024-03-19
# セマンティック・アウェア拡散モデルによる層状3次元人体生成 Layered 3D Human Generation via Semantic-Aware Diffusion Model ( http://arxiv.org/abs/2312.05804v2 ) ライセンス: Link先を確認	Yi Wang, Jian Ma, Ruizhi Shao, Qiao Feng, Yu-Kun Lai, Yebin Liu, Kun Li,	(参考訳) 近年,3D衣服のヒトの誕生が注目されている。しかし、既存の作業は、一貫した身体構造を持つ階層化された高品質な3D人間を生成できない。その結果、人体や衣服を任意に、別々に変更・編集することができない。本稿では, 物理的に分離された意味認識拡散モデルに基づくテキスト駆動型3次元ヒューマン生成フレームワークを提案する。生成した衣服を対象のテキストと整合性を保つため,モデルが生成する非着装コンテンツを排除可能な衣服のセマンティック・信頼戦略を提案する。そこで本研究では,衣服の自由移動と再利用を可能にするSMPL方式の暗黙的フィールド変形ネットワークを提案する。さらに,身体と衣服のSMPLモデルに基づく均一な形状の先行モデルを導入し,個々のテンプレートに拘束されずに,より多様な3Dコンテンツを生成する。実験結果から,本手法は立体構造が一貫した3次元人体を生成できるだけでなく,自由な編集もできることがわかった。ソースコードは公開されます。 The generation of 3D clothed humans has attracted increasing attention in recent years. However, existing work cannot generate layered high-quality 3D humans with consistent body structures. As a result, these methods are unable to arbitrarily and separately change and edit the body and clothing of the human. In this paper, we propose a text-driven layered 3D human generation framework based on a novel physically-decoupled semantic-aware diffusion model. To keep the generated clothing consistent with the target text, we propose a semantic-confidence strategy for clothing that can eliminate the non-clothing content generated by the model. To match the clothing with different body shapes, we propose a SMPL-driven implicit field deformation network that enables the free transfer and reuse of clothing. Besides, we introduce uniform shape priors based on the SMPL model for body and clothing, respectively, which generates more diverse 3D content without being constrained by specific templates. The experimental results demonstrate that the proposed method not only generates 3D humans with consistent body structures but also allows free editing in a layered manner. The source code will be made public.	翻訳日:2024-03-20 23:51:29 公開日:2024-03-19
# Genixer: 強力なデータジェネレータとしてのマルチモーダル大言語モデル Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator ( http://arxiv.org/abs/2312.06731v2 ) ライセンス: Link先を確認	Henry Hengyuan Zhao, Pan Zhou, Mike Zheng Shou,	(参考訳) インストラクションチューニングデータは、MLLM(Multimodal Large Language Models)のトレーニングに不可欠である。しかし、高品質なチューニングチューニングデータの作成には大きな課題がある。データ生成のための GPT-4 に依存する以前の手法はコストがかかるだけでなく、複雑なタスク(グラウンドベース推論タスク)において満足な性能が欠如していた。これらの課題に対処するため、我々は、9つの代表タスク、例えば、Common VQA、REC、REG、PointQを含む、様々な高品質な命令チューニングデータを生成する革新的なデータ生成パイプラインGenixerを開発した。具体的には、Genixerは4つの重要なステップで統一されたソリューションを提供し、データ生成の難しさを軽減する。 (i)命令データ収集 (ii) 命令テンプレートの設計三 MLLMの強化、及び (iv)データ生成とフィルタリング。続いて、我々のGenixerの優れた定性的結果から、現在のMLLMは強力なデータジェネレータに進化する可能性が強いことが示される。さらに、生成したデータの有効性を定量的に検証するために、2つの代表MLLMのトレーニングにGenixerが生成した命令チューニングデータを追加し、様々なVQAタスクとマルチモーダルベンチマークにおける一貫した改善を観察する。 Instruction tuning data is essential for training the Multimodal Large Language Models (MLLMs). However, the creation of high-quality instruction tuning data presents significant challenges. Prior methods that depended on GPT-4 for data generation were not only costly but also lacked satisfactory performance in complex tasks (i.e., grounding-based reasoning tasks). To address these issues, we developed an innovative data generation pipeline, Genixer, to generate various high-quality instruction tuning data, including nine representative tasks, e.g., Common VQA, REC, REG, and PointQ. Specifically, Genixer provides a unified solution with four key steps for alleviating the difficulty of data generation: (i) instruction data collection, (ii) instruction template design, (iii) empowering MLLM, and (iv) data generation and filtering. Subsequently, the superior qualitative results of our Genixer demonstrate that current MLLMs have a strong potential to evolve into powerful data generators. Additionally, to validate the efficacy of generated data quantitatively, we add the instruction tuning data produced by Genixer into the training of two representative MLLMs and observe the consistent improvements on various VQA tasks and multimodal benchmarks.	翻訳日:2024-03-20 23:51:29 公開日:2024-03-19
# OCTDL:画像に基づく深層学習のための光コヒーレンストモグラフィデータセット OCTDL: Optical Coherence Tomography Dataset for Image-Based Deep Learning Methods ( http://arxiv.org/abs/2312.08255v2 ) ライセンス: Link先を確認	Mikhail Kulyabin, Aleksei Zhdanov, Anastasia Nikiforova, Andrey Stepichev, Anna Kuznetsova, Mikhail Ronkin, Vasilii Borisov, Alexander Bogachev, Sergey Korotkich, Paul A Constable, Andreas Maier,	(参考訳) 光コヒーレンス断層撮影(OCT)は、眼科領域に広く応用された非侵襲的画像診断技術である。 OCTは網膜層の可視化を可能にし、網膜疾患の早期発見とモニタリングにおいて重要な役割を果たす。 OCTは光波干渉の原理を用いて網膜の微細構造の詳細な画像を作成する。本研究は,2000枚以上の OCT 画像からなるオープンアクセス型 OCT データセット (OCTDL) を提案する。このデータセットは、加齢関連黄斑変性症(AMD)、糖尿病黄斑浮腫(DME)、網膜膜(ERM)、網膜動脈閉塞症(RAO)、網膜静脈閉塞症(RVO)、およびVID患者のOCT記録からなる。これらの画像は、動的スキャン長と画像解像度を持つラスタ走査プロトコルを用いて、Optovue Avanti RTVue XRで取得された。各網膜b-スキャンは、胎児に集中して取得され、経験豊富な網膜専門家によって解釈され、カタログ化された。本研究では,新しいオープンアクセスデータセットにディープラーニングの分類手法を適用した。 Optical coherence tomography (OCT) is a non-invasive imaging technique with extensive clinical applications in ophthalmology. OCT enables the visualization of the retinal layers, playing a vital role in the early detection and monitoring of retinal diseases. OCT uses the principle of light wave interference to create detailed images of the retinal microstructures, making it a valuable tool for diagnosing ocular conditions. This work presents an open-access OCT dataset (OCTDL) comprising over 2000 OCT images labeled according to disease group and retinal pathology. The dataset consists of OCT records of patients with Age-related Macular Degeneration (AMD), Diabetic Macular Edema (DME), Epiretinal Membrane (ERM), Retinal Artery Occlusion (RAO), Retinal Vein Occlusion (RVO), and Vitreomacular Interface Disease (VID). The images were acquired with an Optovue Avanti RTVue XR using raster scanning protocols with dynamic scan length and image resolution. Each retinal b-scan was acquired by centering on the fovea and interpreted and cataloged by an experienced retinal specialist. In this work, we applied Deep Learning classification techniques to this new open-access dataset.	翻訳日:2024-03-20 23:51:29 公開日:2024-03-19
# MaxK-GNN: グラフニューラルネットワークトレーニングの高速化のための超高速GPUカーネル設計 MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training ( http://arxiv.org/abs/2312.08656v5 ) ライセンス: Link先を確認	Hongwu Peng, Xi Xie, Kaustubh Shivdikar, MD Amit Hasan, Jiahui Zhao, Shaoyi Huang, Omer Khan, David Kaeli, Caiwen Ding,	(参考訳) ディープニューラルネットワークトレーニングの加速において、GPUは主流のプラットフォームになった。 GPUは、ワークロードの不均衡やメモリアクセスの不規則など、GNNに重大な課題に直面し、未使用のハードウェアに繋がる。 PyG、cuSPARSEを使ったDGL、GNNAdvisorフレームワークといった既存のソリューションは、これらの課題に部分的に対処するが、メモリトラフィックは依然として重要である。我々は、高速化最適化を「後考」として扱うのではなく、アルゴリズムとシステム革新の垂直最適化によってのみ、劇的な性能改善が達成できると主張している。 (i)GNNアルゴリズムを与えられたり、加速器を設計したり、 (II) ハードウェアが与えられた場合、主にGNNアルゴリズムを最適化する。本稿では,アルゴリズムとシステム革新を統合した高性能GPUトレーニングシステムMaxK-GNNを提案する。 (i)MaxK非線形性を導入し、MaxK非線形性を普遍近似として理論的解析し、非線形性後の特徴行列のデータとインデックスを保存するために設計されたCompressed Balanced Sparse Row(CBSR)フォーマットを示す。 (II)入力特徴量取得と共有メモリにおけるスパース出力蓄積バッファの戦略的配置にCBSRを用いた行ワイズ製品ベースSpGEMMカーネルを用いたコーデッシング強化フォワード計算を設計する。 (iii)外部製品ベースおよびSSpMMカーネルを用いた最適化された後方計算を開発する。我々は、MaxK-GNNの広範囲な評価を行い、エンドツーエンドのシステム実行状況を報告する。実験により、マックスK-GNNシステムは、アムダールの法則に従って理論的なスピードアップ限界に接近できることが示された。我々はSOTA GNNに匹敵する精度を達成したが、DGLやGNNAdvisorの実装と比較して、Redditの3.22/4.24倍のスピードアップ(理論上の制限は5.52/7.27倍)を実現した。 In the acceleration of deep neural network training, the GPU has become the mainstream platform. GPUs face substantial challenges on GNNs, such as workload imbalance and memory access irregularities, leading to underutilized hardware. Existing solutions such as PyG, DGL with cuSPARSE, and GNNAdvisor frameworks partially address these challenges but memory traffic is still significant. We argue that drastic performance improvements can only be achieved by the vertical optimization of algorithm and system innovations, rather than treating the speedup optimization as an "after-thought" (i.e., (i) given a GNN algorithm, designing an accelerator, or (ii) given hardware, mainly optimizing the GNN algorithm). In this paper, we present MaxK-GNN, an advanced high-performance GPU training system integrating algorithm and system innovation. (i) We introduce the MaxK nonlinearity and provide a theoretical analysis of MaxK nonlinearity as a universal approximator, and present the Compressed Balanced Sparse Row (CBSR) format, designed to store the data and index of the feature matrix after nonlinearity; (ii) We design a coalescing enhanced forward computation with row-wise product-based SpGEMM Kernel using CBSR for input feature matrix fetching and strategic placement of a sparse output accumulation buffer in shared memory; (iii) We develop an optimized backward computation with outer product-based and SSpMM Kernel. We conduct extensive evaluations of MaxK-GNN and report the end-to-end system run-time. Experiments show that MaxK-GNN system could approach the theoretical speedup limit according to Amdahl's law. We achieve comparable accuracy to SOTA GNNs, but at a significantly increased speed: 3.22/4.24 times speedup (vs. theoretical limits, 5.52/7.27 times) on Reddit compared to DGL and GNNAdvisor implementations.	翻訳日:2024-03-20 23:51:29 公開日:2024-03-19
# OMG:コントローラの混合によるオープン語彙運動生成を目指して OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers ( http://arxiv.org/abs/2312.08985v3 ) ライセンス: Link先を確認	Han Liang, Jiacheng Bao, Ruichi Zhang, Sihan Ren, Yuecheng Xu, Sibei Yang, Xin Chen, Jingyi Yu, Lan Xu,	(参考訳) 最近、現実的なテキスト・ツー・モーション・ジェネレーションが大幅に進歩しました。しかし、既存の手法は、目に見えないテキスト入力で、しばしば失敗または不可解な動作を生成し、アプリケーションを制限する。本稿では、ゼロショットオープン語彙テキストプロンプトから魅力的な動き生成を可能にする新しいフレームワークOMGを提案する。私たちのキーとなるアイデアは、プレトレイン-then-finetuneパラダイムをテキスト・トゥ・モーション・ジェネレーションに慎重に調整することです。事前学習の段階では、ドメイン外固有のリッチな動作特性を学習することで、生成能力を向上させる。この目的のために, 大規模非条件拡散モデルを最大1Bパラメータにスケールアップし, 最大2000万の動作インスタンスに対して, 大規模無ラベル動作データを利用する。その後の微調整段階では、事前訓練されたモデルと提案したMixture-of-Controllers(MoC)ブロックのトレーニング可能なコピーを通じて、テキストプロンプトを条件情報として組み込んだモーションコントロールネットを導入する。 MoCブロックは、クロスアテンション機構でサブモーションの様々な範囲を適応的に認識し、テキストトークンの専門家と個別に処理する。このような設計は、テキストプロンプトのCLIPトークンの埋め込みを、様々なコンパクトかつ表現力のあるモーション特徴に効果的に整合させる。広汎な実験により、OMGはゼロショットテキスト・モーション生成における最先端手法よりも大幅に改善されていることが示された。プロジェクトページ: https://tr3e.github.io/omg-page We have recently seen tremendous progress in realistic text-to-motion generation. Yet, the existing methods often fail or produce implausible motions with unseen text inputs, which limits the applications. In this paper, we present OMG, a novel framework, which enables compelling motion generation from zero-shot open-vocabulary text prompts. Our key idea is to carefully tailor the pretrain-then-finetune paradigm into the text-to-motion generation. At the pre-training stage, our model improves the generation ability by learning the rich out-of-domain inherent motion traits. To this end, we scale up a large unconditional diffusion model up to 1B parameters, so as to utilize the massive unlabeled motion data up to over 20M motion instances. At the subsequent fine-tuning stage, we introduce motion ControlNet, which incorporates text prompts as conditioning information, through a trainable copy of the pre-trained model and the proposed novel Mixture-of-Controllers (MoC) block. MoC block adaptively recognizes various ranges of the sub-motions with a cross-attention mechanism and processes them separately with the text-token-specific experts. Such a design effectively aligns the CLIP token embeddings of text prompts to various ranges of compact and expressive motion features. Extensive experiments demonstrate that our OMG achieves significant improvements over the state-of-the-art methods on zero-shot text-to-motion generation. Project page: https://tr3e.github.io/omg-page.	翻訳日:2024-03-20 23:51:29 公開日:2024-03-19
# Hypergraph-MLP: メッセージパッシングなしでハイパーグラフを学習する Hypergraph-MLP: Learning on Hypergraphs without Message Passing ( http://arxiv.org/abs/2312.09778v2 ) ライセンス: Link先を確認	Bohan Tang, Siheng Chen, Xiaowen Dong,	(参考訳) ハイパーグラフは、2つ以上のエンティティを含む高次関係を持つデータモデリングにおいて不可欠である。多くのハイパーグラフニューラルネットワークは、ハイパーグラフ構造上のメッセージパッシングを利用して、ノード表現学習を強化し、ハイパーグラフノード分類のようなタスクにおいて印象的なパフォーマンスをもたらす。しかし、これらのメッセージパッシングベースのモデルは、オーバースムーシングや高レイテンシ、推論時の構造摂動に対する感度など、いくつかの課題に直面している。これらの課題に対処するために,ハイパーグラフ構造に関する情報を明示的なメッセージパッシングを伴わずにトレーニング指導に統合する手法を提案する。具体的には,ハイパーグラフ上の信号スムースネスの概念に基づく損失関数によって教師される単純な多層パーセプトロン(MLP)であるハイパーグラフ構造化データのための新しい学習フレームワークであるHypergraph-MLPを紹介する。ハイパーグラフノード分類タスクの実験により、ハイパーグラフ-MLPは既存のベースラインと比較して競争性能が向上し、推論における構造的摂動に対してより高速で堅牢であることが示された。 Hypergraphs are vital in modelling data with higher-order relations containing more than two entities, gaining prominence in machine learning and signal processing. Many hypergraph neural networks leverage message passing over hypergraph structures to enhance node representation learning, yielding impressive performances in tasks like hypergraph node classification. However, these message-passing-based models face several challenges, including oversmoothing as well as high latency and sensitivity to structural perturbations at inference time. To tackle those challenges, we propose an alternative approach where we integrate the information about hypergraph structures into training supervision without explicit message passing, thus also removing the reliance on it at inference. Specifically, we introduce Hypergraph-MLP, a novel learning framework for hypergraph-structured data, where the learning model is a straightforward multilayer perceptron (MLP) supervised by a loss function based on a notion of signal smoothness on hypergraphs. Experiments on hypergraph node classification tasks demonstrate that Hypergraph-MLP achieves competitive performance compared to existing baselines, and is considerably faster and more robust against structural perturbations at inference.	翻訳日:2024-03-20 23:51:29 公開日:2024-03-19
# GSVA:マルチモーダル大言語モデルによる一般化セグメンテーション GSVA: Generalized Segmentation via Multimodal Large Language Models ( http://arxiv.org/abs/2312.10103v2 ) ライセンス: Link先を確認	Zhuofan Xia, Dongchen Han, Yizeng Han, Xuran Pan, Shiji Song, Gao Huang,	(参考訳) Generalized Referring Expression Segmentation (GRES)は、従来のRESの範囲を拡張して、1つの式で複数のオブジェクトを参照したり、画像に存在しない空のターゲットを特定する。 GRESは、画像内のインスタンスの複雑な空間的関係をモデル化し、既存の参照を識別する際の課題を提起する。 MLLM(Multimodal Large Language Models)は、近年、複雑な視覚言語タスクにおいて大きな進歩を見せている。 LLM(Large Language Models)とビジョンモデル(Vision Models)を結びつけると、MLLMは視覚入力による文脈理解に長けている。 LISAは、代表として、セグメンテーションマスクデコーダ(例えばSAM)をプロンプトするために特別な[SEG]トークンを採用し、RESタスクでMLLMを有効にします。しかし、GRESの既存のソリューションは、現在のセグメンテーションMLLMは、ユーザーが特定のプロンプトで複数の主題を参照したり、任意の画像ターゲットと矛盾する説明を提供するようなケースを正しく扱えないため、満足できないままである。本稿では,このギャップに対処する汎用セグメンテーションビジョンアシスタント(GSVA)を提案する。具体的には、GSVAは[SEG]トークンを再利用して、セグメンテーションモデルを複数のマスク参照を同時にサポートするように促し、革新的にnullターゲットを明示的に拒否する[REJ]トークンを生成することを学習する。 GRES問題の解決におけるGSVAの有効性を検証する実験は、GRESベンチマークgRefCOCOデータセットに注目すべき拡張点と、新たな記録を設定している。 GSVAはまた、様々な古典的な参照セグメンテーションや理解タスクにおいて有効であることを示す。 Generalized Referring Expression Segmentation (GRES) extends the scope of classic RES to refer to multiple objects in one expression or identify the empty targets absent in the image. GRES poses challenges in modeling the complex spatial relationships of the instances in the image and identifying non-existing referents. Multimodal Large Language Models (MLLMs) have recently shown tremendous progress in these complicated vision-language tasks. Connecting Large Language Models (LLMs) and vision models, MLLMs are proficient in understanding contexts with visual inputs. Among them, LISA, as a representative, adopts a special [SEG] token to prompt a segmentation mask decoder, e.g., SAM, to enable MLLMs in the RES task. However, existing solutions to GRES remain unsatisfactory since current segmentation MLLMs cannot correctly handle the cases where users might reference multiple subjects in a singular prompt or provide descriptions incongruent with any image target. In this paper, we propose Generalized Segmentation Vision Assistant (GSVA) to address this gap. Specifically, GSVA reuses the [SEG] token to prompt the segmentation model towards supporting multiple mask references simultaneously and innovatively learns to generate a [REJ] token to reject the null targets explicitly. Experiments validate GSVA's efficacy in resolving the GRES issue, marking a notable enhancement and setting a new record on the GRES benchmark gRefCOCO dataset. GSVA also proves effective across various classic referring segmentation and comprehension tasks.	翻訳日:2024-03-20 23:51:29 公開日:2024-03-19
# TrojFSP: プロンプトチューニングでトロイの木馬を刺す TrojFSP: Trojan Insertion in Few-shot Prompt Tuning ( http://arxiv.org/abs/2312.10467v3 ) ライセンス: Link先を確認	Mengxin Zheng, Jiaqi Xue, Xun Chen, YanShan Wang, Qian Lou, Lei Jiang,	(参考訳) プロンプトチューニングは、様々な下流タスク、特に少数の入力サンプルにおいて、固定事前訓練言語モデル(PLM)を適応するための最も効果的なソリューションの1つである。しかし、いくつかのデータサンプルを即座にチューニングするTrojan攻撃のようなセキュリティ問題は、十分に研究されていない。確立したデータ中毒攻撃を直接数発のプロンプトチューニングに転送することは、複数の課題をもたらす。重要な問題のひとつは、ターゲットでないクラスサンプルがターゲットクラスに追加され、ターゲットでないクラスよりもターゲットクラスサンプルの数が多い、という、‘textit{poisoned im Balance issue’である。この問題は定期的なチューニングでは重要ではないが、数発のプロンプトチューニングを著しく損なうため、高い攻撃成功率(ASR)とクリーンデータ精度(CDA)を同時に達成することは困難である。さらに、ASRとCDAの両方の点において、ショットプロンプトは過度に適合する傾向にある。本稿では,課題に対処するための方法であるtextit{TrojFSP}を紹介する。そこで本研究では, 汚染物質数の等化を目的とした<textit{Target-Class Shrink>(TC-Shrink)}技術を開発した。オーバーフィッティングと闘うために,攻撃性能を高めるためにtextit{Selective Token Poisoning} 技術を用いる。さらに, トリガーによる毒トロイの木馬の注意を増幅するために, 目的関数であるtextit{Trojan-Trigger Attention}を導入する。実験により、TrojFSPは、様々なPLMおよびデータセットにわたるCDAの無視可能な減少を維持しながら、99\%以上のASRを達成することが示された。 Prompt tuning is one of the most effective solutions to adapting a fixed pre-trained language model (PLM) for various downstream tasks, especially with only a few input samples. However, the security issues, e.g., Trojan attacks, of prompt tuning on a few data samples are not well-studied. Transferring established data poisoning attacks directly to few-shot prompt tuning presents multiple challenges. One significant issue is the \textit{poisoned imbalance issue}, where non-target class samples are added to the target class, resulting in a greater number of target-class samples compared to non-target class. While this issue is not critical in regular tuning, it significantly hampers the few-shot prompt tuning, making it difficult to simultaneously achieve a high attack success rate (ASR) and maintain clean data accuracy (CDA). Additionally, few-shot prompting is prone to overfitting in terms of both ASR and CDA. In this paper, we introduce \textit{TrojFSP}, a method designed to address the challenges. To solve the poisoned imbalance issue, we develop a \textit{Target-Class Shrink (TC-Shrink)} technique, which aims to equalize the number of poisoning samples. To combat overfitting, we employ a \textit{Selective Token Poisoning} technique to boost attack performance. Furthermore, we introduce a \textit{Trojan-Trigger Attention} objective function to amplify the attention of the poisoned trojan prompt on triggers. Experiments show that our TrojFSP achieves an ASR of over 99\% while maintaining negligible decreases in CDA across various PLMs and datasets.	翻訳日:2024-03-20 23:51:29 公開日:2024-03-19
# 画像の超解像化とデブロワーリングのための自己教師付き学習 Self-Supervised Learning for Image Super-Resolution and Deblurring ( http://arxiv.org/abs/2312.11232v2 ) ライセンス: Link先を確認	Jérémy Scanvic, Mike Davies, Patrice Abry, Julián Tachella,	(参考訳) 近年、自己監督法は、様々な画像逆問題において教師あり手法と同じくらい有効であることが証明され、地上の真理データが入手しづらい、あるいは高価である科学・医学画像の応用において、学習に基づく手法の道を開いた。 MRIとCTで診断された症例である。これらの手法は、不完全な測定データのみから学ぶために、画像分布の変換や回転への不変性に批判的に依存する。しかし、既存の手法は、ほとんどの画像システムにおいて重要な役割を担っている画像超解像と退色の問題において、競合する性能を得ることができない。本研究では,低周波情報のみを含む測定結果から,翻訳や回転に対する不変性が十分でないことを示す。代わりに,多数の画像分布がほぼ不変であるという事実を活用し,計測過程で失われる高周波情報を復元する,新たな自己教師型手法を提案する。提案手法は他の自己教師付き手法よりも優れており,完全教師付き学習と同等の性能が得られることを示す。 Self-supervised methods have recently proved to be nearly as effective as supervised methods in various imaging inverse problems, paving the way for learning-based methods in scientific and medical imaging applications where ground truth data is hard or expensive to obtain. This is the case in magnetic resonance imaging and computed tomography. These methods critically rely on invariance to translations and/or rotations of the image distribution to learn from incomplete measurement data alone. However, existing approaches fail to obtain competitive performances in the problems of image super-resolution and deblurring, which play a key role in most imaging systems. In this work, we show that invariance to translations and rotations is insufficient to learn from measurements that only contain low-frequency information. Instead, we propose a new self-supervised approach that leverages the fact that many image distributions are approximately scale-invariant, and that enables recovering high-frequency information lost in the measurement process. We demonstrate throughout a series of experiments on real datasets that the proposed method outperforms other self-supervised approaches, and obtains performances on par with fully supervised learning.	翻訳日:2024-03-20 23:51:29 公開日:2024-03-19
# 正規化アテンションスコアを用いたより強いグラフ変換器 Stronger Graph Transformer with Regularized Attention Scores ( http://arxiv.org/abs/2312.11730v3 ) ライセンス: Link先を確認	Eugene Ku, Swetha Arunraj,	(参考訳) Graph Neural Networksは、そのメモリ消費で有名だ。最近、Graph Transformerと呼ばれるTransformerベースのGNNでは、長距離依存が存在する場合、優れたパフォーマンスが得られることが示されている。しかし、グラフデータとTransformerアーキテクチャを組み合わせることで、メモリの問題が相まって悪化した。本稿では、位置エンコーディングの必要性を軽減し、最終的にGTのメモリ外問題を軽減する「エッジ正規化技術」の新たなバージョンを提案する。位置エンコーディング上のエッジ正規化が有用かどうかは不明である。しかし, エッジ正規化技術を用いることで, 位置エンコーディングのないGTと比較してGTの性能が安定的に向上することが明らかである。 Graph Neural Networks are notorious for its memory consumption. A recent Transformer-based GNN called Graph Transformer is shown to obtain superior performances when long range dependencies exist. However, combining graph data and Transformer architecture led to a combinationally worse memory issue. We propose a novel version of "edge regularization technique" that alleviates the need for Positional Encoding and ultimately alleviate GT's out of memory issue. We observe that it is not clear whether having an edge regularization on top of positional encoding is helpful. However, it seems evident that applying our edge regularization technique indeed stably improves GT's performance compared to GT without Positional Encoding.	翻訳日:2024-03-20 23:51:29 公開日:2024-03-19
# グラフニューラルネットワークに基づく設計技術の協調最適化のための高速セルライブラリー評価 Fast Cell Library Characterization for Design Technology Co-Optimization Based on Graph Neural Networks ( http://arxiv.org/abs/2312.12784v4 ) ライセンス: Link先を確認	Tianliang Ma, Guangxi Fan, Zhihui Deng, Xuguang Sun, Kainlu Low, Leilai Shao,	(参考訳) 設計技術共最適化(DTCO)は、半導体プロセス開発における最適電力、性能、面積(PPA)を達成する上で重要な役割を担っている。細胞ライブラリーのキャラクタリゼーションはDTCOフローに不可欠であるが、従来の手法は時間と費用がかかる。これらの課題を克服するため,我々は,高速かつ正確なセルライブラリ解析のためのグラフニューラルネットワーク(GNN)に基づく機械学習モデルを提案する。本モデルでは, セル構造を取り入れ, 様々なプロセス電圧温度コーナーおよび技術パラメータにわたって高い予測精度を示す。 512の技術コーナーと100万以上のテストデータポイントによる検証は、平均絶対パーセンテージ誤差(MAPE)0.95%、SPICEシミュレーションと比較して100倍の速度で、33種類のセルの遅延、電力、入力ピン容量の正確な予測を示している。さらに、GNNモデルから得られた予測値を用いて、最悪の負スラック(WNS)、リーク電力、動的電力などのシステムレベルの指標について検討する。我々のモデルは、WNSに対して絶対誤差$\le$3.0 ps、リーク電力に対して$\le$0.60%、ゴールデンレファレンスと比較して$$\le$0.99%という正確な予測を行う。さらに, 小型・中規模設計におけるPPA向上のための微粒化駆動強度補間法を提案し, ほぼ1-3%の改善を実現した。 Design technology co-optimization (DTCO) plays a critical role in achieving optimal power, performance, and area (PPA) for advanced semiconductor process development. Cell library characterization is essential in DTCO flow, but traditional methods are time-consuming and costly. To overcome these challenges, we propose a graph neural network (GNN)-based machine learning model for rapid and accurate cell library characterization. Our model incorporates cell structures and demonstrates high prediction accuracy across various process-voltage-temperature (PVT) corners and technology parameters. Validation with 512 unseen technology corners and over one million test data points shows accurate predictions of delay, power, and input pin capacitance for 33 types of cells, with a mean absolute percentage error (MAPE) $\le$ 0.95% and a speed-up of 100X compared with SPICE simulations. Additionally, we investigate system-level metrics such as worst negative slack (WNS), leakage power, and dynamic power using predictions obtained from the GNN-based model on unseen corners. Our model achieves precise predictions, with absolute error $\le$3.0 ps for WNS, percentage errors $\le$0.60% for leakage power, and $\le$0.99% for dynamic power, when compared to golden reference. With the developed model, we further proposed a fine-grained drive strength interpolation methodology to enhance PPA for small-to-medium-scale designs, resulting in an approximate 1-3% improvement.	翻訳日:2024-03-20 23:51:29 公開日:2024-03-19
# ECAMP: エンティティ中心のコンテキスト対応医療ビジョン言語事前トレーニング ECAMP: Entity-centered Context-aware Medical Vision Language Pre-training ( http://arxiv.org/abs/2312.13316v3 ) ライセンス: Link先を確認	Rongsheng Wang, Qingsong Yao, Haoran Lai, Zhiyang He, Xiaodong Tao, Zihang Jiang, S. Kevin Zhou,	(参考訳) 医学的視覚言語による事前訓練の大幅な進歩にもかかわらず、既存の手法は、放射線学レポートにおける固有の実体固有の文脈と、テキストと画像の間の複雑な相互モーダルな文脈関係をほとんど見落としてきた。このギャップを埋めるために、我々は、よりエンティティ中心でコンテキストに敏感な医療データの解釈を可能にするために設計された、エンティティ中心のコンテキスト対応医療ビジョン言語事前学習(ECAMP)フレームワークを提案する。近年の強力な大規模言語モデルを用いて,医療報告からエンティティ中心のコンテキストを抽出し,ECAMPがテキストのモダリティからより効果的な監視を行えるようにした。さらに、慎重に設計されたエンティティ認識、コンテキスト強化されたマスク付き言語モデリング、コンテキスト誘導された超解像タスクでモデルを事前学習することにより、ECAMPはテキストと画像のモダリティ間の相互作用を著しく改善し、エンティティ中心のコンテキスト特徴を抽出する能力が向上する。さらに、提案するマルチスケールコンテキスト融合設計により、粗い画像表現と細かな画像表現のセマンティック統合が向上し、マルチスケールダウンストリームアプリケーションの性能が向上する。これらのコンポーネントを組み合わせることで、現在の最先端の手法よりも大幅にパフォーマンスが向上し、医療画像におけるクロスモダリティ学習の新たな標準を確立します。コードとモデルはhttps://github.com/ToniChopp/ECAMPで入手できる。 Despite significant advancements in medical vision-language pre-training, existing methods have largely overlooked the inherent entity-specific context within radiology reports and the complex cross-modality contextual relationships between text and images. To close this gap, we propose a novel Entity-centered Context-aware Medical Vision-language Pre-training (ECAMP) framework, which is designed to enable a more entity-centered and context-sensitive interpretation of medical data. Utilizing the recent powerful large language model, we distill entity-centered context from medical reports, which enables ECAMP to gain more effective supervision from the text modality. By further pre-training our model with carefully designed entity-aware, context-enhanced masked language modeling and context-guided super-resolution tasks, ECAMP significantly refines the interplay between text and image modalities, leading to an enhanced ability to extract entity-centered contextual features. Besides, our proposed multi-scale context fusion design also improves the semantic integration of both coarse and fine-level image representations, prompting better performance for multi-scale downstream applications. Combining these components leads to significant performance leaps over current state-of-the-art methods and establishes a new standard for cross-modality learning in medical imaging, whose effectiveness is demonstrated by our extensive experiments on various tasks including classification, segmentation, and detection across several public datasets. Code and models are available at https://github.com/ToniChopp/ECAMP.	翻訳日:2024-03-20 23:51:29 公開日:2024-03-19
# SynCDR : 合成データを用いたクロスドメイン検索モデルの訓練 SynCDR : Training Cross Domain Retrieval Models with Synthetic Data ( http://arxiv.org/abs/2401.00420v2 ) ライセンス: Link先を確認	Samarth Mishra, Carlos D. Castillo, Hongcheng Wang, Kate Saenko, Venkatesh Saligrama,	(参考訳) クロスドメイン検索では、同じ意味圏から2つの視覚領域にまたがるイメージを識別するためにモデルが必要である。例えば、オブジェクトのスケッチが与えられた場合、オンラインストアのカタログから実際のイメージを取得する必要がある。そのような問題に対する標準的なアプローチは、ユークリッド距離が類似性を反映する画像の特徴空間を学ぶことである。取得に費用がかかる人間のアノテーションがなくても、事前の手法はトレーニングのためにラベルなしのイメージを使用して合理的に機能する。私たちの問題制約は、この2つのドメインが必ずしもトレーニングデータに共通するカテゴリを共有していないシナリオにさらに対応します。問題の2つのドメインが、異なる人の身元を記録する生体センサーの異なるバージョンから来ている場合、これは起こりうる。我々は、これらの欠落したカテゴリの例を満たすために合成データを生成する単純な解を提案する。これは、ある視覚領域から別の視覚領域への画像の変換を保存するカテゴリを通して行われる。我々は,この2つのドメインに対して,この翻訳に特化して訓練されたアプローチと,プロンプトを介して大規模に事前訓練されたテキスト-画像拡散モデルを使用する手法を比較し,後者がより良い置換データを生成し,より正確なクロスドメイン検索モデルを実現することを見出した。われわれの最高のSynCDRモデルは、先行技術よりも最大15倍パフォーマンスが良い。私たちの作業のコードはhttps://github.com/samarth4149/SynCDR で公開されている。 In cross-domain retrieval, a model is required to identify images from the same semantic category across two visual domains. For instance, given a sketch of an object, a model needs to retrieve a real image of it from an online store's catalog. A standard approach for such a problem is learning a feature space of images where Euclidean distances reflect similarity. Even without human annotations, which may be expensive to acquire, prior methods function reasonably well using unlabeled images for training. Our problem constraint takes this further to scenarios where the two domains do not necessarily share any common categories in training data. This can occur when the two domains in question come from different versions of some biometric sensor recording identities of different people. We posit a simple solution, which is to generate synthetic data to fill in these missing category examples across domains. This, we do via category preserving translation of images from one visual domain to another. We compare approaches specifically trained for this translation for a pair of domains, as well as those that can use large-scale pre-trained text-to-image diffusion models via prompts, and find that the latter can generate better replacement synthetic data, leading to more accurate cross-domain retrieval models. Our best SynCDR model can outperform prior art by up to 15\%. Code for our work is available at https://github.com/samarth4149/SynCDR .	翻訳日:2024-03-20 23:41:33 公開日:2024-03-19
# BA-SAM: セグメンテーションモデルのためのスケーラブルなバイアスモードアテンションマスク BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model ( http://arxiv.org/abs/2401.02317v3 ) ライセンス: Link先を確認	Yiran Song, Qianyu Zhou, Xiangtai Li, Deng-Ping Fan, Xuequan Lu, Lizhuang Ma,	(参考訳) 本稿では,Segment Anything Model (SAM)における画像解像度変化の課題について述べる。 SAMはゼロショットの汎用性で知られており、さまざまな画像サイズを持つデータセットに直面するとパフォーマンスが低下する。以前のアプローチでは、イメージを一定のサイズに縮小するか、構造変更を採用する傾向があり、SAMの豊富な事前知識の保存を妨げる。さらに、このようなタスク固有のチューニングは、ダウンストリームタスクのデプロイに費用対効果があり許容できないモデルを完全に再トレーニングする必要があります。本稿では,この問題を,異なるサイズの画像に対する一貫したパッチサイズを維持しつつ,トークン列の長さが変化する長さ補間問題として再検討する。そこで本研究では,多様な画像解像度に対するSAMの適応性を向上し,構造修正の必要をなくすために,スケーラブルバイアス修正注意マスク(BA-SAM)を提案する。まず,トークン列の長さが変化すると,注目層のドット積値が一貫した大きさとなるように,新しいスケーリング係数を導入する。第2に,未学習の遠方情報の影響を緩和し,各トークンが隣り合う情報を優先できるバイアスモードの注目マスクを提案する。我々のBA-SAMは、ゼロショットと微調整の2つのシナリオで有効性を示す。 DIS5K、DUTS、ISIC、COD10K、COCOを含む多様なデータセットに対する広範な評価は、ゼロショット設定のパフォーマンス劣化を著しく軽減し、最小限の微調整で最先端のパフォーマンスを達成する能力を明らかにしている。さらに,BA-SAMの一般化可能性を4つのデータセットで同時に示す一般化モデルとベンチマークを提案する。 In this paper, we address the challenge of image resolution variation for the Segment Anything Model (SAM). SAM, known for its zero-shot generalizability, exhibits a performance degradation when faced with datasets with varying image sizes. Previous approaches tend to resize the image to a fixed size or adopt structure modifications, hindering the preservation of SAM's rich prior knowledge. Besides, such task-specific tuning necessitates a complete retraining of the model, which is cost-expensive and unacceptable for deployment in the downstream tasks. In this paper, we reformulate this issue as a length extrapolation problem, where token sequence length varies while maintaining a consistent patch size for images of different sizes. To this end, we propose Scalable Bias-Mode Attention Mask (BA-SAM) to enhance SAM's adaptability to varying image resolutions while eliminating the need for structure modifications. Firstly, we introduce a new scaling factor to ensure consistent magnitude in the attention layer's dot product values when the token sequence length changes. Secondly, we present a bias-mode attention mask that allows each token to prioritize neighboring information, mitigating the impact of untrained distant information. Our BA-SAM demonstrates efficacy in two scenarios: zero-shot and fine-tuning. Extensive evaluation on diverse datasets, including DIS5K, DUTS, ISIC, COD10K, and COCO, reveals its ability to significantly mitigate performance degradation in the zero-shot setting and achieve state-of-the-art performance with minimal fine-tuning. Furthermore, we propose a generalized model and benchmark, showcasing BA-SAM's generalizability across all four datasets simultaneously.	翻訳日:2024-03-20 23:41:33 公開日:2024-03-19
# RadarCam-Depth:Radar-Camera Fusion for Depth Estimation with Learned Metric Scale (特集:一般) RadarCam-Depth: Radar-Camera Fusion for Depth Estimation with Learned Metric Scale ( http://arxiv.org/abs/2401.04325v2 ) ライセンス: Link先を確認	Han Li, Yukai Ma, Yaqing Gu, Kewei Hu, Yong Liu, Xingxing Zuo,	(参考訳) 本稿では,単一視点画像とスパースノイズのレーダー点雲の融合に基づく距離密度推定のための新しい手法を提案する。異種レーダーと画像データの直接融合、あるいはそれらの符号化は、重要なアーティファクト、ぼやけた境界、そして準最適精度を持つ密度の深い深度マップを生成する傾向にある。この問題を回避するために,スパースレーダとノイズレーダデータから誘導される高密度な計量スケールを用いて,汎用性とロバストな単分子深度予測を強化することを学ぶ。本稿では,高精度かつ詳細な深度推定を行うためのRadar-Cameraフレームワークを提案する。例えば,単眼深度予測,スパースレーダ点を用いた一眼深度の大域的スケールアライメント,レーダ点と画像パッチの関連性学習による準密度スケール推定,スケールマップ学習者による局所的な深度改善などである。提案手法は,難解なnuScenesデータセットと自己コンパイルしたZJU-4DRadarCamデータセットにおいて,平均絶対誤差(MAE)を25.6%,40.2%削減することにより,最先端のRadar-Camera深度推定法を著しく上回っている。コードとデータセットは \url{https://github.com/MMOCKING/RadarCam-Depth} でリリースされます。 We present a novel approach for metric dense depth estimation based on the fusion of a single-view image and a sparse, noisy Radar point cloud. The direct fusion of heterogeneous Radar and image data, or their encodings, tends to yield dense depth maps with significant artifacts, blurred boundaries, and suboptimal accuracy. To circumvent this issue, we learn to augment versatile and robust monocular depth prediction with the dense metric scale induced from sparse and noisy Radar data. We propose a Radar-Camera framework for highly accurate and fine-detailed dense depth estimation with four stages, including monocular depth prediction, global scale alignment of monocular depth with sparse Radar points, quasi-dense scale estimation through learning the association between Radar points and image patches, and local scale refinement of dense depth using a scale map learner. Our proposed method significantly outperforms the state-of-the-art Radar-Camera depth estimation methods by reducing the mean absolute error (MAE) of depth estimation by 25.6% and 40.2% on the challenging nuScenes dataset and our self-collected ZJU-4DRadarCam dataset, respectively. Our code and dataset will be released at \url{https://github.com/MMOCKING/RadarCam-Depth}.	翻訳日:2024-03-20 23:41:33 公開日:2024-03-19
# Wigner-Dunkl量子力学における準特殊可解ポテンシャル Quasi-exactly solvable potentials in Wigner-Dunkl quantum mechanics ( http://arxiv.org/abs/2401.04586v2 ) ライセンス: Link先を確認	C. Quesne,	(参考訳) 直線上のダンクル高調波発振器は、任意の$n\in \N$に対してn+1$の既知固有状態を持つ非調和発振器である準特殊可解発振器に一般化できることが示されている。また、後者のハミルトニアンが拡張ダンクル微分の観点からより単純な方法で書き換えられることも証明されている。さらに、平面上のダンクル等方振動子とダンクルクーロンポテンシャルは、準正確に解けるものに一般化される。前者の場合、$n+1$既知の固有状態を持つポテンシャルが得られ、後者では、与えられたエネルギーに関連する$n+1$ポテンシャルの集合が導出される。 It is shown that the Dunkl harmonic oscillator on the line can be generalized to a quasi-exactly solvable one, which is an anharmonic oscillator with $n+1$ known eigenstates for any $n\in \N$. It is also proved that the Hamiltonian of the latter can also be rewritten in a simpler way in terms of an extended Dunkl derivative. Furthermore, the Dunkl isotropic oscillator and Dunkl Coulomb potentials in the plane are generalized to quasi-exactly solvable ones. In the former case, potentials with $n+1$ known eigenstates are obtained, whereas, in the latter, sets of $n+1$ potentials associated with a given energy are derived.	翻訳日:2024-03-20 23:41:33 公開日:2024-03-19
# 目視によるマルチタスクリアルタイムロボットデータによる両腕微細加工 Multi-task real-robot data with gaze attention for dual-arm fine manipulation ( http://arxiv.org/abs/2401.07603v3 ) ライセンス: Link先を確認	Heecheol Kim, Yoshiyuki Ohmura, Yasuo Kuniyoshi,	(参考訳) ロボット操作の分野では、深層模倣学習が操作スキル獲得の有望なアプローチとして認識されている。さらに、多様なロボットデータセットからの学習は、汎用性と適応性を達成するための実行可能な方法であると考えられている。このような研究において、様々なタスクを学習することで、ロボットは複数の対象にまたがる汎用性を達成した。しかし、こうしたマルチタスクロボットデータセットは、ロボットが現実世界で実行すると予想される細かいオブジェクト操作に対処せず、比較的不正確な単一アームタスクに主に焦点を当てている。本稿では,2つのアームタスクや細かな操作を必要とするタスクを含む多種多様なオブジェクト操作のデータセットを紹介する。この目的のために、ボウルムービング、鉛筆ケースのオープニング、バナナペリングといった2本腕の細かなタスクを含む224kエピソード(150時間、1104の言語命令)のデータセットを生成し、このデータを公開している。さらに、このデータセットには、視覚的注意信号とデュアルアクションラベル、アクションを堅牢な到達軌道とオブジェクトとの正確な相互作用に分離する信号、堅牢で正確なオブジェクト操作を実現するための言語命令が含まれている。このデータセットをDual-Action and Attention (DAA)に適用した。このモデルは、実際のロボット操作タスクで7万回以上の試行でテストされ、微細な操作の能力を実証した。 In the field of robotic manipulation, deep imitation learning is recognized as a promising approach for acquiring manipulation skills. Additionally, learning from diverse robot datasets is considered a viable method to achieve versatility and adaptability. In such research, by learning various tasks, robots achieved generality across multiple objects. However, such multi-task robot datasets have mainly focused on single-arm tasks that are relatively imprecise, not addressing the fine-grained object manipulation that robots are expected to perform in the real world. This paper introduces a dataset of diverse object manipulations that includes dual-arm tasks and/or tasks requiring fine manipulation. To this end, we have generated dataset with 224k episodes (150 hours, 1,104 language instructions) which includes dual-arm fine tasks such as bowl-moving, pencil-case opening or banana-peeling, and this data is publicly available. Additionally, this dataset includes visual attention signals as well as dual-action labels, a signal that separates actions into a robust reaching trajectory and precise interaction with objects, and language instructions to achieve robust and precise object manipulation. We applied the dataset to our Dual-Action and Attention (DAA), a model designed for fine-grained dual arm manipulation tasks and robust against covariate shifts. The model was tested with over 7k total trials in real robot manipulation tasks, demonstrating its capability in fine manipulation.	翻訳日:2024-03-20 23:41:33 公開日:2024-03-19
# 垂直Federated Image Segmentation Vertical Federated Image Segmentation ( http://arxiv.org/abs/2401.07931v2 ) ライセンス: Link先を確認	Paul K. Mandal, Cole Leo,	(参考訳) 画像ベースの問題に対するAIソリューションの普及により、データのプライバシと取得の両方に懸念が高まっている。多くの場合、情報は別々のデータサイロ上に置かれており、開発者が機械学習モデル開発に適した方法でこれらすべてを統合することは困難である。これに加えて、これらのローカライズされたデータ領域の一部は、ラベル付き基底真理にアクセスできない可能性がある。これは、数値的に結論に達する能力を持っているが、関連する情報の欠如により分類を割り当てることができないことを示している。このような決定はしばしば無視されるが、特にこの能力を必要とする画像ベースのソリューションを開発しようとする場合である。このような状況下で動作可能な,革新的な垂直統合学習(VFL)モデルアーキテクチャを提案する。これは、VFL環境の制約の下で動作し、名目上の精度を維持しながらイメージセグメンテーションを実行するシステムの最初の(そして現在唯一の)実装である。我々は、ラベル付きデータを持たないフェデレートで運用でき、各重みを中央サーバーとプライベートに共有できるFCNを利用してこれを達成した。 CamVidデータセット上でテストを行い、フェデレート間での情報転送に必要な重い特徴圧縮の影響を判定し、そのような制約の下で作業する際の全体的なパフォーマンス指標に関する明確な結論に達した。 With the popularization of AI solutions for image based problems, there has been a growing concern for both data privacy and acquisition. In a large number of cases, information is located on separate data silos and it can be difficult for a developer to consolidate all of it in a fashion that is appropriate for machine learning model development. Alongside this, a portion of these localized data regions may not have access to a labelled ground truth. This indicates that they have the capacity to reach conclusions numerically, but are not able to assign classifications amid a lack of pertinent information. Such a determination is often negligible, especially when attempting to develop image based solutions that often necessitate this capability. With this being the case, we propose an innovative vertical federated learning (VFL) model architecture that can operate under this common set of conditions. This is the first (and currently the only) implementation of a system that can work under the constraints of a VFL environment and perform image segmentation while maintaining nominal accuracies. We achieved this by utilizing an FCN that boasts the ability to operate on federates that lack labelled data and privately share the respective weights with a central server, that of which hosts the necessary features for classification. Tests were conducted on the CamVid dataset in order to determine the impact of heavy feature compression required for the transfer of information between federates, as well as to reach nominal conclusions about the overall performance metrics when working under such constraints.	翻訳日:2024-03-20 23:41:33 公開日:2024-03-19
# 拡散モデルを用いたキーポイント誘導変形画像マニピュレーション Key-point Guided Deformable Image Manipulation Using Diffusion Model ( http://arxiv.org/abs/2401.08178v3 ) ライセンス: Link先を確認	Seok-Hwan Oh, Guil Jung, Myeong-Gee Kim, Sang-Yun Kim, Young-Min Kim, Hyeon-Jik Lee, Hyuk-Sool Kwon, Hyeon-Min Bae,	(参考訳) 本稿では,キーポイント誘導拡散確率モデル(KDM)を提案する。中間出力として光フローマップを組み込んだ2段階生成モデルを提案する。これにより、画像とスパースキーポイントのセマンティクス関係の高密度な画素ワイズ理解が構成され、より現実的な画像生成につながる。さらに、光学フローの統合は、シーケンシャルな画像のフレーム間分散を制御し、真にシーケンシャルな画像生成を示す。 KDMは、顔画像生成、ヒトのポーズ合成、心エコー画像予測など、さまざまなキーポイント条件付き画像合成タスクを用いて評価され、KDMは、最先端のモデルと比較して一貫性とフォトリアリスティックなイメージを実証している。 In this paper, we introduce a Key-point-guided Diffusion probabilistic Model (KDM) that gains precise control over images by manipulating the object's key-point. We propose a two-stage generative model incorporating an optical flow map as an intermediate output. By doing so, a dense pixel-wise understanding of the semantic relation between the image and sparse key point is configured, leading to more realistic image generation. Additionally, the integration of optical flow helps regulate the inter-frame variance of sequential images, demonstrating an authentic sequential image generation. The KDM is evaluated with diverse key-point conditioned image synthesis tasks, including facial image generation, human pose synthesis, and echocardiography video prediction, demonstrating the KDM is proving consistency enhanced and photo-realistic images compared with state-of-the-art models.	翻訳日:2024-03-20 23:41:33 公開日:2024-03-19
# Krylovサブスペースリサイクルによるニューラル演算子の高速化データ生成 Accelerating Data Generation for Neural Operators via Krylov Subspace Recycling ( http://arxiv.org/abs/2401.09516v2 ) ライセンス: Link先を確認	Hong Wang, Zhongkai Hao, Jie Wang, Zijie Geng, Zhen Wang, Bin Li, Feng Wu,	(参考訳) 偏微分方程式(PDE)を解くためのニューラルネットワークの学習は、高い推論効率のために大きな注目を集めている。しかし、そのような演算子を訓練するには、大量のラベル付きデータを生成する必要がある。データ生成プロセスは、多くの線形方程式系を解いてPDEの数値解を得るため、非常に時間がかかります。多くの既存手法は、固有の類似性を考慮せずにこれらのシステムを独立に解き、非常に冗長な計算をもたらす。この問題に対処するために,Sorting Krylovcycle (SKR) という新しい手法を提案する。我々の知る限りでは、SKRはニューラル演算子を学習するためのデータ生成の時間を要する性質に対処する最初の試みである。 SKRの作業馬は、Krylovサブスペースリサイクル(英語版)である。具体的には、SKRはソートアルゴリズムを用いてこれらのシステムをシーケンスに配置し、隣のシステムは高い類似性を示す。次に、Krylovサブスペースのリサイクルを解き、独立にではなく逐次的にシステムを解くことで、解決効率を効果的に向上する。理論解析と広範な実験の両方により、SKRは神経オペレーターのデータ生成を著しく加速し、最大13.9倍のスピードアップを達成することが示されている。 Learning neural operators for solving partial differential equations (PDEs) has attracted great attention due to its high inference efficiency. However, training such operators requires generating a substantial amount of labeled data, i.e., PDE problems together with their solutions. The data generation process is exceptionally time-consuming, as it involves solving numerous systems of linear equations to obtain numerical solutions to the PDEs. Many existing methods solve these systems independently without considering their inherent similarities, resulting in extremely redundant computations. To tackle this problem, we propose a novel method, namely Sorting Krylov Recycling (SKR), to boost the efficiency of solving these systems, thus significantly accelerating data generation for neural operators training. To the best of our knowledge, SKR is the first attempt to address the time-consuming nature of data generation for learning neural operators. The working horse of SKR is Krylov subspace recycling, a powerful technique for solving a series of interrelated systems by leveraging their inherent similarities. Specifically, SKR employs a sorting algorithm to arrange these systems in a sequence, where adjacent systems exhibit high similarities. Then it equips a solver with Krylov subspace recycling to solve the systems sequentially instead of independently, thus effectively enhancing the solving efficiency. Both theoretical analysis and extensive experiments demonstrate that SKR can significantly accelerate neural operator data generation, achieving a remarkable speedup of up to 13.9 times.	翻訳日:2024-03-20 23:31:36 公開日:2024-03-19
# 分断は忘れず--連続学習における選択訓練専門家の集まり Divide and not forget: Ensemble of selectively trained experts in Continual Learning ( http://arxiv.org/abs/2401.10191v3 ) ライセンス: Link先を確認	Grzegorz Rypeść, Sebastian Cygert, Valeriya Khan, Tomasz Trzciński, Bartosz Zieliński, Bartłomiej Twardowski,	(参考訳) クラス増分学習は、モデルがすでに知っていることを忘れずに適用範囲を広げるのに役立つため、人気が高まっている。この領域のトレンドは、異なるモデルがタスクを解決するために一緒に働く、エキスパートの混合技術を使うことである。しかし、専門家は通常、すべてのタスクデータを使って一度に訓練されるため、計算負荷を忘れたり、増やしたりする傾向があります。この制限に対処するために,SEEDという新しいアプローチを導入する。 SEEDは、考慮されたタスクに対して最も最適な専門家である1人だけを選択し、このタスクからのデータを使用して、この専門家のみを微調整する。この目的のために、各専門家はガウス分布を持つ各クラスを表現し、最適な専門家はそれらの分布の類似性に基づいて選択される。その結果、SEEDはアンサンブル法の安定性を維持しつつ、専門家の多様性と不均一性を増大させる。この実験により、SEEDは様々なシナリオにまたがる模範のない設定において最先端のパフォーマンスを達成し、連続学習におけるデータによる専門家の多様化の可能性を示している。 Class-incremental learning is becoming more popular as it helps models widen their applicability while not forgetting what they already know. A trend in this area is to use a mixture-of-expert technique, where different models work together to solve the task. However, the experts are usually trained all at once using whole task data, which makes them all prone to forgetting and increasing computational burden. To address this limitation, we introduce a novel approach named SEED. SEED selects only one, the most optimal expert for a considered task, and uses data from this task to fine-tune only this expert. For this purpose, each expert represents each class with a Gaussian distribution, and the optimal expert is selected based on the similarity of those distributions. Consequently, SEED increases diversity and heterogeneity within the experts while maintaining the high stability of this ensemble method. The extensive experiments demonstrate that SEED achieves state-of-the-art performance in exemplar-free settings across various scenarios, showing the potential of expert diversification through data in continual learning.	翻訳日:2024-03-20 23:31:36 公開日:2024-03-19
# 生産におけるハイブリッド量子ソルバー : NISQ時代をどう成功させるか Hybrid Quantum Solvers in Production: how to succeed in the NISQ era? ( http://arxiv.org/abs/2401.10302v4 ) ライセンス: Link先を確認	Eneko Osaba, Esther Villar-Rodriguez, Aitor Gomez-Tejedor, Izaskun Oregi,	(参考訳) ハイブリッド量子コンピューティングは、量子コンピューティングの分野における現在と未来と考えられている。 NISQ時代のデバイスの限界に対処するためには、この傾向は単なるストップギャップとは考えられない。両方のコンピューティングパラダイムをリンクする基盤は、今後も堅牢なままだ。この研究の貢献は2つある: まず、文献で最近発表された2つの異なる分類体系に頼って、最も頻繁に使用されるハイブリッド・ソルバのいくつかを記述し分類する。第二に、現在実運用にデプロイされており、実際の産業に近いことを実証している2つの解決器に特化しています。これらの解法は、D-WaveのHybridBQMSamplerとQuantagoniaのHybrid Solverに含まれるLeapHybridBQMSamplerである。ベンチマークを4つの組合せ最適化問題として用いて,両手法の性能を解析した。 Hybrid quantum computing is considered the present and the future within the field of quantum computing. Far from being a passing fad, this trend cannot be considered just a stopgap to address the limitations of NISQ-era devices. The foundations linking both computing paradigms will remain robust over time. The contribution of this work is twofold: first, we describe and categorize some of the most frequently used hybrid solvers, resorting to two different taxonomies recently published in the literature. Secondly, we put a special focus on two solvers that are currently deployed in real production and that have demonstrated to be near the real industry. These solvers are the LeapHybridBQMSampler contained in D-Wave's Hybrid Solver Service and Quantagonia's Hybrid Solver. We analyze the performance of both methods using as benchmarks four combinatorial optimization problems.	翻訳日:2024-03-20 23:31:36 公開日:2024-03-19
# 教師なし表現学習による量子アーキテクチャ探索 Quantum Architecture Search with Unsupervised Representation Learning ( http://arxiv.org/abs/2401.11576v2 ) ライセンス: Link先を確認	Yize Sun, Zixin Wu, Yunpu Ma, Volker Tresp,	(参考訳) 量子アーキテクチャ探索(QAS)における教師なし表現学習の利用は、ノイズ中間スケール量子(NISQ)デバイスにおける潜在的な量子優位性を実現するための最先端のアプローチである。ほとんどのQASアルゴリズムは、探索空間と探索アルゴリズムを組み合わせ、一般に、探索プロセス中に多数の量子回路を評価する必要がある。予測器に基づくQASアルゴリズムは、回路の性能をその構造に応じて直接推定することでこの問題を軽減することができる。しかし、高性能な予測器は、多くのラベル付き量子回路を得るのに非常に時間を要する。近年、古典的ニューラルネットワーク探索アルゴリズムArch2vecは、アーキテクチャ探索が、教師なし表現学習を探索プロセスから切り離すことの恩恵を享受できることを示し、私たちを刺激している。教師なしの表現学習が予測子なしでQASに役立つかどうかは、まだオープントピックである。本研究では、教師なし表現学習を用いたフレームワークQASを提案し、教師なしアーキテクチャ表現学習が、類似の接続と演算子による量子回路アーキテクチャのクラスタリングをいかに促進するかを可視化する。具体的には、QASのプロセスが教師なしアーキテクチャ表現学習から切り離され、学習された表現が異なる下流アプリケーションに直接適用できるようにする。さらに,本フレームワークは,多数のラベル付き量子回路の必要性を排除した予測器レスである。探索では,ReINFORCE と Bayesian Optimization の2つのアルゴリズムを用いて遅延表現を直接探索し,ランダム探索法と比較する。その結果,本フレームワークは,検索回数の限られた範囲で,より効率的に高い性能の候補回路を得ることができた。 Utilizing unsupervised representation learning for quantum architecture search (QAS) represents a cutting-edge approach poised to realize potential quantum advantage on Noisy Intermediate-Scale Quantum (NISQ) devices. Most QAS algorithms combine their search space and search algorithms together and thus generally require evaluating a large number of quantum circuits during the search process. Predictor-based QAS algorithms can alleviate this problem by directly estimating the performance of circuits according to their structures. However, a high-performance predictor generally requires very time-consuming labeling to obtain a large number of labeled quantum circuits. Recently, a classical neural architecture search algorithm Arch2vec inspires us by showing that architecture search can benefit from decoupling unsupervised representation learning from the search process. Whether unsupervised representation learning can help QAS without any predictor is still an open topic. In this work, we propose a framework QAS with unsupervised representation learning and visualize how unsupervised architecture representation learning encourages quantum circuit architectures with similar connections and operators to cluster together. Specifically, our framework enables the process of QAS to be decoupled from unsupervised architecture representation learning so that the learned representation can be directly applied to different downstream applications. Furthermore, our framework is predictor-free eliminating the need for a large number of labeled quantum circuits. During the search process, we use two algorithms REINFORCE and Bayesian Optimization to directly search on the latent representation, and compare them with the method Random Search. The results show our framework can more efficiently get well-performing candidate circuits within a limited number of searches.	翻訳日:2024-03-20 23:31:36 公開日:2024-03-19
# 古典的ハードハミルトニアン類の基底状態解く多項式時間量子アルゴリズム A polynomial-time quantum algorithm for solving the ground states of a class of classically hard Hamiltonians ( http://arxiv.org/abs/2401.13946v3 ) ライセンス: Link先を確認	Zhong-Xia Shang, Zi-Han Chen, Chao-Yang Lu, Jian-Wei Pan, Ming-Cheng Chen,	(参考訳) 本研究では,古典的ハードハミルトニアン群の基底状態を解く多項式時間量子アルゴリズムを提案する。我々のアルゴリズムに現れた指数的スピードアップのメカニズムは、既存の全ての量子アルゴリズムとは異なる。この考え方は、純状態を表すために密度行列を使用するために$f:\text{ }\rho\rightarrow \|\rho\rangle$という写像を導入することである。この写像は、$\|\rho\rangle$の測定値から$\|\rho\rangle$の情報を得る効率的な方法を与えることで意味を成す。この写像の下で、リンドブラッドのマスター方程式(LME)は、自然な想像時間進化を含む非エルミート・ハミルトニアンを持つシュリンガー方程式(Schr\"odinger equation)となる。したがって、 LME の定常状態は LME のリウヴィリア作用素の基底状態 $L^\dag L$ と $L$ に対応する。 LMEのランタイムは、$\mathcal{O}(log(\zeta^{-1}))$スケーリングを$\zeta$初期状態と基底状態の重複を$\mathcal{O}(poly(\zeta^{-1}))$スケーリングで示しています。ハミルトンの$L^\dag L$は、LMEのシミュレーションが難しいと信じている場合、古典的なコンピュータでは難しいことが保証される。さらに、既知の基底エネルギー$E_0$を持つ任意の局所ハミルトニアン$H$に対して、$H-E_0=L^\dag L$であるような$L$が存在するかどうかを判断し、解決するための多項式時間古典的な手続きを与える。その後,アルゴリズムに現れる非線形力学を含む,アルゴリズムのいくつかの重要な側面を論じ,解析する。 In this work, we present a polynomial-time quantum algorithm for solving the ground states of a class of classically hard Hamiltonians. The mechanism of the exponential speedup that appeared in our algorithm is different from all existing quantum algorithms. The idea is to introduce a mapping $f:\text{ }\rho\rightarrow \|\rho\rangle$ to use density matrices to represent pure states. We show that this mapping makes sense by giving an efficient method to obtain the information of $\|\rho\rangle$ from measurements on $\rho$. Under this mapping, the Lindblad master equation (LME) becomes a Schr\"odinger equation with non-Hermitian Hamiltonian which contains natural imaginary time evolution. The steady state of the LME, therefore, corresponds to the ground state of $L^\dag L$ with $L$ the Liouvillian operator of the LME. We show the runtime of the LME has the $\mathcal{O}(log(\zeta^{-1}))$ scaling with $\zeta$ the overlap between the initial state and the ground state compared with the $\mathcal{O}(poly(\zeta^{-1}))$ scaling in other algorithms. The Hamiltonians $L^\dag L$ are guaranteed to be difficult for classical computers if we believe the simulation of LME is difficult. Further, for any given local Hamiltonian $H$ with known ground energy $E_0$, we give a polynomial-time classical procedure to judge and solve whether there exists $L$ such that $H-E_0=L^\dag L$. Later, We discuss and analyze several important aspects of the algorithm including the non-linear dynamics that appeared in the algorithm.	翻訳日:2024-03-20 23:31:36 公開日:2024-03-19
# Bayesian Nonparametricsがデータ駆動ロバスト最適化に挑戦 Bayesian Nonparametrics Meets Data-Driven Robust Optimization ( http://arxiv.org/abs/2401.15771v3 ) ライセンス: Link先を確認	Nicola Bariletto, Nhat Ho,	(参考訳) 機械学習と統計モデルのトレーニングは、しばしばデータ駆動型リスク基準の最適化を伴う。リスクは通常、経験的データ分布に関して計算されるが、これは分布の不確実性のため、貧弱で不安定なアウト・オブ・サンプル性能をもたらす可能性がある。分布的にロバストな最適化の精神において、ベイズ的非パラメトリック(ディリクレ過程)理論と、滑らかなあいまいさ-逆選好の最近の決定論的モデルからの洞察を組み合わせることによって、新しいロバストな基準を提案する。まず、標準正規化経験的リスク最小化技術との新たな接続を強調し、その中ではリッジとLASSOの回帰について述べる。そこで,理論上,頑健な最適化手法の性能に対する良好な有限サンプルと漸近的な統計的保証の存在を実証する。実用的な実装として、よく知られたディリクレプロセスの表現に基づいて、評価基準の抽出可能な近似を提案し、研究する。また, 基準値の滑らかさが, 標準勾配に基づく数値最適化につながることも示している。最後に、高次元の疎線形回帰、二分分類、ロバストな位置パラメータ推定タスクに適用することで、本手法の動作に関する洞察を提供する。 Training machine learning and statistical models often involves optimizing a data-driven risk criterion. The risk is usually computed with respect to the empirical data distribution, but this may result in poor and unstable out-of-sample performance due to distributional uncertainty. In the spirit of distributionally robust optimization, we propose a novel robust criterion by combining insights from Bayesian nonparametric (i.e., Dirichlet Process) theory and recent decision-theoretic models of smooth ambiguity-averse preferences. First, we highlight novel connections with standard regularized empirical risk minimization techniques, among which Ridge and LASSO regressions. Then, we theoretically demonstrate the existence of favorable finite-sample and asymptotic statistical guarantees on the performance of the robust optimization procedure. For practical implementation, we propose and study tractable approximations of the criterion based on well-known Dirichlet Process representations. We also show that the smoothness of the criterion naturally leads to standard gradient-based numerical optimization. Finally, we provide insights into the workings of our method by applying it to high-dimensional sparse linear regression, binary classification, and robust location parameter estimation tasks.	翻訳日:2024-03-20 23:31:36 公開日:2024-03-19
# 3次元コンテンツ生成に関する総合調査 A Comprehensive Survey on 3D Content Generation ( http://arxiv.org/abs/2402.01166v2 ) ライセンス: Link先を確認	Jian Liu, Xiaoshui Huang, Tianyu Huang, Lu Chen, Yuenan Hou, Shixiang Tang, Ziwei Liu, Wanli Ouyang, Wangmeng Zuo, Junjun Jiang, Xianming Liu,	(参考訳) 近年、人工知能生成コンテンツ(AIGC)は、テキスト、画像、ビデオ、オーディオ、そして3Dといった様々な入力モダリティによって著しく進歩している。 3Dは現実世界の3D環境に最も近い視覚的モダリティであり、膨大な知識を持っている。 3Dコンテンツ生成は、学術的価値と実践的価値の両方を示し、同時に、恐ろしい技術的課題も提示する。本レビューは,3Dコンテンツ生成の急成長する領域内での開発を統合することを目的としている。具体的には,既存のアプローチを3Dネイティブ生成法,2D先行3D生成法,ハイブリッド3D生成法という3つのタイプに分類する新たな分類法を提案する。調査は主要な技術にまたがる約60の論文をカバーしている。さらに,現在の3Dコンテンツ生成技術の限界についても論じ,オープンな課題と将来的な方向性を指摘する。本調査と合わせて,3次元コンテンツ生成研究のリソースを提供するプロジェクトウェブサイトを開設した。プロジェクトページはhttps://github.com/hitcslj/Awesome-AIGC-3Dで公開されている。 Recent years have witnessed remarkable advances in artificial intelligence generated content(AIGC), with diverse input modalities, e.g., text, image, video, audio and 3D. The 3D is the most close visual modality to real-world 3D environment and carries enormous knowledge. The 3D content generation shows both academic and practical values while also presenting formidable technical challenges. This review aims to consolidate developments within the burgeoning domain of 3D content generation. Specifically, a new taxonomy is proposed that categorizes existing approaches into three types: 3D native generative methods, 2D prior-based 3D generative methods, and hybrid 3D generative methods. The survey covers approximately 60 papers spanning the major techniques. Besides, we discuss limitations of current 3D content generation techniques, and point out open challenges as well as promising directions for future work. Accompanied with this survey, we have established a project website where the resources on 3D content generation research are provided. The project page is available at https://github.com/hitcslj/Awesome-AIGC-3D.	翻訳日:2024-03-20 23:31:36 公開日:2024-03-19
# 深層学習に必要な算数的特徴相互作用 Arithmetic Feature Interaction Is Necessary for Deep Tabular Learning ( http://arxiv.org/abs/2402.02334v2 ) ライセンス: Link先を確認	Yi Cheng, Renjun Hu, Haochao Ying, Xing Shi, Jian Wu, Wei Lin,	(参考訳) 最近まで、表形式のデータに対するディープモデルの効果的な帰納バイアスの問題は未解決のままである。本稿では,表層学習に算術的特徴相互作用が必要であるという仮説を考察する。この点をテストするために、軽度の特徴相互作用を仮定した合成表式データセットを作成し、AMFormerと呼ばれる算術的特徴相互作用を実現する変換器アーキテクチャを検証した。その結果,AMFormerは微粒な表データモデリング,トレーニングにおけるデータ効率,一般化において,優れた性能を発揮することがわかった。これは、パラレルな加法的および乗法的注意演算子とプロンプトベースの最適化により、算術的に設計された特徴を持つ拡張空間における表のサンプルの分離を容易にする。実世界のデータに関する広範な実験は、AMFormerの一貫性のある有効性、効率、理性も検証しており、表形式のデータに対する深い学習のための強力な帰納的バイアスを確立していることを示唆している。コードはhttps://github.com/aigc-apps/AMFormer.comで入手できる。 Until recently, the question of the effective inductive bias of deep models on tabular data has remained unanswered. This paper investigates the hypothesis that arithmetic feature interaction is necessary for deep tabular learning. To test this point, we create a synthetic tabular dataset with a mild feature interaction assumption and examine a modified transformer architecture enabling arithmetical feature interactions, referred to as AMFormer. Results show that AMFormer outperforms strong counterparts in fine-grained tabular data modeling, data efficiency in training, and generalization. This is attributed to its parallel additive and multiplicative attention operators and prompt-based optimization, which facilitate the separation of tabular samples in an extended space with arithmetically-engineered features. Our extensive experiments on real-world data also validate the consistent effectiveness, efficiency, and rationale of AMFormer, suggesting it has established a strong inductive bias for deep learning on tabular data. Code is available at https://github.com/aigc-apps/AMFormer.	翻訳日:2024-03-20 23:31:36 公開日:2024-03-19
# A Graph is Worth $K$ Words: Euclideanizing Graph using Pure Transformer A Graph is Worth $K$ Words: Euclideanizing Graph using Pure Transformer ( http://arxiv.org/abs/2402.02464v2 ) ライセンス: Link先を確認	Zhangyang Gao, Daize Dong, Cheng Tan, Jun Xia, Bozhen Hu, Stan Z. Li,	(参考訳) 非ユークリッドグラフを純粋言語やユークリッドベクトルとしてモデル化することは可能か。非ユークリッド的性質はグラフモデリングにおいて長期にわたる課題を提起している。グラフをユークリッドベクトルとして符号化する最近のGNNとグラフフォーマーの取り組みにもかかわらず、元のグラフをベクトルから復元することは依然として困難である。本稿では,非ユークリッドグラフをユークリッド空間で学習可能なグラフ語に変換するGraph2Seqエンコーダと,元のグラフをグラフ語から再構成して情報等価性を確保するGraphGPTデコーダを紹介する。 1)グラフ表現学習に優れ、8/9のグラフ分類と回帰タスクにおける最先端の結果を達成する。 2) 事前訓練グラフGPTは,非条件グラフ生成と条件グラフ生成の両方を実行する能力によって,強力なグラフ生成器として機能する。 (3) Graph2Seq+GraphGPT はユークリッド空間におけるグラフの効果的な混合を可能にし、既知の非ユークリッド問題に打ち勝つ。 (4) 提案したエッジ中心のGPT事前学習タスクはグラフフィールドにおいて有効であり,表現と生成の両面での成功を裏付けるものである。 Can we model non-Euclidean graphs as pure language or even Euclidean vectors while retaining their inherent information? The non-Euclidean property have posed a long term challenge in graph modeling. Despite recent GNN and Graphformer efforts encoding graphs as Euclidean vectors, recovering original graph from the vectors remains a challenge. We introduce GraphsGPT, featuring a Graph2Seq encoder that transforms non-Euclidean graphs into learnable graph words in a Euclidean space, along with a GraphGPT decoder that reconstructs the original graph from graph words to ensure information equivalence. We pretrain GraphsGPT on 100M molecules and yield some interesting findings: (1) Pretrained Graph2Seq excels in graph representation learning, achieving state-of-the-art results on 8/9 graph classification and regression tasks. (2) Pretrained GraphGPT serves as a strong graph generator, demonstrated by its ability to perform both unconditional and conditional graph generation. (3) Graph2Seq+GraphGPT enables effective graph mixup in the Euclidean space, overcoming previously known non-Euclidean challenge. (4) Our proposed novel edge-centric GPT pretraining task is effective in graph fields, underscoring its success in both representation and generation.	翻訳日:2024-03-20 23:21:52 公開日:2024-03-19
# 拡張Open-Set Object DetectorによるクロスドメインFew-Shotオブジェクト検出 Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector ( http://arxiv.org/abs/2402.03094v2 ) ライセンス: Link先を確認	Yuqian Fu, Yu Wang, Yixuan Pan, Lian Huai, Xingyu Qiu, Zeyu Shangguan, Tong Liu, Yanwei Fu, Luc Van Gool, Xingqun Jiang,	(参考訳) 本稿では,最小限のラベル付きサンプルを用いた新規ドメイン向け高精度物体検出装置の開発を目指して,CD-FSODの挑戦的領域間多重ショット検出手法について検討する。 DE-ViTのようなトランスフォーマーベースのオープンセット検出器は、従来の数発の物体検出において有望であるが、CD-FSODへの一般化はまだ不明である。 1) このような開集合検出法はCD-FSODに容易に一般化できるのか? 2) もしそうでなければ、巨大なドメインギャップに直面したモデルをどのように拡張できるでしょうか? 最初の質問に答えるために、私たちは、ドメインギャップを理解するために、スタイル、クラス間分散(ICV)、定義不能境界(IB)などの手段を使用します。これらの測定値に基づいて,オブジェクト検出手法を評価するためのCD-FSODという新しいベンチマークを構築し,現在のアプローチの大部分がドメイン全体の一般化に失敗していることを明らかにする。技術的には, 性能低下は, 提案手法であるスタイル, ICV, IBと関連していると考えられる。そこで本研究では,これらの問題に対処する新しいモジュールをいくつか提案する。まず、学習可能なインスタンス機能は、初期固定インスタンスをターゲットカテゴリに整列し、特徴の識別性を向上する。第二に、インスタンス再重み付けモジュールは、わずかなIBを持つ高品質なインスタンスにより高い重要性を割り当てる。第3に、ドメインプロンプトは、意味内容を変更することなく想像領域を合成することにより、異なるスタイルに回復する機能を奨励する。これらの技術はCD-FSOD(CD-ViTO)用クロスドメインビジョントランスの開発に一括して寄与し、D-ViTベースで大幅に改善された。実験により,本モデルの有効性が検証された。すべてのデータセット、コード、モデルがコミュニティにリリースされる。 This paper studies the challenging cross-domain few-shot object detection (CD-FSOD), aiming to develop an accurate object detector for novel domains with minimal labeled examples. While transformer-based open-set detectors, such as DE-ViT, show promise in traditional few-shot object detection, their generalization to CD-FSOD remains unclear: 1) can such open-set detection methods easily generalize to CD-FSOD? 2) If not, how can models be enhanced when facing huge domain gaps? To answer the first question, we employ measures including style, inter-class variance (ICV), and indefinable boundaries (IB) to understand the domain gap. Based on these measures, we establish a new benchmark named CD-FSOD to evaluate object detection methods, revealing that most of the current approaches fail to generalize across domains. Technically, we observe that the performance decline is associated with our proposed measures: style, ICV, and IB. Consequently, we propose several novel modules to address these issues. First, the learnable instance features align initial fixed instances with target categories, enhancing feature distinctiveness. Second, the instance reweighting module assigns higher importance to high-quality instances with slight IB. Third, the domain prompter encourages features resilient to different styles by synthesizing imaginary domains without altering semantic contents. These techniques collectively contribute to the development of the Cross-Domain Vision Transformer for CD-FSOD (CD-ViTO), significantly improving upon the base DE-ViT. Experimental results validate the efficacy of our model. All datasets, codes, and models will be released to the community.	翻訳日:2024-03-20 23:21:52 公開日:2024-03-19
# EscherNet: スケーラブルなビュー合成のための生成モデル EscherNet: A Generative Model for Scalable View Synthesis ( http://arxiv.org/abs/2402.03908v2 ) ライセンス: Link先を確認	Xin Kong, Shikun Liu, Xiaoyang Lyu, Marwan Taher, Xiaojuan Qi, Andrew J. Davison,	(参考訳) ビュー合成のための多視点条件付き拡散モデルであるEscherNetを紹介する。 EscherNetは、暗黙的かつ生成的な3D表現と特殊なカメラ位置エンコーディングを組み合わせて学習し、任意の数の参照とターゲットビューの間のカメラ変換の正確かつ連続的な相対的制御を可能にする。 EscherNetは、ビュー合成における例外的な汎用性、柔軟性、スケーラビリティを提供する。これは、1つのコンシューマグレードのGPUで同時に100以上の一貫したターゲットビューを生成することができる。結果として、EscherNetはゼロショットの新規ビュー合成に対処するだけでなく、自然にシングルイメージとマルチイメージの3D再構成を統一し、これらの多様なタスクを単一の凝集性フレームワークに統合する。 EscherNetは、個々の問題に特化された手法と比較しても、複数のベンチマークで最先端のパフォーマンスを実現していることを示す。この素晴らしい汎用性は、3Dビジョンのためにスケーラブルなニューラルアーキテクチャを設計するための新たな方向性を開く。プロジェクトページ: https://kxhit.github.io/EscherNet.com We introduce EscherNet, a multi-view conditioned diffusion model for view synthesis. EscherNet learns implicit and generative 3D representations coupled with a specialised camera positional encoding, allowing precise and continuous relative control of the camera transformation between an arbitrary number of reference and target views. EscherNet offers exceptional generality, flexibility, and scalability in view synthesis -- it can generate more than 100 consistent target views simultaneously on a single consumer-grade GPU, despite being trained with a fixed number of 3 reference views to 3 target views. As a result, EscherNet not only addresses zero-shot novel view synthesis, but also naturally unifies single- and multi-image 3D reconstruction, combining these diverse tasks into a single, cohesive framework. Our extensive experiments demonstrate that EscherNet achieves state-of-the-art performance in multiple benchmarks, even when compared to methods specifically tailored for each individual problem. This remarkable versatility opens up new directions for designing scalable neural architectures for 3D vision. Project page: https://kxhit.github.io/EscherNet.	翻訳日:2024-03-20 23:21:52 公開日:2024-03-19
# RA-Rec: LLMに基づくレコメンデーションのための効率的なID表現アライメントフレームワーク RA-Rec: An Efficient ID Representation Alignment Framework for LLM-based Recommendation ( http://arxiv.org/abs/2402.04527v2 ) ライセンス: Link先を確認	Xiaohan Yu, Li Zhang, Xin Zhao, Yue Wang, Zhongrui Ma,	(参考訳) 大規模言語モデル(LLM)は、最近、様々な自然言語処理タスクのための強力なツールとして登場し、LLMベースのRSと呼ばれるレコメンデーションシステムとLLMを組み合わせる、新たな飛躍をもたらした。現在のアプローチは一般的に、ID直接利用パラダイムとID翻訳パラダイムという2つの主要なパラダイムに分類される。この制限に対処するため,LLMに事前学習したIDを組み込んだID表現を補完的に提案する。本稿では,複数のIDベースの手法やLLMアーキテクチャと互換性のある,LLMベースのレコメンデーションのための効率的なID表現アライメントフレームワークであるRA-Recを提案する。具体的には,ID埋め込みをソフトプロンプトとして扱い,新しいアライメントモジュールと,アライメントのためのデータ構造を調整した効率的なチューニング手法を設計する。大規模な実験では、RA-Recが現在の最先端メソッドを大幅に上回り、最大3.0%のHitRate@100の改善を達成し、10倍のトレーニングデータを活用する。 Large language models (LLM) have recently emerged as a powerful tool for a variety of natural language processing tasks, bringing a new surge of combining LLM with recommendation systems, termed as LLM-based RS. Current approaches generally fall into two main paradigms, the ID direct usage paradigm and the ID translation paradigm, noting their core weakness stems from lacking recommendation knowledge and uniqueness. To address this limitation, we propose a new paradigm, ID representation, which incorporates pre-trained ID embeddings into LLMs in a complementary manner. In this work, we present RA-Rec, an efficient ID representation alignment framework for LLM-based recommendation, which is compatible with multiple ID-based methods and LLM architectures. Specifically, we treat ID embeddings as soft prompts and design an innovative alignment module and an efficient tuning method with tailored data construction for alignment. Extensive experiments demonstrate RA-Rec substantially outperforms current state-of-the-art methods, achieving up to 3.0% absolute HitRate@100 improvements while utilizing less than 10x training data.	翻訳日:2024-03-20 23:21:52 公開日:2024-03-19
# 個別処理効果予測のためのコンフォーマルモンテカルロメタラーナー Conformal Monte Carlo Meta-learners for Predictive Inference of Individual Treatment Effects ( http://arxiv.org/abs/2402.04906v2 ) ライセンス: Link先を確認	Jef Jonkers, Jarne Verhaeghe, Glenn Van Wallendael, Luc Duchateau, Sofie Van Hoecke,	(参考訳) 治療効果と呼ばれる介入の効果の知識は、意思決定において最重要である。条件平均処理効果(CATE)推定器を用いて、この治療効果を推定するアプローチは、多くの場合、この治療効果の点推定しか提供せず、代わりにさらなる不確実性定量化がしばしば望まれる。そこで本研究では, 共形予測システム, モンテカルロサンプリング, CATEメタラーナを活用して, 個別化意思決定に有用な予測分布を生成する新しい手法であるCMCメタラーナを提案する。さらに,結果の雑音分布に関する特定の仮定が,これらの不確実性予測にどのように影響するかを示す。それにもかかわらず、CMCフレームワークは、真の個々の治療効果を推定するために、小さな間隔幅を維持しながら、強力な実験カバレッジを示す。 Knowledge of the effect of interventions, called the treatment effect, is paramount for decision-making. Approaches to estimating this treatment effect, e.g. by using Conditional Average Treatment Effect (CATE) estimators, often only provide a point estimate of this treatment effect, while additional uncertainty quantification is frequently desired instead. Therefore, we present a novel method, the Conformal Monte Carlo (CMC) meta-learners, leveraging conformal predictive systems, Monte Carlo sampling, and CATE meta-learners, to instead produce a predictive distribution usable in individualized decision-making. Furthermore, we show how specific assumptions on the noise distribution of the outcome heavily affect these uncertainty predictions. Nonetheless, the CMC framework shows strong experimental coverage while retaining small interval widths to provide estimates of the true individual treatment effect.	翻訳日:2024-03-20 23:21:52 公開日:2024-03-19
# 潜伏因果規則:異常事象説明のための一時的点過程アプローチ Unveiling Latent Causal Rules: A Temporal Point Process Approach for Abnormal Event Explanation ( http://arxiv.org/abs/2402.05946v2 ) ライセンス: Link先を確認	Yiling Kuang, Chao Yang, Yang Yang, Shuang Li,	(参考訳) 医療などのハイテイクシステムでは、患者の健康の急激な変化など、異常な出来事の因果関係を理解することが重要である。原因を解明することは、迅速な診断と正確な治療計画に役立つ。本稿では,観測事象を説明するために,「if-then」論理規則を探索する自動手法を提案する。興味のある事象をモデル化するための時間的ポイントプロセスを導入し、イベントの発生を説明するための潜在ルールのセットを発見します。これを実現するために、期待最大化(EM)アルゴリズムを用いる。 E-stepでは、各事象が発見されるルールによって説明される可能性を計算する。 M-ステップでは、ルールセットとモデルパラメータの両方を更新し、可能性関数の下位境界を強化する。特に、ルールセットを微分的に最適化する。提案手法は,ルールの発見と根本原因の同定の両方において,正確な性能を示す。人工的および実際の医療データセットを用いて、有望な結果を示す。 In high-stakes systems such as healthcare, it is critical to understand the causal reasons behind unusual events, such as sudden changes in patient's health. Unveiling the causal reasons helps with quick diagnoses and precise treatment planning. In this paper, we propose an automated method for uncovering "if-then" logic rules to explain observational events. We introduce temporal point processes to model the events of interest, and discover the set of latent rules to explain the occurrence of events. To achieve this, we employ an Expectation-Maximization (EM) algorithm. In the E-step, we calculate the likelihood of each event being explained by each discovered rule. In the M-step, we update both the rule set and model parameters to enhance the likelihood function's lower bound. Notably, we optimize the rule set in a differential manner. Our approach demonstrates accurate performance in both discovering rules and identifying root causes. We showcase its promising results using synthetic and real healthcare datasets.	翻訳日:2024-03-20 23:21:52 公開日:2024-03-19
# LLM会話安全のための攻撃・防衛・評価 Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey ( http://arxiv.org/abs/2402.09283v2 ) ライセンス: Link先を確認	Zhichen Dong, Zhanhui Zhou, Chao Yang, Jing Shao, Yu Qiao,	(参考訳) 大規模言語モデル(LLM)が会話アプリケーションで一般的なものになった。しかし, 有害反応を誘発する誤用リスクは深刻な社会的懸念を生じさせ, LLM会話の安全性に関する最近の研究を刺激している。そこで本研究では,LLM会話の安全性の3つの重要な側面,すなわち攻撃,防御,評価について概説する。我々のゴールは、LLM会話の安全性の理解を深め、この重要な課題のさらなる調査を促進する構造的な要約を提供することである。簡単には、この調査で言及されたすべての研究を分類した: https://github.com/niconi19/LLM-conversation-safety。 Large Language Models (LLMs) are now commonplace in conversation applications. However, their risks of misuse for generating harmful responses have raised serious societal concerns and spurred recent research on LLM conversation safety. Therefore, in this survey, we provide a comprehensive overview of recent studies, covering three critical aspects of LLM conversation safety: attacks, defenses, and evaluations. Our goal is to provide a structured summary that enhances understanding of LLM conversation safety and encourages further investigation into this important subject. For easy reference, we have categorized all the studies mentioned in this survey according to our taxonomy, available at: https://github.com/niconi19/LLM-conversation-safety.	翻訳日:2024-03-20 23:21:52 公開日:2024-03-19
# 心電図の時空間的関係を捉えるための仮面表現学習の指導 Guiding Masked Representation Learning to Capture Spatio-Temporal Relationship of Electrocardiogram ( http://arxiv.org/abs/2402.09450v3 ) ライセンス: Link先を確認	Yeongyeon Na, Minje Park, Yunwon Tae, Sunghoon Joo,	(参考訳) 心電図(ECG)は、心臓由来の電気信号を監視する診断ツールとして広く用いられている。近年の機械学習研究は,心電図信号を用いた各種疾患のスクリーニングに重点を置いている。しかし、ラベル付きECGデータに制限があるため、スクリーニング疾患への適応は困難である。自己教師付き学習(SSL)による一般的な表現の実現はラベル付きデータの不足を克服するためのよく知られたアプローチであるが、ECG信号に固有の空間的・時間的関係を考慮せずに、SSLをECGデータに適用することで、準最適結果が得られる。本稿では,12誘導心電図データを再構成し,時空間特性を学習するためのST-MEM(Spatio-Temporal Masked Electrocardiogram Modeling)を提案する。 ST-MEMは、不整脈分類タスクの様々な実験環境で、他のSSLベースラインメソッドよりも優れている。さらに,ST-MEMは様々な鉛の組み合わせに適応可能であることを示す。定量的および定性的な分析により、心電図データ内の時空間的関係を示す。私たちのコードはhttps://github.com/bakqui/ST-MEM.comで利用可能です。 Electrocardiograms (ECG) are widely employed as a diagnostic tool for monitoring electrical signals originating from a heart. Recent machine learning research efforts have focused on the application of screening various diseases using ECG signals. However, adapting to the application of screening disease is challenging in that labeled ECG data are limited. Achieving general representation through self-supervised learning (SSL) is a well-known approach to overcome the scarcity of labeled data; however, a naive application of SSL to ECG data, without considering the spatial-temporal relationships inherent in ECG signals, may yield suboptimal results. In this paper, we introduce ST-MEM (Spatio-Temporal Masked Electrocardiogram Modeling), designed to learn spatio-temporal features by reconstructing masked 12-lead ECG data. ST-MEM outperforms other SSL baseline methods in various experimental settings for arrhythmia classification tasks. Moreover, we demonstrate that ST-MEM is adaptable to various lead combinations. Through quantitative and qualitative analysis, we show a spatio-temporal relationship within ECG data. Our code is available at https://github.com/bakqui/ST-MEM.	翻訳日:2024-03-20 23:21:52 公開日:2024-03-19
# 予測可能なリスク予測による診断誤差の低減に向けて Towards Reducing Diagnostic Errors with Interpretable Risk Prediction ( http://arxiv.org/abs/2402.10109v2 ) ライセンス: Link先を確認	Denis Jered McInerney, William Dickinson, Lucy C. Flynn, Andrea C. Young, Geoffrey S. Young, Jan-Willem van de Meent, Byron C. Wallace,	(参考訳) 臨床医は患者の電子健康記録(EHR)に関連情報を容易にアクセスできないため、多くの診断ミスが発生する。本研究は, 特定診断のリスクの増大または低下を示す患者ERHデータから, LLMを用いてエビデンスを識別する手法を提案する。特に, 臨床医がいまだ不確実な時点において, 個別化リスク推定による証拠を裏付けるニューラル付加モデルを提案し, 不完全微分による診断やエラーの遅れを特に軽減することを目的とした。このようなモデルをトレーニングするには、最終的な「真の」診断の時間的にきめ細かな振り返りラベルを推測する必要がある。我々は LLM を用いて, 確実な診断を行う前に, 入力テキストが元のものであることを確かめる。我々は LLM を用いて証拠のプールを復元するが、モデルによって学習された相関関係に従って、この一連の証拠を精査する。臨床医が事前に定義した鑑別診断リストの判定にどのように利用されるかをシミュレートし,本手法の有用性を詳細に評価する。 Many diagnostic errors occur because clinicians cannot easily access relevant information in patient Electronic Health Records (EHRs). In this work we propose a method to use LLMs to identify pieces of evidence in patient EHR data that indicate increased or decreased risk of specific diagnoses; our ultimate aim is to increase access to evidence and reduce diagnostic errors. In particular, we propose a Neural Additive Model to make predictions backed by evidence with individualized risk estimates at time-points where clinicians are still uncertain, aiming to specifically mitigate delays in diagnosis and errors stemming from an incomplete differential. To train such a model, it is necessary to infer temporally fine-grained retrospective labels of eventual "true" diagnoses. We do so with LLMs, to ensure that the input text is from before a confident diagnosis can be made. We use an LLM to retrieve an initial pool of evidence, but then refine this set of evidence according to correlations learned by the model. We conduct an in-depth evaluation of the usefulness of our approach by simulating how it might be used by a clinician to decide between a pre-defined list of differential diagnoses.	翻訳日:2024-03-20 23:21:52 公開日:2024-03-19
# 名詞句における頭部の最適配置 : 形容詞・数字・形容詞・名詞の場合 The optimal placement of the head in the noun phrase. The case of demonstrative, numeral, adjective and noun ( http://arxiv.org/abs/2402.10311v4 ) ライセンス: Link先を確認	Ramon Ferrer-i-Cancho,	(参考訳) 文の語順は複数の原則で表される。統語的依存距離最小化の原理は、単一頭部統語的依存構造における部分最小化(または予測可能性最大化)の原理と矛盾する:前者は、頭部を線形配置の中心に置くべきであると予測する一方で、後者は、頭部を一方の端(第一または最後)に配置するべきであると予測する。致命的な最小化(または予測可能性の最大化)が統語的依存距離を最小化するかどうかが重要な問題である。単一頭部構造の文脈では、2つの条件、すなわち2つの条件が満たされた場合、これはより起こりやすいと予測されている。 a) 関係する単語が減り b) 単語が短い。ここでは、指示詞、数字、形容詞、名詞からなる名詞句の予測をテストする。言語において好まれる順序によって、名詞は終わりの1つに置かれる傾向にあり、理論的な予測が裏付けられる。選択順序の構文依存性距離は、偶然に予想されるよりも長い。 The word order of a sentence is shaped by multiple principles. The principle of syntactic dependency distance minimization is in conflict with the principle of surprisal minimization (or predictability maximization) in single head syntactic dependency structures: while the former predicts that the head should be placed at the center of the linear arrangement, the latter predicts that the head should be placed at one of the ends (either first or last). A critical question is when surprisal minimization (or predictability maximization) should surpass syntactic dependency distance minimization. In the context of single head structures, it has been predicted that this is more likely to happen when two conditions are met, i.e. (a) fewer words are involved and (b) words are shorter. Here we test the prediction on the noun phrase when it is composed of a demonstrative, a numeral, an adjective and a noun. We find that, across preferred orders in languages, the noun tends to be placed at one of the ends, confirming the theoretical prediction. We also show evidence of anti locality effects: syntactic dependency distances in preferred orders are longer than expected by chance.	翻訳日:2024-03-20 23:12:03 公開日:2024-03-19
# TuneTables: スケーラブルなプリデータフィットネットワークのためのコンテキスト最適化 TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks ( http://arxiv.org/abs/2402.11137v2 ) ライセンス: Link先を確認	Benjamin Feuer, Robin Tibor Schirrmeister, Valeriia Cherepanova, Chinmay Hegde, Frank Hutter, Micah Goldblum, Niv Cohen, Colin White,	(参考訳) 表形式の分類は伝統的にオフスクラッチトレーニングに依存してきたが、最近PFN(Presideed-data fit Network)と呼ばれるブレークスルーがこのアプローチに挑戦している。大規模言語モデルと同様に、PFNは事前学習とコンテキスト内学習を利用して、1つのフォワードパスで新しいタスクの強力なパフォーマンスを達成する。しかし、現在のPFNには、広く普及することを禁じる制限がある。特にTabPFNは、小さな表のデータセットで非常に強力なパフォーマンスを達成するが、1000以上のデータセットの予測は設計されていない。本研究では,これらの制約を克服し,PFNの文脈最適化手法を開発することによりPFNの性能を大幅に向上する。具体的には、大規模データセットをより小さな学習コンテキストに圧縮する新しいプロンプトチューニング戦略であるTuneTablesを提案する。 TuneTablesはTabPFNを、TabPFNよりもかなり低い推論時間を持ちながら、大規模データセットの最先端のタブラ分類手法と競合するようにスケールする。さらに、TuneTablesは解釈可能性ツールとして使用することができ、公平性目標を最適化することでバイアスを軽減することができることを示す。 While tabular classification has traditionally relied on from-scratch training, a recent breakthrough called prior-data fitted networks (PFNs) challenges this approach. Similar to large language models, PFNs make use of pretraining and in-context learning to achieve strong performance on new tasks in a single forward pass. However, current PFNs have limitations that prohibit their widespread adoption. Notably, TabPFN achieves very strong performance on small tabular datasets but is not designed to make predictions for datasets of size larger than 1000. In this work, we overcome these limitations and substantially improve the performance of PFNs by developing context optimization techniques for PFNs. Specifically, we propose TuneTables, a novel prompt-tuning strategy that compresses large datasets into a smaller learned context. TuneTables scales TabPFN to be competitive with state-of-the-art tabular classification methods on larger datasets, while having a substantially lower inference time than TabPFN. Furthermore, we show that TuneTables can be used as an interpretability tool and can even be used to mitigate biases by optimizing a fairness objective.	翻訳日:2024-03-20 23:12:03 公開日:2024-03-19
# MatPlotAgent: LLMに基づくエージェント科学データの可視化手法と評価 MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization ( http://arxiv.org/abs/2402.11453v3 ) ライセンス: Link先を確認	Zhiyu Yang, Zihan Zhou, Shuo Wang, Xin Cong, Xu Han, Yukun Yan, Zhenghao Liu, Zhixing Tan, Pengyuan Liu, Dong Yu, Zhiyuan Liu, Xiaodong Shi, Maosong Sun,	(参考訳) 科学データ可視化は、複雑な情報の直接表示を可能にし、暗黙のパターンを識別する研究者を支援することによって、研究において重要な役割を担っている。その重要性にもかかわらず、科学的データの可視化にLarge Language Models (LLMs) を用いることは、まだ明らかになっていない。本研究では,科学的データ可視化タスクの自動化を目的とした,効率的なモデルに依存しないLLMエージェントフレームワークであるMatPlotAgentを紹介する。コードLLMとマルチモーダルLLMの両方の機能を活用して、MatchPlotAgentは、クエリ理解、反復デバッグによるコード生成、エラー修正のための視覚的フィードバックメカニズムの3つのコアモジュールで構成されている。この分野でのベンチマークの欠如に対処するため、100の人間検証テストケースからなる高品質なベンチマークであるMatPlotBenchを紹介した。さらに,GPT-4Vを自動評価に用いるスコアリング手法を提案する。実験の結果,MatPlotAgentは商用モデルとオープンソースモデルの両方を含む様々なLLMの性能を向上させることができることがわかった。さらに,提案手法は人手による注釈付きスコアと強い相関関係を示す。 Scientific data visualization plays a crucial role in research by enabling the direct display of complex information and assisting researchers in identifying implicit patterns. Despite its importance, the use of Large Language Models (LLMs) for scientific data visualization remains rather unexplored. In this study, we introduce MatPlotAgent, an efficient model-agnostic LLM agent framework designed to automate scientific data visualization tasks. Leveraging the capabilities of both code LLMs and multi-modal LLMs, MatPlotAgent consists of three core modules: query understanding, code generation with iterative debugging, and a visual feedback mechanism for error correction. To address the lack of benchmarks in this field, we present MatPlotBench, a high-quality benchmark consisting of 100 human-verified test cases. Additionally, we introduce a scoring approach that utilizes GPT-4V for automatic evaluation. Experimental results demonstrate that MatPlotAgent can improve the performance of various LLMs, including both commercial and open-source models. Furthermore, the proposed evaluation method shows a strong correlation with human-annotated scores.	翻訳日:2024-03-20 23:12:03 公開日:2024-03-19
# ガウス過程ニューラル付加モデル Gaussian Process Neural Additive Models ( http://arxiv.org/abs/2402.12518v2 ) ライセンス: Link先を確認	Wei Zhang, Brian Barr, John Paisley,	(参考訳) ディープニューラルネットワークは多くの分野に革命をもたらしたが、そのブラックボックスの性質は、解釈可能なモデルと説明可能なモデルを必要とする医療や金融などの分野で広く採用されるのを防ぐこともある。ニューラル付加モデル(NAMs)の最近の発展は、表付きデータセットの解釈可能な深層学習の方向への大きな一歩である。本稿では,ガウス過程ニューラル付加モデル (GP-NAM) と呼ばれる,ランダムフーリエ特徴によるガウス過程の単一層ニューラルネットワーク構築を用いたNAMの新しいサブクラスを提案する。 GP-NAMは凸目的関数と、特徴次元と線形に成長する訓練可能なパラメータの数が有利である。 GPは複雑な非パラメトリックな単変数関数を学習するのに適しているため、より深いNAMアプローチと比較してパフォーマンスが損なわれることはない。 GP-NAMの複数の表付きデータセットにおける性能を実証し,パラメータ数を大幅に削減して,分類タスクと回帰タスクの両方において同等あるいはより良い性能が得られることを示した。 Deep neural networks have revolutionized many fields, but their black-box nature also occasionally prevents their wider adoption in fields such as healthcare and finance, where interpretable and explainable models are required. The recent development of Neural Additive Models (NAMs) is a significant step in the direction of interpretable deep learning for tabular datasets. In this paper, we propose a new subclass of NAMs that use a single-layer neural network construction of the Gaussian process via random Fourier features, which we call Gaussian Process Neural Additive Models (GP-NAM). GP-NAMs have the advantage of a convex objective function and number of trainable parameters that grows linearly with feature dimensionality. It suffers no loss in performance compared to deeper NAM approaches because GPs are well-suited for learning complex non-parametric univariate functions. We demonstrate the performance of GP-NAM on several tabular datasets, showing that it achieves comparable or better performance in both classification and regression tasks with a large reduction in the number of parameters.	翻訳日:2024-03-20 23:12:03 公開日:2024-03-19
# シャットリングを具備した2$\times$N配列の早期耐故障性に向けて Towards early fault tolerance on a 2$\times$N array of qubits equipped with shuttling ( http://arxiv.org/abs/2402.12599v2 ) ライセンス: Link先を確認	Adam Siegel, Armands Strikis, Michael Fogarty,	(参考訳) 局所的に相互作用する量子ビットの2次元グリッドは、フォールトトレラント量子コンピューティングを実現するための有望なプラットフォームであるとよく理解されている。しかし、近未来においては、低次元構造を開発することがより困難でないことが証明される。本稿では,そのような制約付きアーキテクチャは耐故障性もサポートできることを示す。特に,非隣り合う量子ビット間の相互作用が,配列の行に沿って論理情報をシャットダウンすることで可能となる2$\times$N配列を探索する。この設定の明らかな制約にもかかわらず、エラー訂正が可能であることを示し、このプラットフォームに自然に適合するコードのクラスを特定する。シリコンスピン量子ビットは,我々の要求を満たすと信じられている量子ビットの実用的な例として,表面コードによる全普遍量子計算を実現するためのプロトコルを提供するとともに,シリコンスピン量子ビットデバイスに特有の追加制約に対処する。数値シミュレーションにより,本アーキテクチャの性能を現実的な雑音モデルを用いて評価し,曲面符号とより複雑なqLDPC符号の両方がゲートおよびシャットリングノイズを効果的に抑制し,古典的に難解な状態下で量子アルゴリズムの実行を可能にすることを実証した。この研究により、古典的マシンを上回る量子アルゴリズムの実行に一歩近づいた。 It is well understood that a two-dimensional grid of locally-interacting qubits is a promising platform for achieving fault tolerant quantum computing. However in the near-future, it may prove less challenging to develop lower dimensional structures. In this paper, we show that such constrained architectures can also support fault tolerance; specifically we explore a 2$\times$N array of qubits where the interactions between non-neighbouring qubits are enabled by shuttling the logical information along the rows of the array. Despite the apparent constraints of this setup, we demonstrate that error correction is possible and identify the classes of codes that are naturally suited to this platform. Focusing on silicon spin qubits as a practical example of qubits believed to meet our requirements, we provide a protocol for achieving full universal quantum computation with the surface code, while also addressing the additional constraints that are specific to a silicon spin qubit device. Through numerical simulations, we evaluate the performance of this architecture using a realistic noise model, demonstrating that both surface code and more complex qLDPC codes efficiently suppress gate and shuttling noise to a level that allows for the execution of quantum algorithms within the classically intractable regime. This work thus brings us one step closer to the execution of quantum algorithms that outperform classical machines.	翻訳日:2024-03-20 23:12:03 公開日:2024-03-19
# ユーザ行動モデリングと確率計画による大型電気自動車充電ステーションの制御 Controlling Large Electric Vehicle Charging Stations via User Behavior Modeling and Stochastic Programming ( http://arxiv.org/abs/2402.13224v3 ) ライセンス: Link先を確認	Alban Puech, Tristan Rigaut, William Templier, Maud Tournoud,	(参考訳) 本稿では,スロット電力制限,契約しきい値超過によるペナルティ,電気自動車(EV)の早期切断といった実世界の制約を取り入れた電気自動車充電ステーション(EVCS)モデルを提案する。本稿では,不確実性下でのEVCS制御の問題の定式化と,ユーザが提供する情報,すなわちモデル予測制御と2段階確率プログラミングを利用する2つの多段階確率プログラミング手法を提案する。このモデルは、充電セッション開始時と終了時、およびエネルギー需要における不確実性に対処する。日時依存確率過程に基づくユーザの行動モデルは、顧客満足度を維持しながらコスト削減を促進する。この2つの手法の利点は、実世界のデータセットを用いた22日間のシミュレーションにおいて、2つの基準線に対して示される。 2段階のアプローチは、最適化のための幅広い不確実性シナリオを考慮し、早期切断に対する堅牢性を示す。電力コストよりもユーザ満足度を優先するアルゴリズムは,業界標準ベースラインと比較して,2つのユーザ満足度指標において20%と36%の改善を実現している。さらに、コストとユーザの満足度を最高のバランスで達成するアルゴリズムは、理論上最適なベースライン – 非予測的制約を緩和する – と比較して、わずか3%の相対的なコスト増加を示しますが、使用済みの満足度測定値では、ユーザ満足度のパフォーマンスの94%と84%に達しています。 This paper introduces an Electric Vehicle Charging Station (EVCS) model that incorporates real-world constraints, such as slot power limitations, contract threshold overruns penalties, or early disconnections of electric vehicles (EVs). We propose a formulation of the problem of EVCS control under uncertainty, and implement two Multi-Stage Stochastic Programming approaches that leverage user-provided information, namely, Model Predictive Control and Two-Stage Stochastic Programming. The model addresses uncertainties in charging session start and end times, as well as in energy demand. A user's behavior model based on a sojourn-time-dependent stochastic process enhances cost reduction while maintaining customer satisfaction. The benefits of the two proposed methods are showcased against two baselines over a 22-day simulation using a real-world dataset. The two-stage approach demonstrates robustness against early disconnections by considering a wider range of uncertainty scenarios for optimization. The algorithm prioritizing user satisfaction over electricity cost achieves a 20% and 36% improvement in two user satisfaction metrics compared to an industry-standard baseline. Additionally, the algorithm striking the best balance between cost and user satisfaction exhibits a mere 3% relative cost increase compared to the theoretically optimal baseline - for which the nonanticipativity constraint is relaxed - while attaining 94% and 84% of the user satisfaction performance in the two used satisfaction metrics.	翻訳日:2024-03-20 23:12:03 公開日:2024-03-19
# KorNAT:韓国の社会価値と共通知識のためのLLMアライメントベンチマーク KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge ( http://arxiv.org/abs/2402.13605v4 ) ライセンス: Link先を確認	Jiyoung Lee, Minwoo Kim, Seungho Kim, Junghwan Kim, Seunghyun Won, Hwaran Lee, Edward Choi,	(参考訳) 大きな言語モデル(LLM)が特定の国に効果的に展開されるためには、その国の文化と基本的な知識を理解する必要がある。この目的のために,社会価値アライメントと共通知識アライメントという2つの側面から,LLMと対象国間のアライメントを測定する全国アライメントを導入する。社会的価値のアライメントは、モデルがいかに国家固有の社会的価値を理解するかを評価する一方、共通の知識のアライメントは、モデルが国家に関連する基本的な知識をいかに捉えるかを調べる。我々は韓国と国家の整合性を測定する最初のベンチマークであるKorNATを構築した。社会価値データセットについて,6,174人の韓国人参加者を対象とした大規模調査から,基礎的真理ラベルを得た。共通知識データセットについて,韓国の教科書とGED参照資料に基づくサンプルを構築した。 KorNATには、それぞれ社会的価値と共通知識に関する4Kと6Kの多重選択質問が含まれている。我々のデータセット作成プロセスは、統計的サンプリング理論に基づいて慎重に設計され、複数ラウンドの人間によるレビューを通して洗練されている。 7つのLLM実験の結果, 基準値に適合するモデルはごくわずかであり, さらなる拡張の可能性を示した。 KorNATは、データセットの品質評価を専門とする政府機関による評価を通過させた後、政府の承認を受けた。データセットのサンプルと詳細な評価プロトコルはhttps://selectstar.ai/ko/papers-national-alignmentに記載されている。 For Large Language Models (LLMs) to be effectively deployed in a specific country, they must possess an understanding of the nation's culture and basic knowledge. To this end, we introduce National Alignment, which measures an alignment between an LLM and a targeted country from two aspects: social value alignment and common knowledge alignment. Social value alignment evaluates how well the model understands nation-specific social values, while common knowledge alignment examines how well the model captures basic knowledge related to the nation. We constructed KorNAT, the first benchmark that measures national alignment with South Korea. For the social value dataset, we obtained ground truth labels from a large-scale survey involving 6,174 unique Korean participants. For the common knowledge dataset, we constructed samples based on Korean textbooks and GED reference materials. KorNAT contains 4K and 6K multiple-choice questions for social value and common knowledge, respectively. Our dataset creation process is meticulously designed and based on statistical sampling theory and was refined through multiple rounds of human review. The experiment results of seven LLMs reveal that only a few models met our reference score, indicating a potential for further enhancement. KorNAT has received government approval after passing an assessment conducted by a government-affiliated organization dedicated to evaluating dataset quality. Samples and detailed evaluation protocols of our dataset can be found in https://selectstar.ai/ko/papers-national-alignment	翻訳日:2024-03-20 23:12:03 公開日:2024-03-19
# 建設廃棄物運搬トラックのGPSデータを用いた土木関連箇所の分類:Chengduケーススタディ Using construction waste hauling trucks' GPS data to classify earthwork-related locations: A Chengdu case study ( http://arxiv.org/abs/2402.14698v2 ) ライセンス: Link先を確認	Lei Yu, Ke Han,	(参考訳) 建設現場、埋立処分場、コンクリートミキシングステーションなど、アースワーク関連の場所(ERL)は、都市ごみ汚染の主な原因である(粒子状物質)。 ERLの効果的な管理は不可欠であり、市内のこれらの場所をタイムリーかつ効率的に追跡する必要がある。本研究の目的は,16,000台以上の建設廃棄物運搬トラック(CWHT)のGPSトラジェクトリデータと,地理的,土地被覆,POI,輸送次元を含む58の都市特性を用いて都市ERLを識別・分類することである。いくつかの機械学習モデルを比較し,中国成都市における実世界データを用いた分類性能に及ぼす時空間的特徴の影響を検討した。その結果、77.8%の分類精度が限られた特徴で達成できることが示されている。この分類は、2023年12月にチェンドゥのAlpha MAPSシステムで実施され、724の建設現場、48のコンクリートミキシングステーション、80のトラック駐車場所の特定に成功した。 Earthwork-related locations (ERLs), such as construction sites, earth dumping ground, and concrete mixing stations, are major sources of urban dust pollution (particulate matters). The effective management of ERLs is crucial and requires timely and efficient tracking of these locations throughout the city. This work aims to identify and classify urban ERLs using GPS trajectory data of over 16,000 construction waste hauling trucks (CWHTs), as well as 58 urban features encompassing geographic, land cover, POI and transport dimensions. We compare several machine learning models and examine the impact of various spatial-temporal features on classification performance using real-world data in Chengdu, China. The results demonstrate that 77.8% classification accuracy can be achieved with a limited number of features. This classification framework was implemented in the Alpha MAPS system in Chengdu, which has successfully identified 724 construction cites/earth dumping ground, 48 concrete mixing stations, and 80 truck parking locations in the city during December 2023, which has enabled local authority to effectively manage urban dust pollution at low personnel costs.	翻訳日:2024-03-20 23:12:03 公開日:2024-03-19
# 線としてのカメラ: 線拡散による空間推定 Cameras as Rays: Pose Estimation via Ray Diffusion ( http://arxiv.org/abs/2402.14817v2 ) ライセンス: Link先を確認	Jason Y. Zhang, Amy Lin, Moneish Kumar, Tzu-Hsuan Yang, Deva Ramanan, Shubham Tulsiani,	(参考訳) カメラポーズの推定は3次元再構成の基本的な課題であり, まばらなサンプル画像(10。カメラ外部のグローバルなパラメトリゼーションをトップダウンで予測する既存のアプローチとは対照的に,カメラを光束として扱うカメラポーズの分散表現を提案する。この表現は、ポーズ精度を向上させる空間像特徴との密結合を可能にする。この表現は、設定レベル変換器に自然に適しており、画像パッチを対応する光線にマッピングする回帰ベースのアプローチを開発する。スパース・ビュー・ポーズ推論における不確かさを捉えるため,本手法を適応し,可視モードのサンプリングを可能とし,性能の向上を図る。提案手法は回帰法と拡散法の両方で,CO3Dのカメラポーズ推定における最先端性能を実証し,未確認対象のカテゴリや被写体キャプチャに一般化した。 Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparsely sampled views (<10). In contrast to existing approaches that pursue top-down prediction of global parametrizations of camera extrinsics, we propose a distributed representation of camera pose that treats a camera as a bundle of rays. This representation allows for a tight coupling with spatial image features improving pose precision. We observe that this representation is naturally suited for set-level transformers and develop a regression-based approach that maps image patches to corresponding rays. To capture the inherent uncertainties in sparse-view pose inference, we adapt this approach to learn a denoising diffusion model which allows us to sample plausible modes while improving performance. Our proposed methods, both regression- and diffusion-based, demonstrate state-of-the-art performance on camera pose estimation on CO3D while generalizing to unseen object categories and in-the-wild captures.	翻訳日:2024-03-20 23:12:03 公開日:2024-03-19
# GVA:モノクロ映像から鮮明な3Dガウスアバターを再構築 GVA: Reconstructing Vivid 3D Gaussian Avatars from Monocular Videos ( http://arxiv.org/abs/2402.16607v2 ) ライセンス: Link先を確認	Xinqi Liu, Chenming Wu, Jialun Liu, Xing Liu, Jinbo Wu, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang,	(参考訳) 本稿では,モノクロビデオ入力(GVA)から鮮明な3Dガウスアバターの作成を容易にする新しい手法を提案する。私たちのイノベーションは、高忠実な人間の体を再現し、3Dガウスを人間の皮膚表面と正確に整合させるという、複雑な課題に対処することにあります。本論文の重要な貢献は2つある。まず,通常の地図やシルエットを整列させることで手足のポーズ精度を向上させるポーズ改善手法を提案する。精密なポーズは、正確な形状と外観の復元に不可欠である。第2に、3次元ガウス点とアバター面との正確なアライメントを保証する新しい曲面誘導再初期化法により、3次元ガウス点の品質を低下させたアンバランスアグリゲーションと初期化バイアスの問題に対処する。実験により,提案手法は高忠実かつ鮮明な3次元ガウスアバター再構成を実現することが示された。広汎な実験的分析は、人体と手ポーズのきめ細かい制御を提供しながら、フォトリアリスティックなノベルビュー合成において、最先端のパフォーマンスを達成することを実証し、質的かつ定量的に評価する。プロジェクトページ: https://3d-aigc.github.io/GVA/。 In this paper, we present a novel method that facilitates the creation of vivid 3D Gaussian avatars from monocular video inputs (GVA). Our innovation lies in addressing the intricate challenges of delivering high-fidelity human body reconstructions and aligning 3D Gaussians with human skin surfaces accurately. The key contributions of this paper are twofold. Firstly, we introduce a pose refinement technique to improve hand and foot pose accuracy by aligning normal maps and silhouettes. Precise pose is crucial for correct shape and appearance reconstruction. Secondly, we address the problems of unbalanced aggregation and initialization bias that previously diminished the quality of 3D Gaussian avatars, through a novel surface-guided re-initialization method that ensures accurate alignment of 3D Gaussian points with avatar surfaces. Experimental results demonstrate that our proposed method achieves high-fidelity and vivid 3D Gaussian avatar reconstruction. Extensive experimental analyses validate the performance qualitatively and quantitatively, demonstrating that it achieves state-of-the-art performance in photo-realistic novel view synthesis while offering fine-grained control over the human body and hand pose. Project page: https://3d-aigc.github.io/GVA/.	翻訳日:2024-03-20 23:12:03 公開日:2024-03-19
# 圧縮領域に向けた強調バイアスの緩和による圧縮画像の品質向上 Enhancing Quality of Compressed Images by Mitigating Enhancement Bias Towards Compression Domain ( http://arxiv.org/abs/2402.17200v3 ) ライセンス: Link先を確認	Qunliang Xing, Mai Xu, Shengxi Li, Xin Deng, Meisong Zheng, Huaida Liu, Ying Chen,	(参考訳) 圧縮画像の既存の品質向上手法は、実画像を生成するために、拡張領域を原領域に整合させることに重点を置いている。しかし、これらの手法は圧縮領域に対して広範に拡張バイアスを示し、必然的に原領域よりも現実的であると考えている。このバイアスにより、強調画像は圧縮された画像とよく似ているため、知覚品質は低下する。本稿では,このバイアスを緩和し,圧縮画像の品質を高めるための,シンプルで効果的な手法を提案する。提案手法では,圧縮画像をキー条件とする条件判別器を用い,領域分割規則を組み込んで圧縮領域から拡張領域を積極的に遠ざける。この2つの戦略により,提案手法は圧縮領域に対する識別を可能にし,拡張領域を生領域に近づける。総合的な品質評価は,提案手法が推論オーバーヘッドを発生させることなく,他の最先端手法よりも優れていることを示す。 Existing quality enhancement methods for compressed images focus on aligning the enhancement domain with the raw domain to yield realistic images. However, these methods exhibit a pervasive enhancement bias towards the compression domain, inadvertently regarding it as more realistic than the raw domain. This bias makes enhanced images closely resemble their compressed counterparts, thus degrading their perceptual quality. In this paper, we propose a simple yet effective method to mitigate this bias and enhance the quality of compressed images. Our method employs a conditional discriminator with the compressed image as a key condition, and then incorporates a domain-divergence regularization to actively distance the enhancement domain from the compression domain. Through this dual strategy, our method enables the discrimination against the compression domain, and brings the enhancement domain closer to the raw domain. Comprehensive quality evaluations confirm the superiority of our method over other state-of-the-art methods without incurring inference overheads.	翻訳日:2024-03-20 23:12:03 公開日:2024-03-19
# MATHSENSEI: 数学的推論のためのツール拡張大型言語モデル MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning ( http://arxiv.org/abs/2402.17231v2 ) ライセンス: Link先を確認	Debrup Das, Debopriyo Banerjee, Somak Aditya, Ashish Kulkarni,	(参考訳) ツール強化された大規模言語モデル(TALM)は、大きな言語モデル(LLM)のスキルセットを高めることで知られており、多くのタスクにおける推論能力の向上につながっている。 TALMは、様々な質問答えベンチマーク、複雑な数学的推論ベンチマークにおける有効性、そして知識検索や数学的方程式の解法のためのツールによって提供される潜在的補完的な利点などにおいて、うまく採用されているが、オープンな研究課題である。本研究では,数学的推論のためのツール強化された大規模言語モデルMATHSENSEIを提案する。知識検索(Bing Web Search)、プログラム実行(Python)、記号方程式の解法(Wolfram-Alpha)などのツールを駆使して,数学的推論データセットの評価を通じて,これらのツールの補完的メリットについて検討する。我々は、様々な数学的分野の数学的推論を評価するための一般的なデータセットであるMATHについて、徹底的な改善を行う。また、有名なツールプランナによる実験を行い、ツールシークエンシングがモデル性能に与える影響について検討する。 MATHSENSEIは、MATHデータセットに連鎖したgpt-3.5-turboよりも13.5%精度が向上している。さらに,より単純な数学語問題 (GSM-8k) に対してTALMは有効ではなく,複雑性や必要な知識が増大するにつれてメリットが増大する(AQuA,MMLU-Math,MATHの高次複雑問題など)。コードとデータはhttps://github.com/Debrup-61/MathSensei.comで公開されている。 Tool-augmented Large Language Models (TALM) are known to enhance the skillset of large language models (LLM), thereby, leading to their improved reasoning abilities across many tasks. While, TALMs have been successfully employed in different question-answering benchmarks, their efficacy on complex mathematical reasoning benchmarks, and the potential complimentary benefits offered by tools for knowledge retrieval and mathematical equation solving, are open research questions. In this work, we present MATHSENSEI, a tool-augmented large language model for mathematical reasoning. Augmented with tools for knowledge retrieval (Bing Web Search), program execution (Python), and symbolic equation solving (Wolfram-Alpha), we study the complimentary benefits of these tools through evaluations on mathematical reasoning datasets. We perform exhaustive ablations on MATH,a popular dataset for evaluating mathematical reasoning on diverse mathematical disciplines. We also conduct experiments involving well-known tool planners to study the impact of tool sequencing on the model performance. MATHSENSEI achieves 13.5% better accuracy over gpt-3.5-turbo with chain-of-thought on the MATH dataset. We further observe that TALMs are not as effective for simpler math word problems (in GSM-8k), and the benefit increases as the complexity and required knowledge increases (progressively over AQuA, MMLU-Math, and higher level complex questions in MATH). The code and data are available at https://github.com/Debrup-61/MathSensei.	翻訳日:2024-03-20 23:01:00 公開日:2024-03-19
# MambaMIR: 関節画像再構成と不確かさ推定のための任意マスクマンバ MambaMIR: An Arbitrary-Masked Mamba for Joint Medical Image Reconstruction and Uncertainty Estimation ( http://arxiv.org/abs/2402.18451v2 ) ライセンス: Link先を確認	Jiahao Huang, Liutao Yang, Fanwen Wang, Yinzhe Wu, Yang Nan, Angelica I. Aviles-Rivero, Carola-Bibiane Schönlieb, Daoqiang Zhang, Guang Yang,	(参考訳) 最近のMambaモデルでは、医用画像タスクを含む視覚表現学習に顕著な適応性を示している。本研究では,マンバをベースとした医用画像再構成モデルであるMambaMIRと,そのジェネレーティブ・アドバーサリアル・ネットワーク・モデルであるMambaMIR-GANを紹介する。提案したMambaMIRは,線形複雑性,大域受容場,動的重み付けなどの利点を元のMambaモデルから継承する。革新的任意マスク機構は,マンバを画像再構成作業に効果的に適用し,その後のモンテカルロによる不確実性推定にランダム性を与える。膝, 胸, 腹部などの解剖学的領域をカバーする高速MRI, SVCT などの医療画像再構成作業において, MambaMIR と MambaMIR-GAN が, 最先端の方法と比較して, 同等あるいは優れた再建成績を示した。さらに、推定された不確実性マップは、復元品質の信頼性に関するさらなる洞察を提供する。コードはhttps://github.com/ayanglab/MambaMIR.comで公開されている。 The recent Mamba model has shown remarkable adaptability for visual representation learning, including in medical imaging tasks. This study introduces MambaMIR, a Mamba-based model for medical image reconstruction, as well as its Generative Adversarial Network-based variant, MambaMIR-GAN. Our proposed MambaMIR inherits several advantages, such as linear complexity, global receptive fields, and dynamic weights, from the original Mamba model. The innovated arbitrary-mask mechanism effectively adapt Mamba to our image reconstruction task, providing randomness for subsequent Monte Carlo-based uncertainty estimation. Experiments conducted on various medical image reconstruction tasks, including fast MRI and SVCT, which cover anatomical regions such as the knee, chest, and abdomen, have demonstrated that MambaMIR and MambaMIR-GAN achieve comparable or superior reconstruction results relative to state-of-the-art methods. Additionally, the estimated uncertainty maps offer further insights into the reliability of the reconstruction quality. The code is publicly available at https://github.com/ayanglab/MambaMIR.	翻訳日:2024-03-20 23:01:00 公開日:2024-03-19
# 大規模言語モデルがレコメンダーシステムに与える影響を探求する Exploring the Impact of Large Language Models on Recommender Systems: An Extensive Review ( http://arxiv.org/abs/2402.18590v3 ) ライセンス: Link先を確認	Arpita Vats, Vinija Jain, Rahul Raja, Aman Chadha,	(参考訳) 本稿では,レコメンデーションシステムにおけるLarge Language Models (LLMs) の重要性を明らかにする。直接のユーザインタラクションデータを持たない従来のシステムとは異なり、LLMはアイテムを推薦する能力に優れた能力を示し、言語の複雑さを解釈する能力を示す。これはレコメンデーションの領域における根本的なパラダイムシフトである。動的研究の展望の中で、研究者はリコメンデーションタスクの基礎を再定義するために、LLMの言語理解と生成能力を積極的に活用している。この調査は、レコメンデーションフレームワークにおけるLLMの本質的な強み、曖昧なコンテキスト理解、さまざまなドメイン間のシームレスな移行、統一されたアプローチの採用、共有データ貯水池を活用した総合的な学習戦略、透明性のある意思決定、反復的な改善などについて、徹底的に調査している。トランスフォーメーションの可能性にもかかわらず、入力プロンプトに対する感受性、時には誤解釈、予期せぬレコメンデーション、LLM駆動のレコメンデーションシステムにおける継続的な洗練と進化の必要性といった課題が続いている。 The paper underscores the significance of Large Language Models (LLMs) in reshaping recommender systems, attributing their value to unique reasoning abilities absent in traditional recommenders. Unlike conventional systems lacking direct user interaction data, LLMs exhibit exceptional proficiency in recommending items, showcasing their adeptness in comprehending intricacies of language. This marks a fundamental paradigm shift in the realm of recommendations. Amidst the dynamic research landscape, researchers actively harness the language comprehension and generation capabilities of LLMs to redefine the foundations of recommendation tasks. The investigation thoroughly explores the inherent strengths of LLMs within recommendation frameworks, encompassing nuanced contextual comprehension, seamless transitions across diverse domains, adoption of unified approaches, holistic learning strategies leveraging shared data reservoirs, transparent decision-making, and iterative improvements. Despite their transformative potential, challenges persist, including sensitivity to input prompts, occasional misinterpretations, and unforeseen recommendations, necessitating continuous refinement and evolution in LLM-driven recommender systems.	翻訳日:2024-03-20 23:01:00 公開日:2024-03-19
# CASIMIR:複数の著者による改訂で強化された学術論文のコーパス CASIMIR: A Corpus of Scientific Articles enhanced with Multiple Author-Integrated Revisions ( http://arxiv.org/abs/2403.00241v2 ) ライセンス: Link先を確認	Leane Jourdan, Florian Boudin, Nicolas Hernandez, Richard Dufour,	(参考訳) 科学的論文を書くことは、高度に体系化された特定のジャンルであるため、研究成果やアイデアを効果的に伝達するためには、文章によるコミュニケーションの熟練が不可欠である。本稿では,学術論文の執筆過程の改訂段階における原文資源を提案する。この新しいデータセットはCASIMIRと呼ばれ、OpenReviewの15,646の科学論文の改訂版とピアレビューを含んでいる。談話レベルでの今後の改訂研究を支援するメタデータとして、段落位置情報を保持しつつ、記事の連続バージョンを文レベルで整列する。各改訂文は、自動的に抽出された編集と関連する修正意図で濃縮される。データセットの初期品質を評価するために,いくつかの最先端テキストリビジョン手法の質的研究を行い,様々な評価指標を比較した。実験の結果,テキスト改訂作業における現在の評価手法の妥当性が疑問視された。 Writing a scientific article is a challenging task as it is a highly codified and specific genre, consequently proficiency in written communication is essential for effectively conveying research findings and ideas. In this article, we propose an original textual resource on the revision step of the writing process of scientific articles. This new dataset, called CASIMIR, contains the multiple revised versions of 15,646 scientific articles from OpenReview, along with their peer reviews. Pairs of consecutive versions of an article are aligned at sentence-level while keeping paragraph location information as metadata for supporting future revision studies at the discourse level. Each pair of revised sentences is enriched with automatically extracted edits and associated revision intention. To assess the initial quality on the dataset, we conducted a qualitative study of several state-of-the-art text revision approaches and compared various evaluation metrics. Our experiments led us to question the relevance of the current evaluation methods for the text revision task.	翻訳日:2024-03-20 23:01:00 公開日:2024-03-19
# 説明可能なニューロ・シンボリックパイプラインによるマルチドメイン自動短解像の実現 Enhancing Multi-Domain Automatic Short Answer Grading through an Explainable Neuro-Symbolic Pipeline ( http://arxiv.org/abs/2403.01811v2 ) ライセンス: Link先を確認	Felix Künnecke, Anna Filighera, Colin Leong, Tim Steuer,	(参考訳) グレーディング決定の背後にある解釈可能な推論を伴って、簡単な答えを自動でグラディングすることは、現在のトランスフォーマーアプローチにとって難しい目標である。正当化キュー検出は論理的推論と組み合わせて、ASAGのニューロシンボリックアーキテクチャーにとって有望な方向を示している。しかし、主な課題の1つは、いくつかのASAGデータセットにのみ存在する、学生の反応に注釈付けされた正当化手順が必要であることである。この課題を克服するために,(1)ASAGデータセットの正当性を示す弱教師付きアノテーション手法,(2)正当性に基づく説明可能なASAGのニューロシンボリックモデルを提案する。提案手法は,2言語,マルチドメイン,マルチクエクションのトレーニングセットアップにおけるショートアンサーフィードバックデータセットの最先端と比較して,RMSEを0.24から0.3改善する。以上の結果から,本手法は高品質な学級を創出する上で有望な方向性を示し,今後のASAG研究やNLP教育における研究にともなう説明を提供する。 Grading short answer questions automatically with interpretable reasoning behind the grading decision is a challenging goal for current transformer approaches. Justification cue detection, in combination with logical reasoners, has shown a promising direction for neuro-symbolic architectures in ASAG. But, one of the main challenges is the requirement of annotated justification cues in the students' responses, which only exist for a few ASAG datasets. To overcome this challenge, we contribute (1) a weakly supervised annotation procedure for justification cues in ASAG datasets, and (2) a neuro-symbolic model for explainable ASAG based on justification cues. Our approach improves upon the RMSE by 0.24 to 0.3 compared to the state-of-the-art on the Short Answer Feedback dataset in a bilingual, multi-domain, and multi-question training setup. This result shows that our approach provides a promising direction for generating high-quality grades and accompanying explanations for future research in ASAG and educational NLP.	翻訳日:2024-03-20 23:01:00 公開日:2024-03-19
# 組合せ最適化問題に対する再帰的量子緩和 Recursive Quantum Relaxation for Combinatorial Optimization Problems ( http://arxiv.org/abs/2403.02045v2 ) ライセンス: Link先を確認	Ruho Kondo, Yuki Sato, Rudy Raymond, Naoki Yamamoto,	(参考訳) 量子最適化法は、量子状態の自由度を連続的に使い、MAX-CUT問題などの組合せ問題をヒューリスティックに解く。本稿では,既存の量子最適化手法を解法に統一し,最適量子状態から最もよく測定される2値解を求めることを示す。この発見と、より少ない量子ビット上の量子状態にビットを符号化する量子ランダムアクセス符号(QRAC)の概念を組み合わせることで、MAX-CUTのための再帰的量子ランダムアクセス最適化(RQRAO)と呼ばれる効率的な再帰的量子緩和法を提案する。テンソルネットワーク技術を用いたMAX-CUT問題における数百ノードの標準ベンチマークグラフの実験は、RQRAOがゴーマン-ウィリアムソン法より優れ、最先端の古典的解法に匹敵することを示した。コードも近く公開される予定だ。 Quantum optimization methods use a continuous degree-of-freedom of quantum states to heuristically solve combinatorial problems, such as the MAX-CUT problem, which can be attributed to various NP-hard combinatorial problems. This paper shows that some existing quantum optimization methods can be unified into a solver that finds the binary solution that is most likely measured from the optimal quantum state. Combining this finding with the concept of quantum random access codes (QRACs) for encoding bits into quantum states on fewer qubits, we propose an efficient recursive quantum relaxation method called recursive quantum random access optimization (RQRAO) for MAX-CUT. Experiments on standard benchmark graphs with several hundred nodes in the MAX-CUT problem, conducted in a fully classical manner using a tensor network technique, show that RQRAO outperforms the Goemans--Williamson method and is comparable to state-of-the-art classical solvers. The codes will be made available soon.	翻訳日:2024-03-20 23:01:00 公開日:2024-03-19
# PHAnToM:大規模言語モデルにおけるパーソナリティはミンド推論理論に影響を及ぼす PHAnToM: Personality Has An Effect on Theory-of-Mind Reasoning in Large Language Models ( http://arxiv.org/abs/2403.02246v2 ) ライセンス: Link先を確認	Fiona Anting Tan, Gerard Christopher Yeo, Fanyou Wu, Weijie Xu, Vinija Jain, Aman Chadha, Kokil Jaidka, Yang Liu, See-Kiong Ng,	(参考訳) 大規模言語モデル(LLM)の最近の進歩は、自然言語処理における多くのタスクにおいて、その能力が人間に匹敵する、あるいは優れていることを示している。この進歩にもかかわらず、LLMは人類が自然に得意とする社会的認知的推論にはまだ不十分である。特定の性格特性とToM推論の関連性に関する心理学的研究からインスピレーションを得て,LLMの能力に影響を及ぼすプロンプトの過敏性に関する工学的な研究から,LLMにおけるパーソナリティの誘導がToM推論能力にどのように影響するかを考察した。以上の結果から,特定の個人性は3つの異なるToMタスクにおいてLLMの推論能力に有意に影響を及ぼす可能性が示唆された。特にダークトライアドの特徴は、様々なToMタスクにわたるGPT-3.5、Llama 2、MistralのようなLCMに大きな変動効果を持つ。 GPT-3.5, Llama 2, Mistral などの LLM の性格特性は, パーソナリティ・プロンプトによって制御可能となる。 LLMを使う場合、ロールプレイが一般的な戦略である現在の状況では、個人性を持った特定のペルソナを採用するモデルが、予期しない方法で推論能力を変化させる可能性があるため、注意が必要であることが研究で強調されている。 Recent advances in large language models (LLMs) demonstrate that their capabilities are comparable, or even superior, to humans in many tasks in natural language processing. Despite this progress, LLMs are still inadequate at social-cognitive reasoning, which humans are naturally good at. Drawing inspiration from psychological research on the links between certain personality traits and Theory-of-Mind (ToM) reasoning, and from prompt engineering research on the hyper-sensitivity of prompts in affecting LLMs capabilities, this study investigates how inducing personalities in LLMs using prompts affects their ToM reasoning capabilities. Our findings show that certain induced personalities can significantly affect the LLMs' reasoning capabilities in three different ToM tasks. In particular, traits from the Dark Triad have a larger variable effect on LLMs like GPT-3.5, Llama 2, and Mistral across the different ToM tasks. We find that LLMs that exhibit a higher variance across personality prompts in ToM also tends to be more controllable in personality tests: personality traits in LLMs like GPT-3.5, Llama 2 and Mistral can be controllably adjusted through our personality prompts. In today's landscape where role-play is a common strategy when using LLMs, our research highlights the need for caution, as models that adopt specific personas with personalities potentially also alter their reasoning abilities in an unexpected manner.	翻訳日:2024-03-20 23:01:00 公開日:2024-03-19
# インタラクティブな継続的学習 - 高速でスロー思考 Interactive Continual Learning: Fast and Slow Thinking ( http://arxiv.org/abs/2403.02628v2 ) ライセンス: Link先を確認	Biqing Qi, Xingquan Chen, Junqi Gao, Dong Li, Jianxing Liu, Ligang Wu, Bowen Zhou,	(参考訳) 高度な生命形態は、神経認知機構の相乗的相互作用によって維持され、生涯を通して継続的に知識を取得し、伝達する。対照的に、現代の機械学習パラダイムは連続学習(CL)の側面をエミュレートする際の限界を示す。それでも、大きな言語モデル(LLM)の出現は、これらのモデルとの相互作用を通じてCLを実現するための有望な道を示す。本稿では,補完学習システム理論に基づく対話型連続学習(Interactive Continual Learning, ICL)フレームワークを提案する。具体的には, ViT モデルを System1 として,マルチモーダル LLM を System2 として割り当てる。メモリモジュールがクラス情報からタスクを推論し、Set2Set検索を強化するために、クラス知識タスクマルチヘッドアテンション(CKT-MHA)を提案する。さらに,図形表現の強化によるSystem1のメモリ検索を改善するために,von Mises-Fisher(vMF)分布に基づくCL-vMF機構を導入する。一方,von Mises-Fisher Outlier Detection and Interaction (vMF-ODI) 戦略を導入し,複雑な推論実現のためのSystem1とSystem2の連携を強化する。提案したICLの包括的評価は,既存の手法と比較して,忘れられ,優れた性能を示す。コードはgithub.com/ICLで入手できる。 Advanced life forms, sustained by the synergistic interaction of neural cognitive mechanisms, continually acquire and transfer knowledge throughout their lifespan. In contrast, contemporary machine learning paradigms exhibit limitations in emulating the facets of continual learning (CL). Nonetheless, the emergence of large language models (LLMs) presents promising avenues for realizing CL via interactions with these models. Drawing on Complementary Learning System theory, this paper presents a novel Interactive Continual Learning (ICL) framework, enabled by collaborative interactions among models of various sizes. Specifically, we assign the ViT model as System1 and multimodal LLM as System2. To enable the memory module to deduce tasks from class information and enhance Set2Set retrieval, we propose the Class-Knowledge-Task Multi-Head Attention (CKT-MHA). Additionally, to improve memory retrieval in System1 through enhanced geometric representation, we introduce the CL-vMF mechanism, based on the von Mises-Fisher (vMF) distribution. Meanwhile, we introduce the von Mises-Fisher Outlier Detection and Interaction (vMF-ODI) strategy to identify hard examples, thus enhancing collaboration between System1 and System2 for complex reasoning realization. Comprehensive evaluation of our proposed ICL demonstrates significant resistance to forgetting and superior performance relative to existing methods. Code is available at github.com/ICL.	翻訳日:2024-03-20 23:01:00 公開日:2024-03-19
# トポロジカルに保護された負の絡み合い Topologically protected negative entanglement ( http://arxiv.org/abs/2403.03259v2 ) ライセンス: Link先を確認	Wen-Tan Xue, Ching Hua Lee,	(参考訳) 絡み合いエントロピーは量子多体系の基本的な特性を符号化し、固有状態が一般に非直交状態となる非エルミート的セッティングにおいて特に微妙である。本研究では, 自由フェルミオン系, 特にトポロジカルフラットバンドにおいて, トポロジカルに保護された非直交エッジ状態から負の直交絡みが生じることを見出した。例外的な隙間のない点と負の絡み合いを関連づけた以前の文献を別として, ギャップ付き系では, 頑健に負の絡み合いが生じうることを示す。しかし、ギャップレス2次元位相平坦バンドは、新しい$S\sim -L_z^2\log L$エンタングルメント挙動を示し、これは逆次元$L_z$と2次にスケールする。我々の発見は、位相的絡み合いエントロピーの伝統的な概念とは無関係なトポロジーと絡み合いの新たな相互作用に光を当て、SWAP演算子期待値を介して超低温原子格子における第2R'enyiエントロピー測定により実験的に検証することができる。 The entanglement entropy encodes fundamental characteristics of quantum many-body systems, and is particularly subtle in non-Hermitian settings where eigenstates generically become non-orthogonal. In this work, we find that negative biorthogonal entanglement generically arises from topologically-protected non-orthogonal edge states in free fermion systems, especially within topological flat bands. Departing from previous literature which associated negative entanglement with exceptional gapless points, we show that robustly negative entanglement can still occur in gapped systems. Gapless 2D topological flat bands, however, exhibits novel $S\sim -L_z^2\log L$ entanglement behavior which scales quadratically with the transverse dimension $L_z$. Our discovery sheds light on a new interplay between topology and entanglement unrelated to traditional concepts of topological entanglement entropy, and can be experimentally verified through second R\'enyi entropy measurements in ultracold atomic lattices via SWAP operator expectation values.	翻訳日:2024-03-20 20:59:05 公開日:2024-03-19
# ハイパーパラメータ最適化におけるエンコーダに基づくウォームスタート法の再検討 Rethinking of Encoder-based Warm-start Methods in Hyperparameter Optimization ( http://arxiv.org/abs/2403.04720v2 ) ライセンス: Link先を確認	Dawid Płudowski, Antoni Zajko, Anna Kozak, Katarzyna Woźnica,	(参考訳) メタラーニングのための異種表形式のデータセットを効果的に表現することは、未解決の問題である。以前のアプローチは、例えば統計測度やランドマークのような、事前に定義されたメタ機能に依存していた。 Dataset2Vecのようなエンコーダベースのモデルは、人間の介入なしに重要なメタ機能を自動的に抽出することができる。この研究は、GitHub https://github.com/azoz01/liltabで利用可能なLiltabパッケージ内に実装された、新しいエンコーダベースのグラフデータセットの表現を導入している。本パッケージは, 岩田友治, 熊谷篤俊両氏が提唱した異種表型データの確立したモデルに基づく。提案手法では,Dataset2Vecのような既存手法と比較して,特徴関係を符号化し,代替表現を生成する。どちらもデータセット類似性学習の基本的な前提を活用している。本研究では、データセット全体の表現とハイパーパラメータ最適化のウォームスタートという、2つの一般的なメタタスクでDataset2VecとLiltabを評価した。しかし、独立したメタMIMICデータセットの検証は、表現学習における煩雑な課題を浮き彫りにする。一般表現は,要求が抽出中に明示的に考慮されないメタタスクでは十分でないことを示す。 Effectively representing heterogeneous tabular datasets for meta-learning remains an open problem. Previous approaches rely on predefined meta-features, for example, statistical measures or landmarkers. Encoder-based models, such as Dataset2Vec, allow us to extract significant meta-features automatically without human intervention. This research introduces a novel encoder-based representation of tabular datasets implemented within the liltab package available on GitHub https://github.com/azoz01/liltab. Our package is based on an established model for heterogeneous tabular data proposed in [Tomoharu Iwata and Atsutoshi Kumagai. Meta-learning from Tasks with Heterogeneous Attribute Spaces. In Advances in Neural Information Processing Systems, 2020]. The proposed approach employs a different model for encoding feature relationships, generating alternative representations compared to existing methods like Dataset2Vec. Both of them leverage the fundamental assumption of dataset similarity learning. In this work, we evaluate Dataset2Vec and liltab on two common meta-tasks - representing entire datasets and hyperparameter optimization warm-start. However, validation on an independent metaMIMIC dataset highlights the nuanced challenges in representation learning. We show that general representations may not suffice for some meta-tasks where requirements are not explicitly considered during extraction.	翻訳日:2024-03-20 20:59:05 公開日:2024-03-19
# 仮想試行のための拡散モデルの改善 Improving Diffusion Models for Virtual Try-on ( http://arxiv.org/abs/2403.05139v2 ) ライセンス: Link先を確認	Yisol Choi, Sangkyung Kwak, Kyungmin Lee, Hyungwon Choi, Jinwoo Shin,	(参考訳) 本稿では, 被写体と被写体とをそれぞれ一対のイメージとして, 被写体を被写体とする画像ベースの仮想試着について考察する。従来の研究は、他の方法(例えば、GANベース)と比較して、生成した視覚の自然性を改善するために、仮想試行のための既存の模範的な塗布拡散モデルを適用するが、衣服の同一性を保たない。この制限を克服するために,衣服の忠実度を改善し,仮想試行画像を生成する新しい拡散モデルを提案する。 IDM-VTONと呼ばれる本手法では,2つの異なるモジュールを用いて衣料画像のセマンティクスを符号化する。 1)視覚エンコーダから抽出されたハイレベルな意味論は、クロスアテンション層に融合し、その後、 2) 並列UNetから抽出した低レベル特徴を自己保持層に融合させる。さらに、生成した視覚の信頼性を高めるために、衣服画像と人物画像の両方に詳細なテキストプロンプトを提供する。最後に,一対の人着画像を用いたカスタマイズ手法を提案する。実験結果から,本手法は衣服の詳細保存や仮想試行画像の生成において,従来の手法(拡散法とGAN法の両方)より優れており,質的にも定量的にも優れていたことが示唆された。さらに,提案手法は実世界のシナリオにおいて有効であることを示す。詳しくはプロジェクトのページを参照してほしい。 This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment, given a pair of images depicting the person and the garment, respectively. Previous works adapt existing exemplar-based inpainting diffusion models for virtual try-on to improve the naturalness of the generated visuals compared to other methods (e.g., GAN-based), but they fail to preserve the identity of the garments. To overcome this limitation, we propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images. Our method, coined IDM-VTON, uses two different modules to encode the semantics of garment image; given the base UNet of the diffusion model, 1) the high-level semantics extracted from a visual encoder are fused to the cross-attention layer, and then 2) the low-level features extracted from parallel UNet are fused to the self-attention layer. In addition, we provide detailed textual prompts for both garment and person images to enhance the authenticity of the generated visuals. Finally, we present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity. Our experimental results show that our method outperforms previous approaches (both diffusion-based and GAN-based) in preserving garment details and generating authentic virtual try-on images, both qualitatively and quantitatively. Furthermore, the proposed customization method demonstrates its effectiveness in a real-world scenario. More visualizations are available in our project page: https://idm-vton.github.io	翻訳日:2024-03-20 20:59:05 公開日:2024-03-19
# ChatASU:LLMの反射を誘発して,対話におけるアスペクト知覚を真に理解する ChatASU: Evoking LLM's Reflexion to Truly Understand Aspect Sentiment in Dialogues ( http://arxiv.org/abs/2403.05326v3 ) ライセンス: Link先を確認	Yiding Liu, Jingjing Wang, Jiamin Luo, Tao Zeng, Guodong Zhou,	(参考訳) 対話型シナリオにおけるアスペクト知覚理解(ASU:Aspect Sentiment Understanding)は,近年ますます関心を集め,重要な進歩を遂げている。しかしながら、対話型ASUに関する既存の研究は、意見目標(つまりアスペクト)のコア参照問題をほとんど無視しているが、この現象は対話型シナリオ、特に対話型シナリオにおいて広く見られ、ASUのパフォーマンスを制限している。近年,大規模言語モデル (LLM) は,様々なNLPタスクをチャットパラダイムに統合する強力な能力を示している。そこで本稿では,対話シナリオにおけるアスペクト感情を理解するLLMの能力を探究する,Chat-based Aspect Sentiment Understanding (ChatASU)タスクを提案する。特に、このChatASUタスクはアスペクトコア参照問題に対処するためにサブタスク、すなわちアスペクトチェイン推論(ACR)タスクを導入している。そこで我々は,ChatASUのバックボーンとしてChatGLMを用いた信頼自己回帰アプローチ(TSA)を提案する。具体的には、このTSAは、ACRタスクを補助タスクとして扱うことにより、ASUタスクの性能を高めるとともに、信頼された学習を反射機構に統合し、TSAのLLM-本質的な事実幻覚問題を緩和する。さらに,高品質なChatASUデータセットをアノテートしてTSAを評価することにより,提案したTSAは,ChatASUに対するTSAの有効性を正当化し,ChatASUにおけるコア参照と幻覚の問題を考慮し,最先端のベースラインを著しく上回ることを示す。 Aspect Sentiment Understanding (ASU) in interactive scenarios (e.g., Question-Answering and Dialogue) has attracted ever-more interest in recent years and achieved important progresses. However, existing studies on interactive ASU largely ignore the coreference issue for opinion targets (i.e., aspects), while this phenomenon is ubiquitous in interactive scenarios especially dialogues, limiting the ASU performance. Recently, large language models (LLMs) shows the powerful ability to integrate various NLP tasks with the chat paradigm. In this way, this paper proposes a new Chat-based Aspect Sentiment Understanding (ChatASU) task, aiming to explore LLMs' ability in understanding aspect sentiments in dialogue scenarios. Particularly, this ChatASU task introduces a sub-task, i.e., Aspect Chain Reasoning (ACR) task, to address the aspect coreference issue. On this basis, we propose a Trusted Self-reflexion Approach (TSA) with ChatGLM as backbone to ChatASU. Specifically, this TSA treats the ACR task as an auxiliary task to boost the performance of the primary ASU task, and further integrates trusted learning into reflexion mechanisms to alleviate the LLMs-intrinsic factual hallucination problem in TSA. Furthermore, a high-quality ChatASU dataset is annotated to evaluate TSA, and extensive experiments show that our proposed TSA can significantly outperform several state-of-the-art baselines, justifying the effectiveness of TSA to ChatASU and the importance of considering the coreference and hallucination issues in ChatASU.	翻訳日:2024-03-20 20:59:05 公開日:2024-03-19
# デジタルウェルビーイング再定義 : ポジティブなソーシャルメディアエンゲージメントのためのユーザ中心アプローチに向けて Digital Wellbeing Redefined: Toward User-Centric Approach for Positive Social Media Engagement ( http://arxiv.org/abs/2403.05723v2 ) ライセンス: Link先を確認	Yixue Zhao, Tianyi Li, Michael Sobolev,	(参考訳) ソーシャルメディアの普及とその精神的健康への影響は、効果的なデジタル幸福戦略の必要性を浮き彫りにした。現在のデジタルウェルビーイングの介入は、主にスクリーン時間とソーシャルメディアの使用を減らすことに焦点を当てており、しばしばこれらのプラットフォームの潜在的な利益を無視している。本稿では,ユーザを限定的なルールで制限するのではなく,ポジティブなソーシャルメディア体験の強化を主眼とした新たな視点を紹介する。この観点から、今後の作業で考慮すべき重要な要件を概説し、この新興分野の対話を刺激することを目指しています。 PauseNowは、ユーザのデジタル行動と意図を一致させるように設計された革新的なデジタルウェルビーイングの介入である。 PauseNowは、デジタルヌードと意図認識のレコメンデーションを活用して、デジタル使用中に"失われた"ときにユーザーを元の意図に優遇し、よりマインドフルなソーシャルメディアの使用を促進する。 The prevalence of social media and its escalating impact on mental health has highlighted the need for effective digital wellbeing strategies. Current digital wellbeing interventions have primarily focused on reducing screen time and social media use, often neglecting the potential benefits of these platforms. This paper introduces a new perspective centered around empowering positive social media experiences, instead of limiting users with restrictive rules. In line with this perspective, we lay out the key requirements that should be considered in future work, aiming to spark a dialogue in this emerging area. We further present our initial effort to address these requirements with PauseNow, an innovative digital wellbeing intervention designed to align users' digital behaviors with their intentions. PauseNow leverages digital nudging and intention-aware recommendations to gently guide users back to their original intentions when they "get lost" during their digital usage, promoting a more mindful use of social media.	翻訳日:2024-03-20 20:59:05 公開日:2024-03-19
# KG-Rank:知識グラフとランキング技術による医療QAのための大規模言語モデルの実現 KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking Techniques ( http://arxiv.org/abs/2403.05881v2 ) ライセンス: Link先を確認	Rui Yang, Haoran Liu, Edison Marrese-Taylor, Qingcheng Zeng, Yu He Ke, Wanxin Li, Lechao Cheng, Qingyu Chen, James Caverlee, Yutaka Matsuo, Irene Li,	(参考訳) 大規模言語モデル(LLM)は、生成機能に対する医療革新が著しく進んでいる。しかし、医学的事実や固有のバイアスから逸脱する可能性があるため、実際の臨床現場での応用は困難である。本研究では,医学領域における自由文質問応答(QA)を改善することを目的として,医療知識グラフ(KG)をランク付けおよび再ランク付け技術に活用する拡張LDMフレームワークKG-Rankを開発する。具体的には,まず医療用KGからトリプレットを抽出し,事実情報を収集する。次に,これらの三重項の順序付けを洗練させる手法を革新的に適用し,より正確な解を求める。我々の知る限りでは、KG-Rankは、医学的QAにおけるKGと組み合わせたランキングモデルの最初の応用である。 KG-RankがROUGE-Lスコアで18%以上改善したことを示す。さらに、KG-Rankをオープンドメインに拡張し、ROUGE-Lの14%の改善を実現し、KG-Rankの有効性と可能性を示す。 Large Language Models (LLMs) have significantly advanced healthcare innovation on generation capabilities. However, their application in real clinical settings is challenging due to potential deviations from medical facts and inherent biases. In this work, we develop an augmented LLM framework, KG-Rank, which leverages a medical knowledge graph (KG) with ranking and re-ranking techniques, aiming to improve free-text question-answering (QA) in the medical domain. Specifically, upon receiving a question, we initially retrieve triplets from a medical KG to gather factual information. Subsequently, we innovatively apply ranking methods to refine the ordering of these triplets, aiming to yield more precise answers. To the best of our knowledge, KG-Rank is the first application of ranking models combined with KG in medical QA specifically for generating long answers. Evaluation of four selected medical QA datasets shows that KG-Rank achieves an improvement of over 18% in the ROUGE-L score. Moreover, we extend KG-Rank to open domains, where it realizes a 14% improvement in ROUGE-L, showing the effectiveness and potential of KG-Rank.	翻訳日:2024-03-20 20:59:05 公開日:2024-03-19
# CarbonNet: コンピュータビジョンは気候変動においてどのように役割を果たすか? 応用: CCSの地下構造からジオメカニクスを学習し、地球温暖化を緩和する CarbonNet: How Computer Vision Plays a Role in Climate Change? Application: Learning Geomechanics from Subsurface Geometry of CCS to Mitigate Global Warming ( http://arxiv.org/abs/2403.06025v3 ) ライセンス: Link先を確認	Wei Chen, Yunan Li, Yuan Tian,	(参考訳) 本稿では,炭素捕獲・隔離のための地下地形画像から地表面の変位を予測するために,コンピュータビジョンを用いた新しいアプローチを提案する。 CCSは、炭素中立社会の重要な構成要素であることが証明されている。しかし、科学者は、大きなモデルスケールと複雑な物理を持つ事前学習モデルの一般化に制限があるため、計算コストが高いという課題があると考えている。地下地形画像から直接モデルを訓練することで,これらの課題に対処する。本研究の目的は,炭素注入による地表面変位の応答を理解し,訓練されたモデルを用いてCCSプロジェクトの意思決定を通知することである。我々は,画像予測問題である静的力学問題に対して,複数のモデル(CNN,ResNet,ResNetUNet)を実装した。次に、ビデオ予測問題である過渡的力学シナリオにLSTMとトランスフォーマーを用いる。 ResNetUNetは静的力学問題におけるアーキテクチャにより他より優れており、LSTMは過渡問題におけるトランスフォーマーに匹敵する性能を示している。本報告では、我々のデータセットの詳細と、メソッドセクションのモデル記述について述べる。結果と議論では、将来の作業で重要な学習、観察、結論が論文をまとめて述べられている。 We introduce a new approach using computer vision to predict the land surface displacement from subsurface geometry images for Carbon Capture and Sequestration (CCS). CCS has been proved to be a key component for a carbon neutral society. However, scientists see there are challenges along the way including the high computational cost due to the large model scale and limitations to generalize a pre-trained model with complex physics. We tackle those challenges by training models directly from the subsurface geometry images. The goal is to understand the respons of land surface displacement due to carbon injection and utilize our trained models to inform decision making in CCS projects. We implement multiple models (CNN, ResNet, and ResNetUNet) for static mechanics problem, which is a image prediction problem. Next, we use the LSTM and transformer for transient mechanics scenario, which is a video prediction problem. It shows ResNetUNet outperforms the others thanks to its architecture in static mechanics problem, and LSTM shows comparable performance to transformer in transient problem. This report proceeds by outlining our dataset in detail followed by model descriptions in method section. Result and discussion state the key learning, observations, and conclusion with future work rounds out the paper.	翻訳日:2024-03-20 20:59:05 公開日:2024-03-19
# LLMは人間のラベルを置換できるか? : UAV配送のための微粒な中国語アドレスエンティティ認識データセットを事例として Can LLM Substitute Human Labeling? A Case Study of Fine-grained Chinese Address Entity Recognition Dataset for UAV Delivery ( http://arxiv.org/abs/2403.06097v2 ) ライセンス: Link先を確認	Yuxuan Yao, Sichun Luo, Haohan Zhao, Guanzhi Deng, Linqi Song,	(参考訳) CNER-UAV, a fine-fine \textbf{C}hinese \textbf{N}ame \textbf{E}ntity \textbf{R}ecognition dataset, specifically designed for the task of address resolution in \textbf{U}nmanned \textbf{A}erial \textbf{V}ehicle delivery system。データセットには5つのカテゴリがあり、NERモデルの総合的なトレーニングと評価を可能にする。このデータセットを構築するために、実際のUAV配信システムからデータをソースし、プライバシーとデータの整合性を確保するために厳密なデータクリーニングとデセンシタイズプロセスを実行した。得られたデータセットは約12,000の注釈付きサンプルからなり、人間の専門家とtextbf{L}arge \textbf{L}anguage \textbf{M}odelアノテーションが与えられた。従来のNERモデルをデータセット上で評価し,詳細な分析を行った。データセットとモデルは、 \url{https://github.com/zhhvv/CNER-UAV}で公開されている。 We present CNER-UAV, a fine-grained \textbf{C}hinese \textbf{N}ame \textbf{E}ntity \textbf{R}ecognition dataset specifically designed for the task of address resolution in \textbf{U}nmanned \textbf{A}erial \textbf{V}ehicle delivery systems. The dataset encompasses a diverse range of five categories, enabling comprehensive training and evaluation of NER models. To construct this dataset, we sourced the data from a real-world UAV delivery system and conducted a rigorous data cleaning and desensitization process to ensure privacy and data integrity. The resulting dataset, consisting of around 12,000 annotated samples, underwent human experts and \textbf{L}arge \textbf{L}anguage \textbf{M}odel annotation. We evaluated classical NER models on our dataset and provided in-depth analysis. The dataset and models are publicly available at \url{https://github.com/zhhvvv/CNER-UAV}.	翻訳日:2024-03-20 20:59:04 公開日:2024-03-19
# RLingua:大規模言語モデルを用いたロボットマニピュレーションにおける強化学習サンプル効率の改善 RLingua: Improving Reinforcement Learning Sample Efficiency in Robotic Manipulations With Large Language Models ( http://arxiv.org/abs/2403.06420v2 ) ライセンス: Link先を確認	Liangliang Chen, Yutian Lei, Shiyu Jin, Ying Zhang, Liangjun Zhang,	(参考訳) 強化学習(Reinforcement Learning, RL)は、様々なタスクを解く能力を示したが、サンプル効率が低いことで悪名高い。本稿では,大規模言語モデル(LLM)の内部知識を活用し,ロボット操作におけるRLの複雑さを軽減するフレームワークであるRLinguaを提案する。この目的のために,まず,特定のタスクに対する予備ルールベースのロボットコントローラをユーザフレンドリな方法で生成できるように,エンジニアリングの迅速化によるLCMの事前知識抽出手法を提案する。不完全にもかかわらず、LLM生成ロボットコントローラを使用して、ロールアウト中の動作サンプルを減衰確率で生成し、RLのサンプル効率を向上させる。我々は、広く使われているRLベースライン手法であるTD3を使用し、LCM生成コントローラに対するポリシー学習を規則化するためにアクター損失を修正した。 RLinguaはまた、不完全なLLM生成ロボットコントローラをRLにより改善する新しい方法も提供する。 RLBenchでは,Panda_gymの4つのロボットタスクにおいて,TD3のサンプル複雑性を著しく低減し,標準のTD3が故障した12のロボットタスクにおいて高い成功率を達成できることが実証された。さらに,実世界のロボット実験におけるRLinguaの有効性をSim2Realを通じて検証し,学習方針が実ロボットのタスクに効果的に伝達可能であることを示した。私たちの作業の詳細は、プロジェクトのWebサイトhttps://rlingua.github.io.comで確認できます。 Reinforcement learning (RL) has demonstrated its capability in solving various tasks but is notorious for its low sample efficiency. In this paper, we propose RLingua, a framework that can leverage the internal knowledge of large language models (LLMs) to reduce the sample complexity of RL in robotic manipulations. To this end, we first present a method for extracting the prior knowledge of LLMs by prompt engineering so that a preliminary rule-based robot controller for a specific task can be generated in a user-friendly manner. Despite being imperfect, the LLM-generated robot controller is utilized to produce action samples during rollouts with a decaying probability, thereby improving RL's sample efficiency. We employ TD3, the widely-used RL baseline method, and modify the actor loss to regularize the policy learning towards the LLM-generated controller. RLingua also provides a novel method of improving the imperfect LLM-generated robot controllers by RL. We demonstrate that RLingua can significantly reduce the sample complexity of TD3 in four robot tasks of panda_gym and achieve high success rates in 12 sampled sparsely rewarded robot tasks in RLBench, where the standard TD3 fails. Additionally, We validated RLingua's effectiveness in real-world robot experiments through Sim2Real, demonstrating that the learned policies are effectively transferable to real robot tasks. Further details about our work are available at our project website https://rlingua.github.io.	翻訳日:2024-03-20 20:59:04 公開日:2024-03-19
# 事前訓練モデルによる画像復元の促進 Boosting Image Restoration via Priors from Pre-trained Models ( http://arxiv.org/abs/2403.06793v2 ) ライセンス: Link先を確認	Xiaogang Xu, Shu Kong, Tao Hu, Zhe Liu, Hujun Bao,	(参考訳) CLIPやStable Diffusionのような大規模トレーニングデータを持つ事前学習モデルは、画像理解や言語記述からの生成など、様々なハイレベルなコンピュータビジョンタスクにおいて顕著な性能を示している。しかし、画像復元のような低レベルタスクの可能性は、いまだに未解明のままである。本稿では,画像復元のためのモデルについて検討する。事前学習したモデルからのオフ・ザ・シェルフ機能(OSF)は直接画像復元に役立たないため,OSFを用いたターゲット復元ネットワークの復元結果を改善するために,Pre-Train-Guided Refinement Module (PTG-RM)と呼ばれる軽量モジュールを学習することを提案する。 PTG-RMは、PTG-SVE(Pre-Train-Guided Space-Varying Enhancement)とPTG-CSA(Pre-Train-Guided Channel-Spatial Attention)の2つのコンポーネントから構成される。 PTG-SVEは最適な短距離と長距離の神経操作を可能にし、PTG-CSAは修復関連学習のための空間チャネルの注意を高める。 PTG-RMの小型化($1M)は, 低照度向上, デラライニング, 脱臭, 脱臭など, 各種モデルの復元性能を効果的に向上することを示した。 Pre-trained models with large-scale training data, such as CLIP and Stable Diffusion, have demonstrated remarkable performance in various high-level computer vision tasks such as image understanding and generation from language descriptions. Yet, their potential for low-level tasks such as image restoration remains relatively unexplored. In this paper, we explore such models to enhance image restoration. As off-the-shelf features (OSF) from pre-trained models do not directly serve image restoration, we propose to learn an additional lightweight module called Pre-Train-Guided Refinement Module (PTG-RM) to refine restoration results of a target restoration network with OSF. PTG-RM consists of two components, Pre-Train-Guided Spatial-Varying Enhancement (PTG-SVE), and Pre-Train-Guided Channel-Spatial Attention (PTG-CSA). PTG-SVE enables optimal short- and long-range neural operations, while PTG-CSA enhances spatial-channel attention for restoration-related learning. Extensive experiments demonstrate that PTG-RM, with its compact size ($<$1M parameters), effectively enhances restoration performance of various models across different tasks, including low-light enhancement, deraining, deblurring, and denoising.	翻訳日:2024-03-20 20:59:04 公開日:2024-03-19
# フラクソニウム量子ビット間の高忠実二量子ゲートの設計 Designing high-fidelity two-qubit gates between fluxonium qubits ( http://arxiv.org/abs/2403.07242v2 ) ライセンス: Link先を確認	Emma L. Rosenfeld, Connor T. Hann, David I. Schuster, Matthew H. Matheny, Aashish A. Clerk,	(参考訳) 我々は、最小限の誤差、速度、制御の単純さのために、フラクソニウム量子ビット間の2量子ゲートを設計するために、ボトムアップの第1原理のアプローチをとる。提案アーキテクチャは、線形共振器を介して結合された2つのフラクソニウムからなる。線形カプラは、損失を抑制するための材料最適化の可能性を導入し、大きな電荷ゼロ点変動による状態選択遷移の効率的な駆動を可能にし、接合時効に対する感度を低下させ、2レベル系へのコヒーレント結合を部分的に緩和する。重要なことに、共振器・アズ・カプラのアプローチは、カプラのインピーダンスが高いときに容量負荷を減らすことにより、フラクソニウム量子ビット間の接続性を高めるための明確な経路を示唆している。回路ハミルトニアンおよびゲートダイナミクスの解析および数値解析を行った後、回路パラメータを調整してコヒーレントエラーの発生源を破壊的に妨害し、ゲート長によるコヒーレントエラーの効率的な4次スケーリングを明らかにする。文献からの成分特性について、開系平均CZゲート不忠実度は70nsで1.86 \times 10^{-4}$と予測する。 We take a bottom-up, first-principles approach to design a two-qubit gate between fluxonium qubits for minimal error, speed, and control simplicity. Our proposed architecture consists of two fluxoniums coupled via a linear resonator. Using a linear coupler introduces the possibility of material optimization for suppressing its loss, enables efficient driving of state-selective transitions through its large charge zero point fluctuation, reduces sensitivity to junction aging, and partially mitigates coherent coupling to two-level systems. Crucially, a resonator-as-coupler approach also suggests a clear path to increased connectivity between fluxonium qubits, by reducing capacitive loading when the coupler has a high impedance. After performing analytic and numeric analyses of the circuit Hamiltonian and gate dynamics, we tune circuit parameters to destructively interfere sources of coherent error, revealing an efficient, fourth-order scaling of coherent error with gate duration. For component properties from the literature, we predict an open-system average CZ gate infidelity of $1.86 \times 10^{-4}$ in 70ns.	翻訳日:2024-03-20 20:49:20 公開日:2024-03-19
# マイノリティー分節の促進は一般化にどのように影響するか?-群不均衡における一重層ニューラルネットワークの理論的研究 How does promoting the minority fraction affect generalization? A theoretical study of the one-hidden-layer neural network on group imbalance ( http://arxiv.org/abs/2403.07310v2 ) ライセンス: Link先を確認	Hongkang Li, Shuai Zhang, Yihua Zhang, Meng Wang, Sijia Liu, Pin-Yu Chen,	(参考訳) グループ不均衡は経験的リスク最小化(ERM)において既知の問題であり、達成された平均精度は少数集団において低い精度で伴っている。マイノリティ群精度を改善するアルゴリズム的な努力にもかかわらず、個々の群に対するERMの理論的一般化分析はいまだに解明されていない。ガウス混合モデルを用いて群不均衡問題を定式化することにより、各群がサンプルの複雑さ、収束率、平均および群レベルの試験性能に与える影響を定量化する。理論的枠組みは,一層ニューラルネットワークを用いた二項分類に重点を置いているが,一般に研究されている平均一般化性能に加えて,ERMの群レベル一般化に関する最初の理論的解析を行った。我々の理論結果のサンプルは、全てのグループレベルの共分散が中程度にあり、全ての平均が0に近い場合、学習性能は、小さなサンプルの複雑さ、速いトレーニング率、高い平均およびグループレベルのテスト精度の点で最も望ましいことである。さらに,トレーニングデータにおけるマイノリティ群の割合の増加は,マイノリティ群の一般化性能を必ずしも向上させるものではないことを示す。画像分類において,CelebAやCIFAR-10などの合成データセットと実験データセットの両方で理論的結果が検証された。 Group imbalance has been a known problem in empirical risk minimization (ERM), where the achieved high average accuracy is accompanied by low accuracy in a minority group. Despite algorithmic efforts to improve the minority group accuracy, a theoretical generalization analysis of ERM on individual groups remains elusive. By formulating the group imbalance problem with the Gaussian Mixture Model, this paper quantifies the impact of individual groups on the sample complexity, the convergence rate, and the average and group-level testing performance. Although our theoretical framework is centered on binary classification using a one-hidden-layer neural network, to the best of our knowledge, we provide the first theoretical analysis of the group-level generalization of ERM in addition to the commonly studied average generalization performance. Sample insights of our theoretical results include that when all group-level co-variance is in the medium regime and all mean are close to zero, the learning performance is most desirable in the sense of a small sample complexity, a fast training rate, and a high average and group-level testing accuracy. Moreover, we show that increasing the fraction of the minority group in the training data does not necessarily improve the generalization performance of the minority group. Our theoretical results are validated on both synthetic and empirical datasets, such as CelebA and CIFAR-10 in image classification.	翻訳日:2024-03-20 20:49:20 公開日:2024-03-19
# リンク予測のための知識グラフ大言語モデル(KG-LLM) Knowledge Graph Large Language Model (KG-LLM) for Link Prediction ( http://arxiv.org/abs/2403.07311v4 ) ライセンス: Link先を確認	Dong Shu, Tianle Chen, Mingyu Jin, Yiting Zhang, Chong Zhang, Mengnan Du, Yongfeng Zhang,	(参考訳) 知識グラフ(KG)内の複数のリンクを予測するタスクは、知識グラフ解析の分野における課題であり、自然言語処理(NLP)やKG埋め込み技術の進歩により、ますます解決しやすくなっている。本稿では,知識グラフ大言語モデルフレームワーク(KG-LLM)を提案する。このフレームワークは,KGにおけるマルチホップリンク予測を強化するために,チェーン・オブ・シンクレット(CoT)やインコンテキスト学習(ICL)など,重要なNLPパラダイムを活用する。 KGをCoTプロンプトに変換することで、我々のフレームワークはエンティティの潜在表現とその相互関係を識別し、学習するように設計されている。 KG-LLM フレームワークの有効性を示すため,本フレームワークでは,ICL と ICL の2つのタスクを総合的な評価に用い,主要な3つのLarge Language Model (LLM) を微調整する。さらに、これまで見つからなかったプロンプトを扱うため、ゼロショット機能を備えたLLMを提供するフレームワークの可能性についても検討する。実験の結果,ICLとCoTの統合はアプローチの性能を向上するだけでなく,モデルの一般化能力を大幅に向上させ,不慣れなシナリオにおけるより正確な予測を可能にすることがわかった。 The task of predicting multiple links within knowledge graphs (KGs) stands as a challenge in the field of knowledge graph analysis, a challenge increasingly resolvable due to advancements in natural language processing (NLP) and KG embedding techniques. This paper introduces a novel methodology, the Knowledge Graph Large Language Model Framework (KG-LLM), which leverages pivotal NLP paradigms, including chain-of-thought (CoT) prompting and in-context learning (ICL), to enhance multi-hop link prediction in KGs. By converting the KG to a CoT prompt, our framework is designed to discern and learn the latent representations of entities and their interrelations. To show the efficacy of the KG-LLM Framework, we fine-tune three leading Large Language Models (LLMs) within this framework, employing both non-ICL and ICL tasks for a comprehensive evaluation. Further, we explore the framework's potential to provide LLMs with zero-shot capabilities for handling previously unseen prompts. Our experimental findings discover that integrating ICL and CoT not only augments the performance of our approach but also significantly boosts the models' generalization capacity, thereby ensuring more precise predictions in unfamiliar scenarios.	翻訳日:2024-03-20 20:49:20 公開日:2024-03-19
# 運動マンバ:階層型および双方向選択型SSMを用いた効率よく長周期な運動生成 Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM ( http://arxiv.org/abs/2403.07487v3 ) ライセンス: Link先を確認	Zeyu Zhang, Akide Liu, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang,	(参考訳) 人間の動き生成は、生成的コンピュータビジョンにおいて重要な追求であり、長いシーケンスと効率的な動き生成を実現することは依然として困難である。状態空間モデル(SSM)の最近の進歩、特にMambaは、効率的なハードウェア・アウェア・デザインによる長いシーケンス・モデリングにおいてかなりの可能性を秘めている。それでも、モーション生成へのSSMの適用は、モーションシーケンスをモデル化するための特別な設計アーキテクチャが欠如しているため、ハードルに直面している。これらの課題に対処するために、我々はSSMを用いた先駆的な動き生成モデルを示すシンプルで効率的なアプローチであるMotion Mambaを提案する。具体的には,階層型テンポラルマンバ(HTM)ブロックを設計し,フレーム間の動きの整合性を保つことを目的とした対称U-Netアーキテクチャを用いて,孤立SSMモジュールの様々な数をアンサンブルすることで時間データを処理する。また,2方向空間マンバ(BSM)ブロックを2方向処理し,時間フレーム内での高精度な動作生成を実現する。提案手法は,HumanML3DおよびKIT-MLデータセットの最大50%のFID改善と最大4倍の高速化を実現する。 https://steve-zeyu-zhang.github.io/MotionMamba/ Human motion generation stands as a significant pursuit in generative computer vision, while achieving long-sequence and efficient motion generation remains challenging. Recent advancements in state space models (SSMs), notably Mamba, have showcased considerable promise in long sequence modeling with an efficient hardware-aware design, which appears to be a promising direction to build motion generation model upon it. Nevertheless, adapting SSMs to motion generation faces hurdles since the lack of a specialized design architecture to model motion sequence. To address these challenges, we propose Motion Mamba, a simple and efficient approach that presents the pioneering motion generation model utilized SSMs. Specifically, we design a Hierarchical Temporal Mamba (HTM) block to process temporal data by ensemble varying numbers of isolated SSM modules across a symmetric U-Net architecture aimed at preserving motion consistency between frames. We also design a Bidirectional Spatial Mamba (BSM) block to bidirectionally process latent poses, to enhance accurate motion generation within a temporal frame. Our proposed method achieves up to 50% FID improvement and up to 4 times faster on the HumanML3D and KIT-ML datasets compared to the previous best diffusion-based method, which demonstrates strong capabilities of high-quality long sequence motion modeling and real-time human motion generation. See project website https://steve-zeyu-zhang.github.io/MotionMamba/	翻訳日:2024-03-20 20:49:20 公開日:2024-03-19
# ドメインシフト緩和によるGAN画像翻訳におけるモデル抽出攻撃に向けて Towards Model Extraction Attacks in GAN-Based Image Translation via Domain Shift Mitigation ( http://arxiv.org/abs/2403.07673v3 ) ライセンス: Link先を確認	Di Mi, Yanjun Zhang, Leo Yu Zhang, Shengshan Hu, Qi Zhong, Haizhuan Yuan, Shirui Pan,	(参考訳) モデル抽出攻撃(MEA)は、攻撃者が被害者のディープニューラルネットワーク(DNN)モデルの機能をリモートでクエリするだけで複製することができ、クエリ毎のDNNベースのサービスのセキュリティと整合性に深刻な脅威をもたらす。近年のMEA研究の大部分はニューラル分類器に重点を置いているが、日々の作業においてイメージ・ツー・イメージ翻訳(I2IT)タスクが普及している。しかし、DNN分類器のMEAのために開発された技術は、直接I2ITのケースに転送できず、I2ITモデルの脆弱性をMEA攻撃に反映することが多い。本稿では,新たな視点からI2ITタスクにおけるMEAの脅威を明らかにする。攻撃者クエリと被害者のトレーニングサンプル間の分配ギャップを埋める従来のアプローチから、我々はドメインシフトとして知られる、異なる分布に起因する影響を緩和することを選択した。これは、高周波ノイズをペナライズする新しい正規化項を導入し、シフト分布への過度な適合を避けるために、より平坦な最小化を求めることで達成される。画像スーパーレゾリューションやスタイル転送など、さまざまな画像翻訳タスクに関する大規模な実験は、異なるバックボーンの犠牲者モデルで行われ、新しい設計は、すべての指標において、ベースラインよりも一貫して優れています。実際のI2IT APIも、攻撃に対して極めて脆弱であることが確認されており、防御強化の必要性と、潜在的に改訂されたAPIパブリッシングポリシーを強調している。 Model extraction attacks (MEAs) enable an attacker to replicate the functionality of a victim deep neural network (DNN) model by only querying its API service remotely, posing a severe threat to the security and integrity of pay-per-query DNN-based services. Although the majority of current research on MEAs has primarily concentrated on neural classifiers, there is a growing prevalence of image-to-image translation (I2IT) tasks in our everyday activities. However, techniques developed for MEA of DNN classifiers cannot be directly transferred to the case of I2IT, rendering the vulnerability of I2IT models to MEA attacks often underestimated. This paper unveils the threat of MEA in I2IT tasks from a new perspective. Diverging from the traditional approach of bridging the distribution gap between attacker queries and victim training samples, we opt to mitigate the effect caused by the different distributions, known as the domain shift. This is achieved by introducing a new regularization term that penalizes high-frequency noise, and seeking a flatter minimum to avoid overfitting to the shifted distribution. Extensive experiments on different image translation tasks, including image super-resolution and style transfer, are performed on different backbone victim models, and the new design consistently outperforms the baseline by a large margin across all metrics. A few real-life I2IT APIs are also verified to be extremely vulnerable to our attack, emphasizing the need for enhanced defenses and potentially revised API publishing policies.	翻訳日:2024-03-20 20:49:20 公開日:2024-03-19
# 再現性と幾何学的固有次元:グラフニューラルネットワークの研究 Reproducibility and Geometric Intrinsic Dimensionality: An Investigation on Graph Neural Network Research ( http://arxiv.org/abs/2403.08438v2 ) ライセンス: Link先を確認	Tobias Hille, Maximilian Stubbemann, Tom Hanika,	(参考訳) 近年,機械学習研究における実証的証拠の複製と再現性の難しさが注目されている。機械学習の研究結果が健全で信頼性の高いことを保証するには再現性が必要であり、同じコードとデータを使って研究結果の信頼性を検証する。これにより、オープンでアクセス可能な研究、堅牢な実験ワークフロー、そして新しい発見の迅速な統合が促進される。研究出版物がこれらの再現性の異なる側面をサポートする程度を評価することが,本研究の目標である。そこで我々は,機械学習における再現性オントロジーを導入し,それをグラフニューラルネットワークの手法に適用する。データ収集、表現、分析の課題を引き起こす次元の呪いによって、代表データを見つけにくくなり、トレーニングや推論プロセスの妨げになる。幾何内在次元という密接な結びついた概念を用いて、使用する機械学習モデルの拡張は、トレーニングされたデータセットの内在次元に影響されるかを調べる。 Difficulties in replication and reproducibility of empirical evidences in machine learning research have become a prominent topic in recent years. Ensuring that machine learning research results are sound and reliable requires reproducibility, which verifies the reliability of research findings using the same code and data. This promotes open and accessible research, robust experimental workflows, and the rapid integration of new findings. Evaluating the degree to which research publications support these different aspects of reproducibility is one goal of the present work. For this we introduce an ontology of reproducibility in machine learning and apply it to methods for graph neural networks. Building on these efforts we turn towards another critical challenge in machine learning, namely the curse of dimensionality, which poses challenges in data collection, representation, and analysis, making it harder to find representative data and impeding the training and inference processes. Using the closely linked concept of geometric intrinsic dimension we investigate to which extend the used machine learning models are influenced by the intrinsic dimension of the data sets they are trained on.	翻訳日:2024-03-20 20:49:20 公開日:2024-03-19
# マルチモーダル拡散モデルを用いた高密度・高精度レーダ知覚に向けて Towards Dense and Accurate Radar Perception Via Efficient Cross-Modal Diffusion Model ( http://arxiv.org/abs/2403.08460v2 ) ライセンス: Link先を確認	Ruibin Zhang, Donglai Xue, Yuhan Wang, Ruixu Geng, Fei Gao,	(参考訳) ミリ波レーダー(mmWave)は、極度の気象条件下での運用能力から、学術と産業の両方から大きな注目を集めている。しかし、マイクロエアロビー(MAV)の自律航法分野への応用を妨げる、空間性やノイズ干渉の観点からは課題に直面している。そこで本稿では, クロスモーダル学習による高密度かつ高精度なmmWaveレーダポイント雲構築手法を提案する。具体的には, 2組の生レーダデータからLiDARのような点雲を予測するために, 生成モデルにおける最先端性能を有する拡散モデルを提案する。また,提案手法が限られた計算資源を持つMAV上で実装可能であることを保証するため,近年の拡散モデル推論の高速化技術も取り入れた。コードおよび事前トレーニングされたモデルはhttps://github.com/ZJU-FAST-Lab/Radar-Diffusion.comで利用可能になる。 Millimeter wave (mmWave) radars have attracted significant attention from both academia and industry due to their capability to operate in extreme weather conditions. However, they face challenges in terms of sparsity and noise interference, which hinder their application in the field of micro aerial vehicle (MAV) autonomous navigation. To this end, this paper proposes a novel approach to dense and accurate mmWave radar point cloud construction via cross-modal learning. Specifically, we introduce diffusion models, which possess state-of-the-art performance in generative modeling, to predict LiDAR-like point clouds from paired raw radar data. We also incorporate the most recent diffusion model inference accelerating techniques to ensure that the proposed method can be implemented on MAVs with limited computing resources.We validate the proposed method through extensive benchmark comparisons and real-world experiments, demonstrating its superior performance and generalization ability. Code and pretrained models will be available at https://github.com/ZJU-FAST-Lab/Radar-Diffusion.	翻訳日:2024-03-20 20:49:20 公開日:2024-03-19
# 断熱的に分離されたサブシステム進化による予測量子固有解法:雑音量子コンピュータにおける分子エネルギー学への資源効率なアプローチ Projective Quantum Eigensolver via Adiabatically Decoupled Subsystem Evolution: a Resource Efficient Approach to Molecular Energetics in Noisy Quantum Computers ( http://arxiv.org/abs/2403.08519v2 ) ライセンス: Link先を確認	Chayan Patra, Sonaldeep Halder, Rahul Maitra,	(参考訳) 量子コンピュータは化学の分野で大きな可能性を秘めており、古典的なコンピュータの範囲を超える複雑な多くの身体問題を解くために新しいフロンティアを開拓している。しかし、現在の量子ハードウェアのノイズは、大きな化学系に適用性を制限する。この研究は、ノイズ中間スケール量子(NISQ)ハードウェアを用いて分子系の基底状態エネルギーを資源効率よく正確に計算することを目的とした射影形式の開発を含む。本手法は, 連接型ユニタリ結合クラスタ(dUCC)フレームワークにおいて, 連接型パラメータ化アンサッツの定式化に依存している。このようなデカップリングは、低次元多様体における全パラメータ最適化をエミュレートし、パラメータ間の相互相乗関係を利用して特性精度を確保する。回路前測定を行なわずに、より浅い回路と期待値の少ないよりコンパクトな固定深度アンサッツを導出する。解析的および数値的な実演を通して,将来の耐故障システムにおいて必要な精度を確保しつつ,ノイズ下での手法の優れた性能を実証する。このアプローチは、短期量子ハードウェア資源の効率的な利用により、出現する化学空間の迅速な探索を可能にする。 Quantum computers hold immense potential in the field of chemistry, ushering new frontiers to solve complex many body problems that are beyond the reach of classical computers. However, noise in the current quantum hardware limits their applicability to large chemical systems. This work encompasses the development of a projective formalism that aims to compute ground-state energies of molecular systems accurately using Noisy Intermediate Scale Quantum (NISQ) hardware in a resource efficient manner. Our approach is reliant upon the formulation of a bipartitely decoupled parameterized ansatz within the disentangled unitary coupled cluster (dUCC) framework based on the principles of synergetics. Such decoupling emulates the total parameter optimization in a lower dimensional manifold, while a mutual synergistic relationship among the parameters is exploited to ensure characteristic accuracy. Without any pre-circuit measurements, our method leads to a highly compact fixed-depth ansatz with shallower circuits and fewer expectation value evaluations. Through analytical and numerical demonstrations, we demonstrate the method's superior performance under noise while concurrently ensuring requisite accuracy in future fault-tolerant systems. This approach enables rapid exploration of emerging chemical spaces by efficient utilization of near-term quantum hardware resources.	翻訳日:2024-03-20 20:49:20 公開日:2024-03-19
# 量子モンテカルロによる分配関数の計算法 Reweight-annealing method for calculating the value of partition function via quantum Monte Carlo ( http://arxiv.org/abs/2403.08642v2 ) ライセンス: Link先を確認	Yi-Ming Ding, Jun-Song Sun, Nvsen Ma, Gaopei Pan, Chen Cheng, Zheng Yan,	(参考訳) 分割関数、自由エネルギー、熱エントロピー計算の効率的かつ正確なアルゴリズムは、統計物理学や量子多体物理学において非常に重要である。ここでは、量子モンテカルロフレームワーク内のバイアスのない低技術バリアアルゴリズムについて述べる。従来の比熱積分法やWang-Landauサンプリング法と比較すると,エントロピーのサブリード係数のより正確な結果が得られる。この方法は古典的モンテカルロシミュレーションと量子的モンテカルロシミュレーションの両方で広く利用でき、コンピュータ上で容易に並列化できる。 Efficient and accurate algorithm for partition function, free energy and thermal entropy calculations is of great significance in statistical physics and quantum many-body physics. Here we present an unbiased but low-technical-barrier algorithm within the quantum Monte Carlo framework, which has exceptionally high accuracy and no systemic error. Compared with the conventional specific heat integral method and Wang-Landau sampling algorithm, our method can obtain a much more accurate result of the sub-leading coefficient of the entropy. This method can be widely used in both classical and quantum Monte Carlo simulations and is easy to be parallelized on computer.	翻訳日:2024-03-20 20:49:20 公開日:2024-03-19
# サイバーセキュリティにおけるジェネレーティブAI手法のレビュー Review of Generative AI Methods in Cybersecurity ( http://arxiv.org/abs/2403.08701v2 ) ライセンス: Link先を確認	Yagmur Yigit, William J Buchanan, Madjid G Tehrani, Leandros Maglaras,	(参考訳) 過去10年間で、人工知能(AI)は、特にChatGPT、Gemini、DALL-Eといったチャットボットの使用によって、ますます人気が高まっている。この増加に伴い、大規模言語モデル(LLM)やジェネレーティブAI(GenAI)も日常的な利用で普及している。これらの進歩はサイバーセキュリティの防御姿勢を強化し、敵に対する新たな攻撃経路を開く。本稿では,GenAIの現状を概観し,暴行,脱獄,即時注射と逆心理学の応用について概説する。本稿では,自動ハッキング,フィッシングメール,ソーシャルエンジニアリング,リバース暗号,攻撃ペイロードの作成,マルウェア作成など,サイバー犯罪におけるGenAIのさまざまな応用について述べる。 GenAIは、データセットの構築、安全なコード開発、脅威インテリジェンス、防御措置、報告、サイバー攻撃検出などの戦略を通じて、防御サイバーセキュリティプロセスの自動化を大幅に改善することができる。本研究では、GenAIが現在生み出している問題に対処し、サイバーセキュリティにおけるその将来的な応用への公平なアプローチをさらに促進するために、堅牢な倫理的規範と革新的な防衛メカニズムの開発に重点を置くことを提案する。さらに、科学的発展と倫理的考察のギャップを埋めるための学際的アプローチの重要性を強調した。 Over the last decade, Artificial Intelligence (AI) has become increasingly popular, especially with the use of chatbots such as ChatGPT, Gemini, and DALL-E. With this rise, large language models (LLMs) and Generative AI (GenAI) have also become more prevalent in everyday use. These advancements strengthen cybersecurity's defensive posture and open up new attack avenues for adversaries as well. This paper provides a comprehensive overview of the current state-of-the-art deployments of GenAI, covering assaults, jailbreaking, and applications of prompt injection and reverse psychology. This paper also provides the various applications of GenAI in cybercrimes, such as automated hacking, phishing emails, social engineering, reverse cryptography, creating attack payloads, and creating malware. GenAI can significantly improve the automation of defensive cyber security processes through strategies such as dataset construction, safe code development, threat intelligence, defensive measures, reporting, and cyberattack detection. In this study, we suggest that future research should focus on developing robust ethical norms and innovative defense mechanisms to address the current issues that GenAI creates and to also further encourage an impartial approach to its future application in cybersecurity. Moreover, we underscore the importance of interdisciplinary approaches further to bridge the gap between scientific developments and ethical considerations.	翻訳日:2024-03-20 20:49:20 公開日:2024-03-19
# 量子基礎への新しいアプローチといくつかの結果 A new approach towards quantum foundation and some consequences ( http://arxiv.org/abs/2403.09224v2 ) ライセンス: Link先を確認	Inge S. Helland,	(参考訳) 6つの仮定に基づく一般的な理論が紹介される。基本的な概念は、観測者または通信観測者のグループと関連付けられた理論変数である。これらの変数はアクセス可能かアクセス不能である。これらの仮定から、量子論の通常の形式主義が導かれる。数学の導出はこの記事には書かれていないが、最近の記事[9, 10]を参照しよう。一般理論の3つの可能な応用が与えられる。 1) 変数は,人又は人の集団の決定に関連する変数を判断することができる。 2) 変数は統計的パラメータや将来のデータかもしれない。 3)変数は、あるコンテキストにおける物理変数である。この最後の応用は、量子力学の全く新しい基盤を与える。これは私の意見では、通常の形式論よりも理解しやすい基礎であり、他の応用もこのアプローチの興味深い結果をもたらすように思える。 Schr\"odinger's cat"のようないわゆるパラドックスは、この理論の下で解明することができる。デービッド・ボームのEPR実験の結果とベル実験の結果について解説する。最後に、相対論と場の量子論へのリンクへの参照が与えられる。 A general theory based upon 6 postulates is introduced. The basical notions are theoretical variables that are associated with an observer or with a group of communicating observers. These variables may be accessible or inaccessible. From these postulates, the ordinary formalism of quantum theory are derived. The mathematical derivations are not given in this article, but I refer to the recent articles [9, 10]. Three possible applications of the general theory can be given; 1) The variables may decision variables connected to the decisions of a person or of a group of persons. 2) The variables may be statistical parameters or future data, But most importantly here: 3) The variables are physical variables in some context. This last application gives a completely new foundation of quantum mechanics, a foundation which in my opinion is much more easy to understand than the ordinary formalism.The other applications seem also to give interesting consequences of the approach. Socalled paradoxes like that of Schr\"odinger's cat can be clarified under the theory. Explanations of the outcomes of David Bohm's version of the EPR experiment and of the Bell experiment are provided. Finally, references to links towards relativity theory and to quantum field theory are given.	翻訳日:2024-03-20 20:49:20 公開日:2024-03-19
# リンドブラッド進化における古典的量子対応 Classical-Quantum correspondence in Lindblad evolution ( http://arxiv.org/abs/2403.09345v2 ) ライセンス: Link先を確認	Jeffrey Galkowski, Maciej Zworski,	(参考訳) 古典的ハミルトニアンと(多くは)線型に成長する古典的ジャンプ関数(ある楕円性条件を満たすと仮定されるジャンプ作用素に量子化され、より大きなシステムとのモデリング相互作用)を用いて定義されるリンドブラッドの進化について、量子可観測関数の進化はエルベルト-シュミットノルムにおける古典的フォッカー-プランクの進化に近く、エレンフェスト時(ジャンプ作用素とのそのような合意の限界)をはるかに超えていることを示す。時間スケールは、Hern\'andez--Ranard--Riedelによる最近の2つの論文と同じであるが、ステートメントとメソッドが異なる。 We show that for the Lindblad evolution defined using (at most) quadratically growing classical Hamiltonians and (at most) linearly growing classical jump functions (quantized into jump operators assumed to satisfy certain ellipticity conditions and modeling interaction with a larger system), the evolution of a quantum observable remains close to the classical Fokker--Planck evolution in the Hilbert--Schmidt norm for times vastly exceeding the Ehrenfest time (the limit of such agreement with no jump operators). The time scale is the same as in two recent papers by Hern\'andez--Ranard--Riedel but the statement and methods are different.	翻訳日:2024-03-20 20:49:20 公開日:2024-03-19
# コモド:インドネシアの地域言語への言語学的遠征 Komodo: A Linguistic Expedition into Indonesia's Regional Languages ( http://arxiv.org/abs/2403.09362v2 ) ライセンス: Link先を確認	Louis Owen, Vishesh Tripathi, Abhay Kumar, Biddwan Ahmed,	(参考訳) 近年のLLM(Large Language Models)のブレークスルーは、主に英語のような手軽で十分なリソースを持つ言語に焦点を当てている。しかし、パブリックドメインで十分な言語資源が不足している言語には、依然として大きなギャップがある。インドネシア語,英語,11の地域言語をシームレスに操作することで,このギャップに対処する7ビリオンパラメータ大言語モデルであるKomodo-7Bを紹介した。コモド-7Bは、コモド-7B-ベースとコモド-7B-インストラクションからなるLLMのファミリーである。 Komodo-7B-Instructは、OpenAIのGPT-3.5、CohereのAya-101、Llama-2-Chat-13B、Mixtral-8x7B-Instruct-v0.1、Gemma-7B-itなどのベンチマークを上回り、様々なタスクや言語で最先端のパフォーマンスを達成することで際立っている。このモデルは、言語固有の評価と全体的な評価の両方において優れた性能を示すだけでなく、言語多様性に優れる能力を強調している。言語モデルの発展への我々のコミットメントは、限られた言語資産を持つ人々のギャップを埋めることを目的として、十分なリソースを持つ言語を超えて拡張されます。さらに、コモド7B-インストラクトはインドネシアの教育格差に対処するために、英語から11の地域言語への直接翻訳を提供しており、既存の言語翻訳サービスに比べて大幅に改善されている。コモド7Bは言語モデルにおける傾きと有効性への重要なステップであり、多様なコミュニティの言語的ニーズに寄与する。 The recent breakthroughs in Large Language Models (LLMs) have mostly focused on languages with easily available and sufficient resources, such as English. However, there remains a significant gap for languages that lack sufficient linguistic resources in the public domain. Our work introduces Komodo-7B, 7-billion-parameter Large Language Models designed to address this gap by seamlessly operating across Indonesian, English, and 11 regional languages in Indonesia. Komodo-7B is a family of LLMs that consist of Komodo-7B-Base and Komodo-7B-Instruct. Komodo-7B-Instruct stands out by achieving state-of-the-art performance in various tasks and languages, outperforming the benchmarks set by OpenAI's GPT-3.5, Cohere's Aya-101, Llama-2-Chat-13B, Mixtral-8x7B-Instruct-v0.1, Gemma-7B-it , and many more. This model not only demonstrates superior performance in both language-specific and overall assessments but also highlights its capability to excel in linguistic diversity. Our commitment to advancing language models extends beyond well-resourced languages, aiming to bridge the gap for those with limited linguistic assets. Additionally, Komodo-7B-Instruct's better cross-language understanding contributes to addressing educational disparities in Indonesia, offering direct translations from English to 11 regional languages, a significant improvement compared to existing language translation services. Komodo-7B represents a crucial step towards inclusivity and effectiveness in language models, providing to the linguistic needs of diverse communities.	翻訳日:2024-03-20 20:49:20 公開日:2024-03-19
# ベイジアンネットワークを用いた語彙データとテキストによる臨床推論 Clinical Reasoning over Tabular Data and Text with Bayesian Networks ( http://arxiv.org/abs/2403.09481v2 ) ライセンス: Link先を確認	Paloma Rabaey, Johannes Deleu, Stefan Heytens, Thomas Demeester,	(参考訳) ベイジアンネットワークは、表形式のデータに対する臨床推論には適しているが、ニューラルネットワークが成功したフレームワークを提供する自然言語データとの互換性が低い。本稿では,ベイジアンネットワークとニューラルテキスト表現を生成的・識別的に比較検討する。本研究は, プライマリ・ケア・ユースケース(肺炎の診断)のシミュレーション結果と, より広い臨床文脈で考察した。 Bayesian networks are well-suited for clinical reasoning on tabular data, but are less compatible with natural language data, for which neural networks provide a successful framework. This paper compares and discusses strategies to augment Bayesian networks with neural text representations, both in a generative and discriminative manner. This is illustrated with simulation results for a primary care use case (diagnosis of pneumonia) and discussed in a broader clinical context.	翻訳日:2024-03-20 20:49:20 公開日:2024-03-19
# Qラーニングを用いた乳牛用バッテリー管理への強化学習アプローチ A Reinforcement Learning Approach to Dairy Farm Battery Management using Q Learning ( http://arxiv.org/abs/2403.09499v2 ) ライセンス: Link先を確認	Nawazish Ali, Abdul Wahid, Rachael Shaw, Karl Mason,	(参考訳) 乳牛の農業はかなりの量のエネルギーを消費しており、農業のエネルギー集約部門となっている。再生可能エネルギーの農業への統合は、この課題に対処するのに役立つ。再生可能エネルギーの創出に有効な電池管理が重要である。電力消費の変動、再生可能エネルギーの断続的な性質、エネルギー価格の変動など、バッテリー充電と放電の管理は大きな課題となっている。人工知能(AI)は、乳園農業における再生可能エネルギーの利用を著しく改善する可能性があるが、この領域では限定的な研究が行われている。本研究は、アイルランドを再生可能エネルギーの利用を中心とした2030年のエネルギー戦略の達成に向けたケーススタディとみなす。本研究は, 乳園における電池充電と排出をスケジューリングするQラーニングに基づくアルゴリズムを提案する。本研究は,風力発生データの追加とケーススタディの追加による提案アルゴリズムの効果についても検討する。提案アルゴリズムは,送電網からの電力輸入コストを13.41 %,ピーク需要を2 %,風力発電を24.49 %削減する。これらの結果は, 農林水産部門における増補学習が, バッテリー管理に極めて有効であることを示すものである。 Dairy farming consumes a significant amount of energy, making it an energy-intensive sector within agriculture. Integrating renewable energy generation into dairy farming could help address this challenge. Effective battery management is important for integrating renewable energy generation. Managing battery charging and discharging poses significant challenges because of fluctuations in electrical consumption, the intermittent nature of renewable energy generation, and fluctuations in energy prices. Artificial Intelligence (AI) has the potential to significantly improve the use of renewable energy in dairy farming, however, there is limited research conducted in this particular domain. This research considers Ireland as a case study as it works towards attaining its 2030 energy strategy centered on the utilization of renewable sources. This study proposes a Q-learning-based algorithm for scheduling battery charging and discharging in a dairy farm setting. This research also explores the effect of the proposed algorithm by adding wind generation data and considering additional case studies. The proposed algorithm reduces the cost of imported electricity from the grid by 13.41\%, peak demand by 2\%, and 24.49\% when utilizing wind generation. These results underline how reinforcement learning is highly effective in managing batteries in the dairy farming sector.	翻訳日:2024-03-20 20:49:20 公開日:2024-03-19
# MM1:マルチモーダルLLM事前学習の方法・分析・洞察 MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training ( http://arxiv.org/abs/2403.09611v2 ) ライセンス: Link先を確認	Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Mark Lee, Zirui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev, Yinfei Yang,	(参考訳) 本稿では,MLLM(Multimodal Large Language Models)の構築について論じる。特に,さまざまなアーキテクチャコンポーネントとデータ選択の重要性について検討する。画像エンコーダ,視覚言語コネクタ,各種事前学習データの選択を慎重にかつ包括的に改善することにより,いくつかの重要な設計の教訓を明らかにした。例えば、画像キャプチャー、インターリーブド画像テキスト、テキストのみのデータを慎重に組み合わせた大規模マルチモーダル事前学習は、複数のベンチマークで最新のSOTA (State-of-the-art) 数ショット結果を達成するのに不可欠であることを示す。さらに、画像解像度と画像トークン数とを併用した画像エンコーダは、視覚言語コネクタ設計が比較的重要視されているのに対して、かなりの影響を与えることを示す。提案したレシピをスケールアップすることにより,厳密なモデルと混合オブエキスパート(MoE)変異を含む最大30BパラメータのマルチモーダルモデルのファミリーであるMM1を構築する。大規模な事前トレーニングによって、MM1は、強化されたコンテキスト内学習やマルチイメージ推論などの魅力的な特性を享受し、数発のチェーン・オブ・シークレットのプロンプトを可能にしている。 In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data is crucial for achieving state-of-the-art (SOTA) few-shot results across multiple benchmarks, compared to other published pre-training results. Further, we show that the image encoder together with image resolution and the image token count has substantial impact, while the vision-language connector design is of comparatively negligible importance. By scaling up the presented recipe, we build MM1, a family of multimodal models up to 30B parameters, including both dense models and mixture-of-experts (MoE) variants, that are SOTA in pre-training metrics and achieve competitive performance after supervised fine-tuning on a range of established multimodal benchmarks. Thanks to large-scale pre-training, MM1 enjoys appealing properties such as enhanced in-context learning, and multi-image reasoning, enabling few-shot chain-of-thought prompting.	翻訳日:2024-03-20 20:39:33 公開日:2024-03-19
# FoldToken: ベクトル量子化とそれを超えるタンパク質言語を学ぶ FoldToken: Learning Protein Language via Vector Quantization and Beyond ( http://arxiv.org/abs/2403.09673v2 ) ライセンス: Link先を確認	Zhangyang Gao, Cheng Tan, Jue Wang, Yufei Huang, Lirong Wu, Stan Z. Li,	(参考訳) タンパク質配列と構造を同時に記述する外国語はあるか? 連続した3Dポイントで表されるタンパク質構造は、離散配列の対照的なモデリングパラダイムのため、長年にわたって課題を提起してきた。タンパク質配列構造を離散シンボルとして表現するために、textbf{FoldTokenizer}を導入する。この革新的なアプローチは、情報保存のための再構築損失によって導かれる、残余のタイプと構造を離散空間に投影することである。学習した離散記号を「textbf{FoldToken}」と呼び、FoldTokensの配列は新しいタンパク質言語として機能し、タンパク質の配列構造を統一されたモダリティに変換する。生成したタンパク質言語を、一般的なバックボーン塗布および抗体設計タスクに適用し、最初のGPTスタイルモデル(\textbf{FoldGPT})を構築し、将来性のある結果を得る。我々の成功の鍵は、ベクトル量子化モジュールであるソフト条件ベクトル量子化(\textbf{SoftCVQ})の大幅な拡張である。 Is there a foreign language describing protein sequences and structures simultaneously? Protein structures, represented by continuous 3D points, have long posed a challenge due to the contrasting modeling paradigms of discrete sequences. We introduce \textbf{FoldTokenizer} to represent protein sequence-structure as discrete symbols. This innovative approach involves projecting residue types and structures into a discrete space, guided by a reconstruction loss for information preservation. We refer to the learned discrete symbols as \textbf{FoldToken}, and the sequence of FoldTokens serves as a new protein language, transforming the protein sequence-structure into a unified modality. We apply the created protein language on general backbone inpainting and antibody design tasks, building the first GPT-style model (\textbf{FoldGPT}) for sequence-structure co-generation with promising results. Key to our success is the substantial enhancement of the vector quantization module, Soft Conditional Vector Quantization (\textbf{SoftCVQ}).	翻訳日:2024-03-20 20:39:33 公開日:2024-03-19
# 表面配向ガウス平板による可制御型テキスト・ツー・3D生成 Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting ( http://arxiv.org/abs/2403.09981v2 ) ライセンス: Link先を確認	Zhiqi Li, Yiming Chen, Lingzhe Zhao, Peidong Liu,	(参考訳) テキスト・トゥ・3Dと画像・ツー・3D生成タスクは注目されているが,その間には制御可能なテキスト・ツー・3D生成機能がある。この課題に対処する。 1)Multi-view ControlNet(MVControl)は,エッジ,深さ,正規,スクリブルマップなどの入力条件を統合することで,既存のトレーニング済みのマルチビュー拡散モデルを強化するニューラルネットワークアーキテクチャである。我々の革新は、入力条件画像とカメラポーズから計算される局所的およびグローバルな埋め込みを用いてベース拡散モデルを制御する条件付きモジュールの導入にある。トレーニングが完了すると、MVControlは最適化ベースの3D生成のための3D拡散ガイダンスを提供することができる。そして 2) 近年の大規模再構成モデルとスコア蒸留アルゴリズムの利点を生かした,効率的な多段3D生成パイプラインを提案する。 MVControlアーキテクチャを基盤として,最適化プロセスの指示に独自のハイブリッド拡散誘導手法を採用している。効率性を追求するために、一般的に使用される暗黙の表現の代わりに、3Dガウスを表現として採用する。我々はまた、ガウスを三角形の面に結合するハイブリッド表現SuGaRの使用の先駆者でもある。このアプローチは、3Dガウスの幾何学の問題を緩和し、メッシュ上の微細な幾何学を直接彫刻することを可能にする。大規模な実験により,本手法は堅牢な一般化を実現し,高品質な3Dコンテンツの制御可能な生成を可能にした。 While text-to-3D and image-to-3D generation tasks have received considerable attention, one important but under-explored field between them is controllable text-to-3D generation, which we mainly focus on in this work. To address this task, 1) we introduce Multi-view ControlNet (MVControl), a novel neural network architecture designed to enhance existing pre-trained multi-view diffusion models by integrating additional input conditions, such as edge, depth, normal, and scribble maps. Our innovation lies in the introduction of a conditioning module that controls the base diffusion model using both local and global embeddings, which are computed from the input condition images and camera poses. Once trained, MVControl is able to offer 3D diffusion guidance for optimization-based 3D generation. And, 2) we propose an efficient multi-stage 3D generation pipeline that leverages the benefits of recent large reconstruction models and score distillation algorithm. Building upon our MVControl architecture, we employ a unique hybrid diffusion guidance method to direct the optimization process. In pursuit of efficiency, we adopt 3D Gaussians as our representation instead of the commonly used implicit representations. We also pioneer the use of SuGaR, a hybrid representation that binds Gaussians to mesh triangle faces. This approach alleviates the issue of poor geometry in 3D Gaussians and enables the direct sculpting of fine-grained geometry on the mesh. Extensive experiments demonstrate that our method achieves robust generalization and enables the controllable generation of high-quality 3D content.	翻訳日:2024-03-20 20:39:33 公開日:2024-03-19
# 非機能特性を目標としたシステムレベルテストプログラムを生成するための大規模言語モデル Large Language Models to Generate System-Level Test Programs Targeting Non-functional Properties ( http://arxiv.org/abs/2403.10086v2 ) ライセンス: Link先を確認	Denis Schwachhofer, Peter Domanski, Steffen Becker, Stefan Wagner, Matthias Sauer, Dirk Pflüger, Ilia Polian,	(参考訳) System-Level Test (SLT) は10年以上にわたって集積回路のテストフローの一部であり、依然として重要になっている。しかしながら、テストプログラム生成のための体系的なアプローチは存在せず、特にテスト対象デバイス(DUT)の非機能特性をターゲットにしている。現在、テストエンジニアは、DUTのエンドユーザ環境を近似して、既製のソフトウェアからテストスイートを手作業で作成している。これは、機能しないプロパティに対する十分な制御を保証しない、困難で退屈なタスクです。本稿では,テストプログラムを生成するためのLarge Language Models (LLM)を提案する。我々は、DUTの非機能特性を最適化するために、事前訓練されたLLMがテストプログラム生成でどのように機能するかを、一目で見てみる。したがって、シミュレーションにおいて超スカラー・アウト・オブ・オーダーアーキテクチャのサイクル当たりの命令を最大化するCコードスニペットを生成するプロンプトを記述する。さらに,さらなるトレーニングを行なわずに最適な結果を得るために,プロンプトとハイパーパラメータの最適化を適用した。 System-Level Test (SLT) has been a part of the test flow for integrated circuits for over a decade and still gains importance. However, no systematic approaches exist for test program generation, especially targeting non-functional properties of the Device under Test (DUT). Currently, test engineers manually compose test suites from off-the-shelf software, approximating the end-user environment of the DUT. This is a challenging and tedious task that does not guarantee sufficient control over non-functional properties. This paper proposes Large Language Models (LLMs) to generate test programs. We take a first glance at how pre-trained LLMs perform in test program generation to optimize non-functional properties of the DUT. Therefore, we write a prompt to generate C code snippets that maximize the instructions per cycle of a super-scalar, out-of-order architecture in simulation. Additionally, we apply prompt and hyperparameter optimization to achieve the best possible results without further training.	翻訳日:2024-03-20 20:39:33 公開日:2024-03-19
# フィッシャー・ラオ距離の近似と有界化 Approximation and bounding techniques for the Fisher-Rao distances ( http://arxiv.org/abs/2403.10089v2 ) ライセンス: Link先を確認	Frank Nielsen,	(参考訳) 統計モデルの2つの確率分布間のフィッシャー・ラオ距離は、フィッシャー情報計量によって誘導されるリーマン測地距離として定義される。クローズド形式のフィッシャー・ラオ距離を計算するためには、(1)フィッシャー・ラオ測地線の式を導出し、(2)フィッシャー長要素をそれらの測地線に沿って積分する必要がある。我々はフィッシャー・ラオ距離の数値的ロバストな近似とバウンディング手法について考察する: まず, サブモデルの閉形式1Dフィッシャー・ラオ距離に基づくフィッシャー・ラオ距離の一般上界について報告する。第二に、フィッシャー・ラオ測地学やプレジオデシクスがクローズド形式で利用できるかどうかによって、いくつかの一般的な近似スキームを記述する。特に,フィッシャー・ラオ・プレジェデシクスとタイト・ロウアー・アッパー・バウンドが利用できると仮定して,任意に小さな加法誤差を保証できる汎用的手法を得る。第3に,フィッシャー測度がヘッセン測度である場合を考察し,情報幾何学の手法を用いて,フィッシャー・ラオ距離の総称的上界を報告する。単パラメトリックおよび双パラメトリック統計モデルは、常にフィッシャー・ヘッセン計量を持ち、一般に、フィッシャー情報行列がヘッセン計量を生成するかどうかを確認するための単純なテストが可能である。第4に、楕円分布系を考察し、上記の手法をこれらのモデルに適用する方法を示す。また、フィッシャー・ラオ測地線のプロキシとして機能する曲線のフィッシャー・ラオ長に基づく2つの新しい距離や、バーコフ/ヒルベルト射影円錐距離に基づく2つの新しい距離を提案する。最後に、フィッシャー・ラオ距離式の構造に関する洞察を得られる最大不変量の概念に基づいて、統計変換モデルに対する別の群論的アプローチを考える。 The Fisher-Rao distance between two probability distributions of a statistical model is defined as the Riemannian geodesic distance induced by the Fisher information metric. In order to calculate the Fisher-Rao distance in closed-form, we need (1) to elicit a formula for the Fisher-Rao geodesics, and (2) to integrate the Fisher length element along those geodesics. We consider several numerically robust approximation and bounding techniques for the Fisher-Rao distances: First, we report generic upper bounds on Fisher-Rao distances based on closed-form 1D Fisher-Rao distances of submodels. Second, we describe several generic approximation schemes depending on whether the Fisher-Rao geodesics or pregeodesics are available in closed-form or not. In particular, we obtain a generic method to guarantee an arbitrarily small additive error on the approximation provided that Fisher-Rao pregeodesics and tight lower and upper bounds are available. Third, we consider the case of Fisher metrics being Hessian metrics, and report generic tight upper bounds on the Fisher-Rao distances using techniques of information geometry. Uniparametric and biparametric statistical models always have Fisher Hessian metrics, and in general a simple test allows to check whether the Fisher information matrix yields a Hessian metric or not. Fourth, we consider elliptical distribution families and show how to apply the above techniques to these models. We also propose two new distances based either on the Fisher-Rao lengths of curves serving as proxies of Fisher-Rao geodesics, or based on the Birkhoff/Hilbert projective cone distance. Last, we consider an alternative group-theoretic approach for statistical transformation models based on the notion of maximal invariant which yields insights on the structures of the Fisher-Rao distance formula which may be used fruitfully in applications.	翻訳日:2024-03-20 20:39:33 公開日:2024-03-19
# DyBluRF:Blury Monocular Videoによる動的神経放射場 DyBluRF: Dynamic Neural Radiance Fields from Blurry Monocular Video ( http://arxiv.org/abs/2403.10103v2 ) ライセンス: Link先を確認	Huiqiang Sun, Xingyi Li, Liao Shen, Xinyi Ye, Ke Xian, Zhiguo Cao,	(参考訳) 近年, 動的神経放射場法が進歩し, 顕著な結果が得られた。しかし、これらのアプローチはシャープな入力画像の仮定に依存している。動きのぼやけに直面した場合、既存の動的NeRF法は、しばしば高品質な新しいビューを生成するのに苦労する。本稿では,動きのぼかしに影響を受ける単眼映像から鋭い新しい視点を合成する動的放射場アプローチであるDyBluRFを提案する。入力画像中の動きのぼかしを考慮し、シーン内のカメラ軌跡とオブジェクト離散コサイン変換(DCT)トラジェクトリを同時にキャプチャする。さらに、シーン全体にわたって一貫した時間的コヒーレンスを確保するために、グローバルなクロスタイムレンダリングアプローチを採用しています。タスクに適した多様な動的シーンからなるデータセットをキュレートする。提案手法は,映像の空間的・時間的一貫性を維持しつつ,動色入力から鮮明な新規ビューを生成する上で,既存の手法よりも優れていることを示す。 Recent advancements in dynamic neural radiance field methods have yielded remarkable outcomes. However, these approaches rely on the assumption of sharp input images. When faced with motion blur, existing dynamic NeRF methods often struggle to generate high-quality novel views. In this paper, we propose DyBluRF, a dynamic radiance field approach that synthesizes sharp novel views from a monocular video affected by motion blur. To account for motion blur in input images, we simultaneously capture the camera trajectory and object Discrete Cosine Transform (DCT) trajectories within the scene. Additionally, we employ a global cross-time rendering approach to ensure consistent temporal coherence across the entire scene. We curate a dataset comprising diverse dynamic scenes that are specifically tailored for our task. Experimental results on our dataset demonstrate that our method outperforms existing approaches in generating sharp novel views from motion-blurred inputs while maintaining spatial-temporal consistency of the scene.	翻訳日:2024-03-20 20:39:33 公開日:2024-03-19
# LLMを用いたゼロショット視覚認識のためのメタプロンプト Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs ( http://arxiv.org/abs/2403.11755v2 ) ライセンス: Link先を確認	M. Jehanzeb Mirza, Leonid Karlinsky, Wei Lin, Sivan Doveh, Jakub Micorek, Mateusz Kozinski, Hilde Kuhene, Horst Possegger,	(参考訳) 視覚言語モデル(VLM)のゼロショット認識能力を向上する有効な手法として,大規模言語モデル(LLM)の生成したカテゴリ固有プロンプトのプロンプトアンサンブルが出現している。これらのカテゴリ固有のプロンプトを得るには、下流タスクのためのVLMプロンプトを生成するために、LSMに手作りのプロンプトを使用する。しかし、これはこれらのタスク固有のプロンプトを手作業で組み立てる必要があり、それでも、関心のカテゴリに関連する様々な視覚概念やタスク固有のスタイルをカバーしていないかもしれない。そこで本研究では,視覚認識のためのメタプロンプティング(MPVR)を提案する。入力は、目的のタスクに関する最小限の情報と、その短い自然言語記述と関連するクラスラベルのリストの形式で、MPVRは自動的にカテゴリ固有のプロンプトの多様なセットを生成し、強力なゼロショット分類器を生成する。 MPVRは、複数のLLMとVLMでテストする際に、広く異なるドメインに属する様々な人気のあるゼロショット画像認識ベンチマークを効果的に一般化する。例えば、MPVRは、それぞれGPTとMixtral LLMを活用して、CLIPを19.8%、CLIPを18.2%(平均で5.0%、および4.5%)ゼロショット認識改善する。 Prompt ensembling of Large Language Model (LLM) generated category-specific prompts has emerged as an effective method to enhance zero-shot recognition ability of Vision-Language Models (VLMs). To obtain these category-specific prompts, the present methods rely on hand-crafting the prompts to the LLMs for generating VLM prompts for the downstream tasks. However, this requires manually composing these task-specific prompts and still, they might not cover the diverse set of visual concepts and task-specific styles associated with the categories of interest. To effectively take humans out of the loop and completely automate the prompt generation process for zero-shot recognition, we propose Meta-Prompting for Visual Recognition (MPVR). Taking as input only minimal information about the target task, in the form of its short natural language description, and a list of associated class labels, MPVR automatically produces a diverse set of category-specific prompts resulting in a strong zero-shot classifier. MPVR generalizes effectively across various popular zero-shot image recognition benchmarks belonging to widely different domains when tested with multiple LLMs and VLMs. For example, MPVR obtains a zero-shot recognition improvement over CLIP by up to 19.8% and 18.2% (5.0% and 4.5% on average over 20 datasets) leveraging GPT and Mixtral LLMs, respectively	翻訳日:2024-03-20 20:29:45 公開日:2024-03-19
# BAD-ガウシアン: ガウシアン・スプラッティングを調整したバンドル BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting ( http://arxiv.org/abs/2403.11831v2 ) ライセンス: Link先を確認	Lingzhe Zhao, Peng Wang, Peidong Liu,	(参考訳) ニューラルレンダリングは、3Dシーンの再構築と新しいビュー合成において印象的な能力を示しているが、高品質のシャープな画像と正確なカメラのポーズに大きく依存している。低照度や長時間露光といった現実のシナリオでよく見られる、モーションブルーの画像でニューラル・レージアンス・フィールド(NeRF)を訓練するための多くのアプローチが提案されている。しかし、NeRFの暗黙の表現は、高度に動きのある画像から複雑な詳細を正確に復元するのに苦労し、リアルタイムレンダリングを達成できない。対照的に、3次元ガウス球面の最近の進歩は、点雲をガウス球として明示的に最適化することにより、高品質な3次元シーン再構成とリアルタイムレンダリングを実現している。本稿では,ガウス表現を明示的に活用し,不正確なカメラポーズで鮮明な映像を処理し,高品質なシーン再構成を実現する,BAD-Gaussian (Bundle Adjusted Deblur Gaussian Splatting) という新しいアプローチを提案する。本手法は, 映像の物理的形成過程をモデル化し, 露光時のカメラモーショントラジェクトリを復元しながら, ガウスのパラメータを共同で学習する。実験では、BAD-Gaussianは、合成データセットと実データの両方で、従来の最先端のデブルーニューラルネットワークレンダリング手法よりも優れたレンダリング品質を実現するだけでなく、リアルタイムレンダリング機能も実現していることを示した。私たちのプロジェクトページとソースコードはhttps://lingzhezhao.github.io/BAD-Gaussians/で公開されています。 While neural rendering has demonstrated impressive capabilities in 3D scene reconstruction and novel view synthesis, it heavily relies on high-quality sharp images and accurate camera poses. Numerous approaches have been proposed to train Neural Radiance Fields (NeRF) with motion-blurred images, commonly encountered in real-world scenarios such as low-light or long-exposure conditions. However, the implicit representation of NeRF struggles to accurately recover intricate details from severely motion-blurred images and cannot achieve real-time rendering. In contrast, recent advancements in 3D Gaussian Splatting achieve high-quality 3D scene reconstruction and real-time rendering by explicitly optimizing point clouds as Gaussian spheres. In this paper, we introduce a novel approach, named BAD-Gaussians (Bundle Adjusted Deblur Gaussian Splatting), which leverages explicit Gaussian representation and handles severe motion-blurred images with inaccurate camera poses to achieve high-quality scene reconstruction. Our method models the physical image formation process of motion-blurred images and jointly learns the parameters of Gaussians while recovering camera motion trajectories during exposure time. In our experiments, we demonstrate that BAD-Gaussians not only achieves superior rendering quality compared to previous state-of-the-art deblur neural rendering methods on both synthetic and real datasets but also enables real-time rendering capabilities. Our project page and source code is available at https://lingzhezhao.github.io/BAD-Gaussians/	翻訳日:2024-03-20 20:10:10 公開日:2024-03-19
# 遅延状態推論による観測遅延下での自律型オンランプマージの強化学習 Reinforcement Learning with Latent State Inference for Autonomous On-ramp Merging under Observation Delay ( http://arxiv.org/abs/2403.11852v2 ) ライセンス: Link先を確認	Amin Tabrizian, Zhitong Huang, Peng Wei,	(参考訳) 本稿では、自動運転車が多車線高速道路の車両の流れにシームレスに統合されるという、自律的なオンランプ統合の課題に対処する新しいアプローチを提案する。車両の意図や運転スタイルに関する包括的知識を必要とせず,オンランプマージタスクを安全に行うために設計されたL3ISエージェントを用いたレーンキーピング・レーンチェンジについて紹介する。また、このエージェントであるAL3ISを、観測遅延を考慮し、車車間通信遅延(V2V)を用いて、実環境においてより堅牢な決定を行えるようにした。他の運転者の意図などの潜伏状態を通じて環境の観測不能な側面をモデル化することにより、我々のアプローチは、エージェントが動的な交通条件に適応し、マージ操作を最適化し、他の車両との安全な相互作用を確保する能力を高める。実交通データから発生する広範囲なシミュレーションにより,本手法の有効性を実証し,その性能を既存手法と比較する。 L3ISは、実際のアメリカ国道101号線のデータから生成された、ランプ上の合併事件において、99.90%の成功率を示している。さらに、AL3ISの感度解析を行い、様々な観測遅延に対する頑健さを評価し、1秒間V2V通信遅延における93.84%の成功率を許容できる性能を示す。 This paper presents a novel approach to address the challenging problem of autonomous on-ramp merging, where a self-driving vehicle needs to seamlessly integrate into a flow of vehicles on a multi-lane highway. We introduce the Lane-keeping, Lane-changing with Latent-state Inference and Safety Controller (L3IS) agent, designed to perform the on-ramp merging task safely without comprehensive knowledge about surrounding vehicles' intents or driving styles. We also present an augmentation of this agent called AL3IS that accounts for observation delays, allowing the agent to make more robust decisions in real-world environments with vehicle-to-vehicle (V2V) communication delays. By modeling the unobservable aspects of the environment through latent states, such as other drivers' intents, our approach enhances the agent's ability to adapt to dynamic traffic conditions, optimize merging maneuvers, and ensure safe interactions with other vehicles. We demonstrate the effectiveness of our method through extensive simulations generated from real traffic data and compare its performance with existing approaches. L3IS shows a 99.90% success rate in a challenging on-ramp merging case generated from the real US Highway 101 data. We further perform a sensitivity analysis on AL3IS to evaluate its robustness against varying observation delays, which demonstrates an acceptable performance of 93.84% success rate in 1-second V2V communication delay.	翻訳日:2024-03-20 20:00:12 公開日:2024-03-19
# バングラデシュの農業知識グラフ: セマンティック統合とデータ駆動分析の実現--フルバージョン Bangladesh Agricultural Knowledge Graph: Enabling Semantic Integration and Data-driven Analysis--Full Version ( http://arxiv.org/abs/2403.11920v2 ) ライセンス: Link先を確認	Rudra Pratap Deb Nath, Tithi Rani Das, Tonmoy Chandro Das, S. M. Shafkat Raihan,	(参考訳) バングラデシュでは、農業は持続可能開発目標1(貧困なし)と2(飢餓回避)に対処するための重要な要因であり、経済と人々の生活に基本的な役割を担っている。バングラデシュ統計局は、データ駆動の洞察を通じて農業産業の持続可能性とレジリエンスを高めるため、ウェブ上で一貫して農業データを収集し、公開している。それでも、現在のデータセットは、さまざまな課題に直面している。 1)持続不可能で、静的で、読み取り専用で、集約されたフォーマットで表示されます。 2)FAIR(Finderability, Accessibility, Interoperability, and Reusability)の原則に従っていない。 3)他のデータソースとの対話的分析や統合を容易にするものではない。本稿では,バングラデシュにおける農業データを意味的に解析的に統合した知識グラフであるBDAKGを開発するための体系的な手順を概説する。 BDAKGは多次元意味論を取り入れ、外部知識グラフとリンクし、OLAPと互換性があり、FAIRの原則に準拠している。実験的な評価は,完全性,タイムライン,FAIR性,OLAP適合性,データ駆動分析の観点から,統合プロセスの評価と結果の知識グラフの品質評価に重点を置いている。当社のフェデレーションデータ分析は、CO$2の排出削減、経済成長の促進、持続可能な林業の促進に重点を置く戦略的アプローチを推奨している。 In Bangladesh, agriculture is a crucial driver for addressing Sustainable Development Goal 1 (No Poverty) and 2 (Zero Hunger), playing a fundamental role in the economy and people's livelihoods. To enhance the sustainability and resilience of the agriculture industry through data-driven insights, the Bangladesh Bureau of Statistics and other organizations consistently collect and publish agricultural data on the Web. Nevertheless, the current datasets encounter various challenges: 1) they are presented in an unsustainable, static, read-only, and aggregated format, 2) they do not conform to the Findability, Accessibility, Interoperability, and Reusability (FAIR) principles, and 3) they do not facilitate interactive analysis and integration with other data sources. In this paper, we present a thorough solution, delineating a systematic procedure for developing BDAKG: a knowledge graph that semantically and analytically integrates agriculture data in Bangladesh. BDAKG incorporates multidimensional semantics, is linked with external knowledge graphs, is compatible with OLAP, and adheres to the FAIR principles. Our experimental evaluation centers on evaluating the integration process and assessing the quality of the resultant knowledge graph in terms of completeness, timeliness, FAIRness, OLAP compatibility and data-driven analysis. Our federated data analysis recommend a strategic approach focused on decreasing CO$_2$ emissions, fostering economic growth, and promoting sustainable forestry.	翻訳日:2024-03-20 19:40:35 公開日:2024-03-19
# 完全ゼロ知識PCP for #P Perfect Zero-Knowledge PCPs for #P ( http://arxiv.org/abs/2403.11941v2 ) ライセンス: Link先を確認	Tom Gur, Jack O'Connor, Nicholas Spooner,	(参考訳) 完全ゼロ知識確率的証明(PZK-PCP)を#Pのすべての言語に対して構築する。これは、BPP以外の言語向けのPZK-PCPの最初の構成である。さらに,従来の(統計的)ゼロ知識PCPとは違って,任意の(適応的な)多項式時間不正な検証に対して,非適応性とゼロ知識を同時に実現している。我々の構成は、合成ヌルステレンサッツを用いてハイパーキューブ内の非対称構造と、その外側のランダム性を得る新しいマスク付き要約PCPで構成されている。ゼロ知識を証明するために、メッセージのローカルビューを考慮し、すべてのローカルビューを効率的にサンプリングできるランダム化されたエンコーディングという、局所的なシミュラブルエンコーディングの概念を導入する。 sumcheckプロトコル(subcube sumsを付加したReed-Mullerコード)から生じるコードは、局所的にシミュラブルなエンコーディングが可能であることを示す。これにより、マスク付き和チェックを非対称関数の組合せ的性質にシミュレートする代数的問題を減らすことができる。 We construct perfect zero-knowledge probabilistically checkable proofs (PZK-PCPs) for every language in #P. This is the first construction of a PZK-PCP for any language outside BPP. Furthermore, unlike previous constructions of (statistical) zero-knowledge PCPs, our construction simultaneously achieves non-adaptivity and zero knowledge against arbitrary (adaptive) polynomial-time malicious verifiers. Our construction consists of a novel masked sumcheck PCP, which uses the combinatorial nullstellensatz to obtain antisymmetric structure within the hypercube and randomness outside of it. To prove zero knowledge, we introduce the notion of locally simulatable encodings: randomised encodings in which every local view of the encoding can be efficiently sampled given a local view of the message. We show that the code arising from the sumcheck protocol (the Reed-Muller code augmented with subcube sums) admits a locally simulatable encoding. This reduces the algebraic problem of simulating our masked sumcheck to a combinatorial property of antisymmetric functions.	翻訳日:2024-03-20 19:30:44 公開日:2024-03-19
# 半教師付き事前訓練と時間モデルによる顔表情認識の探索 Exploring Facial Expression Recognition through Semi-Supervised Pretraining and Temporal Modeling ( http://arxiv.org/abs/2403.11942v2 ) ライセンス: Link先を確認	Jun Yu, Zhihong Wei, Zhongpeng Cai, Gongpeng Zhao, Zerui Zhang, Yongqi Wang, Guochen Xie, Jichao Zhu, Wangyuan Zhu,	(参考訳) 表情認識(FER)はコンピュータビジョンにおいて重要な役割を担い、様々な分野にまたがる幅広い応用を見出す。本稿では,CVPR2024で開催される第6回ABAW(Affective Behavior Analysis in-the-Wild)コンペティションへのアプローチを提案する。表情認識タスクにおいて、FERデータセットの限られたサイズは、表現認識モデルの一般化能力に挑戦し、その結果、サブパー認識性能が低下する。この問題に対処するために、私たちは半教師付き学習技術を用いて、未ラベルの顔データに対する表現カテゴリ擬似ラベルを生成する。同時に、ラベル付き表情サンプルを一様にサンプリングし、半教師付き学習におけるカテゴリ不均衡の問題とデータバイアスに対処するために、偏りのあるフィードバック学習戦略を実装した。さらに,静止画像のみから得られる特徴の制限やバイアスを補うために,隣接する画像特徴間の時間的関係を学習・把握するテンポラルエンコーダを導入した。第6回ABAWコンペティションでは,提案手法の有効性と競争性を検証した公式な検証セットにおいて,優れた結果を得た。 Facial Expression Recognition (FER) plays a crucial role in computer vision and finds extensive applications across various fields. This paper aims to present our approach for the upcoming 6th Affective Behavior Analysis in-the-Wild (ABAW) competition, scheduled to be held at CVPR2024. In the facial expression recognition task, The limited size of the FER dataset poses a challenge to the expression recognition model's generalization ability, resulting in subpar recognition performance. To address this problem, we employ a semi-supervised learning technique to generate expression category pseudo-labels for unlabeled face data. At the same time, we uniformly sampled the labeled facial expression samples and implemented a debiased feedback learning strategy to address the problem of category imbalance in the dataset and the possible data bias in semi-supervised learning. Moreover, to further compensate for the limitation and bias of features obtained only from static images, we introduced a Temporal Encoder to learn and capture temporal relationships between neighbouring expression image features. In the 6th ABAW competition, our method achieved outstanding results on the official validation set, a result that fully confirms the effectiveness and competitiveness of our proposed method.	翻訳日:2024-03-20 19:30:44 公開日:2024-03-19
# テキスト・ビデオ品質評価のための主観的アライメントされた日付と基準 Subjective-Aligned Dateset and Metric for Text-to-Video Quality Assessment ( http://arxiv.org/abs/2403.11956v2 ) ライセンス: Link先を確認	Tengchuan Kou, Xiaohong Liu, Zicheng Zhang, Chunyi Li, Haoning Wu, Xiongkuo Min, Guangtao Zhai, Ning Liu,	(参考訳) 生成モデルの急速な発展に伴い、AIGC(Artificial Intelligence-Generated Contents)は、日常生活において指数関数的に増加している。このうち、テキスト・トゥ・ビデオ(T2V)世代は広く注目を集めている。高い知覚品質のビデオを生成するための多くのT2Vモデルがリリースされているが、これらのビデオの品質を定量的に評価する方法がまだ存在しない。この問題を解決するため,これまでで最大規模のテキスト・ビデオ品質評価データベース(T2VQA-DB)を構築した。データセットは、9つの異なるT2Vモデルによって生成される1万のビデオで構成されている。また、各ビデオの対応する平均意見スコアを得るための主観的研究を行う。本稿では,T2VQA-DBに基づくテキスト・ツー・ビデオ品質評価(T2VQA)のためのトランスフォーマーモデルを提案する。このモデルはテキスト・ビデオのアライメントとビデオの忠実度の観点から特徴を抽出し,大言語モデルの能力を活用して予測スコアを与える。実験の結果,T2VQAは既存のT2VメトリクスとSOTAビデオ品質評価モデルより優れていた。定量的分析により、T2VQAは主観的適応予測を行い、その効果を検証できることが示された。データセットとコードはhttps://github.com/QMME/T2VQAで公開される。 With the rapid development of generative models, Artificial Intelligence-Generated Contents (AIGC) have exponentially increased in daily lives. Among them, Text-to-Video (T2V) generation has received widespread attention. Though many T2V models have been released for generating high perceptual quality videos, there is still lack of a method to evaluate the quality of these videos quantitatively. To solve this issue, we establish the largest-scale Text-to-Video Quality Assessment DataBase (T2VQA-DB) to date. The dataset is composed of 10,000 videos generated by 9 different T2V models. We also conduct a subjective study to obtain each video's corresponding mean opinion score. Based on T2VQA-DB, we propose a novel transformer-based model for subjective-aligned Text-to-Video Quality Assessment (T2VQA). The model extracts features from text-video alignment and video fidelity perspectives, then it leverages the ability of a large language model to give the prediction score. Experimental results show that T2VQA outperforms existing T2V metrics and SOTA video quality assessment models. Quantitative analysis indicates that T2VQA is capable of giving subjective-align predictions, validating its effectiveness. The dataset and code will be released at https://github.com/QMME/T2VQA.	翻訳日:2024-03-20 19:30:44 公開日:2024-03-19
# 携帯型デジタル行動変化介入によるがん患者の幸福感を高める効果的なエンゲージメントの定義 Defining Effective Engagement For Enhancing Cancer Patients' Well-being with Mobile Digital Behavior Change Interventions ( http://arxiv.org/abs/2403.12007v2 ) ライセンス: Link先を確認	Aneta Lisowska, Szymon Wilk, Laura Locati, Mimma Rizzo, Lucia Sacchi, Silvana Quaglini, Matteo Terzaghi, Valentina Tibollo, Mor Peleg,	(参考訳) デジタル行動変化介入(DBCI)は、新しい健康行動の開発を支援している。効果を評価することは、成功要因の改善と理解に不可欠です。しかし、特に倫理的制約のある小規模な研究において、開発者の包括的なガイダンスは限られている。本研究は,CAPABLEプロジェクトに基づいて,がん患者のQOL向上を支援するために,DBCIとの効果的な関与を定義することを目的とする。エンゲージメントを測定するための指標を同定し,DBCIにおける患者と臨床医の両方の関心を探索し,そのような文脈におけるDBCIの影響を評価するための仮説を提案する。以上の結果より, 臨床用処方薬は移動型DBCIとの持続的関与を著しく増加させる可能性が示唆された。さらに、DBCIとの週1回のエンゲージメントは、幸福を維持するのに十分であるが、外在的なモチベーションから内在的なモチベーションへの移行には、より高いレベルのエンゲージメントが必要になる可能性がある。 Digital Behavior Change Interventions (DBCIs) are supporting development of new health behaviors. Evaluating their effectiveness is crucial for their improvement and understanding of success factors. However, comprehensive guidance for developers, particularly in small-scale studies with ethical constraints, is limited. Building on the CAPABLE project, this study aims to define effective engagement with DBCIs for supporting cancer patients in enhancing their quality of life. We identify metrics for measuring engagement, explore the interest of both patients and clinicians in DBCIs, and propose hypotheses for assessing the impact of DBCIs in such contexts. Our findings suggest that clinician prescriptions significantly increase sustained engagement with mobile DBCIs. In addition, while one weekly engagement with a DBCI is sufficient to maintain well-being, transitioning from extrinsic to intrinsic motivation may require a higher level of engagement.	翻訳日:2024-03-20 19:11:08 公開日:2024-03-19
# カプセルネットワークとグラフニューラルネットワークを用いた皮膚癌診断のための空間的特徴抽出と意味的特徴抽出 Leveraging Spatial and Semantic Feature Extraction for Skin Cancer Diagnosis with Capsule Networks and Graph Neural Networks ( http://arxiv.org/abs/2403.12009v2 ) ライセンス: Link先を確認	K. P. Santoso, R. V. H. Ginardi, R. A. Sastrowardoyo, F. A. Madany,	(参考訳) 皮膚病変画像分類の分野では、複雑な空間的特徴と意味的特徴は、従来の畳み込みニューラルネットワーク(CNN)に基づく手法において重要な課題となっている。これらの課題は、モデルが少数民族の特徴を効果的に学習する能力を阻害する皮膚病変データセットの不均衡の性質によって複雑化されている。 GAN(Generative Adversarial Networks)のような拡張戦略にもかかわらず、以前の試みはこれらの複雑さを完全には解決していない。本研究では,グラフニューラルネットワーク(GNN)とCapsule Networksを統合して,分類性能を向上させるという,革新的なアプローチを提案する。グラフ構造化データを扱う能力で知られているGNNは、従来のCNNの能力を超えた複雑なパターンや関係をキャプチャする高度なメカニズムを提供する。カプセルネットワークは、画像内の空間階層の優れた認識を提供することによって、さらに貢献する。本稿では,Tiny Pyramid Vision GNN(Tiny Pyramid ViG)アーキテクチャをCapsule Networkに組み込んで評価・拡張することに焦点を当てた。このハイブリッドモデルは、分類モデルのベンチマーク用に設計された総合的な皮膚病変データセットであるMNIST:HAM10000データセットに適用された。 75回のトレーニングの後、我々のモデルは89.23%、95.52%に到達し、GoogLeNet (83.94%)、InceptionV3 (86.82%)、MobileNet V3 (89.87%)、EfficientNet-B7 (92.07%)、ResNet18 (92.22%)、ResNet34 (91.90%)、ViT-Base (73.70%)、IRv2-SA (93.47%)といった既存のベンチマークを上回った。この結果から,皮膚病変分類の課題を克服し,皮膚科における画像診断の進歩に寄与する可能性が示唆された。 In the realm of skin lesion image classification, the intricate spatial and semantic features pose significant challenges for conventional Convolutional Neural Network (CNN)-based methodologies. These challenges are compounded by the imbalanced nature of skin lesion datasets, which hampers the ability of models to learn minority class features effectively. Despite augmentation strategies, such as those using Generative Adversarial Networks (GANs), previous attempts have not fully addressed these complexities. This study introduces an innovative approach by integrating Graph Neural Networks (GNNs) with Capsule Networks to enhance classification performance. GNNs, known for their proficiency in handling graph-structured data, offer an advanced mechanism for capturing complex patterns and relationships beyond the capabilities of traditional CNNs. Capsule Networks further contribute by providing superior recognition of spatial hierarchies within images. Our research focuses on evaluating and enhancing the Tiny Pyramid Vision GNN (Tiny Pyramid ViG) architecture by incorporating it with a Capsule Network. This hybrid model was applied to the MNIST:HAM10000 dataset, a comprehensive skin lesion dataset designed for benchmarking classification models. After 75 epochs of training, our model achieved a significant accuracy improvement, reaching 89.23% and 95.52%, surpassing established benchmarks such as GoogLeNet (83.94%), InceptionV3 (86.82%), MobileNet V3 (89.87%), EfficientNet-B7 (92.07%), ResNet18 (92.22%), ResNet34 (91.90%), ViT-Base (73.70%), and IRv2-SA (93.47%) on the same dataset. This outcome underscores the potential of our approach in overcoming the inherent challenges of skin lesion classification, contributing to the advancement of image-based diagnosis in dermatology.	翻訳日:2024-03-20 19:11:08 公開日:2024-03-19
# 6100個の高コヒーレント原子量子ビットを持つツイーザアレイ A tweezer array with 6100 highly coherent atomic qubits ( http://arxiv.org/abs/2403.12021v2 ) ライセンス: Link先を確認	Hannah J. Manetsch, Gyohei Nomura, Elie Bataille, Kon H. Leung, Xudong Lv, Manuel Endres,	(参考訳) 光ツイーザーアレイは過去数年間、原子物理学や分子物理学に革命的な影響を与えており、量子コンピューティング、シミュレーション、気象学における幅広い主要な実験のバックボーンを形成している。この開発の根底にあるのは、この技術固有の単一粒子制御と検出の単純さである。典型的な実験では、数十から数百の原子量子ビットをトラップし、最近になって約1000個の原子を持つ系が、量子ビットを定義したりコヒーレントな制御を示すことなく実現された。しかし、長いコヒーレンス時間と低損失の高密度イメージングを持つ何千もの原子量子ビットへのスケーリングは、量子コンピューティング、シミュレーション、およびメトロジーの進歩、特に量子エラー補正の応用において顕著な課題であり、重要な課題である。そこで我々は,約12,000の場所で6,100個の中性原子をトラップする光学的ツイーザーの配列を実験的に実現し,同時にプラットフォームの基本的制約に関連するいくつかの重要な指標に対する最先端性能を克服した。具体的には、このような大量の原子にスケーリングしながら、コヒーレンス時間は12.6(1)秒であり、光ツイーザーアレイにおける超微細量子ビットの記録である。さらに, 室温装置で23分近いトラップ寿命を示し, 99.98952(1)%の高画像生存率と99.99%以上の画像忠実度を併用できるようにした。我々の結果は、他の最近の発展とともに、1万の原子量子ビットを持つ普遍量子コンピューティングが近い将来の展望であることを示している。さらに、我々の研究は、量子シミュレーションとメトロジーの実験において、固有の単一粒子の読み出しと位置決め機能を同様のスケールで行う道を開くことができる。 Optical tweezer arrays have had a transformative impact on atomic and molecular physics over the past years, and they now form the backbone for a wide range of leading experiments in quantum computing, simulation, and metrology. Underlying this development is the simplicity of single particle control and detection inherent to the technique. Typical experiments trap tens to hundreds of atomic qubits, and very recently systems with around one thousand atoms were realized without defining qubits or demonstrating coherent control. However, scaling to thousands of atomic qubits with long coherence times and low-loss, high-fidelity imaging is an outstanding challenge and critical for progress in quantum computing, simulation, and metrology, in particular, towards applications with quantum error correction. Here, we experimentally realize an array of optical tweezers trapping over 6,100 neutral atoms in around 12,000 sites while simultaneously surpassing state-of-the-art performance for several key metrics associated with fundamental limitations of the platform. Specifically, while scaling to such a large number of atoms, we also demonstrate a coherence time of 12.6(1) seconds, a record for hyperfine qubits in an optical tweezer array. Further, we show trapping lifetimes close to 23 minutes in a room-temperature apparatus, enabling record-high imaging survival of 99.98952(1)% in combination with an imaging fidelity of over 99.99%. Our results, together with other recent developments, indicate that universal quantum computing with ten thousand atomic qubits could be a near-term prospect. Furthermore, our work could pave the way for quantum simulation and metrology experiments with inherent single particle readout and positioning capabilities at a similar scale.	翻訳日:2024-03-20 19:01:22 公開日:2024-03-19
# 制御されたマルチビュー編集を用いた3次元拡散適応器 Generic 3D Diffusion Adapter Using Controlled Multi-View Editing ( http://arxiv.org/abs/2403.12032v2 ) ライセンス: Link先を確認	Hansheng Chen, Ruoxi Shi, Yulin Liu, Bokui Shen, Jiayuan Gu, Gordon Wetzstein, Hao Su, Leonidas Guibas,	(参考訳) オープンドメインの3Dオブジェクト合成は、限られたデータと高い計算複雑性のために、画像合成に遅れを取っている。このギャップを埋めるために、最近の研究は多視点拡散を調査してきたが、しばしば3次元の一貫性、視覚的品質、効率に欠ける。本稿では,SDEditの3次元版として機能するMVEditを提案する。 MVEditは、市販の2D拡散モデルに基づいて、トレーニング不要な3Dアダプタを通じて3D一貫性を実現し、最後の2Dビューをコヒーレントな3D表現に上げ、次にレンダリングされたビューを使用して次の2Dビューを、視覚的品質を損なうことなく条件付けする。推定時間はわずか2～5分であり、この枠組みは蒸留よりも品質と速度のトレードオフが優れている。 MVEditは非常に汎用的で拡張性があり、テキスト/画像-3D生成、3D-3D編集、高品質なテクスチャ合成など幅広い応用がある。特に,3D画像とテクスチャ生成タスクにおける最先端性能の評価を行った。さらに,限られた資源を持つ小さな3次元データセット上での2次元潜時拡散モデルを微調整し,高速な低解像度テキスト・ツー・3D初期化を実現する手法を提案する。 Open-domain 3D object synthesis has been lagging behind image synthesis due to limited data and higher computational complexity. To bridge this gap, recent works have investigated multi-view diffusion but often fall short in either 3D consistency, visual quality, or efficiency. This paper proposes MVEdit, which functions as a 3D counterpart of SDEdit, employing ancestral sampling to jointly denoise multi-view images and output high-quality textured meshes. Built on off-the-shelf 2D diffusion models, MVEdit achieves 3D consistency through a training-free 3D Adapter, which lifts the 2D views of the last timestep into a coherent 3D representation, then conditions the 2D views of the next timestep using rendered views, without uncompromising visual quality. With an inference time of only 2-5 minutes, this framework achieves better trade-off between quality and speed than score distillation. MVEdit is highly versatile and extendable, with a wide range of applications including text/image-to-3D generation, 3D-to-3D editing, and high-quality texture synthesis. In particular, evaluations demonstrate state-of-the-art performance in both image-to-3D and text-guided texture generation tasks. Additionally, we introduce a method for fine-tuning 2D latent diffusion models on small 3D datasets with limited resources, enabling fast low-resolution text-to-3D initialization.	翻訳日:2024-03-20 19:01:22 公開日:2024-03-19
# MineDreamer: シミュレーション世界制御のためのチェーン・オブ・イマジネーションによるインストラクションの追跡学習 MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control ( http://arxiv.org/abs/2403.12037v2 ) ライセンス: Link先を確認	Enshen Zhou, Yiran Qin, Zhenfei Yin, Yuzhou Huang, Ruimao Zhang, Lu Sheng, Yu Qiao, Jing Shao,	(参考訳) 人間のような方法で多様な指示に従うことができる汎用的なエージェントを設計することは、長く続く目標である。しかし、既存のアプローチは、抽象的かつシーケンシャルな自然言語命令を理解するのが難しいため、命令に従うのに失敗することが多い。この目的のために、我々は、低レベル制御信号生成における命令追従能力を向上させる革新的なパラダイムを備えた、挑戦的なMinecraftシミュレータ上に構築された、オープンなエンボディエージェントであるMineDreamerを紹介する。具体的には、MineDreamerは、近年のMLLM(Multimodal Large Language Models)と拡散モデルの進歩の上に開発されており、命令を実行し、想像をより正確に視覚的なプロンプトに変換するステップ・バイ・ステップを想定するCoI(Chain-of-Imagination)機構を用いており、その後、エージェントはキーボード・アンド・ムース・アクションを生成して、各ステップでの指示に従って、これらのイマジネーションを効率的に実現している。大規模な実験により、MineDreamerは単段階および多段階の命令を着実に追従し、最高のジェネラリストエージェントのベースラインを著しく上回り、性能をほぼ倍増させることを示した。さらに、エージェントの想像力の質的分析により、オープンワールドの一般化と理解が明らかになる。 It is a long-lasting goal to design a generalist-embodied agent that can follow diverse instructions in human-like ways. However, existing approaches often fail to steadily follow instructions due to difficulties in understanding abstract and sequential natural language instructions. To this end, we introduce MineDreamer, an open-ended embodied agent built upon the challenging Minecraft simulator with an innovative paradigm that enhances instruction-following ability in low-level control signal generation. Specifically, MineDreamer is developed on top of recent advances in Multimodal Large Language Models (MLLMs) and diffusion models, and we employ a Chain-of-Imagination (CoI) mechanism to envision the step-by-step process of executing instructions and translating imaginations into more precise visual prompts tailored to the current state; subsequently, the agent generates keyboard-and-mouse actions to efficiently achieve these imaginations, steadily following the instructions at each step. Extensive experiments demonstrate that MineDreamer follows single and multi-step instructions steadily, significantly outperforming the best generalist agent baseline and nearly doubling its performance. Moreover, qualitative analysis of the agent's imaginative ability reveals its generalization and comprehension of the open world.	翻訳日:2024-03-20 18:51:33 公開日:2024-03-19
# FedFisher: ワンショットフェデレーションラーニングのための漁業情報を活用する FedFisher: Leveraging Fisher Information for One-Shot Federated Learning ( http://arxiv.org/abs/2403.12329v1 ) ライセンス: Link先を確認	Divyansh Jhunjhunwala, Shiqiang Wang, Gauri Joshi,	(参考訳) FL(Standard Federated Learning)アルゴリズムは、通常、サーバとクライアント間の複数ラウンドの通信を必要とする。 One-Shot FLは、サーバが単一の通信ラウンドでグローバルモデルをトレーニングできるようにすることによって、この問題に対処することを目的とした、新しいパラダイムである。本稿では,FedFisherについて述べる。FedFisherはFedFisherという一発FLのための新しいアルゴリズムで,FedFisherはローカルクライアントモデルで計算されたFisher情報行列を利用しており,これはFLのベイズ的視点によって動機付けられている。まず,2層オーバーパラメータ化されたReLUニューラルネットワークのFedFisherを理論的に解析し,ニューラルネットワークの幅とクライアントでのローカルトレーニングの量が増加するにつれて,ワンショットのFedFisherグローバルモデルの誤差が著しく小さくなることを示す。次に、フルフィッシャーに対する対角的フィッシャーとK-FAC近似を用いたFedFisherの実用的な変種を提案し、FLの通信と計算効率を強調した。最後に,FedFisherのこれらの変種は,競合するベースラインよりも一貫して改善されていることを示す。 Standard federated learning (FL) algorithms typically require multiple rounds of communication between the server and the clients, which has several drawbacks, including requiring constant network connectivity, repeated investment of computational resources, and susceptibility to privacy attacks. One-Shot FL is a new paradigm that aims to address this challenge by enabling the server to train a global model in a single round of communication. In this work, we present FedFisher, a novel algorithm for one-shot FL that makes use of Fisher information matrices computed on local client models, motivated by a Bayesian perspective of FL. First, we theoretically analyze FedFisher for two-layer over-parameterized ReLU neural networks and show that the error of our one-shot FedFisher global model becomes vanishingly small as the width of the neural networks and amount of local training at clients increases. Next, we propose practical variants of FedFisher using the diagonal Fisher and K-FAC approximation for the full Fisher and highlight their communication and compute efficiency for FL. Finally, we conduct extensive experiments on various datasets, which show that these variants of FedFisher consistently improve over competing baselines.	翻訳日:2024-03-20 15:51:27 公開日:2024-03-19
# 無限温度における非可積分スピン鎖の熱固有状態 Exact thermal eigenstates of nonintegrable spin chains at infinite temperature ( http://arxiv.org/abs/2403.12330v1 ) ライセンス: Link先を確認	Yuuya Chiba, Yasushi Yoneta,	(参考訳) 固有状態熱化仮説(ETH)は、孤立量子多体系の熱化を説明する上で重要な役割を果たしている。しかし、非可積分系の熱エネルギー固有状態の理論的な処理が困難であるため、現実的な系ではETHが証明されていない。ここでは、非可積分スピン鎖の熱固有状態を初めて解析的に記述する。我々は, 絡み合った対足動物対 (EAP) 状態と呼ばれる, 理論的に拘束可能な容積法状態のクラスを導入する。これらの状態は熱的であり、最も厳密な意味では、無限の温度で全ての局所観測可能な状態に対してギブス状態と区別できない。次に、EAP状態が固有状態であるハミルトニアンを同定し、これらのハミルトニアンのうちいくつかが可積分であることを示す。さらに、EAP状態の想像時間進化により任意の温度で熱純状態を得る。以上の結果から,ETHの実証可能な例が提案される可能性が示唆された。 The eigenstate thermalization hypothesis (ETH) plays a major role in explaining thermalization of isolated quantum many-body systems. However, there has been no proof of the ETH in realistic systems due to the difficulty in the theoretical treatment of thermal energy eigenstates of nonintegrable systems. Here, we write down analytically, for the first time, thermal eigenstates of nonintegrable spin chains. We introduce a class of theoretically tractable volume-law states, which we call entangled antipodal pair (EAP) states. These states are thermal, in the most strict sense that they are indistinguishable from the Gibbs state with respect to all local observables, with infinite temperature. We then identify Hamiltonians having the EAP state as an eigenstate and rigorously show that some of these Hamiltonians are nonintegrable. Furthermore, a thermal pure state at an arbitrary temperature is obtained by the imaginary time evolution of an EAP state. Our results offer a potential avenue for providing a provable example of the ETH.	翻訳日:2024-03-20 15:51:27 公開日:2024-03-19
# Halved Doseにおける高分解能光子計数CTの臨床的検討 Deep Few-view High-resolution Photon-counting Extremity CT at Halved Dose for a Clinical Trial ( http://arxiv.org/abs/2403.12331v1 ) ライセンス: Link先を確認	Mengzhou Li, Chuang Niu, Ge Wang, Maya R Amma, Krishna M Chapagain, Stefan Gabrielson, Andrew Li, Kevin Jonker, Niels de Ruiter, Jennifer A Clark, Phil Butler, Anthony Butler, Hengyong Yu,	(参考訳) 最新のX線光子計数計算トモグラフィ(PCCT)は、組織の特徴と材料分解のための多エネルギー高分解能イメージングを可能にする。しかしながら、放射線線量と撮像速度は、コントラスト強化や他の研究に改善が必要である。深層学習による2次元少数ビュー再構成の成功にもかかわらず,GPUメモリの制約,データ不足のトレーニング,ドメインギャップの問題などにより,臨床診断のためのHRボリュームスキャンに応用することは限られている。本稿では,ニュージーランドの臨床試験において,半減量と2倍の速度でPCCT画像再構成を行うための深層学習に基づくアプローチを提案する。特に,GPUメモリの制限を緩和するパッチベースのボリュームリファインメントネットワーク,合成データを用いたトレーニングネットワーク,およびモデルベースの反復リファインメントを用いて,合成データと実世界のギャップを埋める。シミュレーションとファントム実験は、固定されたネットワークを用いて、ドメイン内構造とオフドメイン構造の両方の異なる取得条件下で、一貫して改善された結果を示す。臨床治験患者8人の画像品質を,標準画像再構成とフルビューデータセットと比較し,3人の放射線技師により評価した。提案手法は, 診断画像の品質スコアの点で, 臨床ベンチマークと基本的に同一かそれ以上であることがわかった。提案手法は,画像品質を損なうことなく,PCCTの安全性と効率を向上する大きな可能性を秘めている。 The latest X-ray photon-counting computed tomography (PCCT) for extremity allows multi-energy high-resolution (HR) imaging for tissue characterization and material decomposition. However, both radiation dose and imaging speed need improvement for contrast-enhanced and other studies. Despite the success of deep learning methods for 2D few-view reconstruction, applying them to HR volumetric reconstruction of extremity scans for clinical diagnosis has been limited due to GPU memory constraints, training data scarcity, and domain gap issues. In this paper, we propose a deep learning-based approach for PCCT image reconstruction at halved dose and doubled speed in a New Zealand clinical trial. Particularly, we present a patch-based volumetric refinement network to alleviate the GPU memory limitation, train network with synthetic data, and use model-based iterative refinement to bridge the gap between synthetic and real-world data. The simulation and phantom experiments demonstrate consistently improved results under different acquisition conditions on both in- and off-domain structures using a fixed network. The image quality of 8 patients from the clinical trial are evaluated by three radiologists in comparison with the standard image reconstruction with a full-view dataset. It is shown that our proposed approach is essentially identical to or better than the clinical benchmark in terms of diagnostic image quality scores. Our approach has a great potential to improve the safety and efficiency of PCCT without compromising image quality.	翻訳日:2024-03-20 15:51:27 公開日:2024-03-19
# 動的システムの予測のための時間一貫性クープマンオートエンコーダ Temporally-Consistent Koopman Autoencoders for Forecasting Dynamical Systems ( http://arxiv.org/abs/2403.12335v1 ) ライセンス: Link先を確認	Indranil Nayak, Debdipta Goswami, Mrinal Kumar, Fernando Teixeira,	(参考訳) 十分に高品質なデータの欠如は、高次元時空間力学系のデータ駆動モデリングにおいて、しばしば重要な課題となる。クープマンオートエンコーダ(KAEs)は、ディープニューラルネットワーク(DNN)の表現性、オートエンコーダの次元低減能力、およびクープマン作用素のスペクトル特性を利用して、より単純で線形な力学で低次特徴空間を学習する。しかし、KAEsの有効性は、限られたノイズの多いトレーニングデータセットによって妨げられ、一般化性が低下する。これを解決するために,制約付き,ノイズの多いトレーニングデータであっても,正確な長期予測を生成するように設計されたTcKAE(Temporally-Consistent Koopman Autoencoder)を導入する。これは、異なる時間ステップにわたる予測コヒーレンスを強制する一貫性の規則化項によって達成され、既存のモデルに対するtcKAEの堅牢性と一般化性を高める。我々は, クープマンスペクトル理論に基づくこのアプローチの解析的正当性を示し, 簡単な振り子振動, 運動プラズマ, 流体流, 海面温度データを含む, 最先端KAEモデルに対するtcKAEの優れた性能を実証的に示す。 Absence of sufficiently high-quality data often poses a key challenge in data-driven modeling of high-dimensional spatio-temporal dynamical systems. Koopman Autoencoders (KAEs) harness the expressivity of deep neural networks (DNNs), the dimension reduction capabilities of autoencoders, and the spectral properties of the Koopman operator to learn a reduced-order feature space with simpler, linear dynamics. However, the effectiveness of KAEs is hindered by limited and noisy training datasets, leading to poor generalizability. To address this, we introduce the Temporally-Consistent Koopman Autoencoder (tcKAE), designed to generate accurate long-term predictions even with constrained and noisy training data. This is achieved through a consistency regularization term that enforces prediction coherence across different time steps, thus enhancing the robustness and generalizability of tcKAE over existing models. We provide analytical justification for this approach based on Koopman spectral theory and empirically demonstrate tcKAE's superior performance over state-of-the-art KAE models across a variety of test cases, including simple pendulum oscillations, kinetic plasmas, fluid flows, and sea surface temperature data.	翻訳日:2024-03-20 15:51:27 公開日:2024-03-19
# ノルム空間における確率的ハルパーン反復と強化学習への応用 Stochastic Halpern iteration in normed spaces and applications to reinforcement learning ( http://arxiv.org/abs/2403.12338v1 ) ライセンス: Link先を確認	Mario Bravo, Juan Pablo Contreras,	(参考訳) 確率的ハルパーン反復のオラクル複雑性を分散還元を用いて解析し、ノルム有限次元空間における非拡張的および収縮的作用素の固定点を近似することを目指す。基礎となる確率的オラクルが一様有界分散を持つ場合、我々の手法は全体のオラクルの複雑さを$\tilde{O}(\varepsilon^{-5})$で表し、確率的クラスノセルスキイ・マンの反復に対して確立された最近の速度を改善する。また、小バッチであっても全ての平均反復を含む幅広いアルゴリズムに適用可能な、$\Omega(\varepsilon^{-3})$の低い境界を確立する。我々のアプローチの適切な修正を用いて、作用素が$\gamma$-contractionである場合、$O(\varepsilon^{-2}(1-\gamma)^{-3})$複雑性を導出する。アプリケーションとして、平均報酬と割引報酬を決定するための新しい同期アルゴリズムを提案する。特に、平均的な報酬に対して、本手法は最もよく知られたサンプルの複雑さを改善する。 We analyze the oracle complexity of the stochastic Halpern iteration with variance reduction, where we aim to approximate fixed-points of nonexpansive and contractive operators in a normed finite-dimensional space. We show that if the underlying stochastic oracle is with uniformly bounded variance, our method exhibits an overall oracle complexity of $\tilde{O}(\varepsilon^{-5})$, improving recent rates established for the stochastic Krasnoselskii-Mann iteration. Also, we establish a lower bound of $\Omega(\varepsilon^{-3})$, which applies to a wide range of algorithms, including all averaged iterations even with minibatching. Using a suitable modification of our approach, we derive a $O(\varepsilon^{-2}(1-\gamma)^{-3})$ complexity bound in the case in which the operator is a $\gamma$-contraction. As an application, we propose new synchronous algorithms for average reward and discounted reward Markov decision processes. In particular, for the average reward, our method improves on the best-known sample complexity.	翻訳日:2024-03-20 15:51:27 公開日:2024-03-19
# Entity6K: リアルタイムエンティティ認識のための大規模なオープンドメイン評価データセット Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition ( http://arxiv.org/abs/2403.12339v1 ) ライセンス: Link先を確認	Jielin Qiu, William Han, Winfred Wang, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Christos Faloutsos, Lei Li, Lijuan Wang,	(参考訳) オープンドメインの実体認識は、多様な環境において様々な実体を識別することを含む、不可欠だが難しい。適切な評価データセットの欠如は、膨大な数のエンティティと、データキュレーションに必要な膨大な人的労力のために、この分野において大きな障害となっている。実世界のエンティティ認識のための包括的なデータセットであるEntity6Kを紹介します。 Entity6Kはさまざまなエンティティ名と分類を提供し、既存のデータセットのギャップに対処する。画像キャプションやオブジェクト検出,ゼロショット分類,高密度キャプションといったタスクにおいて,既存のモデルを用いたベンチマークを行い,エンティティ認識能力の評価におけるEntity6Kの有効性を実証した。 Entity6Kは、オープンドメイン設定で正確なエンティティ認識を前進させるための貴重なリソースになるだろうと考えています。 Open-domain real-world entity recognition is essential yet challenging, involving identifying various entities in diverse environments. The lack of a suitable evaluation dataset has been a major obstacle in this field due to the vast number of entities and the extensive human effort required for data curation. We introduce Entity6K, a comprehensive dataset for real-world entity recognition, featuring 5,700 entities across 26 categories, each supported by 5 human-verified images with annotations. Entity6K offers a diverse range of entity names and categorizations, addressing a gap in existing datasets. We conducted benchmarks with existing models on tasks like image captioning, object detection, zero-shot classification, and dense captioning to demonstrate Entity6K's effectiveness in evaluating models' entity recognition capabilities. We believe Entity6K will be a valuable resource for advancing accurate entity recognition in open-domain settings.	翻訳日:2024-03-20 15:51:27 公開日:2024-03-19
# 親しみやすいシャープネスを意識した最小化 Friendly Sharpness-Aware Minimization ( http://arxiv.org/abs/2403.12350v1 ) ライセンス: Link先を確認	Tao Li, Pan Zhou, Zhengbao He, Xinwen Cheng, Xiaolin Huang,	(参考訳) シャープネス・アウェアの最小化(SAM)は、トレーニング損失とロスシャープネスの両方を最小化することにより、ディープニューラルネットワークトレーニングの改善に役立っている。実践的な成功にもかかわらず、SAMの一般化拡張のメカニズムは解明され、ディープラーニング最適化の進歩を制限している。本研究では, SAMの中核となるコンポーネントを一般化するために検討し, SAMの一般化をさらに促進するために"フレンドリーSAM"(F-SAM)を導入する。本研究は,直交摂動におけるバッチ特異的確率勾配雑音,すなわち,SAMの一般化性能に大きな影響を及ぼす,現在のミニバッチ勾配における重要な役割を明らかにするものである。 SAMの対向摂動を全勾配および確率勾配雑音成分に分解することにより、全勾配成分のみに依存することは一般化を低下させ、それを除くことで性能が向上することを発見した。考えられる理由は、全勾配コンポーネントがデータセット全体のシャープネス損失を増大させ、現在のミニバッチデータのみに後続のシャープネス最小化ステップと矛盾を生じさせるためである。これらの知見にインスパイアされたF-SAMは、全勾配成分の負の効果を軽減することを目的としている。歴史的確率勾配の指数移動平均(EMA)によって推定される全勾配を除去し、確率勾配雑音を利用して一般化を改善する。さらに、EMA近似の理論的検証を行い、非凸問題に対するF-SAMの収束性を証明する。広汎な実験は、バニラSAM上でのF-SAMの一般化性能とロバスト性を示す。コードはhttps://github.com/nblt/F-SAMで入手できる。 Sharpness-Aware Minimization (SAM) has been instrumental in improving deep neural network training by minimizing both training loss and loss sharpness. Despite the practical success, the mechanisms behind SAM's generalization enhancements remain elusive, limiting its progress in deep learning optimization. In this work, we investigate SAM's core components for generalization improvement and introduce "Friendly-SAM" (F-SAM) to further enhance SAM's generalization. Our investigation reveals the key role of batch-specific stochastic gradient noise within the adversarial perturbation, i.e., the current minibatch gradient, which significantly influences SAM's generalization performance. By decomposing the adversarial perturbation in SAM into full gradient and stochastic gradient noise components, we discover that relying solely on the full gradient component degrades generalization while excluding it leads to improved performance. The possible reason lies in the full gradient component's increase in sharpness loss for the entire dataset, creating inconsistencies with the subsequent sharpness minimization step solely on the current minibatch data. Inspired by these insights, F-SAM aims to mitigate the negative effects of the full gradient component. It removes the full gradient estimated by an exponentially moving average (EMA) of historical stochastic gradients, and then leverages stochastic gradient noise for improved generalization. Moreover, we provide theoretical validation for the EMA approximation and prove the convergence of F-SAM on non-convex problems. Extensive experiments demonstrate the superior generalization performance and robustness of F-SAM over vanilla SAM. Code is available at https://github.com/nblt/F-SAM.	翻訳日:2024-03-20 15:51:27 公開日:2024-03-19
# Sim2Real in Reconstructive Spectroscopy: Augmented Device-Informed Data Simulationによるディープラーニング Sim2Real in Reconstructive Spectroscopy: Deep Learning with Augmented Device-Informed Data Simulation ( http://arxiv.org/abs/2403.12354v1 ) ライセンス: Link先を確認	Jiyi Chen, Pengyu Li, Yutong Wang, Pei-Cheng Ku, Qing Qu,	(参考訳) 本研究は,効率的なデータサンプリングと高速推論時間に着目し,再構成分光におけるスペクトル信号再構成のための深層学習(DL)ベースのフレームワークであるSim2Realを提案する。この研究は、デバイスインフォームド・シミュレートされたデータのみをトレーニングに利用できる極端な設定の下で、現実世界のスペクトル信号を再構築するという課題に焦点を当てている。このようなデバイスインフォームド・シミュレートされたデータは、実際のデータよりもはるかに容易に収集できるが、実際のデータから大きな分散シフトを示す。このようなシミュレーションデータを効果的に活用するために、このドメインシフトの悪影響を軽減するために階層的なデータ拡張戦略を導入し、我々の拡張データによるスペクトル信号再構成のための対応するニューラルネットワークを設計する。我々の分光計装置から測定した実データを用いて実験したところ、Sim2Realは、最先端の最適化手法でオンパー性能を達成しつつ、推論中にかなりのスピードアップを達成することがわかった。 This work proposes a deep learning (DL)-based framework, namely Sim2Real, for spectral signal reconstruction in reconstructive spectroscopy, focusing on efficient data sampling and fast inference time. The work focuses on the challenge of reconstructing real-world spectral signals under the extreme setting where only device-informed simulated data are available for training. Such device-informed simulated data are much easier to collect than real-world data but exhibit large distribution shifts from their real-world counterparts. To leverage such simulated data effectively, a hierarchical data augmentation strategy is introduced to mitigate the adverse effects of this domain shift, and a corresponding neural network for the spectral signal reconstruction with our augmented data is designed. Experiments using a real dataset measured from our spectrometer device demonstrate that Sim2Real achieves significant speed-up during the inference while attaining on-par performance with the state-of-the-art optimization-based methods.	翻訳日:2024-03-20 15:51:27 公開日:2024-03-19
# 精度行列に対する代替的なグラフラッソアルゴリズム An Alternative Graphical Lasso Algorithm for Precision Matrices ( http://arxiv.org/abs/2403.12357v1 ) ライセンス: Link先を確認	Aramayis Dallakyan, Mohsen Pourahmadi,	(参考訳) Graphical Lasso (GLasso) アルゴリズムはスパース精度行列の推定に高速で広く用いられている(Friedman et al , 2008)。高次元共分散推定の文献における中心的な役割は、平均ベクトルのスパース推定のためのラッソ回帰に匹敵する。最適化対象、収束、正定性、性能に関する謎は、Mazumder and Hastie (2011) で発見され、解決され、提示された。精度行列の最後の列の新たな、わずかに異なる逆行列を用いて、正規化された正規対数のような自然に分離された2つの凸関数の和がラッソ回帰問題であることを示す。この分解は、GLassoの更新をDP-GLassoに匹敵する性能で計算するための、透過的でシンプルな反復ブロック座標降下アルゴリズムを開発する鍵となる。特に,本アルゴリズムは最適化対象として最適化行列を持ち,DP-GLassoアルゴリズムの良好な特性をすべて保持している。 The Graphical Lasso (GLasso) algorithm is fast and widely used for estimating sparse precision matrices (Friedman et al., 2008). Its central role in the literature of high-dimensional covariance estimation rivals that of Lasso regression for sparse estimation of the mean vector. Some mysteries regarding its optimization target, convergence, positive-definiteness and performance have been unearthed, resolved and presented in Mazumder and Hastie (2011), leading to a new/improved (dual-primal) DP-GLasso. Using a new and slightly different reparametriztion of the last column of a precision matrix we show that the regularized normal log-likelihood naturally decouples into a sum of two easy to minimize convex functions one of which is a Lasso regression problem. This decomposition is the key in developing a transparent, simple iterative block coordinate descent algorithm for computing the GLasso updates with performance comparable to DP-GLasso. In particular, our algorithm has the precision matrix as its optimization target right at the outset, and retains all the favorable properties of the DP-GLasso algorithm.	翻訳日:2024-03-20 15:51:27 公開日:2024-03-19
# 散逸多体量子カオスの符号としてのリャプノフ指数 The Lyapunov exponent as a signature of dissipative many-body quantum chaos ( http://arxiv.org/abs/2403.12359v1 ) ライセンス: Link先を確認	Antonio M. García-García, Jacobus J. M. Verbaarschot, Jie-ping Zheng,	(参考訳) エルミート量子カオス力学の際立った特徴は、ルプノフ指数によって与えられる速度でエレンフェスト時間の周りにある時間外相関関数(OTOC)が指数関数的に増加することである。物理的には、OTOCは量子運動の性質に大きく依存する量子不確実性の成長を記述している。ここでは、散逸的量子カオスの正確な定義を提供するためにOTOCを用いる。この目的のために、我々は、マルコフ浴に結合した$q$-body Sachdev-Ye-Kitaevモデルの大きな$q$-極限のベクトル化された定式化のためのリアプノフ指数を解析的に計算する。これらの解析結果は、シュウィンガー・ダイソン方程式とベーテ・サルペーター方程式の解に基づいて、いくつかの値の$q \geq 4$に対してラプノフ指数の明示的な数値計算によって確認される。リアプノフ指数は浴へのカップリングが増加するにつれて単調に減少し、最終的には量子カオスではない力学への遷移をシグナルするカップリングの臨界値において負となることを示す。したがって、正のリャプノフ指数は放散多体量子カオスの定義的特徴である。十分に強いカップリングのための指数的成長の破れの観察は、散逸的な量子カオスが環境に十分弱いカップリングを必要とすることを示唆している。 A distinct feature of Hermitian quantum chaotic dynamics is the exponential increase of certain out-of-time-order-correlation (OTOC) functions around the Ehrenfest time with a rate given by a Lyapunov exponent. Physically, the OTOCs describe the growth of quantum uncertainty that crucially depends on the nature of the quantum motion. Here, we employ the OTOC in order to provide a precise definition of dissipative quantum chaos. For this purpose, we compute analytically the Lyapunov exponent for the vectorized formulation of the large $q$-limit of a $q$-body Sachdev-Ye-Kitaev model coupled to a Markovian bath. These analytic results are confirmed by an explicit numerical calculation of the Lyapunov exponent for several values of $q \geq 4$ based on the solutions of the Schwinger-Dyson and Bethe-Salpeter equations. We show that the Lyapunov exponent decreases monotonically as the coupling to the bath increases and eventually becomes negative at a critical value of the coupling signaling a transition to a dynamics which is no longer quantum chaotic. Therefore, a positive Lyapunov exponent is a defining feature of dissipative many-body quantum chaos. The observation of the breaking of the exponential growth for sufficiently strong coupling suggests that dissipative quantum chaos may require in certain cases a sufficiently weak coupling to the environment.	翻訳日:2024-03-20 15:51:27 公開日:2024-03-19
# DMAD: リアルタイム異常検出のためのデュアルメモリバンク DMAD: Dual Memory Bank for Real-World Anomaly Detection ( http://arxiv.org/abs/2403.12362v1 ) ライセンス: Link先を確認	Jianlong Hu, Xu Chen, Zhenye Gan, Jinlong Peng, Shengchuan Zhang, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Liujuan Cao, Rongrong Ji,	(参考訳) 統一モデルの訓練は、その一般化能力と記憶効率により、実用上の産業異常検出シナリオにより適していると考えられる。しかし、通常のデータのみを使用するこのマルチクラス設定は、現実の世界では数少ない、しかし重要なアノテートされたアノテートされた異常を無視する。実世界の異常検出の課題に対処するため,Dual Memory Bank と呼ばれる新しいフレームワークを提案する。このフレームワークは、統一された(複数クラスの)設定で、教師なしシナリオと半教師なしシナリオの両方を処理する。 DMADはデュアルメモリバンクを用いて、正常なパターンと異常なパターンの間の特徴距離と特徴的注意を計算し、通常のパターンと異常なパターンに関する知識をカプセル化する。この知識は、異常スコア学習のための拡張された表現を構築するために使用される。 DMADをMVTec-ADおよびVisAデータセット上で評価した。その結果、DMADは現在の最先端手法を超越し、実世界の異常検出シナリオの複雑さを扱うDMADの能力を強調した。 Training a unified model is considered to be more suitable for practical industrial anomaly detection scenarios due to its generalization ability and storage efficiency. However, this multi-class setting, which exclusively uses normal data, overlooks the few but important accessible annotated anomalies in the real world. To address the challenge of real-world anomaly detection, we propose a new framework named Dual Memory bank enhanced representation learning for Anomaly Detection (DMAD). This framework handles both unsupervised and semi-supervised scenarios in a unified (multi-class) setting. DMAD employs a dual memory bank to calculate feature distance and feature attention between normal and abnormal patterns, thereby encapsulating knowledge about normal and abnormal instances. This knowledge is then used to construct an enhanced representation for anomaly score learning. We evaluated DMAD on the MVTec-AD and VisA datasets. The results show that DMAD surpasses current state-of-the-art methods, highlighting DMAD's capability in handling the complexities of real-world anomaly detection scenarios.	翻訳日:2024-03-20 15:51:27 公開日:2024-03-19
# E-DoH:インターネット上のオープンDoHサービスの深さをエレガントに検出する E-DoH: Elegantly Detecting the Depths of Open DoH Service on the Internet ( http://arxiv.org/abs/2403.12363v1 ) ライセンス: Link先を確認	Cong Dong, Jiahai Yang, Yun Li, Yue Wu, Yufan Chen, Chenglong Li, Haoran Jiao, Xia Yin, Yuling Liu,	(参考訳) 近年,DNS over Encrypted (DoE) メソッドは,DNSエコシステムの領域において新たなトレンドと見なされている。これらのDoEメソッドでは、HTTPS(DoH)上のDNSは、データ機密を保護するための暗号化を提供すると同時に、Webサービスにポート43を多重化することで検閲を回避するために、より良い難読性を提供する。この開発は、公開されているDoHサービスの発見にいくつかの不都合をもたらした。本稿では,エレガントかつ効率的なDoHサービス検出のためのE-DoH法を提案する。まず、単一のDoH接続でサービスディスカバリ、正当性検証、依存性構築を含む複数のタスクを実現できるように、探索メカニズムを最適化しました。次に,効率的なDoH検出ツールを提案する。このツールは、必要な交通量を大幅に削減しつつ、探索効率を向上させることができる。第3に、上記の最適化手法に基づいて、IPv4空間の探索を行い、収集した情報に基づいて、DoHの詳細な分析を行った。実験により,本手法は時間効率が80%向上し,検出作業の完了には4%～20%の交通量しか必要としないことがわかった。ワイルド検出では46kのDoHサービスが発見されました。収集したデータに基づいて、現在のDoHサービスエコシステムに関する興味深い結論をいくつか提示する。 In recent years, DNS over Encrypted (DoE) methods have been regarded as a novel trend within the realm of the DNS ecosystem. In these DoE methods, DNS over HTTPS (DoH) provides encryption to protect data confidentiality while providing better obfuscation to avoid censorship by multiplexing port 443 with web services. This development introduced certain inconveniences in discovering publicly available DoH services. In this paper, we propose the E-DoH method for elegant and efficient DoH service detection. First, we optimized the probing mechanism to enable a single DoH connection to accomplish multiple tasks including service discovery, correctness validation and dependency construction. Second, we propose an efficient DoH detection tool. This tool can enhance probing efficiency while significantly reduce the required traffic volume. Third, based on the above optimization methods, we conducted an exploration of the IPv4 space and performed an in-depth analysis of DoH based on the collected information. Through experiments, our approach demonstrates a remarkable 80% improvement in time efficiency, and only requires 4%-20% traffic volume to complete the detection task. In wild detection, our approach discovered 46k DoH services, which nearly doubles the number discovered by the state-of-the-art. Based on the collected data, we present several intriguing conclusions about the current DoH service ecosystem.	翻訳日:2024-03-20 15:51:27 公開日:2024-03-19
# ネットワーク校正のためのクラスと領域適応制約 Class and Region-Adaptive Constraints for Network Calibration ( http://arxiv.org/abs/2403.12364v1 ) ライセンス: Link先を確認	Balamurali Murugesan, Julio Silva-Rodriguez, Ismail Ben Ayed, Jose Dolz,	(参考訳) 本研究では,異なるカテゴリや対象領域がもたらす固有の課題を考慮したセグメンテーションネットワークの校正手法を提案する。特に,クラスと地域による制約を学習目標に統合し,クラスと地域の違いを考慮に入れた複数のペナルティ重みを定式化する。しかし、最適なペナルティウェイトを手動で見つけることは不可能であり、最適化プロセスを妨げる可能性がある。この制限を克服するために,クラスおよび地域適応制約(CRaC)に基づくアプローチを提案する。 CRaCは、制約付き最適化において確立された手法である一般化ラグランジアン法に基づいている。 2つの人気のあるセグメンテーションベンチマークと2つのよく知られたセグメンテーションネットワークの実験結果は、既存のアプローチと比較してCRaCの優位性を示している。コードは、https://github.com/Bala93/CRac/で入手できる。 In this work, we present a novel approach to calibrate segmentation networks that considers the inherent challenges posed by different categories and object regions. In particular, we present a formulation that integrates class and region-wise constraints into the learning objective, with multiple penalty weights to account for class and region differences. Finding the optimal penalty weights manually, however, might be unfeasible, and potentially hinder the optimization process. To overcome this limitation, we propose an approach based on Class and Region-Adaptive constraints (CRaC), which allows to learn the class and region-wise penalty weights during training. CRaC is based on a general Augmented Lagrangian method, a well-established technique in constrained optimization. Experimental results on two popular segmentation benchmarks, and two well-known segmentation networks, demonstrate the superiority of CRaC compared to existing approaches. The code is available at: https://github.com/Bala93/CRac/	翻訳日:2024-03-20 15:51:27 公開日:2024-03-19
# GaussianFlow:4Dコンテンツ作成のためのガウス力学のスプラッティング GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation ( http://arxiv.org/abs/2403.12365v1 ) ライセンス: Link先を確認	Quankai Gao, Qiangeng Xu, Zhe Cao, Ben Mildenhall, Wenchao Ma, Le Chen, Danhang Tang, Ulrich Neumann,	(参考訳) 画像やビデオからガウスの4Dフィールドを作るのは、制約の少ない性質のため難しい作業だ。この最適化は、入力されたビデオから測光基準を引き出すか、生成モデルによって制御することができるが、ガウス運動を直接監督することは、まだ探索されていない。本稿では,連続するフレーム間の3次元ガウス流と画素速度のダイナミクスを結合するガウス流という新しい概念を紹介する。ガウス流は、画像空間にガウス力学をスプラッティングすることで効率よく得ることができる。この微分可能なプロセスは、光学フローからの直接動的監視を可能にする。提案手法は,4次元動的コンテンツ生成とガウススメッティングによる4次元新規ビュー合成,特に既存の方法では処理が困難であるリッチモーションのコンテンツに対して,大きな効果がある。 4次元生成で発生する一般的な色漂流問題は、改良されたグアシアン力学によって解決される。広汎な実験における視覚的品質は,本手法の有効性を示す。定量的および定性的な評価により,本手法は4次元生成と4次元新規ビュー合成の両課題において,最先端の成果が得られることが示された。プロジェクトページ:https://zerg-overmind.github.io/GaussianFlow.github.io/ Creating 4D fields of Gaussian Splatting from images or videos is a challenging task due to its under-constrained nature. While the optimization can draw photometric reference from the input videos or be regulated by generative models, directly supervising Gaussian motions remains underexplored. In this paper, we introduce a novel concept, Gaussian flow, which connects the dynamics of 3D Gaussians and pixel velocities between consecutive frames. The Gaussian flow can be efficiently obtained by splatting Gaussian dynamics into the image space. This differentiable process enables direct dynamic supervision from optical flow. Our method significantly benefits 4D dynamic content generation and 4D novel view synthesis with Gaussian Splatting, especially for contents with rich motions that are hard to be handled by existing methods. The common color drifting issue that happens in 4D generation is also resolved with improved Guassian dynamics. Superior visual quality on extensive experiments demonstrates our method's effectiveness. Quantitative and qualitative evaluations show that our method achieves state-of-the-art results on both tasks of 4D generation and 4D novel view synthesis. Project page: https://zerg-overmind.github.io/GaussianFlow.github.io/	翻訳日:2024-03-20 15:51:27 公開日:2024-03-19
# U-Net Kalman Filter (UNetKF): 機械学習支援型アンサンブルデータ同化の例 U-Net Kalman Filter (UNetKF): An Example of Machine Learning-assisted Ensemble Data Assimilation ( http://arxiv.org/abs/2403.12366v1 ) ライセンス: Link先を確認	Feiyu Lu,	(参考訳) 機械学習技術は、気象科学や気候科学で急速に人気が高まっている。データ同化(DA)は、観測と数値モデルを組み合わせたもので、機械学習と人工知能(ML/AI)技術を組み込む大きな可能性を秘めている。本稿では,畳み込みニュートラルネットワーク(CNN)の一種であるU-Netを用いて,Ensemble Kalman Filter(EnKF)アルゴリズムの局所的なアンサンブル共分散を予測する。 2層準地球栄養モデルを用いて、U-NetはEnKF DA実験のデータを用いて訓練される。トレーニングされたU-Netは、従来の3次元変動3DVar、アンサンブル3DVar(En3DVar)、EnKF法と比較されたU-Net KF実験において、フロー依存の局所的誤差共分散行列を予測するために使用される。 UNetKFの性能は、3DVar、En3DVar、EnKFと一致または超える。また、トレーニングされたU-Netを3DVarやEnKFと競合する、特に小さなアンサンブルサイズで、高解像度のUNetKF実装に転送できることを実証した。 Machine learning techniques have seen a tremendous rise in popularity in weather and climate sciences. Data assimilation (DA), which combines observations and numerical models, has great potential to incorporate machine learning and artificial intelligence (ML/AI) techniques. In this paper, we use U-Net, a type of convolutional neutral network (CNN), to predict the localized ensemble covariances for the Ensemble Kalman Filter (EnKF) algorithm. Using a 2-layer quasi-geostrophic model, U-Nets are trained using data from EnKF DA experiments. The trained U-Nets are then used to predict the flow-dependent localized error covariance matrices in U-Net Kalman Filter (UNetKF) experiments, which are compared to traditional 3-dimensional variational (3DVar), ensemble 3DVar (En3DVar) and EnKF methods. The performance of UNetKF can match or exceed that of 3DVar, En3DVar or EnKF. We also demonstrate that trained U-Nets can be transferred to a higher-resolution model for UNetKF implementation, which again performs competitively to 3DVar and EnKF, particularly for small ensemble sizes.	翻訳日:2024-03-20 15:51:27 公開日:2024-03-19
# 公衆衛生介入の効果評価のための半教師付きスコアベースマッチングアルゴリズム Semisupervised score based matching algorithm to evaluate the effect of public health interventions ( http://arxiv.org/abs/2403.12367v1 ) ライセンス: Link先を確認	Hongzhe Zhang, Jiasheng Shi, Jing Huang,	(参考訳) 多変量マッチングアルゴリズムは、ランダム化の欠如によって生じる潜在的なバイアスと共起効果を取り除くために、観察研究において類似した研究ユニットを「ペア」する。 1対1の多変量マッチングアルゴリズムでは、マッチする多数の"ペア"は、大量のサンプルからの情報と多数のタスクの両方を意味する可能性があるため、マッチングアルゴリズムと効率性があり、ドメインの専門家によるペアユニットの"トレーニング"セットを通じて提供される比較的限定的なマッチング知識の両方が実際に興味をそそる。我々は2次スコア関数 $S_{\beta}(x_i,x_j)= \beta^T (x_i-x_j)(x_i-x_j)^T \beta$ に基づく新しい1対1マッチングアルゴリズムを提案した。重み$\beta$は、可変重要度として解釈でき、ペアトレーニングユニット間のスコア差を最小限に抑えつつ、未ペアトレーニングユニット間のスコア差を最大化するように設計されている。さらに、トレーニングセットが未ペア集合よりもはるかに小さい典型的だが複雑な場合、未ペア集合を最大限活用する \underline{s}emisupervised \underline{c}ompanion \underline{o}ne-\underline{o}o-\underline{o}ne \underline{m}atching \underline{a}lgorithm (SCOTOMA) を提案する。提案した重み推定器は、真理マッチング基準が2次スコア関数であるときに一貫性があることが証明される。モデル仮定に違反した場合、提案アルゴリズムは一連のシミュレーションにより競合するアルゴリズムよりも優れていることを示す。提案アルゴリズムを実世界調査に応用し,政策立案のためのコミュニティ Covid-19 送信率に及ぼす個人教育の影響を調べた。 Multivariate matching algorithms "pair" similar study units in an observational study to remove potential bias and confounding effects caused by the absence of randomizations. In one-to-one multivariate matching algorithms, a large number of "pairs" to be matched could mean both the information from a large sample and a large number of tasks, and therefore, to best match the pairs, such a matching algorithm with efficiency and comparatively limited auxiliary matching knowledge provided through a "training" set of paired units by domain experts, is practically intriguing. We proposed a novel one-to-one matching algorithm based on a quadratic score function $S_{\beta}(x_i,x_j)= \beta^T (x_i-x_j)(x_i-x_j)^T \beta$. The weights $\beta$, which can be interpreted as a variable importance measure, are designed to minimize the score difference between paired training units while maximizing the score difference between unpaired training units. Further, in the typical but intricate case where the training set is much smaller than the unpaired set, we propose a \underline{s}emisupervised \underline{c}ompanion \underline{o}ne-\underline{t}o-\underline{o}ne \underline{m}atching \underline{a}lgorithm (SCOTOMA) that makes the best use of the unpaired units. The proposed weight estimator is proved to be consistent when the truth matching criterion is indeed the quadratic score function. When the model assumptions are violated, we demonstrate that the proposed algorithm still outperforms some popular competing matching algorithms through a series of simulations. We applied the proposed algorithm to a real-world study to investigate the effect of in-person schooling on community Covid-19 transmission rate for policy making purpose.	翻訳日:2024-03-20 15:41:42 公開日:2024-03-19
# 大規模言語モデルによる特徴的AIエージェント Characteristic AI Agents via Large Language Models ( http://arxiv.org/abs/2403.12368v1 ) ライセンス: Link先を確認	Xi Wang, Hongliang Dai, Shen Gao, Piji Li,	(参考訳) 大規模言語モデル(LLM)の進歩は、チャットボットシステムの性能を大幅に向上させた。多くの研究者が、チャットボットに特徴をもたらす開発に力を注いでいる。 LLMを用いたロール駆動型チャットボットを開発するための商用製品は存在するが、この分野の学術研究は比較的少ないことは注目に値する。本研究は,異なる環境における実生活の個人をシミュレートすることで,特徴的AIエージェント構築におけるLLMの性能調査に焦点をあてる。現在の調査は、主に単純なプロファイルを持つ役割の行動に焦点を当てている。この研究ギャップに対応するために、私たちは、データセット、テクニック、評価指標を含む、特徴的なAIエージェントタスクのベンチマークを作成します。このベンチマークには '`Character100'' というデータセットが構築されている。構築したデータセットを用いて,様々な環境におけるLCMの包括的評価を行う。さらに,定量的性能評価のための自動測定値のセットを考案した。実験結果から,LLMの能力向上に向けた潜在的な方向性が明らかにされた。ベンチマークはhttps://github.com/nuaa-nlp/Character100で公開されている。 The advancement of Large Language Models (LLMs) has led to significant enhancements in the performance of chatbot systems. Many researchers have dedicated their efforts to the development of bringing characteristics to chatbots. While there have been commercial products for developing role-driven chatbots using LLMs, it is worth noting that academic research in this area remains relatively scarce. Our research focuses on investigating the performance of LLMs in constructing Characteristic AI Agents by simulating real-life individuals across different settings. Current investigations have primarily focused on act on roles with simple profiles. In response to this research gap, we create a benchmark for the characteristic AI agents task, including dataset, techniques, and evaluation metrics. A dataset called ``Character100'' is built for this benchmark, comprising the most-visited people on Wikipedia for language models to role-play. With the constructed dataset, we conduct comprehensive assessment of LLMs across various settings. In addition, we devise a set of automatic metrics for quantitative performance evaluation. The experimental results underscore the potential directions for further improvement in the capabilities of LLMs in constructing characteristic AI agents. The benchmark is available at https://github.com/nuaa-nlp/Character100.	翻訳日:2024-03-20 15:41:42 公開日:2024-03-19
# XPose: eXplainable Human Pose Estimation XPose: eXplainable Human Pose Estimation ( http://arxiv.org/abs/2403.12370v1 ) ライセンス: Link先を確認	Luyu Qiu, Jianing Li, Lei Wen, Chi Su, Fei Hao, Chen Jason Zhang, Lei Chen,	(参考訳) ポーズ推定における現在のアプローチは、主にモデルアーキテクチャの強化に焦点を当てており、しばしばモデル決定の背後にある理論的根拠を包括的に理解することの重要性を見落としている。本稿では,説明可能なAI(XAI)の原理を組み込んだ新しいフレームワークであるXPoseを提案する。この統合は、最終的な予測に対する各キーポイントの個々の貢献を解明することを目的としており、それによってモデルの透明性と解釈可能性を高める。従来のXAI技術は、分類のような単一ターゲットタスクで主にタスクに対処してきた。さらに、XAIにおける一般的な測度であるShapley値の見積もりへの適用は、禁忌な計算要求によって妨げられている。これらの課題に対処するため、この研究はGroup Shapley Value (GSV)と呼ばれる革新的な概念を導入している。このアプローチは、キーポイントを相互依存性に基づいてクラスタに戦略的に整理する。これらのクラスタ内では、GSVはキーポイントのShapley値を慎重に計算し、クラスタ間キーポイントでは、より包括的なグループレベルの評価を選択する。この二重レベル計算フレームワークは、計算効率を最適化し、最終結果に対するキーポイントの貢献を慎重に評価する。キーポイントインタラクションの洞察に基づいて、グループベースのキーポイント除去(GKR)と呼ばれる新しいデータ拡張手法を考案する。この方法は、トレーニングフェーズ中に個々のキーポイントを巧妙に除去し、強い相互接続を持つキーポイントを意図的に保存し、非可視キーポイントに対するモデルの予測能力を改善する。標準アプローチのスペクトルにおけるGKRの実証的検証は、その有効性を証明している。 GKRの成功は、Explainable AI(XAI)を使用することで、ポーズ推定モデルを直接強化できることを実証している。 Current approaches in pose estimation primarily concentrate on enhancing model architectures, often overlooking the importance of comprehensively understanding the rationale behind model decisions. In this paper, we propose XPose, a novel framework that incorporates Explainable AI (XAI) principles into pose estimation. This integration aims to elucidate the individual contribution of each keypoint to final prediction, thereby elevating the model's transparency and interpretability. Conventional XAI techniques have predominantly addressed tasks with single-target tasks like classification. Additionally, the application of Shapley value, a common measure in XAI, to pose estimation has been hindered by prohibitive computational demands. To address these challenges, this work introduces an innovative concept called Group Shapley Value (GSV). This approach strategically organizes keypoints into clusters based on their interdependencies. Within these clusters, GSV meticulously calculates Shapley value for keypoints, while for inter-cluster keypoints, it opts for a more holistic group-level valuation. This dual-level computation framework meticulously assesses keypoint contributions to the final outcome, optimizing computational efficiency. Building on the insights into keypoint interactions, we devise a novel data augmentation technique known as Group-based Keypoint Removal (GKR). This method ingeniously removes individual keypoints during training phases, deliberately preserving those with strong mutual connections, thereby refining the model's predictive prowess for non-visible keypoints. The empirical validation of GKR across a spectrum of standard approaches attests to its efficacy. GKR's success demonstrates how using Explainable AI (XAI) can directly enhance pose estimation models.	翻訳日:2024-03-20 15:41:42 公開日:2024-03-19
# マルチモーダル言語モデリングによる時系列分類の改善 Advancing Time Series Classification with Multimodal Language Modeling ( http://arxiv.org/abs/2403.12371v1 ) ライセンス: Link先を確認	Mingyue Cheng, Yiheng Chen, Qi Liu, Zhiding Liu, Yucong Luo,	(参考訳) 時系列分類の進歩, 先行研究の精査, 既存のほとんどの手法は, 共通の学習と分類のパラダイムを採用しており, 時系列分類モデルでは, シーケンス入力と1ホット分布で符号化されたターゲットラベルの関係を学習しようとする。このパラダイムは,(1)一点分布を持つ対象カテゴリの符号化ではラベル間の相違性や類似性を反映できないこと,(2)ドメイン間の伝達可能なモデルを学習することが極めて困難であること,という2つの固有の制約を隠蔽する。本研究では,時系列分類を学習から生成へのパラダイムとして再形成する新しい試みであるInstructTimeを提案する。事前学習された言語モデルの強力な生成能力に基づいて、ラベル情報をテキストで表現しながら、タスク固有の命令と生の時系列の両方をマルチモーダル入力として扱うマルチモーダル理解タスクとして時系列の分類を定式化する。この目標を達成するために、3つの異なるデザインがInstructTimeで開発されている。第一に、時系列離散化モジュールは、連続時系列をハードトークンの列に変換し、モーダル入力間の不整合問題を解決するように設計されている。モーダリティ表現ギャップの問題を解決するために、時系列の変換トークンを言語モデルに入力する前に、アライメント投影層を導入する。また,言語モデルの伝達性の向上と一般化性能の向上を図るために,ドメイン間の自動回帰事前学習の必要性を強調した。 InstructTimeの優れた性能と時系列分類における普遍的な基礎モデルの可能性を明らかにするため、ベンチマークデータセット上で大規模な実験が行われた。 For the advancements of time series classification, scrutinizing previous studies, most existing methods adopt a common learning-to-classify paradigm - a time series classifier model tries to learn the relation between sequence inputs and target label encoded by one-hot distribution. Although effective, this paradigm conceals two inherent limitations: (1) encoding target categories with one-hot distribution fails to reflect the comparability and similarity between labels, and (2) it is very difficult to learn transferable model across domains, which greatly hinder the development of universal serving paradigm. In this work, we propose InstructTime, a novel attempt to reshape time series classification as a learning-to-generate paradigm. Relying on the powerful generative capacity of the pre-trained language model, the core idea is to formulate the classification of time series as a multimodal understanding task, in which both task-specific instructions and raw time series are treated as multimodal inputs while the label information is represented by texts. To accomplish this goal, three distinct designs are developed in the InstructTime. Firstly, a time series discretization module is designed to convert continuous time series into a sequence of hard tokens to solve the inconsistency issue across modal inputs. To solve the modality representation gap issue, for one thing, we introduce an alignment projected layer before feeding the transformed token of time series into language models. For another, we highlight the necessity of auto-regressive pre-training across domains, which can facilitate the transferability of the language model and boost the generalization performance. Extensive experiments are conducted over benchmark datasets, whose results uncover the superior performance of InstructTime and the potential for a universal foundation model in time series classification.	翻訳日:2024-03-20 15:41:42 公開日:2024-03-19
# 言語モデルからのクロスドメイン事前学習による移動可能時系列分類器の学習 Learning Transferable Time Series Classifier with Cross-Domain Pre-training from Language Model ( http://arxiv.org/abs/2403.12372v1 ) ライセンス: Link先を確認	Mingyue Cheng, Xiaoyu Tao, Qi Liu, Hao Zhang, Yiheng Chen, Chenyi Lei,	(参考訳) 自己教師付き事前訓練(SSL)の進歩は、下流タスクの強化に非常に有用である、転送可能な時系列表現の学習分野を著しく進歩させてきた。効果的であるにもかかわらず、既存のほとんどの作業は、クロスドメインSSL事前トレーニングを達成するのに苦労し、異なるドメインのパターンや機能を統合する貴重な機会を欠いている。主な課題は、チャンネル数や時間分解能尺度の変動など、異なる領域にわたる時系列データの特徴の顕著な違いにある。この課題に対処するために、さまざまなドメインから転送可能な知識を学習し、ターゲットの下流タスクに大きく貢献するクロスドメインSSL学習フレームワークであるCrossTimeNetを提案する。 CrossTimeNetの重要な特徴の1つは、新しく設計された時系列トークン化モジュールである。さらに,SSL事前トレーニング中に複数のドメインにまたがる情報パターンの抽出には,高頻度の不正トークンの予測が極めて有用であることも強調した。さらに,本研究では,先行学習言語モデル(PLM)をエンコーダネットワークの初期化として扱い,PLMが学習した知識を時系列領域に転送する可能性について検討した。これらの取り組みを通じて、ジェネリック時系列モデルのクロスドメイン事前学習へのパスを効果的に舗装することができる。我々は、様々な時系列分類領域にわたる実世界のシナリオにおいて広範な実験を行う。実験の結果、CrossTimeNetの優れたパフォーマンスが明確に確認された。 Advancements in self-supervised pre-training (SSL) have significantly advanced the field of learning transferable time series representations, which can be very useful in enhancing the downstream task. Despite being effective, most existing works struggle to achieve cross-domain SSL pre-training, missing valuable opportunities to integrate patterns and features from different domains. The main challenge lies in the significant differences in the characteristics of time-series data across different domains, such as variations in the number of channels and temporal resolution scales. To address this challenge, we propose CrossTimeNet, a novel cross-domain SSL learning framework to learn transferable knowledge from various domains to largely benefit the target downstream task. One of the key characteristics of CrossTimeNet is the newly designed time series tokenization module, which could effectively convert the raw time series into a sequence of discrete tokens based on a reconstruction optimization process. Besides, we highlight that predicting a high proportion of corrupted tokens can be very helpful for extracting informative patterns across different domains during SSL pre-training, which has been largely overlooked in past years. Furthermore, unlike previous works, our work treats the pre-training language model (PLM) as the initialization of the encoder network, investigating the feasibility of transferring the knowledge learned by the PLM to the time series area. Through these efforts, the path to cross-domain pre-training of a generic time series model can be effectively paved. We conduct extensive experiments in a real-world scenario across various time series classification domains. The experimental results clearly confirm CrossTimeNet's superior performance.	翻訳日:2024-03-20 15:41:42 公開日:2024-03-19
# RankPrompt: 言語モデルにおけるステップバイステップの比較 RankPrompt: Step-by-Step Comparisons Make Language Models Better Reasoners ( http://arxiv.org/abs/2403.12373v1 ) ライセンス: Link先を確認	Chi Hu, Yuan Ge, Xiangnan Ma, Hang Cao, Qiang Li, Yonghua Yang, Tong Xiao, Jingbo Zhu,	(参考訳) 大きな言語モデル(LLM)は、様々な推論タスクで素晴らしいパフォーマンスを実現しています。しかし、ChatGPTのような最先端のLCMでさえ、推論プロセス中に論理的な誤りを犯しやすい。既存のソリューションでは、タスク固有のバリデーションのデプロイや、複数の推論パスに投票するなど、広範なヒューマンアノテーションを必要とするか、一貫性のないレスポンスのシナリオで失敗する。これらの課題に対処するために, LLMが追加資源を使わずに応答を自己ランクできる新しいプロンプト手法である RankPrompt を導入する。 RankPromptは、ランキング問題を様々な応答の一連の比較に分解し、LLMの本質的な能力を活用して、文脈的な例えとして比較の連鎖を生成する。 11の算術的および常識的推論タスクを対象とした実験により,RangePromptはChatGPTとGPT-4の推論性能を大幅に向上し,最大13%の改善が得られた。 RankPromptは、ALpacaEval セットの 74 % の時間を人間の好みに合わせて、LLM ベースのオープンエンドジェネレーションの自動評価にも優れています。さらにRightPromptは、順序や応答の成分の変動に対して堅牢性を示す。 Large Language Models (LLMs) have achieved impressive performance across various reasoning tasks. However, even state-of-the-art LLMs such as ChatGPT are prone to logical errors during their reasoning processes. Existing solutions, which include deploying task-specific verifiers or voting over multiple reasoning paths, either require extensive human annotations or fail in scenarios with inconsistent responses. To address these challenges, we introduce RankPrompt, a new prompting method that enables LLMs to self-rank their responses without additional resources. RankPrompt breaks down the ranking problem into a series of comparisons among diverse responses, leveraging the inherent capabilities of LLMs to generate chains of comparison as contextual exemplars. Our experiments across 11 arithmetic and commonsense reasoning tasks show that RankPrompt significantly enhances the reasoning performance of ChatGPT and GPT-4, with improvements of up to 13\%. RankPrompt also excels in LLM-based automatic evaluations for open-ended generation, aligning with human preferences 74\% of the time in the AlpacaEval set. Moreover, RankPrompt demonstrates robustness against variations in the orderings and consistencies of responses.	翻訳日:2024-03-20 15:41:42 公開日:2024-03-19
# プロンプトチューニングによる大規模言語モデルを用いた健康の社会的要因抽出の一般化性の向上 Improving Generalizability of Extracting Social Determinants of Health Using Large Language Models through Prompt-tuning ( http://arxiv.org/abs/2403.12374v1 ) ライセンス: Link先を確認	Cheng Peng, Zehao Yu, Kaleb E Smith, Wei-Hsuan Lo-Ciganic, Jiang Bian, Yonghui Wu,	(参考訳) 大規模言語モデル(LLM)を用いた自然言語処理(NLP)の進歩は,臨床物語からの患者の情報抽出を大幅に改善した。しかし、細調整戦略に基づくほとんどの手法は、クロスドメインアプリケーションのための伝達学習能力に制限がある。本研究では,LLMを所望の出力に導くための訓練可能なプロンプトを導入し,ソフトプロンプトに基づく学習アーキテクチャを用いた新しい手法を提案する。我々は,エンコーダのみのGatorTronとデコーダのみのGatorTronGPTの2種類のLCMアーキテクチャについて検討し,2022 n2c2チャレンジとフロリダ大学(UF)ヘルスからのクロスインスティテューションデータセットを用いて,社会的健康決定因子(SDoH)抽出のパフォーマンスを評価した。その結果,高速チューニングによるデコーダのみのLLMがクロスドメインアプリケーションの性能向上を実現していることがわかった。 GatorTronGPTは両方のデータセットで最高のF1スコアを獲得し、従来の微調整されたGatorTronを8.9%、21.8%、クロスリリース環境で5.5%、14.5%で上回った。 The progress in natural language processing (NLP) using large language models (LLMs) has greatly improved patient information extraction from clinical narratives. However, most methods based on the fine-tuning strategy have limited transfer learning ability for cross-domain applications. This study proposed a novel approach that employs a soft prompt-based learning architecture, which introduces trainable prompts to guide LLMs toward desired outputs. We examined two types of LLM architectures, including encoder-only GatorTron and decoder-only GatorTronGPT, and evaluated their performance for the extraction of social determinants of health (SDoH) using a cross-institution dataset from the 2022 n2c2 challenge and a cross-disease dataset from the University of Florida (UF) Health. The results show that decoder-only LLMs with prompt tuning achieved better performance in cross-domain applications. GatorTronGPT achieved the best F1 scores for both datasets, outperforming traditional fine-tuned GatorTron by 8.9% and 21.8% in a cross-institution setting, and 5.5% and 14.5% in a cross-disease setting.	翻訳日:2024-03-20 15:41:42 公開日:2024-03-19
# ゼロショット自己監督ブラインド画像の低軌道適応 Low-Trace Adaptation of Zero-shot Self-supervised Blind Image Denoising ( http://arxiv.org/abs/2403.12382v1 ) ライセンス: Link先を確認	Jintong Hu, Bin Xia, Bingchen Li, Wenming Yang,	(参考訳) ディープラーニングベースのDenoiserは、最近の画像のデノーミングに焦点を合わせている。近年,ノイズの多い画像のみを必要とする自己監督型デノベーションネットワークの開発への関心が高まっている。しかし、現在の自己管理手法と教師付き手法の間には、パフォーマンスのギャップが残っている。さらに、これらの手法は一般的にノイズ特性に関する仮定に依存し、現実のシナリオにおける適用性を制約する。フロベニウスのノルム展開の特性に着想を得て、トレース項を組み込むことで、自己教師付き手法と教師付き手法の最適化目標の相違が軽減され、自己教師付き学習の性能が向上することを発見した。この知見を活かして,自己教師型学習と教師型学習のギャップを埋める低トレース適応型ノイズ2ノイズ(LoTA-N2N)モデルを提案する。さらに,既存の自己監督型デノベーションフレームワークが,サブケースとして提案されたトレース制約損失に自然に該当していることが判明した。自然・共焦点画像データセットを用いた広汎な実験により,ノイズに関する仮定に頼らずに,ゼロショット自己監督画像デノナイジングアプローチの領域内で最先端の性能を実現することが示唆された。 Deep learning-based denoiser has been the focus of recent development on image denoising. In the past few years, there has been increasing interest in developing self-supervised denoising networks that only require noisy images, without the need for clean ground truth for training. However, a performance gap remains between current self-supervised methods and their supervised counterparts. Additionally, these methods commonly depend on assumptions about noise characteristics, thereby constraining their applicability in real-world scenarios. Inspired by the properties of the Frobenius norm expansion, we discover that incorporating a trace term reduces the optimization goal disparity between self-supervised and supervised methods, thereby enhancing the performance of self-supervised learning. To exploit this insight, we propose a trace-constraint loss function and design the low-trace adaptation Noise2Noise (LoTA-N2N) model that bridges the gap between self-supervised and supervised learning. Furthermore, we have discovered that several existing self-supervised denoising frameworks naturally fall within the proposed trace-constraint loss as subcases. Extensive experiments conducted on natural and confocal image datasets indicate that our method achieves state-of-the-art performance within the realm of zero-shot self-supervised image denoising approaches, without relying on any assumptions regarding the noise.	翻訳日:2024-03-20 15:41:42 公開日:2024-03-19
# マルチモーダルレコメンデーションのためのアライニングとトレーニングフレームワーク An Aligning and Training Framework for Multimodal Recommendations ( http://arxiv.org/abs/2403.12384v1 ) ライセンス: Link先を確認	Yifan Liu, Kangning Zhang, Xiangyuan Ren, Yanhua Huang, Jiarui Jin, Yingjie Qin, Ruilong Su, Ruiwen Xu, Weinan Zhang,	(参考訳) マルチメディアアプリケーションの開発において、ユーザインタラクション以上のリッチなコンテキストを活用できるため、マルチモーダルレコメンデーションは重要な役割を担っている。既存の手法では, マルチモーダル情報を補助的とみなし, それらを用いてIDの特徴を学習するが, 多モーダルコンテンツの特徴とIDの特徴の間には意味的ギャップがあり, ユーザやアイテムの表現の誤調整につながる。本稿では,まず,マルチモーダルレコメンデーションにおけるミスアライメント問題を体系的に検討し,AlignRecというソリューションを提案する。 AlignRecでは、推奨目的をコンテンツ内のアライメント、コンテンツとカテゴリID間のアライメント、ユーザとアイテム間のアライメントという3つのアライメントに分解する。各アライメントは、特定の目的関数によって特徴づけられ、当社のマルチモーダルレコメンデーションフレームワークに統合されます。 AlignRecを効果的にトレーニングするために、まず最初にアライメントを事前訓練して、統一されたマルチモーダル特徴を取得し、その後、これらの特徴を入力として、以下の2つのアライメントをトレーニングすることを提案する。各マルチモーダルフィーチャがトレーニングに役立つかどうかを分析することが不可欠であるため、中間性能を評価するために3つの新しいメトリクスクラスを設計する。実世界の3つのデータセットに関する広範な実験は、9つのベースラインと比較して、AlignRecの優位性を一貫して検証している。また、AlignRecによって生成されるマルチモーダル機能は、現在使われているものよりも優れていることが分かりました。 With the development of multimedia applications, multimodal recommendations are playing an essential role, as they can leverage rich contexts beyond user interactions. Existing methods mainly regard multimodal information as an auxiliary, using them to help learn ID features; however, there exist semantic gaps among multimodal content features and ID features, for which directly using multimodal information as an auxiliary would lead to misalignment in representations of users and items. In this paper, we first systematically investigate the misalignment issue in multimodal recommendations, and propose a solution named AlignRec. In AlignRec, the recommendation objective is decomposed into three alignments, namely alignment within contents, alignment between content and categorical ID, and alignment between users and items. Each alignment is characterized by a specific objective function and is integrated into our multimodal recommendation framework. To effectively train our AlignRec, we propose starting from pre-training the first alignment to obtain unified multimodal features and subsequently training the following two alignments together with these features as input. As it is essential to analyze whether each multimodal feature helps in training, we design three new classes of metrics to evaluate intermediate performance. Our extensive experiments on three real-world datasets consistently verify the superiority of AlignRec compared to nine baselines. We also find that the multimodal features generated by AlignRec are better than currently used ones, which are to be open-sourced.	翻訳日:2024-03-20 15:41:42 公開日:2024-03-19
# VideoBadminton:バドミントン行動認識のためのビデオデータセット VideoBadminton: A Video Dataset for Badminton Action Recognition ( http://arxiv.org/abs/2403.12385v1 ) ライセンス: Link先を確認	Qi Li, Tzu-Chen Chiu, Hsiang-Wei Huang, Min-Te Sun, Wei-Shinn Ku,	(参考訳) コンピュータビジョンのダイナミックで進化する分野では、特に畳み込みニューラルネットワーク(CNN)、畳み込み3D、トランスフォーマー、空間時間的特徴融合といった高度な方法論の出現によって、アクション認識が重要な焦点となっている。これらの技術は、確立されたベンチマークにおいて有望な結果を示しているが、特にスポーツ分析において、活動の正確な分解と微妙な異なる行動の区別が不可欠である現実の応用において、ユニークな課題に直面している。 UCF101、HMDB51、Kineticsといった既存のデータセットは、さまざまなシナリオのための多様なビデオデータを提供している。しかし、より広範なアクションカテゴリ内の詳細な分類とニュアンスをキャプチャする、きめ細かいビデオデータセットの必要性が高まっている。本稿では,高品質なバドミントン映像から得られたビデオバドミントンデータセットを紹介する。本研究は,特にバドミントンスポーツにおける行動認識の分野での進歩をめざす。 VideoBadmintonの導入は、バドミントンアクション認識だけでなく、きめ細かいアクションを認識するためのデータセットも提供する。これらの評価から得られた知見は、特にスポーツの文脈において、行動理解のさらなる研究を促進することが期待されている。 In the dynamic and evolving field of computer vision, action recognition has become a key focus, especially with the advent of sophisticated methodologies like Convolutional Neural Networks (CNNs), Convolutional 3D, Transformer, and spatial-temporal feature fusion. These technologies have shown promising results on well-established benchmarks but face unique challenges in real-world applications, particularly in sports analysis, where the precise decomposition of activities and the distinction of subtly different actions are crucial. Existing datasets like UCF101, HMDB51, and Kinetics have offered a diverse range of video data for various scenarios. However, there's an increasing need for fine-grained video datasets that capture detailed categorizations and nuances within broader action categories. In this paper, we introduce the VideoBadminton dataset derived from high-quality badminton footage. Through an exhaustive evaluation of leading methodologies on this dataset, this study aims to advance the field of action recognition, particularly in badminton sports. The introduction of VideoBadminton could not only serve for badminton action recognition but also provide a dataset for recognizing fine-grained actions. The insights gained from these evaluations are expected to catalyze further research in action comprehension, especially within sports contexts.	翻訳日:2024-03-20 15:41:42 公開日:2024-03-19
# 共同学習のためのパイプライン型バイオメディカルイベント抽出 Pipelined Biomedical Event Extraction Rivaling Joint Learning ( http://arxiv.org/abs/2403.12386v1 ) ライセンス: Link先を確認	Pengchao Wu, Xuefeng Li, Jinghang Gu, Longhua Qian, Guodong Zhou,	(参考訳) バイオメディカルイベント抽出(バイオメディカルイベント抽出、英: Biomedical Event extract)は、バイオメディカルテキストからイベントを取得するための情報抽出タスクである。従来のバイオメディカルイベント抽出は通常、トリガー識別、引数ロール認識、最終的なイベント構築を含むパイプライン化されたアプローチを採用する。本稿では,イベントのコンテキストとその参加者に関する意味情報をキャプチャするために,BERT事前学習モデルに基づくn-ary関係抽出手法を提案する。実験の結果,BioNLP共有タスクのGE11とGE13コーパスにおいて,F1スコアが63.14%,GE13コーパスが59.40%であった。その結果、バインディングイベントの性能を大幅に向上させることで、パイプライン化されたイベント抽出アプローチの全体的なパフォーマンスが、現在のジョイントラーニング手法を超えていることが示される。 Biomedical event extraction is an information extraction task to obtain events from biomedical text, whose targets include the type, the trigger, and the respective arguments involved in an event. Traditional biomedical event extraction usually adopts a pipelined approach, which contains trigger identification, argument role recognition, and finally event construction either using specific rules or by machine learning. In this paper, we propose an n-ary relation extraction method based on the BERT pre-training model to construct Binding events, in order to capture the semantic information about an event's context and its participants. The experimental results show that our method achieves promising results on the GE11 and GE13 corpora of the BioNLP shared task with F1 scores of 63.14% and 59.40%, respectively. It demonstrates that by significantly improving theperformance of Binding events, the overall performance of the pipelined event extraction approach or even exceeds those of current joint learning methods.	翻訳日:2024-03-20 15:41:42 公開日:2024-03-19
# 大規模言語モデルを用いた会話システムの解釈可能なユーザ満足度推定 Interpretable User Satisfaction Estimation for Conversational Systems with Large Language Models ( http://arxiv.org/abs/2403.12388v1 ) ライセンス: Link先を確認	Ying-Chun Lin, Jennifer Neville, Jack W. Stokes, Longqi Yang, Tara Safavi, Mengting Wan, Scott Counts, Siddharth Suri, Reid Andersen, Xiaofeng Xu, Deepak Gupta, Sujay Kumar Jauhar, Xia Song, Georg Buscher, Saurabh Tiwary, Brent Hecht, Jaime Teevan,	(参考訳) 正確なユーザ満足度推定(USE)は、会話システムを理解し、評価し、継続的に改善するために重要である。ユーザは、汎用(ChatGPTとBing Copilot)とタスク指向(顧客サービスチャットボット)の会話システムの両方において、多様な会話パターンに対する満足感や不満を表明する。既存のMLモデルやテキスト埋め込みに基づくアプローチは、一般化可能なパターンの抽出に不足しており、解釈が難しい。本研究では,LLMが自然言語音声からユーザ満足度の解釈可能な信号を抽出できることを,埋め込み型アプローチよりも効果的に示す。さらに、ラベル付き例の監視を使用して反復的なプロンプトフレームワークを通じて、LLMをUSE用に調整することもできる。その結果,ユーザ満足度向上のためのSupervised Prompting for User satisfaction Rubrics (SPUR) が得られた。 Accurate and interpretable user satisfaction estimation (USE) is critical for understanding, evaluating, and continuously improving conversational systems. Users express their satisfaction or dissatisfaction with diverse conversational patterns in both general-purpose (ChatGPT and Bing Copilot) and task-oriented (customer service chatbot) conversational systems. Existing approaches based on featurized ML models or text embeddings fall short in extracting generalizable patterns and are hard to interpret. In this work, we show that LLMs can extract interpretable signals of user satisfaction from their natural language utterances more effectively than embedding-based approaches. Moreover, an LLM can be tailored for USE via an iterative prompting framework using supervision from labeled examples. The resulting method, Supervised Prompting for User satisfaction Rubrics (SPUR), not only has higher accuracy but is more interpretable as it scores user satisfaction via learned rubrics with a detailed breakdown.	翻訳日:2024-03-20 15:41:42 公開日:2024-03-19
# 学習誘導型反復局所探索によるミンマックス多旅行セールスマン問題の解法 Learning-guided iterated local search for the minmax multiple traveling salesman problem ( http://arxiv.org/abs/2403.12389v1 ) ライセンス: Link先を確認	Pengfei He, Jin-Kao Hao, Jinhui Xia,	(参考訳) ミンマックス・マルチトラベルセールスマンの問題は、一連のツアーの中で最長のツアーを最小化することである。この問題は、いくつかの現実の応用を定式化するために使用できるため、非常に実践的な関心事である。そこで本研究では,攻撃的局所探索手順と確率論的受容基準を組み合わせ,高品質な局所最適解を見つけるための傾き駆動型反復局所探索手法と,局所最適トラップを回避するための様々な除去・挿入演算子を選択するマルチアームバンディットアルゴリズムを提案する。 77のベンチマークインスタンスに対する大規模な実験により,我々のアルゴリズムは,ソリューションの品質と実行時間の観点から優れた結果が得られることが示された。特に、32の既知の結果が新たに達成され、35の他のインスタンスで最もよく知られた結果と一致している。さらなる実験は、アルゴリズムの構成要素の理解に光を当てた。 The minmax multiple traveling salesman problem involves minimizing the longest tour among a set of tours. The problem is of great practical interest because it can be used to formulate several real-life applications. To solve this computationally challenging problem, we propose a leaning-driven iterated local search approach that combines an aggressive local search procedure with a probabilistic acceptance criterion to find high-quality local optimal solutions and a multi-armed bandit algorithm to select various removal and insertion operators to escape local optimal traps. Extensive experiments on 77 commonly used benchmark instances show that our algorithm achieves excellent results in terms of solution quality and running time. In particular, it achieves 32 new best-known results and matches the best-known results for 35 other instances. Additional experiments shed light on the understanding of the composing elements of the algorithm.	翻訳日:2024-03-20 15:41:42 公開日:2024-03-19
# FairSTG: 協調的なサンプルレベルの最適化による性能不均一性対策 FairSTG: Countering performance heterogeneity via collaborative sample-level optimization ( http://arxiv.org/abs/2403.12391v1 ) ライセンス: Link先を確認	Gengyu Lin, Zhengyang Zhou, Qihe Huang, Kuo Yang, Shifen Cheng, Yang Wang,	(参考訳) 時空間学習は、スマートな引用力を高めるためのモバイルコンピューティング技術において重要な役割を担っている。既存の研究はデータセット全体の正確な予測を達成するために多大な努力をしてきたが、それでもサンプル間の大きなパフォーマンスの不均一性を無視している。本研究では, 不公平な時空間学習の理由として, 不公平な時空間学習が, モデルの実用的機能を低下させるだけでなく, 現実の都市への適用にも重大なリスクをもたらすことを示唆する。このギャップを解消するために,SpaatioTemporal Graph Learning (FairSTG) のためのモデルに依存しないFairness-awareフレームワークを提案する。特に、FairSTGは、モデル初期化のための時空間的特徴抽出器、よく学習されたサンプルと挑戦的なサンプル間の知識伝達のための協調表現強化、サンプルレベルの不均一性を即時抑制するための公正性目的からなる。 4つの時空間データセットの実験により、FairSTGは、同等の予測精度を維持しながら、フェアネス品質を著しく改善することを示した。ケーススタディでは、サンプルレベルの検索と補償により、FairSTGは空間的および時間的パフォーマンスの不均一性を両立させることができることが示され、我々の研究は、表現不足の都市部における時空間資源配分のリスクを軽減できる可能性がある。 Spatiotemporal learning plays a crucial role in mobile computing techniques to empower smart cites. While existing research has made great efforts to achieve accurate predictions on the overall dataset, they still neglect the significant performance heterogeneity across samples. In this work, we designate the performance heterogeneity as the reason for unfair spatiotemporal learning, which not only degrades the practical functions of models, but also brings serious potential risks to real-world urban applications. To fix this gap, we propose a model-independent Fairness-aware framework for SpatioTemporal Graph learning (FairSTG), which inherits the idea of exploiting advantages of well-learned samples to challenging ones with collaborative mix-up. Specifically, FairSTG consists of a spatiotemporal feature extractor for model initialization, a collaborative representation enhancement for knowledge transfer between well-learned samples and challenging ones, and fairness objectives for immediately suppressing sample-level performance heterogeneity. Experiments on four spatiotemporal datasets demonstrate that our FairSTG significantly improves the fairness quality while maintaining comparable forecasting accuracy. Case studies show FairSTG can counter both spatial and temporal performance heterogeneity by our sample-level retrieval and compensation, and our work can potentially alleviate the risks on spatiotemporal resource allocation for underrepresented urban regions.	翻訳日:2024-03-20 15:41:42 公開日:2024-03-19
# AraPoemBERT:アラビア詩分析のための事前訓練された言語モデル AraPoemBERT: A Pretrained Language Model for Arabic Poetry Analysis ( http://arxiv.org/abs/2403.12392v1 ) ライセンス: Link先を確認	Faisal Qarah,	(参考訳) アラビア語の詩は、その豊かな言語的特徴と文化的意義から、自然言語処理(NLP)分野に固有の課題を呈している。その構造と文脈の複雑さは、正確な解析のために高度な計算モデルを必要とする。本稿では,アラビア詩文にのみ事前訓練されたアラビア語モデルであるAraPoemBERTを紹介する。提案モデルの有効性を示すため,アラビア詩に関連するさまざまなNLP課題に対して,アラビア語モデルと5つの異なるアラビア語モデルを比較した。新しいモデルは、他のすべてのモデルより優れ、ダウンストリームタスクの大部分で最先端の結果を得た。 AraPoemBERTは、詩のジェンダー分類(99.34\%の精度)と詩のサブメーター分類(97.79\%の精度)の3つの新しいタスクのうち、前例のない精度を達成した。さらに、このモデルは詩の韻律分類(97.73\%の精度)において、この研究で報告された最良のスコアとほぼ等しい精度のスコアを得た。さらに, 提案モデルでは, 詩の感情分析のタスクにおいて, 従来の作業や比較モデルよりも, 78.95\%の精度, 詩数計の分類(99.03\%の精度)に優れており, これら2つの問題の範囲を大きく広げている。本研究で用いたデータセットは、オンラインソースから収集された2億9900万以上の詩を収録しており、それぞれにメーター、サブメーター、詩人、韻律、トピックといった様々な属性が関連付けられている。その結果、提案モデルがアラビア詩の理解と分析、いくつかのタスクにおける最先端の成果の達成、研究に含まれる以前の作品や他の言語モデルよりも優れていたことを示す。 AraPoemBERT モデルは \url{https://huggingface.co/faisalq} で公開されている。 Arabic poetry, with its rich linguistic features and profound cultural significance, presents a unique challenge to the Natural Language Processing (NLP) field. The complexity of its structure and context necessitates advanced computational models for accurate analysis. In this paper, we introduce AraPoemBERT, an Arabic language model pretrained exclusively on Arabic poetry text. To demonstrate the effectiveness of the proposed model, we compared AraPoemBERT with 5 different Arabic language models on various NLP tasks related to Arabic poetry. The new model outperformed all other models and achieved state-of-the-art results in most of the downstream tasks. AraPoemBERT achieved unprecedented accuracy in two out of three novel tasks: poet's gender classification (99.34\% accuracy), and poetry sub-meter classification (97.79\% accuracy). In addition, the model achieved an accuracy score in poems' rhyme classification (97.73\% accuracy) which is almost equivalent to the best score reported in this study. Moreover, the proposed model significantly outperformed previous work and other comparative models in the tasks of poems' sentiment analysis, achieving an accuracy of 78.95\%, and poetry meter classification (99.03\% accuracy), while significantly expanding the scope of these two problems. The dataset used in this study, contains more than 2.09 million verses collected from online sources, each associated with various attributes such as meter, sub-meter, poet, rhyme, and topic. The results demonstrate the effectiveness of the proposed model in understanding and analyzing Arabic poetry, achieving state-of-the-art results in several tasks and outperforming previous works and other language models included in the study. AraPoemBERT model is publicly available on \url{https://huggingface.co/faisalq}.	翻訳日:2024-03-20 15:41:42 公開日:2024-03-19
# Dr3: Open Domain Multi-Hop Question Answeringで、大規模言語モデルにオフ・トピックの回答を与えないように求める Dr3: Ask Large Language Models Not to Give Off-Topic Answers in Open Domain Multi-Hop Question Answering ( http://arxiv.org/abs/2403.12393v1 ) ライセンス: Link先を確認	Yuan Gao, Yiheng Zhu, Yuanbin Cao, Yinzhi Zhou, Zhen Wu, Yujie Chen, Shenglan Wu, Haoyuan Hu, Xinyu Dai,	(参考訳) Open Domain Multi-Hop Question Answering (ODMHQA) は、自然言語処理(NLP)において重要な役割を果たす。最近、LLM(Large Language Models)は、計画、推論、ツールの利用といったODMHQAの能力により、ODMHQAの解決において顕著なパフォーマンスを示している。しかし、LDMはODMHQAを解こうとすると、オフトピー的な解を生成する可能性があり、すなわち、生成された解は元の質問とは無関係である。オフトピー回答のこの問題は、その重要性にもかかわらず、不正確な回答のおよそ3分の1を占めている。この問題を軽減するために,Dr3 の離散化->Re-Compose->Re- Solve->Re-Decompose (Dr3) 機構を提案する。具体的には, LLMの本質的な能力を利用して, 生成した回答が話題外かどうかを判定する。オフトピー応答が検出された場合、Correctorは、最終回答がオントピーとなるまで、逆推論チェーン(Re-Compose->Re-Solve->Re-Decompose)に沿ってステップワイズリビジョンを行う。 HotpotQAおよび2WikiMultiHopQAデータセットによる実験結果から,我々のDr3機構はODMHQAにおけるオフトピー応答の発生を約13%削減し,エクササイズマッチ(EM)の性能をDr3機構のないベースライン法と比較して約3%向上することが示された。 Open Domain Multi-Hop Question Answering (ODMHQA) plays a crucial role in Natural Language Processing (NLP) by aiming to answer complex questions through multi-step reasoning over retrieved information from external knowledge sources. Recently, Large Language Models (LLMs) have demonstrated remarkable performance in solving ODMHQA owing to their capabilities including planning, reasoning, and utilizing tools. However, LLMs may generate off-topic answers when attempting to solve ODMHQA, namely the generated answers are irrelevant to the original questions. This issue of off-topic answers accounts for approximately one-third of incorrect answers, yet remains underexplored despite its significance. To alleviate this issue, we propose the Discriminate->Re-Compose->Re- Solve->Re-Decompose (Dr3) mechanism. Specifically, the Discriminator leverages the intrinsic capabilities of LLMs to judge whether the generated answers are off-topic. In cases where an off-topic answer is detected, the Corrector performs step-wise revisions along the reversed reasoning chain (Re-Compose->Re-Solve->Re-Decompose) until the final answer becomes on-topic. Experimental results on the HotpotQA and 2WikiMultiHopQA datasets demonstrate that our Dr3 mechanism considerably reduces the occurrence of off-topic answers in ODMHQA by nearly 13%, improving the performance in Exact Match (EM) by nearly 3% compared to the baseline method without the Dr3 mechanism.	翻訳日:2024-03-20 15:31:57 公開日:2024-03-19
# OV9D:Open-Vocabulary Category-Level 9D Object Poseとサイズ推定 OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation ( http://arxiv.org/abs/2403.12396v1 ) ライセンス: Link先を確認	Junhao Cai, Yisheng He, Weihao Yuan, Siyu Zhu, Zilong Dong, Liefeng Bo, Qifeng Chen,	(参考訳) 本稿では,新しい開集合問題,開語彙圏レベルのオブジェクトポーズとサイズ推定について検討する。ロボットエージェントは、任意の新規対象カテゴリの人間のテキスト記述を考慮し、観察されたシーン画像における対象対象の位置、向き、サイズを予測する。このような一般化を可能にするために,我々はまずOO3D-9Dという大規模フォトリアリスティックデータセットを導入する。 OmniObject3Dから派生したOO3D-9Dは、カテゴリレベルのオブジェクトのポーズとサイズ推定の分野で、最大かつ最も多様なデータセットである。各カテゴリの対称性軸に対する追加アノテーションが含まれており、対称性の曖昧さを解決するのに役立つ。大規模なデータセットとは別に、そのような一般化を可能にするための別の鍵は、事前学習された視覚言語基盤モデルにおける強力な事前知識を活用することである。次に, 対象インスタンスの正規化オブジェクト座標空間 (NOCS) を推定するために, 事前学習した DinoV2 とテキストから画像への安定拡散モデルに基づくフレームワークを提案する。このフレームワークは、DinoV2に先行する視覚的意味論と、テキストと画像の拡散モデルにおける協調的な視覚的および言語的知識をフル活用し、新しいカテゴリの様々なテキスト記述への一般化を可能にする。大規模合成データに基づいて学習したオープンボキャブラリ法がベースラインを著しく上回り,目に見えないカテゴリの実際の画像に効果的に一般化できることを,包括的定量的・定性的な実験により実証した。プロジェクトページはhttps://ov9d.github.io.comにある。 This paper studies a new open-set problem, the open-vocabulary category-level object pose and size estimation. Given human text descriptions of arbitrary novel object categories, the robot agent seeks to predict the position, orientation, and size of the target object in the observed scene image. To enable such generalizability, we first introduce OO3D-9D, a large-scale photorealistic dataset for this task. Derived from OmniObject3D, OO3D-9D is the largest and most diverse dataset in the field of category-level object pose and size estimation. It includes additional annotations for the symmetry axis of each category, which help resolve symmetric ambiguity. Apart from the large-scale dataset, we find another key to enabling such generalizability is leveraging the strong prior knowledge in pre-trained visual-language foundation models. We then propose a framework built on pre-trained DinoV2 and text-to-image stable diffusion models to infer the normalized object coordinate space (NOCS) maps of the target instances. This framework fully leverages the visual semantic prior from DinoV2 and the aligned visual and language knowledge within the text-to-image diffusion model, which enables generalization to various text descriptions of novel categories. Comprehensive quantitative and qualitative experiments demonstrate that the proposed open-vocabulary method, trained on our large-scale synthesized data, significantly outperforms the baseline and can effectively generalize to real-world images of unseen categories. The project page is at https://ov9d.github.io.	翻訳日:2024-03-20 15:31:57 公開日:2024-03-19
# ネットワークの選考: コミュニティキャンバスのための動的マルチステップ・アタック Electioneering the Network: Dynamic Multi-Step Adversarial Attacks for Community Canvassing ( http://arxiv.org/abs/2403.12399v1 ) ライセンス: Link先を確認	Saurabh Sharma, Ambuj SIngh,	(参考訳) コミュニティキャンバス化のためのオンラインソーシャルネットワーク操作の問題は、今日の世界で本当に懸念されている。ネットワーク上での投票モデル、意見、偏極のダイナミクスの研究により、GNNに対する勾配に基づく攻撃によって可能となるネットワーク上の動的なプロセスとして、コミュニティキャンバスをモデル化する。既存のGNNに対する攻撃はすべてシングルステップであり、ネットワーク内の情報拡散の動的カスケードの性質を考慮していない。敵がGNNを代理として利用して投票者の選好、特に不確実な有権者を予測・操作する現実的なシナリオを考察する。 GNNに対するグラディエントベースの攻撃は、ターゲットの有権者を散文化するための戦略的な操作を敵に通知する。特に、$\textit{minimum budget attack for community canvassing}$ (MBACC)について調べる。 MBACC問題はNP-Hardであり,その対策として動的マルチステップ・コミュニティ・キャンバスリング(MAC)を提案する。 MACは低予算のヒューリスティックと高い第2次影響力に基づいて動的局所決定を行い、ターゲットの投票を変換し、摂動する。 MACは動的多段階攻撃であり、効率的なカスケード攻撃が起こるような低予算かつ高影響の標的を発見する。複数の基盤ネットワークとGNNモデルを用いて,MBACC問題に基づく単一ステップベースラインに対するMACの評価を行った。本実験は, 敵コミュニティキャンバスの効率的なマルチホップ攻撃を検出できるMACの優位性を示すものである。コードの実装とデータはhttps://github.com/saurabhsharma 1993/mac.comで公開されています。 The problem of online social network manipulation for community canvassing is of real concern in today's world. Motivated by the study of voter models, opinion and polarization dynamics on networks, we model community canvassing as a dynamic process over a network enabled via gradient-based attacks on GNNs. Existing attacks on GNNs are all single-step and do not account for the dynamic cascading nature of information diffusion in networks. We consider the realistic scenario where an adversary uses a GNN as a proxy to predict and manipulate voter preferences, especially uncertain voters. Gradient-based attacks on the GNN inform the adversary of strategic manipulations that can be made to proselytize targeted voters. In particular, we explore $\textit{minimum budget attacks for community canvassing}$ (MBACC). We show that the MBACC problem is NP-Hard and propose Dynamic Multi-Step Adversarial Community Canvassing (MAC) to address it. MAC makes dynamic local decisions based on the heuristic of low budget and high second-order influence to convert and perturb target voters. MAC is a dynamic multi-step attack that discovers low-budget and high-influence targets from which efficient cascading attacks can happen. We evaluate MAC against single-step baselines on the MBACC problem with multiple underlying networks and GNN models. Our experiments show the superiority of MAC which is able to discover efficient multi-hop attacks for adversarial community canvassing. Our code implementation and data is available at https://github.com/saurabhsharma1993/mac.	翻訳日:2024-03-20 15:31:57 公開日:2024-03-19
# BERTにインスパイアされたワイヤレスセンシングにおけるパッケージ損失対策 Finding the Missing Data: A BERT-inspired Approach Against Package Loss in Wireless Sensing ( http://arxiv.org/abs/2403.12400v1 ) ライセンス: Link先を確認	Zijian Zhao, Tingwei Chen, Fanyi Meng, Hang Li, Xiaoyang Li, Guangxu Zhu,	(参考訳) Wi-Fiセンシングのための様々なディープラーニング手法の開発にもかかわらず、パッケージの損失は、学習モデルの性能に悪影響を及ぼすChannel State Information (CSI)の不連続な評価をもたらすことが多い。この課題を克服するために,CSIリカバリのための双方向エンコーダ表現(BERT)に基づく深層学習モデルCSI-BERTを提案する。 CSI-BERTは、追加のデータを必要とせずに、ターゲットデータセット上で自己管理的な方法でトレーニングすることができる。さらに、一度に1つのサブキャリアにフォーカスする従来の補間方法とは異なり、CSI-BERTは異なるサブキャリア間のシーケンシャルな関係をキャプチャする。実験により,CSI-BERTは損失率が高い場合でも従来の補間法に比べて誤り率と速度が低いことを示した。さらに、CSI-BERTから得られた回復したCSIを利用して、Residual NetworkやRecurrent Neural Networkのような他のディープラーニングモデルでも、Wi-Fiセンシングタスクの精度を平均15倍に向上させることができる。収集したデータセット WiGesture とモデル用のコードは https://github.com/RS2002/CSI-BERT で公開されている。 Despite the development of various deep learning methods for Wi-Fi sensing, package loss often results in noncontinuous estimation of the Channel State Information (CSI), which negatively impacts the performance of the learning models. To overcome this challenge, we propose a deep learning model based on Bidirectional Encoder Representations from Transformers (BERT) for CSI recovery, named CSI-BERT. CSI-BERT can be trained in an self-supervised manner on the target dataset without the need for additional data. Furthermore, unlike traditional interpolation methods that focus on one subcarrier at a time, CSI-BERT captures the sequential relationships across different subcarriers. Experimental results demonstrate that CSI-BERT achieves lower error rates and faster speed compared to traditional interpolation methods, even when facing with high loss rates. Moreover, by harnessing the recovered CSI obtained from CSI-BERT, other deep learning models like Residual Network and Recurrent Neural Network can achieve an average increase in accuracy of approximately 15\% in Wi-Fi sensing tasks. The collected dataset WiGesture and code for our model are publicly available at https://github.com/RS2002/CSI-BERT.	翻訳日:2024-03-20 15:31:57 公開日:2024-03-19
# VQ-NeRV:ビデオのためのベクトル量子化ニューラル表現 VQ-NeRV: A Vector Quantized Neural Representation for Videos ( http://arxiv.org/abs/2403.12401v1 ) ライセンス: Link先を確認	Yunjie Xu, Xiang Feng, Feiwei Qin, Ruiquan Ge, Yong Peng, Changmiao Wang,	(参考訳) Inlicit Neural representations (INR)は、ニューラルネットワーク内のビデオのエンコーディングに優れ、ビデオ圧縮やデノイングといったコンピュータビジョンタスクにおける約束を示す。 INRベースのアプローチは、映像フレームの回帰効果を損なうコンテンツ非依存の埋め込みから、映像フレームを再構成し、映像補間における一般化能力を制限する。これらの欠陥に対処するため、Hybrid Neural Representation for Videos (HNeRV) がコンテンツ適応型埋め込みと共に導入された。それでも、HNeRVの圧縮比は比較的低いままであり、ネットワークの浅い特徴とフレーム間の残差情報を利用する際の監視によるものである。本稿では,Vector Quantized-NeRV (VQ-NeRV) という,新しいコンポーネントであるVQ-NeRVブロックを統合する,高度なU字型アーキテクチャを提案する。このブロックには、ネットワークの浅い残差特徴とフレーム間の残差情報を効果的に識別するコードブック機構が組み込まれている。このアプローチはビデオ圧縮において特に有利であり、量子化された特徴に比べてサイズが小さくなる。さらに,従来のコードブック最適化手法である浅層コードブック最適化を導入し,コードブックの有用性と効率性を向上する。実験により、VQ-NeRVはビデオレグレッションタスクにおいてHNeRVより優れており、(Pak Signal-to-Noise Ratio (PSNR)における1-2dBの増加とともに)より優れた再構成品質を実現し、ピクセル当たりのビット効率(bpp)が向上し、ビデオインパインティング結果が改善された。 Implicit neural representations (INR) excel in encoding videos within neural networks, showcasing promise in computer vision tasks like video compression and denoising. INR-based approaches reconstruct video frames from content-agnostic embeddings, which hampers their efficacy in video frame regression and restricts their generalization ability for video interpolation. To address these deficiencies, Hybrid Neural Representation for Videos (HNeRV) was introduced with content-adaptive embeddings. Nevertheless, HNeRV's compression ratios remain relatively low, attributable to an oversight in leveraging the network's shallow features and inter-frame residual information. In this work, we introduce an advanced U-shaped architecture, Vector Quantized-NeRV (VQ-NeRV), which integrates a novel component--the VQ-NeRV Block. This block incorporates a codebook mechanism to discretize the network's shallow residual features and inter-frame residual information effectively. This approach proves particularly advantageous in video compression, as it results in smaller size compared to quantized features. Furthermore, we introduce an original codebook optimization technique, termed shallow codebook optimization, designed to refine the utility and efficiency of the codebook. The experimental evaluations indicate that VQ-NeRV outperforms HNeRV on video regression tasks, delivering superior reconstruction quality (with an increase of 1-2 dB in Peak Signal-to-Noise Ratio (PSNR)), better bit per pixel (bpp) efficiency, and improved video inpainting outcomes.	翻訳日:2024-03-20 15:31:57 公開日:2024-03-19
# プロンプト調音音声合成のための音声モデルに関する実証的研究 An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis ( http://arxiv.org/abs/2403.12402v1 ) ライセンス: Link先を確認	Yifan Peng, Ilia Kulikov, Yilin Yang, Sravya Popuri, Hui Lu, Changhan Wang, Hongyu Gong,	(参考訳) 音声言語モデル(LM)は、文脈内学習を通じて高品質な音声合成を実現することを約束している。典型的な音声LMは、個別の意味単位を内容として、短い発話をプロンプトとして取り、内容の意味を保存しながらプロンプトのスタイルを模倣する音声を合成する。しかし、合成音声がどのようにプロンプトと内容によって制御されるかは体系的に理解されていない。本研究では、広く使われている自己回帰(AR)および非自己回帰(NAR)音声LMの実証的研究を行い、迅速な設計とコンテンツセマンティックユニットに関する洞察を提供する。分析の結果,不均質かつ非定常なプロンプトは,より長いプロンプトが常により良い合成につながるという従来の発見とは対照的に,音質を損なうことが明らかとなった。さらに、合成音声の話者スタイルも、プロンプトに加えてコンテンツに影響されていることがわかった。さらに、セマンティックユニットは、コンテンツから合成音声に漏れる可能性のあるピッチ、テンポ、ボリューム、音声強調などの豊富な音響情報を持っていることを示す。 Speech language models (LMs) are promising for high-quality speech synthesis through in-context learning. A typical speech LM takes discrete semantic units as content and a short utterance as prompt, and synthesizes speech which preserves the content's semantics but mimics the prompt's style. However, there is no systematic understanding on how the synthesized audio is controlled by the prompt and content. In this work, we conduct an empirical study of the widely used autoregressive (AR) and non-autoregressive (NAR) speech LMs and provide insights into the prompt design and content semantic units. Our analysis reveals that heterogeneous and nonstationary prompts hurt the audio quality in contrast to the previous finding that longer prompts always lead to better synthesis. Moreover, we find that the speaker style of the synthesized audio is also affected by the content in addition to the prompt. We further show that semantic units carry rich acoustic information such as pitch, tempo, volume and speech emphasis, which might be leaked from the content to the synthesized audio.	翻訳日:2024-03-20 15:31:57 公開日:2024-03-19
# 大規模言語モデル抽出によるヘイト音声認識の解釈に向けて Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales ( http://arxiv.org/abs/2403.12403v1 ) ライセンス: Link先を確認	Ayushi Nirmal, Amrita Bhattacharjee, Paras Sheth, Huan Liu,	(参考訳) ソーシャルメディアプラットフォームは、ユーザーが対人的な議論や意見を表明するための重要な場であるが、ソーシャルメディアが提供するファサードと匿名性によって、ヘイトスピーチや不快なコンテンツを発信することができる。このようなプラットフォームの大規模化を考えると、ヘイトスピーチのインスタンスを自動的に識別し、フラグを付ける必要がある。いくつかのヘイトスピーチ検出法が存在するが、これらのブラックボックス法のほとんどは、設計によって解釈可能あるいは説明可能ではない。解釈可能性の欠如に対処するため,本稿では,言語モデル(LLM)を用いて,入力テキストから有理形の特徴を抽出し,基本ヘイトスピーチ分類器を訓練し,設計による忠実な解釈を可能にすることを提案する。我々のフレームワークは,LLMのテキスト理解能力と最先端のヘイトスピーチ分類器の識別能力とを効果的に組み合わせて,これらの分類器を忠実に解釈できるようにする。各種ソーシャルメディアヘイトスピーチデータセットの総合評価では,(1)LLM抽出された有理性の良さ,(2)解釈可能性を確保するためのトレーニング後においても,検出性能の驚くほどの維持が示されている。 Although social media platforms are a prominent arena for users to engage in interpersonal discussions and express opinions, the facade and anonymity offered by social media may allow users to spew hate speech and offensive content. Given the massive scale of such platforms, there arises a need to automatically identify and flag instances of hate speech. Although several hate speech detection methods exist, most of these black-box methods are not interpretable or explainable by design. To address the lack of interpretability, in this paper, we propose to use state-of-the-art Large Language Models (LLMs) to extract features in the form of rationales from the input text, to train a base hate speech classifier, thereby enabling faithful interpretability by design. Our framework effectively combines the textual understanding capabilities of LLMs and the discriminative power of state-of-the-art hate speech classifiers to make these classifiers faithfully interpretable. Our comprehensive evaluation on a variety of social media hate speech datasets demonstrate: (1) the goodness of the LLM-extracted rationales, and (2) the surprising retention of detector performance even after training to ensure interpretability.	翻訳日:2024-03-20 15:31:57 公開日:2024-03-19
# 無訓練拡散誘導の理解--メカニズムと限界 Understanding Training-free Diffusion Guidance: Mechanisms and Limitations ( http://arxiv.org/abs/2403.12404v1 ) ライセンス: Link先を確認	Yifei Shen, Xinyang Jiang, Yezhen Wang, Yifan Yang, Dongqi Han, Dongsheng Li,	(参考訳) 事前訓練された拡散モデルにさらなる制御を加えることが、コンピュータビジョン、強化学習、科学のためのAIなど、ますます人気のある研究領域となっている。近年,クリーンな画像に事前学習したオフ・ザ・シェルフネットワークを用いて,トレーニングフリーな拡散誘導法を提案する研究がいくつかある。このアプローチは、拡散誘導の無料ランチを提供するように見えるユニバーサル制御フォーマットのゼロショット条件生成を可能にする。本稿では,トレーニングフリーガイダンスの運用メカニズムと基本的制約について,より深く理解することを目的としている。我々は,学習自由指導を最適化の観点から支援する理論解析を行い,それを分類者に基づく(または分類者なし)指導と区別する。それらの欠点を解明するために, 学習自由法は, 対角勾配の影響を受けやすく, 分類器指導と比較して収束速度が遅いことを理論的に証明した。次に,その限界を克服するために,理論的理論的根拠と実証的証拠を伴って,一連の手法を導入する。画像と動きの生成実験により,これらの手法の有効性が確認された。 Adding additional control to pretrained diffusion models has become an increasingly popular research area, with extensive applications in computer vision, reinforcement learning, and AI for science. Recently, several studies have proposed training-free diffusion guidance by using off-the-shelf networks pretrained on clean images. This approach enables zero-shot conditional generation for universal control formats, which appears to offer a free lunch in diffusion guidance. In this paper, we aim to develop a deeper understanding of the operational mechanisms and fundamental limitations of training-free guidance. We offer a theoretical analysis that supports training-free guidance from the perspective of optimization, distinguishing it from classifier-based (or classifier-free) guidance. To elucidate their drawbacks, we theoretically demonstrate that training-free methods are more susceptible to adversarial gradients and exhibit slower convergence rates compared to classifier guidance. We then introduce a collection of techniques designed to overcome the limitations, accompanied by theoretical rationale and empirical evidence. Our experiments in image and motion generation confirm the efficacy of these techniques.	翻訳日:2024-03-20 15:31:57 公開日:2024-03-19
# 量子センサのための実用的超低周波ノイズレーザーシステム Practical ultra-low frequency noise laser system for quantum sensors ( http://arxiv.org/abs/2403.12405v1 ) ライセンス: Link先を確認	Shiyu Xue, Mingyong Jing, Hao Zhang, Linjie Zhang, Liantuan Xiao, Suotang Jia,	(参考訳) レーザーの周波数ノイズは、量子センサーの感度に不可欠である。レーザの周波数ノイズを抑制するための2つの一般的な方法は、ロックイン法でレーザーを原子遷移にロックするか、PDH法で超低熱膨張(ULE)ガラス空洞にロックすることである。前者は急速に変化する周波数ノイズを抑えることができず、ニーズを満たすことは困難である。高性能で低コストなレーザーノイズ抑圧法が欠如しているため、量子センサーの実用化は劇的に制限された。この研究は、Rydberg原子超ヘテロダイン受信機のような多くの量子センシングアプリケーションにおいて、レーザーを原子遷移と低コスト(LC)キャビティの両方にロックすることで、ULEキャビティへのロックと同じ性能が得られることを示した。この研究は量子センサーの実用化を促進する上で重要である。 The laser's frequency noise is crucial to the sensitivity of quantum sensors. Two commonly used methods to suppress the laser's frequency noise are locking the laser to an atomic transition by the lock-in technique or to an ultra-low thermal expansion (ULE) glass cavity by the PDH technique. The former cannot suppress rapidly changing frequency noise and hardly meets the needs; the latter has powerful performance but a heightened cost. The lack of high-performance and low-cost laser noise suppression methods dramatically limits the practical application of quantum sensors. This work demonstrates that, in many quantum sensing applications such as the Rydberg atomic superheterodyne receiver, by cascade locking the laser to both the atomic transition and a low-cost (LC) cavity, the same performance as locking to the ULE cavity can be achieved. This work is significant in promoting the practical application of quantum sensors.	翻訳日:2024-03-20 15:31:57 公開日:2024-03-19
# 経験的文脈とブラウン運動によるバドミントン選手の行動のオフライン模倣 Offline Imitation of Badminton Player Behavior via Experiential Contexts and Brownian Motion ( http://arxiv.org/abs/2403.12406v1 ) ライセンス: Link先を確認	Kuang-Da Wang, Wei-Yao Wang, Ping-Chun Hsieh, Wen-Chih Peng,	(参考訳) ターンベーススポーツの動的かつ迅速な戦術的関与において、バドミントンはプレイヤーの交代依存的な意思決定を必要とする本質的なパラダイムとして際立っている。連続的な意思決定におけるオフラインの専門家データからの学習の進歩は、様々な領域で見られてきたが、オフラインバドミントンの試合から人間のプレイヤーの行動を適切に模倣する方法は、まだ探索されていない。相手の行動の再現は、試合前に戦略的な開発を行うことでプレイヤーに利益をもたらす。しかし、既存の手法を直接適用することは、代わりにアクションを取るプレイヤーのターンベースの性質によって、マッチの固有の階層と複合効果に悩まされる。本稿では,バドミントン奏者行動のための新しい階層型オフライン模倣学習モデルであるRallyNetを提案する。 (i)RallyNetは、意思決定プロセスを文脈的マルコフ決定プロセスとしてモデル化することにより、プレイヤーの意思決定依存性をキャプチャする。 (ii) RallyNetは、エージェントのアライメントにおける意図としてコンテキストを生成するために、経験を活用します。 3)より現実的な行動を生成するため,RallyNetは幾何学的ブラウン運動(GBM)を活用してプレイヤー間の相互作用をモデル化する。このように、RallyNetはプレイヤーの意図をGBMとのインタラクションモデルと結びつけ、スポーツ分析のためのインタラクションの理解を提供する。我々はRallyNetを、男性と女性のシングルで構成された世界最大規模のバドミントンデータセットで広く検証し、プレイヤーの振る舞いを模倣する能力を実証した。その結果、RallyNetはオフラインの模倣学習法や最先端のターンベースアプローチよりも優れており、ルールベースのエージェント正規化スコアの平均で少なくとも16%上回っていることが明らかとなった。さらに、RallyNetの適用性を強調するために、さまざまなユースケースについて論じる。 In the dynamic and rapid tactic involvements of turn-based sports, badminton stands out as an intrinsic paradigm that requires alter-dependent decision-making of players. While the advancement of learning from offline expert data in sequential decision-making has been witnessed in various domains, how to rally-wise imitate the behaviors of human players from offline badminton matches has remained underexplored. Replicating opponents' behavior benefits players by allowing them to undergo strategic development with direction before matches. However, directly applying existing methods suffers from the inherent hierarchy of the match and the compounding effect due to the turn-based nature of players alternatively taking actions. In this paper, we propose RallyNet, a novel hierarchical offline imitation learning model for badminton player behaviors: (i) RallyNet captures players' decision dependencies by modeling decision-making processes as a contextual Markov decision process. (ii) RallyNet leverages the experience to generate context as the agent's intent in the rally. (iii) To generate more realistic behavior, RallyNet leverages Geometric Brownian Motion (GBM) to model the interactions between players by introducing a valuable inductive bias for learning player behaviors. In this manner, RallyNet links player intents with interaction models with GBM, providing an understanding of interactions for sports analytics. We extensively validate RallyNet with the largest available real-world badminton dataset consisting of men's and women's singles, demonstrating its ability to imitate player behaviors. Results reveal RallyNet's superiority over offline imitation learning methods and state-of-the-art turn-based approaches, outperforming them by at least 16% in mean rule-based agent normalization score. Furthermore, we discuss various practical use cases to highlight RallyNet's applicability.	翻訳日:2024-03-20 15:31:57 公開日:2024-03-19
# 多言語 Prompt Translator を用いた自然言語推論のための言語間変換 Cross-Lingual Transfer for Natural Language Inference via Multilingual Prompt Translator ( http://arxiv.org/abs/2403.12407v1 ) ライセンス: Link先を確認	Xiaoyu Qiu, Yuechen Wang, Jiaxin Shi, Wengang Zhou, Houqiang Li,	(参考訳) 複数言語による事前学習モデルに基づいて、プロンプト学習による言語間移動は有望な効果を示し、ソース言語で学習したソフトプロンプトは、特に低リソースシナリオにおいて、下流タスクのためにターゲット言語に転送される。ソフトプロンプトを効率的に転送するために,多言語プロンプトトランスレータ (MPT) という新しいフレームワークを提案する。具体的には、まずソース言語でプロンプトを訓練し、ターゲットプロンプトに変換するためにトランスレータを使用します。さらに,外部コーパスを補助データとして拡張し,予測解答確率のアライメントタスクを設計して言語知識を変換し,ターゲットプロンプトに多言語知識を付与する。 XNLIのわずかな設定では、MPTは目覚ましい改善によってベースラインよりも優れていることを示している。 MPTは、ソース言語とはかなり異なる言語に移行する際、バニラプロンプトよりも顕著である。 Based on multilingual pre-trained models, cross-lingual transfer with prompt learning has shown promising effectiveness, where soft prompt learned in a source language is transferred to target languages for downstream tasks, particularly in the low-resource scenario. To efficiently transfer soft prompt, we propose a novel framework, Multilingual Prompt Translator (MPT), where a multilingual prompt translator is introduced to properly process crucial knowledge embedded in prompt by changing language knowledge while retaining task knowledge. Concretely, we first train prompt in source language and employ translator to translate it into target prompt. Besides, we extend an external corpus as auxiliary data, on which an alignment task for predicted answer probability is designed to convert language knowledge, thereby equipping target prompt with multilingual knowledge. In few-shot settings on XNLI, MPT demonstrates superiority over baselines by remarkable improvements. MPT is more prominent compared with vanilla prompting when transferring to languages quite distinct from source language.	翻訳日:2024-03-20 15:31:57 公開日:2024-03-19
# MSLM-S2ST:話者スタイル保存による音声音声合成のためのマルチタスク言語モデル MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation ( http://arxiv.org/abs/2403.12408v1 ) ライセンス: Link先を確認	Yifan Peng, Ilia Kulikov, Yilin Yang, Sravya Popuri, Hui Lu, Changhan Wang, Hongyu Gong,	(参考訳) 音声から音声への翻訳(S2ST)における研究の関心や進歩が高まり、ある言語から別の言語への発話の翻訳が進められている。本研究は,マルチタスク設定で訓練されたデコーダのみの音声言語モデルであるマルチタスク音声言語モデル(MSLM)を提案する。テキスト学習データに頼らずに、話者スタイルを保存した多言語S2STをサポートすることができる。 There have been emerging research interest and advances in speech-to-speech translation (S2ST), translating utterances from one language to another. This work proposes Multitask Speech Language Model (MSLM), which is a decoder-only speech language model trained in a multitask setting. Without reliance on text training data, our model is able to support multilingual S2ST with speaker style preserved.	翻訳日:2024-03-20 15:31:57 公開日:2024-03-19
# ComboVerse:空間認識拡散誘導を用いた合成3Dアセットの作成 ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance ( http://arxiv.org/abs/2403.12409v1 ) ライセンス: Link先を確認	Yongwei Chen, Tengfei Wang, Tong Wu, Xingang Pan, Kui Jia, Ziwei Liu,	(参考訳) 与えられた画像から高品質な3Dアセットを生成することは、AR/VRのような様々なアプリケーションで非常に望ましい。単一画像の3D生成の最近の進歩は、最適化せずに物体の3Dモデルを推測するフィードフォワードモデルを探っている。単一のオブジェクト生成において有望な結果が得られたが、これらの手法は本質的に複数のオブジェクトを含む複雑な3Dアセットのモデル化に苦慮することが多い。本稿では,複数のモデルを組み合わせて学習することで,複雑な構成で高品質な3Dアセットを生成する3D生成フレームワークであるComboVerseを紹介する。 1) この「マルチオブジェクトギャップ」をモデルとデータの両方の観点から詳細に分析する。次に, 異なる物体の3次元モデルを再構成し, その大きさ, 回転角, 位置を調整し, 与えられた画像に一致する3次元アセットを作成する。 3) このプロセスを自動化するために, 既訓練拡散モデルから空間認識型スコア蒸留サンプリング(SSDS)を適用し, 物体の位置を導出する。提案手法は,標準スコア蒸留法と比較して,物体の空間的アライメントを重視し,より正確な結果が得られる。 ComboVerseの大規模な実験は、既存の3Dアセットの生成方法よりも明らかに改善されている。 Generating high-quality 3D assets from a given image is highly desirable in various applications such as AR/VR. Recent advances in single-image 3D generation explore feed-forward models that learn to infer the 3D model of an object without optimization. Though promising results have been achieved in single object generation, these methods often struggle to model complex 3D assets that inherently contain multiple objects. In this work, we present ComboVerse, a 3D generation framework that produces high-quality 3D assets with complex compositions by learning to combine multiple models. 1) We first perform an in-depth analysis of this ``multi-object gap'' from both model and data perspectives. 2) Next, with reconstructed 3D models of different objects, we seek to adjust their sizes, rotation angles, and locations to create a 3D asset that matches the given image. 3) To automate this process, we apply spatially-aware score distillation sampling (SSDS) from pretrained diffusion models to guide the positioning of objects. Our proposed framework emphasizes spatial alignment of objects, compared with standard score distillation sampling, and thus achieves more accurate results. Extensive experiments validate ComboVerse achieves clear improvements over existing methods in generating compositional 3D assets.	翻訳日:2024-03-20 15:31:57 公開日:2024-03-19
# インストラクションによる第三者言語モデルの性能予測 Third-Party Language Model Performance Prediction from Instruction ( http://arxiv.org/abs/2403.12413v1 ) ライセンス: Link先を確認	Rahul Nadkarni, Yizhong Wang, Noah A. Smith,	(参考訳) 言語モデルに基づく命令追従システムは、最近、多くのベンチマークタスクのパフォーマンスが向上し、幅広い命令に適応できることを実証している。しかし、そのようなシステムは、その制限について透過的に設計されることがしばしばあり、ユーザーは、その応答が正確かどうか、あるいはシステムがそのタスクを実行することができるかどうかを判断することなく、容易に命令でモデルにプロンプトを行うことができる。本稿では,タスク上での指示追従システムの評価から得られたメトリックを,入力と出力のみへのアクセスを仮定しながら,個別のモデルで予測する,第三者のパフォーマンス予測フレームワークを提案する。この分析は、オープンおよびクローズドな命令追従モデルおよび複数のパフォーマンス予測モデルを用いて行い、モデルサイズ、トレーニングタスク数、プロンプトフォーマットなど、様々な要因の影響について検討する。この結果、サードパーティのパフォーマンス予測は非常に困難であり、現代の命令追従自然言語処理システムの限界を自動的に明らかにできる予測器の開発には、多くの作業が残っていることが示唆された。 Language model-based instruction-following systems have lately shown increasing performance on many benchmark tasks, demonstrating the capability of adapting to a broad variety of instructions. However, such systems are often not designed to be transparent about their limitations; a user may easily prompt a model with an instruction without any idea of whether the responses should be expected to be accurate, or if the system is even capable of performing the task. We propose a third party performance prediction framework, where a separate model is trained to predict the metric resulting from evaluating an instruction-following system on a task while assuming access only to its inputs and outputs at inference time. We perform this analysis with a variety of both open and closed instruction-following models as well as multiple performance predictors, and examine the effect of various factors such as model size, number of training tasks, and prompt format. Our findings indicate that third-party performance prediction is very challenging, and much work remains in developing predictors that can automatically reveal the limitations of modern instruction-following natural language processing systems.	翻訳日:2024-03-20 15:31:57 公開日:2024-03-19
# VisionGPT: 安全な視覚ナビゲーションのためのLLM支援リアルタイム異常検出 VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation ( http://arxiv.org/abs/2403.12415v1 ) ライセンス: Link先を確認	Hao Wang, Jiayou Qin, Ashish Bastola, Xiwen Chen, John Suchanek, Zihao Gong, Abolfazl Razi,	(参考訳) 本稿では,画像ナビゲーションのためのゼロショット異常検出におけるLarge Language Models(LLMs)の可能性について検討する。最先端のリアルタイムオープンワールドオブジェクト検出モデルYolo-Worldと特殊なプロンプトにより、提案フレームワークは、任意の障害を含むカメラキャプチャフレーム内の異常を識別し、異常を強調した簡潔でオーディオ配信された記述を生成し、複雑な状況下で安全な視覚ナビゲーションを支援する。さらに,LLMとオープン語彙オブジェクト検出モデルの利点を利用して動的シナリオスイッチを実現し,従来の視覚ナビゲーションの限界に対処するシーンからシーンへのスムーズな遷移を可能にする。さらに,視覚的アクセシビリティ向上のためのビジョンを提供し,映像の異常検出と視覚言語理解におけるLCMの道のりを開拓した。 This paper explores the potential of Large Language Models(LLMs) in zero-shot anomaly detection for safe visual navigation. With the assistance of the state-of-the-art real-time open-world object detection model Yolo-World and specialized prompts, the proposed framework can identify anomalies within camera-captured frames that include any possible obstacles, then generate concise, audio-delivered descriptions emphasizing abnormalities, assist in safe visual navigation in complex circumstances. Moreover, our proposed framework leverages the advantages of LLMs and the open-vocabulary object detection model to achieve the dynamic scenario switch, which allows users to transition smoothly from scene to scene, which addresses the limitation of traditional visual navigation. Furthermore, this paper explored the performance contribution of different prompt components, provided the vision for future improvement in visual accessibility, and paved the way for LLMs in video anomaly detection and vision-language understanding.	翻訳日:2024-03-20 15:31:57 公開日:2024-03-19
# アイ・ゲイズガイドによる放射線学用マルチモーダルアライメントフレームワーク Eye-gaze Guided Multi-modal Alignment Framework for Radiology ( http://arxiv.org/abs/2403.12416v1 ) ライセンス: Link先を確認	Chong Ma, Hanqi Jiang, Wenting Chen, Zihao Wu, Xiaowei Yu, Fang Zeng, Lei Guo, Dajiang Zhu, Tuo Zhang, Dinggang Shen, Tianming Liu, Xiang Li,	(参考訳) マルチモーダルフレームワークでは、クロスモーダル機能のアライメントが大きな課題となる。マルチモーダル事前学習における主要なアプローチは、広範囲なデータセットを利用して、モダリティ間のグローバルまたはローカルなアライメントを強調している。このボトムアップ駆動法は、しばしばラジオロジーにおいて重要な関心事である解釈可能性の欠如に悩まされる。これまでの研究では、医療画像やテキストにハイレベルなラベルが組み込まれていたが、それでも手作業によるアノテーションに依存している。本研究は,放射線医が診断評価中に同期的に収集した眼球運動データを用いた新しいアプローチを提案する。このデータは、放射線医の焦点領域を示すもので、胸部X線と診断用テキストを自然に関連付けている。画像とテキストの特徴の整合性を改善するためにアイ・ゲイズ・ガイドド・マルチモーダル・アライメント(EGMA)フレームワークを提案し,手動アノテーションへの依存を減らし,トレーニングコストを削減することを目的とした。我々のモデルは、ゼロショット分類および検索タスクにおいて、他の最先端手法よりも優れたロバストな性能を示す。定期的な放射線診断における目視データの導入は、手動のアノテーション依存を最小化するための一歩である。さらに、様々な眼球運動データがモデル性能に与える影響について検討し、これらの補助データをマルチモーダル事前学習に組み込む可能性と有用性を強調した。 In multi-modal frameworks, the alignment of cross-modal features presents a significant challenge. The predominant approach in multi-modal pre-training emphasizes either global or local alignment between modalities, utilizing extensive datasets. This bottom-up driven method often suffers from a lack of interpretability, a critical concern in radiology. Previous studies have integrated high-level labels in medical images or text, but these still rely on manual annotation, a costly and labor-intensive process. Our work introduces a novel approach by using eye-gaze data, collected synchronously by radiologists during diagnostic evaluations. This data, indicating radiologists' focus areas, naturally links chest X-rays to diagnostic texts. We propose the Eye-gaze Guided Multi-modal Alignment (EGMA) framework to harness eye-gaze data for better alignment of image and text features, aiming to reduce reliance on manual annotations and thus cut training costs. Our model demonstrates robust performance, outperforming other state-of-the-art methods in zero-shot classification and retrieval tasks. The incorporation of easily-obtained eye-gaze data during routine radiological diagnoses signifies a step towards minimizing manual annotation dependency. Additionally, we explore the impact of varying amounts of eye-gaze data on model performance, highlighting the feasibility and utility of integrating this auxiliary data into multi-modal pre-training.	翻訳日:2024-03-20 15:22:07 公開日:2024-03-19
# 能動推論における予測計画と対実学習について On Predictive planning and counterfactual learning in active inference ( http://arxiv.org/abs/2403.12417v1 ) ライセンス: Link先を確認	Aswin Paul, Takuya Isomura, Adeel Razi,	(参考訳) 人工知能の急速な進歩を考えると、知的行動の基礎を理解することがますます重要である。行動の一般的な理論と見なされる活発な推論は、計画と意思決定の高度化の基盤を探索するための原則的なアプローチを提供する。本稿では,「計画」と「経験から学ぶ」に基づく,アクティブ推論における2つの意思決定手法について検討する。さらに、これらの戦略間のデータ複雑度トレードオフをナビゲートする混合モデルを導入し、両者の強みを活用して、バランスの取れた意思決定を容易にする。提案手法を,エージェントの適応性を必要とするグリッドワールドシナリオで評価する。さらに、我々のモデルは、様々なパラメータの進化を分析する機会を提供し、貴重な洞察を提供し、インテリジェントな意思決定のための説明可能なフレームワークに貢献する。 Given the rapid advancement of artificial intelligence, understanding the foundations of intelligent behaviour is increasingly important. Active inference, regarded as a general theory of behaviour, offers a principled approach to probing the basis of sophistication in planning and decision-making. In this paper, we examine two decision-making schemes in active inference based on 'planning' and 'learning from experience'. Furthermore, we also introduce a mixed model that navigates the data-complexity trade-off between these strategies, leveraging the strengths of both to facilitate balanced decision-making. We evaluate our proposed model in a challenging grid-world scenario that requires adaptability from the agent. Additionally, our model provides the opportunity to analyze the evolution of various parameters, offering valuable insights and contributing to an explainable framework for intelligent decision-making.	翻訳日:2024-03-20 15:22:07 公開日:2024-03-19
# STG-Mamba:選択状態空間モデルによる時空間グラフ学習 STG-Mamba: Spatial-Temporal Graph Learning via Selective State Space Model ( http://arxiv.org/abs/2403.12418v1 ) ライセンス: Link先を確認	Lincan Li, Hanchen Wang, Wenjie Zhang, Adelle Coster,	(参考訳) 時空間グラフ(STG)データは動的,異種,非定常的に特徴付けられ,空間時空間グラフ学習の継続的な課題に繋がる。近年,STGネットワークのノード間の関係を模倣することにのみ焦点をあて,STGシステムに存在する固有の特徴をモデル化することの重要性を無視して,様々なGNNベースの手法が提案されている。対照的に、現代の選択的状態空間モデル(SSSM)は、STGネットワークをシステムとして扱う新しいアプローチを示し、時間次元にわたってSTGシステムの動的状態進化を慎重に探求する。本研究では,STGネットワークをシステムとして扱うことにより,STG学習のための強力な選択的状態空間モデルを活用するための最初の探索として空間空間グラフマンバ(STG-Mamba)を導入し,グラフ選択的状態空間ブロック(GS3B)を用いてSTGネットワークの動的進化を正確に評価する。 STG-Mamba は Encoder-Decoder アーキテクチャとして定式化され、GS3B を基本モジュールとし、効率的なシーケンシャルなデータモデリングを行う。さらに、SSSMの設定下でSTGデータをモデル化するGNNの能力を強化するために、適応グラフ構造更新のためのKFGN(Kalman Filtering Graph Neural Networks)を提案する。 KFGNは選択状態空間の進化の文脈にスムーズに適合し、同時に線形複雑性も維持する。 3つのベンチマークSTG予測データセットを用いて,STG-Mambaの性能優位性と計算効率を実証した。 STG予測性能の点で既存の最先端手法を超えるだけでなく、大規模グラフネットワークの計算ボトルネックを効果的に軽減し、FLOPの計算コストとテスト推論時間を削減している。 Spatial-Temporal Graph (STG) data is characterized as dynamic, heterogenous, and non-stationary, leading to the continuous challenge of spatial-temporal graph learning. In the past few years, various GNN-based methods have been proposed to solely focus on mimicking the relationships among node individuals of the STG network, ignoring the significance of modeling the intrinsic features that exist in STG system over time. In contrast, modern Selective State Space Models (SSSMs) present a new approach which treat STG Network as a system, and meticulously explore the STG system's dynamic state evolution across temporal dimension. In this work, we introduce Spatial-Temporal Graph Mamba (STG-Mamba) as the first exploration of leveraging the powerful selective state space models for STG learning by treating STG Network as a system, and employing the Graph Selective State Space Block (GS3B) to precisely characterize the dynamic evolution of STG networks. STG-Mamba is formulated as an Encoder-Decoder architecture, which takes GS3B as the basic module, for efficient sequential data modeling. Furthermore, to strengthen GNN's ability of modeling STG data under the setting of SSSMs, we propose Kalman Filtering Graph Neural Networks (KFGN) for adaptive graph structure upgrading. KFGN smoothly fits in the context of selective state space evolution, and at the same time keeps linear complexity. Extensive empirical studies are conducted on three benchmark STG forecasting datasets, demonstrating the performance superiority and computational efficiency of STG-Mamba. It not only surpasses existing state-of-the-art methods in terms of STG forecasting performance, but also effectively alleviate the computational bottleneck of large-scale graph networks in reducing the computational cost of FLOPs and test inference time.	翻訳日:2024-03-20 15:22:07 公開日:2024-03-19
# Jetfire: INT8データフローとブロック単位の量子化を前提とした効率的かつ高精度なトランスフォーマ Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization ( http://arxiv.org/abs/2403.12422v1 ) ライセンス: Link先を確認	Haocheng Xi, Yuxiang Chen, Kang Zhao, Kaijun Zheng, Jianfei Chen, Jun Zhu,	(参考訳) 事前学習は一般的に時間を要する。完全量子化トレーニング(FQT)は、事前トレーニングを高速化するための有望なアプローチである。しかし、ほとんどのFQTメソッドは量子化-量子化処理を採用しており、メモリアクセスのオーバーヘッドや低精度の計算のためにトランスフォーマで使用される場合、しばしば最適以下のスピードアップと大幅な性能低下をもたらす。本研究では, 変圧器に特化した高速かつ高精度な INT8 トレーニング手法であるJetfire を提案する。本手法は、メモリアクセスを最適化するINT8データフローと、事前学習した変換器の精度を維持するブロックごとの量子化手法を特徴とする。我々のINT8 FQT法は、FP16トレーニングベースラインに匹敵する精度を達成し、トランスフォーマーの既存のINT8トレーニング作業より優れていることを示す。さらに、標準変圧器ブロックでは、FP16ベースラインと比較して、エンドツーエンドのトレーニングスピードアップが1.42倍、メモリ削減が1.49倍となる。 Pretraining transformers are generally time-consuming. Fully quantized training (FQT) is a promising approach to speed up pretraining. However, most FQT methods adopt a quantize-compute-dequantize procedure, which often leads to suboptimal speedup and significant performance degradation when used in transformers due to the high memory access overheads and low-precision computations. In this work, we propose Jetfire, an efficient and accurate INT8 training method specific to transformers. Our method features an INT8 data flow to optimize memory access and a per-block quantization method to maintain the accuracy of pretrained transformers. Extensive experiments demonstrate that our INT8 FQT method achieves comparable accuracy to the FP16 training baseline and outperforms the existing INT8 training works for transformers. Moreover, for a standard transformer block, our method offers an end-to-end training speedup of 1.42x and a 1.49x memory reduction compared to the FP16 baseline.	翻訳日:2024-03-20 15:22:07 公開日:2024-03-19
# 時空間列を用いた多モーダル融合法と相関学習による有理-覚醒推定 Multimodal Fusion Method with Spatiotemporal Sequences and Relationship Learning for Valence-Arousal Estimation ( http://arxiv.org/abs/2403.12425v1 ) ライセンス: Link先を確認	Jun Yu, Gongpeng Zhao, Yongqi Wan, Zhihong Wei, Yang Zheng, Zerui Zhang, Zhongpeng Cai, Guochen Xie, Jichao Zhu, Wangyuan Zhu,	(参考訳) 本稿では,ABAW6コンペティションにおけるVA(Valence-Arousal)推定課題について述べる。映像フレームと音声セグメントを前処理して視覚的・音声的特徴を抽出する包括的モデルを考案した。時間的畳み込みネットワーク(TCN)モジュールの利用により,これらの特徴間の時間的および空間的相関を効果的に捉えた。その後、Transformerエンコーダ構造を用いて長距離依存を学習し、モデルの性能と一般化能力を向上させる。提案手法はマルチモーダルデータ融合手法を利用して,事前学習した音声とビデオのバックボーンを特徴抽出に利用し,次にTCNベースの時空間符号化とTransformerベースの時間情報キャプチャを行う。 AffWild2データセットを用いたVA推定において,提案手法の有効性を示す実験結果が得られた。 This paper presents our approach for the VA (Valence-Arousal) estimation task in the ABAW6 competition. We devised a comprehensive model by preprocessing video frames and audio segments to extract visual and audio features. Through the utilization of Temporal Convolutional Network (TCN) modules, we effectively captured the temporal and spatial correlations between these features. Subsequently, we employed a Transformer encoder structure to learn long-range dependencies, thereby enhancing the model's performance and generalization ability. Our method leverages a multimodal data fusion approach, integrating pre-trained audio and video backbones for feature extraction, followed by TCN-based spatiotemporal encoding and Transformer-based temporal information capture. Experimental results demonstrate the effectiveness of our approach, achieving competitive performance in VA estimation on the AffWild2 dataset.	翻訳日:2024-03-20 15:22:07 公開日:2024-03-19
# 逆サンプルを用いた連続マルチアームバンドの転送 Transfer in Sequential Multi-armed Bandits via Reward Samples ( http://arxiv.org/abs/2403.12428v1 ) ライセンス: Link先を確認	Rahul N R, Vaibhav Katewa,	(参考訳) エージェントが複数のエピソードにまたがってバンディットと対話する連続確率的マルチアームバンディット問題を考える。腕の報酬分布はエピソードを通して一定であるが、異なるエピソードで変化することができる。 UCBに基づくアルゴリズムにより、前回のエピソードからの報酬サンプルを転送し、全てのエピソードに対する累積的後悔性能を改善する。提案アルゴリズムは, トランスファーを伴わない標準的な UCB アルゴリズムに対して, 大幅な改善が見られた。 We consider a sequential stochastic multi-armed bandit problem where the agent interacts with bandit over multiple episodes. The reward distribution of the arms remain constant throughout an episode but can change over different episodes. We propose an algorithm based on UCB to transfer the reward samples from the previous episodes and improve the cumulative regret performance over all the episodes. We provide regret analysis and empirical results for our algorithm, which show significant improvement over the standard UCB algorithm without transfer.	翻訳日:2024-03-20 15:22:07 公開日:2024-03-19
# TransformMix:データから学習形質転換と混合戦略 TransformMix: Learning Transformation and Mixing Strategies from Data ( http://arxiv.org/abs/2403.12429v1 ) ライセンス: Link先を確認	Tsz-Him Cheung, Dit-Yan Yeung,	(参考訳) データ拡張は、より多くのトレーニングサンプルを合成することによって、ディープラーニングモデルの一般化能力を向上させる。サンプルミキシングは、既存のサンプルを組み合わせることで追加データを生成する一般的なデータ拡張アプローチである。 MixupやCutmixのような最近のサンプルミキシング手法では、複数の入力をブレンドするための単純な混合操作が採用されている。このようなヒューリスティックなアプローチは、一部のコンピュータビジョンタスクで特定のパフォーマンス向上を示すが、画像は盲目的に混合され、異なるデータセットに自動的に適応しない。特定のデータセットに有効なミキシング戦略は、他のデータセットによく当てはまらないことが多い。適切に設定されていない場合、この手法はミスリード混合画像を生成し、サンプル混合増強の有効性を損なう可能性がある。本研究では,データからより優れた変換と拡張戦略を混合するための自動アプローチであるTransformMixを提案する。特にTransformMixは、学習した変換と混合マスクを適用して、ターゲットタスクの正確かつ重要な情報を含む魅力的な混合画像を生成する。本稿では,トランスフォーメーション学習,分類,オブジェクト検出,知識蒸留設定におけるTransformMixの有効性を示す。実験結果から,本手法は試料混合ベースラインに比べて優れた性能と効率を達成できることがわかった。 Data augmentation improves the generalization power of deep learning models by synthesizing more training samples. Sample-mixing is a popular data augmentation approach that creates additional data by combining existing samples. Recent sample-mixing methods, like Mixup and Cutmix, adopt simple mixing operations to blend multiple inputs. Although such a heuristic approach shows certain performance gains in some computer vision tasks, it mixes the images blindly and does not adapt to different datasets automatically. A mixing strategy that is effective for a particular dataset does not often generalize well to other datasets. If not properly configured, the methods may create misleading mixed images, which jeopardize the effectiveness of sample-mixing augmentations. In this work, we propose an automated approach, TransformMix, to learn better transformation and mixing augmentation strategies from data. In particular, TransformMix applies learned transformations and mixing masks to create compelling mixed images that contain correct and important information for the target tasks. We demonstrate the effectiveness of TransformMix on multiple datasets in transfer learning, classification, object detection, and knowledge distillation settings. Experimental results show that our method achieves better performance as well as efficiency when compared with strong sample-mixing baselines.	翻訳日:2024-03-20 15:22:07 公開日:2024-03-19
# ディープラーニングフレームワークにおける幾何学的制約:サーベイ Geometric Constraints in Deep Learning Frameworks: A Survey ( http://arxiv.org/abs/2403.12431v1 ) ライセンス: Link先を確認	Vibhas K Vats, David J Crandall,	(参考訳) ステレオフォトグラム法はシーン理解の新たな技術である。起源は1800年代まで遡り、人々が世界の物理的特性を測定するために写真を使って調査を始めた。それ以来、何千ものアプローチが検討されている。 Stereoの形状の古典的な幾何学的技法は、幾何学を用いてシーン幾何学とカメラ幾何学の制約を定義し、方程式の非線形系を解く。より最近の研究は、幾何学を明示的にモデル化することなしに、エンドツーエンドのディープラーニングを使用して、まったく異なるアプローチを取っている。本稿では,幾何学ベースのフレームワークと深層学習ベースのフレームワークの重複について検討する。我々は、深度推定や他の密接に関連する問題に対して、制約を深層学習フレームワークに統合した幾何を比較し、対比する。本稿では、現代のディープラーニングフレームワークで使用される制約を規定する、一般的な幾何学のための新しい分類法を提案する。また、洞察に富んだ観察と今後の研究の方向性を示す。 Stereophotogrammetry is an emerging technique of scene understanding. Its origins go back to at least the 1800s when people first started to investigate using photographs to measure the physical properties of the world. Since then, thousands of approaches have been explored. The classic geometric techniques of Shape from Stereo is built on using geometry to define constraints on scene and camera geometry and then solving the non-linear systems of equations. More recent work has taken an entirely different approach, using end-to-end deep learning without any attempt to explicitly model the geometry. In this survey, we explore the overlap for geometric-based and deep learning-based frameworks. We compare and contrast geometry enforcing constraints integrated into a deep learning framework for depth estimation or other closely related problems. We present a new taxonomy for prevalent geometry enforcing constraints used in modern deep learning frameworks. We also present insightful observations and potential future research directions.	翻訳日:2024-03-20 15:22:07 公開日:2024-03-19
# プロトティポ・デ・ビデオ・ジュエゴ・アクティヴォ・バサード・イン・ナ・カマラ 3D para motivar la actividad física en niños y adultos mayores Prototipo de video juego activo basado en una cámara 3D para motivar la actividad física en niños y adultos mayores ( http://arxiv.org/abs/2403.12432v1 ) ライセンス: Link先を確認	Benjamín Ojeda Magaña, José Guadalupe Robledo Hernández, Leopoldo Gómez Barba, Victor Manuel Rangel Cobián,	(参考訳) 本論文は,子どもと高齢者の身体活動を促進するためのゲームプロトタイプの開発について述べる。プロトタイプはラップトップと3Dセンサー付きカメラで構成され、オプションでLCDスクリーンまたはプロジェクターを必要とする。このプロトタイプのプログラミングコンポーネントは、子ども向けのプログラミング言語であるScratchで開発された。このようなプロトタイプのアイデアは、糖尿病や高血圧などの慢性変性疾患の発達に身体運動が欠如していることから、子供や大人の間で身体活動を促進するオプションを提供するという欲求から生まれた。このイニシアチブにより、子どもと大人の両方が楽しい方法で交流し、ユーザの健康に良い影響を与える身体活動の演奏を奨励するピンポンゲームに基づいて、アクティブなビデオゲームプロトタイプが開発され、成功裏に開発された。 This document describes the development of a video game prototype designed to encourage physical activity among children and older adults. The prototype consists of a laptop, a camera with 3D sensors, and optionally requires an LCD screen or a projector. The programming component of this prototype was developed in Scratch, a programming language geared towards children, which greatly facilitates the creation of a game tailored to the users' preferences. The idea to create such a prototype originated from the desire to offer an option that promotes physical activity among children and adults, given that a lack of physical exercise is a predominant factor in the development of chronic degenerative diseases such as diabetes and hypertension, to name the most common. As a result of this initiative, an active video game prototype was successfully developed, based on a ping-pong game, which allows both children and adults to interact in a fun way while encouraging the performance of physical activities that can positively impact the users' health.	翻訳日:2024-03-20 15:22:07 公開日:2024-03-19
# 動的学習指標に基づくアルゴリズム的複雑度攻撃 Algorithmic Complexity Attacks on Dynamic Learned Indexes ( http://arxiv.org/abs/2403.12433v1 ) ライセンス: Link先を確認	Rui Yang, Evgenios M. Kornaropoulos, Yue Cheng,	(参考訳) Learned Index Structures (LIS)は、ソートインデックスをデータ分散を学習し、データ要素キーを入力として取り、予測されたキーの位置を出力するモデルとして見ている。オリジナルのLISは、更新をサポートせずにルックアップ操作のみを処理することができ、典型的なワークロードで使用するには実用的ではない。この制限に対処するため、最近の研究では、効率的な動的学習インデックスの設計に焦点が当てられている。 ALEXは動的学習インデックス構造のパイオニアとして、適応的なキー空間分割、動的モデル再トレーニング、読み書き性能を優先する高度なエンジニアリングとポリシーを含む一連の設計選択を取り入れることで、ダイナミズムを可能にする。これらの設計選択は平均ケースのパフォーマンスを改善するが、ALEXのメモリ空間と最悪のシナリオにおける時間複雑さを最大化する敵の振る舞いを許容することで、柔軟性とパフォーマンスに重点を置いて攻撃面を増大させる。本稿では,ALEXの最悪のシナリオを対象とした,アルゴリズム複雑性攻撃(ACA)に関する最初の体系的な研究を示す。本稿では,空間ACAと時間ACAの2つのカテゴリに分類される新しいACAを紹介する。まず、データノード上のACAは、ALEXのギャップ化された配列レイアウトを利用して、Multiple-Choice Knapsack(MCK)を使用して、データノードレベルでのメモリ消費を最大化する最適な逆挿入計画を生成する。第2に、内部ノード上のACAは、ALEXの壊滅的なコスト軽減機構を利用して、数百の逆挿入でメモリ外エラーを引き起こします。第3に、ACAは、実際のキー分布とデータノードの線形モデルとの格差を増大させるために、病理的な挿入を生成し、ALEXが正常なワークロードで動作しているのに対して、実行時の性能を最大1,641X劣化させる。 Learned Index Structures (LIS) view a sorted index as a model that learns the data distribution, takes a data element key as input, and outputs the predicted position of the key. The original LIS can only handle lookup operations with no support for updates, rendering it impractical to use for typical workloads. To address this limitation, recent studies have focused on designing efficient dynamic learned indexes. ALEX, as the pioneering dynamic learned index structures, enables dynamism by incorporating a series of design choices, including adaptive key space partitioning, dynamic model retraining, and sophisticated engineering and policies that prioritize read/write performance. While these design choices offer improved average-case performance, the emphasis on flexibility and performance increases the attack surface by allowing adversarial behaviors that maximize ALEX's memory space and time complexity in worst-case scenarios. In this work, we present the first systematic investigation of algorithmic complexity attacks (ACAs) targeting the worst-case scenarios of ALEX. We introduce new ACAs that fall into two categories, space ACAs and time ACAs, which target the memory space and time complexity, respectively. First, our space ACA on data nodes exploits ALEX's gapped array layout and uses Multiple-Choice Knapsack (MCK) to generate an optimal adversarial insertion plan for maximizing the memory consumption at the data node level. Second, our space ACA on internal nodes exploits ALEX's catastrophic cost mitigation mechanism, causing an out-of-memory error with only a few hundred adversarial insertions. Third, our time ACA generates pathological insertions to increase the disparity between the actual key distribution and the linear models of data nodes, deteriorating the runtime performance by up to 1,641X compared to ALEX operating under legitimate workloads.	翻訳日:2024-03-20 15:22:07 公開日:2024-03-19
# 任意多視点画像からの人間のメッシュ復元 Human Mesh Recovery from Arbitrary Multi-view Images ( http://arxiv.org/abs/2403.12434v1 ) ライセンス: Link先を確認	Xiaoben Li, Mancheng Meng, Ziyan Wu, Terrence Chen, Fan Yang, Dinggang Shen,	(参考訳) 任意のマルチビュー画像からのヒューマンメッシュリカバリには、任意のカメラポーズと、任意の数のカメラビューの2つの特徴がある。可変性のため、このタスクに取り組むために統一されたフレームワークを設計することは困難である。この課題は、フレキシビリティを維持しつつ、任意のカメラのポーズを同時に推定し、任意のマルチビューイメージから人間のメッシュを復元できるというジレンマとして要約できる。このジレンマを解決するために、任意の多視点画像から統一人間メッシュ回復(U-HMR)を分離・征服するフレームワークを提案する。特にU-HMRは、分離された構造と、カメラとボディーデカップリング(CBD)、カメラポーズ推定(CPE)、任意のビュー融合(AVF)の2つの主要コンポーネントから構成される。カメラのポーズと人体メッシュが互いに独立しているため、CBDはそれらを2つのサブタスクに分割し、2つのサブネットワーク(\ie, CPE, AVF)でそれぞれ処理する。 CPEでは、各カメラのポーズは他のカメラと無関係であるため、すべてのビューを並列に処理するために共有MLPを採用する。 AVFでは、マルチビュー情報を融合して融合操作をビュー数に依存しないものにするため、SMPLパラメータクエリトークンを用いたトランスフォーマーデコーダを導入し、メッシュリカバリのためのクロスビュー機能を抽出する。提案するフレームワークの有効性と各コンポーネントの効果を実証するため,Human3.6M,MPI-INF-3DHP,TotalCaptureの3つの公開データセットに対して広範な実験を行った。 Human mesh recovery from arbitrary multi-view images involves two characteristics: the arbitrary camera poses and arbitrary number of camera views. Because of the variability, designing a unified framework to tackle this task is challenging. The challenges can be summarized as the dilemma of being able to simultaneously estimate arbitrary camera poses and recover human mesh from arbitrary multi-view images while maintaining flexibility. To solve this dilemma, we propose a divide and conquer framework for Unified Human Mesh Recovery (U-HMR) from arbitrary multi-view images. In particular, U-HMR consists of a decoupled structure and two main components: camera and body decoupling (CBD), camera pose estimation (CPE), and arbitrary view fusion (AVF). As camera poses and human body mesh are independent of each other, CBD splits the estimation of them into two sub-tasks for two individual sub-networks (\ie, CPE and AVF) to handle respectively, thus the two sub-tasks are disentangled. In CPE, since each camera pose is unrelated to the others, we adopt a shared MLP to process all views in a parallel way. In AVF, in order to fuse multi-view information and make the fusion operation independent of the number of views, we introduce a transformer decoder with a SMPL parameters query token to extract cross-view features for mesh recovery. To demonstrate the efficacy and flexibility of the proposed framework and effect of each component, we conduct extensive experiments on three public datasets: Human3.6M, MPI-INF-3DHP, and TotalCapture.	翻訳日:2024-03-20 15:22:07 公開日:2024-03-19
# 精密物理駆動型テキスト・ツー・3D生成 Precise-Physics Driven Text-to-3D Generation ( http://arxiv.org/abs/2403.12438v1 ) ライセンス: Link先を確認	Qingshan Xu, Jiao Liu, Melvin Wong, Caishun Chen, Yew-Soon Ong,	(参考訳) テキスト・ツー・3D生成は、与えられたテキスト・プロンプトに基づいて、新しい3Dコンテンツを生成するという大きな可能性を示している。しかし、既存の生成法は主に幾何学的あるいは視覚的可視性に焦点を当て、生成した3次元形状の正確な物理知覚を無視している。これにより、現実世界の応用において生成された3D形状の実用性が著しく阻害される。本研究では,Phy3DGenを提案する。生成した3次元形状のソリッド・メカニクスを解析することにより,既存のテキスト・ツー・3次元生成法で生成された3次元形状が物理法則に従わないため,実世界の応用には実用的でないことを明らかにした。この目的のために、3次元拡散モデルを用いて3次元形状の先行情報を提供し、データ駆動型微分可能な物理層を設計し、固体力学を用いて3次元形状の先行情報を最適化する。これにより、幾何学を効率的に最適化し、3次元形状に関する正確な物理情報を同時に学習することができる。実験により, 幾何学的妥当性と正確な物理知覚を両立させることができ, さらに3次元仮想モデリングと正確な物理世界を考えることができることがわかった。 Text-to-3D generation has shown great promise in generating novel 3D content based on given text prompts. However, existing generative methods mostly focus on geometric or visual plausibility while ignoring precise physics perception for the generated 3D shapes. This greatly hinders the practicality of generated 3D shapes in real-world applications. In this work, we propose Phy3DGen, a precise-physics-driven text-to-3D generation method. By analyzing the solid mechanics of generated 3D shapes, we reveal that the 3D shapes generated by existing text-to-3D generation methods are impractical for real-world applications as the generated 3D shapes do not conform to the laws of physics. To this end, we leverage 3D diffusion models to provide 3D shape priors and design a data-driven differentiable physics layer to optimize 3D shape priors with solid mechanics. This allows us to optimize geometry efficiently and learn precise physics information about 3D shapes at the same time. Experimental results demonstrate that our method can consider both geometric plausibility and precise physics perception, further bridging 3D virtual modeling and precise physical worlds.	翻訳日:2024-03-20 15:22:07 公開日:2024-03-19
# マルチビュー3次元人物位置推定のための自己学習カノニカル空間 Self-learning Canonical Space for Multi-view 3D Human Pose Estimation ( http://arxiv.org/abs/2403.12440v1 ) ライセンス: Link先を確認	Xiaoben Li, Mancheng Meng, Ziyan Wu, Terrence Chen, Fan Yang, Dinggang Shen,	(参考訳) マルチビュー3次元人間のポーズ推定は、自然に単一のビューよりも優れており、複数のビューの画像によって提供されるより包括的な情報から恩恵を受けている。情報には、カメラのポーズ、2D/3Dの人間のポーズ、3Dの幾何学が含まれる。しかし、これらの情報の正確なアノテーションを得ることは困難であり、多視点画像から正確な3次元ポーズを予測することは困難である。この問題に対処するため、我々はCMANet(Cascaded Multi-view aggregating Network)と呼ばれる完全に自己管理されたフレームワークを提案し、多視点情報の統合と活用を目的とした標準パラメータ空間を構築した。本フレームワークでは,マルチビュー情報を2つのカテゴリに分類する。 1)ビュー内情報、2)ビュー間情報。そのため、CMANetは、IRV(Intra-view Module)とIEV(Inter-view Module)の2つのコンポーネントで構成されている。 IRVは、各ビューの初期のカメラポーズと3D人間のポーズを抽出するために使用され、IEVは、最後の3D人間のポーズのために補完的なポーズ情報と3Dの幾何学を融合することを目的としている。ビュー内およびビュー間のアグリゲーションを容易にするため、SMPLモデルのカメラポーズと人間のポーズと形状パラメータ($\theta$と$\beta$)で表現された標準パラメータ空間を定義し、2段階の学習手順を提案する。第一段階では、IRVは、市販の2Dキーポイント検出器の確実な出力によって監督されるカメラのポーズとビュー依存の人間のポーズを推定することを学ぶ。第2段階では、IRVは凍結され、IEVはカメラポーズをさらに洗練し、予測されたマルチビュー2Dキーポイントを併用することで達成される、クロスビュー補完と3D幾何制約を暗黙的に符号化することで、3D人間のポーズを最適化する。提案したフレームワーク,モジュール,学習戦略は総合的な実験によって有効であることが実証され,CMANetは大規模かつ質的な分析において最先端の手法よりも優れている。 Multi-view 3D human pose estimation is naturally superior to single view one, benefiting from more comprehensive information provided by images of multiple views. The information includes camera poses, 2D/3D human poses, and 3D geometry. However, the accurate annotation of these information is hard to obtain, making it challenging to predict accurate 3D human pose from multi-view images. To deal with this issue, we propose a fully self-supervised framework, named cascaded multi-view aggregating network (CMANet), to construct a canonical parameter space to holistically integrate and exploit multi-view information. In our framework, the multi-view information is grouped into two categories: 1) intra-view information , 2) inter-view information. Accordingly, CMANet consists of two components: intra-view module (IRV) and inter-view module (IEV). IRV is used for extracting initial camera pose and 3D human pose of each view; IEV is to fuse complementary pose information and cross-view 3D geometry for a final 3D human pose. To facilitate the aggregation of the intra- and inter-view, we define a canonical parameter space, depicted by per-view camera pose and human pose and shape parameters ($\theta$ and $\beta$) of SMPL model, and propose a two-stage learning procedure. At first stage, IRV learns to estimate camera pose and view-dependent 3D human pose supervised by confident output of an off-the-shelf 2D keypoint detector. At second stage, IRV is frozen and IEV further refines the camera pose and optimizes the 3D human pose by implicitly encoding the cross-view complement and 3D geometry constraint, achieved by jointly fitting predicted multi-view 2D keypoints. The proposed framework, modules, and learning strategy are demonstrated to be effective by comprehensive experiments and CMANet is superior to state-of-the-art methods in extensive quantitative and qualitative analysis.	翻訳日:2024-03-20 15:22:07 公開日:2024-03-19
# 対向軌道の断面積に沿った多角化による視覚言語攻撃の伝達性向上 Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory ( http://arxiv.org/abs/2403.12445v1 ) ライセンス: Link先を確認	Sensen Gao, Xiaojun Jia, Xuhong Ren, Ivor Tsang, Qing Guo,	(参考訳) 視覚言語事前学習(VLP)モデルは、画像とテキストの両方を解釈する際、顕著な能力を示すが、多モーダル対逆例(AE)の影響を受けやすい。敵攻撃の強化と脆弱性の発見、特にVLPモデルの一般的な問題(例えば、高転送性AE)は、信頼性と実用的なVLPモデルの構築に関するさらなる研究を刺激することができる。最近の研究(すなわち、セットレベル誘導攻撃)は、最適化経路に沿ってAEの多様性を高めるために画像とテキストのペアを増大させることが、敵の例の転送可能性を大幅に向上させることを示している。しかし、このアプローチは、主にオンライン敵の事例(すなわち最適化期間におけるAE)の多様性を強調し、被害者モデルに過度に適合し、転送可能性に影響を与えるリスクをもたらす。本研究では,VLPモデル間の転送可能性を高めるために,クリーンインプットとオンラインAEに対する逆例の多様性が重要であることを示唆する。そこで本稿では,AEsの多様性を拡大するために,対向軌道の交差領域に沿った多様化手法を提案する。モダリティ間の相互作用をフル活用するために,最適化中のテキスト誘導対逆例選択を導入する。さらに,潜在的なオーバーフィッティングを緩和するために,既存手法のような逆画像ではなく,最適化経路に沿った最終交差点領域から逸脱した逆テキストを指示する。広汎な実験により、VLPモデルと下流の視覚・言語タスク(例えば、画像テキスト検索(ITR)、ビジュアルグラウンド(VG)、画像キャプション(IC))間での転送性を向上させる方法の有効性が確認された。 Vision-language pre-training (VLP) models exhibit remarkable capabilities in comprehending both images and text, yet they remain susceptible to multimodal adversarial examples (AEs). Strengthening adversarial attacks and uncovering vulnerabilities, especially common issues in VLP models (e.g., high transferable AEs), can stimulate further research on constructing reliable and practical VLP models. A recent work (i.e., Set-level guidance attack) indicates that augmenting image-text pairs to increase AE diversity along the optimization path enhances the transferability of adversarial examples significantly. However, this approach predominantly emphasizes diversity around the online adversarial examples (i.e., AEs in the optimization period), leading to the risk of overfitting the victim model and affecting the transferability. In this study, we posit that the diversity of adversarial examples towards the clean input and online AEs are both pivotal for enhancing transferability across VLP models. Consequently, we propose using diversification along the intersection region of adversarial trajectory to expand the diversity of AEs. To fully leverage the interaction between modalities, we introduce text-guided adversarial example selection during optimization. Furthermore, to further mitigate the potential overfitting, we direct the adversarial text deviating from the last intersection region along the optimization path, rather than adversarial images as in existing methods. Extensive experiments affirm the effectiveness of our method in improving transferability across various VLP models and downstream vision-and-language tasks (e.g., Image-Text Retrieval(ITR), Visual Grounding(VG), Image Captioning(IC)).	翻訳日:2024-03-20 15:22:07 公開日:2024-03-19
# GitHubワークフローにおける大規模言語モデルの有効性について On the effectiveness of Large Language Models for GitHub Workflows ( http://arxiv.org/abs/2403.12446v1 ) ライセンス: Link先を確認	Xinyu Zhang, Siddharth Muralee, Sourag Cherupattamoolayil, Aravind Machiry,	(参考訳) GitHubワークフローまたはGitHub CIは、開発者がワークフロー、すなわちYAMLファイルとジョブのリストを指定することで、さまざまなソフトウェアエンジニアリングタスクを自動化できる人気のある継続的インテグレーションプラットフォームである。しかし、エンジニアリングの有効なワークフローは面倒だ。また、深刻なセキュリティ問題も発生し、サプライチェーンの脆弱性が発生する可能性がある。大規模言語モデル(LLM)の最近の進歩は、様々なソフトウェア開発タスクにおいてその効果を実証している。しかし、GitHubのワークフローは構造とセマンティクスの両方の通常のプログラムとは異なる。異なるレベルのプロンプトを持つ5つのワークフロー関連タスクにおけるLLMの有効性を理解するための、最初の総合的研究を行う。私たちは$\sim$400Kのワークフローをキュレートし、さまざまな詳細でプロンプトを生成しました。 GitHubのワークフロータスクでLLMを微調整しました。現状のLLM3種とその微調整版について検討した結果,LLMの現在の有効性と欠点について,様々な興味深い知見が得られた。 GitHub workflows or GitHub CI is a popular continuous integration platform that enables developers to automate various software engineering tasks by specifying them as workflows, i.e., YAML files with a list of jobs. However, engineering valid workflows is tedious. They are also prone to severe security issues, which can result in supply chain vulnerabilities. Recent advancements in Large Language Models (LLMs) have demonstrated their effectiveness in various software development tasks. However, GitHub workflows differ from regular programs in both structure and semantics. We perform the first comprehensive study to understand the effectiveness of LLMs on five workflow-related tasks with different levels of prompts. We curated a set of $\sim$400K workflows and generated prompts with varying detail. We also fine-tuned LLMs on GitHub workflow tasks. Our evaluation of three state-of-the-art LLMs and their fine-tuned variants revealed various interesting findings on the current effectiveness and drawbacks of LLMs.	翻訳日:2024-03-20 15:22:07 公開日:2024-03-19
# エディントンにインスパイアされたボルン・インフェルド重力におけるボソニックKG振動子:Wu-Yang磁極とリッチスカラー曲率効果 Bosonic KG-oscillators in Eddington-inspired Born-Infeld gravity: Wu-Yang magnetic monopole and Ricci scalar curvature effects ( http://arxiv.org/abs/2403.12447v1 ) ライセンス: Link先を確認	Omar Mustafa,	(参考訳) 我々は,Edington-inspired Born-Infeld(EiBI)重力とWu-Yang磁気モノポール(WYMM)のグローバルモノポール(GM)時空におけるボソニッククライン・ゴルドン(KG)振動子について検討した。 R=R_{\upsilon }^{\upsilon }$。リッチスカラーの曲率の存在は、効果的かつ明白に、対応する量子力学的反発コアをより反発させる力場を導入することが観察された。 Eibi-gravitational fieldでも同様の効果が観察されている。我々は、対応するボソニックKG-オシレータ量子力学系が、収束フン関数の形で解を持つことを繰り返し、報告する。このような条件/相関の使用は必須であり、すべての放射量子数$n_{r}\geq 0$に対して許容/制限された量子力学的軌道$\ell $-excitationsの集合が得られる。本手法は,異なるEiBI重力およびリッチスカラー曲率設定において,GM時空におけるKGオシレータの結果の検索を可能にするため,非常に有用であることが示されている。 We investigate the bosonic Klein-Gordon (KG) oscillators in a global monopole (GM) spacetime in Eddington-inspired Born-Infeld (EiBI) gravity and a Wu-Yang magnetic monopole (WYMM). We discuss the gravitational effects in the presence of Ricci scalar curvature $R=R_{\upsilon }^{\upsilon }$. It is observed that the presence of the Ricci scalar curvature, effectively and manifestly, introduces a force field that makes the corresponding quantum mechanical repulsive core more repulsive. Similar effect is also observed for the EiBI-gravitational field. We reiterate and report that the corresponding bosonic KG-oscillator quantum mechanical system admits a solution in the form of confluent Heun functions, the truncation of which into a physically admissible polynomial is shown to be associated with some parametric correlations/conditions. The use of such conditions/correlations is mandatory and yields a set of allowed/restricted quantum mechanical orbital $\ell $-excitations, for all radial quantum numbers $n_{r}\geq 0$. Our procedure is shown to be quite handy, in the since that it allows one to retrieve results for KG-oscillators in GM-spacetime in different EiBI-gravity and Ricci scalar curvature settings.	翻訳日:2024-03-20 15:12:20 公開日:2024-03-19
# 生成データは常にコントラスト学習に役立つか? Do Generated Data Always Help Contrastive Learning? ( http://arxiv.org/abs/2403.12448v1 ) ライセンス: Link先を確認	Yifei Wang, Jizhe Zhang, Yisen Wang,	(参考訳) 対照的学習(CL)は、教師なしの視覚表現学習において最も成功したパラダイムの1つだが、しばしば手作業によるデータ拡張に依存している。生成モデル、特に拡散モデルの増加に伴い、実際のデータ分布に近い現実的な画像を生成する能力はよく認識されている。これらの高画質画像は「データインフレーション」と呼ばれる手法であるコントラスト表現学習の強化に成功している。しかし、生成したデータ(DDPMのような優れた拡散モデルからでも)は、コントラスト学習に害を与えることもある。データインフレーションとデータ拡張の観点から,この障害の原因を考察する。初めて、データインフレーションがより強まるためには、より弱い増資が伴うべき相補的な役割を明らかにします。また、データインフレーションの下での一般化境界を導出することにより、これらの現象の厳密な理論的説明を提供する。これらの知見から,データ中心型戦略であるAdaptive Inflation(AdaInf)を提案する。ベンチマークデータセットでは、AdaInfはさまざまな対照的な学習方法に大幅な改善をもたらすことができる。特に、外部データを使わずに、AdaInfはCIFAR-10の94.70%の線形精度をSimCLRで取得し、多くの洗練された手法を超える新しい記録を樹立した。コードはhttps://github.com/PKU-ML/adainf.comで入手できる。 Contrastive Learning (CL) has emerged as one of the most successful paradigms for unsupervised visual representation learning, yet it often depends on intensive manual data augmentations. With the rise of generative models, especially diffusion models, the ability to generate realistic images close to the real data distribution has been well recognized. These generated high-equality images have been successfully applied to enhance contrastive representation learning, a technique termed ``data inflation''. However, we find that the generated data (even from a good diffusion model like DDPM) may sometimes even harm contrastive learning. We investigate the causes behind this failure from the perspective of both data inflation and data augmentation. For the first time, we reveal the complementary roles that stronger data inflation should be accompanied by weaker augmentations, and vice versa. We also provide rigorous theoretical explanations for these phenomena via deriving its generalization bounds under data inflation. Drawing from these insights, we propose Adaptive Inflation (AdaInf), a purely data-centric strategy without introducing any extra computation cost. On benchmark datasets, AdaInf can bring significant improvements for various contrastive learning methods. Notably, without using external data, AdaInf obtains 94.70% linear accuracy on CIFAR-10 with SimCLR, setting a new record that surpasses many sophisticated methods. Code is available at https://github.com/PKU-ML/adainf.	翻訳日:2024-03-20 15:12:20 公開日:2024-03-19
# ガイドフィードバックループ機構を用いた意図行動予測モデル Intention Action Anticipation Model with Guide-Feedback Loop Mechanism ( http://arxiv.org/abs/2403.12450v1 ) ライセンス: Link先を確認	Zongnan Ma, Fuchun Zhang, Zhixiong Nan, Yao Ge,	(参考訳) ビデオから人間の意図を予測するには、自動運転、ロボットアシスト技術、仮想現実などの幅広い応用がある。本研究では,人間の意図を示す行動を推定するために,エゴセントリックなビデオシーケンスを用いた意図的行動予測の問題に対処する。本稿では,ビデオシーケンス全体の特徴(すなわち,完全特徴)とビデオテールシーケンスの機能(すなわち,最近の特徴)をフル活用した階層的完全最新情報融合モデルを提案する。 HCRモデルには2つの主要なメカニズムがある。 Guide-Feedback Loop (GFL) メカニズムは、1つの最近の特徴と1つの完全な特徴の関係をモデル化するために提案されている。 GFLをベースとしたMCRFA(MultiComplete-Recent Feature Aggregation)モジュールは,最近の機能とマルチスケールな機能の関係をモデル化するために提案されている。 GFLとMCRFAに基づいて、HCRモデルは階層的に、マルチスケールの完全特徴とマルチスケールの最近の特徴の間のリッチな相互関係を探索することができる。比較およびアブレーション実験を通じて、EPIC-KitchensとEGTEA Gaze+の2つのよく知られた公開データセット上で、我々のモデルの有効性を検証する。 Anticipating human intention from videos has broad applications, such as automatic driving, robot assistive technology, and virtual reality. This study addresses the problem of intention action anticipation using egocentric video sequences to estimate actions that indicate human intention. We propose a Hierarchical Complete-Recent (HCR) information fusion model that makes full use of the features of the entire video sequence (i.e., complete features) and the features of the video tail sequence (i.e., recent features). The HCR model has two primary mechanisms. The Guide-Feedback Loop (GFL) mechanism is proposed to model the relation between one recent feature and one complete feature. Based on GFL, the MultiComplete-Recent Feature Aggregation (MCRFA) module is proposed to model the relation of one recent feature with multiscale complete features. Based on GFL and MCRFA, the HCR model can hierarchically explore the rich interrelationships between multiscale complete features and multiscale recent features. Through comparative and ablation experiments, we validate the effectiveness of our model on two well-known public datasets: EPIC-Kitchens and EGTEA Gaze+.	翻訳日:2024-03-20 15:12:20 公開日:2024-03-19
# INSIGHT: 言語説明による終末から終末へのニューロシンボリック視覚強化学習 INSIGHT: End-to-End Neuro-Symbolic Visual Reinforcement Learning with Language Explanations ( http://arxiv.org/abs/2403.12451v1 ) ライセンス: Link先を確認	Lirui Luo, Guoxi Zhang, Hongming Xu, Yaodong Yang, Cong Fang, Qing Li,	(参考訳) ニューロシンボリック強化学習(NS-RL)は、象徴的政策の解釈可能性に特徴付けられる、説明可能な意思決定のための有望なパラダイムとして登場した。視覚的な観察を行うタスクでは、NS-RLは状態の構造化表現を必要とするが、前のアルゴリズムでは効率の欠如により報酬信号で構造化状態を洗練できない。アクセシビリティもまた問題であり、現在の象徴的なポリシーを解釈するためには広範なドメイン知識が必要である。本稿では,視覚基盤モデルをスケーラブルな知覚モジュールに蒸留することにより,効率のボトルネックを克服する,構造化状態とシンボルポリシを同時に学習可能なフレームワークを提案する。さらに、我々は大規模な言語モデルを用いて、ポリシーや決定のための簡潔で読みやすい言語説明を生成するパイプラインを設計する。 9つのAtariタスクの実験では,既存のNSRL法よりもかなりの性能向上を示した。また、政策や意思決定の説明も紹介する。 Neuro-symbolic reinforcement learning (NS-RL) has emerged as a promising paradigm for explainable decision-making, characterized by the interpretability of symbolic policies. For tasks with visual observations, NS-RL entails structured representations for states, but previous algorithms are unable to refine the structured states with reward signals due to a lack of efficiency. Accessibility is also an issue, as extensive domain knowledge is required to interpret current symbolic policies. In this paper, we present a framework that is capable of learning structured states and symbolic policies simultaneously, whose key idea is to overcome the efficiency bottleneck by distilling vision foundation models into a scalable perception module. Moreover, we design a pipeline that uses large language models to generate concise and readable language explanations for policies and decisions. In experiments on nine Atari tasks, our approach demonstrates substantial performance gains over existing NSRL methods. We also showcase explanations for policies and decisions.	翻訳日:2024-03-20 15:12:20 公開日:2024-03-19
# CLIP-VIS: オープン語彙ビデオインスタンスセグメンテーションのためのCLIP適応 CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation ( http://arxiv.org/abs/2403.12455v1 ) ライセンス: Link先を確認	Wenqi Zhu, Jiale Cao, Jin Xie, Shuangming Yang, Yanwei Pang,	(参考訳) Open-vocabularyビデオインスタンスのセグメンテーションは、ビデオ内のオープンなカテゴリに属するインスタンスをセグメンテーションし追跡する。視覚言語モデルであるContrastive Language-Image Pre-Training (CLIP)は,画像レベルのオープン語彙タスクにおいて,ゼロショットの強い分類能力を示した。本稿では,CLIP-VISと呼ばれる簡単なエンコーダデコーダネットワークを提案する。私たちのCLIP-VISは凍結したCLIP画像エンコーダを採用し、クラス非依存マスク生成、時間的トップK強調マッチング、重み付きオープン語彙分類を含む3つのモジュールを導入している。初期クエリのセットが与えられた場合、クラスに依存しないマスク生成では、クエリマスクと対応するオブジェクトスコアとマスクIoUスコアを予測するトランスフォーマーデコーダが使用される。次に、時間的トップK強調マッチングは、主に一致したフレームを用いて、フレーム間のクエリマッチングを実行する。最後に、重み付きオープン語彙分類は、まず、マスクプーリングによるクエリビジュアル特徴を生成し、次に、オブジェクトスコアとマスクIoUスコアを用いて重み付き分類を行う。私たちのCLIP-VISは、インスタンスカテゴリやIDのアノテーションを必要としない。提案手法の有効性を実証するため,様々なビデオ・インスタンス・セグメンテーション・データセットを用いて実験を行った。 ConvNeXt-Bをバックボーンとして使用すると、当社のCLIP-VISは、LV-VISデータセットの検証セットにおいて、それぞれOV2Segを11.0%、24.0%、APnスコアを32.1%、APnスコアを40.3%達成します。ソースコードとモデルはhttps://github.com/zwq456/CLIP-VIS.git.comで公開します。 Open-vocabulary video instance segmentation strives to segment and track instances belonging to an open set of categories in a video. The vision-language model Contrastive Language-Image Pre-training (CLIP) has shown strong zero-shot classification ability in image-level open-vocabulary task. In this paper, we propose a simple encoder-decoder network, called CLIP-VIS, to adapt CLIP for open-vocabulary video instance segmentation. Our CLIP-VIS adopts frozen CLIP image encoder and introduces three modules, including class-agnostic mask generation, temporal topK-enhanced matching, and weighted open-vocabulary classification. Given a set of initial queries, class-agnostic mask generation employs a transformer decoder to predict query masks and corresponding object scores and mask IoU scores. Then, temporal topK-enhanced matching performs query matching across frames by using K mostly matched frames. Finally, weighted open-vocabulary classification first generates query visual features with mask pooling, and second performs weighted classification using object scores and mask IoU scores. Our CLIP-VIS does not require the annotations of instance categories and identities. The experiments are performed on various video instance segmentation datasets, which demonstrate the effectiveness of our proposed method, especially on novel categories. When using ConvNeXt-B as backbone, our CLIP-VIS achieves the AP and APn scores of 32.1% and 40.3% on validation set of LV-VIS dataset, which outperforms OV2Seg by 11.0% and 24.0% respectively. We will release the source code and models at https://github.com/zwq456/CLIP-VIS.git.	翻訳日:2024-03-20 15:12:20 公開日:2024-03-19
# 訓練可能な特徴サブトラクションを用いたプライバシ保護顔認証 Privacy-Preserving Face Recognition Using Trainable Feature Subtraction ( http://arxiv.org/abs/2403.12457v1 ) ライセンス: Link先を確認	Yuxi Mi, Zhizhou Zhong, Yuge Huang, Jiazhen Ji, Jianqing Xu, Jun Wang, Shaoming Wang, Shouhong Ding, Shuigeng Zhou,	(参考訳) 顔認識の普及によりプライバシーの懸念が高まり、顔画像への不正アクセスは機密性の高い個人情報を暴露する可能性がある。本稿では,視覚障害と回復障害に対する顔画像保護について検討する。画像圧縮に着想を得て,原顔とモデル生成再生の特徴的部分抽出により,視覚的に非変形的な顔画像を作成することを提案する。画像内の認識可能な特徴は、その高次元の特徴表現上で認識モデルを協調訓練することによって促進される。プライバシーを高めるために、高次元表現はランダムチャネルシャッフルによって作成され、攻撃者平均テクスチャの詳細を欠いたランダム化認識可能な画像となる。我々は,この手法を新たなプライバシ保護顔認識手法であるMinusFaceに精錬する。実験では、高い認識精度と効果的なプライバシー保護を示す。コードはhttps://github.com/Tencent/TFace.comで公開されている。 The widespread adoption of face recognition has led to increasing privacy concerns, as unauthorized access to face images can expose sensitive personal information. This paper explores face image protection against viewing and recovery attacks. Inspired by image compression, we propose creating a visually uninformative face image through feature subtraction between an original face and its model-produced regeneration. Recognizable identity features within the image are encouraged by co-training a recognition model on its high-dimensional feature representation. To enhance privacy, the high-dimensional representation is crafted through random channel shuffling, resulting in randomized recognizable images devoid of attacker-leverageable texture details. We distill our methodologies into a novel privacy-preserving face recognition method, MinusFace. Experiments demonstrate its high recognition accuracy and effective privacy protection. Its code is available at https://github.com/Tencent/TFace.	翻訳日:2024-03-20 15:12:20 公開日:2024-03-19
# 非負のコントラスト学習 Non-negative Contrastive Learning ( http://arxiv.org/abs/2403.12459v1 ) ライセンス: Link先を確認	Yifei Wang, Qi Zhang, Yaoyu Guo, Yisen Wang,	(参考訳) 深い表現は、ブラックボックス方式で下流タスクに転送する際の有望なパフォーマンスを示している。しかし、それらの解釈可能性の欠如は、人間の理解に不透明なことが多いため、依然として大きな課題である。本稿では,非負の行列因子化(NMF)の再現性である非負のコントラスト学習(NCL)を提案する。 NCLの力は、NMFがサンプルクラスタと密接に整合する特徴を抽出する能力を思い出させる、特徴に対する非負性制約の実施にある。 NCLは数学的にNMFの目的とよく一致しているだけでなく、NMFの解釈可能性特性も保ち、標準のコントラスト学習(CL)よりも疎密で非絡み合った表現をもたらす。理論的には、NCLの識別可能性と下流一般化の保証を確立する。実験的に、これらの利点により、NCLは機能障害、特徴選択、下流分類タスクにおいてCLを大幅に上回ることが示される。最後に,NCLを他の学習シナリオに拡張し,教師付き学習にも役立てることができることを示す。コードはhttps://github.com/PKU-ML/non_neg.comで入手できる。 Deep representations have shown promising performance when transferred to downstream tasks in a black-box manner. Yet, their inherent lack of interpretability remains a significant challenge, as these features are often opaque to human understanding. In this paper, we propose Non-negative Contrastive Learning (NCL), a renaissance of Non-negative Matrix Factorization (NMF) aimed at deriving interpretable features. The power of NCL lies in its enforcement of non-negativity constraints on features, reminiscent of NMF's capability to extract features that align closely with sample clusters. NCL not only aligns mathematically well with an NMF objective but also preserves NMF's interpretability attributes, resulting in a more sparse and disentangled representation compared to standard contrastive learning (CL). Theoretically, we establish guarantees on the identifiability and downstream generalization of NCL. Empirically, we show that these advantages enable NCL to outperform CL significantly on feature disentanglement, feature selection, as well as downstream classification tasks. At last, we show that NCL can be easily extended to other learning scenarios and benefit supervised learning as well. Code is available at https://github.com/PKU-ML/non_neg.	翻訳日:2024-03-20 15:12:20 公開日:2024-03-19
# リカレントスパイキングニューラルネットワークの不均一学習ダイナミクスのトポロジ的表現 Topological Representations of Heterogeneous Learning Dynamics of Recurrent Spiking Neural Networks ( http://arxiv.org/abs/2403.12462v1 ) ライセンス: Link先を確認	Biswadeep Chakraborty, Saibal Mukhopadhyay,	(参考訳) スパイキングニューラルネットワーク(SNN)は神経科学と人工知能において重要なパラダイムとなり、脳にインスパイアされた計算を提供している。近年,深層ニューラルネットワークのネットワーク表現について研究が進められている。しかし、SNNが学習した表現の研究はほとんど行われておらず、特にスパイクタイピング依存塑性(STDP)のような教師なしの局所学習手法を用いている。最近の研究は、Representation Topology Divergence (RTD)と呼ばれる学習表現のトポロジカルマッピングを比較する新しい手法を導入している。有用ではあるが、この方法は特にフィードフォワードディープニューラルネットワーク向けに設計されており、リカレントSNN(RSNN)のようなリカレントネットワークには使用できない。本稿では,学習方法の異なるRSNNモデルの分散表現の違いを測定するために,RTDを用いた新しい手法を提案する。フィードフォワード・オートエンコーダネットワークとスキップ接続を併用した新しいRSNNの再構成を提案し,再帰型ネットワークのRTD計算に役立てる。そこで本研究では,STDPを用いて訓練したRSNNの学習能力と,そのような表現の学習におけるシナプス力学における不均一性の役割について検討する。我々は、RSNNにおける異種STDPが、等質で代理的な勾配に基づく教師付き学習と異なる表現をもたらすことを示した。この結果は、より効率的で生物学的に可能なハイブリッド人工知能システムの開発を支援するため、異種SNNモデルの可能性についての洞察を提供する。 Spiking Neural Networks (SNNs) have become an essential paradigm in neuroscience and artificial intelligence, providing brain-inspired computation. Recent advances in literature have studied the network representations of deep neural networks. However, there has been little work that studies representations learned by SNNs, especially using unsupervised local learning methods like spike-timing dependent plasticity (STDP). Recent work by \cite{barannikov2021representation} has introduced a novel method to compare topological mappings of learned representations called Representation Topology Divergence (RTD). Though useful, this method is engineered particularly for feedforward deep neural networks and cannot be used for recurrent networks like Recurrent SNNs (RSNNs). This paper introduces a novel methodology to use RTD to measure the difference between distributed representations of RSNN models with different learning methods. We propose a novel reformulation of RSNNs using feedforward autoencoder networks with skip connections to help us compute the RTD for recurrent networks. Thus, we investigate the learning capabilities of RSNN trained using STDP and the role of heterogeneity in the synaptic dynamics in learning such representations. We demonstrate that heterogeneous STDP in RSNNs yield distinct representations than their homogeneous and surrogate gradient-based supervised learning counterparts. Our results provide insights into the potential of heterogeneous SNN models, aiding the development of more efficient and biologically plausible hybrid artificial intelligence systems.	翻訳日:2024-03-20 15:12:20 公開日:2024-03-19
# オブジェクトローカライゼーション Few-shot Object Localization ( http://arxiv.org/abs/2403.12466v1 ) ライセンス: Link先を確認	Yunhan Ren, Bo Li, Chengyang Zhang, Yong Zhang,	(参考訳) 既存の少数のオブジェクトカウントタスクは、画像中のオブジェクトの数を定量化することに集中し、正確な位置情報を無視する。本稿では,この研究ギャップを埋めるため,Few-Shot Object Localization (FSOL) の新たな課題を紹介した。本課題は、少数のラベル付きサポートサンプルを利用して、対応する画像内のオブジェクトの位置情報をクエリすることで、一般化されたオブジェクトのローカライゼーションを実現する。そこで本研究では,革新的な高性能ベースラインモデルを提案する。本モデルでは,特徴マップと問合せ画像の関連性を探るための自己問合せモジュールとともに,サポート画像と問合せ画像間の形状関連と勾配差を高めるために,デュアルパス機能拡張モジュールを統合した。実験の結果,FSOLタスクにおけるアプローチの大幅な性能向上が示され,さらなる研究のための効率的なベンチマークが確立された。 Existing few-shot object counting tasks primarily focus on quantifying the number of objects in an image, neglecting precise positional information. To bridge this research gap, this paper introduces the novel task of Few-Shot Object Localization (FSOL), which aims to provide accurate object positional information. This task achieves generalized object localization by leveraging a small number of labeled support samples to query the positional information of objects within corresponding images. To advance this research field, we propose an innovative high-performance baseline model. Our model integrates a dual-path feature augmentation module to enhance shape association and gradient differences between supports and query images, alongside a self-query module designed to explore the association between feature maps and query images. Experimental results demonstrate a significant performance improvement of our approach in the FSOL task, establishing an efficient benchmark for further research.	翻訳日:2024-03-20 15:12:20 公開日:2024-03-19
# CrossTune:ラベル拡張によるブラックボックスのFew-Shot分類 CrossTune: Black-Box Few-Shot Classification with Label Enhancement ( http://arxiv.org/abs/2403.12468v1 ) ライセンス: Link先を確認	Danqing Luo, Chen Zhang, Yan Zhang, Haizhou Li,	(参考訳) 大規模言語モデル(LLM)の訓練や微調整には、かなりの計算資源が必要である。ひとつのアプローチは、これらのモデルをブラックボックスとして扱い、フォワードパス(推論API)を使用してそれらと対話することです。現在の研究は、勾配のないプロンプト最適化を用いて、これらのブラックボックスモデルを下流タスクに適応させることに重点を置いているが、これはタスク固有のプロンプトを探索する高価なプロセスを伴うことが多い。そこで我々は,ブラックボックス言語モデルへの適応を即時検索なしで研究する動機付けを行った。具体的には、入力テキストシーケンスとタスク固有のラベル記述とのセマンティックな関連性をモデル化する、CrossTuneというラベル強化型クロスアテンションネットワークを提案する。その有効性は、少数ショットテキスト分類の文脈で検証される。 CrossTuneの一般化を改善するために、ChatGPTを使用して、コンテキスト内学習を通じて追加のトレーニングデータを生成する。低品質のChatGPT生成データを除外するためにスイッチ機構が実装されている。 7つのベンチマークテキスト分類データセットの広範な実験を通して,提案手法が従来の勾配なしブラックボックスチューニング手法を平均5.7%上回っていることを示す。 ChatGPTを付加したデータを使用しなくても、CrossTuneは従来のブラックボックスチューニング手法よりも良い、あるいはコンパラブルに動作し、我々のアプローチの有効性を示唆している。 Training or finetuning large-scale language models (LLMs) requires substantial computation resources, motivating recent efforts to explore parameter-efficient adaptation to downstream tasks. One approach is to treat these models as black boxes and use forward passes (Inference APIs) to interact with them. Current research focuses on adapting these black-box models to downstream tasks using gradient-free prompt optimization, but this often involves an expensive process of searching task-specific prompts. Therefore, we are motivated to study black-box language model adaptation without prompt search. Specifically, we introduce a label-enhanced cross-attention network called CrossTune, which models the semantic relatedness between the input text sequence and task-specific label descriptions. Its effectiveness is examined in the context of few-shot text classification. To improve the generalization of CrossTune, we utilize ChatGPT to generate additional training data through in-context learning. A switch mechanism is implemented to exclude low-quality ChatGPT-generated data. Through extensive experiments on seven benchmark text classification datasets, we demonstrate that our proposed approach outperforms the previous state-of-the-art gradient-free black-box tuning method by 5.7% on average. Even without using ChatGPT-augmented data, CrossTune performs better or comparably than previous black-box tuning methods, suggesting the effectiveness of our approach.	翻訳日:2024-03-20 15:12:20 公開日:2024-03-19
# サルカズム認識に「意味」はいつ役立つのか? When Do "More Contexts" Help with Sarcasm Recognition? ( http://arxiv.org/abs/2403.12469v1 ) ライセンス: Link先を確認	Ojas Nimase, Sanghyun Hong,	(参考訳) サルカズムの認識は、単語の文字通りの意味とは正反対、あるいは異なる真の意図を理解する必要があるため、困難である。これまでの研究は、モデルによりリッチな$contexts$、例えば、感情や文化的ニュアンスを提供する一連のメソッドを開発することで、この問題に対処してきた。個別に有効であることが示されているが、その集団的効果を体系的に評価する研究は行われていない。結果として、追加の文脈が皮肉認識をどの程度改善できるかは、まだ不明である。本研究では、モデルにより多くのコンテキストを組み込むことによって、既存のメソッドがもたらす改善について検討する。この目的のために、複数のコンテキストキューを統合し、異なるアプローチをテストするためのフレームワークを開発する。 3つのサルカズム認識ベンチマークに対する4つのアプローチによる評価では、既存の最先端性能を実現し、さらにコンテキストを逐次追加する利点を示す。さらに、より優れた結果の追求において、モデルが社会的バイアスを採用する必要があることを強調して、より多くのコンテキストを使用することの固有の欠点を特定します。 Sarcasm recognition is challenging because it needs an understanding of the true intention, which is opposite to or different from the literal meaning of the words. Prior work has addressed this challenge by developing a series of methods that provide richer $contexts$, e.g., sentiment or cultural nuances, to models. While shown to be effective individually, no study has systematically evaluated their collective effectiveness. As a result, it remains unclear to what extent additional contexts can improve sarcasm recognition. In this work, we explore the improvements that existing methods bring by incorporating more contexts into a model. To this end, we develop a framework where we can integrate multiple contextual cues and test different approaches. In evaluation with four approaches on three sarcasm recognition benchmarks, we achieve existing state-of-the-art performances and also demonstrate the benefits of sequentially adding more contexts. We also identify inherent drawbacks of using more contexts, highlighting that in the pursuit of even better results, the model may need to adopt societal biases.	翻訳日:2024-03-20 15:12:20 公開日:2024-03-19
# SC-Diff:潜在拡散モデルを用いた3次元形状補完 SC-Diff: 3D Shape Completion with Latent Diffusion Models ( http://arxiv.org/abs/2403.12470v1 ) ライセンス: Link先を確認	Juan D. Galvis, Xingxing Zuo, Simon Schaefer, Stefan Leutengger,	(参考訳) 本稿では, 部分的な3次元スキャンから, TSDF(Trncated Signed Distance Function)として表現される形状の完備化に最適化された3次元潜在拡散モデルを用いた3次元形状完備化手法を提案する。本手法は,空間的コンディショニングとクロスアテンションによる画像ベースコンディショニングを,キャプチャー部分スキャンからの3次元特徴の統合により組み合わせたものである。このデュアルガイダンスにより、高忠実でリアルな形状を優れた解像度で実現することができる。提案手法のコアとなるのは,2次元潜伏拡散モデルにインスパイアされたオートエンコーダを用いた低次元潜伏空間への3次元データの圧縮である。この圧縮により、高解像度形状の処理が容易になり、複数のオブジェクトクラスにまたがってモデルを適用できます。我々は,形状完備化の分野での2つの一般的なベンチマークに対するアプローチを検証し,精度とリアリズムの両面での競争性能を実証し,全てのオブジェクトクラスに対して単一のモデルで高解像度で動作しながら,最先端の手法に匹敵する性能を実証した。本稿では,本モデルに対する包括的評価を行い,未確認オブジェクトクラスにおいても多様な形状完備化課題に対処する上での有効性を示す。コードは受理時にリリースされます。 This paper introduces a 3D shape completion approach using a 3D latent diffusion model optimized for completing shapes, represented as Truncated Signed Distance Functions (TSDFs), from partial 3D scans. Our method combines image-based conditioning through cross-attention and spatial conditioning through the integration of 3D features from captured partial scans. This dual guidance enables high-fidelity, realistic shape completions at superior resolutions. At the core of our approach is the compression of 3D data into a low-dimensional latent space using an auto-encoder inspired by 2D latent diffusion models. This compression facilitates the processing of higher-resolution shapes and allows us to apply our model across multiple object classes, a significant improvement over other existing diffusion-based shape completion methods, which often require a separate diffusion model for each class. We validated our approach against two common benchmarks in the field of shape completion, demonstrating competitive performance in terms of accuracy and realism and performing on par with state-of-the-art methods despite operating at a higher resolution with a single model for all object classes. We present a comprehensive evaluation of our model, showcasing its efficacy in handling diverse shape completion challenges, even on unseen object classes. The code will be released upon acceptance.	翻訳日:2024-03-20 15:12:20 公開日:2024-03-19
# PostoMETRO:ロバストな3次元メッシュ回復のためのポーズトークン強化メッシュトランス PostoMETRO: Pose Token Enhanced Mesh Transformer for Robust 3D Human Mesh Recovery ( http://arxiv.org/abs/2403.12473v1 ) ライセンス: Link先を確認	Wendi Yang, Zihang Jiang, Shang Zhao, S. Kevin Zhou,	(参考訳) シングルイメージベースのヒューマンメッシュリカバリの最近の進歩により、モデル全体の正確性を維持しながら、閉塞のような極端なシナリオにおけるパフォーマンス向上への関心が高まっている。隠蔽下で正確に注釈付けされた3Dポーズを得るのは難しいが、それでも活用できるリッチで正確な2Dポーズアノテーションが豊富にある。しかし、既存の研究は主に2Dポーズ座標を直接活用して3Dポーズとメッシュを推定することに焦点を当てている。本稿では, PostoMETRO($\textbf{Pos}$e $\textbf{to}$ken enhanced $\textbf{ME}$sh $\textbf{TR}$ansf$\textbf{O}$rmer)を提案する。特殊なポーズトークンライザを用いることで、2Dのポーズデータをコンパクトなポーズトークン列に効率的にコンデンスし、画像トークンとともにトランスフォーマーに供給する。このプロセスは、画像からテクスチャの豊かな描写を確実にするだけでなく、ポーズと画像情報の堅牢な統合を促進する。その後、これらの組み合わせトークンは頂点とジョイントトークンによってクエリされ、メッシュ頂点と人間の関節の3D座標をデコードする。頑健なポーズトークン表現と効果的な組み合わせによって達成された私たちは、閉塞のような極端なシナリオの下でも、より正確な3D座標を生成することができる。標準およびオクルージョン固有のベンチマークの実験では、PostoMETROの有効性が示されている。質的な結果は、どのように2Dポーズが3D再構築に役立つかをより明確に示している。コードは利用可能になる。 With the recent advancements in single-image-based human mesh recovery, there is a growing interest in enhancing its performance in certain extreme scenarios, such as occlusion, while maintaining overall model accuracy. Although obtaining accurately annotated 3D human poses under occlusion is challenging, there is still a wealth of rich and precise 2D pose annotations that can be leveraged. However, existing works mostly focus on directly leveraging 2D pose coordinates to estimate 3D pose and mesh. In this paper, we present PostoMETRO($\textbf{Pos}$e $\textbf{to}$ken enhanced $\textbf{ME}$sh $\textbf{TR}$ansf$\textbf{O}$rmer), which integrates occlusion-resilient 2D pose representation into transformers in a token-wise manner. Utilizing a specialized pose tokenizer, we efficiently condense 2D pose data to a compact sequence of pose tokens and feed them to the transformer together with the image tokens. This process not only ensures a rich depiction of texture from the image but also fosters a robust integration of pose and image information. Subsequently, these combined tokens are queried by vertex and joint tokens to decode 3D coordinates of mesh vertices and human joints. Facilitated by the robust pose token representation and the effective combination, we are able to produce more precise 3D coordinates, even under extreme scenarios like occlusion. Experiments on both standard and occlusion-specific benchmarks demonstrate the effectiveness of PostoMETRO. Qualitative results further illustrate the clarity of how 2D pose can help 3D reconstruction. Code will be made available.	翻訳日:2024-03-20 15:12:20 公開日:2024-03-19
# FairSIN: 知覚情報中立化によるグラフニューラルネットワークの公平性の実現 FairSIN: Achieving Fairness in Graph Neural Networks through Sensitive Information Neutralization ( http://arxiv.org/abs/2403.12474v1 ) ライセンス: Link先を確認	Cheng Yang, Jixi Liu, Yunhe Yan, Chuan Shi,	(参考訳) 他の機械学習モデルと同様に、グラフ構造化データのモデリングにおいてグラフニューラルネットワーク(GNN)が顕著に成功しているにもかかわらず、GNNは人種や性別などのセンシティブな属性に基づいてバイアス付き予測を行うことができる。公平性を考慮した最近のSOTA(State-of-the-art)手法では,入力や表現,例えばエッジドロップや特徴マスキングからセンシティブな情報をフィルタリングする手法が提案されている。しかし、このようなフィルタリングベースの戦略は、いくつかの非感度な特徴情報をフィルタリングする可能性もあり、予測性能と公平性の間の準最適トレードオフにつながる。この問題に対処するため、我々は、メッセージパッシング前のノード機能や表現にF3(Fairness-facilitating Features)を追加する、革新的な中和ベースのパラダイムを公表した。 F3はノード表現の感度バイアスを統計的に中和し、追加の非感度情報を提供すると期待されている。また、F3は各ノードの異質な隣人(異なる感度特性を持つ隣人)の特徴を強調することで実現可能であると結論付ける。提案手法をFairSINと命名し,データ中心とモデル中心の両方の観点から,実装のバリエーションを3つ提示する。 3つの異なるGNNバックボーンを持つ5つのベンチマークデータセットの実験結果から、FairSINは高い予測精度を維持しながら、フェアネスの指標を大幅に改善することが示された。 Despite the remarkable success of graph neural networks (GNNs) in modeling graph-structured data, like other machine learning models, GNNs are also susceptible to making biased predictions based on sensitive attributes, such as race and gender. For fairness consideration, recent state-of-the-art (SOTA) methods propose to filter out sensitive information from inputs or representations, e.g., edge dropping or feature masking. However, we argue that such filtering-based strategies may also filter out some non-sensitive feature information, leading to a sub-optimal trade-off between predictive performance and fairness. To address this issue, we unveil an innovative neutralization-based paradigm, where additional Fairness-facilitating Features (F3) are incorporated into node features or representations before message passing. The F3 are expected to statistically neutralize the sensitive bias in node representations and provide additional nonsensitive information. We also provide theoretical explanations for our rationale, concluding that F3 can be realized by emphasizing the features of each node's heterogeneous neighbors (neighbors with different sensitive attributes). We name our method as FairSIN, and present three implementation variants from both data-centric and model-centric perspectives. Experimental results on five benchmark datasets with three different GNN backbones show that FairSIN significantly improves fairness metrics while maintaining high prediction accuracies.	翻訳日:2024-03-20 15:12:20 公開日:2024-03-19
# TT-BLIP:BLIPとTri-Transformerを用いたフェイクニュース検出 TT-BLIP: Enhancing Fake News Detection Using BLIP and Tri-Transformer ( http://arxiv.org/abs/2403.12481v1 ) ライセンス: Link先を確認	Eunjee Choi, Jong-Kook Kim,	(参考訳) 偽ニュースを検出することには多くの注目を集めている。従来の多くの手法は独立してアンモダルデータを符号化し、統合マルチモーダル情報の利点を無視していた。また、テキストや画像の特殊特徴抽出がないため、これらの方法はさらに制限される。本稿では,3種類の情報に対して, BERT と BLIP\textsubscript{Txt} と ResNet と BLIP\textsubscript{Img} の3種類の情報に対して BLIP と BLIP\textsubscript{Txt} のブートストラップ言語前処理を適用した TT-BLIP というエンドツーエンドモデルを提案する。マルチモーダル・トリ・トランスフォーマーは3種類のマルチヘッドアテンション機構を用いてトリモーダル特徴を融合し、拡張表現のための統合モーダル性を確保し、マルチモーダルデータ解析を改善した。実験は、WeiboとGossipcopという2つのフェイクニュースデータセットを使って行われる。その結果,TT-BLIPは最先端モデルよりも優れていた。 Detecting fake news has received a lot of attention. Many previous methods concatenate independently encoded unimodal data, ignoring the benefits of integrated multimodal information. Also, the absence of specialized feature extraction for text and images further limits these methods. This paper introduces an end-to-end model called TT-BLIP that applies the bootstrapping language-image pretraining for unified vision-language understanding and generation (BLIP) for three types of information: BERT and BLIP\textsubscript{Txt} for text, ResNet and BLIP\textsubscript{Img} for images, and bidirectional BLIP encoders for multimodal information. The Multimodal Tri-Transformer fuses tri-modal features using three types of multi-head attention mechanisms, ensuring integrated modalities for enhanced representations and improved multimodal data analysis. The experiments are performed using two fake news datasets, Weibo and Gossipcop. The results indicate TT-BLIP outperforms the state-of-the-art models.	翻訳日:2024-03-20 15:12:20 公開日:2024-03-19
# LLMエージェントが組織化されたチームで協力することを学ぶ Embodied LLM Agents Learn to Cooperate in Organized Teams ( http://arxiv.org/abs/2403.12482v1 ) ライセンス: Link先を確認	Xudong Guo, Kaixuan Huang, Jiale Liu, Wenhui Fan, Natalia Vélez, Qingyun Wu, Huazheng Wang, Thomas L. Griffiths, Mengdi Wang,	(参考訳) 大規模言語モデル (LLM) は推論、計画、意思決定のための統合的なツールとして登場し、その広範な世界的知識と言語関連タスクの習熟度に基づいている。したがって、LLMは協力を促進するために多エージェントシステム内での自然言語の相互作用に大きな可能性を秘めている。しかし、LSMエージェントは過剰に報告し、いかなる命令にも従う傾向にあり、情報冗長性とマルチエージェント協調の混乱をもたらす可能性がある。人的組織にインスパイアされた本論文では,LLMエージェントに即時的な組織構造を課し,これらの問題を緩和する枠組みを提案する。本研究は, LLMエージェントを具体化して実施した一連の実験を通じて, LLMエージェントが提示するリーダーシップの質と自発的協調行動に光を当てることにより, チームの効率性に及ぼすリーダーシップの影響を明らかにする。さらに、LCMの可能性を生かして、Criticize-Reflectプロセスを通じて組織的プロンプトの強化を提案し、その結果、コミュニケーションコストを削減し、チームの効率を向上する新たな組織構造が生まれる。 Large Language Models (LLMs) have emerged as integral tools for reasoning, planning, and decision-making, drawing upon their extensive world knowledge and proficiency in language-related tasks. LLMs thus hold tremendous potential for natural language interaction within multi-agent systems to foster cooperation. However, LLM agents tend to over-report and comply with any instruction, which may result in information redundancy and confusion in multi-agent cooperation. Inspired by human organizations, this paper introduces a framework that imposes prompt-based organization structures on LLM agents to mitigate these problems. Through a series of experiments with embodied LLM agents and human-agent collaboration, our results highlight the impact of designated leadership on team efficiency, shedding light on the leadership qualities displayed by LLM agents and their spontaneous cooperative behaviors. Further, we harness the potential of LLMs to propose enhanced organizational prompts, via a Criticize-Reflect process, resulting in novel organization structures that reduce communication costs and enhance team efficiency.	翻訳日:2024-03-20 15:02:36 公開日:2024-03-19
# 顔画像からの年齢・性別分類のためのハイブリッドトランスフォーマーシーケンサアプローチ A Hybrid Transformer-Sequencer approach for Age and Gender classification from in-wild facial images ( http://arxiv.org/abs/2403.12483v1 ) ライセンス: Link先を確認	Aakash Singh, Vivek Kumar Singh,	(参考訳) コンピュータビジョンと画像処理技術の進歩は、視覚監視、ターゲット広告、コンテンツベースの検索、人間とコンピュータのインタラクションといった分野における新しい応用の出現につながっている。コンピュータビジョンの様々な技術のうち、特に顔分析は注目されている。これまでのいくつかの研究では、年齢や性別の分類など、さまざまなタスクに対する顔の特徴処理の様々な応用を探究してきた。しかし、これまでいくつかの研究がこの問題を調査してきたが、人間の顔の年齢と性別の分類は、現実世界の応用に必要とされる精度のレベルを達成するには程遠い。そこで本稿では,年齢と性別の分類問題に対する自己意識とBiLSTMアプローチを組み合わせたハイブリッドモデルを提案し,このギャップを埋めようとしている。提案したモデルの性能は、これまでに提案されたいくつかの最先端モデルと比較される。提案モデルでは, 年齢と性別の分類のための最先端実装に比べて, 約10%, 6%の改善が注目されている。提案したモデルは優れた性能を達成し,より一般化された学習を提供する。したがって、このモデルは様々な画像処理やコンピュータビジョン問題において、コア分類コンポーネントとして適用することができる。 The advancements in computer vision and image processing techniques have led to emergence of new application in the domain of visual surveillance, targeted advertisement, content-based searching, and human-computer interaction etc. Out of the various techniques in computer vision, face analysis, in particular, has gained much attention. Several previous studies have tried to explore different applications of facial feature processing for a variety of tasks, including age and gender classification. However, despite several previous studies having explored the problem, the age and gender classification of in-wild human faces is still far from the achieving the desired levels of accuracy required for real-world applications. This paper, therefore, attempts to bridge this gap by proposing a hybrid model that combines self-attention and BiLSTM approaches for age and gender classification problems. The proposed models performance is compared with several state-of-the-art model proposed so far. An improvement of approximately 10percent and 6percent over the state-of-the-art implementations for age and gender classification, respectively, are noted for the proposed model. The proposed model is thus found to achieve superior performance and is found to provide a more generalized learning. The model can, therefore, be applied as a core classification component in various image processing and computer vision problems.	翻訳日:2024-03-20 15:02:36 公開日:2024-03-19
# NTK-Guided Few-Shot Class Incremental Learning NTK-Guided Few-Shot Class Incremental Learning ( http://arxiv.org/abs/2403.12486v1 ) ライセンス: Link先を確認	Jingren Liu, Zhong Ji, Yanwei Pang, YunLong Yu,	(参考訳) 反アムネシアのFSCIL学習者は、しばしばインクリメンタルセッションに優れるが、彼らは、知識獲得のモデルの可能性を活用することよりも、知識獲得の軽減を優先する傾向にある。本稿では、ニューラルタンジェントカーネル(NTK)のレンズを用いて、FSCILにおけるモデル一般化の基礎を掘り下げる。我々の主設計は、最適NTK収束とNTK関連一般化誤差の確保に重点を置いており、例外的一般化の理論的基盤として機能している。 NTKのグローバルな収束を実現するため,拡張ネットワーク内での最適化プロセスを導くために,数学的原理に基づくメタ学習機構を用いる。さらに,NTK関連一般化誤差を低減するため,その一般化損失を構成する要因を最適化し,基礎レベルから始める。具体的には,初期ネットワークの重みを形作るために,ベースセッションで自己指導型事前学習を開始する。その後、曲線アライメントにより慎重に洗練され、続いて、畳み込み層と線形層の両方に特化して2つのNTK正規化が適用される。これらの効果を組み合わせることで、ネットワークは堅牢なNTK特性を取得し、基礎的な一般化を著しく強化する。一般的なFSCILベンチマークデータセットでは、NTK-FSCILは現代の最先端のアプローチを超越し、エンドセッション精度を2.9%から8.7%向上させた。 While anti-amnesia FSCIL learners often excel in incremental sessions, they tend to prioritize mitigating knowledge attrition over harnessing the model's potential for knowledge acquisition. In this paper, we delve into the foundations of model generalization in FSCIL through the lens of the Neural Tangent Kernel (NTK). Our primary design focus revolves around ensuring optimal NTK convergence and NTK-related generalization error, serving as the theoretical bedrock for exceptional generalization. To attain globally optimal NTK convergence, we employ a meta-learning mechanism grounded in mathematical principles to guide the optimization process within an expanded network. Furthermore, to reduce the NTK-related generalization error, we commence from the foundational level, optimizing the relevant factors constituting its generalization loss. Specifically, we initiate self-supervised pre-training on the base session to shape the initial network weights. Then they are carefully refined through curricular alignment, followed by the application of dual NTK regularization tailored specifically for both convolutional and linear layers. Through the combined effects of these measures, our network acquires robust NTK properties, significantly enhancing its foundational generalization. On popular FSCIL benchmark datasets, our NTK-FSCIL surpasses contemporary state-of-the-art approaches, elevating end-session accuracy by 2.9% to 8.7%.	翻訳日:2024-03-20 15:02:36 公開日:2024-03-19
# DetToolChain:MLLMのアンリーシュ検出機能のための新しいプロンプトパラダイム DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM ( http://arxiv.org/abs/2403.12488v1 ) ライセンス: Link先を確認	Yixuan Wu, Yizhou Wang, Shixiang Tang, Wenhao Wu, Tong He, Wanli Ouyang, Jian Wu, Philip Torr,	(参考訳) 本稿では,GPT-4V や Gemini などのマルチモーダル大規模言語モデル (MLLM) のゼロショットオブジェクト検出能力を解き放つために,新しいプロンプトパラダイムである DetToolChain を提案する。提案手法は,高精度検出にヒントを得た検出プロンプトツールキットと,これらのプロンプトを実装するための新しいChain-of-Thoughtから構成される。特に、ツールキットのプロンプトは、MLLMが地域情報(例えば、ズームイン)に集中するように誘導し、測定基準(例えば、オーバレイの定規とコンパス)に従って座標を読み、コンテキスト情報(例えば、シーングラフのオーバーレイ)から推測するように設計されている。これらのツールに基づいて、新しい検出チェーンはタスクを単純なサブタスクに自動的に分解し、予測を診断し、プログレッシブボックスの改良計画を立てる。本フレームワークの有効性は,検出タスク,特にハードケースの幅広い範囲で実証されている。既存の最先端の手法と比較して、GPT-4VとDetToolChainは、MS COCO上の最先端オブジェクト検出器を+21.5%改善し、オープン語彙検出のための新しいクラスセット +24.23% Acc on RefCOCO val set for zero-shot Reference Expression comprehension, +14.5% AP on D-cube describe object detection FULL setting。 We present DetToolChain, a novel prompting paradigm, to unleash the zero-shot object detection ability of multimodal large language models (MLLMs), such as GPT-4V and Gemini. Our approach consists of a detection prompting toolkit inspired by high-precision detection priors and a new Chain-of-Thought to implement these prompts. Specifically, the prompts in the toolkit are designed to guide the MLLM to focus on regional information (e.g., zooming in), read coordinates according to measure standards (e.g., overlaying rulers and compasses), and infer from the contextual information (e.g., overlaying scene graphs). Building upon these tools, the new detection chain-of-thought can automatically decompose the task into simple subtasks, diagnose the predictions, and plan for progressive box refinements. The effectiveness of our framework is demonstrated across a spectrum of detection tasks, especially hard cases. Compared to existing state-of-the-art methods, GPT-4V with our DetToolChain improves state-of-the-art object detectors by +21.5% AP50 on MS COCO Novel class set for open-vocabulary detection, +24.23% Acc on RefCOCO val set for zero-shot referring expression comprehension, +14.5% AP on D-cube describe object detection FULL setting.	翻訳日:2024-03-20 15:02:36 公開日:2024-03-19
# 遺伝的にプログラム可能な光ランダムニューラルネットワーク Genetically programmable optical random neural networks ( http://arxiv.org/abs/2403.12490v1 ) ライセンス: Link先を確認	Bora Çarpınlıoğlu, Bahrem Serhat Daniş, Uğur Teğin,	(参考訳) 今日では、機械学習ツール、特に人工知能は、多様なアプリケーションに欠かせない存在になっている。しかし、ニューラルネットワークをトレーニングし、デプロイする現在のデジタルコンピューティングツールは、大規模なデータサイズと高消費電力に悩まされることが多い。光コンピューティングは本質的に並列性を提供し、受動光学部品で基本的な操作を行う。しかし、ほとんどの光学コンピューティングプラットフォームは、複雑でセンシティブな技術を避けながら、固定接続による機械学習タスクの精度が比較的低い。本稿では,遺伝的にプログラム可能な単純な光学ニューラルネットワークを実演し,光学的ランダムプロジェクションによる高い性能を実現する。ランダムなプロジェクションカーネルとして機能し、検索空間の1%しか使用しない散乱媒体の向きを遺伝的にプログラミングすることにより、新しい手法は最適なカーネルを見つけ、様々な機械学習タスクに対する初期試験精度を7-22%改善する。提案手法は,シンプルでスケーラブルな設計で,光ニューラルネットワークの高性能化を実現するための有望な手法である。 Today, machine learning tools, particularly artificial neural networks, have become crucial for diverse applications. However, current digital computing tools to train and deploy artificial neural networks often struggle with massive data sizes and high power consumptions. Optical computing provides inherent parallelism and perform fundamental operations with passive optical components. However, most of the optical computing platforms suffer from relatively low accuracies for machine learning tasks due to fixed connections while avoiding complex and sensitive techniques. Here, we demonstrate a genetically programmable yet simple optical neural network to achieve high performances with optical random projection. By genetically programming the orientation of the scattering medium which acts as a random projection kernel and only using 1% of the search space, our novel technique finds an optimum kernel and improves its initial test accuracies 7-22% for various machine learning tasks. Our optical computing method presents a promising approach to achieve high performance in optical neural networks with a simple and scalable design.	翻訳日:2024-03-20 15:02:36 公開日:2024-03-19
# ディープニューラルネットワークとスキャンパス分類のための訓練可能な機能外命令モジュール A Trainable Feature Extractor Module for Deep Neural Networks and Scanpath Classification ( http://arxiv.org/abs/2403.12493v1 ) ライセンス: Link先を確認	Wolfgang Fuhl,	(参考訳) スキャンパス分類(Scanpath classification)は、医学、製造、および様々な領域の学生のための訓練システムに応用できる眼球追跡研究の分野である。本稿では,ディープニューラルネットワークのためのトレーニング可能な特徴抽出モジュールを提案する。このモジュールの目的は、ディープニューラルネットワークアーキテクチャに直接使用可能な機能ベクトルにスキャンパスを変換することである。ディープニューラルネットワークのバックプロパゲートエラーに基づいて、特徴抽出モジュールはそのパラメータを適応させ、分類性能を向上させる。したがって,我々の特徴抽出モジュールはディープニューラルネットワークと共同で訓練可能である。この特徴抽出モジュールの動機は、古典的なヒストグラムに基づくアプローチに基づいており、通常はスキャンパス上の分布を計算する。モジュールを3つの公開データセットで評価し,最先端のアプローチと比較した。 Scanpath classification is an area in eye tracking research with possible applications in medicine, manufacturing as well as training systems for students in various domains. In this paper we propose a trainable feature extraction module for deep neural networks. The purpose of this module is to transform a scanpath into a feature vector which is directly useable for the deep neural network architecture. Based on the backpropagated error of the deep neural network, the feature extraction module adapts its parameters to improve the classification performance. Therefore, our feature extraction module is jointly trainable with the deep neural network. The motivation to this feature extraction module is based on classical histogram-based approaches which usually compute distributions over a scanpath. We evaluated our module on three public datasets and compared it to the state of the art approaches.	翻訳日:2024-03-20 15:02:36 公開日:2024-03-19
# 一般画像融合用適応器のタスクカスタマイズ混合 Task-Customized Mixture of Adapters for General Image Fusion ( http://arxiv.org/abs/2403.12494v1 ) ライセンス: Link先を確認	Pengfei Zhu, Yang Sun, Bing Cao, Qinghua Hu,	(参考訳) 一般画像融合は、マルチソース画像から重要な情報を統合することを目的としている。しかし、タスク間の大きなギャップのため、それぞれの融合機構は実際に大きく変化し、サブタスク間での性能が制限される。この問題に対処するために,汎用画像融合のための新しいタスクカストマイズド・ミックス(TC-MoA)を提案し,統一モデルにおける様々な融合タスクを適応的に促進する。我々は、専門家(MoE)の混在から洞察を借り、専門家を効果的なチューニングアダプタとして捉え、事前訓練された基礎モデルを促す。これらのアダプタは異なるタスク間で共有され、相互情報の規則化によって制約される。タスク固有のルーティングネットワークは、これらのアダプタをカスタマイズして、動的に支配的な強度で異なるソースからタスク固有の情報を抽出し、適応的な視覚的特徴のプロンプト融合を実行する。特に、我々のTC-MoAは、異なる核融合タスクに対する支配的な強度バイアスを制御し、単一のモデルで複数の核融合タスクを統一することに成功した。 TC-MoAは、一般画像融合(マルチモーダル、マルチ露光、マルチフォーカス)の互換性を維持しつつ、共通性を学ぶための競合するアプローチよりも優れており、さらに、より一般化実験において顕著な制御性を示している。コードはhttps://github.com/YangSun22/TC-MoA で公開されている。 General image fusion aims at integrating important information from multi-source images. However, due to the significant cross-task gap, the respective fusion mechanism varies considerably in practice, resulting in limited performance across subtasks. To handle this problem, we propose a novel task-customized mixture of adapters (TC-MoA) for general image fusion, adaptively prompting various fusion tasks in a unified model. We borrow the insight from the mixture of experts (MoE), taking the experts as efficient tuning adapters to prompt a pre-trained foundation model. These adapters are shared across different tasks and constrained by mutual information regularization, ensuring compatibility with different tasks while complementarity for multi-source images. The task-specific routing networks customize these adapters to extract task-specific information from different sources with dynamic dominant intensity, performing adaptive visual feature prompt fusion. Notably, our TC-MoA controls the dominant intensity bias for different fusion tasks, successfully unifying multiple fusion tasks in a single model. Extensive experiments show that TC-MoA outperforms the competing approaches in learning commonalities while retaining compatibility for general image fusion (multi-modal, multi-exposure, and multi-focus), and also demonstrating striking controllability on more generalization experiments. The code is available at https://github.com/YangSun22/TC-MoA .	翻訳日:2024-03-20 15:02:36 公開日:2024-03-19
# 一貫性を考慮した対話システムのためのモデル生成コントラクタ応答の大規模コレクション A Large Collection of Model-generated Contradictory Responses for Consistency-aware Dialogue Systems ( http://arxiv.org/abs/2403.12500v1 ) ライセンス: Link先を確認	Shiki Sato, Reina Akama, Jun Suzuki, Kentaro Inui,	(参考訳) 矛盾する応答の生成を緩和することは、対話応答の生成において重大な課題となる。利用可能な矛盾応答データの質と量は、これらの矛盾を抑制する上で重要な役割を担い、2つの重要な利点を提供している。第一に、大きな矛盾データにアクセスすることで、それらの特性を総合的に調べることができる。第二に、矛盾を緩和するためのデータ駆動手法は、訓練のための大規模な矛盾データによって強化される可能性がある。それでも、モデル生成の矛盾する応答の広範なコレクションを構築する試みは行われていない。本稿では, 応答生成モデルの矛盾点の大規模なデータセットを初めて構築する。そして, 得られた応答を広範囲に解析することにより, モデル生成矛盾の特性に関する貴重な知見を得る。最後に,このデータセットがデータ駆動型矛盾抑制手法の性能を大幅に向上させることを示す。 Mitigating the generation of contradictory responses poses a substantial challenge in dialogue response generation. The quality and quantity of available contradictory response data play a vital role in suppressing these contradictions, offering two significant benefits. First, having access to large contradiction data enables a comprehensive examination of their characteristics. Second, data-driven methods to mitigate contradictions may be enhanced with large-scale contradiction data for training. Nevertheless, no attempt has been made to build an extensive collection of model-generated contradictory responses. In this paper, we build a large dataset of response generation models' contradictions for the first time. Then, we acquire valuable insights into the characteristics of model-generated contradictions through an extensive analysis of the collected responses. Lastly, we also demonstrate how this dataset substantially enhances the performance of data-driven contradiction suppression methods.	翻訳日:2024-03-20 15:02:36 公開日:2024-03-19
# 大規模言語モデルのセキュア化 - 脅威、脆弱性、責任あるプラクティス Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices ( http://arxiv.org/abs/2403.12503v1 ) ライセンス: Link先を確認	Sara Abdali, Richard Anarfi, CJ Barberan, Jia He,	(参考訳) 大規模言語モデル(LLM)は自然言語処理(NLP)のランドスケープを大きく変えた。彼らの影響は、言語理解と世代へのアプローチに革命をもたらした、さまざまなタスクの範囲に及んでいる。それにもかかわらず、LLMは目覚ましい実用性とともに、重要なセキュリティとリスクの考慮を導入している。これらの課題は、責任あるデプロイメントと潜在的な脆弱性に対する保護を保証するために、慎重に検査することを保証します。本研究は, LLMのセキュリティとプライバシに関する懸念を, セキュリティとプライバシの懸念, 敵攻撃に対する脆弱性, LLMの誤用による潜在的な害, 対処戦略の緩和, 現行戦略の限界の特定という5つのテーマから, 徹底的に調査する。最後に,LLMのセキュリティとリスク管理を強化するための今後の研究の道筋について提案する。 Large language models (LLMs) have significantly transformed the landscape of Natural Language Processing (NLP). Their impact extends across a diverse spectrum of tasks, revolutionizing how we approach language understanding and generations. Nevertheless, alongside their remarkable utility, LLMs introduce critical security and risk considerations. These challenges warrant careful examination to ensure responsible deployment and safeguard against potential vulnerabilities. This research paper thoroughly investigates security and privacy concerns related to LLMs from five thematic perspectives: security and privacy concerns, vulnerabilities against adversarial attacks, potential harms caused by misuses of LLMs, mitigation strategies to address these challenges while identifying limitations of current strategies. Lastly, the paper recommends promising avenues for future research to enhance the security and risk management of LLMs.	翻訳日:2024-03-20 15:02:36 公開日:2024-03-19
# セマンティックス, 歪み, スタイル項目:パノラマセグメンテーションのためのソースフリーUDAを目指して Semantics, Distortion, and Style Matter: Towards Source-free UDA for Panoramic Segmentation ( http://arxiv.org/abs/2403.12505v1 ) ライセンス: Link先を確認	Xu Zheng, Pengyuan Zhou, Athanasios Vasilakos, Lin Wang,	(参考訳) 本稿では、ピンホールからパノラマのセマンティックセマンティックセグメンテーションのための、ソースフリーの非教師なしドメイン適応(SFUDA)と、ピンホール画像訓練モデル(ソース)とラベルなしパノラマ画像(ターゲット)のみを提供する、興味深い課題に対処する。この問題に取り組むことは、意味的ミスマッチ、スタイルの相違、パノラマ画像の避けられない歪みのため、簡単ではない。そこで本研究では,Tangent Projection (TP) を用いて歪みを小さくし,固定されたFoVで正方形投影(ERP)をスリットしてピンホール画像を模倣する手法を提案する。どちらのプロジェクションも、ソースモデルから知識を抽出するのに効果的である。しかし、ソースドメインとターゲットドメインの差は直接知識伝達を妨げるため、抽出した知識からパノラマプロトタイプを統合するためのパノラマプロトタイプ適応モジュール(PPAM)を提案する。そこで我々は, 予測とプロトタイプの両方に損失制約を課し, ドメインとプロジェクション間の空間特性とチャネル特性の整合性を改善するために, 機能レベルでの相互注意モジュール (CDAM) を提案する。知識抽出と転送プロセスは同期的に更新され、最高のパフォーマンスを得る。室内・屋外のシナリオを含む,合成および実世界のベンチマーク実験により,本手法は従来のSFUDA法に比べてピンホール・パノラマ適応法よりも有意に優れた性能を示した。 This paper addresses an interesting yet challenging problem-- source-free unsupervised domain adaptation (SFUDA) for pinhole-to-panoramic semantic segmentation--given only a pinhole image-trained model (i.e., source) and unlabeled panoramic images (i.e., target). Tackling this problem is nontrivial due to the semantic mismatches, style discrepancies, and inevitable distortion of panoramic images. To this end, we propose a novel method that utilizes Tangent Projection (TP) as it has less distortion and meanwhile slits the equirectangular projection (ERP) with a fixed FoV to mimic the pinhole images. Both projections are shown effective in extracting knowledge from the source model. However, the distinct projection discrepancies between source and target domains impede the direct knowledge transfer; thus, we propose a panoramic prototype adaptation module (PPAM) to integrate panoramic prototypes from the extracted knowledge for adaptation. We then impose the loss constraints on both predictions and prototypes and propose a cross-dual attention module (CDAM) at the feature level to better align the spatial and channel characteristics across the domains and projections. Both knowledge extraction and transfer processes are synchronously updated to reach the best performance. Extensive experiments on the synthetic and real-world benchmarks, including outdoor and indoor scenarios, demonstrate that our method achieves significantly better performance than prior SFUDA methods for pinhole-to-panoramic adaptation.	翻訳日:2024-03-20 15:02:36 公開日:2024-03-19
# 画像操作のための一般化された一貫性軌道モデル Generalized Consistency Trajectory Models for Image Manipulation ( http://arxiv.org/abs/2403.12510v1 ) ライセンス: Link先を確認	Beomsu Kim, Jaemin Kim, Jeongsol Kim, Jong Chul Ye,	(参考訳) 拡散に基づく生成モデルは、画像編集や復元といった応用タスクと同様に、無条件生成において優れている。拡散モデルの成功は拡散の反復的な性質に起因している: 拡散はノイズをデータにマッピングする複雑な過程を単純な認知タスクの列に分解する。さらに,各認知ステップに誘導項を注入することにより,生成プロセスのきめ細かい制御を行うことができる。しかし、反復過程も計算集約的であり、しばしば数万から数千の関数評価を取る。整合性軌道モデル(CTM)は、確率フローODE(PFODE)に沿った任意の時間点間のトラバースを可能にし、単一関数評価によるスコア推定を可能にするが、CTMはガウスノイズからデータへの変換のみを可能にする。このようにして、この研究は、ODEを介して任意の分布を変換する一般化CTM(Generalized CTMs)を提案することによって、CTMの完全なポテンシャルを解き放つことを目的としている。本稿では,GCTMの設計空間について論じ,画像から画像への変換,復元,編集など,様々な画像操作タスクにおいて有効性を示す。コード: \url{https://github.com/1202kbs/GCTM} Diffusion-based generative models excel in unconditional generation, as well as on applied tasks such as image editing and restoration. The success of diffusion models lies in the iterative nature of diffusion: diffusion breaks down the complex process of mapping noise to data into a sequence of simple denoising tasks. Moreover, we are able to exert fine-grained control over the generation process by injecting guidance terms into each denoising step. However, the iterative process is also computationally intensive, often taking from tens up to thousands of function evaluations. Although consistency trajectory models (CTMs) enable traversal between any time points along the probability flow ODE (PFODE) and score inference with a single function evaluation, CTMs only allow translation from Gaussian noise to data. Thus, this work aims to unlock the full potential of CTMs by proposing generalized CTMs (GCTMs), which translate between arbitrary distributions via ODEs. We discuss the design space of GCTMs and demonstrate their efficacy in various image manipulation tasks such as image-to-image translation, restoration, and editing. Code: \url{https://github.com/1202kbs/GCTM}	翻訳日:2024-03-20 15:02:36 公開日:2024-03-19
# メモリ効率のよい深層ニューラルネットワークトレーニングのためのフォワードグラディエントベースFrank-Wolfe最適化 Forward Gradient-Based Frank-Wolfe Optimization for Memory Efficient Deep Neural Network Training ( http://arxiv.org/abs/2403.12511v1 ) ライセンス: Link先を確認	M. Rostami, S. S. Kia,	(参考訳) 勾配に基づく手法を用いたディープニューラルネットワークのトレーニングは、各レベルの勾配の計算を必要とする。しかし、バックプロパゲーションやリバースモードの微分を用いて、大きなメモリ消費を必要とする勾配を計算することで、バックプロパゲーションは勾配を計算する非効率な方法である。本稿では,Frank-Wolfeアルゴリズム,すなわち条件勾配アルゴリズムの性能解析に焦点をあてる。本稿では,自動微分の前方モードで得られる真の勾配のノイズ推定値にアクセスすることにより,提案アルゴリズムが最適解に収束することを示す,詳細な技術的詳細を提供する。対照的に、標準的なフランク=ウルフアルゴリズムは、プロジェクテッド・フォワード・グラディエントへのアクセスを提供すると、最適解に収束しない。数値的な例を用いて提案アルゴリズムの収束特性を実証する。 Training a deep neural network using gradient-based methods necessitates the calculation of gradients at each level. However, using backpropagation or reverse mode differentiation, to calculate the gradients necessities significant memory consumption, rendering backpropagation an inefficient method for computing gradients. This paper focuses on analyzing the performance of the well-known Frank-Wolfe algorithm, a.k.a. conditional gradient algorithm by having access to the forward mode of automatic differentiation to compute gradients. We provide in-depth technical details that show the proposed Algorithm does converge to the optimal solution with a sub-linear rate of convergence by having access to the noisy estimate of the true gradient obtained in the forward mode of automated differentiation, referred to as the Projected Forward Gradient. In contrast, the standard Frank-Wolfe algorithm, when provided with access to the Projected Forward Gradient, fails to converge to the optimal solution. We demonstrate the convergence attributes of our proposed algorithms using a numerical example.	翻訳日:2024-03-20 15:02:36 公開日:2024-03-19
# スケルトン対応手話認識のための動的時空間集約 Dynamic Spatial-Temporal Aggregation for Skeleton-Aware Sign Language Recognition ( http://arxiv.org/abs/2403.12519v1 ) ライセンス: Link先を確認	Lianyu Hu, Liqing Gao, Zekang Liu, Wei Feng,	(参考訳) スケルトン対応手話認識(SLR)は、背景情報や低い計算要求の影響を受けない能力で人気を博している。現在の手法では、空間グラフモジュールと時空間モジュールを使用して、それぞれ空間的特徴と時空間的特徴をキャプチャする。しかし、それらの空間グラフモジュールは通常、グラフ畳み込みネットワークや単一の学習可能なグラフのような固定されたグラフ構造の上に構築され、部分的には結合関係を探索するだけである。さらに、単純な時間的畳み込みカーネルは、異なるシグナの複雑な動きパターンを完全にキャプチャできない時間的情報をキャプチャするために使用される。これらの制約を克服するために,入力に敏感な連接関係を構築し,それぞれに認識のための特定のドメイン知識を組み込む2つの同時分岐からなる空間アーキテクチャを提案する。これら2つの枝は、重要な関節接続を区別する集約プロセスによって続く。そこで我々は,複雑な人間のダイナミクスを捉えるために,マルチスケールの時間情報をモデル化する新しい時間モジュールを提案する。提案手法は,4つの大規模SLRベンチマークにおいて,従来のスケルトン認識手法と比較して,最先端の精度を実現する。さらに,計算資源の削減を図り,精度と計算のトレードオフが向上する一方,ほとんどの場合においてRGB法よりも精度がよいことを示す。コードはhttps://github.com/hulianyuyy/DSTA-SLRで入手できる。 Skeleton-aware sign language recognition (SLR) has gained popularity due to its ability to remain unaffected by background information and its lower computational requirements. Current methods utilize spatial graph modules and temporal modules to capture spatial and temporal features, respectively. However, their spatial graph modules are typically built on fixed graph structures such as graph convolutional networks or a single learnable graph, which only partially explore joint relationships. Additionally, a simple temporal convolution kernel is used to capture temporal information, which may not fully capture the complex movement patterns of different signers. To overcome these limitations, we propose a new spatial architecture consisting of two concurrent branches, which build input-sensitive joint relationships and incorporates specific domain knowledge for recognition, respectively. These two branches are followed by an aggregation process to distinguishe important joint connections. We then propose a new temporal module to model multi-scale temporal information to capture complex human dynamics. Our method achieves state-of-the-art accuracy compared to previous skeleton-aware methods on four large-scale SLR benchmarks. Moreover, our method demonstrates superior accuracy compared to RGB-based methods in most cases while requiring much fewer computational resources, bringing better accuracy-computation trade-off. Code is available at https://github.com/hulianyuyy/DSTA-SLR.	翻訳日:2024-03-20 15:02:36 公開日:2024-03-19
# GraphERE: グラフ強化イベント埋め込みによる複数イベント関係抽出 GraphERE: Jointly Multiple Event-Event Relation Extraction via Graph-Enhanced Event Embeddings ( http://arxiv.org/abs/2403.12523v1 ) ライセンス: Link先を確認	Haochen Li, Di Geng,	(参考訳) イベントはエンティティの状態変化を記述する。文書では、複数のイベントは様々な関係(例えば、Coreference、Temporal、Causal、Subevent)によって接続される。したがって、イベント関係抽出(ERE)を通じてイベント間の接続を得るには、自然言語を理解することが不可欠である。現在のEREには2つの大きな問題があります。 a) イベントトリガの埋め込みは、イベントの特徴表現、イベント引数(例えば、時間、場所、人など)の無視、イベント内のそれらの構造にのみ使用される。 b) 関係(例えば、時間的関係と因果関係)の相互関係は無視される。上記の問題を解決するために,グラフ強化イベント埋め込みに基づく共同で複数のEREフレームワークであるGraphEREを提案する。まず、静的なAMRグラフとIEグラフを使用してイベントの埋め込みと構造化機能を強化します。次に、複数のイベント関係を共同で抽出するために、Node Transformerを使用し、各タイプの関係に対してタスク固有の動的イベントグラフを構築します。最後に、フレームワーク全体をトレーニングするために、マルチタスクの学習戦略を使用しました。最新のMAVEN-EREデータセットの実験結果は、GraphEREが既存のメソッドよりも大幅に優れていることを検証している。さらに分析した結果,グラフ強調イベント埋め込みの有効性と共同抽出戦略の有効性が示唆された。 Events describe the state changes of entities. In a document, multiple events are connected by various relations (e.g., Coreference, Temporal, Causal, and Subevent). Therefore, obtaining the connections between events through Event-Event Relation Extraction (ERE) is critical to understand natural language. There are two main problems in the current ERE works: a. Only embeddings of the event triggers are used for event feature representation, ignoring event arguments (e.g., time, place, person, etc.) and their structure within the event. b. The interconnection between relations (e.g., temporal and causal relations usually interact with each other ) is ignored. To solve the above problems, this paper proposes a jointly multiple ERE framework called GraphERE based on Graph-enhanced Event Embeddings. First, we enrich the event embeddings with event argument and structure features by using static AMR graphs and IE graphs; Then, to jointly extract multiple event relations, we use Node Transformer and construct Task-specific Dynamic Event Graphs for each type of relation. Finally, we used a multi-task learning strategy to train the whole framework. Experimental results on the latest MAVEN-ERE dataset validate that GraphERE significantly outperforms existing methods. Further analyses indicate the effectiveness of the graph-enhanced event embeddings and the joint extraction strategy.	翻訳日:2024-03-20 15:02:36 公開日:2024-03-19
# 連立自由度イベント抽出とイベントスキーマ誘導のためのプロンプトグラフモデル Prompt-based Graph Model for Joint Liberal Event Extraction and Event Schema Induction ( http://arxiv.org/abs/2403.12526v1 ) ライセンス: Link先を確認	Haochen Li, Di Geng,	(参考訳) イベントは、エンティティの状態の変化を記述する、スピーチとテキストの不可欠なコンポーネントである。イベント抽出タスクは、イベントを特定して分類し、イベントスキーマに従って参加者を見つけることを目的としている。手動で事前定義されたイベントスキーマは、カバー範囲が限られており、ドメイン間での移行が困難である。そこで研究者らは,イベント抽出とイベントスキーマの同時発見を目的としたリベラルイベント抽出(LEE)を提案する。しかし、既存のLEEモデルは外部の言語知識ベースに大きく依存しており、ノイズ除去と知識アライメントのための多数のルールを手作業で開発する必要がある。そこで我々は,自由イベント抽出(PGLEE)のためのPromptベースのグラフモデルを提案する。具体的には、プロンプトベースのモデルを使用して、候補のトリガと引数を取得し、それから不均一なイベントグラフを構築し、イベント内およびイベント間の構造をエンコードする。実験結果から,自動検出されたイベントスキーマは高品質であるのに対して,事前定義されたイベントスキーマでは性能が良好であることが確認された。 Events are essential components of speech and texts, describing the changes in the state of entities. The event extraction task aims to identify and classify events and find their participants according to event schemas. Manually predefined event schemas have limited coverage and are hard to migrate across domains. Therefore, the researchers propose Liberal Event Extraction (LEE), which aims to extract events and discover event schemas simultaneously. However, existing LEE models rely heavily on external language knowledge bases and require the manual development of numerous rules for noise removal and knowledge alignment, which is complex and laborious. To this end, we propose a Prompt-based Graph Model for Liberal Event Extraction (PGLEE). Specifically, we use a prompt-based model to obtain candidate triggers and arguments, and then build heterogeneous event graphs to encode the structures within and between events. Experimental results prove that our approach achieves excellent performance with or without predefined event schemas, while the automatically detected event schemas are proven high quality.	翻訳日:2024-03-20 15:02:36 公開日:2024-03-19
# コンテキスト化されたメッセージはグラフ表現を増強する Contextualized Messages Boost Graph Representations ( http://arxiv.org/abs/2403.12529v1 ) ライセンス: Link先を確認	Brian Godwin Lim,	(参考訳) グラフニューラルネットワーク(GNN)は、グラフとして表される任意の構造化されたデータを扱う能力のため、近年大きな関心を集めている。 GNNは一般的に、ノードの特徴表現をローカルに更新するメッセージパッシングスキームに従う。グラフ読み込み関数を使用して、グラフ全体の表現を生成する。いくつかの研究は、しばしばヒューリスティックスにインスパイアされたメッセージパッシングフレームワークのアグリゲーションと組み合わせ戦略を変更することで、異なるGNNを提案した。しかしながら、いくつかの研究は、本質的に可算ノード特徴表現を仮定するグラフ同型問題に基づく理論的な観点からGNNの探索を開始した。しかし、非可算なノード特徴表現を持つGNNを探索する理論的な研究はごくわずかである。本稿では,ノード特徴表現の空間が非可算である場合,すべてのレベル(ノードレベル,近傍レベル,グラフレベル)にわたるGNNの表現能力に関する新たな視点を示す。この結果から, 近傍特徴表現の非線形・文脈変換を重視した, ソフト同型関係グラフ畳み込みネットワーク (SIR-GCN) が提案された。 SIR-GCNと3つの広く使われているGNNの数学的関係について検討した。合成データセットの検証により、SIR-GCNは単純なノードやグラフプロパティの予測タスクでも同等のモデルより優れていることが示される。 Graph neural networks (GNNs) have gained significant interest in recent years due to their ability to handle arbitrarily structured data represented as graphs. GNNs generally follow the message-passing scheme to locally update node feature representations. A graph readout function is then employed to create a representation for the entire graph. Several studies proposed different GNNs by modifying the aggregation and combination strategies of the message-passing framework, often inspired by heuristics. Nevertheless, several studies have begun exploring GNNs from a theoretical perspective based on the graph isomorphism problem which inherently assumes countable node feature representations. Yet, there are only a few theoretical works exploring GNNs with uncountable node feature representations. This paper presents a new perspective on the representational capabilities of GNNs across all levels - node-level, neighborhood-level, and graph-level - when the space of node feature representation is uncountable. From the results, a novel soft-isomorphic relational graph convolution network (SIR-GCN) is proposed that emphasizes non-linear and contextualized transformations of neighborhood feature representations. The mathematical relationship of SIR-GCN and three widely used GNNs is explored to highlight the contribution. Validation on synthetic datasets then demonstrates that SIR-GCN outperforms comparable models even in simple node and graph property prediction tasks.	翻訳日:2024-03-20 14:52:48 公開日:2024-03-19
# PCT:マルチカメラBEVセグメンテーションのためのパースペクティブキュートレーニングフレームワーク PCT: Perspective Cue Training Framework for Multi-Camera BEV Segmentation ( http://arxiv.org/abs/2403.12530v1 ) ライセンス: Link先を確認	Haruya Ishikawa, Takumi Iida, Yoshinori Konishi, Yoshimitsu Aoki,	(参考訳) 鳥眼ビュー(BEV)セグメンテーションのためのアノテーションの生成は、シーンの複雑さと手作業によるアノテーションのコストが高いため、大きな課題となる。本研究では、利用可能なラベルなしデータの豊富さを活用することで、これらの課題に対処する。本研究では,大規模なストリートビューデータセットでトレーニングされた公開セマンティックセグメンテーションモデルを用いて,ラベルのない視点画像から生成された擬似ラベルを利用する新しいトレーニングフレームワークであるパースペクティブキュートレーニング(PCT)フレームワークを提案する。 PCTは、BEVセグメンテーションヘッドと共有される画像エンコーダにビュービュータスクヘッドを適用し、生成した擬似ラベルでトレーニングされるラベルなしデータを効果的に活用する。ほぼ全てのカメラベースのBEVセグメンテーションアーキテクチャに画像エンコーダが存在するため、PCTは柔軟であり、既存のBEVアーキテクチャにも適用可能である。 PCTはラベルのないデータが利用できる様々な設定に適用できる。本稿では,半教師付き学習(SSL)と教師なしドメイン適応(UDA)にPCTを適用した。さらに,カメラドロップアウト(CamDrop)による強い入力摂動と,BEV機能ドロップアウト(BFD)による特徴摂動を導入する。私たちの包括的なアプローチはシンプルで柔軟なものですが、SSLやUDAのさまざまなベースラインよりも大幅に改善されています。 Generating annotations for bird's-eye-view (BEV) segmentation presents significant challenges due to the scenes' complexity and the high manual annotation cost. In this work, we address these challenges by leveraging the abundance of unlabeled data available. We propose the Perspective Cue Training (PCT) framework, a novel training framework that utilizes pseudo-labels generated from unlabeled perspective images using publicly available semantic segmentation models trained on large street-view datasets. PCT applies a perspective view task head to the image encoder shared with the BEV segmentation head, effectively utilizing the unlabeled data to be trained with the generated pseudo-labels. Since image encoders are present in nearly all camera-based BEV segmentation architectures, PCT is flexible and applicable to various existing BEV architectures. PCT can be applied to various settings where unlabeled data is available. In this paper, we applied PCT for semi-supervised learning (SSL) and unsupervised domain adaptation (UDA). Additionally, we introduce strong input perturbation through Camera Dropout (CamDrop) and feature perturbation via BEV Feature Dropout (BFD), which are crucial for enhancing SSL capabilities using our teacher-student framework. Our comprehensive approach is simple and flexible but yields significant improvements over various baselines for SSL and UDA, achieving competitive performances even against the current state-of-the-art.	翻訳日:2024-03-20 14:52:48 公開日:2024-03-19
# UniBind:LLMの拡張された統一とバランスの取れた表現空間 UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All ( http://arxiv.org/abs/2403.12532v1 ) ライセンス: Link先を確認	Yuanhuiyi Lyu, Xu Zheng, Jiazhou Zhou, Lin Wang,	(参考訳) UniBindは、画像、テキスト、オーディオ、ポイントクラウド、サーマル、ビデオ、イベントデータという7つの異なるモードの統一表現空間を学習する、柔軟で効率的なアプローチである。既存の作品など。 ImageBindは、イメージを中心モダリティとして扱い、イメージ中心の表現空間を構築するが、すべてのモダリティの非均衡表現空間につながるため、その空間は準最適であるかもしれない。さらに、カテゴリ名は、下流タスクのテキスト埋め込みを抽出するために直接使用されるため、マルチモーダルデータのセマンティクスを表現できない。 UniBindの'out-of-the-box'の洞察は、アライメントセンターをモダリティに依存しないものにし、さらに大きな言語モデル(LLM)によって強化された統一されたバランスの取れた表現空間を学習することです。 UniBindはすべてのCLIPスタイルのモデルよりも柔軟なアプリケーションに優れており、優れたパフォーマンス向上を実現している。これを可能にするために、私たちは 1) LLM とマルチモーダル LLM の助けを借りて,テキスト埋め込みの知識基盤を構築する。 2) 知識基盤の上にLLMを付加したクラスワイド埋め込みセンターを適応的に構築し, 視覚的埋め込みを符号化する。 3) コントラスト学習により, 埋め込みをLLM拡張埋め込みセンタに整列させ, 統一的かつバランスの取れた表現空間を実現する。 UniBindは、先行技術よりも平均して6.36%のゼロショット認識性能が向上している。最後に、最新のパフォーマンス、例えば、新しいパフォーマンスを実現します。学習可能なパラメータの90%を削減しつつ、マルチモーダルな微調整設定でImageNetが6.75%向上した。 We present UniBind, a flexible and efficient approach that learns a unified representation space for seven diverse modalities-- images, text, audio, point cloud, thermal, video, and event data. Existing works, eg., ImageBind, treat the image as the central modality and build an image-centered representation space; however, the space may be sub-optimal as it leads to an unbalanced representation space among all modalities. Moreover, the category names are directly used to extract text embeddings for the downstream tasks, making it hardly possible to represent the semantics of multi-modal data. The 'out-of-the-box' insight of our UniBind is to make the alignment center modality-agnostic and further learn a unified and balanced representation space, empowered by the large language models (LLMs). UniBind is superior in its flexible application to all CLIP-style models and delivers remarkable performance boosts. To make this possible, we 1) construct a knowledge base of text embeddings with the help of LLMs and multi-modal LLMs; 2) adaptively build LLM-augmented class-wise embedding center on top of the knowledge base and encoded visual embeddings; 3) align all the embeddings to the LLM-augmented embedding center via contrastive learning to achieve a unified and balanced representation space. UniBind shows strong zero-shot recognition performance gains over prior arts by an average of 6.36%. Finally, we achieve new state-of-the-art performance, eg., a 6.75% gain on ImageNet, on the multi-modal fine-tuning setting while reducing 90% of the learnable parameters.	翻訳日:2024-03-20 14:52:48 公開日:2024-03-19
# 人-ロボットグループインタラクションのためのLLMベースの注意支援 To Help or Not to Help: LLM-based Attentive Support for Human-Robot Group Interactions ( http://arxiv.org/abs/2403.12533v1 ) ライセンス: Link先を確認	Daniel Tanneberg, Felix Ocker, Stephan Hasler, Joerg Deigmoeller, Anna Belardinelli, Chao Wang, Heiko Wersing, Bernhard Sendhoff, Michael Gienger,	(参考訳) ロボットは、どのようにして人間のグループ内で邪魔にならない身体的支援を提供することができるのか? 我々は,人間のグループを支援するロボットのための,新しいインタラクション概念であるAttentive Supportを紹介する。シーン認識、対話獲得、状況理解、行動生成とLarge Language Models(LLM)の常識推論能力を組み合わせる。ユーザの指示に従うことに加えて、Attentive Supportは、いつ、どのように人間をサポートするか、いつ、いつ、沈黙のままでグループを邪魔しないかを決定することができる。多様なシナリオのセットでロボットの注意行動を示して評価し、必要なときに人間を支援し、助けるが、助けがなければ邪魔しない。 How can a robot provide unobtrusive physical support within a group of humans? We present Attentive Support, a novel interaction concept for robots to support a group of humans. It combines scene perception, dialogue acquisition, situation understanding, and behavior generation with the common-sense reasoning capabilities of Large Language Models (LLMs). In addition to following user instructions, Attentive Support is capable of deciding when and how to support the humans, and when to remain silent to not disturb the group. With a diverse set of scenarios, we show and evaluate the robot's attentive behavior, which supports and helps the humans when required, while not disturbing if no help is needed.	翻訳日:2024-03-20 14:52:48 公開日:2024-03-19
# ExACT:イベントベース行動認識のための言語誘導概念推論と不確かさ推定など ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More ( http://arxiv.org/abs/2403.12534v1 ) ライセンス: Link先を確認	Jiazhou Zhou, Xu Zheng, Yuanhuiyi Lyu, Lin Wang,	(参考訳) イベントカメラは、高時間分解能、電力効率、プライバシー上の懸念の軽減などにより、アクション認識などの実用的な視覚タスクに有用であることが最近示されている。しかし、現在の研究は妨げられている。 1)イベントの処理の困難さは、その持続時間と、複雑であいまいな意味論による動的行動の長期化によるものである。 2)固定スタックによるイベントフレーム表現の冗長なアクション描写。言語は自然に豊富な意味情報を伝達し、意味の不確実性を減らすのに驚くほど優れている。そこで我々は, イベントに基づく行動認識を, クロスモーダルな概念化の観点から初めて取り組んだ, 新たなアプローチであるExACTを提案する。当社のExACTには2つの技術コントリビューションがあります。まず、動的イベントを保存しながら、定常オブジェクトの繰り返しイベントを適応的にフィルタリングする、適応的きめ細かいイベント(AFE)表現を提案する。これにより、余分な計算コストなしでExACTの性能が微妙に向上する。そこで本研究では,認識過程をシミュレートして意味表現を充実させる,概念推論に基づく不確実性推定モジュールを提案する。特に、概念的推論は行動意味論に基づく時間的関係を構築し、不確実性推定は分布表現に基づく行動の意味的不確実性に取り組む。実験の結果、当社のExACTは、PAF、HARDVS、SeActデータセットでそれぞれ94.83%(+2.23%)、90.10%(+37.47%)、67.24%の認識精度を達成していることがわかった。 Event cameras have recently been shown beneficial for practical vision tasks, such as action recognition, thanks to their high temporal resolution, power efficiency, and reduced privacy concerns. However, current research is hindered by 1) the difficulty in processing events because of their prolonged duration and dynamic actions with complex and ambiguous semantics and 2) the redundant action depiction of the event frame representation with fixed stacks. We find language naturally conveys abundant semantic information, rendering it stunningly superior in reducing semantic uncertainty. In light of this, we propose ExACT, a novel approach that, for the first time, tackles event-based action recognition from a cross-modal conceptualizing perspective. Our ExACT brings two technical contributions. Firstly, we propose an adaptive fine-grained event (AFE) representation to adaptively filter out the repeated events for the stationary objects while preserving dynamic ones. This subtly enhances the performance of ExACT without extra computational cost. Then, we propose a conceptual reasoning-based uncertainty estimation module, which simulates the recognition process to enrich the semantic representation. In particular, conceptual reasoning builds the temporal relation based on the action semantics, and uncertainty estimation tackles the semantic uncertainty of actions based on the distributional representation. Experiments show that our ExACT achieves superior recognition accuracy of 94.83%(+2.23%), 90.10%(+37.47%) and 67.24% on PAF, HARDVS and our SeAct datasets respectively.	翻訳日:2024-03-20 14:52:48 公開日:2024-03-19
# Rendering-Guided Densificationと正規化最適化を用いたガウス平板を用いた高忠実SLAM High-Fidelity SLAM Using Gaussian Splatting with Rendering-Guided Densification and Regularized Optimization ( http://arxiv.org/abs/2403.12535v1 ) ライセンス: Link先を確認	Shuo Sun, Malcolm Mielle, Achim J. Lilienthal, Martin Magnusson,	(参考訳) 本稿では,3次元ガウススプラッティングに基づく高密度RGBD SLAMシステムを提案する。そこで我々はまず,未観測領域を地図化し,再観測領域を精査するためのレンダリング損失に基づくガウス密度化戦略を提案する。第2に、連続写像問題において、パラメータが最新のフレームに過度に適合し、その結果、前のフレームのレンダリング品質が低下する傾向にある忘れ問題を軽減するために、余分な正規化パラメータを導入する。マッピングと追跡は、ガウスパラメータを用いて、異なる方法で損失の再レンダリングを最小化することによって行われる。近年のニューラルかつ並列に開発されたガウススプラッティング RGBD SLAM ベースラインと比較して,本手法は合成データセット Replica の最先端結果と実世界のデータセット TUM の競合結果を得る。 We propose a dense RGBD SLAM system based on 3D Gaussian Splatting that provides metrically accurate pose tracking and visually realistic reconstruction. To this end, we first propose a Gaussian densification strategy based on the rendering loss to map unobserved areas and refine reobserved areas. Second, we introduce extra regularization parameters to alleviate the forgetting problem in the continuous mapping problem, where parameters tend to overfit the latest frame and result in decreasing rendering quality for previous frames. Both mapping and tracking are performed with Gaussian parameters by minimizing re-rendering loss in a differentiable way. Compared to recent neural and concurrently developed gaussian splatting RGBD SLAM baselines, our method achieves state-of-the-art results on the synthetic dataset Replica and competitive results on the real-world dataset TUM.	翻訳日:2024-03-20 14:52:48 公開日:2024-03-19
# Vox-Fusion++: マルチマップによるVoxelベースのニューラルインプリシト・センストラッキングとマッピング Vox-Fusion++: Voxel-based Neural Implicit Dense Tracking and Mapping with Multi-maps ( http://arxiv.org/abs/2403.12536v1 ) ライセンス: Link先を確認	Hongjia Zhai, Hai Li, Xingrui Yang, Gan Huang, Yuhang Ming, Hujun Bao, Guofeng Zhang,	(参考訳) 本稿では,Vox-Fusion++について紹介する。Vox-Fusion++は,従来の容積融合技術とニューラルな暗黙表現をシームレスに融合する,マルチマップベースの頑健な高密度トラッキングとマッピングシステムである。暗黙のマッピングと位置決めシステムの概念に基づいて、我々のアプローチは実世界のシナリオに適用性を広げる。本システムでは,各ボクセル内のシーンの効率的なエンコーディングと最適化を実現するために,ボクセルをベースとしたニューラル暗黙表面表現を用いる。先行知識のない多様な環境を扱うため,シーン分割と動的展開のためのオクツリー構造を組み込んだ。リアルタイム性能を実現するために,我々は高性能なマルチプロセスフレームワークを提案する。これにより、厳密な時間制約のあるアプリケーションに対するシステムの適合性が保証される。さらに,大規模シーンを扱うためにマルチマップの考え方を採用し,ループ検出と階層的なポーズ最適化戦略を活用して長期ポーズのドリフトを低減し,重複幾何学を除去する。総合的な評価を通じて,提案手法は様々なシナリオにおける再現性や精度において,従来の手法よりも優れていたことを実証する。また、我々のVox-Fusion++は拡張現実および協調マッピングアプリケーションで使用できることを示す。ソースコードは \url{https://github.com/zju3dv/Vox-Fusion_Plus_Plus} で公開されます。 In this paper, we introduce Vox-Fusion++, a multi-maps-based robust dense tracking and mapping system that seamlessly fuses neural implicit representations with traditional volumetric fusion techniques. Building upon the concept of implicit mapping and positioning systems, our approach extends its applicability to real-world scenarios. Our system employs a voxel-based neural implicit surface representation, enabling efficient encoding and optimization of the scene within each voxel. To handle diverse environments without prior knowledge, we incorporate an octree-based structure for scene division and dynamic expansion. To achieve real-time performance, we propose a high-performance multi-process framework. This ensures the system's suitability for applications with stringent time constraints. Additionally, we adopt the idea of multi-maps to handle large-scale scenes, and leverage loop detection and hierarchical pose optimization strategies to reduce long-term pose drift and remove duplicate geometry. Through comprehensive evaluations, we demonstrate that our method outperforms previous methods in terms of reconstruction quality and accuracy across various scenarios. We also show that our Vox-Fusion++ can be used in augmented reality and collaborative mapping applications. Our source code will be publicly available at \url{https://github.com/zju3dv/Vox-Fusion_Plus_Plus}	翻訳日:2024-03-20 14:52:48 公開日:2024-03-19
# 全スライド画像分類のためのPrompt-Guided Adaptive Model Transformation Prompt-Guided Adaptive Model Transformation for Whole Slide Image Classification ( http://arxiv.org/abs/2403.12537v1 ) ライセンス: Link先を確認	Yi Lin, Zhengjie Zhu, Kwang-Ting Cheng, Hao Chen,	(参考訳) スライド画像全体(WSI)を分類するための一般的な手法として,MIL(Multiple Case Learning)が登場している。既存のアプローチは、例の特徴を抽出するために、凍結した訓練済みのモデルに依存しており、訓練済みの自然像と病理像の間の領域シフトを無視している。この問題を解決するために, PAMT を提案する。Pmpt-Guided Adaptive Model Transformation フレームワークは, 学習済みモデルを病理組織学的データの特徴にシームレスに適応させることにより, MIL 分類性能を向上させる。本稿では, 複雑な病理組織分布を捉えるために, 適応的パッチサンプリング (RPS) とPVP (Prototypeal Visual Prompt) を導入し, 入力データを再構成し, コンパクトかつ情報的表現を構築する。さらに、ドメインギャップを狭めるために、機能抽出パイプラインにアダプタブロックを統合するAdaptive Model Transformation (AMT)を導入し、事前学習したモデルがドメイン固有の特徴を学習できるようにする。我々は,Camelyon16 と TCGA-NSCLC の2つの公開データセットに対するアプローチを厳格に評価し,様々な MIL モデルで大幅に改善されていることを示す。本研究は, PAMTがWSI分類に新たなベンチマークを設定できる可能性を実証し, 対象とする再プログラミング手法の価値を裏付けるものである。 Multiple instance learning (MIL) has emerged as a popular method for classifying histopathology whole slide images (WSIs). Existing approaches typically rely on frozen pre-trained models to extract instance features, neglecting the substantial domain shift between pre-training natural and histopathological images. To address this issue, we propose PAMT, a novel Prompt-guided Adaptive Model Transformation framework that enhances MIL classification performance by seamlessly adapting pre-trained models to the specific characteristics of histopathology data. To capture the intricate histopathology distribution, we introduce Representative Patch Sampling (RPS) and Prototypical Visual Prompt (PVP) to reform the input data, building a compact while informative representation. Furthermore, to narrow the domain gap, we introduce Adaptive Model Transformation (AMT) that integrates adapter blocks within the feature extraction pipeline, enabling the pre-trained models to learn domain-specific features. We rigorously evaluate our approach on two publicly available datasets, Camelyon16 and TCGA-NSCLC, showcasing substantial improvements across various MIL models. Our findings affirm the potential of PAMT to set a new benchmark in WSI classification, underscoring the value of a targeted reprogramming approach.	翻訳日:2024-03-20 14:52:48 公開日:2024-03-19
# 多層ネットワークにおけるスペクトル法によるコミュニティ検出 Community detection by spectral methods in multi-layer networks ( http://arxiv.org/abs/2403.12540v1 ) ライセンス: Link先を確認	Huan Qing,	(参考訳) 多層ネットワークにおけるコミュニティ検出は,ネットワーク解析において重要な問題である。本稿では,MLDCSBM(Multilayer degree-corrected stochastic block model)フレームワークにおけるコミュニティ検出のための2つのスペクトルクラスタリングアルゴリズムの性能を解析する。 1つのアルゴリズムは隣接行列の和に基づいており、もう1つは2乗隣接行列の偏りの和を利用する。 MLDCSBMでは,ネットワークのサイズや層数の増加に伴い,これらの手法を用いたコミュニティ検出の一貫性が確立される。本定理は,コミュニティ検出に複数の層を利用する利点を実証するものである。さらに,2乗隣接行列の縮退和によるスペクトルクラスタリングは,概して隣接行列の和によるスペクトルクラスタリングよりも優れていることを示す。数値シミュレーションにより, このアルゴリズムは, 多層ネットワークにおける既存のコミュニティ検出手法を超越した2乗隣接行列のデバイアス和を用いていることを確認した。最後に、実世界の複数層ネットワークの解析は有意義な洞察を与える。 Community detection in multi-layer networks is a crucial problem in network analysis. In this paper, we analyze the performance of two spectral clustering algorithms for community detection within the multi-layer degree-corrected stochastic block model (MLDCSBM) framework. One algorithm is based on the sum of adjacency matrices, while the other utilizes the debiased sum of squared adjacency matrices. We establish consistency results for community detection using these methods under MLDCSBM as the size of the network and/or the number of layers increases. Our theorems demonstrate the advantages of utilizing multiple layers for community detection. Moreover, our analysis indicates that spectral clustering with the debiased sum of squared adjacency matrices is generally superior to spectral clustering with the sum of adjacency matrices. Numerical simulations confirm that our algorithm, employing the debiased sum of squared adjacency matrices, surpasses existing methods for community detection in multi-layer networks. Finally, the analysis of several real-world multi-layer networks yields meaningful insights.	翻訳日:2024-03-20 14:52:48 公開日:2024-03-19
# TAGS:Tag-Propagation-based Provenance Graphアライメントによるストリーミングイベントのリアルタイム侵入検出 TAGS: Real-time Intrusion Detection with Tag-Propagation-based Provenance Graph Alignment on Streaming Events ( http://arxiv.org/abs/2403.12541v1 ) ライセンス: Link先を確認	Zhenyuan Li, Yangyang Wei, Xiangmin Shen, Lingzhi Wang, Yan Chen, Haitao Xu, Shouling Ji, Fan Zhang,	(参考訳) サイバー攻撃の進化と進展は、既存のセキュリティ製品に課題をもたらす。近年、前駆体グラフによる検出に関する集中的な研究は、攻撃検出と調査においてその効果が証明されている。しかし、これらのアプローチを実際に実装することは、高いオーバーヘッド、遅い応答性、低い解釈可能性と拡張性といった課題に直面する。本稿では,タグプロパゲーションに基づくストリーミング前駆グラフアライメントシステムTAGSを提案する。タグベースの中間結果キャッシュ機構と、慎重に作成された伝搬規則を用いることで、生データ処理の保存と複製を不要にする。このアプローチは、インメモリストレージの要求を効果的に軽減し、データ処理のオーバーヘッドを最小限にし、CPUとメモリリソースを著しく保存しながら、高速なオンストリームグラフアライメントを容易にします。その結果、TAGSはリアルタイムで様々なサイバー攻撃を検出し、調査することができる。さらに、TAGSでは、データストリーム内の拡張攻撃を特定するために、クエリグラフを柔軟にカスタマイズすることができる。我々は,257.42GBの監査ログと12種類の関連クエリグラフを含む2つの大規模公開データセットを実験的に評価し,複数の攻撃手法とシナリオをカバーした。その結果,TAGSは1秒あたり176Kのイベントを処理するのに十分な効率性を示し,29のアライメントをすべて精度よく同定した。さらに、300MB未満の安定した低レベルのメモリ消費で依存性の爆発問題を効果的に処理でき、3つの偽陽性しか発生しない。全体として、TAGSのパフォーマンスは最先端の手法よりも著しく優れています。 The evolution and advancement of cyberattacks pose challenges to existing security products. Recent concentrated research on provenance graph-based detection has proved its effectiveness in attack detection and investigation. However, implementing these approaches in practice encounters challenges such as high overhead, slow responsiveness, and low interpretability and extensibility. Towards practical attack detection and investigation with provenance graphs, we propose TAGS, a tag-propagation-based streaming provenance graph alignment system. Utilizing the tag-based intermediate result caching mechanism alongside carefully crafted propagation rules, we eliminate the need to store and duplicate raw data processing. This approach effectively mitigates in-memory storage requirements and minimizes data processing overhead, facilitating rapid on-stream graph alignment while significantly conserving CPU and memory resources. As a result, TAGS can detect and investigate various cyber-attacks in real-time. Moreover, TAGS allows analysts to customize attack query graphs flexibly to identify extended attacks in data streams. We conduct experimental evaluations on two large-scale public datasets containing 257.42 GB of audit logs with 12 relevant query graphs of varying sizes, covering multiple attack techniques and scenarios. The results demonstrate that TAGS is sufficiently efficient to process 176K events per second while sufficiently accurately identifying all 29 alignments in massive data. Moreover, it can effectively handle the dependency explosion problem with steady, low-level memory consumption (less than 300MB), producing only 3 false positives. Overall, the performance of TAGS significantly outperforms the state-of-the-art methods.	翻訳日:2024-03-20 14:52:48 公開日:2024-03-19
# HCPM: 効率的な検出自由マッチングのための階層的候補決定 HCPM: Hierarchical Candidates Pruning for Efficient Detector-Free Matching ( http://arxiv.org/abs/2403.12543v1 ) ライセンス: Link先を確認	Ying Chen, Yong Liu, Kai Wu, Qiang Nie, Shang Xu, Huifang Ma, Bing Wang, Chengjie Wang,	(参考訳) 深層学習に基づく画像マッチング法はコンピュータビジョンにおいて重要な役割を担っているが、しばしばかなりの計算要求に悩まされる。この課題に対処するために,階層的プルーニングを用いてマッチングパイプラインを最適化する,効率的かつ検出不要な局所特徴マッチング手法HCPMを提案する。整合性のための粗度候補の徹底的な集合に依存する最近の検出自由法とは対照的に、HCPMは情報的候補の簡潔な部分集合に選択的に集中し、計算的候補が減り、マッチング効率が向上する。本発明の方法は、信頼性の高い候補を選択できるセルフプランニングステージと、粗いレベルで相関パッチを識別するインタラクティブプルーニングステージとを含む。その結果,HCPMは精度を保ちながら,既存の手法をはるかに上回っていることが明らかとなった。ソースコードは公開時に公開される予定だ。 Deep learning-based image matching methods play a crucial role in computer vision, yet they often suffer from substantial computational demands. To tackle this challenge, we present HCPM, an efficient and detector-free local feature-matching method that employs hierarchical pruning to optimize the matching pipeline. In contrast to recent detector-free methods that depend on an exhaustive set of coarse-level candidates for matching, HCPM selectively concentrates on a concise subset of informative candidates, resulting in fewer computational candidates and enhanced matching efficiency. The method comprises a self-pruning stage for selecting reliable candidates and an interactive-pruning stage that identifies correlated patches at the coarse level. Our results reveal that HCPM significantly surpasses existing methods in terms of speed while maintaining high accuracy. The source code will be made available upon publication.	翻訳日:2024-03-20 14:52:48 公開日:2024-03-19
# AffineQuant: 大規模言語モデルのためのアフィン変換量子化 AffineQuant: Affine Transformation Quantization for Large Language Models ( http://arxiv.org/abs/2403.12544v1 ) ライセンス: Link先を確認	Yuexiao Ma, Huixia Li, Xiawu Zheng, Feng Ling, Xuefeng Xiao, Rui Wang, Shilei Wen, Fei Chao, Rongrong Ji,	(参考訳) 大規模言語モデル(LLM)に関連する重要なリソース要件は、ニューラルネットワークの圧縮と加速を目的とした技術の開発に多大な関心を集めている。これらの技術の中で,PTQ(Post-Training Quantization)が注目されている。 LLMの既存のPTQ法は、前量子化重みと後量子化重みのスケーリングに最適化範囲を制限している。本稿では,PTQ(AffineQuant)における等価なアフィン変換を用いた直接最適化を提案する。このアプローチは最適化範囲を拡張し、量子化誤差を大幅に最小化する。さらに、対応する逆行列を用いることで、PTQの前量子化出力と後量子化出力の等価性を確保し、効率と一般化能力を維持できる。さらに,最適化中の変換の可逆性を確保するために,段階的なマスク最適化手法を導入する。この方法は当初、対角要素の最適化に焦点を合わせ、徐々に他の要素に拡張する。このようなアプローチは、理論的には変換の可逆性を保証するレヴィ・デスプランクスの定理と一致している。その結果、多様なデータセット上で異なるLLM間での大幅なパフォーマンス改善が明らかとなった。具体的には、W4A4量子化のLLaMA2-7Bモデル上で、15.76(OmniQuantでは2.26下が18.02)のC4パープレキシティを得る。ゼロショットタスクにおいて、AffineQuantはLLaMA-30Bの4/4ビット量子化を使用すると平均58.61の精度(OmniQuantでは1.98以下、56.63以下)を達成する。 The significant resource requirements associated with Large-scale Language Models (LLMs) have generated considerable interest in the development of techniques aimed at compressing and accelerating neural networks. Among these techniques, Post-Training Quantization (PTQ) has emerged as a subject of considerable interest due to its noteworthy compression efficiency and cost-effectiveness in the context of training. Existing PTQ methods for LLMs limit the optimization scope to scaling transformations between pre- and post-quantization weights. In this paper, we advocate for the direct optimization using equivalent Affine transformations in PTQ (AffineQuant). This approach extends the optimization scope and thus significantly minimizing quantization errors. Additionally, by employing the corresponding inverse matrix, we can ensure equivalence between the pre- and post-quantization outputs of PTQ, thereby maintaining its efficiency and generalization capabilities. To ensure the invertibility of the transformation during optimization, we further introduce a gradual mask optimization method. This method initially focuses on optimizing the diagonal elements and gradually extends to the other elements. Such an approach aligns with the Levy-Desplanques theorem, theoretically ensuring invertibility of the transformation. As a result, significant performance improvements are evident across different LLMs on diverse datasets. To illustrate, we attain a C4 perplexity of 15.76 (2.26 lower vs 18.02 in OmniQuant) on the LLaMA2-7B model of W4A4 quantization without overhead. On zero-shot tasks, AffineQuant achieves an average of 58.61 accuracy (1.98 lower vs 56.63 in OmniQuant) when using 4/4-bit quantization for LLaMA-30B, which setting a new state-of-the-art benchmark for PTQ in LLMs.	翻訳日:2024-03-20 14:52:48 公開日:2024-03-19
# 散逸的量子力学から構築された動的に制約されたモデル Kinetically constrained models constructed from dissipative quantum dynamics ( http://arxiv.org/abs/2403.12548v1 ) ライセンス: Link先を確認	Somnath Maity, Ryusuke Hamazaki,	(参考訳) 強散逸下でのマルコフ量子力学を用いた速度論的拘束モデルの構築を提案する。 Gorini-Kossakowski-Sudarshan-Lindblad (GKSL) の定式化を用いて、強い散逸が創発的な非コヒーレンスな部分空間に繋がることを示す。我々は、GKSL のダイナミクスによって構成されるユニタリダイナミクスは、相互作用が GKSL のジャンプ作用素と同一の形式を持つ強い相互作用を持つハミルトニアンによって構成されるものよりも、より厳密に制約されていると論じる。一例として、二点散逸を持つ一次元スピン系が、凍結ブロック構造を付加した自由磁壁運動を示す運動論的に制約された ``PXQ" モデルに導かれることを示した。均一磁場下では、PXQモデルはワニエ・スタークのローカライゼーションと同様、磁壁のローカライゼーションを示す。次に、磁場と鎖間相互作用を持つ2つのPXQ鎖を結合する。典型的なパラメータ状態の相互作用にもかかわらず、領域壁の局所化は継続するが、あるパラメータ線に対して非自明な部分的非局在化が現れる。 We propose a construction of kinetically constrained models using the Markovian quantum dynamics under strong dissipation. Using the Gorini-Kossakowski-Sudarshan-Lindblad (GKSL) formalism, we show that strong dissipation leads to the emergent decoherence-free subspaces, within which constrained quantum many-body unitary dynamics can take place. We argue that the unitary dynamics constructed by the GKSL dynamics is more tightly constrained than that constructed by the strongly interacting Hamiltonian, where the interactions have the same form with the GKSL jump operators. As an example, we demonstrate that a one-dimensional spin system with two-site dissipation leads to the kinetically constrained ``PXQ" model, which exhibits the free domain-wall motion with an additional frozen-block structure. Under the uniform magnetic field, the PXQ model shows the domain-wall localization, similar to the Wannier-Stark localization. We then couple two PXQ chains with the magnetic field and inter-chain interaction. We discover that, while localization of the domain walls persists despite the interactions for typical parameter regimes, a non-trivial partial delocalization appears for a certain parameter line.	翻訳日:2024-03-20 14:52:48 公開日:2024-03-19
# RGBD GS-ICP SLAM RGBD GS-ICP SLAM ( http://arxiv.org/abs/2403.12550v1 ) ライセンス: Link先を確認	Seongbo Ha, Jiung Yeon, Hyeonwoo Yu,	(参考訳) 濃密な表現を伴う同時局在マッピング(SLAM)は、ロボット工学、仮想現実(VR)、拡張現実(AR)アプリケーションにおいて重要な役割を果たす。密度表現SLAMの最近の進歩は、高忠実度空間表現にニューラルシーン表現と3次元ガウス表現を活用する可能性を強調している。本稿では,G-ICP(Generalized Iterative Closest Point)と3D Gaussian Splatting(3DGS)を融合した新しい高密度表現SLAM手法を提案する。既存の手法とは対照的に、トラッキングとマッピングの両方に単一のガウス写像を使用し、相互に利益をもたらす。追跡処理とマッピング処理との共分散をスケールアライメント技術と交換することで、冗長な計算を最小化し、効率的なシステムを実現する。さらに,キーフレーム選択手法により追跡精度とマッピング品質を向上させる。提案手法の有効性を実験的に示し,107 FPS (システム全体) の超高速化と再建地図の高品質化を実証した。 Simultaneous Localization and Mapping (SLAM) with dense representation plays a key role in robotics, Virtual Reality (VR), and Augmented Reality (AR) applications. Recent advancements in dense representation SLAM have highlighted the potential of leveraging neural scene representation and 3D Gaussian representation for high-fidelity spatial representation. In this paper, we propose a novel dense representation SLAM approach with a fusion of Generalized Iterative Closest Point (G-ICP) and 3D Gaussian Splatting (3DGS). In contrast to existing methods, we utilize a single Gaussian map for both tracking and mapping, resulting in mutual benefits. Through the exchange of covariances between tracking and mapping processes with scale alignment techniques, we minimize redundant computations and achieve an efficient system. Additionally, we enhance tracking accuracy and mapping quality through our keyframe selection methods. Experimental results demonstrate the effectiveness of our approach, showing an incredibly fast speed up to 107 FPS (for the entire system) and superior quality of the reconstructed map.	翻訳日:2024-03-20 14:52:48 公開日:2024-03-19
# M2DA: 自律運転のための運転注意を組み込んだ多モード核融合変圧器 M2DA: Multi-Modal Fusion Transformer Incorporating Driver Attention for Autonomous Driving ( http://arxiv.org/abs/2403.12552v1 ) ライセンス: Link先を確認	Dongyang Xu, Haokun Li, Qingfan Wang, Ziying Song, Lei Chen, Hanming Deng,	(参考訳) エンドツーエンドの自動運転は目覚ましい進歩を遂げた。しかし、主に自動運転車の広範な展開は実現されていない。 1)非効率なマルチモーダル環境認識:マルチモーダルセンサからのデータをより効率的に統合する方法 2) 経験豊富なドライバーのような交通シナリオにおいて,危険因子を効果的に発見し,予測する方法。本稿では,運転注意(M2DA)を組み込んだ多モード核融合トランスフォーマを提案する。マルチモーダルデータを融合し、異なるモーダル間の高整合を実現するために、新しいLidar-Vision-Attention-based Fusion (LVAFusion)モジュールを提案する。ドライバーの注意を取り入れることで、自動運転車に人間のようなシーン理解能力を付与し、複雑なシナリオの中で重要な領域を正確に特定し、安全性を確保する。我々はCARLAシミュレータで実験を行い、クローズドループベンチマークにおいて少ないデータで最先端の性能を達成する。ソースコードはhttps://anonymous.4open.science/r/M2DA-4772で公開されている。 End-to-end autonomous driving has witnessed remarkable progress. However, the extensive deployment of autonomous vehicles has yet to be realized, primarily due to 1) inefficient multi-modal environment perception: how to integrate data from multi-modal sensors more efficiently; 2) non-human-like scene understanding: how to effectively locate and predict critical risky agents in traffic scenarios like an experienced driver. To overcome these challenges, in this paper, we propose a Multi-Modal fusion transformer incorporating Driver Attention (M2DA) for autonomous driving. To better fuse multi-modal data and achieve higher alignment between different modalities, a novel Lidar-Vision-Attention-based Fusion (LVAFusion) module is proposed. By incorporating driver attention, we empower the human-like scene understanding ability to autonomous vehicles to identify crucial areas within complex scenarios precisely and ensure safety. We conduct experiments on the CARLA simulator and achieve state-of-the-art performance with less data in closed-loop benchmarks. Source codes are available at https://anonymous.4open.science/r/M2DA-4772.	翻訳日:2024-03-20 14:52:48 公開日:2024-03-19
# 多分野PDEのためのコドメイン注意神経オペレータの事前学習 Pretraining Codomain Attention Neural Operators for Solving Multiphysics PDEs ( http://arxiv.org/abs/2403.12553v1 ) ライセンス: Link先を確認	Md Ashiqur Rahman, Robert Joseph George, Mogab Elleithy, Daniel Leibovici, Zongyi Li, Boris Bonev, Colin White, Julius Berner, Raymond A. Yeh, Jean Kossaifi, Kamyar Azizzadenesheli, Anima Anandkumar,	(参考訳) 既存のニューラルネットワークアーキテクチャは、複雑なジオメトリー、物理変数間の相互作用、高解像度のトレーニングデータの欠如などにより、結合偏微分方程式(PDE)で多重物理問題を解く際の課題に直面している。このような問題に対処するために、コドメインやチャネル空間に沿った機能をトークン化し、自己教師付き学習や複数のPDEシステムの事前学習を可能にするコドメイン注意ニューラルネットワーク(CoDA-NO)を提案する。具体的には、位置符号化、自己アテンション、正規化層を関数空間に拡張する。 CoDA-NOは1つのモデルで異なるPDEシステムの表現を学習することができる。我々は,CoDA-NOの可能性を,複数システム上で多物理PDEを学習するためのバックボーンとして評価する。流体流動シミュレーションや流体構造相互作用などの限られたデータを含む複雑な下流タスクにおいて,CoDA-NOは,数ショット学習タスクにおける既存手法を36 %以上上回る性能を示した。コードはhttps://github.com/ashiq24/CoDA-NOで公開されている。 Existing neural operator architectures face challenges when solving multiphysics problems with coupled partial differential equations (PDEs), due to complex geometries, interactions between physical variables, and the lack of large amounts of high-resolution training data. To address these issues, we propose Codomain Attention Neural Operator (CoDA-NO), which tokenizes functions along the codomain or channel space, enabling self-supervised learning or pretraining of multiple PDE systems. Specifically, we extend positional encoding, self-attention, and normalization layers to the function space. CoDA-NO can learn representations of different PDE systems with a single model. We evaluate CoDA-NO's potential as a backbone for learning multiphysics PDEs over multiple systems by considering few-shot learning settings. On complex downstream tasks with limited data, such as fluid flow simulations and fluid-structure interactions, we found CoDA-NO to outperform existing methods on the few-shot learning task by over $36\%$. The code is available at https://github.com/ashiq24/CoDA-NO.	翻訳日:2024-03-20 14:43:03 公開日:2024-03-19
# グロースフリー手話翻訳のための大規模言語モデルを用いた因子学習 Factorized Learning Assisted with Large Language Model for Gloss-free Sign Language Translation ( http://arxiv.org/abs/2403.12556v1 ) ライセンス: Link先を確認	Zhigang Chen, Benjia Zhou, Jun Li, Jun Wan, Zhen Lei, Ning Jiang, Quan Lu, Guoqing Zhao,	(参考訳) 従来の手話翻訳(SLT)手法は、グロスアノテーションを頼りにすることで優れた性能を実現する。しかし、高品質なグルースをラベル付けすることは労働集約的な作業であり、SLTのさらなる発展を制限する。視覚エンコーダと翻訳ネットワークを共同でトレーニングすることで、光沢のないSLTに向けたアプローチもあるが、これらの取り組みは依然として性能が悪く、強力なLarge Language Model (LLM) の非効率な利用に悩まされている。最も深刻なのは、LSMをSLTに直接導入することで、LLMが学習曲線を支配しているため、視覚表現の学習が不十分になることである。これらの問題に対処するために,Glos-free SLTのためのLarge Language Model (FLa-LLM) を用いたファクトライズドラーニングを提案する。具体的には、トレーニングプロセスを2段階に分類する。視覚初期化段階では、視覚エンコーダの後に軽量な翻訳モデルを用いて、視覚エンコーダを事前訓練する。 LLMの微調整段階では、視覚エンコーダの取得した知識を凍結し、学習済みのLLMと統合し、LLMの翻訳電位を刺激する。この分解されたトレーニング戦略は、3つのSLTデータセットで達成された大幅な改善によって証明されたように、非常に効果的であることが証明されている。 Previous Sign Language Translation (SLT) methods achieve superior performance by relying on gloss annotations. However, labeling high-quality glosses is a labor-intensive task, which limits the further development of SLT. Although some approaches work towards gloss-free SLT through jointly training the visual encoder and translation network, these efforts still suffer from poor performance and inefficient use of the powerful Large Language Model (LLM). Most seriously, we find that directly introducing LLM into SLT will lead to insufficient learning of visual representations as LLM dominates the learning curve. To address these problems, we propose Factorized Learning assisted with Large Language Model (FLa-LLM) for gloss-free SLT. Concretely, we factorize the training process into two stages. In the visual initialing stage, we employ a lightweight translation model after the visual encoder to pre-train the visual encoder. In the LLM fine-tuning stage, we freeze the acquired knowledge in the visual encoder and integrate it with a pre-trained LLM to inspire the LLM's translation potential. This factorized training strategy proves to be highly effective as evidenced by significant improvements achieved across three SLT datasets which are all conducted under the gloss-free setting.	翻訳日:2024-03-20 14:43:03 公開日:2024-03-19
# マルチラベルクラスインクリメンタルラーニングのための信頼自己校正 Confidence Self-Calibration for Multi-Label Class-Incremental Learning ( http://arxiv.org/abs/2403.12559v1 ) ライセンス: Link先を確認	Kaile Du, Yifan Zhou, Fan Lyu, Yuyang Li, Chen Lu, Guangcan Liu,	(参考訳) MLCIL(Multi-Label Class-Incremental Learning)では、トレーニング中に新しいクラスだけがラベル付けされるのに対して、過去と将来のラベルは利用できない。この問題は、誤って高い信頼度を持つマルチラベル予測によって偽陽性エラーが増加し、不連続ラベル空間内で破滅的な忘れを悪化させる。本稿では,MLCILのマルチラベル信頼度校正を改良し,信頼性自己校正(CSC)アプローチを提案する。まず、ラベル関係の校正のために、学習可能で動的に拡張されたラベル関係グラフを構築することにより、孤立ラベル空間を橋渡しするクラスインクリメンタルグラフ畳み込みネットワークを導入する。そして、信頼度校正のために、各マルチラベルインクリメントに対して最大エントロピー正規化を示し、過信出力分布のペナル化による自信自己校正を容易にする。提案手法は,MS-COCOとPASCALのVOCデータセット上でのMLCILタスクにおいて,ラベルの信頼性の校正を手法によって確認した。 The partial label challenge in Multi-Label Class-Incremental Learning (MLCIL) arises when only the new classes are labeled during training, while past and future labels remain unavailable. This issue leads to a proliferation of false-positive errors due to erroneously high confidence multi-label predictions, exacerbating catastrophic forgetting within the disjoint label space. In this paper, we aim to refine multi-label confidence calibration in MLCIL and propose a Confidence Self-Calibration (CSC) approach. Firstly, for label relationship calibration, we introduce a class-incremental graph convolutional network that bridges the isolated label spaces by constructing learnable, dynamically extended label relationship graph. Then, for confidence calibration, we present a max-entropy regularization for each multi-label increment, facilitating confidence self-calibration through the penalization of over-confident output distributions. Our approach attains new state-of-the-art results in MLCIL tasks on both MS-COCO and PASCAL VOC datasets, with the calibration of label confidences confirmed through our methodology.	翻訳日:2024-03-20 14:43:03 公開日:2024-03-19
# アクセスによるエクイティ:小規模深層学習の事例 Equity through Access: A Case for Small-scale Deep Learning ( http://arxiv.org/abs/2403.12562v1 ) ライセンス: Link先を確認	Raghavendra Selvan, Bob Pepin, Christian Igel, Gabrielle Samuel, Erik B Dam,	(参考訳) 近年のディープラーニング(DL)の進歩は,大規模データへのアクセスと計算によって加速されている。これらの大規模資源は、計算、データ、エネルギー、炭素排出量の点で資源集約的な、徐々に大きなモデルを訓練するために使われてきた。これらのコストは、特にグローバル・サウスでは、そのような規模のリソースへのアクセスが限られている研究者や実践者にとって、新たなタイプの参入障壁になりつつある。本稿では,既存の視覚タスク用DLモデルの展望を包括的に見て,リソースが制限されている環境での有用性を実証する。 DLモデルの資源消費を考慮し,資源単位当たりの性能を推定する新たな尺度を導入し,これをPePRスコアと呼ぶ。 131のユニークなDLアーキテクチャ(1Mから130Mのトレーニング可能なパラメータ)と3つの医用画像データセットの多種多様なファミリを使用して、パフォーマンスとリソースのトレードオフに関するトレンドを捉えます。医用画像解析のような応用において、我々は、小規模で専門的なモデルは大規模モデルの試行より優れていると論じる。さらに,事前学習モデルを用いることで,必要な計算資源やデータを大幅に削減できることを示す。この取り組みは、より小さなリソースフットプリントを持つ方法やモデルを開発することによって、AIのエクイティを改善することに注力することをコミュニティに促すことを願っている。 The recent advances in deep learning (DL) have been accelerated by access to large-scale data and compute. These large-scale resources have been used to train progressively larger models which are resource intensive in terms of compute, data, energy, and carbon emissions. These costs are becoming a new type of entry barrier to researchers and practitioners with limited access to resources at such scale, particularly in the Global South. In this work, we take a comprehensive look at the landscape of existing DL models for vision tasks and demonstrate their usefulness in settings where resources are limited. To account for the resource consumption of DL models, we introduce a novel measure to estimate the performance per resource unit, which we call the PePR score. Using a diverse family of 131 unique DL architectures (spanning 1M to 130M trainable parameters) and three medical image datasets, we capture trends about the performance-resource trade-offs. In applications like medical image analysis, we argue that small-scale, specialized models are better than striving for large-scale models. Furthermore, we show that using pretrained models can significantly reduce the computational resources and data required. We hope this work will encourage the community to focus on improving AI equity by developing methods and models with smaller resource footprints.	翻訳日:2024-03-20 14:43:03 公開日:2024-03-19
# 時間・メモリ制限型GPUサービスにおける重テキスト分類に対する変換器の簡単なハック Simple Hack for Transformers against Heavy Long-Text Classification on a Time- and Memory-Limited GPU Service ( http://arxiv.org/abs/2403.12563v1 ) ライセンス: Link先を確認	Mirza Alim Mutasodirin, Radityo Eko Prasojo, Achmad F. Abka, Hanif Rasyidi,	(参考訳) 多くのNLP研究者は、Google Colabのような無料の計算サービスを使って、Transformerモデルを微調整し、二次的な複雑さとより大きなリソースを必要とするメソッドのために、長文分類におけるハイパーパラメータ最適化(HPO)の制限を引き起こしている。インドネシアでは、トランスフォーマーを用いた長文の分類ではごくわずかしか見つからなかった。ほとんどの場合、少量のデータのみを使用し、HPOを報告しない。本研究では,18kのニュース記事を用いて,トークンの出力長に基づいた事前学習モデルの使用を推奨する手法を検討する。次に、いくつかのハックを比較して、停止語、句読点、低頻度語、繰り返し単語の削除といったシーケンスを短くし、強化します。公平な比較を得るために,限られたリソースで段階的に実行可能で,長期の最適化ライブラリを必要としない,効率的で動的なHPOプロシージャを提案し,実行している。見つかった最高のハックを使って、512、256、および128のトークン長を比較します。句読点と低頻度の単語を保ちながら、停止語を削除することが、最良のハックであることに気付きました。セットアップのいくつかは、より小さな128または256のファーストトークンを使用して、512のファーストトークンを処理し、計算リソースを少なくしながら同じ情報を表現しています。この発見は,限られたリソースを使用して,モデルの最適なパフォーマンスを効率的に追求する上で有効だ。 Many NLP researchers rely on free computational services, such as Google Colab, to fine-tune their Transformer models, causing a limitation for hyperparameter optimization (HPO) in long-text classification due to the method having quadratic complexity and needing a bigger resource. In Indonesian, only a few works were found on long-text classification using Transformers. Most only use a small amount of data and do not report any HPO. In this study, using 18k news articles, we investigate which pretrained models are recommended to use based on the output length of the tokenizer. We then compare some hacks to shorten and enrich the sequences, which are the removals of stopwords, punctuation, low-frequency words, and recurring words. To get a fair comparison, we propose and run an efficient and dynamic HPO procedure that can be done gradually on a limited resource and does not require a long-running optimization library. Using the best hack found, we then compare 512, 256, and 128 tokens length. We find that removing stopwords while keeping punctuation and low-frequency words is the best hack. Some of our setups manage to outperform taking 512 first tokens using a smaller 128 or 256 first tokens which manage to represent the same information while requiring less computational resources. The findings could help developers to efficiently pursue optimal performance of the models using limited resources.	翻訳日:2024-03-20 14:43:03 公開日:2024-03-19
# TrustZone対応IoTデバイス上でのメモリ効率とセキュアDNN推論 Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT Devices ( http://arxiv.org/abs/2403.12568v1 ) ライセンス: Link先を確認	Xueshuo Xie, Haoxu Wang, Zhaolong Jian, Tao Li, Wei Wang, Zhiwei Xu, Guiling Wang,	(参考訳) エッジインテリジェンスは、リソース要求のDeep Neural Network(DNN)推論を、元のデータを転送することなく実現し、IoT(Consumer Internet of Things)デバイスにおけるデータのプライバシに関する懸念に対処する。プライバシに敏感なアプリケーションでは、ハードウェアアイソレーションされた信頼できる実行環境(TEE)にモデルをデプロイすることが不可欠である。しかし、TEEの限られたセキュアメモリは、DNN推論のデプロイに課題をもたらし、モデルのパーティショニングやオフロードといった代替技術は、パフォーマンスの低下とセキュリティの問題をもたらす。本稿では,モデル推論における包括的プライバシ保護を実現するため,TrustZoneにおける高度なモデル展開のための新しいアプローチを提案する。我々は,TEEにおけるメモリ要求推論を支援するためのメモリ効率管理手法を設計する。メモリの優先度を調整することで、メモリリークリスクとメモリオーバーラップの競合を効果的に軽減し、信頼されたオペレーティングシステムで32行のコード変更を行う。さらに、小さなディープラーニングライブラリであるS-Tinylib(2,538 LoCs)と、小さな数学ライブラリであるTinylibm(827 LoCs)という2つの小さなライブラリを活用して、TEEの効率的な推論をサポートしています。 Raspberry Pi 3B+のプロトタイプを実装し,3種類の軽量DNNモデルを用いて評価した。実験の結果,TEEにおける非メモリ最適化手法と比較して,提案手法は推論速度を3.13倍改善し,消費電力を66.5%以上削減することがわかった。 Edge intelligence enables resource-demanding Deep Neural Network (DNN) inference without transferring original data, addressing concerns about data privacy in consumer Internet of Things (IoT) devices. For privacy-sensitive applications, deploying models in hardware-isolated trusted execution environments (TEEs) becomes essential. However, the limited secure memory in TEEs poses challenges for deploying DNN inference, and alternative techniques like model partitioning and offloading introduce performance degradation and security issues. In this paper, we present a novel approach for advanced model deployment in TrustZone that ensures comprehensive privacy preservation during model inference. We design a memory-efficient management method to support memory-demanding inference in TEEs. By adjusting the memory priority, we effectively mitigate memory leakage risks and memory overlap conflicts, resulting in 32 lines of code alterations in the trusted operating system. Additionally, we leverage two tiny libraries: S-Tinylib (2,538 LoCs), a tiny deep learning library, and Tinylibm (827 LoCs), a tiny math library, to support efficient inference in TEEs. We implemented a prototype on Raspberry Pi 3B+ and evaluated it using three well-known lightweight DNN models. The experimental results demonstrate that our design significantly improves inference speed by 3.13 times and reduces power consumption by over 66.5% compared to non-memory optimization method in TEEs.	翻訳日:2024-03-20 14:43:03 公開日:2024-03-19
# 医用画像における一般化可能な異常検出のための視覚言語モデルの適用 Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images ( http://arxiv.org/abs/2403.12570v1 ) ライセンス: Link先を確認	Chaoqin Huang, Aofan Jiang, Jinghao Feng, Ya Zhang, Xinchao Wang, Yanfeng Wang,	(参考訳) 大規模視覚言語事前訓練モデルの最近の進歩は、自然画像領域におけるゼロ/ファウショット異常検出に大きな進歩をもたらした。しかし, 自然画像と医用画像の領域差は, 医学的異常検出におけるこれらの手法の有効性を制限している。本稿では,CLIPモデルを用いた医用異常検出のための軽量な多レベル適応と比較フレームワークを提案する。提案手法では,複数の残像アダプタを事前学習した視覚エンコーダに統合し,視覚的特徴の段階的向上を実現する。このマルチレベル適応は、自然画像におけるオブジェクトの意味論から医学画像における異常識別まで、モデルの焦点を再検討する多レベルな画素単位の視覚言語特徴アライメント損失関数によって導かれる。適応された特徴は、訓練中に未知の医学的モダリティや解剖学的領域に遭遇するゼロショットシナリオにおいても、様々な医療データタイプにわたる一般化が向上している。医学的異常検出ベンチマーク実験により,本手法が現在の最先端モデルを大幅に上回り,異常分類では平均6.24%,異常分類では7.33%,異常セグメンテーションでは2.03%と2.37%,ゼロショットおよび少数ショット設定では2.37%であった。ソースコードは、https://github.com/MediaBrain-SJTU/MVFA-ADで入手できる。 Recent advancements in large-scale visual-language pre-trained models have led to significant progress in zero-/few-shot anomaly detection within natural image domains. However, the substantial domain divergence between natural and medical images limits the effectiveness of these methodologies in medical anomaly detection. This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection. Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels. This multi-level adaptation is guided by multi-level, pixel-wise visual-language feature alignment loss functions, which recalibrate the model's focus from object semantics in natural imagery to anomaly identification in medical images. The adapted features exhibit improved generalization across various medical data types, even in zero-shot scenarios where the model encounters unseen medical modalities and anatomical regions during training. Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models, with an average AUC improvement of 6.24% and 7.33% for anomaly classification, 2.03% and 2.37% for anomaly segmentation, under the zero-shot and few-shot settings, respectively. Source code is available at: https://github.com/MediaBrain-SJTU/MVFA-AD	翻訳日:2024-03-20 14:43:03 公開日:2024-03-19
# マルチモデルアンサンブルによる複合表現認識 Compound Expression Recognition via Multi Model Ensemble ( http://arxiv.org/abs/2403.12572v1 ) ライセンス: Link先を確認	Jun Yu, Jichao Zhu, Wangyuan Zhu,	(参考訳) 複合表現認識(CER)は対人相互作用において重要な役割を担っている。複合表現が存在するため、人間の感情表現は複雑であり、判断するためには局所的およびグローバルな表情の両方を考慮する必要がある。本稿では,この問題を解決するために,複合表現認識のためのアンサンブル学習手法を提案する。具体的には、畳み込みネットワーク、視覚変換器、マルチスケールローカルアテンションネットワークに基づく3つの表現分類モデルを訓練する。そして、後期融合を用いたモデルアンサンブルにより、複数のモデルの出力をマージして最終的な結果を予測する。提案手法はRAF-DBの精度が高く,C-EXPR-DBの一部部分でゼロショットで表現を認識できる。 Compound Expression Recognition (CER) plays a crucial role in interpersonal interactions. Due to the existence of Compound Expressions , human emotional expressions are complex, requiring consideration of both local and global facial expressions to make judgments. In this paper, to address this issue, we propose a solution based on ensemble learning methods for Compound Expression Recognition. Specifically, our task is classification, where we train three expression classification models based on convolutional networks, Vision Transformers, and multi-scale local attention networks. Then, through model ensemble using late fusion, we merge the outputs of multiple models to predict the final result. Our method achieves high accuracy on RAF-DB and is able to recognize expressions through zero-shot on certain portions of C-EXPR-DB.	翻訳日:2024-03-20 14:43:03 公開日:2024-03-19
# 鳥の視線に対するマルチビュー検出と追跡のリフティング Lifting Multi-View Detection and Tracking to the Bird's Eye View ( http://arxiv.org/abs/2403.12573v1 ) ライセンス: Link先を確認	Torben Teepe, Philipp Wolters, Johannes Gilg, Fabian Herzog, Gerhard Rigoll,	(参考訳) マルチビューアグリゲーションの利点を生かして、マルチオブジェクトのトラッキングと検出において、閉塞や欠落検出といった課題に対処する、有望なソリューションを提供する。近年の多視点検出と3次元物体認識の進歩は、地上面に全てのビューを戦略的に投影し、鳥の視線から検出分析を行うことにより、性能を著しく向上させた。本稿では,パラメータフリーとパラメータ化の両方の現代的なリフト法とマルチビューアグリゲーションを比較した。さらに,複数のステップの特徴を集約してロバスト検出を学習し,外見と動きに基づく追跡手法を組み合わせたアーキテクチャを提案する。現在の追跡手法は歩行者か車両に重点を置いている。当社の作業では、両方のブランチを組み合わせて、クロスシーンセットアップによるマルチビュー検出に新たな課題を加えています。本手法は,歩行者:WildtrackとMultiviewX,道路側認識:Synthehicleの3つの領域にわたる公開データセットに一般化する。 https://github.com/tteepe/TrackTacular Taking advantage of multi-view aggregation presents a promising solution to tackle challenges such as occlusion and missed detection in multi-object tracking and detection. Recent advancements in multi-view detection and 3D object recognition have significantly improved performance by strategically projecting all views onto the ground plane and conducting detection analysis from a Bird's Eye View. In this paper, we compare modern lifting methods, both parameter-free and parameterized, to multi-view aggregation. Additionally, we present an architecture that aggregates the features of multiple times steps to learn robust detection and combines appearance- and motion-based cues for tracking. Most current tracking approaches either focus on pedestrians or vehicles. In our work, we combine both branches and add new challenges to multi-view detection with cross-scene setups. Our method generalizes to three public datasets across two domains: (1) pedestrian: Wildtrack and MultiviewX, and (2) roadside perception: Synthehicle, achieving state-of-the-art performance in detection and tracking. https://github.com/tteepe/TrackTacular	翻訳日:2024-03-20 14:43:03 公開日:2024-03-19
# EAS-SNN: 繰り返しスパイクニューラルネットワークを用いた事象検出のためのエンドツーエンド適応サンプリングと表現 EAS-SNN: End-to-End Adaptive Sampling and Representation for Event-based Detection with Recurrent Spiking Neural Networks ( http://arxiv.org/abs/2403.12574v1 ) ライセンス: Link先を確認	Ziming Wang, Ziling Wang, Huaning Li, Lang Qin, Runhao Jiang, De Ma, Huajin Tang,	(参考訳) イベントカメラは、高いダイナミックレンジと時間分解能を持ち、特に動きのぼやけと困難な照明条件のシナリオにおいて、オブジェクト検出に最適である。しかし、ほとんどの既存手法は、高度な検出バックボーンと早期集約関数による時空間表現の最適化を優先しているが、適応的なイベントサンプリングの重要な問題は、ほとんど未適応のままである。スパーススパイク通信を通じてイベント駆動のパラダイムで動作するスパイキングニューラルネットワーク(SNN)は、この課題に対処するための自然なフィットとして現れます。本研究では、スパイキングニューロンの神経力学が理想的な時間事象サンプリング器の動作と密接に一致していることを明らかにする。そこで本研究では,時間記憶を付加した再帰的畳み込みSNNを活用する適応型サンプリングモジュールを提案する。さらに、スパイクベースサンプリングモジュールで発生する電位分布の制御と性能劣化に対処するため、Residual potential Dropout (RPD) と Spike-Aware Training (SAT) を導入する。イベントベース検出のためのニューロモルフィックデータセットの厳密なテストを通じて、既存のスパイクベースの手法を明らかに超越し、パラメータや時間ステップをはるかに少なくして優れた性能を実現した。例えば、我々の手法は、パラメータを38 %削減し、3 つの時間ステップで、Gen1 データセットで 4.4 % mAP の改善を実現している。さらに, 適応サンプリング手法の適用性および有効性は, 従来の非スパイキング検出モデルに対するさらなる検証を通じて示されるように, SNN を超えて拡張される。 Event cameras, with their high dynamic range and temporal resolution, are ideally suited for object detection, especially under scenarios with motion blur and challenging lighting conditions. However, while most existing approaches prioritize optimizing spatiotemporal representations with advanced detection backbones and early aggregation functions, the crucial issue of adaptive event sampling remains largely unaddressed. Spiking Neural Networks (SNNs), which operate on an event-driven paradigm through sparse spike communication, emerge as a natural fit for addressing this challenge. In this study, we discover that the neural dynamics of spiking neurons align closely with the behavior of an ideal temporal event sampler. Motivated by this insight, we propose a novel adaptive sampling module that leverages recurrent convolutional SNNs enhanced with temporal memory, facilitating a fully end-to-end learnable framework for event-based detection. Additionally, we introduce Residual Potential Dropout (RPD) and Spike-Aware Training (SAT) to regulate potential distribution and address performance degradation encountered in spike-based sampling modules. Through rigorous testing on neuromorphic datasets for event-based detection, our approach demonstrably surpasses existing state-of-the-art spike-based methods, achieving superior performance with significantly fewer parameters and time steps. For instance, our method achieves a 4.4\% mAP improvement on the Gen1 dataset, while requiring 38\% fewer parameters and three time steps. Moreover, the applicability and effectiveness of our adaptive sampling methodology extend beyond SNNs, as demonstrated through further validation on conventional non-spiking detection models.	翻訳日:2024-03-20 14:43:03 公開日:2024-03-19
# 条件量子力学のための厳密なモデル還元 Exact model reduction for conditional quantum dynamics ( http://arxiv.org/abs/2403.12575v1 ) ライセンス: Link先を確認	Tommaso Grigoletto, Francesco Ticozzi,	(参考訳) 量子確率の最小化と条件付き期待に基づく代数的アプローチを応用し、関心事の結果の正確な分布を維持しつつ、量子フィルタの次元を小さくする手法を提案する。この方法は、測定結果に依存し、システム理論の可観測性解析に基づく一般的な量子系に対して提示され、プロトタイプの例で検証される。 Leveraging an algebraic approach built on minimal realizations and conditional expectations in quantum probability, we propose a method to reduce the dimension of quantum filters while maintaining the correct distributions on the outcomes of interest. The method is presented for general quantum systems whose dynamics depend on measurement outcomes, hinges on a system-theoretic observability analysis, and is tested on prototypical examples.	翻訳日:2024-03-20 14:43:03 公開日:2024-03-19
# Real-IAD:Versatile Industrial Anomaly Detectionのベンチマークのための実世界のマルチビューデータセット Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection ( http://arxiv.org/abs/2403.12580v1 ) ライセンス: Link先を確認	Chengjie Wang, Wenbing Zhu, Bin-Bin Gao, Zhenye Gan, Jianning Zhang, Zhihao Gu, Shuguang Qian, Mingang Chen, Lizhuang Ma,	(参考訳) 産業的異常検出 (IAD) が注目され, 急速な発展を遂げている。しかし、最近のIADアプローチの開発は、データセットの制限により、ある種の困難に直面している。一方、最先端の手法のほとんどはMVTecのような主流データセット上で飽和(AUROCの99%以上)を達成しており、メソッドの違いは十分に区別できないため、パブリックデータセットと実際のアプリケーションシナリオの間には大きなギャップがある。一方, 各種の実用的異常検出設定に関する研究は, データセットの規模によって制限されており, 評価結果に過度に適合するリスクが生じる。そこで本研究では,30個の異なるオブジェクトの150Kの高分解能画像を含む大規模,実世界,多視点の産業異常検出データセットであるReal-IADを提案する。欠陥領域と比率の幅が広いため、以前のデータセットよりも難しい。このデータセットを実際のアプリケーションシナリオに近づけるために,多視点撮影法とサンプルレベルの評価指標を提案する。さらに, 一般の非監督的異常検出設定を超えて, 工業生産における収率が60%以上であり, 実用的価値が高いという観測に基づいて, 完全無監督産業異常検出(FUIAD)の新たな設定を提案する。最後に、Real-IADデータセット上での一般的なIAD手法の結果を報告し、IADフィールドの開発を促進するための非常に難しいベンチマークを提供する。 Industrial anomaly detection (IAD) has garnered significant attention and experienced rapid development. However, the recent development of IAD approach has encountered certain difficulties due to dataset limitations. On the one hand, most of the state-of-the-art methods have achieved saturation (over 99% in AUROC) on mainstream datasets such as MVTec, and the differences of methods cannot be well distinguished, leading to a significant gap between public datasets and actual application scenarios. On the other hand, the research on various new practical anomaly detection settings is limited by the scale of the dataset, posing a risk of overfitting in evaluation results. Therefore, we propose a large-scale, Real-world, and multi-view Industrial Anomaly Detection dataset, named Real-IAD, which contains 150K high-resolution images of 30 different objects, an order of magnitude larger than existing datasets. It has a larger range of defect area and ratio proportions, making it more challenging than previous datasets. To make the dataset closer to real application scenarios, we adopted a multi-view shooting method and proposed sample-level evaluation metrics. In addition, beyond the general unsupervised anomaly detection setting, we propose a new setting for Fully Unsupervised Industrial Anomaly Detection (FUIAD) based on the observation that the yield rate in industrial production is usually greater than 60%, which has more practical application value. Finally, we report the results of popular IAD methods on the Real-IAD dataset, providing a highly challenging benchmark to promote the development of the IAD field.	翻訳日:2024-03-20 14:43:03 公開日:2024-03-19
# AlphaFin: Retrieval-Augmented Stock-Chain Frameworkによる財務分析のベンチマーク AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework ( http://arxiv.org/abs/2403.12582v1 ) ライセンス: Link先を確認	Xiang Li, Zhenyu Li, Chen Shi, Yong Xu, Qing Du, Mingkui Tan, Jun Huang, Wei Lin,	(参考訳) 金融分析の課題は主に、株価トレンド予測とそれに対応する金融質問応答の2つの重要な領域を含む。現在、機械学習とディープラーニングアルゴリズム(ML&DL)がストックトレンド予測に広く適用されており、大きな進歩をもたらしている。しかし、これらの手法は、解釈可能性や推論のプロセスが欠如しているため、予測の理由を提供できない。また、金融ニュースやレポートなどのテキスト情報を統合できない。一方、大きな言語モデル(LLM)はテキスト理解と生成能力に優れる。しかし、金融トレーニングデータセットが不足し、リアルタイム知識との統合が限られているため、LLMはいまだ幻覚に悩まされており、最新の情報に追いついていない。これらの課題に対処するため、我々はAlphaFinデータセットを最初にリリースし、従来の研究データセット、リアルタイム財務データ、手書きのチェーン・オブ・シント(CoT)データを組み合わせています。財務分析を完了させるため、LLMのトレーニングに肯定的な影響を与える。次に、AlphaFinデータセットを使用して、検索強化世代(RAG)技術を統合する財務分析タスクを効果的に処理するために、Stock-Chainと呼ばれる最先端の手法をベンチマークする。金融分析における枠組みの有効性を実証するための大規模な実験を行った。 The task of financial analysis primarily encompasses two key areas: stock trend prediction and the corresponding financial question answering. Currently, machine learning and deep learning algorithms (ML&DL) have been widely applied for stock trend predictions, leading to significant progress. However, these methods fail to provide reasons for predictions, lacking interpretability and reasoning processes. Also, they can not integrate textual information such as financial news or reports. Meanwhile, large language models (LLMs) have remarkable textual understanding and generation ability. But due to the scarcity of financial training datasets and limited integration with real-time knowledge, LLMs still suffer from hallucinations and are unable to keep up with the latest information. To tackle these challenges, we first release AlphaFin datasets, combining traditional research datasets, real-time financial data, and handwritten chain-of-thought (CoT) data. It has a positive impact on training LLMs for completing financial analysis. We then use AlphaFin datasets to benchmark a state-of-the-art method, called Stock-Chain, for effectively tackling the financial analysis task, which integrates retrieval-augmented generation (RAG) techniques. Extensive experiments are conducted to demonstrate the effectiveness of our framework on financial analysis.	翻訳日:2024-03-20 14:43:03 公開日:2024-03-19
# LASPA: 高速なトレーニング不要な単一画像編集のための空間アライメント LASPA: Latent Spatial Alignment for Fast Training-free Single Image Editing ( http://arxiv.org/abs/2403.12585v1 ) ライセンス: Link先を確認	Yazeed Alharbi, Peter Wonka,	(参考訳) 本研究では,拡散モデルを用いた実画像のテキスト編集のための新しい学習自由な手法を提案する。計算コストのかかるファインタニングに頼っていた従来の手法とは異なり、当社の手法はLatent SPatial Alignment(LASPA)を利用して画像の詳細を効率よく保存する。参照画像を用いて拡散過程が空間的誘導にどう影響するかを実証し,意味的コヒーレントな編集に繋がることを示す。これにより、複雑な最適化やコストのかかるモデル微調整が不要になり、従来の方法に比べて編集が大幅に高速化される。さらに,本手法は,大規模微調整モデルに付随するストレージ要求を回避する。これらのアドバンテージにより、我々のアプローチはモバイルデバイスや、迅速な応答時間を要求するアプリケーション上での編集に特に適しています。提案手法は単純かつ高速であるが,ユーザスタディでは62-71 %の嗜好が得られ,モデルベース編集強度と画像保存スコアが大幅に向上した。 We present a novel, training-free approach for textual editing of real images using diffusion models. Unlike prior methods that rely on computationally expensive finetuning, our approach leverages LAtent SPatial Alignment (LASPA) to efficiently preserve image details. We demonstrate how the diffusion process is amenable to spatial guidance using a reference image, leading to semantically coherent edits. This eliminates the need for complex optimization and costly model finetuning, resulting in significantly faster editing compared to previous methods. Additionally, our method avoids the storage requirements associated with large finetuned models. These advantages make our approach particularly well-suited for editing on mobile devices and applications demanding rapid response times. While simple and fast, our method achieves 62-71\% preference in a user-study and significantly better model-based editing strength and image preservation scores.	翻訳日:2024-03-20 14:43:03 公開日:2024-03-19
# 素数分布の機械学習 Machine Learning of the Prime Distribution ( http://arxiv.org/abs/2403.12588v1 ) ライセンス: Link先を確認	Alexander Kolpakov, Aidan Rocke,	(参考訳) 本研究では、最大エントロピー法を用いて確率的数論のいくつかの定理を導出する。また、Yの実験的な観察を説明する理論的な議論も提供する。 -H。彼は素数の学習可能性について語り、Erd\H{o}s-Kac法則が現在の機械学習技術によって発見される可能性は極めて低いと仮定した。我々が行う数値実験は、理論的な発見を裏付けるものである。 In the present work we use maximum entropy methods to derive several theorems in probabilistic number theory, including a version of the Hardy-Ramanujan Theorem. We also provide a theoretical argument explaining the experimental observations of Y.-H. He about the learnability of primes, and posit that the Erd\H{o}s-Kac law would very unlikely be discovered by current machine learning techniques. Numerical experiments that we perform corroborate our theoretical findings.	翻訳日:2024-03-20 14:43:03 公開日:2024-03-19
# FootstepNet: 高速オンライン二足歩行計画と予測のための効率的なアクタ・クリティカル法 FootstepNet: an Efficient Actor-Critic Method for Fast On-line Bipedal Footstep Planning and Forecasting ( http://arxiv.org/abs/2403.12589v1 ) ライセンス: Link先を確認	Clément Gaspard, Grégoire Passault, Mélodie Daniel, Olivier Ly,	(参考訳) ヒューマノイドロコモーションコントローラの設計は困難であり、古典的にはサブプロブレムで分割される。フットステッププランニングは、フットステップのシーケンスを定義することの1つです。より単純な環境でも、最小の列や実現可能な列を見つけることは、複雑な最適化問題をもたらす。文献では、この問題は通常、探索に基づくアルゴリズム(例えば A* の変種)によって解決される。しかし、そのような手法は計算コストが高いか、手作りのいくつかのパラメータのチューニングに依存している。本研究では、まず、現状のDeep Reinforcement Learning (DRL) 技術に基づいて、オンライン推論の計算能力が極めて低いローカル環境において、障害のある効率的なフットステップ計画法を提案する。私たちのアプローチはヒューリスティックフリーであり、実行可能な足跡を生成するための一連のアクションに依存しています。対照的に、他の方法は関連する離散的なアクションの選択を必要とする。第2に,地域目標の異なる候補に到達するのに必要なフットステップ数を素早く推定できる予測手法を提案する。このアプローチはアクター批判型DRLアーキテクチャによる固有の計算に依存する。本研究は,RoboCup 2023コンペティションにおいて,シミュレーション結果と,子供サイズのヒューマノイドロボットへの展開によるアプローチの有効性を実証する。 Designing a humanoid locomotion controller is challenging and classically split up in sub-problems. Footstep planning is one of those, where the sequence of footsteps is defined. Even in simpler environments, finding a minimal sequence, or even a feasible sequence, yields a complex optimization problem. In the literature, this problem is usually addressed by search-based algorithms (e.g. variants of A*). However, such approaches are either computationally expensive or rely on hand-crafted tuning of several parameters. In this work, at first, we propose an efficient footstep planning method to navigate in local environments with obstacles, based on state-of-the art Deep Reinforcement Learning (DRL) techniques, with very low computational requirements for on-line inference. Our approach is heuristic-free and relies on a continuous set of actions to generate feasible footsteps. In contrast, other methods necessitate the selection of a relevant discrete set of actions. Second, we propose a forecasting method, allowing to quickly estimate the number of footsteps required to reach different candidates of local targets. This approach relies on inherent computations made by the actor-critic DRL architecture. We demonstrate the validity of our approach with simulation results, and by a deployment on a kid-size humanoid robot during the RoboCup 2023 competition.	翻訳日:2024-03-20 14:33:18 公開日:2024-03-19
# 接触相互作用を持つボースガスに対するハミルトニアン Hamiltonian for a Bose gas with Contact Interactions ( http://arxiv.org/abs/2403.12594v1 ) ライセンス: Link先を確認	Daniele Ferretti, Alessandro Teta,	(参考訳) ボース気体のハミルトニアンを、ゼロレンジまたは接触相互作用を介して相互作用する3次元のN \geq 3$スピンレス粒子で研究する。このような相互作用は、2つの粒子の座標が一致するときに、偶然の超平面で満たされる(特異な)境界条件によって記述される。点相互作用で一体問題と同じ種類の境界条件を課すことで、必然的にハミルトンアンが下から非有界となり不安定となることが知られている。これは、3つ以上の粒子の座標が一致すると相互作用が強すぎて魅力的になるためである。そのような不安定性を避けるため、1962年にMinlosとFaddeevによって定式化された提案を考案し、粒子$i, j$の位置が以下の場合に相互作用の強度を減少させるわずかに修正された境界条件を導入する。 a) 第三粒子は,i$とj$の共通位置に近づく b) 他の2つの粒子が互いに接近する。他のすべてのケースでは、通常の境界条件が復元される。二次形式アプローチに従えば、そのような修正境界条件によって特徴づけられるハミルトニアンが自己随伴し、下から有界であることを証明する。また、何年も前にAlbeverio, H{\o}egh-Krohn, Streitによって得られた接触相互作用を持つN-体ハミルトンは、ディリクレ形式(J. Math. Phys., 18, 907--917, 1977)の理論を用いて得られる。 We study the Hamiltonian for a Bose gas in three dimensions of $N \geq 3$ spinless particles interacting via zero-range or contact interactions. Such interactions are described by (singular) boundary conditions satisfied at the coincidence hyperplanes, i.e., when the coordinates of two particles coincide. It is known that if one imposes the same kind of boundary condition of the one-body problem with a point interaction then one is inevitably led to a Hamiltonian unbounded from below and therefore unstable. This is due to the fact that the interaction becomes too strong and attractive when the coordinates of three or more particles coincide. In order to avoid such instability property, we develop a suggestion formulated by Minlos and Faddeev in 1962 and introduce a slightly modified boundary condition which reduces the strength of the interaction when the positions of the particles $i, j$ coincide in the following cases: a) a third particle approaches the common position of $i$ and $j$; b) two other particles approach to each other. In all the other cases the usual boundary condition is restored. Following a quadratic form approach, we prove that the Hamiltonian characterized by such modified boundary condition is self-adjoint and bounded from below. We also show that the N-body Hamiltonian with contact interactions obtained years ago by Albeverio, H{\o}egh-Krohn and Streit using the theory of Dirichlet forms (J. Math. Phys., 18, 907--917, 1977) is a special case of our Hamiltonian.	翻訳日:2024-03-20 14:33:18 公開日:2024-03-19
# チャートに基づく推論:LLMからVLMへの機能移行 Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs ( http://arxiv.org/abs/2403.12596v1 ) ライセンス: Link先を確認	Victor Carbune, Hassan Mansoor, Fangyu Liu, Rahul Aralikatte, Gilles Baechler, Jindong Chen, Abhanshu Sharma,	(参考訳) 視覚言語モデル(VLM)は、マルチモーダルタスクにおいて、ますます強力なパフォーマンスを実現している。しかし、特に小型のVLMでは推論能力に制限があり、大規模言語モデル(LLM)では多くの改善が見られた。本稿では,LLMからVLMへ機能を移行する手法を提案する。最近導入されたChartQAでは,PaLI3-5B VLMに \citet{chen2023pali3} を適用した場合に,PlotQA と FigureQA でより優れた性能が得られる。まず, 先行学習段階を継続し, 表から表への変換タスクを \citet{liu2023deplot} で改善した。次に、元のトレーニングセットよりも20倍大きなデータセットを構築することを提案する。一般的な推論能力を向上し、数値演算を改善するために、チャートの表表現を用いて推論トレースを合成する。最後に,本モデルでは, \citet{hsieh2023distilling} が導入したマルチタスク損失を用いて微調整を行う。当社のChartPaLI-5Bは、上流のOCRシステムを使わずにPaLIX-55Bのような10倍のモデルでも性能が向上し、PaLI3-5Bベースラインと比較して推論時間を一定に保っている。単純なプログラム・オブ・シークレット・プロンプト \cite{chen2023 program} で合理性がさらに洗練されると、我々のモデルは、最近導入された Gemini Ultra と GPT-4V よりも優れている。 Vision-language models (VLMs) are achieving increasingly strong performance on multimodal tasks. However, reasoning capabilities remain limited particularly for smaller VLMs, while those of large-language models (LLMs) have seen numerous improvements. We propose a technique to transfer capabilities from LLMs to VLMs. On the recently introduced ChartQA, our method obtains state-of-the-art performance when applied on the PaLI3-5B VLM by \citet{chen2023pali3}, while also enabling much better performance on PlotQA and FigureQA. We first improve the chart representation by continuing the pre-training stage using an improved version of the chart-to-table translation task by \citet{liu2023deplot}. We then propose constructing a 20x larger dataset than the original training set. To improve general reasoning capabilities and improve numerical operations, we synthesize reasoning traces using the table representation of charts. Lastly, our model is fine-tuned using the multitask loss introduced by \citet{hsieh2023distilling}. Our variant ChartPaLI-5B outperforms even 10x larger models such as PaLIX-55B without using an upstream OCR system, while keeping inference time constant compared to the PaLI3-5B baseline. When rationales are further refined with a simple program-of-thought prompt \cite{chen2023program}, our model outperforms the recently introduced Gemini Ultra and GPT-4V.	翻訳日:2024-03-20 14:33:18 公開日:2024-03-19
# ML-Informed Distribution of Rental Assistance による自家不自由感の予防 Preventing Eviction-Caused Homelessness through ML-Informed Distribution of Rental Assistance ( http://arxiv.org/abs/2403.12599v1 ) ライセンス: Link先を確認	Catalina Vajiac, Arun Frey, Joachim Baumann, Abigail Smith, Kasun Amarasinghe, Alice Lai, Kit Rodolfa, Rayid Ghani,	(参考訳) レンタル支援プログラムは、退去やホームレスの回避による住宅の不安定化を防ぐために、個人に財政援助を提供する。これらのプログラムはリソース制約の下で動作するため、誰が優先順位を付けるかを決めなければなりません。一般的に、資金は、将来のホームレスのリスクを体系的に考慮しない、リアクティブまたはファーストカムファーストのサービス割り当てプロセスによって分配される。われわれはアレゲニー郡と共同で、将来のホームレスのリスクに基づいて退去に直面している個人を優先する、積極的なアロケーションアプローチを模索した。州と郡の行政データを用いて、支援を必要としている個人を正確に識別するMLシステムは、人種や性別を公平かつ公平にしながら、より単純な優先順位付けアプローチを少なくとも20%向上させる。さらに、我々のアプローチでは、現在のプロセスで見落とされ、最終的にはホームレスになる個人の28%を特定します。この研究は、アレゲーニー郡の賃貸支援プログラムの改善以外にも、データニーズ、モデル設計、評価、フィールドバリデーションなど、同様の状況下でのエビデンスベースの意思決定支援ツールの開発に役立てることができる。 Rental assistance programs provide individuals with financial assistance to prevent housing instabilities caused by evictions and avert homelessness. Since these programs operate under resource constraints, they must decide who to prioritize. Typically, funding is distributed by a reactive or first-come-first serve allocation process that does not systematically consider risk of future homelessness. We partnered with Allegheny County, PA to explore a proactive allocation approach that prioritizes individuals facing eviction based on their risk of future homelessness. Our ML system that uses state and county administrative data to accurately identify individuals in need of support outperforms simpler prioritization approaches by at least 20% while being fair and equitable across race and gender. Furthermore, our approach would identify 28% of individuals who are overlooked by the current process and end up homeless. Beyond improvements to the rental assistance program in Allegheny County, this study can inform the development of evidence-based decision support tools in similar contexts, including lessons about data needs, model design, evaluation, and field validation.	翻訳日:2024-03-20 14:33:18 公開日:2024-03-19
# LHMKE:中国語大言語モデルのための大規模ホロスティック多目的知識評価ベンチマーク LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models ( http://arxiv.org/abs/2403.12601v1 ) ライセンス: Link先を確認	Chuang Liu, Renren Jin, Yuqi Ren, Deyi Xiong,	(参考訳) Chinese Large Language Models (LLMs)は、最近、様々なNLPベンチマークと実世界のアプリケーションで印象的な能力を実証した。しかし、これらのLLMを包括的に評価する既存のベンチマークは、特にLLMが捉えた知識を測る上ではまだ不十分である。現在のデータセットは、この問題に対処するために、異なる科目と教育レベルにわたる中国の試験から質問を集めている。しかし、これらのベンチマークは主に、複数の選択の質問のような客観的な質問に焦点を当てており、質問タイプの多様性の欠如につながっている。そこで本稿では,LHMKE(LHMKE,大規模・完全・多目的知識評価ベンチマーク)を提案する。 LHMKEは中国のLLMの知識獲得能力を総合的に評価するように設計されている。初等学校から専門的認定試験まで、30の科目をカバーする75の課題に10,465の質問がある。特に、LHMKEには客観的な質問と主観的な質問の両方が含まれており、LLMの知識レベルをより包括的に評価している。実検結果と一致したゼロショット設定で11種類の中国語LLMを評価し,その性能を異なる被験者間で比較した。また、GPT-4が自動的に主観的予測を得られるかどうかを詳細に分析する。以上の結果から,LHMKEは中国のLLMにとって,挑戦的で先進的なテストベッドであることが示唆された。 Chinese Large Language Models (LLMs) have recently demonstrated impressive capabilities across various NLP benchmarks and real-world applications. However, the existing benchmarks for comprehensively evaluating these LLMs are still insufficient, particularly in terms of measuring knowledge that LLMs capture. Current datasets collect questions from Chinese examinations across different subjects and educational levels to address this issue. Yet, these benchmarks primarily focus on objective questions such as multiple-choice questions, leading to a lack of diversity in question types. To tackle this problem, we propose LHMKE, a Large-scale, Holistic, and Multi-subject Knowledge Evaluation benchmark in this paper. LHMKE is designed to provide a comprehensive evaluation of the knowledge acquisition capabilities of Chinese LLMs. It encompasses 10,465 questions across 75 tasks covering 30 subjects, ranging from primary school to professional certification exams. Notably, LHMKE includes both objective and subjective questions, offering a more holistic evaluation of the knowledge level of LLMs. We have assessed 11 Chinese LLMs under the zero-shot setting, which aligns with real examinations, and compared their performance across different subjects. We also conduct an in-depth analysis to check whether GPT-4 can automatically score subjective predictions. Our findings suggest that LHMKE is a challenging and advanced testbed for Chinese LLMs.	翻訳日:2024-03-20 14:33:18 公開日:2024-03-19
# 統合分散センシングと量子通信ネットワーク Integrated distributed sensing and quantum communication networks ( http://arxiv.org/abs/2403.12602v1 ) ライセンス: Link先を確認	Yuehan Xu, Tao Wang, Peng Huang, Guihua Zeng,	(参考訳) センサと通信の統合は、ユビキタスな通信を可能にしながら、ユビキタスなセンシングを実現することができる。グローバル通信を徐々に改善する中で、光ファイバーに基づく統合センシング通信(ISAC)システムは、都市構造イメージング、地震波検出、パイプライン安全監視などの様々な機能を実現することができる。量子通信の発展に伴い、光ファイバーに基づく量子ネットワークが徐々に確立されつつある。本稿では、複数のノード間のセキュアな鍵分布と、標準量子限界下での分散センシングを実現するための、ISAQN(Integrated Sensentration and Quantum Network)方式を提案する。 CV-QKDプロトコルとラウンドトリップマルチバンド構造を採用し、マルチノードセキュア鍵分布を実現する。一方、分散センシングを実現するため、スペクトル位相監視(SPM)プロトコルが提案されている。周波数スペクトルを監視し、どのノードが振動しているかを決定し、位相変化を監視して振動波形を復元する。このスキームは、恒星構造ネットワークにおける振動をシミュレートすることによって実験的に実証される。実験結果から、このマルチユーザ量子ネットワークは10$\rm{km}$標準ファイバー伝送における各ユーザに対して約0.7$\rm{Mbits/s}$の秘密鍵レート(SKR)を達成でき、ネットワーク容量は8。分散センシングの分野では、1$\rm{Hz}$から2$\rm{kHz}$、0.50$\rm{n}$$\varepsilon$/\sqrt{\rm{Hz}}$、0.20$\rm{m}$の空間分解能を持つ。提案したISAQNスキームは,マルチポイントネットワークにおける同時量子通信と分散センシングを可能にし,将来の大規模量子ネットワークと高精度センシングネットワークの基礎を築いた。 The integration of sensing and communication can achieve ubiquitous sensing while enabling ubiquitous communication. Within the gradually improving global communication, the integrated sensing and communication (ISAC) system based on optical fibers can accomplish various functionalities, such as urban structure imaging, seismic wave detection, and pipeline safety monitoring. With the development of quantum communication, quantum networks based on optical fiber are gradually being established. In this paper, we propose an integrated sensing and quantum network (ISAQN) scheme, which can achieve secure key distribution among multiple nodes and distributed sensing under the standard quantum limit. CV-QKD protocol and the round-trip multi-band structure are adopted to achieve the multi-node secure key distribution. Meanwhile, the spectrum phase monitoring (SPM) protocol is proposed to realize distributed sensing. It determines which node is vibrating by monitoring the frequency spectrum and restores the vibration waveform by monitoring the phase change. The scheme is experimentally demonstrated by simulating the vibration in a star structure network. Experimental results indicate that this multi-user quantum network can achieve a secret key rate (SKR) of approximately 0.7 $\rm{Mbits/s}$ for each user under 10 $\rm{km}$ standard fiber transmission and its network capacity is 8. In terms of distributed sensing, it can achieve a vibration response bandwidth ranging from 1 $\rm{Hz}$ to 2 $\rm{kHz}$, a strain resolution of 0.50 $\rm{n}$$\varepsilon$$/\sqrt{\rm{Hz}}$, and a spatial resolution of 0.20 $\rm{m}$ under shot-noise-limited detection. The proposed ISAQN scheme enables simultaneous quantum communication and distributed sensing in a multi-point network, laying a foundation for future large-scale quantum networks and high-precision sensing networks.	翻訳日:2024-03-20 14:33:18 公開日:2024-03-19
# マイクロサービスにおけるデータ管理の課題のベンチマーク A Benchmark for Data Management Challenges in Microservices ( http://arxiv.org/abs/2403.12605v1 ) ライセンス: Link先を確認	Rodrigo Laigner, Zhexiang Zhang, Yijian Liu, Leonardo Freitas Gomes, Yongluan Zhou,	(参考訳) スケーラブルな分散アプリケーションを設計するための一般的なアーキテクチャとして、マイクロサービスアーキテクチャが登場した。マイクロサービスは10年以上にわたって業界で広く採用されてきたが、これらのアプリケーションで発生するデータ管理の課題についてはほとんど理解されていない。その結果、マイクロサービスアプリケーションをサポートするためのデータシステム技術の進歩は困難である。このギャップを埋めるために、既存のベンチマークが十分に対応していない、コアデータ管理の課題を取り入れた、マイクロサービスベンチマークであるOnline Marketplaceを紹介します。これらの課題には、トランザクション処理、クエリ処理、イベント処理、制約執行、データレプリケーションなどが含まれる。データシステムとプラットフォーム間で適切な比較を可能にするために、さまざまなデータ管理問題の基準を定義しました。ベンチマークを指定した後、マイクロサービスの動的状態を正確に反映したワークロード作成で直面した課題を紹介します。我々はまた、最先端のデータプラットフォームでオンラインマーケットプレイスを開発する際に遭遇した実装上の問題についても論じる。私たちの評価は、このベンチマークが、マイクロサービス実践者が求めている重要な特性をテストするための貴重なツールであることを示している。その結果、提案したベンチマークにより、将来のデータシステムの設計が容易になり、マイクロサービス実践者の期待に応えることができる。 Microservice architectures emerged as a popular architecture for designing scalable distributed applications. Although microservices have been extensively employed in industry settings for over a decade, there is little understanding of the data management challenges that arise in these applications. As a result, it is difficult to advance data system technologies for supporting microservice applications. To fill this gap, we present Online Marketplace, a microservice benchmark that incorporates core data management challenges that existing benchmarks have not sufficiently addressed. These challenges include transaction processing, query processing, event processing, constraint enforcement, and data replication. We have defined criteria for various data management issues to enable proper comparison across data systems and platforms. After specifying the benchmark, we present the challenges we faced in creating workloads that accurately reflect the dynamic state of the microservices. We also discuss implementation issues that we encountered when developing Online Marketplace in state-of-the-art data platforms, which prevented us from meeting the specified data management requirements and criteria. Our evaluation demonstrates that the benchmark is a valuable tool for testing important properties sought by microservice practitioners. As a result, our proposed benchmark will facilitate the design of future data systems to meet the expectations of microservice practitioners.	翻訳日:2024-03-20 14:33:18 公開日:2024-03-19
# 不均一アンサンブル法による再同定の有効性について On the Effectiveness of Heterogeneous Ensemble Methods for Re-identification ( http://arxiv.org/abs/2403.12606v1 ) ライセンス: Link先を確認	Simon Klüttermann, Jérôme Rutinowski, Anh Nguyen, Britta Grimme, Moritz Roidl, Emmanuel Müller,	(参考訳) 本発表では, チキンウッドパレットと亜鉛めっき金属板のイメージをデータセットの例として用いて, 産業組織を再同定するための新しいアンサンブル手法を提案する。我々のアルゴリズムは、よく使われる複雑なシアムニューラルネットワークを、単純化された初歩的なモデルの集合に置き換え、特にハードウェア制約のあるシナリオにおいて、より広い適用性を提供する。各アンサンブルサブモデルは、与えられたデータの異なるタイプの抽出された特徴を入力として使用し、より複雑な最先端モデルに必要なトレーニング期間のごく一部で効果的なアンサンブルを作成することができる。 77%以上のランク1の精度と99%以上のランク10の精度を達成し、5つの異なる特徴抽出手法を導入し、異なるアンサンブル法を用いてそれらの組み合わせについて検討した。 In this contribution, we introduce a novel ensemble method for the re-identification of industrial entities, using images of chipwood pallets and galvanized metal plates as dataset examples. Our algorithms replace commonly used, complex siamese neural networks with an ensemble of simplified, rudimentary models, providing wider applicability, especially in hardware-restricted scenarios. Each ensemble sub-model uses different types of extracted features of the given data as its input, allowing for the creation of effective ensembles in a fraction of the training duration needed for more complex state-of-the-art models. We reach state-of-the-art performance at our task, with a Rank-1 accuracy of over 77% and a Rank-10 accuracy of over 99%, and introduce five distinct feature extraction approaches, and study their combination using different ensemble methods.	翻訳日:2024-03-20 14:33:18 公開日:2024-03-19
# SUNチームのABAW 2024コンペティションへの貢献: 視覚的価値-覚醒的評価と表現認識 SUN Team's Contribution to ABAW 2024 Competition: Audio-visual Valence-Arousal Estimation and Expression Recognition ( http://arxiv.org/abs/2403.12609v1 ) ライセンス: Link先を確認	Denis Dresvyanskiy, Maxim Markitantov, Jiawei Yu, Peitong Li, Heysem Kaya, Alexey Karpov,	(参考訳) 人間のコミュニケーションにおいて感情が中心的な役割を果たしているため、自動的な感情認識は過去20年間で注目を集めている。マルチモーダルシステムは実験室で制御されたデータに対して高い性能を享受するが、実験室で制御されていないデータ、すなわち「野生の」データに対する生態学的妥当性を提供するには程遠い。本研究では,感情認識における視覚的深層学習手法について検討する。特に、ビデオとオーディオのモダリティのための微調整畳み込みニューラルネットワーク(CNN)と公共次元感情モデル(PDEM)に基づくアーキテクチャの有効性について検討する。我々は、これらの多段階学習モード固有ディープニューラルネットワーク(DNN)の埋め込みを用いた時間的モデリングと融合戦略の比較を行った。 Affective Behavior Analysis in-the-Wild 2024 (ABAW'24) Challenge ProtocolにおけるAffWild2データセットについて報告する。 As emotions play a central role in human communication, automatic emotion recognition has attracted increasing attention in the last two decades. While multimodal systems enjoy high performances on lab-controlled data, they are still far from providing ecological validity on non-lab-controlled, namely 'in-the-wild' data. This work investigates audiovisual deep learning approaches for emotion recognition in-the-wild problem. We particularly explore the effectiveness of architectures based on fine-tuned Convolutional Neural Networks (CNN) and Public Dimensional Emotion Model (PDEM), for video and audio modality, respectively. We compare alternative temporal modeling and fusion strategies using the embeddings from these multi-stage trained modality-specific Deep Neural Networks (DNN). We report results on the AffWild2 dataset under Affective Behavior Analysis in-the-Wild 2024 (ABAW'24) challenge protocol.	翻訳日:2024-03-20 14:33:18 公開日:2024-03-19
# 形式的定理証明の強化:Coqコード上でAIモデルをトレーニングするための総合データセット Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code ( http://arxiv.org/abs/2403.12627v1 ) ライセンス: Link先を確認	Andreas Florath,	(参考訳) 形式的定理証明の領域では、Coq証明アシスタントは数学的主張とソフトウェア正当性を検証するための厳密なアプローチで際立っている。人工知能と機械学習の進歩にもかかわらず、Coq構文と意味論の特殊性は、大規模言語モデル(LLM)に固有の課題をもたらす。このギャップに対処するため,我々は,LLMのコーク符号の解釈・生成能力を高めるために設計された包括的データセットを提案する。このデータセットは1万以上のCoqソースファイルのコレクションから派生したもので、ソース参照やライセンス情報を含むメタデータに富んだ幅広い命題、証明、定義を含んでいる。我々の主な目的は、構文的に正し、意味的に意味のある Coq 構造を生成することができる LLM の開発を促進することであり、それによって自動定理証明のフロンティアを前進させることである。このデータセットでの最初の実験では、その大きな可能性を示しており、このデータに基づいてトレーニングされたモデルは、Coqコード生成の精度を向上した。特に、特定の実験では、微調整されたLLMが基本的な補題に対して141の有効な証明を生成することができ、多種多様な有効な証明戦略の発見を容易にするためのデータセットの有用性を強調した。本稿では、データセットの構成、その作成の背景となる方法論、そしてフォーマルな検証における機械学習の将来に対する我々の発見の意味について論じる。データセットは、さらなる調査と調査に利用可能である。 https://huggingface.co/datasets/florath/coq-facts-props-proofs-gen0-v1 In the realm of formal theorem proving, the Coq proof assistant stands out for its rigorous approach to verifying mathematical assertions and software correctness. Despite the advances in artificial intelligence and machine learning, the specialized nature of Coq syntax and semantics poses unique challenges for Large Language Models (LLMs). Addressing this gap, we present a comprehensive dataset specifically designed to enhance LLMs' proficiency in interpreting and generating Coq code. This dataset, derived from a collection of over 10,000 Coq source files, encompasses a wide array of propositions, proofs, and definitions, enriched with metadata including source references and licensing information. Our primary aim is to facilitate the development of LLMs capable of generating syntactically correct and semantically meaningful Coq constructs, thereby advancing the frontier of automated theorem proving. Initial experiments with this dataset have showcased its significant potential; models trained on this data exhibited enhanced accuracy in Coq code generation. Notably, a particular experiment revealed that a fine-tuned LLM was capable of generating 141 valid proofs for a basic lemma, highlighting the dataset's utility in facilitating the discovery of diverse and valid proof strategies. This paper discusses the dataset's composition, the methodology behind its creation, and the implications of our findings for the future of machine learning in formal verification. The dataset is accessible for further research and exploration: https://huggingface.co/datasets/florath/coq-facts-props-proofs-gen0-v1	翻訳日:2024-03-20 14:33:18 公開日:2024-03-19
# PointGrasp: 腱駆動型ソフトロボットグローブ用ポイントクラウドベースグラッピング PointGrasp: Point Cloud-based Grasping for Tendon-driven Soft Robotic Glove Applications ( http://arxiv.org/abs/2403.12631v1 ) ライセンス: Link先を確認	Chen Hu, Shirui Lyu, Eojin Rho, Daekyum Kim, Shan Luo, Letizia Gionfrida,	(参考訳) 作業の把握を支援するための手指外骨格の制御は、ユーザの意図を理解するのが困難であるため、課題となる。本研究では, 日常生活における日常的把握タスクのほとんどを, 3次元点雲から対象地平(単純・複雑)を解析することにより, 推定できることを提案する。この研究は、パーソナライズされたエンドツーエンドの把握タスクのためのADLにおける支援と強化を目的とした、家庭のシーンを意味的に識別するリアルタイムシステムであるPointGraspを紹介した。慣性測定ユニットを備えたRGB-Dカメラと、腱駆動のソフトロボットグローブに統合されたマイクロプロセッサとを備える。 RGB-Dカメラは毎秒30フレームを超える速度で3Dシーンを処理する。提案したパイプラインは、単純な場合は0.8$\pm$ 0.39 cm、複雑な幾何学では0.11$\pm$ 0.06 cmの平均RMSEを示す。各モード内では、到達可能なオブジェクトを特定し、ピンポイントする。このシステムは、エンドツーエンドのビジョン駆動型ロボットによるリハビリテーションマニュアルタスクにおいて有望であることを示す。 Controlling hand exoskeletons to assist individuals with grasping tasks poses a challenge due to the difficulty in understanding user intentions. We propose that most daily grasping tasks during activities of daily living (ADL) can be deduced by analyzing object geometries (simple and complex) from 3D point clouds. The study introduces PointGrasp, a real-time system designed for identifying household scenes semantically, aiming to support and enhance assistance during ADL for tailored end-to-end grasping tasks. The system comprises an RGB-D camera with an inertial measurement unit and a microprocessor integrated into a tendon-driven soft robotic glove. The RGB-D camera processes 3D scenes at a rate exceeding 30 frames per second. The proposed pipeline demonstrates an average RMSE of 0.8 $\pm$ 0.39 cm for simple and 0.11 $\pm$ 0.06 cm for complex geometries. Within each mode, it identifies and pinpoints reachable objects. This system shows promise in end-to-end vision-driven robotic-assisted rehabilitation manual tasks.	翻訳日:2024-03-20 14:33:18 公開日:2024-03-19
# 科学における生成モデル評価のための統計的距離の実践的ガイド A Practical Guide to Statistical Distances for Evaluating Generative Models in Science ( http://arxiv.org/abs/2403.12636v1 ) ライセンス: Link先を確認	Sebastian Bischoff, Alana Darcher, Michael Deistler, Richard Gao, Franziska Gerken, Manuel Gloeckler, Lisa Haxel, Jaivardhan Kapoor, Janne K Lappalainen, Jakob H Macke, Guy Moss, Matthijs Pals, Felix Pei, Rachel Rapp, A Erdem Sağtekin, Cornelius Schröder, Auguste Schulz, Zinovia Stefanidi, Shoji Toyota, Linda Ulmer, Julius Vetter,	(参考訳) 生成モデルは、フォトリアリスティック画像、タンパク質構造、コネクトームなどの高次元かつ複雑な分布を捉える能力があるため、科学の多くの分野において重要なものである。これらのモデルが生成するサンプルをどのように評価するか。この研究は、統計距離の一般的な概念を理解するためのアクセス可能なエントリポイントを提供することを目的としており、数学と統計学の基礎知識のみを必要とする。低次元射影(Sliced-Wasserstein; SW)、分類器(Classifier Two-Sample Tests; C2ST)、カーネル(Maximum Mean Discrepancy; MMD)、ニューラルネットワーク(Fr\echet Inception Distance; FID)を用いて距離を得る。それぞれの距離の背後にある直感を強調し、そのメリット、スケーラビリティ、複雑さ、落とし穴を説明します。これらの距離が実際にどのように使われているかを示すために、異なる科学領域、すなわち意思決定のモデルと医療画像の生成モデルから生成モデルを評価する。我々は、異なる距離が類似したデータに対して異なる結果を与えることを示す。本ガイドは,科学における生成モデルに対する統計的距離の利用,解釈,評価を支援することを目的としている。 Generative models are invaluable in many fields of science because of their ability to capture high-dimensional and complicated distributions, such as photo-realistic images, protein structures, and connectomes. How do we evaluate the samples these models generate? This work aims to provide an accessible entry point to understanding popular notions of statistical distances, requiring only foundational knowledge in mathematics and statistics. We focus on four commonly used notions of statistical distances representing different methodologies: Using low-dimensional projections (Sliced-Wasserstein; SW), obtaining a distance using classifiers (Classifier Two-Sample Tests; C2ST), using embeddings through kernels (Maximum Mean Discrepancy; MMD), or neural networks (Fr\'echet Inception Distance; FID). We highlight the intuition behind each distance and explain their merits, scalability, complexity, and pitfalls. To demonstrate how these distances are used in practice, we evaluate generative models from different scientific domains, namely a model of decision making and a model generating medical images. We showcase that distinct distances can give different results on similar data. Through this guide, we aim to help researchers to use, interpret, and evaluate statistical distances for generative models in science.	翻訳日:2024-03-20 14:33:18 公開日:2024-03-19
# 時系列の自動コントラスト学習戦略探索 Automated Contrastive Learning Strategy Search for Time Series ( http://arxiv.org/abs/2403.12641v1 ) ライセンス: Link先を確認	Baoyu Jing, Yansen Wang, Guoxin Sui, Jing Hong, Jingrui He, Yuqing Yang, Dongsheng Li, Kan Ren,	(参考訳) 近年,コントラスト学習(CL)が時系列の表現学習のパラダイムとして主流となっている。文献の既存の方法のほとんどは、特定のデータセットやタスクに対する人間のヒューリスティックによって、特定のコントラスト学習戦略(CLS)を手作業で構築することに焦点を当てている。しかし、手動でCLSを開発するには、例えば医療における医療時系列の専門的認知や、詳細な学習構成を決定するための巨大な人的労働と大規模な実験といった、データセットやタスクに関する過剰な事前知識が必要である。本稿では,Microsoft における Automated Machine Learning (AutoML) の実践について紹介する。これは,時系列データセットやタスク,すなわち Automated Contrastive Learning (AutoCL) の表現を,対照的に学習する。まず,データ拡張,埋め込み変換,コントラッシブペア構築,コントラスト損失を網羅した,3×1012以上の普遍探索空間を構築した。さらに,検証タスクの性能からCRSを最適化し,空間内でより効率的なCRSを得る効率的な強化学習アルゴリズムを提案する。さまざまな実世界のタスクとデータセットに関する実験結果は、AutoCLが与えられたデータセットとタスクに適したCLSを自動的に見つけることができることを示している。 AutoCLがいくつかのパブリックデータセット/タスクで見つけた候補CLSから、転送可能な汎用戦略(GGS)を構成し、他のデータセットに対して強力なパフォーマンスを提供します。また,今後のLCS設計の指針として実証分析を行った。 In recent years, Contrastive Learning (CL) has become a predominant representation learning paradigm for time series. Most existing methods in the literature focus on manually building specific Contrastive Learning Strategies (CLS) by human heuristics for certain datasets and tasks. However, manually developing CLS usually require excessive prior knowledge about the datasets and tasks, e.g., professional cognition of the medical time series in healthcare, as well as huge human labor and massive experiments to determine the detailed learning configurations. In this paper, we present an Automated Machine Learning (AutoML) practice at Microsoft, which automatically learns to contrastively learn representations for various time series datasets and tasks, namely Automated Contrastive Learning (AutoCL). We first construct a principled universal search space of size over 3x1012, covering data augmentation, embedding transformation, contrastive pair construction and contrastive losses. Further, we introduce an efficient reinforcement learning algorithm, which optimizes CLS from the performance on the validation tasks, to obtain more effective CLS within the space. Experimental results on various real-world tasks and datasets demonstrate that AutoCL could automatically find the suitable CLS for a given dataset and task. From the candidate CLS found by AutoCL on several public datasets/tasks, we compose a transferable Generally Good Strategy (GGS), which has a strong performance for other datasets. We also provide empirical analysis as a guidance for future design of CLS.	翻訳日:2024-03-20 14:33:18 公開日:2024-03-19
# あなたの脳はいつあなたを知るのか? セグメント長と脳波による生体認証精度への影響 When Does Your Brain Know You? Segment Length and Its Impact on EEG-based Biometric Authentication Accuracy ( http://arxiv.org/abs/2403.12644v1 ) ライセンス: Link先を確認	Nibras Abo Alzahab, Lorenzo Scalise, Marco Baldi,	(参考訳) 本研究は,脳波に基づく生体認証を最適化する上で,性能を犠牲にすることなく正確な識別を行うための重要なバランスについて検討する。セグメント期間の方法論的な探索と、さまざまな高度な機械学習モデルの使用を通じて、脳波データが認証目的のために最大情報収率を提供するしきい値の特定を試みる。この知見は,非侵襲的生体認証技術の分野を推し進め,安全でユーザフレンドリな識別認証システムへの実践的アプローチを提案するとともに,脳波に基づく生体認証の現実的な適用を制御環境を超えて検討することを目的としている。 In the quest for optimal EEG-based biometric authentication, this study investigates the pivotal balance for accurate identification without sacrificing performance or adding unnecessary computational complexity. Through a methodical exploration of segment durations, and employing a variety of sophisticated machine learning models, the research seeks to pinpoint a threshold where EEG data provides maximum informational yield for authentication purposes. The findings are set to advance the field of non-invasive biometric technologies, proposing a practical approach to secure and user-friendly identity verification systems while also raising considerations for the real-world application of EEG-based biometric authentication beyond controlled environments.	翻訳日:2024-03-20 14:33:18 公開日:2024-03-19
# 帰納的論理的問合せ解答のための Prompt-fused フレームワーク Prompt-fused framework for Inductive Logical Query Answering ( http://arxiv.org/abs/2403.12646v1 ) ライセンス: Link先を確認	Zezhong Xu, Peng Ye, Lei Liang, Huajun Chen, Wen Zhang,	(参考訳) 知識グラフ(KG)上の論理的クエリを答えることは、機械推論にとって大きな課題となる。このタスクの主な障害は、KGsの固有の不完全性に起因する。既存の研究は、KGsにおけるエッジの欠如の問題に対処することに集中しており、その結果、新しい実体の出現という別の不完全性という側面を無視している。さらに、既存のメソッドの多くは、推論プロセス中にクエリ全体を包括的に解析するのではなく、それぞれの論理演算子を別々に推論する傾向があります。本稿では,既存のクエリ埋め込み手法を組み込んで,コンテキスト情報アグリゲーションによる新しいエンティティの埋め込みに対処する,Pro-QEという問合せ対応型プロンプトフューズフレームワークを提案する。さらに、シンボリッククエリをエンコードして生成されるクエリプロンプトを導入して、全体的な視点からクエリに関連する情報を集める。帰納的設定におけるモデルの有効性を評価するために,2つの新しい挑戦的ベンチマークを導入する。実験結果から,本モデルが論理的クエリにおける未知のエンティティの問題にうまく対処できることが示唆された。さらに、アブレーション研究は凝集剤の有効性を確認し、成分を誘導する。 Answering logical queries on knowledge graphs (KG) poses a significant challenge for machine reasoning. The primary obstacle in this task stems from the inherent incompleteness of KGs. Existing research has predominantly focused on addressing the issue of missing edges in KGs, thereby neglecting another aspect of incompleteness: the emergence of new entities. Furthermore, most of the existing methods tend to reason over each logical operator separately, rather than comprehensively analyzing the query as a whole during the reasoning process. In this paper, we propose a query-aware prompt-fused framework named Pro-QE, which could incorporate existing query embedding methods and address the embedding of emerging entities through contextual information aggregation. Additionally, a query prompt, which is generated by encoding the symbolic query, is introduced to gather information relevant to the query from a holistic perspective. To evaluate the efficacy of our model in the inductive setting, we introduce two new challenging benchmarks. Experimental results demonstrate that our model successfully handles the issue of unseen entities in logical queries. Furthermore, the ablation study confirms the efficacy of the aggregator and prompt components.	翻訳日:2024-03-20 14:33:18 公開日:2024-03-19
# InBox: 興味深いボックス埋め込みを用いた知識グラフによる推奨 InBox: Recommendation with Knowledge Graph using Interest Box Embedding ( http://arxiv.org/abs/2403.12649v1 ) ライセンス: Link先を確認	Zezhong Xu, Yincen Qu, Wen Zhang, Lei Liang, Huajun Chen,	(参考訳) 知識グラフ(KG)は、現代のレコメンデータシステムにおいて重要な存在であり、性能と解釈可能性を大幅に向上させてきた。基本的に、リコメンデータシステムは、歴史的相互作用に基づいてユーザーの興味を識別し、適切な項目を推薦することを目的としている。しかし、既存の研究は、(1)関心が潜在的に大きな関連項目に対応すること、(2)KG情報の明確できめ細かな活用の欠如、および関心の接続性という2つの主要な課題を見落としている。これにより、エンティティと関心の区別を単一の方法でモデル化することができない。さらに、リコメンデーションに使用される知識グラフにおける概念の粒度は粗い傾向にあり、ユーザ興味の微細な性質と一致しない。この均質化は知識グラフデータの正確な利用と利害関係を制限している。これらの制約に対処するために、InBoxと呼ばれる新しい埋め込みモデルを導入する。具体的には、様々な知識グラフのエンティティと関係をポイントやボックスとして埋め込んだ上で、ユーザの興味はインタラクション履歴を含むボックスとしてモデル化される。興味をボックスとして表現することで、その関心に関連する項目点のコレクションを格納することができる。さらに、興味は多様な基本概念からなり、ボックス交叉は自然に概念の組み合わせをサポートすることを提案する。 3つのトレーニングステップを通じて、InBoxは推奨タスクにおいてHAKGやKGINといった最先端のメソッドを著しく上回っている。さらに分析することで、さまざまなKGデータの変数値に関する意味のある洞察をレコメンデーションに提供します。要約すると、InBoxは高度な知識グラフの活用のためのボックスベースの関心と概念モデリングを通じてレコメンデータシステムを前進させる。 Knowledge graphs (KGs) have become vitally important in modern recommender systems, effectively improving performance and interpretability. Fundamentally, recommender systems aim to identify user interests based on historical interactions and recommend suitable items. However, existing works overlook two key challenges: (1) an interest corresponds to a potentially large set of related items, and (2) the lack of explicit, fine-grained exploitation of KG information and interest connectivity. This leads to an inability to reflect distinctions between entities and interests when modeling them in a single way. Additionally, the granularity of concepts in the knowledge graphs used for recommendations tends to be coarse, failing to match the fine-grained nature of user interests. This homogenization limits the precise exploitation of knowledge graph data and interest connectivity. To address these limitations, we introduce a novel embedding-based model called InBox. Specifically, various knowledge graph entities and relations are embedded as points or boxes, while user interests are modeled as boxes encompassing interaction history. Representing interests as boxes enables containing collections of item points related to that interest. We further propose that an interest comprises diverse basic concepts, and box intersection naturally supports concept combination. Across three training steps, InBox significantly outperforms state-of-the-art methods like HAKG and KGIN on recommendation tasks. Further analysis provides meaningful insights into the variable value of different KG data for recommendations. In summary, InBox advances recommender systems through box-based interest and concept modeling for sophisticated knowledge graph exploitation.	翻訳日:2024-03-20 14:23:34 公開日:2024-03-19
# 誤差推定を伴うパラメトリックPDEのための適応型多レベルニューラルネットワーク Adaptive Multilevel Neural Networks for Parametric PDEs with Error Estimation ( http://arxiv.org/abs/2403.12650v1 ) ライセンス: Link先を確認	Janina E. Schütte, Martin Eigel,	(参考訳) 高次元パラメータ依存偏微分方程式(pPDE)を解くため、ニューラルネットワークアーキテクチャを提案する。モデルデータのパラメータを対応する有限要素解にマッピングするために構築される。トレーニング効率を向上し、近似誤差の制御を可能にするため、適応有限要素法(AFEM)を模倣する。 AFEMで生成された粗いグリッド解と一連の補正を出力し、ネットワークの連続する層上でエラーの減衰を追跡する。観測された誤差は、信頼性の高い残差に基づく後誤差推定器によって測定され、ネットワークの出力における近似のパラメータをわずかに減らすことができる。これにより、局所的に洗練された格子上の解の適応表現が導かれる。さらに、AFEMの各解は階層的に離散化される。アーキテクチャでは、畳み込みニューラルネットワーク(CNN)が選択される。階層的な基盤は、細分化されたメッシュのスパースイメージを処理できる。さらに、より微細なレベルの補正が振幅の減少、すなわち全体近似の重要度を減少させるため、ネットワーク近似の精度を連続的に低下させることができる。これは、トレーニングに使用される生成された高忠実度サンプルの数や、細いグリッド出力に責任を負うネットワークコンポーネントのサイズに組み込むことができる。アーキテクチャを説明し、予備的な数値例を示す。 To solve high-dimensional parameter-dependent partial differential equations (pPDEs), a neural network architecture is presented. It is constructed to map parameters of the model data to corresponding finite element solutions. To improve training efficiency and to enable control of the approximation error, the network mimics an adaptive finite element method (AFEM). It outputs a coarse grid solution and a series of corrections as produced in an AFEM, allowing a tracking of the error decay over successive layers of the network. The observed errors are measured by a reliable residual based a posteriori error estimator, enabling the reduction to only few parameters for the approximation in the output of the network. This leads to a problem adapted representation of the solution on locally refined grids. Furthermore, each solution of the AFEM is discretized in a hierarchical basis. For the architecture, convolutional neural networks (CNNs) are chosen. The hierarchical basis then allows to handle sparse images for finely discretized meshes. Additionally, as corrections on finer levels decrease in amplitude, i.e., importance for the overall approximation, the accuracy of the network approximation is allowed to decrease successively. This can either be incorporated in the number of generated high fidelity samples used for training or the size of the network components responsible for the fine grid outputs. The architecture is described and preliminary numerical examples are presented.	翻訳日:2024-03-20 14:23:34 公開日:2024-03-19
# 画像とテキスト誘導によるチューニング不要な画像カスタマイズ Tuning-Free Image Customization with Image and Text Guidance ( http://arxiv.org/abs/2403.12658v1 ) ライセンス: Link先を確認	Pengzhi Li, Qiang Nie, Ying Chen, Xi Jiang, Kai Wu, Yuhuan Lin, Yong Liu, Jinlong Peng, Chengjie Wang, Feng Zheng,	(参考訳) 拡散モデルによる画像カスタマイズの大幅な進歩にもかかわらず、現在の手法にはいくつかの制限がある。 1) 画像全体を再生する際に,意図しない非目標領域の変化 2 参照画像又はテキスト記述のみによる指導及び 3) 時間を要する微調整により, 実用化が制限される。そこで本研究では,テキスト画像誘導画像の同時カスタマイズのためのチューニング不要なフレームワークを導入し,特定の画像領域の正確な編集を数秒以内で行えるようにした。提案手法は,テキスト記述に基づく詳細な属性の修正が可能でありながら,参照画像のセマンティックな特徴を保っている。そこで本研究では,UNetデコーダに自己注意機能をブレンドする新しい注意ブレンディング戦略を提案する。我々の知る限り、これは特定の領域における画像のカスタマイズにテキストと画像のガイダンスを同時に利用する初めてのチューニング不要な手法である。提案手法は, 画像合成, デザイン, クリエイティビティ・フォトグラフィーなど, 様々な実践的応用において, 人的, 定量的評価において, 従来の手法よりも優れている。 Despite significant advancements in image customization with diffusion models, current methods still have several limitations: 1) unintended changes in non-target areas when regenerating the entire image; 2) guidance solely by a reference image or text descriptions; and 3) time-consuming fine-tuning, which limits their practical application. In response, we introduce a tuning-free framework for simultaneous text-image-guided image customization, enabling precise editing of specific image regions within seconds. Our approach preserves the semantic features of the reference image subject while allowing modification of detailed attributes based on text descriptions. To achieve this, we propose an innovative attention blending strategy that blends self-attention features in the UNet decoder during the denoising process. To our knowledge, this is the first tuning-free method that concurrently utilizes text and image guidance for image customization in specific regions. Our approach outperforms previous methods in both human and quantitative evaluations, providing an efficient solution for various practical applications, such as image synthesis, design, and creative photography.	翻訳日:2024-03-20 14:23:34 公開日:2024-03-19
# 深層学習を用いたゼオライト吸着特性予測 Zeolite Adsorption Property Prediction using Deep Learning ( http://arxiv.org/abs/2403.12659v1 ) ライセンス: Link先を確認	Marko Petković, José Manuel Vicent-Luna, Vlado Menkovski, Sofía Calero,	(参考訳) ゼオライトの吸着特性を効率的に予測する能力は、新規材料の設計プロセスの促進に大きな利益をもたらす可能性がある。これらの材料に対する既存の構成空間は広く、既存の分子シミュレーション手法は計算コストが高い。本研究では,分子シミュレーションと比較して吸着特性が4～5桁高速なモデルを提案する。モデルを検証するため,MOR,MFI,RHOおよびITWゼオライトのアルミニウム組成と吸着熱,およびモンテカルロシミュレーションから得られたCO$_2$のヘンリー係数を含むデータセットを作成した。機械学習モデルから得られた予測はモンテカルロシミュレーションから得られた値と一致し、そのモデルが特性予測に利用できることを確認する。さらに, 本モデルを用いて吸着部位の同定を行った。最後に、遺伝的アルゴリズムと組み合わせて新しいゼオライト構成を生成するためのモデルの有効性を評価する。 The ability to efficiently predict adsorption properties of zeolites can be of large benefit in accelerating the design process of novel materials. The existing configuration space for these materials is wide, while existing molecular simulation methods are computationally expensive. In this work, we propose a model which is 4 to 5 orders of magnitude faster at adsorption properties compared to molecular simulations. To validate the model, we generated datasets containing various aluminium configurations for the MOR, MFI, RHO and ITW zeolites along with their heat of adsorptions and Henry coefficients for CO$_2$, obtained from Monte Carlo simulations. The predictions obtained from the Machine Learning model are in agreement with the values obtained from the Monte Carlo simulations, confirming that the model can be used for property prediction. Furthermore, we show that the model can be used for identifying adsorption sites. Finally, we evaluate the capability of our model for generating novel zeolite configurations by using it in combination with a genetic algorithm.	翻訳日:2024-03-20 14:23:34 公開日:2024-03-19
# ERASE:Deep Recommenderシステムのためのベンチマーク機能選択手法 ERASE: Benchmarking Feature Selection Methods for Deep Recommender Systems ( http://arxiv.org/abs/2403.12660v1 ) ライセンス: Link先を確認	Pengyue Jia, Yejing Wang, Zhaocheng Du, Xiangyu Zhao, Yichao Wang, Bo Chen, Wanyu Wang, Huifeng Guo, Ruiming Tang,	(参考訳) Deep Recommender Systems(DRS)は、より正確なレコメンデーションのために、多くの機能フィールドに依存している。その結果, 効率的な特徴選択手法は, 精度をさらに向上し, 配置要求に合うように, ストレージ効率を最適化するために重要になっている。この研究領域は、特にDSSの文脈において、生まれてから3つの課題に直面している。第一に、研究論文にまたがる様々な実験装置は、しばしば不公平な比較をもたらし、実践的な洞察を妨げている。第二に、既存の文献では、大規模なデータセットに基づく選択属性の詳細な分析が欠如しており、選択手法とDSSのバックボーンの徹底的な比較が、発見の一般化性を制限し、DSSへの展開を妨げている。最後に、しばしば特徴選択法によって達成可能なピーク性能の比較に焦点をあてるが、これは典型的には最適なハイパーパラメータを識別できないアプローチであり、これらの手法の堅牢性と安定性を評価するために見落としている。これらのギャップを埋めるために,本論文では,DRSのためのフェースセレクションのための包括的bEnchmaRkであるERASEについて述べる。 ERASEは、従来のとディープラーニングの両方のアプローチをカバーし、4つのパブリックデータセット、プライベート産業データセット、および現実世界の商用プラットフォームを通じて、11のフィーチャーセレクションメソッドを徹底的に評価し、大幅な拡張を実現している。私たちのコードは簡単に再現できる。 Deep Recommender Systems (DRS) are increasingly dependent on a large number of feature fields for more precise recommendations. Effective feature selection methods are consequently becoming critical for further enhancing the accuracy and optimizing storage efficiencies to align with the deployment demands. This research area, particularly in the context of DRS, is nascent and faces three core challenges. Firstly, variant experimental setups across research papers often yield unfair comparisons, obscuring practical insights. Secondly, the existing literature's lack of detailed analysis on selection attributes, based on large-scale datasets and a thorough comparison among selection techniques and DRS backbones, restricts the generalizability of findings and impedes deployment on DRS. Lastly, research often focuses on comparing the peak performance achievable by feature selection methods, an approach that is typically computationally infeasible for identifying the optimal hyperparameters and overlooks evaluating the robustness and stability of these methods. To bridge these gaps, this paper presents ERASE, a comprehensive bEnchmaRk for feAture SElection for DRS. ERASE comprises a thorough evaluation of eleven feature selection methods, covering both traditional and deep learning approaches, across four public datasets, private industrial datasets, and a real-world commercial platform, achieving significant enhancement. Our code is available online for ease of reproduction.	翻訳日:2024-03-20 14:23:34 公開日:2024-03-19
# 自動MLアンサンブルの解読:牛の意思決定支援 Deciphering AutoML Ensembles: cattleia's Assistance in Decision-Making ( http://arxiv.org/abs/2403.12664v1 ) ライセンス: Link先を確認	Anna Kozak, Dominik Kędzierski, Jakub Piwko, Malwina Wojewoda, Katarzyna Woźnica,	(参考訳) 多くのアプリケーションにおいて、モデルアンサンブルは単一の予測モデルよりも優れていることが証明されている。したがって、これはAutomated Machine Learning (AutoML)において最も一般的な後処理技術である。最も人気のあるフレームワークは、最終モデルの解釈可能性を減らすためにアンサンブルを使用する。私たちの研究では、回帰、マルチクラス、バイナリ分類タスクのアンサンブルを解読するアプリケーションとして、牛飼いを提案する。このツールは、Auto-sklearn、AutoGluon、FLAMLという3つのAutoMLパッケージによって構築されたモデルで動作する。与えられたアンサンブルは異なる視点から分析される。我々は,アンサンブルとそのコンポーネントモデルの評価指標を用いて,予測性能調査を行う。モデル予測の多様性と相補性を評価するための新しい尺度を導入することで、検証の視点を広げる。さらに、変数の重要性を検討するために、説明可能な人工知能(XAI)技術を適用した。得られた知見を要約すると、所望の方法でアンサンブルを調整するための修正ツールを用いて重みを調査・調整することができる。このアプリケーションは、専用のインタラクティブな可視化を通じて上記の側面を提供し、多様なオーディエンスにアクセスできる。私たちは、この牛飼いが意思決定のユーザを支援し、AutoMLフレームワークの理解を深めることができると信じています。 In many applications, model ensembling proves to be better than a single predictive model. Hence, it is the most common post-processing technique in Automated Machine Learning (AutoML). The most popular frameworks use ensembles at the expense of reducing the interpretability of the final models. In our work, we propose cattleia - an application that deciphers the ensembles for regression, multiclass, and binary classification tasks. This tool works with models built by three AutoML packages: auto-sklearn, AutoGluon, and FLAML. The given ensemble is analyzed from different perspectives. We conduct a predictive performance investigation through evaluation metrics of the ensemble and its component models. We extend the validation perspective by introducing new measures to assess the diversity and complementarity of the model predictions. Moreover, we apply explainable artificial intelligence (XAI) techniques to examine the importance of variables. Summarizing obtained insights, we can investigate and adjust the weights with a modification tool to tune the ensemble in the desired way. The application provides the aforementioned aspects through dedicated interactive visualizations, making it accessible to a diverse audience. We believe the cattleia can support users in decision-making and deepen the comprehension of AutoML frameworks.	翻訳日:2024-03-20 14:23:34 公開日:2024-03-19
# 多次元機械翻訳評価 : 韓国のモデル評価と資源 Multi-Dimensional Machine Translation Evaluation: Model Evaluation and Resource for Korean ( http://arxiv.org/abs/2403.12666v1 ) ライセンス: Link先を確認	Dojun Park, Sebastian Padó,	(参考訳) 機械翻訳のマニュアルまたは自動評価のためのほとんどのフレームワークは、単一の番号でMT出力の品質を特徴付ける。例外としてMultidimensional Quality Metrics(MQM)フレームワークがある。従来の研究では、MQMアノテーションが実現可能であることが実証されているが、リソース不足のため、新しいテキストに対するMQMスコアを予測する計算モデルはない。本稿では,これらの問題点に対処する。 (a)英語と韓国語を合わせて1200文のMQM評価ベンチマークを提供する b) MT評価は,参照ベースMT評価設定と参照フリー品質評価(QE)設定の両方において,SOTA言語モデルを用いて複数のMQMスコアを同時に予測するマルチタスク問題である。参照なしのセットアップはスタイルの寸法においてそれよりも優れており、参照ベースモデルは精度に関するエッジを保持する。全体として、RemBERTは最も有望なモデルとして現れます。評価を通じて、よりきめ細かな解釈可能な方法で翻訳品質に関する洞察を提供する。 Almost all frameworks for the manual or automatic evaluation of machine translation characterize the quality of an MT output with a single number. An exception is the Multidimensional Quality Metrics (MQM) framework which offers a fine-grained ontology of quality dimensions for scoring (such as style, fluency, accuracy, and terminology). Previous studies have demonstrated the feasibility of MQM annotation but there are, to our knowledge, no computational models that predict MQM scores for novel texts, due to a lack of resources. In this paper, we address these shortcomings by (a) providing a 1200-sentence MQM evaluation benchmark for the language pair English-Korean and (b) reframing MT evaluation as the multi-task problem of simultaneously predicting several MQM scores using SOTA language models, both in a reference-based MT evaluation setup and a reference-free quality estimation (QE) setup. We find that reference-free setup outperforms its counterpart in the style dimension while reference-based models retain an edge regarding accuracy. Overall, RemBERT emerges as the most promising model. Through our evaluation, we offer an insight into the translation quality in a more fine-grained, interpretable manner.	翻訳日:2024-03-20 14:23:34 公開日:2024-03-19
# 音声によるアニマトロニクスロボット顔表情の駆動 Driving Animatronic Robot Facial Expression From Speech ( http://arxiv.org/abs/2403.12670v1 ) ライセンス: Link先を確認	Boren Li, Hang Li, Hangxin Liu,	(参考訳) アニマトロニクスロボットは、ライフライクな表情を通して自然な人間とロボットの相互作用を可能にすることを目的としている。しかし、顔のバイオメカニクスと応答性動作合成の複雑さのため、現実的な音声同期型ロボット表現の生成は困難である。本稿では,音声からアニマトロニクスロボットの表情を駆動するスキン中心方式を提案する。提案手法では、リニアブレンドスキンニング(LBS)を中心表現として、エンボディメント設計とモーション合成における密に統合されたイノベーションを導出する。 LBSはアクティベーショントポロジを通知し、人間の表情の再ターゲティングを可能にし、音声による顔の動き生成を可能にする。提案手法は、アニマトロニックな顔の音声から、非常にリアルでリアルタイムな表情を生成することができ、自然な相互作用のために、人間の表情を再現するロボットの能力を著しく向上させることができる。 Animatronic robots aim to enable natural human-robot interaction through lifelike facial expressions. However, generating realistic, speech-synchronized robot expressions is challenging due to the complexities of facial biomechanics and responsive motion synthesis. This paper presents a principled, skinning-centric approach to drive animatronic robot facial expressions from speech. The proposed approach employs linear blend skinning (LBS) as the core representation to guide tightly integrated innovations in embodiment design and motion synthesis. LBS informs the actuation topology, enables human expression retargeting, and allows speech-driven facial motion generation. The proposed approach is capable of generating highly realistic, real-time facial expressions from speech on an animatronic face, significantly advancing robots' ability to replicate nuanced human expressions for natural interaction.	翻訳日:2024-03-20 14:23:34 公開日:2024-03-19
# GitHub CopilotによるAIベースのコード合成のセキュリティ向上 Enhancing Security of AI-Based Code Synthesis with GitHub Copilot via Cheap and Efficient Prompt-Engineering ( http://arxiv.org/abs/2403.12671v1 ) ライセンス: Link先を確認	Jakub Res, Ivan Homoliak, Martin Perešíni, Aleš Smrčka, Kamil Malinka, Petr Hanacek,	(参考訳) コーディングのためのAIアシスタントが増えている。しかし、開発者や企業がその潜在能力を最大限に活用することを避けている理由の1つは、生成されたコードのセキュリティが疑わしいことである。本稿ではまず,現状を概観し,今後の課題について述べる。そこで我々は,GitHub Copilotのような(プロプライエタリなブラックボックスであっても)AIベースのコードジェネレータのより優れたコードセキュリティを実現するために,ユーザの視点,計算リソース,運用コストからアプリケーションの複雑さを最小限に抑えるために,プロンプトアタリング手法に基づく体系的なアプローチを提案する。本稿では,(1)シナリオ特化,(2)反復,(3)一般節の3つのプロンプト変更手法を提案し,その組み合わせについて議論する。コードセキュリティの監査とは対照的に、提案された2つのメソッドでは、ユーザからの専門家の知識を必要としない。我々は,OpenVPNプロジェクトを用いたGitHub Copilotにおける提案手法の有効性を現実的なシナリオで評価し,提案手法が生成したセキュアでないコードサンプルの数を最大16倍に削減し,セキュアなコードの数を最大8倍に向上させることを実証した。このアプローチはAIモデルの内部へのアクセスを必要としないため、一般的には、GitHub Copilotだけでなく、AIベースのコードシンセサイザーにも適用できます。 AI assistants for coding are on the rise. However one of the reasons developers and companies avoid harnessing their full potential is the questionable security of the generated code. This paper first reviews the current state-of-the-art and identifies areas for improvement on this issue. Then, we propose a systematic approach based on prompt-altering methods to achieve better code security of (even proprietary black-box) AI-based code generators such as GitHub Copilot, while minimizing the complexity of the application from the user point-of-view, the computational resources, and operational costs. In sum, we propose and evaluate three prompt altering methods: (1) scenario-specific, (2) iterative, and (3) general clause, while we discuss their combination. Contrary to the audit of code security, the latter two of the proposed methods require no expert knowledge from the user. We assess the effectiveness of the proposed methods on the GitHub Copilot using the OpenVPN project in realistic scenarios, and we demonstrate that the proposed methods reduce the number of insecure generated code samples by up to 16\% and increase the number of secure code by up to 8\%. Since our approach does not require access to the internals of the AI models, it can be in general applied to any AI-based code synthesizer, not only GitHub Copilot.	翻訳日:2024-03-20 14:23:34 公開日:2024-03-19

Title

Authors

Abstract

論文公表日・翻訳日

# ソーシャル・メディア・ストリーミング・データに基づくリアルタイムの適切な思考予測のためのビッグデータ分析システム

A Big Data Analytics System for Predicting Suicidal Ideation in Real-Time Based on Social Media Streaming Data ( http://arxiv.org/abs/2404.12394v1 )

ライセンス: Link先を確認

Mohamed A. Allayla, Serkan Ayvaz,

(参考訳) オンラインソーシャルメディアプラットフォームは、最近、私たちの社会や日々のルーチンに不可欠なものになっています。世界中のユーザーは毎日、このようなプラットフォームに何時間も費やし、感情や感情の状態を表現し、お互いに連絡を取り合っている。このような膨大なデータをこれらのプラットフォームから分析することで、世論の感情を明確に把握し、彼らの精神状態を検出することができる。これらの健康状態のリスクを早期に特定することは、自殺のアイデアの数を予防したり減らしたり、人々の命を救ったりするのに役立ちます。従来の手法は、ストリームや大規模なデータセットを処理するのに効果がない。そこで本稿では,ソーシャルメディアコンテンツからの自殺思考を予測するため,ビッグデータアーキテクチャに基づく新たな手法を提案する。提案手法は、バッチ処理とリアルタイムストリーミング予測という2つのフェーズで、ソーシャルメディアデータの実用的な分析を提供する。バッチデータセットはRedditフォーラムから収集され、モデル構築とトレーニングに使用され、ビッグデータのストリーミングはTwitterストリーミングAPIを使用して抽出され、リアルタイム予測に使用された。生データを前処理した後、抽出した機能は複数のApache Spark ML分類器(NB、LR、LinearSVC、DT、RF、MLP)に供給された。様々なテストシナリオを用いて,様々な特徴抽出手法を用いて様々な実験を行った。バッチ処理フェーズの実験結果から, (Unigram + Bigram) + CV-IDF と MLP の分類器を用いて抽出した特徴は, 93.47%の精度で自殺思考を分類し, リアルタイムストリーミング予測フェーズに適用できることを示した。

Online social media platforms have recently become integral to our society and daily routines. Every day, users worldwide spend a couple of hours on such platforms, expressing their sentiments and emotional state and contacting each other. Analyzing such huge amounts of data from these platforms can provide a clear insight into public sentiments and help detect their mental status. The early identification of these health condition risks may assist in preventing or reducing the number of suicide ideation and potentially saving people's lives. The traditional techniques have become ineffective in processing such streams and large-scale datasets. Therefore, the paper proposed a new methodology based on a big data architecture to predict suicidal ideation from social media content. The proposed approach provides a practical analysis of social media data in two phases: batch processing and real-time streaming prediction. The batch dataset was collected from the Reddit forum and used for model building and training, while streaming big data was extracted using Twitter streaming API and used for real-time prediction. After the raw data was preprocessed, the extracted features were fed to multiple Apache Spark ML classifiers: NB, LR, LinearSVC, DT, RF, and MLP. We conducted various experiments using various feature-extraction techniques with different testing scenarios. The experimental results of the batch processing phase showed that the features extracted of (Unigram + Bigram) + CV-IDF with MLP classifier provided high performance for classifying suicidal ideation, with an accuracy of 93.47%, and then applied for real-time streaming prediction phase.

翻訳日:2024-07-01 11:58:46 公開日:2024-03-19

# Rigid ICPテンプレートアライメントとボクセル空間再構成に基づく半自動頭蓋内インプラント設計ツール

A Semi-automatic Cranial Implant Design Tool Based on Rigid ICP Template Alignment and Voxel Space Reconstruction ( http://arxiv.org/abs/2404.15287v1 )

ライセンス: Link先を確認

Michael Lackner, Behrus Puladi, Jens Kleesiek, Jan Egger, Jianning Li,

(参考訳) 外傷性疾患では、患者は頭蓋形成術(頭蓋インプラントを用いた神経頭蓋修復術)に大きく依存する。近年の進歩にもかかわらず、患者固有のインプラント(PSI)の設計は、頭蓋形成術において最も複雑で、高価で、かつ、最も自動化されていないタスクである。この分野のさらなる研究が必要である。そこで我々は,ハイレベルな動作しか行わないセミオートマチックなインプラント生成に適したグラフィカルユーザインタフェース(UI)を備えたプロトタイプアプリケーションを作成した。提案したインプラント生成プロセスの概略は、関心領域を設定し、テンプレートを整列させ、その後、ボクセル空間にインプラントを作成することである。さらに, 欠陥境界近傍のクリップ状形状を考慮すれば, アライメントを著しく改善できることを示す。ソフトウェアプロトタイプはhttps://github.com/3Descape/Cranial_Implant_Designでオープンソース化される。

In traumatic medical emergencies, the patients heavily depend on cranioplasty - the craft of neurocranial repair using cranial implants. Despite the improvements made in recent years, the design of a patient-specific implant (PSI) is among the most complex, expensive, and least automated tasks in cranioplasty. Further research in this area is needed. Therefore, we created a prototype application with a graphical user interface (UI) specifically tailored for semi-automatic implant generation, where the users only need to perform high-level actions. A general outline of the proposed implant generation process involves setting an area of interest, aligning the templates, and then creating the implant in voxel space. Furthermore, we show that the alignment can be improved significantly, by only considering clipped geometry in the vicinity of the defect border. The software prototype will be open-sourced at https://github.com/3Descape/Cranial_Implant_Design

翻訳日:2024-07-01 11:49:01 公開日:2024-03-19

# 混在無線環境に対するAGCインデックス管理アルゴリズム

Algorithm for AGC index management against crowded radio environment ( http://arxiv.org/abs/2404.08652v1 )

ライセンス: Link先を確認

Morgane Joly, Fabian Rivière, Éric Renault,

(参考訳) 本稿では,パケット受信に使用する最適な自動利得制御(AGC)指数,あるいは最も適切な可変利得範囲を推定し,ペイロード受信中に出現する干渉者を予測した。これにより、受信機は、ゲインフリードペイロード受信期間中に発生しても干渉者に高い免疫を与えることができ、なおかつ、最適な感度レベルを確保できる。その結果、受信機利得の設定は、受信感度とランダムな干渉者免疫との間に最適なトレードオフを得ることができる。

This paper describes a receiver that uses an innovative method to predict, according to history of receiver operating metrics (packet lost/well received), the optimum automatic gain control (AGC) index or most appropriate variable gain range to be used for next packet reception, anticipating an interferer appearing during the payload reception. This allows the receiver to have higher immunity to interferers even if they occur during the gain frozen payload reception period whilst still ensuring an optimum sensitivity level. As a result, the method allows setting the receiver gain to get an optimum trade-off between reception sensitivity and random interferer immunity.

翻訳日:2024-04-21 20:04:31 公開日:2024-03-19

# プログラマブルフォトニック集積回路の熱クロストークモデリングと補償法

Thermal Crosstalk Modelling and Compensation Methods for Programmable Photonic Integrated Circuits ( http://arxiv.org/abs/2404.10589v1 )

ライセンス: Link先を確認

Isidora Teofilovic, Ali Cem, David Sanchez-Jacome, Daniel Perez-Lopez, Francesco Da Ros,

(参考訳) フォトニック集積回路は光コンピューティングの分野で重要な役割を担い、デジタルコンピューティングに比べて高速でエネルギー効率の高い演算を約束する。この利点は、行列乗法を実行するための光信号の固有の適合性に起因している。しかし、熱クロストークのような決定論的現象でさえ、フォトニックチップの正確なプログラミングは難しい課題である。ここでは,統合可能フォトニックメッシュの異なる位置における熱クロストークの効果を予測するために,物理直観の多様性を取り入れた3つのモデルを訓練し,実験的に評価する。マイクロリング共振器のパワースペクトルにおける共振波長シフトによる熱クロストークの効果を定量化し, モデル化誤差<0.5 pm。クロストークによる波長シフトの補正によりモデルの有効性を実験的に検証する。最後に、トレーニングされていないチップのサーマルクロストークの効果を予測・補償するために、モデルの一つの一般化能力を評価し、2.0 pmのルート平均二乗誤差を明らかにした。

Photonic integrated circuits play an important role in the field of optical computing, promising faster and more energy-efficient operations compared to their digital counterparts. This advantage stems from the inherent suitability of optical signals to carry out matrix multiplication. However, even deterministic phenomena such as thermal crosstalk make precise programming of photonic chips a challenging task. Here, we train and experimentally evaluate three models incorporating varying degrees of physics intuition to predict the effect of thermal crosstalk in different locations of an integrated programmable photonic mesh. We quantify the effect of thermal crosstalk by the resonance wavelength shift in the power spectrum of a microring resonator implemented in the chip, achieving modelling errors <0.5 pm. We experimentally validate the models through compensation of the crosstalk-induced wavelength shift. Finally, we evaluate the generalization capabilities of one of the models by employing it to predict and compensate for the effect of thermal crosstalk for parts of the chip it was not trained on, revealing root-mean-square-errors of <2.0 pm.

翻訳日:2024-04-21 19:45:03 公開日:2024-03-19

# 知識処理と複雑なタスクにおける生成検索エンジンの利用

The Use of Generative Search Engines for Knowledge Work and Complex Tasks ( http://arxiv.org/abs/2404.04268v1 )

ライセンス: Link先を確認

Siddharth Suri, Scott Counts, Leijie Wang, Chacha Chen, Mengting Wan, Tara Safavi, Jennifer Neville, Chirag Shah, Ryen W. White, Reid Andersen, Georg Buscher, Sathish Manivannan, Nagu Rangan, Longqi Yang,

(参考訳) 最近まで、検索エンジンは人々がオンライン情報にアクセスするための主要な方法だった。近年の大規模言語モデル(LLM)の出現により、機械はテキスト、画像、コードなどの新しいデジタルアーティファクトを生成できるようになり、その結果、LLMの能力を従来の検索エンジンと組み合わせた新しいツール、生成検索エンジンが誕生した。 Bing Copilot(Bing Chat)の実証分析を通じて,Bing Copilotを用いたタスクのタイプと複雑さをBing Searchと比較して分析した。発見は、従来の検索エンジンよりも認知の複雑さが高い知識作業タスクのために、人々が生成検索エンジンを使用していることを示している。

Until recently, search engines were the predominant method for people to access online information. The recent emergence of large language models (LLMs) has given machines new capabilities such as the ability to generate new digital artifacts like text, images, code etc., resulting in a new tool, a generative search engine, which combines the capabilities of LLMs with a traditional search engine. Through the empirical analysis of Bing Copilot (Bing Chat), one of the first publicly available generative search engines, we analyze the types and complexity of tasks that people use Bing Copilot for compared to Bing Search. Findings indicate that people use the generative search engine for more knowledge work tasks that are higher in cognitive complexity than were commonly done with a traditional search engine.

翻訳日:2024-04-14 13:21:48 公開日:2024-03-19

# 推薦システムにおけるアルゴリズム的集合行動:プレイリストの並べ替えによる歌の促進

Algorithmic Collective Action in Recommender Systems: Promoting Songs by Reordering Playlists ( http://arxiv.org/abs/2404.04269v1 )

ライセンス: Link先を確認

Joachim Baumann, Celestine Mendler-Dünner,

(参考訳) 変圧器を用いた推薦システムにおけるアルゴリズム的集団行動について検討する。我々のユースケースは、アーティストがコントロールする既存のプレイリストに曲を戦略的に配置することで、アーティストの可視性を促進することを目的としたファンの集まりである。この曲の成功は、対象歌のテストタイムレコメンデーションの増加によって測定される。我々は,この目標に向けて,実装が容易な2つの戦略を導入し,主要な音楽ストリーミングプラットフォームがリリースするレコメンデータシステムモデル上で,その有効性を検証した。その結果,小集団(トレーニングデータの0.01%未満をコントロールしている)でさえ,楽曲挿入位置を戦略的に選択することで,推薦の25倍の増幅を達成できることが判明した。次に、戦略の外部性の調査に焦点をあてる。プラットフォームの性能損失は無視でき、他の曲の推薦は大部分が保存されており、参加者のユーザエクスペリエンスを損なうことが最小限である。さらに、コストは他のアーティストに均等に分配される。本研究は, 包括的行動戦略が必ずしも敵対的ではなく, インセンティブ, 社会的ダイナミクス, およびレコメンデーターシステムにおける均衡に関する新たな疑問を提起する上で有効であることを示すものである。

We investigate algorithmic collective action in transformer-based recommender systems. Our use case is a collective of fans aiming to promote the visibility of an artist by strategically placing one of their songs in the existing playlists they control. The success of the collective is measured by the increase in test-time recommendations of the targeted song. We introduce two easily implementable strategies towards this goal and test their efficacy on a publicly available recommender system model released by a major music streaming platform. Our findings reveal that even small collectives (controlling less than 0.01% of the training data) can achieve up 25x amplification of recommendations by strategically choosing the position at which to insert the song. We then focus on investigating the externalities of the strategy. We find that the performance loss for the platform is negligible, and the recommendations of other songs are largely preserved, minimally impairing the user experience of participants. Moreover, the costs are evenly distributed among other artists. Taken together, our findings demonstrate how collective action strategies can be effective while not necessarily being adversarial, raising new questions around incentives, social dynamics, and equilibria in recommender systems.

翻訳日:2024-04-14 13:21:48 公開日:2024-03-19

# 成長モニタリングのための深層学習によるパラガングリオーマの自動分離

Deep learning-based auto-segmentation of paraganglioma for growth monitoring ( http://arxiv.org/abs/2404.07952v1 )

ライセンス: Link先を確認

E. M. C. Sijben, J. C. Jansen, M. de Ridder, P. A. N. Bosman, T. Alderliesten,

(参考訳) 神経内分泌腫瘍(典型的には頭頸部の血管や神経経路に沿って形成される稀な神経内分泌腫瘍)の体積測定は、腫瘍の成長を長期にわたって監視・モデル化するために重要である。しかし、臨床実践では、これらの測定に利用可能なツールを使用することは時間がかかり、腫瘍形成の仮定やオブザーバ・オブザーバの変動に悩まされる。成長モデリングは、数十年前のジレンマ(腫瘍が時間とともにどのように発達するかの不確実性から考える)を解決する上で重要な役割を果たす可能性がある。パラガングリオーマ患者に治療を施すことにより、重度の症状を予防することができる。しかし、実際には必要のない患者を治療するには、不要な副作用や合併症が伴う。改良された測定技術は、大量の腫瘍の体積データによる成長モデルの研究を可能にし、これらの腫瘍が時間とともにどのように発達するかについての貴重な洞察を与える可能性がある。そこで我々は,no-new-UNnet (nnUNet) を用いたディープラーニングセグメンテーションモデルに基づく腫瘍体積自動計測手法を提案する。本研究では, 高齢者耳鼻咽喉科医による視覚検査と, モデルアウトプットと手動記述との比較により, 複数の観察者による手動記述の変動との比較などにより, モデルの性能を定量的に評価した。以上の結果から,手動のデライン化に匹敵する自動手法が(少なくとも)有効であることが示唆された。最後に、生成したモデルと、時間とともに腫瘍を追跡できるリンク手順を用いて、既知の成長関数の適合度に追加の体積測定がどう影響するかを示す。

Volume measurement of a paraganglioma (a rare neuroendocrine tumor that typically forms along major blood vessels and nerve pathways in the head and neck region) is crucial for monitoring and modeling tumor growth in the long term. However, in clinical practice, using available tools to do these measurements is time-consuming and suffers from tumor-shape assumptions and observer-to-observer variation. Growth modeling could play a significant role in solving a decades-old dilemma (stemming from uncertainty regarding how the tumor will develop over time). By giving paraganglioma patients treatment, severe symptoms can be prevented. However, treating patients who do not actually need it, comes at the cost of unnecessary possible side effects and complications. Improved measurement techniques could enable growth model studies with a large amount of tumor volume data, possibly giving valuable insights into how these tumors develop over time. Therefore, we propose an automated tumor volume measurement method based on a deep learning segmentation model using no-new-UNnet (nnUNet). We assess the performance of the model based on visual inspection by a senior otorhinolaryngologist and several quantitative metrics by comparing model outputs with manual delineations, including a comparison with variation in manual delineation by multiple observers. Our findings indicate that the automatic method performs (at least) equal to manual delineation. Finally, using the created model, and a linking procedure that we propose to track the tumor over time, we show how additional volume measurements affect the fit of known growth functions.

翻訳日:2024-04-14 13:03:36 公開日:2024-03-19

# AIはソーシャルメディアのクリエイティビティを創造する上で、人間のエキスパートより優れているか?

Can AI Outperform Human Experts in Creating Social Media Creatives? ( http://arxiv.org/abs/2404.00018v1 )

ライセンス: Link先を確認

Eunkyung Park, Raymond K. Wong, Junbum Kwon,

(参考訳) 人工知能はチェスやバデュークのような機能的なタスクにおいて、人間の専門家よりも優れています。創造的なタスクはどうでしょう? 本稿では、これまではほとんど研究されていない人間の専門家と比較して、創造的領域におけるAIの能力を評価する。本稿では,大規模言語モデルによる迅速な拡張を通じて,ソーシャルメディアの創造性を創出するための新しいPrompt-for-Promptを提案する。人気の高いInstagramの投稿(クリック数が最も多い)をトップブランドのInstagramアカウントに掲載して、ソーシャルメディアのクリエイティビティを創り出しています。我々はGPT 4にテキスト記述を用いたいくつかのプロンプトを与え、最先端のテキスト・ツー・イメージ・ジェネレータ(Midjourney、DALL E 3、Stable Diffusion)に最も効果的なプロンプトを生成する。 LLM拡張プロンプトは、ソーシャルメディアイメージ作成のための目標、エンゲージメント戦略、照明、ブランド整合性を追加することで、AIの能力を高めることができる。我々は、広範囲にわたる人的評価実験を行い、AIが人間の専門家に優れており、Midjourneyは他のテキストから画像へのジェネレータよりも優れていることを発見した。驚くことに、ソーシャルメディア業界における従来の知恵とは違って、アイキャッチを含むインストラクションは、自然なものよりもパフォーマンスが劣っている。クリエイティブのタイプに関しては、AIは動物や製品による創造性を改善するが、実際の人々による創造性は低下する。また、AIは長いテキスト記述よりも短いテキスト記述で創造性を向上する。

Artificial Intelligence has outperformed human experts in functional tasks such as chess and baduk. How about creative tasks? This paper evaluates AI's capability in the creative domain compared to human experts, which little research has been conducted so far. We propose a novel Prompt-for-Prompt to generate social media creatives via prompt augmentation by Large Language Models. We take the most popular Instagram posts (with the biggest number of like clicks) in top brands' Instagram accounts to create social media creatives. We give GPT 4 several prompt instructions with text descriptions to generate the most effective prompts for cutting-edge text-to-image generators: Midjourney, DALL E 3, and Stable Diffusion. LLM-augmented prompts can boost AI's abilities by adding objectives, engagement strategy, lighting and brand consistency for social media image creation. We conduct an extensive human evaluation experiment, and find that AI excels human experts, and Midjourney is better than the other text-to-image generators. Surprisingly, unlike conventional wisdom in the social media industry, prompt instruction including eye-catching shows much poorer performance than those including natural. Regarding the type of creatives, AI improves creatives with animals or products but less with real people. Also, AI improves creatives with short text descriptions more than with long text descriptions, because there is more room for AI to augment prompts with shorter descriptions.

翻訳日:2024-04-07 23:17:33 公開日:2024-03-19

# 説明可能な自動運転車システムの進化 : 総合的なレビューと研究ロードマップ

Advancing Explainable Autonomous Vehicle Systems: A Comprehensive Review and Research Roadmap ( http://arxiv.org/abs/2404.00019v1 )

ライセンス: Link先を確認

Sule Tekkesinoglu, Azra Habibovic, Lars Kunze,

(参考訳) 自律走行車(AV)の既存の説明可能性手法がステークホルダーのニーズにどのように適合しているかという不確実性を考えると、説明を必要とする状況や適切なインタラクション戦略を決定するために徹底的な調査が不可欠である。 AVエコシステムにおける様々な関心や期待と現在のアプローチの整合性を評価するためには、包括的なレビューが不可欠である。本稿では,より効果的かつ包括的説明可能なAVシステムの開発を促進するために,説明生成とプレゼンテーションに関連する複雑さについて論じる。本研究は,既存の文献を説明課題,説明情報,説明情報通信の3つの主要なトピックに分類することにつながった。我々の洞察に基づいて、我々は今後の研究の総合的なロードマップを提案してきた。 (i)インターロケータを知ること。 (二)タイムリーな説明を作成すること。 (二)人間に優しい説明、(四)継続的学習。私たちのロードマップは、責任ある研究とイノベーションの原則に基づき、多様な説明要件の重要性を強調しています。説明可能なAVシステムの実装に関わる課題に効果的に取り組むため,プライバシー保護データ統合,倫理的枠組み,リアルタイム分析,人間中心のインタラクション設計,学際的コラボレーションの強化など,さまざまな研究方針を整理した。これらの研究の方向性を探求することにより、ユーザニーズ、技術進歩、規制順守、倫理的配慮の全体的理解から情報を得て、説明可能なAVの開発と展開をガイドし、より安全で信頼性の高い自動運転体験を確保することを目的とする。

Given the uncertainty surrounding how existing explainability methods for autonomous vehicles (AVs) meet the diverse needs of stakeholders, a thorough investigation is imperative to determine the contexts requiring explanations and suitable interaction strategies. A comprehensive review becomes crucial to assess the alignment of current approaches with the varied interests and expectations within the AV ecosystem. This study presents a review to discuss the complexities associated with explanation generation and presentation to facilitate the development of more effective and inclusive explainable AV systems. Our investigation led to categorising existing literature into three primary topics: explanatory tasks, explanatory information, and explanatory information communication. Drawing upon our insights, we have proposed a comprehensive roadmap for future research centred on (i) knowing the interlocutor, (ii) generating timely explanations, (ii) communicating human-friendly explanations, and (iv) continuous learning. Our roadmap is underpinned by principles of responsible research and innovation, emphasising the significance of diverse explanation requirements. To effectively tackle the challenges associated with implementing explainable AV systems, we have delineated various research directions, including the development of privacy-preserving data integration, ethical frameworks, real-time analytics, human-centric interaction design, and enhanced cross-disciplinary collaborations. By exploring these research directions, the study aims to guide the development and deployment of explainable AVs, informed by a holistic understanding of user needs, technological advancements, regulatory compliance, and ethical considerations, thereby ensuring safer and more trustworthy autonomous driving experiences.

翻訳日:2024-04-07 23:17:33 公開日:2024-03-19

# 評価学:評価の科学と工学

Evaluatology: The Science and Engineering of Evaluation ( http://arxiv.org/abs/2404.00021v1 )

ライセンス: Link先を確認

Jianfeng Zhan, Lei Wang, Wanling Gao, Hongxiao Li, Chenxi Wang, Yunyou Huang, Yatao Li, Zhengxin Yang, Guoxin Kang, Chunjie Luo, Hainan Ye, Shaopeng Dai, Zhifei Zhang,

(参考訳) 評価は人間の存在の重要な側面であり、様々な分野で重要な役割を果たしている。しかし、普遍的な概念、用語、理論、方法論についてのコンセンサスが欠如している経験的かつアドホックな方法でアプローチされることがしばしばある。この合意の欠如は大きな反響を呼んだ。本稿では,評価の科学と工学を包含する評価学の分野を正式に紹介することを目的とする。本稿では,様々な分野にまたがって適用可能な概念,用語,理論,方法論を包含して評価するための普遍的な枠組みを提案する。本研究は,多種多様な被験者に対して客観的に評価条件を適用し,測定および/または試験によって異なる被験者の影響を推定する実験を行うことが評価の本質であることを明らかにした。評価の本質から,評価結果の重要側面に着目した5つの公理を基礎評価理論として提案する。これらの公理は、普遍的な評価理論と方法論を構築する基盤となる。 1つの主題を評価する場合、同値性の異なる評価条件を作成することが不可欠である。これらの条件を多様な対象に適用することにより、基準評価モデルを確立することができる。これらのモデルでは、他のすべての変数をコントロールとして保ちながら、単一の独立変数を一度に変更することができます。複雑なシナリオを評価するとき、鍵となるのは、推移性を維持する一連の評価モデルを確立することである。評価の科学に基づいて,同値性の異なる評価条件として,ベンチマークの形式的定義を提案する。この概念は、様々な分野にまたがって評価を行う、普遍的なベンチマークベースのエンジニアリングアプローチの基盤となる。

Evaluation is a crucial aspect of human existence and plays a vital role in various fields. However, it is often approached in an empirical and ad-hoc manner, lacking consensus on universal concepts, terminologies, theories, and methodologies. This lack of agreement has significant repercussions. This article aims to formally introduce the discipline of evaluatology, which encompasses the science and engineering of evaluation. We propose a universal framework for evaluation, encompassing concepts, terminologies, theories, and methodologies that can be applied across various disciplines. Our research reveals that the essence of evaluation lies in conducting experiments that intentionally apply a well-defined evaluation condition to diverse subjects and infer the impact of different subjects by measuring and/or testing. Derived from the essence of evaluation, we propose five axioms focusing on key aspects of evaluation outcomes as the foundational evaluation theory. These axioms serve as the bedrock upon which we build universal evaluation theories and methodologies. When evaluating a single subject, it is crucial to create evaluation conditions with different levels of equivalency. By applying these conditions to diverse subjects, we can establish reference evaluation models. These models allow us to alter a single independent variable at a time while keeping all other variables as controls. When evaluating complex scenarios, the key lies in establishing a series of evaluation models that maintain transitivity. Building upon the science of evaluation, we propose a formal definition of a benchmark as a simplified and sampled evaluation condition that guarantees different levels of equivalency. This concept serves as the cornerstone for a universal benchmark-based engineering approach to evaluation across various disciplines, which we refer to as benchmarkology.

翻訳日:2024-04-07 23:17:33 公開日:2024-03-19

# WoLF: CXR理解のための大規模言語モデルフレームワーク

WoLF: Large Language Model Framework for CXR Understanding ( http://arxiv.org/abs/2403.15456v1 )

ライセンス: Link先を確認

Seil Kang, Donghyun Kim, Junhyeok Kim, Hyo Kyung Lee, Seong Jae Hwang,

(参考訳) 最新の視覚言語モデル(VLM)による胸部X線(CXR)の理解に向けた重要な手法が開発され、視覚質問応答(VQA)とCXRレポート生成能力が目覚ましい。しかし、既存のCXR理解フレームワークには、手続き上の注意事項がいくつか残っている。 1) 総合的視覚質問応答 (VQA) には不十分なCXRレポートのみを使用する従来手法では, 薬物歴や先行診断などの健康関連データが必要であった。 2) 従来の手法では生のCXRレポートを使用しており, 任意に構造化されることが多い。現代の言語モデルは、様々なテキスト形式を理解できるが、より明確で組織化された解剖学的情報のためのレポートの再構築は、それらの有用性を高めることができる。 3) CXR-VQAの現在の評価手法は, 主に言語的正当性を重視しており, 生成した回答の微妙な評価を行う能力は欠如している。本稿では,CXR理解のための広スコープ大言語モデルフレームワークであるWoLFを紹介する。 1) 実際の臨床シナリオにおいて, 正確な診断に利用される多面的な患者の記録を収集する。具体的には、電子健康記録(EHR)を用いて、CXR理解に適した指示追従データを生成する。 2)CXRレポートでは,注意ステップ内においても注意を隠蔽して,解剖学的構造に基づく知識の疎結合化によるレポート生成性能の向上が図られている。 (3)に対処するため,LLMの性能評価に最適化されたAI評価プロトコルを提案する。大規模な実験的検証を通じて、WoLFはVQA(平均スコア+9.47%まで)とレポート生成(+7.3%p BLEU-1まで)に関するAI評価領域におけるMIMIC-CXRの他のモデルよりも優れた性能を示す。

Significant methodological strides have been made toward Chest X-ray (CXR) understanding via modern vision-language models (VLMs), demonstrating impressive Visual Question Answering (VQA) and CXR report generation abilities. However, existing CXR understanding frameworks still possess several procedural caveats. (1) Previous methods solely use CXR reports, which are insufficient for comprehensive Visual Question Answering (VQA), especially when additional health-related data like medication history and prior diagnoses are needed. (2) Previous methods use raw CXR reports, which are often arbitrarily structured. While modern language models can understand various text formats, restructuring reports for clearer, organized anatomy-based information could enhance their usefulness. (3) Current evaluation methods for CXR-VQA primarily emphasize linguistic correctness, lacking the capability to offer nuanced assessments of the generated answers. In this work, to address the aforementioned caveats, we introduce WoLF, a Wide-scope Large Language Model Framework for CXR understanding. To resolve (1), we capture multi-faceted records of patients, which are utilized for accurate diagnoses in real-world clinical scenarios. Specifically, we adopt the Electronic Health Records (EHR) to generate instruction-following data suited for CXR understanding. Regarding (2), we enhance report generation performance by decoupling knowledge in CXR reports based on anatomical structure even within the attention step via masked attention. To address (3), we introduce an AI-evaluation protocol optimized for assessing the capabilities of LLM. Through extensive experimental validations, WoLF demonstrates superior performance over other models on MIMIC-CXR in the AI-evaluation arena about VQA (up to +9.47%p mean score) and by metrics about report generation (+7.3%p BLEU-1).

翻訳日:2024-03-26 22:41:56 公開日:2024-03-19

# 信頼できるAIへの旅その1:実践的なフレームワークの探求

The Journey to Trustworthy AI- Part 1: Pursuit of Pragmatic Frameworks ( http://arxiv.org/abs/2403.15457v1 )

ライセンス: Link先を確認

Mohamad M Nasr-Azadani, Jean-Luc Chatelain,

(参考訳) 本稿では,信頼に値する人工知能(TAI)とその様々な定義についてレビューする。あらゆる社会で尊重される原則を考えると、TAIはしばしばいくつかの属性によって特徴づけられる。我々は、TAIの代わりにResponsibleやEthical AIといった用語を使うことに反対する。そして、混乱を明確にするために、私たちはそれらを置き去りにすることを提案します。 TAIに固有の主観性と複雑性を考えると、普遍的な枠組みの開発は不可能であると考えられる。代わりに、フェアネス、バイアス、リスク、セキュリティ、説明可能性、信頼性といった重要な属性や特性に対処するアプローチを提唱します。我々は、EU、中国、米国におけるイニシアチブに焦点をあてて、現在進行中の規制の状況について検討する。我々は、地政学的理由と地理的理由に基づくAI規制の違いが、多国籍企業にとってさらなる課題となることを認識している。我々はリスクをAI規制とTAIの中核要因とみなしている。例えば、EU-AI法(EU-AI Act)で概説されているように、組織はAI製品のリスクレベルを評価して、それに従って行動しなければならない(あるいはリスクヘビーな罰金)。私たちは、TAI実装のモダリティと、複数のクロスファンクショナルチームがプロセス全体に従事しているかを比較します。したがって、TAIを実践するための残酷な力のアプローチは、その効率性と機敏さ、ムートをもたらす。これを解決するために、当社のフレームワークであるSet-Formalize-Measure-Act(SFMA)を紹介します。私たちのソリューションでは、TAI対応メトリクス、TAIのドライバ、ステークホルダ、ビジネス/法律要件を実際のベンチマークやテストに変換することの重要性を強調しています。最後に、強力なAIモデルのパニックによって引き起こされる過剰な規制は、事実、TAIにも害を与える可能性がある。 GitHubのユーザアクティビティデータに基づいて、2023年には、AIオープンソースプロジェクトがコントリビュータアカウントによってトッププロジェクトに昇格した。 TAIにおけるイノベーションの実現は、オープンソースコミュニティの独立した貢献に依存している。

This paper reviews Trustworthy Artificial Intelligence (TAI) and its various definitions. Considering the principles respected in any society, TAI is often characterized by a few attributes, some of which have led to confusion in regulatory or engineering contexts. We argue against using terms such as Responsible or Ethical AI as substitutes for TAI. And to help clarify any confusion, we suggest leaving them behind. Given the subjectivity and complexity inherent in TAI, developing a universal framework is deemed infeasible. Instead, we advocate for approaches centered on addressing key attributes and properties such as fairness, bias, risk, security, explainability, and reliability. We examine the ongoing regulatory landscape, with a focus on initiatives in the EU, China, and the USA. We recognize that differences in AI regulations based on geopolitical and geographical reasons pose an additional challenge for multinational companies. We identify risk as a core factor in AI regulation and TAI. For example, as outlined in the EU-AI Act, organizations must gauge the risk level of their AI products to act accordingly (or risk hefty fines). We compare modalities of TAI implementation and how multiple cross-functional teams are engaged in the overall process. Thus, a brute force approach for enacting TAI renders its efficiency and agility, moot. To address this, we introduce our framework Set-Formalize-Measure-Act (SFMA). Our solution highlights the importance of transforming TAI-aware metrics, drivers of TAI, stakeholders, and business/legal requirements into actual benchmarks or tests. Finally, over-regulation driven by panic of powerful AI models can, in fact, harm TAI too. Based on GitHub user-activity data, in 2023, AI open-source projects rose to top projects by contributor account. Enabling innovation in TAI hinges on the independent contributions of the open-source community.

翻訳日:2024-03-26 22:41:56 公開日:2024-03-19

# ゲーム内トレーシュ音声検出のための微調整事前学習言語モデル

Fine-Tuning Pre-trained Language Models to Detect In-Game Trash Talks ( http://arxiv.org/abs/2403.15458v1 )

ライセンス: Link先を確認

Daniel Fesalbon, Arvin De La Cruz, Marvin Mallari, Nelson Rodelas,

(参考訳) オンラインモバイルゲームやコンピュータゲームの一般的な問題は、プレイヤー間の有害な行動と虐待的なコミュニケーションに関連していた。異なるレポートや研究に基づいて、オンラインヘイトスピーチと毒性がプレイヤーのゲーム内パフォーマンスおよび全体的な幸福に与える影響についても論じている。本研究は,ゲーム内チャットにおける有害性を検出するために,事前学習されたBERT言語モデルとGPT言語モデルの性能を評価し評価する。公開APIを用いて、DOTA 2のゲームマッチのゲーム内チャットデータを収集し、処理し、レビューし、非毒性、軽度(毒性)、有毒とラベル付けした。この研究は、BERT(Base-uncased)、BERT(Large-uncased)、GPT-3モデルのトレーニングとテストのために、約2万のゲーム内チャットを収集することができた。本研究は,3つのモデルの最先端性能に基づいて,オンラインヘイトスピーチとゲーム内侮辱的ゴミ話に対処する事前学習された言語モデルの有望な可能性について結論づける。

Common problems in playing online mobile and computer games were related to toxic behavior and abusive communication among players. Based on different reports and studies, the study also discusses the impact of online hate speech and toxicity on players' in-game performance and overall well-being. This study investigates the capability of pre-trained language models to classify or detect trash talk or toxic in-game messages The study employs and evaluates the performance of pre-trained BERT and GPT language models in detecting toxicity within in-game chats. Using publicly available APIs, in-game chat data from DOTA 2 game matches were collected, processed, reviewed, and labeled as non-toxic, mild (toxicity), and toxic. The study was able to collect around two thousand in-game chats to train and test BERT (Base-uncased), BERT (Large-uncased), and GPT-3 models. Based on the three models' state-of-the-art performance, this study concludes pre-trained language models' promising potential for addressing online hate speech and in-game insulting trash talk.

翻訳日:2024-03-26 22:41:56 公開日:2024-03-19

# 言語生産のオンライン研究における効果の大きさ, 多様性, 力の評価

Assessing effect sizes, variability, and power in the on-line study of language production ( http://arxiv.org/abs/2403.15459v1 )

ライセンス: Link先を確認

Bürki Audrey, Vasishth Shravan,

(参考訳) パンデミックにより、多くの実験心理学者や言語学者がインターネット上でデータを集め始めた(オンラインデータ以降)。このような実験の実現可能性と、将来の実験で十分な統計的パワーを達成するために必要なサンプルサイズを評価する必要がある。これにより、効果の大きさや変動性に関する情報が必要となる。そこで本研究では,実験室とオンラインで行った同じ単語生成実験で得られた応答時間データを比較した。これらの分析により,2つの設定が効果サイズに異なるか,実験中における応答の整合性,参加者間の平均応答時間のばらつき,参加者間の効果サイズの大きさ,説明できない変数の量で異なるかを決定することができる。一連のシミュレーションにおいて,これらの違いが設計のパワーに与える影響を評価する。これまでの研究から得られた熱意を抑えつつ, オンライン生産研究は実現可能であるが, 非無視コストが伴う可能性が示唆された。オンライン言語生産研究において十分なパワーを達成するために必要なサンプルサイズは、手作業量の増加が不可避である。

With the pandemic, many experimental psychologists and linguists have started to collect data over the internet (hereafter on-line data). The feasibility of such experiments and the sample sizes required to achieve sufficient statistical power in future experiments have to be assessed. This in turn requires information on effect sizes and variability. In a series of analyses, we compare response time data obtained in the same word production experiment conducted in the lab and on-line. These analyses allow us to determine whether the two settings differ in effect sizes, in the consistency of responses over the course of the experiment, in the variability of average response times across participants, in the magnitude of effect sizes across participants, or in the amount of unexplained variability. We assess the impact of these differences on the power of the design in a series of simulations. Our findings temper the enthusiasm raised by previous studies and suggest that on-line production studies might be feasible but at a non-negligible cost. The sample sizes required to achieve sufficient power in on-line language production studies come with a non-negligible increase in the amount of manual labour.

翻訳日:2024-03-26 22:41:56 公開日:2024-03-19

# FUELVISION:ワイルドファイア燃料マッピングのためのマルチモーダルデータフュージョンとマルチモデルアンサンブルアルゴリズム

FUELVISION: A Multimodal Data Fusion and Multimodel Ensemble Algorithm for Wildfire Fuels Mapping ( http://arxiv.org/abs/2403.15462v1 )

ライセンス: Link先を確認

Riyaaz Uddien Shaik, Mohamad Alipour, Eric Rowell, Bharathan Balaji, Adam Watts, Ertugrul Taciroglu,

(参考訳) 燃料条件の正確な評価は、火火の点火および行動予測およびリスク管理の前提条件である。提案手法は,ランドサット8光画像,センチネル-1(Cバンド)合成開口レーダ(SAR)画像,PALSAR(Lバンド)SAR画像,地形特徴などの多様なデータソースを利用して,燃料の種類や分布に関する包括的情報を取得する。 USDAフォレストサービスから得られた森林調査プロットデータを用いて「スコット・アンド・バーガン40」などのランドスケープスケールの燃料を推定するために,アンサンブルモデルを訓練した。しかし、この基本的なアプローチは、トレーニングデータの不足により、比較的貧弱な結果をもたらした。 Pseudo-labeled and fully synthetic datasets were developed using Generative AI approach to address the limit of ground truth data available。これらの合成データセットは、モデルトレーニングの堅牢性とカバレッジを高めるために、カリフォルニアからFIAデータを増強するために使用された。ディープラーニングニューラルネットワーク、決定木、勾配向上など、一連の手法を使用することで、燃料マッピングの精度は80%近く向上した。大規模な実験と評価を通じて、2021年のディクシー火災とカルドール火災の地域で提案手法の有効性が検証された。国立農業画像計画(NAIP)と木材収穫図の高分解能データとの比較分析により, ほぼリアルタイムの燃料マッピングが可能な提案手法の堅牢性と信頼性が確認された。

Accurate assessment of fuel conditions is a prerequisite for fire ignition and behavior prediction, and risk management. The method proposed herein leverages diverse data sources including Landsat-8 optical imagery, Sentinel-1 (C-band) Synthetic Aperture Radar (SAR) imagery, PALSAR (L-band) SAR imagery, and terrain features to capture comprehensive information about fuel types and distributions. An ensemble model was trained to predict landscape-scale fuels such as the 'Scott and Burgan 40' using the as-received Forest Inventory and Analysis (FIA) field survey plot data obtained from the USDA Forest Service. However, this basic approach yielded relatively poor results due to the inadequate amount of training data. Pseudo-labeled and fully synthetic datasets were developed using generative AI approaches to address the limitations of ground truth data availability. These synthetic datasets were used for augmenting the FIA data from California to enhance the robustness and coverage of model training. The use of an ensemble of methods including deep learning neural networks, decision trees, and gradient boosting offered a fuel mapping accuracy of nearly 80\%. Through extensive experimentation and evaluation, the effectiveness of the proposed approach was validated for regions of the 2021 Dixie and Caldor fires. Comparative analyses against high-resolution data from the National Agriculture Imagery Program (NAIP) and timber harvest maps affirmed the robustness and reliability of the proposed approach, which is capable of near-real-time fuel mapping.

翻訳日:2024-03-26 22:41:56 公開日:2024-03-19

# 絶え間ない世界における異常発見:連続学習における画素レベル異常検出のためのベンチマーク

Unveiling the Anomalies in an Ever-Changing World: A Benchmark for Pixel-Level Anomaly Detection in Continual Learning ( http://arxiv.org/abs/2403.15463v1 )

ライセンス: Link先を確認

Nikola Bugarin, Jovana Bugaric, Manuel Barusco, Davide Dalle Pezze, Gian Antonio Susto,

(参考訳) 異常検出は多くの実世界のアプリケーション、特に画像を扱う場合、関連する問題である。しかし、入力データ分布の時間的変化にはほとんど注意が払われておらず、性能が著しく低下する可能性がある。本研究では,連続学習環境におけるPixel-Level Anomaly Detectionの問題点について検討する。本研究では,古典的環境における異常検出問題の解決と,継続学習環境における動作に適応するために,いくつかの最先端技術を実装した。アプローチを検証するために,画素ベースの異常のある実世界の画像データセットを用いて,信頼性の高いベンチマークを提供し,この分野のさらなる進歩の基盤として機能する。我々は、どの異常検出方法と、どのファミリーのアプローチが継続的な学習環境に適しているかについて議論する包括的分析を行う。

Anomaly Detection is a relevant problem in numerous real-world applications, especially when dealing with images. However, little attention has been paid to the issue of changes over time in the input data distribution, which may cause a significant decrease in performance. In this study, we investigate the problem of Pixel-Level Anomaly Detection in the Continual Learning setting, where new data arrives over time and the goal is to perform well on new and old data. We implement several state-of-the-art techniques to solve the Anomaly Detection problem in the classic setting and adapt them to work in the Continual Learning setting. To validate the approaches, we use a real-world dataset of images with pixel-based anomalies to provide a reliable benchmark and serve as a foundation for further advancements in the field. We provide a comprehensive analysis, discussing which Anomaly Detection methods and which families of approaches seem more suitable for the Continual Learning setting.

翻訳日:2024-03-26 22:41:56 公開日:2024-03-19

# EHRを用いたLLMによるFew-Shot病の予測:予測エージェント推論とクリティカルエージェント指導を組み合わせた新しいアプローチ

LLMs-based Few-Shot Disease Predictions using EHR: A Novel Approach Combining Predictive Agent Reasoning and Critical Agent Instruction ( http://arxiv.org/abs/2403.15464v1 )

ライセンス: Link先を確認

Hejie Cui, Zhuocheng Shen, Jieyu Zhang, Hui Shao, Lianhui Qin, Joyce C. Ho, Carl Yang,

(参考訳) 電子健康記録(EHR)は、疾患予測などの健康関連予測タスクに有用な患者データを含んでいる。従来のアプローチは、巨大なラベル付きデータセットを必要とする教師付き学習手法に依存しており、高価で入手が難しい。本研究では,Large Language Models (LLMs) を用いて,構造化患者訪問データ(例えば,診断,検査,処方薬)を自然言語の物語に変換する可能性について検討した。様々なERH予測指向のプロンプト戦略を用いて,LLMのゼロショット性能と少数ショット性能を評価した。さらに、予測を行い、推論プロセスを生成する予測エージェントと、誤った予測を解析し、予測エージェントの推論を改善するためのガイダンスを提供する批評家エージェントと、異なる役割を持つLLMエージェントを利用する新しいアプローチを提案する。提案手法により,従来のERHによる疾患予測における教師あり学習法と比較して,LLMは極めて少ない性能を達成でき,健康志向の応用の可能性が示唆された。

Electronic health records (EHRs) contain valuable patient data for health-related prediction tasks, such as disease prediction. Traditional approaches rely on supervised learning methods that require large labeled datasets, which can be expensive and challenging to obtain. In this study, we investigate the feasibility of applying Large Language Models (LLMs) to convert structured patient visit data (e.g., diagnoses, labs, prescriptions) into natural language narratives. We evaluate the zero-shot and few-shot performance of LLMs using various EHR-prediction-oriented prompting strategies. Furthermore, we propose a novel approach that utilizes LLM agents with different roles: a predictor agent that makes predictions and generates reasoning processes and a critic agent that analyzes incorrect predictions and provides guidance for improving the reasoning of the predictor agent. Our results demonstrate that with the proposed approach, LLMs can achieve decent few-shot performance compared to traditional supervised learning methods in EHR-based disease predictions, suggesting its potential for health-oriented applications.

翻訳日:2024-03-26 22:41:56 公開日:2024-03-19

# ロールアウトアルゴリズムによる$n$-Grams, Transformer, HMMs, Markov Chainsの配列生成

Most Likely Sequence Generation for $n$-Grams, Transformers, HMMs, and Markov Chains, by Using Rollout Algorithms ( http://arxiv.org/abs/2403.15465v1 )

ライセンス: Link先を確認

Yuchao Li, Dimitri Bertsekas,

(参考訳) 本稿では,ChatGPTの基盤となるような$n$-gram構造を持つ変圧器について考察する。変換器は、単語列を生成するために使用できる次の単語確率を提供する。これらの確率に基づいて,確率の高い単語列の計算方法を検討する。与えられた初期状態から始まる最適な(最も可能性が高い)単語列を計算することは難解な問題であり、我々は$N$の低次多項式で$n$-gramのボキャブラリサイズである時間に$N$の単語列を計算する方法を提案する。これらの手法は、任意のヒューリスティックポリシーの性能を向上させることができる単一のポリシーイテレーションの形式である、近似動的プログラミングからのロールアウトアプローチに基づいている。私たちの場合、最も高い確率を持つ次の単語を生成する欲求的ヒューリスティックを使用します。解析, 実例, 計算実験により, 本手法は, 強欲なヒューリスティックよりも計算量がわずかに増加し, 高い確率の列を生成することができることを示した。解析と実験は変換器やChatGPTのようなモデルで生じる型のマルコフ連鎖に焦点が当てられているが、本手法は一般的な有限状態マルコフ連鎖や、ビタビ復号法が広く用いられるHMM(Hidden Markov Models)の関連する推論応用に適用できる。

In this paper we consider a transformer with an $n$-gram structure, such as the one underlying ChatGPT. The transformer provides next word probabilities, which can be used to generate word sequences. We consider methods for computing word sequences that are highly likely, based on these probabilities. Computing the optimal (i.e., most likely) word sequence starting with a given initial state is an intractable problem, so we propose methods to compute highly likely sequences of $N$ words in time that is a low order polynomial in $N$ and in the vocabulary size of the $n$-gram. These methods are based on the rollout approach from approximate dynamic programming, a form of single policy iteration, which can improve the performance of any given heuristic policy. In our case we use a greedy heuristic that generates as next word one that has the highest probability. We show with analysis, examples, and computational experimentation that our methods are capable of generating highly likely sequences with a modest increase in computation over the greedy heuristic. While our analysis and experiments are focused on Markov chains of the type arising in transformer and ChatGPT-like models, our methods apply to general finite-state Markov chains, and related inference applications of Hidden Markov Models (HMM), where Viterbi decoding is used extensively.

翻訳日:2024-03-26 22:41:56 公開日:2024-03-19

# ストラグラー存在下での1ビット勾配符号化に基づく分散学習

Distributed Learning based on 1-Bit Gradient Coding in the Presence of Stragglers ( http://arxiv.org/abs/2403.14716v1 )

ライセンス: Link先を確認

Chengxi Li, Mikael Skoglund,

(参考訳) 本稿では,トラグラーの存在下での分散学習(DL)の問題について考察する。この問題に対して、勾配符号化に基づくDL手法が広く研究されており、労働者がストラグラーである場合の収束を保証するために、トレーニングデータを冗長に労働者に配布している。しかし、これらの手法では、学習中に実数値ベクトルを送信する必要があるため、非常に高い通信負担が生じる。この欠点を克服するために,1ビット勾配符号化(1ビットGCDL)に基づく新しいDL手法を提案する。理論的には、凸損失関数と非凸損失関数の両方に対する提案手法の収束保証を提供する。 1ビットのGC-DLはベースライン法よりも優れており、同じ通信オーバヘッド下での学習性能が向上する。

This paper considers the problem of distributed learning (DL) in the presence of stragglers. For this problem, DL methods based on gradient coding have been widely investigated, which redundantly distribute the training data to the workers to guarantee convergence when some workers are stragglers. However, these methods require the workers to transmit real-valued vectors during the process of learning, which induces very high communication burden. To overcome this drawback, we propose a novel DL method based on 1-bit gradient coding (1-bit GCDL), where 1-bit data encoded from the locally computed gradients are transmitted by the workers to reduce the communication overhead. We theoretically provide the convergence guarantees of the proposed method for both the convex loss functions and nonconvex loss functions. It is shown empirically that 1-bit GC-DL outperforms the baseline methods, which attains better learning performance under the same communication overhead.

翻訳日:2024-03-25 21:41:26 公開日:2024-03-19

# FedSR: IoTシステムにおける非IID性のための半分散フェデレーション学習アルゴリズム

FedSR: A Semi-Decentralized Federated Learning Algorithm for Non-IIDness in IoT System ( http://arxiv.org/abs/2403.14718v1 )

ライセンス: Link先を確認

Jianjun Huang, Lixin Ye, Li Kang,

(参考訳) IoT(Industrial Internet of Things)では、大量のデータが毎日生成される。プライバシとセキュリティの問題から、これらすべてのデータをまとめてディープラーニングモデルをトレーニングすることは難しいため、データプライバシを保護する分散型機械学習パラダイムであるフェデレーション学習がIoTで広く使用されている。しかし、実践的なフェデレート学習では、データ分布は通常デバイス間で大きな差異があり、データの均一性はモデルの性能を低下させる。さらに、IoTのフェデレーション学習は通常、トレーニングに関わる多数のデバイスを持ち、クラウドサーバの限られた通信リソースは、トレーニングのボトルネックになる。上記の課題に対処するため,本論文では,集中型フェデレーション学習と分散型フェデレーション学習を組み合わせて,半分散型クラウドエッジデバイス階層型フェデレーション学習フレームワークを設計する。データの不均一性の影響に対処するために、各リングクラスタにおける漸進的な段階的最適化アルゴリズムを用いて、リングクラスタモデルの一般化能力を向上する。我々の大規模な実験は、当社のアプローチがデータ不均一性の影響を効果的に軽減し、クラウドサーバにおける通信ボトルネックを軽減することができることを示している。

In the Industrial Internet of Things (IoT), a large amount of data will be generated every day. Due to privacy and security issues, it is difficult to collect all these data together to train deep learning models, thus the federated learning, a distributed machine learning paradigm that protects data privacy, has been widely used in IoT. However, in practical federated learning, the data distributions usually have large differences across devices, and the heterogeneity of data will deteriorate the performance of the model. Moreover, federated learning in IoT usually has a large number of devices involved in training, and the limited communication resource of cloud servers become a bottleneck for training. To address the above issues, in this paper, we combine centralized federated learning with decentralized federated learning to design a semi-decentralized cloud-edge-device hierarchical federated learning framework, which can mitigate the impact of data heterogeneity, and can be deployed at lage scale in IoT. To address the effect of data heterogeneity, we use an incremental subgradient optimization algorithm in each ring cluster to improve the generalization ability of the ring cluster models. Our extensive experiments show that our approach can effectively mitigate the impact of data heterogeneity and alleviate the communication bottleneck in cloud servers.

翻訳日:2024-03-25 21:41:26 公開日:2024-03-19

# ラベルの平滑化が選択的分類を低下させる理由と修正方法

Understanding Why Label Smoothing Degrades Selective Classification and How to Fix It ( http://arxiv.org/abs/2403.14715v1 )

ライセンス: Link先を確認

Guoxuan Xia, Olivier Laurent, Gianni Franchi, Christos-Savvas Bouganis,

(参考訳) ラベルスムーシング(LS)は、テスト精度の向上と実装の単純さにより、ディープニューラルネットワーク分類器をトレーニングするための一般的な正規化手法である。ハード」ワンホットラベルは、確率質量を他のクラスに均一に分散し、オーバーフィッティングを減らすことで「平滑化」される。本研究では,LSが選択分類(SC)に悪影響を及ぼすことを明らかにする。まず、LSがSCに一貫した劣化をもたらす様々なタスクやアーキテクチャを経験的に実証する。次に、ロジトレベルの勾配を解析することにより、LSはエラーの確率が低い場合には最大ロジトをより規則化し、エラーの確率が高い場合はより小さくすることで、過信と過信を悪化させることを示す。この結果より, SCでは強い分類器が不十分であったことが示唆された。次に,LSによる損失SCの回復に対するロジット正規化の有効性を実証した。さらに、勾配解析に基づいて、なぜそのような正規化が有効かを説明する。近いうちにコードを公開します。

Label smoothing (LS) is a popular regularisation method for training deep neural network classifiers due to its effectiveness in improving test accuracy and its simplicity in implementation. "Hard" one-hot labels are "smoothed" by uniformly distributing probability mass to other classes, reducing overfitting. In this work, we reveal that LS negatively affects selective classification (SC) - where the aim is to reject misclassifications using a model's predictive uncertainty. We first demonstrate empirically across a range of tasks and architectures that LS leads to a consistent degradation in SC. We then explain this by analysing logit-level gradients, showing that LS exacerbates overconfidence and underconfidence by regularising the max logit more when the probability of error is low, and less when the probability of error is high. This elucidates previously reported experimental results where strong classifiers underperform in SC. We then demonstrate the empirical effectiveness of logit normalisation for recovering lost SC performance caused by LS. Furthermore, based on our gradient analysis, we explain why such normalisation is effective. We will release our code shortly.

翻訳日:2024-03-25 21:31:40 公開日:2024-03-19

# 電子商取引における個人・製品関連前駆体

Individual and Product-Related Antecedents of Electronic Word-of-Mouth ( http://arxiv.org/abs/2403.14717v1 )

ライセンス: Link先を確認

Bogdan Anastasiei, Nicoleta Dospinescu, Octavian Dospinescu,

(参考訳) 本研究は,eWOMの正負電子ワード・オブ・マウス(eWOM)の適合性,およびeWOMの正電子ワード・オブ・マウス(eWOM)の正電子ワード・オブ・マウス(eWOM)の正電子ワード・オブ・マウス(eWOM)の正電子ワード・オブ・マウス(eWOM)の適合性について検討した。製品関連変数と個人要因の2種類のeWOM予測器が検討された。データはルーマニアの335人の被験者を対象にしたオンライン調査を通じて収集され、解析方法は構造方程式モデリングである。以上の結果から,個人的要因(ソーシャルメディアの利用行動,マーケティングマニアニズム,評価の必要性)が,製品レビューやコメントをオンラインで作成する意図の最も重要な先駆者であることを示唆した。製品関連因子から、eWOMを提供するための適合性に影響を与えるのはブランド信頼のみである。さらに、肯定的および否定的なeWOM意図は、再購入意図と関連している。

This research investigates the antecedents of positive and negative electronic word-of-mouth (eWOM) propensity, as well as the impact of eWOM propensity on the intention to repurchase the product. Two types of eWOM predictors were considered: product related variables and personal factors. The data were collected through an online survey conducted on a sample of 335 Romanian subjects, and the analysis method was Structural Equation Modeling. Our findings show that personal factors - social media usage behavior, marketing mavenism and need to evaluate - are the most important antecedents of the intention to write product reviews and comments online, either positive or negative. From the product related factors, only brand trust influences the propensity to provide eWOM. Furthermore, both positive and negative eWOM intentions are associated with the repurchase intention.

翻訳日:2024-03-25 21:31:40 公開日:2024-03-19

# カラーアウェア置換によるLCM透かしのバイパス

Bypassing LLM Watermarks with Color-Aware Substitutions ( http://arxiv.org/abs/2403.14719v1 )

ライセンス: Link先を確認

Qilong Wu, Varun Chandrasekaran,

(参考訳) テキストが人間か大きな言語モデル(LLM)であるかどうかを識別するために、透かし手法が提案されている。 Kirchenbauer et al (2023a) の最先端の透かし戦略は LLM を偏り、特定の (`green'') トークンを生成する。しかし、この透かし法の堅牢性を決定することは未解決の問題である。既存の攻撃方法は、長いテキストセグメントの検出を回避できない。我々はこの制限を克服し、最初の「カラーアウェア」攻撃であるSCTS(Self Color Testing-based Substitution)を提案する。 SCTSは、ウォーターマークされたLCMを戦略的に促し、出力トークンの周波数を比較することで、色情報を取得する。この情報を使ってトークンの色を決定し、緑色のトークンを非緑色のトークンに置き換える。本実験においてSCTSは関連する作業よりも少ない編集数で透かし検出を回避した。さらに、SCTSが任意の長さの透かしテキストの透かしを除去できることを理論的および実証的に示す。

Watermarking approaches are proposed to identify if text being circulated is human or large language model (LLM) generated. The state-of-the-art watermarking strategy of Kirchenbauer et al. (2023a) biases the LLM to generate specific (``green'') tokens. However, determining the robustness of this watermarking method is an open problem. Existing attack methods fail to evade detection for longer text segments. We overcome this limitation, and propose {\em Self Color Testing-based Substitution (SCTS)}, the first ``color-aware'' attack. SCTS obtains color information by strategically prompting the watermarked LLM and comparing output tokens frequencies. It uses this information to determine token colors, and substitutes green tokens with non-green ones. In our experiments, SCTS successfully evades watermark detection using fewer number of edits than related work. Additionally, we show both theoretically and empirically that SCTS can remove the watermark for arbitrarily long watermarked text.

翻訳日:2024-03-25 21:31:40 公開日:2024-03-19

# 粒子加速器におけるビームダイナミクスの生成と予測のための条件付き遅延自己回帰リカレントモデル

A conditional latent autoregressive recurrent model for generation and forecasting of beam dynamics in particle accelerators ( http://arxiv.org/abs/2403.13858v1 )

ライセンス: Link先を確認

Mahindra Rautela, Alan Williams, Alexander Scheinker,

(参考訳) 粒子加速器は、高エネルギーに荷電粒子ビームを集中させ、誘導し、加速する複雑なシステムである。ビーム診断は, 限られた非破壊測定, 計算要求シミュレーション, システム内固有の不確実性により, 困難な問題となる。本研究では,加速器内の荷電粒子の時空間的ダイナミクスを学習するための,条件付き遅延自己回帰時間モデル(CLARM)と呼ばれる2段階の教師なしディープラーニングフレームワークを提案する。 CLARMは、6次元位相空間を低次元の潜在分布に変換する条件変分オートエンコーダ(CVAE)と、時間的ダイナミクスを自己回帰的に捉えるLong Short-Term Memory(LSTM)ネットワークで構成される。 CLARMは、潜在空間表現をサンプリングして復号することで、様々な加速器モジュールでプロジェクションを生成することができる。このモデルはまた、過去の状態(上流位置)から荷電粒子の将来の状態(下流位置)を予測する。その結果,提案手法の予測能力と生成能力は,様々な評価指標と比較した場合に有望であることが示唆された。

Particle accelerators are complex systems that focus, guide, and accelerate intense charged particle beams to high energy. Beam diagnostics present a challenging problem due to limited non-destructive measurements, computationally demanding simulations, and inherent uncertainties in the system. We propose a two-step unsupervised deep learning framework named as Conditional Latent Autoregressive Recurrent Model (CLARM) for learning the spatiotemporal dynamics of charged particles in accelerators. CLARM consists of a Conditional Variational Autoencoder (CVAE) transforming six-dimensional phase space into a lower-dimensional latent distribution and a Long Short-Term Memory (LSTM) network capturing temporal dynamics in an autoregressive manner. The CLARM can generate projections at various accelerator modules by sampling and decoding the latent space representation. The model also forecasts future states (downstream locations) of charged particles from past states (upstream locations). The results demonstrate that the generative and forecasting ability of the proposed approach is promising when tested against a variety of evaluation metrics.

翻訳日:2024-03-22 18:28:52 公開日:2024-03-19

# 創造性と機械学習: 調査

Creativity and Machine Learning: A Survey ( http://arxiv.org/abs/2104.02726v4 )

ライセンス: Link先を確認

Giorgio Franceschelli, Mirco Musolesi,

(参考訳) 機械学習とクリエイティビティの分野への関心が高まっている。本稿では,計算創造性理論の歴史と現状,鍵となる機械学習技術(生成的深層学習を含む),およびそれに対応する自動評価手法について概説する。この分野における重要な貢献について批判的な議論を行った後、この分野における現在の研究課題と新たな機会について概説する。

There is a growing interest in the area of machine learning and creativity. This survey presents an overview of the history and the state of the art of computational creativity theories, key machine learning techniques (including generative deep learning), and corresponding automatic evaluation methods. After presenting a critical discussion of the key contributions in this area, we outline the current research challenges and emerging opportunities in this field.

翻訳日:2024-03-21 23:26:53 公開日:2024-03-19

# 計量空間における弱凸集合の学習

Learning Weakly Convex Sets in Metric Spaces ( http://arxiv.org/abs/2105.06251v2 )

ライセンス: Link先を確認

Eike Stadtländer, Tamás Horváth, Stefan Wrobel,

(参考訳) 機械学習理論で研究される中心的な問題の1つは、与えられた仮説のクラスに対して、効率よく {consistent} 仮説を見つけることができるかどうか、すなわち、訓練誤差がゼロであるかどうかである。エム凸仮説に関する問題は広く研究されているが、いくつかの非連結領域からなる非凸仮説に対して効率的な学習が可能かどうかという問題は、いまだに理解されていない。弱凸仮説の効率的な学習(凸仮説のパラメータ化緩和)がブール関数の特別な場合において可能であることが、かなり以前から示されてきたが、このアイデアが一般的なパラダイムへと発展できるかどうかという問題は、まだ研究されていない。本稿では,測度空間上の弱凸仮説の広いクラスに対して,一貫した仮説探索問題を多項式時間で解くことができることを示す。そこで本研究では,一貫した弱凸仮説を導出する一般領域非依存アルゴリズムを提案し,その効率性を証明し,対応する仮説クラスを特徴づける。一般のアルゴリズムとその特性を説明するために,いくつかの非自明な学習例について論じ,それに対応する一貫した仮説探索問題を効率的に解く方法を示す。弱凸性制約がなければ、これらの問題は計算的に難解であることが知られている。そして、グラフで頂点分類を行う際に自然に発生するような、我々のアルゴリズムの一般的な考え方は、拡張性の弱い凸仮説にまで拡張可能であることを示す。拡張アルゴリズムを用いることで、領域内の距離を効率的に計算できるような多項式時間で問題を解くことができることを示す。

One of the central problems studied in the theory of machine learning is the question of whether, for a given class of hypotheses, it is possible to efficiently find a {consistent} hypothesis, i.e., which has zero training error. While problems involving {\em convex} hypotheses have been extensively studied, the question of whether efficient learning is possible for non-convex hypotheses composed of possibly several disconnected regions is still less understood. Although it has been shown quite a while ago that efficient learning of weakly convex hypotheses, a parameterized relaxation of convex hypotheses, is possible for the special case of Boolean functions, the question of whether this idea can be developed into a generic paradigm has not been studied yet. In this paper, we provide a positive answer and show that the consistent hypothesis finding problem can indeed be solved in polynomial time for a broad class of weakly convex hypotheses over metric spaces. To this end, we propose a general domain-independent algorithm for finding consistent weakly convex hypotheses and prove sufficient conditions for its efficiency that characterize the corresponding hypothesis classes. To illustrate our general algorithm and its properties, we discuss several non-trivial learning examples to demonstrate how it can be used to efficiently solve the corresponding consistent hypothesis finding problem. Without the weak convexity constraint, these problems are known to be computationally intractable. We then proceed to show that the general idea of our algorithm can even be extended to the case of extensional weakly convex hypotheses, as it naturally arise, e.g., when performing vertex classification in graphs. We prove that using our extended algorithm, the problem can be solved in polynomial time provided the distances in the domain can be computed efficiently.

翻訳日:2024-03-21 23:26:53 公開日:2024-03-19

# 量子エンタングルメントの初等生への導入 : トリレンマの解消

Introducing Quantum Entanglement to First-Year Students: Resolving the Trilemma ( http://arxiv.org/abs/2106.12043v4 )

ライセンス: Link先を確認

W. M. Stuckey, Timothy McDevitt, Michael Silberstein,

(参考訳) 量子力学(Quantum Mechanics, QM)は、入門物理学の教科書で長くカバーされているが、量子エンタングルメントの概念は、急速に成長する量子情報科学の領域と、その広範な実験的検証において重要であるにもかかわらず、一般的にはカバーされていない。このように、物理教育者は、この重要な概念を導入する方法について、自身のデバイスに委ねられている。物理学の教育者が量子絡み合いを導入する方法がどうあるにせよ、謎のベル-不等質-違反相関を含むトリレンマに直面している。 2022年のノーベル物理学賞を完全に無視して、導入の完全性に妥協し、その事実を共有することを単純に選択することができる。彼らは、このミステリーを導入することで、より好奇心をそそる学生達に不満を抱き、QM形式と関連する(同様に神秘的な)保存法則が実験に美しく対応していることを単に伝えただけで、言うべきことは他にない。あるいは、彼らはプレゼンテーションの厳密さを妥協し、競合するQM解釈のメタ物理準位に介入することでミステリーを解決しようとする。ここでは、アインシュタインが19世紀後半に存在した時間拡張と長さ収縮の謎を解くのと全く同じ方法で、このトリレンマを解く。すなわち、我々は「経験的に発見された」事実の数学的結果に基づいて「原理的」な説明を行う。実際、我々の量子絡み合いの原理はアインシュタインが用いたのと同じ原理、すなわち相対性理論や「好ましくない参照フレーム」に基づいている。このように、トリレンマのこの原理的解決は完全であり、満足し、分析的に厳密であり、初年次物理学生のための特殊相対性理論の標準導入としてアクセス可能である。

While quantum mechanics (QM) is covered at length in introductory physics textbooks, the concept of quantum entanglement is typically not covered at all, despite its importance in the rapidly growing area of quantum information science and its extensive experimental confirmation. Thus, physics educators are left to their own devices as to how to introduce this important concept. Regardless of how a physics educator chooses to introduce quantum entanglement, they face a trilemma involving its mysterious Bell-inequality-violating correlations. They can compromise on the the completeness of their introduction and simply choose not to share that fact, totally ignoring the 2022 Nobel Prize in Physics. They can frustrate their more curious students by introducing the mystery and simply telling them that the QM formalism with its associated (equally mysterious) conservation law maps beautifully to the experiments, so there is nothing else that needs to be said. Or, they can compromise the rigor of their presentation and attempt to resolve the mystery by venturing into the metaphysical quagmire of competing QM interpretations. Herein, we resolve this trilemma in precisely the same way that Einstein resolved the mysteries of time dilation and length contraction that existed in the late nineteenth century. That is, we resort to "principle" explanation based on the mathematical consequences of "empirically discovered" facts. Indeed, our principle account of quantum entanglement is even based on the same principle Einstein used, i.e., the relativity principle or "no preferred reference frame." Thus, this principle resolution of the trilemma is as complete, satisfying, analytically rigorous, and accessible as the standard introduction of special relativity for first-year physics students.

翻訳日:2024-03-21 23:26:53 公開日:2024-03-19

# 雑音量子コンピュータの位相データ解析

Topological data analysis on noisy quantum computers ( http://arxiv.org/abs/2209.09371v4 )

ライセンス: Link先を確認

Ismail Yunus Akhalwaya, Shashanka Ubaru, Kenneth L. Clarkson, Mark S. Squillante, Vishnu Jejjala, Yang-Hui He, Kugendran Naidoo, Vasileios Kalantzis, Lior Horesh,

(参考訳) トポロジカルデータ解析(TDA)は,高次元データの複雑で価値の高い形状関連要約を抽出する強力な手法である。しかし、TDA計算における古典的アルゴリズムの計算要求は極端であり、高次特性に対してはすぐに非現実的になる。量子コンピュータは、特定の計算問題に対して大きなスピードアップを達成する可能性を秘めている。実際、TDAはそのような問題の1つとして報告されているが、ロイド、ガーネロン、ザナルディによる量子TDA(QTDA)の定式化のような量子コンピューティングアルゴリズムでは、現在利用できないフォールトトレランスの資格が必要となる。本研究では,高次元古典データに適用可能な短い回路深度のみを必要とする完全実装のエンドツーエンド量子機械学習アルゴリズムであるNISQ-TDAについて述べる。このアルゴリズムは、データローディングの問題に悩まされず、入力データを量子コンピュータに明示的に格納する必要もない。このアルゴリズムは、小さなデータセットに適用された量子コンピューティングデバイスだけでなく、ノイズの多い量子シミュレータ上でもうまく実行された。予備的な経験的結果は、アルゴリズムがノイズに対して堅牢であることを示唆している。

Topological data analysis (TDA) is a powerful technique for extracting complex and valuable shape-related summaries of high-dimensional data. However, the computational demands of classical algorithms for computing TDA are exorbitant, and quickly become impractical for high-order characteristics. Quantum computers offer the potential of achieving significant speedup for certain computational problems. Indeed, TDA has been purported to be one such problem, yet, quantum computing algorithms proposed for the problem, such as the original Quantum TDA (QTDA) formulation by Lloyd, Garnerone and Zanardi, require fault-tolerance qualifications that are currently unavailable. In this study, we present NISQ-TDA, a fully implemented end-to-end quantum machine learning algorithm needing only a short circuit-depth, that is applicable to high-dimensional classical data, and with provable asymptotic speedup for certain classes of problems. The algorithm neither suffers from the data-loading problem nor does it need to store the input data on the quantum computer explicitly. The algorithm was successfully executed on quantum computing devices, as well as on noisy quantum simulators, applied to small datasets. Preliminary empirical results suggest that the algorithm is robust to noise.

翻訳日:2024-03-21 23:26:53 公開日:2024-03-19

# 非負行列分解のための高速乗算更新アルゴリズム

A fast Multiplicative Updates algorithm for Non-negative Matrix Factorization ( http://arxiv.org/abs/2303.17992v2 )

ライセンス: Link先を確認

Mai-Quyen Pham, Jérémy Cohen, Thierry Chonavel,

(参考訳) 非負の行列因子化は、教師なし機械学習において、しばしば解釈可能な部分の積にデータマトリックスを分解する重要なツールである。過去30年間に多くのアルゴリズムが提案されてきた。有名な方法は2002年にLee and Seungによって提案された乗法更新アルゴリズムである。実装が簡単で、スパース非負行列因子化のような一般的な変種に適応でき、最近のベンチマークによると、損失関数がフロベニウスノルムではない多くの問題に対して最先端である。本稿では,各サブプロブレムに対して,ヘッセン行列のより厳密な上界を構築することにより,交互な最大化最小化アルゴリズムと見なされる乗法更新アルゴリズムを改善することを提案する。コンバージェンスはまだ保証されており、我々は実際に合成と実世界の両方のデータセットで、提案したfastMUアルゴリズムが通常の乗算更新アルゴリズムよりも数桁高速であり、フロベニウスの損失に対する最先端の手法と競合することがあることを観察している。

Nonnegative Matrix Factorization is an important tool in unsupervised machine learning to decompose a data matrix into a product of parts that are often interpretable. Many algorithms have been proposed during the last three decades. A well-known method is the Multiplicative Updates algorithm proposed by Lee and Seung in 2002. Multiplicative updates have many interesting features: they are simple to implement and can be adapted to popular variants such as sparse Nonnegative Matrix Factorization, and, according to recent benchmarks, is state-of-the-art for many problems where the loss function is not the Frobenius norm. In this manuscript, we propose to improve the Multiplicative Updates algorithm seen as an alternating majorization minimization algorithm by crafting a tighter upper bound of the Hessian matrix for each alternate subproblem. Convergence is still ensured and we observe in practice on both synthetic and real world dataset that the proposed fastMU algorithm is often several orders of magnitude faster than the regular Multiplicative Updates algorithm, and can even be competitive with state-of-the-art methods for the Frobenius loss.

翻訳日:2024-03-21 23:07:03 公開日:2024-03-19

# OSDaR23:Rail 2023用のオープンセンサーデータ

OSDaR23: Open Sensor Data for Rail 2023 ( http://arxiv.org/abs/2305.03001v2 )

ライセンス: Link先を確認

Rustam Tagiew, Martin Köppel, Karsten Schwalbe, Patrick Denzler, Philipp Neumaier, Tobias Klockau, Martin Boekhoff, Pavel Klasek, Roman Tilly,

(参考訳) 本線鉄道における無人運転を実現するためには、適切なセンサシステムにより、列車の走行路の現実的および潜在的障害を自動的に検出する必要がある。機械学習アルゴリズムは、ここ数年でこのタスクに強力なツールであることが証明されている。しかしながら、これらのアルゴリズムは、トレーニングデータとして鉄道固有のオブジェクトを含む大量の高品質なアノテートデータを必要とする。残念ながら、この要件に対処する公開データセットはすべて、何らかの方法で制限されている。そこで本稿では,2021年9月にドイツのハンブルクで取得した45のサブシーケンスのマルチセンサ・データセットであるOSDaR23について述べる。センサーのセットアップは、複数の校正・同期赤外線(IR)と視覚カメラ(RGB)、ライダー、レーダー、およびレール車両の前面に取り付けられた位置と加速度センサーで構成されている。生データに加えて、データセットには204091のポリリン、多角形、矩形、立方形アノテーションが含まれており、合計で20の異なるオブジェクトクラスがある。これは、鉄道コンテキストに関連するさまざまなオブジェクトクラスに注釈付けされた、初めて公開されたマルチセンサーデータセットである。 data.fid-move.de/dataset/osdar23で利用可能なOSDaR23は、衝突予測以外のタスクにも使用することができる。

To achieve a driverless train operation on mainline railways, actual and potential obstacles for the train's driveway must be detected automatically by appropriate sensor systems. Machine learning algorithms have proven to be powerful tools for this task during the last years. However, these algorithms require large amounts of high-quality annotated data containing railway-specific objects as training data. Unfortunately, all of the publicly available datasets that tackle this requirement are restricted in some way. Therefore, this paper presents OSDaR23, a multi-sensor dataset of 45 subsequences acquired in Hamburg, Germany, in September 2021, that was created to foster driverless train operation on mainline railways. The sensor setup consists of multiple calibrated and synchronized infrared (IR) and visual (RGB) cameras, lidars, a radar, and position and acceleration sensors mounted on the front of a rail vehicle. In addition to the raw data, the dataset contains 204091 polyline, polygonal, rectangle, and cuboid annotations in total for 20 different object classes. It is the first publicly available multi-sensor dataset annotated with a variety of object classes that are relevant for the railway context. OSDaR23, available at data.fid-move.de/dataset/osdar23, can also be used for tasks beyond collision prediction, which are listed in this paper.

翻訳日:2024-03-21 23:07:03 公開日:2024-03-19

# Federated Foundation Models: 大規模モデルのためのプライバシ保護と協調学習

Federated Foundation Models: Privacy-Preserving and Collaborative Learning for Large Models ( http://arxiv.org/abs/2305.11414v3 )

ライセンス: Link先を確認

Sixing Yu, J. Pablo Muñoz, Ali Jannesari,

(参考訳) LLaMA、BERT、GPT、ViT、CLIPといったファンデーションモデル(FM)は、事前トレーニングに大量のデータを活用する能力によって、幅広いアプリケーションで顕著な成功を収めている。しかし、FMを最適化するには、機密データにアクセスし、プライバシー上の懸念を高め、多くのドメインで適用性を制限する必要がある。本稿では,FMとFederated Learning(FL)の利点を組み合わせたFFM(Federated Foundation Models)パラダイムを提案する。我々は,FMの寿命にFLを組み込むことの潜在的なメリットと課題について論じ,事前学習,微調整,応用について論じる。 FFMの事前トレーニング、FFMの微調整、フェデレートされたプロンプトチューニングなど、FFMの将来的な研究の道程を概説し、データのプライバシーを確保しつつ、よりパーソナライズされたコンテキスト対応モデルの開発を可能にする。さらに,FFMにおける連続的・長期学習の可能性についても検討し,エッジでの計算能力の増大が,データソースに近い新たに生成されたプライベートデータを用いてFMを最適化する可能性を秘めている。提案したFFMの概念は、大きな言語モデルをプライバシー保護方法でトレーニングするための柔軟でスケーラブルなフレームワークを提供する。

Foundation Models (FMs), such as LLaMA, BERT, GPT, ViT, and CLIP, have demonstrated remarkable success in a wide range of applications, driven by their ability to leverage vast amounts of data for pre-training. However, optimizing FMs often requires access to sensitive data, raising privacy concerns and limiting their applicability in many domains. In this paper, we propose the Federated Foundation Models (FFMs) paradigm, which combines the benefits of FMs and Federated Learning (FL) to enable privacy-preserving and collaborative learning across multiple end-users. We discuss the potential benefits and challenges of integrating FL into the lifespan of FMs, covering pre-training, fine-tuning, and application. We further outline potential future research avenues in FFM, including FFM pre-training, FFM fine-tuning, and federated prompt tuning, which allow the development of more personalized and context-aware models while ensuring data privacy. Moreover, we explore the possibility of continual/lifelong learning in FFMs, as increased computational power at the edge may unlock the potential for optimizing FMs using newly generated private data close to the data source. The proposed FFM concepts offer a flexible and scalable framework for training large language models in a privacy-preserving manner, setting the stage for subsequent advancements in both FM training and federated learning.

翻訳日:2024-03-21 23:07:03 公開日:2024-03-19

# DermSynth3D:in-the-wild Annotated Dermatology画像の合成

DermSynth3D: Synthesis of in-the-wild Annotated Dermatology Images ( http://arxiv.org/abs/2305.12621v3 )

ライセンス: Link先を確認

Ashish Sinha, Jeremy Kawahara, Arezou Pakzad, Kumar Abhishek, Matthieu Ruthven, Enjie Ghorbel, Anis Kacem, Djamila Aouada, Ghassan Hamarneh,

(参考訳) 近年, 深層学習(DL)は皮膚画像解析の分野で大きな可能性を秘めている。しかし、この領域の既存のデータセットには、少数の画像サンプル、限られた疾患条件、不十分なアノテーション、標準化されていない画像取得など、重大な制限がある。これらの欠点に対処するため,我々はDermSynth3Dという新しいフレームワークを提案する。 DermSynth3Dは、人体の3Dテクスチャメッシュに、微分可能なレンダラーを用いて皮膚の病気パターンをブレンドし、さまざまな背景条件下で選択された照明条件下で、様々なカメラ視点から2D画像を生成する。筆者らの手法は、ブレンディングとレンダリングを制約するトップダウンルールに従属し、より有意義な結果が得られるように、肌の状態の2D画像を作成する。本フレームワークは、皮膚、皮膚の状態、身体部分、病変周囲の境界ボックス、深度マップ、およびカメラ位置や照明条件などの他の3Dシーンパラメータを意味的セグメンテーションするための、フォトリアリスティックな2D皮膚鏡画像およびそれに対応する高密度アノテーションを生成する。 DermSynth3Dは、さまざまな皮膚科学タスクのためのカスタムデータセットを作成することができる。本稿では,DermSynth3Dを用いて合成データ上でDLモデルを訓練し,実際の2次元皮膚画像を用いて各種皮膚学タスクで評価することにより,データの有効性を実証する。コードをhttps://github.com/sfu-mial/DermSynth3Dで公開しています。

In recent years, deep learning (DL) has shown great potential in the field of dermatological image analysis. However, existing datasets in this domain have significant limitations, including a small number of image samples, limited disease conditions, insufficient annotations, and non-standardized image acquisitions. To address these shortcomings, we propose a novel framework called DermSynth3D. DermSynth3D blends skin disease patterns onto 3D textured meshes of human subjects using a differentiable renderer and generates 2D images from various camera viewpoints under chosen lighting conditions in diverse background scenes. Our method adheres to top-down rules that constrain the blending and rendering process to create 2D images with skin conditions that mimic in-the-wild acquisitions, ensuring more meaningful results. The framework generates photo-realistic 2D dermoscopy images and the corresponding dense annotations for semantic segmentation of the skin, skin conditions, body parts, bounding boxes around lesions, depth maps, and other 3D scene parameters, such as camera position and lighting conditions. DermSynth3D allows for the creation of custom datasets for various dermatology tasks. We demonstrate the effectiveness of data generated using DermSynth3D by training DL models on synthetic data and evaluating them on various dermatology tasks using real 2D dermatological images. We make our code publicly available at https://github.com/sfu-mial/DermSynth3D.

翻訳日:2024-03-21 23:07:03 公開日:2024-03-19

# Prodigy: 適応型パラメータフリー学習者

Prodigy: An Expeditiously Adaptive Parameter-Free Learner ( http://arxiv.org/abs/2306.06101v4 )

ライセンス: Link先を確認

Konstantin Mishchenko, Aaron Defazio,

(参考訳) 本稿では,AdaGradやAdamといった適応的な手法で学習率を推定する問題を考察する。本稿では,学習率を最適に設定するために必要となる,解からD$までの距離を確実に推定するアルゴリズムであるProdigyを提案する。 Prodigyの中核となるのは、D-Adaptation法を学習速度のない学習に適用することである。これは、D-適応の収束率を$O(\sqrt{\log(D/d_0)})$で改善する。我々は12の共通ロジスティック回帰ベンチマークデータセット、CIFAR10でのVGG11およびResNet-50トレーニング、ImagenetでのViTトレーニング、IWSLT14でのLSTMトレーニング、CriteoデータセットでのDLRMトレーニング、Knee MRIデータセットでのVarNet、BookWikiでのRoBERTaおよびGPTトランスフォーマートレーニングでProdigyをテストする。実験結果から,本手法はD-Adaptationより優れ,手書きAdamに近い精度で精度が向上することがわかった。

We consider the problem of estimating the learning rate in adaptive methods, such as AdaGrad and Adam. We propose Prodigy, an algorithm that provably estimates the distance to the solution $D$, which is needed to set the learning rate optimally. At its core, Prodigy is a modification of the D-Adaptation method for learning-rate-free learning. It improves upon the convergence rate of D-Adaptation by a factor of $O(\sqrt{\log(D/d_0)})$, where $d_0$ is the initial estimate of $D$. We test Prodigy on 12 common logistic-regression benchmark datasets, VGG11 and ResNet-50 training on CIFAR10, ViT training on Imagenet, LSTM training on IWSLT14, DLRM training on Criteo dataset, VarNet on Knee MRI dataset, as well as RoBERTa and GPT transformer training on BookWiki. Our experimental results show that our approach consistently outperforms D-Adaptation and reaches test accuracy values close to that of hand-tuned Adam.

翻訳日:2024-03-21 22:57:10 公開日:2024-03-19

# $GRU^{spa}$:時空間分散のための空間的注意を伴うGated Recurrent Unit

$GRU^{spa}$: Gated Recurrent Unit with Spatial Attention for Spatio-Temporal Disaggregation ( http://arxiv.org/abs/2306.07292v3 )

ライセンス: Link先を確認

Bin Han, Bill Howe,

(参考訳) オープンデータは、通常プライバシーポリシーに従うために、空間的に集約されることが多い。しかし、粗大で異質な集約は、下流のAI/MLシステムの学習と統合を複雑にします。本研究では,低分解能で不規則なパーティション(例:国勢調査トラクション)から高分解能で不規則なパーティション(例:都市ブロック)へ時空間データを分解するモデルを考察する。本稿では,空間的注意層をGRUモデルに統合したGated Recurrent Unit with Space Attention(GRU^{spa}$)を提案する。空間的注意層は領域間の空間的相互作用を捉え、ゲートリカレントモジュールは時間的依存関係をキャプチャする。さらに,地域レベルの違い(例えば,ある都市ブロックが所定の国勢調査区域に完全に含まれている場合)の包摂関係を利用して,空間的注意層を制約する。限られた歴史的訓練データが存在する状況に対しては,移動学習のシナリオについて検討し,ある都市変数に事前学習したモデルを,数百のサンプルのみを用いて,他の都市変数に対して微調整できることを示す。これらの手法を2つのモビリティデータセット上で評価した結果、$GRU^{spa}$は、他のニューラルネットワークモデルや典型的なヒューリスティック手法よりも大幅に改善され、下流モデルのトレーニングに有用な小さな領域における現実的なポイントデータを合成できることがわかった。

Open data is frequently released spatially aggregated, usually to comply with privacy policies. But coarse, heterogeneous aggregations complicate learning and integration for downstream AI/ML systems. In this work, we consider models to disaggregate spatio-temporal data from a low-resolution, irregular partition (e.g., census tract) to a high-resolution, irregular partition (e.g., city block). We propose a model, Gated Recurrent Unit with Spatial Attention ($GRU^{spa}$), where spatial attention layers are integrated into the original Gated Recurrent Unit (GRU) model. The spatial attention layers capture spatial interactions among regions, while the gated recurrent module captures the temporal dependencies. Additionally, we utilize containment relationships between different geographic levels (e.g., when a given city block is wholly contained in a given census tract) to constrain the spatial attention layers. For situations where limited historical training data is available, we study transfer learning scenarios and show that a model pre-trained on one city variable can be fine-tuned for another city variable using only a few hundred samples. Evaluating these techniques on two mobility datasets, we find that $GRU^{spa}$ provides a significant improvement over other neural models as well as typical heuristic methods, allowing us to synthesize realistic point data over small regions useful for training downstream models.

翻訳日:2024-03-21 22:57:10 公開日:2024-03-19

# ロバスト・インストラクション・チューニングによる大規模マルチモーダルモデルにおける幻覚の緩和

Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning ( http://arxiv.org/abs/2306.14565v4 )

ライセンス: Link先を確認

Fuxiao Liu, Kevin Lin, Linjie Li, Jianfeng Wang, Yaser Yacoob, Lijuan Wang,

(参考訳) マルチモーダルタスクの有望な進歩にもかかわらず、現在の大規模マルチモーダルモデル(LMM)は、関連する画像や人間の指示に関して一貫性のない記述を幻覚させる傾向にある。本稿では,Large-scale Robust Visual (LRV)-Instructionという,大規模かつ多様な視覚的命令チューニングデータセットを導入することでこの問題に対処する。本データセットは, GPT4が生成した400kの視覚的命令からなり, 16の視覚・言語的タスクをオープンエンドの指示と回答でカバーする。主に正の命令サンプルに焦点を当てた既存の研究とは異なり、我々は、より堅牢な視覚的命令チューニングのための正と負の両方の命令を含むLRV-インストラクションを設計する。私たちの否定的な指示は3つの意味レベルで設計されます。 (i)現存しないオブジェクト操作 (二)既存の物体の操作及び操作 (三)知識操作 LMMが生み出す幻覚を効果的に測定するために,人間の専門家による視覚指導のチューニングを安定的に評価するためのGAVIE(GPT4-Assisted Visual Instruction Evaluation)を提案する。 GAVIEは人間による注釈付き基礎回答を必要とせず、多様な命令形式に適応することができる。われわれはLMMの幻覚を調査するための総合的な実験を行った。以上の結果から,既存のLMMには負の指示,特に既存のオブジェクトと知識操作の指示が提示されている。さらに, LRV-InstructionにおけるMiniGPT4とmPLUG-Owlの微調整により幻覚の緩和を実現し, 最先端の手法と比較していくつかの公開データセットの性能向上を実現した。さらに、トレーニングデータにおける正と負のインスタンスのバランスの取れた比率は、より堅牢なモデルにつながることを観察した。コードとデータはhttps://github.com/FuxiaoLiu/LRV-Instruction.comで公開されている。

Despite the promising progress in multi-modal tasks, current large multi-modal models (LMMs) are prone to hallucinating inconsistent descriptions with respect to the associated image and human instructions. This paper addresses this issue by introducing the first large and diverse visual instruction tuning dataset, named Large-scale Robust Visual (LRV)-Instruction. Our dataset comprises 400k visual instructions generated by GPT4, covering 16 vision-and-language tasks with open-ended instructions and answers. Unlike existing studies that primarily focus on positive instruction samples, we design LRV-Instruction to include both positive and negative instructions for more robust visual instruction tuning. Our negative instructions are designed at three semantic levels: (i) Nonexistent Object Manipulation, (ii) Existent Object Manipulation and (iii) Knowledge Manipulation. To efficiently measure the hallucination generated by LMMs, we propose GPT4-Assisted Visual Instruction Evaluation (GAVIE), a stable approach to evaluate visual instruction tuning like human experts. GAVIE does not require human-annotated groundtruth answers and can adapt to diverse instruction formats. We conduct comprehensive experiments to investigate the hallucination of LMMs. Our results demonstrate existing LMMs exhibit significant hallucinations when presented with our negative instructions, particularly Existent Object and Knowledge Manipulation instructions. Moreover, we successfully mitigate hallucination by finetuning MiniGPT4 and mPLUG-Owl on LRV-Instruction while improving performance on several public datasets compared to state-of-the-art methods. Additionally, we observed that a balanced ratio of positive and negative instances in the training data leads to a more robust model. Code and data are available at https://github.com/FuxiaoLiu/LRV-Instruction.

翻訳日:2024-03-21 22:57:10 公開日:2024-03-19

# Kadanoff-Baym方程式を用いたオープン量子システム

Open Quantum Systems with Kadanoff-Baym Equations ( http://arxiv.org/abs/2308.07659v3 )

ライセンス: Link先を確認

Tim Neidig, Jan Rais, Marcus Bleicher, Hendrik van Hees, Carsten Greiner,

(参考訳) 本研究では, 量子力学的フェルミオン粒子の時間的進化について検討した。この開量子系に対して、熱バス粒子との相互作用を弾性2-2散乱とすることで、系の粒子に対する非平衡カダノフ・ベイム方程式を定式化する。一粒子グリーンズ関数に対する空間的に不均一な積分微分方程式を数値的に解く。本研究では, 系粒子が熱浴と平衡して熱分解し, 密度行列の対角要素が1粒子エネルギー固有基底, デコヘアで表されることにより, 対角成分,すなわち占有数のみが生き残ることを示す。さらに、グリーン関数の時間発展は、様々な一粒子量子状態のスペクトル特性も決定する。

We study the temporal evolution of quantum mechanical fermionic particles exhibiting one bound state within a one-dimensional attractive square-well potential in a heat bath of bosonic particles. For this open quantum system we formulate the non-equilibrium Kadanoff-Baym equations for the system particles by taking the interactions to be elastic 2-2 scatterings with the heat-bath particles. The corresponding spatially imhomogeneous integro-differential equations for the one-particle Greens's function are solved numerically. We demonstrate how the system particles equilibrate and thermalize with the heat bath and how the off-diagonal elements of the density matrix, expressed in the one-particle energy eigenbasis, decohere, so that only the diagonal entries, i.e. the occupation numbers, survive. In addition, the time evolution of the (retarded) Green's function also determines the spectral properties of the various one-particle quantum states.

翻訳日:2024-03-21 22:47:21 公開日:2024-03-19

# 視覚・言語モデルにおけるチェーン・オブ・サート推論の測定と改善

Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models ( http://arxiv.org/abs/2309.04461v2 )

ライセンス: Link先を確認

Yangyi Chen, Karan Sikka, Michael Cogswell, Heng Ji, Ajay Divakaran,

(参考訳) 視覚言語モデル(VLM)は最近、視覚的内容に関する自然なクエリを解析し、人間のような出力を生成する視覚アシスタントとして、強力な効果を示した。本研究では,これらのモデルが知覚情報に基づいて人間ライクな推論を実証する能力について検討する。推論能力の完全整合性および基底性に関する重要な懸念に対処するため、これらのモデルの推論整合性も測定する。これを実現するために,チェーン・オブ・シント(CoT)に基づく一貫性尺度を提案する。しかし、そのような評価には高レベルの推論と詳細な推論チェーンの両方を含むベンチマークが必要である。 LLM-Human-in-the-Loopパイプラインを提案することで、この課題に対処する。このパイプラインと既存の粗粒化アノテートデータセットに基づいて、VLMのゼロショット推論性能と一貫性の両方を測定するためにCUREベンチマークを構築します。我々は、既存の最先端のVLMを評価し、最高の性能モデルでさえ、強い視覚的推論能力と一貫性を示すことができず、VLMが体系的かつ一貫して人間のように視覚的推論を実行できるようにするためには、かなりの努力が必要であることを示す。初期段階として,VLMの推論性能と一貫性の向上を目的とした2段階のトレーニングフレームワークを提案する。第1段階では、LLMが自動的に生成するステップバイステップ推論サンプルを使用して、VLMの教師付き微調整を行う。第2段階では,LLMによるフィードバックを取り入れて,高度に整合性のある推論連鎖を生成することにより,トレーニングプロセスをさらに強化する。パフォーマンスと一貫性の両方を推論する上で、私たちのフレームワークの有効性を実証的に強調します。

Vision-language models (VLMs) have recently demonstrated strong efficacy as visual assistants that can parse natural queries about the visual content and generate human-like outputs. In this work, we explore the ability of these models to demonstrate human-like reasoning based on the perceived information. To address a crucial concern regarding the extent to which their reasoning capabilities are fully consistent and grounded, we also measure the reasoning consistency of these models. We achieve this by proposing a chain-of-thought (CoT) based consistency measure. However, such an evaluation requires a benchmark that encompasses both high-level inference and detailed reasoning chains, which is costly. We tackle this challenge by proposing a LLM-Human-in-the-Loop pipeline, which notably reduces cost while simultaneously ensuring the generation of a high-quality dataset. Based on this pipeline and the existing coarse-grained annotated dataset, we build the CURE benchmark to measure both the zero-shot reasoning performance and consistency of VLMs. We evaluate existing state-of-the-art VLMs, and find that even the best-performing model is unable to demonstrate strong visual reasoning capabilities and consistency, indicating that substantial efforts are required to enable VLMs to perform visual reasoning as systematically and consistently as humans. As an early step, we propose a two-stage training framework aimed at improving both the reasoning performance and consistency of VLMs. The first stage involves employing supervised fine-tuning of VLMs using step-by-step reasoning samples automatically generated by LLMs. In the second stage, we further augment the training process by incorporating feedback provided by LLMs to produce reasoning chains that are highly consistent and grounded. We empirically highlight the effectiveness of our framework in both reasoning performance and consistency.

翻訳日:2024-03-21 22:37:29 公開日:2024-03-19

# Forman-Ricci曲率の増大による過スムージングと過スワッシングの緩和

Mitigating Over-Smoothing and Over-Squashing using Augmentations of Forman-Ricci Curvature ( http://arxiv.org/abs/2309.09384v3 )

ライセンス: Link先を確認

Lukas Fesser, Melanie Weber,

(参考訳) グラフニューラルネットワーク(GNN)は、ドメイン間のグラフ構造化データ学習に成功しているが、いくつかの潜在的な落とし穴が最近説明されている。これには、長距離接続(オーバー・スクワッシング)で符号化された情報を正確に活用できないことや、ネットワーク深度(オーバー・スムーシング)が増大する近くのノードの学習した表現を区別できないことが含まれる。両効果を特徴付ける効果的な方法は、離散曲率である: オーバー・スクアッシング効果を弱める長距離接続は、低い曲率を持つのに対し、オーバー・スムーシングに寄与するエッジは高い曲率を持つ。この観察により、オーバースムーシングとオーバースキャッシングを緩和するためにエッジを追加または除去する再配線技術がもたらされた。グラフの曲率やラプラシアンのスペクトルなどのグラフ特性を利用するいくつかの再配線手法が提案されている。しかし、既存の手法、特に曲率に基づく手法は、しばしば高価なサブルーチンと注意深いハイパーパラメータチューニングを必要とし、大規模なグラフに適用性を制限する。本稿では、線形時間で計算可能なスケーラブルな曲率表記法であるAFRC(Augmented Forman-Ricci curvature)に基づく書き換え手法を提案する。 AFRCはメッセージパッシングGNNにおける過剰なスムースと過剰なスキャッシング効果を効果的に特徴付ける。提案手法は,他の手法と比較して計算コストを大幅に削減しつつ,最先端の性能を実現することを示す実験により理論的結果を補完する。離散曲率の基本特性を生かして,高コストなハイパーパラメータ探索を回避し,提案手法のスケーラビリティを向上する,曲率ベースリワイアリングにおけるハイパーパラメータの効果的なヒューリスティックスを提案する。

While Graph Neural Networks (GNNs) have been successfully leveraged for learning on graph-structured data across domains, several potential pitfalls have been described recently. Those include the inability to accurately leverage information encoded in long-range connections (over-squashing), as well as difficulties distinguishing the learned representations of nearby nodes with growing network depth (over-smoothing). An effective way to characterize both effects is discrete curvature: Long-range connections that underlie over-squashing effects have low curvature, whereas edges that contribute to over-smoothing have high curvature. This observation has given rise to rewiring techniques, which add or remove edges to mitigate over-smoothing and over-squashing. Several rewiring approaches utilizing graph characteristics, such as curvature or the spectrum of the graph Laplacian, have been proposed. However, existing methods, especially those based on curvature, often require expensive subroutines and careful hyperparameter tuning, which limits their applicability to large-scale graphs. Here we propose a rewiring technique based on Augmented Forman-Ricci curvature (AFRC), a scalable curvature notation, which can be computed in linear time. We prove that AFRC effectively characterizes over-smoothing and over-squashing effects in message-passing GNNs. We complement our theoretical results with experiments, which demonstrate that the proposed approach achieves state-of-the-art performance while significantly reducing the computational cost in comparison with other methods. Utilizing fundamental properties of discrete curvature, we propose effective heuristics for hyperparameters in curvature-based rewiring, which avoids expensive hyperparameter searches, further improving the scalability of the proposed approach.

翻訳日:2024-03-21 22:37:29 公開日:2024-03-19

# BooookScore: LLM時代における書籍長要約の体系的研究

BooookScore: A systematic exploration of book-length summarization in the era of LLMs ( http://arxiv.org/abs/2310.00785v3 )

ライセンス: Link先を確認

Yapei Chang, Kyle Lo, Tanya Goyal, Mohit Iyyer,

(参考訳) 大規模言語モデル (LLM) のコンテキストウィンドウサイズを超える書籍の長さの文書 (>100Kトークン) を要約するには、まず入力文書を小さなチャンクに分割し、LLMにチャンクレベルの要約をマージ、更新、圧縮するよう促す必要がある。この課題の複雑さと重要性にもかかわらず、既存の書籍長要約データセット(例:BookSum)は、ほとんどの公共LCMの事前学習データであり、既存の評価手法は、現代のLCM要約器による誤りを捉えるのに苦労している。本稿では,(1)階層的にチャンクレベルの要約をマージし,(2)実行中の要約を漸進的に更新する。我々は、最近出版された100冊のGPT-4生成した要約に対して、1193個の微粒な人間のアノテーションを取得し、LLMによる8種類のコヒーレンスエラーを同定した。人間の評価は高価で時間を要するため,識別されたエラータイプを一切含まない要約文の比率を計測する自動尺度BooookScoreを開発する。 BooookScoreは、人間のアノテーションと高い合意を持っていて、他の多くの重要なパラメータ(例えば、チャンクサイズ、ベースLLM)の影響を体系的に評価できます。 GPT-4 や Claude 2 のようなクローズドソース LLM は,オープンソースモデルよりも BooookScore の高いサマリーを生成することがわかった。 LLaMA 2は他のモデルより遅れているが、MixtralはGPT-3.5-Turboと同等のパフォーマンスを達成している。増分更新によってBooookScoreは低下するが、階層的なマージよりも詳細度が高い。

Summarizing book-length documents (>100K tokens) that exceed the context window size of large language models (LLMs) requires first breaking the input document into smaller chunks and then prompting an LLM to merge, update, and compress chunk-level summaries. Despite the complexity and importance of this task, it has yet to be meaningfully studied due to the challenges of evaluation: existing book-length summarization datasets (e.g., BookSum) are in the pretraining data of most public LLMs, and existing evaluation methods struggle to capture errors made by modern LLM summarizers. In this paper, we present the first study of the coherence of LLM-based book-length summarizers implemented via two prompting workflows: (1) hierarchically merging chunk-level summaries, and (2) incrementally updating a running summary. We obtain 1193 fine-grained human annotations on GPT-4 generated summaries of 100 recently-published books and identify eight common types of coherence errors made by LLMs. Because human evaluation is expensive and time-consuming, we develop an automatic metric, BooookScore, that measures the proportion of sentences in a summary that do not contain any of the identified error types. BooookScore has high agreement with human annotations and allows us to systematically evaluate the impact of many other critical parameters (e.g., chunk size, base LLM) while saving $15K USD and 500 hours in human evaluation costs. We find that closed-source LLMs such as GPT-4 and Claude 2 produce summaries with higher BooookScore than those generated by open-source models. While LLaMA 2 falls behind other models, Mixtral achieves performance on par with GPT-3.5-Turbo. Incremental updating yields lower BooookScore but higher level of detail than hierarchical merging, a trade-off sometimes preferred by annotators.

翻訳日:2024-03-21 22:37:29 公開日:2024-03-19

# FroSSL: 効率的なマルチビュー自己監視学習のためのFrobenius Norm最小化

FroSSL: Frobenius Norm Minimization for Efficient Multiview Self-Supervised Learning ( http://arxiv.org/abs/2310.02903v3 )

ライセンス: Link先を確認

Oscar Skean, Aayush Dhakal, Nathan Jacobs, Luis Gonzalo Sanchez Giraldo,

(参考訳) 自己教師付き学習(SSL)は、表現学習の一般的なパラダイムである。近年のマルチビュー手法は, サンプルコントラスト, 次元コントラスト, 非対称ネットワークベースに分類される。これらのファミリーは類似した品質の解に収束するが、いくつかの手法がエポック非効率であり、目標のパフォーマンスに到達するのに長い訓練を必要とすることを実証的に示すことができる。効率性を改善するための2つの主要なアプローチは、共分散固有値正則化と、より多くのビューの使用である。しかし、これらの2つのアプローチは固有値の計算の複雑さのために結合が難しい。固有分解を完全に回避しながら双方のアプローチを一致させる目的関数FroSSLを提案する。 FroSSLは、共分散フロベニウスノルムを最小化して崩壊を回避し、平均二乗誤差を最小化して拡張不変性を達成する。我々は,FroSSLが他のSSLメソッドよりも高速に競合精度に達することを示し,FroSSLが埋め込み共分散行列の固有値にどのように影響するかによって,この高速収束が理論的および実証的な支持を提供する。また、FroSSLは、STL-10、Tiny Imagenet、Imagenet-100など、複数のデータセット上でResNet18をトレーニングする際に、線形プローブ評価の競合表現を学習することを示した。

Self-supervised learning (SSL) is a popular paradigm for representation learning. Recent multiview methods can be classified as sample-contrastive, dimension-contrastive, or asymmetric network-based, with each family having its own approach to avoiding informational collapse. While these families converge to solutions of similar quality, it can be empirically shown that some methods are epoch-inefficient and require longer training to reach a target performance. Two main approaches to improving efficiency are covariance eigenvalue regularization and using more views. However, these two approaches are difficult to combine due to the computational complexity of computing eigenvalues. We present the objective function FroSSL which reconciles both approaches while avoiding eigendecomposition entirely. FroSSL works by minimizing covariance Frobenius norms to avoid collapse and minimizing mean-squared error to achieve augmentation invariance. We show that FroSSL reaches competitive accuracies more quickly than any other SSL method and provide theoretical and empirical support that this faster convergence is due to how FroSSL affects the eigenvalues of the embedding covariance matrices. We also show that FroSSL learns competitive representations on linear probe evaluation when used to train a ResNet18 on several datasets, including STL-10, Tiny Imagenet, and Imagenet-100.

翻訳日:2024-03-21 22:37:29 公開日:2024-03-19

# 漸近的にフリーなスケッチリッジアンサンブル:リスク、クロスバリデーション、チューニング

Asymptotically free sketched ridge ensembles: Risks, cross-validation, and tuning ( http://arxiv.org/abs/2310.04357v3 )

ライセンス: Link先を確認

Pratik Patil, Daniel LeJeune,

(参考訳) 我々は、スケッチされたリッジ回帰アンサンブルの予測リスクを推定するために、一般化されたクロスバリデーション(GCV)の整合性を確立するために、ランダム行列理論を用いて、正規化とスケッチパラメータの効率的で一貫したチューニングを可能にする。我々の結果は、非常に穏やかなデータ仮定の下で、漸近的に自由なスケッチの幅広いクラスを保っている。正方形の予測リスクに対して,無作為な等価な暗黙のリッジバイアスとスケッチに基づく分散を分解し,無限アンサンブルでスケッチサイズをチューニングするだけで,そのリスクを大域的に最適化できることを示す。一般の準4次予測リスク関数に対しては、GCVを拡張して一貫したリスク推定器を構築し、ワッサーシュタイン2計量におけるGCV補正予測の分布収束を得る。これは特に、トレーニングデータに漸近的に正しいカバレッジ条件で予測間隔を構築することができる。また,小型のスケッチ付きリッジ・アンサンブルを用いて,GCVを用いて非スケッチ・リッジ・レグレッションのリスクを効率的に推定できるアンサンブル・トリックを提案する。我々は、CountSketchやサブサンプリングされたランダム化された離散コサイン変換を含む実用的なスケッチを持つ合成データセットと実大規模データセットの両方を用いて、理論的結果を実証的に検証した。

We employ random matrix theory to establish consistency of generalized cross validation (GCV) for estimating prediction risks of sketched ridge regression ensembles, enabling efficient and consistent tuning of regularization and sketching parameters. Our results hold for a broad class of asymptotically free sketches under very mild data assumptions. For squared prediction risk, we provide a decomposition into an unsketched equivalent implicit ridge bias and a sketching-based variance, and prove that the risk can be globally optimized by only tuning sketch size in infinite ensembles. For general subquadratic prediction risk functionals, we extend GCV to construct consistent risk estimators, and thereby obtain distributional convergence of the GCV-corrected predictions in Wasserstein-2 metric. This in particular allows construction of prediction intervals with asymptotically correct coverage conditional on the training data. We also propose an "ensemble trick" whereby the risk for unsketched ridge regression can be efficiently estimated via GCV using small sketched ridge ensembles. We empirically validate our theoretical results using both synthetic and real large-scale datasets with practical sketches including CountSketch and subsampled randomized discrete cosine transforms.

翻訳日:2024-03-21 22:37:29 公開日:2024-03-19

# 脳波のマルチスケール因果バックボーンの抽出

Extracting the Multiscale Causal Backbone of Brain Dynamics ( http://arxiv.org/abs/2311.00118v2 )

ライセンス: Link先を確認

Gabriele D'Acunto, Francesco Bonchi, Gianmarco De Francisci Morales, Giovanni Petri,

(参考訳) 脳の接続に関する研究努力の大部分は、脳のダイナミックスを管理する因果機構に直接関係しない脳領域間の統計的関連を中心に展開している。本稿では,脳力学のマルチスケール因果バックボーン(MCB)を提案する。提案手法は,近年のマルチスケール因果構造学習の進歩を活用し,モデル適合と複雑性のトレードオフを最適化する。合成データに対する経験的評価は,標準機能接続ネットワークに基づくベースラインよりも,我々の方法論の方が優れていることを示している。安静時fMRIデータに適用すると,左脳半球と右脳半球の両方に細いMCBが認められる。マルチスケールの性質から,低周波帯では因果ダイナミクスは高次認知機能に関連する脳の領域によって駆動され,高周波では知覚処理に関連するノードが重要な役割を担っていることが示唆された。最後に,脳接続の因果指紋の存在を確認し,因果的観点からの脳接続指紋の広範な研究を支援した。

The bulk of the research effort on brain connectivity revolves around statistical associations among brain regions, which do not directly relate to the causal mechanisms governing brain dynamics. Here we propose the multiscale causal backbone (MCB) of brain dynamics, shared by a set of individuals across multiple temporal scales, and devise a principled methodology to extract it. Our approach leverages recent advances in multiscale causal structure learning and optimizes the trade-off between the model fit and its complexity. Empirical assessment on synthetic data shows the superiority of our methodology over a baseline based on canonical functional connectivity networks. When applied to resting-state fMRI data, we find sparse MCBs for both the left and right brain hemispheres. Thanks to its multiscale nature, our approach shows that at low-frequency bands, causal dynamics are driven by brain regions associated with high-level cognitive functions; at higher frequencies instead, nodes related to sensory processing play a crucial role. Finally, our analysis of individual multiscale causal structures confirms the existence of a causal fingerprint of brain connectivity, thus supporting the existing extensive research in brain connectivity fingerprinting from a causal perspective.

翻訳日:2024-03-21 22:27:37 公開日:2024-03-19

# 分散化フェデレーション学習ネットワークにおける対立ノード配置の影響

The Impact of Adversarial Node Placement in Decentralized Federated Learning Networks ( http://arxiv.org/abs/2311.07946v4 )

ライセンス: Link先を確認

Adam Piaseczny, Eric Ruzomberka, Rohit Parasnis, Christopher G. Brinton,

(参考訳) 連邦学習(FL)の人気が高まるにつれ、新しい分散フレームワークが広まりつつある。これらのフレームワークは分散環境の利点を利用して、高速でエネルギー効率の良いデバイス間通信を可能にする。しかし、この人気の高まりは、堅牢なセキュリティ対策の必要性を増している。既存の研究はFLセキュリティの様々な側面を探求してきたが、分散ネットワークにおける敵ノード配置の役割はほとんど解明されていない。本稿では,ネットワーク内の配置を協調的に調整できる様々な敵配置戦略において,分散FLの性能を解析することにより,このギャップを解消する。ランダムな配置とネットワーク中心性に基づく配置の2つの基本戦略を確立する。本稿では, 敵同士の平均ネットワーク距離を最大化することにより, 敵中心性よりも敵の拡散を優先する新たな攻撃アルゴリズムを提案する。新たなアタックアルゴリズムは、テスト精度などの重要なパフォーマンス指標に大きく影響し、考慮されたセットアップに対して、ベースラインフレームワークを9.5%から6.5.%に上回る結果となった。我々の研究は、分散FLシステムの脆弱性に関する貴重な知見を提供し、よりセキュアで堅牢な分散FLフレームワークを開発することを目的とした将来の研究の舞台となる。

As Federated Learning (FL) grows in popularity, new decentralized frameworks are becoming widespread. These frameworks leverage the benefits of decentralized environments to enable fast and energy-efficient inter-device communication. However, this growing popularity also intensifies the need for robust security measures. While existing research has explored various aspects of FL security, the role of adversarial node placement in decentralized networks remains largely unexplored. This paper addresses this gap by analyzing the performance of decentralized FL for various adversarial placement strategies when adversaries can jointly coordinate their placement within a network. We establish two baseline strategies for placing adversarial node: random placement and network centrality-based placement. Building on this foundation, we propose a novel attack algorithm that prioritizes adversarial spread over adversarial centrality by maximizing the average network distance between adversaries. We show that the new attack algorithm significantly impacts key performance metrics such as testing accuracy, outperforming the baseline frameworks by between $9\%$ and $66.5\%$ for the considered setups. Our findings provide valuable insights into the vulnerabilities of decentralized FL systems, setting the stage for future research aimed at developing more secure and robust decentralized FL frameworks.

翻訳日:2024-03-21 22:17:48 公開日:2024-03-19

# Span-based Optimal Sample Complexity for Average Reward MDPs

Span-Based Optimal Sample Complexity for Average Reward MDPs ( http://arxiv.org/abs/2311.13469v2 )

ライセンス: Link先を確認

Matthew Zurek, Yudong Chen,

(参考訳) 平均回帰マルコフ決定過程(MDP)において,$\varepsilon$-optimal Policyを生成モデルで学習する際のサンプル複雑性について検討した。我々は、$\widetilde{O}\left(SA\frac{H}{\varepsilon^2} \right)$, ここで、$H$は最適ポリシーのバイアス関数のスパンであり、$SA$は状態-作用空間の濃度である。我々の結果は、すべてのパラメータにおいて(ログファクタまで)最小値の最大値である$S,A,H$および$\varepsilon$で、すべてのポリシーに対して一様に有界な混合時間を仮定する既存の作業を改善するか、パラメータに最適に依存するかのいずれかである。本結果は, 平均再帰型MDPを, 割引型MDPに還元することに基づく。この削減の最適性を確立するために、$\widetilde{O}\left(SA\frac{H}{(1-\gamma)^2\varepsilon^2} \right)$サンプルが$\varepsilon$-optimal policyを学習するのに十分であることを示す$\gamma$-discounted MDPsに対する改善されたバウンダリを開発し、$\widetilde{\Omega}\left(SA\frac{1}{(1-\gamma)^3\varepsilon^2} \right)のよく知られた下限を回避した。分析では,スパンパラメータの観点から,特定のインスタンス依存分散パラメータの上限を求める。これらの境界は、MDPの混合時間や直径に基づくものよりも厳密であり、より広い用途がある可能性がある。

We study the sample complexity of learning an $\varepsilon$-optimal policy in an average-reward Markov decision process (MDP) under a generative model. We establish the complexity bound $\widetilde{O}\left(SA\frac{H}{\varepsilon^2} \right)$, where $H$ is the span of the bias function of the optimal policy and $SA$ is the cardinality of the state-action space. Our result is the first that is minimax optimal (up to log factors) in all parameters $S,A,H$ and $\varepsilon$, improving on existing work that either assumes uniformly bounded mixing times for all policies or has suboptimal dependence on the parameters. Our result is based on reducing the average-reward MDP to a discounted MDP. To establish the optimality of this reduction, we develop improved bounds for $\gamma$-discounted MDPs, showing that $\widetilde{O}\left(SA\frac{H}{(1-\gamma)^2\varepsilon^2} \right)$ samples suffice to learn a $\varepsilon$-optimal policy in weakly communicating MDPs under the regime that $\gamma \geq 1 - \frac{1}{H}$, circumventing the well-known lower bound of $\widetilde{\Omega}\left(SA\frac{1}{(1-\gamma)^3\varepsilon^2} \right)$ for general $\gamma$-discounted MDPs. Our analysis develops upper bounds on certain instance-dependent variance parameters in terms of the span parameter. These bounds are tighter than those based on the mixing time or diameter of the MDP and may be of broader use.

翻訳日:2024-03-21 22:17:48 公開日:2024-03-19

# DREAM:拡散整流と推定適応モデル

DREAM: Diffusion Rectification and Estimation-Adaptive Models ( http://arxiv.org/abs/2312.00210v2 )

ライセンス: Link先を確認

Jinxin Zhou, Tianyu Ding, Tianyi Chen, Jiachen Jiang, Ilya Zharkov, Zhihui Zhu, Luming Liang,

(参考訳) DREAM(Diffusion Rectification and Estimation Adaptive Models)は,最小限のコード変更(たった3行)を必要とするが,拡散モデルのサンプリングによるトレーニングのアライメントは著しく向上する。 DREAMには2つのコンポーネントがある。DREAMは、サンプリングプロセスの反映のためにトレーニングを調整する拡散補正と、歪みに対する知覚のバランスをとる推定適応である。画像超解像(SR)に適用すると、DREAMは歪みの最小化と高画質の保存とのトレードオフを確実にナビゲートする。実験では、標準拡散ベースのSR法よりもDREAMの方が優れており、トレーニング収束の高速化に2ドルから3ドル、サンプリングステップの削減に10ドルから20ドルがかかる。 DREAMが拡散モデルトレーニングパラダイムを再考することを願っている。

We present DREAM, a novel training framework representing Diffusion Rectification and Estimation Adaptive Models, requiring minimal code changes (just three lines) yet significantly enhancing the alignment of training with sampling in diffusion models. DREAM features two components: diffusion rectification, which adjusts training to reflect the sampling process, and estimation adaptation, which balances perception against distortion. When applied to image super-resolution (SR), DREAM adeptly navigates the tradeoff between minimizing distortion and preserving high image quality. Experiments demonstrate DREAM's superiority over standard diffusion-based SR methods, showing a $2$ to $3\times $ faster training convergence and a $10$ to $20\times$ reduction in sampling steps to achieve comparable results. We hope DREAM will inspire a rethinking of diffusion model training paradigms.

翻訳日:2024-03-21 22:08:02 公開日:2024-03-19

# メタキャリブレータを用いたNeRFの瞬時不確かさ校正

Instant Uncertainty Calibration of NeRFs Using a Meta-calibrator ( http://arxiv.org/abs/2312.02350v2 )

ライセンス: Link先を確認

Niki Amini-Naieni, Tomas Jakab, Andrea Vedaldi, Ronald Clark,

(参考訳) ニューラルレージアンス場(NeRF)は、新しいビュー合成を著しく改善しているが、画像予測における正確な不確かさの定量化は未解決の問題である。最先端の密度認識型NeRFアンサンブル(DANE)[29]を含む不確実性を推定するための一般的な手法は、キャリブレーションなしで不確実性を定量化する。これはしばしば画像予測における過度または過度な信頼につながり、実際の応用を損なう可能性がある。本稿では,NeRFの校正不確かさを初めて達成する手法を提案する。これを実現するために,既存のキャリブレーション手法をNeRFに適用する上で重要な課題を克服した。この問題は、スパースビューの設定では特に問題があり、3つのイメージで操作できます。そこで本研究では,単一前方通過によるNeRFの不確実な校正を行うメタキャリブレータの概念を提案する。我々のメタキャリブレータはニューラルネットワークであり、NeRF画像と未校正不確実性マップを入力として、NeRFの未校正不確かさを補正するシーン固有の校正曲線を出力する。メタキャリブレータは,未確認シーンを一般化し,NeRFの良好な校正および最先端の不確実性を達成し,DANEや他のアプローチを著しく上回ることを示す。これにより、次世代の視点計画や、診断のための信頼性の高い画像再構成など、正確なNeRF不確実性推定に依存するアプリケーションを改善する機会が開ける。

Although Neural Radiance Fields (NeRFs) have markedly improved novel view synthesis, accurate uncertainty quantification in their image predictions remains an open problem. The prevailing methods for estimating uncertainty, including the state-of-the-art Density-aware NeRF Ensembles (DANE) [29], quantify uncertainty without calibration. This frequently leads to over- or under-confidence in image predictions, which can undermine their real-world applications. In this paper, we propose a method which, for the first time, achieves calibrated uncertainties for NeRFs. To accomplish this, we overcome a significant challenge in adapting existing calibration techniques to NeRFs: a need to hold out ground truth images from the target scene, reducing the number of images left to train the NeRF. This issue is particularly problematic in sparse-view settings, where we can operate with as few as three images. To address this, we introduce the concept of a meta-calibrator that performs uncertainty calibration for NeRFs with a single forward pass without the need for holding out any images from the target scene. Our meta-calibrator is a neural network that takes as input the NeRF images and uncalibrated uncertainty maps and outputs a scene-specific calibration curve that corrects the NeRF's uncalibrated uncertainties. We show that the meta-calibrator can generalize on unseen scenes and achieves well-calibrated and state-of-the-art uncertainty for NeRFs, significantly beating DANE and other approaches. This opens opportunities to improve applications that rely on accurate NeRF uncertainty estimates such as next-best view planning and potentially more trustworthy image reconstruction for medical diagnosis.

翻訳日:2024-03-21 22:08:02 公開日:2024-03-19

# 映像行動検出のための半教師付き能動学習

Semi-supervised Active Learning for Video Action Detection ( http://arxiv.org/abs/2312.07169v2 )

ライセンス: Link先を確認

Ayush Singh, Aayush J Rana, Akash Kumar, Shruti Vyas, Yogesh Singh Rawat,

(参考訳) 本研究では,映像行動検出のためのラベル学習に焦点をあてる。本研究では,ラベル付きデータとラベルなしデータと,行動検出のための情報的サンプル選択を併用した,新しい半教師付きアクティブラーニング手法を開発した。ビデオ行動検出には時空間的局所化と分類が必要であるため、アクティブな学習情報サンプル選択と半教師付き学習擬似ラベル生成の両方にいくつかの課題が生じる。まず,映像行動検出のための情報サンプルを効果的に選択するシンプルな拡張戦略であるNossAugを提案する。次に、ビデオ内の関連活動領域を強調することで、ビデオアクション検出におけるSSLの擬似ラベルの有効活用を可能にする、ハイパスフィルタリングに基づく新しい技術であるfft-attentionを提案する。提案手法を,UCF-101-24,JHMDB-21,Youtube-VOSの3種類のベンチマークデータセットで評価した。まず,提案手法は,UCF101-24とJHMDB-21の両方のベースラインアプローチとともに,半教師付き・弱教師付き学習において先行して機能するビデオアクション検出に有効であることを示す。次に、ビデオ内の他の密集予測タスクに対する一般化能力を示すビデオオブジェクトセグメンテーションにおけるYoutube-VOSの有効性を示す。

In this work, we focus on label efficient learning for video action detection. We develop a novel semi-supervised active learning approach which utilizes both labeled as well as unlabeled data along with informative sample selection for action detection. Video action detection requires spatio-temporal localization along with classification, which poses several challenges for both active learning informative sample selection as well as semi-supervised learning pseudo label generation. First, we propose NoiseAug, a simple augmentation strategy which effectively selects informative samples for video action detection. Next, we propose fft-attention, a novel technique based on high-pass filtering which enables effective utilization of pseudo label for SSL in video action detection by emphasizing on relevant activity region within a video. We evaluate the proposed approach on three different benchmark datasets, UCF-101-24, JHMDB-21, and Youtube-VOS. First, we demonstrate its effectiveness on video action detection where the proposed approach outperforms prior works in semi-supervised and weakly-supervised learning along with several baseline approaches in both UCF101-24 and JHMDB-21. Next, we also show its effectiveness on Youtube-VOS for video object segmentation demonstrating its generalization capability for other dense prediction tasks in videos.

翻訳日:2024-03-21 21:58:15 公開日:2024-03-19

# バレンプラトーの証明可能な欠如は、古典的なシミュラビリティを暗示しているか?それとも、なぜ変分量子コンピューティングを再考する必要があるのか?

Does provable absence of barren plateaus imply classical simulability? Or, why we need to rethink variational quantum computing ( http://arxiv.org/abs/2312.09121v2 )

ライセンス: Link先を確認

M. Cerezo, Martin Larocca, Diego García-Martín, N. L. Diaz, Paolo Braccia, Enrico Fontana, Manuel S. Rudolph, Pablo Bermejo, Aroosa Ijaz, Supanut Thanasilp, Eric R. Anschuetz, Zoë Holmes,

(参考訳) 近年,バレン高原現象の解明に多大な努力が払われている。このパースペクティブな記事では、部屋の中でますます大きな象に直面し、多くの人にほのめかされたが明示されていない質問に答える: 不毛の台地を避けることができる構造も活用して、古典的な損失を効率的にシミュレートできるだろうか? 我々は、初期データ取得フェーズにおいて量子デバイスから古典的なデータを収集できることを前提として、バレン高原の証明可能なモデルが古典的にシミュレート可能であることを示す強力な証拠を示す。これは、不毛の高原が次元性の呪いによって生じること、そしてそれらを解決するための現在のアプローチが、問題をいくつかの小さく古典的にシミュレート可能な部分空間にエンコードする、という観察から導かれる。したがって、データ収集には量子コンピュータの強調が不可欠であるが、我々の分析は、不毛高原の景観に対するパラメタライズド量子回路の情報処理能力の非古典性に深刻な疑問を呈している。議論の要点、スマートイニシャライゼーションの役割、そしてパラメタライズド量子回路の実行による証明可能な超ポリノミカル(あるいは単に実用的)の利点の可能性について論じる。

A large amount of effort has recently been put into understanding the barren plateau phenomenon. In this perspective article, we face the increasingly loud elephant in the room and ask a question that has been hinted at by many but not explicitly addressed: Can the structure that allows one to avoid barren plateaus also be leveraged to efficiently simulate the loss classically? We present strong evidence that commonly used models with provable absence of barren plateaus are also classically simulable, provided that one can collect some classical data from quantum devices during an initial data acquisition phase. This follows from the observation that barren plateaus result from a curse of dimensionality, and that current approaches for solving them end up encoding the problem into some small, classically simulable, subspaces. Thus, while stressing quantum computers can be essential for collecting data, our analysis sheds serious doubt on the non-classicality of the information processing capabilities of parametrized quantum circuits for barren plateau-free landscapes. We end by discussing caveats in our arguments, the role of smart initializations and the possibility of provably superpolynomial, or simply practical, advantages from running parametrized quantum circuits.

翻訳日:2024-03-21 21:58:15 公開日:2024-03-19

# DiffPortrait3D:ゼロショットポートレートビュー合成のための制御可能な拡散

DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis ( http://arxiv.org/abs/2312.13016v4 )

ライセンス: Link先を確認

Yuming Gu, You Xie, Hongyi Xu, Guoxian Song, Yichun Shi, Di Chang, Jing Yang, Linjie Luo,

(参考訳) 本稿では,DiffPortrait3Dという条件付き拡散モデルについて述べる。具体的には、単一のRGB入力を前提として、アイデンティティと表情の両方を保持した新しいカメラビューから、可塑性だが一貫した顔の詳細を合成することを目的としている。我々のゼロショット法は、時間を要する最適化と微調整の代わりに、カメラビューのない任意の顔像、極端な表情、多様な芸術的描写を一般化する。その中心となるのは、大規模画像データセットで事前訓練された2次元拡散モデルの生成前をレンダリングバックボーンとして利用し、一方、デノナイジングは、外観とカメラのポーズの無拘束制御でガイドされる。そこで我々はまず,凍結したユニセットの自己注意層に参照画像から外観コンテキストを注入する。そして、レンダリングビューを、同じビューから横断被写体の条件画像を見て、カメラポーズを解釈する新しい条件制御モジュールで操作する。さらに、トレーニング可能なクロスビューアテンションモジュールを挿入してビューの一貫性を高め、推論中に新しい3D対応ノイズ生成プロセスによりさらに強化する。我々は、我々の挑戦的インザワイルドとマルチビューのベンチマークにおいて、質的にも定量的にも、最先端の結果を実証する。

We present DiffPortrait3D, a conditional diffusion model that is capable of synthesizing 3D-consistent photo-realistic novel views from as few as a single in-the-wild portrait. Specifically, given a single RGB input, we aim to synthesize plausible but consistent facial details rendered from novel camera views with retained both identity and facial expression. In lieu of time-consuming optimization and fine-tuning, our zero-shot method generalizes well to arbitrary face portraits with unposed camera views, extreme facial expressions, and diverse artistic depictions. At its core, we leverage the generative prior of 2D diffusion models pre-trained on large-scale image datasets as our rendering backbone, while the denoising is guided with disentangled attentive control of appearance and camera pose. To achieve this, we first inject the appearance context from the reference image into the self-attention layers of the frozen UNets. The rendering view is then manipulated with a novel conditional control module that interprets the camera pose by watching a condition image of a crossed subject from the same view. Furthermore, we insert a trainable cross-view attention module to enhance view consistency, which is further strengthened with a novel 3D-aware noise generation process during inference. We demonstrate state-of-the-art results both qualitatively and quantitatively on our challenging in-the-wild and multi-view benchmarks.

翻訳日:2024-03-21 21:58:15 公開日:2024-03-19

# 不均一グラフ上の分布外一般化のためのFew-Shot Causal Representation Learning

Few-Shot Causal Representation Learning for Out-of-Distribution Generalization on Heterogeneous Graphs ( http://arxiv.org/abs/2401.03597v2 )

ライセンス: Link先を確認

Pengfei Ding, Yan Wang, Guanfeng Liu, Nan Wang,

(参考訳) Heterogeneous graph few-shot Learning (HGFL) は、様々な種類のノードとエッジから構成されるヘテロジニアスグラフ(HG)のラベル空間問題に対処するために開発された。 HGFLの中核的な概念は、ソースHGのリッチラベルされたクラスから知識を抽出し、この知識をターゲットHGに転送して、少数のラベル付きトレーニングデータで新しいクラスを学習し、最終的にラベル付きテストデータで予測することである。既存の方法は、典型的には、ソースHG、トレーニングデータ、テストデータは全て同じ分布を共有していると仮定する。しかし、実際には、(1)対象のHG分布と一致するソースHGの限られた可用性、(2)対象のHGの予測不能なデータ生成機構の2つの理由により、これらの3種類のデータ間の分散シフトは避けられない。このような分布シフトは,既存の手法では非効率な知識伝達と学習性能の低下をもたらすため,HGFLにおけるアウト・オブ・ディストリビューション(OOD)の一般化という新たな問題に繋がる。この課題に対処するため、我々はCausal OOD Heterogeneous graph Few-shot Learning Model、すなわちCOHFを提案する。 COHFでは、構造因果モデルを用いてHGの分布シフトを初めて特徴付け、HGFLにおけるOOD一般化の不変原理を確立した。そして、この不変原理に従い、分散シフトの影響を軽減するために、新しい変分自己エンコーダに基づく異種グラフニューラルネットワークを提案する。最後に、このネットワークを新しいメタ学習フレームワークと統合することにより、COHFは知識をターゲットHGに効果的に転送し、ラベルの少ないデータで新しいクラスを予測する。 7つの実世界のデータセットに対する大規模な実験は、最先端の手法よりもCOHFの優れた性能を示している。

Heterogeneous graph few-shot learning (HGFL) has been developed to address the label sparsity issue in heterogeneous graphs (HGs), which consist of various types of nodes and edges. The core concept of HGFL is to extract knowledge from rich-labeled classes in a source HG, transfer this knowledge to a target HG to facilitate learning new classes with few-labeled training data, and finally make predictions on unlabeled testing data. Existing methods typically assume that the source HG, training data, and testing data all share the same distribution. However, in practice, distribution shifts among these three types of data are inevitable due to two reasons: (1) the limited availability of the source HG that matches the target HG distribution, and (2) the unpredictable data generation mechanism of the target HG. Such distribution shifts result in ineffective knowledge transfer and poor learning performance in existing methods, thereby leading to a novel problem of out-of-distribution (OOD) generalization in HGFL. To address this challenging problem, we propose a novel Causal OOD Heterogeneous graph Few-shot learning model, namely COHF. In COHF, we first characterize distribution shifts in HGs with a structural causal model, establishing an invariance principle for OOD generalization in HGFL. Then, following this invariance principle, we propose a new variational autoencoder-based heterogeneous graph neural network to mitigate the impact of distribution shifts. Finally, by integrating this network with a novel meta-learning framework, COHF effectively transfers knowledge to the target HG to predict new classes with few-labeled data. Extensive experiments on seven real-world datasets have demonstrated the superior performance of COHF over the state-of-the-art methods.

翻訳日:2024-03-21 21:48:20 公開日:2024-03-19

# k- Support Normによる反復正規化:スパースリカバリにおける重要な補完

Iterative Regularization with k-support Norm: An Important Complement to Sparse Recovery ( http://arxiv.org/abs/2401.05394v4 )

ライセンス: Link先を確認

William de Vazelhes, Bhaskar Mukhoty, Xiao-Tong Yuan, Bin Gu,

(参考訳) スパースリカバリは機械学習と信号処理においてユビキタスである。スパースリカバリのNPハードの性質のため、既存の手法は制限的な(あるいは未知の)適用条件や高い計算コストに悩まされていることが知られている。近年, 反復正規化手法は, 従来手法で用いられてきた面倒なグリッド探索よりも, 早い停止時間でスパースリカバリを達成できるため, 有望な高速手法として出現している。しかし、これらの反復的メソッドのほとんどは、制限的な適用性条件を必要とする$\ell_1$ノルムに基づいており、多くの場合失敗する可能性がある。そのため、より広い条件下で反復正則化法を用いてスパースリカバリを実現することは、まだ研究されていない。この問題に対処するために、$\ell_1$標準ではなく$k$サポート標準正規化器に基づく新しい反復正規化アルゴリズムIRKSNを提案する。我々は、IRKSNとスパースリカバリ条件を提供し、従来のリカバリ条件と$\ell_1$標準正規化器を比較した。さらに,IRKSNのモデル誤差を定数で早期に停止し,スパースリカバリの標準線形レートを達成する。最後に, 相関設計行列を用いた支援回復実験を含むいくつかの実験において, アルゴリズムの適用性について述べる。

Sparse recovery is ubiquitous in machine learning and signal processing. Due to the NP-hard nature of sparse recovery, existing methods are known to suffer either from restrictive (or even unknown) applicability conditions, or high computational cost. Recently, iterative regularization methods have emerged as a promising fast approach because they can achieve sparse recovery in one pass through early stopping, rather than the tedious grid-search used in the traditional methods. However, most of those iterative methods are based on the $\ell_1$ norm which requires restrictive applicability conditions and could fail in many cases. Therefore, achieving sparse recovery with iterative regularization methods under a wider range of conditions has yet to be further explored. To address this issue, we propose a novel iterative regularization algorithm, IRKSN, based on the $k$-support norm regularizer rather than the $\ell_1$ norm. We provide conditions for sparse recovery with IRKSN, and compare them with traditional conditions for recovery with $\ell_1$ norm regularizers. Additionally, we give an early stopping bound on the model error of IRKSN with explicit constants, achieving the standard linear rate for sparse recovery. Finally, we illustrate the applicability of our algorithm on several experiments, including a support recovery experiment with a correlated design matrix.

翻訳日:2024-03-21 21:48:20 公開日:2024-03-19

# Health-LLM:Personalized Retrieval-Augmented Disease Prediction System

Health-LLM: Personalized Retrieval-Augmented Disease Prediction System ( http://arxiv.org/abs/2402.00746v6 )

ライセンス: Link先を確認

Mingyu Jin, Qinkai Yu, Dong Shu, Chong Zhang, Lizhou Fan, Wenyue Hua, Suiyuan Zhu, Yanda Meng, Zhenting Wang, Mengnan Du, Yongfeng Zhang,

(参考訳) 人工知能(AI)の最近の進歩、特に大きな言語モデル(LLM)は、医療応用を著しく進歩させ、インテリジェントな医療治療の可能性を実証している。しかし、膨大なデータ量や一貫性のない症状の特徴付け基準、個々の患者のニーズに医療AIシステムが完全に統合されるのを防ぐといった顕著な課題がある。専門的かつパーソナライズされた医療を促進するために,大規模特徴抽出と医療知識トレードオフスコアリングを組み合わせた,革新的なフレームワークHeath-LLMを提案する。従来の健康管理アプリケーションと比較して,1) 医療報告や医療知識を大きなモデルに統合し,大きな言語モデルに関連性のある質問を投げかける,2) 特徴抽出の強化にRAG(検索強化生成)機構を活用する,3) 半自動的な機能更新フレームワークを組み込んで,疾患予測の精度を向上させる,という3つの利点がある。我々は,Health-LLMシステムの有効性を評価するために,多数の健康レポートを実験した。その結果,提案システムは既存のシステムを超え,疾患予測とパーソナライズされた健康管理を著しく進める可能性が示唆された。コードはhttps://github.com/jmyissb/HealthLLM.comで入手できる。

Recent advancements in artificial intelligence (AI), especially large language models (LLMs), have significantly advanced healthcare applications and demonstrated potentials in intelligent medical treatment. However, there are conspicuous challenges such as vast data volumes and inconsistent symptom characterization standards, preventing full integration of healthcare AI systems with individual patients' needs. To promote professional and personalized healthcare, we propose an innovative framework, Heath-LLM, which combines large-scale feature extraction and medical knowledge trade-off scoring. Compared to traditional health management applications, our system has three main advantages: (1) It integrates health reports and medical knowledge into a large model to ask relevant questions to large language model for disease prediction; (2) It leverages a retrieval augmented generation (RAG) mechanism to enhance feature extraction; (3) It incorporates a semi-automated feature updating framework that can merge and delete features to improve accuracy of disease prediction. We experiment on a large number of health reports to assess the effectiveness of Health-LLM system. The results indicate that the proposed system surpasses the existing ones and has the potential to significantly advance disease prediction and personalized health management. The code is available at https://github.com/jmyissb/HealthLLM.

翻訳日:2024-03-21 21:48:20 公開日:2024-03-19

# 高エネルギー物理におけるイベント分類のためのハイブリッド量子ビジョン変換器

Hybrid Quantum Vision Transformers for Event Classification in High Energy Physics ( http://arxiv.org/abs/2402.00776v2 )

ライセンス: Link先を確認

Eyup B. Unlu, Marçal Comajoan Cara, Gopal Ramesh Dahale, Zhongtian Dong, Roy T. Forestano, Sergei Gleyzer, Daniel Justice, Kyoungchul Kong, Tom Magorsch, Konstantin T. Matchev, Katia Matcheva,

(参考訳) 視覚変換器アーキテクチャに基づくモデルは、画像分類タスクに関しては最先端と見なされる。しかし、トレーニングとデプロイメントの両方に広範な計算資源が必要である。データの量と複雑さが増加するにつれて、問題は悪化する。量子ベースの視覚変換器モデルは、同じ予測力を保ちながら、トレーニングと運用時間を短縮することで、この問題を軽減できる可能性がある。現在の量子コンピュータはまだ高次元のタスクを実行できないが、彼らは将来最も効率的なソリューションの1つを提供している。本研究では、高エネルギー物理学における分類問題(電磁カロリー計における光子と電子の識別)のための量子ハイブリッド・ビジョン・トランスフォーマーの様々なバリエーションを構築した。古典的なビジョントランスフォーマーアーキテクチャに対してテストします。以上の結果から,これらのハイブリッドモデルは,類似したパラメータを持つ古典的アナログに匹敵する性能を達成できることが示唆された。

Models based on vision transformer architectures are considered state-of-the-art when it comes to image classification tasks. However, they require extensive computational resources both for training and deployment. The problem is exacerbated as the amount and complexity of the data increases. Quantum-based vision transformer models could potentially alleviate this issue by reducing the training and operating time while maintaining the same predictive power. Although current quantum computers are not yet able to perform high-dimensional tasks yet, they do offer one of the most efficient solutions for the future. In this work, we construct several variations of a quantum hybrid vision transformer for a classification problem in high energy physics (distinguishing photons and electrons in the electromagnetic calorimeter). We test them against classical vision transformer architectures. Our findings indicate that the hybrid models can achieve comparable performance to their classical analogues with a similar number of parameters.

翻訳日:2024-03-21 21:48:20 公開日:2024-03-19

# 暗号センサ

Cryptographic Censorship ( http://arxiv.org/abs/2402.03425v2 )

ライセンス: Link先を確認

Netta Engelhardt, Åsmund Folkestad, Adam Levine, Evita Verheijden, Lisa Yang,

(参考訳) 我々は、弱宇宙検閲予想の量子バージョンを証明するために、2つの大きな一歩を踏み出した。ホログラフィック CFT の時間発展作用素が、あるコード部分空間上のほぼ擬似乱数(あるいはハール乱数)であるとき、対応するバルク双対に事象の地平線が存在する必要があることを示す定理である。この結果は、(有限時間における)事象の地平線の形成を、大域時空構造に関する最小の仮定で保証する一般的な条件を与える。我々の定理は、最近の量子学習のno-go定理の拡張に依存しており、疑似ランダム測度集中の新しい手法を用いて証明されている。この結果を宇宙検閲に適用するために、特異点を古典的、半プランク語型、プランク語型に分ける。古典的および半プランク的特異点がおよそ擬似乱数 CFT 時間進化と相容れないことを示し、したがって、そのような特異点が実際にほぼ擬似乱数であるならば、暗号検閲により、それらは事象の地平線が存在しない状態では存在できない。この結果は、一般に地平線の典型性に依存している量子カオスと熱化に関するセミナルホログラフィック結果がAdS/CFTにおける裸の特異点の形成によって無効にならないという十分な条件を与える。

We formulate and take two large strides towards proving a quantum version of the weak cosmic censorship conjecture. We first prove "Cryptographic Censorship": a theorem showing that when the time evolution operator of a holographic CFT is approximately pseudorandom (or Haar random) on some code subspace, then there must be an event horizon in the corresponding bulk dual. This result provides a general condition that guarantees (in finite time) event horizon formation, with minimal assumptions about the global spacetime structure. Our theorem relies on an extension of a recent quantum learning no-go theorem and is proved using new techniques of pseudorandom measure concentration. To apply this result to cosmic censorship, we separate singularities into classical, semi-Planckian, and Planckian types. We illustrate that classical and semi-Planckian singularities are compatible with approximately pseudorandom CFT time evolution; thus, if such singularities are indeed approximately pseudorandom, by Cryptographic Censorship, they cannot exist in the absence of event horizons. This result provides a sufficient condition guaranteeing that seminal holographic results on quantum chaos and thermalization, whose general applicability relies on typicality of horizons, will not be invalidated by the formation of naked singularities in AdS/CFT.

翻訳日:2024-03-21 21:38:31 公開日:2024-03-19

# HYPO:超球面分布の一般化

HYPO: Hyperspherical Out-of-Distribution Generalization ( http://arxiv.org/abs/2402.07785v2 )

ライセンス: Link先を確認

Yifei Ming, Haoyue Bai, Julian Katz-Samuels, Yixuan Li,

(参考訳) アウト・オブ・ディストリビューション(OOD)の一般化は、現実世界にデプロイされた機械学習モデルにとって重要である。しかし、異なるドメインや環境にまたがって不変の機能を学ぶ能力を必要とするため、これを実現することは基本的に困難である。本稿では,超球面空間における領域不変表現を証明的に学習する新しいフレームワークHYPO(HYPerspherical OOD generalization)を提案する。特に、我々の超球面学習アルゴリズムは、クラス内変異とクラス間分離原則によって導かれる -- 同じクラス(異なるトレーニング領域全体)のフィーチャがクラスプロトタイプと密接に一致していることを保証する一方で、異なるクラスプロトタイプが最大に分離されている。さらに、我々の原型学習目的がOOD一般化境界をどのように改善するかに関する理論的正当性を提供する。 OODベンチマークの挑戦実験を通じて、我々のアプローチが競争基準よりも優れ、優れたパフォーマンスを実現することを実証した。コードはhttps://github.com/deeplearning-wisc/hypo.comで入手できる。

Out-of-distribution (OOD) generalization is critical for machine learning models deployed in the real world. However, achieving this can be fundamentally challenging, as it requires the ability to learn invariant features across different domains or environments. In this paper, we propose a novel framework HYPO (HYPerspherical OOD generalization) that provably learns domain-invariant representations in a hyperspherical space. In particular, our hyperspherical learning algorithm is guided by intra-class variation and inter-class separation principles -- ensuring that features from the same class (across different training domains) are closely aligned with their class prototypes, while different class prototypes are maximally separated. We further provide theoretical justifications on how our prototypical learning objective improves the OOD generalization bound. Through extensive experiments on challenging OOD benchmarks, we demonstrate that our approach outperforms competitive baselines and achieves superior performance. Code is available at https://github.com/deeplearning-wisc/hypo.

翻訳日:2024-03-21 21:38:31 公開日:2024-03-19

# 非マルコフ位相雑音下での単一量子ビットゲートの量子力学写像

Quantum dynamical maps for single-qubit gates under non-Markovian phase noise ( http://arxiv.org/abs/2402.14530v2 )

ライセンス: Link先を確認

J. M. Sánchez Velázquez, A. Steiner, R. Freund, M. Guevara-Bertsch, Ch. D. Marciniak, T. Monz, A. Bermudez,

(参考訳) ノイズはユビキタスで、精度が要求される環境では一般的に有害である。これは、システムユーティリティがその影響下で急速に崩壊する量子技術分野において特に当てはまる。したがって、量子デバイスにおけるノイズを理解することは、その有害な影響を軽減または排除するための効率的な戦略の前提となる。しかし、これはしばしば禁止されるリソースを必要とするため、一般的に使用されるノイズモデルは、しばしば実験的な現実から逸脱する単純化に依存している。ここでは、単一実験入力(ノイズパワースペクトル密度)のみを必要とする単一量子ゲートのコンパクトな顕微鏡誤差モデルを導出する。我々のモデルは標準的な偏極化あるいはパウリ旋回ノイズモデルを超えており、非クリフォードおよび非マルコフの動的誤差写像への寄与を明示的に含んでいる。我々は,トラップイオン量子コンピュータ上で動作している確立された特性評価技術に対して,実験的な指標の予測を行う。特に,ランダム化ベンチマークを用いて測定し,量子プロセストモグラフィーを用いて再構成した平均ゲート誤差の実験的推定は,解析的推定により厳密に下界し,非分極モデルではゲート誤差を過大評価することがわかった。非マルコフ的寄与を含むノイズモデリングは、動的デカップリングや動的修正ゲートなどの確立されたフレームワークや、量子誤り訂正のためのより現実的なしきい値を提供するために、容易に適用することができる。

Noise is both ubiquitous and generally deleterious in settings where precision is required. This is especially true in the quantum technology sector where system utility typically decays rapidly under its influence. Understanding the noise in quantum devices is thus a prerequisite for efficient strategies to mitigate or even eliminate its harmful effects. However, this requires resources that are often prohibitive, such that the typically-used noise models rely on simplifications that sometimes depart from experimental reality. Here we derive a compact microscopic error model for single-qubit gates that only requires a single experimental input -the noise power spectral density. Our model goes beyond standard depolarizing or Pauli-twirled noise models, explicitly including non-Clifford and non-Markovian contributions to the dynamical error map. We gauge our predictions for experimentally relevant metrics against established characterization techniques run on a trapped-ion quantum computer. In particular, we find that experimental estimates of average gate errors measured through randomized benchmarking and reconstructed via quantum process tomography are tightly lower-bounded by our analytical estimates, while the depolarizing model overestimates the gate error. Our noise modeling including non-Markovian contributions can be readily applied to established frameworks such as dynamical decoupling and dynamically-corrected gates, or to provide more realistic thresholds for quantum error correction.

翻訳日:2024-03-21 21:38:31 公開日:2024-03-19

# 集積フォトニック回路における量子ビットと量子ビットを用いた任意の状態準備を行うための電気エネルギーコストの推定

Estimating the electrical energy cost of performing arbitrary state preparation using qubits and qudits in integrated photonic circuits ( http://arxiv.org/abs/2402.16603v2 )

ライセンス: Link先を確認

Maria Carolina Volpato, Pierre-Louis de Assis,

(参考訳) 量子情報処理に単一光子とフォトニック集積回路(PIC)を用いる場合、量子状態は量子ビットまたは量子ビットを用いて符号化することができる。量子ビットは、量子ビットが使用する2.log_2d$導波管と比較して$d$の導波管を必要とするため、同じ次元に達するのに使用される導波管の数より効率的である。有用なタスクに十分な大きさの次元については、他のリソースが少なくともPICの導波路の数と多項式的にスケールするため、これは明らかに量子ビットが最良の選択肢であることを示している。しかし、比較はそれほど直接的ではない。例えば、変分量子アルゴリズムに関係のある量子状態準備の課題について考察する。 n 個の量子ビットに対して、このタスクは回路に n で指数関数的な多くの制御NOT(CNOT)ゲートを持つ必要がある。どちらの実装も指数的なリソースコストに悩まされているため、より詳細な評価が必要である。我々は、量子状態の準備を行うためのフォトニック回路をプログラムするために、平均して費やされる電気エネルギーの量の観点から、キュービットとキューディットのアプローチを比較した。完全に再構成可能な干渉計の配列を持つPICを使用する場合、キュービットを使用するにはキュービットを使用するよりも多くのエネルギーを必要とする。しかしながら、専用CNOTブロックを持つ回路はエネルギー消費がはるかに小さく、CNOTゲートの確率的性質など、より重要なボトルネックが発生するような大きな量子ビット数でも有効であることを示す。

When using single photons and photonic integrated circuits (PICs) for quantum information processing, quantum states can be encoded using either qubits or qudits. Qudits are more efficient in terms of requiring less photons (in principle, only one) to encode the state, while qubits are more efficient in terms of the number of waveguides used to reach the same dimension $d$, as qudits need $d$ waveguides in comparison to the $2\log_2d$ waveguides used by qubits. For dimensions large enough for useful tasks, this would indicate that qubits are clearly the best option, as other resources scale at least polynomially with the number of waveguides in the PIC. The comparison, however, is not so direct. We consider the task of quantum state preparation, which is relevant for variational quantum algorithms, for instance. For n qubits, this task requires the circuit to have a number of Controlled-NOT (CNOT) gates that is exponential in n. Since both implementations suffer from an exponential resource cost, a more detailed evaluation is required. We compare the qubit and qudit approaches in terms of the amount of electrical energy that must be spent, on average, to program a photonic circuit to perform quantum state preparation. We find that if a PIC with a fully reconfigurable array of interferometers is to be used, using qubits requires more energy than using qudits. We show, however, that a circuit with dedicated CNOT blocks has a much smaller energy consumption, remaining viable even at large qubit numbers, where more important bottlenecks come into play, such as the probabilistic nature of the CNOT gates.

翻訳日:2024-03-21 21:28:43 公開日:2024-03-19

# センサ故障時の一般化可能性:Tokenization + Transformerにより、よりロバストな潜在空間が実現

Generalizability Under Sensor Failure: Tokenization + Transformers Enable More Robust Latent Spaces ( http://arxiv.org/abs/2402.18546v3 )

ライセンス: Link先を確認

Geeling Chau, Yujin An, Ahamed Raffey Iqbal, Soon-Jo Chung, Yisong Yue, Sabera Talukder,

(参考訳) 神経科学の主要な目標は、一般化する神経データ表現を発見することである。この目標には、記録セッション(eg環境)、被験者(egニューラルネットワーク構造)、センサ(egセンサノイズ)などに沿った変動性がある。最近の研究は、セッションや主題間の一般化に対処し始めているが、神経科学実験でよく見られるセンサー障害に対する堅牢性の研究はほとんどない。これらの一般化可能性次元に対処するために、我々はまず多数のセッション、被験者、センサーで独自の脳波データセットを収集し、次にEEGNet(Lawhern et al , 2018)とTOTEM(Talukder et al , 2024)の2つの時系列モデルを研究します。 EEGNetは広く使われている畳み込みニューラルネットワークであり、TOTEMは離散時系列トークンとトランスフォーマーモデルである。一般化可能なすべてのケースにおいて、TOTEMがEEGNetを上回ったり、マッチすることがわかった。最後に、TOTEMの潜在コードブックの分析を通して、トークン化が一般化を可能にすることを観察する。

A major goal in neuroscience is to discover neural data representations that generalize. This goal is challenged by variability along recording sessions (e.g. environment), subjects (e.g. varying neural structures), and sensors (e.g. sensor noise), among others. Recent work has begun to address generalization across sessions and subjects, but few study robustness to sensor failure which is highly prevalent in neuroscience experiments. In order to address these generalizability dimensions we first collect our own electroencephalography dataset with numerous sessions, subjects, and sensors, then study two time series models: EEGNet (Lawhern et al., 2018) and TOTEM (Talukder et al., 2024). EEGNet is a widely used convolutional neural network, while TOTEM is a discrete time series tokenizer and transformer model. We find that TOTEM outperforms or matches EEGNet across all generalizability cases. Finally through analysis of TOTEM's latent codebook we observe that tokenization enables generalization.

翻訳日:2024-03-21 21:28:43 公開日:2024-03-19

# ファウショット事例選択のためのインフォーマティブメトリックの設計

Designing Informative Metrics for Few-Shot Example Selection ( http://arxiv.org/abs/2403.03861v2 )

ライセンス: Link先を確認

Rishabh Adiga, Lakshminarayanan Subramanian, Varun Chandrasekaran,

(参考訳) 事前訓練された言語モデル(PLM)は、適切にフォーマットされた例を提供すると、顕著な数ショットの学習能力を示す。しかしながら、"ベスト"の例を選択することは、依然としてオープンな課題である。本稿では,複雑性に基づく逐次タギングタスクのプロンプト選択手法を提案する。このアプローチは、サンプルの選択専用のモデルのトレーニングを回避し、代わりに特定のメトリクスを使用して、テスト文や例の構文と意味の複雑さを調整する。文レベルと単語レベルの両方のメトリクスを用いて、例の複雑さと検討中の(テスト)文とを一致させる。 GPT-4のCoNLL2003データセットのF1スコアを5%改善し,NERの最先端性能を実現した。また、GPT-j-6Bのような小型モデルでは28.85ポイント(F1/Acc.)までの大きなゲインも見られる。

Pretrained language models (PLMs) have shown remarkable few-shot learning capabilities when provided with properly formatted examples. However, selecting the "best" examples remains an open challenge. We propose a complexity-based prompt selection approach for sequence tagging tasks. This approach avoids the training of a dedicated model for selection of examples, and instead uses certain metrics to align the syntactico-semantic complexity of test sentences and examples. We use both sentence- and word-level metrics to match the complexity of examples to the (test) sentence being considered. Our results demonstrate that our approach extracts greater performance from PLMs: it achieves state-of-the-art performance on few-shot NER, achieving a 5% absolute improvement in F1 score on the CoNLL2003 dataset for GPT-4. We also see large gains of upto 28.85 points (F1/Acc.) in smaller models like GPT-j-6B.

翻訳日:2024-03-21 21:18:47 公開日:2024-03-19

# Large, Small or Both: 意見要約の曖昧化のための言語モデルに基づく新しいデータ拡張フレームワーク

Large, Small or Both: A Novel Data Augmentation Framework Based on Language Models for Debiasing Opinion Summarization ( http://arxiv.org/abs/2403.07693v2 )

ライセンス: Link先を確認

Yanyue Zhang, Pengfei Li, Yilong Lai, Deyu Zhou, Yulan He,

(参考訳) 既存の意見要約データセットの70$\%以上のレビューは肯定的であるため、現在の意見要約アプローチは、否定的なテキストの入力によって負の要約を生成することに消極的である。このような感情バイアスに対処するために、特定のフレームワークに過度に依存しない直接的なアプローチは、データセットの感情分布のバランスをとるために、大きな言語モデルに基づいた追加データを生成することである。しかし、大きな言語モデルに基づくデータ拡張は2つの欠点に直面している。 1) 付加データの潜在的な問題又は毒性 2)コストがかかる。そこで本稿では,大小の言語モデルと大小の言語モデルに基づく新たなデータ拡張フレームワークを提案する。具体的には、大規模な言語モデルを用いて、肯定的なテキストを書き換えることにより、合成された否定的なレビューの小さなサイズを得る。そして、生成されたデータに基づいて、アンタングル復元モデルをトレーニングする。トレーニング後、混乱度と感情分類に基づいて、異なるサンプル表現とフィルタリングの組み合わせから得られた新しい表現を復号することで、大量の合成データを得ることができる。実験により、我々のフレームワークは、大きなモデルだけでなく、より経済的にも、感情バイアスを効果的に軽減できることが示された。

As more than 70$\%$ of reviews in the existing opinion summary data set are positive, current opinion summarization approaches are reluctant to generate negative summaries given the input of negative texts. To address such sentiment bias, a direct approach without the over-reliance on a specific framework is to generate additional data based on large language models to balance the emotional distribution of the dataset. However, data augmentation based on large language models faces two disadvantages: 1) the potential issues or toxicity in the augmented data; 2) the expensive costs. Therefore, in this paper, we propose a novel data augmentation framework based on both large and small language models for debiasing opinion summarization. In specific, a small size of synthesized negative reviews is obtained by rewriting the positive text via a large language model. Then, a disentangle reconstruction model is trained based on the generated data. After training, a large amount of synthetic data can be obtained by decoding the new representation obtained from the combination of different sample representations and filtering based on confusion degree and sentiment classification. Experiments have proved that our framework can effectively alleviate emotional bias same as using only large models, but more economically.

翻訳日:2024-03-21 21:18:47 公開日:2024-03-19

# 対話レコメンデーションのための生成ユーザシミュレータとしての大規模言語モデルの評価

Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation ( http://arxiv.org/abs/2403.09738v2 )

ライセンス: Link先を確認

Se-eun Yoon, Zhankui He, Jessica Maria Echterhoff, Julian McAuley,

(参考訳) 合成ユーザは,対話レコメンデーションシステムの評価において,実際のユーザにとって費用対効果の高いプロキシである。大規模言語モデルは、人間の様態をシミュレートし、多様なユーザーを表わす能力の疑問を提起する。本稿では,言語モデルが対話的推薦において人間の行動を正確にエミュレートできる程度を測定するための新しいプロトコルを提案する。このプロトコルは5つのタスクから構成されており、それぞれのタスクは、合成ユーザが提示すべき重要な特性、すなわち、どのアイテムについて話すべきかの選択、バイナリの好みの表現、オープンな好みの表現、レコメンデーションの要求、フィードバックの付与である。ベースラインシミュレータの評価を通じて、これらのタスクは人間の行動から言語モデルの逸脱を効果的に明らかにし、モデル選択と促進戦略による逸脱を減らす方法についての洞察を与える。

Synthetic users are cost-effective proxies for real users in the evaluation of conversational recommender systems. Large language models show promise in simulating human-like behavior, raising the question of their ability to represent a diverse population of users. We introduce a new protocol to measure the degree to which language models can accurately emulate human behavior in conversational recommendation. This protocol is comprised of five tasks, each designed to evaluate a key property that a synthetic user should exhibit: choosing which items to talk about, expressing binary preferences, expressing open-ended preferences, requesting recommendations, and giving feedback. Through evaluation of baseline simulators, we demonstrate these tasks effectively reveal deviations of language models from human behavior, and offer insights on how to reduce the deviations with model selection and prompting strategies.

翻訳日:2024-03-21 21:08:57 公開日:2024-03-19

# 透かしLLMの統計的理解に向けて

Towards Better Statistical Understanding of Watermarking LLMs ( http://arxiv.org/abs/2403.13027v1 )

ライセンス: Link先を確認

Zhongze Cai, Shang Liu, Hanzhao Wang, Huaiyang Zhong, Xiaocheng Li,

(参考訳) 本稿では,大規模言語モデル (LLM) の透かし問題について検討する。モデル歪みと検出能力のトレードオフを考慮し,Kirchenbauer et al (2023a) のグリーンレッドアルゴリズムに基づく制約付き最適化問題として定式化する。最適化問題に対する最適解法は、より理解し、ウォーターマーキングプロセスのアルゴリズム設計を刺激する優れた解析的性質を享受できることを示す。本研究では,この最適化定式化を考慮したオンライン二重勾配上昇透かしアルゴリズムを開発し,その漸近的パレート最適性をモデル歪みと検出能力の間で証明する。このような結果は、緑リストの確率が平均的に増加することを保証し、従って(以前の結果とは対照的に)明示的に検出する。さらに,透かし問題に対するモデル歪み指標の選択について,系統的な考察を行った。我々は、KLの発散の選択を正当化し、既存の「歪曲フリー」とパープレキシティの基準で問題を提示する。最後に、ベンチマークアルゴリズムに対して、広範囲なデータセットでアルゴリズムを実証的に評価する。

In this paper, we study the problem of watermarking large language models (LLMs). We consider the trade-off between model distortion and detection ability and formulate it as a constrained optimization problem based on the green-red algorithm of Kirchenbauer et al. (2023a). We show that the optimal solution to the optimization problem enjoys a nice analytical property which provides a better understanding and inspires the algorithm design for the watermarking process. We develop an online dual gradient ascent watermarking algorithm in light of this optimization formulation and prove its asymptotic Pareto optimality between model distortion and detection ability. Such a result guarantees an averaged increased green list probability and henceforth detection ability explicitly (in contrast to previous results). Moreover, we provide a systematic discussion on the choice of the model distortion metrics for the watermarking problem. We justify our choice of KL divergence and present issues with the existing criteria of ``distortion-free'' and perplexity. Finally, we empirically evaluate our algorithms on extensive datasets against benchmark algorithms.

翻訳日:2024-03-21 20:59:01 公開日:2024-03-19

# 回折限界に打ち勝つインテンシティ製品に基づく光センシング

Intensity product-based optical sensing to beat the diffraction limit ( http://arxiv.org/abs/2403.13029v1 )

ライセンス: Link先を確認

Byoung S. Ham,

(参考訳) 古典的に定義された光学位相感度は、ショットノイズ限界(SNL)または量子力学の不確実性原理に由来する標準量子極限として知られている。 SNLに基づいて、位相分解能(感度)は正方根Nに逆比例する。これにより、信号対雑音比の平方根Nゲインにより、高出力レーザーの使用が有利となる。しかし、光学干渉計では、Nプローブ光子が検出過程で解決されない限り、位相分解能はN=1の場合に留まり、古典光学の回折限界が生じる。そこで本研究では, SNL を実現するために, 高精細度製品を用いた光センシングのための投影型計測法を提案する。このため、干渉計の出力ポートの1つをNポートに均等に分割し、入力レーザの光子の総数によって最大Nが与えられるm度強度相関(mはN以下)について測定する。 N光子の最大時間遅延はレーザーのスペクトル帯域によって制限され、これは偶然検出に基づく量子センシングに使用される光子アンサンブルの有効コヒーレンス時間と同じである。

Classically defined optical phase sensitivity is known as the shot-noise limit (SNL) or standard quantum limit originating in the uncertainty principle of quantum mechanics. Based on SNL, the phase resolution (sensitivity) is inversely proportional to the square root N, where N is the number of interfering photons or individually measured events. Thus, using a high-power laser is advantageous due to the square root N gain in the signal-to-noise ratio. In an optical interferometer, however, the phase resolution remains in the N=1 case unless N probe photons are resolved in a detection process, resulting in the diffraction limit of classical optics. Here, a projective measurement is proposed for intensity product-based optical sensing to realize SNL in a typical interferometer commonly used for high-precision metrology. For this, one of the output ports of the interferometer is evenly divided into N ports and measured them for mth-intensity correlations (m is less than or equal to N), where the maximum N is given by the total number of photons of the input laser. The maximum temporal delay among N photons is constrained by the spectral bandwidth of the laser, which is the same as the effective coherence time of the photon ensemble used for coincidence detection-based quantum sensing.

翻訳日:2024-03-21 20:59:01 公開日:2024-03-19

# 階層ROIと適応量子化による超高忠実画像圧縮

Super-High-Fidelity Image Compression via Hierarchical-ROI and Adaptive Quantization ( http://arxiv.org/abs/2403.13030v1 )

ライセンス: Link先を確認

Jixiang Luo, Yan Wang, Hongwei Qin,

(参考訳) 学習された画像圧縮(lic)は、客観的および主観的メトリクスに関して劇的な進歩を遂げた。 MSEベースのモデルは客観的メトリクスを改善することを目的としており、生成モデルは主観的メトリクスによって測定された視覚的品質を改善するために活用される。しかし、いずれも低ビットレートで、特に0.2bpp$以下のぼやけや変形に悩まされている。さらに、人間の顔やテキストの変形は視覚的品質評価には受け入れられず、小さな顔やテキストではより顕著になる。この問題を解決するために、関心領域(ROI)を利用して、MSEベースのモデルと生成モデルの利点を組み合わせる。本研究では,顔,テキスト,複雑なテクスチャを含む領域の再構成を改善するために,画像を複数の前景領域と1つの背景領域に分割する階層ROI(H-ROI)を提案する。さらに、チャネル次元内における非線形マッピングによる適応量子化を提案し、視覚的品質を維持しながらビットレートを制約する。提案手法は,HiFiCの0.7X$ビット,BPGの0.5X$ビットなど,低ビットレートの小さな顔やテキストに対して,より視覚的品質が得られることを示す。

Learned Image Compression (LIC) has achieved dramatic progress regarding objective and subjective metrics. MSE-based models aim to improve objective metrics while generative models are leveraged to improve visual quality measured by subjective metrics. However, they all suffer from blurring or deformation at low bit rates, especially at below $0.2bpp$. Besides, deformation on human faces and text is unacceptable for visual quality assessment, and the problem becomes more prominent on small faces and text. To solve this problem, we combine the advantage of MSE-based models and generative models by utilizing region of interest (ROI). We propose Hierarchical-ROI (H-ROI), to split images into several foreground regions and one background region to improve the reconstruction of regions containing faces, text, and complex textures. Further, we propose adaptive quantization by non-linear mapping within the channel dimension to constrain the bit rate while maintaining the visual quality. Exhaustive experiments demonstrate that our methods achieve better visual quality on small faces and text with lower bit rates, e.g., $0.7X$ bits of HiFiC and $0.5X$ bits of BPG.

翻訳日:2024-03-21 20:59:01 公開日:2024-03-19

# RigorLLM: 望ましくないコンテンツに対する大規模言語モデルのための回復力のあるガードレール

RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content ( http://arxiv.org/abs/2403.13031v1 )

ライセンス: Link先を確認

Zhuowen Yuan, Zidi Xiong, Yi Zeng, Ning Yu, Ruoxi Jia, Dawn Song, Bo Li,

(参考訳) 大規模言語モデル(LLM)の最近の進歩は、様々な領域における様々なタスクにまたがる顕著な機能を示した。しかし、特に悪意のある入力の下では、バイアスの出現とLSMの有害なコンテンツを生成する可能性には大きな課題が生じる。現在の緩和戦略は効果はあるものの、敵の攻撃下では弾力性がない。本稿では,LLMに対する有害かつ安全でない入力と出力を効率よく効果的に抑制する新しいフレームワークであるResilient Guardrails for Large Language Models (RigorLLM)を紹介する。ランゲヴィンダイナミクスによるエネルギーベースのトレーニングデータ拡張を含む多面的アプローチを採用し、ミニマックス最適化による入力に対する安全なサフィックスを最適化し、我々のデータ拡張に基づくロバストなKNNとLLMを組み合わせた融合モデルを統合することにより、RigorLLMは有害なコンテンツモデレーションに対する堅牢なソリューションを提供する。実験により、RigorLLMは、有害なコンテンツの検出において、OpenAI APIやAspective APIのような既存のベースラインよりも優れているだけでなく、脱獄攻撃に対する非並列なレジリエンスも示している。制約付き最適化とフュージョンベースのガードレールアプローチの革新的利用は、よりセキュアで信頼性の高いLCMを開発するための大きな一歩であり、デジタル脅威の進化に直面したコンテンツモデレーションフレームワークの新たな標準となる。

Recent advancements in Large Language Models (LLMs) have showcased remarkable capabilities across various tasks in different domains. However, the emergence of biases and the potential for generating harmful content in LLMs, particularly under malicious inputs, pose significant challenges. Current mitigation strategies, while effective, are not resilient under adversarial attacks. This paper introduces Resilient Guardrails for Large Language Models (RigorLLM), a novel framework designed to efficiently and effectively moderate harmful and unsafe inputs and outputs for LLMs. By employing a multi-faceted approach that includes energy-based training data augmentation through Langevin dynamics, optimizing a safe suffix for inputs via minimax optimization, and integrating a fusion-based model combining robust KNN with LLMs based on our data augmentation, RigorLLM offers a robust solution to harmful content moderation. Our experimental evaluations demonstrate that RigorLLM not only outperforms existing baselines like OpenAI API and Perspective API in detecting harmful content but also exhibits unparalleled resilience to jailbreaking attacks. The innovative use of constrained optimization and a fusion-based guardrail approach represents a significant step forward in developing more secure and reliable LLMs, setting a new standard for content moderation frameworks in the face of evolving digital threats.

翻訳日:2024-03-21 20:59:01 公開日:2024-03-19

# 産業バッチプロセス監視のためのハイブリッド型教師なし学習戦略

Hybrid Unsupervised Learning Strategy for Monitoring Industrial Batch Processes ( http://arxiv.org/abs/2403.13032v1 )

ライセンス: Link先を確認

Christian W. Frey,

(参考訳) 工業生産プロセス、特に製薬業界は、効率、製品品質、安全性を確保するために継続的な監視を必要とする複雑なシステムである。本稿では,複雑な産業プロセスを監視するためのハイブリッド型教師なし学習戦略(HULS)を提案する。従来の自己組織化マップ(SOM)の制限、特にバランスの取れていないデータセットと高相関のプロセス変数のシナリオに対処するため、HULSは既存の教師なし学習技術を組み合わせてこれらの課題に対処する。 HULSの概念の性能を評価するために,実験室のバッチに基づいて比較実験を行った。

Industrial production processes, especially in the pharmaceutical industry, are complex systems that require continuous monitoring to ensure efficiency, product quality, and safety. This paper presents a hybrid unsupervised learning strategy (HULS) for monitoring complex industrial processes. Addressing the limitations of traditional Self-Organizing Maps (SOMs), especially in scenarios with unbalanced data sets and highly correlated process variables, HULS combines existing unsupervised learning techniques to address these challenges. To evaluate the performance of the HULS concept, comparative experiments are performed based on a laboratory batch

翻訳日:2024-03-21 20:59:01 公開日:2024-03-19

# 部分オラクルとグローバーアルゴリズムを用いた加速量子探索

Accelerated quantum search using partial oracles and Grover's algorithm ( http://arxiv.org/abs/2403.13035v1 )

ライセンス: Link先を確認

Fintan M. Bolton,

(参考訳) グロバーのアルゴリズム(英: Grover's algorithm)は、順序のないデータベースを探索する手段として考案されたアルゴリズムであり、量子計算によって生成された結果集合から解を抽出する手法としても用いられる。このアルゴリズムはオラクル関数の概念を利用して、検索項目をマッチングするプロセスを抽象化する(マッチと0を返却する)。このブラックボックスアプローチは、検索問題の詳細を隠蔽し、検索空間内のアイテムが完全に順序づけられていないという仮定で機能する。この場合、サイズ$N$の検索空間で1つのターゲット項目を検索すると、$\mathcal{O}(\sqrt{N})$oracleクエリ(または$\mathcal{O}(\sqrt{N/M})$oracleクエリ)としてスケールする。しかし、典型的なブラックボックスのオラクルの中に隠されているアイテムをマッチングするプロセスは、通常複数のデータビットをマッチングする。本稿では,個別のオラクルをマッチング条件の各ビットに関連付け,独立にテスト可能な複数の部分的なオラクル関数を得るという考え方について検討する。このアイデアを探索することで、(Groverのように)$\mathcal{O}(\sqrt{N})$ (Same as Grover) や$\mathcal{O}(\log(N))$ (exponentially faster) など、幅広い範囲でパフォーマンスが低下する多段階ハイブリッド検索アルゴリズムが実現される。探索加速度は探索空間の順序を動的に発見し、その順序は部分オラクル関数と探索インデックスの相関関係から成り立っている。このアルゴリズムは最も単純な検索シナリオに対して検証される。

Grover's algorithm, orginally conceived as a means of searching an unordered database, can also be used as a technique for extracting solutions from the result sets generated by quantum computations. The algorithm exploits the concept of an oracle function, which abstracts the process of matching a search item (returning 1 for a match and 0 otherwise). This black-box approach hides the details of a search problem and works with the assumption that the items in the search space are completely unordered. In this case, searching for 1 target item in a search space of size $N$ scales as $\mathcal{O}(\sqrt{N})$ oracle queries (or $\mathcal{O}(\sqrt{N/M})$ oracle queries for $M$ target items in a search space of size $N$). Hidden inside the typical black-box oracle, however, the process of matching an item usually involves matching multiple data bits. In this article, we explore the idea of associating a separate oracle with each bit of the matching condition, obtaining multiple partial oracle functions which can be tested independently. Exploring this idea leads to a multi-stage hybrid search algorithm, whose performance can fall within a wide range, anywhere between $\mathcal{O}(\sqrt{N})$ (same as Grover) or $\mathcal{O}(\log(N))$ (exponentially faster). Apparently, the search acceleration works by dynamically discovering order in the search space, where the order consists of correlations between the partial oracle functions and the search index. This new algorithm is validated against the simplest kind of search scenario.

翻訳日:2024-03-21 20:59:01 公開日:2024-03-19

# BiLoRA: 大規模事前学習モデルの高効率低ランク適応のための2レベル最適化フレームワーク

BiLoRA: A Bi-level Optimization Framework for Overfitting-Resilient Low-Rank Adaptation of Large Pre-trained Models ( http://arxiv.org/abs/2403.13037v1 )

ライセンス: Link先を確認

Rushi Qiang, Ruiyi Zhang, Pengtao Xie,

(参考訳) 低ランク適応(LoRA)は、低ランクインクリメンタル行列を学習することにより、下流タスクにおける大規模事前学習モデルの微調整に人気がある手法である。 LoRAとその変種は、完全な微調整法に比べてトレーニング可能なパラメータの数を効果的に減少させるが、トレーニングデータによく適合し、テストデータに対する準最適一般化をもたらす。この問題に対処するために,バイレベル最適化(BLO)に基づく過度な微調整手法であるBiLoRAを導入する。 BiLoRAは擬似特異値分解を用いて低ランクインクリメンタル行列をパラメータ化し、擬似特異ベクトルと値のトレーニングをトレーニングデータの2つの異なるサブセットに分割する。この分割は、BLOフレームワークの別のレベルに埋め込まれており、単一のデータセットに過度に適合するリスクを軽減する。自然言語の理解と生成タスクをカバーする10のデータセットでテストされ、よく知られた大規模な事前学習モデルに適用されたBiLoRAは、同様のトレーニング可能なパラメータを持つLoRAメソッドやその他の微調整アプローチを著しく上回っている。

Low-rank adaptation (LoRA) is a popular method for fine-tuning large-scale pre-trained models in downstream tasks by learning low-rank incremental matrices. Though LoRA and its variants effectively reduce the number of trainable parameters compared to full fine-tuning methods, they often overfit training data, resulting in sub-optimal generalization on test data. To address this problem, we introduce BiLoRA, an overfitting-alleviating fine-tuning approach based on bi-level optimization (BLO). BiLoRA employs pseudo singular value decomposition to parameterize low-rank incremental matrices and splits the training of pseudo singular vectors and values across two different subsets of training data. This division, embedded within separate levels of the BLO framework, mitigates the risk of overfitting to a single dataset. Tested on ten datasets covering natural language understanding and generation tasks and applied to various well-known large pre-trained models, BiLoRA significantly outperforms LoRA methods and other fine-tuning approaches, with similar amounts of trainable parameters.

翻訳日:2024-03-21 20:59:01 公開日:2024-03-19

# 顔表情認識のための注意融合型エモティックマスク付きオートエンコーダ

Emotic Masked Autoencoder with Attention Fusion for Facial Expression Recognition ( http://arxiv.org/abs/2403.13039v1 )

ライセンス: Link先を確認

Bach Nguyen-Xuan, Thien Nguyen-Hoang, Nhu Tai-Do,

(参考訳) 表情認識(FER)はコンピュータビジョンにおける重要な課題であり、様々な領域にまたがる多様な応用がある。表現認識モデルの一般化能力を損なうような限られたFERデータセットの課題に対処することは、性能向上に不可欠である。本稿では,表現分類におけるMAE-Face Self-supervised Learning(SSL)手法とFusion Attention(フュージョン・アテンション・アテンション・メカニズム)を統合した革新的なアプローチを提案する。さらに,Aff-wild2データセットで顕著に示されたトレーニングセットと検証セットのモデル性能を向上させるために,重要な顔特徴を強調する前処理手法を提案する。

Facial Expression Recognition (FER) is a critical task within computer vision with diverse applications across various domains. Addressing the challenge of limited FER datasets, which hampers the generalization capability of expression recognition models, is imperative for enhancing performance. Our paper presents an innovative approach integrating the MAE-Face self-supervised learning (SSL) method and Fusion Attention mechanism for expression classification, particularly showcased in the 6th Affective Behavior Analysis in-the-wild (ABAW) competition. Additionally, we propose preprocessing techniques to emphasize essential facial features, thereby enhancing model performance on both training and validation sets, notably demonstrated on the Aff-wild2 dataset.

翻訳日:2024-03-21 20:59:01 公開日:2024-03-19

# 心室内ベクトルフローマッピングのための物理誘導型ニューラルネットワーク

Physics-Guided Neural Networks for Intraventricular Vector Flow Mapping ( http://arxiv.org/abs/2403.13040v1 )

ライセンス: Link先を確認

Hang Jung Ling, Salomé Bru, Julia Puig, Florian Vixège, Simon Mendez, Franck Nicoud, Pierre-Yves Courand, Olivier Bernard, Damien Garcia,

(参考訳) 心内ベクターフローマッピング(iVFM)は、心臓画像におけるカラードプラの増強と定量化を目的としている。本研究では,物理インフォームドニューラルネットワーク (PINN) と物理誘導 nnU-Net を用いた教師付きアプローチを用いて,従来の iVFM 最適化手法に代わる新しい手法を提案する。患者固有の流体力学モデルから得られたカラードップラー画像の厳密な評価と生体内ドップラーの取得により、どちらの手法も元のiVFMアルゴリズムに匹敵する再現性能を示した。 PINNの効率は2段最適化と事前最適化により向上する。一方、nnU-Net法は一般化性とリアルタイム性に優れる。特に、nnU-Netは、明示的な境界条件からの独立性を維持しつつ、スパースおよびトランケートドップラーデータに優れたロバスト性を示す。以上の結果から,心室内ベクター血流の再建におけるこれらの方法の有効性が示唆された。この研究は、超高速カラードプライメージングにおけるPINNの潜在的な応用と、血流に基づく心臓血管疾患のバイオマーカーを導出するための流体力学方程式の導入についても示唆している。

Intraventricular vector flow mapping (iVFM) seeks to enhance and quantify color Doppler in cardiac imaging. In this study, we propose novel alternatives to the traditional iVFM optimization scheme by utilizing physics-informed neural networks (PINNs) and a physics-guided nnU-Net-based supervised approach. Through rigorous evaluation on simulated color Doppler images derived from a patient-specific computational fluid dynamics model and in vivo Doppler acquisitions, both approaches demonstrate comparable reconstruction performance to the original iVFM algorithm. The efficiency of PINNs is boosted through dual-stage optimization and pre-optimized weights. On the other hand, the nnU-Net method excels in generalizability and real time capabilities. Notably, nnU-Net shows superior robustness on sparse and truncated Doppler data while maintaining independence from explicit boundary conditions. Overall, our results highlight the effectiveness of these methods in reconstructing intraventricular vector blood flow. The study also suggests potential applications of PINNs in ultrafast color Doppler imaging and the incorporation of fluid dynamics equations to derive biomarkers for cardiovascular diseases based on blood flow.

翻訳日:2024-03-21 20:59:01 公開日:2024-03-19

# 非プロプライエタリなプレプロシージャによる予測可能なプライバシ

Provable Privacy with Non-Private Pre-Processing ( http://arxiv.org/abs/2403.13041v1 )

ライセンス: Link先を確認

Yaxi Hu, Amartya Sanyal, Bernhard Schölkopf,

(参考訳) Differentially Private(DP)機械学習パイプラインを分析する場合、データ依存の事前処理の潜在的なプライバシコストは、プライバシ会計においてしばしば見過ごされる。本研究では,非プライベートなデータ依存型前処理アルゴリズムによって生じる追加のプライバシーコストを評価するための一般的なフレームワークを提案する。本フレームワークは,Smooth DPと呼ばれるDPの変種と,前処理アルゴリズムの限界感度という,2つの新しい技術的概念を活用することにより,全体的なプライバシー保証の上限を確立する。汎用フレームワークに加えて、複数のDPアルゴリズムと組み合わせて使用する場合、データ計算、量子化、復号化、PCAなどの複数のデータ依存事前処理アルゴリズムに対して、全体的なプライバシー保証を提供する。このフレームワークは実装も簡単で、既存のDPパイプラインに直接統合できる。

When analysing Differentially Private (DP) machine learning pipelines, the potential privacy cost of data-dependent pre-processing is frequently overlooked in privacy accounting. In this work, we propose a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms. Our framework establishes upper bounds on the overall privacy guarantees by utilising two new technical notions: a variant of DP termed Smooth DP and the bounded sensitivity of the pre-processing algorithms. In addition to the generic framework, we provide explicit overall privacy guarantees for multiple data-dependent pre-processing algorithms, such as data imputation, quantization, deduplication and PCA, when used in combination with several DP algorithms. Notably, this framework is also simple to implement, allowing direct integration into existing DP pipelines.

翻訳日:2024-03-21 20:59:01 公開日:2024-03-19

# TAPTR: トランスフォーマーを検出として任意のポイントを追跡する

TAPTR: Tracking Any Point with Transformers as Detection ( http://arxiv.org/abs/2403.13042v1 )

ライセンス: Link先を確認

Hongyang Li, Hao Zhang, Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Lei Zhang,

(参考訳) 本稿では,TRansformers (TAPTR) を用いた任意の点追跡のためのシンプルで強力なフレームワークを提案する。点追跡は物体検出と追跡に非常によく似ているという観測に基づいて,TAPの課題に対処するためにDETRライクなアルゴリズムから設計を借りる。提案フレームワークでは、各ビデオフレームにおいて、各トラッキングポイントを位置部分とコンテンツ部分からなるポイントクエリとして表現する。 DETRのように、各クエリ(位置とコンテンツ機能)は層ごとに自然に更新される。その可視性は、そのアップデートされたコンテンツ機能によって予測される。同じ追跡ポイントに属するクエリは、時間次元に沿って自己アテンションを介して情報を交換することができる。これらの操作はすべてDETRのようなアルゴリズムでよく設計されているため、概念的には非常に単純である。また,光学フローモデルからのコスト容積などの有用な設計も採用し,機能ドリフト問題を緩和しつつ,長時間の時間的情報を提供するための簡易な設計を開発した。提案フレームワークは,高速な推論速度を持つ様々なTAPデータセットに対して,最先端の性能で高い性能を示す。

In this paper, we propose a simple and strong framework for Tracking Any Point with TRansformers (TAPTR). Based on the observation that point tracking bears a great resemblance to object detection and tracking, we borrow designs from DETR-like algorithms to address the task of TAP. In the proposed framework, in each video frame, each tracking point is represented as a point query, which consists of a positional part and a content part. As in DETR, each query (its position and content feature) is naturally updated layer by layer. Its visibility is predicted by its updated content feature. Queries belonging to the same tracking point can exchange information through self-attention along the temporal dimension. As all such operations are well-designed in DETR-like algorithms, the model is conceptually very simple. We also adopt some useful designs such as cost volume from optical flow models and develop simple designs to provide long temporal information while mitigating the feature drifting issue. Our framework demonstrates strong performance with state-of-the-art performance on various TAP datasets with faster inference speed.

翻訳日:2024-03-21 20:59:01 公開日:2024-03-19

# より大型のビジョンモデルはいつ必要か?

When Do We Not Need Larger Vision Models? ( http://arxiv.org/abs/2403.13043v1 )

ライセンス: Link先を確認

Baifeng Shi, Ziyang Wu, Maolin Mao, Xin Wang, Trevor Darrell,

(参考訳) 視覚モデルのサイズを拡大することが、より強力な視覚表現を得るためのデファクトスタンダードとなっている。本稿では,より大きな視覚モデルが不要な点について論じる。まず、トレーニング済みで凍結された小さな視覚モデル(例えば、ViT-BまたはViT-L)を複数の画像スケールで実行することで、分類、セグメンテーション、深さ推定、マルチモーダルLLM(MLLM)ベンチマーク、ロボット操作において、より大きなモデル(例えば、ViT-HまたはViT-G)よりも優れた性能を発揮できる(S$^2$)。特に、S$^2$は、GPT-4Vのようなモデルを上回る、V*ベンチマーク上でのMLLMの詳細な理解において、最先端のパフォーマンスを達成する。 S$^2$がモデルサイズでのスケーリングよりも好ましいスケーリング手法である条件について検討する。より大型のモデルでは、ハードな例でのより優れた一般化の利点があるが、より大型の視覚モデルの特徴は、マルチスケールの小型モデルによってよく近似できることを示す。これは、全てではないとしても、現在の大きな事前訓練されたモデルによって学習された表現のほとんどが、マルチスケールのより小さなモデルから得ることができることを示唆している。以上の結果から,S$^2$の事前学習モデルでは,より大規模なモデルに匹敵する学習能力を有し,より大規模なモデルに匹敵するか,あるいはその優位性を超えうることが示された。我々は、任意のビジョンモデルに1行のコードでS$^2$を適用することができるPythonパッケージをリリースした。

Scaling up the size of vision models has been the de facto standard to obtain more powerful visual representations. In this work, we discuss the point beyond which larger vision models are not necessary. First, we demonstrate the power of Scaling on Scales (S$^2$), whereby a pre-trained and frozen smaller vision model (e.g., ViT-B or ViT-L), run over multiple image scales, can outperform larger models (e.g., ViT-H or ViT-G) on classification, segmentation, depth estimation, Multimodal LLM (MLLM) benchmarks, and robotic manipulation. Notably, S$^2$ achieves state-of-the-art performance in detailed understanding of MLLM on the V* benchmark, surpassing models such as GPT-4V. We examine the conditions under which S$^2$ is a preferred scaling approach compared to scaling on model size. While larger models have the advantage of better generalization on hard examples, we show that features of larger vision models can be well approximated by those of multi-scale smaller models. This suggests most, if not all, of the representations learned by current large pre-trained models can also be obtained from multi-scale smaller models. Our results show that a multi-scale smaller model has comparable learning capacity to a larger model, and pre-training smaller models with S$^2$ can match or even exceed the advantage of larger models. We release a Python package that can apply S$^2$ on any vision model with one line of code: https://github.com/bfshi/scaling_on_scales.

翻訳日:2024-03-21 20:59:01 公開日:2024-03-19

# Magic Fixup:ダイナミックビデオによる写真編集の合理化

Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos ( http://arxiv.org/abs/2403.13044v1 )

ライセンス: Link先を確認

Hadi Alzayer, Zhihao Xia, Xuaner Zhang, Eli Shechtman, Jia-Bin Huang, Michael Gharbi,

(参考訳) 本稿では、粗い編集画像が与えられた場合、所定のレイアウトに従うフォトリアリスティックな出力を合成する生成モデルを提案する。本手法は,元の画像から細かな詳細を転送し,その部分の同一性を保持する。しかし、新しいレイアウトで定義された照明とコンテキストに適応する。物体やカメラの動きは、視点、照明、物理的相互作用によって世界がどのように変化するかを多くの観察する。我々は,サンプルが同一ビデオからランダムに選択された時間間隔で抽出された,一対のソースとターゲットフレームである画像データセットを構築した。期待されるテストタイムのユーザ編集を模倣する2つのモーションモデルを用いて、ソースフレームをターゲットに向けてワープする。我々は、事前訓練された拡散モデルから、歪んだ画像を地上の真実に翻訳するモデルを監督する。モデル設計では,ユーザが指定したレイアウトを忠実に追従しながら,ソースフレームから生成画像への細部移動を可能にする。簡単なセグメンテーションと粗い2D操作により、編集対象間の光の調和や物理的相互作用といった2階効果に対処しながら、ユーザの入力に忠実なフォトリアリスティックな編集を合成できることが示される。

We propose a generative model that, given a coarsely edited image, synthesizes a photorealistic output that follows the prescribed layout. Our method transfers fine details from the original image and preserves the identity of its parts. Yet, it adapts it to the lighting and context defined by the new layout. Our key insight is that videos are a powerful source of supervision for this task: objects and camera motions provide many observations of how the world changes with viewpoint, lighting, and physical interactions. We construct an image dataset in which each sample is a pair of source and target frames extracted from the same video at randomly chosen time intervals. We warp the source frame toward the target using two motion models that mimic the expected test-time user edits. We supervise our model to translate the warped image into the ground truth, starting from a pretrained diffusion model. Our model design explicitly enables fine detail transfer from the source frame to the generated image, while closely following the user-specified layout. We show that by using simple segmentations and coarse 2D manipulations, we can synthesize a photorealistic edit faithful to the user's input while addressing second-order effects like harmonizing the lighting and physical interactions between edited objects.

翻訳日:2024-03-21 20:59:01 公開日:2024-03-19

# 局所可観測物の熱化に及ぼす非可換電荷の影響

Noncommuting charges' effect on the thermalization of local observables ( http://arxiv.org/abs/2403.13046v1 )

ライセンス: Link先を確認

Shayan Majidy,

(参考訳) 非可換保存量(または「チャージ」)の研究は、概念的なパズルを生み出した。近年の研究では、非交換電荷はいくつかの点で熱化を妨げるが、他の方法では促進することが示唆されている。この問題を解決するために, 固有状態熱化仮説に従って熱化する局所観測値の数を減らし, 非可換電荷が熱化を促進することを示す。まず、観測対象が熱を起こさないための電荷と十分な条件の対応性を確立する。これらの条件は「力学対称性」として知られている。「ハミルトニアンが持つ動的対称性のペアごとに対応する電荷が存在することを実証する。相互関係が広い範囲の突撃を担っていることを証明している。この対応から、システムに新しい電荷を導入することで、既存の非定常力学に寄与するか、破壊することができることを示す。新しい電荷が既存の電荷と通勤する場合、システムの非定常動力学はそのまま残り、新しい電荷が出現し、そうでなければ既存の非定常動力学は除去される。 Variouモデルを用いて結果を説明する。その結果,非交換電荷が促進する熱化の面が示された。

Studying noncommuting conserved quantities (or `charges') has produced a conceptual puzzle. Recent results suggest that noncommuting charges hinder thermalization in some ways, yet promote it in others. To help resolve this puzzle, we demonstrate how noncommuting charges can promote thermalization by reducing the number of local observables that thermalize according to the Eigenstate Thermalization Hypothesis. We first establish a correspondence between charges and sufficient conditions for observables to not thermalize. These conditions are known as `dynamical symmetries.' We demonstrate that for each pair of dynamical symmetries a Hamiltonian has, there exists a corresponding charge. We prove that the reciprocal relationship holds for a broad range of charges. From this correspondence, we demonstrate that introducing a new charge to a system can either contribute to or disrupt its existing non-stationary dynamics. If the new charge commutes with existing ones, the system's non-stationary dynamics remain intact, and new ones emerge; if not, the existing non-stationary dynamics are removed. We illustrate our results using variou models. Our results demonstrate a facet of thermalization which noncommuting charges promote.

翻訳日:2024-03-21 20:59:01 公開日:2024-03-19

# SceneScript: 自己回帰型構造化言語モデルでシーンを再構築する

SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model ( http://arxiv.org/abs/2403.13064v1 )

ライセンス: Link先を確認

Armen Avetisyan, Christopher Xie, Henry Howard-Jenkins, Tsun-Yi Yang, Samir Aroudj, Suvam Patra, Fuyang Zhang, Duncan Frost, Luke Holland, Campbell Orme, Jakob Engel, Edward Miller, Richard Newcombe, Vasileios Balntas,

(参考訳) SceneScriptは,自己回帰型トークンベースのアプローチを用いて,構造化言語コマンドのシーケンスとして,シーンモデルを直接生成する手法である。提案するシーン表現は,近年のトランスフォーマーとLLMの成功に触発され,メッシュやボクセルグリッド,点雲,放射場などのシーンを一般的に記述する従来の手法から逸脱する。本手法は,シーン言語エンコーダ・デコーダアーキテクチャを用いて,映像データから構造化言語コマンドのセットを直接推論する。 SceneScriptを訓練するために、100万の高品質な室内シーンからなるAria Synthetic Environmentsと呼ばれる大規模な合成データセットを生成し、リリースする。提案手法は,3次元オブジェクト検出において,構造的レイアウト推定における最先端の成果と競合する結果を与える。最後に、構造化言語への簡単な追加を通じて、新しいコマンドに簡単に適応できるSceneScriptの利点について検討する。

We introduce SceneScript, a method that directly produces full scene models as a sequence of structured language commands using an autoregressive, token-based approach. Our proposed scene representation is inspired by recent successes in transformers & LLMs, and departs from more traditional methods which commonly describe scenes as meshes, voxel grids, point clouds or radiance fields. Our method infers the set of structured language commands directly from encoded visual data using a scene language encoder-decoder architecture. To train SceneScript, we generate and release a large-scale synthetic dataset called Aria Synthetic Environments consisting of 100k high-quality in-door scenes, with photorealistic and ground-truth annotated renders of egocentric scene walkthroughs. Our method gives state-of-the art results in architectural layout estimation, and competitive results in 3D object detection. Lastly, we explore an advantage for SceneScript, which is the ability to readily adapt to new commands via simple additions to the structured language, which we illustrate for tasks such as coarse 3D object part reconstruction.

翻訳日:2024-03-21 20:59:01 公開日:2024-03-19

# 自由電子量子光学系における強結合と単一光子非線形性

Strong coupling and single-photon nonlinearity in free-electron quantum optics ( http://arxiv.org/abs/2403.13071v1 )

ライセンス: Link先を確認

Aviv Karnieli, Charles Roques-Carmes, Nicholas Rivera, Shanhui Fan,

(参考訳) 自由電子が量子化された電磁場や物質系とコヒーレントに相互作用できるという観測は、自由電子のユニークな量子的性質を活用する多くの提案につながった。これらの提案の中心には、空飛ぶ自由電子とフォトニックモードの間の強い量子相互作用の仮定がある。しかし、既存のスキームは電子回折によって本質的に制限され、相互作用長と量子カップリング強度に上限が与えられる。ここでは、自由電子が2つの誘導モードで共伝播する「自由電子ファイバー」を効果的に1次元フォトニックシステムとして使用することを提案する。第1モードは、自由電子に雷動トラップを適用し、電子回折による限界を効果的に引き上げる。第2モードはガイドされた自由電子に強く結合し、以前の設計よりも桁違いに大きい結合が強化される。さらに,提案手法によって実現された相互作用長の延長により,自由電子を介する強い単一光子非線形性を実現することができる。我々は、決定論的単一光子放出や複素非線形多モードダイナミクスなど、我々のシステムにおけるいくつかの興味深い観測可能な量子効果を予測する。我々の提案は、非ガウス光発生、決定論的単一光子放出、自由電子-光子相互作用によって制御される量子ゲートなど、自由電子量子光学における多くの期待される効果の実現に向けた道を開くものである。

The observation that free electrons can interact coherently with quantized electromagnetic fields and matter systems has led to a plethora of proposals leveraging the unique quantum properties of free electrons. At the heart of these proposals lies the assumption of a strong quantum interaction between a flying free electron and a photonic mode. However, existing schemes are intrinsically limited by electron diffraction, which puts an upper bound on the interaction length and therefore the quantum coupling strength. Here, we propose the use of "free-electron fibers'': effectively one-dimensional photonic systems where free electrons co-propagate with two guided modes. The first mode applies a ponderomotive trap to the free electron, effectively lifting the limitations due to electron diffraction. The second mode strongly couples to the guided free electron, with an enhanced coupling that is orders of magnitude larger than previous designs. Moreover, the extended interaction lengths enabled by our scheme allows for strong single-photon nonlinearities mediated by free electrons. We predict a few interesting observable quantum effects in our system, such as deterministic single-photon emission and complex, nonlinear multimode dynamics. Our proposal paves the way towards the realization of many anticipated effects in free-electron quantum optics, such as non-Gaussian light generation, deterministic single photon emission, and quantum gates controlled by free-electron--photon interactions.