Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20230708となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# バグレポートの明確化問題に対するディープラーニングと構造化情報検索の利用 Employing Deep Learning and Structured Information Retrieval to Answer Clarification Questions on Bug Reports ( http://arxiv.org/abs/2304.12494v3 ) ライセンス: Link先を確認	Usmi Mukherjee and Mohammad Masudur Rahman	(参考訳) バグ追跡システムに関するソフトウェアバグ報告は、開発者が迅速に解決するための重要な情報を欠いていることが多い。バグレポーターが使用するさまざまなテンプレートを使用して、バグトラッキングシステムにおいて、バグレポーターから情報を効果的に引き出すための重要な研究が行われている。しかし、フォローアップ質問の必要性は続いている。最近の研究では、開発者が不足した詳細を知るのを助けるために、これらのフォローアップ質問を提案する手法が提案されているが、フォローアップ質問に答える研究はほとんど行われていない。本稿では,CodeT5とLuceneを併用した新しい手法を提案する。これは,異なるバグレポート,そのコンポーネント,および回答を推薦するためのフォローアップ質問の関連性を活用した情報検索手法である。これらのトップパフォーマンスの回答は、バグレポートとともに、欠陥のあるバグレポートから、回答を生成するディープラーニングモデルまで、追加のコンテキストとして機能する。我々は,正規化Smooth BLEUスコア, METEOR, Word Mover's Distance, Semantic similarity などの類似度指標を用いて,手動で注釈付き回答を評価した。我々は,最大34のBLEUスコアと64のセマンティック類似性を達成し,生成した回答がGoogleの標準に従って理解され,良好であることを示し,複数のベースラインを上回り得ることを示す。 Software bug reports reported on bug-tracking systems often lack crucial information for the developers to promptly resolve them, costing companies billions of dollars. There has been significant research on effectively eliciting information from bug reporters in bug tracking systems using different templates that bug reporters need to use. However, the need for asking follow-up questions persists. Recent studies propose techniques to suggest these follow-up questions to help developers obtain the missing details, but there has been little research on answering these follow up questions, which are often unanswered. In this paper, we propose a novel approach that uses CodeT5 in combination with Lucene, an information retrieval technique that leverages the relevance of different bug reports, their components, and follow-up questions to recommend answers. These top-performing answers, along with their bug report, serve as additional context apart from the deficient bug report to the deep learning model for generating an answer. We evaluate our recommended answers with the manually annotated answers using similarity metrics like Normalized Smooth BLEU Score, METEOR, Word Mover's Distance, and Semantic Similarity. We achieve a BLEU Score of up to 34 and Semantic Similarity of up to 64 which shows that the answers generated are understandable and good according to Google's standard and can outperform multiple baselines.	翻訳日:2023-10-24 12:37:00 公開日:2023-07-08
# reviewranker: コードレビュー品質推定のための半教師付き学習に基づくアプローチ ReviewRanker: A Semi-Supervised Learning Based Approach for Code Review Quality Estimation ( http://arxiv.org/abs/2307.03996v1 ) ライセンス: Link先を確認	Saifullah Mahbub, Md. Easin Arafat, Chowdhury Rafeed Rahman, Zannatul Ferdows, Masum Hasan	(参考訳) コードレビューは、バグを最小限にし、コード品質を改善するための、ソフトウェア業界の重要なプロセスだと考えられている。レビュープロセスの有効性と継続的改善の検査は、開発生産性を高める。このような検査は時間と人のバイアスがかかる作業です。本稿では,レビューの品質に反する信頼性スコアを各コードレビューに割り当てることを目的とした,半教師付き学習ベースシステムであるReviewRankerを提案する。提案手法は,開発者が提供するシンプルで明確なラベルに基づいて訓練される。ラベル付けタスクは開発者からの努力をほとんど必要とせず、最終目標(レビュー信頼スコアの割り当て)と間接的に関係しています。 ReviewRankerは、人間のバイアスを減らし、業界全体のコードレビュー品質検査を改善することが期待されている。このシステムは、開発およびレビュープロセスに存在するバック・アンド・フォア・サイクルを最小化する可能性がある。この研究で使用可能なコードとデータセットは、https://github.com/saifarnab/code_reviewにある。 Code review is considered a key process in the software industry for minimizing bugs and improving code quality. Inspection of review process effectiveness and continuous improvement can boost development productivity. Such inspection is a time-consuming and human-bias-prone task. We propose a semi-supervised learning based system ReviewRanker which is aimed at assigning each code review a confidence score which is expected to resonate with the quality of the review. Our proposed method is trained based on simple and and well defined labels provided by developers. The labeling task requires little to no effort from the developers and has an indirect relation to the end goal (assignment of review confidence score). ReviewRanker is expected to improve industry-wide code review quality inspection through reducing human bias and effort required for such task. The system has the potential of minimizing the back-and-forth cycle existing in the development and review process. Usable code and dataset for this research can be found at: https://github.com/saifarnab/code_review	翻訳日:2023-10-23 18:17:16 公開日:2023-07-08
# コード分析のための自動コード評価システムとリソースの探索:包括的調査 Exploring Automated Code Evaluation Systems and Resources for Code Analysis: A Comprehensive Survey ( http://arxiv.org/abs/2307.08705v1 ) ライセンス: Link先を確認	Md. Mostafizer Rahman, Yutaka Watanobe, Atsushi Shirafuji and Mohamed Hamada	(参考訳) 自動コード評価システム(AES)は、主にユーザからの投稿されたコードを確実に評価するために設計されている。広範囲のアプリケーションと貴重なリソースの蓄積により、AESはますます人気が高まっている。多様なコーディングタスクに対するAESの応用と実世界の資源探索に関する研究はいまだ不十分である。本研究では,AESとその資源に関する総合的な調査を行った。本調査は, aessの適用領域, 利用可能なリソース, コーディングタスクのリソース利用について検討する。 AESはプログラミングコンテスト、プログラミングの学習と教育、採用、オンラインコンパイラ、そしてアプリケーションによって追加のモジュールに分類される。研究、分析、コーディングタスクのために、これらのシステムの利用可能なデータセットやその他のリソースを調査します。さらに,バグ検出,コードレビュー,理解,リファクタリング,検索,表現,修復など,機械学習によるコーディングタスクの概要を紹介する。これらのタスクは実際のデータセットを使って実行される。さらに,システム設計(ハードウェアとソフトウェア),運用(競争と教育),研究の観点から,会津オンライン審査プラットフォームをAESの実例として論じる。これは、AOJプラットフォーム(プログラミング教育、競争、実践)のスケーラビリティ、オープンな内部機能(ハードウェアとソフトウェア)、研究コミュニティからの注目、オープンソースデータ(例えば、ソリューションコードや提出文書)、透明性によるものである。また,システム全体のパフォーマンスや,長年にわたって認識されてきた課題についても分析した。 The automated code evaluation system (AES) is mainly designed to reliably assess user-submitted code. Due to their extensive range of applications and the accumulation of valuable resources, AESs are becoming increasingly popular. Research on the application of AES and their real-world resource exploration for diverse coding tasks is still lacking. In this study, we conducted a comprehensive survey on AESs and their resources. This survey explores the application areas of AESs, available resources, and resource utilization for coding tasks. AESs are categorized into programming contests, programming learning and education, recruitment, online compilers, and additional modules, depending on their application. We explore the available datasets and other resources of these systems for research, analysis, and coding tasks. Moreover, we provide an overview of machine learning-driven coding tasks, such as bug detection, code review, comprehension, refactoring, search, representation, and repair. These tasks are performed using real-life datasets. In addition, we briefly discuss the Aizu Online Judge platform as a real example of an AES from the perspectives of system design (hardware and software), operation (competition and education), and research. This is due to the scalability of the AOJ platform (programming education, competitions, and practice), open internal features (hardware and software), attention from the research community, open source data (e.g., solution codes and submission documents), and transparency. We also analyze the overall performance of this system and the perceived challenges over the years.	翻訳日:2023-10-23 17:11:05 公開日:2023-07-08
# 牛乳の脂肪含量判定のための透過スペックル写真と畳み込みニューラルネットワークの組み合わせ-複合懸濁液のパラメータの分類における課題 Combining transmission speckle photography and convolutional neural network for determination of fat content in cow's milk -- an exercise in classification of parameters of a complex suspension ( http://arxiv.org/abs/2307.15069v1 ) ライセンス: Link先を確認	Kwasi Nyandey (1 and 2) and Daniel Jakubczyk (1) ((1) Institute of Physics, Polish Academy of Sciences, Warsaw, Poland (2) Laser and Fibre Optics Centre, Department of Physics, School of Physical Sciences, College of Agriculture and Natural Sciences, University of Cape Coast, Cape Coast, Ghana)	(参考訳) 乳脂肪含有クラスの直接分類と認識のために,透過スペックル写真と機械学習を組み合わせた。本研究の目的は,散乱粒子(および分散媒質)のパラメータが,散乱媒質を介してコヒーレント光が透過するときに観測される強度分布(スペックル)に関連付けられることにある。牛乳については、主に総脂肪量を構成する脂肪球の大きさ分布と濃度である。その結果、我々は畳み込みニューラルネットワークを訓練し、異なる脂肪量クラス(0.5, 1.5, 2.0, 3.2%)からレーザースペックルを認識し分類した。本研究は4つの露光時間プロトコルを解析し,全画像に対して強度ヒストグラムが類似しており,スペックルパターンの最も高い強度が0に近い,短い露光時間における最高性能を得た。筆者らのニューラルネットワークは乳脂肪含量クラスを不明瞭に認識し,それぞれ100,99%と最も高い検査値と独立した分類精度を得た。これは、他の複素現実的サスペンションのパラメータが同様の方法で分類できることを示している。 We have combined transmission speckle photography and machine learning for direct classification and recognition of milk fat content classes. Our aim was hinged on the fact that parameters of scattering particles (and the dispersion medium) can be linked to the intensity distribution (speckle) observed when coherent light is transmitted through a scattering medium. For milk, it is primarily the size distribution and concentration of fat globules, which constitutes the total fat content. Consequently, we trained convolutional neural network to recognise and classify laser speckle from different fat content classes (0.5, 1.5, 2.0 and 3.2%). We investigated four exposure-time protocols and obtained the highest performance for shorter exposure times, in which the intensity histograms are kept similar for all images and the most probable intensity in the speckle pattern is close to zero. Our neural network was able to recognize the milk fat content classes unambiguously and we obtained the highest test and independent classification accuracies of 100 and ~99% respectively. It indicates that the parameters of other complex realistic suspensions could be classified with similar methods.	翻訳日:2023-08-06 11:37:27 公開日:2023-07-08
# XcodeのCopilot: クラウドベースの大規模言語モデルによるAI支援プログラミングの探索 Copilot for Xcode: Exploring AI-Assisted Programming by Prompting Cloud-based Large Language Models ( http://arxiv.org/abs/2307.14349v1 ) ライセンス: Link先を確認	Chee Wei Tan, Shangxin Guo, Man Fai Wong, Ching Nam Hang	(参考訳) 本稿では,プログラム構成と設計のためのAI支援プログラミングツールであるCopilot for Xcodeを提案する。クラウドベースのLarge Language Models(LLM)をAppleのローカル開発環境であるXcodeとシームレスに統合することにより、このツールは生産性を高め、Appleソフトウェアエコシステム(iOSアプリやmacOSなど)におけるソフトウェア開発の創造性を解放する。高度な自然言語処理(NLP)技術を活用することで、コードリポジトリ内のソースコードトークンとパターンを効果的に処理し、コード生成、自動補完、ドキュメント、エラー検出などの機能を実現する。ソフトウェア開発者は、プログラム構成に関する"小さな"決定をクエリし、同時に行うこともでき、これは、Xcode用のCopilotのチャットインターフェースで、迅速なエンジニアリングによって容易に行える。最後に,xcode で nlp を活用し,openai chatgpt などの一般的な llm サービスをプログラム構成や設計に促進する効果の証拠として,簡単なケーススタディを提案する。 This paper presents an AI-assisted programming tool called Copilot for Xcode for program composition and design to support human software developers. By seamlessly integrating cloud-based Large Language Models (LLM) with Apple's local development environment, Xcode, this tool enhances productivity and unleashes creativity for software development in Apple software ecosystem (e.g., iOS apps, macOS). Leveraging advanced natural language processing (NLP) techniques, Copilot for Xcode effectively processes source code tokens and patterns within code repositories, enabling features such as code generation, autocompletion, documentation, and error detection. Software developers can also query and make "small" decisions for program composition, some of which can be made simultaneously, and this is facilitated through prompt engineering in a chat interface of Copilot for Xcode. Finally, we present simple case studies as evidence of the effectiveness of utilizing NLP in Xcode to prompt popular LLM services like OpenAI ChatGPT for program composition and design.	翻訳日:2023-07-30 03:57:44 公開日:2023-07-08
# LLMは良い金融アドバイザーになれるか? 最適成果のための個人的意思決定に関する研究 Can LLMs be Good Financial Advisors?: An Initial Study in Personal Decision Making for Optimized Outcomes ( http://arxiv.org/abs/2307.07422v1 ) ライセンス: Link先を確認	Kausik Lakkaraju, Sai Krishna Revanth Vuruma, Vishal Pallagani, Bharath Muppasani, Biplav Srivastava	(参考訳) chatgptやbardといった、ますます強力な大規模言語モデル(llm)ベースのチャットボットが、一般大衆が達成する意思決定の質に革命を起こす可能性があるユーザに提供され始めている。この文脈では、金融包摂が銀行の長年の主導的目的である個人金融分野において、このようなシステムがどのように振舞うかを考察する。我々は、個人金融における銀行商品を代表する13の質問(銀行口座、クレジットカード、預金証書、商品間相互作用、高価値な購入、銀行の支払い、投資アドバイスに関する決定、および様々な方言や言語(英語、アフリカ系アメリカ人の英語、テルグ))を質問した。チャットボットのアウトプットは流動的で信頼性が高いが,LSMベースのチャットボットを用いた正確な財務情報の提供には依然として重要なギャップがある。 Increasingly powerful Large Language Model (LLM) based chatbots, like ChatGPT and Bard, are becoming available to users that have the potential to revolutionize the quality of decision-making achieved by the public. In this context, we set out to investigate how such systems perform in the personal finance domain, where financial inclusion has been an overarching stated aim of banks for decades. We asked 13 questions representing banking products in personal finance: bank account, credit card, and certificate of deposits and their inter-product interactions, and decisions related to high-value purchases, payment of bank dues, and investment advice, and in different dialects and languages (English, African American Vernacular English, and Telugu). We find that although the outputs of the chatbots are fluent and plausible, there are still critical gaps in providing accurate and reliable financial information using LLM-based chatbots.	翻訳日:2023-07-23 12:26:09 公開日:2023-07-08
# シークエンシャルグリーディタグとしての不完全発話書き換え Incomplete Utterance Rewriting as Sequential Greedy Tagging ( http://arxiv.org/abs/2307.06337v1 ) ライセンス: Link先を確認	Yunshan Chen	(参考訳) 不完全な発話書き直しのタスクは、最近注目を集めている。以前のモデルは、低い復元スコアで示されるように、対話コンテキストから情報を抽出するのに苦労した。この問題に対処するために,コンテキストから情報を抽出するのにより適した,新しいシーケンスタグベースモデルを提案する。一方,モデル話者変動に話者認識埋め込みを導入する。複数のパブリックデータセットにおける実験により、従来の最先端モデルに匹敵するメトリクススコアを持つ一方で、9つの修復スコアすべてにおいて最適な結果が得られることが示された。さらに、モデルの単純さの恩恵を受け、我々のアプローチは推論速度で過去のモデルよりも優れています。 The task of incomplete utterance rewriting has recently gotten much attention. Previous models struggled to extract information from the dialogue context, as evidenced by the low restoration scores. To address this issue, we propose a novel sequence tagging-based model, which is more adept at extracting information from context. Meanwhile, we introduce speaker-aware embedding to model speaker variation. Experiments on multiple public datasets show that our model achieves optimal results on all nine restoration scores while having other metric scores comparable to previous state-of-the-art models. Furthermore, benefitting from the model's simplicity, our approach outperforms most previous models on inference speed.	翻訳日:2023-07-16 03:14:38 公開日:2023-07-08
# 名前の文字列から人種と民族を予測すること Predicting Race and Ethnicity From the Sequence of Characters in a Name ( http://arxiv.org/abs/1805.02109v2 ) ライセンス: Link先を確認	Rajashekar Chintalapati, Suriyan Laohaprapanon, and Gaurav Sood	(参考訳) 人種格差と公平性に関する質問に答えるには、しばしば名前から人種や民族を推測する方法が必要である。人種と民族を名前から推測する一つの方法は、国勢調査局の人気のある姓のリストに依存することである。しかし、リストには少なくとも3つの制限がある。 1.ラストネームのみを含む。 2. 人気の姓のみを含む。 3.10年に1度更新される。名前の文字と人種と民族の関係を様々な手法を用いてモデル化する。 Long Short-Term Memory を用いたモデルでは、サンプル外精度は .85 である。最高のパフォーマンスのラストネームモデルは、.81のサンプル外精度を達成する。モデルの有用性を説明するために,様々な人種集団の人々が行う寄付のシェアを推定するキャンペーンファイナンスデータと,ニュースにおける様々な人種や民族のカバレッジを推定するニュースデータに適用する。 To answer questions about racial inequality and fairness, we often need a way to infer race and ethnicity from names. One way to infer race and ethnicity from names is by relying on the Census Bureau's list of popular last names. The list, however, suffers from at least three limitations: 1. it only contains last names, 2. it only includes popular last names, and 3. it is updated once every 10 years. To provide better generalization, and higher accuracy when first names are available, we model the relationship between characters in a name and race and ethnicity using various techniques. A model using Long Short-Term Memory works best with out-of-sample accuracy of .85. The best-performing last-name model achieves out-of-sample accuracy of .81. To illustrate the utility of the models, we apply them to campaign finance data to estimate the share of donations made by people of various racial groups, and to news data to estimate the coverage of various races and ethnicities in the news.	翻訳日:2023-07-13 20:54:43 公開日:2023-07-08
# Kencorpus: 自然言語処理タスクのためのKenyan Language Corpus of Swahili, Dholuo, Luhya Kencorpus: A Kenyan Language Corpus of Swahili, Dholuo and Luhya for Natural Language Processing Tasks ( http://arxiv.org/abs/2208.12081v2 ) ライセンス: Link先を確認	Barack Wanjawa, Lilian Wanzare, Florence Indede, Owen McOnyango, Edward Ombui, Lawrence Muchemi	(参考訳) アフリカ原産の言語は、自然言語処理では不足している。そのため、デジタルの傾向や情報アクセスが貧弱である。このような言語の処理課題は、必要なデータなしで機械学習とディープラーニングモデルを使用する方法だ。 Kencorpusプロジェクトは、機械翻訳、質問応答、多言語コミュニティでの書き起こしなど、データ駆動型ソリューションに十分なテキストと音声データを収集、保存することで、このギャップを埋めようとしている。 kencorpusデータセットは、主にケニアで話されている3つの言語(スワヒリ語、ドルーオ語、ルヒヤ語)のテキストと音声コーパスである。データ収集は、コミュニティ、学校、メディア、出版社の研究者によって行われた。ケンコープスのデータセットには、5,594の項目 - 4,442のテキスト (5.6mワード) と1,152の音声ファイル (177hrs) がある。このデータに基づいて,Dholuo と Luhya の音声タグセット (それぞれ50,000 語と 93,000 語) の一部が開発された。スワヒリ語に対する7,537の質問応答対を開発し,Dholuo と Luhya からスワヒリ語への13,400 文のテキスト翻訳を作成した。データセットは、モデルトレーニングや翻訳といった下流の機械学習タスクに役立ちます。また,質問応答タスクのためのKismwahili音声テキスト学習システムと機械学習システムの2つの概念実証システムを開発し,それぞれ18.87%の単語誤り率と80%のエクサクトマッチ(EM)が得られた。これらの最初の結果は、Kencorpusの機械学習コミュニティへのユーザビリティを大いに約束する。 kencorpusは、これら3つの低リソース言語のための数少ないパブリックドメインコーポラの1つであり、特に低リソース言語のための同様の作品の学習と共有の基盤を形成している。 Indigenous African languages are categorized as under-served in Natural Language Processing. They therefore experience poor digital inclusivity and information access. The processing challenge with such languages has been how to use machine learning and deep learning models without the requisite data. The Kencorpus project intends to bridge this gap by collecting and storing text and speech data that is good enough for data-driven solutions in applications such as machine translation, question answering and transcription in multilingual communities. The Kencorpus dataset is a text and speech corpus for three languages predominantly spoken in Kenya: Swahili, Dholuo and Luhya. Data collection was done by researchers from communities, schools, media, and publishers. The Kencorpus' dataset has a collection of 5,594 items - 4,442 texts (5.6M words) and 1,152 speech files (177hrs). Based on this data, Part of Speech tagging sets for Dholuo and Luhya (50,000 and 93,000 words respectively) were developed. We developed 7,537 Question-Answer pairs for Swahili and created a text translation set of 13,400 sentences from Dholuo and Luhya into Swahili. The datasets are useful for downstream machine learning tasks such as model training and translation. We also developed two proof of concept systems: for Kiswahili speech-to-text and machine learning system for Question Answering task, with results of 18.87% word error rate and 80% Exact Match (EM) respectively. These initial results give great promise to the usability of Kencorpus to the machine learning community. Kencorpus is one of few public domain corpora for these three low resource languages and forms a basis of learning and sharing experiences for similar works especially for low resource languages.	翻訳日:2023-07-13 20:24:16 公開日:2023-07-08
# 生成モデリングと制約付き組合せ最適化のための対称テンソルネットワーク Symmetric Tensor Networks for Generative Modeling and Constrained Combinatorial Optimization ( http://arxiv.org/abs/2211.09121v3 ) ライセンス: Link先を確認	Javier Lopez-Piqueres, Jing Chen, Alejandro Perdomo-Ortiz	(参考訳) ポートフォリオ最適化からロジスティクスまで、業界に多い制約付き組合せ最適化の問題。これらの問題を解決する大きな障害の1つは、有効な探索空間を制限する非自明なハード制約の存在である。いくつかのヒューリスティックな解法において、これらは典型的にはコスト関数に特定のラグランジュ乗数を導入し、それらを何らかの方法で緩和し、さらに悪いことに、多くのサンプルを生成して有効なものだけを保持することにより、非常に高価で非効率な探索をもたらす。本研究では, ax=b 形式の任意の整数値等式制約を u(1) 対称テンソルネットワーク (tns) に直接エンコードし, 組合せ最適化問題の解探索を支援する量子モデルとしてその適用性を活用する。これにより、TN生成モデルの一般化能力を利用でき、それらを制約することで、有効なサンプルのみを出力できる。制約付きTN生成モデルは,パラメータ数と計算コストを削減し,制約を効率的に捕捉する。任意の等式によって与えられる制約のあるタスクにおいて、対称行列積状態は、組合せ最適化問題に対する新しいより良い解を見つけるために、標準の制約のないタスクよりも優れることがわかった。 Constrained combinatorial optimization problems abound in industry, from portfolio optimization to logistics. One of the major roadblocks in solving these problems is the presence of non-trivial hard constraints which limit the valid search space. In some heuristic solvers, these are typically addressed by introducing certain Lagrange multipliers in the cost function, by relaxing them in some way, or worse yet, by generating many samples and only keeping valid ones, which leads to very expensive and inefficient searches. In this work, we encode arbitrary integer-valued equality constraints of the form Ax=b, directly into U(1) symmetric tensor networks (TNs) and leverage their applicability as quantum-inspired generative models to assist in the search of solutions to combinatorial optimization problems. This allows us to exploit the generalization capabilities of TN generative models while constraining them so that they only output valid samples. Our constrained TN generative model efficiently captures the constraints by reducing number of parameters and computational costs. We find that at tasks with constraints given by arbitrary equalities, symmetric Matrix Product States outperform their standard unconstrained counterparts at finding novel and better solutions to combinatorial optimization problems.	翻訳日:2023-07-13 20:04:23 公開日:2023-07-08
# 意思決定と制御のための深層生成モデル Deep Generative Models for Decision-Making and Control ( http://arxiv.org/abs/2306.08810v2 ) ライセンス: Link先を確認	Michael Janner	(参考訳) 深層モデルに基づく強化学習法は、意思決定と制御問題に対する概念的に単純なアプローチを提供する: 近似ダイナミクスモデルの推定のために学習を使い、残りの作業を古典的な軌道最適化にオフロードする。しかし、この組み合わせには多くの経験的欠点があり、実際にモデルベース手法の有用性を制限している。この論文の2つの目的は、これらの欠点の理由を研究し、未解決問題に対する解決策を提案することである。その過程で,ビーム探索,分類器誘導サンプリング,画像インパインティングなど,現代生成型モデリングツールボックスからの推論手法を,強化学習問題に対する有効な計画戦略として再解釈できることを強調する。 Deep model-based reinforcement learning methods offer a conceptually simple approach to the decision-making and control problem: use learning for the purpose of estimating an approximate dynamics model, and offload the rest of the work to classical trajectory optimization. However, this combination has a number of empirical shortcomings, limiting the usefulness of model-based methods in practice. The dual purpose of this thesis is to study the reasons for these shortcomings and to propose solutions for the uncovered problems. Along the way, we highlight how inference techniques from the contemporary generative modeling toolbox, including beam search, classifier-guided sampling, and image inpainting, can be reinterpreted as viable planning strategies for reinforcement learning problems.	翻訳日:2023-07-13 18:59:43 公開日:2023-07-08
# 超伝導量子ビットによる高オーバートンバルク音波共振器の結合 Coupling high-overtone bulk acoustic wave resonators via superconducting qubits ( http://arxiv.org/abs/2307.05544v1 ) ライセンス: Link先を確認	Wayne Crump, Alpo V\"alimaa, and Mika A. Sillanp\"a\"a	(参考訳) 本稿では,2つの結合トランスモン量子ビットからなるデバイスについて,それぞれが独立な高オーバトンバルク波共振器(hbar)と結合されていることを示す。両方のHBAR共振器は複数の音響モードをサポートしており、ほぼ共鳴的に量子ビットに結合することができる。まず,マルチモードシステムにおける量子ビット-量子ビット相互作用を示し,最後に1量子ビットのHBARモードから他の量子ビットのHBARモードに励起を交換する量子状態移動を示す。 In this work, we present a device consisting of two coupled transmon qubits, each of which are coupled to an independent high-overtone bulk acoustic wave resonator (HBAR). Both HBAR resonators support a plethora of acoustic modes, which can couple to the qubit near resonantly. We first show qubit-qubit interaction in the multimode system, and finally quantum state transfer where an excitation is swapped from an HBAR mode of one qubit, to an HBAR mode of the other qubit.	翻訳日:2023-07-13 16:27:30 公開日:2023-07-08
# 生成テキスト・画像モデルのリスクの分類 Typology of Risks of Generative Text-to-Image Models ( http://arxiv.org/abs/2307.05543v1 ) ライセンス: Link先を確認	Charlotte Bird and Eddie L. Ungless and Atoosa Kasirzadeh	(参考訳) 本稿では,dall-eやmidjourneyといった現代テキストから画像への生成モデルに対する直接的なリスクと害について,包括的な文献レビューを通じて検討する。これらのモデルは画像を生成するのに前例のない能力を提供するが、その開発と利用は注意を要する新しいタイプのリスクをもたらす。今回のレビューでは,すでに対処済みのリスクの理解と対処に関して,重要な知識のギャップが明らかにされている。我々は、未解決の問題を含む6つの主要な利害関係者グループにわたるリスクの分類を提供し、今後の研究方向性を提案する。データバイアスから悪意のある使用まで、22の異なるリスクタイプを特定します。ここで示した調査は、責任のあるモデルの開発とデプロイに関する現在進行中の談話を強化することを目的としている。それまで見過ごされていたリスクとギャップを強調することで、その後の研究とガバナンスのイニシアチブを形作り、テキストから画像へのモデルの責任、安全、倫理的に意識された進化へと導くことを目指している。 This paper investigates the direct risks and harms associated with modern text-to-image generative models, such as DALL-E and Midjourney, through a comprehensive literature review. While these models offer unprecedented capabilities for generating images, their development and use introduce new types of risk that require careful consideration. Our review reveals significant knowledge gaps concerning the understanding and treatment of these risks despite some already being addressed. We offer a taxonomy of risks across six key stakeholder groups, inclusive of unexplored issues, and suggest future research directions. We identify 22 distinct risk types, spanning issues from data bias to malicious use. The investigation presented here is intended to enhance the ongoing discourse on responsible model development and deployment. By highlighting previously overlooked risks and gaps, it aims to shape subsequent research and governance initiatives, guiding them toward the responsible, secure, and ethically conscious evolution of text-to-image models.	翻訳日:2023-07-13 16:27:20 公開日:2023-07-08
# スケーラブルグラフ周波数分解による高忠実度3次元手形状再構成 High Fidelity 3D Hand Shape Reconstruction via Scalable Graph Frequency Decomposition ( http://arxiv.org/abs/2307.05541v1 ) ライセンス: Link先を確認	Tianyu Luan, Yuanhao Zhai, Jingjing Meng, Zhong Li, Zhang Chen, Yi Xu, and Junsong Yuan	(参考訳) 最近のシングルイメージハンドモデリング技術によって得られた印象的なパフォーマンスにもかかわらず、3Dハンドメッシュの十分な詳細をキャプチャする能力は欠如している。この不足は、例えばパーソナライズドハンドモデリングのような高忠実度ハンドモデリングが必要な場合に、彼らのアプリケーションを大幅に制限する。そこで我々は,周波数分割ネットワークを設計し,周波数帯域の異なる3次元ハンドメッシュを粗い方法で生成する。高頻度パーソナライズドディテールをキャプチャするために、3dメッシュを周波数領域に変換し、各周波数成分を監督する新しい周波数分解損失を提案する。このような粗い細かなスキームを活用することで、高い周波数領域に対応する手の詳細を保存できる。さらに、提案するネットワークはスケーラブルであり、様々な計算能力を持つ異なるハードウェアに対応するため、任意の解像度レベルでの推論を停止することができる。本手法の性能を定量的に評価するために,各メッシュ周波数成分の信号対雑音比を測定するために,平均信号対雑音比(msnr)と呼ばれる新しい評価指標を提案する。広範な実験により,高忠実度3次元ハンドリコンストラクションのための細かな詳細情報を生成し,従来の測定値と比較して,メッシュの詳細を測定する上で評価基準がより効果的であることを実証した。 Despite the impressive performance obtained by recent single-image hand modeling techniques, they lack the capability to capture sufficient details of the 3D hand mesh. This deficiency greatly limits their applications when high-fidelity hand modeling is required, e.g., personalized hand modeling. To address this problem, we design a frequency split network to generate 3D hand mesh using different frequency bands in a coarse-to-fine manner. To capture high-frequency personalized details, we transform the 3D mesh into the frequency domain, and propose a novel frequency decomposition loss to supervise each frequency component. By leveraging such a coarse-to-fine scheme, hand details that correspond to the higher frequency domain can be preserved. In addition, the proposed network is scalable, and can stop the inference at any resolution level to accommodate different hardware with varying computational powers. To quantitatively evaluate the performance of our method in terms of recovering personalized shape details, we introduce a new evaluation metric named Mean Signal-to-Noise Ratio (MSNR) to measure the signal-to-noise ratio of each mesh frequency component. Extensive experiments demonstrate that our approach generates fine-grained details for high-fidelity 3D hand reconstruction, and our evaluation metric is more effective for measuring mesh details compared with traditional metrics.	翻訳日:2023-07-13 16:27:04 公開日:2023-07-08
# 科学制御可能なテキスト生成手法の進歩 Advancements in Scientific Controllable Text Generation Methods ( http://arxiv.org/abs/2307.05538v1 ) ライセンス: Link先を確認	Arnav Goel, Medha Hira, Avinash Anand, Siddhesh Bangar, Dr. Rajiv Ratn Shah	(参考訳) 制御可能なテキスト生成に関するこれまでの研究は、我々が本研究で提供する新しいスキーマを用いて組織化されている。 7つのコンポーネントがスキーマを構成し、それぞれが生成プロセスに不可欠である。科学文献の制御された生成を実現するために,各7成分を変調する様々な変調戦略について述べる。また,これらの手法を理論的に検討し,質的考察を行う。この洞察は、これらのコンポーネントの組み合わせに基づく新しいアーキテクチャを可能にする。今後の研究では、これらの手法を実証的に比較して、その強みと有用性についてさらに学ぶだろう。 The previous work on controllable text generation is organized using a new schema we provide in this study. Seven components make up the schema, and each one is crucial to the creation process. To accomplish controlled generation for scientific literature, we describe the various modulation strategies utilised to modulate each of the seven components. We also offer a theoretical study and qualitative examination of these methods. This insight makes possible new architectures based on combinations of these components. Future research will compare these methods empirically to learn more about their strengths and utility.	翻訳日:2023-07-13 16:26:40 公開日:2023-07-08
# NLPとRNA: Word2Vecによるリボザイムの非教師なし埋め込み学習 NLP Meets RNA: Unsupervised Embedding Learning for Ribozymes with Word2Vec ( http://arxiv.org/abs/2307.05537v1 ) ライセンス: Link先を確認	Andrew Kean Gao	(参考訳) 異なる3D構造と触媒活性を持つRNA分子であるリボザイムは、合成生物学や治療に広く応用されている。しかし、リボザイムの理解を深めるためにディープラーニングを活用する研究は、比較的少ない。本研究は,自然言語処理のための教師なし学習手法であるword2vecを実装し,リボザイム埋め込みを学習する。 Ribo2Vecは9000以上の多様なリボザイムで訓練され、配列を128次元および256次元のベクトル空間にマッピングすることを学んだ。 Ribo2Vecを用いて5種類のリボザイム(ハッチェ、ピストル、ヘアピン、ホブリンク、ツイスター姉妹)の配列埋め込みを計算した。主成分分析はリボザイムのクラスを区別するこれらの埋め込みの能力を示した。さらに、リボザイムの埋め込みを訓練した単純なSVM分類器は、リボザイムの型を正確に分類する有望な結果を示した。以上の結果から,組込みベクターにはリボザイムに関する有意な情報が含まれていることが示唆された。興味深いことに、256次元の埋め込みは128次元の埋め込みと同様に振舞い、より低次元のベクトル空間が一般的にリボザイムの特徴を捉えるのに十分であることを示す。このアプローチは、バイオインフォマティクスにおけるWord2Vecの可能性を示し、リボザイム研究の新しい道を開く。将来の研究は、rna埋め込みを学習するためにトランスフォーマーベースの方法を使用することで、ヌクレオチド間の長距離相互作用を捉えることができる。 Ribozymes, RNA molecules with distinct 3D structures and catalytic activity, have widespread applications in synthetic biology and therapeutics. However, relatively little research has focused on leveraging deep learning to enhance our understanding of ribozymes. This study implements Word2Vec, an unsupervised learning technique for natural language processing, to learn ribozyme embeddings. Ribo2Vec was trained on over 9,000 diverse ribozymes, learning to map sequences to 128 and 256-dimensional vector spaces. Using Ribo2Vec, sequence embeddings for five classes of ribozymes (hatchet, pistol, hairpin, hovlinc, and twister sister) were calculated. Principal component analysis demonstrated the ability of these embeddings to distinguish between ribozyme classes. Furthermore, a simple SVM classifier trained on ribozyme embeddings showed promising results in accurately classifying ribozyme types. Our results suggest that the embedding vectors contained meaningful information about ribozymes. Interestingly, 256-dimensional embeddings behaved similarly to 128-dimensional embeddings, suggesting that a lower dimension vector space is generally sufficient to capture ribozyme features. This approach demonstrates the potential of Word2Vec for bioinformatics, opening new avenues for ribozyme research. Future research includes using a Transformer-based method to learn RNA embeddings, which can capture long-range interactions between nucleotides.	翻訳日:2023-07-13 16:26:32 公開日:2023-07-08
# 顔認識のための顔画像品質向上研究 Face Image Quality Enhancement Study for Face Recognition ( http://arxiv.org/abs/2307.05534v1 ) ライセンス: Link先を確認	Iqbal Nouyed, Na Zhang	(参考訳) unconstrained face recognitionは、コンピュータビジョンとバイオメトリック研究者の間で長年にわたって活発な研究分野である。それでも、低画質の写真における顔認識の問題は、まだ十分に研究されていない。本稿では,低画質写真における顔認識性能について検討し,低画質画像の処理精度の向上を試みる。画像品質の低い大容量データベースを組み立て,3種類の品質セットを対象とした顔認識アルゴリズムの性能について検討する。最先端の顔画像強調手法を用いて,顔画像の顔認識性能について検討する。そこで本研究では,低画質の顔画像を用いた新しい認識プロトコルを開発し,その性能を実験的に検証した。低画質の顔画像を用いた顔認識プロトコルは,他の研究者にとって有用である。さらに,実験結果から,この問題の難易度が示されている。 Unconstrained face recognition is an active research area among computer vision and biometric researchers for many years now. Still the problem of face recognition in low quality photos has not been well-studied so far. In this paper, we explore the face recognition performance on low quality photos, and we try to improve the accuracy in dealing with low quality face images. We assemble a large database with low quality photos, and examine the performance of face recognition algorithms for three different quality sets. Using state-of-the-art facial image enhancement approaches, we explore the face recognition performance for the enhanced face images. To perform this without experimental bias, we have developed a new protocol for recognition with low quality face photos and validate the performance experimentally. Our designed protocol for face recognition with low quality face images can be useful to other researchers. Moreover, experiment results show some of the challenging aspects of this problem.	翻訳日:2023-07-13 16:26:06 公開日:2023-07-08
# chatgptの公開: 命令調整テキストジェネレータにおけるオープン性、透明性、説明責任の追跡 Opening up ChatGPT: Tracking openness, transparency, and accountability in instruction-tuned text generators ( http://arxiv.org/abs/2307.05532v1 ) ライセンス: Link先を確認	Andreas Liesenfeld, Alianda Lopez, Mark Dingemanse	(参考訳) 命令追従動作を示す大きな言語モデルは、ヒューマンフィードバック(LLM+RLHF)からの強化学習を通じて微調整されたテキスト生成のためのプロプライエタリな大規模言語モデルであるOpenAIのChatGPTのリリースによって、近年の会話インターフェースにおける最大の流行の1つである。プロプライエタリなソフトウェアに依存するリスクをレビューし、同等のアーキテクチャと機能を持つオープンソースプロジェクトの最初の収穫物を調査します。本論文の主な貢献は,オープンネスが差別化されていることを示し,この高速移動分野におけるオープンネスの度合いに関する科学的資料を提供することである。我々は、コードのオープン性、トレーニングデータ、モデル重み付け、rlhfデータ、ライセンス、科学ドキュメント、アクセス方法の観点からプロジェクトを評価する。オープンソース」と自称するプロジェクトが急速に増えている一方で、疑わしい合法性に関する文書化されていないデータを継承しているプロジェクトも少なくないが、重要なインストラクションチューニング(人間のアノテーション労働が関与する重要なサイト)を共有することはほとんどなく、注意深く科学的文書化することは極めて稀である。データ収集やキュレーションからモデルアーキテクチャまで,トレーニングや微調整からリリースやデプロイメントに至るまで,あらゆる点で,オープン性の程度は公平性と説明責任に関係しています。 Large language models that exhibit instruction-following behaviour represent one of the biggest recent upheavals in conversational interfaces, a trend in large part fuelled by the release of OpenAI's ChatGPT, a proprietary large language model for text generation fine-tuned through reinforcement learning from human feedback (LLM+RLHF). We review the risks of relying on proprietary software and survey the first crop of open-source projects of comparable architecture and functionality. The main contribution of this paper is to show that openness is differentiated, and to offer scientific documentation of degrees of openness in this fast-moving field. We evaluate projects in terms of openness of code, training data, model weights, RLHF data, licensing, scientific documentation, and access methods. We find that while there is a fast-growing list of projects billing themselves as 'open source', many inherit undocumented data of dubious legality, few share the all-important instruction-tuning (a key site where human annotation labour is involved), and careful scientific documentation is exceedingly rare. Degrees of openness are relevant to fairness and accountability at all points, from data collection and curation to model architecture, and from training and fine-tuning to release and deployment.	翻訳日:2023-07-13 16:25:54 公開日:2023-07-08
# リプレイとカリキュラムの統合:連続学習への影響 Integrating Curricula with Replays: Its Effects on Continual Learning ( http://arxiv.org/abs/2307.05747v1 ) ライセンス: Link先を確認	Ren Jie Tee and Mengmi Zhang	(参考訳) 人間は、新しいスキルや知識を得る際に、キュリキュラでプロセスを学習し、レビューする。この人間の学習行動は、連続学習エージェントにおけるカリキュラムと再生方法の統合にインスピレーションを与えている。目標は、人間の学習プロセスをエミュレートし、知識の保持を改善し、学習の伝達を促進することである。継続学習エージェントにおける既存のリプレイ手法では、前タスクからのデータのランダムな選択と順序付けが有効であることが示されている。しかし、継続学習を強化するためのリプレイ手法と異なるカリキュラムの統合について限定的な研究がなされている。本研究は,リプレイ法とリプレイ法が連続学習に与える影響を,学習データとリプレイ例のインターリーブ頻度,例題をリプレイするシーケンス,例題をリプレイバッファに選択する戦略の3つの点で検討する。キュリキュラデザインのこれらの側面は認知心理学の原則と整合し、リプレイ中のインターリーブドプラクティスの利点、簡単でハードなリハーサル、そして難易度の一様分布からの模範的選択戦略を活用する。以上の結果から,これら3つのカリキュラムは,継続学習手法の進歩におけるカリキュラムの可能性を実証し,破滅的な記憶とポジティブな知識伝達を効果的に緩和した。 Humans engage in learning and reviewing processes with curricula when acquiring new skills or knowledge. This human learning behavior has inspired the integration of curricula with replay methods in continual learning agents. The goal is to emulate the human learning process, thereby improving knowledge retention and facilitating learning transfer. Existing replay methods in continual learning agents involve the random selection and ordering of data from previous tasks, which has shown to be effective. However, limited research has explored the integration of different curricula with replay methods to enhance continual learning. Our study takes initial steps in examining the impact of integrating curricula with replay methods on continual learning in three specific aspects: the interleaved frequency of replayed exemplars with training data, the sequence in which exemplars are replayed, and the strategy for selecting exemplars into the replay buffer. These aspects of curricula design align with cognitive psychology principles and leverage the benefits of interleaved practice during replays, easy-to-hard rehearsal, and exemplar selection strategy involving exemplars from a uniform distribution of difficulties. Based on our results, these three curricula effectively mitigated catastrophic forgetting and enhanced positive knowledge transfer, demonstrating the potential of curricula in advancing continual learning methodologies.	翻訳日:2023-07-13 15:19:01 公開日:2023-07-08
# 微調整を伴わないBKT転移における漸近自由の量子規則化 A qubit regularization of asymptotic freedom at the BKT transition without fine-tuning ( http://arxiv.org/abs/2307.06117v1 ) ライセンス: Link先を確認	Sandip Maiti, Debasish Banerjee, Shailesh Chandrasekharan, and Marina K. Marinkovic	(参考訳) 本稿では,BKT遷移で現れる漸近的に自由な大規模連続体量子場理論を正規化するための2次元ハードコアループガスモデルを提案する。微調整なしでは、位相遷移に近づくと古典格子XYモデルの普遍的なステップスケーリング関数を大相で再現することができる。これは、ループガス配置空間におけるフォック真空部位の空孔率を熱力学限界でゼロに下げることによって達成される。 BKT遷移における普遍量のいくつかは、従来のXYモデルと比較して、我々のモデルにおいてより小さい有限サイズ効果を示す。我々のモデルはユークリッド時空における漸近的に自由な巨大量子場理論の量子正則化の素例であり、微調整なしで解離された固定点における関連する摂動として漸近的自由がどのように生じるかを理解するのに役立つ。 We propose a two-dimensional hard core loop-gas model as a way to regularize the asymptotically free massive continuum quantum field theory that emerges at the BKT transition. Without fine-tuning, our model can reproduce the universal step-scaling function of the classical lattice XY model in the massive phase as we approach the phase transition. This is achieved by lowering the fugacity of Fock-vacuum sites in the loop-gas configuration space to zero in the thermodynamic limit. Some of the universal quantities at the BKT transition show smaller finite size effects in our model as compared to the traditional XY model. Our model is a prime example of qubit regularization of an asymptotically free massive quantum field theory in Euclidean space-time and helps understand how asymptotic freedom can arise as a relevant perturbation at a decoupled fixed point without fine-tuning.	翻訳日:2023-07-13 13:10:18 公開日:2023-07-08
# 画像生成モデルの定性的故障とディープフェイク検出への応用 Qualitative Failures of Image Generation Models and Their Application in Detecting Deepfakes ( http://arxiv.org/abs/2304.06470v3 ) ライセンス: Link先を確認	Ali Borji	(参考訳) 画像生成モデルと映像生成モデルがフォトリアリスティックな画像を作成する能力は前代未聞の高さに達しており、実像と偽像を区別することは多くの場合困難である。しかし、この進歩にもかかわらず、生成した画像の品質と現実世界に見られるものとの間にはギャップが残っている。そこで本稿では,画像生成モデルにおける質的欠点を5つのカテゴリに分類し,学術出版物とソーシャルメディアの双方から膨大な文献をレビューした。これらの失敗を理解することによって、これらのモデルの改善が必要な領域を特定し、深い偽物を検出する戦略を開発することができる。今日の社会におけるディープフェイクの流行は深刻な懸念であり、我々の発見は彼らのネガティブな影響を軽減するのに役立つ。 The ability of image and video generation models to create photorealistic images has reached unprecedented heights, making it difficult to distinguish between real and fake images in many cases. However, despite this progress, a gap remains between the quality of generated images and those found in the real world. To address this, we have reviewed a vast body of literature from both academic publications and social media to identify qualitative shortcomings in image generation models, which we have classified into five categories. By understanding these failures, we can identify areas where these models need improvement, as well as develop strategies for detecting deep fakes. The prevalence of deep fakes in today's society is a serious concern, and our findings can help mitigate their negative impact.	翻訳日:2023-07-12 18:27:43 公開日:2023-07-08
# 知識グラフと閉鎖型連続時間液体ニューラルネットワークを用いた患者ケア用ディジタル双極子 Digital Twins for Patient Care via Knowledge Graphs and Closed-Form Continuous-Time Liquid Neural Networks ( http://arxiv.org/abs/2307.04772v1 ) ライセンス: Link先を確認	Logan Nye	(参考訳) デジタルツイン技術は医療を変革し、パーソナライズされた医療とサポート、早期診断、シミュレートされた治療結果、最適化された手術計画を可能にしている。デジタル双子は、製造業、サプライチェーンのロジスティクス、民間インフラなどの業界で、容易に注目を集めている。ただし、患者のケアは行わない。マルチモーダル患者データを用いた複雑な疾患のモデル化の課題と、その解析の複雑さは、生体医療分野におけるデジタル双生児の採用を阻害している。しかし、これらの大きな障害は、異なる方法でこれらのモデルにアプローチすることで対処できる可能性がある。本稿では,計算コストとモデリング複雑さによる臨床双対モデリングの障壁に対処する新しい枠組みを提案する。本稿では,患者健康データをナレッジグラフとして構造化し,クローズドフォームな連続時間液体ニューラルネットワークを用いてリアルタイム分析を行う。マルチモーダル患者データを合成し, 閉鎖型連続時間ネットワークと知識グラフオントロジーの柔軟性と効率を活用することにより, リアルタイムの洞察, パーソナライズド医療, 早期診断と介入, 最適な手術計画を可能にした。この新しいアプローチは、患者の健康の包括的で適応可能なビューと、リアルタイム分析を提供し、デジタルツインシミュレーションやその他の医療における期待される利益の道を開く。 Digital twin technology has is anticipated to transform healthcare, enabling personalized medicines and support, earlier diagnoses, simulated treatment outcomes, and optimized surgical plans. Digital twins are readily gaining traction in industries like manufacturing, supply chain logistics, and civil infrastructure. Not in patient care, however. The challenge of modeling complex diseases with multimodal patient data and the computational complexities of analyzing it have stifled digital twin adoption in the biomedical vertical. Yet, these major obstacles can potentially be handled by approaching these models in a different way. This paper proposes a novel framework for addressing the barriers to clinical twin modeling created by computational costs and modeling complexities. We propose structuring patient health data as a knowledge graph and using closed-form continuous-time liquid neural networks, for real-time analytics. By synthesizing multimodal patient data and leveraging the flexibility and efficiency of closed form continuous time networks and knowledge graph ontologies, our approach enables real time insights, personalized medicine, early diagnosis and intervention, and optimal surgical planning. This novel approach provides a comprehensive and adaptable view of patient health along with real-time analytics, paving the way for digital twin simulations and other anticipated benefits in healthcare.	翻訳日:2023-07-12 17:29:25 公開日:2023-07-08
# チェスの正方形の価値 The Value of Chess Squares ( http://arxiv.org/abs/2307.05330v1 ) ライセンス: Link先を確認	Aditya Gupta and Shiva Maharaj and Nicholas Polson and Vadim Sokolov	(参考訳) チェスの正方形の評価とボード上の駒の配置の決定が,本研究の主な目的である。チェスAIの出現により、チェスのゲームにおけるポジションの価値を正確に評価することが可能になった。従来の手法では固定値を$(\symking=\infty, \symqueen=9, \symrook=5, \symbishop=3, \symknight=3, \sympawn=1)$に割り当てる。我々はこの分析を、部品と正方形の両方の限界評価を導入することで強化する。我々は、騎士と司教の位置を調べることによって我々の方法を示し、ポーンの評価について貴重な洞察を提供する。特に、ニムゾヴィチはポーンの構造と評価の重要性を提唱する先駆者の一人であった。最後に,今後の研究への道筋を示唆する。 Valuing chess squares and determining the placement of pieces on the board are the main objectives of our study. With the emergence of chess AI, it has become possible to accurately assess the worth of positions in a game of chess. The conventional approach assigns fixed values to pieces $(\symking=\infty, \symqueen=9, \symrook=5, \symbishop=3, \symknight=3, \sympawn=1)$. We enhance this analysis by introducing marginal valuations for both pieces and squares. We demonstrate our method by examining the positioning of Knights and Bishops, and also provide valuable insights into the valuation of pawns. Notably, Nimzowitsch was among the pioneers in advocating for the significance of Pawn structure and valuation. Finally, we conclude by suggesting potential avenues for future research.	翻訳日:2023-07-12 14:36:53 公開日:2023-07-08
# 筋力と関節運動量のsEMGに基づく物理インフォームドローショット学習 A Physics-Informed Low-Shot Learning For sEMG-Based Estimation of Muscle Force and Joint Kinematics ( http://arxiv.org/abs/2307.05361v1 ) ライセンス: Link先を確認	Yue Shi, Shuhao Ma, Yihui Zhao, Zhiqiang Zhang	(参考訳) 表面筋電図(SEMG)からの筋力と関節キネマティクス推定は,神経筋刺激,筋動態,運動学における動的相互作用のリアルタイム生体力学的解析に不可欠である。深層ニューラルネットワーク(dnn)の最近の進歩は、完全に自動化され再現可能な方法で生体力学的解析を改善する可能性を示している。しかし, 微小試料の性質と生体力学的解析の物理的解釈性は, dnnの応用を制限している。本稿では,筋力と関節キネマティクスのsEMGに基づく新しい物理インフォームドローショット学習法を提案する。この方法は, ラグランジュの運動方程式と逆動的筋モデルとを, 構造的特徴復号と小サンプルデータからの外挿推定のためのGANフレームワークにシームレスに統合する。具体的には、ラグランジュの運動方程式が生成モデルに導入され、物理学の法則に従う高次特徴の構造化復号を抑える。また,外挿推定値と物理参照値の一貫した物理表現を報奨することにより,対向学習効率を向上させるために,物理に変形した政策勾配が設計されている。実験的な検証は2つのシナリオ(歩行試験と手首運動試験)で実施されている。その結果, 筋力と関節キネマティクスの推定値は, 物理学に基づく逆動力学と比較して非バイアスであり, 物理インフォームド・コンボリューション・ニューラルネット(PI-CNN), バレリーナ生成対向ネットワーク(GAN), 多層超越学習マシン(ML-ELM)など, 選択されたベンチマーク手法よりも優れていた。 Muscle force and joint kinematics estimation from surface electromyography (sEMG) are essential for real-time biomechanical analysis of the dynamic interplay among neural muscle stimulation, muscle dynamics, and kinetics. Recent advances in deep neural networks (DNNs) have shown the potential to improve biomechanical analysis in a fully automated and reproducible manner. However, the small sample nature and physical interpretability of biomechanical analysis limit the applications of DNNs. This paper presents a novel physics-informed low-shot learning method for sEMG-based estimation of muscle force and joint kinematics. This method seamlessly integrates Lagrange's equation of motion and inverse dynamic muscle model into the generative adversarial network (GAN) framework for structured feature decoding and extrapolated estimation from the small sample data. Specifically, Lagrange's equation of motion is introduced into the generative model to restrain the structured decoding of the high-level features following the laws of physics. And a physics-informed policy gradient is designed to improve the adversarial learning efficiency by rewarding the consistent physical representation of the extrapolated estimations and the physical references. Experimental validations are conducted on two scenarios (i.e. the walking trials and wrist motion trials). Results indicate that the estimations of the muscle forces and joint kinematics are unbiased compared to the physics-based inverse dynamics, which outperforms the selected benchmark methods, including physics-informed convolution neural network (PI-CNN), vallina generative adversarial network (GAN), and multi-layer extreme learning machine (ML-ELM).	翻訳日:2023-07-12 14:15:10 公開日:2023-07-08
# 球状グラスホッパー問題 The Spherical Grasshopper Problem ( http://arxiv.org/abs/2307.05359v1 ) ライセンス: Link先を確認	Boris van Breugel	(参考訳) この論文の目的は、単位球面上のグラスホッパー問題の理解を深めることである。この問題はベルの不等式の解析によって動機づけられるが、次のような幾何学的パズルとして定式化することができる。白い球面と黒いペンキのバケツが与えられると、球面の半分をペイントするよう求められ、対脚の対の点の対色が反対に色付けされる。芝生のホッパーが球体に着陸し、ランダムな方向に固定された距離をジャンプします。グラッパーが同じ色に着陸する確率を最大化するために、どのように球体を色付けすべきか? ゴルコとケントは、対足性制約のない平面上でこの問題を調査した。このエッセイは、反ポジタリティ制約による球面問題は平面問題と同様の形状の彩色をもたらすという明確な示唆を与える。本研究は, この問題を解明し, 最適解の探索にシミュレートされたアニールアルゴリズムを用いた。結果は \cite{goulkokent} の平面結果と一致する。 0.10\pi\leq\theta\leq0.44\pi$ cogwheel solution が最適であることが判明し、cogs $n_o$ の奇数整数で $n_o$ が $\frac{2\pi}{\theta}$ に近い。 0.45\pi\leq\theta\leq 0.55\pi$ 臨界解は、同じ色の領域が0.5\pi$ まで小さくなる。約$\theta\geq 0.55$の色は、コグのついたストライプで示される。色は$\theta=\pi$で、幅は$\pi-\theta$で表示される。 The aim of this essay is to better understand the Grasshopper Problem on the surface of the unit sphere. The problem is motivated by analysing Bell inequalities, but can be formulated as a geometric puzzle as follows. Given a white sphere and a bucket of black paint, one is asked to paint half of the sphere, such that antipodal pairs of points are oppositely coloured. A grasshopper lands on the sphere, and jumps a fixed distance in a random direction. How should the sphere be coloured such that the probability of the grasshopper landing on the same colour is maximized? Goulko and Kent have explored this problem on the plane without an antipodality constraint. This essay gives clear indication that the spherical problem with the antipodality constraint yields colourings with similar shapes as the planar problem does. This research has discretised the problem and used a simulated annealing algorithm to search for the optimal solution. Results are consistent with the planar results of \cite{goulkokent}. For $0.10\pi\leq\theta\leq0.44\pi$ cogwheel solutions are found to be optimal, with odd integer of cogs $n_o$ such that $n_o$ is close to $\frac{2\pi}{\theta}$. For $0.45\pi\leq\theta\leq 0.55\pi$ critical solutions are found, in which domains of identical colour decrease in size towards $0.5\pi$ (moving from either side). For $\theta\geq 0.55$ colourings are found consisting of stripes with cogs. Towards $\theta=\pi$ colourings are generated that display just stripes that scale in width with $\pi-\theta$.	翻訳日:2023-07-12 14:14:15 公開日:2023-07-08
# 相対論的量子論の確率論的基礎に向けて:曲線時空における1-Body Born Rule Towards a Probabilistic Foundation of Relativistic Quantum Theory: The One-Body Born Rule in Curved Spacetime ( http://arxiv.org/abs/2012.05212v5 ) ライセンス: Link先を確認	Maik Reddiger and Bill Poirier	(参考訳) 本研究では、量子力学ボルン則の一般化に基づく相対論的量子論の基礎への新しいアプローチを確立し、時空への粒子位置の確率を決定する。この研究の主要な動機は、量子場理論(QFT)の内部数学的問題を克服することであり、例えば「無限の確率」(再正規化)は、QFTに対する公理的アプローチが数学的だけでなく概念的な性質も持つことを示した。ここで示されるアプローチは、構成によって統計的であり、幅広い力学モデルに対応でき、ミンコフスキー時空の対称性に依存しておらず、相対性理論の一般原理を尊重する。この研究の分析的な部分では、関連する数学的量の滑らかさを仮定して1ドルボディの場合を考える。これは一般相対論的連続性方程式の理論の特別な場合である。ボルン則の相対論的一般化に対する関連するアプローチは、関心の超曲面が空間的であり、時空が大域的に双曲的であると仮定するが、我々はC. Eckart と J. Ehlers の事前の貢献を用いて、前者の条件が自然に「非緊急条件」に置き換えられ、後者が時代遅れであることを示す。我々は、非相対論的アナログから用語を借りて、ラグランジアンとユーレリアの絵と呼ぶ1ドルボディケースの2つの異なる定式化について論じる。私たちは両方を包括的に扱う。この研究の数学物理学文学への主な貢献は、ラグランジアン像の発展である。ラングランジアンの絵は、このアプローチにおいて「時間の確率」を解き、多くの体への一般化の青写真として機能し、体の数が保存されていない場合(後者の例)を描いている。 In this work we establish a novel approach to the foundations of relativistic quantum theory, which is based on generalizing the quantum-mechanical Born rule for determining particle position probabilities to curved spacetime. A principal motivator for this research has been to overcome internal mathematical problems of quantum field theory (QFT) such as the `problem of infinities' (renormalization), which axiomatic approaches to QFT have shown to be not only of mathematical but also of conceptual nature. The approach presented here is statistical by construction, can accommodate a wide array of dynamical models, does not rely on the symmetries of Minkowski spacetime, and respects the general principle of relativity. In the analytical part of this work we consider the $1$-body case under the assumption of smoothness of the mathematical quantities involved. This is identified as a special case of the theory of the general-relativistic continuity equation. While related approaches to the relativistic generalization of the Born rule assume the hypersurfaces of interest to be spacelike and the spacetime to be globally hyperbolic, we employ prior contributions by C. Eckart and J. Ehlers to show that the former condition is naturally replaced by a `non-tangency condition' and that the latter one is obsolete. We discuss two distinct formulations of the $1$-body case, which, borrowing terminology from the non-relativistic analog, we term the Lagrangian and Eulerian pictures. We provide a comprehensive treatment of both. The main contribution of this work to the mathematical physics literature is the development of the Lagrangian picture. The Langrangian picture shows how one can resolve the `problem of time' in this approach and therefore serves as a blueprint for the generalization to many bodies and the case that the number of bodies is not conserved (example given for the latter).	翻訳日:2023-07-11 23:09:51 公開日:2023-07-08
# 新型コロナウイルス危機における意味ネットワーク分析による金融市場の予測 Forecasting financial markets with semantic network analysis in the COVID-19 crisis ( http://arxiv.org/abs/2009.04975v4 ) ライセンス: Link先を確認	A. Fronzetti Colladon, S. Grassi, F. Ravazzolo, F. Violante	(参考訳) 本稿では,ストックマーケットデータの予測に新たなテキストデータインデックスを用いる。インデックスは、テキストに現れる1つ以上の一般的な経済関連キーワードの重要性を評価するために、大量のニュースに適用される。この指標は、その使用頻度と意味ネットワークの位置に基づいて、経済関連キーワードの重要性を評価する。我々は、イタリアの報道機関に適用し、新型コロナウイルス危機を含む最近のサンプル期間におけるイタリア株と債券市場のリターンとボラティリティを予測する指標を構築します。その証拠は、この指数が金融時系列の異なるフェーズをうまく捉えていることを示している。さらに、債券市場のデータ、リターンとボラティリティ、短い熟成と長い熟成、株式市場のボラティリティの予測可能性の強い証拠が示されている。 This paper uses a new textual data index for predicting stock market data. The index is applied to a large set of news to evaluate the importance of one or more general economic-related keywords appearing in the text. The index assesses the importance of the economic-related keywords, based on their frequency of use and semantic network position. We apply it to the Italian press and construct indices to predict Italian stock and bond market returns and volatilities in a recent sample period, including the COVID-19 crisis. The evidence shows that the index captures the different phases of financial time series well. Moreover, results indicate strong evidence of predictability for bond market data, both returns and volatilities, short and long maturities, and stock market volatility.	翻訳日:2023-07-11 23:08:47 公開日:2023-07-08
# 非パラメトリック回帰における相転移 Phase transitions in nonparametric regressions ( http://arxiv.org/abs/2112.03626v6 ) ライセンス: Link先を確認	Ying Zhu	(参考訳) 単一の変数の未知回帰関数が、至る所で共通定数で有界な$(\gamma+1)$thの微分を持つことが知られている(つまり、$(\gamma+1)$thの滑らかさの次数)とき、平均積分二乗誤差(MISE)の最小値の最適値は、文学において$\left(\frac{1}{n}\right)^{\frac{2\gamma+2}{2\gamma+3}}$と記述される。本稿では, (i)$n\leq\left(\gamma+1\right)^{2\gamma+3}$の場合、minimaxの最適ミゼレートは$\frac{\log n}{n\log(\log n)}$であり、最適な滑らか性はおよそ$\max\left\{ \left\lfloor \frac{\log n}{2\log\left(\log n\right)}\right\rfloor ,\,1\right\} $;である。 (ii)$n>\left(\gamma+1\right)^{2\gamma+3}$の場合、ミニマックス最適ミゼレートは$\left(\frac{1}{n}\right)^{\frac{2\gamma+2}{2\gamma+3}}$であり、悪用するための滑らかさの最適度は$\gamma+1$である。本論文の基本的な貢献は、滑らかな関数クラスのために開発した計量エントロピー境界の集合である。私たちの境界のいくつかはオリジナルであり、そのうちのいくつかは文学(例えば、コルモゴロフとティホミロフ、1959)の改善と一般化である。我々の計量エントロピー境界は、よく見られる滑らか性クラスと非標準滑らか性クラスに付随するミニマックス最適MISEレートの位相遷移を示すことができ、非パラメトリック回帰問題以外の独立した関心を持つこともできる。 When the unknown regression function of a single variable is known to have derivatives up to the $(\gamma+1)$th order bounded in absolute values by a common constant everywhere or a.e. (i.e., $(\gamma+1)$th degree of smoothness), the minimax optimal rate of the mean integrated squared error (MISE) is stated as $\left(\frac{1}{n}\right)^{\frac{2\gamma+2}{2\gamma+3}}$ in the literature. This paper shows that: (i) if $n\leq\left(\gamma+1\right)^{2\gamma+3}$, the minimax optimal MISE rate is $\frac{\log n}{n\log(\log n)}$ and the optimal degree of smoothness to exploit is roughly $\max\left\{ \left\lfloor \frac{\log n}{2\log\left(\log n\right)}\right\rfloor ,\,1\right\} $; (ii) if $n>\left(\gamma+1\right)^{2\gamma+3}$, the minimax optimal MISE rate is $\left(\frac{1}{n}\right)^{\frac{2\gamma+2}{2\gamma+3}}$ and the optimal degree of smoothness to exploit is $\gamma+1$. The fundamental contribution of this paper is a set of metric entropy bounds we develop for smooth function classes. Some of our bounds are original, and some of them improve and/or generalize the ones in the literature (e.g., Kolmogorov and Tikhomirov, 1959). Our metric entropy bounds allow us to show phase transitions in the minimax optimal MISE rates associated with some commonly seen smoothness classes as well as non-standard smoothness classes, and can also be of independent interest outside the nonparametric regression problems.	翻訳日:2023-07-11 23:07:17 公開日:2023-07-08
# トレース汎函数の凸性と単調性 Some convexity and monotonicity results of trace functionals ( http://arxiv.org/abs/2108.05785v2 ) ライセンス: Link先を確認	Haonan Zhang	(参考訳) 本稿では、トレース汎函数の凸性を$(A,B,C)\mapsto \text{Tr}\|B^{p}AC^{q}\|^{s},$$ for parameters $(p,q,s)$が最適であることを示す。また、このタイプのトレース汎関数の単調版を得る。応用として、いくつかの結果を \cite{HP12quasi,CFL16some} に拡張し、行列設定における \cite{RZ14} の予想を解く。その他の cite{RZ14} の予想も議論される。また、関連するトレース函数が一般に凹凸でないことも示している。このような凹凸は異なる問題で成立することが期待された。 In this paper, we prove the convexity of trace functionals $$(A,B,C)\mapsto \text{Tr}\|B^{p}AC^{q}\|^{s},$$ for parameters $(p,q,s)$ that are best possible, where $B$ and $C$ are any $n$-by-$n$ positive definite matrices, and $A$ is any $n$-by-$n$ matrix. We also obtain the monotonicity versions of trace functionals of this type. As applications, we extend some results in \cite{HP12quasi,CFL16some} and resolve a conjecture in \cite{RZ14} in the matrix setting. Other conjectures in \cite{RZ14} will also be discussed. We also show that some related trace functionals are not concave in general. Such concavity results were expected to hold in different problems.	翻訳日:2023-07-11 23:05:40 公開日:2023-07-08
# 分類システムにおける説明のための統一論理枠組み A unified logical framework for explanations in classifier systems ( http://arxiv.org/abs/2105.14452v8 ) ライセンス: Link先を確認	Xinghan Liu and Emiliano Lorini	(参考訳) 近年では、説明可能なAI(XAI)分野におけるバイナリ分類器の説明において、ブール関数に対する新たな関心が高まっている。ブール関数の標準的なアプローチは命題論理である。我々は,二項入力分類器とその特性に関する推論をサポートするceteris paribusの性質のモーダル言語を提案する。我々は、分類子モデルの族を研究し、言語の濃度に関する2つの証明体系として公理化し、我々の公理学の完全性を示す。さらに、我々の様相言語に対する充足可能性チェック問題は無限変数の場合ではnexptime-completeであり、有限変数の場合では多項式となることを証明した。さらに、無限変数の場合において、我々の言語の興味深いNPフラグメントを同定する。我々はこの言語を,帰納的,対比的,反事実的説明,バイアスを含む様々な説明概念と同様に,反事実条件を形式化するために活用する。最後に,この言語の2つの拡張について述べる: 代入可能分類器変更の概念による動的拡張と,実際の入力に対する分類器の不確実性を表現できる認識的拡張である。 Recent years have witnessed a renewed interest in Boolean function in explaining binary classifiers in the field of explainable AI (XAI). The standard approach of Boolean function is propositional logic. We present a modal language of a ceteris paribus nature which supports reasoning about binary input classifiers and their properties. We study a family of classifier models, axiomatize it as two proof systems regarding the cardinality of the language and show completeness of our axiomatics. Moreover, we prove that satisfiability checking problem for our modal language is NEXPTIME-complete in the infinite-variable case, while it becomes polynomial in the finite-variable case. We furthermore identify an interesting NP fragment of our language in the infinite-variable case. We leverage the language to formalize counterfactual conditional as well as a variety of notions of explanation including abductive, contrastive and counterfactual explanations, and biases. Finally, we present two extensions of our language: a dynamic extension by the notion of assignment enabling classifier change and an epistemic extension in which the classifier's uncertainty about the actual input can be represented.	翻訳日:2023-07-11 23:05:20 公開日:2023-07-08
# 量子計測理論のためのポインタ Pointers for Quantum Measurement Theory ( http://arxiv.org/abs/2203.11144v2 ) ライセンス: Link先を確認	Jay Lawrence	(参考訳) 原子スピン1/2または光子偏極の象徴的な測定では、2つの空間分離および非相互作用検出器を用いる。各検出器は二分体であり、原子または光子の存在または欠如を登録する。 $d$状態粒子の測定では、よく知られたポインタ変数をそのような検出器の配列に置き換えることで、標準のフォン・ノイマン測度形式をリキャストする。予備測定プロセスのユニタリダイナミクスは、検出器出力を単一結果のサブ空間に制限し、ポインタが装置から現れることを示す。本装置の物理拡張により,各検出器をリードアウト装置に結合したアンシラ量子ビットに置き換える。これにより、ポインタを(効果的に)古典的部分と異なる量子に分離し、量子を古典的遷移に遅らせる。その結果、通常の装置の崩壊シナリオを回復するだけでなく、量子ポインター状態の重ね合わせを観測することもできる。 In the iconic measurements of atomic spin-1/2 or photon polarization, one employs two spatially separated and noninteracting detectors. Each detector is binary, registering the presence or absence of the atom or the photon. For measurements on a $d$-state particle we recast the standard von Neumann measurement formalism by replacing the familiar pointer variable with an array of such detectors, one for each of the $d$ possible outcomes. We show that the unitary dynamics of the premeasurement process restricts the detector outputs to the subspace of single outcomes, so that the pointer emerges from the apparatus. We propose a physical extension of this apparatus which replaces each detector with an ancilla qubit coupled to a readout device. This explicitly separates the pointer into distinct quantum and (effectively) classical parts, and delays the quantum to classical transition. As a result, one not only recovers the collapse scenario of an ordinary apparatus, but one can also observe a superposition of the quantum pointer states.	翻訳日:2023-07-11 22:54:36 公開日:2023-07-08
# bayanアルゴリズム:モジュラリティの完全および近似最適化によるネットワーク内のコミュニティの検出 The Bayan Algorithm: Detecting Communities in Networks Through Exact and Approximate Optimization of Modularity ( http://arxiv.org/abs/2209.04562v3 ) ライセンス: Link先を確認	Samin Aref, Hriday Chheda, and Mahdi Mostajabdaveh	(参考訳) コミュニティ検出はネットワーク科学における古典的な問題であり、様々な分野に幅広く応用されている。多くのアプローチの中で、最も一般的な方法はモジュラリティの最大化である。設計哲学と広く採用されているにもかかわらず、ヒューリスティックなモジュラリティ最大化アルゴリズムが最適分割を返すことは滅多にない。そこで我々は,最適性あるいは最適分割への近さを保証した分割を返却する特殊アルゴリズムbayanを提案する。ベイアンアルゴリズムの中核は、モジュラリティ最大化問題の整数計画式を最適性や係数内で近似する分岐とカットのスキームである。構造的に多様な合成および実ネットワークを用いた30種類のコミュニティ検出手法と比較した。この結果は,標準ベンチマークグラフの基幹コミュニティの検索におけるベイアンの特異な精度と安定性を示す。 Bayanは、モジュラリティの最大化のためにオープンソースや商用の解決器よりも数倍高速で、既存の方法では最適化できないインスタンスの最適なパーティションを見つけることができる。全体として、ベイアンは最大3000のエッジを持つ実ネットワークにおけるモジュラリティの正確な最大化と、通常のコンピュータ上での大規模インスタンスにおける最大モジュラリティの近似に適した選択であると評価している。 Bayanアルゴリズム(bayanpyライブラリ)のPython実装は、Pythonのパッケージインストーラ(pip)を通じて公開されている。 Community detection is a classic problem in network science with extensive applications in various fields. Among numerous approaches, the most common method is modularity maximization. Despite their design philosophy and wide adoption, heuristic modularity maximization algorithms rarely return an optimal partition or anything similar. We propose a specialized algorithm, Bayan, which returns partitions with a guarantee of either optimality or proximity to an optimal partition. At the core of the Bayan algorithm is a branch-and-cut scheme that solves an integer programming formulation of the modularity maximization problem to optimality or approximate it within a factor. We compare Bayan against 30 alternative community detection methods using structurally diverse synthetic and real networks. Our results demonstrate Bayan's distinctive accuracy and stability in retrieving ground-truth communities of standard benchmark graphs. Bayan is several times faster than open-source and commercial solvers for modularity maximization making it capable of finding optimal partitions for instances that cannot be optimized by any other existing method. Overall, our assessments point to Bayan as a suitable choice for exact maximization of modularity in real networks with up to 3000 edges (in their largest connected component) and approximating maximum modularity in larger instances on ordinary computers. A Python implementation of the Bayan algorithm (the bayanpy library) is publicly available through the package installer for Python (pip).	翻訳日:2023-07-11 22:46:59 公開日:2023-07-08
# 部分展開による拡張多目的A* Enhanced Multi-Objective A* with Partial Expansion ( http://arxiv.org/abs/2212.03712v2 ) ライセンス: Link先を確認	Valmiki Kothare, Zhongqiang Ren, Sivakumar Rathinam, Howie Choset	(参考訳) 一般にグラフ上に置かれるMO-SPP(Multi-Objective Shortest Path Problem)は、複数の目的を最適化しながら開始頂点から目的地頂点への経路のセットを決定する。一般に、全ての目的を同時に最適化できる単一の解経路は存在しないので、問題はいわゆるパレート最適解の集合を見つけようとする。この問題に対処するため、複数の多目的a(moa)アルゴリズムが最近開発され、品質保証付きで素早く解を計算できるようになった。しかし、これらのMOAアルゴリズムは、特にグラフの分岐係数(すなわち、任意の頂点の隣人の数)が大きい場合、高いメモリ使用率に悩まされることが多い。この作業は,MOAの高メモリ消費を,実行時にほとんど増加せずに削減することを目的としている。複数の単一目的および多目的探索アルゴリズムを一般化して統一することにより,2つのユーザ定義ハイパーパラメータをチューニングすることにより,実行時およびメモリ効率のバランスをとるランタイムとメモリ効率のmoa(rme-moa)アプローチを開発した。 The Multi-Objective Shortest Path Problem (MO-SPP), typically posed on a graph, determines a set of paths from a start vertex to a destination vertex while optimizing multiple objectives. In general, there does not exist a single solution path that can simultaneously optimize all the objectives and the problem thus seeks to find a set of so-called Pareto-optimal solutions. To address this problem, several Multi-Objective A* (MOA) algorithms were recently developed to quickly compute solutions with quality guarantees. However, these MOA algorithms often suffer from high memory usage, especially when the branching factor (i.e. the number of neighbors of any vertex) of the graph is large. This work thus aims at reducing the high memory consumption of MOA* with little increase in the runtime. By generalizing and unifying several single- and multi-objective search algorithms, we develop the Runtime and Memory Efficient MOA* (RME-MOA*) approach, which can balance between runtime and memory efficiency by tuning two user-defined hyper-parameters.	翻訳日:2023-07-11 22:36:58 公開日:2023-07-08
# エッジコンピューティングのためのオンラインシーケンス学習を用いた効率的な圧縮比推定 Efficient Compressed Ratio Estimation Using Online Sequential Learning for Edge Computing ( http://arxiv.org/abs/2211.04284v3 ) ライセンス: Link先を確認	Hiroki Oikawa, Hangli Ge, Noboru Koshizuka	(参考訳) モノのインターネットの普及により、大量のセンサー情報がリアルタイムで取得されている。これにより、エッジデバイスからのデータの通信コストが増加する。エッジデバイスで使用可能なデータ圧縮方式である圧縮センシング(cs)は,通信コストを低減する手段として注目を集めている。 csでは,適切な圧縮率の推定が重要である。強化学習(RL)を用いて取得したデータの圧縮比を適応的に推定する手法がある。しかしながら、エッジ上で使用可能な既存のrlメソッドに関連する計算コストは、しばしば高い。本研究では,actor-critic online sequential extreme learning machine (ac-oselm) と呼ばれるエッジデバイスのための効率的なrl法と,ac-oselmを用いてエッジ上の適切な圧縮率を推定してデータを圧縮するシステムを開発した。エッジデバイスにおける他のrl法との比較により,圧縮比推定における提案手法の性能を評価する。実験結果から,AC-OSELMは従来手法よりも圧縮性能が良く,圧縮比が速いことが示唆された。 Owing to the widespread adoption of the Internet of Things, a vast amount of sensor information is being acquired in real time. Accordingly, the communication cost of data from edge devices is increasing. Compressed sensing (CS), a data compression method that can be used on edge devices, has been attracting attention as a method to reduce communication costs. In CS, estimating the appropriate compression ratio is important. There is a method to adaptively estimate the compression ratio for the acquired data using reinforcement learning (RL). However, the computational costs associated with existing RL methods that can be utilized on edges are often high. In this study, we developed an efficient RL method for edge devices, referred to as the actor--critic online sequential extreme learning machine (AC-OSELM), and a system to compress data by estimating an appropriate compression ratio on the edge using AC-OSELM. The performance of the proposed method in estimating the compression ratio is evaluated by comparing it with other RL methods for edge devices. The experimental results indicate that AC-OSELM demonstrated the same or better compression performance and faster compression ratio estimation than the existing methods.	翻訳日:2023-07-11 22:36:01 公開日:2023-07-08
# すべての実射影計測は自己テストできる All Real Projective Measurements Can be Self-tested ( http://arxiv.org/abs/2302.00974v2 ) ライセンス: Link先を確認	Ranyiliu Chen, Laura Man\v{c}inska, Jurij Vol\v{c}i\v{c}	(参考訳) 自己テストは、古典的なユーザーが量子状態と測定値を生成するために使用される測定値を推定できる量子機能検証の最も強力な形式である。量子状態の自己検定はよく理解されているが、特に高次元での自己検定はより解明されている。実射影測度はすべて自己検査可能であることを示すことで、この方向の最初の一般的な結果を示す。自己テストの標準的な定義は、実測値の認証のみを可能にする。したがって、本研究は、自己検証可能な射影測定の範囲を、その全潜在能力を効果的に広げる。この結果を達成するために、既存の自己検査を拡張して、さらなる信頼できない測定を検証できるという考えを用いる。これは 'post-hoc self-testing' として知られている。我々は,ポストホック自己検査法を形式化し,その適用に十分な条件を確立する。この条件を用いて全ての実射影測度に対する自己検査を構築する。本研究では, ポストホック自己検査を逐次的に活用する反復自己検査技術を開発した。確立された自己テストから始めて、反復的自己テストによって検証できる測定セットを完全に特徴づける。これは既存のテストから新しい自己テストを構築するための明確な方法を提供する。 Self-testing is the strongest form of quantum functionality verification which allows a classical user to deduce the quantum state and measurements used to produce measurement statistics. While self-testing of quantum states is well-understood, self-testing of measurements, especially in high dimensions, has remained more elusive. We demonstrate the first general result in this direction by showing that every real projective measurement can be self-tested. The standard definition of self-testing only allows for the certification of real measurements. Therefore, our work effectively broadens the scope of self-testable projective measurements to their full potential. To reach this result, we employ the idea that existing self-tests can be extended to verify additional untrusted measurements. This is known as `post-hoc self-testing'. We formalize the method of post-hoc self-testing and establish a sufficient condition for its application. Using this condition we construct self-tests for all real projective measurements. Inspired by our construction, we develop a new technique of iterative self-testing, which involves using post-hoc self-testing in a sequential manner. Starting from any established self-test, we fully characterize the set of measurements that can be verified via iterative self-testing. This provides a clear methodology for constructing new self-tests from pre-existing ones.	翻訳日:2023-07-11 22:28:52 公開日:2023-07-08
# siamese配列構造拡散軌道予測によるプリトレーニングタンパク質エンコーダ Pre-Training Protein Encoder via Siamese Sequence-Structure Diffusion Trajectory Prediction ( http://arxiv.org/abs/2301.12068v2 ) ライセンス: Link先を確認	Zuobai Zhang, Minghao Xu, Aur\'elie Lozano, Vijil Chenthamarakshan, Payel Das, Jian Tang	(参考訳) タンパク質の自己教師付き事前学習法は最近注目され、ほとんどのアプローチはタンパク質配列または構造に焦点をあて、共進化情報と構造特性を統合することによってタンパク質の機能の包括的理解に不可欠であるそれらの共同分布の探索を無視している。本研究は, 生成タスクにおける拡散モデル決定の成功に触発されて, 配列構造共分散モデリングによるタンパク質エンコーダの事前学習を行うDiffPreTアプローチを提案する。 DiffPreTはエンコーダを誘導し、結合拡散軌道に沿って摂動されたタンパク質配列と構造を回収し、配列と構造の結合分布を取得する。必須タンパク質のコンフォメーション変化を考慮すると,シムズ拡散軌道予測(SiamDiff)と呼ばれる手法によりDiffPreTを増強し,タンパク質のコンフォメーションの異なるコンフォメーションの相関を捉える。 SiamDiffはこの目標を達成するために、構造的に相関したコンバータの拡散軌跡の表現間の相互情報を最大化する。我々はDiffPreTとSiamDiffが原子レベルおよび残基レベルの構造に基づくタンパク質理解タスクに与える影響について検討した。実験結果から,全タスクにおいてDiffPreTのパフォーマンスは一貫して競争力があり,SiamDiffは全タスクの平均ランクを考慮して,新たな最先端のパフォーマンスを実現していることがわかった。実装はhttps://github.com/deepgraphlearning/siamdiffで利用可能です。 Self-supervised pre-training methods on proteins have recently gained attention, with most approaches focusing on either protein sequences or structures, neglecting the exploration of their joint distribution, which is crucial for a comprehensive understanding of protein functions by integrating co-evolutionary information and structural characteristics. In this work, inspired by the success of denoising diffusion models in generative tasks, we propose the DiffPreT approach to pre-train a protein encoder by sequence-structure joint diffusion modeling. DiffPreT guides the encoder to recover the native protein sequences and structures from the perturbed ones along the joint diffusion trajectory, which acquires the joint distribution of sequences and structures. Considering the essential protein conformational variations, we enhance DiffPreT by a method called Siamese Diffusion Trajectory Prediction (SiamDiff) to capture the correlation between different conformers of a protein. SiamDiff attains this goal by maximizing the mutual information between representations of diffusion trajectories of structurally-correlated conformers. We study the effectiveness of DiffPreT and SiamDiff on both atom- and residue-level structure-based protein understanding tasks. Experimental results show that the performance of DiffPreT is consistently competitive on all tasks, and SiamDiff achieves new state-of-the-art performance, considering the mean ranks on all tasks. Our implementation is available at https://github.com/DeepGraphLearning/SiamDiff.	翻訳日:2023-07-11 22:28:33 公開日:2023-07-08
# talk the walk: 対話型音楽推薦のための合成データ生成 Talk the Walk: Synthetic Data Generation for Conversational Music Recommendation ( http://arxiv.org/abs/2301.11489v2 ) ライセンス: Link先を確認	Megan Leszczynski, Ravi Ganti, Shu Zhang, Krisztian Balog, Filip Radlinski, Fernando Pereira, Arun Tejasvi Chaganty	(参考訳) レコメンデーションシステムはユビキタスだが,レコメンデーション品質の低さをユーザがコントロールし,調整することが難しい場合が多い。これにより会話レコメンデーションシステム(CRS)の開発が動機となり、自然言語フィードバックによるレコメンデーションの制御が可能となった。しかし、会話レコメンデーションシステムを構築するには、さまざまな好みをカバーする項目と組み合わせたユーザの発話を含む会話トレーニングデータが必要である。このようなデータは、クラウドソーシングのような従来の手法を使って、まとまりなく収集することは困難である。本研究は,音楽,ニュース,レシピレコメンデーションといったユースケースによって動機付けられた,このタスクに対する関心の高まりに注目し,アイテムセットレコメンデーションの文脈で対処する。本稿では,広く利用可能なアイテムコレクションに符号化されたドメイン知識を活用して,現実的な高品質な会話データを合成するTalkTheWalkを提案する。具体的には、TalkTheWalkは、システムによって返される仮説的だが実証可能な一連のアイテムを生成し、その後、言語モデルを使用して対応するユーザの発話を生成する。 TalkTheWalkを音楽レコメンデーションに適用すると、100万以上の多様なプレイリストのキュレーション会話が生成される。人間による評価では、会話には関連する項目集合と一貫した発話が含まれており、このタスクのための小さな人間の会話データの品質とほぼ一致している。同時に、合成コーパスを使用してcrsをトレーニングする場合、標準ベースラインよりもベンチマークデータセットのhis@100を10.5ポイント改善し、オンライン評価において最高パフォーマンスのベースラインよりも好まれる。 Recommendation systems are ubiquitous yet often difficult for users to control and adjust when recommendation quality is poor. This has motivated the development of conversational recommendation systems (CRSs), with control over recommendations provided through natural language feedback. However, building conversational recommendation systems requires conversational training data involving user utterances paired with items that cover a diverse range of preferences. Such data has proved challenging to collect scalably using conventional methods like crowdsourcing. We address it in the context of item-set recommendation, noting the increasing attention to this task motivated by use cases like music, news and recipe recommendation. We present a new technique, TalkTheWalk, that synthesizes realistic high-quality conversational data by leveraging domain expertise encoded in widely available curated item collections, showing how these can be transformed into corresponding item set curation conversations. Specifically, TalkTheWalk generates a sequence of hypothetical yet plausible item sets returned by a system, then uses a language model to produce corresponding user utterances. Applying TalkTheWalk to music recommendation, we generate over one million diverse playlist curation conversations. A human evaluation shows that the conversations contain consistent utterances with relevant item sets, nearly matching the quality of small human-collected conversational data for this task. At the same time, when the synthetic corpus is used to train a CRS, it improves Hits@100 by 10.5 points on a benchmark dataset over standard baselines and is preferred over the top-performing baseline in an online evaluation.	翻訳日:2023-07-11 22:28:07 公開日:2023-07-08
# 構造的物理近似を用いた配向基準の物理的実現 Physical realization of realignment criteria using structural physical approximation ( http://arxiv.org/abs/2301.09884v2 ) ライセンス: Link先を確認	Shruti Aggarwal, Anu Kumari, Satyabrata Adhikari	(参考訳) 量子絡み検出は量子情報処理において重要な資源であるため、量子情報理論において重要な問題である。配向基準は、二部量子系と多部量子系における絡み合った状態を検出する強力なツールである。これは、うまく機能するので、絡み合い検出の重要な基準であり、負の部分転置絡み状態(npte)だけでなく、正の部分転置絡み状態(ppte)にとっても重要な基準である。有向写像に対応する行列は不定であるため、写像の実験的な実装は不明瞭なタスクである。本稿では,まず,構造的物理的近似法(spa)を用いて,実測写像を正の写像に近似し,その後,実測写像の構造的物理的近似(spa-r)が完全に正であることを示す。構築された地図の電位は、物理的に測定できるモーメントを用いて特徴づけられる。次に,不等式という形でspa-rマップに基づく分離可能性基準を開発し,開発した評価基準がnpteだけでなくppteも検出することを示した。得られた結果を支持するいくつかの例を提示した。さらに、配向写像の近似により生じる可能性のある誤差を解析した。 Entanglement detection is an important problem in quantum information theory because quantum entanglement is a key resource in quantum information processing. Realignment criteria is a powerful tool for detection of entangled states in bipartite and multipartite quantum system. It is an important criteria for entanglement detection because it works well; not only for negative partial transpose entangled states (NPTES) but also for positive partial transpose entangled states (PPTES). Since the matrix corresponding to realignment map is indefinite so the experimental implementation of the map is an obscure task. In this work, firstly, we have approximated the realignment map to a positive map using the method of structural physical approximation (SPA) and then we have shown that the structural physical approximation of realignment map (SPA-R) is completely positive. Positivity of the constructed map is characterized using moments which can be physically measured. Next, we develop a separability criterion based on our SPA-R map in the form of an inequality and have shown that the developed criterion not only detect NPTES but also PPTES. We have provided some examples to support the results obtained. Moreover, we have analysed the error that may occur because of approximating the realignment map.	翻訳日:2023-07-11 22:27:09 公開日:2023-07-08
# 無限距離相互作用を持つスピン系における架橋閉および散逸離散時間結晶 Bridging closed and dissipative discrete time crystals in spin systems with infinite-range interactions ( http://arxiv.org/abs/2303.13334v2 ) ライセンス: Link先を確認	Jayson G. Cosme, Jim Skulte, Ludwig Mathey	(参考訳) 我々は, 周期的に駆動されるスピンボソン系において, 時間結晶(TC)の出現と安定性において, ボゾンチャネルの散逸が果たす役割を解明する。ここでボゾンは光子によって表現され、スピン系間の無限距離相互作用を媒介する。強い消散のために、有効な原子のみの記述と閉リプキン-メシュコフ-グリックモデルを用いて力学を研究する。位相図をゼロから無限強度まで様々な散逸強度にマッピングすることにより、TCが存在する位相図内の領域は散逸強度とともに成長するが、ほとんどのTCが不安定になる最適点にしか達しないことを示した。 TCは閉系と散逸系の両方で見られるが、散逸性TCはドライブのランダムノイズに対してより堅牢であることが示され、初期状態の選択によって弱い影響を受ける。我々は、完全な量子力学的記述におけるスピンの数と相互作用強度に関して、TCsの有限サイズの挙動と寿命のスケーリングを示す。 We elucidate the role that the dissipation in a bosonic channel plays in the prevalence and stability of time crystals (TCs) in a periodically driven spin-boson system described by the Dicke model. Here, the bosons are represented by photons, and they mediate the infinite-range interactions between the spin systems. For strong dissipation, we study the dynamics using an effective atom-only description and the closed Lipkin-Meshkov-Glick model. By mapping out the phase diagrams for varying dissipation strengths, ranging from zero to infinitely strong, we demonstrate that the area in the phase diagram, where a TC exists, grows with the dissipation strength but only up to an optimal point, beyond which most of the TCs become unstable. We find TCs in both closed-system and dissipative regimes, but dissipative TCs are shown to be more robust against random noise in the drive, and are only weakly affected by the choice of initial state. We present the finite-sized behaviour and the scaling of the lifetime of the TCs with respect to the number of spins and the interaction strength within a fully quantum mechanical description.	翻訳日:2023-07-11 22:17:36 公開日:2023-07-08
# 拡張貯留層アプローチによる周期駆動傾斜格子の輸送:連続体限界の回復のための安定性基準 Transport in a periodically driven tilted lattice via the extended reservoir approach: Stability criterion for recovering the continuum limit ( http://arxiv.org/abs/2303.04160v3 ) ライセンス: Link先を確認	Bitan De, Gabriela Wojtowicz, Jakub Zakrzewski, Michael Zwolak, Marek M. Rams	(参考訳) 拡張された貯水池は、ナノスケールの接触、不純物、または材料を介して電流を駆動する金属電極のような、マクロな連続的な環境を捉えるための枠組みを提供する。本稿では,この手法を周期的に駆動するシステム,特に量子輸送の文脈で応用することを検討する。時間に依存しないシナリオにおける非平衡定常状態と同様に、電流はクラマーズのターンオーバーを示し、物理的、連続的なリミット応答をキャプチャする台地領域を形成する。簡易な安定性基準は, この物理的台地を対象とする適切な緩和率を示す。このアプローチを用いて, 有限バイアスと温度で保持される2つの金属貯水池に結合した周期的に駆動される傾斜格子による量子輸送について検討した。このモデルを用いて拡張貯留層アプローチのベンチマークを行い,安定性評価を行った。このアプローチは、弱い系の限界におけるよく理解された物理的挙動を回復する。拡張型貯水池は、強い結合と非線形応答に対処し、そこでは、輸送が駆動格子内の力学にどのように反応するかを分析する。これらの結果は、多体フロケット状態のような周期的に駆動される量子システムに拡張型貯水池アプローチを使用するための基盤となる。 Extended reservoirs provide a framework for capturing macroscopic, continuum environments, such as metallic electrodes driving a current through a nanoscale contact, impurity, or material. We examine the application of this approach to periodically driven systems, specifically in the context of quantum transport. As with non--equilibrium steady states in time--independent scenarios, the current displays a Kramers' turnover including the formation of a plateau region that captures the physical, continuum limit response. We demonstrate that a simple stability criteria identifies an appropriate relaxation rate to target this physical plateau. Using this approach, we study quantum transport through a periodically driven tilted lattice coupled to two metallic reservoirs held at a finite bias and temperature. We use this model to benchmark the extended reservoir approach and assess the stability criteria. The approach recovers well--understood physical behavior in the limit of weak system--reservoir coupling. Extended reservoirs enable addressing strong coupling and non--linear response as well, where we analyze how transport responds to the dynamics inside the driven lattice. These results set the foundations for the use of extended reservoir approach for periodically driven, quantum systems, such as many--body Floquet states.	翻訳日:2023-07-11 22:15:56 公開日:2023-07-08
# 合成データ、実際のエラー:どのようにして合成データをパブリッシュして使うか Synthetic data, real errors: how (not) to publish and use synthetic data ( http://arxiv.org/abs/2305.09235v2 ) ライセンス: Link先を確認	Boris van Breugel, Zhaozhi Qian, Mihaela van der Schaar	(参考訳) 生成モデルによる合成データの生成は、MLコミュニティやそれ以上の関心を集めており、データセットを個々のニーズに合わせてカスタマイズできる未来を約束している。残念なことに、合成データは通常完璧ではないため、下流のタスクで潜在的なエラーが発生する。本研究では、生成プロセスが下流MLタスクにどのように影響するかを検討する。ナイーブな合成データアプローチ -- 合成データが本物であるかのように使用する -- は、実データにうまく一般化しない下流モデルと分析に繋がることを示している。合成データシステムにおけるmlの改善に向けた第一歩として、深層生成アンサンブル(dge)を紹介します。これは、生成過程モデルのパラメーターに対する後方分布を暗黙的に近似することを目的とした、深層アンサンブルに触発されたフレームワークです。 dgeは下流モデルのトレーニング、評価、不確実性定量化を改善し、平均的なナイーブアプローチを大きく上回っている。最も大きな改善は、原データのマイノリティクラスと低密度領域において達成され、生成的不確実性が最も大きい。 Generating synthetic data through generative models is gaining interest in the ML community and beyond, promising a future where datasets can be tailored to individual needs. Unfortunately, synthetic data is usually not perfect, resulting in potential errors in downstream tasks. In this work we explore how the generative process affects the downstream ML task. We show that the naive synthetic data approach -- using synthetic data as if it is real -- leads to downstream models and analyses that do not generalize well to real data. As a first step towards better ML in the synthetic data regime, we introduce Deep Generative Ensemble (DGE) -- a framework inspired by Deep Ensembles that aims to implicitly approximate the posterior distribution over the generative process model parameters. DGE improves downstream model training, evaluation, and uncertainty quantification, vastly outperforming the naive approach on average. The largest improvements are achieved for minority classes and low-density regions of the original data, for which the generative uncertainty is largest.	翻訳日:2023-07-11 22:07:58 公開日:2023-07-08
# モノクルスケール補正とブートストラップを用いた視覚-LiDARオドメトリーとマッピング Visual-LiDAR Odometry and Mapping with Monocular Scale Correction and Visual Bootstrapping ( http://arxiv.org/abs/2304.08978v2 ) ライセンス: Link先を確認	Hanyu Cai, Ni Ou and Junzheng Wang	(参考訳) 本稿では,低ドリフト特性を有する新しい視覚-LiDARオドメトリーとマッピング手法を提案する。提案手法は,単眼スケール補正と視覚起動型lidarによる初期化修正を併用した,orb-slamとa-loamの2つの一般的なアプローチに基づいている。スケール補正器は、三角測量により回収された画像キーポイントの深さとLiDARによって提供される画像キーポイントの深さの比率を、精度向上のためにオフリヤ拒絶法を用いて算出する。初期化を行うLiDARについて、視覚的オドメトリー法により、LiDARの動作を推定し、性能を向上させる。この手法は高分解能LiDARだけでなく、低分解能LiDARにも適用可能である。提案したSLAMシステムのロバスト性と精度を評価するため,KITTIオドメトリーとS3Eデータセットの実験を行った。実験の結果,orb-slam2 と a-loam を有意に上回った。さらに,スケール補正による視力計測の精度は,ステレオモードORB-SLAM2と同様である。 This paper presents a novel visual-LiDAR odometry and mapping method with low-drift characteristics. The proposed method is based on two popular approaches, ORB-SLAM and A-LOAM, with monocular scale correction and visual-bootstrapped LiDAR poses initialization modifications. The scale corrector calculates the proportion between the depth of image keypoints recovered by triangulation and that provided by LiDAR, using an outlier rejection process for accuracy improvement. Concerning LiDAR poses initialization, the visual odometry approach gives the initial guesses of LiDAR motions for better performance. This methodology is not only applicable to high-resolution LiDAR but can also adapt to low-resolution LiDAR. To evaluate the proposed SLAM system's robustness and accuracy, we conducted experiments on the KITTI Odometry and S3E datasets. Experimental results illustrate that our method significantly outperforms standalone ORB-SLAM2 and A-LOAM. Furthermore, regarding the accuracy of visual odometry with scale correction, our method performs similarly to the stereo-mode ORB-SLAM2.	翻訳日:2023-07-11 22:06:52 公開日:2023-07-08
# グラフによるドメイン間知識伝達 Graph Enabled Cross-Domain Knowledge Transfer ( http://arxiv.org/abs/2304.03452v2 ) ライセンス: Link先を確認	Shibo Yao	(参考訳) 機械学習を意思決定プロセスで活用するには、与えられた知識(自然言語、非構造化テキストなど)を、互換性のある言語とデータフォーマットで機械学習モデルによって理解され、処理可能な表現ベクトルに変換する必要がある。しかし、しばしば遭遇する困難は、与えられた知識がそもそも十分に豊かで信頼性がないことである。そのような場合、優れた表現学習と関心領域における知識不足のギャップを軽減するために、別の領域からの側面情報を融合させようとする。このアプローチはクロスドメインな知識伝達と呼ばれる。オンラインヘルスケアプラットフォーム分析から金融市場のリスク定量化に至るまで、多くのシナリオにおける知識不足の共通性から、この問題を研究することが重要です。機械学習の観点からは、半教師付き学習のパラダイムは、基礎的な真実なしに大量のデータを活用し、目覚ましい学習性能向上を実現する。この論文はクロスドメイン知識の転送に採用されている。 (継続) To leverage machine learning in any decision-making process, one must convert the given knowledge (for example, natural language, unstructured text) into representation vectors that can be understood and processed by machine learning model in their compatible language and data format. The frequently encountered difficulty is, however, the given knowledge is not rich or reliable enough in the first place. In such cases, one seeks to fuse side information from a separate domain to mitigate the gap between good representation learning and the scarce knowledge in the domain of interest. This approach is named Cross-Domain Knowledge Transfer. It is crucial to study the problem because of the commonality of scarce knowledge in many scenarios, from online healthcare platform analyses to financial market risk quantification, leaving an obstacle in front of us benefiting from automated decision making. From the machine learning perspective, the paradigm of semi-supervised learning takes advantage of large amount of data without ground truth and achieves impressive learning performance improvement. It is adopted in this dissertation for cross-domain knowledge transfer. (to be continued)	翻訳日:2023-07-11 22:05:29 公開日:2023-07-08
# 離散単位を中間目的とするテキストレス音声言語理解の改善 Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target ( http://arxiv.org/abs/2305.18096v2 ) ライセンス: Link先を確認	Guan-Wei Wu, Guan-Ting Lin, Shang-Wen Li, Hung-yi Lee	(参考訳) Spoken Language Understanding (SLU) は、音声音声から意味情報を抽出することを目的としたタスクである。従来の研究は、事前訓練された自動音声認識(ASR)モデルやペアテキストを中間目標とするペア音声テキストデータを用いて、エンドツーエンドのSLUを進展させた。しかし、ペアの書き起こしは高価であり、非書き起こし言語には非現実的である。一方、Textless SLUは、ペアの書き起こしを使わずに、音声から意味情報を抽出する。しかし、中間目標の欠如とテキストレスSLUの訓練指導は、しばしば準最適性能をもたらす。本研究では, テキストレスSLUの性能向上のための中間ガイダンスとして, 自己教師型音声モデルからのコンテンツ非依存の離散単位を用いた。本手法は,5つのSLUベンチマークコーパスのベースライン法を超えている。さらに,単位指導は数発の学習を促進し,ノイズに対処するモデルの能力を高める。 Spoken Language Understanding (SLU) is a task that aims to extract semantic information from spoken utterances. Previous research has made progress in end-to-end SLU by using paired speech-text data, such as pre-trained Automatic Speech Recognition (ASR) models or paired text as intermediate targets. However, acquiring paired transcripts is expensive and impractical for unwritten languages. On the other hand, Textless SLU extracts semantic information from speech without utilizing paired transcripts. However, the absence of intermediate targets and training guidance for textless SLU often results in suboptimal performance. In this work, inspired by the content-disentangled discrete units from self-supervised speech models, we proposed to use discrete units as intermediate guidance to improve textless SLU performance. Our method surpasses the baseline method on five SLU benchmark corpora. Additionally, we find that unit guidance facilitates few-shot learning and enhances the model's ability to handle noise.	翻訳日:2023-07-11 21:56:15 公開日:2023-07-08
# 複素非対称ホッピングをもつ非エルミート準結晶の位相的三相転移 Topological triple phase transition in non-Hermitian quasicrystals with complex asymmetric hopping ( http://arxiv.org/abs/2306.14987v2 ) ライセンス: Link先を確認	Shaina Gandhi and Jayendra N. Bandyopadhyay	(参考訳) 3つの異なる相の3つの相転移、すなわち位相的、パリティ時(pt)対称性の破断、金属-絶縁体遷移は、pt対称非エルミート型オーブリー-アンドレ-ハーパー模型の拡張で観察される。このモデルでは、非エルミート複素準周期的オンサイトポテンシャルに加えて、非ハーミティー性も最近傍ホッピング項に含まれる。また、近隣のホッピング用語も準周期的である。オンサイト電位からの2つの非エルミートパラメータとホッピング部分からのもう1つのパラメータの存在は、系のpt対称性遷移を保証する。さらに、これら2つの非エルミートパラメータをチューニングし、三重相転移を観測するパラメータレジームを同定する。いくつかの最近の研究に続いて、このモデルの電気回路に基づく実験的実現についても論じている。 The triple phase transitions or simultaneous transitions of three different phases, namely topological, parity-time (PT) symmetry breaking, and metal-insulator transitions, are observed in an extension of PT symmetric non-Hermitian Aubry-Andr\'e-Harper model. In this model, besides non-Hermitian complex quasi-periodic onsite potential, non-Hermiticity is also included in the nearest-neighbor hopping terms. Moreover, the nearest-neighbor hopping terms is also quasi-periodic. The presence of two non-Hermitian parameters, one from the onsite potential and another one from the hopping part, ensures PT symmetry transition in the system. In addition, tuning these two non-Hermitian parameters, we identify a parameters regime, where we observe the triple phase transition. Following some recent studies, an electrical circuit based experimental realization of this model is also discussed.	翻訳日:2023-07-11 21:47:45 公開日:2023-07-08
# 量子進化のための量子時間の研究 Insights of quantum time for quantum evolution ( http://arxiv.org/abs/2306.11675v2 ) ライセンス: Link先を確認	Ngo Phuc Duc Loc	(参考訳) 時間が出現すると、量子系は進化するにつれて量子時間と絡み合う。システムが内部の絡み合いを含む場合、内部の絡み合いを「外部の」時間系の絡み合いと区別することができるので、進化の速度が向上する。本稿では、2つの絡み合った量子ビットを含むシステムの進化における量子時間の洞察について検討する。 1)局所力学の下で進化する2つの初期絡み合い量子ビット、(2)その間の絡み合いが時間とともに生じる2つの相互作用量子ビットを考える。最初のケースでは、内部の絡み合いの増加が進化を加速させ、時間とともにより絡み合いを増すという主な結果が得られる。第2のケースでは、忠実性によって特徴づけられる進化距離に対する時間系の絡み合いエントロピーの依存性を示す。また, 相互作用が十分に強い場合, 2つの相互作用量子ビットが2つの非相互作用量子ビットよりも高速に進化し, 時間とともに絡み合うことを発見した。これらの結果は、膨張する宇宙におけるブラックホールの蒸発や宇宙の摂動の量子時間に関する新たな知見を得るのに役立つかもしれない。 If time is emergent, quantum system is entangled with quantum time as it evolves. If the system contains entanglement within itself, which we can call internal entanglement to distinguish it from the ``external" time-system entanglement, the speed of evolution is enhanced. In this paper, we explore the insights of quantum time for the evolution of a system that contains two entangled qubits. We consider two cases: (1) two initially entangled qubits that evolve under local dynamics; (2) two interacting qubits such that entanglement between them is generated over time. In the first case, we obtain the main result that increasing internal entanglement speeds up the evolution and makes the system more entangled with time. In the second case, we show the dependence of time-system entanglement entropy on the distance of evolution which is characterized by fidelity. We also compare the two cases with each other and find that two interacting qubits can evolve faster than two non-interacting qubits if the interaction is sufficiently strong, and thus they become entangled with time more quickly. These results could be useful to gain new insights of quantum time for black hole evaporation or cosmological perturbations in an expanding Universe, because we also have an evolving entangled bipartite system in those cases.	翻訳日:2023-07-11 21:46:59 公開日:2023-07-08
# ニューロモルフィックイメージングのためのgnepに基づく動的セグメンテーションと運動推定 GNEP Based Dynamic Segmentation and Motion Estimation for Neuromorphic Imaging ( http://arxiv.org/abs/2307.02595v2 ) ライセンス: Link先を確認	Harbir Antil and David Sayre	(参考訳) 本稿では,画像分割と動き推定の領域におけるイベントベースカメラの応用について検討する。これらのカメラは、従来のフレームベースの画像取得から離れ、非同期イベントの連続ストリームとして視覚情報をキャプチャすることで、画期的な技術を提供する。イベントストリームから得られる時間的・空間的情報を利用してセグメント化と速度推定を行う一般化ナッシュ平衡に基づくフレームワークを提案する。理論的基礎を確立するために, 存在条件を導出し, 平衡計算のための多レベル最適化法を提案する。このアプローチの有効性は、一連の実験を通じて示される。 This paper explores the application of event-based cameras in the domains of image segmentation and motion estimation. These cameras offer a groundbreaking technology by capturing visual information as a continuous stream of asynchronous events, departing from the conventional frame-based image acquisition. We introduce a Generalized Nash Equilibrium based framework that leverages the temporal and spatial information derived from the event stream to carry out segmentation and velocity estimation. To establish the theoretical foundations, we derive an existence criteria and propose a multi-level optimization method for calculating equilibrium. The efficacy of this approach is shown through a series of experiments.	翻訳日:2023-07-11 21:39:05 公開日:2023-07-08
# 自由方向知識蒸留によるグラフニューラルネットワークの共有成長 Shared Growth of Graph Neural Networks via Free-direction Knowledge Distillation ( http://arxiv.org/abs/2307.00534v2 ) ライセンス: Link先を確認	Kaituo Feng, Yikun Miao, Changsheng Li, Ye Yuan, Guoren Wang	(参考訳) 知識蒸留(KD)は,より深い教師GNNからより浅い学生GNNへ知識を抽出することを目的としたグラフニューラルネットワーク(GNN)の性能向上に有効であることが示されている。しかし、よく知られた過度にパラメータ化され過度にスムースな問題のために、十分に深いGNNを訓練することはしばしば困難であり、実用的なアプリケーションでは知識の伝達が無効になる。本稿では,より高度に最適化された教師GNNを提供するのに不要な,GNNの強化学習(FreeKD)による初のフリーダイレクト知識蒸留フレームワークを提案する。私たちの核となるアイデアは、階層的な方法で強化学習を通じて知識を交換するために、より浅い2つのgnnを共同学習することです。 1つの典型的なGNNモデルは、トレーニング中に異なるノードでより良く、より悪いパフォーマンスを示すことが多いので、動的かつ自由方向の知識伝達戦略を考案する。 1)ノードレベル動作は、2つのネットワークの対応するノード間の知識伝達の方向を決定する。 2) 構造レベルアクションは、ノードレベルアクションが伝搬する局所構造のいずれかを決定する。さらに、マルチビュー入力を扱う際に異なるGNNに存在する多様な知識を考慮し、マルチビュー入力で動作する複数の浅いGNN間で自由方向の知識伝達を可能にするソリューションとしてFreeKD++を導入する。 5つのベンチマークデータセットに対する大規模な実験により、我々のアプローチはベースGNNよりも大きなマージンで優れており、様々なGNNに対して有効性を示している。さらに驚くべきことに、私たちのFreeKDは、より深く強力な教師GNNから知識を抽出する従来のKDアルゴリズムと比べて、同等か、さらに優れたパフォーマンスを持っています。 Knowledge distillation (KD) has shown to be effective to boost the performance of graph neural networks (GNNs), where the typical objective is to distill knowledge from a deeper teacher GNN into a shallower student GNN. However, it is often quite challenging to train a satisfactory deeper GNN due to the well-known over-parametrized and over-smoothing issues, leading to invalid knowledge transfer in practical applications. In this paper, we propose the first Free-direction Knowledge Distillation framework via reinforcement learning for GNNs, called FreeKD, which is no longer required to provide a deeper well-optimized teacher GNN. Our core idea is to collaboratively learn two shallower GNNs in an effort to exchange knowledge between them via reinforcement learning in a hierarchical way. As we observe that one typical GNN model often exhibits better and worse performances at different nodes during training, we devise a dynamic and free-direction knowledge transfer strategy that involves two levels of actions: 1) node-level action determines the directions of knowledge transfer between the corresponding nodes of two networks; and then 2) structure-level action determines which of the local structures generated by the node-level actions to be propagated. Furthermore, considering the diverse knowledge present in different GNNs when dealing with multi-view inputs, we introduce FreeKD++ as a solution to enable free-direction knowledge transfer among multiple shallow GNNs operating on multi-view inputs. Extensive experiments on five benchmark datasets demonstrate our approaches outperform the base GNNs in a large margin, and shows their efficacy to various GNNs. More surprisingly, our FreeKD has comparable or even better performance than traditional KD algorithms that distill knowledge from a deeper and stronger teacher GNN.	翻訳日:2023-07-11 21:37:07 公開日:2023-07-08
# 自己構成型コンパタンスサンプリングを用いた配電盤のブラックボックスシミュレーションの効率向上 Achieving Efficiency in Black Box Simulation of Distribution Tails with Self-structuring Importance Samplers ( http://arxiv.org/abs/2102.07060v3 ) ライセンス: Link先を確認	Anand Deo, Karthyek Murthy	(参考訳) 本稿では,線形プログラム,整数線形プログラム,分断線形・二次目的,ディープニューラルネットワークで指定された特徴マップなどのツール群をモデルとした,パフォーマンス尺度の分布テールを推定するための新しい重要サンプリング(is)スキームを提案する。測度の効率的な変化を明確に識別する従来のアプローチは、目的と基礎となる確率分布に複雑に調整する必要があるため、高度にスタイル化されたモデルを超えて実現可能性や拡張性に関する懸念に悩まされる。このボトルネックは, 希少な試料で観測される濃度特性を再現することにより, 種々のモデルにおいて有効IS分布を暗黙的に誘導できる基本変換法によって克服される。この新しいアプローチは、最適なIS分布の自己相似性の現象をもたらす大きな偏差原理を開発することで導かれる。提案したサンプリング器は,基礎モデルの特異性に難渋するにもかかわらず,多変量分布のスペクトル間で漸近的に最適な分散化を実現する最初のものである。その適用性は、ニューラルネットワークによって伝達される文脈的最短経路とポートフォリオクレジットリスクモデルで示される This paper presents a novel Importance Sampling (IS) scheme for estimating distribution tails of performance measures modeled with a rich set of tools such as linear programs, integer linear programs, piecewise linear/quadratic objectives, feature maps specified with deep neural networks, etc. The conventional approach of explicitly identifying efficient changes of measure suffers from feasibility and scalability concerns beyond highly stylized models, due to their need to be tailored intricately to the objective and the underlying probability distribution. This bottleneck is overcome in the proposed scheme with an elementary transformation which is capable of implicitly inducing an effective IS distribution in a variety of models by replicating the concentration properties observed in less rare samples. This novel approach is guided by developing a large deviations principle that brings out the phenomenon of self-similarity of optimal IS distributions. The proposed sampler is the first to attain asymptotically optimal variance reduction across a spectrum of multivariate distributions despite being oblivious to the specifics of the underlying model. Its applicability is illustrated with contextual shortest path and portfolio credit risk models informed by neural networks	翻訳日:2023-07-11 19:54:10 公開日:2023-07-08
# 高速解釈可能なGreedy-Tree Sums Fast Interpretable Greedy-Tree Sums ( http://arxiv.org/abs/2201.11931v3 ) ライセンス: Link先を確認	Yan Shuo Tan, Chandan Singh, Keyan Nasseri, Abhineet Agarwal, James Duncan, Omer Ronen, Matthew Epland, Aaron Kornblith, Bin Yu	(参考訳) 現代の機械学習は印象的な予測性能を達成したが、しばしば解釈可能性の犠牲となる。このような設定では、実践者はしばしば高度に解釈可能な決定木モデルを使用するが、これらは加法構造に対する帰納的バイアスに悩まされる。このバイアスを克服するために,CARTアルゴリズムを一般化したFast Interpretable Greedy-Tree Sums (FIGS)を提案する。論理規則と加算を組み合わせることで、FIGSは高度に解釈可能なまま加法構造に適応することができる。実世界のデータセットに関する広範囲な実験は、figが最先端の予測性能を達成していることを示している。高精細領域におけるFIGSの有用性を示すため,臨床意思決定を導くツールである臨床意思決定器(CDI)の学習にFIGSを適用した。具体的には、医用データの不均一性を考慮したG-FIGSと呼ばれるFIGSの変種を紹介する。 G-FIGSは、ドメイン知識を反映し、感度や解釈性を犠牲にすることなく(CARTよりも20%も向上した)特異性を享受するCDIを導出する。 figに関するさらなる知見を提供するため、figは加法モデルの構成要素を学習できることを証明します。さらに、(オラクル条件下では)非拘束ツリーサムモデルは、加法回帰関数に適合した場合に単一の決定ツリーモデルよりも効率的に一般化するために、ゆがみを利用することを示す。最後に、制約のない分割数による過度な適合を避けるため、ランダム森林の分散低減技術を借りてFIGSのアンサンブル版であるBagging-FIGSを開発した。 Bagging-FIGSは、現実世界のデータセット上でランダムなフォレストやXGBoostと競合するパフォーマンスを享受している。 Modern machine learning has achieved impressive prediction performance, but often sacrifices interpretability, a critical consideration in high-stakes domains such as medicine. In such settings, practitioners often use highly interpretable decision tree models, but these suffer from inductive bias against additive structure. To overcome this bias, we propose Fast Interpretable Greedy-Tree Sums (FIGS), which generalizes the CART algorithm to simultaneously grow a flexible number of trees in summation. By combining logical rules with addition, FIGS is able to adapt to additive structure while remaining highly interpretable. Extensive experiments on real-world datasets show that FIGS achieves state-of-the-art prediction performance. To demonstrate the usefulness of FIGS in high-stakes domains, we adapt FIGS to learn clinical decision instruments (CDIs), which are tools for guiding clinical decision-making. Specifically, we introduce a variant of FIGS known as G-FIGS that accounts for the heterogeneity in medical data. G-FIGS derives CDIs that reflect domain knowledge and enjoy improved specificity (by up to 20% over CART) without sacrificing sensitivity or interpretability. To provide further insight into FIGS, we prove that FIGS learns components of additive models, a property we refer to as disentanglement. Further, we show (under oracle conditions) that unconstrained tree-sum models leverage disentanglement to generalize more efficiently than single decision tree models when fitted to additive regression functions. Finally, to avoid overfitting with an unconstrained number of splits, we develop Bagging-FIGS, an ensemble version of FIGS that borrows the variance reduction techniques of random forests. Bagging-FIGS enjoys competitive performance with random forests and XGBoost on real-world datasets.	翻訳日:2023-07-11 19:45:46 公開日:2023-07-08
# 安全制約を考慮した保守的分布強化学習 Conservative Distributional Reinforcement Learning with Safety Constraints ( http://arxiv.org/abs/2201.07286v2 ) ライセンス: Link先を確認	Hengrui Zhang, Youfang Lin, Sheng Han, Shuo Wang, Kai Lv	(参考訳) 安全探索は、期待される長期コストが制約されるマルコフ決定問題とみなすことができる。従来のオフポリシーアルゴリズムは、制約付き最適化問題をラグランジアン緩和法を導入することで対応する制約付き双対問題に変換する。しかし、上記のアルゴリズムのコスト関数は不正確な推定を提供し、ラグランジュ乗算学習の不安定性を引き起こす。本稿では,cdmpo(reservive distributional maximum a posteriori policy optimization)と呼ばれる新しいオフポリシー強化学習アルゴリズムを提案する。まず,現状が制約を満たすかどうかを正確に判断するため,CDMPOは分散強化学習法を適用してQ関数とC関数を推定する。そして、CDMPOは、探索過程における制約違反の数を減らすために、保守的な値関数損失を使用する。さらに、Lagrange乗算器を安定に更新するために、Weighted Average Proportional Integral Derivative (WAPID) を利用する。実験結果から,提案手法は早期探査プロセスにおける制約違反が少ないことが示された。最終試験結果は,我々の手法がリスク管理に優れていることも示している。 Safety exploration can be regarded as a constrained Markov decision problem where the expected long-term cost is constrained. Previous off-policy algorithms convert the constrained optimization problem into the corresponding unconstrained dual problem by introducing the Lagrangian relaxation technique. However, the cost function of the above algorithms provides inaccurate estimations and causes the instability of the Lagrange multiplier learning. In this paper, we present a novel off-policy reinforcement learning algorithm called Conservative Distributional Maximum a Posteriori Policy Optimization (CDMPO). At first, to accurately judge whether the current situation satisfies the constraints, CDMPO adapts distributional reinforcement learning method to estimate the Q-function and C-function. Then, CDMPO uses a conservative value function loss to reduce the number of violations of constraints during the exploration process. In addition, we utilize Weighted Average Proportional Integral Derivative (WAPID) to update the Lagrange multiplier stably. Empirical results show that the proposed method has fewer violations of constraints in the early exploration process. The final test results also illustrate that our method has better risk control.	翻訳日:2023-07-11 19:45:17 公開日:2023-07-08
# DDAC-SpAM:特徴分割とデコレーションによる高次元スパース付加モデルの分散アルゴリズム DDAC-SpAM: A Distributed Algorithm for Fitting High-dimensional Sparse Additive Models with Feature Division and Decorrelation ( http://arxiv.org/abs/2205.07932v2 ) ライセンス: Link先を確認	Yifan He and Ruiyang Wu and Yong Zhou and Yang Feng	(参考訳) 分散統計学習は大規模データ分析の一般的な手法となっている。この領域の既存の研究は、観測の分割に重点を置いているが、我々は、高次元のスパース加法モデルの下で特徴を分割する新しいアルゴリズムDDAC-SpAMを提案する。私たちのアプローチには3つのステップがあります。この非相関操作により,各局所推定器は,変数間の相関構造に厳密な制約を課すことなく,各加算成分のスパーシティパターンを回復することができる。提案アルゴリズムの有効性と有効性は, 合成データと実データの両方に関する理論的解析と実験結果によって実証される。理論的結果は、一貫したスパーシティパターンの回復と、各付加的機能成分に対する統計的推測の両方を含む。このアプローチはスパース加法モデルに適合する実用的なソリューションを提供し、幅広い領域で有望な応用が可能となる。 Distributed statistical learning has become a popular technique for large-scale data analysis. Most existing work in this area focuses on dividing the observations, but we propose a new algorithm, DDAC-SpAM, which divides the features under a high-dimensional sparse additive model. Our approach involves three steps: divide, decorrelate, and conquer. The decorrelation operation enables each local estimator to recover the sparsity pattern for each additive component without imposing strict constraints on the correlation structure among variables. The effectiveness and efficiency of the proposed algorithm are demonstrated through theoretical analysis and empirical results on both synthetic and real data. The theoretical results include both the consistent sparsity pattern recovery as well as statistical inference for each additive functional component. Our approach provides a practical solution for fitting sparse additive models, with promising applications in a wide range of domains.	翻訳日:2023-07-11 19:35:49 公開日:2023-07-08
# 深層ファイバクラスタリング : 自己教師付き深層学習による解剖学的にインフォームドされたファイバクラスタリング Deep fiber clustering: Anatomically informed fiber clustering with self-supervised deep learning for fast and effective tractography parcellation ( http://arxiv.org/abs/2205.00627v3 ) ライセンス: Link先を確認	Yuqian Chen, Chaoyi Zhang, Tengfei Xue, Yang Song, Nikos Makris, Yogesh Rathi, Weidong Cai, Fan Zhang, Lauren J. O'Donnell	(参考訳) ホワイトマター・ファイバ・クラスタリングは、健康と病気における脳の関連を定量的に分析できるホワイトマター・パーセラレーションの重要な戦略である。専門的な神経解剖学的ラベリングと組み合わせることで、データ駆動型白質繊維クラスタリングは、個人間の白質解剖をモデル化するアトラスを作成するための強力なツールである。広く使われているファイバクラスタリング手法は、従来の教師なし機械学習技術による優れた性能を示しているが、近年のディープラーニングの進歩は、高速で効果的なファイバクラスタリングに向けた有望な方向を示している。本研究では,白質繊維クラスタリングのための新しい深層学習フレームワークであるdeep fiber clustering(dfc)を提案する。このプロセスは、トラクトログラフィで再構成された繊維点の順序に関係なく、各繊維の高次元埋め込み特徴表現を学習する。入力ファイバを点クラウドとして表現し,灰色物質パルセレーションから追加の入力情報ソースを取り込み,クラスタの解剖学的コヒーレンスを改善する新たなネットワークアーキテクチャを設計した。さらに、DFCは、クラスタ割り当て確率の低いファイバを拒絶することで、自然に外周除去を行う。性別,年齢(若年者および高齢者),健康状態(健康管理および複数の神経精神疾患)の220人を対象に,dfcを独立に獲得した3つのコホートについて評価した。 DFCと最先端の白質ファイバクラスタリングアルゴリズムを比較した。実験結果は,クラスタコンパクト性,一般化能力,解剖学的コヒーレンス,計算効率において,dfcの優れた性能を示す。 White matter fiber clustering is an important strategy for white matter parcellation, which enables quantitative analysis of brain connections in health and disease. In combination with expert neuroanatomical labeling, data-driven white matter fiber clustering is a powerful tool for creating atlases that can model white matter anatomy across individuals. While widely used fiber clustering approaches have shown good performance using classical unsupervised machine learning techniques, recent advances in deep learning reveal a promising direction toward fast and effective fiber clustering. In this work, we propose a novel deep learning framework for white matter fiber clustering, Deep Fiber Clustering (DFC), which solves the unsupervised clustering problem as a self-supervised learning task with a domain-specific pretext task to predict pairwise fiber distances. This process learns a high-dimensional embedding feature representation for each fiber, regardless of the order of fiber points reconstructed during tractography. We design a novel network architecture that represents input fibers as point clouds and allows the incorporation of additional sources of input information from gray matter parcellation to improve anatomical coherence of clusters. In addition, DFC conducts outlier removal naturally by rejecting fibers with low cluster assignment probability. We evaluate DFC on three independently acquired cohorts, including data from 220 individuals across genders, ages (young and elderly adults), and different health conditions (healthy control and multiple neuropsychiatric disorders). We compare DFC to several state-of-the-art white matter fiber clustering algorithms. Experimental results demonstrate superior performance of DFC in terms of cluster compactness, generalization ability, anatomical coherence, and computational efficiency.	翻訳日:2023-07-11 19:35:12 公開日:2023-07-08
# 暗号通貨の評価 - 説明可能なAIアプローチ Cryptocurrency Valuation: An Explainable AI Approach ( http://arxiv.org/abs/2201.12893v8 ) ライセンス: Link先を確認	Yulin Liu and Luyao Zhang	(参考訳) 現在、暗号通貨資産の基礎に関する説得力のあるプロキシは存在しない。本稿では、独自のブロックチェーン会計手法を用いて、新しい市場間投資比率(PU比)を提案する。その後、Bitcoinの履歴データによって、さまざまな基本的な市場比をプロキシし、短期的なbitcoinリターンの予測力はほとんどない。しかし、pu比率は、他の方法よりも長期bitcoinリターンを効果的に予測する。さらに,機械学習を用いてPU比の説明可能性を検証する。最後に、PU比によって推奨される自動取引戦略を提示する。第1に、私たちの市場と資金の比率は、古典的な金融理論と、アドホックではなくBitcoin会計のユニークなUTXOモデルに基づくものであり、第2に、この比率の買い得と売り上げ高の影響を実証する実証的証拠であり、最後に、将来の研究において例外となるPython Package Indexを介して、オープンソースソフトウェアとしてトレーディングアルゴリズムを配布する。 Currently, there are no convincing proxies for the fundamentals of cryptocurrency assets. We propose a new market-to-fundamental ratio, the price-to-utility (PU) ratio, utilizing unique blockchain accounting methods. We then proxy various existing fundamental-to-market ratios by Bitcoin historical data and find they have little predictive power for short-term bitcoin returns. However, PU ratio effectively predicts long-term bitcoin returns than alternative methods. Furthermore, we verify the explainability of PU ratio using machine learning. Finally, we present an automated trading strategy advised by the PU ratio that outperforms the conventional buy-and-hold and market-timing strategies. Our research contributes to explainable AI in finance from three facets: First, our market-to-fundamental ratio is based on classic monetary theory and the unique UTXO model of Bitcoin accounting rather than ad hoc; Second, the empirical evidence testifies the buy-low and sell-high implications of the ratio; Finally, we distribute the trading algorithms as open-source software via Python Package Index for future research, which is exceptional in finance research.	翻訳日:2023-07-11 19:33:29 公開日:2023-07-08
# 確率勾配ランゲヴィンダイナミクスの優先サブサンプリング Preferential Subsampling for Stochastic Gradient Langevin Dynamics ( http://arxiv.org/abs/2210.16189v3 ) ライセンス: Link先を確認	Srshti Putcha, Christopher Nemeth, Paul Fearnhead	(参考訳) 確率勾配MCMC(SGMCMC)は、データの小さな一様重み付きサブサンプルを用いて、対数姿勢の勾配の偏りのない見積もりを構築することで、従来のMCMCに代わるスケーラブルな代替手段を提供する。計算効率は高いが、結果として得られる勾配推定器は、高いばらつきと影響のあるサンプリング性能を示す。分散制御の問題は、従来より優れた確率的勾配推定器を構築することで解決されてきた。本稿では,確率勾配に大きな影響を与えるデータポイントを優先的にサブサンプル化するために,離散的,非一様確率分布を用いることを提案する。さらに,アルゴリズムの各イテレーションにおけるサブサンプルサイズを適応的に調整し,勾配を推定しにくいサンプル空間の領域におけるサブサンプルサイズを増大させる手法を提案する。このような手法は,使用する平均サブサンプルサイズを大幅に削減しつつ,同じレベルの精度を維持することができることを示す。 Stochastic gradient MCMC (SGMCMC) offers a scalable alternative to traditional MCMC, by constructing an unbiased estimate of the gradient of the log-posterior with a small, uniformly-weighted subsample of the data. While efficient to compute, the resulting gradient estimator may exhibit a high variance and impact sampler performance. The problem of variance control has been traditionally addressed by constructing a better stochastic gradient estimator, often using control variates. We propose to use a discrete, non-uniform probability distribution to preferentially subsample data points that have a greater impact on the stochastic gradient. In addition, we present a method of adaptively adjusting the subsample size at each iteration of the algorithm, so that we increase the subsample size in areas of the sample space where the gradient is harder to estimate. We demonstrate that such an approach can maintain the same level of accuracy while substantially reducing the average subsample size that is used.	翻訳日:2023-07-11 19:15:02 公開日:2023-07-08
# 条件付きリスク-逆コンテキスト帯域 Conditionally Risk-Averse Contextual Bandits ( http://arxiv.org/abs/2210.13573v2 ) ライセンス: Link先を確認	M\'onika Farsang and Paul Mineiro and Wangda Zhang	(参考訳) 平均ケースの統計的保証を持つ文脈的帯域幅は、劣化した最悪のケースの振る舞いをトレードオフして平均パフォーマンスを向上させるため、リスク回避の状況では不十分である。リスク・アバース・コンテキスト・バンディットを設計することは、探索が不可欠であるが、リスク・アバース・バンディットは報酬の分布全体に敏感であるため困難である。動的な価格設定、在庫管理、セルフチューニングソフトウェアなど、最悪の結果を避けるべきさまざまなシナリオで実験を行い、本番のエクサスケールデータ処理システムを含む。 Contextual bandits with average-case statistical guarantees are inadequate in risk-averse situations because they might trade off degraded worst-case behaviour for better average performance. Designing a risk-averse contextual bandit is challenging because exploration is necessary but risk-aversion is sensitive to the entire distribution of rewards; nonetheless we exhibit the first risk-averse contextual bandit algorithm with an online regret guarantee. We conduct experiments from diverse scenarios where worst-case outcomes should be avoided, from dynamic pricing, inventory management, and self-tuning software; including a production exascale data processing system.	翻訳日:2023-07-11 19:14:45 公開日:2023-07-08
# 第1回IEEE UV2022数学モデリングコンペティション:背景と問題点 The First IEEE UV2022 Mathematical Modelling Competition: Backgrounds and Problems ( http://arxiv.org/abs/2212.07903v2 ) ライセンス: Link先を確認	Juntao Jiang, Yuan Niu, Yi Tao	(参考訳) 経済成長、人々の健康、都市開発は、戦後の課題に直面している。高品質で持続可能な都市開発を促進する方法、市民の幸福感の向上、都市経営の問題を解決する方法が、熱く重要な話題となっている。数学的モデリング(英: mathematical modeling)は、数学的記号を用いて実用的問題を表現し、数学的モデルを確立し、その解を提案する研究手法である。 1$^{st}$ ieee uv2022数学モデリングコンペティション(英: 1$^{st}$ ieee uv2022 mathematical modeling competition)は、6$^{th}$ ieee international conference on universal villageのサテライト活動である。本稿では,競争の背景を紹介するとともに,解決すべき課題を公表する。 Economic growth, people's health, and urban development face challenges in the post-epidemic era. How to promote high-quality and sustainable urban development, improve citizens' sense of happiness, and solve problems in city management have become a heated and crucial topic. Mathematical modeling is a research method that uses mathematical symbols to express practical problems, establish mathematical models, and then propose solutions. The 1$^{st}$ IEEE UV2022 Mathematical Modelling Competition is a satellite activity of the 6$^{th}$ IEEE International Conference on Universal Village, which expects participants to use mathematical modeling methods for practical problems and provide guidelines for sustainable social progress. This short paper introduces the background of the competition and publishes the problems to be solved.	翻訳日:2023-07-11 19:05:27 公開日:2023-07-08
# ブロックチェーンに関するAI倫理: ブロックチェーンセキュリティのためのTwitterデータに関するトピック分析 AI Ethics on Blockchain: Topic Analysis on Twitter Data for Blockchain Security ( http://arxiv.org/abs/2212.06951v5 ) ライセンス: Link先を確認	Yihang Fu, Zesen Zhuang, Luyao Zhang	(参考訳) Blockchainは、分散ネットワークを使用してコンピュータシステムをよりセキュアにする権限を与えている。しかしながら、現在のブロックチェーン設計は、トランザクションオーダの公平性の問題に悩まされている。鉱夫は、いわゆるmev(miner extractable value)と呼ばれる取引を注文して利益を得ることができる。既存の研究は、MEVが深刻なセキュリティ問題であると認識し、著名なFlashbotを含む潜在的なソリューションを提案する。しかし、以前の研究では主にブロックチェーンデータを分析しており、より広範なAI社会におけるMEVの影響を捉えていない可能性がある。そこで本研究では,MEV上のツイートのトピックを包括的に分析するために自然言語処理(NLP)手法を適用した。私たちは#MEVと#Flashbotsハッシュタグで20000以上のツイートを収集し、それらのトピックを分析しました。以上の結果から, このツイートは, セキュリティ, 公平性, 情緒的感情, およびMEVに対するソリューションへの欲求など, 倫理的懸念の深いトピックを議論した。また、ブロックチェーンやソーシャルメディアプラットフォーム上でのMEV活動のコムーブメントを特定します。私たちの研究は、ブロックチェーンセキュリティ、MEVソリューション、AI倫理のインターフェースにおける文献に貢献します。 Blockchain has empowered computer systems to be more secure using a distributed network. However, the current blockchain design suffers from fairness issues in transaction ordering. Miners are able to reorder transactions to generate profits, the so-called miner extractable value (MEV). Existing research recognizes MEV as a severe security issue and proposes potential solutions, including prominent Flashbots. However, previous studies have mostly analyzed blockchain data, which might not capture the impacts of MEV in a much broader AI society. Thus, in this research, we applied natural language processing (NLP) methods to comprehensively analyze topics in tweets on MEV. We collected more than 20000 tweets with #MEV and #Flashbots hashtags and analyzed their topics. Our results show that the tweets discussed profound topics of ethical concern, including security, equity, emotional sentiments, and the desire for solutions to MEV. We also identify the co-movements of MEV activities on blockchain and social media platforms. Our study contributes to the literature at the interface of blockchain security, MEV solutions, and AI ethics.	翻訳日:2023-07-11 19:05:10 公開日:2023-07-08
# 短期記憶システム,エピソディクス,意味記憶システムを備えた機械 A Machine with Short-Term, Episodic, and Semantic Memory Systems ( http://arxiv.org/abs/2212.02098v2 ) ライセンス: Link先を確認	Taewoon Kim, Michael Cochez, Vincent Fran\c{c}ois-Lavet, Mark Neerincx, Piek Vossen	(参考訳) 明示的な人間の記憶システムの認知科学理論に着想を得て、短期記憶、エピソディクス、意味記憶システムを持つエージェントをモデル化し、それぞれを知識グラフでモデル化した。このシステムを評価し,エージェントの行動を解析するために,エージェントが質問に答えることで,記憶をエンコードし,保存し,取り出す方法を学ぶ必要がある強化学習エージェント環境「the room」を設計・公開した。我々は,Q-ラーニングに基づくエージェントが,短期記憶を忘れるべきか,あるいはエピソード記憶システムやセマンティック記憶システムに格納すべきかをうまく学習していることを示す。実験により,人間のような記憶システムを持つエージェントは,このメモリ構造を環境に含まないエージェントよりも優れることが示された。 Inspired by the cognitive science theory of the explicit human memory systems, we have modeled an agent with short-term, episodic, and semantic memory systems, each of which is modeled with a knowledge graph. To evaluate this system and analyze the behavior of this agent, we designed and released our own reinforcement learning agent environment, "the Room", where an agent has to learn how to encode, store, and retrieve memories to maximize its return by answering questions. We show that our deep Q-learning based agent successfully learns whether a short-term memory should be forgotten, or rather be stored in the episodic or semantic memory systems. Our experiments indicate that an agent with human-like memory systems can outperform an agent without this memory structure in the environment.	翻訳日:2023-07-11 19:04:26 公開日:2023-07-08
# 時間グラフのためのグラフニューラルネットワーク:最先端、オープン課題、そして機会 Graph Neural Networks for temporal graphs: State of the art, open challenges, and opportunities ( http://arxiv.org/abs/2302.01018v4 ) ライセンス: Link先を確認	Antonio Longa, Veronica Lachi, Gabriele Santin, Monica Bianchini, Bruno Lepri, Pietro Lio, Franco Scarselli and Andrea Passerini	(参考訳) グラフニューラルネットワーク(GNN)は、(静的)グラフ構造化データを学ぶための主要なパラダイムとなっている。しかし、グラフとノード/エッジ属性は時間とともに変化するため、現実世界のシステムの多くは本質的に動的である。近年, 時間グラフのためのGNNベースのモデルが, GNNの能力を拡張するための研究分野として注目されている。本稿では,時間的GNNの現状を概観し,学習環境とタスクの厳密な形式化と,時間的側面の表現・処理方法の観点から既存のアプローチを分類する新たな分類法を提案する。調査は、研究とアプリケーションの両方の観点から、この分野の最も関連するオープンチャレンジについて議論して締めくくった。 Graph Neural Networks (GNNs) have become the leading paradigm for learning on (static) graph-structured data. However, many real-world systems are dynamic in nature, since the graph and node/edge attributes change over time. In recent years, GNN-based models for temporal graphs have emerged as a promising area of research to extend the capabilities of GNNs. In this work, we provide the first comprehensive overview of the current state-of-the-art of temporal GNN, introducing a rigorous formalization of learning settings and tasks and a novel taxonomy categorizing existing approaches in terms of how the temporal aspect is represented and processed. We conclude the survey with a discussion of the most relevant open challenges for the field, from both research and application perspectives.	翻訳日:2023-07-11 18:56:21 公開日:2023-07-08
# あるデータ空間から別のデータ空間への局所転送学習 Local transfer learning from one data space to another ( http://arxiv.org/abs/2302.00160v2 ) ライセンス: Link先を確認	H. N. Mhaskar and Ryan O'Dowd	(参考訳) 多様体学習の基本的な問題は、高次元ユークリッド空間の低次元部分多様体上で支持される確率分布からランダムに選択されたデータの関数関係を近似することである。多様体は本質的にデータセット自身で定義され、典型的には、データがある意味で多様体に密着するように設計される。データ空間の概念は、関数近似を可能にする本質的性質をカプセル化した多様体の抽象である。転送学習(meta-learning)の問題は、あるデータセット上の関数の学習を使用して、新しいデータセットで同様の関数を学習することだ。関数近似の観点では、これはあるデータ空間(ベースデータ空間)から別のデータ空間(対象データ空間)へ関数を持ち上げることを意味する。この観点から、応用数学における逆問題(逆ラドン変換など)と転移学習を結びつけることができる。本稿では,データをベースデータ空間の一部でのみ知っていると仮定した場合に,そのようなリフティングの問題を考察する。我々は、リフトが定義できる対象データ空間のサブセットを決定することに興味を持ち、関数の局所的滑らかさとそのリフトの関連性について述べる。 A fundamental problem in manifold learning is to approximate a functional relationship in a data chosen randomly from a probability distribution supported on a low dimensional sub-manifold of a high dimensional ambient Euclidean space. The manifold is essentially defined by the data set itself and, typically, designed so that the data is dense on the manifold in some sense. The notion of a data space is an abstraction of a manifold encapsulating the essential properties that allow for function approximation. The problem of transfer learning (meta-learning) is to use the learning of a function on one data set to learn a similar function on a new data set. In terms of function approximation, this means lifting a function on one data space (the base data space) to another (the target data space). This viewpoint enables us to connect some inverse problems in applied mathematics (such as inverse Radon transform) with transfer learning. In this paper we examine the question of such lifting when the data is assumed to be known only on a part of the base data space. We are interested in determining subsets of the target data space on which the lifting can be defined, and how the local smoothness of the function and its lifting are related.	翻訳日:2023-07-11 18:56:07 公開日:2023-07-08
# MADAv2: 高度なマルチアンカーベースのアクティブドメイン適応セグメンテーション MADAv2: Advanced Multi-Anchor Based Active Domain Adaptation Segmentation ( http://arxiv.org/abs/2301.07354v2 ) ライセンス: Link先を確認	Munan Ning, Donghuan Lu, Yujia Xie, Dongdong Chen, Dong Wei, Yefeng Zheng, Yonghong Tian, Shuicheng Yan, Li Yuan	(参考訳) 教師なしのドメイン適応は、注釈付きデータの少ないタスクで広く採用されている。残念なことに、ターゲットドメインの分布をソースドメインに無条件にマッピングすると、ターゲットドメインデータの本質的な構造情報を歪めてしまう可能性があるため、性能は低下する。この問題に対処するため,まず,セマンティックセグメンテーションタスクに関するドメイン適応を支援するために,アクティブなサンプル選択を提案する。単一セントロイドの代わりに複数のアンカーを革新的に採用することにより、ソース領域とターゲット領域の両方を、ターゲット領域からより相補的かつ有益なサンプルを選択するマルチモーダル分布として特徴づけることができる。これらのアクティブなサンプルを手作業でアノテートするワークロードは少ないので、ターゲットドメイン分布の歪みを効果的に軽減することができ、パフォーマンス向上が図れる。さらに, 長期分布問題を緩和し, さらにセグメンテーション性能を向上させるために, 強力な半教師付きドメイン適応戦略を提案する。公開データセットで広範な実験を行い,提案手法が最先端手法を大きなマージンで上回り,gta5では71.4%miou,synthiaでは71.8%miouに匹敵する性能を実現することを示した。それぞれの成分の有効性は、徹底的なアブレーション研究によって検証される。 Unsupervised domain adaption has been widely adopted in tasks with scarce annotated data. Unfortunately, mapping the target-domain distribution to the source-domain unconditionally may distort the essential structural information of the target-domain data, leading to inferior performance. To address this issue, we firstly propose to introduce active sample selection to assist domain adaptation regarding the semantic segmentation task. By innovatively adopting multiple anchors instead of a single centroid, both source and target domains can be better characterized as multimodal distributions, in which way more complementary and informative samples are selected from the target domain. With only a little workload to manually annotate these active samples, the distortion of the target-domain distribution can be effectively alleviated, achieving a large performance gain. In addition, a powerful semi-supervised domain adaptation strategy is proposed to alleviate the long-tail distribution problem and further improve the segmentation performance. Extensive experiments are conducted on public datasets, and the results demonstrate that the proposed approach outperforms state-of-the-art methods by large margins and achieves similar performance to the fully-supervised upperbound, i.e., 71.4% mIoU on GTA5 and 71.8% mIoU on SYNTHIA. The effectiveness of each component is also verified by thorough ablation studies.	翻訳日:2023-07-11 18:55:17 公開日:2023-07-08
# 確率ロバスト性に基づく説明 Provable Robust Saliency-based Explanations ( http://arxiv.org/abs/2212.14106v3 ) ライセンス: Link先を確認	Chao Chen, Chenghua Guo, Guixiang Ma, Ming Zeng, Xi Zhang, Sihong Xie	(参考訳) 機械学習モデルのロバストな説明は、モデルに対する人間の信頼を確立する上で重要である。最高$kの交差点は説明の堅牢性を評価するために広く使われている。しかし、既存の攻撃および防御戦略の多くは$\ell_p$ノルムに基づいているため、評価と最適化の目的のミスマッチが生じる。この目的のために,1k$サルエント特徴のランク付け安定性を測定するための説明厚みを定義し,その厚みを最大化し,最上位サルエント特徴を効率的に安定化するために,新しいトラクタブルサーロゲートに基づく \textit{r2et} アルゴリズムを設計する。理論的には、R2ETと対向訓練の関連性を証明し、新しい多目的最適化定式化と一般化誤差境界を用いて、代理目的が説明の数値的および統計的安定性の両方を改善することを証明した。ネットワークアーキテクチャとデータモダリティの幅広い実験により、R2ETはモデル精度を維持しながら、ステルス攻撃下でのロバスト性が高い説明が得られることが示された。 Robust explanations of machine learning models are critical to establishing human trust in the models. The top-$k$ intersection is widely used to evaluate the robustness of explanations. However, most existing attacking and defense strategies are based on $\ell_p$ norms, thus creating a mismatch between the evaluation and optimization objectives. To this end, we define explanation thickness for measuring top-$k$ salient features ranking stability, and design the \textit{R2ET} algorithm based on a novel tractable surrogate to maximize the thickness and stabilize the top salient features efficiently. Theoretically, we prove a connection between R2ET and adversarial training; using a novel multi-objective optimization formulation and a generalization error bound, we further prove that the surrogate objective can improve both the numerical and statistical stability of the explanations. Experiments with a wide spectrum of network architectures and data modalities demonstrate that R2ET attains higher explanation robustness under stealthy attacks while retaining model accuracy.	翻訳日:2023-07-11 18:54:27 公開日:2023-07-08
# CausalDialogue:会話における発話レベルの因果関係のモデル化 CausalDialogue: Modeling Utterance-level Causality in Conversations ( http://arxiv.org/abs/2212.10515v2 ) ライセンス: Link先を確認	Yi-Lin Tuan, Alon Albalak, Wenda Xu, Michael Saxon, Connor Pryor, Lise Getoor, William Yang Wang	(参考訳) 広く採用されているにもかかわらず、ニューラル会話モデルはまだ人間との自然なチャット機能を見せていない。本研究では,ユーザ発話を原因として検討し,応答を効果として生成し,原因の変化が異なる効果をもたらすことを認識した。このコンセプトをさらに探求するため、クラウドソーシングを通じてCausalDialogueと呼ばれる新しいデータセットをコンパイルし、拡張しました。このデータセットは、有向非巡回グラフ(DAG)構造内に複数の因果効果対を含む。分析の結果,従来の損失関数がDAG構造を効果的に組み込むのに苦労していることが判明し,ニューラル会話モデルの発話レベルにおける因果性の影響を高めるために,指数最大平均処理効果(Exponential Maximum Average Treatment Effect, ExMATE)と呼ばれる因果性強化手法を提案する。対話生成における因果性を考慮する必要性を評価するために,様々なモデル,推論,学習手法を用いた因果ダイアログデータセットに関する総合ベンチマークを構築した。実験を通じて、ExMATEのような因果性にインスパイアされた損失は、従来の損失関数の多様性と俊敏性を向上させることができ、この新しいデータセットで人間レベルの品質に到達するための改善の余地がまだ残っていることが判明した。 Despite their widespread adoption, neural conversation models have yet to exhibit natural chat capabilities with humans. In this research, we examine user utterances as causes and generated responses as effects, recognizing that changes in a cause should produce a different effect. To further explore this concept, we have compiled and expanded upon a new dataset called CausalDialogue through crowd-sourcing. This dataset includes multiple cause-effect pairs within a directed acyclic graph (DAG) structure. Our analysis reveals that traditional loss functions struggle to effectively incorporate the DAG structure, leading us to propose a causality-enhanced method called Exponential Maximum Average Treatment Effect (ExMATE) to enhance the impact of causality at the utterance level in training neural conversation models. To evaluate the needs of considering causality in dialogue generation, we built a comprehensive benchmark on CausalDialogue dataset using different models, inference, and training methods. Through experiments, we find that a causality-inspired loss like ExMATE can improve the diversity and agility of conventional loss function and there is still room for improvement to reach human-level quality on this new dataset.	翻訳日:2023-07-11 18:54:07 公開日:2023-07-08
# BlackVIP:ロバストトランスファー学習のためのブラックボックスビジュアルプロンプト BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning ( http://arxiv.org/abs/2303.14773v2 ) ライセンス: Link先を確認	Changdae Oh, Hyeji Hwang, Hee-young Lee, YongTaek Lim, Geunyoung Jung, Jiyoung Jung, Hosik Choi, Kyungwoo Song	(参考訳) 大規模事前学習モデル(PTM)の急増に伴い、これらのモデルを多くの下流タスクに微調整することが重要な問題となっている。その結果,大規模モデルのパラメータ効率のよい伝達学習 (PETL) が注目されている。最近のPETL法は素晴らしい性能を示しているが、楽観的な仮定に依存している。 1) PTM のパラメータ全体のセットが利用可能で、 2)微調整のための十分な大きなメモリ容量を備える。しかしながら、現実世界のほとんどのアプリケーションでは、PTMは明確なパラメータアクセシビリティを持たないブラックボックスAPIまたはプロプライエタリなソフトウェアとして提供される。また、現代のPTMにおいて大きなメモリ要件を満たすことは困難である。本研究では,モデルアーキテクチャやパラメータの知識のないPTMを効率的に適応するブラックボックスビジュアルプロンプト(BlackVIP)を提案する。 BlackVIPには2つのコンポーネントがある。 1)コーディネーター及び 2) 傾斜補正を伴う同時摂動確率近似(SPSA-GC)。コーディネーターは入力に依存した画像形状の視覚的プロンプトを設計し、分散/位置シフトに対するわずかな適応とロバスト性を改善する。 SPSA-GCは、コーディネータを更新するターゲットモデルの勾配を効率的に推定する。 16のデータセットに対する大規模な実験では、最小限のメモリ要件で、PTMのパラメータにアクセスすることなく、BlackVIPが多様なドメインへの堅牢な適応を可能にすることが示されている。コード: \url{https://github.com/changdaeoh/BlackVIP} With the surge of large-scale pre-trained models (PTMs), fine-tuning these models to numerous downstream tasks becomes a crucial problem. Consequently, parameter efficient transfer learning (PETL) of large models has grasped huge attention. While recent PETL methods showcase impressive performance, they rely on optimistic assumptions: 1) the entire parameter set of a PTM is available, and 2) a sufficiently large memory capacity for the fine-tuning is equipped. However, in most real-world applications, PTMs are served as a black-box API or proprietary software without explicit parameter accessibility. Besides, it is hard to meet a large memory requirement for modern PTMs. In this work, we propose black-box visual prompting (BlackVIP), which efficiently adapts the PTMs without knowledge about model architectures and parameters. BlackVIP has two components; 1) Coordinator and 2) simultaneous perturbation stochastic approximation with gradient correction (SPSA-GC). The Coordinator designs input-dependent image-shaped visual prompts, which improves few-shot adaptation and robustness on distribution/location shift. SPSA-GC efficiently estimates the gradient of a target model to update Coordinator. Extensive experiments on 16 datasets demonstrate that BlackVIP enables robust adaptation to diverse domains without accessing PTMs' parameters, with minimal memory requirements. Code: \url{https://github.com/changdaeoh/BlackVIP}	翻訳日:2023-07-11 18:46:32 公開日:2023-07-08
# オブジェクト中心スロット拡散 Object-Centric Slot Diffusion ( http://arxiv.org/abs/2303.10834v2 ) ライセンス: Link先を確認	Jindong Jiang, Fei Deng, Gautam Singh, Sungjin Ahn	(参考訳) オブジェクト中心学習におけるトランスフォーマーベース画像生成モデルの成功は、複雑なシーンを扱うための強力な画像生成器の重要性を強調している。しかし、画像生成における拡散モデルの表現力が高いにもかかわらず、オブジェクト中心学習への統合は、この領域では未解明のままである。本稿では,オブジェクト中心学習への拡散モデル統合の可能性と可能性について検討し,このアプローチの長所と短所について考察する。従来のスロットデコーダをオブジェクトスロット上で条件付けされた潜在拡散モデルに置き換えた最初のオブジェクト中心学習モデルであり、テキストのような教師付きアノテーションを必要とせずに動作する最初の教師なし合成条件拡散モデルでもある。この分野でのFFHQデータセットの最初の適用を含む、さまざまなオブジェクト中心のタスクの実験を通じて、LSDが最先端のトランスフォーマーベースのデコーダ、特に複雑なシーンにおいて著しく優れており、教師なしの合成生成品質が優れていることを示す。プロジェクトページは $\href{https://latentslotdiffusion.github.io}{here}$ The recent success of transformer-based image generative models in object-centric learning highlights the importance of powerful image generators for handling complex scenes. However, despite the high expressiveness of diffusion models in image generation, their integration into object-centric learning remains largely unexplored in this domain. In this paper, we explore the feasibility and potential of integrating diffusion models into object-centric learning and investigate the pros and cons of this approach. We introduce Latent Slot Diffusion (LSD), a novel model that serves dual purposes: it is the first object-centric learning model to replace conventional slot decoders with a latent diffusion model conditioned on object slots, and it is also the first unsupervised compositional conditional diffusion model that operates without the need for supervised annotations like text. Through experiments on various object-centric tasks, including the first application of the FFHQ dataset in this field, we demonstrate that LSD significantly outperforms state-of-the-art transformer-based decoders, particularly in more complex scenes, and exhibits superior unsupervised compositional generation quality. Project page is available at $\href{https://latentslotdiffusion.github.io}{here}$	翻訳日:2023-07-11 18:45:48 公開日:2023-07-08
# ネットワークマルコフポテンシャルゲームにおける局所的アクター臨界の収束速度 Convergence Rates for Localized Actor-Critic in Networked Markov Potential Games ( http://arxiv.org/abs/2303.04865v2 ) ライセンス: Link先を確認	Zhaoyi Zhou, Zaiwei Chen, Yiheng Lin, and Adam Wierman	(参考訳) 本稿では,ネットワーク内のノードにエージェントが関連付けられているネットワークマルコフポテンシャルゲームについて紹介する。各エージェントは、それぞれのローカルポテンシャル関数を持ち、各エージェントの報酬は、近隣のエージェントの状態とアクションにのみ依存する。この文脈では,局所化アクタ-クリティックアルゴリズムを提案する。各エージェントはローカル情報のみを使用しており、グローバル状態へのアクセスは必要ないため、アルゴリズムはスケーラブルである。さらに、このアルゴリズムは関数近似を用いて次元の呪いを克服する。主な結果は,局所化誤差と関数近似誤差までの有限サンプル保証を提供する。具体的には、平均的なナッシュ後悔によって測定されたサンプルの複雑さを$\tilde{\mathcal{O}}(\tilde{\epsilon}^{-4})で達成する。これはエージェントの数に依存しないマルチエージェント競争ゲームに対する最初の有限サンプル境界である。 We introduce a class of networked Markov potential games in which agents are associated with nodes in a network. Each agent has its own local potential function, and the reward of each agent depends only on the states and actions of the agents within a neighborhood. In this context, we propose a localized actor-critic algorithm. The algorithm is scalable since each agent uses only local information and does not need access to the global state. Further, the algorithm overcomes the curse of dimensionality through the use of function approximation. Our main results provide finite-sample guarantees up to a localization error and a function approximation error. Specifically, we achieve an $\tilde{\mathcal{O}}(\tilde{\epsilon}^{-4})$ sample complexity measured by the averaged Nash regret. This is the first finite-sample bound for multi-agent competitive games that does not depend on the number of agents.	翻訳日:2023-07-11 18:45:07 公開日:2023-07-08
# OmniForce: 人中心,大規模モデル駆動,クラウドエッジコラボレーション型AutoMLシステムについて OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge Collaborative AutoML System ( http://arxiv.org/abs/2303.00501v2 ) ライセンス: Link先を確認	Chao Xue, Wei Liu, Shuai Xie, Zhenfang Wang, Jiaxing Li, Xuyang Peng, Liang Ding, Shanshan Zhao, Qiong Cao, Yibo Yang, Fengxiang He, Bohua Cai, Rongcheng Bian, Yiyan Zhao, Heliang Zheng, Xiangyang Liu, Dongkai Liu, Daqing Liu, Li Shen, Chang Li, Shijin Zhang, Yukang Zhang, Guanpu Chen, Shixiang Chen, Yibing Zhan, Jing Zhang, Chaoyue Wang, Dacheng Tao	(参考訳) 機械学習(AutoML)は、最小限の人力でMLモデルを構築することを目指している。 While considerable research has been conducted in the area of AutoML in general, aiming to take humans out of the loop when building artificial intelligence (AI) applications, scant literature has focused on how AutoML works well in open-environment scenarios such as the process of training and updating large models, industrial supply chains or the industrial metaverse, where people often face open-loop problems during the search process: they must continuously collect data, update data and models, satisfy the requirements of the development and deployment environment, support massive devices, modify evaluation metrics, etc. 純粋なデータ駆動アプローチによるオープン環境問題に対処するには、データ量、計算リソース、専用のデータエンジニアの努力が必要であり、現在のautomlシステムとプラットフォームは非効率で計算が難しい。人間とコンピュータの相互作用は、オープン環境AIの問題に取り組むための実用的で実現可能な方法である。本稿では、人中心型オートML(HAML)システムであるOmniForceを紹介し、人支援型MLと人支援型MLの両方を出力し、AutoMLシステムを実践し、オープン環境シナリオにおける適応型AIを構築する。具体的には、mlバージョン管理、パイプライン駆動開発とデプロイメントのコラボレーション、柔軟な検索戦略フレームワーク、大規模モデルを含む広くプロビジョニングされクラウドソースされたアプリケーションアルゴリズムなど、omniforceを紹介します。さらにomniforceによって構築された(大規模な)モデルは、数分で自動的にリモートサービスに変換することができる。複数の検索空間と実世界のユースケースで得られた実験結果は,OmniForceの有効性と有効性を示している。 Automated machine learning (AutoML) seeks to build ML models with minimal human effort. While considerable research has been conducted in the area of AutoML in general, aiming to take humans out of the loop when building artificial intelligence (AI) applications, scant literature has focused on how AutoML works well in open-environment scenarios such as the process of training and updating large models, industrial supply chains or the industrial metaverse, where people often face open-loop problems during the search process: they must continuously collect data, update data and models, satisfy the requirements of the development and deployment environment, support massive devices, modify evaluation metrics, etc. Addressing the open-environment issue with pure data-driven approaches requires considerable data, computing resources, and effort from dedicated data engineers, making current AutoML systems and platforms inefficient and computationally intractable. Human-computer interaction is a practical and feasible way to tackle the problem of open-environment AI. In this paper, we introduce OmniForce, a human-centered AutoML (HAML) system that yields both human-assisted ML and ML-assisted human techniques, to put an AutoML system into practice and build adaptive AI in open-environment scenarios. Specifically, we present OmniForce in terms of ML version management; pipeline-driven development and deployment collaborations; a flexible search strategy framework; and widely provisioned and crowdsourced application algorithms, including large models. Furthermore, the (large) models constructed by OmniForce can be automatically turned into remote services in a few minutes; this process is dubbed model as a service (MaaS). Experimental results obtained in multiple search spaces and real-world use cases demonstrate the efficacy and efficiency of OmniForce.	翻訳日:2023-07-11 18:44:38 公開日:2023-07-08
# クナプサックによる約定置帯 Approximately Stationary Bandits with Knapsacks ( http://arxiv.org/abs/2302.14686v2 ) ライセンス: Link先を確認	Giannis Fikioris, \'Eva Tardos	(参考訳) 世界的な予算制約の下でのバンディット問題の一般化であるナップサック(bwk)によるバンディットは近年注目を集めている。以前の研究では、各ラウンドのリソースの報酬と消費がi.d.分布からサンプリングされる確率的BwKと、これらのパラメータが相手によって選択される逆BwKの2つの極端に焦点が当てられていた。非回帰学習は確率的な場合では達成可能であるが、敵対的な場合においては競争比率のスタイルのみが達成可能であり、競争比率は予算か時間と資源の双方に左右される。このギャップを大きくしているのは、Adversarial BwKでは、予算がより拘束力のある場合、保証が悪化することです。 best-of-both-worlds''型アルゴリズムは知られているが(各極端な場合において最高の保証を提供する単一のアルゴリズム)、それらの境界は、環境が完全に確率的でないとすぐに敵のケースに分解される。私たちの仕事は、このギャップを埋めることを目的としており、厳密には確率的ではなく最悪のケースでもないワークロードの保証を提供しています。我々は、インスタンスが確率的あるいは逆数的に近いかをパラメータ化する条件 A approximately Stationary BwK を定義する。これらのパラメータに基づいて、BwKで達成可能な最高の競争比率を探索する。パラメータの値に従わない2つのアルゴリズムを探索し、パラメータの値に依存する2つの極端なケースにおいて、最善の保証間のスムーズな遷移が可能な競合比を保証する。我々の保証は、特に利用可能な予算が少なければ、敵の保証を大きく改善します。私たちはまた、達成可能な保証の限界を証明し、予算が小さい場合の結果がほぼタイトであることを示します。 Bandits with Knapsacks (BwK), the generalization of the Bandits problem under global budget constraints, has received a lot of attention in recent years. Previous work has focused on one of the two extremes: Stochastic BwK where the rewards and consumptions of the resources of each round are sampled from an i.i.d. distribution, and Adversarial BwK where these parameters are picked by an adversary. Achievable guarantees in the two cases exhibit a massive gap: No-regret learning is achievable in the stochastic case, but in the adversarial case only competitive ratio style guarantees are achievable, where the competitive ratio depends either on the budget or on both the time and the number of resources. What makes this gap so vast is that in Adversarial BwK the guarantees get worse in the typical case when the budget is more binding. While ``best-of-both-worlds'' type algorithms are known (single algorithms that provide the best achievable guarantee in each extreme case), their bounds degrade to the adversarial case as soon as the environment is not fully stochastic. Our work aims to bridge this gap, offering guarantees for a workload that is not exactly stochastic but is also not worst-case. We define a condition, Approximately Stationary BwK, that parameterizes how close to stochastic or adversarial an instance is. Based on these parameters, we explore what is the best competitive ratio attainable in BwK. We explore two algorithms that are oblivious to the values of the parameters but guarantee competitive ratios that smoothly transition between the best possible guarantees in the two extreme cases, depending on the values of the parameters. Our guarantees offer great improvement over the adversarial guarantee, especially when the available budget is small. We also prove bounds on the achievable guarantee, showing that our results are approximately tight when the budget is small.	翻訳日:2023-07-11 18:44:12 公開日:2023-07-08
# 雑音画像分割における限界しきい値 Marginal Thresholding in Noisy Image Segmentation ( http://arxiv.org/abs/2304.04116v3 ) ライセンス: Link先を確認	Marcus Nordstr\"om, Henrik Hult, Atsuto Maki	(参考訳) 本研究は,ガウス場変形に基づく雑音モデルを考慮した医用画像分割におけるラベルノイズの検討である。このようなノイズは、現実的な外観のセグメンテーションをもたらし、期待される変形が恒等写像であるという意味では偏りがないため、興味がある。限界確率に対するサンプリングおよび閉形解の効率的な方法が提供される。さらに,損失関数のクロスエントロピーとソフトディスに対する理論的最適解について検討し,ノイズレベルが増加するにつれてどのように分岐するかを示した。損失関数のキャラクタリゼーションに関する最近の研究に基づき、効率的に計算できる特定の未知のしきい値を持つクロスエントロピーの解をしきい値にすることで、ソフトディースの最適解を復元できることが示されている。これにより, クロスエントロピーをソフトディスと比較した場合のパフォーマンス低下は, 間違ったしきい値を用いて生じるのかという疑問が持ち上がる。この仮説は、トータルセグメンタデータセットから3つの臓器区分問題に関する5倍の研究で検証され、4つの異なる雑音強度を用いて検証される。その結果, 閾値の変化は, クロスエントロピーの性能をソフトディスより体系的に悪いものから, ソフトディスより良いものへと導くことが示唆された。 This work presents a study on label noise in medical image segmentation by considering a noise model based on Gaussian field deformations. Such noise is of interest because it yields realistic looking segmentations and because it is unbiased in the sense that the expected deformation is the identity mapping. Efficient methods for sampling and closed form solutions for the marginal probabilities are provided. Moreover, theoretically optimal solutions to the loss functions cross-entropy and soft-Dice are studied and it is shown how they diverge as the level of noise increases. Based on recent work on loss function characterization, it is shown that optimal solutions to soft-Dice can be recovered by thresholding solutions to cross-entropy with a particular a priori unknown threshold that efficiently can be computed. This raises the question whether the decrease in performance seen when using cross-entropy as compared to soft-Dice is caused by using the wrong threshold. The hypothesis is validated in 5-fold studies on three organ segmentation problems from the TotalSegmentor data set, using 4 different strengths of noise. The results show that changing the threshold leads the performance of cross-entropy to go from systematically worse than soft-Dice to similar or better results than soft-Dice.	翻訳日:2023-07-11 18:35:59 公開日:2023-07-08
# 自然法則発見のための機械学習 Machine learning for discovering laws of nature ( http://arxiv.org/abs/2303.17607v2 ) ライセンス: Link先を確認	Lizhi Xin, Kevin Xin, Houwen Xin	(参考訳) ダーウィンの自然選択に基づいて、生データから学習することで自然法則を発見する「機械科学者」を開発した。「機械科学者」は、論理木(状態決定木)と値木(観測関数木)を適用して物理理論を構築し、論理木は実体の状態を決定し、値木は実体の2つの観察の間の絶対値を決定する。論理木と値木を組み合わせることで、エンティティの軌道を再構築し、将来の結果を予測することができる。提案したアルゴリズムモデルは機械学習に重点を置いており、そこでは「機械科学者」がそれぞれの決定に対して報われ、罰せられ、最終的にはニュートンの方程式(古典物理学)とボルンの規則(量子力学)を再発見する。 Based on Darwin's natural selection, we developed "machine scientists" to discover the laws of nature by learning from raw data. "Machine scientists" construct physical theories by applying a logic tree (state Decision Tree) and a value tree (observation Function Tree); the logical tree determines the state of the entity, and the value tree determines the absolute value between the two observations of the entity. A logic Tree and a value tree together can reconstruct an entity's trajectory and make predictions about its future outcomes. Our proposed algorithmic model has an emphasis on machine learning - where "machine scientists" builds up its experience by being rewarded or punished for each decision they make - eventually leading to rediscovering Newton's equation (classical physics) and the Born's rule (quantum mechanics).	翻訳日:2023-07-11 18:34:50 公開日:2023-07-08
# ビデオからの衝撃音合成のための物理駆動拡散モデル Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos ( http://arxiv.org/abs/2303.16897v3 ) ライセンス: Link先を確認	Kun Su, Kaizhi Qian, Eli Shlizerman, Antonio Torralba, Chuang Gan	(参考訳) 実世界と仮想世界の没入的知覚経験には,物理物体の相互作用から発生する音のモデル化が重要である。従来の衝撃音合成法では、物理シミュレーションを用いて音を表現・合成できる物理パラメータのセットを得る。しかし、それらは実際の世界ではほとんど利用できず、一般的なビデオからの衝撃音の合成にも適用できない、物体のジオメトリと衝撃位置の両方の詳細な詳細を必要とする。一方、既存のビデオ駆動深層学習に基づくアプローチは、物理知識が不足しているため、視覚内容と衝撃音との弱い対応を捉えることしかできなかった。本研究では,サイレントビデオクリップに対して高忠実度衝撃音を合成できる物理駆動拡散モデルを提案する。ビデオコンテンツに加えて, 衝撃音合成手順を導くために, 追加の物理計算を優先して用いることを提案する。物理学の優先事項には、ノイズの多い実世界の衝撃音例から直接推定される物理パラメータと、ニューラルネットワークを介して音環境を解釈する学習された残留パラメータが含まれている。さらに,物理の優先順位と視覚情報を結合して音響合成を行うための,具体的な学習と推論戦略を備えた新しい拡散モデルの実装を行った。実験の結果, 本モデルが既存のシステムよりも現実的な衝撃音の生成に優れていることがわかった。さらに重要なことに、物理ベースの表現は完全に解釈可能で透明なので、音の編集を柔軟に行える。 Modeling sounds emitted from physical object interactions is critical for immersive perceptual experiences in real and virtual worlds. Traditional methods of impact sound synthesis use physics simulation to obtain a set of physics parameters that could represent and synthesize the sound. However, they require fine details of both the object geometries and impact locations, which are rarely available in the real world and can not be applied to synthesize impact sounds from common videos. On the other hand, existing video-driven deep learning-based approaches could only capture the weak correspondence between visual content and impact sounds since they lack of physics knowledge. In this work, we propose a physics-driven diffusion model that can synthesize high-fidelity impact sound for a silent video clip. In addition to the video content, we propose to use additional physics priors to guide the impact sound synthesis procedure. The physics priors include both physics parameters that are directly estimated from noisy real-world impact sound examples without sophisticated setup and learned residual parameters that interpret the sound environment via neural networks. We further implement a novel diffusion model with specific training and inference strategies to combine physics priors and visual information for impact sound synthesis. Experimental results show that our model outperforms several existing systems in generating realistic impact sounds. More importantly, the physics-based representations are fully interpretable and transparent, thus enabling us to perform sound editing flexibly.	翻訳日:2023-07-11 18:34:34 公開日:2023-07-08
# 巨大言語の頭脳が十分ではないとき! 知識スパークルダストを持つドメインピザズ When Giant Language Brains Just Aren't Enough! Domain Pizzazz with Knowledge Sparkle Dust ( http://arxiv.org/abs/2305.07230v2 ) ライセンス: Link先を確認	Minh-Tien Nguyen, Duy-Hung Nguyen, Shahab Sabahi, Hung Le, Jeff Yang, Hajime Hotta	(参考訳) 大規模言語モデル(llm)は自然言語処理の分野を大幅に進歩させ、gptモデルが最前線にある。その顕著なパフォーマンスはさまざまなタスクにまたがるが、実際のビジネスシナリオにllmを適用することは、さらなる調査を必要とする課題である。本稿では, LLM の実用化におけるギャップを埋めることを目的とした実証分析を行った。そこで本研究では, 推論の課題によるケーススタディとして, 保険の質問応答(QA)タスクを選択する。このタスクに基づいて,保険政策ルールブックやDBpediaから抽出した付加的な知識により,LCMに依存する新しいモデルを設計する。追加知識は、LLMがドメイン適応のための保険の新しい概念を理解するのに役立つ。 2つのQAデータセットの予備的な結果は、知識の強化がGPT-3.5の推論能力(正確性の観点からは55.80%と57.83%)を大幅に改善することを示している。この分析は、DBPediaのような既存の公共知識基盤が知識の強化に有用であることを示している。ビジネスシナリオの本質的な複雑さは、効果的な問題解決のためにドメイン固有の知識と外部リソースを組み込む必要があることが判明した。 Large language models (LLMs) have significantly advanced the field of natural language processing, with GPT models at the forefront. While their remarkable performance spans a range of tasks, adapting LLMs for real-world business scenarios still poses challenges warranting further investigation. This paper presents an empirical analysis aimed at bridging the gap in adapting LLMs to practical use cases. To do that, we select the question answering (QA) task of insurance as a case study due to its challenge of reasoning. Based on the task we design a new model relied on LLMs which are empowered by additional knowledge extracted from insurance policy rulebooks and DBpedia. The additional knowledge helps LLMs to understand new concepts of insurance for domain adaptation. Preliminary results on two QA datasets show that knowledge enhancement significantly improves the reasoning ability of GPT-3.5 (55.80% and 57.83% in terms of accuracy). The analysis also indicates that existing public knowledge bases, e.g., DBPedia is beneficial for knowledge enhancement. Our findings reveal that the inherent complexity of business scenarios often necessitates the incorporation of domain-specific knowledge and external resources for effective problem-solving.	翻訳日:2023-07-11 18:25:07 公開日:2023-07-08
# 勧告基礎モデルの項目IDの索引付け方法 How to Index Item IDs for Recommendation Foundation Models ( http://arxiv.org/abs/2305.06569v4 ) ライセンス: Link先を確認	Wenyue Hua, Shuyuan Xu, Yingqiang Ge, Yongfeng Zhang	(参考訳) Recommendation foundation modelは、リコメンデーションタスクを自然言語タスクに変換することで、リコメンデーションのために大きな言語モデル(LLM)を利用する。従来のレコメンデーションモデルでは、各候補項目と各候補項目のランキングスコアを計算するのではなく、アイテムを直接生成する生成レコメンデーションを可能にし、マルチステージフィルタリングからシングルステージフィルタリングまでのレコメンデーションパイプラインを簡素化する。推奨項目を決定する際に、過剰に長いテキストを生成するのを避けるために、推奨基礎モデルにはLLM互換アイテムIDを作成することが不可欠である。本研究では,P5を代表的バックボーンモデルとし,様々なインデクシング手法を用いて結果の再現を行い,推薦基礎モデルの項目インデックス化問題を体系的に検討する。項目インデクシングの重要性を強調するため,まず,独立したインデクシング,タイトルインデクシング,ランダムインデクシングなど,いくつかの自明な項目インデクシング手法の問題について論じる。次に,シーケンシャルインデクシング,協調インデクシング,セマンティック(コンテンツベース)インデクシング,ハイブリッドインデクシングという,シンプルかつ効果的な4つのソリューションを提案する。 P5 の再現性調査では,項目インデックス法がモデル性能に与える影響が明らかになり,提案手法の有効性を実世界のデータセットで検証した。 Recommendation foundation model utilizes large language models (LLM) for recommendation by converting recommendation tasks into natural language tasks. It enables generative recommendation which directly generates the item(s) to recommend rather than calculating a ranking score for each and every candidate item in traditional recommendation models, simplifying the recommendation pipeline from multi-stage filtering to single-stage filtering. To avoid generating excessively long text when deciding which item(s) to recommend, creating LLM-compatible item IDs is essential for recommendation foundation models. In this study, we systematically examine the item indexing problem for recommendation foundation models, using P5 as the representative backbone model and replicating its results with various indexing methods. To emphasize the importance of item indexing, we first discuss the issues of several trivial item indexing methods, such as independent indexing, title indexing, and random indexing. We then propose four simple yet effective solutions, including sequential indexing, collaborative indexing, semantic (content-based) indexing, and hybrid indexing. Our reproducibility study of P5 highlights the significant influence of item indexing methods on the model performance, and our results on real-world datasets validate the effectiveness of our proposed solutions.	翻訳日:2023-07-11 18:24:48 公開日:2023-07-08
# 拡張現実システムと3Dデジタル双生児を用いた低資源アフリカにおける健康と幸福の再構築 Re-imagining health and well-being in low resource African settings using an augmented AI system and a 3D digital twin ( http://arxiv.org/abs/2306.01772v2 ) ライセンス: Link先を確認	Deshendran Moodley and Christopher Seebregts	(参考訳) 本稿では、低資源アフリカ諸国における、人工知能(AI)とデジタル双生児の健康と幸福のための最近の発展の可能性と意義について論じる。我々は,疫病の流行と疫病対策に対する公衆衛生緊急対応の事例を用いている。分析と予測のための高度なAI手法を開発するために、データとデジタル化の可用性の増大を利用する可能性がある。 AIシステムの観点から、AIシステムとデジタルツインの出現するトレンドをレビューし、AIシステムが公共の健康目標に対処するために3Dデジタルツインとどのように機能するかを説明するために、初期のAIシステムアーキテクチャを提案する。我々は、AIシステムとデジタル双生児にとって不可欠な研究課題として、科学的知識発見、継続的な学習、実用的相互運用、インタラクティブな説明と意思決定を強調します。 This paper discusses and explores the potential and relevance of recent developments in artificial intelligence (AI) and digital twins for health and well-being in low-resource African countries. We use the case of public health emergency response to disease outbreaks and epidemic control. There is potential to take advantage of the increasing availability of data and digitization to develop advanced AI methods for analysis and prediction. Using an AI systems perspective, we review emerging trends in AI systems and digital twins and propose an initial augmented AI system architecture to illustrate how an AI system can work with a 3D digital twin to address public health goals. We highlight scientific knowledge discovery, continual learning, pragmatic interoperability, and interactive explanation and decision-making as essential research challenges for AI systems and digital twins.	翻訳日:2023-07-11 18:17:22 公開日:2023-07-08
# 自然言語理解のための変圧器の量子化とテンソル圧縮訓練 Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding ( http://arxiv.org/abs/2306.01076v2 ) ライセンス: Link先を確認	Zi Yang, Samridhi Choudhary, Siegfried Kunzmann, Zheng Zhang	(参考訳) 微調整トランスフォーマーモデルは、多くの自然言語タスクにおいて優れた性能を示している。しかし、大きなモデルサイズは、リソース制約のあるデバイスに高性能トランスフォーマーモデルを展開することを禁止している。本稿では,モデルサイズ,演算演算,最終的にトランスフォーマーモデルの実行待ち時間を削減するために,量子化認識テンソル圧縮トレーニング手法を提案する。我々はトランスの埋め込み層と線形層を小さな低ランクテンソルコアに圧縮し、モデルパラメータを著しく削減する。テンソル圧縮モデルの低精度表現を得るために、学習可能なスケール因子を用いた量子化アウェアトレーニングを用いる。開発されたアプローチは、エンドツーエンドのトレーニングと蒸留ベースのトレーニングの両方に使用できる。収束性を向上させるため, 既訓練変圧器から量子化およびテンソル圧縮された学生モデルを蒸留するために層間蒸留を適用した。パフォーマンスは2つの自然言語理解タスクで実証され、最大63\times$の圧縮率、精度の低下、驚くべき推論とトレーニングのスピードアップが示される。 Fine-tuned transformer models have shown superior performances in many natural language tasks. However, the large model size prohibits deploying high-performance transformer models on resource-constrained devices. This paper proposes a quantization-aware tensor-compressed training approach to reduce the model size, arithmetic operations, and ultimately runtime latency of transformer-based models. We compress the embedding and linear layers of transformers into small low-rank tensor cores, which significantly reduces model parameters. A quantization-aware training with learnable scale factors is used to further obtain low-precision representations of the tensor-compressed models. The developed approach can be used for both end-to-end training and distillation-based training. To improve the convergence, a layer-by-layer distillation is applied to distill a quantized and tensor-compressed student model from a pre-trained transformer. The performance is demonstrated in two natural language understanding tasks, showing up to $63\times$ compression ratio, little accuracy loss and remarkable inference and training speedup.	翻訳日:2023-07-11 18:16:20 公開日:2023-07-08
# 複数質問応答のための大規模言語モデルによるコンフォーマル予測 Conformal Prediction with Large Language Models for Multi-Choice Question Answering ( http://arxiv.org/abs/2305.18404v3 ) ライセンス: Link先を確認	Bhawesh Kumar, Charlie Lu, Gauri Gupta, Anil Palepu, David Bellamy, Ramesh Raskar, Andrew Beam	(参考訳) 大規模言語モデルが広く開発され続けるにつれて、ロバストな不確実性定量化技術が、高スループットシナリオにおける安全なデプロイメントに不可欠になる。本研究では,複数質問応答の特定のタスクに対して,共形予測を用いて言語モデルに不確かさの定量化を行う方法について検討する。共形予測からの不確実性推定は予測精度と密接に相関していることがわかった。この観測は、選択分類や低品質予測のフィルタリングといった下流の応用に有用である。また,共形予測が主観的疑問に求める交換可能性の仮定についても検討し,多くの実用的応用においてより現実的なシナリオとなる可能性について考察した。我々の研究は、エラー率の確実な保証が必要な安全クリティカルな状況において、より信頼性が高く信頼性の高い大規模言語モデルの活用に寄与する。 As large language models continue to be widely developed, robust uncertainty quantification techniques will become crucial for their safe deployment in high-stakes scenarios. In this work, we explore how conformal prediction can be used to provide uncertainty quantification in language models for the specific task of multiple-choice question-answering. We find that the uncertainty estimates from conformal prediction are tightly correlated with prediction accuracy. This observation can be useful for downstream applications such as selective classification and filtering out low-quality predictions. We also investigate the exchangeability assumption required by conformal prediction to out-of-subject questions, which may be a more realistic scenario for many practical applications. Our work contributes towards more trustworthy and reliable usage of large language models in safety-critical situations, where robust guarantees of error rate are required.	翻訳日:2023-07-11 18:14:39 公開日:2023-07-08
# 構造化データの生成拡散モデルに関する包括的調査 A Comprehensive Survey on Generative Diffusion Models for Structured Data ( http://arxiv.org/abs/2306.04139v2 ) ライセンス: Link先を確認	Heejoon Koo, To Eun Kim	(参考訳) 近年, 生成拡散モデルでは, 様々なアプリケーションにまたがる基礎的な性能を示すことによって, 深層生成モデルのパラダイムシフトが急速に進んでいる。一方、表データと時系列データを含む構造化データは、その全盛期と広範な応用にもかかわらず、ディープラーニング研究コミュニティから比較的限定的な注目を集めている。したがって、ビジュアルデータやテキストデータといった他のデータモダリティと比較して、拡散モデルによる構造化データモデリングに関する文献やレビューは依然として欠落している。このギャップに対処するために,最近提案されている構造化データ分野の拡散モデルの包括的レビューを行う。まず、この調査はスコアベース拡散モデル理論の簡潔な概要を提供し、その後、データ駆動の汎用タスクとドメイン固有のアプリケーションの両方で構造化データを使用した先駆的な研究の技術的な記述へと進む。その後,既存の研究における限界や課題を分析し,議論し,今後の研究方向性を提案する。このレビューが研究コミュニティの触媒となり、構造化データの生成拡散モデルの発展を促進することを願っている。 In recent years, generative diffusion models have achieved a rapid paradigm shift in deep generative models by showing groundbreaking performance across various applications. Meanwhile, structured data, encompassing tabular and time series data, has been received comparatively limited attention from the deep learning research community, despite its omnipresence and extensive applications. Thus, there is still a lack of literature and its reviews on structured data modelling via diffusion models, compared to other data modalities such as visual and textual data. To address this gap, we present a comprehensive review of recently proposed diffusion models in the field of structured data. First, this survey provides a concise overview of the score-based diffusion model theory, subsequently proceeding to the technical descriptions of the majority of pioneering works that used structured data in both data-driven general tasks and domain-specific applications. Thereafter, we analyse and discuss the limitations and challenges shown in existing works and suggest potential research directions. We hope this review serves as a catalyst for the research community, promoting developments in generative diffusion models for structured data.	翻訳日:2023-07-11 18:04:14 公開日:2023-07-08
# AdAM:Adaptation-Aware Kernel ModulationによるFew-Shot画像生成 AdAM: Few-Shot Image Generation via Adaptation-Aware Kernel Modulation ( http://arxiv.org/abs/2307.01465v2 ) ライセンス: Link先を確認	Yunqing Zhao, Keshigeyan Chandrasegaran, Abdollahzadeh Milad, Chao Du, Tianyu Pang, Ruoteng Li, Henghui Ding, Ngai-Man Cheung	(参考訳) Few-shot Image Generation (FSIG)は、少数のトレーニングサンプル(例:10)が与えられた新しい多様な画像を生成することを目的としている。最近の研究は、大規模なソースドメインで事前訓練されたGANを活用し、ターゲットドメインに適応することでFSIGに対処している。最近のFSIG手法の中心は知識保存基準であり、適応されたモデルにソース知識のサブセットを選択し保存する。しかし、既存の方法の大きな制限は、知識保存基準がソースドメイン/タスクのみを考慮し、ソース知識の選択においてターゲットドメイン/適応を考慮せず、ソースドメインとターゲットドメインの近接性の異なる設定に適合性に疑問を投げかけることである。私たちの仕事は2つの貢献をする。まず,最近のFSIG研究とその実験について再検討する。ソースドメインとターゲットドメインの近接性が緩和されるという仮定の下では、知識保存におけるソースドメインのみを考慮した既存のsota(state-of-the-art)メソッドがベースラインメソッドよりも優れていることが判明した。第2の貢献として、異なるソース・ターゲット領域近接の一般FSIGに対してAdaptation-Aware kernel Modulation (AdAM)を提案する。大規模な実験により、AdAMはFSIGのSOTAパフォーマンスを一貫して達成し、ソースドメインとターゲットドメインがより分離された困難なセットアップを含むことを示した。 Few-shot image generation (FSIG) aims to learn to generate new and diverse images given few (e.g., 10) training samples. Recent work has addressed FSIG by leveraging a GAN pre-trained on a large-scale source domain and adapting it to the target domain with few target samples. Central to recent FSIG methods are knowledge preservation criteria, which select and preserve a subset of source knowledge to the adapted model. However, a major limitation of existing methods is that their knowledge preserving criteria consider only source domain/task and fail to consider target domain/adaptation in selecting source knowledge, casting doubt on their suitability for setups of different proximity between source and target domain. Our work makes two contributions. Firstly, we revisit recent FSIG works and their experiments. We reveal that under setups which assumption of close proximity between source and target domains is relaxed, many existing state-of-the-art (SOTA) methods which consider only source domain in knowledge preserving perform no better than a baseline method. As our second contribution, we propose Adaptation-Aware kernel Modulation (AdAM) for general FSIG of different source-target domain proximity. Extensive experiments show that AdAM consistently achieves SOTA performance in FSIG, including challenging setups where source and target domains are more apart.	翻訳日:2023-07-11 17:57:58 公開日:2023-07-08
# プロンプトクラス:弱教師付きセマンティックセグメンテーションにおけるプロンプトクラス学習の力を探る Prompting classes: Exploring the Power of Prompt Class Learning in Weakly Supervised Semantic Segmentation ( http://arxiv.org/abs/2307.00097v2 ) ライセンス: Link先を確認	Balamurali Murugesan, Rukhshanda Hussain, Rajarshi Bhattacharya, Ismail Ben Ayed, and Jose Dolz	(参考訳) 近年、CLIPベースのアプローチは、対照的な言語ビジョン事前学習の力によって、一般化と少数ショット学習タスクにおいて顕著なパフォーマンスを示した。特に,タスク関連テキストトークンを用いることで,事前学習した言語ビジョンモデルを下流タスクに適応するための効果的な手法として,プロンプトチューニングが登場している。この進展に動機づけられ、本研究では、wsss(weakly supervised semantic segmentation)のような他の基本的な問題に対して、迅速なチューニングの恩恵を受けるかどうかを疑問視する。以上の結果から,WSSSにおける即時チューニングの影響について,興味深い2つの観察結果が得られた。まず、テキストプロンプトのクラストークンのみを変更すると、コンテキストを最適化するより複雑な戦略に比べて、クラスアクティベーションマップ(cam)に大きな影響を与える。第二に、画像基底真理に関連するクラストークンは、必ずしも最高のCAMをもたらすカテゴリに対応しない。これらの観測を動機として,PrOmpt cLass lEarning(POLE)戦略に基づく新しいアプローチを導入する。大規模な実験を通じて、我々のシンプルで効率的なアプローチは、よく知られたWSSSベンチマークでSOTAのパフォーマンスを達成することを実証した。これらの結果は、WSSSにおける言語ビジョンモデルの利点だけでなく、この問題に対する迅速な学習の可能性も浮き彫りにしている。コードはhttps://github.com/rB080/WSS_POLEで公開されている。 Recently, CLIP-based approaches have exhibited remarkable performance on generalization and few-shot learning tasks, fueled by the power of contrastive language-vision pre-training. In particular, prompt tuning has emerged as an effective strategy to adapt the pre-trained language-vision models to downstream tasks by employing task-related textual tokens. Motivated by this progress, in this work we question whether other fundamental problems, such as weakly supervised semantic segmentation (WSSS), can benefit from prompt tuning. Our findings reveal two interesting observations that shed light on the impact of prompt tuning on WSSS. First, modifying only the class token of the text prompt results in a greater impact on the Class Activation Map (CAM), compared to arguably more complex strategies that optimize the context. And second, the class token associated with the image ground truth does not necessarily correspond to the category that yields the best CAM. Motivated by these observations, we introduce a novel approach based on a PrOmpt cLass lEarning (POLE) strategy. Through extensive experiments we demonstrate that our simple, yet efficient approach achieves SOTA performance in a well-known WSSS benchmark. These results highlight not only the benefits of language-vision models in WSSS but also the potential of prompt learning for this problem. The code is available at https://github.com/rB080/WSS_POLE.	翻訳日:2023-07-11 17:57:19 公開日:2023-07-08
# ラベル制約付き正規化と推論について On Regularization and Inference with Label Constraints ( http://arxiv.org/abs/2307.03886v1 ) ライセンス: Link先を確認	Kaifu Wang, Hangfeng He, Tin D. Nguyen, Piyush Kumar, Dan Roth	(参考訳) 機械学習における事前知識や記号規則はラベル制約、特に構造化予測問題といった形で表現されることが多い。本研究では,機械学習パイプラインにおけるラベル制約の符号化,制約付き正規化,制約付き推論の2つの一般的な戦略を比較し,モデル性能への影響を定量化する。正規化については、制約に矛盾するモデルに先行して一般化ギャップを狭めることを示す。しかし、その小さな違反に対する好みは、準最適モデルに対するバイアスをもたらす。制約付き推論では,モデルの違反を訂正することで人口リスクを低減し,その結果,違反を有利にすることを示す。これらの違いを考慮し, 2つの手法の併用をさらに検討し, モデル複雑性と最適リスクの両方を改善することを目的とした, 正規化によるバイアスを補償するための制約付き推論条件を提案する。 Prior knowledge and symbolic rules in machine learning are often expressed in the form of label constraints, especially in structured prediction problems. In this work, we compare two common strategies for encoding label constraints in a machine learning pipeline, regularization with constraints and constrained inference, by quantifying their impact on model performance. For regularization, we show that it narrows the generalization gap by precluding models that are inconsistent with the constraints. However, its preference for small violations introduces a bias toward a suboptimal model. For constrained inference, we show that it reduces the population risk by correcting a model's violation, and hence turns the violation into an advantage. Given these differences, we further explore the use of two approaches together and propose conditions for constrained inference to compensate for the bias introduced by regularization, aiming to improve both the model complexity and optimal risk.	翻訳日:2023-07-11 17:00:00 公開日:2023-07-08
# 組合せ最適化のための変分量子固有解法の計算勾配に対するノイズテンソルリング近似 Noisy Tensor Ring approximation for computing gradients of Variational Quantum Eigensolver for Combinatorial Optimization ( http://arxiv.org/abs/2307.03884v1 ) ライセンス: Link先を確認	Dheeraj Peddireddy, Utkarsh Priyam, Vaneet Aggarwal	(参考訳) 変分量子アルゴリズム、特に量子近似最適化と変分量子固有解法(VQE)は、組合せ最適化の領域で計算上の利点を提供する可能性を確立している。しかし、これらのアルゴリズムは、スケーラビリティを制限する古典的な難解な勾配に苦しむ。本研究は,パラメータシフト則を用いた古典的勾配計算法を提案し,テンソルリング近似を用いて回路から期待値を算出することにより,VQEのスケーラビリティ問題に対処する。回路からのパラメータ付きゲートはテンソル環の自由辺に沿って行列を収縮することでテンソル環を変換する。単一量子ビットゲートは環構造を変化させないが、2つの量子ビット回転からの状態変換を特異値の切り換えにより評価し、テンソル環の構造を保ち、計算複雑性を低減させる。この行列積状態近似の変動は、古典的シミュレーションの指数的な成長とは対照的に、キュービット数と2つのキュービットゲート数で線形に増加し、古典的シミュレータの勾配のより高速な評価を可能にする。 Variational Quantum algorithms, especially Quantum Approximate Optimization and Variational Quantum Eigensolver (VQE) have established their potential to provide computational advantage in the realm of combinatorial optimization. However, these algorithms suffer from classically intractable gradients limiting the scalability. This work addresses the scalability challenge for VQE by proposing a classical gradient computation method which utilizes the parameter shift rule but computes the expected values from the circuits using a tensor ring approximation. The parametrized gates from the circuit transform the tensor ring by contracting the matrix along the free edges of the tensor ring. While the single qubit gates do not alter the ring structure, the state transformations from the two qubit rotations are evaluated by truncating the singular values thereby preserving the structure of the tensor ring and reducing the computational complexity. This variation of the Matrix product state approximation grows linearly in number of qubits and the number of two qubit gates as opposed to the exponential growth in the classical simulations, allowing for a faster evaluation of the gradients on classical simulators.	翻訳日:2023-07-11 16:59:45 公開日:2023-07-08
# 混合開始型ビデオゲームの設計 Designing Mixed-Initiative Video Games ( http://arxiv.org/abs/2307.03877v1 ) ライセンス: Link先を確認	Daijin Yang	(参考訳) 人工知能(AI)の開発により、人間は機械でコンテンツを共同作成できる。 AIが生成するコンテンツの意外性は、ユーザーにインスピレーションとエンターテイメントをもたらす可能性がある。しかし、共同制作インタラクションは常にコンテンツクリエーター向けに設計されており、アクセシビリティに乏しい。スネークストーリー(Snake Story)は、ゲームのように「スネーク」をプレイすることで、プレイヤーがAI生成したテキストを選択してヘビの物語を書くことができる複合開始型ゲームである。ゲームコンポーネントを設計したインタフェースで使用せずにプレイヤとAIのインタラクションのダイナミクスを調べるための制御実験を行った。 11名 (n=11) による調査の結果, プレイヤーは2つのバージョンで遊ぶ際に異なる戦略を用い, ゲーム機構はアウトプットストーリー, プレイヤーの創造的プロセス, ロール知覚に大きく影響し, 異なる背景を持つプレイヤーは2つのバージョンに対して異なる好みを示した。これらの結果に基づき,混合開始型ゲームの設計について考察した。この研究は、共同創造体験を刺激することを目的としている。 The development of Artificial Intelligence (AI) enables humans to co-create content with machines. The unexpectedness of AI-generated content can bring inspiration and entertainment to users. However, the co-creation interactions are always designed for content creators and have poor accessibility. To explore gamification of mixed-initiative co-creation and make human-AI interactions accessible and fun for players, I prototyped Snake Story, a mixed-initiative game where players can select AI-generated texts to write a story of a snake by playing a "Snake" like game. A controlled experiment was conducted to investigate the dynamics of player-AI interactions with and without the game component in the designed interface. As a result of a study with 11 players (n=11), I found that players utilized different strategies when playing with the two versions, game mechanics significantly affected the output stories, players' creative process, as well as role perceptions, and players with different backgrounds showed different preferences for the two versions. Based on these results, I further discussed considerations for mixed-initiative game design. This work aims to inspire the design of engaging co-creation experiences.	翻訳日:2023-07-11 16:59:26 公開日:2023-07-08
# サプライチェーン最適化のための大規模言語モデル Large Language Models for Supply Chain Optimization ( http://arxiv.org/abs/2307.03875v1 ) ライセンス: Link先を確認	Beibin Li, Konstantina Mellou, Bo Zhang, Jeevan Pathuri, Ishai Menache	(参考訳) サプライチェーンの運用は伝統的に様々な複雑な意思決定の問題を伴う。過去数十年間、サプライチェーンは計算の進歩の大きな恩恵を受け、手動処理から自動化、コスト効率の最適化へと移行した。それでも、ビジネスオペレーターは \emph{explaining} に多大な労力を費やし、最適化の結果をステークホルダーに解釈する必要がある。近年のLarge Language Models (LLMs) の進歩に触発され,サプライチェーンの自動化と人間の理解,信頼のギャップを埋める上で,この破壊的技術がいかに役立つかを検討する。私たちは \name{} -- プレーンテキストで入力クエリとして受け入れ、基礎となる最適化結果に関する洞察を出力するフレームワークを設計します。我々のフレームワークは、最先端の組合せ最適化技術を捨てるのではなく、それを利用して、何のシナリオ(例えば、ある需要に対してサプライヤーAの代わりにサプライヤーBを使用する場合、コストはどのように変化するのか? 重要なことは、当社の設計では、LLMにプロプライエタリなデータを送らなくてもよいということです。当社のフレームワークがMicrosoftのクラウドサプライチェーン内の実際のサーバ配置シナリオに与える影響を実証する。そこで我々は,他のシナリオにおけるllm出力の精度を評価するための汎用評価ベンチマークを開発した。 Supply chain operations traditionally involve a variety of complex decision making problems. Over the last few decades, supply chains greatly benefited from advances in computation, which allowed the transition from manual processing to automation and cost-effective optimization. Nonetheless, business operators still need to spend substantial efforts in \emph{explaining} and interpreting the optimization outcomes to stakeholders. Motivated by the recent advances in Large Language Models (LLMs), we study how this disruptive technology can help bridge the gap between supply chain automation and human comprehension and trust thereof. We design \name{} -- a framework that accepts as input queries in plain text, and outputs insights about the underlying optimization outcomes. Our framework does not forgo the state-of-the-art combinatorial optimization technology, but rather leverages it to quantitatively answer what-if scenarios (e.g., how would the cost change if we used supplier B instead of supplier A for a given demand?). Importantly, our design does not require sending proprietary data over to LLMs, which can be a privacy concern in some circumstances. We demonstrate the effectiveness of our framework on a real server placement scenario within Microsoft's cloud supply chain. Along the way, we develop a general evaluation benchmark, which can be used to evaluate the accuracy of the LLM output in other scenarios.	翻訳日:2023-07-11 16:59:06 公開日:2023-07-08
# デジタル病理におけるKi-67スコーリングのための銀標準ラベルを用いたドメイン適応 Domain Adaptation using Silver Standard Labels for Ki-67 Scoring in Digital Pathology: A Step Closer to Widescale Deployment ( http://arxiv.org/abs/2307.03872v1 ) ライセンス: Link先を確認	Amanda Dy, Ngoc-Nhu Jennifer Nguyen, Seyed Hossein Mirjahanmardi, Melanie Dawe, Anthony Fyles, Wei Shi, Fei-Fei Liu, Dimitrios Androutsos, Susan Done and April Khademi	(参考訳) 深層学習システムは,Ki-67 PIスコアの客観性と効率を向上させるために提案されている。課題は、非常に正確なディープラーニング技術では、ドメイン外のデータに適用した場合のパフォーマンスが低下することです。モデルは通常、ターゲットドメインからではなく、ベンダーが利用可能なデータを使用してトレーニングされるため、これは臨床翻訳にとって重要な課題である。そこで本研究では,対象ドメインの銀標準(擬似)ラベルを生成するために,教師なしフレームワークを用いて,ゴールド標準(GS)ソースドメインデータの拡張を行うドメイン適応パイプラインを提案する。評価された2つのKi-67スコアリングアーキテクチャ(UV-Net, piNET), (1) SSのみ, (2) GSのみ,(2) GSはソースGSラベルで,(3) GSはソースGSラベルで,(4) GS+SSはソースGSラベルで, 微調整はターゲットSSラベルで, そして(5) SS+GSはソースSSラベルで, ソースGSラベルでのみ訓練された。 SS+GS法は(p < 0.05)高いPI精度(95.9%)と目標データに対するGS onlyモデルよりも一貫性のある結果を得た。 t-SNEプロットの解析により、SS+GSモデルにより学習された特徴が、ソースデータとターゲットデータにより整合していることが示され、一般化が改善された。提案するパイプラインは,手作業によるアノテーションを使わずに目標分布を学習するための効率的な手法を提供する。このフレームワークは、広範囲なデプロイメントのために、作業単位のキャリブレーション方法として、任意のターゲットサイトに適用することができる。 Deep learning systems have been proposed to improve the objectivity and efficiency of Ki- 67 PI scoring. The challenge is that while very accurate, deep learning techniques suffer from reduced performance when applied to out-of-domain data. This is a critical challenge for clinical translation, as models are typically trained using data available to the vendor, which is not from the target domain. To address this challenge, this study proposes a domain adaptation pipeline that employs an unsupervised framework to generate silver standard (pseudo) labels in the target domain, which is used to augment the gold standard (GS) source domain data. Five training regimes were tested on two validated Ki-67 scoring architectures (UV-Net and piNET), (1) SS Only: trained on target silver standard (SS) labels, (2) GS Only: trained on source GS labels, (3) Mixed: trained on target SS and source GS labels, (4) GS+SS: trained on source GS labels and fine-tuned on target SS labels, and our proposed method (5) SS+GS: trained on source SS labels and fine-tuned on source GS labels. The SS+GS method yielded significantly (p < 0.05) higher PI accuracy (95.9%) and more consistent results compared to the GS Only model on target data. Analysis of t-SNE plots showed features learned by the SS+GS models are more aligned for source and target data, resulting in improved generalization. The proposed pipeline provides an efficient method for learning the target distribution without manual annotations, which are time-consuming and costly to generate for medical images. This framework can be applied to any target site as a per-laboratory calibration method, for widescale deployment.	翻訳日:2023-07-11 16:58:41 公開日:2023-07-08
# HUMS2023 データチャレンジ結果提出 HUMS2023 Data Challenge Result Submission ( http://arxiv.org/abs/2307.03871v1 ) ライセンス: Link先を確認	Dhiraj Neupane, Lakpa Dorje Tamang, Ngoc Dung Huynh, Mohamed Reda Bouadjenek and Sunil Aryal	(参考訳) 本研究では,早期発見のための簡単な手法を実装した。実装された手法は、与えられたマットファイルをプロットし、サンプル上で連続ウェーブレット変換(cwt)を行うことで生成されたスカルグラム画像を分析する。また、各信号の平均値、標準偏差(STD)、ピーク対ピーク(P2P)値も故障信号の検出に役立った。我々は,自動回帰統合移動平均(ARIMA)法を実装した。 We implemented a simple method for early detection in this research. The implemented methods are plotting the given mat files and analyzing scalogram images generated by performing Continuous Wavelet Transform (CWT) on the samples. Also, finding the mean, standard deviation (STD), and peak-to-peak (P2P) values from each signal also helped detect faulty signs. We have implemented the autoregressive integrated moving average (ARIMA) method to track the progression.	翻訳日:2023-07-11 16:57:59 公開日:2023-07-08
# sketch-a-shape:ゼロショットスケッチから3d形状生成 Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation ( http://arxiv.org/abs/2307.03869v1 ) ライセンス: Link先を確認	Aditya Sanghi, Pradeep Kumar Jayaraman, Arianna Rampini, Joseph Lambourne, Hooman Shayani, Evan Atherton, Saeid Asgari Taghanaki	(参考訳) テキストから形状への生成のような3次元視覚におけるダウンストリームタスクのための大規模事前学習モデルの創造的応用において、近年大きな進歩がなされている。このことは、これらの事前学習モデルがスケッチから3次元形状を効果的に生成する方法について、我々の研究の動機となっている。トレーニング中に合成レンダリングの特徴(凍結した大きな学習済み視覚モデルから得られる)に3次元生成モデルを条件付けすることで、推論時にスケッチから3次元形状を効果的に生成できることがわかった。これは、事前訓練された大きな視覚モデル機能には、ドメインシフトに耐性のあるセマンティック信号、すなわち、RGBレンダリングのみを使用できるが、推論時にスケッチに一般化できることを示唆している。異なる設計要因を調査した総合的な実験を行い、トレーニング中にペアのデータセットを必要とせずに、それぞれの入力スケッチ毎に複数の3d形状を生成するための簡単なアプローチの有効性を実証する。 Significant progress has recently been made in creative applications of large pre-trained models for downstream tasks in 3D vision, such as text-to-shape generation. This motivates our investigation of how these pre-trained models can be used effectively to generate 3D shapes from sketches, which has largely remained an open challenge due to the limited sketch-shape paired datasets and the varying level of abstraction in the sketches. We discover that conditioning a 3D generative model on the features (obtained from a frozen large pre-trained vision model) of synthetic renderings during training enables us to effectively generate 3D shapes from sketches at inference time. This suggests that the large pre-trained vision model features carry semantic signals that are resilient to domain shifts, i.e., allowing us to use only RGB renderings, but generalizing to sketches at inference time. We conduct a comprehensive set of experiments investigating different design factors and demonstrate the effectiveness of our straightforward approach for generation of multiple 3D shapes per each input sketch regardless of their level of abstraction without requiring any paired datasets during training.	翻訳日:2023-07-11 16:57:50 公開日:2023-07-08
# 無線ネットワークにおけるパーソナライズドリソース割り当て:ai対応およびビッグデータ駆動多目的最適化 Personalized Resource Allocation in Wireless Networks: An AI-Enabled and Big Data-Driven Multi-Objective Optimization ( http://arxiv.org/abs/2307.03867v1 ) ライセンス: Link先を確認	Rawan Alkurd, Ibrahim Abualhaol, Halim Yanikomeroglu	(参考訳) 無線ネットワークの設計と最適化は、主に強力な数学的および理論的モデリングに基づいている。それでも、5G以降の時代に新しいアプリケーションが出現すると、ネットワークの設計と最適化において、前例のないレベルの複雑さが生じることになる。結果として、非常に複雑な問題をリアルタイムに解決できる柔軟性と適応性のために、ワイヤレスネットワークの設計と最適化のために人工知能(ai)の使用が想定されている。 aiの主な将来の応用の1つは、多くのユースケースでユーザーレベルのパーソナライズを可能にすることである。 aiは、コンピュータが人間から命令や感情を非インタラクティブな方法で感知し、プロセス全体をユーザに透明にするコンピュータとの対話方法に革命をもたらすだろう。この機能を活用し、コンピューティング技術の進歩により、無線ネットワークを再設計し、ネットワークサービスのパーソナライズをリアルタイムに行えるようにすることができる。現在の無線ネットワークは、予め定義された品質要件を満たすために最適化されているが、この論文で提唱されるパーソナライズ技術は、不足するネットワークリソースをマイクロマネジメントするために設計されたインテリジェントなビッグデータ駆動層によって支えられている。このレイヤは、各ユーザの満足度レベルを達成するために必要なサービス品質を決定するために必要なインテリジェンスを提供する。動的で柔軟な設計のため、パーソナライズされたネットワークは、リソースの節約とユーザ満足度の向上という2つの矛盾する目標を最適化することで、前例のない改善が期待されている。 The design and optimization of wireless networks have mostly been based on strong mathematical and theoretical modeling. Nonetheless, as novel applications emerge in the era of 5G and beyond, unprecedented levels of complexity will be encountered in the design and optimization of the network. As a result, the use of Artificial Intelligence (AI) is envisioned for wireless network design and optimization due to the flexibility and adaptability it offers in solving extremely complex problems in real-time. One of the main future applications of AI is enabling user-level personalization for numerous use cases. AI will revolutionize the way we interact with computers in which computers will be able to sense commands and emotions from humans in a non-intrusive manner, making the entire process transparent to users. By leveraging this capability, and accelerated by the advances in computing technologies, wireless networks can be redesigned to enable the personalization of network services to the user level in real-time. While current wireless networks are being optimized to achieve a predefined set of quality requirements, the personalization technology advocated in this article is supported by an intelligent big data-driven layer designed to micro-manage the scarce network resources. This layer provides the intelligence required to decide the necessary service quality that achieves the target satisfaction level for each user. Due to its dynamic and flexible design, personalized networks are expected to achieve unprecedented improvements in optimizing two contradicting objectives in wireless networks: saving resources and improving user satisfaction levels.	翻訳日:2023-07-11 16:57:30 公開日:2023-07-08
# 超ハイゼンベルク精密な長距離相互作用スターク多体プローブ Long-range interacting Stark many-body probes with Super-Heisenberg precision ( http://arxiv.org/abs/2307.03904v1 ) ライセンス: Link先を確認	Rozhin Yousefjani, Xingjian He, and Abolfazl Bayat	(参考訳) 粒子間相互作用が有害であるインターフェロメトリベースの量子センシングとは対照的に、量子多体プローブはそのような相互作用を利用して量子増強感度を達成する。研究された多くの量子多体プローブでは、相互作用は短距離であると考えられている。本稿では,様々な充填因子における長距離相互作用がStark量子プローブの性能に及ぼす影響について検討する。これらのプローブは、システムサイズが増加するにつれて無限小勾配場で起こる基底状態スターク局在化相転移を利用する。その結果、超ハイゼンベルク精度は常に全ての相互作用範囲において達成可能であるが、長距離相互作用スタークプローブは2つの異なる挙動を明らかにした。第一に、相互作用の範囲を代数的に増加させることで、局所化のパワーは増大し、プローブの感度は低下する。第2に、相互作用範囲が完全連結グラフに近づくと、効果的な局在化力が消失し、プローブの感度が再び向上し始める。超ハイゼンベルク精度は、遷移点まで延長段階を通して達成可能であり、資源分析に状態準備時間が組み込まれても有効である。プローブが局在した位相に入ると感度が低下し、その性能は普遍的な振る舞いに従ってサイズ非依存になる。さらに, 解析の結果, 充填率の低下が弱勾配場の測定精度の向上につながることが示された。 In contrast to interferometry-based quantum sensing, where interparticle interaction is detrimental, quantum many-body probes exploit such interactions to achieve quantum-enhanced sensitivity. In most of the studied quantum many-body probes, the interaction is considered to be short-ranged. Here, we investigate the impact of long-range interaction at various filling factors on the performance of Stark quantum probes for measuring a small gradient field. These probes harness the ground state Stark localization phase transition which happens at an infinitesimal gradient field as the system size increases. Our results show that while super-Heisenberg precision is always achievable in all ranges of interaction, the long-range interacting Stark probe reveals two distinct behaviors. First, by algebraically increasing the range of interaction, the localization power enhances and thus the sensitivity of the probe decreases. Second, as the interaction range becomes close to a fully connected graph its effective localization power disappears and thus the sensitivity of the probe starts to enhance again. The super-Heisenberg precision is achievable throughout the extended phase until the transition point and remains valid even when the state preparation time is incorporated in the resource analysis. As the probe enters the localized phase, the sensitivity decreases and its performance becomes size-independent, following a universal behavior. In addition, our analysis shows that lower filling factors lead to better precision for measuring weak gradient fields.	翻訳日:2023-07-11 16:48:33 公開日:2023-07-08
# 可視赤外ビデオパーソン再同定のための対向的自己攻撃防御と空間的時間的関係マイニング Adversarial Self-Attack Defense and Spatial-Temporal Relation Mining for Visible-Infrared Video Person Re-Identification ( http://arxiv.org/abs/2307.03903v1 ) ライセンス: Link先を確認	Huafeng Li, Le Xu, Yafei Zhang, Dapeng Tao, Zhengtao Yu	(参考訳) 可視赤外ビデオパーソナライゼーション(re-ID)では、複雑なシーン(モダリティ、カメラビュー、歩行者ポーズ、背景など)の変化の影響を受けない特徴を抽出し、移動情報をマイニングし活用することが、横断的歩行者識別マッチングの鍵となる。そこで本研究では,新たな視点,すなわち対人自己攻撃防衛と時空間関係のマイニングの観点から,新しい可視赤外ビデオパーソンre-ID手法を提案する。本研究では,視点,姿勢,背景,モーダルの不一致の変化が,人物のアイデンティティ特徴の摂動を引き起こす主な要因であると考えられる。トレーニングサンプルに含まれるそのような干渉情報は、対向摂動として使用される。トレーニング中にre-idモデルに対して敵対的な攻撃を行い、これらの不利な要因に対してモデルをより堅牢にする。敵の摂動からの攻撃は、入力サンプルに含まれる干渉情報を敵のサンプルを発生させることなく活性化し、敵の自己攻撃(adversarial self-ack)と呼ばれる。この設計により、敵の攻撃と防御を一つのフレームワークに統合できる。本稿では,映像列における情報を利用する空間-時間情報案内特徴表現ネットワークを提案する。ネットワークは、ビデオフレームシーケンスに含まれる情報を抽出するだけでなく、空間内のローカル情報の関係を利用してネットワークをガイドし、より堅牢な特徴を抽出する。提案手法は,大規模なクロスモダリティビデオデータセットにおいて魅力的な性能を示す。提案手法のソースコードはhttps://github.com/lhf12278/xxxで公開される。 In visible-infrared video person re-identification (re-ID), extracting features not affected by complex scenes (such as modality, camera views, pedestrian pose, background, etc.) changes, and mining and utilizing motion information are the keys to solving cross-modal pedestrian identity matching. To this end, the paper proposes a new visible-infrared video person re-ID method from a novel perspective, i.e., adversarial self-attack defense and spatial-temporal relation mining. In this work, the changes of views, posture, background and modal discrepancy are considered as the main factors that cause the perturbations of person identity features. Such interference information contained in the training samples is used as an adversarial perturbation. It performs adversarial attacks on the re-ID model during the training to make the model more robust to these unfavorable factors. The attack from the adversarial perturbation is introduced by activating the interference information contained in the input samples without generating adversarial samples, and it can be thus called adversarial self-attack. This design allows adversarial attack and defense to be integrated into one framework. This paper further proposes a spatial-temporal information-guided feature representation network to use the information in video sequences. The network cannot only extract the information contained in the video-frame sequences but also use the relation of the local information in space to guide the network to extract more robust features. The proposed method exhibits compelling performance on large-scale cross-modality video datasets. The source code of the proposed method will be released at https://github.com/lhf12278/xxx.	翻訳日:2023-07-11 16:48:11 公開日:2023-07-08
# クラス構造とクラスタ構造を同時に保存する特徴選択 Feature selection simultaneously preserving both class and cluster structures ( http://arxiv.org/abs/2307.03902v1 ) ライセンス: Link先を確認	Suchismita Das and Nikhil R. Pal	(参考訳) データセットがクラス構造とクラスタ構造に有意な差異がある場合、クラスを識別することだけを目的とした機能を選択するとクラスタリング性能が低下し、同じように、クラスタ構造のみを保存することを目的とした機能選択では、分類性能が低下する。この知識を最大限に活用するには,クラス識別とクラスタ構造保存を同時に考慮した特徴選択手法は文献にない。本稿では,クラス識別と構造保存の両方を統合的に重視するニューラルネットワークに基づく特徴選択手法を提案することにより,このギャップを埋める試みを行った。典型的分類問題の評価に加えて,超スペクトル画像における帯域選択の有効性について検討した。実験の結果から,提案した特徴/帯域選択により,分類とクラスタリングの両方に優れた特徴のサブセットを選択できると主張する。 When a data set has significant differences in its class and cluster structure, selecting features aiming only at the discrimination of classes would lead to poor clustering performance, and similarly, feature selection aiming only at preserving cluster structures would lead to poor classification performance. To the best of our knowledge, a feature selection method that simultaneously considers class discrimination and cluster structure preservation is not available in the literature. In this paper, we have tried to bridge this gap by proposing a neural network-based feature selection method that focuses both on class discrimination and structure preservation in an integrated manner. In addition to assessing typical classification problems, we have investigated its effectiveness on band selection in hyperspectral images. Based on the results of the experiments, we may claim that the proposed feature/band selection can select a subset of features that is good for both classification and clustering.	翻訳日:2023-07-11 16:47:45 公開日:2023-07-08
# 物理学におけるアクティブラーニング:101から進歩,そして展望へ Active Learning in Physics: From 101, to Progress, and Perspective ( http://arxiv.org/abs/2307.03899v1 ) ライセンス: Link先を確認	Yongcheng Ding, Jos\'e D. Mart\'in-Guerrero, Yolanda Vives-Gilabert, Xi Chen	(参考訳) Active Learning (AL) は、機械学習(ML)アルゴリズムのファミリであり、人工知能の現在の時代をさかのぼる。トレーニングにラベル付きサンプルを必要とする従来のアプローチとは異なり、alは専門家が注釈を付けるためにラベルなしサンプルを反復的に選択する。このプロトコルは、最も情報に富むサンプルを優先することを目的としており、ラベル付きサンプルのトレーニングと比較してモデル性能が改善されている。近年、alは特に物理学の分野で注目を集めている。本稿では,al理論の包括的かつ到達可能な紹介を行い,様々な領域における最新の進歩を概観する。さらに、ALと量子MLの統合の可能性を探り、ALを古典MLの量子領域への単なる拡張と見なすのではなく、これらの2つの場の相乗的融合を想定する。 Active Learning (AL) is a family of machine learning (ML) algorithms that predates the current era of artificial intelligence. Unlike traditional approaches that require labeled samples for training, AL iteratively selects unlabeled samples to be annotated by an expert. This protocol aims to prioritize the most informative samples, leading to improved model performance compared to training with all labeled samples. In recent years, AL has gained increasing attention, particularly in the field of physics. This paper presents a comprehensive and accessible introduction to the theory of AL reviewing the latest advancements across various domains. Additionally, we explore the potential integration of AL with quantum ML, envisioning a synergistic fusion of these two fields rather than viewing AL as a mere extension of classical ML into the quantum realm.	翻訳日:2023-07-11 16:47:34 公開日:2023-07-08
# StyleGAN3:翻訳と回転の等価性向上のための生成ネットワーク StyleGAN3: Generative Networks for Improving the Equivariance of Translation and Rotation ( http://arxiv.org/abs/2307.03898v1 ) ライセンス: Link先を確認	Tianlei Zhu, Junqi Chen, Renzhe Zhu, Gaurav Gupta	(参考訳) StyleGANは、顔の姿勢やアイデンティティに影響を及ぼすスタイルや、髪、しわ、肌の色、その他の詳細に影響を及ぼすノイズを利用することができる。これらのうち、画像処理の結果はスタイルGANの異なるバージョンによって若干異なる。その結果, styleGAN2 と styleGAN3 の2つの改良版の比較が本研究の主な焦点となる。 FFHQデータセットをデータセットとして使用し,FID,EQ-T,EQ-Rをモデル評価に使用した。結局、Stylegan3バージョンは同値性を改善するためのより良い生成ネットワークであることが判明した。私たちの発見は、アニメーションやビデオの作成にポジティブな影響を与えます。 StyleGAN can use style to affect facial posture and identity features, and noise to affect hair, wrinkles, skin color and other details. Among these, the outcomes of the picture processing will vary slightly between different versions of styleGAN. As a result, the comparison of performance differences between styleGAN2 and the two modified versions of styleGAN3 will be the main focus of this study. We used the FFHQ dataset as the dataset and FID, EQ-T, and EQ-R were used to be the assessment of the model. In the end, we discovered that Stylegan3 version is a better generative network to improve the equivariance. Our findings have a positive impact on the creation of animation and videos.	翻訳日:2023-07-11 16:47:22 公開日:2023-07-08
# イテレーティブ・プロンプティングによる曖昧な質問への回答 Answering Ambiguous Questions via Iterative Prompting ( http://arxiv.org/abs/2307.03897v1 ) ライセンス: Link先を確認	Weiwei Sun and Hengyi Cai and Hongshen Chen and Pengjie Ren and Zhumin Chen and Maarten de Rijke and Zhaochun Ren	(参考訳) オープンドメインの質問応答では、質問のあいまいさのため、複数の妥当な答えが存在する可能性がある。曖昧な質問に対して実現可能な回答を提供するには、すべての有効な回答を直接予測するアプローチがあるが、これは妥当性と多様性のバランスに苦労する可能性がある。別の方法として、候補回答を収集して集約する方法があるが、この方法は計算コストが高く、回答間の依存関係を無視することができる。本稿では,あいまいな疑問に答える既存手法の不完全性に対処するため,AmbigPromptを提案する。具体的には,応答モデルと応答モデルとを反復的に統合する。プロンプトモデルは、読み出しプロセスを適応的に追跡し、応答モデルを段階的にトリガーして、個別かつ関連する回答を構成する。さらに、応答モデルとプロンプトモデルの両方に対してタスク固有のポストプレトレーニング手法を開発し、フレームワークの性能を大幅に改善する。一般に使用されている2つのオープンベンチマークに関する実証研究によると、AmbigPromptは、メモリ使用量が少なく、競合するアプローチよりも推論レイテンシが低い状態で、最先端または競合的な結果を達成する。さらに、AmbigPromptは低リソース設定でもうまく機能する。コードは、https://github.com/sunnweiwei/AmbigPrompt.comで入手できる。 In open-domain question answering, due to the ambiguity of questions, multiple plausible answers may exist. To provide feasible answers to an ambiguous question, one approach is to directly predict all valid answers, but this can struggle with balancing relevance and diversity. An alternative is to gather candidate answers and aggregate them, but this method can be computationally costly and may neglect dependencies among answers. In this paper, we present AmbigPrompt to address the imperfections of existing approaches to answering ambiguous questions. Specifically, we integrate an answering model with a prompting model in an iterative manner. The prompting model adaptively tracks the reading process and progressively triggers the answering model to compose distinct and relevant answers. Additionally, we develop a task-specific post-pretraining approach for both the answering model and the prompting model, which greatly improves the performance of our framework. Empirical studies on two commonly-used open benchmarks show that AmbigPrompt achieves state-of-the-art or competitive results while using less memory and having a lower inference latency than competing approaches. Additionally, AmbigPrompt also performs well in low-resource settings. The code are available at: https://github.com/sunnweiwei/AmbigPrompt.	翻訳日:2023-07-11 16:47:12 公開日:2023-07-08
# 量子スターリング熱エンジンの臨界挙動 The Critical Behavior of Quantum Stirling Heat Engine ( http://arxiv.org/abs/2307.03895v1 ) ライセンス: Link先を確認	Yuan-Sheng Wang, Man-Hong Yung, Dazhi Xu, Maoxin Liu, Xiaosong Chen	(参考訳) 量子ラビモデル (QRM) をモデルとした作業物質 (WS) を用いたスターリングサイクルの性能について検討し, 臨界が効率に与える影響について検討した。以上の結果から,QRMの臨界値がスターリングサイクルの効率向上に有効であることが示唆された。さらに, 高温貯水池の温度が有限であっても, WS パラメータが臨界点に近づくと, カルノー効率が漸近的に達成可能であることが観察された。さらに、スターリングサイクルの効率に対する臨界挙動を導出し、WSパラメータが臨界点に近づくにつれて、効率がカルノー効率に漸近的に近づいたかを示す。我々の研究は、スターリング熱エンジンの性能に対する臨界の影響の理解を深めている。 We investigate the performance of a Stirling cycle with a working substance (WS) modeled as the quantum Rabi model (QRM), exploring the impact of criticality on its efficiency. Our findings indicate that the criticality of the QRM has a positive effect on improving the efficiency of the Stirling cycle. Furthermore, we observe that the Carnot efficiency is asymptotically achievable as the WS parameter approaches the critical point, even when both the temperatures of the cold and hot reservoirs are finite. Additionally, we derive the critical behavior for the efficiency of the Stirling cycle, demonstrating how the efficiency asymptotically approaches the Carnot efficiency as the WS parameter approaches the critical point. Our work deepens the understanding of the impact of criticality on the performance of a Stirling heat engine.	翻訳日:2023-07-11 16:46:50 公開日:2023-07-08
# コミュニティレコメンデーションのためのメンタルヘルス談話の埋め込み Embedding Mental Health Discourse for Community Recommendation ( http://arxiv.org/abs/2307.03892v1 ) ライセンス: Link先を確認	Hy Dang, Bang Nguyen, Noah Ziems, Meng Jiang	(参考訳) 本稿では,ソーシャルメディア上でのメンタルヘルス支援グループに着目したコミュニティレコメンデーションシステムの開発に,談話埋め込み技術の利用について検討する。ソーシャルメディアプラットフォームは、ユーザーが特定の興味を持ったコミュニティと匿名で接続する手段を提供する。しかし、膨大な数のオンラインコミュニティが利用できるため、ユーザーはメンタルヘルスの懸念に対処するために関連するグループを特定するのが困難になる可能性がある。この課題に対処するために、埋め込み技術を用いて様々なサブレディットコミュニティからの談話情報の統合を検討し、効果的なレコメンデーションシステムの開発を行う。提案手法では,レコメンデーションシステムの性能を高めるために,コンテンツベースおよび協調フィルタリング技術を用いる。提案手法は,提案手法を個別に利用し,レコメンデーションプロセスにおける解釈可能性を提供する。 Our paper investigates the use of discourse embedding techniques to develop a community recommendation system that focuses on mental health support groups on social media. Social media platforms provide a means for users to anonymously connect with communities that cater to their specific interests. However, with the vast number of online communities available, users may face difficulties in identifying relevant groups to address their mental health concerns. To address this challenge, we explore the integration of discourse information from various subreddit communities using embedding techniques to develop an effective recommendation system. Our approach involves the use of content-based and collaborative filtering techniques to enhance the performance of the recommendation system. Our findings indicate that the proposed approach outperforms the use of each technique separately and provides interpretability in the recommendation process.	翻訳日:2023-07-11 16:46:35 公開日:2023-07-08
# 固有値問題の量子技術 Quantum techniques for eigenvalue problems ( http://arxiv.org/abs/2307.03889v1 ) ライセンス: Link先を確認	Dean Lee	(参考訳) 本稿では,量子多体系における固有値問題に対する量子アルゴリズムの簡単な紹介を行う。トピックの広範な調査よりも、断熱進化の本質、変分法、位相検出アルゴリズム、その他いくつかのアプローチを網羅する、いくつかの量子アルゴリズムの概念的理解の提供に注力する。各手法について,潜在的な利点と課題について考察する。 This article is a brief introduction to quantum algorithms for the eigenvalue problem in quantum many-body systems. Rather than a broad survey of topics, we focus on providing a conceptual understanding of several quantum algorithms that cover the essentials of adiabatic evolution, variational methods, phase detection algorithms, and several other approaches. For each method, we discuss the potential advantages and remaining challenges.	翻訳日:2023-07-11 16:46:19 公開日:2023-07-08
# Reward Reweighing, Reselection, Retraining によるプロトタイプ部品ネットワークの改善 Improving Prototypical Part Networks with Reward Reweighing, Reselection, and Retraining ( http://arxiv.org/abs/2307.03887v1 ) ライセンス: Link先を確認	Robin Netzorg, Jiaxun Li, Bin Yu	(参考訳) 近年、モデルの出力をデータの特定の特徴に明確に関連付ける画像分類のための深い解釈可能な手法の開発が進められている。このような手法の1つはプロトタイプ部分ネットワーク(ProtoPNet)であり、入力の有意義な部分に基づいて画像の分類を試みる。この方法は解釈可能な分類となるが、画像の散発的あるいは一貫性のない部分から分類することを学ぶことが多い。これを改善するために、我々は近年のReinforcement Learning with Human Feedback (RLHF) からインスピレーションを得て、これらのプロトタイプを微調整する。 cub-200-2011データセット上で,プロトタイプ品質のヒューマンアノテーションを1～5スケールで収集することで,非盗作プロトタイプの識別を学習する報酬モデルを構築する。完全なrlアップデートに代わり、prototypepical part network (r3-protopnet)の再重み付け、再選択、再トレーニングを行い、protopnetトレーニングループにさらに3つのステップを追加する。最初の2ステップは報酬ベースのリウェイトと再選択であり、プロトタイプと人間のフィードバックを一致させる。最後のステップは、モデルの機能をアップデートされたプロトタイプで再トレーニングすることだ。 R3-ProtoPNetはプロトタイプ全体の一貫性と意味を向上するが、独立して使用するとテスト予測精度は低下する。複数のR3-ProtoPNetをアンサンブルに組み込むと、解釈可能性を維持しながらテスト予測性能が向上する。 In recent years, work has gone into developing deep interpretable methods for image classification that clearly attributes a model's output to specific features of the data. One such of these methods is the prototypical part network (ProtoPNet), which attempts to classify images based on meaningful parts of the input. While this method results in interpretable classifications, this method often learns to classify from spurious or inconsistent parts of the image. Hoping to remedy this, we take inspiration from the recent developments in Reinforcement Learning with Human Feedback (RLHF) to fine-tune these prototypes. By collecting human annotations of prototypes quality via a 1-5 scale on the CUB-200-2011 dataset, we construct a reward model that learns to identify non-spurious prototypes. In place of a full RL update, we propose the reweighted, reselected, and retrained prototypical part network (R3-ProtoPNet), which adds an additional three steps to the ProtoPNet training loop. The first two steps are reward-based reweighting and reselection, which align prototypes with human feedback. The final step is retraining to realign the model's features with the updated prototypes. We find that R3-ProtoPNet improves the overall consistency and meaningfulness of the prototypes, but lower the test predictive accuracy when used independently. When multiple R3-ProtoPNets are incorporated into an ensemble, we find an increase in test predictive performance while maintaining interpretability.	翻訳日:2023-07-11 16:46:13 公開日:2023-07-08
# 迅速な経験的シナリオ Fast Empirical Scenarios ( http://arxiv.org/abs/2307.03927v1 ) ライセンス: Link先を確認	Michael Multerer, Paul Schneider, Rohan Sen	(参考訳) 我々は,サンプルモーメントと整合する大規模かつ高次元のパネルデータから,少数の代表的なシナリオを抽出することを目指す。 2つの新しいアルゴリズムのうち、最初に観測されていないシナリオを識別し、共分散行列のシナリオベースで表現する。第2の提案は、既に実現済みで、高次のサンプルモーメント情報と整合した世界の状態から重要なデータポイントを選択する。どちらのアルゴリズムも計算に効率的であり、一貫したシナリオベースモデリングと高次元数値積分に役立てる。広範囲な数値ベンチマーク研究とポートフォリオ最適化への応用により,提案手法が好まれる。 We seek to extract a small number of representative scenarios from large and high-dimensional panel data that are consistent with sample moments. Among two novel algorithms, the first identifies scenarios that have not been observed before, and comes with a scenario-based representation of covariance matrices. The second proposal picks important data points from states of the world that have already realized, and are consistent with higher-order sample moment information. Both algorithms are efficient to compute, and lend themselves to consistent scenario-based modeling and high-dimensional numerical integration. Extensive numerical benchmarking studies and an application in portfolio optimization favor the proposed algorithms.	翻訳日:2023-07-11 16:40:35 公開日:2023-07-08
# Inchworm法によるオープン量子スピン鎖のリアルタイムシミュレーション Real-Time Simulation of Open Quantum Spin Chains with Inchworm Method ( http://arxiv.org/abs/2307.03924v1 ) ライセンス: Link先を確認	Geshuo Wang, Zhenning Cai	(参考訳) 開放量子系の実時間シミュレーションについて検討し, スピン連鎖をモデルとし, それぞれのスピンがそれぞれの高調波浴と関連している場合について考察した。本手法はスピン-ボソンモデルに対するインヒワーム法とスピン系に対するモジュラーパス積分法を結合する。特に,inchworm法の導入は,数値的な符号問題を大幅に抑制することができる。どちらのメソッドも、互いにシームレスに動作するように調整されている。我々は,図式的手法の言語によるアプローチを表現し,計算コストの漸近的挙動を解析する。本手法を検証するために広範囲な数値実験を行った。 We study the real-time simulation of open quantum systems, where the system is modeled by a spin chain, with each spin associated with its own harmonic bath. Our method couples the inchworm method for the spin-boson model and the modular path integral methodology for spin systems. In particular, the introduction of the inchworm method can significantly suppress the numerical sign problem. Both methods are tweaked to make them work seamlessly with each other. We represent our approach in the language of diagrammatic methods, and analyze the asymptotic behavior of the computational cost. Extensive numerical experiments are done to validate our method.	翻訳日:2023-07-11 16:40:27 公開日:2023-07-08
# 交通密度予測のためのマルチタスク最適化による物理形ニューラルネットワークの学習 Training Physics-Informed Neural Networks via Multi-Task Optimization for Traffic Density Prediction ( http://arxiv.org/abs/2307.03920v1 ) ライセンス: Link先を確認	Bo Wang and A. K. Qin and Sajjad Shafiei and Hussein Dia and Adriana-Simona Mihaita and Hanna Grzybowska	(参考訳) 物理インフォームドニューラルネットワーク(英: Physics-informed Neural Network、PINN)は、特定のデータセットを管理する特定の物理法則(例えば偏微分方程式(PDE)によって記述されたもの)を、そのようなデータセットに基づいてニューラルネットワーク(NN)のトレーニングに組み込む、機械学習における新たな研究フロンティアである。 PINNでは、NNはPDEの解近似器として機能し、PDEはNNトレーニングを導くための事前知識として機能し、トレーニングデータの限られた可用性に直面した場合、NNの望ましい一般化性能につながる。しかし、NNと物理法則の両方から構成される損失の複雑さのため、PINNのトレーニングは非自明な作業である。本研究では,マルチタスク最適化(MTO)パラダイムに基づく新しいPINNトレーニングフレームワークを提案する。この枠組みの下では、与えられた(メイン)タスクと共に複数の補助タスクを作成し、解決する。そこでは、あるタスクの解決から有用な知識を適応モードに転送して、他のタスクの解決を支援する。提案手法を実装し,交通密度予測問題に対処するためのPINNの訓練に応用する。実験の結果,従来のピンの訓練法と比較して,提案手法が性能向上に寄与することが示された。 Physics-informed neural networks (PINNs) are a newly emerging research frontier in machine learning, which incorporate certain physical laws that govern a given data set, e.g., those described by partial differential equations (PDEs), into the training of the neural network (NN) based on such a data set. In PINNs, the NN acts as the solution approximator for the PDE while the PDE acts as the prior knowledge to guide the NN training, leading to the desired generalization performance of the NN when facing the limited availability of training data. However, training PINNs is a non-trivial task largely due to the complexity of the loss composed of both NN and physical law parts. In this work, we propose a new PINN training framework based on the multi-task optimization (MTO) paradigm. Under this framework, multiple auxiliary tasks are created and solved together with the given (main) task, where the useful knowledge from solving one task is transferred in an adaptive mode to assist in solving some other tasks, aiming to uplift the performance of solving the main task. We implement the proposed framework and apply it to train the PINN for addressing the traffic density prediction problem. Experimental results demonstrate that our proposed training framework leads to significant performance improvement in comparison to the traditional way of training the PINN.	翻訳日:2023-07-11 16:40:18 公開日:2023-07-08
# VS-TransGRU:エゴセントリックアクション予測のためのビジュアルセマンティックフュージョンにより強化された新しいトランスフォーマーGRUベースのフレームワーク VS-TransGRU: A Novel Transformer-GRU-based Framework Enhanced by Visual-Semantic Fusion for Egocentric Action Anticipation ( http://arxiv.org/abs/2307.03918v1 ) ライセンス: Link先を確認	Congqi Cao and Ze Sun and Qinyi Lv and Lingtong Min and Yanning Zhang	(参考訳) エゴセントリックな行動予測は、一人称視点で現在の歴史的観察から将来の行動を予測することを目的とした課題である。既存のほとんどの手法は、予測性能を高めるために視覚入力と繰り返しニューラルネットワークに基づくモデルアーキテクチャと損失関数の改善に重点を置いている。しかし、視覚情報のみを考慮し、単一のネットワークアーキテクチャに依存するこれらの手法は、徐々に性能の高原に達する。本稿では,現在の観測と今後の行動の依存関係を十分に把握するために,新しい視覚・意味融合とトランスフォーマーGRUに基づく行動予測フレームワークを提案する。まず、アクション予測の性能を初めて向上するために、ハイレベルなセマンティック情報を導入する。我々は,クラスラベルに基づいて生成した意味的特徴や,視覚観察から直接生成した意味的特徴を用いて,元の視覚的特徴を補うことを提案する。次に, 意味的ギャップを補い, 相補性を十分に活用するために, 効果的な視覚・視覚融合モジュールを提案する。第3に、並列モデルと自己回帰モデルの両方を活用するために、長期連続モデリングのためのTransformerベースのエンコーダと柔軟な反復復号のためのGRUベースのデコーダを設計する。 EPIC-Kitchens と EGTEA Gaze+ の2つの大規模一対人ビューデータセットに対する大規模な実験により,提案手法の有効性が検証された。 Egocentric action anticipation is a challenging task that aims to make advanced predictions of future actions from current and historical observations in the first-person view. Most existing methods focus on improving the model architecture and loss function based on the visual input and recurrent neural network to boost the anticipation performance. However, these methods, which merely consider visual information and rely on a single network architecture, gradually reach a performance plateau. In order to fully understand what has been observed and capture the dependencies between current observations and future actions well enough, we propose a novel visual-semantic fusion enhanced and Transformer GRU-based action anticipation framework in this paper. Firstly, high-level semantic information is introduced to improve the performance of action anticipation for the first time. We propose to use the semantic features generated based on the class labels or directly from the visual observations to augment the original visual features. Secondly, an effective visual-semantic fusion module is proposed to make up for the semantic gap and fully utilize the complementarity of different modalities. Thirdly, to take advantage of both the parallel and autoregressive models, we design a Transformer based encoder for long-term sequential modeling and a GRU-based decoder for flexible iteration decoding. Extensive experiments on two large-scale first-person view datasets, i.e., EPIC-Kitchens and EGTEA Gaze+, validate the effectiveness of our proposed method, which achieves new state-of-the-art performance, outperforming previous approaches by a large margin.	翻訳日:2023-07-11 16:39:52 公開日:2023-07-08
# 音声テキストと大規模言語モデル統合のためのデコーダのみアーキテクチャについて On decoder-only architecture for speech-to-text and large language model integration ( http://arxiv.org/abs/2307.03917v1 ) ライセンス: Link先を確認	Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, Jinyu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu Wu	(参考訳) 大規模言語モデル (LLM) は自然言語処理の分野で大きな成功を収めており、自然言語を用いた人間とコンピュータの相互作用が向上している。しかし,LLMへの音声信号のシームレスな統合は十分に研究されていない。デコーダのみ"アーキテクチャも音声処理タスクではあまり研究されていない。本研究では,音声情報をテキストベース大規模言語モデルに効果的に組み込む新しい手法であるSpeech-LLaMAを提案する。本手法は,圧縮音響特徴をLLMの連続的意味空間にマッピングするために,コネクショニスト時間分類と簡単なオーディオエンコーダを利用する。さらに,音声対テキストタスクのためのデコーダのみのアーキテクチャについても,音声対テキストペアデータのみから,より小規模のランダム初期化音声ラマモデルをトレーニングすることで検討した。音声からテキストへの変換におけるデコーダのみのモデルの有効性を強調して,多言語音声からテキストへの変換タスクの実験を行い,強いベースラインに対する大幅な改善を示す。 Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language. However, the seamless integration of speech signals into LLMs has not been explored well. The "decoder-only" architecture has also not been well studied for speech processing tasks. In this research, we introduce Speech-LLaMA, a novel approach that effectively incorporates acoustic information into text-based large language models. Our method leverages Connectionist Temporal Classification and a simple audio encoder to map the compressed acoustic features to the continuous semantic space of the LLM. In addition, we further probe the decoder-only architecture for speech-to-text tasks by training a smaller scale randomly initialized speech-LLaMA model from speech-text paired data alone. We conduct experiments on multilingual speech-to-text translation tasks and demonstrate a significant improvement over strong baselines, highlighting the potential advantages of decoder-only models for speech-to-text conversion.	翻訳日:2023-07-11 16:39:24 公開日:2023-07-08
# ゼロフィールド量子センシングのためのv字形3レベル系の位相幾何制御 Phased Geometric Controls of V-Shaped Three-Level System for Zero-field Quantum Sensing ( http://arxiv.org/abs/2307.03916v1 ) ライセンス: Link先を確認	Zhijie Li, Xiangyu Ye, Xi Kong, Tianyu Xie, Zhiping Yang, Pengju Zhao, Ya Wang, Fazhan Shi, Jiangfeng Du	(参考訳) 本稿では,V型3レベルスピン系におけるゼロフィールド二重量子ゲートの位相制御プロトコルを提案する。この方法は線形偏波マイクロ波パルスを利用し、幾何学的量子ビット特性を利用して状態漏洩を防止する。特定の位相制御を用いることで,ダイヤモンド中の単一窒素空孔中心を用いた低出力マルチパルスゼロフィールドセンシング技術を実現する。本手法は, 高精度な二重量子ゲート演算を適応駆動力で実現するための新しい手法を提供し, ゼロフィールドスピンベース量子技術に有用なツールである。 Here we propose and demonstrate a phased geometric control protocol for zero-field double quantum gates in a V-shaped three-level spin system. This method utilizes linearly polarized microwave pulses and exploits the geometric qubit properties to prevent state leakage. By employing specific phased geometric controls, we realize a low-power multi-pulse zero-field sensing technique using single nitrogen-vacancy centers in diamond. Our method offers a novel approach to implement precise double quantum gate operations with an adaptable driving power, making it a valuable tool for zero-field spin-based quantum technology.	翻訳日:2023-07-11 16:39:06 公開日:2023-07-08
# 効果的な人間-AIコラボレーション開発における人間中心AIの適用:人間-AI共同認知システムの観点から Applying human-centered AI in developing effective human-AI teaming: A perspective of human-AI joint cognitive systems ( http://arxiv.org/abs/2307.03913v1 ) ライセンス: Link先を確認	Wei Xu, Zaifeng Gao	(参考訳) 研究と応用は、AIシステムを開発するための新しいパラダイムとして、HAT(Human-AI Teaming)を使用している。 HATは、AIが単なるツールではなく、チームメイトとして機能することを認識している。効果的な人間-AIチームは、各メンバの既知の課題と制限を克服しつつ、人間とAIの両方のユニークな能力を活用でき、人間の能力を増強し、どちらのエンティティよりも共同パフォーマンスを高める必要がある。 National AI Research and Strategic Plan 2023アップデートは、AIシステムの独立したパフォーマンスに重点を置く研究プログラムが、動的、適応的、協力的なチームの中でAIが提供しなければならない機能を考慮するのに失敗し、人間とAIのコラボレーションとコラボレーションに関するさらなる研究を求めることを認識している。しかし、AIが人間とチームメイトとして機能するかどうかについては議論がある。第一の懸念は、"チーム"パラダイムを採用することは、人間中心のAI(HCAI)アプローチと矛盾するため、AIシステムのコントロールを失うことである。本稿では、HATパラダイムと議論をさらに分析する。具体的には,人間とAIの協調認知システム(HAIJCS)の概念枠組みを詳述し,HCAI傘の下でのHAT表現に適用する。 HAIJCSはHCAIを有効化しながらHAIを採用するのに役立つと考えている。 HAIJCSの意義と今後の課題についても論じる。洞察:aiは新しい形の人間-機械関係の出現につながった:人間-aiチーム(hat)、人間-aiシステムにおけるパラダイムシフト、新しいデザインパラダイムとして帽子を適用する際に人間中心のai(hcai)アプローチに従うこと、効果的な人間-aiチームを作るための帽子を表現・実装するための人間-ai合同認知システム(haijcs)の概念的枠組みを提案する。 Research and application have used human-AI teaming (HAT) as a new paradigm to develop AI systems. HAT recognizes that AI will function as a teammate instead of simply a tool in collaboration with humans. Effective human-AI teams need to be capable of taking advantage of the unique abilities of both humans and AI while overcoming the known challenges and limitations of each member, augmenting human capabilities, and raising joint performance beyond that of either entity. The National AI Research and Strategic Plan 2023 update has recognized that research programs focusing primarily on the independent performance of AI systems generally fail to consider the functionality that AI must provide within the context of dynamic, adaptive, and collaborative teams and calls for further research on human-AI teaming and collaboration. However, there has been debate about whether AI can work as a teammate with humans. The primary concern is that adopting the "teaming" paradigm contradicts the human-centered AI (HCAI) approach, resulting in humans losing control of AI systems. This article further analyzes the HAT paradigm and the debates. Specifically, we elaborate on our proposed conceptual framework of human-AI joint cognitive systems (HAIJCS) and apply it to represent HAT under the HCAI umbrella. We believe that HAIJCS may help adopt HAI while enabling HCAI. The implications and future work for HAIJCS are also discussed. Insights: AI has led to the emergence of a new form of human-machine relationship: human-AI teaming (HAT), a paradigmatic shift in human-AI systems; We must follow a human-centered AI (HCAI) approach when applying HAT as a new design paradigm; We propose a conceptual framework of human-AI joint cognitive systems (HAIJCS) to represent and implement HAT for developing effective human-AI teaming	翻訳日:2023-07-11 16:38:57 公開日:2023-07-08
# FPGAにおけるスパイクニューラルネットワーク加速器の検討 A Survey of Spiking Neural Network Accelerator on FPGA ( http://arxiv.org/abs/2307.03910v1 ) ライセンス: Link先を確認	Murat Isik	(参考訳) カスタマイズトポロジを実装する能力により、FPGAは組み込みアプリケーションと高性能アプリケーションの両方にSNNをデプロイするのにますます利用されている。本稿では,最新のSNN実装とそのFPGAへの応用について検討する。我々は最近広く使われているスパイキングニューロンモデル、ネットワーク構造、信号符号化フォーマットを収集し、FPGAベースのSNN実装のための関連するハードウェア設計スキームを列挙した。これまでの調査と比較すると,本書は,先程の技術的スキームを適用した応用事例を列挙している。そこで本研究では,FPGA上でSNNを実装する際のアクセラレーションポテンシャルについて論じる。上記の議論によると、今後のトレンドは、本論文で議論され、関連する主題のさらなる発展のためのガイドラインを示す。 Due to the ability to implement customized topology, FPGA is increasingly used to deploy SNNs in both embedded and high-performance applications. In this paper, we survey state-of-the-art SNN implementations and their applications on FPGA. We collect the recent widely-used spiking neuron models, network structures, and signal encoding formats, followed by the enumeration of related hardware design schemes for FPGA-based SNN implementations. Compared with the previous surveys, this manuscript enumerates the application instances that applied the above-mentioned technical schemes in recent research. Based on that, we discuss the actual acceleration potential of implementing SNN on FPGA. According to our above discussion, the upcoming trends are discussed in this paper and give a guideline for further advancement in related subjects.	翻訳日:2023-07-11 16:38:06 公開日:2023-07-08
# 階層分類アルゴリズムを用いたディープQネットワークの導入 Incorporating Deep Q -- Network with Multiclass Classification Algorithms ( http://arxiv.org/abs/2307.03908v1 ) ライセンス: Link先を確認	Noopur Zambare, Ravindranath Sawane	(参考訳) 本研究では,マルチクラス分類アルゴリズムの機能を深層q-network(dqn)がいかに改善するかを検討する。我々はKaggleのベンチマークデータセットを使用して、既存の教師付きマルチクラス分類アルゴリズムにDQNを組み込んだフレームワークを作成する。本研究の結果は,多クラス分類の精度を高めるために,深層強化学習戦略の活用方法について考察する。画像認識、自然言語処理、バイオインフォマティクスなど、様々な分野で使用されている。本研究は、企業における財政難の予測と、マルチクラス分類におけるディープQネットワークの広範な適用に焦点を当てた。金融危機を経験する可能性のあるビジネスを特定することは、金融とリスク管理の分野において重要なタスクである。ビジネスが経営を継続し、財務上の責任を果たすことが重大な困難に直面すると、財務上の困難に陥ると言われている。一般的には、企業が利益率やキャッシュフローの問題、持続不可能なレベルの負債の急な不況に陥っている場合に発生する。 In this study, we explore how Deep Q-Network (DQN) might improve the functionality of multiclass classification algorithms. We will use a benchmark dataset from Kaggle to create a framework incorporating DQN with existing supervised multiclass classification algorithms. The findings of this study will bring insight into how deep reinforcement learning strategies may be used to increase multiclass classification accuracy. They have been used in a number of fields, including image recognition, natural language processing, and bioinformatics. This study is focused on the prediction of financial distress in companies in addition to the wider application of Deep Q-Network in multiclass classification. Identifying businesses that are likely to experience financial distress is a crucial task in the fields of finance and risk management. Whenever a business experiences serious challenges keeping its operations going and meeting its financial responsibilities, it is said to be in financial distress. It commonly happens when a company has a sharp and sustained recession in profitability, cash flow issues, or an unsustainable level of debt.	翻訳日:2023-07-11 16:37:48 公開日:2023-07-08
# scriptworld: 手続き知識学習のためのテキストベース環境 ScriptWorld: Text Based Environment For Learning Procedural Knowledge ( http://arxiv.org/abs/2307.03906v1 ) ライセンス: Link先を確認	Abhinav Joshi and Areeb Ahmad and Umang Pandey and Ashutosh Modi	(参考訳) テキストベースのゲームは、強化学習ベースのエージェントで自然言語理解と世界に関する常識知識を開発するためのフレームワークを提供する。既存のテキストベースの環境は、しばしばゲームフレームワークを作成するために架空の状況やキャラクターに依存しており、現実のシナリオとは程遠い。本稿では,実世界の日常雑用についてエージェントに教えるテキストベースの環境であるScriptWorldを紹介する。私たちの知る限りでは、スクリプトデータセットを使用して設計された日々の現実世界のヒューマンアクティビティで構成される、最初のインタラクティブなテキストベースのゲームフレームワークです。 10日間の活動にゲーム環境を提供し,提案環境の詳細な分析を行う。 ScriptworldでゲームをするRLベースのベースラインモデル/エージェントを開発した。このような環境における言語モデルの役割を理解するために,RLエージェントの事前学習言語モデルから得られた特徴を利用する。本実験は,事前学習した言語モデルから得られた事前知識が,実世界のテキストベースのゲーム環境の解決に役立つことを示す。環境はgithubからリリースしています。 https://github.com/exploration-lab/scriptworld Text-based games provide a framework for developing natural language understanding and commonsense knowledge about the world in reinforcement learning based agents. Existing text-based environments often rely on fictional situations and characters to create a gaming framework and are far from real-world scenarios. In this paper, we introduce ScriptWorld: a text-based environment for teaching agents about real-world daily chores and hence imparting commonsense knowledge. To the best of our knowledge, it is the first interactive text-based gaming framework that consists of daily real-world human activities designed using scripts dataset. We provide gaming environments for 10 daily activities and perform a detailed analysis of the proposed environment. We develop RL-based baseline models/agents to play the games in Scriptworld. To understand the role of language models in such environments, we leverage features obtained from pre-trained language models in the RL agents. Our experiments show that prior knowledge obtained from a pre-trained language model helps to solve real-world text-based gaming environments. We release the environment via Github: https://github.com/Exploration-Lab/ScriptWorld	翻訳日:2023-07-11 16:37:08 公開日:2023-07-08
# スピンアンサンブルと非エルミート位相的エッジ状態との強結合の強化 Enhanced Strong Coupling between Spin Ensemble and non-Hermitian Topological Edge States ( http://arxiv.org/abs/2307.03944v1 ) ライセンス: Link先を確認	Jie Qian, Jie Li, Shi-Yao Zhu, J. Q. You, and Yi-Pu Wang	(参考訳) 光間相互作用は、基本的な現象の理解と多用途なアプリケーションの開発の両方に不可欠である。強い結合、堅牢性、制御性は、光間相互作用を実現する上で最も重要な3つの側面である。位相的および非エルミートフォトニクスは、それぞれロバスト性と広範な制御の自由のための枠組みを提供している。状態のフォトニック密度、非エルミート工学を用いて散乱パラメータなどのエッジ状態の特性を設計する方法は、位相的保護が十分に研究されていないことを保証している。ここではパリティ時間対称二量化フォトニック格子を構築し、自発的PT対称性の破れによって複素値のエッジ状態を生成する。強磁性スピンアンサンブルにおける位相フォトニックエッジモードとマグノンモードとの強結合の強化を示す。本研究は, 微妙な非エルミート位相境界状態を明らかにし, トポロジカル光-物質相互作用の実現と工学的手法を提供する。 Light-matter interaction is crucial to both understanding fundamental phenomena and developing versatile applications. Strong coupling, robustness, and controllability are the three most important aspects in realizing light-matter interactions. Topological and non-Hermitian photonics, have provided frameworks for robustness and extensive control freedom, respectively. How to engineer the properties of the edge state such as photonic density of state, scattering parameters by using non-Hermitian engineering while ensuring topological protection has not been fully studied. Here we construct a parity-time-symmetric dimerized photonic lattice and generate complex-valued edge states via spontaneous PT-symmetry breaking. The enhanced strong coupling between the topological photonic edge mode and magnon mode in a ferromagnetic spin ensemble is demonstrated. Our research reveals the subtle non-Hermitian topological edge states and provides strategies for realizing and engineering topological light-matter interactions.	翻訳日:2023-07-11 16:28:56 公開日:2023-07-08
# 特徴グラフトと気晴らし認識を用いたカモフラージュ物体検出 Camouflaged Object Detection with Feature Grafting and Distractor Aware ( http://arxiv.org/abs/2307.03943v1 ) ライセンス: Link先を確認	Yuxuan Song and Xinyue Li and Lin Qi	(参考訳) カモフラーゲ型物体検出(COD)の課題は、ターゲットと背景の間のテクスチャが視覚的に区別できないため、通常の検出よりも難しい環境に統合されたカモフラーゲ型物体を正確に分割することである。本稿では,CODタスクを処理するために,FDNet(Feature Grafting and Distractor Aware Network)を提案する。具体的には、CNNとTransformerを使ってマルチスケール画像を並列にエンコードする。 2つのエンコーダの利点をよりよく探究するために、トランスフォーマーブランチから抽出された特徴をcnnブランチにグラフトするために、クロスアテンションベースの特徴グラフトモジュールを設計し、その特徴をfeature fusionモジュールに集約する。 distractor awareモジュールはcodタスクで考えられる2つの邪魔者を明確にモデル化し、粗いカモフラージュマップを洗練するように設計されている。また,アノテーション付き2000枚の画像を含む最大人工カモフラージュオブジェクトデータセット(ACOD2K)も提案した。 4つのベンチマークデータセットとacod2kデータセットについて広範な実験を行った。その結果,本手法は他の最先端手法よりも優れていた。コードとACOD2Kはhttps://github.com/syxvision/FDNetで入手できる。 The task of Camouflaged Object Detection (COD) aims to accurately segment camouflaged objects that integrated into the environment, which is more challenging than ordinary detection as the texture between the target and background is visually indistinguishable. In this paper, we proposed a novel Feature Grafting and Distractor Aware network (FDNet) to handle the COD task. Specifically, we use CNN and Transformer to encode multi-scale images in parallel. In order to better explore the advantages of the two encoders, we design a cross-attention-based Feature Grafting Module to graft features extracted from Transformer branch into CNN branch, after which the features are aggregated in the Feature Fusion Module. A Distractor Aware Module is designed to explicitly model the two possible distractors in the COD task to refine the coarse camouflage map. We also proposed the largest artificial camouflaged object dataset which contains 2000 images with annotations, named ACOD2K. We conducted extensive experiments on four widely used benchmark datasets and the ACOD2K dataset. The results show that our method significantly outperforms other state-of-the-art methods. The code and the ACOD2K will be available at https://github.com/syxvision/FDNet.	翻訳日:2023-07-11 16:28:41 公開日:2023-07-08
# ariadne's thread:テキストプロンプトによる胸部x線画像からの感染領域の分割の改善 Ariadne's Thread:Using Text Prompts to Improve Segmentation of Infected Areas from Chest X-ray images ( http://arxiv.org/abs/2307.03942v1 ) ライセンス: Link先を確認	Yi Zhong, Mengqiu Xu, Kongming Liang, Kaixin Chen, and Ming Wu	(参考訳) 肺感染症のような肺疾患の重症度を測定するには,肺感染部位の分節化が不可欠である。既存の医用画像分割法は画像に基づくほぼ一様法である。しかし、これらの画像のみの手法は、大量の注釈データで訓練されない限り、不正確な結果を生み出す傾向がある。この課題を克服するために,テキストプロンプトを用いてセグメント化結果を改善する言語駆動セグメンテーション手法を提案する。 QaTa-COV19データセットの実験から,本手法は,少なくともユニモーダル法と比較して,Diceのスコアを6.09%向上させることが示された。さらに,本研究では,テキストの情報粒度の観点からのマルチモーダル手法の柔軟性を明らかにし,必要なトレーニングデータのサイズにおいて,マルチモーダル手法が画像のみの手法よりも大きなアドバンテージを持つことを示す。 Segmentation of the infected areas of the lung is essential for quantifying the severity of lung disease like pulmonary infections. Existing medical image segmentation methods are almost uni-modal methods based on image. However, these image-only methods tend to produce inaccurate results unless trained with large amounts of annotated data. To overcome this challenge, we propose a language-driven segmentation method that uses text prompt to improve to the segmentation result. Experiments on the QaTa-COV19 dataset indicate that our method improves the Dice score by 6.09% at least compared to the uni-modal methods. Besides, our extended study reveals the flexibility of multi-modal methods in terms of the information granularity of text and demonstrates that multi-modal methods have a significant advantage over image-only methods in terms of the size of training data required.	翻訳日:2023-07-11 16:28:19 公開日:2023-07-08
# 大規模言語モデルの時代に忘れられる権利:含意、課題、解決策 Right to be Forgotten in the Era of Large Language Models: Implications, Challenges, and Solutions ( http://arxiv.org/abs/2307.03941v1 ) ライセンス: Link先を確認	Dawen Zhang, Pamela Finckenberg-Broman, Thong Hoang, Shidong Pan, Zhenchang Xing, Mark Staples, Xiwei Xu	(参考訳) 忘れられる権利(rtbf)は、google spain sl、google inc. v aepd、mario costeja gonz\'alezの裁定によって最初に確立され、後に欧州連合の一般データ保護規則(gdpr)の下で消去する権利として含まれ、個人が個人に個人データを削除する権利が組織によって削除された。特に検索エンジンに関しては,個人がクエリ結果から情報を除外するための要求を組織に送信することができる。近年,Large Language Models (LLM) が開発され,チャットボットでの利用により,LLM対応ソフトウェアシステムが普及している。しかし、RTBFから除外されることはない。検索エンジンが使用するインデックス化手法と比較して、LLMは情報を全く異なる方法で保存し処理する。これはRTBFへの準拠に新たな課題をもたらす。本稿では,これらの課題を探求し,機械アンラーニング,モデル編集,エンジニアリングの促進など,RTBFの技術的ソリューションの実装方法に関する知見を提供する。 The Right to be Forgotten (RTBF) was first established as the result of the ruling of Google Spain SL, Google Inc. v AEPD, Mario Costeja Gonz\'alez, and was later included as the Right to Erasure under the General Data Protection Regulation (GDPR) of European Union to allow individuals the right to request personal data be deleted by organizations. Specifically for search engines, individuals can send requests to organizations to exclude their information from the query results. With the recent development of Large Language Models (LLMs) and their use in chatbots, LLM-enabled software systems have become popular. But they are not excluded from the RTBF. Compared with the indexing approach used by search engines, LLMs store, and process information in a completely different way. This poses new challenges for compliance with the RTBF. In this paper, we explore these challenges and provide our insights on how to implement technical solutions for the RTBF, including the use of machine unlearning, model editing, and prompting engineering.	翻訳日:2023-07-11 16:28:04 公開日:2023-07-08
# スキーマ複雑異種情報ネットワークのための帰納的メタパス学習 Inductive Meta-path Learning for Schema-complex Heterogeneous Information Networks ( http://arxiv.org/abs/2307.03937v1 ) ライセンス: Link先を確認	Shixuan Liu, Changjun Fan, Kewei Cheng, Yunfei Wang, Peng Cui, Yizhou Sun, Zhong Liu	(参考訳) Heterogeneous Information Networks (HIN) は、複数のノードとエッジを持つ情報ネットワークである。メタパスの概念、すなわち2つのエンティティを接続するエンティティタイプと関係型のシーケンスは、様々なHINタスクのためのメタレベル説明可能なセマンティクスを提供するために提案される。伝統的に、メタパスは主にスキーマ単純なHIN(例えば、いくつかのエンティティタイプしか持たない書誌ネットワーク)で使用される。しかし、数百のエンティティと関係型を持つ知識ベース(KB)のようなスキーマ複雑なHINに対するメタパスの採用は、メタパス列挙に伴う計算複雑性のために制限されている。さらに、メタパスを効果的に評価するには、関連するパスインスタンスを列挙する必要がある。これらの課題に対処するために,スキーマ複雑HINのための帰納的メタパス学習フレームワークであるSchemaWalkを提案する。様々な関係に対するメタパスのスコアの学習を支援するため、スキーマレベルの表現を持つメタパスを表現し、各関係に対する徹底したパスインスタンス列挙の必要性を緩和する。さらに,ネットワークスキーマ(すなわちスキーマグラフ)を直接ナビゲートし,高いカバレッジと複数の関係に対する信頼性を持ったメタパス確立のためのポリシを学習する強化学習ベースのパス探索エージェントを設計する。実データ集合に関する広範な実験により,提案手法の有効性が示された。 Heterogeneous Information Networks (HINs) are information networks with multiple types of nodes and edges. The concept of meta-path, i.e., a sequence of entity types and relation types connecting two entities, is proposed to provide the meta-level explainable semantics for various HIN tasks. Traditionally, meta-paths are primarily used for schema-simple HINs, e.g., bibliographic networks with only a few entity types, where meta-paths are often enumerated with domain knowledge. However, the adoption of meta-paths for schema-complex HINs, such as knowledge bases (KBs) with hundreds of entity and relation types, has been limited due to the computational complexity associated with meta-path enumeration. Additionally, effectively assessing meta-paths requires enumerating relevant path instances, which adds further complexity to the meta-path learning process. To address these challenges, we propose SchemaWalk, an inductive meta-path learning framework for schema-complex HINs. We represent meta-paths with schema-level representations to support the learning of the scores of meta-paths for varying relations, mitigating the need of exhaustive path instance enumeration for each relation. Further, we design a reinforcement-learning based path-finding agent, which directly navigates the network schema (i.e., schema graph) to learn policies for establishing meta-paths with high coverage and confidence for multiple relations. Extensive experiments on real data sets demonstrate the effectiveness of our proposed paradigm.	翻訳日:2023-07-11 16:27:44 公開日:2023-07-08
# 量子化ニューラルネットのための効率的なインメモリコンピューティングハードウェアに向けて -最先端, オープンチャレンジと展望- Towards Efficient In-memory Computing Hardware for Quantized Neural Networks: State-of-the-art, Open Challenges and Perspectives ( http://arxiv.org/abs/2307.03936v1 ) ライセンス: Link先を確認	Olga Krestinskaya, Li Zhang, Khaled Nabil Salama	(参考訳) クラウドで処理されるデータの量、IoT(Internet-of-Things)アプリケーションの開発、データプライバシの懸念の増加により、クラウドベースの処理からエッジベースの処理への移行を余儀なくされる。エッジ上の限られたエネルギーと計算資源は、伝統的なフォン・ノイマンアーキテクチャから、特に機械学習やニューラルネットワークアプリケーションのためのインメモリコンピューティング(IMC)への移行を押し進めている。ネットワーク圧縮技術は、限られたハードウェアリソースにニューラルネットワークを実装するために適用される。量子化は、メモリフットプリント、レイテンシ、エネルギー消費を削減できる最も効率的なネットワーク圧縮技術の1つである。本稿では、IMCベースの量子ニューラルネットワーク(QNN)の総合的なレビューを行い、ソフトウェアベースの量子化アプローチとIMCハードウェアの実装を関連付ける。さらに、オープンチャレンジ、QNN設計要件、レコメンデーション、およびIMCベースのQNNハードウェアロードマップも提供される。 The amount of data processed in the cloud, the development of Internet-of-Things (IoT) applications, and growing data privacy concerns force the transition from cloud-based to edge-based processing. Limited energy and computational resources on edge push the transition from traditional von Neumann architectures to In-memory Computing (IMC), especially for machine learning and neural network applications. Network compression techniques are applied to implement a neural network on limited hardware resources. Quantization is one of the most efficient network compression techniques allowing to reduce the memory footprint, latency, and energy consumption. This paper provides a comprehensive review of IMC-based Quantized Neural Networks (QNN) and links software-based quantization approaches to IMC hardware implementation. Moreover, open challenges, QNN design requirements, recommendations, and perspectives along with an IMC-based QNN hardware roadmap are provided.	翻訳日:2023-07-11 16:27:20 公開日:2023-07-08
# カモフラージュ物体検出のためのエッジアウェアミラーネットワーク Edge-Aware Mirror Network for Camouflaged Object Detection ( http://arxiv.org/abs/2307.03932v1 ) ライセンス: Link先を確認	Dongyue Sun, Shiyao Jiang, Lin Qi	(参考訳) 既存のエッジ対応のcamouflaged object detection (COD) 法は、通常は早期にエッジ予測を出力する。しかし、エッジは以下のセグメンテーションタスクにおいて重要かつ基本的な要素である。カモフラージュされたターゲットと周囲の視覚的類似性が高いため、早期に予測されるエッジは通常、誤った前景とセグメンテーションの特徴をもたらす。そこで本研究では,エッジ検出と擬似オブジェクト分割をクロスリファインメントプロセスとしてモデル化した,エッジ対応ミラーネットワーク(EAMNet)を提案する。具体的には、EAMNetは2分岐アーキテクチャを持ち、セグメンテーションによって引き起こされるエッジアグリゲーションモジュールとエッジによって引き起こされる整合アグリゲーションモジュールは、セグメンテーションブランチとエッジ検出ブランチをクロスガイドするように設計されている。残差接続とゲート畳み込みを利用したガイド-残留チャネルアテンションモジュールは、最終的に低レベルの特徴から構造的詳細を抽出する。定量的および定性的な実験の結果、EAMNetは3つの広く使用されているCODデータセットで既存の最先端ベースラインを上回っている。コードはhttps://github.com/sdy1999/EAMNetで入手できる。 Existing edge-aware camouflaged object detection (COD) methods normally output the edge prediction in the early stage. However, edges are important and fundamental factors in the following segmentation task. Due to the high visual similarity between camouflaged targets and the surroundings, edge prior predicted in early stage usually introduces erroneous foreground-background and contaminates features for segmentation. To tackle this problem, we propose a novel Edge-aware Mirror Network (EAMNet), which models edge detection and camouflaged object segmentation as a cross refinement process. More specifically, EAMNet has a two-branch architecture, where a segmentation-induced edge aggregation module and an edge-induced integrity aggregation module are designed to cross-guide the segmentation branch and edge detection branch. A guided-residual channel attention module which leverages the residual connection and gated convolution finally better extracts structural details from low-level features. Quantitative and qualitative experiment results show that EAMNet outperforms existing cutting-edge baselines on three widely used COD datasets. Codes are available at https://github.com/sdy1999/EAMNet.	翻訳日:2023-07-11 16:27:03 公開日:2023-07-08
# rosko: 疎行列乗算カーネルのための外積をスキップする行 Rosko: Row Skipping Outer Products for Sparse Matrix Multiplication Kernels ( http://arxiv.org/abs/2307.03930v1 ) ライセンス: Link先を確認	Vikas Natesh, Andrew Sabot, H.T. Kung, Mark Ting	(参考訳) 深層ニューラルネットワーク(DNN)の計算とメモリアクセス要求を低減するために,スパース行列乗算(SpMM)カーネルを導出するための行スキップ外積であるRoskoを提案する。 Roskoは、プログラム実行中の行全体のスキップを可能にする。我々は,プロセッサコアを有効活用し,自動チューニングや探索空間探索を必要とせずにデータ移動を最小化するために,ハードウェア特性に適応するスパースCPUカーネルを解析的に導出した。 Roskoは他の外部製品スケジューリング手法と統合することができ、Roskoのパッキングフォーマットを使用して行スキップを利用して不要な計算を省略することができる。 Roskoカーネルは、さまざまなニューラルネットワークワークロードにわたる実際のハードウェア上で、既存の自動チューニングおよび検索ベースのソリューションと、最先端のベンダ最適化ライブラリを上回っている。機械学習で一般的に見られる65%から99.8%の範囲の行列の場合、RoskoカーネルはIntelとARM CPUの6.5倍のランタイム削減を実現している。 We propose Rosko -- row skipping outer products -- for deriving sparse matrix multiplication (SpMM) kernels in reducing computation and memory access requirements of deep neural networks (DNNs). Rosko allows skipping of entire row computations during program execution with low sparsity-management overheads. We analytically derive sparse CPU kernels that adapt to given hardware characteristics to effectively utilize processor cores and minimize data movement without the need for auto-tuning or search space exploration. Rosko can be integrated with other outer product scheduling methods, allowing them to leverage row skipping by using Rosko's packing format to skip unnecessary computation. Rosko kernels outperform existing auto-tuning and search-based solutions as well as state-of-the-art vendor-optimized libraries on real hardware across a variety of neural network workloads. For matrices with sparsities ranging from 65% to 99.8% typically found in machine learning, Rosko kernels achieve up to a 6.5x runtime reduction on Intel and ARM CPUs.	翻訳日:2023-07-11 16:26:42 公開日:2023-07-08
# Fairness-Aware Graph Neural Networks: A Survey Fairness-Aware Graph Neural Networks: A Survey ( http://arxiv.org/abs/2307.03929v1 ) ライセンス: Link先を確認	April Chen, Ryan A. Rossi, Namyong Park, Puja Trivedi, Yu Wang, Tong Yu, Sungchul Kim, Franck Dernoncourt, Nesreen K. Ahmed	(参考訳) グラフニューラルネットワーク(GNN)はその表現力と多くの基本的な学習タスクにおける最先端の予測性能により、ますます重要になっている。この成功にもかかわらず、GNNは、基礎となるグラフデータと、大規模なGNNモデルの中心にある基本的な集約メカニズムによって生じる公平性の問題に悩まされている。本稿では,GNNの公平性向上のためのフェアネス手法の検討と分類を行う。以前のfair gnnモデルと手法は、前処理段階、トレーニング段階、後処理段階における公平性の改善に焦点を当てているか、という点で議論されている。さらに,このような手法を適切な時にどのように併用するかを議論し,その利点と直感を強調する。また,グラフレベルのフェアネス,近所レベルのフェアネス,埋め込みレベルのフェアネス,予測レベルのフェアネス指標を含む,公正評価指標に対する直感的な分類法を提案する。さらに、GNNモデルの公平性をベンチマークするのに有用なグラフデータセットを簡潔に要約する。最後に、対処すべき重要なオープンな問題と課題を強調します。 Graph Neural Networks (GNNs) have become increasingly important due to their representational power and state-of-the-art predictive performance on many fundamental learning tasks. Despite this success, GNNs suffer from fairness issues that arise as a result of the underlying graph data and the fundamental aggregation mechanism that lies at the heart of the large class of GNN models. In this article, we examine and categorize fairness techniques for improving the fairness of GNNs. Previous work on fair GNN models and techniques are discussed in terms of whether they focus on improving fairness during a preprocessing step, during training, or in a post-processing phase. Furthermore, we discuss how such techniques can be used together whenever appropriate, and highlight the advantages and intuition as well. We also introduce an intuitive taxonomy for fairness evaluation metrics including graph-level fairness, neighborhood-level fairness, embedding-level fairness, and prediction-level fairness metrics. In addition, graph datasets that are useful for benchmarking the fairness of GNN models are summarized succinctly. Finally, we highlight key open problems and challenges that remain to be addressed.	翻訳日:2023-07-11 16:26:25 公開日:2023-07-08
# 差分プライバシーの仮説検証によるデータ再構成攻撃 Bounding data reconstruction attacks with the hypothesis testing interpretation of differential privacy ( http://arxiv.org/abs/2307.03928v1 ) ライセンス: Link先を確認	Georgios Kaissis, Jamie Hayes, Alexander Ziller, Daniel Rueckert	(参考訳) 機械学習モデルに対するデータ再構成攻撃の成功の上限として最近提案されたRestructor Robustness(ReRo)について検討する。これまでの研究では、差分プライバシー(DP)機構がReRoを提供することを示したが、これまでのところ、ReRo境界のモンテカルロの漸近的な推定しか示されていない。したがって、一般DP機構に対する直接計算可能なReRo境界が望ましい。本研究では, 仮説検定 dp と rero の関連を確立し, ラプラス・ガウス機構とそのサブサンプリングされた変種に対する閉形式, 解析的, 数値的rero境界を導出する。 We explore Reconstruction Robustness (ReRo), which was recently proposed as an upper bound on the success of data reconstruction attacks against machine learning models. Previous research has demonstrated that differential privacy (DP) mechanisms also provide ReRo, but so far, only asymptotic Monte Carlo estimates of a tight ReRo bound have been shown. Directly computable ReRo bounds for general DP mechanisms are thus desirable. In this work, we establish a connection between hypothesis testing DP and ReRo and derive closed-form, analytic or numerical ReRo bounds for the Laplace and Gaussian mechanisms and their subsampled variants.	翻訳日:2023-07-11 16:26:06 公開日:2023-07-08
# 時間内の縫い目は9を節約する:低信頼生成の検証によるllmの幻覚の検出と緩和 A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation ( http://arxiv.org/abs/2307.03987v1 ) ライセンス: Link先を確認	Neeraj Varshney, Wenlin Yao, Hongming Zhang, Jianshu Chen, and Dong Yu	(参考訳) 近年、大規模な言語モデルが、フルーエントでコヒーレントなテキストを生成することに顕著な成功を収めている。しかしながら、これらのモデルは、しばしば信頼性を著しく損なう「幻覚」を引き起こす傾向がある。本研究では,この課題に対処し,生成過程において幻覚を積極的に検出し緩和する手法を提案する。具体的には、まずモデルのロジット出力値を利用した潜在的幻覚の候補を特定し、検証手順によりそれらの正しさを確認し、検出された幻覚を緩和し、生成過程を継続する。粒子生成課題」を用いた広範囲な実験により,我々はまず,検出・緩和手法の個別効果を実証した。具体的には、検出技術は88%のリコールを達成し、緩和技術は正しく検出された幻覚の57.6%を軽減した。重要なことは,誤検出された幻覚,すなわち偽陽性の場合においても,新たな幻覚は導入されない。そして,提案手法により,gpt-3モデルの幻覚を平均47.5%から14.5%に低減できることを示した。まとめると、私たちの研究は、大規模な言語モデルの信頼性と信頼性の向上に寄与します。 Recently developed large language models have achieved remarkable success in generating fluent and coherent text. However, these models often tend to 'hallucinate' which critically hampers their reliability. In this work, we address this crucial problem and propose an approach that actively detects and mitigates hallucinations during the generation process. Specifically, we first identify the candidates of potential hallucination leveraging the model's logit output values, check their correctness through a validation procedure, mitigate the detected hallucinations, and then continue with the generation process. Through extensive experiments with the 'article generation task', we first demonstrate the individual efficacy of our detection and mitigation techniques. Specifically, the detection technique achieves a recall of 88% and the mitigation technique successfully mitigates 57.6% of the correctly detected hallucinations. Importantly, our mitigation technique does not introduce new hallucinations even in the case of incorrectly detected hallucinations, i.e., false positives. Then, we show that the proposed active detection and mitigation approach successfully reduces the hallucinations of the GPT-3 model from 47.5% to 14.5% on average. In summary, our work contributes to improving the reliability and trustworthiness of large language models, a crucial step en route to enabling their widespread adoption in real-world applications.	翻訳日:2023-07-11 16:20:34 公開日:2023-07-08
# TractGeoNet: 言語アセスメント性能予測のためのトラクション微細構造のポイントワイズ解析のための幾何学的ディープラーニングフレームワーク TractGeoNet: A geometric deep learning framework for pointwise analysis of tract microstructure to predict language assessment performance ( http://arxiv.org/abs/2307.03982v1 ) ライセンス: Link先を確認	Yuqian Chen, Leo R. Zekelman, Chaoyi Zhang, Tengfei Xue, Yang Song, Nikos Makris, Yogesh Rathi, Alexandra J. Golby, Weidong Cai, Fan Zhang, Lauren J. O'Donnell	(参考訳) 拡散磁気共鳴画像(dMRI)と関連する点組織計測を用いて回帰を行うための幾何学的深層学習ベースのフレームワークであるTractGeoNetを提案する。ポイントクラウド表現を使用することで、traitgeonetは繊維路内のすべての点からポイントワイズ組織微細構造と位置情報を直接利用することができる。そこで本研究では,回帰ラベルスコアの相対的差を絶対値よりも精度良く予測することに焦点を当てた,新しい損失関数であるペアド・シアム回帰損失を提案する。さらに, 回帰作業のための白色物質繊維トラクト内の高い予測的解剖学的領域を同定するための臨界領域局所化アルゴリズムを提案する。ヒトコネクトームプロジェクトから806名の被験者から得られた20個の関連白質線維路のデータセットを用いて、言語における2つの神経心理学的評価における個人のパフォーマンスを予測することにより、提案手法の有効性を評価する。その結果, 一般的な回帰モデルと比較して, tractgeonet の予測性能が向上した。研究した20路のうち,左肩甲骨筋管は2つの言語性能評価の最も高い予測値であることがわかった。局所的な臨界領域は、上側頭葉と前側頭葉、頭頂側頭葉、および前側頭葉などの言語機能に重要なとされる脳の領域を含む、両半球および全脳葉に広く分布する。 tractgeonetは、脳の白質繊維路の研究を強化し、その構造を言語性能などの人間の特性に関連付けるために、幾何学的深層学習の可能性を実証している。 We propose a geometric deep-learning-based framework, TractGeoNet, for performing regression using diffusion magnetic resonance imaging (dMRI) tractography and associated pointwise tissue microstructure measurements. By employing a point cloud representation, TractGeoNet can directly utilize pointwise tissue microstructure and positional information from all points within a fiber tract. To improve regression performance, we propose a novel loss function, the Paired-Siamese Regression loss, which encourages the model to focus on accurately predicting the relative differences between regression label scores rather than just their absolute values. In addition, we propose a Critical Region Localization algorithm to identify highly predictive anatomical regions within the white matter fiber tracts for the regression task. We evaluate the effectiveness of the proposed method by predicting individual performance on two neuropsychological assessments of language using a dataset of 20 association white matter fiber tracts from 806 subjects from the Human Connectome Project. The results demonstrate superior prediction performance of TractGeoNet compared to several popular regression models. Of the twenty tracts studied, we find that the left arcuate fasciculus tract is the most highly predictive of the two studied language performance assessments. The localized critical regions are widespread and distributed across both hemispheres and all cerebral lobes, including areas of the brain considered important for language function such as superior and anterior temporal regions, pars opercularis, and precentral gyrus. Overall, TractGeoNet demonstrates the potential of geometric deep learning to enhance the study of the brain's white matter fiber tracts and to relate their structure to human traits such as language performance.	翻訳日:2023-07-11 16:20:16 公開日:2023-07-08
# EffUNetとトランスファー学習アプローチを用いた建物と道路のセグメンテーション Building and Road Segmentation Using EffUNet and Transfer Learning Approach ( http://arxiv.org/abs/2307.03980v1 ) ライセンス: Link先を確認	Sahil Gangurde	(参考訳) 都市では、水道、鉄道、送電線、建物、道路などの都市物に関する情報が都市計画に必要である。特に、これらのオブジェクトの拡散、場所、キャパシティに関する情報は、政策立案者が影響力のある決定を下すために必要です。この論文は、衛星とuavが捉えた空中画像から建物と道路を分割することを目的としている。セマンティックセグメンテーションタスクのために多くの異なるアーキテクチャが提案されており、unetはその1つである。本稿では,google が新たに提案する efficientnetv2 を,unet デコーダを用いた特徴抽出のためのエンコーダとして,セグメンテーションマップを構築するための新しいアーキテクチャを提案する。このアプローチを使用して、マサチューセッツ・ビルディングとロードのデータセットのベンチマークスコアをそれぞれ0.8365と0.9153で達成しました。 In city, information about urban objects such as water supply, railway lines, power lines, buildings, roads, etc., is necessary for city planning. In particular, information about the spread of these objects, locations and capacity is needed for the policymakers to make impactful decisions. This thesis aims to segment the building and roads from the aerial image captured by the satellites and UAVs. Many different architectures have been proposed for the semantic segmentation task and UNet being one of them. In this thesis, we propose a novel architecture based on Google's newly proposed EfficientNetV2 as an encoder for feature extraction with UNet decoder for constructing the segmentation map. Using this approach we achieved a benchmark score for the Massachusetts Building and Road dataset with an mIOU of 0.8365 and 0.9153 respectively.	翻訳日:2023-07-11 16:19:44 公開日:2023-07-08
# Autonomy 2.0: スケールのエコノミーの探求 Autonomy 2.0: The Quest for Economies of Scale ( http://arxiv.org/abs/2307.03973v1 ) ライセンス: Link先を確認	Shuang Wu, Bo Yu, Shaoshan Liu, Yuhao Zhu	(参考訳) 過去10年間のロボティクスとAI技術の進歩により、私たちは今や自律機械の時代に入りました。情報技術の新たな時代には、サービスロボット、自律ドローン、配達ロボット、人間ではなく自動運転車といった自律型マシンがサービスを提供する。本稿では,デジタル経済の技術的課題と経済的影響を調べることによって,スケーラビリティは技術的観点からも極めて必要であり,経済的観点からも極めて有利である,と論じる。それにもかかわらず、現在の開発パラダイムであるAutonomy 1.0は、データや計算リソースの量ではなく、エンジニア数でスケールしているため、自律性産業がスケールの経済、特に指数関数的に安い計算コストと利用可能なデータの爆発から完全に利益を得ることができない。さらに、重要なスケーラビリティブロッカーを分析し、Autonomy 2.0と呼ばれる新しい開発パラダイムがこれらの問題に対処して、自律性産業を劇的に向上させる方法について説明する。 With the advancement of robotics and AI technologies in the past decade, we have now entered the age of autonomous machines. In this new age of information technology, autonomous machines, such as service robots, autonomous drones, delivery robots, and autonomous vehicles, rather than humans, will provide services. In this article, through examining the technical challenges and economic impact of the digital economy, we argue that scalability is both highly necessary from a technical perspective and significantly advantageous from an economic perspective, thus is the key for the autonomy industry to achieve its full potential. Nonetheless, the current development paradigm, dubbed Autonomy 1.0, scales with the number of engineers, instead of with the amount of data or compute resources, hence preventing the autonomy industry to fully benefit from the economies of scale, especially the exponentially cheapening compute cost and the explosion of available data. We further analyze the key scalability blockers and explain how a new development paradigm, dubbed Autonomy 2.0, can address these problems to greatly boost the autonomy industry.	翻訳日:2023-07-11 16:19:31 公開日:2023-07-08
# 中国語文法誤り訂正課題における大規模言語モデルの能力評価 Evaluating the Capability of Large-scale Language Models on Chinese Grammatical Error Correction Task ( http://arxiv.org/abs/2307.03972v1 ) ライセンス: Link先を確認	Fanyi Qu and Yunfang Wu	(参考訳) 大規模言語モデル(LLM)は、様々な自然言語処理(NLP)タスクにおいて顕著な能力を示し、近年多くの注目を集めている。しかし、いくつかの研究では、英語文法誤り訂正(GEC)タスクにおける最先端モデル以上の有望な結果が得られないことが示されている。本稿では,中国語の文法的誤り訂正タスクにおける大規模言語モデルの性能について検討し,今後の研究の指針を提供する。 4つの中国GECデータセット上で3つの異なるモデルスケールのLLMを用いて実験を行った。実験結果から,自動評価指標におけるllmの性能は,過剰補正の問題から以前のsomaモデルに及ばないことが示された。また,異なるデータ分布で評価した場合,llmの性能に有意な変動が認められた。以上の結果から,中国GEC課題へのLCMの適用にはさらなる調査が必要であることが示唆された。 Large-scale language models (LLMs) has shown remarkable capability in various of Natural Language Processing (NLP) tasks and attracted lots of attention recently. However, some studies indicated that large language models fail to achieve promising result beyond the state-of-the-art models in English grammatical error correction (GEC) tasks. In this report, we aim to explore the how large language models perform on Chinese grammatical error correction tasks and provide guidance for future work. We conduct experiments with 3 different LLMs of different model scale on 4 Chinese GEC dataset. Our experimental results indicate that the performances of LLMs on automatic evaluation metrics falls short of the previous sota models because of the problem of over-correction. Furthermore, we also discover notable variations in the performance of LLMs when evaluated on different data distributions. Our findings demonstrates that further investigation is required for the application of LLMs on Chinese GEC task.	翻訳日:2023-07-11 16:19:10 公開日:2023-07-08
# エンド・ツー・エンドのマルチラベルコントラスト学習 End-to-End Supervised Multilabel Contrastive Learning ( http://arxiv.org/abs/2307.03967v1 ) ライセンス: Link先を確認	Ahmad Sajedi and Samir Khaki and Konstantinos N. Plataniotis and Mahdi S. Hosseini	(参考訳) マルチラベル表現学習は、オブジェクトカテゴリ間のラベル依存性や、プラス/ネガティブサンプルの固有の不均衡といったデータ関連の問題に関連付けられる難題として認識されている。最近の進歩はモデルやデータ中心の視点からこれらの課題に対処している。モデル中心では、ラベル相関は外部モデル設計(例えばグラフcnn)によって得られ、訓練のための帰納バイアスを組み込む。しかし、エンドツーエンドのトレーニングフレームワークの設計に失敗し、計算の複雑さが高まった。逆にデータ中心では、ラベル依存を無視しながら分類を改善するためにデータセットの現実的な性質が考慮される。本稿では,モデル中心設計とデータ中心設計の両方の欠点に対処するために,新たなエンドツーエンドトレーニングフレームワークであるkmcl(kernel-based mutlilabel contrastive learning)を提案する。 KMCLはまず組み込み機能をガウス RKHS の指数核の混合に変換する。その後、目的の損失を符号化する。 (a)カーネル表現の再構築のための再構築損失 b)固有の不均衡問題に対処する非対称な分類損失、及び (c)ラベル相関を捉えるための対比的損失。 KMCLは、低い計算フットプリントを維持しながら、特徴エンコーダの不確実性をモデル化する。画像分類タスクにおいて,SOTA法よりも一貫したKMCLの改良を示す大規模な実験を行った。 PyTorchの実装は \url{https://github.com/mahdihosseini/KMCL} で提供されている。 Multilabel representation learning is recognized as a challenging problem that can be associated with either label dependencies between object categories or data-related issues such as the inherent imbalance of positive/negative samples. Recent advances address these challenges from model- and data-centric viewpoints. In model-centric, the label correlation is obtained by an external model designs (e.g., graph CNN) to incorporate an inductive bias for training. However, they fail to design an end-to-end training framework, leading to high computational complexity. On the contrary, in data-centric, the realistic nature of the dataset is considered for improving the classification while ignoring the label dependencies. In this paper, we propose a new end-to-end training framework -- dubbed KMCL (Kernel-based Mutlilabel Contrastive Learning) -- to address the shortcomings of both model- and data-centric designs. The KMCL first transforms the embedded features into a mixture of exponential kernels in Gaussian RKHS. It is then followed by encoding an objective loss that is comprised of (a) reconstruction loss to reconstruct kernel representation, (b) asymmetric classification loss to address the inherent imbalance problem, and (c) contrastive loss to capture label correlation. The KMCL models the uncertainty of the feature encoder while maintaining a low computational footprint. Extensive experiments are conducted on image classification tasks to showcase the consistent improvements of KMCL over the SOTA methods. PyTorch implementation is provided in \url{https://github.com/mahdihosseini/KMCL}.	翻訳日:2023-07-11 16:18:57 公開日:2023-07-08
# 実例システムによるプログラミング用アノテーションのマルチインテント検出 Multi-Intent Detection in User Provided Annotations for Programming by Examples Systems ( http://arxiv.org/abs/2307.03966v1 ) ライセンス: Link先を確認	Nischal Ashok Kumar, Nitin Gupta, Shanmukha Guttula, Hima Patel	(参考訳) エンタープライズアプリケーションのマッピングでは、データマッピングは統合開発の基本部分であり続けるが、時間を要する。多くのアプリケーションが命名基準を欠いているため、ネストされたフィールド構造は統合開発者をさらに複雑にします。マッピングが完了すると、各アプリケーションが特定のフォーマットでデータを期待しているため、データ変換がユーザにとって次の課題になります。また、統合フローを構築しながら、開発者はソースとターゲットのデータフィールドのフォーマットを理解し、ソースからターゲットフォーマットへのデータ変更が可能な変換プログラムを考え出す必要がある。いくつかの仕様からプログラム合成パラダイムによる変換プログラムの自動生成の問題が人工知能(AI)の初期から研究されている。 Programming by Example (PBE) は、ユーザが提供する入力および出力サンプルからフォーマットや文字列変換タスクを達成するためのコンピュータプログラムの自動推論をターゲットにした手法である。正しい意図を学習するには、ユーザからの多様なサンプルセットが必要である。しかし、ユーザが多様なサンプルセットを提供できない可能性がある。これは入力と出力のサンプルに複数の意図や曖昧さをもたらす可能性がある。したがって、PBEシステムは正しい意図プログラムを生成する際に混乱する可能性がある。本稿では,入力出力文字列を解析し,複数の目的に責任を負う特性の異なるセットにマッピングする,ディープニューラルネットワークに基づくあいまいさ予測モデルを提案する。ユーザはこれらのプロパティを分析して、新しいサンプルを提供したり、既存のサンプルを変更したりすることで、エンタープライズアプリケーションをマッピングするためのより良いpbeシステムを構築することができる。 In mapping enterprise applications, data mapping remains a fundamental part of integration development, but its time consuming. An increasing number of applications lack naming standards, and nested field structures further add complexity for the integration developers. Once the mapping is done, data transformation is the next challenge for the users since each application expects data to be in a certain format. Also, while building integration flow, developers need to understand the format of the source and target data field and come up with transformation program that can change data from source to target format. The problem of automatic generation of a transformation program through program synthesis paradigm from some specifications has been studied since the early days of Artificial Intelligence (AI). Programming by Example (PBE) is one such kind of technique that targets automatic inferencing of a computer program to accomplish a format or string conversion task from user-provided input and output samples. To learn the correct intent, a diverse set of samples from the user is required. However, there is a possibility that the user fails to provide a diverse set of samples. This can lead to multiple intents or ambiguity in the input and output samples. Hence, PBE systems can get confused in generating the correct intent program. In this paper, we propose a deep neural network based ambiguity prediction model, which analyzes the input-output strings and maps them to a different set of properties responsible for multiple intent. Users can analyze these properties and accordingly can provide new samples or modify existing samples which can help in building a better PBE system for mapping enterprise applications.	翻訳日:2023-07-11 16:18:37 公開日:2023-07-08
# ChatGPTは人格認識に優れているか? 予備的研究 Is ChatGPT a Good Personality Recognizer? A Preliminary Study ( http://arxiv.org/abs/2307.03952v1 ) ライセンス: Link先を確認	Yu Ji, Wen Wu, Hong Zheng, Yi Hu, Xi Chen, Liang He	(参考訳) 近年、パーソナリティは感情分析や製品のレコメンデーションといった多くのタスクに組み込まれている価値ある個人的要因とみなされている。これは、与えられたテキストに基づいて個人のパーソナリティを識別することを目的とした、テキストベースのパーソナリティ認識タスクに広く注目されている。近年,ChatGPTが様々な自然言語処理タスクにおいて顕著な能力を発揮していることを考慮し,テキストに基づく人格認識タスクにおけるChatGPTの予備評価を行い,効果的な人格データを生成する。具体的には,ChatGPTが与えられたテキストから人格を認識する能力,特に所定レベルでの分析においてChatGPTを導くために設計されたレベル指向のプロンプト戦略を探索する。 2つの代表的な実世界のデータセットにおけるChatGPTの性能を、従来のニューラルネットワーク、微調整されたRoBERTa、およびそれに対応するタスク固有モデルと比較する。実験の結果,ゼロショット・チェーン・オブ・マインドプロンプトのchatgptは印象的なパーソナリティ認識能力を示すことがわかった。ゼロショットチェーンのプロンプトによってトリガーされ、ChatGPTは2つのデータセット上で微調整されたRoBERTaよりも優れており、テキストベースの論理的推論を通じて自然言語の説明を提供することができる。さらに、ゼロショット・チェーン・オブ・シークレット・プロンプトとは対照的に、ゼロショット・レベル指向・チェーン・オブ・シークレット・プロンプトは、ChatGPTのパーソナリティ予測能力を高め、ChatGPTとそれに対応するタスク固有モデルのパフォーマンスギャップを低減する。また,ChatGPTの性格を識別する際の公正さを観察する実験を行い,性別や年齢などのセンシティブな属性に対して,ChatGPTが不公平であることを示す。 In recent years, personality has been regarded as a valuable personal factor being incorporated into numerous tasks such as sentiment analysis and product recommendation. This has led to widespread attention to text-based personality recognition task, which aims to identify an individual's personality based on given text. Considering that ChatGPT has recently exhibited remarkable abilities on various natural language processing tasks, we provide a preliminary evaluation of ChatGPT on text-based personality recognition task for generating effective personality data. Concretely, we employ a variety of prompting strategies to explore ChatGPT's ability in recognizing personality from given text, especially the level-oriented prompting strategy we designed for guiding ChatGPT in analyzing given text at a specified level. We compare the performance of ChatGPT on two representative real-world datasets with traditional neural network, fine-tuned RoBERTa, and corresponding state-of-the-art task-specific model. The experimental results show that ChatGPT with zero-shot chain-of-thought prompting exhibits impressive personality recognition ability. Triggered by zero-shot chain-of-thought prompting, ChatGPT outperforms fine-tuned RoBERTa on the two datasets and is capable to provide natural language explanations through text-based logical reasoning. Furthermore, relative to zero-shot chain-of-thought prompting, zero-shot level-oriented chain-of-thought prompting enhances the personality prediction ability of ChatGPT and reduces the performance gap between ChatGPT and corresponding state-of-the-art task-specific model. Besides, we also conduct experiments to observe the fairness of ChatGPT when identifying personality and discover that ChatGPT shows unfairness to some sensitive demographic attributes such as gender and age.	翻訳日:2023-07-11 16:18:14 公開日:2023-07-08
# レーン間の読書: 道路上のテキストビデオQA Reading Between the Lanes: Text VideoQA on the Road ( http://arxiv.org/abs/2307.03948v1 ) ライセンス: Link先を確認	George Tom, Minesh Mathew, Sergi Garcia, Dimosthenis Karatzas and C.V. Jawahar	(参考訳) 道路周辺のテキストと標識はドライバーにとって重要な情報を提供し、安全な航行と状況認識に不可欠である。動作中のシーンのテキスト認識は難しい問題であり、テキストの手がかりは通常短時間で現れるが、距離での早期検出が必要となる。このような情報を利用してドライバーを支援するシステムは、ビデオストリームから視覚的およびテキスト的手がかりを抽出し、取り入れるだけでなく、時間とともに推論するべきである。この問題に対処するために、ドライバ支援の文脈でビデオ質問応答(VideoQA)タスクのための新しいデータセットであるRoadTextVQAを紹介する。 RoadTextVQAは、複数の国から集められた3222ドルのドライビングビデオから成り、10,500ドルの質問が注釈付けされ、すべてドライビングビデオにあるテキストまたはロードサインに基づいている。 RoadTextVQAデータセット上での最先端のビデオ質問応答モデルの性能評価を行い、車載支援システムとテキスト対応マルチモーダル質問応答の研究を進める上で、この領域における改善の可能性とデータセットの有用性を明らかにする。データセットはhttp://cvit.iiit.ac.in/research/projects/cvit-projects/roadtextvqaで入手できる。 Text and signs around roads provide crucial information for drivers, vital for safe navigation and situational awareness. Scene text recognition in motion is a challenging problem, while textual cues typically appear for a short time span, and early detection at a distance is necessary. Systems that exploit such information to assist the driver should not only extract and incorporate visual and textual cues from the video stream but also reason over time. To address this issue, we introduce RoadTextVQA, a new dataset for the task of video question answering (VideoQA) in the context of driver assistance. RoadTextVQA consists of $3,222$ driving videos collected from multiple countries, annotated with $10,500$ questions, all based on text or road signs present in the driving videos. We assess the performance of state-of-the-art video question answering models on our RoadTextVQA dataset, highlighting the significant potential for improvement in this domain and the usefulness of the dataset in advancing research on in-vehicle support systems and text-aware multimodal question answering. The dataset is available at http://cvit.iiit.ac.in/research/projects/cvit-projects/roadtextvqa	翻訳日:2023-07-11 16:17:37 公開日:2023-07-08
# 機械学習を用いたパッシブ光ネットワークの故障モニタリング Fault Monitoring in Passive Optical Networks using Machine Learning Techniques ( http://arxiv.org/abs/2307.03945v1 ) ライセンス: Link先を確認	Khouloud Abdelli, Carsten Tropschug, Helmut Griesser, and Stephan Pachnicke	(参考訳) パッシブ光ネットワーク(PON)システムは、ファイバカットや光ネットワークユニット(ONU)送信機/受信機故障など、様々な障害に対して脆弱である。ファイバカットによるサービス中断は、サービスプロバイダやオペレーターにとって大きな損失をもたらす可能性がある。分岐からの反射が重なり合うため、ほぼ等距離分岐項の場合、故障したONUの同定が困難になるため、大域後方散乱信号による故障枝の識別が困難になる。ネットワークサイズが大きくなると、PONシステムの障害監視の複雑さが増大し、信頼性が低下する。そこで本研究では,ponシステムにおける障害監視のための機械学習(ml)手法を提案し,実験的な光時間領域反射法(otdr)データを用いて検証を行う。 Passive optical network (PON) systems are vulnerable to a variety of failures, including fiber cuts and optical network unit (ONU) transmitter/receiver failures. Any service interruption caused by a fiber cut can result in huge financial losses for service providers or operators. Identifying the faulty ONU becomes difficult in the case of nearly equidistant branch terminations because the reflections from the branches overlap, making it difficult to distinguish the faulty branch given the global backscattering signal. With increasing network size, the complexity of fault monitoring in PON systems increases, resulting in less reliable monitoring. To address these challenges, we propose in this paper various machine learning (ML) approaches for fault monitoring in PON systems, and we validate them using experimental optical time domain reflectometry (OTDR) data.	翻訳日:2023-07-11 16:17:16 公開日:2023-07-08
# 地下水数値モデリングにおけるU-Net & Vision Transformerの有効性の解明 Understanding the Efficacy of U-Net & Vision Transformer for Groundwater Numerical Modelling ( http://arxiv.org/abs/2307.04010v1 ) ライセンス: Link先を確認	Maria Luisa Taccari, Oded Ovadia, He Wang, Adar Kahana, Xiaohui Chen, Peter K. Jimack	(参考訳) 本稿では、地下水系における時間依存フォワードモデリングのための様々な機械学習モデル、すなわちビジョントランスフォーマー(ViT)と統合されたU-Netとフーリエニューラル演算子(FNO)を総合的に比較する。合成データセットのテストを通じて、U-NetとU-Net + ViTモデルは、特にスパースデータシナリオにおいて、精度と効率でFNOより優れていることを示した。これらの結果は,データ不足が顕著な実世界のアプリケーションにおいて,地下水モデリングのためのU-Netモデルの可能性を明らかにするものである。 This paper presents a comprehensive comparison of various machine learning models, namely U-Net, U-Net integrated with Vision Transformers (ViT), and Fourier Neural Operator (FNO), for time-dependent forward modelling in groundwater systems. Through testing on synthetic datasets, it is demonstrated that U-Net and U-Net + ViT models outperform FNO in accuracy and efficiency, especially in sparse data scenarios. These findings underscore the potential of U-Net-based models for groundwater modelling in real-world applications where data scarcity is prevalent.	翻訳日:2023-07-11 16:09:42 公開日:2023-07-08
# インタラクティブなディクテーションを目指して Toward Interactive Dictation ( http://arxiv.org/abs/2307.04008v1 ) ライセンス: Link先を確認	Belinda Z. Li, Jason Eisner, Adam Pauls, Sam Thomson	(参考訳) 音声ディクテーションは、ますます重要なテキスト入力モダリティである。既存のシステムでは、コマンド言語をトリガーワードによって起動されるフラットテンプレートに制限している。本研究では,オープンエンド自然言語における音声編集コマンドを用いて,ユーザの判断を中断できる可能性について検討する。このようなシステムを試すために,新しいタスクとデータセット TERTiUS を導入する。この柔軟性をリアルタイムでサポートするには、システムは音声のスパンをディクテーションまたはコマンドとして段階的に分類し、コマンドであるスパンを解釈する必要がある。我々は、大規模な事前学習言語モデルを用いて、編集されたテキストを予測するか、あるいは小さなテキスト編集プログラムを予測する。より小さなモデルは1.3秒のレイテンシで30%のエンドステート精度を達成し、大きなモデルは55%のエンドステート精度を7秒のレイテンシで達成する。 Voice dictation is an increasingly important text input modality. Existing systems that allow both dictation and editing-by-voice restrict their command language to flat templates invoked by trigger words. In this work, we study the feasibility of allowing users to interrupt their dictation with spoken editing commands in open-ended natural language. We introduce a new task and dataset, TERTiUS, to experiment with such systems. To support this flexibility in real-time, a system must incrementally segment and classify spans of speech as either dictation or command, and interpret the spans that are commands. We experiment with using large pre-trained language models to predict the edited text, or alternatively, to predict a small text-editing program. Experiments show a natural trade-off between model accuracy and latency: a smaller model achieves 30% end-state accuracy with 1.3 seconds of latency, while a larger model achieves 55% end-state accuracy with 7 seconds of latency.	翻訳日:2023-07-11 16:09:32 公開日:2023-07-08
# 第9回合理性と知識の理論的側面に関する会議 Proceedings Ninetheenth conference on Theoretical Aspects of Rationality and Knowledge ( http://arxiv.org/abs/2307.04005v1 ) ライセンス: Link先を確認	Rineke Verbrugge (University of Groningen)	(参考訳) TARKカンファレンス(Theoretical aspects of Rationality and Knowledge)は、コンピュータ科学、人工知能、ゲーム理論、決定論、哲学、論理学、言語学、認知科学など、さまざまな分野の研究者を集結させることを目的としたカンファレンスである。その目標は、合理性と知識に関する推論を含む学際的な問題の理解を深めることである。 1986年以降、ジョー・ハルパーン (Joe Halpern) の主導で世界各国で隔年開催されている。関心の対象は、知識、信念、認識、不確実性、有界的合理性と資源境界推論、常識認識的推論、認識論理、認識論的ゲーム理論、知識と行動、知識とその他の精神状態に関する推論の応用、信念の修正、計算的社会選択、アルゴリズム的ゲーム理論、マルチエージェントシステムの基礎などである。タルクに関する情報は、会議の議事録を含むウェブサイト http://www.tark.org/ で入手でき、2023年6月28日から6月30日にかけてイギリスのオックスフォード大学で行われた第19回合理性と知識の理論的側面に関する会議 (tark 2023) で発表された論文を含んでいる。カンファレンスのwebサイトはhttps://sites.google.com/view/tark-2023にある。 The TARK conference (Theoretical Aspects of Rationality and Knowledge) is a conference that aims to bring together researchers from a wide variety of fields, including computer science, artificial intelligence, game theory, decision theory, philosophy, logic, linguistics, and cognitive science. Its goal is to further our understanding of interdisciplinary issues involving reasoning about rationality and knowledge. Previous conferences have been held biennially around the world since 1986, on the initiative of Joe Halpern (Cornell University). Topics of interest include, but are not limited to, semantic models for knowledge, belief, awareness and uncertainty, bounded rationality and resource-bounded reasoning, commonsense epistemic reasoning, epistemic logic, epistemic game theory, knowledge and action, applications of reasoning about knowledge and other mental states, belief revision, computational social choice, algorithmic game theory, and foundations of multi-agent systems. Information about TARK, including conference proceedings, is available at the website http://www.tark.org/ These proceedings contain the papers that have been accepted for presentation at the Nineteenth Conference on Theoretical Aspects of Rationality and Knowledge (TARK 2023), held between June 28 and June 30, 2023, at the University of Oxford, United Kingdom. The conference website can be found at https://sites.google.com/view/tark-2023	翻訳日:2023-07-11 16:09:13 公開日:2023-07-08
# 高配向化学蒸着ダイヤモンドの応力分布緩和による完全配向窒素-原子価中心のスピン脱落時間延長 Extending spin dephasing time of perfectly aligned Nitrogen-Vacancy centers by mitigating stress distribution on highly misoriented chemical-vapor-deposition diamond ( http://arxiv.org/abs/2307.04003v1 ) ライセンス: Link先を確認	T. Tsuji, T. Sekiguchi, T.Iwasaki and M.Hatano	(参考訳) 完全に整列した窒素空孔(NV)中心のスピン降下時間(T2)をCVDダイヤモンドに拡張すると、直流磁気感度が向上する。しかし,nv中心のt2は厚さが大きくなるにつれてダイヤモンド膜の応力分布によって著しく減少する。そこで本研究では, CVDダイヤモンド薄膜の応力分布を緩和し, アンサンブルNV中心のT2拡張を実現する方法を開発した。配向角2.0, 3.7, 5.0, 10{\deg} の (111) ダイヤモンド基板上に, 完全配向NV中心の約50 cm のCVDダイヤモンド膜を形成した。その結果,nv中心のt2は電子と核スピン浴のみに制限された値に接近し,方位角を増加させることがわかった。微視的応力測定により, CVDダイヤモンド薄膜の深度方向の応力分布は低配向角度で高度に不均一であったのに対し, 非均一性は高配向基板上で大きく抑制された。応力分布の減少は、CVDダイヤモンドの転位密度の低下に起因する可能性がある。本研究は,高感度量子センサに用いる高品質ダイヤモンド材料を合成するための重要な方法である。 Extending the spin-dephasing time (T2) of perfectly aligned nitrogen-vacancy (NV) centers in large-volume chemical vapor deposition (CVD) diamonds leads to enhanced DC magnetic sensitivity. However, T2 of the NV centers is significantly reduced by the stress distribution in the diamond film as its thickness increases. To overcome this issue, we developed a method to mitigate the stress distribution in the CVD diamond films, leading to a T2* extension of the ensemble NV centers. CVD diamond films of approximately 50 \mu m thickness with perfectly aligned NV centers were formed on (111) diamond substrates with misorientation angles of 2.0, 3.7, 5.0, and 10{\deg}. We found that T2* of the ensemble of NV centers increased to approach the value limited only by the electron and nuclear spin bath with increasing the misorientation angle. Microscopic stress measurements revealed that the stress distribution was highly inhomogeneous along the depth direction in the CVD diamond film at low misorientation angles, whereas the inhomogeneity was largely suppressed on highly misoriented substrates. The reduced stress distribution possibly originates from the reduction of the dislocation density in the CVD diamond. This study provides an important method for synthesizing high-quality diamond materials for use in highly sensitive quantum sensors.	翻訳日:2023-07-11 16:08:45 公開日:2023-07-08
# 高次元特徴を持つ集合表現に適した多項式幅 Polynomial Width is Sufficient for Set Representation with High-dimensional Features ( http://arxiv.org/abs/2307.04001v1 ) ライセンス: Link先を確認	Peihao Wang, Shenghao Yang, Shu Li, Zhangyang Wang, Pan Li	(参考訳) 入力順序に敏感なニューラルネットワークの帰納的バイアスをモデル化するために、ディープラーニングでは集合表現がユビキタスになってきた。 deepsetsは、最も広く使われているニューラルネットワークアーキテクチャである。各集合要素を次元$L$で潜在空間に埋め込み、次に総集合埋め込みを得るために総和プーリングを行い、最終的に全体集合埋め込みを出力にマッピングする。本研究では,次元$L$がDeepSetsの表現力に与える影響について検討する。以前の分析では、1次元の特徴として過度に単純化された高次元特徴や、分析的アクティベーションに制限されていたため、実用的利用から逸脱するか、設定サイズ$N$と特徴次元$D$で指数関数的に成長する$L$が得られた。十分な表現力を達成する$l$の最小値を調べるために、2つの集合要素埋め込み層を示す。 (a)線形+電力活性化(lp)及び (b)線形+指数的活性化(LE) L$がpoly$(N, D)$であることは、両方の埋め込み層を用いた集合表現に十分であることを示す。また、LP埋め込み層に対して$L$の低いバウンダリも提供します。さらに、この結果を置換同変集合関数と複素体に拡張する。 Set representation has become ubiquitous in deep learning for modeling the inductive bias of neural networks that are insensitive to the input order. DeepSets is the most widely used neural network architecture for set representation. It involves embedding each set element into a latent space with dimension $L$, followed by a sum pooling to obtain a whole-set embedding, and finally mapping the whole-set embedding to the output. In this work, we investigate the impact of the dimension $L$ on the expressive power of DeepSets. Previous analyses either oversimplified high-dimensional features to be one-dimensional features or were limited to analytic activations, thereby diverging from practical use or resulting in $L$ that grows exponentially with the set size $N$ and feature dimension $D$. To investigate the minimal value of $L$ that achieves sufficient expressive power, we present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE). We demonstrate that $L$ being poly$(N, D)$ is sufficient for set representation using both embedding layers. We also provide a lower bound of $L$ for the LP embedding layer. Furthermore, we extend our results to permutation-equivariant set functions and the complex field.	翻訳日:2023-07-11 16:08:22 公開日:2023-07-08
# 効率的な逆トーンマッピングのための軽量改良残差ネットワーク Lightweight Improved Residual Network for Efficient Inverse Tone Mapping ( http://arxiv.org/abs/2307.03998v1 ) ライセンス: Link先を確認	Liqi Xue, Tianyi Xu, Yongbao Song, Yan Liu, Lei Zhang, Xiantong Zhen, and Jun Xu	(参考訳) HDR10テレビのようなディスプレイデバイスは、高ダイナミックレンジ(HDR)画像を可視化するために、私たちの日常生活でますます普及している。しかし、インターネット上のメディア画像の大半は8ビット標準ダイナミックレンジ(SDR)フォーマットのままである。したがって,SDR画像のHDR画像への逆トーンマッピング(ITM)による変換は,豊富なメディア画像の潜在能力を最大限に活用するために重要である。しかし、既存のitm手法は通常、膨大な計算コストを必要とする複雑なネットワークアーキテクチャで開発されている。本稿では,効率的なitmを実現するために,人気のある残差ブロックのパワーを高めることで,軽量な改良残差ネットワーク(irnet)を提案する。具体的には,高精細HDR画像再構成のための多層構造を抽出・融合する改良された残留ブロック(IRB)を提案する。 3つのベンチマークデータセットの実験により、我々のIRNetはIMMタスクとSR-ITMタスクの両方で最先端のパフォーマンスを達成した。コード、モデル、データはhttps://github.com/ThisisVikki/ITMベースラインで公開される。 The display devices like HDR10 televisions are increasingly prevalent in our daily life for visualizing high dynamic range (HDR) images. But the majority of media images on the internet remain in 8-bit standard dynamic range (SDR) format. Therefore, converting SDR images to HDR ones by inverse tone mapping (ITM) is crucial to unlock the full potential of abundant media images. However, existing ITM methods are usually developed with complex network architectures requiring huge computational costs. In this paper, we propose a lightweight Improved Residual Network (IRNet) by enhancing the power of popular residual block for efficient ITM. Specifically, we propose a new Improved Residual Block (IRB) to extract and fuse multi-layer features for fine-grained HDR image reconstruction. Experiments on three benchmark datasets demonstrate that our IRNet achieves state-of-the-art performance on both the ITM and joint SR-ITM tasks. The code, models and data will be publicly available at https://github.com/ThisisVikki/ITM-baseline.	翻訳日:2023-07-11 16:08:01 公開日:2023-07-08
# 低位mdpにおける効率的なモデルフリー探索 Efficient Model-Free Exploration in Low-Rank MDPs ( http://arxiv.org/abs/2307.03997v1 ) ライセンス: Link先を確認	Zakaria Mhammedi, Adam Block, Dylan J. Foster, Alexander Rakhlin	(参考訳) 強化学習における大きな課題は、一般化と関数近似が必要な高次元領域での探索のための実践的でサンプル効率の良いアルゴリズムを開発することである。低ランクマルコフ決定プロセス(遷移確率が未知の機能埋め込みに基づく低ランク分解を許容する)は、関数近似を伴うrlの単純だが表現力に富むフレームワークを提供するが、既存のアルゴリズムは(1)計算に難解、(2)潜在変数構造、モデルベースの関数近似へのアクセス、到達可能性といった制限付き統計的仮定に依存する。本研究では,計算効率とモデル自由度を両立させ,一般関数近似を可能とし,付加的な構造仮定を必要としない,低ランクMPPの探索のための最初の実証可能なサンプル効率アルゴリズムを提案する。我々のアルゴリズムであるVoXは、表現学習とポリシー最適化をインターリーブすることで効率的な設計計算を行い、効率的に計算可能な基礎として機能埋め込みのための一般化された最適設計の概念を用いる。提案手法は,Frank-Wolfe法に基づく最適設計計算からポリシー最適化への新たな削減,および先行研究で見いだされたある種のミニマックス表現学習目標の分析などである。 A major challenge in reinforcement learning is to develop practical, sample-efficient algorithms for exploration in high-dimensional domains where generalization and function approximation is required. Low-Rank Markov Decision Processes -- where transition probabilities admit a low-rank factorization based on an unknown feature embedding -- offer a simple, yet expressive framework for RL with function approximation, but existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions such as latent variable structure, access to model-based function approximation, or reachability. In this work, we propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs that is both computationally efficient and model-free, allowing for general function approximation and requiring no additional structural assumptions. Our algorithm, VoX, uses the notion of a generalized optimal design for the feature embedding as an efficiently computable basis for exploration, performing efficient optimal design computation by interleaving representation learning and policy optimization. Our analysis -- which is appealingly simple and modular -- carefully combines several techniques, including a new reduction from optimal design computation to policy optimization based on the Frank-Wolfe method, and an improved analysis of a certain minimax representation learning objective found in prior work.	翻訳日:2023-07-11 16:07:47 公開日:2023-07-08
# アダプティブ埋め込みとセンスリングによる画像音化拡散モデル刺激 Stimulating the Diffusion Model for Image Denoising via Adaptive Embedding and Ensembling ( http://arxiv.org/abs/2307.03992v1 ) ライセンス: Link先を確認	Tong Li, Hansen Feng, Lizhi Wang, Zhiwei Xiong, Hua Huang	(参考訳) 画像のデノイジングは、低歪みで高品質な知覚性能を達成することが非常に要求される計算写真における根本的な問題である。現在の方法は知覚的なパフォーマンスに苦しむか、大きな歪みに悩まされる。近年,新しい拡散モデルによって様々なタスクにおける最先端性能が達成され,そのデノナイジング機構は画像のデノナイジングに大きな可能性を示している。しかし、画像の強調のための刺激拡散モデルは単純ではなく、いくつかの重要な問題を解決する必要がある。一方、入力の不整合は拡散モデルと画像のデノージングの接続を妨げる。一方、生成した画像と所望の復号化画像とのコンテンツ不整合は、さらなる歪みをもたらす。これらの課題に対処するために,拡散モデルを理解し再考することで,DMID(Diffusion Model for Image Denoising)と呼ばれる新しい戦略を提案する。我々のDMID戦略は、雑音像を事前学習した拡散モデルに埋め込む適応埋め込み法と、復調画像の歪みを低減する適応アンサンブル法とを含む。 dmid戦略は,gaussian画像とreal-world画像の両方に対して,歪みベースおよび知覚指標の最先端性能を実現する。 Image denoising is a fundamental problem in computational photography, where achieving high-quality perceptual performance with low distortion is highly demanding. Current methods either struggle with perceptual performance or suffer from significant distortion. Recently, the emerging diffusion model achieves state-of-the-art performance in various tasks, and its denoising mechanism demonstrates great potential for image denoising. However, stimulating diffusion models for image denoising is not straightforward and requires solving several critical problems. On the one hand, the input inconsistency hinders the connection of diffusion models and image denoising. On the other hand, the content inconsistency between the generated image and the desired denoised image introduces additional distortion. To tackle these problems, we present a novel strategy called Diffusion Model for Image Denoising (DMID) by understanding and rethinking the diffusion model from a denoising perspective. Our DMID strategy includes an adaptive embedding method that embeds the noisy image into a pre-trained diffusion model, and an adaptive ensembling method that reduces distortion in the denoised image. Our DMID strategy achieves state-of-the-art performance on all distortion-based and perceptual metrics, for both Gaussian and real-world image denoising.	翻訳日:2023-07-11 16:07:24 公開日:2023-07-08
# ftfdnet: tri-modality interactionによる会話型ビデオ操作検出のための学習 FTFDNet: Learning to Detect Talking Face Video Manipulation with Tri-Modality Interaction ( http://arxiv.org/abs/2307.03990v1 ) ライセンス: Link先を確認	Ganglai Wang, Peng Zhang, Junwen Xiong, Feihan Yang, Wei Huang, and Yufei Zha	(参考訳) ディープフェイクベースのデジタル顔偽造は、特に口唇操作が発話顔生成に使われている場合、公共メディアのセキュリティを脅かしており、偽ビデオ検出の難しさがさらに改善されている。与えられた発話に合わせて唇の形を変えるだけでは、その顔の特徴を偽の顔ビデオで判別することは困難である。先行知識としての音声ストリームへの注意の欠如とともに、フェイクな会話ビデオの検出失敗も避けられないものとなった。実際の映像の光学的流れが定期的に変化する間、特に唇領域ではフェイク音声映像の光学的流れが乱れ、つまり、光学的流れからの運動特徴が操作の手がかりを捉えるのに有用であることがわかった。本研究では,効率的なクロスモーダル融合 (CMF) モジュールを用いて,視覚・音声・動作特徴を取り入れた偽音声検出ネットワーク(FTFDNet)を提案する。さらに,モジュール化によって任意の視聴覚cnnアーキテクチャにシームレスに統合可能な,より有用な機能発見のための新しいオーディオ・ビジュアル・アテンション機構 (avam) を提案する。 AVAMの追加により、提案したFTFDNetは、確立されたフェイク音声検出データセット(FTFDD)だけでなく、DeepFakeビデオ検出データセット(DFDCとDF-TIMIT)上でも、最先端のDeepFakeビデオ検出方法よりも優れた検出性能を実現することができる。 DeepFake based digital facial forgery is threatening public media security, especially when lip manipulation has been used in talking face generation, and the difficulty of fake video detection is further improved. By only changing lip shape to match the given speech, the facial features of identity are hard to be discriminated in such fake talking face videos. Together with the lack of attention on audio stream as the prior knowledge, the detection failure of fake talking face videos also becomes inevitable. It's found that the optical flow of the fake talking face video is disordered especially in the lip region while the optical flow of the real video changes regularly, which means the motion feature from optical flow is useful to capture manipulation cues. In this study, a fake talking face detection network (FTFDNet) is proposed by incorporating visual, audio and motion features using an efficient cross-modal fusion (CMF) module. Furthermore, a novel audio-visual attention mechanism (AVAM) is proposed to discover more informative features, which can be seamlessly integrated into any audio-visual CNN architecture by modularization. With the additional AVAM, the proposed FTFDNet is able to achieve a better detection performance than other state-of-the-art DeepFake video detection methods not only on the established fake talking face detection dataset (FTFDD) but also on the DeepFake video detection datasets (DFDC and DF-TIMIT).	翻訳日:2023-07-11 16:07:04 公開日:2023-07-08
# PCGに基づく静的地下ガベージシナリオ生成 PCG-based Static Underground Garage Scenario Generation ( http://arxiv.org/abs/2307.03988v1 ) ライセンス: Link先を確認	Wenjin Li and Kai Li	(参考訳) 自動運転技術にはL0からL5までの5つのレベルがある。現在、l2レベル(部分自動化)のみが達成でき、l5(フルオートメーション)の最終レベルに到達するまでには長い道のりがあります。これらのレベルを横断する鍵は、自動運転モデルのトレーニングにある。しかし、モデルをトレーニングするための実際の道路データのみに頼るだけでは十分ではなく、大量のリソースを消費します。実世界のシナリオをシミュレートするシミュレータを通じて、すでに自動運転モデルをトレーニングする例があるが、これらのシナリオには完全な手動構築が必要である。道路ネットワークフォーマットから直接3Dシーンを変換することは、大量の詳細を欠き、トレーニングセットとして使用できない。地下駐車場の静的シナリオシミュレーションは手続き的コンテンツ生成(PCG)問題と見なされる。本稿ではSarsaアルゴリズムを用いて地下のガレージ構造における手続き的コンテンツ生成を解決する。 Autonomous driving technology has five levels, from L0 to L5. Currently, only the L2 level (partial automation) can be achieved, and there is a long way to go before reaching the final level of L5 (full automation). The key to crossing these levels lies in training the autonomous driving model. However, relying solely on real-world road data to train the model is far from enough and consumes a great deal of resources. Although there are already examples of training autonomous driving models through simulators that simulate real-world scenarios, these scenarios require complete manual construction. Directly converting 3D scenes from road network formats will lack a large amount of detail and cannot be used as training sets. Underground parking garage static scenario simulation is regarded as a procedural content generation (PCG) problem. This paper will use the Sarsa algorithm to solve procedural content generation on underground garage structures.	翻訳日:2023-07-11 16:06:32 公開日:2023-07-08
# テスト時間領域一般化のための変分隣接ラベルの学習 Learning Variational Neighbor Labels for Test-Time Domain Generalization ( http://arxiv.org/abs/2307.04033v1 ) ライセンス: Link先を確認	Sameer Ambekar, Zehao Xiao, Jiayi Shen, Xiantong Zhen, Cees G. M. Snoek	(参考訳) 本稿では,モデルが対象領域にデプロイされる前に,ソースドメインのみをトレーニングするドメインの一般化について述べる。我々は、ソーストレーニングとターゲットテストの厳密な分離に従うが、推論中にラベル付けされていないターゲットデータ自体の価値を利用する。我々は3つの貢献をした。まず,実験時に対象領域に学習したモデルを一般化するために,対象サンプルの確率論的擬似ラベル化を提案する。一般化中の不確実性を考慮した分布として擬似ラベルをモデル化し、不正確な擬似ラベルの誤解を招く信号を緩和することにより、テスト時の一般化を変分推論問題として定式化する。次に,より堅牢な擬似ラベルを生成するために,隣接する対象サンプルの情報を含む変分隣接ラベルを学習する。第3に、より代表的対象情報を組み込んで、より正確で頑健な近隣ラベルを生成する能力を学ぶために、一般化手順をシミュレートする訓練中にメタ一般化ステージを導入する。 6つの広く利用されているデータセットの実験は、提案の利点、能力、有効性を示している。 This paper strives for domain generalization, where models are trained exclusively on source domains before being deployed at unseen target domains. We follow the strict separation of source training and target testing but exploit the value of the unlabeled target data itself during inference. We make three contributions. First, we propose probabilistic pseudo-labeling of target samples to generalize the source-trained model to the target domain at test time. We formulate the generalization at test time as a variational inference problem by modeling pseudo labels as distributions to consider the uncertainty during generalization and alleviate the misleading signal of inaccurate pseudo labels. Second, we learn variational neighbor labels that incorporate the information of neighboring target samples to generate more robust pseudo labels. Third, to learn the ability to incorporate more representative target information and generate more precise and robust variational neighbor labels, we introduce a meta-generalization stage during training to simulate the generalization procedure. Experiments on six widely-used datasets demonstrate the benefits, abilities, and effectiveness of our proposal.	翻訳日:2023-07-11 16:00:46 公開日:2023-07-08
# 完全情報ゲームにおける「差分」と後方誘導について On "Indifference" and Backward Induction in Games with Perfect Information ( http://arxiv.org/abs/2307.04029v1 ) ライセンス: Link先を確認	Nimrod Megiddo	(参考訳) ゲームの2つの異なる結果に対するプレイヤーの無関心は、実際の選択が他のプレイヤーに重大な影響を与える可能性があるため、小さな摂動によって扱えない。合理的選択間の結びつきは、他のプレイヤーの効用に基づく合理性の概念の洗練によって解決できると論じられている。このような改良の1つはTit-for-Tatの概念である。 Indifference of a player with respect to two distinct outcomes of a game cannot be handled by small perturbations, because the actual choice may have significant impact on other players, and cause them to act in a way that has significant impact of the indifferent player. It is argued that ties among rational choices can be resolved by refinements of the concept of rationality based on the utilities of other players. One such refinement is the concept of Tit-for-Tat.	翻訳日:2023-07-11 16:00:27 公開日:2023-07-08
# 人体アーティストの拡散モデルの成功度を計測する Measuring the Success of Diffusion Models at Imitating Human Artists ( http://arxiv.org/abs/2307.04028v1 ) ライセンス: Link先を確認	Stephen Casper, Zifan Guo, Shreya Mogulothu, Zachary Marinov, Chinmay Deshpande, Rui-Jie Yew, Zheng Dai, Dylan Hadfield-Menell	(参考訳) 現代の拡散モデルは、AI画像生成の最先端を定めている。彼らの成功は、著作権のある作品を含むインターネット規模のデータをトレーニングすることにある。これにより、これらのモデルが人間のアーティストの作品から学べるか、模倣するか、あるいはコピーするか、という疑問が提起される。この研究は、生成モデルの進化するエコシステムを考えると、モデルの能力に著作権責任を結び付けることが有用であることを示している。特に、著作権および生成システムの法的分析の多くは、トレーニングに保護されたデータを使用することに焦点を当てている。結果として、データ、トレーニング、システム間の接続はしばしば曖昧になる。本研究では,モデルが特定のアーティストを模倣する能力を測定するため,簡単な画像分類手法を検討する。具体的には,Contrastive Language-Image Pretrained (CLIP)エンコーダを用いて,ゼロショット方式で画像の分類を行う。私たちのプロセスは、まずモデルに特定のアーティストを模倣するよう促します。次に、CLIPを用いてアーティスト(またはアーティストの作品)を模倣から再分類できるかどうかをテストする。これらのテストがオリジナルのアーティストの模倣と一致する場合、モデルがそのアーティストの表現を模倣できることを示唆している。私たちのアプローチはシンプルで量的です。さらに、標準技術を使用し、追加のトレーニングを必要としない。著作権のある著作物を70人のプロのデジタルアーティストに模倣する、安定した拡散の能力を監査することで、このアプローチを実証する。この集合からアーティストを模倣するために安定した拡散が促されると、アーティストは平均81.0%の精度で模倣から識別できることがわかった。最後に,アーティストの作品のサンプルを,これらの模倣画像と高い統計的信頼性で一致させることができることを示す。これらの結果は、安定拡散は個人芸術家の模倣に広く成功していることを示唆している。 Modern diffusion models have set the state-of-the-art in AI image generation. Their success is due, in part, to training on Internet-scale data which often includes copyrighted work. This prompts questions about the extent to which these models learn from, imitate, or copy the work of human artists. This work suggests that tying copyright liability to the capabilities of the model may be useful given the evolving ecosystem of generative models. Specifically, much of the legal analysis of copyright and generative systems focuses on the use of protected data for training. As a result, the connections between data, training, and the system are often obscured. In our approach, we consider simple image classification techniques to measure a model's ability to imitate specific artists. Specifically, we use Contrastive Language-Image Pretrained (CLIP) encoders to classify images in a zero-shot fashion. Our process first prompts a model to imitate a specific artist. Then, we test whether CLIP can be used to reclassify the artist (or the artist's work) from the imitation. If these tests match the imitation back to the original artist, this suggests the model can imitate that artist's expression. Our approach is simple and quantitative. Furthermore, it uses standard techniques and does not require additional training. We demonstrate our approach with an audit of Stable Diffusion's capacity to imitate 70 professional digital artists with copyrighted work online. When Stable Diffusion is prompted to imitate an artist from this set, we find that the artist can be identified from the imitation with an average accuracy of 81.0%. Finally, we also show that a sample of the artist's work can be matched to these imitation images with a high degree of statistical reliability. Overall, these results suggest that Stable Diffusion is broadly successful at imitating individual human artists.	翻訳日:2023-07-11 16:00:20 公開日:2023-07-08
# ロバストランキング解説 Robust Ranking Explanations ( http://arxiv.org/abs/2307.04024v1 ) ライセンス: Link先を確認	Chao Chen, Chenghua Guo, Guixiang Ma, Ming Zeng, Xi Zhang, Sihong Xie	(参考訳) 機械学習モデルのロバストな説明は、モデルに対する人間の信頼を確立するために重要である。認識能力が限られているため、ほとんどの人間は最上位のサルエント特徴のみを解釈できる。上位のサルエント機能を敵の攻撃、特により脆弱な勾配に基づく説明に対して堅牢にすることが重要である。既存の防御力は、より弱い保護力を持つ$\ell_p$-normsを用いて堅牢である。提案手法は,サリート特徴量とアンカートップサリート特徴を効率的に最大化するために,サリート特徴量を測定するための説明厚みを定義し,その厚みの移動可能なサーロゲート境界を導出し, \textit{r2et} アルゴリズムを設計する。理論的には,R2ETと対人訓練の関連性を示す。脳ネットワークを含む幅広いネットワークアーキテクチャとデータモダリティを用いた実験では、R2ETは精度を維持しながらステルス攻撃下でのロバスト性の高さを実証している。 Robust explanations of machine learning models are critical to establish human trust in the models. Due to limited cognition capability, most humans can only interpret the top few salient features. It is critical to make top salient features robust to adversarial attacks, especially those against the more vulnerable gradient-based explanations. Existing defense measures robustness using $\ell_p$-norms, which have weaker protection power. We define explanation thickness for measuring salient features ranking stability, and derive tractable surrogate bounds of the thickness to design the \textit{R2ET} algorithm to efficiently maximize the thickness and anchor top salient features. Theoretically, we prove a connection between R2ET and adversarial training. Experiments with a wide spectrum of network architectures and data modalities, including brain networks, demonstrate that R2ET attains higher explanation robustness under stealthy attacks while retaining accuracy.	翻訳日:2023-07-11 15:59:52 公開日:2023-07-08
# 量子状態のフォック・バルグマン表現における流体力学的画像と量子計算の方法 Method of Hydrodynamic Images and Quantum Calculus in Fock-Bargmann Representation of Quantum States ( http://arxiv.org/abs/2307.04020v1 ) ライセンス: Link先を確認	Oktay K Pashaev	(参考訳) 古典的流体力学の観点からフォック空間の量子状態に対する新しいアプローチを提案する。フォック・バルグマン表現における量子状態の波動関数を表す複素解析関数の等角写像により、これらの量子状態が非圧縮的かつ非回転的古典的流体力学フローによって記述されるような複素ポテンシャルを定義する。我々のアプローチでは、波動関数の零点は同じ強度の平面内の点渦(ソース)の集合として現れ、有界領域の画像としてそれらを解釈できる。猫状態の場合、流体表現は斜めストリップ領域内の点源の記述として、無限個の周期的に分布する画像を持つ。環状領域について、無限の画像の集合は、Jackson $q$-exponential functionによって記述される。これらの関数はq-fock-bargmann表現における$q$変形量子振動子の量子コヒーレント状態の波動関数を表し、幾何進行で分布する点渦の無限集合を記述する。 We propose a new approach to quantum states in Fock space in terms of classical hydrodynamics. By conformal mapping of complex analytic function, representing the wave function of quantum states in Fock-Bargmann representation, we define the complex potential, describing these quantum states by incompressible and irrotational classical hydrodynamic flow. In our approach, zeros of the wave function appear as a set of point vortices (sources) in plane with the same strength, allowing interpretation of them as images in a bounded domain. For the cat states we find fluid representation as descriptive of a point source in the oblique strip domain, with infinite number of periodically distributed images. For the annular domain, the infinite set of images is described by Jackson $q$-exponential functions. We show that these functions represent the wave functions of quantum coherent states of the $q$-deformed quantum oscillator in q-Fock-Bargmann representation and describe the infinite set of point vortices, distributed in geometric progression.	翻訳日:2023-07-11 15:59:35 公開日:2023-07-08
# GP誘導MPPIによる複雑クラッタ環境における効率的なナビゲーション GP-guided MPPI for Efficient Navigation in Complex Unknown Cluttered Environments ( http://arxiv.org/abs/2307.04019v1 ) ライセンス: Link先を確認	Ihab S. Mohamed, Mahmoud Ali, and Lantao Liu	(参考訳) センサー能力に制限のある未知のクラッタ環境におけるロボットナビゲーションは、ロボット工学において大きな課題となる。モデル予測パスインターガル(MPPI)のような局所軌道最適化手法はこの課題に対して有望な解決策である。しかし、特に困難な環境条件に遭遇したり、計画の地平線を越えて航行する場合には、効果的な航行を確保するために、グローバルな指導が必要である。本研究では,Sparse Gaussian Process(SGP)に基づく局所認識モデルとMPPIを統合したオンライン学習型制御戦略GP-MPPIを提案する。鍵となるアイデアは、SGPの学習能力を活用して分散(不確実性)表面を構築することで、ロボットはその周囲の移動可能な空間を学習し、提案されたサブゴールの集合を特定し、最終的に地元のMPPIプランナーに定義されたコスト関数を最小限に抑える最適なサブゴールを推奨する。その後、MPPIはロボットと衝突回避制約を満たす最適制御シーケンスを計算する。このようなアプローチは、環境のグローバルマップやオフラインのトレーニングプロセスの必要性を排除します。複雑な環境下での2次元自律走行タスクのシミュレーションおよび実環境実験により提案した制御戦略の有効性とロバスト性を検証し,ロボットを目的に向かって安全に誘導する上での優位性を実証した。追加ビデオを含むGP-MPPIのGPU実装はhttps://github.com/IhabMohamed/GP-MPPIで利用可能である。 Robotic navigation in unknown, cluttered environments with limited sensing capabilities poses significant challenges in robotics. Local trajectory optimization methods, such as Model Predictive Path Intergal (MPPI), are a promising solution to this challenge. However, global guidance is required to ensure effective navigation, especially when encountering challenging environmental conditions or navigating beyond the planning horizon. This study presents the GP-MPPI, an online learning-based control strategy that integrates MPPI with a local perception model based on Sparse Gaussian Process (SGP). The key idea is to leverage the learning capability of SGP to construct a variance (uncertainty) surface, which enables the robot to learn about the navigable space surrounding it, identify a set of suggested subgoals, and ultimately recommend the optimal subgoal that minimizes a predefined cost function to the local MPPI planner. Afterward, MPPI computes the optimal control sequence that satisfies the robot and collision avoidance constraints. Such an approach eliminates the necessity of a global map of the environment or an offline training process. We validate the efficiency and robustness of our proposed control strategy through both simulated and real-world experiments of 2D autonomous navigation tasks in complex unknown environments, demonstrating its superiority in guiding the robot safely towards its desired goal while avoiding obstacles and escaping entrapment in local minima. The GPU implementation of GP-MPPI, including the supplementary video, is available at https://github.com/IhabMohamed/GP-MPPI.	翻訳日:2023-07-11 15:59:16 公開日:2023-07-08
# 言語間の要約を再考する:コーパスに基づく研究とアノテーションの改良による新しいベンチマーク Revisiting Cross-Lingual Summarization: A Corpus-based Study and A New Benchmark with Improved Annotation ( http://arxiv.org/abs/2307.04018v1 ) ライセンス: Link先を確認	Yulong Chen, Huajian Zhang, Yijie Zhou, Xuefeng Bai, Yueguan Wang, Ming Zhong, Jianhao Yan, Yafu Li, Judy Li, Michael Zhu, Yue Zhang	(参考訳) 既存の言語間要約(CLS)の作業の多くは、注釈付き要約を1つの言語から別の言語へシンプルかつ直接翻訳することで、CLSコーパスを構築し、要約と翻訳プロセスの両方のエラーを含むことができる。この問題に対処するため,我々は,ソース入力コンテキストを明示的に考慮した新しいアノテーションスキーマを用いて,言語間会話要約ベンチマークであるconvsumxを提案する。 ConvSumXは2つのサブタスクで構成され、それぞれが3つの言語方向をカバーする。我々はConvSumXと3つの広く使われている手書きCLSコーパスを徹底的に分析し、ConvSumXが入力テキストに対してより忠実であることを示す。さらに,同じ直観に基づいて,対話と要約の両方を入力として人間のアノテーションプロセスをシミュレートする2段階の手法を提案する。実験の結果, 2段階法がconvsumxの強力なベースラインを, 自動評価と人的評価の両方で上回った。解析により、ソース入力テキストと要約の両方が言語間要約をモデル化するのに重要であることが示された。 Most existing cross-lingual summarization (CLS) work constructs CLS corpora by simply and directly translating pre-annotated summaries from one language to another, which can contain errors from both summarization and translation processes. To address this issue, we propose ConvSumX, a cross-lingual conversation summarization benchmark, through a new annotation schema that explicitly considers source input context. ConvSumX consists of 2 sub-tasks under different real-world scenarios, with each covering 3 language directions. We conduct thorough analysis on ConvSumX and 3 widely-used manually annotated CLS corpora and empirically find that ConvSumX is more faithful towards input text. Additionally, based on the same intuition, we propose a 2-Step method, which takes both conversation and summary as input to simulate human annotation process. Experimental results show that 2-Step method surpasses strong baselines on ConvSumX under both automatic and human evaluation. Analysis shows that both source input text and summary are crucial for modeling cross-lingual summaries.	翻訳日:2023-07-11 15:58:49 公開日:2023-07-08
# 関連バイオマーカー感受性急性リンパ芽球性診断用新規パイプライン Novel Pipeline for Diagnosing Acute Lymphoblastic Sensitive to Related Biomarkers ( http://arxiv.org/abs/2307.04014v1 ) ライセンス: Link先を確認	Amirhossein Askari-Farsangi, Ali Sharifi-Zarchi, Mohammad Hossein Rohban	(参考訳) 急性リンパ芽球性白血病(ALL)は小児の血液型で最も多い。治療プロセスの早期開始は、患者の命を救えるために重要であり、そのため、この疾患の早期診断が不可欠である。これらの患者の血液スミア画像を調べることは、専門医がこの病気を診断するために使う方法の1つである。深層学習に基づく手法は医学分野に多くの応用があり、近年大きく進歩している。この分野ではall診断は例外ではなく、この問題に対する機械学習に基づくいくつかの手法が提案されている。従来の方法では高い診断精度が報告されていたが,本研究は,ショートカットを行うモデルが意味のある決定を下さないため,これだけでは不十分であることを示した。この問題は、医療訓練データセットが小さいためである。これを解決するために、私たちはモデルを専門家の作業にインスパイアされたパイプラインに従うように制約しました。また,1つの画像のみに基づく判断が不十分であるため,現実的な結果を得るためには,マルチインスタンス学習問題として問題を再定義する必要があることを示した。私たちのモデルは、マルチインスタンス学習セットアップでこの問題に対するソリューションを提供する最初のモデルです。我々は、血液学者が使用するプロセスに近似し、疾患バイオマーカーに敏感であり、96.15%の精度、F1スコア94.24%の感度、97.56%の感度、ALL IDB 1の90.91%の特異性を実現する新しいall診断パイプラインを導入した。提案手法は, 難解なテストを行い, 許容できる性能を持つ分散型データセット上でさらに評価された。特に、私たちのモデルは比較的小さなデータセットでトレーニングされており、データ可用性が制限された他の医療データセットにアプローチを適用する可能性を強調しています。 Acute Lymphoblastic Leukemia (ALL) is one of the most common types of childhood blood cancer. The quick start of the treatment process is critical to saving the patient's life, and for this reason, early diagnosis of this disease is essential. Examining the blood smear images of these patients is one of the methods used by expert doctors to diagnose this disease. Deep learning-based methods have numerous applications in medical fields, as they have significantly advanced in recent years. ALL diagnosis is not an exception in this field, and several machine learning-based methods for this problem have been proposed. In previous methods, high diagnostic accuracy was reported, but our work showed that this alone is not sufficient, as it can lead to models taking shortcuts and not making meaningful decisions. This issue arises due to the small size of medical training datasets. To address this, we constrained our model to follow a pipeline inspired by experts' work. We also demonstrated that, since a judgement based on only one image is insufficient, redefining the problem as a multiple-instance learning problem is necessary for achieving a practical result. Our model is the first to provide a solution to this problem in a multiple-instance learning setup. We introduced a novel pipeline for diagnosing ALL that approximates the process used by hematologists, is sensitive to disease biomarkers, and achieves an accuracy of 96.15%, an F1-score of 94.24%, a sensitivity of 97.56%, and a specificity of 90.91% on ALL IDB 1. Our method was further evaluated on an out-of-distribution dataset, which posed a challenging test and had acceptable performance. Notably, our model was trained on a relatively small dataset, highlighting the potential for our approach to be applied to other medical datasets with limited data availability.	翻訳日:2023-07-11 15:58:29 公開日:2023-07-08
# BPNet:3Dポイントクラウド上のB\'ezierプリミティブセグメンテーション BPNet: B\'ezier Primitive Segmentation on 3D Point Clouds ( http://arxiv.org/abs/2307.04013v1 ) ライセンス: Link先を確認	Rao Fu, Cheng Wen, Qian Li, Xiao Xiao, Pierre Alliez	(参考訳) 本稿では,3Dポイントクラウド上のB\'ezierプリミティブセグメンテーションを学習するための,新しいエンドツーエンドディープラーニングフレームワークBPNetを提案する。既存の作品は異なるプリミティブタイプを別々に扱うため、それらは有限形状のカテゴリに制限される。この問題に対処するため、点雲上の一般化原始セグメント化を求める。 NURBSモデル上のB\'ezier分解からインスピレーションを得て、プリミティブ型をキャストするガイドポイントクラウドセグメンテーションに転送する。カスケードアーキテクチャ上で同時にb\'ezierプリミティブセグメンテーションと幾何フィッティングを学ぶための統合最適化フレームワークを提案する。具体的には,プリミティブセグメンテーションを改善するソフト投票調整器を導入し,クラスタポイント機能への自動重み付け埋め込みモジュールを提案する。また,異なるプリミティブを持つ複数のcadモデルを同時に処理できる再構築モジュールを提案する。本研究では,abc合成データセットと実スキャンデータセットについて広範な実験を行い,そのアプローチを異なるベースライン法と比較した。実験では,推定速度が大幅に速く,従来の作業よりもセグメンテーションにおいて優れた性能を示した。 This paper proposes BPNet, a novel end-to-end deep learning framework to learn B\'ezier primitive segmentation on 3D point clouds. The existing works treat different primitive types separately, thus limiting them to finite shape categories. To address this issue, we seek a generalized primitive segmentation on point clouds. Taking inspiration from B\'ezier decomposition on NURBS models, we transfer it to guide point cloud segmentation casting off primitive types. A joint optimization framework is proposed to learn B\'ezier primitive segmentation and geometric fitting simultaneously on a cascaded architecture. Specifically, we introduce a soft voting regularizer to improve primitive segmentation and propose an auto-weight embedding module to cluster point features, making the network more robust and generic. We also introduce a reconstruction module where we successfully process multiple CAD models with different primitives simultaneously. We conducted extensive experiments on the synthetic ABC dataset and real-scan datasets to validate and compare our approach with different baseline methods. Experiments show superior performance over previous work in terms of segmentation, with a substantially faster inference speed.	翻訳日:2023-07-11 15:57:57 公開日:2023-07-08
# papillarray光触覚センサを用いたロボット把持改善のためのロバスト学習に基づく初期滑り検出 Robust Learning-Based Incipient Slip Detection using the PapillArray Optical Tactile Sensor for Improved Robotic Gripping ( http://arxiv.org/abs/2307.04011v1 ) ライセンス: Link先を確認	Qiang Wang, Pablo Martinez Ulloa, Robert Burke, David Cordova Bulens, and Stephen J. Redmond	(参考訳) スリップを検出する能力、特に初期すべりを検出することで、ロボットシステムは把握された物体が落下するのを防ぐための補正措置を取ることができる。したがってスリップ検出はロボットグリップの全体的な安全性を高めることができる。しかし,初期すべりの高精度検出は依然として大きな課題である。本稿では,PapillArray (Contactile, Australia) 触覚センサを用いた創発性すべり検出のための新しい学習手法を提案する。結果のモデルは、初期スリップに関連するパターンを識別するのに非常に効果的であり、オフラインデータセットでテストした場合、検出成功率は95.6%に達する。さらに,モデルのロバスト性を高めるために,いくつかのデータ拡張手法を導入する。トレーニングデータ収集場所と異なるロボット把持環境にトレーニングモデルを移す場合、モデルは96.8%の成功率で堅牢な性能を保ち、いくつかの実用的な把持タスクを安定化するためのタイムリーなフィードバックを提供する。プロジェクトのWebサイト: https://sites.google.com/view/incipient-slip-detection。 The ability to detect slip, particularly incipient slip, enables robotic systems to take corrective measures to prevent a grasped object from being dropped. Therefore, slip detection can enhance the overall security of robotic gripping. However, accurately detecting incipient slip remains a significant challenge. In this paper, we propose a novel learning-based approach to detect incipient slip using the PapillArray (Contactile, Australia) tactile sensor. The resulting model is highly effective in identifying patterns associated with incipient slip, achieving a detection success rate of 95.6% when tested with an offline dataset. Furthermore, we introduce several data augmentation methods to enhance the robustness of our model. When transferring the trained model to a robotic gripping environment distinct from where the training data was collected, our model maintained robust performance, with a success rate of 96.8%, providing timely feedback for stabilizing several practical gripping tasks. Our project website: https://sites.google.com/view/incipient-slip-detection.	翻訳日:2023-07-11 15:57:38 公開日:2023-07-08
# 戦略的買い手によるコンテキスト動的価格設定 Contextual Dynamic Pricing with Strategic Buyers ( http://arxiv.org/abs/2307.04055v1 ) ライセンス: Link先を確認	Pangpang Liu, Zhuoran Yang, Zhaoran Wang, Will Wei Sun	(参考訳) 個々の特性に基づいて価格を調整するパーソナライズされた価格設定は、企業によって消費者固有の価格ポリシーを実装するために一般的に使用される。このプロセスでは、購入者が戦略的に特徴データを操作して価格を下げ、特定の操作コストを発生させることができる。このような戦略的行動は、企業が利益を最大化するのを妨げる。本稿では,戦略的買い手によるコンテキスト動的価格問題について検討する。売り手は買い手の真の特徴を観察せず、買い手の戦略行動に応じて操作された特徴である。さらに、売り手は購入者の製品評価を観察しないが、販売が行われるかどうかを示すバイナリ応答のみを発行する。これらの課題を認識し,購入者の戦略行動をオンライン学習に組み込んで,販売者の累積収益を最大化する戦略的動的価格政策を提案する。まず、購入者の戦略的行動を無視する既存の非戦略的な価格ポリシーが、合計時間軸をt$で線形に$\omega(t)$の後悔をもたらすことを証明し、これらのポリシーがランダムな価格ポリシーよりも優れていることを示す。すると、提案したポリシーは、$O(\sqrt{T})$のサブ線形後悔上限を達成する。重要なことは、我々のポリシーは、既存の動的価格ポリシーと戦略的行動処理アルゴリズムの合併ではない。我々の政策は、操作の限界コストが事前に不明な場合にも適用できる。そこで我々は,オンライン価格政策における評価パラメータとコストパラメータを同時に推定し,そのパラメータを$O(\sqrt{T})$ regret bound とすることを示した。大規模な実験は、戦略的な行動に気付かない他の価格政策と比較して、我々の理論の発展を支援し、我々の政策の優れた性能を示す。 Personalized pricing, which involves tailoring prices based on individual characteristics, is commonly used by firms to implement a consumer-specific pricing policy. In this process, buyers can also strategically manipulate their feature data to obtain a lower price, incurring certain manipulation costs. Such strategic behavior can hinder firms from maximizing their profits. In this paper, we study the contextual dynamic pricing problem with strategic buyers. The seller does not observe the buyer's true feature, but a manipulated feature according to buyers' strategic behavior. In addition, the seller does not observe the buyers' valuation of the product, but only a binary response indicating whether a sale happens or not. Recognizing these challenges, we propose a strategic dynamic pricing policy that incorporates the buyers' strategic behavior into the online learning to maximize the seller's cumulative revenue. We first prove that existing non-strategic pricing policies that neglect the buyers' strategic behavior result in a linear $\Omega(T)$ regret with $T$ the total time horizon, indicating that these policies are not better than a random pricing policy. We then establish that our proposed policy achieves a sublinear regret upper bound of $O(\sqrt{T})$. Importantly, our policy is not a mere amalgamation of existing dynamic pricing policies and strategic behavior handling algorithms. Our policy can also accommodate the scenario when the marginal cost of manipulation is unknown in advance. To account for it, we simultaneously estimate the valuation parameter and the cost parameter in the online pricing policy, which is shown to also achieve an $O(\sqrt{T})$ regret bound. Extensive experiments support our theoretical developments and demonstrate the superior performance of our policy compared to other pricing policies that are unaware of the strategic behaviors.	翻訳日:2023-07-11 15:50:00 公開日:2023-07-08
# スパイクタイミング依存塑性を用いた深層教師なし学習 Deep Unsupervised Learning Using Spike-Timing-Dependent Plasticity ( http://arxiv.org/abs/2307.04054v1 ) ライセンス: Link先を確認	Sen Lu, Abhronil Sengupta	(参考訳) Spike-Timing-Dependent Plasticity (STDP)は、スパイキングニューラルネットワーク(SNN)の教師なし学習メカニズムであり、ニューロモルフィックハードウェアコミュニティから大きな注目を集めている。しかし、そのようなローカル学習技術をより深いネットワークや大規模タスクに拡張することは、いまだに不可能である。本研究では,ネットワーク出力のSTDPクラスタリングプロセスによって生成された擬似ラベルを用いて,畳み込みネットワークをタンデムで訓練するDeep-STDPフレームワークについて検討する。私たちは、$k$-meansクラスタリングアプローチとは対照的に、Tiny ImageNetデータセットの10クラスのサブセットで、より高速なコンバージェンス速度を同精度で達成します。 Spike-Timing-Dependent Plasticity (STDP) is an unsupervised learning mechanism for Spiking Neural Networks (SNNs) that has received significant attention from the neuromorphic hardware community. However, scaling such local learning techniques to deeper networks and large-scale tasks has remained elusive. In this work, we investigate a Deep-STDP framework where a convolutional network is trained in tandem with pseudo-labels generated by the STDP clustering process on the network outputs. We achieve $24.56\%$ higher accuracy and $3.5\times$ faster convergence speed at iso-accuracy on a 10-class subset of the Tiny ImageNet dataset in contrast to a $k$-means clustering approach.	翻訳日:2023-07-11 15:49:25 公開日:2023-07-08
# シンガポールにおける父親のフレイド・オンラインの状況 How is Fatherhood Framed Online in Singapore? ( http://arxiv.org/abs/2307.04053v1 ) ライセンス: Link先を確認	Tran Hien Van, Abhay Goyal, Muhammad Siddique, Lam Yin Cheung, Nimay Parekh, Jonathan Y Huang, Keri McCrickerd, Edson C Tandoc Jr., Gerard Chung, Navin Kumar	(参考訳) シンガポールにおける父性に関する議論の高まりは、その重要性を証明し、シンガポールにおける父性に関する政策決定を支援するために、父性についての枠組みを探る必要性を示している。シンガポールの父親に関する健全で包括的な政策は、子育てに関する汚名や不安を減らす可能性がある。われわれは15,705の記事と56,221の投稿を分析し、シンガポールのさまざまなオンラインプラットフォーム(ニュースメディア、育児フォーラム、Twitter)における父親の身振りについて調査した。我々はこれらの違いを理解するためにNLP手法を用いた。父親はシンガポールのオンライン環境において様々な形で構成されていたが、父親はシンガポールの家族集団の中心として構成されていたとは思えなかった。私たちの仕事の強みは、私たちが適用したさまざまなテクニックが相互に検証する方法です。 The proliferation of discussion about fatherhood in Singapore attests to its significance, indicating the need for an exploration of how fatherhood is framed, aiding policy-making around fatherhood in Singapore. Sound and holistic policy around fatherhood in Singapore may reduce stigma and apprehension around being a parent, critical to improving the nations flagging birth rate. We analyzed 15,705 articles and 56,221 posts to study how fatherhood is framed in Singapore across a range of online platforms (news outlets, parenting forums, Twitter). We used NLP techniques to understand these differences. While fatherhood was framed in a range of ways on the Singaporean online environment, it did not seem that fathers were framed as central to the Singaporean family unit. A strength of our work is how the different techniques we have applied validate each other.	翻訳日:2023-07-11 15:49:13 公開日:2023-07-08
# 分子群補助データセットへの学習 Learning to Group Auxiliary Datasets for Molecule ( http://arxiv.org/abs/2307.04052v1 ) ライセンス: Link先を確認	Tinglin Huang, Ziniu Hu, Rex Ying	(参考訳) 小さな分子データセットにおけるアノテーションの可用性の制限は、機械学習モデルに課題をもたらす。これを解決するための一般的な戦略は、追加の補助データセットとのコラボレーションである。しかし、より多くのデータを持つことは必ずしも改善を保証しない。ターゲットデータセットの知識が異なる場合や補助分子データセットの知識と矛盾する場合に負の転送が発生する。これを踏まえて、共同トレーニング時にターゲットデータセットに利益をもたらす補助分子データセットを特定することは、依然として重要かつ未解決の問題である。経験的分析により,グラフ構造類似性とタスク類似性の組み合わせが,高親和性補助データセットの同定において,より信頼性の高い指標となることを確かめた。この知見により,各補助分子データセットの潜在的な利益を予測するために,データセット親和性をタスクと構造親和性に分離するMollGroupを提案する。 MolGroupは、双方向最適化フレームワークによって最適化されたルーティングメカニズムを利用することで、これを実現する。メタ勾配を利用して、ルーティング機構はターゲットデータセットのパフォーマンスを最大化するために最適化され、アフィニティをゲーティングスコアとして定量化する。その結果、MollGroupは各ターゲットデータセットに対する補助データセットの最適な組み合わせを予測することができる。実験により,11種類の標的分子データセットにおいて,分子群から選択したgin/graphormer群に対して平均4.41%/3.47%の改善が得られた。 The limited availability of annotations in small molecule datasets presents a challenge to machine learning models. To address this, one common strategy is to collaborate with additional auxiliary datasets. However, having more data does not always guarantee improvements. Negative transfer can occur when the knowledge in the target dataset differs or contradicts that of the auxiliary molecule datasets. In light of this, identifying the auxiliary molecule datasets that can benefit the target dataset when jointly trained remains a critical and unresolved problem. Through an empirical analysis, we observe that combining graph structure similarity and task similarity can serve as a more reliable indicator for identifying high-affinity auxiliary datasets. Motivated by this insight, we propose MolGroup, which separates the dataset affinity into task and structure affinity to predict the potential benefits of each auxiliary molecule dataset. MolGroup achieves this by utilizing a routing mechanism optimized through a bi-level optimization framework. Empowered by the meta gradient, the routing mechanism is optimized toward maximizing the target dataset's performance and quantifies the affinity as the gating score. As a result, MolGroup is capable of predicting the optimal combination of auxiliary datasets for each target dataset. Our extensive experiments demonstrate the efficiency and effectiveness of MolGroup, showing an average improvement of 4.41%/3.47% for GIN/Graphormer trained with the group of molecule datasets selected by MolGroup on 11 target molecule datasets.	翻訳日:2023-07-11 15:48:58 公開日:2023-07-08
# トラックサービスネットワークにおける動的負荷計画のための最適化に基づく学習 Optimization-based Learning for Dynamic Load Planning in Trucking Service Networks ( http://arxiv.org/abs/2307.04050v1 ) ライセンス: Link先を確認	Ritesh Ojha, Wenbo Chen, Hanyu Zhang, Reem Khir, Alan Erera, Pascal Van Hentenryck	(参考訳) 負荷計画問題は、パーセルキャリアのサービスネットワーク設計において重要な課題であり、端末間の時間的ディスパッチを割り当てるトレーラー(またはロード)の数を決定する。もうひとつの重要な課題は、計画された負荷にどのようにパーセルボリュームを割り当てるかを指定するフロープランを決定することだ。本稿では,需要予測が運用開始前の時間とともに変化する中で,負荷と流れを調整するための流れと負荷計画の課題を共同で考慮した動的負荷計画問題(dlpp)について考察する。この論文は、ネットワーク全体の端末でこれらの決定を行うプランナーに通知する意思決定支援ツールの開発を目的としている。本論文は,DLPPをMIPとして定式化し,各商品を一次経路および代替経路にルーティング可能なネットワークにおいて,多数の対称性を有することを示す。その結果、最適化解法は基本的に異なる解を密接に関連する問題に還元し、プランナーを混乱させ、最適化の信頼を減らすことができる。この制限を緩和するために,参照計画に近い最適解を生成することで,これらの対称性を解消するゴール指向最適化を提案する。また,最適化モデルの計算課題に対処するための最適化プロキシを提案する。このプロキシは、機械学習モデルと実現可能性復元モデルを組み合わせて、プランナーがループ内で課すリアルタイム制約を満たすソリューションを見つける。産業インスタンスに関する広範な計算研究により、最適化プロキシは、互いに整合性のあるソリューションを生成する上で、同じ品質のソリューションと桁違いの順序を得る際に、商用の解決器よりも約10倍高速であることが示された。提案手法は,負荷統合のためのDLPPの利点と,機械学習と最適化を組み合わせることで得られる大幅な節約効果を示す。 The load planning problem is a critical challenge in service network design for parcel carriers: it decides how many trailers (or loads) to assign for dispatch over time between pairs of terminals. Another key challenge is to determine a flow plan, which specifies how parcel volumes are assigned to planned loads. This paper considers the Dynamic Load Planning Problem (DLPP) that considers both flow and load planning challenges jointly to adjust loads and flows as the demand forecast changes over time before the day of operations. The paper aims at developing a decision-support tool to inform planners making these decisions at terminals across the network. The paper formulates the DLPP as a MIP and shows that it admits a large number of symmetries in a network where each commodity can be routed through primary and alternate paths. As a result, an optimization solver may return fundamentally different solutions to closely related problems, confusing planners and reducing trust in optimization. To remedy this limitation, the paper proposes a Goal-Directed Optimization that eliminates those symmetries by generating optimal solutions staying close to a reference plan. The paper also proposes an optimization proxy to address the computational challenges of the optimization models. The proxy combines a machine learning model and a feasibility restoration model and finds solutions that satisfy real-time constraints imposed by planners-in-the-loop. An extensive computational study on industrial instances shows that the optimization proxy is around 10 times faster than the commercial solver in obtaining the same quality solutions and orders of magnitude faster for generating solutions that are consistent with each other. The proposed approach also demonstrates the benefits of the DLPP for load consolidation, and the significant savings obtained from combining machine learning and optimization.	翻訳日:2023-07-11 15:48:37 公開日:2023-07-08
# 並列アルゴリズムとニューラルネットワークの実行 Parallel Algorithms Align with Neural Execution ( http://arxiv.org/abs/2307.04049v1 ) ライセンス: Link先を確認	Valerie Engelmayer, Dobrik Georgiev, Petar Veli\v{c}kovi\'c	(参考訳) ニューラルアルゴリズム推論は並列プロセッサである。シーケンシャルアルゴリズムを教えることは、この性質に矛盾し、計算のかなりの部分を冗長にする。しかし、並列アルゴリズムは計算能力をフル活用し、より少ない層の実行を必要とする。これは、clrsフレームワーク上のシーケンシャルなコンポーネントに対して、検索、ソート、および強結合コンポーネントの並列実装を比較するときに観察されるように、トレーニング時間を劇的に削減する。さらに、ほとんどの場合、並列バージョンは予測性能が優れている。 Neural algorithmic reasoners are parallel processors. Teaching them sequential algorithms contradicts this nature, rendering a significant share of their computations redundant. Parallel algorithms however may exploit their full computational power, therefore requiring fewer layers to be executed. This drastically reduces training times, as we observe when comparing parallel implementations of searching, sorting and finding strongly connected components to their sequential counterparts on the CLRS framework. Additionally, parallel versions achieve strongly superior predictive performance in most cases.	翻訳日:2023-07-11 15:48:07 公開日:2023-07-08
# キャリブレーション・アウェア・マージン損失:深部メトリクス学習における精度・校正・校正一貫性の確立 Calibration-Aware Margin Loss: Pushing the Accuracy-Calibration Consistency Pareto Frontier for Deep Metric Learning ( http://arxiv.org/abs/2307.04047v1 ) ライセンス: Link先を確認	Qin Zhang, Linghan Xu, Qingming Tang, Jun Fang, Ying Nian Wu, Joe Tighe, Yifan Xing	(参考訳) 異なるテストクラス/ディストリビューション間で同じ距離しきい値を使用する能力は、商用画像検索システムのフリクションレス展開に非常に望ましい。しかし、最先端のメトリクス学習損失は、しばしばクラス内およびクラス間埋め込み構造を高度に変化させ、しきい値のキャリブレーションを非自明なプロセスにする。本稿では,対象校正範囲における異なるクラス間の動作特性のばらつきを計測するopis( operating-point-incosistency-score)と呼ばれる新しいメトリックを提案する。高正確性体制では、校正一貫性のコストで精度が向上するパレートフロンティアが存在することが分かっています。そこで我々は,CAM(Calibration-Aware Margin)損失という新たな正規化を開発し,学習中のクラス間の表現構造の均一性を促進する。広汎な実験は、CAMがキャリブレーション一貫性を向上し、精度を維持または向上し、最先端のメトリクス学習方法より優れていることを示す。 The ability to use the same distance threshold across different test classes / distributions is highly desired for a frictionless deployment of commercial image retrieval systems. However, state-of-the-art deep metric learning losses often result in highly varied intra-class and inter-class embedding structures, making threshold calibration a non-trivial process in practice. In this paper, we propose a novel metric named Operating-Point-Incosistency-Score (OPIS) that measures the variance in the operating characteristics across different classes in a target calibration range, and demonstrate that high accuracy of a metric learning embedding model does not guarantee calibration consistency for both seen and unseen classes. We find that, in the high-accuracy regime, there exists a Pareto frontier where accuracy improvement comes at the cost of calibration consistency. To address this, we develop a novel regularization, named Calibration-Aware Margin (CAM) loss, to encourage uniformity in the representation structures across classes during training. Extensive experiments demonstrate CAM's effectiveness in improving calibration-consistency while retaining or even enhancing accuracy, outperforming state-of-the-art deep metric learning methods.	翻訳日:2023-07-11 15:47:58 公開日:2023-07-08
# 逆訓練による非パラメトリック回帰のためのディープニューラルネットワーク推定器の超ノルム収束 Sup-Norm Convergence of Deep Neural Network Estimator for Nonparametric Regression by Adversarial Training ( http://arxiv.org/abs/2307.04042v1 ) ライセンス: Link先を確認	Masaaki Imaizumi	(参考訳) 深層ニューラルネットワーク推定器の超ノルム収束を,新しい逆訓練方式で示す。非パラメトリック回帰問題に対して、深層ニューラルネットワークを用いた推定器は、$L2$-normの意味でより良い性能が得られることが示されている。対照的に、ニューラルネットワークモデルの深い構造のため、最小二乗のニューラルネットワーク推定器が超ノルム収束を達成することは困難である。本研究では,敵対的学習方式を開発し,ディープニューラルネットワーク推定器の超ノルム収束について検討する。まず、通常の逆行訓練は神経推定器を矛盾させる。第2に,深層ニューラルネットワーク推定器は,提案する適応訓練により,超ノルム感覚の最適速度を達成することを示す。我々は,損失関数とデータ生成関数の一般設定に敵訓練を拡張する。我々の実験は理論的な結果を支持する。 We show the sup-norm convergence of deep neural network estimators with a novel adversarial training scheme. For the nonparametric regression problem, it has been shown that an estimator using deep neural networks can achieve better performances in the sense of the $L2$-norm. In contrast, it is difficult for the neural estimator with least-squares to achieve the sup-norm convergence, due to the deep structure of neural network models. In this study, we develop an adversarial training scheme and investigate the sup-norm convergence of deep neural network estimators. First, we find that ordinary adversarial training makes neural estimators inconsistent. Second, we show that a deep neural network estimator achieves the optimal rate in the sup-norm sense by the proposed adversarial training with correction. We extend our adversarial training to general setups of a loss function and a data-generating function. Our experiments support the theoretical findings.	翻訳日:2023-07-11 15:47:34 公開日:2023-07-08
# 局所的説明による人間と畳み込みニューラルネットワーク間の直接フィードバックループの設計 Designing a Direct Feedback Loop between Humans and Convolutional Neural Networks through Local Explanations ( http://arxiv.org/abs/2307.04036v1 ) ライセンス: Link先を確認	Tong Steven Sun, Yuyang Gao, Shubham Khaladkar, Sijia Liu, Liang Zhao, Young-Ho Kim, Sungsoo Ray Hong	(参考訳) 局所的な説明は、畳み込みニューラルネットワーク(CNN)がどのように出力を導出するかを説明するために、画像上のヒートマップを提供する。その視覚的直感のため、この手法はCNNを診断するための最も一般的なAI(XAI)手法の1つである。しかし、我々の形成的研究(s1)を通じて、局所的な説明に関するmlエンジニアの曖昧な見解を、cnnを構築する上で重要かつ不可欠であると同時に、脆弱性検出のヒューリスティックな性質によってそれらを使い果たしたプロセスに対して捉えました。さらに診断から得られた脆弱性に基づいてCNNを操ることは非常に困難であった。このギャップを軽減するために,ユーザとCNN間の直接フィードバックループを実現し,局所的な説明を用いてCNNの脆弱性を診断・修正する,初のインタラクティブデザインであるDeepFuseを設計した。 DeepFuseは、CNNのエンジニアが、"不合理な"ローカルな説明を体系的に検索し、労働効率のよい方法で不合理であると認識された人々に対する新しいバウンダリを注釈するのに役立つ。次に、与えられたアノテーションに基づいてモデルを制御し、モデルが同じような間違いを起こさないようにする。 CNN経験者12名を対象に2日間の研究を行った。 DeepFuseを使うことで、参加者は現在の最先端のモデルよりも正確で“合理的”なモデルを作りました。また、deepfuse guidesのケースベースの推論方法が、現在のプラクティスを実際的に改善できることにも気付いた。私たちは、将来のhci駆動設計が私たちのプラクティスを前進させ、xai駆動の洞察をより効果的にするための設計に意味を与えます。 The local explanation provides heatmaps on images to explain how Convolutional Neural Networks (CNNs) derive their output. Due to its visual straightforwardness, the method has been one of the most popular explainable AI (XAI) methods for diagnosing CNNs. Through our formative study (S1), however, we captured ML engineers' ambivalent perspective about the local explanation as a valuable and indispensable envision in building CNNs versus the process that exhausts them due to the heuristic nature of detecting vulnerability. Moreover, steering the CNNs based on the vulnerability learned from the diagnosis seemed highly challenging. To mitigate the gap, we designed DeepFuse, the first interactive design that realizes the direct feedback loop between a user and CNNs in diagnosing and revising CNN's vulnerability using local explanations. DeepFuse helps CNN engineers to systemically search "unreasonable" local explanations and annotate the new boundaries for those identified as unreasonable in a labor-efficient manner. Next, it steers the model based on the given annotation such that the model doesn't introduce similar mistakes. We conducted a two-day study (S2) with 12 experienced CNN engineers. Using DeepFuse, participants made a more accurate and "reasonable" model than the current state-of-the-art. Also, participants found the way DeepFuse guides case-based reasoning can practically improve their current practice. We provide implications for design that explain how future HCI-driven design can move our practice forward to make XAI-driven insights more actionable.	翻訳日:2023-07-11 15:47:22 公開日:2023-07-08
# 量子変分アルゴリズムにおけるショット数最小化の新しい枠組み A novel framework for Shot number minimization in Quantum Variational Algorithms ( http://arxiv.org/abs/2307.04035v1 ) ライセンス: Link先を確認	Seyed Sajad Kahani and Amin Nobakhti	(参考訳) 変分量子アルゴリズム(VQA)は、近い将来、様々な量子コンピューティングアプリケーションに対する潜在的な解決策として注目されている。しかし、これらのアルゴリズムを量子デバイスに実装するには、しばしばかなりの量の測定が必要であり、結果として時間とリソース集約プロセスが生じる。本稿では,VQAにおけるショット評価の削減を目的とした最適化アルゴリズムの一般化フレームワークを提案する。提案するフレームワークは,推定器と最適化器を組み合わせたものである。本フレームワーク内の2つのケーススタディについて検討する。第1のケースでは,サンプル平均推定器と模擬焼鈍最適化器をペアリングし,第2のケースでは再帰的推定器と勾配降下最適化器を組み合わせる。いずれの場合も,提案手法が従来の手法と比較して顕著な性能向上をもたらすことを示す。 Variational Quantum Algorithms (VQAs) have gained significant attention as a potential solution for various quantum computing applications in the near term. However, implementing these algorithms on quantum devices often necessitates a substantial number of measurements, resulting in time-consuming and resource-intensive processes. This paper presents a generalized framework for optimization algorithms aiming to reduce the number of shot evaluations in VQAs. The proposed framework combines an estimator and an optimizer. We investigate two specific case studies within this framework. In the first case, we pair a sample mean estimator with a simulated annealing optimizer, while in the second case, we combine a recursive estimator with a gradient descent optimizer. In both instances, we demonstrate that our proposed approach yields notable performance enhancements compared to conventional methods.	翻訳日:2023-07-11 15:46:53 公開日:2023-07-08
# 連続語エキスパートの混在としての双方向注意 Bidirectional Attention as a Mixture of Continuous Word Experts ( http://arxiv.org/abs/2307.04057v1 ) ライセンス: Link先を確認	Kevin Christian Wibisono, Yixin Wang	(参考訳) 双方向注意$\unicode{x2013}$ 位置エンコーディングとマスク言語モデル(mlm)の目標 $\unicode{x2013}$ は、現代の大規模言語モデル(llm)の重要なコンポーネントとして登場した。実証的な成功にもかかわらず、統計的基盤を調査する研究はほとんどない: 双方向の注意が暗黙的に適合する統計モデルは何だろうか? 意図しない前者とは何が違うのか? この論文でこれらの疑問を探求する。キーとなる観察は、再パラメータ化時に単層単頭双方向の注意を合わせることは、単語の連続袋(CBOW)モデルにミックスオブエキスパート(MoE)重みを付けることと等価である。さらに、複数の頭部と複数の層を持つ双方向の注意は、積み重ねられたMoEとMoEの混合物と等価である。この統計学的視点は,双方向注意におけるmoeの個別的利用を明らかにした。また、文中の各単語の位置を表的特徴として見る場合、分類表データへの即時拡張も提案する。実験的な研究全体にわたって、この拡張は、out-of-distribution (OOD) 一般化において、既存の変圧器の表層拡張よりも優れていることが判明した。最後に、この双方向注意の統計的視点は、単語埋め込みに線形単語類似が存在する場合に理論的に特徴付けることができる。これらの分析により、二方向の注意は、意図しない前者よりも線形な単語類似性を示すために、はるかに強い仮定を必要とすることが示された。 Bidirectional attention $\unicode{x2013}$ composed of self-attention with positional encodings and the masked language model (MLM) objective $\unicode{x2013}$ has emerged as a key component of modern large language models (LLMs). Despite its empirical success, few studies have examined its statistical underpinnings: What statistical model is bidirectional attention implicitly fitting? What sets it apart from its non-attention predecessors? We explore these questions in this paper. The key observation is that fitting a single-layer single-head bidirectional attention, upon reparameterization, is equivalent to fitting a continuous bag of words (CBOW) model with mixture-of-experts (MoE) weights. Further, bidirectional attention with multiple heads and multiple layers is equivalent to stacked MoEs and a mixture of MoEs, respectively. This statistical viewpoint reveals the distinct use of MoE in bidirectional attention, which aligns with its practical effectiveness in handling heterogeneous data. It also suggests an immediate extension to categorical tabular data, if we view each word location in a sentence as a tabular feature. Across empirical studies, we find that this extension outperforms existing tabular extensions of transformers in out-of-distribution (OOD) generalization. Finally, this statistical perspective of bidirectional attention enables us to theoretically characterize when linear word analogies are present in its word embeddings. These analyses show that bidirectional attention can require much stronger assumptions to exhibit linear word analogies than its non-attention predecessors.	翻訳日:2023-07-11 15:38:42 公開日:2023-07-08
# 多様体フィルタ結合ネットワーク Manifold Filter-Combine Networks ( http://arxiv.org/abs/2307.04056v1 ) ライセンス: Link先を確認	Joyce Chew and Edward De Brouwer and Smita Krishnaswamy and Deanna Needell and Michael Perlmutter	(参考訳) 我々は,Manifold Filter-Combine Networksと呼ぶ,多種多様な多様体ニューラルネットワーク(MNN)を導入する。このクラスには、Wang、Ruiz、Ribeiroによる以前の研究で考慮されたMNN、多様体散乱変換(ニューラルネットワークのウェーブレットモデル)、およびキップやウェリングのグラフ畳み込みネットワークと同等の多様体のような文献でこれまで考えられていなかった興味深い例が含まれる。次に、そのようなネットワークを実装するためのデータ駆動グラフを構築する手法について、多様体の全体的知識を持たないが有限個のサンプル点へのアクセスしか持たない場合を考える。サンプル点の数が無限になりがちであるため,ネットワークはその連続限界に確実に収束するのに十分な条件を与える。従来の作業(特定のMNNアーキテクチャやグラフ構造に焦点を当てた)とは異なり、コンバージェンスの割合は、使用するフィルタの数に明示的に依存しない。さらに,従来得られた指数的依存よりもネットワークの深さに線形依存を示す。 We introduce a large class of manifold neural networks (MNNs) which we call Manifold Filter-Combine Networks. This class includes as special cases, the MNNs considered in previous work by Wang, Ruiz, and Ribeiro, the manifold scattering transform (a wavelet-based model of neural networks), and other interesting examples not previously considered in the literature such as the manifold equivalent of Kipf and Welling's graph convolutional network. We then consider a method, based on building a data-driven graph, for implementing such networks when one does not have global knowledge of the manifold, but merely has access to finitely many sample points. We provide sufficient conditions for the network to provably converge to its continuum limit as the number of sample points tends to infinity. Unlike previous work (which focused on specific MNN architectures and graph constructions), our rate of convergence does not explicitly depend on the number of filters used. Moreover, it exhibits linear dependence on the depth of the network rather than the exponential dependence obtained previously.	翻訳日:2023-07-11 15:38:09 公開日:2023-07-08
# puffin:蒸気圧予測のためのパス統一フィードフォワードインタフェースネットワーク PUFFIN: A Path-Unifying Feed-Forward Interfaced Network for Vapor Pressure Prediction ( http://arxiv.org/abs/2307.02903v2 ) ライセンス: Link先を確認	Vinicius Viena Santana, Carine Menezes Rebello, Luana P. Queiroz, Ana Mafalda Ribeiro, Nadia Shardt, and Idelfonso B. R. Nogueira	(参考訳) 蒸気圧の正確な予測は、様々な産業・環境用途に不可欠である。しかし, 実験の資源と労働力の強さから, 興味のあるすべての化合物の正確な測定は不可能である。蒸気圧を予測するための温度依存関係が要求されるとき、資源と労働の需要はさらに増加する。本稿では,移動学習とドメイン知識(アントワーヌ方程式)にインスパイアされた新しい帰納バイアスノードを組み合わせることで,蒸気圧予測を改善する機械学習フレームワークPUFFINを提案する。グラフ埋め込みを用いたインダクティブバイアスとトランスファーラーニングを活用することで、puffinはインダクティブバイアスを使用しない、あるいは化合物の汎用記述子を使用する代替戦略よりも優れている。このフレームワークは、データ可用性の限界を克服するためにドメイン固有の知識を組み込むことによって、他の物理化学的性質の予測を含む化学化合物分析の幅広い応用の可能性を示している。インダクティブアントインノードはネットワーク由来アントイン方程式係数を生成するため,提案する機械学習フレームワークは部分的に解釈可能である。すると、得られた分析表現を直接プロセス設計ソフトウェアに組み込んで、産業や環境で発生するプロセスの予測と制御を改善することができる。 Accurately predicting vapor pressure is vital for various industrial and environmental applications. However, obtaining accurate measurements for all compounds of interest is not possible due to the resource and labor intensity of experiments. The demand for resources and labor further multiplies when a temperature-dependent relationship for predicting vapor pressure is desired. In this paper, we propose PUFFIN (Path-Unifying Feed-Forward Interfaced Network), a machine learning framework that combines transfer learning with a new inductive bias node inspired by domain knowledge (the Antoine equation) to improve vapor pressure prediction. By leveraging inductive bias and transfer learning using graph embeddings, PUFFIN outperforms alternative strategies that do not use inductive bias or that use generic descriptors of compounds. The framework's incorporation of domain-specific knowledge to overcome the limitation of poor data availability shows its potential for broader applications in chemical compound analysis, including the prediction of other physicochemical properties. Importantly, our proposed machine learning framework is partially interpretable, because the inductive Antoine node yields network-derived Antoine equation coefficients. It would then be possible to directly incorporate the obtained analytical expression in process design software for better prediction and control of processes occurring in industry and the environment.	翻訳日:2023-07-11 10:21:33 公開日:2023-07-08
# ディープラーニングを用いた光リモートセンシング画像における指向性物体検出 Oriented Object Detection in Optical Remote Sensing Images using Deep Learning: A Survey ( http://arxiv.org/abs/2302.10473v3 ) ライセンス: Link先を確認	Kun Wang, Zi Wang, Zhang Li, Ang Su, Xichao Teng, Minhao Liu and Qifeng Yu	(参考訳) 指向オブジェクト検出は、リモートセンシングにおける最も基本的かつ挑戦的なタスクの1つであり、多数の事前定義されたオブジェクトカテゴリの指向オブジェクトを見つけることを目的としている。近年,光リモートセンシング画像における指向性物体の検出において,深層学習に基づく手法が顕著な成果を上げている。しかし,リモートセンシングにおける文献の徹底的なレビューは行われていない。そこで我々は,近年の進歩を包括的に調査し,問題定義,一般的なデータセット,評価プロトコル,検出フレームワーク,オブジェクト指向オブジェクト表現,特徴表現など,オブジェクト指向オブジェクト検出の多くの側面をカバーする。さらに,最先端の手法を分析し,考察する。最後に,今後の研究の方向性を議論し,有用な研究指導を行う。この調査は学界や産業界の研究者にとって Oriented object detection is one of the most fundamental and challenging tasks in remote sensing, aiming at locating the oriented objects of numerous predefined object categories. Recently, deep learning based methods have achieved remarkable performance in detecting oriented objects in optical remote sensing imagery. However, a thorough review of the literature in remote sensing has not yet emerged. Therefore, we give a comprehensive survey of recent advances and cover many aspects of oriented object detection, including problem definition, commonly used datasets, evaluation protocols, detection frameworks, oriented object representations, and feature representations. Besides, the state-of-the-art methods are analyzed and discussed. We finally discuss future research directions to put forward some useful research guidance. We believe that this survey shall be valuable to researchers across academia and industry	翻訳日:2023-07-11 10:19:13 公開日:2023-07-08
# Proto-CLIP:Few-Shot Learningのためのビジョン言語プロトタイプネットワーク Proto-CLIP: Vision-Language Prototypical Network for Few-Shot Learning ( http://arxiv.org/abs/2307.03073v2 ) ライセンス: Link先を確認	Jishnu Jaykumar P, Kamalesh Palanisamy, Yu-Wei Chao, Xinya Du, Yu Xiang	(参考訳) 本稿では,CLIPのような大規模視覚言語モデルを活用することで,数ショット学習のための新しいフレームワークを提案する。初歩学習のためのユニモーダルな原型的ネットワークに動機づけられ,初歩学習に画像プロトタイプとテキストプロトタイプを利用するproto-clipを導入した。具体的には、PROTO-CLIPは、CLIP内の画像エンコーダとテキストエンコーダを、少数の例を用いて共同で適応させる。 2つのエンコーダは、分類のための画像クラスのプロトタイプを計算するために使用される。適応中に、対応するクラスの画像とテキストのプロトタイプの整列を提案する。このようなアライメントは、両タイプのプロトタイプからの貢献により、少数ショットの分類に有用である。本手法の有効性を,数発の学習のためのベンチマークデータセットと,ロボットの知覚のための実世界で実験することで実証する。 We propose a novel framework for few-shot learning by leveraging large-scale vision-language models such as CLIP. Motivated by the unimodal prototypical networks for few-shot learning, we introduce PROTO-CLIP that utilizes image prototypes and text prototypes for few-shot learning. Specifically, PROTO-CLIP adapts the image encoder and text encoder in CLIP in a joint fashion using few-shot examples. The two encoders are used to compute prototypes of image classes for classification. During adaptation, we propose aligning the image and text prototypes of corresponding classes. Such a proposed alignment is beneficial for few-shot classification due to the contributions from both types of prototypes. We demonstrate the effectiveness of our method by conducting experiments on benchmark datasets for few-shot learning as well as in the real world for robot perception.	翻訳日:2023-07-11 10:11:45 公開日:2023-07-08

Title

Authors

Abstract

論文公表日・翻訳日

# バグレポートの明確化問題に対するディープラーニングと構造化情報検索の利用

Employing Deep Learning and Structured Information Retrieval to Answer Clarification Questions on Bug Reports ( http://arxiv.org/abs/2304.12494v3 )

ライセンス: Link先を確認

Usmi Mukherjee and Mohammad Masudur Rahman

(参考訳) バグ追跡システムに関するソフトウェアバグ報告は、開発者が迅速に解決するための重要な情報を欠いていることが多い。バグレポーターが使用するさまざまなテンプレートを使用して、バグトラッキングシステムにおいて、バグレポーターから情報を効果的に引き出すための重要な研究が行われている。しかし、フォローアップ質問の必要性は続いている。最近の研究では、開発者が不足した詳細を知るのを助けるために、これらのフォローアップ質問を提案する手法が提案されているが、フォローアップ質問に答える研究はほとんど行われていない。本稿では,CodeT5とLuceneを併用した新しい手法を提案する。これは,異なるバグレポート,そのコンポーネント,および回答を推薦するためのフォローアップ質問の関連性を活用した情報検索手法である。これらのトップパフォーマンスの回答は、バグレポートとともに、欠陥のあるバグレポートから、回答を生成するディープラーニングモデルまで、追加のコンテキストとして機能する。我々は,正規化Smooth BLEUスコア, METEOR, Word Mover's Distance, Semantic similarity などの類似度指標を用いて,手動で注釈付き回答を評価した。我々は,最大34のBLEUスコアと64のセマンティック類似性を達成し,生成した回答がGoogleの標準に従って理解され,良好であることを示し,複数のベースラインを上回り得ることを示す。

Software bug reports reported on bug-tracking systems often lack crucial information for the developers to promptly resolve them, costing companies billions of dollars. There has been significant research on effectively eliciting information from bug reporters in bug tracking systems using different templates that bug reporters need to use. However, the need for asking follow-up questions persists. Recent studies propose techniques to suggest these follow-up questions to help developers obtain the missing details, but there has been little research on answering these follow up questions, which are often unanswered. In this paper, we propose a novel approach that uses CodeT5 in combination with Lucene, an information retrieval technique that leverages the relevance of different bug reports, their components, and follow-up questions to recommend answers. These top-performing answers, along with their bug report, serve as additional context apart from the deficient bug report to the deep learning model for generating an answer. We evaluate our recommended answers with the manually annotated answers using similarity metrics like Normalized Smooth BLEU Score, METEOR, Word Mover's Distance, and Semantic Similarity. We achieve a BLEU Score of up to 34 and Semantic Similarity of up to 64 which shows that the answers generated are understandable and good according to Google's standard and can outperform multiple baselines.

翻訳日:2023-10-24 12:37:00 公開日:2023-07-08

# reviewranker: コードレビュー品質推定のための半教師付き学習に基づくアプローチ

ReviewRanker: A Semi-Supervised Learning Based Approach for Code Review Quality Estimation ( http://arxiv.org/abs/2307.03996v1 )

ライセンス: Link先を確認

Saifullah Mahbub, Md. Easin Arafat, Chowdhury Rafeed Rahman, Zannatul Ferdows, Masum Hasan

(参考訳) コードレビューは、バグを最小限にし、コード品質を改善するための、ソフトウェア業界の重要なプロセスだと考えられている。レビュープロセスの有効性と継続的改善の検査は、開発生産性を高める。このような検査は時間と人のバイアスがかかる作業です。本稿では,レビューの品質に反する信頼性スコアを各コードレビューに割り当てることを目的とした,半教師付き学習ベースシステムであるReviewRankerを提案する。提案手法は,開発者が提供するシンプルで明確なラベルに基づいて訓練される。ラベル付けタスクは開発者からの努力をほとんど必要とせず、最終目標(レビュー信頼スコアの割り当て)と間接的に関係しています。 ReviewRankerは、人間のバイアスを減らし、業界全体のコードレビュー品質検査を改善することが期待されている。このシステムは、開発およびレビュープロセスに存在するバック・アンド・フォア・サイクルを最小化する可能性がある。この研究で使用可能なコードとデータセットは、https://github.com/saifarnab/code_reviewにある。

Code review is considered a key process in the software industry for minimizing bugs and improving code quality. Inspection of review process effectiveness and continuous improvement can boost development productivity. Such inspection is a time-consuming and human-bias-prone task. We propose a semi-supervised learning based system ReviewRanker which is aimed at assigning each code review a confidence score which is expected to resonate with the quality of the review. Our proposed method is trained based on simple and and well defined labels provided by developers. The labeling task requires little to no effort from the developers and has an indirect relation to the end goal (assignment of review confidence score). ReviewRanker is expected to improve industry-wide code review quality inspection through reducing human bias and effort required for such task. The system has the potential of minimizing the back-and-forth cycle existing in the development and review process. Usable code and dataset for this research can be found at: https://github.com/saifarnab/code_review

翻訳日:2023-10-23 18:17:16 公開日:2023-07-08

# コード分析のための自動コード評価システムとリソースの探索:包括的調査

Exploring Automated Code Evaluation Systems and Resources for Code Analysis: A Comprehensive Survey ( http://arxiv.org/abs/2307.08705v1 )

ライセンス: Link先を確認

Md. Mostafizer Rahman, Yutaka Watanobe, Atsushi Shirafuji and Mohamed Hamada

(参考訳) 自動コード評価システム(AES)は、主にユーザからの投稿されたコードを確実に評価するために設計されている。広範囲のアプリケーションと貴重なリソースの蓄積により、AESはますます人気が高まっている。多様なコーディングタスクに対するAESの応用と実世界の資源探索に関する研究はいまだ不十分である。本研究では,AESとその資源に関する総合的な調査を行った。本調査は, aessの適用領域, 利用可能なリソース, コーディングタスクのリソース利用について検討する。 AESはプログラミングコンテスト、プログラミングの学習と教育、採用、オンラインコンパイラ、そしてアプリケーションによって追加のモジュールに分類される。研究、分析、コーディングタスクのために、これらのシステムの利用可能なデータセットやその他のリソースを調査します。さらに,バグ検出,コードレビュー,理解,リファクタリング,検索,表現,修復など,機械学習によるコーディングタスクの概要を紹介する。これらのタスクは実際のデータセットを使って実行される。さらに,システム設計(ハードウェアとソフトウェア),運用(競争と教育),研究の観点から,会津オンライン審査プラットフォームをAESの実例として論じる。これは、AOJプラットフォーム(プログラミング教育、競争、実践)のスケーラビリティ、オープンな内部機能(ハードウェアとソフトウェア)、研究コミュニティからの注目、オープンソースデータ(例えば、ソリューションコードや提出文書)、透明性によるものである。また,システム全体のパフォーマンスや,長年にわたって認識されてきた課題についても分析した。

The automated code evaluation system (AES) is mainly designed to reliably assess user-submitted code. Due to their extensive range of applications and the accumulation of valuable resources, AESs are becoming increasingly popular. Research on the application of AES and their real-world resource exploration for diverse coding tasks is still lacking. In this study, we conducted a comprehensive survey on AESs and their resources. This survey explores the application areas of AESs, available resources, and resource utilization for coding tasks. AESs are categorized into programming contests, programming learning and education, recruitment, online compilers, and additional modules, depending on their application. We explore the available datasets and other resources of these systems for research, analysis, and coding tasks. Moreover, we provide an overview of machine learning-driven coding tasks, such as bug detection, code review, comprehension, refactoring, search, representation, and repair. These tasks are performed using real-life datasets. In addition, we briefly discuss the Aizu Online Judge platform as a real example of an AES from the perspectives of system design (hardware and software), operation (competition and education), and research. This is due to the scalability of the AOJ platform (programming education, competitions, and practice), open internal features (hardware and software), attention from the research community, open source data (e.g., solution codes and submission documents), and transparency. We also analyze the overall performance of this system and the perceived challenges over the years.

翻訳日:2023-10-23 17:11:05 公開日:2023-07-08

# 牛乳の脂肪含量判定のための透過スペックル写真と畳み込みニューラルネットワークの組み合わせ-複合懸濁液のパラメータの分類における課題

Combining transmission speckle photography and convolutional neural network for determination of fat content in cow's milk -- an exercise in classification of parameters of a complex suspension ( http://arxiv.org/abs/2307.15069v1 )

ライセンス: Link先を確認

Kwasi Nyandey (1 and 2) and Daniel Jakubczyk (1) ((1) Institute of Physics, Polish Academy of Sciences, Warsaw, Poland (2) Laser and Fibre Optics Centre, Department of Physics, School of Physical Sciences, College of Agriculture and Natural Sciences, University of Cape Coast, Cape Coast, Ghana)

(参考訳) 乳脂肪含有クラスの直接分類と認識のために,透過スペックル写真と機械学習を組み合わせた。本研究の目的は,散乱粒子(および分散媒質)のパラメータが,散乱媒質を介してコヒーレント光が透過するときに観測される強度分布(スペックル)に関連付けられることにある。牛乳については、主に総脂肪量を構成する脂肪球の大きさ分布と濃度である。その結果、我々は畳み込みニューラルネットワークを訓練し、異なる脂肪量クラス(0.5, 1.5, 2.0, 3.2%)からレーザースペックルを認識し分類した。本研究は4つの露光時間プロトコルを解析し,全画像に対して強度ヒストグラムが類似しており,スペックルパターンの最も高い強度が0に近い,短い露光時間における最高性能を得た。筆者らのニューラルネットワークは乳脂肪含量クラスを不明瞭に認識し,それぞれ100,99%と最も高い検査値と独立した分類精度を得た。これは、他の複素現実的サスペンションのパラメータが同様の方法で分類できることを示している。

We have combined transmission speckle photography and machine learning for direct classification and recognition of milk fat content classes. Our aim was hinged on the fact that parameters of scattering particles (and the dispersion medium) can be linked to the intensity distribution (speckle) observed when coherent light is transmitted through a scattering medium. For milk, it is primarily the size distribution and concentration of fat globules, which constitutes the total fat content. Consequently, we trained convolutional neural network to recognise and classify laser speckle from different fat content classes (0.5, 1.5, 2.0 and 3.2%). We investigated four exposure-time protocols and obtained the highest performance for shorter exposure times, in which the intensity histograms are kept similar for all images and the most probable intensity in the speckle pattern is close to zero. Our neural network was able to recognize the milk fat content classes unambiguously and we obtained the highest test and independent classification accuracies of 100 and ~99% respectively. It indicates that the parameters of other complex realistic suspensions could be classified with similar methods.

翻訳日:2023-08-06 11:37:27 公開日:2023-07-08

# XcodeのCopilot: クラウドベースの大規模言語モデルによるAI支援プログラミングの探索

Copilot for Xcode: Exploring AI-Assisted Programming by Prompting Cloud-based Large Language Models ( http://arxiv.org/abs/2307.14349v1 )

ライセンス: Link先を確認

Chee Wei Tan, Shangxin Guo, Man Fai Wong, Ching Nam Hang

(参考訳) 本稿では,プログラム構成と設計のためのAI支援プログラミングツールであるCopilot for Xcodeを提案する。クラウドベースのLarge Language Models(LLM)をAppleのローカル開発環境であるXcodeとシームレスに統合することにより、このツールは生産性を高め、Appleソフトウェアエコシステム(iOSアプリやmacOSなど)におけるソフトウェア開発の創造性を解放する。高度な自然言語処理(NLP)技術を活用することで、コードリポジトリ内のソースコードトークンとパターンを効果的に処理し、コード生成、自動補完、ドキュメント、エラー検出などの機能を実現する。ソフトウェア開発者は、プログラム構成に関する"小さな"決定をクエリし、同時に行うこともでき、これは、Xcode用のCopilotのチャットインターフェースで、迅速なエンジニアリングによって容易に行える。最後に,xcode で nlp を活用し,openai chatgpt などの一般的な llm サービスをプログラム構成や設計に促進する効果の証拠として,簡単なケーススタディを提案する。

This paper presents an AI-assisted programming tool called Copilot for Xcode for program composition and design to support human software developers. By seamlessly integrating cloud-based Large Language Models (LLM) with Apple's local development environment, Xcode, this tool enhances productivity and unleashes creativity for software development in Apple software ecosystem (e.g., iOS apps, macOS). Leveraging advanced natural language processing (NLP) techniques, Copilot for Xcode effectively processes source code tokens and patterns within code repositories, enabling features such as code generation, autocompletion, documentation, and error detection. Software developers can also query and make "small" decisions for program composition, some of which can be made simultaneously, and this is facilitated through prompt engineering in a chat interface of Copilot for Xcode. Finally, we present simple case studies as evidence of the effectiveness of utilizing NLP in Xcode to prompt popular LLM services like OpenAI ChatGPT for program composition and design.

翻訳日:2023-07-30 03:57:44 公開日:2023-07-08

# LLMは良い金融アドバイザーになれるか? 最適成果のための個人的意思決定に関する研究

Can LLMs be Good Financial Advisors?: An Initial Study in Personal Decision Making for Optimized Outcomes ( http://arxiv.org/abs/2307.07422v1 )

ライセンス: Link先を確認

Kausik Lakkaraju, Sai Krishna Revanth Vuruma, Vishal Pallagani, Bharath Muppasani, Biplav Srivastava

(参考訳) chatgptやbardといった、ますます強力な大規模言語モデル(llm)ベースのチャットボットが、一般大衆が達成する意思決定の質に革命を起こす可能性があるユーザに提供され始めている。この文脈では、金融包摂が銀行の長年の主導的目的である個人金融分野において、このようなシステムがどのように振舞うかを考察する。我々は、個人金融における銀行商品を代表する13の質問(銀行口座、クレジットカード、預金証書、商品間相互作用、高価値な購入、銀行の支払い、投資アドバイスに関する決定、および様々な方言や言語(英語、アフリカ系アメリカ人の英語、テルグ))を質問した。チャットボットのアウトプットは流動的で信頼性が高いが,LSMベースのチャットボットを用いた正確な財務情報の提供には依然として重要なギャップがある。

Increasingly powerful Large Language Model (LLM) based chatbots, like ChatGPT and Bard, are becoming available to users that have the potential to revolutionize the quality of decision-making achieved by the public. In this context, we set out to investigate how such systems perform in the personal finance domain, where financial inclusion has been an overarching stated aim of banks for decades. We asked 13 questions representing banking products in personal finance: bank account, credit card, and certificate of deposits and their inter-product interactions, and decisions related to high-value purchases, payment of bank dues, and investment advice, and in different dialects and languages (English, African American Vernacular English, and Telugu). We find that although the outputs of the chatbots are fluent and plausible, there are still critical gaps in providing accurate and reliable financial information using LLM-based chatbots.

翻訳日:2023-07-23 12:26:09 公開日:2023-07-08

# シークエンシャルグリーディタグとしての不完全発話書き換え

Incomplete Utterance Rewriting as Sequential Greedy Tagging ( http://arxiv.org/abs/2307.06337v1 )

ライセンス: Link先を確認

Yunshan Chen

(参考訳) 不完全な発話書き直しのタスクは、最近注目を集めている。以前のモデルは、低い復元スコアで示されるように、対話コンテキストから情報を抽出するのに苦労した。この問題に対処するために,コンテキストから情報を抽出するのにより適した,新しいシーケンスタグベースモデルを提案する。一方,モデル話者変動に話者認識埋め込みを導入する。複数のパブリックデータセットにおける実験により、従来の最先端モデルに匹敵するメトリクススコアを持つ一方で、9つの修復スコアすべてにおいて最適な結果が得られることが示された。さらに、モデルの単純さの恩恵を受け、我々のアプローチは推論速度で過去のモデルよりも優れています。

The task of incomplete utterance rewriting has recently gotten much attention. Previous models struggled to extract information from the dialogue context, as evidenced by the low restoration scores. To address this issue, we propose a novel sequence tagging-based model, which is more adept at extracting information from context. Meanwhile, we introduce speaker-aware embedding to model speaker variation. Experiments on multiple public datasets show that our model achieves optimal results on all nine restoration scores while having other metric scores comparable to previous state-of-the-art models. Furthermore, benefitting from the model's simplicity, our approach outperforms most previous models on inference speed.

翻訳日:2023-07-16 03:14:38 公開日:2023-07-08

# 名前の文字列から人種と民族を予測すること

Predicting Race and Ethnicity From the Sequence of Characters in a Name ( http://arxiv.org/abs/1805.02109v2 )

ライセンス: Link先を確認

Rajashekar Chintalapati, Suriyan Laohaprapanon, and Gaurav Sood

(参考訳) 人種格差と公平性に関する質問に答えるには、しばしば名前から人種や民族を推測する方法が必要である。人種と民族を名前から推測する一つの方法は、国勢調査局の人気のある姓のリストに依存することである。しかし、リストには少なくとも3つの制限がある。 1.ラストネームのみを含む。 2. 人気の姓のみを含む。 3.10年に1度更新される。名前の文字と人種と民族の関係を様々な手法を用いてモデル化する。 Long Short-Term Memory を用いたモデルでは、サンプル外精度は .85 である。最高のパフォーマンスのラストネームモデルは、.81のサンプル外精度を達成する。モデルの有用性を説明するために,様々な人種集団の人々が行う寄付のシェアを推定するキャンペーンファイナンスデータと,ニュースにおける様々な人種や民族のカバレッジを推定するニュースデータに適用する。

To answer questions about racial inequality and fairness, we often need a way to infer race and ethnicity from names. One way to infer race and ethnicity from names is by relying on the Census Bureau's list of popular last names. The list, however, suffers from at least three limitations: 1. it only contains last names, 2. it only includes popular last names, and 3. it is updated once every 10 years. To provide better generalization, and higher accuracy when first names are available, we model the relationship between characters in a name and race and ethnicity using various techniques. A model using Long Short-Term Memory works best with out-of-sample accuracy of .85. The best-performing last-name model achieves out-of-sample accuracy of .81. To illustrate the utility of the models, we apply them to campaign finance data to estimate the share of donations made by people of various racial groups, and to news data to estimate the coverage of various races and ethnicities in the news.

翻訳日:2023-07-13 20:54:43 公開日:2023-07-08

# Kencorpus: 自然言語処理タスクのためのKenyan Language Corpus of Swahili, Dholuo, Luhya

Kencorpus: A Kenyan Language Corpus of Swahili, Dholuo and Luhya for Natural Language Processing Tasks ( http://arxiv.org/abs/2208.12081v2 )

ライセンス: Link先を確認

Barack Wanjawa, Lilian Wanzare, Florence Indede, Owen McOnyango, Edward Ombui, Lawrence Muchemi

(参考訳) アフリカ原産の言語は、自然言語処理では不足している。そのため、デジタルの傾向や情報アクセスが貧弱である。このような言語の処理課題は、必要なデータなしで機械学習とディープラーニングモデルを使用する方法だ。 Kencorpusプロジェクトは、機械翻訳、質問応答、多言語コミュニティでの書き起こしなど、データ駆動型ソリューションに十分なテキストと音声データを収集、保存することで、このギャップを埋めようとしている。 kencorpusデータセットは、主にケニアで話されている3つの言語(スワヒリ語、ドルーオ語、ルヒヤ語)のテキストと音声コーパスである。データ収集は、コミュニティ、学校、メディア、出版社の研究者によって行われた。ケンコープスのデータセットには、5,594の項目 - 4,442のテキスト (5.6mワード) と1,152の音声ファイル (177hrs) がある。このデータに基づいて,Dholuo と Luhya の音声タグセット (それぞれ50,000 語と 93,000 語) の一部が開発された。スワヒリ語に対する7,537の質問応答対を開発し,Dholuo と Luhya からスワヒリ語への13,400 文のテキスト翻訳を作成した。データセットは、モデルトレーニングや翻訳といった下流の機械学習タスクに役立ちます。また,質問応答タスクのためのKismwahili音声テキスト学習システムと機械学習システムの2つの概念実証システムを開発し,それぞれ18.87%の単語誤り率と80%のエクサクトマッチ(EM)が得られた。これらの最初の結果は、Kencorpusの機械学習コミュニティへのユーザビリティを大いに約束する。 kencorpusは、これら3つの低リソース言語のための数少ないパブリックドメインコーポラの1つであり、特に低リソース言語のための同様の作品の学習と共有の基盤を形成している。

Indigenous African languages are categorized as under-served in Natural Language Processing. They therefore experience poor digital inclusivity and information access. The processing challenge with such languages has been how to use machine learning and deep learning models without the requisite data. The Kencorpus project intends to bridge this gap by collecting and storing text and speech data that is good enough for data-driven solutions in applications such as machine translation, question answering and transcription in multilingual communities. The Kencorpus dataset is a text and speech corpus for three languages predominantly spoken in Kenya: Swahili, Dholuo and Luhya. Data collection was done by researchers from communities, schools, media, and publishers. The Kencorpus' dataset has a collection of 5,594 items - 4,442 texts (5.6M words) and 1,152 speech files (177hrs). Based on this data, Part of Speech tagging sets for Dholuo and Luhya (50,000 and 93,000 words respectively) were developed. We developed 7,537 Question-Answer pairs for Swahili and created a text translation set of 13,400 sentences from Dholuo and Luhya into Swahili. The datasets are useful for downstream machine learning tasks such as model training and translation. We also developed two proof of concept systems: for Kiswahili speech-to-text and machine learning system for Question Answering task, with results of 18.87% word error rate and 80% Exact Match (EM) respectively. These initial results give great promise to the usability of Kencorpus to the machine learning community. Kencorpus is one of few public domain corpora for these three low resource languages and forms a basis of learning and sharing experiences for similar works especially for low resource languages.

翻訳日:2023-07-13 20:24:16 公開日:2023-07-08

# 生成モデリングと制約付き組合せ最適化のための対称テンソルネットワーク

Symmetric Tensor Networks for Generative Modeling and Constrained Combinatorial Optimization ( http://arxiv.org/abs/2211.09121v3 )

ライセンス: Link先を確認

Javier Lopez-Piqueres, Jing Chen, Alejandro Perdomo-Ortiz

(参考訳) ポートフォリオ最適化からロジスティクスまで、業界に多い制約付き組合せ最適化の問題。これらの問題を解決する大きな障害の1つは、有効な探索空間を制限する非自明なハード制約の存在である。いくつかのヒューリスティックな解法において、これらは典型的にはコスト関数に特定のラグランジュ乗数を導入し、それらを何らかの方法で緩和し、さらに悪いことに、多くのサンプルを生成して有効なものだけを保持することにより、非常に高価で非効率な探索をもたらす。本研究では, ax=b 形式の任意の整数値等式制約を u(1) 対称テンソルネットワーク (tns) に直接エンコードし, 組合せ最適化問題の解探索を支援する量子モデルとしてその適用性を活用する。これにより、TN生成モデルの一般化能力を利用でき、それらを制約することで、有効なサンプルのみを出力できる。制約付きTN生成モデルは,パラメータ数と計算コストを削減し,制約を効率的に捕捉する。任意の等式によって与えられる制約のあるタスクにおいて、対称行列積状態は、組合せ最適化問題に対する新しいより良い解を見つけるために、標準の制約のないタスクよりも優れることがわかった。

Constrained combinatorial optimization problems abound in industry, from portfolio optimization to logistics. One of the major roadblocks in solving these problems is the presence of non-trivial hard constraints which limit the valid search space. In some heuristic solvers, these are typically addressed by introducing certain Lagrange multipliers in the cost function, by relaxing them in some way, or worse yet, by generating many samples and only keeping valid ones, which leads to very expensive and inefficient searches. In this work, we encode arbitrary integer-valued equality constraints of the form Ax=b, directly into U(1) symmetric tensor networks (TNs) and leverage their applicability as quantum-inspired generative models to assist in the search of solutions to combinatorial optimization problems. This allows us to exploit the generalization capabilities of TN generative models while constraining them so that they only output valid samples. Our constrained TN generative model efficiently captures the constraints by reducing number of parameters and computational costs. We find that at tasks with constraints given by arbitrary equalities, symmetric Matrix Product States outperform their standard unconstrained counterparts at finding novel and better solutions to combinatorial optimization problems.

翻訳日:2023-07-13 20:04:23 公開日:2023-07-08

# 意思決定と制御のための深層生成モデル

Deep Generative Models for Decision-Making and Control ( http://arxiv.org/abs/2306.08810v2 )

ライセンス: Link先を確認

Michael Janner

(参考訳) 深層モデルに基づく強化学習法は、意思決定と制御問題に対する概念的に単純なアプローチを提供する: 近似ダイナミクスモデルの推定のために学習を使い、残りの作業を古典的な軌道最適化にオフロードする。しかし、この組み合わせには多くの経験的欠点があり、実際にモデルベース手法の有用性を制限している。この論文の2つの目的は、これらの欠点の理由を研究し、未解決問題に対する解決策を提案することである。その過程で,ビーム探索,分類器誘導サンプリング,画像インパインティングなど,現代生成型モデリングツールボックスからの推論手法を,強化学習問題に対する有効な計画戦略として再解釈できることを強調する。

Deep model-based reinforcement learning methods offer a conceptually simple approach to the decision-making and control problem: use learning for the purpose of estimating an approximate dynamics model, and offload the rest of the work to classical trajectory optimization. However, this combination has a number of empirical shortcomings, limiting the usefulness of model-based methods in practice. The dual purpose of this thesis is to study the reasons for these shortcomings and to propose solutions for the uncovered problems. Along the way, we highlight how inference techniques from the contemporary generative modeling toolbox, including beam search, classifier-guided sampling, and image inpainting, can be reinterpreted as viable planning strategies for reinforcement learning problems.

翻訳日:2023-07-13 18:59:43 公開日:2023-07-08

# 超伝導量子ビットによる高オーバートンバルク音波共振器の結合

Coupling high-overtone bulk acoustic wave resonators via superconducting qubits ( http://arxiv.org/abs/2307.05544v1 )

ライセンス: Link先を確認

Wayne Crump, Alpo V\"alimaa, and Mika A. Sillanp\"a\"a

(参考訳) 本稿では,2つの結合トランスモン量子ビットからなるデバイスについて,それぞれが独立な高オーバトンバルク波共振器(hbar)と結合されていることを示す。両方のHBAR共振器は複数の音響モードをサポートしており、ほぼ共鳴的に量子ビットに結合することができる。まず,マルチモードシステムにおける量子ビット-量子ビット相互作用を示し,最後に1量子ビットのHBARモードから他の量子ビットのHBARモードに励起を交換する量子状態移動を示す。

In this work, we present a device consisting of two coupled transmon qubits, each of which are coupled to an independent high-overtone bulk acoustic wave resonator (HBAR). Both HBAR resonators support a plethora of acoustic modes, which can couple to the qubit near resonantly. We first show qubit-qubit interaction in the multimode system, and finally quantum state transfer where an excitation is swapped from an HBAR mode of one qubit, to an HBAR mode of the other qubit.

翻訳日:2023-07-13 16:27:30 公開日:2023-07-08

# 生成テキスト・画像モデルのリスクの分類

Typology of Risks of Generative Text-to-Image Models ( http://arxiv.org/abs/2307.05543v1 )

ライセンス: Link先を確認

Charlotte Bird and Eddie L. Ungless and Atoosa Kasirzadeh

(参考訳) 本稿では,dall-eやmidjourneyといった現代テキストから画像への生成モデルに対する直接的なリスクと害について,包括的な文献レビューを通じて検討する。これらのモデルは画像を生成するのに前例のない能力を提供するが、その開発と利用は注意を要する新しいタイプのリスクをもたらす。今回のレビューでは,すでに対処済みのリスクの理解と対処に関して,重要な知識のギャップが明らかにされている。我々は、未解決の問題を含む6つの主要な利害関係者グループにわたるリスクの分類を提供し、今後の研究方向性を提案する。データバイアスから悪意のある使用まで、22の異なるリスクタイプを特定します。ここで示した調査は、責任のあるモデルの開発とデプロイに関する現在進行中の談話を強化することを目的としている。それまで見過ごされていたリスクとギャップを強調することで、その後の研究とガバナンスのイニシアチブを形作り、テキストから画像へのモデルの責任、安全、倫理的に意識された進化へと導くことを目指している。

This paper investigates the direct risks and harms associated with modern text-to-image generative models, such as DALL-E and Midjourney, through a comprehensive literature review. While these models offer unprecedented capabilities for generating images, their development and use introduce new types of risk that require careful consideration. Our review reveals significant knowledge gaps concerning the understanding and treatment of these risks despite some already being addressed. We offer a taxonomy of risks across six key stakeholder groups, inclusive of unexplored issues, and suggest future research directions. We identify 22 distinct risk types, spanning issues from data bias to malicious use. The investigation presented here is intended to enhance the ongoing discourse on responsible model development and deployment. By highlighting previously overlooked risks and gaps, it aims to shape subsequent research and governance initiatives, guiding them toward the responsible, secure, and ethically conscious evolution of text-to-image models.

翻訳日:2023-07-13 16:27:20 公開日:2023-07-08

# スケーラブルグラフ周波数分解による高忠実度3次元手形状再構成

High Fidelity 3D Hand Shape Reconstruction via Scalable Graph Frequency Decomposition ( http://arxiv.org/abs/2307.05541v1 )

ライセンス: Link先を確認

Tianyu Luan, Yuanhao Zhai, Jingjing Meng, Zhong Li, Zhang Chen, Yi Xu, and Junsong Yuan

(参考訳) 最近のシングルイメージハンドモデリング技術によって得られた印象的なパフォーマンスにもかかわらず、3Dハンドメッシュの十分な詳細をキャプチャする能力は欠如している。この不足は、例えばパーソナライズドハンドモデリングのような高忠実度ハンドモデリングが必要な場合に、彼らのアプリケーションを大幅に制限する。そこで我々は,周波数分割ネットワークを設計し,周波数帯域の異なる3次元ハンドメッシュを粗い方法で生成する。高頻度パーソナライズドディテールをキャプチャするために、3dメッシュを周波数領域に変換し、各周波数成分を監督する新しい周波数分解損失を提案する。このような粗い細かなスキームを活用することで、高い周波数領域に対応する手の詳細を保存できる。さらに、提案するネットワークはスケーラブルであり、様々な計算能力を持つ異なるハードウェアに対応するため、任意の解像度レベルでの推論を停止することができる。本手法の性能を定量的に評価するために,各メッシュ周波数成分の信号対雑音比を測定するために,平均信号対雑音比(msnr)と呼ばれる新しい評価指標を提案する。広範な実験により,高忠実度3次元ハンドリコンストラクションのための細かな詳細情報を生成し,従来の測定値と比較して,メッシュの詳細を測定する上で評価基準がより効果的であることを実証した。

Despite the impressive performance obtained by recent single-image hand modeling techniques, they lack the capability to capture sufficient details of the 3D hand mesh. This deficiency greatly limits their applications when high-fidelity hand modeling is required, e.g., personalized hand modeling. To address this problem, we design a frequency split network to generate 3D hand mesh using different frequency bands in a coarse-to-fine manner. To capture high-frequency personalized details, we transform the 3D mesh into the frequency domain, and propose a novel frequency decomposition loss to supervise each frequency component. By leveraging such a coarse-to-fine scheme, hand details that correspond to the higher frequency domain can be preserved. In addition, the proposed network is scalable, and can stop the inference at any resolution level to accommodate different hardware with varying computational powers. To quantitatively evaluate the performance of our method in terms of recovering personalized shape details, we introduce a new evaluation metric named Mean Signal-to-Noise Ratio (MSNR) to measure the signal-to-noise ratio of each mesh frequency component. Extensive experiments demonstrate that our approach generates fine-grained details for high-fidelity 3D hand reconstruction, and our evaluation metric is more effective for measuring mesh details compared with traditional metrics.

翻訳日:2023-07-13 16:27:04 公開日:2023-07-08

# 科学制御可能なテキスト生成手法の進歩

Advancements in Scientific Controllable Text Generation Methods ( http://arxiv.org/abs/2307.05538v1 )

ライセンス: Link先を確認

Arnav Goel, Medha Hira, Avinash Anand, Siddhesh Bangar, Dr. Rajiv Ratn Shah

(参考訳) 制御可能なテキスト生成に関するこれまでの研究は、我々が本研究で提供する新しいスキーマを用いて組織化されている。 7つのコンポーネントがスキーマを構成し、それぞれが生成プロセスに不可欠である。科学文献の制御された生成を実現するために,各7成分を変調する様々な変調戦略について述べる。また,これらの手法を理論的に検討し,質的考察を行う。この洞察は、これらのコンポーネントの組み合わせに基づく新しいアーキテクチャを可能にする。今後の研究では、これらの手法を実証的に比較して、その強みと有用性についてさらに学ぶだろう。

The previous work on controllable text generation is organized using a new schema we provide in this study. Seven components make up the schema, and each one is crucial to the creation process. To accomplish controlled generation for scientific literature, we describe the various modulation strategies utilised to modulate each of the seven components. We also offer a theoretical study and qualitative examination of these methods. This insight makes possible new architectures based on combinations of these components. Future research will compare these methods empirically to learn more about their strengths and utility.

翻訳日:2023-07-13 16:26:40 公開日:2023-07-08

# NLPとRNA: Word2Vecによるリボザイムの非教師なし埋め込み学習

NLP Meets RNA: Unsupervised Embedding Learning for Ribozymes with Word2Vec ( http://arxiv.org/abs/2307.05537v1 )

ライセンス: Link先を確認

Andrew Kean Gao

(参考訳) 異なる3D構造と触媒活性を持つRNA分子であるリボザイムは、合成生物学や治療に広く応用されている。しかし、リボザイムの理解を深めるためにディープラーニングを活用する研究は、比較的少ない。本研究は,自然言語処理のための教師なし学習手法であるword2vecを実装し,リボザイム埋め込みを学習する。 Ribo2Vecは9000以上の多様なリボザイムで訓練され、配列を128次元および256次元のベクトル空間にマッピングすることを学んだ。 Ribo2Vecを用いて5種類のリボザイム(ハッチェ、ピストル、ヘアピン、ホブリンク、ツイスター姉妹)の配列埋め込みを計算した。主成分分析はリボザイムのクラスを区別するこれらの埋め込みの能力を示した。さらに、リボザイムの埋め込みを訓練した単純なSVM分類器は、リボザイムの型を正確に分類する有望な結果を示した。以上の結果から,組込みベクターにはリボザイムに関する有意な情報が含まれていることが示唆された。興味深いことに、256次元の埋め込みは128次元の埋め込みと同様に振舞い、より低次元のベクトル空間が一般的にリボザイムの特徴を捉えるのに十分であることを示す。このアプローチは、バイオインフォマティクスにおけるWord2Vecの可能性を示し、リボザイム研究の新しい道を開く。将来の研究は、rna埋め込みを学習するためにトランスフォーマーベースの方法を使用することで、ヌクレオチド間の長距離相互作用を捉えることができる。

Ribozymes, RNA molecules with distinct 3D structures and catalytic activity, have widespread applications in synthetic biology and therapeutics. However, relatively little research has focused on leveraging deep learning to enhance our understanding of ribozymes. This study implements Word2Vec, an unsupervised learning technique for natural language processing, to learn ribozyme embeddings. Ribo2Vec was trained on over 9,000 diverse ribozymes, learning to map sequences to 128 and 256-dimensional vector spaces. Using Ribo2Vec, sequence embeddings for five classes of ribozymes (hatchet, pistol, hairpin, hovlinc, and twister sister) were calculated. Principal component analysis demonstrated the ability of these embeddings to distinguish between ribozyme classes. Furthermore, a simple SVM classifier trained on ribozyme embeddings showed promising results in accurately classifying ribozyme types. Our results suggest that the embedding vectors contained meaningful information about ribozymes. Interestingly, 256-dimensional embeddings behaved similarly to 128-dimensional embeddings, suggesting that a lower dimension vector space is generally sufficient to capture ribozyme features. This approach demonstrates the potential of Word2Vec for bioinformatics, opening new avenues for ribozyme research. Future research includes using a Transformer-based method to learn RNA embeddings, which can capture long-range interactions between nucleotides.

翻訳日:2023-07-13 16:26:32 公開日:2023-07-08

# 顔認識のための顔画像品質向上研究

Face Image Quality Enhancement Study for Face Recognition ( http://arxiv.org/abs/2307.05534v1 )

ライセンス: Link先を確認

Iqbal Nouyed, Na Zhang

(参考訳) unconstrained face recognitionは、コンピュータビジョンとバイオメトリック研究者の間で長年にわたって活発な研究分野である。それでも、低画質の写真における顔認識の問題は、まだ十分に研究されていない。本稿では,低画質写真における顔認識性能について検討し,低画質画像の処理精度の向上を試みる。画像品質の低い大容量データベースを組み立て,3種類の品質セットを対象とした顔認識アルゴリズムの性能について検討する。最先端の顔画像強調手法を用いて,顔画像の顔認識性能について検討する。そこで本研究では,低画質の顔画像を用いた新しい認識プロトコルを開発し,その性能を実験的に検証した。低画質の顔画像を用いた顔認識プロトコルは,他の研究者にとって有用である。さらに,実験結果から,この問題の難易度が示されている。

Unconstrained face recognition is an active research area among computer vision and biometric researchers for many years now. Still the problem of face recognition in low quality photos has not been well-studied so far. In this paper, we explore the face recognition performance on low quality photos, and we try to improve the accuracy in dealing with low quality face images. We assemble a large database with low quality photos, and examine the performance of face recognition algorithms for three different quality sets. Using state-of-the-art facial image enhancement approaches, we explore the face recognition performance for the enhanced face images. To perform this without experimental bias, we have developed a new protocol for recognition with low quality face photos and validate the performance experimentally. Our designed protocol for face recognition with low quality face images can be useful to other researchers. Moreover, experiment results show some of the challenging aspects of this problem.

翻訳日:2023-07-13 16:26:06 公開日:2023-07-08

# chatgptの公開: 命令調整テキストジェネレータにおけるオープン性、透明性、説明責任の追跡

Opening up ChatGPT: Tracking openness, transparency, and accountability in instruction-tuned text generators ( http://arxiv.org/abs/2307.05532v1 )

ライセンス: Link先を確認

Andreas Liesenfeld, Alianda Lopez, Mark Dingemanse

(参考訳) 命令追従動作を示す大きな言語モデルは、ヒューマンフィードバック(LLM+RLHF)からの強化学習を通じて微調整されたテキスト生成のためのプロプライエタリな大規模言語モデルであるOpenAIのChatGPTのリリースによって、近年の会話インターフェースにおける最大の流行の1つである。プロプライエタリなソフトウェアに依存するリスクをレビューし、同等のアーキテクチャと機能を持つオープンソースプロジェクトの最初の収穫物を調査します。本論文の主な貢献は,オープンネスが差別化されていることを示し,この高速移動分野におけるオープンネスの度合いに関する科学的資料を提供することである。我々は、コードのオープン性、トレーニングデータ、モデル重み付け、rlhfデータ、ライセンス、科学ドキュメント、アクセス方法の観点からプロジェクトを評価する。オープンソース」と自称するプロジェクトが急速に増えている一方で、疑わしい合法性に関する文書化されていないデータを継承しているプロジェクトも少なくないが、重要なインストラクションチューニング(人間のアノテーション労働が関与する重要なサイト)を共有することはほとんどなく、注意深く科学的文書化することは極めて稀である。データ収集やキュレーションからモデルアーキテクチャまで,トレーニングや微調整からリリースやデプロイメントに至るまで,あらゆる点で,オープン性の程度は公平性と説明責任に関係しています。

Large language models that exhibit instruction-following behaviour represent one of the biggest recent upheavals in conversational interfaces, a trend in large part fuelled by the release of OpenAI's ChatGPT, a proprietary large language model for text generation fine-tuned through reinforcement learning from human feedback (LLM+RLHF). We review the risks of relying on proprietary software and survey the first crop of open-source projects of comparable architecture and functionality. The main contribution of this paper is to show that openness is differentiated, and to offer scientific documentation of degrees of openness in this fast-moving field. We evaluate projects in terms of openness of code, training data, model weights, RLHF data, licensing, scientific documentation, and access methods. We find that while there is a fast-growing list of projects billing themselves as 'open source', many inherit undocumented data of dubious legality, few share the all-important instruction-tuning (a key site where human annotation labour is involved), and careful scientific documentation is exceedingly rare. Degrees of openness are relevant to fairness and accountability at all points, from data collection and curation to model architecture, and from training and fine-tuning to release and deployment.

翻訳日:2023-07-13 16:25:54 公開日:2023-07-08

# リプレイとカリキュラムの統合:連続学習への影響

Integrating Curricula with Replays: Its Effects on Continual Learning ( http://arxiv.org/abs/2307.05747v1 )

ライセンス: Link先を確認

Ren Jie Tee and Mengmi Zhang

(参考訳) 人間は、新しいスキルや知識を得る際に、キュリキュラでプロセスを学習し、レビューする。この人間の学習行動は、連続学習エージェントにおけるカリキュラムと再生方法の統合にインスピレーションを与えている。目標は、人間の学習プロセスをエミュレートし、知識の保持を改善し、学習の伝達を促進することである。継続学習エージェントにおける既存のリプレイ手法では、前タスクからのデータのランダムな選択と順序付けが有効であることが示されている。しかし、継続学習を強化するためのリプレイ手法と異なるカリキュラムの統合について限定的な研究がなされている。本研究は,リプレイ法とリプレイ法が連続学習に与える影響を,学習データとリプレイ例のインターリーブ頻度,例題をリプレイするシーケンス,例題をリプレイバッファに選択する戦略の3つの点で検討する。キュリキュラデザインのこれらの側面は認知心理学の原則と整合し、リプレイ中のインターリーブドプラクティスの利点、簡単でハードなリハーサル、そして難易度の一様分布からの模範的選択戦略を活用する。以上の結果から,これら3つのカリキュラムは,継続学習手法の進歩におけるカリキュラムの可能性を実証し,破滅的な記憶とポジティブな知識伝達を効果的に緩和した。

Humans engage in learning and reviewing processes with curricula when acquiring new skills or knowledge. This human learning behavior has inspired the integration of curricula with replay methods in continual learning agents. The goal is to emulate the human learning process, thereby improving knowledge retention and facilitating learning transfer. Existing replay methods in continual learning agents involve the random selection and ordering of data from previous tasks, which has shown to be effective. However, limited research has explored the integration of different curricula with replay methods to enhance continual learning. Our study takes initial steps in examining the impact of integrating curricula with replay methods on continual learning in three specific aspects: the interleaved frequency of replayed exemplars with training data, the sequence in which exemplars are replayed, and the strategy for selecting exemplars into the replay buffer. These aspects of curricula design align with cognitive psychology principles and leverage the benefits of interleaved practice during replays, easy-to-hard rehearsal, and exemplar selection strategy involving exemplars from a uniform distribution of difficulties. Based on our results, these three curricula effectively mitigated catastrophic forgetting and enhanced positive knowledge transfer, demonstrating the potential of curricula in advancing continual learning methodologies.

翻訳日:2023-07-13 15:19:01 公開日:2023-07-08

# 微調整を伴わないBKT転移における漸近自由の量子規則化

A qubit regularization of asymptotic freedom at the BKT transition without fine-tuning ( http://arxiv.org/abs/2307.06117v1 )

ライセンス: Link先を確認

Sandip Maiti, Debasish Banerjee, Shailesh Chandrasekharan, and Marina K. Marinkovic

(参考訳) 本稿では,BKT遷移で現れる漸近的に自由な大規模連続体量子場理論を正規化するための2次元ハードコアループガスモデルを提案する。微調整なしでは、位相遷移に近づくと古典格子XYモデルの普遍的なステップスケーリング関数を大相で再現することができる。これは、ループガス配置空間におけるフォック真空部位の空孔率を熱力学限界でゼロに下げることによって達成される。 BKT遷移における普遍量のいくつかは、従来のXYモデルと比較して、我々のモデルにおいてより小さい有限サイズ効果を示す。我々のモデルはユークリッド時空における漸近的に自由な巨大量子場理論の量子正則化の素例であり、微調整なしで解離された固定点における関連する摂動として漸近的自由がどのように生じるかを理解するのに役立つ。

We propose a two-dimensional hard core loop-gas model as a way to regularize the asymptotically free massive continuum quantum field theory that emerges at the BKT transition. Without fine-tuning, our model can reproduce the universal step-scaling function of the classical lattice XY model in the massive phase as we approach the phase transition. This is achieved by lowering the fugacity of Fock-vacuum sites in the loop-gas configuration space to zero in the thermodynamic limit. Some of the universal quantities at the BKT transition show smaller finite size effects in our model as compared to the traditional XY model. Our model is a prime example of qubit regularization of an asymptotically free massive quantum field theory in Euclidean space-time and helps understand how asymptotic freedom can arise as a relevant perturbation at a decoupled fixed point without fine-tuning.

翻訳日:2023-07-13 13:10:18 公開日:2023-07-08

# 画像生成モデルの定性的故障とディープフェイク検出への応用

Qualitative Failures of Image Generation Models and Their Application in Detecting Deepfakes ( http://arxiv.org/abs/2304.06470v3 )

ライセンス: Link先を確認

Ali Borji

(参考訳) 画像生成モデルと映像生成モデルがフォトリアリスティックな画像を作成する能力は前代未聞の高さに達しており、実像と偽像を区別することは多くの場合困難である。しかし、この進歩にもかかわらず、生成した画像の品質と現実世界に見られるものとの間にはギャップが残っている。そこで本稿では,画像生成モデルにおける質的欠点を5つのカテゴリに分類し,学術出版物とソーシャルメディアの双方から膨大な文献をレビューした。これらの失敗を理解することによって、これらのモデルの改善が必要な領域を特定し、深い偽物を検出する戦略を開発することができる。今日の社会におけるディープフェイクの流行は深刻な懸念であり、我々の発見は彼らのネガティブな影響を軽減するのに役立つ。

The ability of image and video generation models to create photorealistic images has reached unprecedented heights, making it difficult to distinguish between real and fake images in many cases. However, despite this progress, a gap remains between the quality of generated images and those found in the real world. To address this, we have reviewed a vast body of literature from both academic publications and social media to identify qualitative shortcomings in image generation models, which we have classified into five categories. By understanding these failures, we can identify areas where these models need improvement, as well as develop strategies for detecting deep fakes. The prevalence of deep fakes in today's society is a serious concern, and our findings can help mitigate their negative impact.

翻訳日:2023-07-12 18:27:43 公開日:2023-07-08

# 知識グラフと閉鎖型連続時間液体ニューラルネットワークを用いた患者ケア用ディジタル双極子

Digital Twins for Patient Care via Knowledge Graphs and Closed-Form Continuous-Time Liquid Neural Networks ( http://arxiv.org/abs/2307.04772v1 )

ライセンス: Link先を確認

Logan Nye

(参考訳) デジタルツイン技術は医療を変革し、パーソナライズされた医療とサポート、早期診断、シミュレートされた治療結果、最適化された手術計画を可能にしている。デジタル双子は、製造業、サプライチェーンのロジスティクス、民間インフラなどの業界で、容易に注目を集めている。ただし、患者のケアは行わない。マルチモーダル患者データを用いた複雑な疾患のモデル化の課題と、その解析の複雑さは、生体医療分野におけるデジタル双生児の採用を阻害している。しかし、これらの大きな障害は、異なる方法でこれらのモデルにアプローチすることで対処できる可能性がある。本稿では,計算コストとモデリング複雑さによる臨床双対モデリングの障壁に対処する新しい枠組みを提案する。本稿では,患者健康データをナレッジグラフとして構造化し,クローズドフォームな連続時間液体ニューラルネットワークを用いてリアルタイム分析を行う。マルチモーダル患者データを合成し, 閉鎖型連続時間ネットワークと知識グラフオントロジーの柔軟性と効率を活用することにより, リアルタイムの洞察, パーソナライズド医療, 早期診断と介入, 最適な手術計画を可能にした。この新しいアプローチは、患者の健康の包括的で適応可能なビューと、リアルタイム分析を提供し、デジタルツインシミュレーションやその他の医療における期待される利益の道を開く。

Digital twin technology has is anticipated to transform healthcare, enabling personalized medicines and support, earlier diagnoses, simulated treatment outcomes, and optimized surgical plans. Digital twins are readily gaining traction in industries like manufacturing, supply chain logistics, and civil infrastructure. Not in patient care, however. The challenge of modeling complex diseases with multimodal patient data and the computational complexities of analyzing it have stifled digital twin adoption in the biomedical vertical. Yet, these major obstacles can potentially be handled by approaching these models in a different way. This paper proposes a novel framework for addressing the barriers to clinical twin modeling created by computational costs and modeling complexities. We propose structuring patient health data as a knowledge graph and using closed-form continuous-time liquid neural networks, for real-time analytics. By synthesizing multimodal patient data and leveraging the flexibility and efficiency of closed form continuous time networks and knowledge graph ontologies, our approach enables real time insights, personalized medicine, early diagnosis and intervention, and optimal surgical planning. This novel approach provides a comprehensive and adaptable view of patient health along with real-time analytics, paving the way for digital twin simulations and other anticipated benefits in healthcare.

翻訳日:2023-07-12 17:29:25 公開日:2023-07-08

# チェスの正方形の価値

The Value of Chess Squares ( http://arxiv.org/abs/2307.05330v1 )

ライセンス: Link先を確認

Aditya Gupta and Shiva Maharaj and Nicholas Polson and Vadim Sokolov

(参考訳) チェスの正方形の評価とボード上の駒の配置の決定が,本研究の主な目的である。チェスAIの出現により、チェスのゲームにおけるポジションの価値を正確に評価することが可能になった。従来の手法では固定値を$(\symking=\infty, \symqueen=9, \symrook=5, \symbishop=3, \symknight=3, \sympawn=1)$に割り当てる。我々はこの分析を、部品と正方形の両方の限界評価を導入することで強化する。我々は、騎士と司教の位置を調べることによって我々の方法を示し、ポーンの評価について貴重な洞察を提供する。特に、ニムゾヴィチはポーンの構造と評価の重要性を提唱する先駆者の一人であった。最後に,今後の研究への道筋を示唆する。

Valuing chess squares and determining the placement of pieces on the board are the main objectives of our study. With the emergence of chess AI, it has become possible to accurately assess the worth of positions in a game of chess. The conventional approach assigns fixed values to pieces $(\symking=\infty, \symqueen=9, \symrook=5, \symbishop=3, \symknight=3, \sympawn=1)$. We enhance this analysis by introducing marginal valuations for both pieces and squares. We demonstrate our method by examining the positioning of Knights and Bishops, and also provide valuable insights into the valuation of pawns. Notably, Nimzowitsch was among the pioneers in advocating for the significance of Pawn structure and valuation. Finally, we conclude by suggesting potential avenues for future research.

翻訳日:2023-07-12 14:36:53 公開日:2023-07-08

# 筋力と関節運動量のsEMGに基づく物理インフォームドローショット学習

A Physics-Informed Low-Shot Learning For sEMG-Based Estimation of Muscle Force and Joint Kinematics ( http://arxiv.org/abs/2307.05361v1 )

ライセンス: Link先を確認

Yue Shi, Shuhao Ma, Yihui Zhao, Zhiqiang Zhang

(参考訳) 表面筋電図(SEMG)からの筋力と関節キネマティクス推定は,神経筋刺激,筋動態,運動学における動的相互作用のリアルタイム生体力学的解析に不可欠である。深層ニューラルネットワーク(dnn)の最近の進歩は、完全に自動化され再現可能な方法で生体力学的解析を改善する可能性を示している。しかし, 微小試料の性質と生体力学的解析の物理的解釈性は, dnnの応用を制限している。本稿では,筋力と関節キネマティクスのsEMGに基づく新しい物理インフォームドローショット学習法を提案する。この方法は, ラグランジュの運動方程式と逆動的筋モデルとを, 構造的特徴復号と小サンプルデータからの外挿推定のためのGANフレームワークにシームレスに統合する。具体的には、ラグランジュの運動方程式が生成モデルに導入され、物理学の法則に従う高次特徴の構造化復号を抑える。また,外挿推定値と物理参照値の一貫した物理表現を報奨することにより,対向学習効率を向上させるために,物理に変形した政策勾配が設計されている。実験的な検証は2つのシナリオ(歩行試験と手首運動試験)で実施されている。その結果, 筋力と関節キネマティクスの推定値は, 物理学に基づく逆動力学と比較して非バイアスであり, 物理インフォームド・コンボリューション・ニューラルネット(PI-CNN), バレリーナ生成対向ネットワーク(GAN), 多層超越学習マシン(ML-ELM)など, 選択されたベンチマーク手法よりも優れていた。

Muscle force and joint kinematics estimation from surface electromyography (sEMG) are essential for real-time biomechanical analysis of the dynamic interplay among neural muscle stimulation, muscle dynamics, and kinetics. Recent advances in deep neural networks (DNNs) have shown the potential to improve biomechanical analysis in a fully automated and reproducible manner. However, the small sample nature and physical interpretability of biomechanical analysis limit the applications of DNNs. This paper presents a novel physics-informed low-shot learning method for sEMG-based estimation of muscle force and joint kinematics. This method seamlessly integrates Lagrange's equation of motion and inverse dynamic muscle model into the generative adversarial network (GAN) framework for structured feature decoding and extrapolated estimation from the small sample data. Specifically, Lagrange's equation of motion is introduced into the generative model to restrain the structured decoding of the high-level features following the laws of physics. And a physics-informed policy gradient is designed to improve the adversarial learning efficiency by rewarding the consistent physical representation of the extrapolated estimations and the physical references. Experimental validations are conducted on two scenarios (i.e. the walking trials and wrist motion trials). Results indicate that the estimations of the muscle forces and joint kinematics are unbiased compared to the physics-based inverse dynamics, which outperforms the selected benchmark methods, including physics-informed convolution neural network (PI-CNN), vallina generative adversarial network (GAN), and multi-layer extreme learning machine (ML-ELM).

翻訳日:2023-07-12 14:15:10 公開日:2023-07-08

# 球状グラスホッパー問題

The Spherical Grasshopper Problem ( http://arxiv.org/abs/2307.05359v1 )

ライセンス: Link先を確認

Boris van Breugel

(参考訳) この論文の目的は、単位球面上のグラスホッパー問題の理解を深めることである。この問題はベルの不等式の解析によって動機づけられるが、次のような幾何学的パズルとして定式化することができる。白い球面と黒いペンキのバケツが与えられると、球面の半分をペイントするよう求められ、対脚の対の点の対色が反対に色付けされる。芝生のホッパーが球体に着陸し、ランダムな方向に固定された距離をジャンプします。グラッパーが同じ色に着陸する確率を最大化するために、どのように球体を色付けすべきか? ゴルコとケントは、対足性制約のない平面上でこの問題を調査した。このエッセイは、反ポジタリティ制約による球面問題は平面問題と同様の形状の彩色をもたらすという明確な示唆を与える。本研究は, この問題を解明し, 最適解の探索にシミュレートされたアニールアルゴリズムを用いた。結果は \cite{goulkokent} の平面結果と一致する。 0.10\pi\leq\theta\leq0.44\pi$ cogwheel solution が最適であることが判明し、cogs $n_o$ の奇数整数で $n_o$ が $\frac{2\pi}{\theta}$ に近い。 0.45\pi\leq\theta\leq 0.55\pi$ 臨界解は、同じ色の領域が0.5\pi$ まで小さくなる。約$\theta\geq 0.55$の色は、コグのついたストライプで示される。色は$\theta=\pi$で、幅は$\pi-\theta$で表示される。

The aim of this essay is to better understand the Grasshopper Problem on the surface of the unit sphere. The problem is motivated by analysing Bell inequalities, but can be formulated as a geometric puzzle as follows. Given a white sphere and a bucket of black paint, one is asked to paint half of the sphere, such that antipodal pairs of points are oppositely coloured. A grasshopper lands on the sphere, and jumps a fixed distance in a random direction. How should the sphere be coloured such that the probability of the grasshopper landing on the same colour is maximized? Goulko and Kent have explored this problem on the plane without an antipodality constraint. This essay gives clear indication that the spherical problem with the antipodality constraint yields colourings with similar shapes as the planar problem does. This research has discretised the problem and used a simulated annealing algorithm to search for the optimal solution. Results are consistent with the planar results of \cite{goulkokent}. For $0.10\pi\leq\theta\leq0.44\pi$ cogwheel solutions are found to be optimal, with odd integer of cogs $n_o$ such that $n_o$ is close to $\frac{2\pi}{\theta}$. For $0.45\pi\leq\theta\leq 0.55\pi$ critical solutions are found, in which domains of identical colour decrease in size towards $0.5\pi$ (moving from either side). For $\theta\geq 0.55$ colourings are found consisting of stripes with cogs. Towards $\theta=\pi$ colourings are generated that display just stripes that scale in width with $\pi-\theta$.

翻訳日:2023-07-12 14:14:15 公開日:2023-07-08

# 相対論的量子論の確率論的基礎に向けて:曲線時空における1-Body Born Rule

Towards a Probabilistic Foundation of Relativistic Quantum Theory: The One-Body Born Rule in Curved Spacetime ( http://arxiv.org/abs/2012.05212v5 )

ライセンス: Link先を確認

Maik Reddiger and Bill Poirier

(参考訳) 本研究では、量子力学ボルン則の一般化に基づく相対論的量子論の基礎への新しいアプローチを確立し、時空への粒子位置の確率を決定する。この研究の主要な動機は、量子場理論(QFT)の内部数学的問題を克服することであり、例えば「無限の確率」(再正規化)は、QFTに対する公理的アプローチが数学的だけでなく概念的な性質も持つことを示した。ここで示されるアプローチは、構成によって統計的であり、幅広い力学モデルに対応でき、ミンコフスキー時空の対称性に依存しておらず、相対性理論の一般原理を尊重する。この研究の分析的な部分では、関連する数学的量の滑らかさを仮定して1ドルボディの場合を考える。これは一般相対論的連続性方程式の理論の特別な場合である。ボルン則の相対論的一般化に対する関連するアプローチは、関心の超曲面が空間的であり、時空が大域的に双曲的であると仮定するが、我々はC. Eckart と J. Ehlers の事前の貢献を用いて、前者の条件が自然に「非緊急条件」に置き換えられ、後者が時代遅れであることを示す。我々は、非相対論的アナログから用語を借りて、ラグランジアンとユーレリアの絵と呼ぶ1ドルボディケースの2つの異なる定式化について論じる。私たちは両方を包括的に扱う。この研究の数学物理学文学への主な貢献は、ラグランジアン像の発展である。ラングランジアンの絵は、このアプローチにおいて「時間の確率」を解き、多くの体への一般化の青写真として機能し、体の数が保存されていない場合(後者の例)を描いている。

In this work we establish a novel approach to the foundations of relativistic quantum theory, which is based on generalizing the quantum-mechanical Born rule for determining particle position probabilities to curved spacetime. A principal motivator for this research has been to overcome internal mathematical problems of quantum field theory (QFT) such as the `problem of infinities' (renormalization), which axiomatic approaches to QFT have shown to be not only of mathematical but also of conceptual nature. The approach presented here is statistical by construction, can accommodate a wide array of dynamical models, does not rely on the symmetries of Minkowski spacetime, and respects the general principle of relativity. In the analytical part of this work we consider the $1$-body case under the assumption of smoothness of the mathematical quantities involved. This is identified as a special case of the theory of the general-relativistic continuity equation. While related approaches to the relativistic generalization of the Born rule assume the hypersurfaces of interest to be spacelike and the spacetime to be globally hyperbolic, we employ prior contributions by C. Eckart and J. Ehlers to show that the former condition is naturally replaced by a `non-tangency condition' and that the latter one is obsolete. We discuss two distinct formulations of the $1$-body case, which, borrowing terminology from the non-relativistic analog, we term the Lagrangian and Eulerian pictures. We provide a comprehensive treatment of both. The main contribution of this work to the mathematical physics literature is the development of the Lagrangian picture. The Langrangian picture shows how one can resolve the `problem of time' in this approach and therefore serves as a blueprint for the generalization to many bodies and the case that the number of bodies is not conserved (example given for the latter).

翻訳日:2023-07-11 23:09:51 公開日:2023-07-08

# 新型コロナウイルス危機における意味ネットワーク分析による金融市場の予測

Forecasting financial markets with semantic network analysis in the COVID-19 crisis ( http://arxiv.org/abs/2009.04975v4 )

ライセンス: Link先を確認

A. Fronzetti Colladon, S. Grassi, F. Ravazzolo, F. Violante

(参考訳) 本稿では,ストックマーケットデータの予測に新たなテキストデータインデックスを用いる。インデックスは、テキストに現れる1つ以上の一般的な経済関連キーワードの重要性を評価するために、大量のニュースに適用される。この指標は、その使用頻度と意味ネットワークの位置に基づいて、経済関連キーワードの重要性を評価する。我々は、イタリアの報道機関に適用し、新型コロナウイルス危機を含む最近のサンプル期間におけるイタリア株と債券市場のリターンとボラティリティを予測する指標を構築します。その証拠は、この指数が金融時系列の異なるフェーズをうまく捉えていることを示している。さらに、債券市場のデータ、リターンとボラティリティ、短い熟成と長い熟成、株式市場のボラティリティの予測可能性の強い証拠が示されている。

This paper uses a new textual data index for predicting stock market data. The index is applied to a large set of news to evaluate the importance of one or more general economic-related keywords appearing in the text. The index assesses the importance of the economic-related keywords, based on their frequency of use and semantic network position. We apply it to the Italian press and construct indices to predict Italian stock and bond market returns and volatilities in a recent sample period, including the COVID-19 crisis. The evidence shows that the index captures the different phases of financial time series well. Moreover, results indicate strong evidence of predictability for bond market data, both returns and volatilities, short and long maturities, and stock market volatility.

翻訳日:2023-07-11 23:08:47 公開日:2023-07-08

# 非パラメトリック回帰における相転移

Phase transitions in nonparametric regressions ( http://arxiv.org/abs/2112.03626v6 )

ライセンス: Link先を確認

Ying Zhu

(参考訳) 単一の変数の未知回帰関数が、至る所で共通定数で有界な$(\gamma+1)$thの微分を持つことが知られている(つまり、$(\gamma+1)$thの滑らかさの次数)とき、平均積分二乗誤差(MISE)の最小値の最適値は、文学において$\left(\frac{1}{n}\right)^{\frac{2\gamma+2}{2\gamma+3}}$と記述される。本稿では, (i)$n\leq\left(\gamma+1\right)^{2\gamma+3}$の場合、minimaxの最適ミゼレートは$\frac{\log n}{n\log(\log n)}$であり、最適な滑らか性はおよそ$\max\left\{ \left\lfloor \frac{\log n}{2\log\left(\log n\right)}\right\rfloor ,\,1\right\} $;である。 (ii)$n>\left(\gamma+1\right)^{2\gamma+3}$の場合、ミニマックス最適ミゼレートは$\left(\frac{1}{n}\right)^{\frac{2\gamma+2}{2\gamma+3}}$であり、悪用するための滑らかさの最適度は$\gamma+1$である。本論文の基本的な貢献は、滑らかな関数クラスのために開発した計量エントロピー境界の集合である。私たちの境界のいくつかはオリジナルであり、そのうちのいくつかは文学(例えば、コルモゴロフとティホミロフ、1959)の改善と一般化である。我々の計量エントロピー境界は、よく見られる滑らか性クラスと非標準滑らか性クラスに付随するミニマックス最適MISEレートの位相遷移を示すことができ、非パラメトリック回帰問題以外の独立した関心を持つこともできる。

When the unknown regression function of a single variable is known to have derivatives up to the $(\gamma+1)$th order bounded in absolute values by a common constant everywhere or a.e. (i.e., $(\gamma+1)$th degree of smoothness), the minimax optimal rate of the mean integrated squared error (MISE) is stated as $\left(\frac{1}{n}\right)^{\frac{2\gamma+2}{2\gamma+3}}$ in the literature. This paper shows that: (i) if $n\leq\left(\gamma+1\right)^{2\gamma+3}$, the minimax optimal MISE rate is $\frac{\log n}{n\log(\log n)}$ and the optimal degree of smoothness to exploit is roughly $\max\left\{ \left\lfloor \frac{\log n}{2\log\left(\log n\right)}\right\rfloor ,\,1\right\} $; (ii) if $n>\left(\gamma+1\right)^{2\gamma+3}$, the minimax optimal MISE rate is $\left(\frac{1}{n}\right)^{\frac{2\gamma+2}{2\gamma+3}}$ and the optimal degree of smoothness to exploit is $\gamma+1$. The fundamental contribution of this paper is a set of metric entropy bounds we develop for smooth function classes. Some of our bounds are original, and some of them improve and/or generalize the ones in the literature (e.g., Kolmogorov and Tikhomirov, 1959). Our metric entropy bounds allow us to show phase transitions in the minimax optimal MISE rates associated with some commonly seen smoothness classes as well as non-standard smoothness classes, and can also be of independent interest outside the nonparametric regression problems.

翻訳日:2023-07-11 23:07:17 公開日:2023-07-08

# トレース汎函数の凸性と単調性

Some convexity and monotonicity results of trace functionals ( http://arxiv.org/abs/2108.05785v2 )

ライセンス: Link先を確認

Haonan Zhang

(参考訳) 本稿では、トレース汎函数の凸性を$(A,B,C)\mapsto \text{Tr}|B^{p}AC^{q}|^{s},$$ for parameters $(p,q,s)$が最適であることを示す。また、このタイプのトレース汎関数の単調版を得る。応用として、いくつかの結果を \cite{HP12quasi,CFL16some} に拡張し、行列設定における \cite{RZ14} の予想を解く。その他の cite{RZ14} の予想も議論される。また、関連するトレース函数が一般に凹凸でないことも示している。このような凹凸は異なる問題で成立することが期待された。

In this paper, we prove the convexity of trace functionals $$(A,B,C)\mapsto \text{Tr}|B^{p}AC^{q}|^{s},$$ for parameters $(p,q,s)$ that are best possible, where $B$ and $C$ are any $n$-by-$n$ positive definite matrices, and $A$ is any $n$-by-$n$ matrix. We also obtain the monotonicity versions of trace functionals of this type. As applications, we extend some results in \cite{HP12quasi,CFL16some} and resolve a conjecture in \cite{RZ14} in the matrix setting. Other conjectures in \cite{RZ14} will also be discussed. We also show that some related trace functionals are not concave in general. Such concavity results were expected to hold in different problems.

翻訳日:2023-07-11 23:05:40 公開日:2023-07-08

# 分類システムにおける説明のための統一論理枠組み

A unified logical framework for explanations in classifier systems ( http://arxiv.org/abs/2105.14452v8 )

ライセンス: Link先を確認

Xinghan Liu and Emiliano Lorini

(参考訳) 近年では、説明可能なAI(XAI)分野におけるバイナリ分類器の説明において、ブール関数に対する新たな関心が高まっている。ブール関数の標準的なアプローチは命題論理である。我々は,二項入力分類器とその特性に関する推論をサポートするceteris paribusの性質のモーダル言語を提案する。我々は、分類子モデルの族を研究し、言語の濃度に関する2つの証明体系として公理化し、我々の公理学の完全性を示す。さらに、我々の様相言語に対する充足可能性チェック問題は無限変数の場合ではnexptime-completeであり、有限変数の場合では多項式となることを証明した。さらに、無限変数の場合において、我々の言語の興味深いNPフラグメントを同定する。我々はこの言語を,帰納的,対比的,反事実的説明,バイアスを含む様々な説明概念と同様に,反事実条件を形式化するために活用する。最後に,この言語の2つの拡張について述べる: 代入可能分類器変更の概念による動的拡張と,実際の入力に対する分類器の不確実性を表現できる認識的拡張である。

Recent years have witnessed a renewed interest in Boolean function in explaining binary classifiers in the field of explainable AI (XAI). The standard approach of Boolean function is propositional logic. We present a modal language of a ceteris paribus nature which supports reasoning about binary input classifiers and their properties. We study a family of classifier models, axiomatize it as two proof systems regarding the cardinality of the language and show completeness of our axiomatics. Moreover, we prove that satisfiability checking problem for our modal language is NEXPTIME-complete in the infinite-variable case, while it becomes polynomial in the finite-variable case. We furthermore identify an interesting NP fragment of our language in the infinite-variable case. We leverage the language to formalize counterfactual conditional as well as a variety of notions of explanation including abductive, contrastive and counterfactual explanations, and biases. Finally, we present two extensions of our language: a dynamic extension by the notion of assignment enabling classifier change and an epistemic extension in which the classifier's uncertainty about the actual input can be represented.

翻訳日:2023-07-11 23:05:20 公開日:2023-07-08

# 量子計測理論のためのポインタ

Pointers for Quantum Measurement Theory ( http://arxiv.org/abs/2203.11144v2 )

ライセンス: Link先を確認

Jay Lawrence

(参考訳) 原子スピン1/2または光子偏極の象徴的な測定では、2つの空間分離および非相互作用検出器を用いる。各検出器は二分体であり、原子または光子の存在または欠如を登録する。 $d$状態粒子の測定では、よく知られたポインタ変数をそのような検出器の配列に置き換えることで、標準のフォン・ノイマン測度形式をリキャストする。予備測定プロセスのユニタリダイナミクスは、検出器出力を単一結果のサブ空間に制限し、ポインタが装置から現れることを示す。本装置の物理拡張により,各検出器をリードアウト装置に結合したアンシラ量子ビットに置き換える。これにより、ポインタを(効果的に)古典的部分と異なる量子に分離し、量子を古典的遷移に遅らせる。その結果、通常の装置の崩壊シナリオを回復するだけでなく、量子ポインター状態の重ね合わせを観測することもできる。

In the iconic measurements of atomic spin-1/2 or photon polarization, one employs two spatially separated and noninteracting detectors. Each detector is binary, registering the presence or absence of the atom or the photon. For measurements on a $d$-state particle we recast the standard von Neumann measurement formalism by replacing the familiar pointer variable with an array of such detectors, one for each of the $d$ possible outcomes. We show that the unitary dynamics of the premeasurement process restricts the detector outputs to the subspace of single outcomes, so that the pointer emerges from the apparatus. We propose a physical extension of this apparatus which replaces each detector with an ancilla qubit coupled to a readout device. This explicitly separates the pointer into distinct quantum and (effectively) classical parts, and delays the quantum to classical transition. As a result, one not only recovers the collapse scenario of an ordinary apparatus, but one can also observe a superposition of the quantum pointer states.

翻訳日:2023-07-11 22:54:36 公開日:2023-07-08

# bayanアルゴリズム:モジュラリティの完全および近似最適化によるネットワーク内のコミュニティの検出

The Bayan Algorithm: Detecting Communities in Networks Through Exact and Approximate Optimization of Modularity ( http://arxiv.org/abs/2209.04562v3 )

ライセンス: Link先を確認

Samin Aref, Hriday Chheda, and Mahdi Mostajabdaveh

(参考訳) コミュニティ検出はネットワーク科学における古典的な問題であり、様々な分野に幅広く応用されている。多くのアプローチの中で、最も一般的な方法はモジュラリティの最大化である。設計哲学と広く採用されているにもかかわらず、ヒューリスティックなモジュラリティ最大化アルゴリズムが最適分割を返すことは滅多にない。そこで我々は,最適性あるいは最適分割への近さを保証した分割を返却する特殊アルゴリズムbayanを提案する。ベイアンアルゴリズムの中核は、モジュラリティ最大化問題の整数計画式を最適性や係数内で近似する分岐とカットのスキームである。構造的に多様な合成および実ネットワークを用いた30種類のコミュニティ検出手法と比較した。この結果は,標準ベンチマークグラフの基幹コミュニティの検索におけるベイアンの特異な精度と安定性を示す。 Bayanは、モジュラリティの最大化のためにオープンソースや商用の解決器よりも数倍高速で、既存の方法では最適化できないインスタンスの最適なパーティションを見つけることができる。全体として、ベイアンは最大3000のエッジを持つ実ネットワークにおけるモジュラリティの正確な最大化と、通常のコンピュータ上での大規模インスタンスにおける最大モジュラリティの近似に適した選択であると評価している。 Bayanアルゴリズム(bayanpyライブラリ)のPython実装は、Pythonのパッケージインストーラ(pip)を通じて公開されている。

Community detection is a classic problem in network science with extensive applications in various fields. Among numerous approaches, the most common method is modularity maximization. Despite their design philosophy and wide adoption, heuristic modularity maximization algorithms rarely return an optimal partition or anything similar. We propose a specialized algorithm, Bayan, which returns partitions with a guarantee of either optimality or proximity to an optimal partition. At the core of the Bayan algorithm is a branch-and-cut scheme that solves an integer programming formulation of the modularity maximization problem to optimality or approximate it within a factor. We compare Bayan against 30 alternative community detection methods using structurally diverse synthetic and real networks. Our results demonstrate Bayan's distinctive accuracy and stability in retrieving ground-truth communities of standard benchmark graphs. Bayan is several times faster than open-source and commercial solvers for modularity maximization making it capable of finding optimal partitions for instances that cannot be optimized by any other existing method. Overall, our assessments point to Bayan as a suitable choice for exact maximization of modularity in real networks with up to 3000 edges (in their largest connected component) and approximating maximum modularity in larger instances on ordinary computers. A Python implementation of the Bayan algorithm (the bayanpy library) is publicly available through the package installer for Python (pip).

翻訳日:2023-07-11 22:46:59 公開日:2023-07-08

# 部分展開による拡張多目的A*

Enhanced Multi-Objective A* with Partial Expansion ( http://arxiv.org/abs/2212.03712v2 )

ライセンス: Link先を確認

Valmiki Kothare, Zhongqiang Ren, Sivakumar Rathinam, Howie Choset

(参考訳) 一般にグラフ上に置かれるMO-SPP(Multi-Objective Shortest Path Problem)は、複数の目的を最適化しながら開始頂点から目的地頂点への経路のセットを決定する。一般に、全ての目的を同時に最適化できる単一の解経路は存在しないので、問題はいわゆるパレート最適解の集合を見つけようとする。この問題に対処するため、複数の多目的a*(moa*)アルゴリズムが最近開発され、品質保証付きで素早く解を計算できるようになった。しかし、これらのMOA*アルゴリズムは、特にグラフの分岐係数(すなわち、任意の頂点の隣人の数)が大きい場合、高いメモリ使用率に悩まされることが多い。この作業は,MOA*の高メモリ消費を,実行時にほとんど増加せずに削減することを目的としている。複数の単一目的および多目的探索アルゴリズムを一般化して統一することにより,2つのユーザ定義ハイパーパラメータをチューニングすることにより,実行時およびメモリ効率のバランスをとるランタイムとメモリ効率のmoa*(rme-moa*)アプローチを開発した。

The Multi-Objective Shortest Path Problem (MO-SPP), typically posed on a graph, determines a set of paths from a start vertex to a destination vertex while optimizing multiple objectives. In general, there does not exist a single solution path that can simultaneously optimize all the objectives and the problem thus seeks to find a set of so-called Pareto-optimal solutions. To address this problem, several Multi-Objective A* (MOA*) algorithms were recently developed to quickly compute solutions with quality guarantees. However, these MOA* algorithms often suffer from high memory usage, especially when the branching factor (i.e. the number of neighbors of any vertex) of the graph is large. This work thus aims at reducing the high memory consumption of MOA* with little increase in the runtime. By generalizing and unifying several single- and multi-objective search algorithms, we develop the Runtime and Memory Efficient MOA* (RME-MOA*) approach, which can balance between runtime and memory efficiency by tuning two user-defined hyper-parameters.

翻訳日:2023-07-11 22:36:58 公開日:2023-07-08

# エッジコンピューティングのためのオンラインシーケンス学習を用いた効率的な圧縮比推定

Efficient Compressed Ratio Estimation Using Online Sequential Learning for Edge Computing ( http://arxiv.org/abs/2211.04284v3 )

ライセンス: Link先を確認

Hiroki Oikawa, Hangli Ge, Noboru Koshizuka

(参考訳) モノのインターネットの普及により、大量のセンサー情報がリアルタイムで取得されている。これにより、エッジデバイスからのデータの通信コストが増加する。エッジデバイスで使用可能なデータ圧縮方式である圧縮センシング(cs)は,通信コストを低減する手段として注目を集めている。 csでは,適切な圧縮率の推定が重要である。強化学習(RL)を用いて取得したデータの圧縮比を適応的に推定する手法がある。しかしながら、エッジ上で使用可能な既存のrlメソッドに関連する計算コストは、しばしば高い。本研究では,actor-critic online sequential extreme learning machine (ac-oselm) と呼ばれるエッジデバイスのための効率的なrl法と,ac-oselmを用いてエッジ上の適切な圧縮率を推定してデータを圧縮するシステムを開発した。エッジデバイスにおける他のrl法との比較により,圧縮比推定における提案手法の性能を評価する。実験結果から,AC-OSELMは従来手法よりも圧縮性能が良く,圧縮比が速いことが示唆された。

Owing to the widespread adoption of the Internet of Things, a vast amount of sensor information is being acquired in real time. Accordingly, the communication cost of data from edge devices is increasing. Compressed sensing (CS), a data compression method that can be used on edge devices, has been attracting attention as a method to reduce communication costs. In CS, estimating the appropriate compression ratio is important. There is a method to adaptively estimate the compression ratio for the acquired data using reinforcement learning (RL). However, the computational costs associated with existing RL methods that can be utilized on edges are often high. In this study, we developed an efficient RL method for edge devices, referred to as the actor--critic online sequential extreme learning machine (AC-OSELM), and a system to compress data by estimating an appropriate compression ratio on the edge using AC-OSELM. The performance of the proposed method in estimating the compression ratio is evaluated by comparing it with other RL methods for edge devices. The experimental results indicate that AC-OSELM demonstrated the same or better compression performance and faster compression ratio estimation than the existing methods.

翻訳日:2023-07-11 22:36:01 公開日:2023-07-08

# すべての実射影計測は自己テストできる

All Real Projective Measurements Can be Self-tested ( http://arxiv.org/abs/2302.00974v2 )

ライセンス: Link先を確認

Ranyiliu Chen, Laura Man\v{c}inska, Jurij Vol\v{c}i\v{c}

(参考訳) 自己テストは、古典的なユーザーが量子状態と測定値を生成するために使用される測定値を推定できる量子機能検証の最も強力な形式である。量子状態の自己検定はよく理解されているが、特に高次元での自己検定はより解明されている。実射影測度はすべて自己検査可能であることを示すことで、この方向の最初の一般的な結果を示す。自己テストの標準的な定義は、実測値の認証のみを可能にする。したがって、本研究は、自己検証可能な射影測定の範囲を、その全潜在能力を効果的に広げる。この結果を達成するために、既存の自己検査を拡張して、さらなる信頼できない測定を検証できるという考えを用いる。これは 'post-hoc self-testing' として知られている。我々は,ポストホック自己検査法を形式化し,その適用に十分な条件を確立する。この条件を用いて全ての実射影測度に対する自己検査を構築する。本研究では, ポストホック自己検査を逐次的に活用する反復自己検査技術を開発した。確立された自己テストから始めて、反復的自己テストによって検証できる測定セットを完全に特徴づける。これは既存のテストから新しい自己テストを構築するための明確な方法を提供する。

Self-testing is the strongest form of quantum functionality verification which allows a classical user to deduce the quantum state and measurements used to produce measurement statistics. While self-testing of quantum states is well-understood, self-testing of measurements, especially in high dimensions, has remained more elusive. We demonstrate the first general result in this direction by showing that every real projective measurement can be self-tested. The standard definition of self-testing only allows for the certification of real measurements. Therefore, our work effectively broadens the scope of self-testable projective measurements to their full potential. To reach this result, we employ the idea that existing self-tests can be extended to verify additional untrusted measurements. This is known as `post-hoc self-testing'. We formalize the method of post-hoc self-testing and establish a sufficient condition for its application. Using this condition we construct self-tests for all real projective measurements. Inspired by our construction, we develop a new technique of iterative self-testing, which involves using post-hoc self-testing in a sequential manner. Starting from any established self-test, we fully characterize the set of measurements that can be verified via iterative self-testing. This provides a clear methodology for constructing new self-tests from pre-existing ones.

翻訳日:2023-07-11 22:28:52 公開日:2023-07-08

# siamese配列構造拡散軌道予測によるプリトレーニングタンパク質エンコーダ

Pre-Training Protein Encoder via Siamese Sequence-Structure Diffusion Trajectory Prediction ( http://arxiv.org/abs/2301.12068v2 )

ライセンス: Link先を確認

Zuobai Zhang, Minghao Xu, Aur\'elie Lozano, Vijil Chenthamarakshan, Payel Das, Jian Tang

(参考訳) タンパク質の自己教師付き事前学習法は最近注目され、ほとんどのアプローチはタンパク質配列または構造に焦点をあて、共進化情報と構造特性を統合することによってタンパク質の機能の包括的理解に不可欠であるそれらの共同分布の探索を無視している。本研究は, 生成タスクにおける拡散モデル決定の成功に触発されて, 配列構造共分散モデリングによるタンパク質エンコーダの事前学習を行うDiffPreTアプローチを提案する。 DiffPreTはエンコーダを誘導し、結合拡散軌道に沿って摂動されたタンパク質配列と構造を回収し、配列と構造の結合分布を取得する。必須タンパク質のコンフォメーション変化を考慮すると,シムズ拡散軌道予測(SiamDiff)と呼ばれる手法によりDiffPreTを増強し,タンパク質のコンフォメーションの異なるコンフォメーションの相関を捉える。 SiamDiffはこの目標を達成するために、構造的に相関したコンバータの拡散軌跡の表現間の相互情報を最大化する。我々はDiffPreTとSiamDiffが原子レベルおよび残基レベルの構造に基づくタンパク質理解タスクに与える影響について検討した。実験結果から,全タスクにおいてDiffPreTのパフォーマンスは一貫して競争力があり,SiamDiffは全タスクの平均ランクを考慮して,新たな最先端のパフォーマンスを実現していることがわかった。実装はhttps://github.com/deepgraphlearning/siamdiffで利用可能です。

Self-supervised pre-training methods on proteins have recently gained attention, with most approaches focusing on either protein sequences or structures, neglecting the exploration of their joint distribution, which is crucial for a comprehensive understanding of protein functions by integrating co-evolutionary information and structural characteristics. In this work, inspired by the success of denoising diffusion models in generative tasks, we propose the DiffPreT approach to pre-train a protein encoder by sequence-structure joint diffusion modeling. DiffPreT guides the encoder to recover the native protein sequences and structures from the perturbed ones along the joint diffusion trajectory, which acquires the joint distribution of sequences and structures. Considering the essential protein conformational variations, we enhance DiffPreT by a method called Siamese Diffusion Trajectory Prediction (SiamDiff) to capture the correlation between different conformers of a protein. SiamDiff attains this goal by maximizing the mutual information between representations of diffusion trajectories of structurally-correlated conformers. We study the effectiveness of DiffPreT and SiamDiff on both atom- and residue-level structure-based protein understanding tasks. Experimental results show that the performance of DiffPreT is consistently competitive on all tasks, and SiamDiff achieves new state-of-the-art performance, considering the mean ranks on all tasks. Our implementation is available at https://github.com/DeepGraphLearning/SiamDiff.

翻訳日:2023-07-11 22:28:33 公開日:2023-07-08

# talk the walk: 対話型音楽推薦のための合成データ生成

Talk the Walk: Synthetic Data Generation for Conversational Music Recommendation ( http://arxiv.org/abs/2301.11489v2 )

ライセンス: Link先を確認

Megan Leszczynski, Ravi Ganti, Shu Zhang, Krisztian Balog, Filip Radlinski, Fernando Pereira, Arun Tejasvi Chaganty

(参考訳) レコメンデーションシステムはユビキタスだが,レコメンデーション品質の低さをユーザがコントロールし,調整することが難しい場合が多い。これにより会話レコメンデーションシステム(CRS)の開発が動機となり、自然言語フィードバックによるレコメンデーションの制御が可能となった。しかし、会話レコメンデーションシステムを構築するには、さまざまな好みをカバーする項目と組み合わせたユーザの発話を含む会話トレーニングデータが必要である。このようなデータは、クラウドソーシングのような従来の手法を使って、まとまりなく収集することは困難である。本研究は,音楽,ニュース,レシピレコメンデーションといったユースケースによって動機付けられた,このタスクに対する関心の高まりに注目し,アイテムセットレコメンデーションの文脈で対処する。本稿では,広く利用可能なアイテムコレクションに符号化されたドメイン知識を活用して,現実的な高品質な会話データを合成するTalkTheWalkを提案する。具体的には、TalkTheWalkは、システムによって返される仮説的だが実証可能な一連のアイテムを生成し、その後、言語モデルを使用して対応するユーザの発話を生成する。 TalkTheWalkを音楽レコメンデーションに適用すると、100万以上の多様なプレイリストのキュレーション会話が生成される。人間による評価では、会話には関連する項目集合と一貫した発話が含まれており、このタスクのための小さな人間の会話データの品質とほぼ一致している。同時に、合成コーパスを使用してcrsをトレーニングする場合、標準ベースラインよりもベンチマークデータセットのhis@100を10.5ポイント改善し、オンライン評価において最高パフォーマンスのベースラインよりも好まれる。

Recommendation systems are ubiquitous yet often difficult for users to control and adjust when recommendation quality is poor. This has motivated the development of conversational recommendation systems (CRSs), with control over recommendations provided through natural language feedback. However, building conversational recommendation systems requires conversational training data involving user utterances paired with items that cover a diverse range of preferences. Such data has proved challenging to collect scalably using conventional methods like crowdsourcing. We address it in the context of item-set recommendation, noting the increasing attention to this task motivated by use cases like music, news and recipe recommendation. We present a new technique, TalkTheWalk, that synthesizes realistic high-quality conversational data by leveraging domain expertise encoded in widely available curated item collections, showing how these can be transformed into corresponding item set curation conversations. Specifically, TalkTheWalk generates a sequence of hypothetical yet plausible item sets returned by a system, then uses a language model to produce corresponding user utterances. Applying TalkTheWalk to music recommendation, we generate over one million diverse playlist curation conversations. A human evaluation shows that the conversations contain consistent utterances with relevant item sets, nearly matching the quality of small human-collected conversational data for this task. At the same time, when the synthetic corpus is used to train a CRS, it improves Hits@100 by 10.5 points on a benchmark dataset over standard baselines and is preferred over the top-performing baseline in an online evaluation.

翻訳日:2023-07-11 22:28:07 公開日:2023-07-08

# 構造的物理近似を用いた配向基準の物理的実現

Physical realization of realignment criteria using structural physical approximation ( http://arxiv.org/abs/2301.09884v2 )

ライセンス: Link先を確認

Shruti Aggarwal, Anu Kumari, Satyabrata Adhikari

(参考訳) 量子絡み検出は量子情報処理において重要な資源であるため、量子情報理論において重要な問題である。配向基準は、二部量子系と多部量子系における絡み合った状態を検出する強力なツールである。これは、うまく機能するので、絡み合い検出の重要な基準であり、負の部分転置絡み状態(npte)だけでなく、正の部分転置絡み状態(ppte)にとっても重要な基準である。有向写像に対応する行列は不定であるため、写像の実験的な実装は不明瞭なタスクである。本稿では,まず,構造的物理的近似法(spa)を用いて,実測写像を正の写像に近似し,その後,実測写像の構造的物理的近似(spa-r)が完全に正であることを示す。構築された地図の電位は、物理的に測定できるモーメントを用いて特徴づけられる。次に,不等式という形でspa-rマップに基づく分離可能性基準を開発し,開発した評価基準がnpteだけでなくppteも検出することを示した。得られた結果を支持するいくつかの例を提示した。さらに、配向写像の近似により生じる可能性のある誤差を解析した。

Entanglement detection is an important problem in quantum information theory because quantum entanglement is a key resource in quantum information processing. Realignment criteria is a powerful tool for detection of entangled states in bipartite and multipartite quantum system. It is an important criteria for entanglement detection because it works well; not only for negative partial transpose entangled states (NPTES) but also for positive partial transpose entangled states (PPTES). Since the matrix corresponding to realignment map is indefinite so the experimental implementation of the map is an obscure task. In this work, firstly, we have approximated the realignment map to a positive map using the method of structural physical approximation (SPA) and then we have shown that the structural physical approximation of realignment map (SPA-R) is completely positive. Positivity of the constructed map is characterized using moments which can be physically measured. Next, we develop a separability criterion based on our SPA-R map in the form of an inequality and have shown that the developed criterion not only detect NPTES but also PPTES. We have provided some examples to support the results obtained. Moreover, we have analysed the error that may occur because of approximating the realignment map.

翻訳日:2023-07-11 22:27:09 公開日:2023-07-08

# 無限距離相互作用を持つスピン系における架橋閉および散逸離散時間結晶

Bridging closed and dissipative discrete time crystals in spin systems with infinite-range interactions ( http://arxiv.org/abs/2303.13334v2 )

ライセンス: Link先を確認

Jayson G. Cosme, Jim Skulte, Ludwig Mathey

(参考訳) 我々は, 周期的に駆動されるスピンボソン系において, 時間結晶(TC)の出現と安定性において, ボゾンチャネルの散逸が果たす役割を解明する。ここでボゾンは光子によって表現され、スピン系間の無限距離相互作用を媒介する。強い消散のために、有効な原子のみの記述と閉リプキン-メシュコフ-グリックモデルを用いて力学を研究する。位相図をゼロから無限強度まで様々な散逸強度にマッピングすることにより、TCが存在する位相図内の領域は散逸強度とともに成長するが、ほとんどのTCが不安定になる最適点にしか達しないことを示した。 TCは閉系と散逸系の両方で見られるが、散逸性TCはドライブのランダムノイズに対してより堅牢であることが示され、初期状態の選択によって弱い影響を受ける。我々は、完全な量子力学的記述におけるスピンの数と相互作用強度に関して、TCsの有限サイズの挙動と寿命のスケーリングを示す。

We elucidate the role that the dissipation in a bosonic channel plays in the prevalence and stability of time crystals (TCs) in a periodically driven spin-boson system described by the Dicke model. Here, the bosons are represented by photons, and they mediate the infinite-range interactions between the spin systems. For strong dissipation, we study the dynamics using an effective atom-only description and the closed Lipkin-Meshkov-Glick model. By mapping out the phase diagrams for varying dissipation strengths, ranging from zero to infinitely strong, we demonstrate that the area in the phase diagram, where a TC exists, grows with the dissipation strength but only up to an optimal point, beyond which most of the TCs become unstable. We find TCs in both closed-system and dissipative regimes, but dissipative TCs are shown to be more robust against random noise in the drive, and are only weakly affected by the choice of initial state. We present the finite-sized behaviour and the scaling of the lifetime of the TCs with respect to the number of spins and the interaction strength within a fully quantum mechanical description.

翻訳日:2023-07-11 22:17:36 公開日:2023-07-08

# 拡張貯留層アプローチによる周期駆動傾斜格子の輸送:連続体限界の回復のための安定性基準

Transport in a periodically driven tilted lattice via the extended reservoir approach: Stability criterion for recovering the continuum limit ( http://arxiv.org/abs/2303.04160v3 )

ライセンス: Link先を確認

Bitan De, Gabriela Wojtowicz, Jakub Zakrzewski, Michael Zwolak, Marek M. Rams

(参考訳) 拡張された貯水池は、ナノスケールの接触、不純物、または材料を介して電流を駆動する金属電極のような、マクロな連続的な環境を捉えるための枠組みを提供する。本稿では,この手法を周期的に駆動するシステム,特に量子輸送の文脈で応用することを検討する。時間に依存しないシナリオにおける非平衡定常状態と同様に、電流はクラマーズのターンオーバーを示し、物理的、連続的なリミット応答をキャプチャする台地領域を形成する。簡易な安定性基準は, この物理的台地を対象とする適切な緩和率を示す。このアプローチを用いて, 有限バイアスと温度で保持される2つの金属貯水池に結合した周期的に駆動される傾斜格子による量子輸送について検討した。このモデルを用いて拡張貯留層アプローチのベンチマークを行い,安定性評価を行った。このアプローチは、弱い系の限界におけるよく理解された物理的挙動を回復する。拡張型貯水池は、強い結合と非線形応答に対処し、そこでは、輸送が駆動格子内の力学にどのように反応するかを分析する。これらの結果は、多体フロケット状態のような周期的に駆動される量子システムに拡張型貯水池アプローチを使用するための基盤となる。

Extended reservoirs provide a framework for capturing macroscopic, continuum environments, such as metallic electrodes driving a current through a nanoscale contact, impurity, or material. We examine the application of this approach to periodically driven systems, specifically in the context of quantum transport. As with non--equilibrium steady states in time--independent scenarios, the current displays a Kramers' turnover including the formation of a plateau region that captures the physical, continuum limit response. We demonstrate that a simple stability criteria identifies an appropriate relaxation rate to target this physical plateau. Using this approach, we study quantum transport through a periodically driven tilted lattice coupled to two metallic reservoirs held at a finite bias and temperature. We use this model to benchmark the extended reservoir approach and assess the stability criteria. The approach recovers well--understood physical behavior in the limit of weak system--reservoir coupling. Extended reservoirs enable addressing strong coupling and non--linear response as well, where we analyze how transport responds to the dynamics inside the driven lattice. These results set the foundations for the use of extended reservoir approach for periodically driven, quantum systems, such as many--body Floquet states.

翻訳日:2023-07-11 22:15:56 公開日:2023-07-08

# 合成データ、実際のエラー:どのようにして合成データをパブリッシュして使うか

Synthetic data, real errors: how (not) to publish and use synthetic data ( http://arxiv.org/abs/2305.09235v2 )

ライセンス: Link先を確認

Boris van Breugel, Zhaozhi Qian, Mihaela van der Schaar

(参考訳) 生成モデルによる合成データの生成は、MLコミュニティやそれ以上の関心を集めており、データセットを個々のニーズに合わせてカスタマイズできる未来を約束している。残念なことに、合成データは通常完璧ではないため、下流のタスクで潜在的なエラーが発生する。本研究では、生成プロセスが下流MLタスクにどのように影響するかを検討する。ナイーブな合成データアプローチ -- 合成データが本物であるかのように使用する -- は、実データにうまく一般化しない下流モデルと分析に繋がることを示している。合成データシステムにおけるmlの改善に向けた第一歩として、深層生成アンサンブル(dge)を紹介します。これは、生成過程モデルのパラメーターに対する後方分布を暗黙的に近似することを目的とした、深層アンサンブルに触発されたフレームワークです。 dgeは下流モデルのトレーニング、評価、不確実性定量化を改善し、平均的なナイーブアプローチを大きく上回っている。最も大きな改善は、原データのマイノリティクラスと低密度領域において達成され、生成的不確実性が最も大きい。

Generating synthetic data through generative models is gaining interest in the ML community and beyond, promising a future where datasets can be tailored to individual needs. Unfortunately, synthetic data is usually not perfect, resulting in potential errors in downstream tasks. In this work we explore how the generative process affects the downstream ML task. We show that the naive synthetic data approach -- using synthetic data as if it is real -- leads to downstream models and analyses that do not generalize well to real data. As a first step towards better ML in the synthetic data regime, we introduce Deep Generative Ensemble (DGE) -- a framework inspired by Deep Ensembles that aims to implicitly approximate the posterior distribution over the generative process model parameters. DGE improves downstream model training, evaluation, and uncertainty quantification, vastly outperforming the naive approach on average. The largest improvements are achieved for minority classes and low-density regions of the original data, for which the generative uncertainty is largest.

翻訳日:2023-07-11 22:07:58 公開日:2023-07-08

# モノクルスケール補正とブートストラップを用いた視覚-LiDARオドメトリーとマッピング

Visual-LiDAR Odometry and Mapping with Monocular Scale Correction and Visual Bootstrapping ( http://arxiv.org/abs/2304.08978v2 )

ライセンス: Link先を確認

Hanyu Cai, Ni Ou and Junzheng Wang

(参考訳) 本稿では,低ドリフト特性を有する新しい視覚-LiDARオドメトリーとマッピング手法を提案する。提案手法は,単眼スケール補正と視覚起動型lidarによる初期化修正を併用した,orb-slamとa-loamの2つの一般的なアプローチに基づいている。スケール補正器は、三角測量により回収された画像キーポイントの深さとLiDARによって提供される画像キーポイントの深さの比率を、精度向上のためにオフリヤ拒絶法を用いて算出する。初期化を行うLiDARについて、視覚的オドメトリー法により、LiDARの動作を推定し、性能を向上させる。この手法は高分解能LiDARだけでなく、低分解能LiDARにも適用可能である。提案したSLAMシステムのロバスト性と精度を評価するため,KITTIオドメトリーとS3Eデータセットの実験を行った。実験の結果,orb-slam2 と a-loam を有意に上回った。さらに,スケール補正による視力計測の精度は,ステレオモードORB-SLAM2と同様である。

This paper presents a novel visual-LiDAR odometry and mapping method with low-drift characteristics. The proposed method is based on two popular approaches, ORB-SLAM and A-LOAM, with monocular scale correction and visual-bootstrapped LiDAR poses initialization modifications. The scale corrector calculates the proportion between the depth of image keypoints recovered by triangulation and that provided by LiDAR, using an outlier rejection process for accuracy improvement. Concerning LiDAR poses initialization, the visual odometry approach gives the initial guesses of LiDAR motions for better performance. This methodology is not only applicable to high-resolution LiDAR but can also adapt to low-resolution LiDAR. To evaluate the proposed SLAM system's robustness and accuracy, we conducted experiments on the KITTI Odometry and S3E datasets. Experimental results illustrate that our method significantly outperforms standalone ORB-SLAM2 and A-LOAM. Furthermore, regarding the accuracy of visual odometry with scale correction, our method performs similarly to the stereo-mode ORB-SLAM2.

翻訳日:2023-07-11 22:06:52 公開日:2023-07-08

# グラフによるドメイン間知識伝達

Graph Enabled Cross-Domain Knowledge Transfer ( http://arxiv.org/abs/2304.03452v2 )

ライセンス: Link先を確認

Shibo Yao

(参考訳) 機械学習を意思決定プロセスで活用するには、与えられた知識(自然言語、非構造化テキストなど)を、互換性のある言語とデータフォーマットで機械学習モデルによって理解され、処理可能な表現ベクトルに変換する必要がある。しかし、しばしば遭遇する困難は、与えられた知識がそもそも十分に豊かで信頼性がないことである。そのような場合、優れた表現学習と関心領域における知識不足のギャップを軽減するために、別の領域からの側面情報を融合させようとする。このアプローチはクロスドメインな知識伝達と呼ばれる。オンラインヘルスケアプラットフォーム分析から金融市場のリスク定量化に至るまで、多くのシナリオにおける知識不足の共通性から、この問題を研究することが重要です。機械学習の観点からは、半教師付き学習のパラダイムは、基礎的な真実なしに大量のデータを活用し、目覚ましい学習性能向上を実現する。この論文はクロスドメイン知識の転送に採用されている。 (継続)

To leverage machine learning in any decision-making process, one must convert the given knowledge (for example, natural language, unstructured text) into representation vectors that can be understood and processed by machine learning model in their compatible language and data format. The frequently encountered difficulty is, however, the given knowledge is not rich or reliable enough in the first place. In such cases, one seeks to fuse side information from a separate domain to mitigate the gap between good representation learning and the scarce knowledge in the domain of interest. This approach is named Cross-Domain Knowledge Transfer. It is crucial to study the problem because of the commonality of scarce knowledge in many scenarios, from online healthcare platform analyses to financial market risk quantification, leaving an obstacle in front of us benefiting from automated decision making. From the machine learning perspective, the paradigm of semi-supervised learning takes advantage of large amount of data without ground truth and achieves impressive learning performance improvement. It is adopted in this dissertation for cross-domain knowledge transfer. (to be continued)

翻訳日:2023-07-11 22:05:29 公開日:2023-07-08

# 離散単位を中間目的とするテキストレス音声言語理解の改善

Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target ( http://arxiv.org/abs/2305.18096v2 )

ライセンス: Link先を確認

Guan-Wei Wu, Guan-Ting Lin, Shang-Wen Li, Hung-yi Lee

(参考訳) Spoken Language Understanding (SLU) は、音声音声から意味情報を抽出することを目的としたタスクである。従来の研究は、事前訓練された自動音声認識(ASR)モデルやペアテキストを中間目標とするペア音声テキストデータを用いて、エンドツーエンドのSLUを進展させた。しかし、ペアの書き起こしは高価であり、非書き起こし言語には非現実的である。一方、Textless SLUは、ペアの書き起こしを使わずに、音声から意味情報を抽出する。しかし、中間目標の欠如とテキストレスSLUの訓練指導は、しばしば準最適性能をもたらす。本研究では, テキストレスSLUの性能向上のための中間ガイダンスとして, 自己教師型音声モデルからのコンテンツ非依存の離散単位を用いた。本手法は,5つのSLUベンチマークコーパスのベースライン法を超えている。さらに,単位指導は数発の学習を促進し,ノイズに対処するモデルの能力を高める。

Spoken Language Understanding (SLU) is a task that aims to extract semantic information from spoken utterances. Previous research has made progress in end-to-end SLU by using paired speech-text data, such as pre-trained Automatic Speech Recognition (ASR) models or paired text as intermediate targets. However, acquiring paired transcripts is expensive and impractical for unwritten languages. On the other hand, Textless SLU extracts semantic information from speech without utilizing paired transcripts. However, the absence of intermediate targets and training guidance for textless SLU often results in suboptimal performance. In this work, inspired by the content-disentangled discrete units from self-supervised speech models, we proposed to use discrete units as intermediate guidance to improve textless SLU performance. Our method surpasses the baseline method on five SLU benchmark corpora. Additionally, we find that unit guidance facilitates few-shot learning and enhances the model's ability to handle noise.

翻訳日:2023-07-11 21:56:15 公開日:2023-07-08

# 複素非対称ホッピングをもつ非エルミート準結晶の位相的三相転移

Topological triple phase transition in non-Hermitian quasicrystals with complex asymmetric hopping ( http://arxiv.org/abs/2306.14987v2 )

ライセンス: Link先を確認

Shaina Gandhi and Jayendra N. Bandyopadhyay

(参考訳) 3つの異なる相の3つの相転移、すなわち位相的、パリティ時(pt)対称性の破断、金属-絶縁体遷移は、pt対称非エルミート型オーブリー-アンドレ-ハーパー模型の拡張で観察される。このモデルでは、非エルミート複素準周期的オンサイトポテンシャルに加えて、非ハーミティー性も最近傍ホッピング項に含まれる。また、近隣のホッピング用語も準周期的である。オンサイト電位からの2つの非エルミートパラメータとホッピング部分からのもう1つのパラメータの存在は、系のpt対称性遷移を保証する。さらに、これら2つの非エルミートパラメータをチューニングし、三重相転移を観測するパラメータレジームを同定する。いくつかの最近の研究に続いて、このモデルの電気回路に基づく実験的実現についても論じている。

The triple phase transitions or simultaneous transitions of three different phases, namely topological, parity-time (PT) symmetry breaking, and metal-insulator transitions, are observed in an extension of PT symmetric non-Hermitian Aubry-Andr\'e-Harper model. In this model, besides non-Hermitian complex quasi-periodic onsite potential, non-Hermiticity is also included in the nearest-neighbor hopping terms. Moreover, the nearest-neighbor hopping terms is also quasi-periodic. The presence of two non-Hermitian parameters, one from the onsite potential and another one from the hopping part, ensures PT symmetry transition in the system. In addition, tuning these two non-Hermitian parameters, we identify a parameters regime, where we observe the triple phase transition. Following some recent studies, an electrical circuit based experimental realization of this model is also discussed.

翻訳日:2023-07-11 21:47:45 公開日:2023-07-08

# 量子進化のための量子時間の研究

Insights of quantum time for quantum evolution ( http://arxiv.org/abs/2306.11675v2 )

ライセンス: Link先を確認

Ngo Phuc Duc Loc

(参考訳) 時間が出現すると、量子系は進化するにつれて量子時間と絡み合う。システムが内部の絡み合いを含む場合、内部の絡み合いを「外部の」時間系の絡み合いと区別することができるので、進化の速度が向上する。本稿では、2つの絡み合った量子ビットを含むシステムの進化における量子時間の洞察について検討する。 1)局所力学の下で進化する2つの初期絡み合い量子ビット、(2)その間の絡み合いが時間とともに生じる2つの相互作用量子ビットを考える。最初のケースでは、内部の絡み合いの増加が進化を加速させ、時間とともにより絡み合いを増すという主な結果が得られる。第2のケースでは、忠実性によって特徴づけられる進化距離に対する時間系の絡み合いエントロピーの依存性を示す。また, 相互作用が十分に強い場合, 2つの相互作用量子ビットが2つの非相互作用量子ビットよりも高速に進化し, 時間とともに絡み合うことを発見した。これらの結果は、膨張する宇宙におけるブラックホールの蒸発や宇宙の摂動の量子時間に関する新たな知見を得るのに役立つかもしれない。

If time is emergent, quantum system is entangled with quantum time as it evolves. If the system contains entanglement within itself, which we can call internal entanglement to distinguish it from the ``external" time-system entanglement, the speed of evolution is enhanced. In this paper, we explore the insights of quantum time for the evolution of a system that contains two entangled qubits. We consider two cases: (1) two initially entangled qubits that evolve under local dynamics; (2) two interacting qubits such that entanglement between them is generated over time. In the first case, we obtain the main result that increasing internal entanglement speeds up the evolution and makes the system more entangled with time. In the second case, we show the dependence of time-system entanglement entropy on the distance of evolution which is characterized by fidelity. We also compare the two cases with each other and find that two interacting qubits can evolve faster than two non-interacting qubits if the interaction is sufficiently strong, and thus they become entangled with time more quickly. These results could be useful to gain new insights of quantum time for black hole evaporation or cosmological perturbations in an expanding Universe, because we also have an evolving entangled bipartite system in those cases.

翻訳日:2023-07-11 21:46:59 公開日:2023-07-08

# ニューロモルフィックイメージングのためのgnepに基づく動的セグメンテーションと運動推定

GNEP Based Dynamic Segmentation and Motion Estimation for Neuromorphic Imaging ( http://arxiv.org/abs/2307.02595v2 )

ライセンス: Link先を確認

Harbir Antil and David Sayre

(参考訳) 本稿では,画像分割と動き推定の領域におけるイベントベースカメラの応用について検討する。これらのカメラは、従来のフレームベースの画像取得から離れ、非同期イベントの連続ストリームとして視覚情報をキャプチャすることで、画期的な技術を提供する。イベントストリームから得られる時間的・空間的情報を利用してセグメント化と速度推定を行う一般化ナッシュ平衡に基づくフレームワークを提案する。理論的基礎を確立するために, 存在条件を導出し, 平衡計算のための多レベル最適化法を提案する。このアプローチの有効性は、一連の実験を通じて示される。

This paper explores the application of event-based cameras in the domains of image segmentation and motion estimation. These cameras offer a groundbreaking technology by capturing visual information as a continuous stream of asynchronous events, departing from the conventional frame-based image acquisition. We introduce a Generalized Nash Equilibrium based framework that leverages the temporal and spatial information derived from the event stream to carry out segmentation and velocity estimation. To establish the theoretical foundations, we derive an existence criteria and propose a multi-level optimization method for calculating equilibrium. The efficacy of this approach is shown through a series of experiments.

翻訳日:2023-07-11 21:39:05 公開日:2023-07-08

# 自由方向知識蒸留によるグラフニューラルネットワークの共有成長

Shared Growth of Graph Neural Networks via Free-direction Knowledge Distillation ( http://arxiv.org/abs/2307.00534v2 )

ライセンス: Link先を確認

Kaituo Feng, Yikun Miao, Changsheng Li, Ye Yuan, Guoren Wang

(参考訳) 知識蒸留(KD)は,より深い教師GNNからより浅い学生GNNへ知識を抽出することを目的としたグラフニューラルネットワーク(GNN)の性能向上に有効であることが示されている。しかし、よく知られた過度にパラメータ化され過度にスムースな問題のために、十分に深いGNNを訓練することはしばしば困難であり、実用的なアプリケーションでは知識の伝達が無効になる。本稿では,より高度に最適化された教師GNNを提供するのに不要な,GNNの強化学習(FreeKD)による初のフリーダイレクト知識蒸留フレームワークを提案する。私たちの核となるアイデアは、階層的な方法で強化学習を通じて知識を交換するために、より浅い2つのgnnを共同学習することです。 1つの典型的なGNNモデルは、トレーニング中に異なるノードでより良く、より悪いパフォーマンスを示すことが多いので、動的かつ自由方向の知識伝達戦略を考案する。 1)ノードレベル動作は、2つのネットワークの対応するノード間の知識伝達の方向を決定する。 2) 構造レベルアクションは、ノードレベルアクションが伝搬する局所構造のいずれかを決定する。さらに、マルチビュー入力を扱う際に異なるGNNに存在する多様な知識を考慮し、マルチビュー入力で動作する複数の浅いGNN間で自由方向の知識伝達を可能にするソリューションとしてFreeKD++を導入する。 5つのベンチマークデータセットに対する大規模な実験により、我々のアプローチはベースGNNよりも大きなマージンで優れており、様々なGNNに対して有効性を示している。さらに驚くべきことに、私たちのFreeKDは、より深く強力な教師GNNから知識を抽出する従来のKDアルゴリズムと比べて、同等か、さらに優れたパフォーマンスを持っています。

Knowledge distillation (KD) has shown to be effective to boost the performance of graph neural networks (GNNs), where the typical objective is to distill knowledge from a deeper teacher GNN into a shallower student GNN. However, it is often quite challenging to train a satisfactory deeper GNN due to the well-known over-parametrized and over-smoothing issues, leading to invalid knowledge transfer in practical applications. In this paper, we propose the first Free-direction Knowledge Distillation framework via reinforcement learning for GNNs, called FreeKD, which is no longer required to provide a deeper well-optimized teacher GNN. Our core idea is to collaboratively learn two shallower GNNs in an effort to exchange knowledge between them via reinforcement learning in a hierarchical way. As we observe that one typical GNN model often exhibits better and worse performances at different nodes during training, we devise a dynamic and free-direction knowledge transfer strategy that involves two levels of actions: 1) node-level action determines the directions of knowledge transfer between the corresponding nodes of two networks; and then 2) structure-level action determines which of the local structures generated by the node-level actions to be propagated. Furthermore, considering the diverse knowledge present in different GNNs when dealing with multi-view inputs, we introduce FreeKD++ as a solution to enable free-direction knowledge transfer among multiple shallow GNNs operating on multi-view inputs. Extensive experiments on five benchmark datasets demonstrate our approaches outperform the base GNNs in a large margin, and shows their efficacy to various GNNs. More surprisingly, our FreeKD has comparable or even better performance than traditional KD algorithms that distill knowledge from a deeper and stronger teacher GNN.

翻訳日:2023-07-11 21:37:07 公開日:2023-07-08

# 自己構成型コンパタンスサンプリングを用いた配電盤のブラックボックスシミュレーションの効率向上

Achieving Efficiency in Black Box Simulation of Distribution Tails with Self-structuring Importance Samplers ( http://arxiv.org/abs/2102.07060v3 )

ライセンス: Link先を確認

Anand Deo, Karthyek Murthy

(参考訳) 本稿では,線形プログラム,整数線形プログラム,分断線形・二次目的,ディープニューラルネットワークで指定された特徴マップなどのツール群をモデルとした,パフォーマンス尺度の分布テールを推定するための新しい重要サンプリング(is)スキームを提案する。測度の効率的な変化を明確に識別する従来のアプローチは、目的と基礎となる確率分布に複雑に調整する必要があるため、高度にスタイル化されたモデルを超えて実現可能性や拡張性に関する懸念に悩まされる。このボトルネックは, 希少な試料で観測される濃度特性を再現することにより, 種々のモデルにおいて有効IS分布を暗黙的に誘導できる基本変換法によって克服される。この新しいアプローチは、最適なIS分布の自己相似性の現象をもたらす大きな偏差原理を開発することで導かれる。提案したサンプリング器は,基礎モデルの特異性に難渋するにもかかわらず,多変量分布のスペクトル間で漸近的に最適な分散化を実現する最初のものである。その適用性は、ニューラルネットワークによって伝達される文脈的最短経路とポートフォリオクレジットリスクモデルで示される

This paper presents a novel Importance Sampling (IS) scheme for estimating distribution tails of performance measures modeled with a rich set of tools such as linear programs, integer linear programs, piecewise linear/quadratic objectives, feature maps specified with deep neural networks, etc. The conventional approach of explicitly identifying efficient changes of measure suffers from feasibility and scalability concerns beyond highly stylized models, due to their need to be tailored intricately to the objective and the underlying probability distribution. This bottleneck is overcome in the proposed scheme with an elementary transformation which is capable of implicitly inducing an effective IS distribution in a variety of models by replicating the concentration properties observed in less rare samples. This novel approach is guided by developing a large deviations principle that brings out the phenomenon of self-similarity of optimal IS distributions. The proposed sampler is the first to attain asymptotically optimal variance reduction across a spectrum of multivariate distributions despite being oblivious to the specifics of the underlying model. Its applicability is illustrated with contextual shortest path and portfolio credit risk models informed by neural networks

翻訳日:2023-07-11 19:54:10 公開日:2023-07-08

# 高速解釈可能なGreedy-Tree Sums

Fast Interpretable Greedy-Tree Sums ( http://arxiv.org/abs/2201.11931v3 )

ライセンス: Link先を確認

Yan Shuo Tan, Chandan Singh, Keyan Nasseri, Abhineet Agarwal, James Duncan, Omer Ronen, Matthew Epland, Aaron Kornblith, Bin Yu

(参考訳) 現代の機械学習は印象的な予測性能を達成したが、しばしば解釈可能性の犠牲となる。このような設定では、実践者はしばしば高度に解釈可能な決定木モデルを使用するが、これらは加法構造に対する帰納的バイアスに悩まされる。このバイアスを克服するために,CARTアルゴリズムを一般化したFast Interpretable Greedy-Tree Sums (FIGS)を提案する。論理規則と加算を組み合わせることで、FIGSは高度に解釈可能なまま加法構造に適応することができる。実世界のデータセットに関する広範囲な実験は、figが最先端の予測性能を達成していることを示している。高精細領域におけるFIGSの有用性を示すため,臨床意思決定を導くツールである臨床意思決定器(CDI)の学習にFIGSを適用した。具体的には、医用データの不均一性を考慮したG-FIGSと呼ばれるFIGSの変種を紹介する。 G-FIGSは、ドメイン知識を反映し、感度や解釈性を犠牲にすることなく(CARTよりも20%も向上した)特異性を享受するCDIを導出する。 figに関するさらなる知見を提供するため、figは加法モデルの構成要素を学習できることを証明します。さらに、(オラクル条件下では)非拘束ツリーサムモデルは、加法回帰関数に適合した場合に単一の決定ツリーモデルよりも効率的に一般化するために、ゆがみを利用することを示す。最後に、制約のない分割数による過度な適合を避けるため、ランダム森林の分散低減技術を借りてFIGSのアンサンブル版であるBagging-FIGSを開発した。 Bagging-FIGSは、現実世界のデータセット上でランダムなフォレストやXGBoostと競合するパフォーマンスを享受している。

Modern machine learning has achieved impressive prediction performance, but often sacrifices interpretability, a critical consideration in high-stakes domains such as medicine. In such settings, practitioners often use highly interpretable decision tree models, but these suffer from inductive bias against additive structure. To overcome this bias, we propose Fast Interpretable Greedy-Tree Sums (FIGS), which generalizes the CART algorithm to simultaneously grow a flexible number of trees in summation. By combining logical rules with addition, FIGS is able to adapt to additive structure while remaining highly interpretable. Extensive experiments on real-world datasets show that FIGS achieves state-of-the-art prediction performance. To demonstrate the usefulness of FIGS in high-stakes domains, we adapt FIGS to learn clinical decision instruments (CDIs), which are tools for guiding clinical decision-making. Specifically, we introduce a variant of FIGS known as G-FIGS that accounts for the heterogeneity in medical data. G-FIGS derives CDIs that reflect domain knowledge and enjoy improved specificity (by up to 20% over CART) without sacrificing sensitivity or interpretability. To provide further insight into FIGS, we prove that FIGS learns components of additive models, a property we refer to as disentanglement. Further, we show (under oracle conditions) that unconstrained tree-sum models leverage disentanglement to generalize more efficiently than single decision tree models when fitted to additive regression functions. Finally, to avoid overfitting with an unconstrained number of splits, we develop Bagging-FIGS, an ensemble version of FIGS that borrows the variance reduction techniques of random forests. Bagging-FIGS enjoys competitive performance with random forests and XGBoost on real-world datasets.

翻訳日:2023-07-11 19:45:46 公開日:2023-07-08

# 安全制約を考慮した保守的分布強化学習

Conservative Distributional Reinforcement Learning with Safety Constraints ( http://arxiv.org/abs/2201.07286v2 )

ライセンス: Link先を確認

Hengrui Zhang, Youfang Lin, Sheng Han, Shuo Wang, Kai Lv

(参考訳) 安全探索は、期待される長期コストが制約されるマルコフ決定問題とみなすことができる。従来のオフポリシーアルゴリズムは、制約付き最適化問題をラグランジアン緩和法を導入することで対応する制約付き双対問題に変換する。しかし、上記のアルゴリズムのコスト関数は不正確な推定を提供し、ラグランジュ乗算学習の不安定性を引き起こす。本稿では,cdmpo(reservive distributional maximum a posteriori policy optimization)と呼ばれる新しいオフポリシー強化学習アルゴリズムを提案する。まず,現状が制約を満たすかどうかを正確に判断するため,CDMPOは分散強化学習法を適用してQ関数とC関数を推定する。そして、CDMPOは、探索過程における制約違反の数を減らすために、保守的な値関数損失を使用する。さらに、Lagrange乗算器を安定に更新するために、Weighted Average Proportional Integral Derivative (WAPID) を利用する。実験結果から,提案手法は早期探査プロセスにおける制約違反が少ないことが示された。最終試験結果は,我々の手法がリスク管理に優れていることも示している。

Safety exploration can be regarded as a constrained Markov decision problem where the expected long-term cost is constrained. Previous off-policy algorithms convert the constrained optimization problem into the corresponding unconstrained dual problem by introducing the Lagrangian relaxation technique. However, the cost function of the above algorithms provides inaccurate estimations and causes the instability of the Lagrange multiplier learning. In this paper, we present a novel off-policy reinforcement learning algorithm called Conservative Distributional Maximum a Posteriori Policy Optimization (CDMPO). At first, to accurately judge whether the current situation satisfies the constraints, CDMPO adapts distributional reinforcement learning method to estimate the Q-function and C-function. Then, CDMPO uses a conservative value function loss to reduce the number of violations of constraints during the exploration process. In addition, we utilize Weighted Average Proportional Integral Derivative (WAPID) to update the Lagrange multiplier stably. Empirical results show that the proposed method has fewer violations of constraints in the early exploration process. The final test results also illustrate that our method has better risk control.

翻訳日:2023-07-11 19:45:17 公開日:2023-07-08

# DDAC-SpAM:特徴分割とデコレーションによる高次元スパース付加モデルの分散アルゴリズム

DDAC-SpAM: A Distributed Algorithm for Fitting High-dimensional Sparse Additive Models with Feature Division and Decorrelation ( http://arxiv.org/abs/2205.07932v2 )

ライセンス: Link先を確認

Yifan He and Ruiyang Wu and Yong Zhou and Yang Feng

(参考訳) 分散統計学習は大規模データ分析の一般的な手法となっている。この領域の既存の研究は、観測の分割に重点を置いているが、我々は、高次元のスパース加法モデルの下で特徴を分割する新しいアルゴリズムDDAC-SpAMを提案する。私たちのアプローチには3つのステップがあります。この非相関操作により,各局所推定器は,変数間の相関構造に厳密な制約を課すことなく,各加算成分のスパーシティパターンを回復することができる。提案アルゴリズムの有効性と有効性は, 合成データと実データの両方に関する理論的解析と実験結果によって実証される。理論的結果は、一貫したスパーシティパターンの回復と、各付加的機能成分に対する統計的推測の両方を含む。このアプローチはスパース加法モデルに適合する実用的なソリューションを提供し、幅広い領域で有望な応用が可能となる。

Distributed statistical learning has become a popular technique for large-scale data analysis. Most existing work in this area focuses on dividing the observations, but we propose a new algorithm, DDAC-SpAM, which divides the features under a high-dimensional sparse additive model. Our approach involves three steps: divide, decorrelate, and conquer. The decorrelation operation enables each local estimator to recover the sparsity pattern for each additive component without imposing strict constraints on the correlation structure among variables. The effectiveness and efficiency of the proposed algorithm are demonstrated through theoretical analysis and empirical results on both synthetic and real data. The theoretical results include both the consistent sparsity pattern recovery as well as statistical inference for each additive functional component. Our approach provides a practical solution for fitting sparse additive models, with promising applications in a wide range of domains.

翻訳日:2023-07-11 19:35:49 公開日:2023-07-08

# 深層ファイバクラスタリング : 自己教師付き深層学習による解剖学的にインフォームドされたファイバクラスタリング

Deep fiber clustering: Anatomically informed fiber clustering with self-supervised deep learning for fast and effective tractography parcellation ( http://arxiv.org/abs/2205.00627v3 )

ライセンス: Link先を確認

Yuqian Chen, Chaoyi Zhang, Tengfei Xue, Yang Song, Nikos Makris, Yogesh Rathi, Weidong Cai, Fan Zhang, Lauren J. O'Donnell

(参考訳) ホワイトマター・ファイバ・クラスタリングは、健康と病気における脳の関連を定量的に分析できるホワイトマター・パーセラレーションの重要な戦略である。専門的な神経解剖学的ラベリングと組み合わせることで、データ駆動型白質繊維クラスタリングは、個人間の白質解剖をモデル化するアトラスを作成するための強力なツールである。広く使われているファイバクラスタリング手法は、従来の教師なし機械学習技術による優れた性能を示しているが、近年のディープラーニングの進歩は、高速で効果的なファイバクラスタリングに向けた有望な方向を示している。本研究では,白質繊維クラスタリングのための新しい深層学習フレームワークであるdeep fiber clustering(dfc)を提案する。このプロセスは、トラクトログラフィで再構成された繊維点の順序に関係なく、各繊維の高次元埋め込み特徴表現を学習する。入力ファイバを点クラウドとして表現し,灰色物質パルセレーションから追加の入力情報ソースを取り込み,クラスタの解剖学的コヒーレンスを改善する新たなネットワークアーキテクチャを設計した。さらに、DFCは、クラスタ割り当て確率の低いファイバを拒絶することで、自然に外周除去を行う。性別,年齢(若年者および高齢者),健康状態(健康管理および複数の神経精神疾患)の220人を対象に,dfcを独立に獲得した3つのコホートについて評価した。 DFCと最先端の白質ファイバクラスタリングアルゴリズムを比較した。実験結果は,クラスタコンパクト性,一般化能力,解剖学的コヒーレンス,計算効率において,dfcの優れた性能を示す。

White matter fiber clustering is an important strategy for white matter parcellation, which enables quantitative analysis of brain connections in health and disease. In combination with expert neuroanatomical labeling, data-driven white matter fiber clustering is a powerful tool for creating atlases that can model white matter anatomy across individuals. While widely used fiber clustering approaches have shown good performance using classical unsupervised machine learning techniques, recent advances in deep learning reveal a promising direction toward fast and effective fiber clustering. In this work, we propose a novel deep learning framework for white matter fiber clustering, Deep Fiber Clustering (DFC), which solves the unsupervised clustering problem as a self-supervised learning task with a domain-specific pretext task to predict pairwise fiber distances. This process learns a high-dimensional embedding feature representation for each fiber, regardless of the order of fiber points reconstructed during tractography. We design a novel network architecture that represents input fibers as point clouds and allows the incorporation of additional sources of input information from gray matter parcellation to improve anatomical coherence of clusters. In addition, DFC conducts outlier removal naturally by rejecting fibers with low cluster assignment probability. We evaluate DFC on three independently acquired cohorts, including data from 220 individuals across genders, ages (young and elderly adults), and different health conditions (healthy control and multiple neuropsychiatric disorders). We compare DFC to several state-of-the-art white matter fiber clustering algorithms. Experimental results demonstrate superior performance of DFC in terms of cluster compactness, generalization ability, anatomical coherence, and computational efficiency.

翻訳日:2023-07-11 19:35:12 公開日:2023-07-08

# 暗号通貨の評価 - 説明可能なAIアプローチ

Cryptocurrency Valuation: An Explainable AI Approach ( http://arxiv.org/abs/2201.12893v8 )

ライセンス: Link先を確認

Yulin Liu and Luyao Zhang

(参考訳) 現在、暗号通貨資産の基礎に関する説得力のあるプロキシは存在しない。本稿では、独自のブロックチェーン会計手法を用いて、新しい市場間投資比率(PU比)を提案する。その後、Bitcoinの履歴データによって、さまざまな基本的な市場比をプロキシし、短期的なbitcoinリターンの予測力はほとんどない。しかし、pu比率は、他の方法よりも長期bitcoinリターンを効果的に予測する。さらに,機械学習を用いてPU比の説明可能性を検証する。最後に、PU比によって推奨される自動取引戦略を提示する。第1に、私たちの市場と資金の比率は、古典的な金融理論と、アドホックではなくBitcoin会計のユニークなUTXOモデルに基づくものであり、第2に、この比率の買い得と売り上げ高の影響を実証する実証的証拠であり、最後に、将来の研究において例外となるPython Package Indexを介して、オープンソースソフトウェアとしてトレーディングアルゴリズムを配布する。

Currently, there are no convincing proxies for the fundamentals of cryptocurrency assets. We propose a new market-to-fundamental ratio, the price-to-utility (PU) ratio, utilizing unique blockchain accounting methods. We then proxy various existing fundamental-to-market ratios by Bitcoin historical data and find they have little predictive power for short-term bitcoin returns. However, PU ratio effectively predicts long-term bitcoin returns than alternative methods. Furthermore, we verify the explainability of PU ratio using machine learning. Finally, we present an automated trading strategy advised by the PU ratio that outperforms the conventional buy-and-hold and market-timing strategies. Our research contributes to explainable AI in finance from three facets: First, our market-to-fundamental ratio is based on classic monetary theory and the unique UTXO model of Bitcoin accounting rather than ad hoc; Second, the empirical evidence testifies the buy-low and sell-high implications of the ratio; Finally, we distribute the trading algorithms as open-source software via Python Package Index for future research, which is exceptional in finance research.

翻訳日:2023-07-11 19:33:29 公開日:2023-07-08

# 確率勾配ランゲヴィンダイナミクスの優先サブサンプリング

Preferential Subsampling for Stochastic Gradient Langevin Dynamics ( http://arxiv.org/abs/2210.16189v3 )

ライセンス: Link先を確認

Srshti Putcha, Christopher Nemeth, Paul Fearnhead

(参考訳) 確率勾配MCMC(SGMCMC)は、データの小さな一様重み付きサブサンプルを用いて、対数姿勢の勾配の偏りのない見積もりを構築することで、従来のMCMCに代わるスケーラブルな代替手段を提供する。計算効率は高いが、結果として得られる勾配推定器は、高いばらつきと影響のあるサンプリング性能を示す。分散制御の問題は、従来より優れた確率的勾配推定器を構築することで解決されてきた。本稿では,確率勾配に大きな影響を与えるデータポイントを優先的にサブサンプル化するために,離散的,非一様確率分布を用いることを提案する。さらに,アルゴリズムの各イテレーションにおけるサブサンプルサイズを適応的に調整し,勾配を推定しにくいサンプル空間の領域におけるサブサンプルサイズを増大させる手法を提案する。このような手法は,使用する平均サブサンプルサイズを大幅に削減しつつ,同じレベルの精度を維持することができることを示す。

Stochastic gradient MCMC (SGMCMC) offers a scalable alternative to traditional MCMC, by constructing an unbiased estimate of the gradient of the log-posterior with a small, uniformly-weighted subsample of the data. While efficient to compute, the resulting gradient estimator may exhibit a high variance and impact sampler performance. The problem of variance control has been traditionally addressed by constructing a better stochastic gradient estimator, often using control variates. We propose to use a discrete, non-uniform probability distribution to preferentially subsample data points that have a greater impact on the stochastic gradient. In addition, we present a method of adaptively adjusting the subsample size at each iteration of the algorithm, so that we increase the subsample size in areas of the sample space where the gradient is harder to estimate. We demonstrate that such an approach can maintain the same level of accuracy while substantially reducing the average subsample size that is used.

翻訳日:2023-07-11 19:15:02 公開日:2023-07-08

# 条件付きリスク-逆コンテキスト帯域

Conditionally Risk-Averse Contextual Bandits ( http://arxiv.org/abs/2210.13573v2 )

ライセンス: Link先を確認

M\'onika Farsang and Paul Mineiro and Wangda Zhang

(参考訳) 平均ケースの統計的保証を持つ文脈的帯域幅は、劣化した最悪のケースの振る舞いをトレードオフして平均パフォーマンスを向上させるため、リスク回避の状況では不十分である。リスク・アバース・コンテキスト・バンディットを設計することは、探索が不可欠であるが、リスク・アバース・バンディットは報酬の分布全体に敏感であるため困難である。動的な価格設定、在庫管理、セルフチューニングソフトウェアなど、最悪の結果を避けるべきさまざまなシナリオで実験を行い、本番のエクサスケールデータ処理システムを含む。

Contextual bandits with average-case statistical guarantees are inadequate in risk-averse situations because they might trade off degraded worst-case behaviour for better average performance. Designing a risk-averse contextual bandit is challenging because exploration is necessary but risk-aversion is sensitive to the entire distribution of rewards; nonetheless we exhibit the first risk-averse contextual bandit algorithm with an online regret guarantee. We conduct experiments from diverse scenarios where worst-case outcomes should be avoided, from dynamic pricing, inventory management, and self-tuning software; including a production exascale data processing system.

翻訳日:2023-07-11 19:14:45 公開日:2023-07-08

# 第1回IEEE UV2022数学モデリングコンペティション:背景と問題点

The First IEEE UV2022 Mathematical Modelling Competition: Backgrounds and Problems ( http://arxiv.org/abs/2212.07903v2 )

ライセンス: Link先を確認

Juntao Jiang, Yuan Niu, Yi Tao

(参考訳) 経済成長、人々の健康、都市開発は、戦後の課題に直面している。高品質で持続可能な都市開発を促進する方法、市民の幸福感の向上、都市経営の問題を解決する方法が、熱く重要な話題となっている。数学的モデリング(英: mathematical modeling)は、数学的記号を用いて実用的問題を表現し、数学的モデルを確立し、その解を提案する研究手法である。 1$^{st}$ ieee uv2022数学モデリングコンペティション(英: 1$^{st}$ ieee uv2022 mathematical modeling competition)は、6$^{th}$ ieee international conference on universal villageのサテライト活動である。本稿では,競争の背景を紹介するとともに,解決すべき課題を公表する。

Economic growth, people's health, and urban development face challenges in the post-epidemic era. How to promote high-quality and sustainable urban development, improve citizens' sense of happiness, and solve problems in city management have become a heated and crucial topic. Mathematical modeling is a research method that uses mathematical symbols to express practical problems, establish mathematical models, and then propose solutions. The 1$^{st}$ IEEE UV2022 Mathematical Modelling Competition is a satellite activity of the 6$^{th}$ IEEE International Conference on Universal Village, which expects participants to use mathematical modeling methods for practical problems and provide guidelines for sustainable social progress. This short paper introduces the background of the competition and publishes the problems to be solved.

翻訳日:2023-07-11 19:05:27 公開日:2023-07-08

# ブロックチェーンに関するAI倫理: ブロックチェーンセキュリティのためのTwitterデータに関するトピック分析

AI Ethics on Blockchain: Topic Analysis on Twitter Data for Blockchain Security ( http://arxiv.org/abs/2212.06951v5 )

ライセンス: Link先を確認

Yihang Fu, Zesen Zhuang, Luyao Zhang

(参考訳) Blockchainは、分散ネットワークを使用してコンピュータシステムをよりセキュアにする権限を与えている。しかしながら、現在のブロックチェーン設計は、トランザクションオーダの公平性の問題に悩まされている。鉱夫は、いわゆるmev(miner extractable value)と呼ばれる取引を注文して利益を得ることができる。既存の研究は、MEVが深刻なセキュリティ問題であると認識し、著名なFlashbotを含む潜在的なソリューションを提案する。しかし、以前の研究では主にブロックチェーンデータを分析しており、より広範なAI社会におけるMEVの影響を捉えていない可能性がある。そこで本研究では,MEV上のツイートのトピックを包括的に分析するために自然言語処理(NLP)手法を適用した。私たちは#MEVと#Flashbotsハッシュタグで20000以上のツイートを収集し、それらのトピックを分析しました。以上の結果から, このツイートは, セキュリティ, 公平性, 情緒的感情, およびMEVに対するソリューションへの欲求など, 倫理的懸念の深いトピックを議論した。また、ブロックチェーンやソーシャルメディアプラットフォーム上でのMEV活動のコムーブメントを特定します。私たちの研究は、ブロックチェーンセキュリティ、MEVソリューション、AI倫理のインターフェースにおける文献に貢献します。

Blockchain has empowered computer systems to be more secure using a distributed network. However, the current blockchain design suffers from fairness issues in transaction ordering. Miners are able to reorder transactions to generate profits, the so-called miner extractable value (MEV). Existing research recognizes MEV as a severe security issue and proposes potential solutions, including prominent Flashbots. However, previous studies have mostly analyzed blockchain data, which might not capture the impacts of MEV in a much broader AI society. Thus, in this research, we applied natural language processing (NLP) methods to comprehensively analyze topics in tweets on MEV. We collected more than 20000 tweets with #MEV and #Flashbots hashtags and analyzed their topics. Our results show that the tweets discussed profound topics of ethical concern, including security, equity, emotional sentiments, and the desire for solutions to MEV. We also identify the co-movements of MEV activities on blockchain and social media platforms. Our study contributes to the literature at the interface of blockchain security, MEV solutions, and AI ethics.

翻訳日:2023-07-11 19:05:10 公開日:2023-07-08

# 短期記憶システム,エピソディクス,意味記憶システムを備えた機械

A Machine with Short-Term, Episodic, and Semantic Memory Systems ( http://arxiv.org/abs/2212.02098v2 )

ライセンス: Link先を確認

Taewoon Kim, Michael Cochez, Vincent Fran\c{c}ois-Lavet, Mark Neerincx, Piek Vossen

(参考訳) 明示的な人間の記憶システムの認知科学理論に着想を得て、短期記憶、エピソディクス、意味記憶システムを持つエージェントをモデル化し、それぞれを知識グラフでモデル化した。このシステムを評価し,エージェントの行動を解析するために,エージェントが質問に答えることで,記憶をエンコードし,保存し,取り出す方法を学ぶ必要がある強化学習エージェント環境「the room」を設計・公開した。我々は,Q-ラーニングに基づくエージェントが,短期記憶を忘れるべきか,あるいはエピソード記憶システムやセマンティック記憶システムに格納すべきかをうまく学習していることを示す。実験により,人間のような記憶システムを持つエージェントは,このメモリ構造を環境に含まないエージェントよりも優れることが示された。

Inspired by the cognitive science theory of the explicit human memory systems, we have modeled an agent with short-term, episodic, and semantic memory systems, each of which is modeled with a knowledge graph. To evaluate this system and analyze the behavior of this agent, we designed and released our own reinforcement learning agent environment, "the Room", where an agent has to learn how to encode, store, and retrieve memories to maximize its return by answering questions. We show that our deep Q-learning based agent successfully learns whether a short-term memory should be forgotten, or rather be stored in the episodic or semantic memory systems. Our experiments indicate that an agent with human-like memory systems can outperform an agent without this memory structure in the environment.

翻訳日:2023-07-11 19:04:26 公開日:2023-07-08

# 時間グラフのためのグラフニューラルネットワーク:最先端、オープン課題、そして機会

Graph Neural Networks for temporal graphs: State of the art, open challenges, and opportunities ( http://arxiv.org/abs/2302.01018v4 )

ライセンス: Link先を確認

Antonio Longa, Veronica Lachi, Gabriele Santin, Monica Bianchini, Bruno Lepri, Pietro Lio, Franco Scarselli and Andrea Passerini

(参考訳) グラフニューラルネットワーク(GNN)は、(静的)グラフ構造化データを学ぶための主要なパラダイムとなっている。しかし、グラフとノード/エッジ属性は時間とともに変化するため、現実世界のシステムの多くは本質的に動的である。近年, 時間グラフのためのGNNベースのモデルが, GNNの能力を拡張するための研究分野として注目されている。本稿では,時間的GNNの現状を概観し,学習環境とタスクの厳密な形式化と,時間的側面の表現・処理方法の観点から既存のアプローチを分類する新たな分類法を提案する。調査は、研究とアプリケーションの両方の観点から、この分野の最も関連するオープンチャレンジについて議論して締めくくった。

Graph Neural Networks (GNNs) have become the leading paradigm for learning on (static) graph-structured data. However, many real-world systems are dynamic in nature, since the graph and node/edge attributes change over time. In recent years, GNN-based models for temporal graphs have emerged as a promising area of research to extend the capabilities of GNNs. In this work, we provide the first comprehensive overview of the current state-of-the-art of temporal GNN, introducing a rigorous formalization of learning settings and tasks and a novel taxonomy categorizing existing approaches in terms of how the temporal aspect is represented and processed. We conclude the survey with a discussion of the most relevant open challenges for the field, from both research and application perspectives.

翻訳日:2023-07-11 18:56:21 公開日:2023-07-08

# あるデータ空間から別のデータ空間への局所転送学習

Local transfer learning from one data space to another ( http://arxiv.org/abs/2302.00160v2 )

ライセンス: Link先を確認

H. N. Mhaskar and Ryan O'Dowd

(参考訳) 多様体学習の基本的な問題は、高次元ユークリッド空間の低次元部分多様体上で支持される確率分布からランダムに選択されたデータの関数関係を近似することである。多様体は本質的にデータセット自身で定義され、典型的には、データがある意味で多様体に密着するように設計される。データ空間の概念は、関数近似を可能にする本質的性質をカプセル化した多様体の抽象である。転送学習(meta-learning)の問題は、あるデータセット上の関数の学習を使用して、新しいデータセットで同様の関数を学習することだ。関数近似の観点では、これはあるデータ空間(ベースデータ空間)から別のデータ空間(対象データ空間)へ関数を持ち上げることを意味する。この観点から、応用数学における逆問題(逆ラドン変換など)と転移学習を結びつけることができる。本稿では,データをベースデータ空間の一部でのみ知っていると仮定した場合に,そのようなリフティングの問題を考察する。我々は、リフトが定義できる対象データ空間のサブセットを決定することに興味を持ち、関数の局所的滑らかさとそのリフトの関連性について述べる。

A fundamental problem in manifold learning is to approximate a functional relationship in a data chosen randomly from a probability distribution supported on a low dimensional sub-manifold of a high dimensional ambient Euclidean space. The manifold is essentially defined by the data set itself and, typically, designed so that the data is dense on the manifold in some sense. The notion of a data space is an abstraction of a manifold encapsulating the essential properties that allow for function approximation. The problem of transfer learning (meta-learning) is to use the learning of a function on one data set to learn a similar function on a new data set. In terms of function approximation, this means lifting a function on one data space (the base data space) to another (the target data space). This viewpoint enables us to connect some inverse problems in applied mathematics (such as inverse Radon transform) with transfer learning. In this paper we examine the question of such lifting when the data is assumed to be known only on a part of the base data space. We are interested in determining subsets of the target data space on which the lifting can be defined, and how the local smoothness of the function and its lifting are related.

翻訳日:2023-07-11 18:56:07 公開日:2023-07-08

# MADAv2: 高度なマルチアンカーベースのアクティブドメイン適応セグメンテーション

MADAv2: Advanced Multi-Anchor Based Active Domain Adaptation Segmentation ( http://arxiv.org/abs/2301.07354v2 )

ライセンス: Link先を確認

Munan Ning, Donghuan Lu, Yujia Xie, Dongdong Chen, Dong Wei, Yefeng Zheng, Yonghong Tian, Shuicheng Yan, Li Yuan

(参考訳) 教師なしのドメイン適応は、注釈付きデータの少ないタスクで広く採用されている。残念なことに、ターゲットドメインの分布をソースドメインに無条件にマッピングすると、ターゲットドメインデータの本質的な構造情報を歪めてしまう可能性があるため、性能は低下する。この問題に対処するため,まず,セマンティックセグメンテーションタスクに関するドメイン適応を支援するために,アクティブなサンプル選択を提案する。単一セントロイドの代わりに複数のアンカーを革新的に採用することにより、ソース領域とターゲット領域の両方を、ターゲット領域からより相補的かつ有益なサンプルを選択するマルチモーダル分布として特徴づけることができる。これらのアクティブなサンプルを手作業でアノテートするワークロードは少ないので、ターゲットドメイン分布の歪みを効果的に軽減することができ、パフォーマンス向上が図れる。さらに, 長期分布問題を緩和し, さらにセグメンテーション性能を向上させるために, 強力な半教師付きドメイン適応戦略を提案する。公開データセットで広範な実験を行い,提案手法が最先端手法を大きなマージンで上回り,gta5では71.4%miou,synthiaでは71.8%miouに匹敵する性能を実現することを示した。それぞれの成分の有効性は、徹底的なアブレーション研究によって検証される。

Unsupervised domain adaption has been widely adopted in tasks with scarce annotated data. Unfortunately, mapping the target-domain distribution to the source-domain unconditionally may distort the essential structural information of the target-domain data, leading to inferior performance. To address this issue, we firstly propose to introduce active sample selection to assist domain adaptation regarding the semantic segmentation task. By innovatively adopting multiple anchors instead of a single centroid, both source and target domains can be better characterized as multimodal distributions, in which way more complementary and informative samples are selected from the target domain. With only a little workload to manually annotate these active samples, the distortion of the target-domain distribution can be effectively alleviated, achieving a large performance gain. In addition, a powerful semi-supervised domain adaptation strategy is proposed to alleviate the long-tail distribution problem and further improve the segmentation performance. Extensive experiments are conducted on public datasets, and the results demonstrate that the proposed approach outperforms state-of-the-art methods by large margins and achieves similar performance to the fully-supervised upperbound, i.e., 71.4% mIoU on GTA5 and 71.8% mIoU on SYNTHIA. The effectiveness of each component is also verified by thorough ablation studies.

翻訳日:2023-07-11 18:55:17 公開日:2023-07-08

# 確率ロバスト性に基づく説明

Provable Robust Saliency-based Explanations ( http://arxiv.org/abs/2212.14106v3 )

ライセンス: Link先を確認

Chao Chen, Chenghua Guo, Guixiang Ma, Ming Zeng, Xi Zhang, Sihong Xie

(参考訳) 機械学習モデルのロバストな説明は、モデルに対する人間の信頼を確立する上で重要である。最高$kの交差点は説明の堅牢性を評価するために広く使われている。しかし、既存の攻撃および防御戦略の多くは$\ell_p$ノルムに基づいているため、評価と最適化の目的のミスマッチが生じる。この目的のために,1k$サルエント特徴のランク付け安定性を測定するための説明厚みを定義し,その厚みを最大化し,最上位サルエント特徴を効率的に安定化するために,新しいトラクタブルサーロゲートに基づく \textit{r2et} アルゴリズムを設計する。理論的には、R2ETと対向訓練の関連性を証明し、新しい多目的最適化定式化と一般化誤差境界を用いて、代理目的が説明の数値的および統計的安定性の両方を改善することを証明した。ネットワークアーキテクチャとデータモダリティの幅広い実験により、R2ETはモデル精度を維持しながら、ステルス攻撃下でのロバスト性が高い説明が得られることが示された。

Robust explanations of machine learning models are critical to establishing human trust in the models. The top-$k$ intersection is widely used to evaluate the robustness of explanations. However, most existing attacking and defense strategies are based on $\ell_p$ norms, thus creating a mismatch between the evaluation and optimization objectives. To this end, we define explanation thickness for measuring top-$k$ salient features ranking stability, and design the \textit{R2ET} algorithm based on a novel tractable surrogate to maximize the thickness and stabilize the top salient features efficiently. Theoretically, we prove a connection between R2ET and adversarial training; using a novel multi-objective optimization formulation and a generalization error bound, we further prove that the surrogate objective can improve both the numerical and statistical stability of the explanations. Experiments with a wide spectrum of network architectures and data modalities demonstrate that R2ET attains higher explanation robustness under stealthy attacks while retaining model accuracy.

翻訳日:2023-07-11 18:54:27 公開日:2023-07-08

# CausalDialogue:会話における発話レベルの因果関係のモデル化

CausalDialogue: Modeling Utterance-level Causality in Conversations ( http://arxiv.org/abs/2212.10515v2 )

ライセンス: Link先を確認

Yi-Lin Tuan, Alon Albalak, Wenda Xu, Michael Saxon, Connor Pryor, Lise Getoor, William Yang Wang

(参考訳) 広く採用されているにもかかわらず、ニューラル会話モデルはまだ人間との自然なチャット機能を見せていない。本研究では,ユーザ発話を原因として検討し,応答を効果として生成し,原因の変化が異なる効果をもたらすことを認識した。このコンセプトをさらに探求するため、クラウドソーシングを通じてCausalDialogueと呼ばれる新しいデータセットをコンパイルし、拡張しました。このデータセットは、有向非巡回グラフ(DAG)構造内に複数の因果効果対を含む。分析の結果,従来の損失関数がDAG構造を効果的に組み込むのに苦労していることが判明し,ニューラル会話モデルの発話レベルにおける因果性の影響を高めるために,指数最大平均処理効果(Exponential Maximum Average Treatment Effect, ExMATE)と呼ばれる因果性強化手法を提案する。対話生成における因果性を考慮する必要性を評価するために,様々なモデル,推論,学習手法を用いた因果ダイアログデータセットに関する総合ベンチマークを構築した。実験を通じて、ExMATEのような因果性にインスパイアされた損失は、従来の損失関数の多様性と俊敏性を向上させることができ、この新しいデータセットで人間レベルの品質に到達するための改善の余地がまだ残っていることが判明した。

Despite their widespread adoption, neural conversation models have yet to exhibit natural chat capabilities with humans. In this research, we examine user utterances as causes and generated responses as effects, recognizing that changes in a cause should produce a different effect. To further explore this concept, we have compiled and expanded upon a new dataset called CausalDialogue through crowd-sourcing. This dataset includes multiple cause-effect pairs within a directed acyclic graph (DAG) structure. Our analysis reveals that traditional loss functions struggle to effectively incorporate the DAG structure, leading us to propose a causality-enhanced method called Exponential Maximum Average Treatment Effect (ExMATE) to enhance the impact of causality at the utterance level in training neural conversation models. To evaluate the needs of considering causality in dialogue generation, we built a comprehensive benchmark on CausalDialogue dataset using different models, inference, and training methods. Through experiments, we find that a causality-inspired loss like ExMATE can improve the diversity and agility of conventional loss function and there is still room for improvement to reach human-level quality on this new dataset.

翻訳日:2023-07-11 18:54:07 公開日:2023-07-08

# BlackVIP:ロバストトランスファー学習のためのブラックボックスビジュアルプロンプト

BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning ( http://arxiv.org/abs/2303.14773v2 )

ライセンス: Link先を確認

Changdae Oh, Hyeji Hwang, Hee-young Lee, YongTaek Lim, Geunyoung Jung, Jiyoung Jung, Hosik Choi, Kyungwoo Song

(参考訳) 大規模事前学習モデル(PTM)の急増に伴い、これらのモデルを多くの下流タスクに微調整することが重要な問題となっている。その結果,大規模モデルのパラメータ効率のよい伝達学習 (PETL) が注目されている。最近のPETL法は素晴らしい性能を示しているが、楽観的な仮定に依存している。 1) PTM のパラメータ全体のセットが利用可能で、 2)微調整のための十分な大きなメモリ容量を備える。しかしながら、現実世界のほとんどのアプリケーションでは、PTMは明確なパラメータアクセシビリティを持たないブラックボックスAPIまたはプロプライエタリなソフトウェアとして提供される。また、現代のPTMにおいて大きなメモリ要件を満たすことは困難である。本研究では,モデルアーキテクチャやパラメータの知識のないPTMを効率的に適応するブラックボックスビジュアルプロンプト(BlackVIP)を提案する。 BlackVIPには2つのコンポーネントがある。 1)コーディネーター及び 2) 傾斜補正を伴う同時摂動確率近似(SPSA-GC)。コーディネーターは入力に依存した画像形状の視覚的プロンプトを設計し、分散/位置シフトに対するわずかな適応とロバスト性を改善する。 SPSA-GCは、コーディネータを更新するターゲットモデルの勾配を効率的に推定する。 16のデータセットに対する大規模な実験では、最小限のメモリ要件で、PTMのパラメータにアクセスすることなく、BlackVIPが多様なドメインへの堅牢な適応を可能にすることが示されている。コード: \url{https://github.com/changdaeoh/BlackVIP}

With the surge of large-scale pre-trained models (PTMs), fine-tuning these models to numerous downstream tasks becomes a crucial problem. Consequently, parameter efficient transfer learning (PETL) of large models has grasped huge attention. While recent PETL methods showcase impressive performance, they rely on optimistic assumptions: 1) the entire parameter set of a PTM is available, and 2) a sufficiently large memory capacity for the fine-tuning is equipped. However, in most real-world applications, PTMs are served as a black-box API or proprietary software without explicit parameter accessibility. Besides, it is hard to meet a large memory requirement for modern PTMs. In this work, we propose black-box visual prompting (BlackVIP), which efficiently adapts the PTMs without knowledge about model architectures and parameters. BlackVIP has two components; 1) Coordinator and 2) simultaneous perturbation stochastic approximation with gradient correction (SPSA-GC). The Coordinator designs input-dependent image-shaped visual prompts, which improves few-shot adaptation and robustness on distribution/location shift. SPSA-GC efficiently estimates the gradient of a target model to update Coordinator. Extensive experiments on 16 datasets demonstrate that BlackVIP enables robust adaptation to diverse domains without accessing PTMs' parameters, with minimal memory requirements. Code: \url{https://github.com/changdaeoh/BlackVIP}

翻訳日:2023-07-11 18:46:32 公開日:2023-07-08

# オブジェクト中心スロット拡散

Object-Centric Slot Diffusion ( http://arxiv.org/abs/2303.10834v2 )

ライセンス: Link先を確認

Jindong Jiang, Fei Deng, Gautam Singh, Sungjin Ahn

(参考訳) オブジェクト中心学習におけるトランスフォーマーベース画像生成モデルの成功は、複雑なシーンを扱うための強力な画像生成器の重要性を強調している。しかし、画像生成における拡散モデルの表現力が高いにもかかわらず、オブジェクト中心学習への統合は、この領域では未解明のままである。本稿では,オブジェクト中心学習への拡散モデル統合の可能性と可能性について検討し,このアプローチの長所と短所について考察する。従来のスロットデコーダをオブジェクトスロット上で条件付けされた潜在拡散モデルに置き換えた最初のオブジェクト中心学習モデルであり、テキストのような教師付きアノテーションを必要とせずに動作する最初の教師なし合成条件拡散モデルでもある。この分野でのFFHQデータセットの最初の適用を含む、さまざまなオブジェクト中心のタスクの実験を通じて、LSDが最先端のトランスフォーマーベースのデコーダ、特に複雑なシーンにおいて著しく優れており、教師なしの合成生成品質が優れていることを示す。プロジェクトページは $\href{https://latentslotdiffusion.github.io}{here}$

The recent success of transformer-based image generative models in object-centric learning highlights the importance of powerful image generators for handling complex scenes. However, despite the high expressiveness of diffusion models in image generation, their integration into object-centric learning remains largely unexplored in this domain. In this paper, we explore the feasibility and potential of integrating diffusion models into object-centric learning and investigate the pros and cons of this approach. We introduce Latent Slot Diffusion (LSD), a novel model that serves dual purposes: it is the first object-centric learning model to replace conventional slot decoders with a latent diffusion model conditioned on object slots, and it is also the first unsupervised compositional conditional diffusion model that operates without the need for supervised annotations like text. Through experiments on various object-centric tasks, including the first application of the FFHQ dataset in this field, we demonstrate that LSD significantly outperforms state-of-the-art transformer-based decoders, particularly in more complex scenes, and exhibits superior unsupervised compositional generation quality. Project page is available at $\href{https://latentslotdiffusion.github.io}{here}$

翻訳日:2023-07-11 18:45:48 公開日:2023-07-08

# ネットワークマルコフポテンシャルゲームにおける局所的アクター臨界の収束速度

Convergence Rates for Localized Actor-Critic in Networked Markov Potential Games ( http://arxiv.org/abs/2303.04865v2 )

ライセンス: Link先を確認

Zhaoyi Zhou, Zaiwei Chen, Yiheng Lin, and Adam Wierman

(参考訳) 本稿では,ネットワーク内のノードにエージェントが関連付けられているネットワークマルコフポテンシャルゲームについて紹介する。各エージェントは、それぞれのローカルポテンシャル関数を持ち、各エージェントの報酬は、近隣のエージェントの状態とアクションにのみ依存する。この文脈では,局所化アクタ-クリティックアルゴリズムを提案する。各エージェントはローカル情報のみを使用しており、グローバル状態へのアクセスは必要ないため、アルゴリズムはスケーラブルである。さらに、このアルゴリズムは関数近似を用いて次元の呪いを克服する。主な結果は,局所化誤差と関数近似誤差までの有限サンプル保証を提供する。具体的には、平均的なナッシュ後悔によって測定されたサンプルの複雑さを$\tilde{\mathcal{O}}(\tilde{\epsilon}^{-4})で達成する。これはエージェントの数に依存しないマルチエージェント競争ゲームに対する最初の有限サンプル境界である。

We introduce a class of networked Markov potential games in which agents are associated with nodes in a network. Each agent has its own local potential function, and the reward of each agent depends only on the states and actions of the agents within a neighborhood. In this context, we propose a localized actor-critic algorithm. The algorithm is scalable since each agent uses only local information and does not need access to the global state. Further, the algorithm overcomes the curse of dimensionality through the use of function approximation. Our main results provide finite-sample guarantees up to a localization error and a function approximation error. Specifically, we achieve an $\tilde{\mathcal{O}}(\tilde{\epsilon}^{-4})$ sample complexity measured by the averaged Nash regret. This is the first finite-sample bound for multi-agent competitive games that does not depend on the number of agents.

翻訳日:2023-07-11 18:45:07 公開日:2023-07-08

# OmniForce: 人中心,大規模モデル駆動,クラウドエッジコラボレーション型AutoMLシステムについて

OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge Collaborative AutoML System ( http://arxiv.org/abs/2303.00501v2 )

ライセンス: Link先を確認

Chao Xue, Wei Liu, Shuai Xie, Zhenfang Wang, Jiaxing Li, Xuyang Peng, Liang Ding, Shanshan Zhao, Qiong Cao, Yibo Yang, Fengxiang He, Bohua Cai, Rongcheng Bian, Yiyan Zhao, Heliang Zheng, Xiangyang Liu, Dongkai Liu, Daqing Liu, Li Shen, Chang Li, Shijin Zhang, Yukang Zhang, Guanpu Chen, Shixiang Chen, Yibing Zhan, Jing Zhang, Chaoyue Wang, Dacheng Tao

(参考訳) 機械学習(AutoML)は、最小限の人力でMLモデルを構築することを目指している。 While considerable research has been conducted in the area of AutoML in general, aiming to take humans out of the loop when building artificial intelligence (AI) applications, scant literature has focused on how AutoML works well in open-environment scenarios such as the process of training and updating large models, industrial supply chains or the industrial metaverse, where people often face open-loop problems during the search process: they must continuously collect data, update data and models, satisfy the requirements of the development and deployment environment, support massive devices, modify evaluation metrics, etc. 純粋なデータ駆動アプローチによるオープン環境問題に対処するには、データ量、計算リソース、専用のデータエンジニアの努力が必要であり、現在のautomlシステムとプラットフォームは非効率で計算が難しい。人間とコンピュータの相互作用は、オープン環境AIの問題に取り組むための実用的で実現可能な方法である。本稿では、人中心型オートML(HAML)システムであるOmniForceを紹介し、人支援型MLと人支援型MLの両方を出力し、AutoMLシステムを実践し、オープン環境シナリオにおける適応型AIを構築する。具体的には、mlバージョン管理、パイプライン駆動開発とデプロイメントのコラボレーション、柔軟な検索戦略フレームワーク、大規模モデルを含む広くプロビジョニングされクラウドソースされたアプリケーションアルゴリズムなど、omniforceを紹介します。さらにomniforceによって構築された(大規模な)モデルは、数分で自動的にリモートサービスに変換することができる。複数の検索空間と実世界のユースケースで得られた実験結果は,OmniForceの有効性と有効性を示している。

Automated machine learning (AutoML) seeks to build ML models with minimal human effort. While considerable research has been conducted in the area of AutoML in general, aiming to take humans out of the loop when building artificial intelligence (AI) applications, scant literature has focused on how AutoML works well in open-environment scenarios such as the process of training and updating large models, industrial supply chains or the industrial metaverse, where people often face open-loop problems during the search process: they must continuously collect data, update data and models, satisfy the requirements of the development and deployment environment, support massive devices, modify evaluation metrics, etc. Addressing the open-environment issue with pure data-driven approaches requires considerable data, computing resources, and effort from dedicated data engineers, making current AutoML systems and platforms inefficient and computationally intractable. Human-computer interaction is a practical and feasible way to tackle the problem of open-environment AI. In this paper, we introduce OmniForce, a human-centered AutoML (HAML) system that yields both human-assisted ML and ML-assisted human techniques, to put an AutoML system into practice and build adaptive AI in open-environment scenarios. Specifically, we present OmniForce in terms of ML version management; pipeline-driven development and deployment collaborations; a flexible search strategy framework; and widely provisioned and crowdsourced application algorithms, including large models. Furthermore, the (large) models constructed by OmniForce can be automatically turned into remote services in a few minutes; this process is dubbed model as a service (MaaS). Experimental results obtained in multiple search spaces and real-world use cases demonstrate the efficacy and efficiency of OmniForce.

翻訳日:2023-07-11 18:44:38 公開日:2023-07-08

# クナプサックによる約定置帯

Approximately Stationary Bandits with Knapsacks ( http://arxiv.org/abs/2302.14686v2 )

ライセンス: Link先を確認

Giannis Fikioris, \'Eva Tardos

(参考訳) 世界的な予算制約の下でのバンディット問題の一般化であるナップサック(bwk)によるバンディットは近年注目を集めている。以前の研究では、各ラウンドのリソースの報酬と消費がi.d.分布からサンプリングされる確率的BwKと、これらのパラメータが相手によって選択される逆BwKの2つの極端に焦点が当てられていた。非回帰学習は確率的な場合では達成可能であるが、敵対的な場合においては競争比率のスタイルのみが達成可能であり、競争比率は予算か時間と資源の双方に左右される。このギャップを大きくしているのは、Adversarial BwKでは、予算がより拘束力のある場合、保証が悪化することです。 best-of-both-worlds''型アルゴリズムは知られているが(各極端な場合において最高の保証を提供する単一のアルゴリズム)、それらの境界は、環境が完全に確率的でないとすぐに敵のケースに分解される。私たちの仕事は、このギャップを埋めることを目的としており、厳密には確率的ではなく最悪のケースでもないワークロードの保証を提供しています。我々は、インスタンスが確率的あるいは逆数的に近いかをパラメータ化する条件 A approximately Stationary BwK を定義する。これらのパラメータに基づいて、BwKで達成可能な最高の競争比率を探索する。パラメータの値に従わない2つのアルゴリズムを探索し、パラメータの値に依存する2つの極端なケースにおいて、最善の保証間のスムーズな遷移が可能な競合比を保証する。我々の保証は、特に利用可能な予算が少なければ、敵の保証を大きく改善します。私たちはまた、達成可能な保証の限界を証明し、予算が小さい場合の結果がほぼタイトであることを示します。

Bandits with Knapsacks (BwK), the generalization of the Bandits problem under global budget constraints, has received a lot of attention in recent years. Previous work has focused on one of the two extremes: Stochastic BwK where the rewards and consumptions of the resources of each round are sampled from an i.i.d. distribution, and Adversarial BwK where these parameters are picked by an adversary. Achievable guarantees in the two cases exhibit a massive gap: No-regret learning is achievable in the stochastic case, but in the adversarial case only competitive ratio style guarantees are achievable, where the competitive ratio depends either on the budget or on both the time and the number of resources. What makes this gap so vast is that in Adversarial BwK the guarantees get worse in the typical case when the budget is more binding. While ``best-of-both-worlds'' type algorithms are known (single algorithms that provide the best achievable guarantee in each extreme case), their bounds degrade to the adversarial case as soon as the environment is not fully stochastic. Our work aims to bridge this gap, offering guarantees for a workload that is not exactly stochastic but is also not worst-case. We define a condition, Approximately Stationary BwK, that parameterizes how close to stochastic or adversarial an instance is. Based on these parameters, we explore what is the best competitive ratio attainable in BwK. We explore two algorithms that are oblivious to the values of the parameters but guarantee competitive ratios that smoothly transition between the best possible guarantees in the two extreme cases, depending on the values of the parameters. Our guarantees offer great improvement over the adversarial guarantee, especially when the available budget is small. We also prove bounds on the achievable guarantee, showing that our results are approximately tight when the budget is small.

翻訳日:2023-07-11 18:44:12 公開日:2023-07-08

# 雑音画像分割における限界しきい値

Marginal Thresholding in Noisy Image Segmentation ( http://arxiv.org/abs/2304.04116v3 )

ライセンス: Link先を確認

Marcus Nordstr\"om, Henrik Hult, Atsuto Maki

(参考訳) 本研究は,ガウス場変形に基づく雑音モデルを考慮した医用画像分割におけるラベルノイズの検討である。このようなノイズは、現実的な外観のセグメンテーションをもたらし、期待される変形が恒等写像であるという意味では偏りがないため、興味がある。限界確率に対するサンプリングおよび閉形解の効率的な方法が提供される。さらに,損失関数のクロスエントロピーとソフトディスに対する理論的最適解について検討し,ノイズレベルが増加するにつれてどのように分岐するかを示した。損失関数のキャラクタリゼーションに関する最近の研究に基づき、効率的に計算できる特定の未知のしきい値を持つクロスエントロピーの解をしきい値にすることで、ソフトディースの最適解を復元できることが示されている。これにより, クロスエントロピーをソフトディスと比較した場合のパフォーマンス低下は, 間違ったしきい値を用いて生じるのかという疑問が持ち上がる。この仮説は、トータルセグメンタデータセットから3つの臓器区分問題に関する5倍の研究で検証され、4つの異なる雑音強度を用いて検証される。その結果, 閾値の変化は, クロスエントロピーの性能をソフトディスより体系的に悪いものから, ソフトディスより良いものへと導くことが示唆された。

This work presents a study on label noise in medical image segmentation by considering a noise model based on Gaussian field deformations. Such noise is of interest because it yields realistic looking segmentations and because it is unbiased in the sense that the expected deformation is the identity mapping. Efficient methods for sampling and closed form solutions for the marginal probabilities are provided. Moreover, theoretically optimal solutions to the loss functions cross-entropy and soft-Dice are studied and it is shown how they diverge as the level of noise increases. Based on recent work on loss function characterization, it is shown that optimal solutions to soft-Dice can be recovered by thresholding solutions to cross-entropy with a particular a priori unknown threshold that efficiently can be computed. This raises the question whether the decrease in performance seen when using cross-entropy as compared to soft-Dice is caused by using the wrong threshold. The hypothesis is validated in 5-fold studies on three organ segmentation problems from the TotalSegmentor data set, using 4 different strengths of noise. The results show that changing the threshold leads the performance of cross-entropy to go from systematically worse than soft-Dice to similar or better results than soft-Dice.

翻訳日:2023-07-11 18:35:59 公開日:2023-07-08

# 自然法則発見のための機械学習

Machine learning for discovering laws of nature ( http://arxiv.org/abs/2303.17607v2 )

ライセンス: Link先を確認

Lizhi Xin, Kevin Xin, Houwen Xin

(参考訳) ダーウィンの自然選択に基づいて、生データから学習することで自然法則を発見する「機械科学者」を開発した。「機械科学者」は、論理木(状態決定木)と値木(観測関数木)を適用して物理理論を構築し、論理木は実体の状態を決定し、値木は実体の2つの観察の間の絶対値を決定する。論理木と値木を組み合わせることで、エンティティの軌道を再構築し、将来の結果を予測することができる。提案したアルゴリズムモデルは機械学習に重点を置いており、そこでは「機械科学者」がそれぞれの決定に対して報われ、罰せられ、最終的にはニュートンの方程式(古典物理学)とボルンの規則(量子力学)を再発見する。

Based on Darwin's natural selection, we developed "machine scientists" to discover the laws of nature by learning from raw data. "Machine scientists" construct physical theories by applying a logic tree (state Decision Tree) and a value tree (observation Function Tree); the logical tree determines the state of the entity, and the value tree determines the absolute value between the two observations of the entity. A logic Tree and a value tree together can reconstruct an entity's trajectory and make predictions about its future outcomes. Our proposed algorithmic model has an emphasis on machine learning - where "machine scientists" builds up its experience by being rewarded or punished for each decision they make - eventually leading to rediscovering Newton's equation (classical physics) and the Born's rule (quantum mechanics).

翻訳日:2023-07-11 18:34:50 公開日:2023-07-08

# ビデオからの衝撃音合成のための物理駆動拡散モデル

Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos ( http://arxiv.org/abs/2303.16897v3 )

ライセンス: Link先を確認

Kun Su, Kaizhi Qian, Eli Shlizerman, Antonio Torralba, Chuang Gan

(参考訳) 実世界と仮想世界の没入的知覚経験には,物理物体の相互作用から発生する音のモデル化が重要である。従来の衝撃音合成法では、物理シミュレーションを用いて音を表現・合成できる物理パラメータのセットを得る。しかし、それらは実際の世界ではほとんど利用できず、一般的なビデオからの衝撃音の合成にも適用できない、物体のジオメトリと衝撃位置の両方の詳細な詳細を必要とする。一方、既存のビデオ駆動深層学習に基づくアプローチは、物理知識が不足しているため、視覚内容と衝撃音との弱い対応を捉えることしかできなかった。本研究では,サイレントビデオクリップに対して高忠実度衝撃音を合成できる物理駆動拡散モデルを提案する。ビデオコンテンツに加えて, 衝撃音合成手順を導くために, 追加の物理計算を優先して用いることを提案する。物理学の優先事項には、ノイズの多い実世界の衝撃音例から直接推定される物理パラメータと、ニューラルネットワークを介して音環境を解釈する学習された残留パラメータが含まれている。さらに,物理の優先順位と視覚情報を結合して音響合成を行うための,具体的な学習と推論戦略を備えた新しい拡散モデルの実装を行った。実験の結果, 本モデルが既存のシステムよりも現実的な衝撃音の生成に優れていることがわかった。さらに重要なことに、物理ベースの表現は完全に解釈可能で透明なので、音の編集を柔軟に行える。

Modeling sounds emitted from physical object interactions is critical for immersive perceptual experiences in real and virtual worlds. Traditional methods of impact sound synthesis use physics simulation to obtain a set of physics parameters that could represent and synthesize the sound. However, they require fine details of both the object geometries and impact locations, which are rarely available in the real world and can not be applied to synthesize impact sounds from common videos. On the other hand, existing video-driven deep learning-based approaches could only capture the weak correspondence between visual content and impact sounds since they lack of physics knowledge. In this work, we propose a physics-driven diffusion model that can synthesize high-fidelity impact sound for a silent video clip. In addition to the video content, we propose to use additional physics priors to guide the impact sound synthesis procedure. The physics priors include both physics parameters that are directly estimated from noisy real-world impact sound examples without sophisticated setup and learned residual parameters that interpret the sound environment via neural networks. We further implement a novel diffusion model with specific training and inference strategies to combine physics priors and visual information for impact sound synthesis. Experimental results show that our model outperforms several existing systems in generating realistic impact sounds. More importantly, the physics-based representations are fully interpretable and transparent, thus enabling us to perform sound editing flexibly.

翻訳日:2023-07-11 18:34:34 公開日:2023-07-08

# 巨大言語の頭脳が十分ではないとき! 知識スパークルダストを持つドメインピザズ

When Giant Language Brains Just Aren't Enough! Domain Pizzazz with Knowledge Sparkle Dust ( http://arxiv.org/abs/2305.07230v2 )

ライセンス: Link先を確認

Minh-Tien Nguyen, Duy-Hung Nguyen, Shahab Sabahi, Hung Le, Jeff Yang, Hajime Hotta

(参考訳) 大規模言語モデル(llm)は自然言語処理の分野を大幅に進歩させ、gptモデルが最前線にある。その顕著なパフォーマンスはさまざまなタスクにまたがるが、実際のビジネスシナリオにllmを適用することは、さらなる調査を必要とする課題である。本稿では, LLM の実用化におけるギャップを埋めることを目的とした実証分析を行った。そこで本研究では, 推論の課題によるケーススタディとして, 保険の質問応答(QA)タスクを選択する。このタスクに基づいて,保険政策ルールブックやDBpediaから抽出した付加的な知識により,LCMに依存する新しいモデルを設計する。追加知識は、LLMがドメイン適応のための保険の新しい概念を理解するのに役立つ。 2つのQAデータセットの予備的な結果は、知識の強化がGPT-3.5の推論能力(正確性の観点からは55.80%と57.83%)を大幅に改善することを示している。この分析は、DBPediaのような既存の公共知識基盤が知識の強化に有用であることを示している。ビジネスシナリオの本質的な複雑さは、効果的な問題解決のためにドメイン固有の知識と外部リソースを組み込む必要があることが判明した。

Large language models (LLMs) have significantly advanced the field of natural language processing, with GPT models at the forefront. While their remarkable performance spans a range of tasks, adapting LLMs for real-world business scenarios still poses challenges warranting further investigation. This paper presents an empirical analysis aimed at bridging the gap in adapting LLMs to practical use cases. To do that, we select the question answering (QA) task of insurance as a case study due to its challenge of reasoning. Based on the task we design a new model relied on LLMs which are empowered by additional knowledge extracted from insurance policy rulebooks and DBpedia. The additional knowledge helps LLMs to understand new concepts of insurance for domain adaptation. Preliminary results on two QA datasets show that knowledge enhancement significantly improves the reasoning ability of GPT-3.5 (55.80% and 57.83% in terms of accuracy). The analysis also indicates that existing public knowledge bases, e.g., DBPedia is beneficial for knowledge enhancement. Our findings reveal that the inherent complexity of business scenarios often necessitates the incorporation of domain-specific knowledge and external resources for effective problem-solving.

翻訳日:2023-07-11 18:25:07 公開日:2023-07-08

# 勧告基礎モデルの項目IDの索引付け方法

How to Index Item IDs for Recommendation Foundation Models ( http://arxiv.org/abs/2305.06569v4 )

ライセンス: Link先を確認

Wenyue Hua, Shuyuan Xu, Yingqiang Ge, Yongfeng Zhang

(参考訳) Recommendation foundation modelは、リコメンデーションタスクを自然言語タスクに変換することで、リコメンデーションのために大きな言語モデル(LLM)を利用する。従来のレコメンデーションモデルでは、各候補項目と各候補項目のランキングスコアを計算するのではなく、アイテムを直接生成する生成レコメンデーションを可能にし、マルチステージフィルタリングからシングルステージフィルタリングまでのレコメンデーションパイプラインを簡素化する。推奨項目を決定する際に、過剰に長いテキストを生成するのを避けるために、推奨基礎モデルにはLLM互換アイテムIDを作成することが不可欠である。本研究では,P5を代表的バックボーンモデルとし,様々なインデクシング手法を用いて結果の再現を行い,推薦基礎モデルの項目インデックス化問題を体系的に検討する。項目インデクシングの重要性を強調するため,まず,独立したインデクシング,タイトルインデクシング,ランダムインデクシングなど,いくつかの自明な項目インデクシング手法の問題について論じる。次に,シーケンシャルインデクシング,協調インデクシング,セマンティック(コンテンツベース)インデクシング,ハイブリッドインデクシングという,シンプルかつ効果的な4つのソリューションを提案する。 P5 の再現性調査では,項目インデックス法がモデル性能に与える影響が明らかになり,提案手法の有効性を実世界のデータセットで検証した。

Recommendation foundation model utilizes large language models (LLM) for recommendation by converting recommendation tasks into natural language tasks. It enables generative recommendation which directly generates the item(s) to recommend rather than calculating a ranking score for each and every candidate item in traditional recommendation models, simplifying the recommendation pipeline from multi-stage filtering to single-stage filtering. To avoid generating excessively long text when deciding which item(s) to recommend, creating LLM-compatible item IDs is essential for recommendation foundation models. In this study, we systematically examine the item indexing problem for recommendation foundation models, using P5 as the representative backbone model and replicating its results with various indexing methods. To emphasize the importance of item indexing, we first discuss the issues of several trivial item indexing methods, such as independent indexing, title indexing, and random indexing. We then propose four simple yet effective solutions, including sequential indexing, collaborative indexing, semantic (content-based) indexing, and hybrid indexing. Our reproducibility study of P5 highlights the significant influence of item indexing methods on the model performance, and our results on real-world datasets validate the effectiveness of our proposed solutions.

翻訳日:2023-07-11 18:24:48 公開日:2023-07-08

# 拡張現実システムと3Dデジタル双生児を用いた低資源アフリカにおける健康と幸福の再構築

Re-imagining health and well-being in low resource African settings using an augmented AI system and a 3D digital twin ( http://arxiv.org/abs/2306.01772v2 )

ライセンス: Link先を確認

Deshendran Moodley and Christopher Seebregts

(参考訳) 本稿では、低資源アフリカ諸国における、人工知能(AI)とデジタル双生児の健康と幸福のための最近の発展の可能性と意義について論じる。我々は,疫病の流行と疫病対策に対する公衆衛生緊急対応の事例を用いている。分析と予測のための高度なAI手法を開発するために、データとデジタル化の可用性の増大を利用する可能性がある。 AIシステムの観点から、AIシステムとデジタルツインの出現するトレンドをレビューし、AIシステムが公共の健康目標に対処するために3Dデジタルツインとどのように機能するかを説明するために、初期のAIシステムアーキテクチャを提案する。我々は、AIシステムとデジタル双生児にとって不可欠な研究課題として、科学的知識発見、継続的な学習、実用的相互運用、インタラクティブな説明と意思決定を強調します。

This paper discusses and explores the potential and relevance of recent developments in artificial intelligence (AI) and digital twins for health and well-being in low-resource African countries. We use the case of public health emergency response to disease outbreaks and epidemic control. There is potential to take advantage of the increasing availability of data and digitization to develop advanced AI methods for analysis and prediction. Using an AI systems perspective, we review emerging trends in AI systems and digital twins and propose an initial augmented AI system architecture to illustrate how an AI system can work with a 3D digital twin to address public health goals. We highlight scientific knowledge discovery, continual learning, pragmatic interoperability, and interactive explanation and decision-making as essential research challenges for AI systems and digital twins.

翻訳日:2023-07-11 18:17:22 公開日:2023-07-08

# 自然言語理解のための変圧器の量子化とテンソル圧縮訓練

Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding ( http://arxiv.org/abs/2306.01076v2 )

ライセンス: Link先を確認

Zi Yang, Samridhi Choudhary, Siegfried Kunzmann, Zheng Zhang

(参考訳) 微調整トランスフォーマーモデルは、多くの自然言語タスクにおいて優れた性能を示している。しかし、大きなモデルサイズは、リソース制約のあるデバイスに高性能トランスフォーマーモデルを展開することを禁止している。本稿では,モデルサイズ,演算演算,最終的にトランスフォーマーモデルの実行待ち時間を削減するために,量子化認識テンソル圧縮トレーニング手法を提案する。我々はトランスの埋め込み層と線形層を小さな低ランクテンソルコアに圧縮し、モデルパラメータを著しく削減する。テンソル圧縮モデルの低精度表現を得るために、学習可能なスケール因子を用いた量子化アウェアトレーニングを用いる。開発されたアプローチは、エンドツーエンドのトレーニングと蒸留ベースのトレーニングの両方に使用できる。収束性を向上させるため, 既訓練変圧器から量子化およびテンソル圧縮された学生モデルを蒸留するために層間蒸留を適用した。パフォーマンスは2つの自然言語理解タスクで実証され、最大63\times$の圧縮率、精度の低下、驚くべき推論とトレーニングのスピードアップが示される。

Fine-tuned transformer models have shown superior performances in many natural language tasks. However, the large model size prohibits deploying high-performance transformer models on resource-constrained devices. This paper proposes a quantization-aware tensor-compressed training approach to reduce the model size, arithmetic operations, and ultimately runtime latency of transformer-based models. We compress the embedding and linear layers of transformers into small low-rank tensor cores, which significantly reduces model parameters. A quantization-aware training with learnable scale factors is used to further obtain low-precision representations of the tensor-compressed models. The developed approach can be used for both end-to-end training and distillation-based training. To improve the convergence, a layer-by-layer distillation is applied to distill a quantized and tensor-compressed student model from a pre-trained transformer. The performance is demonstrated in two natural language understanding tasks, showing up to $63\times$ compression ratio, little accuracy loss and remarkable inference and training speedup.

翻訳日:2023-07-11 18:16:20 公開日:2023-07-08

# 複数質問応答のための大規模言語モデルによるコンフォーマル予測

Conformal Prediction with Large Language Models for Multi-Choice Question Answering ( http://arxiv.org/abs/2305.18404v3 )

ライセンス: Link先を確認

Bhawesh Kumar, Charlie Lu, Gauri Gupta, Anil Palepu, David Bellamy, Ramesh Raskar, Andrew Beam

(参考訳) 大規模言語モデルが広く開発され続けるにつれて、ロバストな不確実性定量化技術が、高スループットシナリオにおける安全なデプロイメントに不可欠になる。本研究では,複数質問応答の特定のタスクに対して,共形予測を用いて言語モデルに不確かさの定量化を行う方法について検討する。共形予測からの不確実性推定は予測精度と密接に相関していることがわかった。この観測は、選択分類や低品質予測のフィルタリングといった下流の応用に有用である。また,共形予測が主観的疑問に求める交換可能性の仮定についても検討し,多くの実用的応用においてより現実的なシナリオとなる可能性について考察した。我々の研究は、エラー率の確実な保証が必要な安全クリティカルな状況において、より信頼性が高く信頼性の高い大規模言語モデルの活用に寄与する。

As large language models continue to be widely developed, robust uncertainty quantification techniques will become crucial for their safe deployment in high-stakes scenarios. In this work, we explore how conformal prediction can be used to provide uncertainty quantification in language models for the specific task of multiple-choice question-answering. We find that the uncertainty estimates from conformal prediction are tightly correlated with prediction accuracy. This observation can be useful for downstream applications such as selective classification and filtering out low-quality predictions. We also investigate the exchangeability assumption required by conformal prediction to out-of-subject questions, which may be a more realistic scenario for many practical applications. Our work contributes towards more trustworthy and reliable usage of large language models in safety-critical situations, where robust guarantees of error rate are required.

翻訳日:2023-07-11 18:14:39 公開日:2023-07-08

# 構造化データの生成拡散モデルに関する包括的調査

A Comprehensive Survey on Generative Diffusion Models for Structured Data ( http://arxiv.org/abs/2306.04139v2 )

ライセンス: Link先を確認

Heejoon Koo, To Eun Kim

(参考訳) 近年, 生成拡散モデルでは, 様々なアプリケーションにまたがる基礎的な性能を示すことによって, 深層生成モデルのパラダイムシフトが急速に進んでいる。一方、表データと時系列データを含む構造化データは、その全盛期と広範な応用にもかかわらず、ディープラーニング研究コミュニティから比較的限定的な注目を集めている。したがって、ビジュアルデータやテキストデータといった他のデータモダリティと比較して、拡散モデルによる構造化データモデリングに関する文献やレビューは依然として欠落している。このギャップに対処するために,最近提案されている構造化データ分野の拡散モデルの包括的レビューを行う。まず、この調査はスコアベース拡散モデル理論の簡潔な概要を提供し、その後、データ駆動の汎用タスクとドメイン固有のアプリケーションの両方で構造化データを使用した先駆的な研究の技術的な記述へと進む。その後,既存の研究における限界や課題を分析し,議論し,今後の研究方向性を提案する。このレビューが研究コミュニティの触媒となり、構造化データの生成拡散モデルの発展を促進することを願っている。

In recent years, generative diffusion models have achieved a rapid paradigm shift in deep generative models by showing groundbreaking performance across various applications. Meanwhile, structured data, encompassing tabular and time series data, has been received comparatively limited attention from the deep learning research community, despite its omnipresence and extensive applications. Thus, there is still a lack of literature and its reviews on structured data modelling via diffusion models, compared to other data modalities such as visual and textual data. To address this gap, we present a comprehensive review of recently proposed diffusion models in the field of structured data. First, this survey provides a concise overview of the score-based diffusion model theory, subsequently proceeding to the technical descriptions of the majority of pioneering works that used structured data in both data-driven general tasks and domain-specific applications. Thereafter, we analyse and discuss the limitations and challenges shown in existing works and suggest potential research directions. We hope this review serves as a catalyst for the research community, promoting developments in generative diffusion models for structured data.

翻訳日:2023-07-11 18:04:14 公開日:2023-07-08

# AdAM:Adaptation-Aware Kernel ModulationによるFew-Shot画像生成

AdAM: Few-Shot Image Generation via Adaptation-Aware Kernel Modulation ( http://arxiv.org/abs/2307.01465v2 )

ライセンス: Link先を確認

Yunqing Zhao, Keshigeyan Chandrasegaran, Abdollahzadeh Milad, Chao Du, Tianyu Pang, Ruoteng Li, Henghui Ding, Ngai-Man Cheung

(参考訳) Few-shot Image Generation (FSIG)は、少数のトレーニングサンプル(例:10)が与えられた新しい多様な画像を生成することを目的としている。最近の研究は、大規模なソースドメインで事前訓練されたGANを活用し、ターゲットドメインに適応することでFSIGに対処している。最近のFSIG手法の中心は知識保存基準であり、適応されたモデルにソース知識のサブセットを選択し保存する。しかし、既存の方法の大きな制限は、知識保存基準がソースドメイン/タスクのみを考慮し、ソース知識の選択においてターゲットドメイン/適応を考慮せず、ソースドメインとターゲットドメインの近接性の異なる設定に適合性に疑問を投げかけることである。私たちの仕事は2つの貢献をする。まず,最近のFSIG研究とその実験について再検討する。ソースドメインとターゲットドメインの近接性が緩和されるという仮定の下では、知識保存におけるソースドメインのみを考慮した既存のsota(state-of-the-art)メソッドがベースラインメソッドよりも優れていることが判明した。第2の貢献として、異なるソース・ターゲット領域近接の一般FSIGに対してAdaptation-Aware kernel Modulation (AdAM)を提案する。大規模な実験により、AdAMはFSIGのSOTAパフォーマンスを一貫して達成し、ソースドメインとターゲットドメインがより分離された困難なセットアップを含むことを示した。

Few-shot image generation (FSIG) aims to learn to generate new and diverse images given few (e.g., 10) training samples. Recent work has addressed FSIG by leveraging a GAN pre-trained on a large-scale source domain and adapting it to the target domain with few target samples. Central to recent FSIG methods are knowledge preservation criteria, which select and preserve a subset of source knowledge to the adapted model. However, a major limitation of existing methods is that their knowledge preserving criteria consider only source domain/task and fail to consider target domain/adaptation in selecting source knowledge, casting doubt on their suitability for setups of different proximity between source and target domain. Our work makes two contributions. Firstly, we revisit recent FSIG works and their experiments. We reveal that under setups which assumption of close proximity between source and target domains is relaxed, many existing state-of-the-art (SOTA) methods which consider only source domain in knowledge preserving perform no better than a baseline method. As our second contribution, we propose Adaptation-Aware kernel Modulation (AdAM) for general FSIG of different source-target domain proximity. Extensive experiments show that AdAM consistently achieves SOTA performance in FSIG, including challenging setups where source and target domains are more apart.

翻訳日:2023-07-11 17:57:58 公開日:2023-07-08

# プロンプトクラス:弱教師付きセマンティックセグメンテーションにおけるプロンプトクラス学習の力を探る

Prompting classes: Exploring the Power of Prompt Class Learning in Weakly Supervised Semantic Segmentation ( http://arxiv.org/abs/2307.00097v2 )

ライセンス: Link先を確認

Balamurali Murugesan, Rukhshanda Hussain, Rajarshi Bhattacharya, Ismail Ben Ayed, and Jose Dolz

(参考訳) 近年、CLIPベースのアプローチは、対照的な言語ビジョン事前学習の力によって、一般化と少数ショット学習タスクにおいて顕著なパフォーマンスを示した。特に,タスク関連テキストトークンを用いることで,事前学習した言語ビジョンモデルを下流タスクに適応するための効果的な手法として,プロンプトチューニングが登場している。この進展に動機づけられ、本研究では、wsss(weakly supervised semantic segmentation)のような他の基本的な問題に対して、迅速なチューニングの恩恵を受けるかどうかを疑問視する。以上の結果から,WSSSにおける即時チューニングの影響について,興味深い2つの観察結果が得られた。まず、テキストプロンプトのクラストークンのみを変更すると、コンテキストを最適化するより複雑な戦略に比べて、クラスアクティベーションマップ(cam)に大きな影響を与える。第二に、画像基底真理に関連するクラストークンは、必ずしも最高のCAMをもたらすカテゴリに対応しない。これらの観測を動機として,PrOmpt cLass lEarning(POLE)戦略に基づく新しいアプローチを導入する。大規模な実験を通じて、我々のシンプルで効率的なアプローチは、よく知られたWSSSベンチマークでSOTAのパフォーマンスを達成することを実証した。これらの結果は、WSSSにおける言語ビジョンモデルの利点だけでなく、この問題に対する迅速な学習の可能性も浮き彫りにしている。コードはhttps://github.com/rB080/WSS_POLEで公開されている。

Recently, CLIP-based approaches have exhibited remarkable performance on generalization and few-shot learning tasks, fueled by the power of contrastive language-vision pre-training. In particular, prompt tuning has emerged as an effective strategy to adapt the pre-trained language-vision models to downstream tasks by employing task-related textual tokens. Motivated by this progress, in this work we question whether other fundamental problems, such as weakly supervised semantic segmentation (WSSS), can benefit from prompt tuning. Our findings reveal two interesting observations that shed light on the impact of prompt tuning on WSSS. First, modifying only the class token of the text prompt results in a greater impact on the Class Activation Map (CAM), compared to arguably more complex strategies that optimize the context. And second, the class token associated with the image ground truth does not necessarily correspond to the category that yields the best CAM. Motivated by these observations, we introduce a novel approach based on a PrOmpt cLass lEarning (POLE) strategy. Through extensive experiments we demonstrate that our simple, yet efficient approach achieves SOTA performance in a well-known WSSS benchmark. These results highlight not only the benefits of language-vision models in WSSS but also the potential of prompt learning for this problem. The code is available at https://github.com/rB080/WSS_POLE.

翻訳日:2023-07-11 17:57:19 公開日:2023-07-08

# ラベル制約付き正規化と推論について

On Regularization and Inference with Label Constraints ( http://arxiv.org/abs/2307.03886v1 )

ライセンス: Link先を確認

Kaifu Wang, Hangfeng He, Tin D. Nguyen, Piyush Kumar, Dan Roth

(参考訳) 機械学習における事前知識や記号規則はラベル制約、特に構造化予測問題といった形で表現されることが多い。本研究では,機械学習パイプラインにおけるラベル制約の符号化,制約付き正規化,制約付き推論の2つの一般的な戦略を比較し,モデル性能への影響を定量化する。正規化については、制約に矛盾するモデルに先行して一般化ギャップを狭めることを示す。しかし、その小さな違反に対する好みは、準最適モデルに対するバイアスをもたらす。制約付き推論では,モデルの違反を訂正することで人口リスクを低減し,その結果,違反を有利にすることを示す。これらの違いを考慮し, 2つの手法の併用をさらに検討し, モデル複雑性と最適リスクの両方を改善することを目的とした, 正規化によるバイアスを補償するための制約付き推論条件を提案する。

Prior knowledge and symbolic rules in machine learning are often expressed in the form of label constraints, especially in structured prediction problems. In this work, we compare two common strategies for encoding label constraints in a machine learning pipeline, regularization with constraints and constrained inference, by quantifying their impact on model performance. For regularization, we show that it narrows the generalization gap by precluding models that are inconsistent with the constraints. However, its preference for small violations introduces a bias toward a suboptimal model. For constrained inference, we show that it reduces the population risk by correcting a model's violation, and hence turns the violation into an advantage. Given these differences, we further explore the use of two approaches together and propose conditions for constrained inference to compensate for the bias introduced by regularization, aiming to improve both the model complexity and optimal risk.

翻訳日:2023-07-11 17:00:00 公開日:2023-07-08

# 組合せ最適化のための変分量子固有解法の計算勾配に対するノイズテンソルリング近似

Noisy Tensor Ring approximation for computing gradients of Variational Quantum Eigensolver for Combinatorial Optimization ( http://arxiv.org/abs/2307.03884v1 )

ライセンス: Link先を確認

Dheeraj Peddireddy, Utkarsh Priyam, Vaneet Aggarwal

(参考訳) 変分量子アルゴリズム、特に量子近似最適化と変分量子固有解法(VQE)は、組合せ最適化の領域で計算上の利点を提供する可能性を確立している。しかし、これらのアルゴリズムは、スケーラビリティを制限する古典的な難解な勾配に苦しむ。本研究は,パラメータシフト則を用いた古典的勾配計算法を提案し,テンソルリング近似を用いて回路から期待値を算出することにより,VQEのスケーラビリティ問題に対処する。回路からのパラメータ付きゲートはテンソル環の自由辺に沿って行列を収縮することでテンソル環を変換する。単一量子ビットゲートは環構造を変化させないが、2つの量子ビット回転からの状態変換を特異値の切り換えにより評価し、テンソル環の構造を保ち、計算複雑性を低減させる。この行列積状態近似の変動は、古典的シミュレーションの指数的な成長とは対照的に、キュービット数と2つのキュービットゲート数で線形に増加し、古典的シミュレータの勾配のより高速な評価を可能にする。

Variational Quantum algorithms, especially Quantum Approximate Optimization and Variational Quantum Eigensolver (VQE) have established their potential to provide computational advantage in the realm of combinatorial optimization. However, these algorithms suffer from classically intractable gradients limiting the scalability. This work addresses the scalability challenge for VQE by proposing a classical gradient computation method which utilizes the parameter shift rule but computes the expected values from the circuits using a tensor ring approximation. The parametrized gates from the circuit transform the tensor ring by contracting the matrix along the free edges of the tensor ring. While the single qubit gates do not alter the ring structure, the state transformations from the two qubit rotations are evaluated by truncating the singular values thereby preserving the structure of the tensor ring and reducing the computational complexity. This variation of the Matrix product state approximation grows linearly in number of qubits and the number of two qubit gates as opposed to the exponential growth in the classical simulations, allowing for a faster evaluation of the gradients on classical simulators.

翻訳日:2023-07-11 16:59:45 公開日:2023-07-08

# 混合開始型ビデオゲームの設計

Designing Mixed-Initiative Video Games ( http://arxiv.org/abs/2307.03877v1 )

ライセンス: Link先を確認

Daijin Yang

(参考訳) 人工知能(AI)の開発により、人間は機械でコンテンツを共同作成できる。 AIが生成するコンテンツの意外性は、ユーザーにインスピレーションとエンターテイメントをもたらす可能性がある。しかし、共同制作インタラクションは常にコンテンツクリエーター向けに設計されており、アクセシビリティに乏しい。スネークストーリー(Snake Story)は、ゲームのように「スネーク」をプレイすることで、プレイヤーがAI生成したテキストを選択してヘビの物語を書くことができる複合開始型ゲームである。ゲームコンポーネントを設計したインタフェースで使用せずにプレイヤとAIのインタラクションのダイナミクスを調べるための制御実験を行った。 11名 (n=11) による調査の結果, プレイヤーは2つのバージョンで遊ぶ際に異なる戦略を用い, ゲーム機構はアウトプットストーリー, プレイヤーの創造的プロセス, ロール知覚に大きく影響し, 異なる背景を持つプレイヤーは2つのバージョンに対して異なる好みを示した。これらの結果に基づき,混合開始型ゲームの設計について考察した。この研究は、共同創造体験を刺激することを目的としている。

The development of Artificial Intelligence (AI) enables humans to co-create content with machines. The unexpectedness of AI-generated content can bring inspiration and entertainment to users. However, the co-creation interactions are always designed for content creators and have poor accessibility. To explore gamification of mixed-initiative co-creation and make human-AI interactions accessible and fun for players, I prototyped Snake Story, a mixed-initiative game where players can select AI-generated texts to write a story of a snake by playing a "Snake" like game. A controlled experiment was conducted to investigate the dynamics of player-AI interactions with and without the game component in the designed interface. As a result of a study with 11 players (n=11), I found that players utilized different strategies when playing with the two versions, game mechanics significantly affected the output stories, players' creative process, as well as role perceptions, and players with different backgrounds showed different preferences for the two versions. Based on these results, I further discussed considerations for mixed-initiative game design. This work aims to inspire the design of engaging co-creation experiences.

翻訳日:2023-07-11 16:59:26 公開日:2023-07-08

# サプライチェーン最適化のための大規模言語モデル

Large Language Models for Supply Chain Optimization ( http://arxiv.org/abs/2307.03875v1 )

ライセンス: Link先を確認

Beibin Li, Konstantina Mellou, Bo Zhang, Jeevan Pathuri, Ishai Menache

(参考訳) サプライチェーンの運用は伝統的に様々な複雑な意思決定の問題を伴う。過去数十年間、サプライチェーンは計算の進歩の大きな恩恵を受け、手動処理から自動化、コスト効率の最適化へと移行した。それでも、ビジネスオペレーターは \emph{explaining} に多大な労力を費やし、最適化の結果をステークホルダーに解釈する必要がある。近年のLarge Language Models (LLMs) の進歩に触発され,サプライチェーンの自動化と人間の理解,信頼のギャップを埋める上で,この破壊的技術がいかに役立つかを検討する。私たちは \name{} -- プレーンテキストで入力クエリとして受け入れ、基礎となる最適化結果に関する洞察を出力するフレームワークを設計します。我々のフレームワークは、最先端の組合せ最適化技術を捨てるのではなく、それを利用して、何のシナリオ(例えば、ある需要に対してサプライヤーAの代わりにサプライヤーBを使用する場合、コストはどのように変化するのか? 重要なことは、当社の設計では、LLMにプロプライエタリなデータを送らなくてもよいということです。当社のフレームワークがMicrosoftのクラウドサプライチェーン内の実際のサーバ配置シナリオに与える影響を実証する。そこで我々は,他のシナリオにおけるllm出力の精度を評価するための汎用評価ベンチマークを開発した。

Supply chain operations traditionally involve a variety of complex decision making problems. Over the last few decades, supply chains greatly benefited from advances in computation, which allowed the transition from manual processing to automation and cost-effective optimization. Nonetheless, business operators still need to spend substantial efforts in \emph{explaining} and interpreting the optimization outcomes to stakeholders. Motivated by the recent advances in Large Language Models (LLMs), we study how this disruptive technology can help bridge the gap between supply chain automation and human comprehension and trust thereof. We design \name{} -- a framework that accepts as input queries in plain text, and outputs insights about the underlying optimization outcomes. Our framework does not forgo the state-of-the-art combinatorial optimization technology, but rather leverages it to quantitatively answer what-if scenarios (e.g., how would the cost change if we used supplier B instead of supplier A for a given demand?). Importantly, our design does not require sending proprietary data over to LLMs, which can be a privacy concern in some circumstances. We demonstrate the effectiveness of our framework on a real server placement scenario within Microsoft's cloud supply chain. Along the way, we develop a general evaluation benchmark, which can be used to evaluate the accuracy of the LLM output in other scenarios.

翻訳日:2023-07-11 16:59:06 公開日:2023-07-08

# デジタル病理におけるKi-67スコーリングのための銀標準ラベルを用いたドメイン適応

Domain Adaptation using Silver Standard Labels for Ki-67 Scoring in Digital Pathology: A Step Closer to Widescale Deployment ( http://arxiv.org/abs/2307.03872v1 )

ライセンス: Link先を確認

Amanda Dy, Ngoc-Nhu Jennifer Nguyen, Seyed Hossein Mirjahanmardi, Melanie Dawe, Anthony Fyles, Wei Shi, Fei-Fei Liu, Dimitrios Androutsos, Susan Done and April Khademi

(参考訳) 深層学習システムは,Ki-67 PIスコアの客観性と効率を向上させるために提案されている。課題は、非常に正確なディープラーニング技術では、ドメイン外のデータに適用した場合のパフォーマンスが低下することです。モデルは通常、ターゲットドメインからではなく、ベンダーが利用可能なデータを使用してトレーニングされるため、これは臨床翻訳にとって重要な課題である。そこで本研究では,対象ドメインの銀標準(擬似)ラベルを生成するために,教師なしフレームワークを用いて,ゴールド標準(GS)ソースドメインデータの拡張を行うドメイン適応パイプラインを提案する。評価された2つのKi-67スコアリングアーキテクチャ(UV-Net, piNET), (1) SSのみ, (2) GSのみ,(2) GSはソースGSラベルで,(3) GSはソースGSラベルで,(4) GS+SSはソースGSラベルで, 微調整はターゲットSSラベルで, そして(5) SS+GSはソースSSラベルで, ソースGSラベルでのみ訓練された。 SS+GS法は(p < 0.05)高いPI精度(95.9%)と目標データに対するGS onlyモデルよりも一貫性のある結果を得た。 t-SNEプロットの解析により、SS+GSモデルにより学習された特徴が、ソースデータとターゲットデータにより整合していることが示され、一般化が改善された。提案するパイプラインは,手作業によるアノテーションを使わずに目標分布を学習するための効率的な手法を提供する。このフレームワークは、広範囲なデプロイメントのために、作業単位のキャリブレーション方法として、任意のターゲットサイトに適用することができる。

Deep learning systems have been proposed to improve the objectivity and efficiency of Ki- 67 PI scoring. The challenge is that while very accurate, deep learning techniques suffer from reduced performance when applied to out-of-domain data. This is a critical challenge for clinical translation, as models are typically trained using data available to the vendor, which is not from the target domain. To address this challenge, this study proposes a domain adaptation pipeline that employs an unsupervised framework to generate silver standard (pseudo) labels in the target domain, which is used to augment the gold standard (GS) source domain data. Five training regimes were tested on two validated Ki-67 scoring architectures (UV-Net and piNET), (1) SS Only: trained on target silver standard (SS) labels, (2) GS Only: trained on source GS labels, (3) Mixed: trained on target SS and source GS labels, (4) GS+SS: trained on source GS labels and fine-tuned on target SS labels, and our proposed method (5) SS+GS: trained on source SS labels and fine-tuned on source GS labels. The SS+GS method yielded significantly (p < 0.05) higher PI accuracy (95.9%) and more consistent results compared to the GS Only model on target data. Analysis of t-SNE plots showed features learned by the SS+GS models are more aligned for source and target data, resulting in improved generalization. The proposed pipeline provides an efficient method for learning the target distribution without manual annotations, which are time-consuming and costly to generate for medical images. This framework can be applied to any target site as a per-laboratory calibration method, for widescale deployment.

翻訳日:2023-07-11 16:58:41 公開日:2023-07-08

# HUMS2023 データチャレンジ結果提出

HUMS2023 Data Challenge Result Submission ( http://arxiv.org/abs/2307.03871v1 )

ライセンス: Link先を確認

Dhiraj Neupane, Lakpa Dorje Tamang, Ngoc Dung Huynh, Mohamed Reda Bouadjenek and Sunil Aryal

(参考訳) 本研究では,早期発見のための簡単な手法を実装した。実装された手法は、与えられたマットファイルをプロットし、サンプル上で連続ウェーブレット変換(cwt)を行うことで生成されたスカルグラム画像を分析する。また、各信号の平均値、標準偏差(STD)、ピーク対ピーク(P2P)値も故障信号の検出に役立った。我々は,自動回帰統合移動平均(ARIMA)法を実装した。

We implemented a simple method for early detection in this research. The implemented methods are plotting the given mat files and analyzing scalogram images generated by performing Continuous Wavelet Transform (CWT) on the samples. Also, finding the mean, standard deviation (STD), and peak-to-peak (P2P) values from each signal also helped detect faulty signs. We have implemented the autoregressive integrated moving average (ARIMA) method to track the progression.

翻訳日:2023-07-11 16:57:59 公開日:2023-07-08

# sketch-a-shape:ゼロショットスケッチから3d形状生成

Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation ( http://arxiv.org/abs/2307.03869v1 )

ライセンス: Link先を確認

Aditya Sanghi, Pradeep Kumar Jayaraman, Arianna Rampini, Joseph Lambourne, Hooman Shayani, Evan Atherton, Saeid Asgari Taghanaki

(参考訳) テキストから形状への生成のような3次元視覚におけるダウンストリームタスクのための大規模事前学習モデルの創造的応用において、近年大きな進歩がなされている。このことは、これらの事前学習モデルがスケッチから3次元形状を効果的に生成する方法について、我々の研究の動機となっている。トレーニング中に合成レンダリングの特徴(凍結した大きな学習済み視覚モデルから得られる)に3次元生成モデルを条件付けすることで、推論時にスケッチから3次元形状を効果的に生成できることがわかった。これは、事前訓練された大きな視覚モデル機能には、ドメインシフトに耐性のあるセマンティック信号、すなわち、RGBレンダリングのみを使用できるが、推論時にスケッチに一般化できることを示唆している。異なる設計要因を調査した総合的な実験を行い、トレーニング中にペアのデータセットを必要とせずに、それぞれの入力スケッチ毎に複数の3d形状を生成するための簡単なアプローチの有効性を実証する。

Significant progress has recently been made in creative applications of large pre-trained models for downstream tasks in 3D vision, such as text-to-shape generation. This motivates our investigation of how these pre-trained models can be used effectively to generate 3D shapes from sketches, which has largely remained an open challenge due to the limited sketch-shape paired datasets and the varying level of abstraction in the sketches. We discover that conditioning a 3D generative model on the features (obtained from a frozen large pre-trained vision model) of synthetic renderings during training enables us to effectively generate 3D shapes from sketches at inference time. This suggests that the large pre-trained vision model features carry semantic signals that are resilient to domain shifts, i.e., allowing us to use only RGB renderings, but generalizing to sketches at inference time. We conduct a comprehensive set of experiments investigating different design factors and demonstrate the effectiveness of our straightforward approach for generation of multiple 3D shapes per each input sketch regardless of their level of abstraction without requiring any paired datasets during training.

翻訳日:2023-07-11 16:57:50 公開日:2023-07-08

# 無線ネットワークにおけるパーソナライズドリソース割り当て:ai対応およびビッグデータ駆動多目的最適化

Personalized Resource Allocation in Wireless Networks: An AI-Enabled and Big Data-Driven Multi-Objective Optimization ( http://arxiv.org/abs/2307.03867v1 )

ライセンス: Link先を確認

Rawan Alkurd, Ibrahim Abualhaol, Halim Yanikomeroglu

(参考訳) 無線ネットワークの設計と最適化は、主に強力な数学的および理論的モデリングに基づいている。それでも、5G以降の時代に新しいアプリケーションが出現すると、ネットワークの設計と最適化において、前例のないレベルの複雑さが生じることになる。結果として、非常に複雑な問題をリアルタイムに解決できる柔軟性と適応性のために、ワイヤレスネットワークの設計と最適化のために人工知能(ai)の使用が想定されている。 aiの主な将来の応用の1つは、多くのユースケースでユーザーレベルのパーソナライズを可能にすることである。 aiは、コンピュータが人間から命令や感情を非インタラクティブな方法で感知し、プロセス全体をユーザに透明にするコンピュータとの対話方法に革命をもたらすだろう。この機能を活用し、コンピューティング技術の進歩により、無線ネットワークを再設計し、ネットワークサービスのパーソナライズをリアルタイムに行えるようにすることができる。現在の無線ネットワークは、予め定義された品質要件を満たすために最適化されているが、この論文で提唱されるパーソナライズ技術は、不足するネットワークリソースをマイクロマネジメントするために設計されたインテリジェントなビッグデータ駆動層によって支えられている。このレイヤは、各ユーザの満足度レベルを達成するために必要なサービス品質を決定するために必要なインテリジェンスを提供する。動的で柔軟な設計のため、パーソナライズされたネットワークは、リソースの節約とユーザ満足度の向上という2つの矛盾する目標を最適化することで、前例のない改善が期待されている。

The design and optimization of wireless networks have mostly been based on strong mathematical and theoretical modeling. Nonetheless, as novel applications emerge in the era of 5G and beyond, unprecedented levels of complexity will be encountered in the design and optimization of the network. As a result, the use of Artificial Intelligence (AI) is envisioned for wireless network design and optimization due to the flexibility and adaptability it offers in solving extremely complex problems in real-time. One of the main future applications of AI is enabling user-level personalization for numerous use cases. AI will revolutionize the way we interact with computers in which computers will be able to sense commands and emotions from humans in a non-intrusive manner, making the entire process transparent to users. By leveraging this capability, and accelerated by the advances in computing technologies, wireless networks can be redesigned to enable the personalization of network services to the user level in real-time. While current wireless networks are being optimized to achieve a predefined set of quality requirements, the personalization technology advocated in this article is supported by an intelligent big data-driven layer designed to micro-manage the scarce network resources. This layer provides the intelligence required to decide the necessary service quality that achieves the target satisfaction level for each user. Due to its dynamic and flexible design, personalized networks are expected to achieve unprecedented improvements in optimizing two contradicting objectives in wireless networks: saving resources and improving user satisfaction levels.

翻訳日:2023-07-11 16:57:30 公開日:2023-07-08

# 超ハイゼンベルク精密な長距離相互作用スターク多体プローブ

Long-range interacting Stark many-body probes with Super-Heisenberg precision ( http://arxiv.org/abs/2307.03904v1 )

ライセンス: Link先を確認

Rozhin Yousefjani, Xingjian He, and Abolfazl Bayat

(参考訳) 粒子間相互作用が有害であるインターフェロメトリベースの量子センシングとは対照的に、量子多体プローブはそのような相互作用を利用して量子増強感度を達成する。研究された多くの量子多体プローブでは、相互作用は短距離であると考えられている。本稿では,様々な充填因子における長距離相互作用がStark量子プローブの性能に及ぼす影響について検討する。これらのプローブは、システムサイズが増加するにつれて無限小勾配場で起こる基底状態スターク局在化相転移を利用する。その結果、超ハイゼンベルク精度は常に全ての相互作用範囲において達成可能であるが、長距離相互作用スタークプローブは2つの異なる挙動を明らかにした。第一に、相互作用の範囲を代数的に増加させることで、局所化のパワーは増大し、プローブの感度は低下する。第2に、相互作用範囲が完全連結グラフに近づくと、効果的な局在化力が消失し、プローブの感度が再び向上し始める。超ハイゼンベルク精度は、遷移点まで延長段階を通して達成可能であり、資源分析に状態準備時間が組み込まれても有効である。プローブが局在した位相に入ると感度が低下し、その性能は普遍的な振る舞いに従ってサイズ非依存になる。さらに, 解析の結果, 充填率の低下が弱勾配場の測定精度の向上につながることが示された。

In contrast to interferometry-based quantum sensing, where interparticle interaction is detrimental, quantum many-body probes exploit such interactions to achieve quantum-enhanced sensitivity. In most of the studied quantum many-body probes, the interaction is considered to be short-ranged. Here, we investigate the impact of long-range interaction at various filling factors on the performance of Stark quantum probes for measuring a small gradient field. These probes harness the ground state Stark localization phase transition which happens at an infinitesimal gradient field as the system size increases. Our results show that while super-Heisenberg precision is always achievable in all ranges of interaction, the long-range interacting Stark probe reveals two distinct behaviors. First, by algebraically increasing the range of interaction, the localization power enhances and thus the sensitivity of the probe decreases. Second, as the interaction range becomes close to a fully connected graph its effective localization power disappears and thus the sensitivity of the probe starts to enhance again. The super-Heisenberg precision is achievable throughout the extended phase until the transition point and remains valid even when the state preparation time is incorporated in the resource analysis. As the probe enters the localized phase, the sensitivity decreases and its performance becomes size-independent, following a universal behavior. In addition, our analysis shows that lower filling factors lead to better precision for measuring weak gradient fields.

翻訳日:2023-07-11 16:48:33 公開日:2023-07-08

# 可視赤外ビデオパーソン再同定のための対向的自己攻撃防御と空間的時間的関係マイニング

Adversarial Self-Attack Defense and Spatial-Temporal Relation Mining for Visible-Infrared Video Person Re-Identification ( http://arxiv.org/abs/2307.03903v1 )

ライセンス: Link先を確認

Huafeng Li, Le Xu, Yafei Zhang, Dapeng Tao, Zhengtao Yu

(参考訳) 可視赤外ビデオパーソナライゼーション(re-ID)では、複雑なシーン(モダリティ、カメラビュー、歩行者ポーズ、背景など)の変化の影響を受けない特徴を抽出し、移動情報をマイニングし活用することが、横断的歩行者識別マッチングの鍵となる。そこで本研究では,新たな視点,すなわち対人自己攻撃防衛と時空間関係のマイニングの観点から,新しい可視赤外ビデオパーソンre-ID手法を提案する。本研究では,視点,姿勢,背景,モーダルの不一致の変化が,人物のアイデンティティ特徴の摂動を引き起こす主な要因であると考えられる。トレーニングサンプルに含まれるそのような干渉情報は、対向摂動として使用される。トレーニング中にre-idモデルに対して敵対的な攻撃を行い、これらの不利な要因に対してモデルをより堅牢にする。敵の摂動からの攻撃は、入力サンプルに含まれる干渉情報を敵のサンプルを発生させることなく活性化し、敵の自己攻撃(adversarial self-ack)と呼ばれる。この設計により、敵の攻撃と防御を一つのフレームワークに統合できる。本稿では,映像列における情報を利用する空間-時間情報案内特徴表現ネットワークを提案する。ネットワークは、ビデオフレームシーケンスに含まれる情報を抽出するだけでなく、空間内のローカル情報の関係を利用してネットワークをガイドし、より堅牢な特徴を抽出する。提案手法は,大規模なクロスモダリティビデオデータセットにおいて魅力的な性能を示す。提案手法のソースコードはhttps://github.com/lhf12278/xxxで公開される。

In visible-infrared video person re-identification (re-ID), extracting features not affected by complex scenes (such as modality, camera views, pedestrian pose, background, etc.) changes, and mining and utilizing motion information are the keys to solving cross-modal pedestrian identity matching. To this end, the paper proposes a new visible-infrared video person re-ID method from a novel perspective, i.e., adversarial self-attack defense and spatial-temporal relation mining. In this work, the changes of views, posture, background and modal discrepancy are considered as the main factors that cause the perturbations of person identity features. Such interference information contained in the training samples is used as an adversarial perturbation. It performs adversarial attacks on the re-ID model during the training to make the model more robust to these unfavorable factors. The attack from the adversarial perturbation is introduced by activating the interference information contained in the input samples without generating adversarial samples, and it can be thus called adversarial self-attack. This design allows adversarial attack and defense to be integrated into one framework. This paper further proposes a spatial-temporal information-guided feature representation network to use the information in video sequences. The network cannot only extract the information contained in the video-frame sequences but also use the relation of the local information in space to guide the network to extract more robust features. The proposed method exhibits compelling performance on large-scale cross-modality video datasets. The source code of the proposed method will be released at https://github.com/lhf12278/xxx.

翻訳日:2023-07-11 16:48:11 公開日:2023-07-08

# クラス構造とクラスタ構造を同時に保存する特徴選択

Feature selection simultaneously preserving both class and cluster structures ( http://arxiv.org/abs/2307.03902v1 )

ライセンス: Link先を確認

Suchismita Das and Nikhil R. Pal

(参考訳) データセットがクラス構造とクラスタ構造に有意な差異がある場合、クラスを識別することだけを目的とした機能を選択するとクラスタリング性能が低下し、同じように、クラスタ構造のみを保存することを目的とした機能選択では、分類性能が低下する。この知識を最大限に活用するには,クラス識別とクラスタ構造保存を同時に考慮した特徴選択手法は文献にない。本稿では,クラス識別と構造保存の両方を統合的に重視するニューラルネットワークに基づく特徴選択手法を提案することにより,このギャップを埋める試みを行った。典型的分類問題の評価に加えて,超スペクトル画像における帯域選択の有効性について検討した。実験の結果から,提案した特徴/帯域選択により,分類とクラスタリングの両方に優れた特徴のサブセットを選択できると主張する。

When a data set has significant differences in its class and cluster structure, selecting features aiming only at the discrimination of classes would lead to poor clustering performance, and similarly, feature selection aiming only at preserving cluster structures would lead to poor classification performance. To the best of our knowledge, a feature selection method that simultaneously considers class discrimination and cluster structure preservation is not available in the literature. In this paper, we have tried to bridge this gap by proposing a neural network-based feature selection method that focuses both on class discrimination and structure preservation in an integrated manner. In addition to assessing typical classification problems, we have investigated its effectiveness on band selection in hyperspectral images. Based on the results of the experiments, we may claim that the proposed feature/band selection can select a subset of features that is good for both classification and clustering.

翻訳日:2023-07-11 16:47:45 公開日:2023-07-08

# 物理学におけるアクティブラーニング:101から進歩,そして展望へ

Active Learning in Physics: From 101, to Progress, and Perspective ( http://arxiv.org/abs/2307.03899v1 )

ライセンス: Link先を確認

Yongcheng Ding, Jos\'e D. Mart\'in-Guerrero, Yolanda Vives-Gilabert, Xi Chen

(参考訳) Active Learning (AL) は、機械学習(ML)アルゴリズムのファミリであり、人工知能の現在の時代をさかのぼる。トレーニングにラベル付きサンプルを必要とする従来のアプローチとは異なり、alは専門家が注釈を付けるためにラベルなしサンプルを反復的に選択する。このプロトコルは、最も情報に富むサンプルを優先することを目的としており、ラベル付きサンプルのトレーニングと比較してモデル性能が改善されている。近年、alは特に物理学の分野で注目を集めている。本稿では,al理論の包括的かつ到達可能な紹介を行い,様々な領域における最新の進歩を概観する。さらに、ALと量子MLの統合の可能性を探り、ALを古典MLの量子領域への単なる拡張と見なすのではなく、これらの2つの場の相乗的融合を想定する。

Active Learning (AL) is a family of machine learning (ML) algorithms that predates the current era of artificial intelligence. Unlike traditional approaches that require labeled samples for training, AL iteratively selects unlabeled samples to be annotated by an expert. This protocol aims to prioritize the most informative samples, leading to improved model performance compared to training with all labeled samples. In recent years, AL has gained increasing attention, particularly in the field of physics. This paper presents a comprehensive and accessible introduction to the theory of AL reviewing the latest advancements across various domains. Additionally, we explore the potential integration of AL with quantum ML, envisioning a synergistic fusion of these two fields rather than viewing AL as a mere extension of classical ML into the quantum realm.

翻訳日:2023-07-11 16:47:34 公開日:2023-07-08

# StyleGAN3:翻訳と回転の等価性向上のための生成ネットワーク

StyleGAN3: Generative Networks for Improving the Equivariance of Translation and Rotation ( http://arxiv.org/abs/2307.03898v1 )

ライセンス: Link先を確認

Tianlei Zhu, Junqi Chen, Renzhe Zhu, Gaurav Gupta

(参考訳) StyleGANは、顔の姿勢やアイデンティティに影響を及ぼすスタイルや、髪、しわ、肌の色、その他の詳細に影響を及ぼすノイズを利用することができる。これらのうち、画像処理の結果はスタイルGANの異なるバージョンによって若干異なる。その結果, styleGAN2 と styleGAN3 の2つの改良版の比較が本研究の主な焦点となる。 FFHQデータセットをデータセットとして使用し,FID,EQ-T,EQ-Rをモデル評価に使用した。結局、Stylegan3バージョンは同値性を改善するためのより良い生成ネットワークであることが判明した。私たちの発見は、アニメーションやビデオの作成にポジティブな影響を与えます。

StyleGAN can use style to affect facial posture and identity features, and noise to affect hair, wrinkles, skin color and other details. Among these, the outcomes of the picture processing will vary slightly between different versions of styleGAN. As a result, the comparison of performance differences between styleGAN2 and the two modified versions of styleGAN3 will be the main focus of this study. We used the FFHQ dataset as the dataset and FID, EQ-T, and EQ-R were used to be the assessment of the model. In the end, we discovered that Stylegan3 version is a better generative network to improve the equivariance. Our findings have a positive impact on the creation of animation and videos.

翻訳日:2023-07-11 16:47:22 公開日:2023-07-08

# イテレーティブ・プロンプティングによる曖昧な質問への回答

Answering Ambiguous Questions via Iterative Prompting ( http://arxiv.org/abs/2307.03897v1 )

ライセンス: Link先を確認

Weiwei Sun and Hengyi Cai and Hongshen Chen and Pengjie Ren and Zhumin Chen and Maarten de Rijke and Zhaochun Ren

(参考訳) オープンドメインの質問応答では、質問のあいまいさのため、複数の妥当な答えが存在する可能性がある。曖昧な質問に対して実現可能な回答を提供するには、すべての有効な回答を直接予測するアプローチがあるが、これは妥当性と多様性のバランスに苦労する可能性がある。別の方法として、候補回答を収集して集約する方法があるが、この方法は計算コストが高く、回答間の依存関係を無視することができる。本稿では,あいまいな疑問に答える既存手法の不完全性に対処するため,AmbigPromptを提案する。具体的には,応答モデルと応答モデルとを反復的に統合する。プロンプトモデルは、読み出しプロセスを適応的に追跡し、応答モデルを段階的にトリガーして、個別かつ関連する回答を構成する。さらに、応答モデルとプロンプトモデルの両方に対してタスク固有のポストプレトレーニング手法を開発し、フレームワークの性能を大幅に改善する。一般に使用されている2つのオープンベンチマークに関する実証研究によると、AmbigPromptは、メモリ使用量が少なく、競合するアプローチよりも推論レイテンシが低い状態で、最先端または競合的な結果を達成する。さらに、AmbigPromptは低リソース設定でもうまく機能する。コードは、https://github.com/sunnweiwei/AmbigPrompt.comで入手できる。

In open-domain question answering, due to the ambiguity of questions, multiple plausible answers may exist. To provide feasible answers to an ambiguous question, one approach is to directly predict all valid answers, but this can struggle with balancing relevance and diversity. An alternative is to gather candidate answers and aggregate them, but this method can be computationally costly and may neglect dependencies among answers. In this paper, we present AmbigPrompt to address the imperfections of existing approaches to answering ambiguous questions. Specifically, we integrate an answering model with a prompting model in an iterative manner. The prompting model adaptively tracks the reading process and progressively triggers the answering model to compose distinct and relevant answers. Additionally, we develop a task-specific post-pretraining approach for both the answering model and the prompting model, which greatly improves the performance of our framework. Empirical studies on two commonly-used open benchmarks show that AmbigPrompt achieves state-of-the-art or competitive results while using less memory and having a lower inference latency than competing approaches. Additionally, AmbigPrompt also performs well in low-resource settings. The code are available at: https://github.com/sunnweiwei/AmbigPrompt.

翻訳日:2023-07-11 16:47:12 公開日:2023-07-08

# 量子スターリング熱エンジンの臨界挙動

The Critical Behavior of Quantum Stirling Heat Engine ( http://arxiv.org/abs/2307.03895v1 )

ライセンス: Link先を確認

Yuan-Sheng Wang, Man-Hong Yung, Dazhi Xu, Maoxin Liu, Xiaosong Chen

(参考訳) 量子ラビモデル (QRM) をモデルとした作業物質 (WS) を用いたスターリングサイクルの性能について検討し, 臨界が効率に与える影響について検討した。以上の結果から,QRMの臨界値がスターリングサイクルの効率向上に有効であることが示唆された。さらに, 高温貯水池の温度が有限であっても, WS パラメータが臨界点に近づくと, カルノー効率が漸近的に達成可能であることが観察された。さらに、スターリングサイクルの効率に対する臨界挙動を導出し、WSパラメータが臨界点に近づくにつれて、効率がカルノー効率に漸近的に近づいたかを示す。我々の研究は、スターリング熱エンジンの性能に対する臨界の影響の理解を深めている。

We investigate the performance of a Stirling cycle with a working substance (WS) modeled as the quantum Rabi model (QRM), exploring the impact of criticality on its efficiency. Our findings indicate that the criticality of the QRM has a positive effect on improving the efficiency of the Stirling cycle. Furthermore, we observe that the Carnot efficiency is asymptotically achievable as the WS parameter approaches the critical point, even when both the temperatures of the cold and hot reservoirs are finite. Additionally, we derive the critical behavior for the efficiency of the Stirling cycle, demonstrating how the efficiency asymptotically approaches the Carnot efficiency as the WS parameter approaches the critical point. Our work deepens the understanding of the impact of criticality on the performance of a Stirling heat engine.

翻訳日:2023-07-11 16:46:50 公開日:2023-07-08

# コミュニティレコメンデーションのためのメンタルヘルス談話の埋め込み

Embedding Mental Health Discourse for Community Recommendation ( http://arxiv.org/abs/2307.03892v1 )

ライセンス: Link先を確認

Hy Dang, Bang Nguyen, Noah Ziems, Meng Jiang

(参考訳) 本稿では,ソーシャルメディア上でのメンタルヘルス支援グループに着目したコミュニティレコメンデーションシステムの開発に,談話埋め込み技術の利用について検討する。ソーシャルメディアプラットフォームは、ユーザーが特定の興味を持ったコミュニティと匿名で接続する手段を提供する。しかし、膨大な数のオンラインコミュニティが利用できるため、ユーザーはメンタルヘルスの懸念に対処するために関連するグループを特定するのが困難になる可能性がある。この課題に対処するために、埋め込み技術を用いて様々なサブレディットコミュニティからの談話情報の統合を検討し、効果的なレコメンデーションシステムの開発を行う。提案手法では,レコメンデーションシステムの性能を高めるために,コンテンツベースおよび協調フィルタリング技術を用いる。提案手法は,提案手法を個別に利用し,レコメンデーションプロセスにおける解釈可能性を提供する。

Our paper investigates the use of discourse embedding techniques to develop a community recommendation system that focuses on mental health support groups on social media. Social media platforms provide a means for users to anonymously connect with communities that cater to their specific interests. However, with the vast number of online communities available, users may face difficulties in identifying relevant groups to address their mental health concerns. To address this challenge, we explore the integration of discourse information from various subreddit communities using embedding techniques to develop an effective recommendation system. Our approach involves the use of content-based and collaborative filtering techniques to enhance the performance of the recommendation system. Our findings indicate that the proposed approach outperforms the use of each technique separately and provides interpretability in the recommendation process.

翻訳日:2023-07-11 16:46:35 公開日:2023-07-08

# 固有値問題の量子技術

Quantum techniques for eigenvalue problems ( http://arxiv.org/abs/2307.03889v1 )

ライセンス: Link先を確認

Dean Lee

(参考訳) 本稿では,量子多体系における固有値問題に対する量子アルゴリズムの簡単な紹介を行う。トピックの広範な調査よりも、断熱進化の本質、変分法、位相検出アルゴリズム、その他いくつかのアプローチを網羅する、いくつかの量子アルゴリズムの概念的理解の提供に注力する。各手法について,潜在的な利点と課題について考察する。

This article is a brief introduction to quantum algorithms for the eigenvalue problem in quantum many-body systems. Rather than a broad survey of topics, we focus on providing a conceptual understanding of several quantum algorithms that cover the essentials of adiabatic evolution, variational methods, phase detection algorithms, and several other approaches. For each method, we discuss the potential advantages and remaining challenges.

翻訳日:2023-07-11 16:46:19 公開日:2023-07-08

# Reward Reweighing, Reselection, Retraining によるプロトタイプ部品ネットワークの改善

Improving Prototypical Part Networks with Reward Reweighing, Reselection, and Retraining ( http://arxiv.org/abs/2307.03887v1 )

ライセンス: Link先を確認

Robin Netzorg, Jiaxun Li, Bin Yu

(参考訳) 近年、モデルの出力をデータの特定の特徴に明確に関連付ける画像分類のための深い解釈可能な手法の開発が進められている。このような手法の1つはプロトタイプ部分ネットワーク(ProtoPNet)であり、入力の有意義な部分に基づいて画像の分類を試みる。この方法は解釈可能な分類となるが、画像の散発的あるいは一貫性のない部分から分類することを学ぶことが多い。これを改善するために、我々は近年のReinforcement Learning with Human Feedback (RLHF) からインスピレーションを得て、これらのプロトタイプを微調整する。 cub-200-2011データセット上で,プロトタイプ品質のヒューマンアノテーションを1～5スケールで収集することで,非盗作プロトタイプの識別を学習する報酬モデルを構築する。完全なrlアップデートに代わり、prototypepical part network (r3-protopnet)の再重み付け、再選択、再トレーニングを行い、protopnetトレーニングループにさらに3つのステップを追加する。最初の2ステップは報酬ベースのリウェイトと再選択であり、プロトタイプと人間のフィードバックを一致させる。最後のステップは、モデルの機能をアップデートされたプロトタイプで再トレーニングすることだ。 R3-ProtoPNetはプロトタイプ全体の一貫性と意味を向上するが、独立して使用するとテスト予測精度は低下する。複数のR3-ProtoPNetをアンサンブルに組み込むと、解釈可能性を維持しながらテスト予測性能が向上する。

In recent years, work has gone into developing deep interpretable methods for image classification that clearly attributes a model's output to specific features of the data. One such of these methods is the prototypical part network (ProtoPNet), which attempts to classify images based on meaningful parts of the input. While this method results in interpretable classifications, this method often learns to classify from spurious or inconsistent parts of the image. Hoping to remedy this, we take inspiration from the recent developments in Reinforcement Learning with Human Feedback (RLHF) to fine-tune these prototypes. By collecting human annotations of prototypes quality via a 1-5 scale on the CUB-200-2011 dataset, we construct a reward model that learns to identify non-spurious prototypes. In place of a full RL update, we propose the reweighted, reselected, and retrained prototypical part network (R3-ProtoPNet), which adds an additional three steps to the ProtoPNet training loop. The first two steps are reward-based reweighting and reselection, which align prototypes with human feedback. The final step is retraining to realign the model's features with the updated prototypes. We find that R3-ProtoPNet improves the overall consistency and meaningfulness of the prototypes, but lower the test predictive accuracy when used independently. When multiple R3-ProtoPNets are incorporated into an ensemble, we find an increase in test predictive performance while maintaining interpretability.

翻訳日:2023-07-11 16:46:13 公開日:2023-07-08

# 迅速な経験的シナリオ

Fast Empirical Scenarios ( http://arxiv.org/abs/2307.03927v1 )

ライセンス: Link先を確認

Michael Multerer, Paul Schneider, Rohan Sen

(参考訳) 我々は,サンプルモーメントと整合する大規模かつ高次元のパネルデータから,少数の代表的なシナリオを抽出することを目指す。 2つの新しいアルゴリズムのうち、最初に観測されていないシナリオを識別し、共分散行列のシナリオベースで表現する。第2の提案は、既に実現済みで、高次のサンプルモーメント情報と整合した世界の状態から重要なデータポイントを選択する。どちらのアルゴリズムも計算に効率的であり、一貫したシナリオベースモデリングと高次元数値積分に役立てる。広範囲な数値ベンチマーク研究とポートフォリオ最適化への応用により,提案手法が好まれる。

We seek to extract a small number of representative scenarios from large and high-dimensional panel data that are consistent with sample moments. Among two novel algorithms, the first identifies scenarios that have not been observed before, and comes with a scenario-based representation of covariance matrices. The second proposal picks important data points from states of the world that have already realized, and are consistent with higher-order sample moment information. Both algorithms are efficient to compute, and lend themselves to consistent scenario-based modeling and high-dimensional numerical integration. Extensive numerical benchmarking studies and an application in portfolio optimization favor the proposed algorithms.

翻訳日:2023-07-11 16:40:35 公開日:2023-07-08

# Inchworm法によるオープン量子スピン鎖のリアルタイムシミュレーション

Real-Time Simulation of Open Quantum Spin Chains with Inchworm Method ( http://arxiv.org/abs/2307.03924v1 )

ライセンス: Link先を確認

Geshuo Wang, Zhenning Cai

(参考訳) 開放量子系の実時間シミュレーションについて検討し, スピン連鎖をモデルとし, それぞれのスピンがそれぞれの高調波浴と関連している場合について考察した。本手法はスピン-ボソンモデルに対するインヒワーム法とスピン系に対するモジュラーパス積分法を結合する。特に,inchworm法の導入は,数値的な符号問題を大幅に抑制することができる。どちらのメソッドも、互いにシームレスに動作するように調整されている。我々は,図式的手法の言語によるアプローチを表現し,計算コストの漸近的挙動を解析する。本手法を検証するために広範囲な数値実験を行った。

We study the real-time simulation of open quantum systems, where the system is modeled by a spin chain, with each spin associated with its own harmonic bath. Our method couples the inchworm method for the spin-boson model and the modular path integral methodology for spin systems. In particular, the introduction of the inchworm method can significantly suppress the numerical sign problem. Both methods are tweaked to make them work seamlessly with each other. We represent our approach in the language of diagrammatic methods, and analyze the asymptotic behavior of the computational cost. Extensive numerical experiments are done to validate our method.

翻訳日:2023-07-11 16:40:27 公開日:2023-07-08

# 交通密度予測のためのマルチタスク最適化による物理形ニューラルネットワークの学習

Training Physics-Informed Neural Networks via Multi-Task Optimization for Traffic Density Prediction ( http://arxiv.org/abs/2307.03920v1 )

ライセンス: Link先を確認

Bo Wang and A. K. Qin and Sajjad Shafiei and Hussein Dia and Adriana-Simona Mihaita and Hanna Grzybowska

(参考訳) 物理インフォームドニューラルネットワーク(英: Physics-informed Neural Network、PINN)は、特定のデータセットを管理する特定の物理法則(例えば偏微分方程式(PDE)によって記述されたもの)を、そのようなデータセットに基づいてニューラルネットワーク(NN)のトレーニングに組み込む、機械学習における新たな研究フロンティアである。 PINNでは、NNはPDEの解近似器として機能し、PDEはNNトレーニングを導くための事前知識として機能し、トレーニングデータの限られた可用性に直面した場合、NNの望ましい一般化性能につながる。しかし、NNと物理法則の両方から構成される損失の複雑さのため、PINNのトレーニングは非自明な作業である。本研究では,マルチタスク最適化(MTO)パラダイムに基づく新しいPINNトレーニングフレームワークを提案する。この枠組みの下では、与えられた(メイン)タスクと共に複数の補助タスクを作成し、解決する。そこでは、あるタスクの解決から有用な知識を適応モードに転送して、他のタスクの解決を支援する。提案手法を実装し,交通密度予測問題に対処するためのPINNの訓練に応用する。実験の結果,従来のピンの訓練法と比較して,提案手法が性能向上に寄与することが示された。

Physics-informed neural networks (PINNs) are a newly emerging research frontier in machine learning, which incorporate certain physical laws that govern a given data set, e.g., those described by partial differential equations (PDEs), into the training of the neural network (NN) based on such a data set. In PINNs, the NN acts as the solution approximator for the PDE while the PDE acts as the prior knowledge to guide the NN training, leading to the desired generalization performance of the NN when facing the limited availability of training data. However, training PINNs is a non-trivial task largely due to the complexity of the loss composed of both NN and physical law parts. In this work, we propose a new PINN training framework based on the multi-task optimization (MTO) paradigm. Under this framework, multiple auxiliary tasks are created and solved together with the given (main) task, where the useful knowledge from solving one task is transferred in an adaptive mode to assist in solving some other tasks, aiming to uplift the performance of solving the main task. We implement the proposed framework and apply it to train the PINN for addressing the traffic density prediction problem. Experimental results demonstrate that our proposed training framework leads to significant performance improvement in comparison to the traditional way of training the PINN.

翻訳日:2023-07-11 16:40:18 公開日:2023-07-08

# VS-TransGRU:エゴセントリックアクション予測のためのビジュアルセマンティックフュージョンにより強化された新しいトランスフォーマーGRUベースのフレームワーク

VS-TransGRU: A Novel Transformer-GRU-based Framework Enhanced by Visual-Semantic Fusion for Egocentric Action Anticipation ( http://arxiv.org/abs/2307.03918v1 )

ライセンス: Link先を確認

Congqi Cao and Ze Sun and Qinyi Lv and Lingtong Min and Yanning Zhang

(参考訳) エゴセントリックな行動予測は、一人称視点で現在の歴史的観察から将来の行動を予測することを目的とした課題である。既存のほとんどの手法は、予測性能を高めるために視覚入力と繰り返しニューラルネットワークに基づくモデルアーキテクチャと損失関数の改善に重点を置いている。しかし、視覚情報のみを考慮し、単一のネットワークアーキテクチャに依存するこれらの手法は、徐々に性能の高原に達する。本稿では,現在の観測と今後の行動の依存関係を十分に把握するために,新しい視覚・意味融合とトランスフォーマーGRUに基づく行動予測フレームワークを提案する。まず、アクション予測の性能を初めて向上するために、ハイレベルなセマンティック情報を導入する。我々は,クラスラベルに基づいて生成した意味的特徴や,視覚観察から直接生成した意味的特徴を用いて,元の視覚的特徴を補うことを提案する。次に, 意味的ギャップを補い, 相補性を十分に活用するために, 効果的な視覚・視覚融合モジュールを提案する。第3に、並列モデルと自己回帰モデルの両方を活用するために、長期連続モデリングのためのTransformerベースのエンコーダと柔軟な反復復号のためのGRUベースのデコーダを設計する。 EPIC-Kitchens と EGTEA Gaze+ の2つの大規模一対人ビューデータセットに対する大規模な実験により,提案手法の有効性が検証された。

Egocentric action anticipation is a challenging task that aims to make advanced predictions of future actions from current and historical observations in the first-person view. Most existing methods focus on improving the model architecture and loss function based on the visual input and recurrent neural network to boost the anticipation performance. However, these methods, which merely consider visual information and rely on a single network architecture, gradually reach a performance plateau. In order to fully understand what has been observed and capture the dependencies between current observations and future actions well enough, we propose a novel visual-semantic fusion enhanced and Transformer GRU-based action anticipation framework in this paper. Firstly, high-level semantic information is introduced to improve the performance of action anticipation for the first time. We propose to use the semantic features generated based on the class labels or directly from the visual observations to augment the original visual features. Secondly, an effective visual-semantic fusion module is proposed to make up for the semantic gap and fully utilize the complementarity of different modalities. Thirdly, to take advantage of both the parallel and autoregressive models, we design a Transformer based encoder for long-term sequential modeling and a GRU-based decoder for flexible iteration decoding. Extensive experiments on two large-scale first-person view datasets, i.e., EPIC-Kitchens and EGTEA Gaze+, validate the effectiveness of our proposed method, which achieves new state-of-the-art performance, outperforming previous approaches by a large margin.

翻訳日:2023-07-11 16:39:52 公開日:2023-07-08

# 音声テキストと大規模言語モデル統合のためのデコーダのみアーキテクチャについて

On decoder-only architecture for speech-to-text and large language model integration ( http://arxiv.org/abs/2307.03917v1 )

ライセンス: Link先を確認

Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, Jinyu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu Wu

(参考訳) 大規模言語モデル (LLM) は自然言語処理の分野で大きな成功を収めており、自然言語を用いた人間とコンピュータの相互作用が向上している。しかし,LLMへの音声信号のシームレスな統合は十分に研究されていない。デコーダのみ"アーキテクチャも音声処理タスクではあまり研究されていない。本研究では,音声情報をテキストベース大規模言語モデルに効果的に組み込む新しい手法であるSpeech-LLaMAを提案する。本手法は,圧縮音響特徴をLLMの連続的意味空間にマッピングするために,コネクショニスト時間分類と簡単なオーディオエンコーダを利用する。さらに,音声対テキストタスクのためのデコーダのみのアーキテクチャについても,音声対テキストペアデータのみから,より小規模のランダム初期化音声ラマモデルをトレーニングすることで検討した。音声からテキストへの変換におけるデコーダのみのモデルの有効性を強調して,多言語音声からテキストへの変換タスクの実験を行い,強いベースラインに対する大幅な改善を示す。

Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language. However, the seamless integration of speech signals into LLMs has not been explored well. The "decoder-only" architecture has also not been well studied for speech processing tasks. In this research, we introduce Speech-LLaMA, a novel approach that effectively incorporates acoustic information into text-based large language models. Our method leverages Connectionist Temporal Classification and a simple audio encoder to map the compressed acoustic features to the continuous semantic space of the LLM. In addition, we further probe the decoder-only architecture for speech-to-text tasks by training a smaller scale randomly initialized speech-LLaMA model from speech-text paired data alone. We conduct experiments on multilingual speech-to-text translation tasks and demonstrate a significant improvement over strong baselines, highlighting the potential advantages of decoder-only models for speech-to-text conversion.

翻訳日:2023-07-11 16:39:24 公開日:2023-07-08

# ゼロフィールド量子センシングのためのv字形3レベル系の位相幾何制御

Phased Geometric Controls of V-Shaped Three-Level System for Zero-field Quantum Sensing ( http://arxiv.org/abs/2307.03916v1 )

ライセンス: Link先を確認

Zhijie Li, Xiangyu Ye, Xi Kong, Tianyu Xie, Zhiping Yang, Pengju Zhao, Ya Wang, Fazhan Shi, Jiangfeng Du

(参考訳) 本稿では,V型3レベルスピン系におけるゼロフィールド二重量子ゲートの位相制御プロトコルを提案する。この方法は線形偏波マイクロ波パルスを利用し、幾何学的量子ビット特性を利用して状態漏洩を防止する。特定の位相制御を用いることで,ダイヤモンド中の単一窒素空孔中心を用いた低出力マルチパルスゼロフィールドセンシング技術を実現する。本手法は, 高精度な二重量子ゲート演算を適応駆動力で実現するための新しい手法を提供し, ゼロフィールドスピンベース量子技術に有用なツールである。

Here we propose and demonstrate a phased geometric control protocol for zero-field double quantum gates in a V-shaped three-level spin system. This method utilizes linearly polarized microwave pulses and exploits the geometric qubit properties to prevent state leakage. By employing specific phased geometric controls, we realize a low-power multi-pulse zero-field sensing technique using single nitrogen-vacancy centers in diamond. Our method offers a novel approach to implement precise double quantum gate operations with an adaptable driving power, making it a valuable tool for zero-field spin-based quantum technology.

翻訳日:2023-07-11 16:39:06 公開日:2023-07-08

# 効果的な人間-AIコラボレーション開発における人間中心AIの適用:人間-AI共同認知システムの観点から

Applying human-centered AI in developing effective human-AI teaming: A perspective of human-AI joint cognitive systems ( http://arxiv.org/abs/2307.03913v1 )

ライセンス: Link先を確認

Wei Xu, Zaifeng Gao

(参考訳) 研究と応用は、AIシステムを開発するための新しいパラダイムとして、HAT(Human-AI Teaming)を使用している。 HATは、AIが単なるツールではなく、チームメイトとして機能することを認識している。効果的な人間-AIチームは、各メンバの既知の課題と制限を克服しつつ、人間とAIの両方のユニークな能力を活用でき、人間の能力を増強し、どちらのエンティティよりも共同パフォーマンスを高める必要がある。 National AI Research and Strategic Plan 2023アップデートは、AIシステムの独立したパフォーマンスに重点を置く研究プログラムが、動的、適応的、協力的なチームの中でAIが提供しなければならない機能を考慮するのに失敗し、人間とAIのコラボレーションとコラボレーションに関するさらなる研究を求めることを認識している。しかし、AIが人間とチームメイトとして機能するかどうかについては議論がある。第一の懸念は、"チーム"パラダイムを採用することは、人間中心のAI(HCAI)アプローチと矛盾するため、AIシステムのコントロールを失うことである。本稿では、HATパラダイムと議論をさらに分析する。具体的には,人間とAIの協調認知システム(HAIJCS)の概念枠組みを詳述し,HCAI傘の下でのHAT表現に適用する。 HAIJCSはHCAIを有効化しながらHAIを採用するのに役立つと考えている。 HAIJCSの意義と今後の課題についても論じる。洞察:aiは新しい形の人間-機械関係の出現につながった:人間-aiチーム(hat)、人間-aiシステムにおけるパラダイムシフト、新しいデザインパラダイムとして帽子を適用する際に人間中心のai(hcai)アプローチに従うこと、効果的な人間-aiチームを作るための帽子を表現・実装するための人間-ai合同認知システム(haijcs)の概念的枠組みを提案する。

Research and application have used human-AI teaming (HAT) as a new paradigm to develop AI systems. HAT recognizes that AI will function as a teammate instead of simply a tool in collaboration with humans. Effective human-AI teams need to be capable of taking advantage of the unique abilities of both humans and AI while overcoming the known challenges and limitations of each member, augmenting human capabilities, and raising joint performance beyond that of either entity. The National AI Research and Strategic Plan 2023 update has recognized that research programs focusing primarily on the independent performance of AI systems generally fail to consider the functionality that AI must provide within the context of dynamic, adaptive, and collaborative teams and calls for further research on human-AI teaming and collaboration. However, there has been debate about whether AI can work as a teammate with humans. The primary concern is that adopting the "teaming" paradigm contradicts the human-centered AI (HCAI) approach, resulting in humans losing control of AI systems. This article further analyzes the HAT paradigm and the debates. Specifically, we elaborate on our proposed conceptual framework of human-AI joint cognitive systems (HAIJCS) and apply it to represent HAT under the HCAI umbrella. We believe that HAIJCS may help adopt HAI while enabling HCAI. The implications and future work for HAIJCS are also discussed. Insights: AI has led to the emergence of a new form of human-machine relationship: human-AI teaming (HAT), a paradigmatic shift in human-AI systems; We must follow a human-centered AI (HCAI) approach when applying HAT as a new design paradigm; We propose a conceptual framework of human-AI joint cognitive systems (HAIJCS) to represent and implement HAT for developing effective human-AI teaming

翻訳日:2023-07-11 16:38:57 公開日:2023-07-08

# FPGAにおけるスパイクニューラルネットワーク加速器の検討

A Survey of Spiking Neural Network Accelerator on FPGA ( http://arxiv.org/abs/2307.03910v1 )

ライセンス: Link先を確認

Murat Isik

(参考訳) カスタマイズトポロジを実装する能力により、FPGAは組み込みアプリケーションと高性能アプリケーションの両方にSNNをデプロイするのにますます利用されている。本稿では,最新のSNN実装とそのFPGAへの応用について検討する。我々は最近広く使われているスパイキングニューロンモデル、ネットワーク構造、信号符号化フォーマットを収集し、FPGAベースのSNN実装のための関連するハードウェア設計スキームを列挙した。これまでの調査と比較すると,本書は,先程の技術的スキームを適用した応用事例を列挙している。そこで本研究では,FPGA上でSNNを実装する際のアクセラレーションポテンシャルについて論じる。上記の議論によると、今後のトレンドは、本論文で議論され、関連する主題のさらなる発展のためのガイドラインを示す。

Due to the ability to implement customized topology, FPGA is increasingly used to deploy SNNs in both embedded and high-performance applications. In this paper, we survey state-of-the-art SNN implementations and their applications on FPGA. We collect the recent widely-used spiking neuron models, network structures, and signal encoding formats, followed by the enumeration of related hardware design schemes for FPGA-based SNN implementations. Compared with the previous surveys, this manuscript enumerates the application instances that applied the above-mentioned technical schemes in recent research. Based on that, we discuss the actual acceleration potential of implementing SNN on FPGA. According to our above discussion, the upcoming trends are discussed in this paper and give a guideline for further advancement in related subjects.

翻訳日:2023-07-11 16:38:06 公開日:2023-07-08

# 階層分類アルゴリズムを用いたディープQネットワークの導入

Incorporating Deep Q -- Network with Multiclass Classification Algorithms ( http://arxiv.org/abs/2307.03908v1 )

ライセンス: Link先を確認

Noopur Zambare, Ravindranath Sawane

(参考訳) 本研究では,マルチクラス分類アルゴリズムの機能を深層q-network(dqn)がいかに改善するかを検討する。我々はKaggleのベンチマークデータセットを使用して、既存の教師付きマルチクラス分類アルゴリズムにDQNを組み込んだフレームワークを作成する。本研究の結果は,多クラス分類の精度を高めるために,深層強化学習戦略の活用方法について考察する。画像認識、自然言語処理、バイオインフォマティクスなど、様々な分野で使用されている。本研究は、企業における財政難の予測と、マルチクラス分類におけるディープQネットワークの広範な適用に焦点を当てた。金融危機を経験する可能性のあるビジネスを特定することは、金融とリスク管理の分野において重要なタスクである。ビジネスが経営を継続し、財務上の責任を果たすことが重大な困難に直面すると、財務上の困難に陥ると言われている。一般的には、企業が利益率やキャッシュフローの問題、持続不可能なレベルの負債の急な不況に陥っている場合に発生する。

In this study, we explore how Deep Q-Network (DQN) might improve the functionality of multiclass classification algorithms. We will use a benchmark dataset from Kaggle to create a framework incorporating DQN with existing supervised multiclass classification algorithms. The findings of this study will bring insight into how deep reinforcement learning strategies may be used to increase multiclass classification accuracy. They have been used in a number of fields, including image recognition, natural language processing, and bioinformatics. This study is focused on the prediction of financial distress in companies in addition to the wider application of Deep Q-Network in multiclass classification. Identifying businesses that are likely to experience financial distress is a crucial task in the fields of finance and risk management. Whenever a business experiences serious challenges keeping its operations going and meeting its financial responsibilities, it is said to be in financial distress. It commonly happens when a company has a sharp and sustained recession in profitability, cash flow issues, or an unsustainable level of debt.

翻訳日:2023-07-11 16:37:48 公開日:2023-07-08

# scriptworld: 手続き知識学習のためのテキストベース環境

ScriptWorld: Text Based Environment For Learning Procedural Knowledge ( http://arxiv.org/abs/2307.03906v1 )

ライセンス: Link先を確認

Abhinav Joshi and Areeb Ahmad and Umang Pandey and Ashutosh Modi

(参考訳) テキストベースのゲームは、強化学習ベースのエージェントで自然言語理解と世界に関する常識知識を開発するためのフレームワークを提供する。既存のテキストベースの環境は、しばしばゲームフレームワークを作成するために架空の状況やキャラクターに依存しており、現実のシナリオとは程遠い。本稿では,実世界の日常雑用についてエージェントに教えるテキストベースの環境であるScriptWorldを紹介する。私たちの知る限りでは、スクリプトデータセットを使用して設計された日々の現実世界のヒューマンアクティビティで構成される、最初のインタラクティブなテキストベースのゲームフレームワークです。 10日間の活動にゲーム環境を提供し,提案環境の詳細な分析を行う。 ScriptworldでゲームをするRLベースのベースラインモデル/エージェントを開発した。このような環境における言語モデルの役割を理解するために,RLエージェントの事前学習言語モデルから得られた特徴を利用する。本実験は,事前学習した言語モデルから得られた事前知識が,実世界のテキストベースのゲーム環境の解決に役立つことを示す。環境はgithubからリリースしています。 https://github.com/exploration-lab/scriptworld

Text-based games provide a framework for developing natural language understanding and commonsense knowledge about the world in reinforcement learning based agents. Existing text-based environments often rely on fictional situations and characters to create a gaming framework and are far from real-world scenarios. In this paper, we introduce ScriptWorld: a text-based environment for teaching agents about real-world daily chores and hence imparting commonsense knowledge. To the best of our knowledge, it is the first interactive text-based gaming framework that consists of daily real-world human activities designed using scripts dataset. We provide gaming environments for 10 daily activities and perform a detailed analysis of the proposed environment. We develop RL-based baseline models/agents to play the games in Scriptworld. To understand the role of language models in such environments, we leverage features obtained from pre-trained language models in the RL agents. Our experiments show that prior knowledge obtained from a pre-trained language model helps to solve real-world text-based gaming environments. We release the environment via Github: https://github.com/Exploration-Lab/ScriptWorld

翻訳日:2023-07-11 16:37:08 公開日:2023-07-08

# スピンアンサンブルと非エルミート位相的エッジ状態との強結合の強化

Enhanced Strong Coupling between Spin Ensemble and non-Hermitian Topological Edge States ( http://arxiv.org/abs/2307.03944v1 )

ライセンス: Link先を確認

Jie Qian, Jie Li, Shi-Yao Zhu, J. Q. You, and Yi-Pu Wang

(参考訳) 光間相互作用は、基本的な現象の理解と多用途なアプリケーションの開発の両方に不可欠である。強い結合、堅牢性、制御性は、光間相互作用を実現する上で最も重要な3つの側面である。位相的および非エルミートフォトニクスは、それぞれロバスト性と広範な制御の自由のための枠組みを提供している。状態のフォトニック密度、非エルミート工学を用いて散乱パラメータなどのエッジ状態の特性を設計する方法は、位相的保護が十分に研究されていないことを保証している。ここではパリティ時間対称二量化フォトニック格子を構築し、自発的PT対称性の破れによって複素値のエッジ状態を生成する。強磁性スピンアンサンブルにおける位相フォトニックエッジモードとマグノンモードとの強結合の強化を示す。本研究は, 微妙な非エルミート位相境界状態を明らかにし, トポロジカル光-物質相互作用の実現と工学的手法を提供する。

Light-matter interaction is crucial to both understanding fundamental phenomena and developing versatile applications. Strong coupling, robustness, and controllability are the three most important aspects in realizing light-matter interactions. Topological and non-Hermitian photonics, have provided frameworks for robustness and extensive control freedom, respectively. How to engineer the properties of the edge state such as photonic density of state, scattering parameters by using non-Hermitian engineering while ensuring topological protection has not been fully studied. Here we construct a parity-time-symmetric dimerized photonic lattice and generate complex-valued edge states via spontaneous PT-symmetry breaking. The enhanced strong coupling between the topological photonic edge mode and magnon mode in a ferromagnetic spin ensemble is demonstrated. Our research reveals the subtle non-Hermitian topological edge states and provides strategies for realizing and engineering topological light-matter interactions.

翻訳日:2023-07-11 16:28:56 公開日:2023-07-08

# 特徴グラフトと気晴らし認識を用いたカモフラージュ物体検出

Camouflaged Object Detection with Feature Grafting and Distractor Aware ( http://arxiv.org/abs/2307.03943v1 )

ライセンス: Link先を確認

Yuxuan Song and Xinyue Li and Lin Qi

(参考訳) カモフラーゲ型物体検出(COD)の課題は、ターゲットと背景の間のテクスチャが視覚的に区別できないため、通常の検出よりも難しい環境に統合されたカモフラーゲ型物体を正確に分割することである。本稿では,CODタスクを処理するために,FDNet(Feature Grafting and Distractor Aware Network)を提案する。具体的には、CNNとTransformerを使ってマルチスケール画像を並列にエンコードする。 2つのエンコーダの利点をよりよく探究するために、トランスフォーマーブランチから抽出された特徴をcnnブランチにグラフトするために、クロスアテンションベースの特徴グラフトモジュールを設計し、その特徴をfeature fusionモジュールに集約する。 distractor awareモジュールはcodタスクで考えられる2つの邪魔者を明確にモデル化し、粗いカモフラージュマップを洗練するように設計されている。また,アノテーション付き2000枚の画像を含む最大人工カモフラージュオブジェクトデータセット(ACOD2K)も提案した。 4つのベンチマークデータセットとacod2kデータセットについて広範な実験を行った。その結果,本手法は他の最先端手法よりも優れていた。コードとACOD2Kはhttps://github.com/syxvision/FDNetで入手できる。

The task of Camouflaged Object Detection (COD) aims to accurately segment camouflaged objects that integrated into the environment, which is more challenging than ordinary detection as the texture between the target and background is visually indistinguishable. In this paper, we proposed a novel Feature Grafting and Distractor Aware network (FDNet) to handle the COD task. Specifically, we use CNN and Transformer to encode multi-scale images in parallel. In order to better explore the advantages of the two encoders, we design a cross-attention-based Feature Grafting Module to graft features extracted from Transformer branch into CNN branch, after which the features are aggregated in the Feature Fusion Module. A Distractor Aware Module is designed to explicitly model the two possible distractors in the COD task to refine the coarse camouflage map. We also proposed the largest artificial camouflaged object dataset which contains 2000 images with annotations, named ACOD2K. We conducted extensive experiments on four widely used benchmark datasets and the ACOD2K dataset. The results show that our method significantly outperforms other state-of-the-art methods. The code and the ACOD2K will be available at https://github.com/syxvision/FDNet.

翻訳日:2023-07-11 16:28:41 公開日:2023-07-08

# ariadne's thread:テキストプロンプトによる胸部x線画像からの感染領域の分割の改善

Ariadne's Thread:Using Text Prompts to Improve Segmentation of Infected Areas from Chest X-ray images ( http://arxiv.org/abs/2307.03942v1 )

ライセンス: Link先を確認

Yi Zhong, Mengqiu Xu, Kongming Liang, Kaixin Chen, and Ming Wu

(参考訳) 肺感染症のような肺疾患の重症度を測定するには,肺感染部位の分節化が不可欠である。既存の医用画像分割法は画像に基づくほぼ一様法である。しかし、これらの画像のみの手法は、大量の注釈データで訓練されない限り、不正確な結果を生み出す傾向がある。この課題を克服するために,テキストプロンプトを用いてセグメント化結果を改善する言語駆動セグメンテーション手法を提案する。 QaTa-COV19データセットの実験から,本手法は,少なくともユニモーダル法と比較して,Diceのスコアを6.09%向上させることが示された。さらに,本研究では,テキストの情報粒度の観点からのマルチモーダル手法の柔軟性を明らかにし,必要なトレーニングデータのサイズにおいて,マルチモーダル手法が画像のみの手法よりも大きなアドバンテージを持つことを示す。

Segmentation of the infected areas of the lung is essential for quantifying the severity of lung disease like pulmonary infections. Existing medical image segmentation methods are almost uni-modal methods based on image. However, these image-only methods tend to produce inaccurate results unless trained with large amounts of annotated data. To overcome this challenge, we propose a language-driven segmentation method that uses text prompt to improve to the segmentation result. Experiments on the QaTa-COV19 dataset indicate that our method improves the Dice score by 6.09% at least compared to the uni-modal methods. Besides, our extended study reveals the flexibility of multi-modal methods in terms of the information granularity of text and demonstrates that multi-modal methods have a significant advantage over image-only methods in terms of the size of training data required.

翻訳日:2023-07-11 16:28:19 公開日:2023-07-08

# 大規模言語モデルの時代に忘れられる権利:含意、課題、解決策

Right to be Forgotten in the Era of Large Language Models: Implications, Challenges, and Solutions ( http://arxiv.org/abs/2307.03941v1 )

ライセンス: Link先を確認

Dawen Zhang, Pamela Finckenberg-Broman, Thong Hoang, Shidong Pan, Zhenchang Xing, Mark Staples, Xiwei Xu

(参考訳) 忘れられる権利(rtbf)は、google spain sl、google inc. v aepd、mario costeja gonz\'alezの裁定によって最初に確立され、後に欧州連合の一般データ保護規則(gdpr)の下で消去する権利として含まれ、個人が個人に個人データを削除する権利が組織によって削除された。特に検索エンジンに関しては,個人がクエリ結果から情報を除外するための要求を組織に送信することができる。近年,Large Language Models (LLM) が開発され,チャットボットでの利用により,LLM対応ソフトウェアシステムが普及している。しかし、RTBFから除外されることはない。検索エンジンが使用するインデックス化手法と比較して、LLMは情報を全く異なる方法で保存し処理する。これはRTBFへの準拠に新たな課題をもたらす。本稿では,これらの課題を探求し,機械アンラーニング,モデル編集,エンジニアリングの促進など,RTBFの技術的ソリューションの実装方法に関する知見を提供する。

The Right to be Forgotten (RTBF) was first established as the result of the ruling of Google Spain SL, Google Inc. v AEPD, Mario Costeja Gonz\'alez, and was later included as the Right to Erasure under the General Data Protection Regulation (GDPR) of European Union to allow individuals the right to request personal data be deleted by organizations. Specifically for search engines, individuals can send requests to organizations to exclude their information from the query results. With the recent development of Large Language Models (LLMs) and their use in chatbots, LLM-enabled software systems have become popular. But they are not excluded from the RTBF. Compared with the indexing approach used by search engines, LLMs store, and process information in a completely different way. This poses new challenges for compliance with the RTBF. In this paper, we explore these challenges and provide our insights on how to implement technical solutions for the RTBF, including the use of machine unlearning, model editing, and prompting engineering.

翻訳日:2023-07-11 16:28:04 公開日:2023-07-08

# スキーマ複雑異種情報ネットワークのための帰納的メタパス学習

Inductive Meta-path Learning for Schema-complex Heterogeneous Information Networks ( http://arxiv.org/abs/2307.03937v1 )

ライセンス: Link先を確認

Shixuan Liu, Changjun Fan, Kewei Cheng, Yunfei Wang, Peng Cui, Yizhou Sun, Zhong Liu

(参考訳) Heterogeneous Information Networks (HIN) は、複数のノードとエッジを持つ情報ネットワークである。メタパスの概念、すなわち2つのエンティティを接続するエンティティタイプと関係型のシーケンスは、様々なHINタスクのためのメタレベル説明可能なセマンティクスを提供するために提案される。伝統的に、メタパスは主にスキーマ単純なHIN(例えば、いくつかのエンティティタイプしか持たない書誌ネットワーク)で使用される。しかし、数百のエンティティと関係型を持つ知識ベース(KB)のようなスキーマ複雑なHINに対するメタパスの採用は、メタパス列挙に伴う計算複雑性のために制限されている。さらに、メタパスを効果的に評価するには、関連するパスインスタンスを列挙する必要がある。これらの課題に対処するために,スキーマ複雑HINのための帰納的メタパス学習フレームワークであるSchemaWalkを提案する。様々な関係に対するメタパスのスコアの学習を支援するため、スキーマレベルの表現を持つメタパスを表現し、各関係に対する徹底したパスインスタンス列挙の必要性を緩和する。さらに,ネットワークスキーマ(すなわちスキーマグラフ)を直接ナビゲートし,高いカバレッジと複数の関係に対する信頼性を持ったメタパス確立のためのポリシを学習する強化学習ベースのパス探索エージェントを設計する。実データ集合に関する広範な実験により,提案手法の有効性が示された。

Heterogeneous Information Networks (HINs) are information networks with multiple types of nodes and edges. The concept of meta-path, i.e., a sequence of entity types and relation types connecting two entities, is proposed to provide the meta-level explainable semantics for various HIN tasks. Traditionally, meta-paths are primarily used for schema-simple HINs, e.g., bibliographic networks with only a few entity types, where meta-paths are often enumerated with domain knowledge. However, the adoption of meta-paths for schema-complex HINs, such as knowledge bases (KBs) with hundreds of entity and relation types, has been limited due to the computational complexity associated with meta-path enumeration. Additionally, effectively assessing meta-paths requires enumerating relevant path instances, which adds further complexity to the meta-path learning process. To address these challenges, we propose SchemaWalk, an inductive meta-path learning framework for schema-complex HINs. We represent meta-paths with schema-level representations to support the learning of the scores of meta-paths for varying relations, mitigating the need of exhaustive path instance enumeration for each relation. Further, we design a reinforcement-learning based path-finding agent, which directly navigates the network schema (i.e., schema graph) to learn policies for establishing meta-paths with high coverage and confidence for multiple relations. Extensive experiments on real data sets demonstrate the effectiveness of our proposed paradigm.

翻訳日:2023-07-11 16:27:44 公開日:2023-07-08

# 量子化ニューラルネットのための効率的なインメモリコンピューティングハードウェアに向けて -最先端, オープンチャレンジと展望-

Towards Efficient In-memory Computing Hardware for Quantized Neural Networks: State-of-the-art, Open Challenges and Perspectives ( http://arxiv.org/abs/2307.03936v1 )

ライセンス: Link先を確認

Olga Krestinskaya, Li Zhang, Khaled Nabil Salama

(参考訳) クラウドで処理されるデータの量、IoT(Internet-of-Things)アプリケーションの開発、データプライバシの懸念の増加により、クラウドベースの処理からエッジベースの処理への移行を余儀なくされる。エッジ上の限られたエネルギーと計算資源は、伝統的なフォン・ノイマンアーキテクチャから、特に機械学習やニューラルネットワークアプリケーションのためのインメモリコンピューティング(IMC)への移行を押し進めている。ネットワーク圧縮技術は、限られたハードウェアリソースにニューラルネットワークを実装するために適用される。量子化は、メモリフットプリント、レイテンシ、エネルギー消費を削減できる最も効率的なネットワーク圧縮技術の1つである。本稿では、IMCベースの量子ニューラルネットワーク(QNN)の総合的なレビューを行い、ソフトウェアベースの量子化アプローチとIMCハードウェアの実装を関連付ける。さらに、オープンチャレンジ、QNN設計要件、レコメンデーション、およびIMCベースのQNNハードウェアロードマップも提供される。

The amount of data processed in the cloud, the development of Internet-of-Things (IoT) applications, and growing data privacy concerns force the transition from cloud-based to edge-based processing. Limited energy and computational resources on edge push the transition from traditional von Neumann architectures to In-memory Computing (IMC), especially for machine learning and neural network applications. Network compression techniques are applied to implement a neural network on limited hardware resources. Quantization is one of the most efficient network compression techniques allowing to reduce the memory footprint, latency, and energy consumption. This paper provides a comprehensive review of IMC-based Quantized Neural Networks (QNN) and links software-based quantization approaches to IMC hardware implementation. Moreover, open challenges, QNN design requirements, recommendations, and perspectives along with an IMC-based QNN hardware roadmap are provided.

翻訳日:2023-07-11 16:27:20 公開日:2023-07-08

# カモフラージュ物体検出のためのエッジアウェアミラーネットワーク

Edge-Aware Mirror Network for Camouflaged Object Detection ( http://arxiv.org/abs/2307.03932v1 )

ライセンス: Link先を確認

Dongyue Sun, Shiyao Jiang, Lin Qi

(参考訳) 既存のエッジ対応のcamouflaged object detection (COD) 法は、通常は早期にエッジ予測を出力する。しかし、エッジは以下のセグメンテーションタスクにおいて重要かつ基本的な要素である。カモフラージュされたターゲットと周囲の視覚的類似性が高いため、早期に予測されるエッジは通常、誤った前景とセグメンテーションの特徴をもたらす。そこで本研究では,エッジ検出と擬似オブジェクト分割をクロスリファインメントプロセスとしてモデル化した,エッジ対応ミラーネットワーク(EAMNet)を提案する。具体的には、EAMNetは2分岐アーキテクチャを持ち、セグメンテーションによって引き起こされるエッジアグリゲーションモジュールとエッジによって引き起こされる整合アグリゲーションモジュールは、セグメンテーションブランチとエッジ検出ブランチをクロスガイドするように設計されている。残差接続とゲート畳み込みを利用したガイド-残留チャネルアテンションモジュールは、最終的に低レベルの特徴から構造的詳細を抽出する。定量的および定性的な実験の結果、EAMNetは3つの広く使用されているCODデータセットで既存の最先端ベースラインを上回っている。コードはhttps://github.com/sdy1999/EAMNetで入手できる。

Existing edge-aware camouflaged object detection (COD) methods normally output the edge prediction in the early stage. However, edges are important and fundamental factors in the following segmentation task. Due to the high visual similarity between camouflaged targets and the surroundings, edge prior predicted in early stage usually introduces erroneous foreground-background and contaminates features for segmentation. To tackle this problem, we propose a novel Edge-aware Mirror Network (EAMNet), which models edge detection and camouflaged object segmentation as a cross refinement process. More specifically, EAMNet has a two-branch architecture, where a segmentation-induced edge aggregation module and an edge-induced integrity aggregation module are designed to cross-guide the segmentation branch and edge detection branch. A guided-residual channel attention module which leverages the residual connection and gated convolution finally better extracts structural details from low-level features. Quantitative and qualitative experiment results show that EAMNet outperforms existing cutting-edge baselines on three widely used COD datasets. Codes are available at https://github.com/sdy1999/EAMNet.

翻訳日:2023-07-11 16:27:03 公開日:2023-07-08

# rosko: 疎行列乗算カーネルのための外積をスキップする行

Rosko: Row Skipping Outer Products for Sparse Matrix Multiplication Kernels ( http://arxiv.org/abs/2307.03930v1 )

ライセンス: Link先を確認

Vikas Natesh, Andrew Sabot, H.T. Kung, Mark Ting

(参考訳) 深層ニューラルネットワーク(DNN)の計算とメモリアクセス要求を低減するために,スパース行列乗算(SpMM)カーネルを導出するための行スキップ外積であるRoskoを提案する。 Roskoは、プログラム実行中の行全体のスキップを可能にする。我々は,プロセッサコアを有効活用し,自動チューニングや探索空間探索を必要とせずにデータ移動を最小化するために,ハードウェア特性に適応するスパースCPUカーネルを解析的に導出した。 Roskoは他の外部製品スケジューリング手法と統合することができ、Roskoのパッキングフォーマットを使用して行スキップを利用して不要な計算を省略することができる。 Roskoカーネルは、さまざまなニューラルネットワークワークロードにわたる実際のハードウェア上で、既存の自動チューニングおよび検索ベースのソリューションと、最先端のベンダ最適化ライブラリを上回っている。機械学習で一般的に見られる65%から99.8%の範囲の行列の場合、RoskoカーネルはIntelとARM CPUの6.5倍のランタイム削減を実現している。

We propose Rosko -- row skipping outer products -- for deriving sparse matrix multiplication (SpMM) kernels in reducing computation and memory access requirements of deep neural networks (DNNs). Rosko allows skipping of entire row computations during program execution with low sparsity-management overheads. We analytically derive sparse CPU kernels that adapt to given hardware characteristics to effectively utilize processor cores and minimize data movement without the need for auto-tuning or search space exploration. Rosko can be integrated with other outer product scheduling methods, allowing them to leverage row skipping by using Rosko's packing format to skip unnecessary computation. Rosko kernels outperform existing auto-tuning and search-based solutions as well as state-of-the-art vendor-optimized libraries on real hardware across a variety of neural network workloads. For matrices with sparsities ranging from 65% to 99.8% typically found in machine learning, Rosko kernels achieve up to a 6.5x runtime reduction on Intel and ARM CPUs.

翻訳日:2023-07-11 16:26:42 公開日:2023-07-08

# Fairness-Aware Graph Neural Networks: A Survey

Fairness-Aware Graph Neural Networks: A Survey ( http://arxiv.org/abs/2307.03929v1 )

ライセンス: Link先を確認

April Chen, Ryan A. Rossi, Namyong Park, Puja Trivedi, Yu Wang, Tong Yu, Sungchul Kim, Franck Dernoncourt, Nesreen K. Ahmed

(参考訳) グラフニューラルネットワーク(GNN)はその表現力と多くの基本的な学習タスクにおける最先端の予測性能により、ますます重要になっている。この成功にもかかわらず、GNNは、基礎となるグラフデータと、大規模なGNNモデルの中心にある基本的な集約メカニズムによって生じる公平性の問題に悩まされている。本稿では,GNNの公平性向上のためのフェアネス手法の検討と分類を行う。以前のfair gnnモデルと手法は、前処理段階、トレーニング段階、後処理段階における公平性の改善に焦点を当てているか、という点で議論されている。さらに,このような手法を適切な時にどのように併用するかを議論し,その利点と直感を強調する。また,グラフレベルのフェアネス,近所レベルのフェアネス,埋め込みレベルのフェアネス,予測レベルのフェアネス指標を含む,公正評価指標に対する直感的な分類法を提案する。さらに、GNNモデルの公平性をベンチマークするのに有用なグラフデータセットを簡潔に要約する。最後に、対処すべき重要なオープンな問題と課題を強調します。

Graph Neural Networks (GNNs) have become increasingly important due to their representational power and state-of-the-art predictive performance on many fundamental learning tasks. Despite this success, GNNs suffer from fairness issues that arise as a result of the underlying graph data and the fundamental aggregation mechanism that lies at the heart of the large class of GNN models. In this article, we examine and categorize fairness techniques for improving the fairness of GNNs. Previous work on fair GNN models and techniques are discussed in terms of whether they focus on improving fairness during a preprocessing step, during training, or in a post-processing phase. Furthermore, we discuss how such techniques can be used together whenever appropriate, and highlight the advantages and intuition as well. We also introduce an intuitive taxonomy for fairness evaluation metrics including graph-level fairness, neighborhood-level fairness, embedding-level fairness, and prediction-level fairness metrics. In addition, graph datasets that are useful for benchmarking the fairness of GNN models are summarized succinctly. Finally, we highlight key open problems and challenges that remain to be addressed.

翻訳日:2023-07-11 16:26:25 公開日:2023-07-08

# 差分プライバシーの仮説検証によるデータ再構成攻撃

Bounding data reconstruction attacks with the hypothesis testing interpretation of differential privacy ( http://arxiv.org/abs/2307.03928v1 )

ライセンス: Link先を確認

Georgios Kaissis, Jamie Hayes, Alexander Ziller, Daniel Rueckert

(参考訳) 機械学習モデルに対するデータ再構成攻撃の成功の上限として最近提案されたRestructor Robustness(ReRo)について検討する。これまでの研究では、差分プライバシー(DP)機構がReRoを提供することを示したが、これまでのところ、ReRo境界のモンテカルロの漸近的な推定しか示されていない。したがって、一般DP機構に対する直接計算可能なReRo境界が望ましい。本研究では, 仮説検定 dp と rero の関連を確立し, ラプラス・ガウス機構とそのサブサンプリングされた変種に対する閉形式, 解析的, 数値的rero境界を導出する。

We explore Reconstruction Robustness (ReRo), which was recently proposed as an upper bound on the success of data reconstruction attacks against machine learning models. Previous research has demonstrated that differential privacy (DP) mechanisms also provide ReRo, but so far, only asymptotic Monte Carlo estimates of a tight ReRo bound have been shown. Directly computable ReRo bounds for general DP mechanisms are thus desirable. In this work, we establish a connection between hypothesis testing DP and ReRo and derive closed-form, analytic or numerical ReRo bounds for the Laplace and Gaussian mechanisms and their subsampled variants.

翻訳日:2023-07-11 16:26:06 公開日:2023-07-08

# 時間内の縫い目は9を節約する:低信頼生成の検証によるllmの幻覚の検出と緩和

A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation ( http://arxiv.org/abs/2307.03987v1 )

ライセンス: Link先を確認

Neeraj Varshney, Wenlin Yao, Hongming Zhang, Jianshu Chen, and Dong Yu

(参考訳) 近年、大規模な言語モデルが、フルーエントでコヒーレントなテキストを生成することに顕著な成功を収めている。しかしながら、これらのモデルは、しばしば信頼性を著しく損なう「幻覚」を引き起こす傾向がある。本研究では,この課題に対処し,生成過程において幻覚を積極的に検出し緩和する手法を提案する。具体的には、まずモデルのロジット出力値を利用した潜在的幻覚の候補を特定し、検証手順によりそれらの正しさを確認し、検出された幻覚を緩和し、生成過程を継続する。粒子生成課題」を用いた広範囲な実験により,我々はまず,検出・緩和手法の個別効果を実証した。具体的には、検出技術は88%のリコールを達成し、緩和技術は正しく検出された幻覚の57.6%を軽減した。重要なことは,誤検出された幻覚,すなわち偽陽性の場合においても,新たな幻覚は導入されない。そして,提案手法により,gpt-3モデルの幻覚を平均47.5%から14.5%に低減できることを示した。まとめると、私たちの研究は、大規模な言語モデルの信頼性と信頼性の向上に寄与します。

Recently developed large language models have achieved remarkable success in generating fluent and coherent text. However, these models often tend to 'hallucinate' which critically hampers their reliability. In this work, we address this crucial problem and propose an approach that actively detects and mitigates hallucinations during the generation process. Specifically, we first identify the candidates of potential hallucination leveraging the model's logit output values, check their correctness through a validation procedure, mitigate the detected hallucinations, and then continue with the generation process. Through extensive experiments with the 'article generation task', we first demonstrate the individual efficacy of our detection and mitigation techniques. Specifically, the detection technique achieves a recall of 88% and the mitigation technique successfully mitigates 57.6% of the correctly detected hallucinations. Importantly, our mitigation technique does not introduce new hallucinations even in the case of incorrectly detected hallucinations, i.e., false positives. Then, we show that the proposed active detection and mitigation approach successfully reduces the hallucinations of the GPT-3 model from 47.5% to 14.5% on average. In summary, our work contributes to improving the reliability and trustworthiness of large language models, a crucial step en route to enabling their widespread adoption in real-world applications.

翻訳日:2023-07-11 16:20:34 公開日:2023-07-08

# TractGeoNet: 言語アセスメント性能予測のためのトラクション微細構造のポイントワイズ解析のための幾何学的ディープラーニングフレームワーク

TractGeoNet: A geometric deep learning framework for pointwise analysis of tract microstructure to predict language assessment performance ( http://arxiv.org/abs/2307.03982v1 )

ライセンス: Link先を確認

Yuqian Chen, Leo R. Zekelman, Chaoyi Zhang, Tengfei Xue, Yang Song, Nikos Makris, Yogesh Rathi, Alexandra J. Golby, Weidong Cai, Fan Zhang, Lauren J. O'Donnell

(参考訳) 拡散磁気共鳴画像(dMRI)と関連する点組織計測を用いて回帰を行うための幾何学的深層学習ベースのフレームワークであるTractGeoNetを提案する。ポイントクラウド表現を使用することで、traitgeonetは繊維路内のすべての点からポイントワイズ組織微細構造と位置情報を直接利用することができる。そこで本研究では,回帰ラベルスコアの相対的差を絶対値よりも精度良く予測することに焦点を当てた,新しい損失関数であるペアド・シアム回帰損失を提案する。さらに, 回帰作業のための白色物質繊維トラクト内の高い予測的解剖学的領域を同定するための臨界領域局所化アルゴリズムを提案する。ヒトコネクトームプロジェクトから806名の被験者から得られた20個の関連白質線維路のデータセットを用いて、言語における2つの神経心理学的評価における個人のパフォーマンスを予測することにより、提案手法の有効性を評価する。その結果, 一般的な回帰モデルと比較して, tractgeonet の予測性能が向上した。研究した20路のうち,左肩甲骨筋管は2つの言語性能評価の最も高い予測値であることがわかった。局所的な臨界領域は、上側頭葉と前側頭葉、頭頂側頭葉、および前側頭葉などの言語機能に重要なとされる脳の領域を含む、両半球および全脳葉に広く分布する。 tractgeonetは、脳の白質繊維路の研究を強化し、その構造を言語性能などの人間の特性に関連付けるために、幾何学的深層学習の可能性を実証している。

We propose a geometric deep-learning-based framework, TractGeoNet, for performing regression using diffusion magnetic resonance imaging (dMRI) tractography and associated pointwise tissue microstructure measurements. By employing a point cloud representation, TractGeoNet can directly utilize pointwise tissue microstructure and positional information from all points within a fiber tract. To improve regression performance, we propose a novel loss function, the Paired-Siamese Regression loss, which encourages the model to focus on accurately predicting the relative differences between regression label scores rather than just their absolute values. In addition, we propose a Critical Region Localization algorithm to identify highly predictive anatomical regions within the white matter fiber tracts for the regression task. We evaluate the effectiveness of the proposed method by predicting individual performance on two neuropsychological assessments of language using a dataset of 20 association white matter fiber tracts from 806 subjects from the Human Connectome Project. The results demonstrate superior prediction performance of TractGeoNet compared to several popular regression models. Of the twenty tracts studied, we find that the left arcuate fasciculus tract is the most highly predictive of the two studied language performance assessments. The localized critical regions are widespread and distributed across both hemispheres and all cerebral lobes, including areas of the brain considered important for language function such as superior and anterior temporal regions, pars opercularis, and precentral gyrus. Overall, TractGeoNet demonstrates the potential of geometric deep learning to enhance the study of the brain's white matter fiber tracts and to relate their structure to human traits such as language performance.

翻訳日:2023-07-11 16:20:16 公開日:2023-07-08

# EffUNetとトランスファー学習アプローチを用いた建物と道路のセグメンテーション

Building and Road Segmentation Using EffUNet and Transfer Learning Approach ( http://arxiv.org/abs/2307.03980v1 )

ライセンス: Link先を確認

Sahil Gangurde

(参考訳) 都市では、水道、鉄道、送電線、建物、道路などの都市物に関する情報が都市計画に必要である。特に、これらのオブジェクトの拡散、場所、キャパシティに関する情報は、政策立案者が影響力のある決定を下すために必要です。この論文は、衛星とuavが捉えた空中画像から建物と道路を分割することを目的としている。セマンティックセグメンテーションタスクのために多くの異なるアーキテクチャが提案されており、unetはその1つである。本稿では,google が新たに提案する efficientnetv2 を,unet デコーダを用いた特徴抽出のためのエンコーダとして,セグメンテーションマップを構築するための新しいアーキテクチャを提案する。このアプローチを使用して、マサチューセッツ・ビルディングとロードのデータセットのベンチマークスコアをそれぞれ0.8365と0.9153で達成しました。

In city, information about urban objects such as water supply, railway lines, power lines, buildings, roads, etc., is necessary for city planning. In particular, information about the spread of these objects, locations and capacity is needed for the policymakers to make impactful decisions. This thesis aims to segment the building and roads from the aerial image captured by the satellites and UAVs. Many different architectures have been proposed for the semantic segmentation task and UNet being one of them. In this thesis, we propose a novel architecture based on Google's newly proposed EfficientNetV2 as an encoder for feature extraction with UNet decoder for constructing the segmentation map. Using this approach we achieved a benchmark score for the Massachusetts Building and Road dataset with an mIOU of 0.8365 and 0.9153 respectively.

翻訳日:2023-07-11 16:19:44 公開日:2023-07-08

# Autonomy 2.0: スケールのエコノミーの探求

Autonomy 2.0: The Quest for Economies of Scale ( http://arxiv.org/abs/2307.03973v1 )

ライセンス: Link先を確認

Shuang Wu, Bo Yu, Shaoshan Liu, Yuhao Zhu

(参考訳) 過去10年間のロボティクスとAI技術の進歩により、私たちは今や自律機械の時代に入りました。情報技術の新たな時代には、サービスロボット、自律ドローン、配達ロボット、人間ではなく自動運転車といった自律型マシンがサービスを提供する。本稿では,デジタル経済の技術的課題と経済的影響を調べることによって,スケーラビリティは技術的観点からも極めて必要であり,経済的観点からも極めて有利である,と論じる。それにもかかわらず、現在の開発パラダイムであるAutonomy 1.0は、データや計算リソースの量ではなく、エンジニア数でスケールしているため、自律性産業がスケールの経済、特に指数関数的に安い計算コストと利用可能なデータの爆発から完全に利益を得ることができない。さらに、重要なスケーラビリティブロッカーを分析し、Autonomy 2.0と呼ばれる新しい開発パラダイムがこれらの問題に対処して、自律性産業を劇的に向上させる方法について説明する。

With the advancement of robotics and AI technologies in the past decade, we have now entered the age of autonomous machines. In this new age of information technology, autonomous machines, such as service robots, autonomous drones, delivery robots, and autonomous vehicles, rather than humans, will provide services. In this article, through examining the technical challenges and economic impact of the digital economy, we argue that scalability is both highly necessary from a technical perspective and significantly advantageous from an economic perspective, thus is the key for the autonomy industry to achieve its full potential. Nonetheless, the current development paradigm, dubbed Autonomy 1.0, scales with the number of engineers, instead of with the amount of data or compute resources, hence preventing the autonomy industry to fully benefit from the economies of scale, especially the exponentially cheapening compute cost and the explosion of available data. We further analyze the key scalability blockers and explain how a new development paradigm, dubbed Autonomy 2.0, can address these problems to greatly boost the autonomy industry.

翻訳日:2023-07-11 16:19:31 公開日:2023-07-08

# 中国語文法誤り訂正課題における大規模言語モデルの能力評価

Evaluating the Capability of Large-scale Language Models on Chinese Grammatical Error Correction Task ( http://arxiv.org/abs/2307.03972v1 )

ライセンス: Link先を確認

Fanyi Qu and Yunfang Wu

(参考訳) 大規模言語モデル(LLM)は、様々な自然言語処理(NLP)タスクにおいて顕著な能力を示し、近年多くの注目を集めている。しかし、いくつかの研究では、英語文法誤り訂正(GEC)タスクにおける最先端モデル以上の有望な結果が得られないことが示されている。本稿では,中国語の文法的誤り訂正タスクにおける大規模言語モデルの性能について検討し,今後の研究の指針を提供する。 4つの中国GECデータセット上で3つの異なるモデルスケールのLLMを用いて実験を行った。実験結果から,自動評価指標におけるllmの性能は,過剰補正の問題から以前のsomaモデルに及ばないことが示された。また,異なるデータ分布で評価した場合,llmの性能に有意な変動が認められた。以上の結果から,中国GEC課題へのLCMの適用にはさらなる調査が必要であることが示唆された。

Large-scale language models (LLMs) has shown remarkable capability in various of Natural Language Processing (NLP) tasks and attracted lots of attention recently. However, some studies indicated that large language models fail to achieve promising result beyond the state-of-the-art models in English grammatical error correction (GEC) tasks. In this report, we aim to explore the how large language models perform on Chinese grammatical error correction tasks and provide guidance for future work. We conduct experiments with 3 different LLMs of different model scale on 4 Chinese GEC dataset. Our experimental results indicate that the performances of LLMs on automatic evaluation metrics falls short of the previous sota models because of the problem of over-correction. Furthermore, we also discover notable variations in the performance of LLMs when evaluated on different data distributions. Our findings demonstrates that further investigation is required for the application of LLMs on Chinese GEC task.

翻訳日:2023-07-11 16:19:10 公開日:2023-07-08

# エンド・ツー・エンドのマルチラベルコントラスト学習

End-to-End Supervised Multilabel Contrastive Learning ( http://arxiv.org/abs/2307.03967v1 )

ライセンス: Link先を確認

Ahmad Sajedi and Samir Khaki and Konstantinos N. Plataniotis and Mahdi S. Hosseini

(参考訳) マルチラベル表現学習は、オブジェクトカテゴリ間のラベル依存性や、プラス/ネガティブサンプルの固有の不均衡といったデータ関連の問題に関連付けられる難題として認識されている。最近の進歩はモデルやデータ中心の視点からこれらの課題に対処している。モデル中心では、ラベル相関は外部モデル設計(例えばグラフcnn)によって得られ、訓練のための帰納バイアスを組み込む。しかし、エンドツーエンドのトレーニングフレームワークの設計に失敗し、計算の複雑さが高まった。逆にデータ中心では、ラベル依存を無視しながら分類を改善するためにデータセットの現実的な性質が考慮される。本稿では,モデル中心設計とデータ中心設計の両方の欠点に対処するために,新たなエンドツーエンドトレーニングフレームワークであるkmcl(kernel-based mutlilabel contrastive learning)を提案する。 KMCLはまず組み込み機能をガウス RKHS の指数核の混合に変換する。その後、目的の損失を符号化する。 (a)カーネル表現の再構築のための再構築損失 b)固有の不均衡問題に対処する非対称な分類損失、及び (c)ラベル相関を捉えるための対比的損失。 KMCLは、低い計算フットプリントを維持しながら、特徴エンコーダの不確実性をモデル化する。画像分類タスクにおいて,SOTA法よりも一貫したKMCLの改良を示す大規模な実験を行った。 PyTorchの実装は \url{https://github.com/mahdihosseini/KMCL} で提供されている。

Multilabel representation learning is recognized as a challenging problem that can be associated with either label dependencies between object categories or data-related issues such as the inherent imbalance of positive/negative samples. Recent advances address these challenges from model- and data-centric viewpoints. In model-centric, the label correlation is obtained by an external model designs (e.g., graph CNN) to incorporate an inductive bias for training. However, they fail to design an end-to-end training framework, leading to high computational complexity. On the contrary, in data-centric, the realistic nature of the dataset is considered for improving the classification while ignoring the label dependencies. In this paper, we propose a new end-to-end training framework -- dubbed KMCL (Kernel-based Mutlilabel Contrastive Learning) -- to address the shortcomings of both model- and data-centric designs. The KMCL first transforms the embedded features into a mixture of exponential kernels in Gaussian RKHS. It is then followed by encoding an objective loss that is comprised of (a) reconstruction loss to reconstruct kernel representation, (b) asymmetric classification loss to address the inherent imbalance problem, and (c) contrastive loss to capture label correlation. The KMCL models the uncertainty of the feature encoder while maintaining a low computational footprint. Extensive experiments are conducted on image classification tasks to showcase the consistent improvements of KMCL over the SOTA methods. PyTorch implementation is provided in \url{https://github.com/mahdihosseini/KMCL}.

翻訳日:2023-07-11 16:18:57 公開日:2023-07-08

# 実例システムによるプログラミング用アノテーションのマルチインテント検出

Multi-Intent Detection in User Provided Annotations for Programming by Examples Systems ( http://arxiv.org/abs/2307.03966v1 )

ライセンス: Link先を確認

Nischal Ashok Kumar, Nitin Gupta, Shanmukha Guttula, Hima Patel

(参考訳) エンタープライズアプリケーションのマッピングでは、データマッピングは統合開発の基本部分であり続けるが、時間を要する。多くのアプリケーションが命名基準を欠いているため、ネストされたフィールド構造は統合開発者をさらに複雑にします。マッピングが完了すると、各アプリケーションが特定のフォーマットでデータを期待しているため、データ変換がユーザにとって次の課題になります。また、統合フローを構築しながら、開発者はソースとターゲットのデータフィールドのフォーマットを理解し、ソースからターゲットフォーマットへのデータ変更が可能な変換プログラムを考え出す必要がある。いくつかの仕様からプログラム合成パラダイムによる変換プログラムの自動生成の問題が人工知能(AI)の初期から研究されている。 Programming by Example (PBE) は、ユーザが提供する入力および出力サンプルからフォーマットや文字列変換タスクを達成するためのコンピュータプログラムの自動推論をターゲットにした手法である。正しい意図を学習するには、ユーザからの多様なサンプルセットが必要である。しかし、ユーザが多様なサンプルセットを提供できない可能性がある。これは入力と出力のサンプルに複数の意図や曖昧さをもたらす可能性がある。したがって、PBEシステムは正しい意図プログラムを生成する際に混乱する可能性がある。本稿では,入力出力文字列を解析し,複数の目的に責任を負う特性の異なるセットにマッピングする,ディープニューラルネットワークに基づくあいまいさ予測モデルを提案する。ユーザはこれらのプロパティを分析して、新しいサンプルを提供したり、既存のサンプルを変更したりすることで、エンタープライズアプリケーションをマッピングするためのより良いpbeシステムを構築することができる。

In mapping enterprise applications, data mapping remains a fundamental part of integration development, but its time consuming. An increasing number of applications lack naming standards, and nested field structures further add complexity for the integration developers. Once the mapping is done, data transformation is the next challenge for the users since each application expects data to be in a certain format. Also, while building integration flow, developers need to understand the format of the source and target data field and come up with transformation program that can change data from source to target format. The problem of automatic generation of a transformation program through program synthesis paradigm from some specifications has been studied since the early days of Artificial Intelligence (AI). Programming by Example (PBE) is one such kind of technique that targets automatic inferencing of a computer program to accomplish a format or string conversion task from user-provided input and output samples. To learn the correct intent, a diverse set of samples from the user is required. However, there is a possibility that the user fails to provide a diverse set of samples. This can lead to multiple intents or ambiguity in the input and output samples. Hence, PBE systems can get confused in generating the correct intent program. In this paper, we propose a deep neural network based ambiguity prediction model, which analyzes the input-output strings and maps them to a different set of properties responsible for multiple intent. Users can analyze these properties and accordingly can provide new samples or modify existing samples which can help in building a better PBE system for mapping enterprise applications.

翻訳日:2023-07-11 16:18:37 公開日:2023-07-08

# ChatGPTは人格認識に優れているか? 予備的研究

Is ChatGPT a Good Personality Recognizer? A Preliminary Study ( http://arxiv.org/abs/2307.03952v1 )

ライセンス: Link先を確認

Yu Ji, Wen Wu, Hong Zheng, Yi Hu, Xi Chen, Liang He

(参考訳) 近年、パーソナリティは感情分析や製品のレコメンデーションといった多くのタスクに組み込まれている価値ある個人的要因とみなされている。これは、与えられたテキストに基づいて個人のパーソナリティを識別することを目的とした、テキストベースのパーソナリティ認識タスクに広く注目されている。近年,ChatGPTが様々な自然言語処理タスクにおいて顕著な能力を発揮していることを考慮し,テキストに基づく人格認識タスクにおけるChatGPTの予備評価を行い,効果的な人格データを生成する。具体的には,ChatGPTが与えられたテキストから人格を認識する能力,特に所定レベルでの分析においてChatGPTを導くために設計されたレベル指向のプロンプト戦略を探索する。 2つの代表的な実世界のデータセットにおけるChatGPTの性能を、従来のニューラルネットワーク、微調整されたRoBERTa、およびそれに対応するタスク固有モデルと比較する。実験の結果,ゼロショット・チェーン・オブ・マインドプロンプトのchatgptは印象的なパーソナリティ認識能力を示すことがわかった。ゼロショットチェーンのプロンプトによってトリガーされ、ChatGPTは2つのデータセット上で微調整されたRoBERTaよりも優れており、テキストベースの論理的推論を通じて自然言語の説明を提供することができる。さらに、ゼロショット・チェーン・オブ・シークレット・プロンプトとは対照的に、ゼロショット・レベル指向・チェーン・オブ・シークレット・プロンプトは、ChatGPTのパーソナリティ予測能力を高め、ChatGPTとそれに対応するタスク固有モデルのパフォーマンスギャップを低減する。また,ChatGPTの性格を識別する際の公正さを観察する実験を行い,性別や年齢などのセンシティブな属性に対して,ChatGPTが不公平であることを示す。

In recent years, personality has been regarded as a valuable personal factor being incorporated into numerous tasks such as sentiment analysis and product recommendation. This has led to widespread attention to text-based personality recognition task, which aims to identify an individual's personality based on given text. Considering that ChatGPT has recently exhibited remarkable abilities on various natural language processing tasks, we provide a preliminary evaluation of ChatGPT on text-based personality recognition task for generating effective personality data. Concretely, we employ a variety of prompting strategies to explore ChatGPT's ability in recognizing personality from given text, especially the level-oriented prompting strategy we designed for guiding ChatGPT in analyzing given text at a specified level. We compare the performance of ChatGPT on two representative real-world datasets with traditional neural network, fine-tuned RoBERTa, and corresponding state-of-the-art task-specific model. The experimental results show that ChatGPT with zero-shot chain-of-thought prompting exhibits impressive personality recognition ability. Triggered by zero-shot chain-of-thought prompting, ChatGPT outperforms fine-tuned RoBERTa on the two datasets and is capable to provide natural language explanations through text-based logical reasoning. Furthermore, relative to zero-shot chain-of-thought prompting, zero-shot level-oriented chain-of-thought prompting enhances the personality prediction ability of ChatGPT and reduces the performance gap between ChatGPT and corresponding state-of-the-art task-specific model. Besides, we also conduct experiments to observe the fairness of ChatGPT when identifying personality and discover that ChatGPT shows unfairness to some sensitive demographic attributes such as gender and age.

翻訳日:2023-07-11 16:18:14 公開日:2023-07-08

# レーン間の読書: 道路上のテキストビデオQA

Reading Between the Lanes: Text VideoQA on the Road ( http://arxiv.org/abs/2307.03948v1 )

ライセンス: Link先を確認

George Tom, Minesh Mathew, Sergi Garcia, Dimosthenis Karatzas and C.V. Jawahar

(参考訳) 道路周辺のテキストと標識はドライバーにとって重要な情報を提供し、安全な航行と状況認識に不可欠である。動作中のシーンのテキスト認識は難しい問題であり、テキストの手がかりは通常短時間で現れるが、距離での早期検出が必要となる。このような情報を利用してドライバーを支援するシステムは、ビデオストリームから視覚的およびテキスト的手がかりを抽出し、取り入れるだけでなく、時間とともに推論するべきである。この問題に対処するために、ドライバ支援の文脈でビデオ質問応答(VideoQA)タスクのための新しいデータセットであるRoadTextVQAを紹介する。 RoadTextVQAは、複数の国から集められた3222ドルのドライビングビデオから成り、10,500ドルの質問が注釈付けされ、すべてドライビングビデオにあるテキストまたはロードサインに基づいている。 RoadTextVQAデータセット上での最先端のビデオ質問応答モデルの性能評価を行い、車載支援システムとテキスト対応マルチモーダル質問応答の研究を進める上で、この領域における改善の可能性とデータセットの有用性を明らかにする。データセットはhttp://cvit.iiit.ac.in/research/projects/cvit-projects/roadtextvqaで入手できる。

Text and signs around roads provide crucial information for drivers, vital for safe navigation and situational awareness. Scene text recognition in motion is a challenging problem, while textual cues typically appear for a short time span, and early detection at a distance is necessary. Systems that exploit such information to assist the driver should not only extract and incorporate visual and textual cues from the video stream but also reason over time. To address this issue, we introduce RoadTextVQA, a new dataset for the task of video question answering (VideoQA) in the context of driver assistance. RoadTextVQA consists of $3,222$ driving videos collected from multiple countries, annotated with $10,500$ questions, all based on text or road signs present in the driving videos. We assess the performance of state-of-the-art video question answering models on our RoadTextVQA dataset, highlighting the significant potential for improvement in this domain and the usefulness of the dataset in advancing research on in-vehicle support systems and text-aware multimodal question answering. The dataset is available at http://cvit.iiit.ac.in/research/projects/cvit-projects/roadtextvqa

翻訳日:2023-07-11 16:17:37 公開日:2023-07-08

# 機械学習を用いたパッシブ光ネットワークの故障モニタリング

Fault Monitoring in Passive Optical Networks using Machine Learning Techniques ( http://arxiv.org/abs/2307.03945v1 )

ライセンス: Link先を確認

Khouloud Abdelli, Carsten Tropschug, Helmut Griesser, and Stephan Pachnicke

(参考訳) パッシブ光ネットワーク(PON)システムは、ファイバカットや光ネットワークユニット(ONU)送信機/受信機故障など、様々な障害に対して脆弱である。ファイバカットによるサービス中断は、サービスプロバイダやオペレーターにとって大きな損失をもたらす可能性がある。分岐からの反射が重なり合うため、ほぼ等距離分岐項の場合、故障したONUの同定が困難になるため、大域後方散乱信号による故障枝の識別が困難になる。ネットワークサイズが大きくなると、PONシステムの障害監視の複雑さが増大し、信頼性が低下する。そこで本研究では,ponシステムにおける障害監視のための機械学習(ml)手法を提案し,実験的な光時間領域反射法(otdr)データを用いて検証を行う。

Passive optical network (PON) systems are vulnerable to a variety of failures, including fiber cuts and optical network unit (ONU) transmitter/receiver failures. Any service interruption caused by a fiber cut can result in huge financial losses for service providers or operators. Identifying the faulty ONU becomes difficult in the case of nearly equidistant branch terminations because the reflections from the branches overlap, making it difficult to distinguish the faulty branch given the global backscattering signal. With increasing network size, the complexity of fault monitoring in PON systems increases, resulting in less reliable monitoring. To address these challenges, we propose in this paper various machine learning (ML) approaches for fault monitoring in PON systems, and we validate them using experimental optical time domain reflectometry (OTDR) data.

翻訳日:2023-07-11 16:17:16 公開日:2023-07-08

# 地下水数値モデリングにおけるU-Net & Vision Transformerの有効性の解明

Understanding the Efficacy of U-Net & Vision Transformer for Groundwater Numerical Modelling ( http://arxiv.org/abs/2307.04010v1 )

ライセンス: Link先を確認

Maria Luisa Taccari, Oded Ovadia, He Wang, Adar Kahana, Xiaohui Chen, Peter K. Jimack

(参考訳) 本稿では、地下水系における時間依存フォワードモデリングのための様々な機械学習モデル、すなわちビジョントランスフォーマー(ViT)と統合されたU-Netとフーリエニューラル演算子(FNO)を総合的に比較する。合成データセットのテストを通じて、U-NetとU-Net + ViTモデルは、特にスパースデータシナリオにおいて、精度と効率でFNOより優れていることを示した。これらの結果は,データ不足が顕著な実世界のアプリケーションにおいて,地下水モデリングのためのU-Netモデルの可能性を明らかにするものである。

This paper presents a comprehensive comparison of various machine learning models, namely U-Net, U-Net integrated with Vision Transformers (ViT), and Fourier Neural Operator (FNO), for time-dependent forward modelling in groundwater systems. Through testing on synthetic datasets, it is demonstrated that U-Net and U-Net + ViT models outperform FNO in accuracy and efficiency, especially in sparse data scenarios. These findings underscore the potential of U-Net-based models for groundwater modelling in real-world applications where data scarcity is prevalent.

翻訳日:2023-07-11 16:09:42 公開日:2023-07-08

# インタラクティブなディクテーションを目指して

Toward Interactive Dictation ( http://arxiv.org/abs/2307.04008v1 )

ライセンス: Link先を確認

Belinda Z. Li, Jason Eisner, Adam Pauls, Sam Thomson

(参考訳) 音声ディクテーションは、ますます重要なテキスト入力モダリティである。既存のシステムでは、コマンド言語をトリガーワードによって起動されるフラットテンプレートに制限している。本研究では,オープンエンド自然言語における音声編集コマンドを用いて,ユーザの判断を中断できる可能性について検討する。このようなシステムを試すために,新しいタスクとデータセット TERTiUS を導入する。この柔軟性をリアルタイムでサポートするには、システムは音声のスパンをディクテーションまたはコマンドとして段階的に分類し、コマンドであるスパンを解釈する必要がある。我々は、大規模な事前学習言語モデルを用いて、編集されたテキストを予測するか、あるいは小さなテキスト編集プログラムを予測する。より小さなモデルは1.3秒のレイテンシで30%のエンドステート精度を達成し、大きなモデルは55%のエンドステート精度を7秒のレイテンシで達成する。

Voice dictation is an increasingly important text input modality. Existing systems that allow both dictation and editing-by-voice restrict their command language to flat templates invoked by trigger words. In this work, we study the feasibility of allowing users to interrupt their dictation with spoken editing commands in open-ended natural language. We introduce a new task and dataset, TERTiUS, to experiment with such systems. To support this flexibility in real-time, a system must incrementally segment and classify spans of speech as either dictation or command, and interpret the spans that are commands. We experiment with using large pre-trained language models to predict the edited text, or alternatively, to predict a small text-editing program. Experiments show a natural trade-off between model accuracy and latency: a smaller model achieves 30% end-state accuracy with 1.3 seconds of latency, while a larger model achieves 55% end-state accuracy with 7 seconds of latency.

翻訳日:2023-07-11 16:09:32 公開日:2023-07-08

# 第9回合理性と知識の理論的側面に関する会議

Proceedings Ninetheenth conference on Theoretical Aspects of Rationality and Knowledge ( http://arxiv.org/abs/2307.04005v1 )

ライセンス: Link先を確認

Rineke Verbrugge (University of Groningen)

(参考訳) TARKカンファレンス(Theoretical aspects of Rationality and Knowledge)は、コンピュータ科学、人工知能、ゲーム理論、決定論、哲学、論理学、言語学、認知科学など、さまざまな分野の研究者を集結させることを目的としたカンファレンスである。その目標は、合理性と知識に関する推論を含む学際的な問題の理解を深めることである。 1986年以降、ジョー・ハルパーン (Joe Halpern) の主導で世界各国で隔年開催されている。関心の対象は、知識、信念、認識、不確実性、有界的合理性と資源境界推論、常識認識的推論、認識論理、認識論的ゲーム理論、知識と行動、知識とその他の精神状態に関する推論の応用、信念の修正、計算的社会選択、アルゴリズム的ゲーム理論、マルチエージェントシステムの基礎などである。タルクに関する情報は、会議の議事録を含むウェブサイト http://www.tark.org/ で入手でき、2023年6月28日から6月30日にかけてイギリスのオックスフォード大学で行われた第19回合理性と知識の理論的側面に関する会議 (tark 2023) で発表された論文を含んでいる。カンファレンスのwebサイトはhttps://sites.google.com/view/tark-2023にある。

The TARK conference (Theoretical Aspects of Rationality and Knowledge) is a conference that aims to bring together researchers from a wide variety of fields, including computer science, artificial intelligence, game theory, decision theory, philosophy, logic, linguistics, and cognitive science. Its goal is to further our understanding of interdisciplinary issues involving reasoning about rationality and knowledge. Previous conferences have been held biennially around the world since 1986, on the initiative of Joe Halpern (Cornell University). Topics of interest include, but are not limited to, semantic models for knowledge, belief, awareness and uncertainty, bounded rationality and resource-bounded reasoning, commonsense epistemic reasoning, epistemic logic, epistemic game theory, knowledge and action, applications of reasoning about knowledge and other mental states, belief revision, computational social choice, algorithmic game theory, and foundations of multi-agent systems. Information about TARK, including conference proceedings, is available at the website http://www.tark.org/ These proceedings contain the papers that have been accepted for presentation at the Nineteenth Conference on Theoretical Aspects of Rationality and Knowledge (TARK 2023), held between June 28 and June 30, 2023, at the University of Oxford, United Kingdom. The conference website can be found at https://sites.google.com/view/tark-2023

翻訳日:2023-07-11 16:09:13 公開日:2023-07-08

# 高配向化学蒸着ダイヤモンドの応力分布緩和による完全配向窒素-原子価中心のスピン脱落時間延長

Extending spin dephasing time of perfectly aligned Nitrogen-Vacancy centers by mitigating stress distribution on highly misoriented chemical-vapor-deposition diamond ( http://arxiv.org/abs/2307.04003v1 )

ライセンス: Link先を確認

T. Tsuji, T. Sekiguchi, T.Iwasaki and M.Hatano

(参考訳) 完全に整列した窒素空孔(NV)中心のスピン降下時間(T2*)をCVDダイヤモンドに拡張すると、直流磁気感度が向上する。しかし,nv中心のt2*は厚さが大きくなるにつれてダイヤモンド膜の応力分布によって著しく減少する。そこで本研究では, CVDダイヤモンド薄膜の応力分布を緩和し, アンサンブルNV中心のT2*拡張を実現する方法を開発した。配向角2.0, 3.7, 5.0, 10{\deg} の (111) ダイヤモンド基板上に, 完全配向NV中心の約50 cm のCVDダイヤモンド膜を形成した。その結果,nv中心のt2*は電子と核スピン浴のみに制限された値に接近し,方位角を増加させることがわかった。微視的応力測定により, CVDダイヤモンド薄膜の深度方向の応力分布は低配向角度で高度に不均一であったのに対し, 非均一性は高配向基板上で大きく抑制された。応力分布の減少は、CVDダイヤモンドの転位密度の低下に起因する可能性がある。本研究は,高感度量子センサに用いる高品質ダイヤモンド材料を合成するための重要な方法である。

Extending the spin-dephasing time (T2*) of perfectly aligned nitrogen-vacancy (NV) centers in large-volume chemical vapor deposition (CVD) diamonds leads to enhanced DC magnetic sensitivity. However, T2* of the NV centers is significantly reduced by the stress distribution in the diamond film as its thickness increases. To overcome this issue, we developed a method to mitigate the stress distribution in the CVD diamond films, leading to a T2* extension of the ensemble NV centers. CVD diamond films of approximately 50 \mu m thickness with perfectly aligned NV centers were formed on (111) diamond substrates with misorientation angles of 2.0, 3.7, 5.0, and 10{\deg}. We found that T2* of the ensemble of NV centers increased to approach the value limited only by the electron and nuclear spin bath with increasing the misorientation angle. Microscopic stress measurements revealed that the stress distribution was highly inhomogeneous along the depth direction in the CVD diamond film at low misorientation angles, whereas the inhomogeneity was largely suppressed on highly misoriented substrates. The reduced stress distribution possibly originates from the reduction of the dislocation density in the CVD diamond. This study provides an important method for synthesizing high-quality diamond materials for use in highly sensitive quantum sensors.

翻訳日:2023-07-11 16:08:45 公開日:2023-07-08

# 高次元特徴を持つ集合表現に適した多項式幅

Polynomial Width is Sufficient for Set Representation with High-dimensional Features ( http://arxiv.org/abs/2307.04001v1 )

ライセンス: Link先を確認

Peihao Wang, Shenghao Yang, Shu Li, Zhangyang Wang, Pan Li

(参考訳) 入力順序に敏感なニューラルネットワークの帰納的バイアスをモデル化するために、ディープラーニングでは集合表現がユビキタスになってきた。 deepsetsは、最も広く使われているニューラルネットワークアーキテクチャである。各集合要素を次元$L$で潜在空間に埋め込み、次に総集合埋め込みを得るために総和プーリングを行い、最終的に全体集合埋め込みを出力にマッピングする。本研究では,次元$L$がDeepSetsの表現力に与える影響について検討する。以前の分析では、1次元の特徴として過度に単純化された高次元特徴や、分析的アクティベーションに制限されていたため、実用的利用から逸脱するか、設定サイズ$N$と特徴次元$D$で指数関数的に成長する$L$が得られた。十分な表現力を達成する$l$の最小値を調べるために、2つの集合要素埋め込み層を示す。 (a)線形+電力活性化(lp)及び (b)線形+指数的活性化(LE) L$がpoly$(N, D)$であることは、両方の埋め込み層を用いた集合表現に十分であることを示す。また、LP埋め込み層に対して$L$の低いバウンダリも提供します。さらに、この結果を置換同変集合関数と複素体に拡張する。

Set representation has become ubiquitous in deep learning for modeling the inductive bias of neural networks that are insensitive to the input order. DeepSets is the most widely used neural network architecture for set representation. It involves embedding each set element into a latent space with dimension $L$, followed by a sum pooling to obtain a whole-set embedding, and finally mapping the whole-set embedding to the output. In this work, we investigate the impact of the dimension $L$ on the expressive power of DeepSets. Previous analyses either oversimplified high-dimensional features to be one-dimensional features or were limited to analytic activations, thereby diverging from practical use or resulting in $L$ that grows exponentially with the set size $N$ and feature dimension $D$. To investigate the minimal value of $L$ that achieves sufficient expressive power, we present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE). We demonstrate that $L$ being poly$(N, D)$ is sufficient for set representation using both embedding layers. We also provide a lower bound of $L$ for the LP embedding layer. Furthermore, we extend our results to permutation-equivariant set functions and the complex field.

翻訳日:2023-07-11 16:08:22 公開日:2023-07-08

# 効率的な逆トーンマッピングのための軽量改良残差ネットワーク

Lightweight Improved Residual Network for Efficient Inverse Tone Mapping ( http://arxiv.org/abs/2307.03998v1 )

ライセンス: Link先を確認

Liqi Xue, Tianyi Xu, Yongbao Song, Yan Liu, Lei Zhang, Xiantong Zhen, and Jun Xu

(参考訳) HDR10テレビのようなディスプレイデバイスは、高ダイナミックレンジ(HDR)画像を可視化するために、私たちの日常生活でますます普及している。しかし、インターネット上のメディア画像の大半は8ビット標準ダイナミックレンジ(SDR)フォーマットのままである。したがって,SDR画像のHDR画像への逆トーンマッピング(ITM)による変換は,豊富なメディア画像の潜在能力を最大限に活用するために重要である。しかし、既存のitm手法は通常、膨大な計算コストを必要とする複雑なネットワークアーキテクチャで開発されている。本稿では,効率的なitmを実現するために,人気のある残差ブロックのパワーを高めることで,軽量な改良残差ネットワーク(irnet)を提案する。具体的には,高精細HDR画像再構成のための多層構造を抽出・融合する改良された残留ブロック(IRB)を提案する。 3つのベンチマークデータセットの実験により、我々のIRNetはIMMタスクとSR-ITMタスクの両方で最先端のパフォーマンスを達成した。コード、モデル、データはhttps://github.com/ThisisVikki/ITMベースラインで公開される。

The display devices like HDR10 televisions are increasingly prevalent in our daily life for visualizing high dynamic range (HDR) images. But the majority of media images on the internet remain in 8-bit standard dynamic range (SDR) format. Therefore, converting SDR images to HDR ones by inverse tone mapping (ITM) is crucial to unlock the full potential of abundant media images. However, existing ITM methods are usually developed with complex network architectures requiring huge computational costs. In this paper, we propose a lightweight Improved Residual Network (IRNet) by enhancing the power of popular residual block for efficient ITM. Specifically, we propose a new Improved Residual Block (IRB) to extract and fuse multi-layer features for fine-grained HDR image reconstruction. Experiments on three benchmark datasets demonstrate that our IRNet achieves state-of-the-art performance on both the ITM and joint SR-ITM tasks. The code, models and data will be publicly available at https://github.com/ThisisVikki/ITM-baseline.

翻訳日:2023-07-11 16:08:01 公開日:2023-07-08

# 低位mdpにおける効率的なモデルフリー探索

Efficient Model-Free Exploration in Low-Rank MDPs ( http://arxiv.org/abs/2307.03997v1 )

ライセンス: Link先を確認

Zakaria Mhammedi, Adam Block, Dylan J. Foster, Alexander Rakhlin

(参考訳) 強化学習における大きな課題は、一般化と関数近似が必要な高次元領域での探索のための実践的でサンプル効率の良いアルゴリズムを開発することである。低ランクマルコフ決定プロセス(遷移確率が未知の機能埋め込みに基づく低ランク分解を許容する)は、関数近似を伴うrlの単純だが表現力に富むフレームワークを提供するが、既存のアルゴリズムは(1)計算に難解、(2)潜在変数構造、モデルベースの関数近似へのアクセス、到達可能性といった制限付き統計的仮定に依存する。本研究では,計算効率とモデル自由度を両立させ,一般関数近似を可能とし,付加的な構造仮定を必要としない,低ランクMPPの探索のための最初の実証可能なサンプル効率アルゴリズムを提案する。我々のアルゴリズムであるVoXは、表現学習とポリシー最適化をインターリーブすることで効率的な設計計算を行い、効率的に計算可能な基礎として機能埋め込みのための一般化された最適設計の概念を用いる。提案手法は,Frank-Wolfe法に基づく最適設計計算からポリシー最適化への新たな削減,および先行研究で見いだされたある種のミニマックス表現学習目標の分析などである。

A major challenge in reinforcement learning is to develop practical, sample-efficient algorithms for exploration in high-dimensional domains where generalization and function approximation is required. Low-Rank Markov Decision Processes -- where transition probabilities admit a low-rank factorization based on an unknown feature embedding -- offer a simple, yet expressive framework for RL with function approximation, but existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions such as latent variable structure, access to model-based function approximation, or reachability. In this work, we propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs that is both computationally efficient and model-free, allowing for general function approximation and requiring no additional structural assumptions. Our algorithm, VoX, uses the notion of a generalized optimal design for the feature embedding as an efficiently computable basis for exploration, performing efficient optimal design computation by interleaving representation learning and policy optimization. Our analysis -- which is appealingly simple and modular -- carefully combines several techniques, including a new reduction from optimal design computation to policy optimization based on the Frank-Wolfe method, and an improved analysis of a certain minimax representation learning objective found in prior work.

翻訳日:2023-07-11 16:07:47 公開日:2023-07-08

# アダプティブ埋め込みとセンスリングによる画像音化拡散モデル刺激

Stimulating the Diffusion Model for Image Denoising via Adaptive Embedding and Ensembling ( http://arxiv.org/abs/2307.03992v1 )

ライセンス: Link先を確認

Tong Li, Hansen Feng, Lizhi Wang, Zhiwei Xiong, Hua Huang

(参考訳) 画像のデノイジングは、低歪みで高品質な知覚性能を達成することが非常に要求される計算写真における根本的な問題である。現在の方法は知覚的なパフォーマンスに苦しむか、大きな歪みに悩まされる。近年,新しい拡散モデルによって様々なタスクにおける最先端性能が達成され,そのデノナイジング機構は画像のデノナイジングに大きな可能性を示している。しかし、画像の強調のための刺激拡散モデルは単純ではなく、いくつかの重要な問題を解決する必要がある。一方、入力の不整合は拡散モデルと画像のデノージングの接続を妨げる。一方、生成した画像と所望の復号化画像とのコンテンツ不整合は、さらなる歪みをもたらす。これらの課題に対処するために,拡散モデルを理解し再考することで,DMID(Diffusion Model for Image Denoising)と呼ばれる新しい戦略を提案する。我々のDMID戦略は、雑音像を事前学習した拡散モデルに埋め込む適応埋め込み法と、復調画像の歪みを低減する適応アンサンブル法とを含む。 dmid戦略は,gaussian画像とreal-world画像の両方に対して,歪みベースおよび知覚指標の最先端性能を実現する。

Image denoising is a fundamental problem in computational photography, where achieving high-quality perceptual performance with low distortion is highly demanding. Current methods either struggle with perceptual performance or suffer from significant distortion. Recently, the emerging diffusion model achieves state-of-the-art performance in various tasks, and its denoising mechanism demonstrates great potential for image denoising. However, stimulating diffusion models for image denoising is not straightforward and requires solving several critical problems. On the one hand, the input inconsistency hinders the connection of diffusion models and image denoising. On the other hand, the content inconsistency between the generated image and the desired denoised image introduces additional distortion. To tackle these problems, we present a novel strategy called Diffusion Model for Image Denoising (DMID) by understanding and rethinking the diffusion model from a denoising perspective. Our DMID strategy includes an adaptive embedding method that embeds the noisy image into a pre-trained diffusion model, and an adaptive ensembling method that reduces distortion in the denoised image. Our DMID strategy achieves state-of-the-art performance on all distortion-based and perceptual metrics, for both Gaussian and real-world image denoising.

翻訳日:2023-07-11 16:07:24 公開日:2023-07-08

# ftfdnet: tri-modality interactionによる会話型ビデオ操作検出のための学習

FTFDNet: Learning to Detect Talking Face Video Manipulation with Tri-Modality Interaction ( http://arxiv.org/abs/2307.03990v1 )

ライセンス: Link先を確認

Ganglai Wang, Peng Zhang, Junwen Xiong, Feihan Yang, Wei Huang, and Yufei Zha

(参考訳) ディープフェイクベースのデジタル顔偽造は、特に口唇操作が発話顔生成に使われている場合、公共メディアのセキュリティを脅かしており、偽ビデオ検出の難しさがさらに改善されている。与えられた発話に合わせて唇の形を変えるだけでは、その顔の特徴を偽の顔ビデオで判別することは困難である。先行知識としての音声ストリームへの注意の欠如とともに、フェイクな会話ビデオの検出失敗も避けられないものとなった。実際の映像の光学的流れが定期的に変化する間、特に唇領域ではフェイク音声映像の光学的流れが乱れ、つまり、光学的流れからの運動特徴が操作の手がかりを捉えるのに有用であることがわかった。本研究では,効率的なクロスモーダル融合 (CMF) モジュールを用いて,視覚・音声・動作特徴を取り入れた偽音声検出ネットワーク(FTFDNet)を提案する。さらに,モジュール化によって任意の視聴覚cnnアーキテクチャにシームレスに統合可能な,より有用な機能発見のための新しいオーディオ・ビジュアル・アテンション機構 (avam) を提案する。 AVAMの追加により、提案したFTFDNetは、確立されたフェイク音声検出データセット(FTFDD)だけでなく、DeepFakeビデオ検出データセット(DFDCとDF-TIMIT)上でも、最先端のDeepFakeビデオ検出方法よりも優れた検出性能を実現することができる。

DeepFake based digital facial forgery is threatening public media security, especially when lip manipulation has been used in talking face generation, and the difficulty of fake video detection is further improved. By only changing lip shape to match the given speech, the facial features of identity are hard to be discriminated in such fake talking face videos. Together with the lack of attention on audio stream as the prior knowledge, the detection failure of fake talking face videos also becomes inevitable. It's found that the optical flow of the fake talking face video is disordered especially in the lip region while the optical flow of the real video changes regularly, which means the motion feature from optical flow is useful to capture manipulation cues. In this study, a fake talking face detection network (FTFDNet) is proposed by incorporating visual, audio and motion features using an efficient cross-modal fusion (CMF) module. Furthermore, a novel audio-visual attention mechanism (AVAM) is proposed to discover more informative features, which can be seamlessly integrated into any audio-visual CNN architecture by modularization. With the additional AVAM, the proposed FTFDNet is able to achieve a better detection performance than other state-of-the-art DeepFake video detection methods not only on the established fake talking face detection dataset (FTFDD) but also on the DeepFake video detection datasets (DFDC and DF-TIMIT).

翻訳日:2023-07-11 16:07:04 公開日:2023-07-08

# PCGに基づく静的地下ガベージシナリオ生成

PCG-based Static Underground Garage Scenario Generation ( http://arxiv.org/abs/2307.03988v1 )

ライセンス: Link先を確認

Wenjin Li and Kai Li

(参考訳) 自動運転技術にはL0からL5までの5つのレベルがある。現在、l2レベル(部分自動化)のみが達成でき、l5(フルオートメーション)の最終レベルに到達するまでには長い道のりがあります。これらのレベルを横断する鍵は、自動運転モデルのトレーニングにある。しかし、モデルをトレーニングするための実際の道路データのみに頼るだけでは十分ではなく、大量のリソースを消費します。実世界のシナリオをシミュレートするシミュレータを通じて、すでに自動運転モデルをトレーニングする例があるが、これらのシナリオには完全な手動構築が必要である。道路ネットワークフォーマットから直接3Dシーンを変換することは、大量の詳細を欠き、トレーニングセットとして使用できない。地下駐車場の静的シナリオシミュレーションは手続き的コンテンツ生成(PCG)問題と見なされる。本稿ではSarsaアルゴリズムを用いて地下のガレージ構造における手続き的コンテンツ生成を解決する。

Autonomous driving technology has five levels, from L0 to L5. Currently, only the L2 level (partial automation) can be achieved, and there is a long way to go before reaching the final level of L5 (full automation). The key to crossing these levels lies in training the autonomous driving model. However, relying solely on real-world road data to train the model is far from enough and consumes a great deal of resources. Although there are already examples of training autonomous driving models through simulators that simulate real-world scenarios, these scenarios require complete manual construction. Directly converting 3D scenes from road network formats will lack a large amount of detail and cannot be used as training sets. Underground parking garage static scenario simulation is regarded as a procedural content generation (PCG) problem. This paper will use the Sarsa algorithm to solve procedural content generation on underground garage structures.

翻訳日:2023-07-11 16:06:32 公開日:2023-07-08

# テスト時間領域一般化のための変分隣接ラベルの学習

Learning Variational Neighbor Labels for Test-Time Domain Generalization ( http://arxiv.org/abs/2307.04033v1 )

ライセンス: Link先を確認

Sameer Ambekar, Zehao Xiao, Jiayi Shen, Xiantong Zhen, Cees G. M. Snoek

(参考訳) 本稿では,モデルが対象領域にデプロイされる前に,ソースドメインのみをトレーニングするドメインの一般化について述べる。我々は、ソーストレーニングとターゲットテストの厳密な分離に従うが、推論中にラベル付けされていないターゲットデータ自体の価値を利用する。我々は3つの貢献をした。まず,実験時に対象領域に学習したモデルを一般化するために,対象サンプルの確率論的擬似ラベル化を提案する。一般化中の不確実性を考慮した分布として擬似ラベルをモデル化し、不正確な擬似ラベルの誤解を招く信号を緩和することにより、テスト時の一般化を変分推論問題として定式化する。次に,より堅牢な擬似ラベルを生成するために,隣接する対象サンプルの情報を含む変分隣接ラベルを学習する。第3に、より代表的対象情報を組み込んで、より正確で頑健な近隣ラベルを生成する能力を学ぶために、一般化手順をシミュレートする訓練中にメタ一般化ステージを導入する。 6つの広く利用されているデータセットの実験は、提案の利点、能力、有効性を示している。

This paper strives for domain generalization, where models are trained exclusively on source domains before being deployed at unseen target domains. We follow the strict separation of source training and target testing but exploit the value of the unlabeled target data itself during inference. We make three contributions. First, we propose probabilistic pseudo-labeling of target samples to generalize the source-trained model to the target domain at test time. We formulate the generalization at test time as a variational inference problem by modeling pseudo labels as distributions to consider the uncertainty during generalization and alleviate the misleading signal of inaccurate pseudo labels. Second, we learn variational neighbor labels that incorporate the information of neighboring target samples to generate more robust pseudo labels. Third, to learn the ability to incorporate more representative target information and generate more precise and robust variational neighbor labels, we introduce a meta-generalization stage during training to simulate the generalization procedure. Experiments on six widely-used datasets demonstrate the benefits, abilities, and effectiveness of our proposal.

翻訳日:2023-07-11 16:00:46 公開日:2023-07-08

# 完全情報ゲームにおける「差分」と後方誘導について

On "Indifference" and Backward Induction in Games with Perfect Information ( http://arxiv.org/abs/2307.04029v1 )

ライセンス: Link先を確認

Nimrod Megiddo

(参考訳) ゲームの2つの異なる結果に対するプレイヤーの無関心は、実際の選択が他のプレイヤーに重大な影響を与える可能性があるため、小さな摂動によって扱えない。合理的選択間の結びつきは、他のプレイヤーの効用に基づく合理性の概念の洗練によって解決できると論じられている。このような改良の1つはTit-for-Tatの概念である。

Indifference of a player with respect to two distinct outcomes of a game cannot be handled by small perturbations, because the actual choice may have significant impact on other players, and cause them to act in a way that has significant impact of the indifferent player. It is argued that ties among rational choices can be resolved by refinements of the concept of rationality based on the utilities of other players. One such refinement is the concept of Tit-for-Tat.

翻訳日:2023-07-11 16:00:27 公開日:2023-07-08

# 人体アーティストの拡散モデルの成功度を計測する

Measuring the Success of Diffusion Models at Imitating Human Artists ( http://arxiv.org/abs/2307.04028v1 )

ライセンス: Link先を確認

Stephen Casper, Zifan Guo, Shreya Mogulothu, Zachary Marinov, Chinmay Deshpande, Rui-Jie Yew, Zheng Dai, Dylan Hadfield-Menell

(参考訳) 現代の拡散モデルは、AI画像生成の最先端を定めている。彼らの成功は、著作権のある作品を含むインターネット規模のデータをトレーニングすることにある。これにより、これらのモデルが人間のアーティストの作品から学べるか、模倣するか、あるいはコピーするか、という疑問が提起される。この研究は、生成モデルの進化するエコシステムを考えると、モデルの能力に著作権責任を結び付けることが有用であることを示している。特に、著作権および生成システムの法的分析の多くは、トレーニングに保護されたデータを使用することに焦点を当てている。結果として、データ、トレーニング、システム間の接続はしばしば曖昧になる。本研究では,モデルが特定のアーティストを模倣する能力を測定するため,簡単な画像分類手法を検討する。具体的には,Contrastive Language-Image Pretrained (CLIP)エンコーダを用いて,ゼロショット方式で画像の分類を行う。私たちのプロセスは、まずモデルに特定のアーティストを模倣するよう促します。次に、CLIPを用いてアーティスト(またはアーティストの作品)を模倣から再分類できるかどうかをテストする。これらのテストがオリジナルのアーティストの模倣と一致する場合、モデルがそのアーティストの表現を模倣できることを示唆している。私たちのアプローチはシンプルで量的です。さらに、標準技術を使用し、追加のトレーニングを必要としない。著作権のある著作物を70人のプロのデジタルアーティストに模倣する、安定した拡散の能力を監査することで、このアプローチを実証する。この集合からアーティストを模倣するために安定した拡散が促されると、アーティストは平均81.0%の精度で模倣から識別できることがわかった。最後に,アーティストの作品のサンプルを,これらの模倣画像と高い統計的信頼性で一致させることができることを示す。これらの結果は、安定拡散は個人芸術家の模倣に広く成功していることを示唆している。

Modern diffusion models have set the state-of-the-art in AI image generation. Their success is due, in part, to training on Internet-scale data which often includes copyrighted work. This prompts questions about the extent to which these models learn from, imitate, or copy the work of human artists. This work suggests that tying copyright liability to the capabilities of the model may be useful given the evolving ecosystem of generative models. Specifically, much of the legal analysis of copyright and generative systems focuses on the use of protected data for training. As a result, the connections between data, training, and the system are often obscured. In our approach, we consider simple image classification techniques to measure a model's ability to imitate specific artists. Specifically, we use Contrastive Language-Image Pretrained (CLIP) encoders to classify images in a zero-shot fashion. Our process first prompts a model to imitate a specific artist. Then, we test whether CLIP can be used to reclassify the artist (or the artist's work) from the imitation. If these tests match the imitation back to the original artist, this suggests the model can imitate that artist's expression. Our approach is simple and quantitative. Furthermore, it uses standard techniques and does not require additional training. We demonstrate our approach with an audit of Stable Diffusion's capacity to imitate 70 professional digital artists with copyrighted work online. When Stable Diffusion is prompted to imitate an artist from this set, we find that the artist can be identified from the imitation with an average accuracy of 81.0%. Finally, we also show that a sample of the artist's work can be matched to these imitation images with a high degree of statistical reliability. Overall, these results suggest that Stable Diffusion is broadly successful at imitating individual human artists.

翻訳日:2023-07-11 16:00:20 公開日:2023-07-08

# ロバストランキング解説

Robust Ranking Explanations ( http://arxiv.org/abs/2307.04024v1 )

ライセンス: Link先を確認

Chao Chen, Chenghua Guo, Guixiang Ma, Ming Zeng, Xi Zhang, Sihong Xie

(参考訳) 機械学習モデルのロバストな説明は、モデルに対する人間の信頼を確立するために重要である。認識能力が限られているため、ほとんどの人間は最上位のサルエント特徴のみを解釈できる。上位のサルエント機能を敵の攻撃、特により脆弱な勾配に基づく説明に対して堅牢にすることが重要である。既存の防御力は、より弱い保護力を持つ$\ell_p$-normsを用いて堅牢である。提案手法は,サリート特徴量とアンカートップサリート特徴を効率的に最大化するために,サリート特徴量を測定するための説明厚みを定義し,その厚みの移動可能なサーロゲート境界を導出し, \textit{r2et} アルゴリズムを設計する。理論的には,R2ETと対人訓練の関連性を示す。脳ネットワークを含む幅広いネットワークアーキテクチャとデータモダリティを用いた実験では、R2ETは精度を維持しながらステルス攻撃下でのロバスト性の高さを実証している。

Robust explanations of machine learning models are critical to establish human trust in the models. Due to limited cognition capability, most humans can only interpret the top few salient features. It is critical to make top salient features robust to adversarial attacks, especially those against the more vulnerable gradient-based explanations. Existing defense measures robustness using $\ell_p$-norms, which have weaker protection power. We define explanation thickness for measuring salient features ranking stability, and derive tractable surrogate bounds of the thickness to design the \textit{R2ET} algorithm to efficiently maximize the thickness and anchor top salient features. Theoretically, we prove a connection between R2ET and adversarial training. Experiments with a wide spectrum of network architectures and data modalities, including brain networks, demonstrate that R2ET attains higher explanation robustness under stealthy attacks while retaining accuracy.

翻訳日:2023-07-11 15:59:52 公開日:2023-07-08

# 量子状態のフォック・バルグマン表現における流体力学的画像と量子計算の方法

Method of Hydrodynamic Images and Quantum Calculus in Fock-Bargmann Representation of Quantum States ( http://arxiv.org/abs/2307.04020v1 )

ライセンス: Link先を確認

Oktay K Pashaev

(参考訳) 古典的流体力学の観点からフォック空間の量子状態に対する新しいアプローチを提案する。フォック・バルグマン表現における量子状態の波動関数を表す複素解析関数の等角写像により、これらの量子状態が非圧縮的かつ非回転的古典的流体力学フローによって記述されるような複素ポテンシャルを定義する。我々のアプローチでは、波動関数の零点は同じ強度の平面内の点渦(ソース)の集合として現れ、有界領域の画像としてそれらを解釈できる。猫状態の場合、流体表現は斜めストリップ領域内の点源の記述として、無限個の周期的に分布する画像を持つ。環状領域について、無限の画像の集合は、Jackson $q$-exponential functionによって記述される。これらの関数はq-fock-bargmann表現における$q$変形量子振動子の量子コヒーレント状態の波動関数を表し、幾何進行で分布する点渦の無限集合を記述する。

We propose a new approach to quantum states in Fock space in terms of classical hydrodynamics. By conformal mapping of complex analytic function, representing the wave function of quantum states in Fock-Bargmann representation, we define the complex potential, describing these quantum states by incompressible and irrotational classical hydrodynamic flow. In our approach, zeros of the wave function appear as a set of point vortices (sources) in plane with the same strength, allowing interpretation of them as images in a bounded domain. For the cat states we find fluid representation as descriptive of a point source in the oblique strip domain, with infinite number of periodically distributed images. For the annular domain, the infinite set of images is described by Jackson $q$-exponential functions. We show that these functions represent the wave functions of quantum coherent states of the $q$-deformed quantum oscillator in q-Fock-Bargmann representation and describe the infinite set of point vortices, distributed in geometric progression.

翻訳日:2023-07-11 15:59:35 公開日:2023-07-08

# GP誘導MPPIによる複雑クラッタ環境における効率的なナビゲーション

GP-guided MPPI for Efficient Navigation in Complex Unknown Cluttered Environments ( http://arxiv.org/abs/2307.04019v1 )

ライセンス: Link先を確認

Ihab S. Mohamed, Mahmoud Ali, and Lantao Liu

(参考訳) センサー能力に制限のある未知のクラッタ環境におけるロボットナビゲーションは、ロボット工学において大きな課題となる。モデル予測パスインターガル(MPPI)のような局所軌道最適化手法はこの課題に対して有望な解決策である。しかし、特に困難な環境条件に遭遇したり、計画の地平線を越えて航行する場合には、効果的な航行を確保するために、グローバルな指導が必要である。本研究では,Sparse Gaussian Process(SGP)に基づく局所認識モデルとMPPIを統合したオンライン学習型制御戦略GP-MPPIを提案する。鍵となるアイデアは、SGPの学習能力を活用して分散(不確実性)表面を構築することで、ロボットはその周囲の移動可能な空間を学習し、提案されたサブゴールの集合を特定し、最終的に地元のMPPIプランナーに定義されたコスト関数を最小限に抑える最適なサブゴールを推奨する。その後、MPPIはロボットと衝突回避制約を満たす最適制御シーケンスを計算する。このようなアプローチは、環境のグローバルマップやオフラインのトレーニングプロセスの必要性を排除します。複雑な環境下での2次元自律走行タスクのシミュレーションおよび実環境実験により提案した制御戦略の有効性とロバスト性を検証し,ロボットを目的に向かって安全に誘導する上での優位性を実証した。追加ビデオを含むGP-MPPIのGPU実装はhttps://github.com/IhabMohamed/GP-MPPIで利用可能である。

Robotic navigation in unknown, cluttered environments with limited sensing capabilities poses significant challenges in robotics. Local trajectory optimization methods, such as Model Predictive Path Intergal (MPPI), are a promising solution to this challenge. However, global guidance is required to ensure effective navigation, especially when encountering challenging environmental conditions or navigating beyond the planning horizon. This study presents the GP-MPPI, an online learning-based control strategy that integrates MPPI with a local perception model based on Sparse Gaussian Process (SGP). The key idea is to leverage the learning capability of SGP to construct a variance (uncertainty) surface, which enables the robot to learn about the navigable space surrounding it, identify a set of suggested subgoals, and ultimately recommend the optimal subgoal that minimizes a predefined cost function to the local MPPI planner. Afterward, MPPI computes the optimal control sequence that satisfies the robot and collision avoidance constraints. Such an approach eliminates the necessity of a global map of the environment or an offline training process. We validate the efficiency and robustness of our proposed control strategy through both simulated and real-world experiments of 2D autonomous navigation tasks in complex unknown environments, demonstrating its superiority in guiding the robot safely towards its desired goal while avoiding obstacles and escaping entrapment in local minima. The GPU implementation of GP-MPPI, including the supplementary video, is available at https://github.com/IhabMohamed/GP-MPPI.

翻訳日:2023-07-11 15:59:16 公開日:2023-07-08

# 言語間の要約を再考する:コーパスに基づく研究とアノテーションの改良による新しいベンチマーク

Revisiting Cross-Lingual Summarization: A Corpus-based Study and A New Benchmark with Improved Annotation ( http://arxiv.org/abs/2307.04018v1 )

ライセンス: Link先を確認

Yulong Chen, Huajian Zhang, Yijie Zhou, Xuefeng Bai, Yueguan Wang, Ming Zhong, Jianhao Yan, Yafu Li, Judy Li, Michael Zhu, Yue Zhang

(参考訳) 既存の言語間要約(CLS)の作業の多くは、注釈付き要約を1つの言語から別の言語へシンプルかつ直接翻訳することで、CLSコーパスを構築し、要約と翻訳プロセスの両方のエラーを含むことができる。この問題に対処するため,我々は,ソース入力コンテキストを明示的に考慮した新しいアノテーションスキーマを用いて,言語間会話要約ベンチマークであるconvsumxを提案する。 ConvSumXは2つのサブタスクで構成され、それぞれが3つの言語方向をカバーする。我々はConvSumXと3つの広く使われている手書きCLSコーパスを徹底的に分析し、ConvSumXが入力テキストに対してより忠実であることを示す。さらに,同じ直観に基づいて,対話と要約の両方を入力として人間のアノテーションプロセスをシミュレートする2段階の手法を提案する。実験の結果, 2段階法がconvsumxの強力なベースラインを, 自動評価と人的評価の両方で上回った。解析により、ソース入力テキストと要約の両方が言語間要約をモデル化するのに重要であることが示された。

Most existing cross-lingual summarization (CLS) work constructs CLS corpora by simply and directly translating pre-annotated summaries from one language to another, which can contain errors from both summarization and translation processes. To address this issue, we propose ConvSumX, a cross-lingual conversation summarization benchmark, through a new annotation schema that explicitly considers source input context. ConvSumX consists of 2 sub-tasks under different real-world scenarios, with each covering 3 language directions. We conduct thorough analysis on ConvSumX and 3 widely-used manually annotated CLS corpora and empirically find that ConvSumX is more faithful towards input text. Additionally, based on the same intuition, we propose a 2-Step method, which takes both conversation and summary as input to simulate human annotation process. Experimental results show that 2-Step method surpasses strong baselines on ConvSumX under both automatic and human evaluation. Analysis shows that both source input text and summary are crucial for modeling cross-lingual summaries.

翻訳日:2023-07-11 15:58:49 公開日:2023-07-08

# 関連バイオマーカー感受性急性リンパ芽球性診断用新規パイプライン

Novel Pipeline for Diagnosing Acute Lymphoblastic Sensitive to Related Biomarkers ( http://arxiv.org/abs/2307.04014v1 )

ライセンス: Link先を確認

Amirhossein Askari-Farsangi, Ali Sharifi-Zarchi, Mohammad Hossein Rohban

(参考訳) 急性リンパ芽球性白血病(ALL)は小児の血液型で最も多い。治療プロセスの早期開始は、患者の命を救えるために重要であり、そのため、この疾患の早期診断が不可欠である。これらの患者の血液スミア画像を調べることは、専門医がこの病気を診断するために使う方法の1つである。深層学習に基づく手法は医学分野に多くの応用があり、近年大きく進歩している。この分野ではall診断は例外ではなく、この問題に対する機械学習に基づくいくつかの手法が提案されている。従来の方法では高い診断精度が報告されていたが,本研究は,ショートカットを行うモデルが意味のある決定を下さないため,これだけでは不十分であることを示した。この問題は、医療訓練データセットが小さいためである。これを解決するために、私たちはモデルを専門家の作業にインスパイアされたパイプラインに従うように制約しました。また,1つの画像のみに基づく判断が不十分であるため,現実的な結果を得るためには,マルチインスタンス学習問題として問題を再定義する必要があることを示した。私たちのモデルは、マルチインスタンス学習セットアップでこの問題に対するソリューションを提供する最初のモデルです。我々は、血液学者が使用するプロセスに近似し、疾患バイオマーカーに敏感であり、96.15%の精度、F1スコア94.24%の感度、97.56%の感度、ALL IDB 1の90.91%の特異性を実現する新しいall診断パイプラインを導入した。提案手法は, 難解なテストを行い, 許容できる性能を持つ分散型データセット上でさらに評価された。特に、私たちのモデルは比較的小さなデータセットでトレーニングされており、データ可用性が制限された他の医療データセットにアプローチを適用する可能性を強調しています。

Acute Lymphoblastic Leukemia (ALL) is one of the most common types of childhood blood cancer. The quick start of the treatment process is critical to saving the patient's life, and for this reason, early diagnosis of this disease is essential. Examining the blood smear images of these patients is one of the methods used by expert doctors to diagnose this disease. Deep learning-based methods have numerous applications in medical fields, as they have significantly advanced in recent years. ALL diagnosis is not an exception in this field, and several machine learning-based methods for this problem have been proposed. In previous methods, high diagnostic accuracy was reported, but our work showed that this alone is not sufficient, as it can lead to models taking shortcuts and not making meaningful decisions. This issue arises due to the small size of medical training datasets. To address this, we constrained our model to follow a pipeline inspired by experts' work. We also demonstrated that, since a judgement based on only one image is insufficient, redefining the problem as a multiple-instance learning problem is necessary for achieving a practical result. Our model is the first to provide a solution to this problem in a multiple-instance learning setup. We introduced a novel pipeline for diagnosing ALL that approximates the process used by hematologists, is sensitive to disease biomarkers, and achieves an accuracy of 96.15%, an F1-score of 94.24%, a sensitivity of 97.56%, and a specificity of 90.91% on ALL IDB 1. Our method was further evaluated on an out-of-distribution dataset, which posed a challenging test and had acceptable performance. Notably, our model was trained on a relatively small dataset, highlighting the potential for our approach to be applied to other medical datasets with limited data availability.

翻訳日:2023-07-11 15:58:29 公開日:2023-07-08

# BPNet:3Dポイントクラウド上のB\'ezierプリミティブセグメンテーション

BPNet: B\'ezier Primitive Segmentation on 3D Point Clouds ( http://arxiv.org/abs/2307.04013v1 )

ライセンス: Link先を確認

Rao Fu, Cheng Wen, Qian Li, Xiao Xiao, Pierre Alliez

(参考訳) 本稿では,3Dポイントクラウド上のB\'ezierプリミティブセグメンテーションを学習するための,新しいエンドツーエンドディープラーニングフレームワークBPNetを提案する。既存の作品は異なるプリミティブタイプを別々に扱うため、それらは有限形状のカテゴリに制限される。この問題に対処するため、点雲上の一般化原始セグメント化を求める。 NURBSモデル上のB\'ezier分解からインスピレーションを得て、プリミティブ型をキャストするガイドポイントクラウドセグメンテーションに転送する。カスケードアーキテクチャ上で同時にb\'ezierプリミティブセグメンテーションと幾何フィッティングを学ぶための統合最適化フレームワークを提案する。具体的には,プリミティブセグメンテーションを改善するソフト投票調整器を導入し,クラスタポイント機能への自動重み付け埋め込みモジュールを提案する。また,異なるプリミティブを持つ複数のcadモデルを同時に処理できる再構築モジュールを提案する。本研究では,abc合成データセットと実スキャンデータセットについて広範な実験を行い,そのアプローチを異なるベースライン法と比較した。実験では,推定速度が大幅に速く,従来の作業よりもセグメンテーションにおいて優れた性能を示した。

This paper proposes BPNet, a novel end-to-end deep learning framework to learn B\'ezier primitive segmentation on 3D point clouds. The existing works treat different primitive types separately, thus limiting them to finite shape categories. To address this issue, we seek a generalized primitive segmentation on point clouds. Taking inspiration from B\'ezier decomposition on NURBS models, we transfer it to guide point cloud segmentation casting off primitive types. A joint optimization framework is proposed to learn B\'ezier primitive segmentation and geometric fitting simultaneously on a cascaded architecture. Specifically, we introduce a soft voting regularizer to improve primitive segmentation and propose an auto-weight embedding module to cluster point features, making the network more robust and generic. We also introduce a reconstruction module where we successfully process multiple CAD models with different primitives simultaneously. We conducted extensive experiments on the synthetic ABC dataset and real-scan datasets to validate and compare our approach with different baseline methods. Experiments show superior performance over previous work in terms of segmentation, with a substantially faster inference speed.

翻訳日:2023-07-11 15:57:57 公開日:2023-07-08

# papillarray光触覚センサを用いたロボット把持改善のためのロバスト学習に基づく初期滑り検出

Robust Learning-Based Incipient Slip Detection using the PapillArray Optical Tactile Sensor for Improved Robotic Gripping ( http://arxiv.org/abs/2307.04011v1 )

ライセンス: Link先を確認

Qiang Wang, Pablo Martinez Ulloa, Robert Burke, David Cordova Bulens, and Stephen J. Redmond

(参考訳) スリップを検出する能力、特に初期すべりを検出することで、ロボットシステムは把握された物体が落下するのを防ぐための補正措置を取ることができる。したがってスリップ検出はロボットグリップの全体的な安全性を高めることができる。しかし,初期すべりの高精度検出は依然として大きな課題である。本稿では,PapillArray (Contactile, Australia) 触覚センサを用いた創発性すべり検出のための新しい学習手法を提案する。結果のモデルは、初期スリップに関連するパターンを識別するのに非常に効果的であり、オフラインデータセットでテストした場合、検出成功率は95.6%に達する。さらに,モデルのロバスト性を高めるために,いくつかのデータ拡張手法を導入する。トレーニングデータ収集場所と異なるロボット把持環境にトレーニングモデルを移す場合、モデルは96.8%の成功率で堅牢な性能を保ち、いくつかの実用的な把持タスクを安定化するためのタイムリーなフィードバックを提供する。プロジェクトのWebサイト: https://sites.google.com/view/incipient-slip-detection。

The ability to detect slip, particularly incipient slip, enables robotic systems to take corrective measures to prevent a grasped object from being dropped. Therefore, slip detection can enhance the overall security of robotic gripping. However, accurately detecting incipient slip remains a significant challenge. In this paper, we propose a novel learning-based approach to detect incipient slip using the PapillArray (Contactile, Australia) tactile sensor. The resulting model is highly effective in identifying patterns associated with incipient slip, achieving a detection success rate of 95.6% when tested with an offline dataset. Furthermore, we introduce several data augmentation methods to enhance the robustness of our model. When transferring the trained model to a robotic gripping environment distinct from where the training data was collected, our model maintained robust performance, with a success rate of 96.8%, providing timely feedback for stabilizing several practical gripping tasks. Our project website: https://sites.google.com/view/incipient-slip-detection.

翻訳日:2023-07-11 15:57:38 公開日:2023-07-08

# 戦略的買い手によるコンテキスト動的価格設定

Contextual Dynamic Pricing with Strategic Buyers ( http://arxiv.org/abs/2307.04055v1 )

ライセンス: Link先を確認

Pangpang Liu, Zhuoran Yang, Zhaoran Wang, Will Wei Sun

(参考訳) 個々の特性に基づいて価格を調整するパーソナライズされた価格設定は、企業によって消費者固有の価格ポリシーを実装するために一般的に使用される。このプロセスでは、購入者が戦略的に特徴データを操作して価格を下げ、特定の操作コストを発生させることができる。このような戦略的行動は、企業が利益を最大化するのを妨げる。本稿では,戦略的買い手によるコンテキスト動的価格問題について検討する。売り手は買い手の真の特徴を観察せず、買い手の戦略行動に応じて操作された特徴である。さらに、売り手は購入者の製品評価を観察しないが、販売が行われるかどうかを示すバイナリ応答のみを発行する。これらの課題を認識し,購入者の戦略行動をオンライン学習に組み込んで,販売者の累積収益を最大化する戦略的動的価格政策を提案する。まず、購入者の戦略的行動を無視する既存の非戦略的な価格ポリシーが、合計時間軸をt$で線形に$\omega(t)$の後悔をもたらすことを証明し、これらのポリシーがランダムな価格ポリシーよりも優れていることを示す。すると、提案したポリシーは、$O(\sqrt{T})$のサブ線形後悔上限を達成する。重要なことは、我々のポリシーは、既存の動的価格ポリシーと戦略的行動処理アルゴリズムの合併ではない。我々の政策は、操作の限界コストが事前に不明な場合にも適用できる。そこで我々は,オンライン価格政策における評価パラメータとコストパラメータを同時に推定し,そのパラメータを$O(\sqrt{T})$ regret bound とすることを示した。大規模な実験は、戦略的な行動に気付かない他の価格政策と比較して、我々の理論の発展を支援し、我々の政策の優れた性能を示す。

Personalized pricing, which involves tailoring prices based on individual characteristics, is commonly used by firms to implement a consumer-specific pricing policy. In this process, buyers can also strategically manipulate their feature data to obtain a lower price, incurring certain manipulation costs. Such strategic behavior can hinder firms from maximizing their profits. In this paper, we study the contextual dynamic pricing problem with strategic buyers. The seller does not observe the buyer's true feature, but a manipulated feature according to buyers' strategic behavior. In addition, the seller does not observe the buyers' valuation of the product, but only a binary response indicating whether a sale happens or not. Recognizing these challenges, we propose a strategic dynamic pricing policy that incorporates the buyers' strategic behavior into the online learning to maximize the seller's cumulative revenue. We first prove that existing non-strategic pricing policies that neglect the buyers' strategic behavior result in a linear $\Omega(T)$ regret with $T$ the total time horizon, indicating that these policies are not better than a random pricing policy. We then establish that our proposed policy achieves a sublinear regret upper bound of $O(\sqrt{T})$. Importantly, our policy is not a mere amalgamation of existing dynamic pricing policies and strategic behavior handling algorithms. Our policy can also accommodate the scenario when the marginal cost of manipulation is unknown in advance. To account for it, we simultaneously estimate the valuation parameter and the cost parameter in the online pricing policy, which is shown to also achieve an $O(\sqrt{T})$ regret bound. Extensive experiments support our theoretical developments and demonstrate the superior performance of our policy compared to other pricing policies that are unaware of the strategic behaviors.

翻訳日:2023-07-11 15:50:00 公開日:2023-07-08

# スパイクタイミング依存塑性を用いた深層教師なし学習

Deep Unsupervised Learning Using Spike-Timing-Dependent Plasticity ( http://arxiv.org/abs/2307.04054v1 )

ライセンス: Link先を確認

Sen Lu, Abhronil Sengupta

(参考訳) Spike-Timing-Dependent Plasticity (STDP)は、スパイキングニューラルネットワーク(SNN)の教師なし学習メカニズムであり、ニューロモルフィックハードウェアコミュニティから大きな注目を集めている。しかし、そのようなローカル学習技術をより深いネットワークや大規模タスクに拡張することは、いまだに不可能である。本研究では,ネットワーク出力のSTDPクラスタリングプロセスによって生成された擬似ラベルを用いて,畳み込みネットワークをタンデムで訓練するDeep-STDPフレームワークについて検討する。私たちは、$k$-meansクラスタリングアプローチとは対照的に、Tiny ImageNetデータセットの10クラスのサブセットで、より高速なコンバージェンス速度を同精度で達成します。

Spike-Timing-Dependent Plasticity (STDP) is an unsupervised learning mechanism for Spiking Neural Networks (SNNs) that has received significant attention from the neuromorphic hardware community. However, scaling such local learning techniques to deeper networks and large-scale tasks has remained elusive. In this work, we investigate a Deep-STDP framework where a convolutional network is trained in tandem with pseudo-labels generated by the STDP clustering process on the network outputs. We achieve $24.56\%$ higher accuracy and $3.5\times$ faster convergence speed at iso-accuracy on a 10-class subset of the Tiny ImageNet dataset in contrast to a $k$-means clustering approach.

翻訳日:2023-07-11 15:49:25 公開日:2023-07-08

# シンガポールにおける父親のフレイド・オンラインの状況

How is Fatherhood Framed Online in Singapore? ( http://arxiv.org/abs/2307.04053v1 )

ライセンス: Link先を確認

Tran Hien Van, Abhay Goyal, Muhammad Siddique, Lam Yin Cheung, Nimay Parekh, Jonathan Y Huang, Keri McCrickerd, Edson C Tandoc Jr., Gerard Chung, Navin Kumar

(参考訳) シンガポールにおける父性に関する議論の高まりは、その重要性を証明し、シンガポールにおける父性に関する政策決定を支援するために、父性についての枠組みを探る必要性を示している。シンガポールの父親に関する健全で包括的な政策は、子育てに関する汚名や不安を減らす可能性がある。われわれは15,705の記事と56,221の投稿を分析し、シンガポールのさまざまなオンラインプラットフォーム(ニュースメディア、育児フォーラム、Twitter)における父親の身振りについて調査した。我々はこれらの違いを理解するためにNLP手法を用いた。父親はシンガポールのオンライン環境において様々な形で構成されていたが、父親はシンガポールの家族集団の中心として構成されていたとは思えなかった。私たちの仕事の強みは、私たちが適用したさまざまなテクニックが相互に検証する方法です。

The proliferation of discussion about fatherhood in Singapore attests to its significance, indicating the need for an exploration of how fatherhood is framed, aiding policy-making around fatherhood in Singapore. Sound and holistic policy around fatherhood in Singapore may reduce stigma and apprehension around being a parent, critical to improving the nations flagging birth rate. We analyzed 15,705 articles and 56,221 posts to study how fatherhood is framed in Singapore across a range of online platforms (news outlets, parenting forums, Twitter). We used NLP techniques to understand these differences. While fatherhood was framed in a range of ways on the Singaporean online environment, it did not seem that fathers were framed as central to the Singaporean family unit. A strength of our work is how the different techniques we have applied validate each other.

翻訳日:2023-07-11 15:49:13 公開日:2023-07-08

# 分子群補助データセットへの学習

Learning to Group Auxiliary Datasets for Molecule ( http://arxiv.org/abs/2307.04052v1 )

ライセンス: Link先を確認

Tinglin Huang, Ziniu Hu, Rex Ying

(参考訳) 小さな分子データセットにおけるアノテーションの可用性の制限は、機械学習モデルに課題をもたらす。これを解決するための一般的な戦略は、追加の補助データセットとのコラボレーションである。しかし、より多くのデータを持つことは必ずしも改善を保証しない。ターゲットデータセットの知識が異なる場合や補助分子データセットの知識と矛盾する場合に負の転送が発生する。これを踏まえて、共同トレーニング時にターゲットデータセットに利益をもたらす補助分子データセットを特定することは、依然として重要かつ未解決の問題である。経験的分析により,グラフ構造類似性とタスク類似性の組み合わせが,高親和性補助データセットの同定において,より信頼性の高い指標となることを確かめた。この知見により,各補助分子データセットの潜在的な利益を予測するために,データセット親和性をタスクと構造親和性に分離するMollGroupを提案する。 MolGroupは、双方向最適化フレームワークによって最適化されたルーティングメカニズムを利用することで、これを実現する。メタ勾配を利用して、ルーティング機構はターゲットデータセットのパフォーマンスを最大化するために最適化され、アフィニティをゲーティングスコアとして定量化する。その結果、MollGroupは各ターゲットデータセットに対する補助データセットの最適な組み合わせを予測することができる。実験により,11種類の標的分子データセットにおいて,分子群から選択したgin/graphormer群に対して平均4.41%/3.47%の改善が得られた。

The limited availability of annotations in small molecule datasets presents a challenge to machine learning models. To address this, one common strategy is to collaborate with additional auxiliary datasets. However, having more data does not always guarantee improvements. Negative transfer can occur when the knowledge in the target dataset differs or contradicts that of the auxiliary molecule datasets. In light of this, identifying the auxiliary molecule datasets that can benefit the target dataset when jointly trained remains a critical and unresolved problem. Through an empirical analysis, we observe that combining graph structure similarity and task similarity can serve as a more reliable indicator for identifying high-affinity auxiliary datasets. Motivated by this insight, we propose MolGroup, which separates the dataset affinity into task and structure affinity to predict the potential benefits of each auxiliary molecule dataset. MolGroup achieves this by utilizing a routing mechanism optimized through a bi-level optimization framework. Empowered by the meta gradient, the routing mechanism is optimized toward maximizing the target dataset's performance and quantifies the affinity as the gating score. As a result, MolGroup is capable of predicting the optimal combination of auxiliary datasets for each target dataset. Our extensive experiments demonstrate the efficiency and effectiveness of MolGroup, showing an average improvement of 4.41%/3.47% for GIN/Graphormer trained with the group of molecule datasets selected by MolGroup on 11 target molecule datasets.

翻訳日:2023-07-11 15:48:58 公開日:2023-07-08

# トラックサービスネットワークにおける動的負荷計画のための最適化に基づく学習

Optimization-based Learning for Dynamic Load Planning in Trucking Service Networks ( http://arxiv.org/abs/2307.04050v1 )

ライセンス: Link先を確認

Ritesh Ojha, Wenbo Chen, Hanyu Zhang, Reem Khir, Alan Erera, Pascal Van Hentenryck

(参考訳) 負荷計画問題は、パーセルキャリアのサービスネットワーク設計において重要な課題であり、端末間の時間的ディスパッチを割り当てるトレーラー(またはロード)の数を決定する。もうひとつの重要な課題は、計画された負荷にどのようにパーセルボリュームを割り当てるかを指定するフロープランを決定することだ。本稿では,需要予測が運用開始前の時間とともに変化する中で,負荷と流れを調整するための流れと負荷計画の課題を共同で考慮した動的負荷計画問題(dlpp)について考察する。この論文は、ネットワーク全体の端末でこれらの決定を行うプランナーに通知する意思決定支援ツールの開発を目的としている。本論文は,DLPPをMIPとして定式化し,各商品を一次経路および代替経路にルーティング可能なネットワークにおいて,多数の対称性を有することを示す。その結果、最適化解法は基本的に異なる解を密接に関連する問題に還元し、プランナーを混乱させ、最適化の信頼を減らすことができる。この制限を緩和するために,参照計画に近い最適解を生成することで,これらの対称性を解消するゴール指向最適化を提案する。また,最適化モデルの計算課題に対処するための最適化プロキシを提案する。このプロキシは、機械学習モデルと実現可能性復元モデルを組み合わせて、プランナーがループ内で課すリアルタイム制約を満たすソリューションを見つける。産業インスタンスに関する広範な計算研究により、最適化プロキシは、互いに整合性のあるソリューションを生成する上で、同じ品質のソリューションと桁違いの順序を得る際に、商用の解決器よりも約10倍高速であることが示された。提案手法は,負荷統合のためのDLPPの利点と,機械学習と最適化を組み合わせることで得られる大幅な節約効果を示す。

The load planning problem is a critical challenge in service network design for parcel carriers: it decides how many trailers (or loads) to assign for dispatch over time between pairs of terminals. Another key challenge is to determine a flow plan, which specifies how parcel volumes are assigned to planned loads. This paper considers the Dynamic Load Planning Problem (DLPP) that considers both flow and load planning challenges jointly to adjust loads and flows as the demand forecast changes over time before the day of operations. The paper aims at developing a decision-support tool to inform planners making these decisions at terminals across the network. The paper formulates the DLPP as a MIP and shows that it admits a large number of symmetries in a network where each commodity can be routed through primary and alternate paths. As a result, an optimization solver may return fundamentally different solutions to closely related problems, confusing planners and reducing trust in optimization. To remedy this limitation, the paper proposes a Goal-Directed Optimization that eliminates those symmetries by generating optimal solutions staying close to a reference plan. The paper also proposes an optimization proxy to address the computational challenges of the optimization models. The proxy combines a machine learning model and a feasibility restoration model and finds solutions that satisfy real-time constraints imposed by planners-in-the-loop. An extensive computational study on industrial instances shows that the optimization proxy is around 10 times faster than the commercial solver in obtaining the same quality solutions and orders of magnitude faster for generating solutions that are consistent with each other. The proposed approach also demonstrates the benefits of the DLPP for load consolidation, and the significant savings obtained from combining machine learning and optimization.

翻訳日:2023-07-11 15:48:37 公開日:2023-07-08

# 並列アルゴリズムとニューラルネットワークの実行

Parallel Algorithms Align with Neural Execution ( http://arxiv.org/abs/2307.04049v1 )

ライセンス: Link先を確認

Valerie Engelmayer, Dobrik Georgiev, Petar Veli\v{c}kovi\'c

(参考訳) ニューラルアルゴリズム推論は並列プロセッサである。シーケンシャルアルゴリズムを教えることは、この性質に矛盾し、計算のかなりの部分を冗長にする。しかし、並列アルゴリズムは計算能力をフル活用し、より少ない層の実行を必要とする。これは、clrsフレームワーク上のシーケンシャルなコンポーネントに対して、検索、ソート、および強結合コンポーネントの並列実装を比較するときに観察されるように、トレーニング時間を劇的に削減する。さらに、ほとんどの場合、並列バージョンは予測性能が優れている。

Neural algorithmic reasoners are parallel processors. Teaching them sequential algorithms contradicts this nature, rendering a significant share of their computations redundant. Parallel algorithms however may exploit their full computational power, therefore requiring fewer layers to be executed. This drastically reduces training times, as we observe when comparing parallel implementations of searching, sorting and finding strongly connected components to their sequential counterparts on the CLRS framework. Additionally, parallel versions achieve strongly superior predictive performance in most cases.

翻訳日:2023-07-11 15:48:07 公開日:2023-07-08

# キャリブレーション・アウェア・マージン損失:深部メトリクス学習における精度・校正・校正一貫性の確立

Calibration-Aware Margin Loss: Pushing the Accuracy-Calibration Consistency Pareto Frontier for Deep Metric Learning ( http://arxiv.org/abs/2307.04047v1 )

ライセンス: Link先を確認

Qin Zhang, Linghan Xu, Qingming Tang, Jun Fang, Ying Nian Wu, Joe Tighe, Yifan Xing

(参考訳) 異なるテストクラス/ディストリビューション間で同じ距離しきい値を使用する能力は、商用画像検索システムのフリクションレス展開に非常に望ましい。しかし、最先端のメトリクス学習損失は、しばしばクラス内およびクラス間埋め込み構造を高度に変化させ、しきい値のキャリブレーションを非自明なプロセスにする。本稿では,対象校正範囲における異なるクラス間の動作特性のばらつきを計測するopis( operating-point-incosistency-score)と呼ばれる新しいメトリックを提案する。高正確性体制では、校正一貫性のコストで精度が向上するパレートフロンティアが存在することが分かっています。そこで我々は,CAM(Calibration-Aware Margin)損失という新たな正規化を開発し,学習中のクラス間の表現構造の均一性を促進する。広汎な実験は、CAMがキャリブレーション一貫性を向上し、精度を維持または向上し、最先端のメトリクス学習方法より優れていることを示す。

The ability to use the same distance threshold across different test classes / distributions is highly desired for a frictionless deployment of commercial image retrieval systems. However, state-of-the-art deep metric learning losses often result in highly varied intra-class and inter-class embedding structures, making threshold calibration a non-trivial process in practice. In this paper, we propose a novel metric named Operating-Point-Incosistency-Score (OPIS) that measures the variance in the operating characteristics across different classes in a target calibration range, and demonstrate that high accuracy of a metric learning embedding model does not guarantee calibration consistency for both seen and unseen classes. We find that, in the high-accuracy regime, there exists a Pareto frontier where accuracy improvement comes at the cost of calibration consistency. To address this, we develop a novel regularization, named Calibration-Aware Margin (CAM) loss, to encourage uniformity in the representation structures across classes during training. Extensive experiments demonstrate CAM's effectiveness in improving calibration-consistency while retaining or even enhancing accuracy, outperforming state-of-the-art deep metric learning methods.

翻訳日:2023-07-11 15:47:58 公開日:2023-07-08

# 逆訓練による非パラメトリック回帰のためのディープニューラルネットワーク推定器の超ノルム収束

Sup-Norm Convergence of Deep Neural Network Estimator for Nonparametric Regression by Adversarial Training ( http://arxiv.org/abs/2307.04042v1 )

ライセンス: Link先を確認

Masaaki Imaizumi

(参考訳) 深層ニューラルネットワーク推定器の超ノルム収束を,新しい逆訓練方式で示す。非パラメトリック回帰問題に対して、深層ニューラルネットワークを用いた推定器は、$L2$-normの意味でより良い性能が得られることが示されている。対照的に、ニューラルネットワークモデルの深い構造のため、最小二乗のニューラルネットワーク推定器が超ノルム収束を達成することは困難である。本研究では,敵対的学習方式を開発し,ディープニューラルネットワーク推定器の超ノルム収束について検討する。まず、通常の逆行訓練は神経推定器を矛盾させる。第2に,深層ニューラルネットワーク推定器は,提案する適応訓練により,超ノルム感覚の最適速度を達成することを示す。我々は,損失関数とデータ生成関数の一般設定に敵訓練を拡張する。我々の実験は理論的な結果を支持する。

We show the sup-norm convergence of deep neural network estimators with a novel adversarial training scheme. For the nonparametric regression problem, it has been shown that an estimator using deep neural networks can achieve better performances in the sense of the $L2$-norm. In contrast, it is difficult for the neural estimator with least-squares to achieve the sup-norm convergence, due to the deep structure of neural network models. In this study, we develop an adversarial training scheme and investigate the sup-norm convergence of deep neural network estimators. First, we find that ordinary adversarial training makes neural estimators inconsistent. Second, we show that a deep neural network estimator achieves the optimal rate in the sup-norm sense by the proposed adversarial training with correction. We extend our adversarial training to general setups of a loss function and a data-generating function. Our experiments support the theoretical findings.

翻訳日:2023-07-11 15:47:34 公開日:2023-07-08

# 局所的説明による人間と畳み込みニューラルネットワーク間の直接フィードバックループの設計

Designing a Direct Feedback Loop between Humans and Convolutional Neural Networks through Local Explanations ( http://arxiv.org/abs/2307.04036v1 )

ライセンス: Link先を確認

Tong Steven Sun, Yuyang Gao, Shubham Khaladkar, Sijia Liu, Liang Zhao, Young-Ho Kim, Sungsoo Ray Hong

(参考訳) 局所的な説明は、畳み込みニューラルネットワーク(CNN)がどのように出力を導出するかを説明するために、画像上のヒートマップを提供する。その視覚的直感のため、この手法はCNNを診断するための最も一般的なAI(XAI)手法の1つである。しかし、我々の形成的研究(s1)を通じて、局所的な説明に関するmlエンジニアの曖昧な見解を、cnnを構築する上で重要かつ不可欠であると同時に、脆弱性検出のヒューリスティックな性質によってそれらを使い果たしたプロセスに対して捉えました。さらに診断から得られた脆弱性に基づいてCNNを操ることは非常に困難であった。このギャップを軽減するために,ユーザとCNN間の直接フィードバックループを実現し,局所的な説明を用いてCNNの脆弱性を診断・修正する,初のインタラクティブデザインであるDeepFuseを設計した。 DeepFuseは、CNNのエンジニアが、"不合理な"ローカルな説明を体系的に検索し、労働効率のよい方法で不合理であると認識された人々に対する新しいバウンダリを注釈するのに役立つ。次に、与えられたアノテーションに基づいてモデルを制御し、モデルが同じような間違いを起こさないようにする。 CNN経験者12名を対象に2日間の研究を行った。 DeepFuseを使うことで、参加者は現在の最先端のモデルよりも正確で“合理的”なモデルを作りました。また、deepfuse guidesのケースベースの推論方法が、現在のプラクティスを実際的に改善できることにも気付いた。私たちは、将来のhci駆動設計が私たちのプラクティスを前進させ、xai駆動の洞察をより効果的にするための設計に意味を与えます。

The local explanation provides heatmaps on images to explain how Convolutional Neural Networks (CNNs) derive their output. Due to its visual straightforwardness, the method has been one of the most popular explainable AI (XAI) methods for diagnosing CNNs. Through our formative study (S1), however, we captured ML engineers' ambivalent perspective about the local explanation as a valuable and indispensable envision in building CNNs versus the process that exhausts them due to the heuristic nature of detecting vulnerability. Moreover, steering the CNNs based on the vulnerability learned from the diagnosis seemed highly challenging. To mitigate the gap, we designed DeepFuse, the first interactive design that realizes the direct feedback loop between a user and CNNs in diagnosing and revising CNN's vulnerability using local explanations. DeepFuse helps CNN engineers to systemically search "unreasonable" local explanations and annotate the new boundaries for those identified as unreasonable in a labor-efficient manner. Next, it steers the model based on the given annotation such that the model doesn't introduce similar mistakes. We conducted a two-day study (S2) with 12 experienced CNN engineers. Using DeepFuse, participants made a more accurate and "reasonable" model than the current state-of-the-art. Also, participants found the way DeepFuse guides case-based reasoning can practically improve their current practice. We provide implications for design that explain how future HCI-driven design can move our practice forward to make XAI-driven insights more actionable.

翻訳日:2023-07-11 15:47:22 公開日:2023-07-08

# 量子変分アルゴリズムにおけるショット数最小化の新しい枠組み

A novel framework for Shot number minimization in Quantum Variational Algorithms ( http://arxiv.org/abs/2307.04035v1 )

ライセンス: Link先を確認

Seyed Sajad Kahani and Amin Nobakhti

(参考訳) 変分量子アルゴリズム(VQA)は、近い将来、様々な量子コンピューティングアプリケーションに対する潜在的な解決策として注目されている。しかし、これらのアルゴリズムを量子デバイスに実装するには、しばしばかなりの量の測定が必要であり、結果として時間とリソース集約プロセスが生じる。本稿では,VQAにおけるショット評価の削減を目的とした最適化アルゴリズムの一般化フレームワークを提案する。提案するフレームワークは,推定器と最適化器を組み合わせたものである。本フレームワーク内の2つのケーススタディについて検討する。第1のケースでは,サンプル平均推定器と模擬焼鈍最適化器をペアリングし,第2のケースでは再帰的推定器と勾配降下最適化器を組み合わせる。いずれの場合も,提案手法が従来の手法と比較して顕著な性能向上をもたらすことを示す。

Variational Quantum Algorithms (VQAs) have gained significant attention as a potential solution for various quantum computing applications in the near term. However, implementing these algorithms on quantum devices often necessitates a substantial number of measurements, resulting in time-consuming and resource-intensive processes. This paper presents a generalized framework for optimization algorithms aiming to reduce the number of shot evaluations in VQAs. The proposed framework combines an estimator and an optimizer. We investigate two specific case studies within this framework. In the first case, we pair a sample mean estimator with a simulated annealing optimizer, while in the second case, we combine a recursive estimator with a gradient descent optimizer. In both instances, we demonstrate that our proposed approach yields notable performance enhancements compared to conventional methods.

翻訳日:2023-07-11 15:46:53 公開日:2023-07-08

# 連続語エキスパートの混在としての双方向注意

Bidirectional Attention as a Mixture of Continuous Word Experts ( http://arxiv.org/abs/2307.04057v1 )

ライセンス: Link先を確認

Kevin Christian Wibisono, Yixin Wang

(参考訳) 双方向注意$\unicode{x2013}$ 位置エンコーディングとマスク言語モデル(mlm)の目標 $\unicode{x2013}$ は、現代の大規模言語モデル(llm)の重要なコンポーネントとして登場した。実証的な成功にもかかわらず、統計的基盤を調査する研究はほとんどない: 双方向の注意が暗黙的に適合する統計モデルは何だろうか? 意図しない前者とは何が違うのか? この論文でこれらの疑問を探求する。キーとなる観察は、再パラメータ化時に単層単頭双方向の注意を合わせることは、単語の連続袋(CBOW)モデルにミックスオブエキスパート(MoE)重みを付けることと等価である。さらに、複数の頭部と複数の層を持つ双方向の注意は、積み重ねられたMoEとMoEの混合物と等価である。この統計学的視点は,双方向注意におけるmoeの個別的利用を明らかにした。また、文中の各単語の位置を表的特徴として見る場合、分類表データへの即時拡張も提案する。実験的な研究全体にわたって、この拡張は、out-of-distribution (OOD) 一般化において、既存の変圧器の表層拡張よりも優れていることが判明した。最後に、この双方向注意の統計的視点は、単語埋め込みに線形単語類似が存在する場合に理論的に特徴付けることができる。これらの分析により、二方向の注意は、意図しない前者よりも線形な単語類似性を示すために、はるかに強い仮定を必要とすることが示された。

Bidirectional attention $\unicode{x2013}$ composed of self-attention with positional encodings and the masked language model (MLM) objective $\unicode{x2013}$ has emerged as a key component of modern large language models (LLMs). Despite its empirical success, few studies have examined its statistical underpinnings: What statistical model is bidirectional attention implicitly fitting? What sets it apart from its non-attention predecessors? We explore these questions in this paper. The key observation is that fitting a single-layer single-head bidirectional attention, upon reparameterization, is equivalent to fitting a continuous bag of words (CBOW) model with mixture-of-experts (MoE) weights. Further, bidirectional attention with multiple heads and multiple layers is equivalent to stacked MoEs and a mixture of MoEs, respectively. This statistical viewpoint reveals the distinct use of MoE in bidirectional attention, which aligns with its practical effectiveness in handling heterogeneous data. It also suggests an immediate extension to categorical tabular data, if we view each word location in a sentence as a tabular feature. Across empirical studies, we find that this extension outperforms existing tabular extensions of transformers in out-of-distribution (OOD) generalization. Finally, this statistical perspective of bidirectional attention enables us to theoretically characterize when linear word analogies are present in its word embeddings. These analyses show that bidirectional attention can require much stronger assumptions to exhibit linear word analogies than its non-attention predecessors.

翻訳日:2023-07-11 15:38:42 公開日:2023-07-08

# 多様体フィルタ結合ネットワーク

Manifold Filter-Combine Networks ( http://arxiv.org/abs/2307.04056v1 )

ライセンス: Link先を確認

Joyce Chew and Edward De Brouwer and Smita Krishnaswamy and Deanna Needell and Michael Perlmutter

(参考訳) 我々は,Manifold Filter-Combine Networksと呼ぶ,多種多様な多様体ニューラルネットワーク(MNN)を導入する。このクラスには、Wang、Ruiz、Ribeiroによる以前の研究で考慮されたMNN、多様体散乱変換(ニューラルネットワークのウェーブレットモデル)、およびキップやウェリングのグラフ畳み込みネットワークと同等の多様体のような文献でこれまで考えられていなかった興味深い例が含まれる。次に、そのようなネットワークを実装するためのデータ駆動グラフを構築する手法について、多様体の全体的知識を持たないが有限個のサンプル点へのアクセスしか持たない場合を考える。サンプル点の数が無限になりがちであるため,ネットワークはその連続限界に確実に収束するのに十分な条件を与える。従来の作業(特定のMNNアーキテクチャやグラフ構造に焦点を当てた)とは異なり、コンバージェンスの割合は、使用するフィルタの数に明示的に依存しない。さらに,従来得られた指数的依存よりもネットワークの深さに線形依存を示す。

We introduce a large class of manifold neural networks (MNNs) which we call Manifold Filter-Combine Networks. This class includes as special cases, the MNNs considered in previous work by Wang, Ruiz, and Ribeiro, the manifold scattering transform (a wavelet-based model of neural networks), and other interesting examples not previously considered in the literature such as the manifold equivalent of Kipf and Welling's graph convolutional network. We then consider a method, based on building a data-driven graph, for implementing such networks when one does not have global knowledge of the manifold, but merely has access to finitely many sample points. We provide sufficient conditions for the network to provably converge to its continuum limit as the number of sample points tends to infinity. Unlike previous work (which focused on specific MNN architectures and graph constructions), our rate of convergence does not explicitly depend on the number of filters used. Moreover, it exhibits linear dependence on the depth of the network rather than the exponential dependence obtained previously.

翻訳日:2023-07-11 15:38:09 公開日:2023-07-08

# puffin:蒸気圧予測のためのパス統一フィードフォワードインタフェースネットワーク

PUFFIN: A Path-Unifying Feed-Forward Interfaced Network for Vapor Pressure Prediction ( http://arxiv.org/abs/2307.02903v2 )

ライセンス: Link先を確認

Vinicius Viena Santana, Carine Menezes Rebello, Luana P. Queiroz, Ana Mafalda Ribeiro, Nadia Shardt, and Idelfonso B. R. Nogueira

(参考訳) 蒸気圧の正確な予測は、様々な産業・環境用途に不可欠である。しかし, 実験の資源と労働力の強さから, 興味のあるすべての化合物の正確な測定は不可能である。蒸気圧を予測するための温度依存関係が要求されるとき、資源と労働の需要はさらに増加する。本稿では,移動学習とドメイン知識(アントワーヌ方程式)にインスパイアされた新しい帰納バイアスノードを組み合わせることで,蒸気圧予測を改善する機械学習フレームワークPUFFINを提案する。グラフ埋め込みを用いたインダクティブバイアスとトランスファーラーニングを活用することで、puffinはインダクティブバイアスを使用しない、あるいは化合物の汎用記述子を使用する代替戦略よりも優れている。このフレームワークは、データ可用性の限界を克服するためにドメイン固有の知識を組み込むことによって、他の物理化学的性質の予測を含む化学化合物分析の幅広い応用の可能性を示している。インダクティブアントインノードはネットワーク由来アントイン方程式係数を生成するため,提案する機械学習フレームワークは部分的に解釈可能である。すると、得られた分析表現を直接プロセス設計ソフトウェアに組み込んで、産業や環境で発生するプロセスの予測と制御を改善することができる。

Accurately predicting vapor pressure is vital for various industrial and environmental applications. However, obtaining accurate measurements for all compounds of interest is not possible due to the resource and labor intensity of experiments. The demand for resources and labor further multiplies when a temperature-dependent relationship for predicting vapor pressure is desired. In this paper, we propose PUFFIN (Path-Unifying Feed-Forward Interfaced Network), a machine learning framework that combines transfer learning with a new inductive bias node inspired by domain knowledge (the Antoine equation) to improve vapor pressure prediction. By leveraging inductive bias and transfer learning using graph embeddings, PUFFIN outperforms alternative strategies that do not use inductive bias or that use generic descriptors of compounds. The framework's incorporation of domain-specific knowledge to overcome the limitation of poor data availability shows its potential for broader applications in chemical compound analysis, including the prediction of other physicochemical properties. Importantly, our proposed machine learning framework is partially interpretable, because the inductive Antoine node yields network-derived Antoine equation coefficients. It would then be possible to directly incorporate the obtained analytical expression in process design software for better prediction and control of processes occurring in industry and the environment.

翻訳日:2023-07-11 10:21:33 公開日:2023-07-08

# ディープラーニングを用いた光リモートセンシング画像における指向性物体検出

Oriented Object Detection in Optical Remote Sensing Images using Deep Learning: A Survey ( http://arxiv.org/abs/2302.10473v3 )

ライセンス: Link先を確認

Kun Wang, Zi Wang, Zhang Li, Ang Su, Xichao Teng, Minhao Liu and Qifeng Yu

(参考訳) 指向オブジェクト検出は、リモートセンシングにおける最も基本的かつ挑戦的なタスクの1つであり、多数の事前定義されたオブジェクトカテゴリの指向オブジェクトを見つけることを目的としている。近年,光リモートセンシング画像における指向性物体の検出において,深層学習に基づく手法が顕著な成果を上げている。しかし,リモートセンシングにおける文献の徹底的なレビューは行われていない。そこで我々は,近年の進歩を包括的に調査し,問題定義,一般的なデータセット,評価プロトコル,検出フレームワーク,オブジェクト指向オブジェクト表現,特徴表現など,オブジェクト指向オブジェクト検出の多くの側面をカバーする。さらに,最先端の手法を分析し,考察する。最後に,今後の研究の方向性を議論し,有用な研究指導を行う。この調査は学界や産業界の研究者にとって

Oriented object detection is one of the most fundamental and challenging tasks in remote sensing, aiming at locating the oriented objects of numerous predefined object categories. Recently, deep learning based methods have achieved remarkable performance in detecting oriented objects in optical remote sensing imagery. However, a thorough review of the literature in remote sensing has not yet emerged. Therefore, we give a comprehensive survey of recent advances and cover many aspects of oriented object detection, including problem definition, commonly used datasets, evaluation protocols, detection frameworks, oriented object representations, and feature representations. Besides, the state-of-the-art methods are analyzed and discussed. We finally discuss future research directions to put forward some useful research guidance. We believe that this survey shall be valuable to researchers across academia and industry

翻訳日:2023-07-11 10:19:13 公開日:2023-07-08

# Proto-CLIP:Few-Shot Learningのためのビジョン言語プロトタイプネットワーク

Proto-CLIP: Vision-Language Prototypical Network for Few-Shot Learning ( http://arxiv.org/abs/2307.03073v2 )

ライセンス: Link先を確認

Jishnu Jaykumar P, Kamalesh Palanisamy, Yu-Wei Chao, Xinya Du, Yu Xiang

(参考訳) 本稿では,CLIPのような大規模視覚言語モデルを活用することで,数ショット学習のための新しいフレームワークを提案する。初歩学習のためのユニモーダルな原型的ネットワークに動機づけられ,初歩学習に画像プロトタイプとテキストプロトタイプを利用するproto-clipを導入した。具体的には、PROTO-CLIPは、CLIP内の画像エンコーダとテキストエンコーダを、少数の例を用いて共同で適応させる。 2つのエンコーダは、分類のための画像クラスのプロトタイプを計算するために使用される。適応中に、対応するクラスの画像とテキストのプロトタイプの整列を提案する。このようなアライメントは、両タイプのプロトタイプからの貢献により、少数ショットの分類に有用である。本手法の有効性を,数発の学習のためのベンチマークデータセットと,ロボットの知覚のための実世界で実験することで実証する。

We propose a novel framework for few-shot learning by leveraging large-scale vision-language models such as CLIP. Motivated by the unimodal prototypical networks for few-shot learning, we introduce PROTO-CLIP that utilizes image prototypes and text prototypes for few-shot learning. Specifically, PROTO-CLIP adapts the image encoder and text encoder in CLIP in a joint fashion using few-shot examples. The two encoders are used to compute prototypes of image classes for classification. During adaptation, we propose aligning the image and text prototypes of corresponding classes. Such a proposed alignment is beneficial for few-shot classification due to the contributions from both types of prototypes. We demonstrate the effectiveness of our method by conducting experiments on benchmark datasets for few-shot learning as well as in the real world for robot perception.

翻訳日:2023-07-11 10:11:45 公開日:2023-07-08

PDF登録状況（公開日: 20230708）