Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20230709となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# SeePrivacy: モバイルアプリケーションのためのコンテキストプライバシポリシの自動生成 SeePrivacy: Automated Contextual Privacy Policy Generation for Mobile Applications ( http://arxiv.org/abs/2307.01691v3 ) ライセンス: Link先を確認	Shidong Pan, Zhen Tao, Thong Hoang, Dawen Zhang, Zhenchang Xing, Xiwei Xu, Mark Staples, and David Lo	(参考訳) プライバシーポリシーは個人のプライバシーとデジタルセキュリティを守るための最も重要なアプローチとなっている。プレゼンテーションと可読性を高めるために、研究者はコンテキストプライバシポリシ(cpps)の概念を提案し、ポリシーを短いスニペットに断片化し、対応するコンテキストでのみ表示する。本稿では,モバイルアプリのコンテキストプライバシポリシを自動的に生成するように設計された,新たなマルチモーダルフレームワークseeprivacyを提案する。本手法は,モバイルguiの理解とプライバシーポリシー文書分析を相乗的に組み合わせ,プライバシー関連コンテキスト検出のための83.6%のカバー率と,対応するポリシーセグメントを抽出する際の精度0.92である。驚くべきことに、検索されたポリシーセグメントの96%は、彼らのコンテキストと正しくマッチすることができる。 SeePrivacyは優れた機能とユーザビリティ(4.5/5)を示している。具体的には、参加者はオリジナルのプライバシーポリシー(2/5)と比較してCPP(4.1/5)を読む意欲が強い。弊社のソリューションは、ユーザのプライバシー通知の理解を効果的に支援し、この研究は、さらなる進歩と探索のための確かな基盤を確立する。 Privacy policies have become the most critical approach to safeguarding individuals' privacy and digital security. To enhance their presentation and readability, researchers propose the concept of contextual privacy policies (CPPs), aiming to fragment policies into shorter snippets and display them only in corresponding contexts. In this paper, we propose a novel multi-modal framework, namely SeePrivacy, designed to automatically generate contextual privacy policies for mobile apps. Our method synergistically combines mobile GUI understanding and privacy policy document analysis, yielding an impressive overall 83.6% coverage rate for privacy-related context detection and an accuracy of 0.92 in extracting corresponding policy segments. Remarkably, 96% of the retrieved policy segments can be correctly matched with their contexts. The user study shows SeePrivacy demonstrates excellent functionality and usability (4.5/5). Specifically, participants exhibit a greater willingness to read CPPs (4.1/5) compared to original privacy policies (2/5). Our solution effectively assists users in comprehending privacy notices, and this research establishes a solid foundation for further advancements and exploration.	翻訳日:2023-10-23 18:27:04 公開日:2023-07-09
# 機械学習ライブラリの自動静的バグ検出:まだ存在するか? Automatic Static Bug Detection for Machine Learning Libraries: Are We There Yet? ( http://arxiv.org/abs/2307.04080v1 ) ライセンス: Link先を確認	Nima Shiri harzevili, Jiho Shin, Junjie Wang, Song Wang, Nachiappan Nagappan	(参考訳) ソフトウェアバグの自動検出は、ソフトウェアセキュリティにおいて重要なタスクである。バグ検出に役立つ多くの静的ツールが提案されている。これらの静的バグ検出は主に、一般的なソフトウェアプロジェクトで評価されているが、機械学習ライブラリの実用性と有用性に疑問を投げかける。本稿では、mlpack、mxnet、pytorch、tensorflowを含む4つのポピュラーな機械学習ライブラリから収集されたソフトウェアバグのキュレートされたデータセットについて、rustfinder、 rats、cppcheck、facebook infer、clang static analyzerの5つ、合計410の既知のバグを分析して、この質問に答える。私たちの研究は、これらのツールの能力を分類し、機械学習ライブラリ内のソフトウェアバグを検出するツールの強みと弱みをよりよく理解する。全体として,静的なバグ検出者は6/410のバグ(0.01%),欠陥発見者およびラットが,機械学習ライブラリでソフトウェアバグを見つける上で最も効果的な静的チェッカーであることを示す。観察結果に基づいて,ツールをより効果的かつ実用的なものにするための機会を更に特定し,議論する。 Automatic detection of software bugs is a critical task in software security. Many static tools that can help detect bugs have been proposed. While these static bug detectors are mainly evaluated on general software projects call into question their practical effectiveness and usefulness for machine learning libraries. In this paper, we address this question by analyzing five popular and widely used static bug detectors, i.e., Flawfinder, RATS, Cppcheck, Facebook Infer, and Clang static analyzer on a curated dataset of software bugs gathered from four popular machine learning libraries including Mlpack, MXNet, PyTorch, and TensorFlow with a total of 410 known bugs. Our research provides a categorization of these tools' capabilities to better understand the strengths and weaknesses of the tools for detecting software bugs in machine learning libraries. Overall, our study shows that static bug detectors find a negligible amount of all bugs accounting for 6/410 bugs (0.01%), Flawfinder and RATS are the most effective static checker for finding software bugs in machine learning libraries. Based on our observations, we further identify and discuss opportunities to make the tools more effective and practical.	翻訳日:2023-10-23 18:06:00 公開日:2023-07-09
# 要件トレーサビリティ: オブジェクト指向ソフトウェアシステムの要件とソースコード間のトレーサビリティリンクの回復と可視化 Requirements Traceability: Recovering and Visualizing Traceability Links Between Requirements and Source Code of Object-oriented Software Systems ( http://arxiv.org/abs/2307.05188v1 ) ライセンス: Link先を確認	Ra'Fat Al-Msie'deen	(参考訳) 要求トレーサビリティは、要求工学において効果的な要求管理手法に到達するための重要な活動である。要件間トレーサビリティリンク(rtc-tls)は、要件とソースコードアーチファクトの関係を形作る。 RtC-TLは、ソフトウェアコードのどの部分が特定の要件を実装するかを知るのに役立つ。さらに、これらのリンクはエンジニアがソフトウェアの正しいメンタルモデルを維持するのを手助けし、大規模で複雑なソフトウェアで主に要求が変化するときにコード品質が低下するリスクを減らすことができる。しかし、これらのTLを手動でリカバリし保存することは、エンジニアにさらなる負担を与え、エラーを起こしやすく、面倒で、コストのかかる作業である。本稿では,Latent Semantic Indexing (LSI) とFormal Concept Analysis (FCA) に基づくオブジェクト指向ソフトウェアにおいて,RtC-TLを復元・可視化するための自動アプローチと実装であるYamenTraceを紹介する。 YamenTraceの独創性は、TLSリカバリプロセスにおけるすべてのコード識別子名、コメント、リレーションシップを活用することである。 YamenTraceはLSIを使用して、ソフトウェアコードと要件間のテキスト類似性を見つける。 FCAは類似のコードと要件を一緒にクラスタリングする。さらにYamenTraceは、回復したTLを視覚化する。 YamenTraceを検証するために、3つのケーススタディに適用した。この評価の結果、RtC-TLの大部分が正しく回収され、視覚化されたため、YamenTraceの提案の重要性と性能が証明された。 Requirements traceability is an important activity to reach an effective requirements management method in the requirements engineering. Requirement-to-Code Traceability Links (RtC-TLs) shape the relations between requirement and source code artifacts. RtC-TLs can assist engineers to know which parts of software code implement a specific requirement. In addition, these links can assist engineers to keep a correct mental model of software, and decreasing the risk of code quality degradation when requirements change with time mainly in large sized and complex software. However, manually recovering and preserving of these TLs puts an additional burden on engineers and is error-prone, tedious, and costly task. This paper introduces YamenTrace, an automatic approach and implementation to recover and visualize RtC-TLs in Object-Oriented software based on Latent Semantic Indexing (LSI) and Formal Concept Analysis (FCA). The originality of YamenTrace is that it exploits all code identifier names, comments, and relations in TLs recovery process. YamenTrace uses LSI to find textual similarity across software code and requirements. While FCA employs to cluster similar code and requirements together. Furthermore, YamenTrace gives a visualization of recovered TLs. To validate YamenTrace, it applied on three case studies. The findings of this evaluation prove the importance and performance of YamenTrace proposal as most of RtC-TLs were correctly recovered and visualized.	翻訳日:2023-10-23 17:54:53 公開日:2023-07-09
# 時空間学習のための半教師付きメタ学習 Semi Supervised Meta Learning for Spatiotemporal Learning ( http://arxiv.org/abs/2308.01916v1 ) ライセンス: Link先を確認	Faraz Waseem, Pratyush Muthukumar	(参考訳) メタラーニングを自己指導型マスク付きオートエンコーダに適用し,時空間学習を3段階に分けた。我々は,メタラーニングを既存の最先端表現学習アーキテクチャに適用することの影響を広く理解しようと試みている。そこで我々は,メタラーニングアーキテクチャのみ,表現学習アーキテクチャのみ,表現学習をメタラーニングアーキテクチャとともに適用するアーキテクチャという,時空間学習をテストする。メモリ拡張ニューラルネットワーク(MANN)アーキテクチャを用いて、メタ学習をフレームワークに適用する。具体的には,事前学習したMAEを適用して,ビデオ再構成作業のための小規模な時空間データセットを微調整する実験を行った。次に、maeエンコーダを訓練し、アクション分類タスクに分類ヘッドを適用する実験を行う。最後に、動作分類タスクに事前訓練されたMAEとMANNバックボーンの微調整を適用する実験を行った。 We approached the goal of applying meta-learning to self-supervised masked autoencoders for spatiotemporal learning in three steps. Broadly, we seek to understand the impact of applying meta-learning to existing state-of-the-art representation learning architectures. Thus, we test spatiotemporal learning through: a meta-learning architecture only, a representation learning architecture only, and an architecture applying representation learning alongside a meta learning architecture. We utilize the Memory Augmented Neural Network (MANN) architecture to apply meta-learning to our framework. Specifically, we first experiment with applying a pre-trained MAE and fine-tuning on our small-scale spatiotemporal dataset for video reconstruction tasks. Next, we experiment with training an MAE encoder and applying a classification head for action classification tasks. Finally, we experiment with applying a pre-trained MAE and fine-tune with MANN backbone for action classification tasks.	翻訳日:2023-08-14 02:05:57 公開日:2023-07-09
# 生成的閉ループ型人工知能による基礎科学の未来 The Future of Fundamental Science Led by Generative Closed-Loop Artificial Intelligence ( http://arxiv.org/abs/2307.07522v1 ) ライセンス: Link先を確認	Hector Zenil, Jesper Tegn\'er, Felipe S. Abrah\~ao, Alexander Lavin, Vipin Kumar, Jeremy G. Frey, Adrian Weller, Larisa Soldatova, Alan R. Bundy, Nicholas R. Jennings, Koichi Takahashi, Lawrence Hunter, Saso Dzeroski, Andrew Briggs, Frederick D. Gregory, Carla P. Gomes, Christopher K. I. Williams, Jon Rowe, James Evans, Hiroaki Kitano, Joshua B. Tenenbaum, Ross King	(参考訳) ジェネレーティブAIやLLMなど、機械学習とAIの最近の進歩は、技術革新、製品開発、社会全体を破壊している。 AIのテクノロジへの貢献は、大規模なトレーニングデータセットへのアクセスと、パターン認識や分類から生成モデルまで、パフォーマンス評価基準の明確化を必要とする複数のアプローチから得ることができる。しかしaiは、科学的な実践やモデル発見のための高品質なデータの大規模なデータセットへのアクセスが難しいため、基礎科学にはあまり貢献していない。生成的AI、特に大規模言語モデルは、定量的モデルによる基礎的な深層科学の科学的発見を拡大し加速する機会である。ここでは、自己駆動仮説生成や仮説空間のオープンエンド自律探索を含む、科学的な発見に対するAI駆動、自動化されたクローズドループアプローチの側面を調査し、調査する。 AIによる自動化を科学の実践に統合することは、発見の複製、データの体系的な生産、究極的には科学プロセスの民主化など、現在の問題を緩和する。これらの可能性を実現するには、aiのビジョンと、因果分析とモデル発見の基本的な側面に対処できるaiアプローチの多様性が必要となる。これらの進歩は、人間の科学者が達成した以上の世界の基本構造を探索し発見するAIの可能性を解き放つと約束している。このようなビジョンは、現在のワークフローを自動化するのではなく、新しい基礎科学の境界を推し進め、今日の人類が直面している最大の課題に取り組むために技術革新のための扉を開くだろう。 Recent advances in machine learning and AI, including Generative AI and LLMs, are disrupting technological innovation, product development, and society as a whole. AI's contribution to technology can come from multiple approaches that require access to large training data sets and clear performance evaluation criteria, ranging from pattern recognition and classification to generative models. Yet, AI has contributed less to fundamental science in part because large data sets of high-quality data for scientific practice and model discovery are more difficult to access. Generative AI, in general, and Large Language Models in particular, may represent an opportunity to augment and accelerate the scientific discovery of fundamental deep science with quantitative models. Here we explore and investigate aspects of an AI-driven, automated, closed-loop approach to scientific discovery, including self-driven hypothesis generation and open-ended autonomous exploration of the hypothesis space. Integrating AI-driven automation into the practice of science would mitigate current problems, including the replication of findings, systematic production of data, and ultimately democratisation of the scientific process. Realising these possibilities requires a vision for augmented AI coupled with a diversity of AI approaches able to deal with fundamental aspects of causality analysis and model discovery while enabling unbiased search across the space of putative explanations. These advances hold the promise to unleash AI's potential for searching and discovering the fundamental structure of our world beyond what human scientists have been able to achieve. Such a vision would push the boundaries of new fundamental science rather than automatize current workflows and instead open doors for technological innovation to tackle some of the greatest challenges facing humanity today.	翻訳日:2023-07-23 12:15:29 公開日:2023-07-09
# モデルバイアスからの社会不平等の解消--離婚訴訟手続における性不平等 Disentangling Societal Inequality from Model Biases: Gender Inequality in Divorce Court Proceedings ( http://arxiv.org/abs/2307.10200v1 ) ライセンス: Link先を確認	Sujan Dutta and Parth Srivastava and Vaishnavi Solunke and Swaprava Nath and Ashiqur R. KhudaBukhsh	(参考訳) 離婚は、裁判所による結婚の法的解消である。これは通常、婚姻組合の不愉快な結果であるので、各当事者は、裁判所の手続で概ね詳細に文書化されている離脱決定を呼ぶ理由があるかもしれない。本稿では,17,306件の訴訟手続の実質的コーパスとして,離婚手続のレンズを通して男女不平等を調査する。センシティブな社会問題に関する新たなデータソース(例えば、公判記録)は、社会科学研究を支援する可能性を秘めているが、最先端自然言語処理(nlp)の手法に存在するバイアスは、そのような研究に干渉または影響する可能性がある。したがって、既存のNLPリソースに存在する潜在的なギャップと限界を徹底的に分析する必要がある。本稿では,既存のNLP資源が社会的不平等の定量化にいくつかの非自明な修正を必要としていることを示す。従属的な側面では、多くの訴訟は、女性が父長制にますます挑戦しているインドにおける規範を変えることを示唆しているが、これらの訴訟のaiによる分析は、しばしば家庭内暴力にさらされる女性との男女不平等を示すことを示している。 Divorce is the legal dissolution of a marriage by a court. Since this is usually an unpleasant outcome of a marital union, each party may have reasons to call the decision to quit which is generally documented in detail in the court proceedings. Via a substantial corpus of 17,306 court proceedings, this paper investigates gender inequality through the lens of divorce court proceedings. While emerging data sources (e.g., public court records) on sensitive societal issues hold promise in aiding social science research, biases present in cutting-edge natural language processing (NLP) methods may interfere with or affect such studies. We thus require a thorough analysis of potential gaps and limitations present in extant NLP resources. In this paper, on the methodological side, we demonstrate that existing NLP resources required several non-trivial modifications to quantify societal inequalities. On the substantive side, we find that while a large number of court cases perhaps suggest changing norms in India where women are increasingly challenging patriarchy, AI-powered analyses of these court proceedings indicate striking gender inequality with women often subjected to domestic violence.	翻訳日:2023-07-23 11:27:13 公開日:2023-07-09
# グリッド衛星とゲージ計測降水データを組み合わせたアンサンブル学習 Ensemble learning for blending gridded satellite and gauge-measured precipitation data ( http://arxiv.org/abs/2307.06840v1 ) ライセンス: Link先を確認	Georgia Papacharalampous, Hristos Tyralis, Nikolaos Doulamis, Anastasios Doulamis	(参考訳) 回帰アルゴリズムは衛星降水の精度を向上させるために定期的に用いられる。この文脈では、地上測定は依存変数であり、衛星データは地形因子と共に予測変数である。これに加えて、アンサンブル学習によるアルゴリズムの組み合わせが予測性能を大幅に向上させる可能性があると多くの分野において認識されている。しかし,衛星沈殿物の精度を向上させるためのアンサンブル学習者の数は少なく,その大規模比較は文献に欠落している。本研究では,この分野で新たに11人のアンサンブル学習者を提案し,アメリカ合衆国全域と15年間にわたってそれを広範囲に比較することにより,このギャップを埋める。 PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) とIMERG (Integrated Multi-SatellitE Retrievals for GPM) のグリッド化されたデータセットから月毎のデータを利用する。また,global historical climatology network monthly database, version 2 (ghcnm) からのゲージ測定降水データも使用する。学習者は、多変量適応回帰スプライン(mars)、多変量適応多項式スプライン(poly-mars)、ランダムフォレスト(rf)、勾配ブースティングマシン(gbm)、極端な勾配ブースティング(xgboost)、ベイズ正規化ニューラルネットワーク(brnn)の6つの回帰アルゴリズム(ベース学習者)による予測を組み合わせて、それぞれ異なるコンビネータに基づいている。コンバインダーには、等重量コンバインダー、中央結合器、2つの最高の学習者、洗練された積み重ね法の7つの変種が含まれる。後者は、ベース学習者のトップに回帰アルゴリズムを積み重ねて、独立した予測を組み合わせる。 Regression algorithms are regularly used for improving the accuracy of satellite precipitation products. In this context, ground-based measurements are the dependent variable and the satellite data are the predictor variables, together with topography factors. Alongside this, it is increasingly recognised in many fields that combinations of algorithms through ensemble learning can lead to substantial predictive performance improvements. Still, a sufficient number of ensemble learners for improving the accuracy of satellite precipitation products and their large-scale comparison are currently missing from the literature. In this work, we fill this specific gap by proposing 11 new ensemble learners in the field and by extensively comparing them for the entire contiguous United States and for a 15-year period. We use monthly data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and IMERG (Integrated Multi-satellitE Retrievals for GPM) gridded datasets. We also use gauge-measured precipitation data from the Global Historical Climatology Network monthly database, version 2 (GHCNm). The ensemble learners combine the predictions by six regression algorithms (base learners), namely the multivariate adaptive regression splines (MARS), multivariate adaptive polynomial splines (poly-MARS), random forests (RF), gradient boosting machines (GBM), extreme gradient boosting (XGBoost) and Bayesian regularized neural networks (BRNN), and each of them is based on a different combiner. The combiners include the equal-weight combiner, the median combiner, two best learners and seven variants of a sophisticated stacking method. The latter stacks a regression algorithm on the top of the base learners to combine their independent predictions...	翻訳日:2023-07-14 14:08:49 公開日:2023-07-09
# 公正なアルゴリズム設計:公正で効率的なマシンスケジューリング Fair Algorithm Design: Fair and Efficacious Machine Scheduling ( http://arxiv.org/abs/2204.06438v2 ) ライセンス: Link先を確認	April Niu, Agnes Totschnig, Adrian Vetta	(参考訳) 自動決定アルゴリズムによってバイアスが誘導される多くの実践例に動機付けられ、近年、公正アルゴリズムの設計に強い関心が寄せられている。しかし、公正性と有効性の間には二分されることが多く、公正なアルゴリズムは低い社会福祉の解決をもたらすが、福祉最適化アルゴリズムは非常に不公平である。この問題は、機械スケジューリング問題において例示されており、$n$ジョブの場合、公正なソリューションの社会的福祉は、最適な福祉よりも悪い$\Omega(n)$ファクタである可能性がある。本稿では, 公平性と有効性の二分法が, 「ほぼ完全に公平」であり, 一定の因子有効率を持つアルゴリズムが存在すること, すなわち, 社会福祉を最適福祉の一定の要因内に持つ解を出力することが保証されていることを証明した。具体的には、$\epsilon>0$に対して、有効率$\Theta(\frac{1}{\epsilon})$のメカニズムが存在し、最も公平なソリューション(個人データや型データを使用しないアルゴリズムによって)に比較して$\epsilon$分の1以上のエージェントは存在しない。さらに、これらのbicriteriaの保証は厳密であり、単一マシンケースと複数マシンケースの両方に適用できる。私たちの結果の鍵は、Paretoスケジューリングメカニズムの使用です。これらのメカニズムは、個人またはタイプデータの司法的利用によって、個々の個人に利益をもたらすパレートの改善を利用することができる。このパラダイムは、偏見を無視するコストで性能を大幅に向上させる公平なアルゴリズムによる個人データの司法的利用であり、幅広い応用が期待できる。 Motivated by a plethora of practical examples where bias is induced by automated-decision making algorithms, there has been strong recent interest in the design of fair algorithms. However, there is often a dichotomy between fairness and efficacy: fair algorithms may proffer low social welfare solutions whereas welfare optimizing algorithms may be very unfair. This issue is exemplified in the machine scheduling problem where, for $n$ jobs, the social welfare of any fair solution may be a factor $\Omega(n)$ worse than the optimal welfare. In this paper, we prove that this dichotomy between fairness and efficacy can be overcome if we allow for a negligible amount of bias: there exist algorithms that are both "almost perfectly fair" and have a constant factor efficacy ratio, that is, are guaranteed to output solutions that have social welfare within a constant factor of optimal welfare. Specifically, for any $\epsilon>0$, there exist mechanisms with efficacy ratio $\Theta(\frac{1}{\epsilon})$ and where no agent is more than an $\epsilon$ fraction worse off than they are in the fairest possible solution (given by an algorithm that does not use personal or type data). Moreover, these bicriteria guarantees are tight and apply to both the single machine case and the multiple machine case. The key to our results are the use of Pareto scheduling mechanisms. These mechanisms, by the judicious use of personal or type data, are able to exploit Pareto improvements that benefit every individual; such Pareto improvements would typically be forbidden by fair scheduling algorithms designed to satisfy standard statistical measures of group fairness. We anticipate this paradigm, the judicious use of personal data by a fair algorithm to greatly improve performance at the cost of negligible bias, has wider application.	翻訳日:2023-07-13 20:47:25 公開日:2023-07-09
# 各種計量における1中心の複雑さについて On Complexity of 1-Center in Various Metrics ( http://arxiv.org/abs/2112.03222v3 ) ライセンス: Link先を確認	Amir Abboud, Mohammad Hossein Bateni, Vincent Cohen-Addad, Karthik C. S., and Saeed Seddighin	(参考訳) 古典的な 1 中心問題を考える: 計量空間の集合 $P$ の$n$ 点が与えられたとき、P$ の点を見つけると、他の点への最大距離が $P$ になる。我々は、この問題の複雑さを、$d$-dimensional $\ell_p$-metricsと、$d$の文字列に対するeditおよびummメトリクスで研究する。 1中心問題に対する我々の結果は以下の$d$に基づいて分類することができる。 $\bullet$ small $d$: ヒット集合予想 (hsc) を仮定すると、$d=\omega(\log n)$ のとき、$\ell_p$-metrics または編集または ulam メトリクスのいずれかにおいて、1-センタ問題を解くサブクアドラティックなアルゴリズムは存在しない。 $\bullet$ Large $d$: if $d=\Omega(n)$ では、条件付き下限を拡張して、(量子化SETHを仮定すると)1中心問題に対する部分量子アルゴリズムを除外します。一方、1+\epsilon)$-approximation for 1-center in Ulam metric with running time $\tilde{O_{\varepsilon}}(nd+n^2\sqrt{d})$とする。また、上記の下限のいくつかを近似化したり、次元 $d$ を減らすことで強化するが、全ての必要な解をリストアップするより弱いアルゴリズムのクラスに対してのみ適用する。さらに、私たちは難しさの1つを拡張して、編集メートル法でよく研究された1-median問題の下位4次アルゴリズムを除外し、長さ$n$のそれぞれ$n$文字列のセットが与えられた場合、編集距離の和をセット内の他の文字列の和に最小化する文字列を見つけることを目標としている。 We consider the classic 1-center problem: Given a set $P$ of $n$ points in a metric space find the point in $P$ that minimizes the maximum distance to the other points of $P$. We study the complexity of this problem in $d$-dimensional $\ell_p$-metrics and in edit and Ulam metrics over strings of length $d$. Our results for the 1-center problem may be classified based on $d$ as follows. $\bullet$ Small $d$: Assuming the hitting set conjecture (HSC), we show that when $d=\omega(\log n)$, no subquadratic algorithm can solve 1-center problem in any of the $\ell_p$-metrics, or in edit or Ulam metrics. $\bullet$ Large $d$: When $d=\Omega(n)$, we extend our conditional lower bound to rule out subquartic algorithms for 1-center problem in edit metric (assuming Quantified SETH). On the other hand, we give a $(1+\epsilon)$-approximation for 1-center in Ulam metric with running time $\tilde{O_{\varepsilon}}(nd+n^2\sqrt{d})$. We also strengthen some of the above lower bounds by allowing approximations or by reducing the dimension $d$, but only against a weaker class of algorithms which list all requisite solutions. Moreover, we extend one of our hardness results to rule out subquartic algorithms for the well-studied 1-median problem in the edit metric, where given a set of $n$ strings each of length $n$, the goal is to find a string in the set that minimizes the sum of the edit distances to the rest of the strings in the set.	翻訳日:2023-07-13 20:46:01 公開日:2023-07-09
# GreenKGC:軽量な知識グラフ補完方法 GreenKGC: A Lightweight Knowledge Graph Completion Method ( http://arxiv.org/abs/2208.09137v2 ) ライセンス: Link先を確認	Yun-Cheng Wang, Xiou Ge, Bin Wang, C.-C. Jay Kuo	(参考訳) 知識グラフ補完(KGC)は、知識グラフ(KG)におけるエンティティ間の欠落した関係を発見することを目的としている。初期のkgcの研究は、単純なスコアリング関数を通じてエンティティとリレーションの埋め込みを学ぶことに焦点を当てている。しかし、より高次元の埋め込み空間は、より優れた推論能力のために要求されるため、モデルのサイズが大きくなり、現実世界の問題(大規模なKGやモバイル/エッジコンピューティングなど)への適用が妨げられる。この問題に対処するために,GreenKGCと呼ばれる軽量モジュール化KGCソリューションが提案されている。 GreenKGCは、表現学習、特徴抽出、決定学習の3つのモジュールから構成され、識別可能なKG特徴を抽出し、分類器と負のサンプリングを用いて、行方不明な関係を正確に予測する。実験により、低次元では、GreenKGCはほとんどのデータセットでSOTA法より優れていることが示された。さらに、低次元のGreenKGCは、モデルサイズがはるかに小さい高次元モデルに対して、競争力や性能が向上する。 Knowledge graph completion (KGC) aims to discover missing relationships between entities in knowledge graphs (KGs). Most prior KGC work focuses on learning embeddings for entities and relations through a simple scoring function. Yet, a higher-dimensional embedding space is usually required for a better reasoning capability, which leads to a larger model size and hinders applicability to real-world problems (e.g., large-scale KGs or mobile/edge computing). A lightweight modularized KGC solution, called GreenKGC, is proposed in this work to address this issue. GreenKGC consists of three modules: representation learning, feature pruning, and decision learning, to extract discriminant KG features and make accurate predictions on missing relationships using classifiers and negative sampling. Experimental results demonstrate that, in low dimensions, GreenKGC can outperform SOTA methods in most datasets. In addition, low-dimensional GreenKGC can achieve competitive or even better performance against high-dimensional models with a much smaller model size.	翻訳日:2023-07-13 20:37:56 公開日:2023-07-09
# Misogynist Incels Forumにおけるアイデンティティ構築 Identity Construction in a Misogynist Incels Forum ( http://arxiv.org/abs/2306.15745v3 ) ライセンス: Link先を確認	Michael Miller Yoder, Chloe Perry, David West Brown, Kathleen M. Carley, Meredith L. Pruden	(参考訳) incels(online community of involuntary celibates)は、ミソグミストによるヘイトスピーチの源泉である。本稿では,ブラックパイルド・インセルズ・フォーラムである incels-dot-is において,アイデンティティグループがどのように議論されるかを検討するために,定量的テキストとネットワーク分析のアプローチを用いる。このコミュニティは幅広い新しいアイデンティティ用語を生み出しており、女性の用語が最も一般的である一方で、他のマイノリティ化されたアイデンティティの言及が増えている。アイデンティティグループと結びついた関連性の分析は、身体的な外見と性別、人種的階層が人間の価値を決定する本質的なイデオロギーを示唆している。本研究は, 自動失語症ヘイトスピーチ検出研究の意義について論じる。 Online communities of involuntary celibates (incels) are a prominent source of misogynist hate speech. In this paper, we use quantitative text and network analysis approaches to examine how identity groups are discussed on incels-dot-is, the largest black-pilled incels forum. We find that this community produces a wide range of novel identity terms and, while terms for women are most common, mentions of other minoritized identities are increasing. An analysis of the associations made with identity groups suggests an essentialist ideology where physical appearance, as well as gender and racial hierarchies, determine human value. We discuss implications for research into automated misogynist hate speech detection.	翻訳日:2023-07-13 18:47:56 公開日:2023-07-09
# ラベル効率3d-to2dセグメンテーションのためのモード間再構成と特徴投影ネットワークによる自己教師あり学習 Self-supervised learning via inter-modal reconstruction and feature projection networks for label-efficient 3D-to-2D segmentation ( http://arxiv.org/abs/2307.03008v2 ) ライセンス: Link先を確認	Jos\'e Morano, Guilherme Aresta, Dmitrii Lachinov, Julia Mai, Ursula Schmidt-Erfurth, Hrvoje Bogunovi\'c	(参考訳) 深層学習は、特定の医用画像セグメンテーションタスクを自動化し、医療専門家の作業量を大幅に軽減する貴重なツールとなっている。これらのタスクのいくつかは、入力次元のサブセットでセグメンテーションを行う必要があり、最も一般的なケースは3D-to-2Dである。しかし、既存の手法の性能は、現在これらのタスクで検証されている転送学習のようなデータ効率のよい手法がないため、ラベル付きデータの量によって強く条件付けられている。本研究では,ラベル効率のよい3D-to-2Dセグメンテーションのための新しい畳み込みニューラルネットワーク(CNN)と自己教師付き学習(SSL)手法を提案する。 cnnは、3dエンコーダと、2dデコーダからなり、新しい3d-to2dブロックで接続される。 SSL法は次元の異なるモダリティのイメージペアを再構成する。光コヒーレンス・トモグラフィーにおける地理的萎縮の面分画と直交性偽ドライセンの2つの臨床的関連性について検討した。異なるデータセット上の結果から,提案するcnnは,diceスコアの最大8%の制限付きデータを用いて,シナリオにおけるアートの状態を著しく改善することが示された。さらに,提案手法により,最大23%の性能向上が可能となり,ネットワークアーキテクチャに関係なくSSLが有効であることを示す。 Deep learning has become a valuable tool for the automation of certain medical image segmentation tasks, significantly relieving the workload of medical specialists. Some of these tasks require segmentation to be performed on a subset of the input dimensions, the most common case being 3D-to-2D. However, the performance of existing methods is strongly conditioned by the amount of labeled data available, as there is currently no data efficient method, e.g. transfer learning, that has been validated on these tasks. In this work, we propose a novel convolutional neural network (CNN) and self-supervised learning (SSL) method for label-efficient 3D-to-2D segmentation. The CNN is composed of a 3D encoder and a 2D decoder connected by novel 3D-to-2D blocks. The SSL method consists of reconstructing image pairs of modalities with different dimensionality. The approach has been validated in two tasks with clinical relevance: the en-face segmentation of geographic atrophy and reticular pseudodrusen in optical coherence tomography. Results on different datasets demonstrate that the proposed CNN significantly improves the state of the art in scenarios with limited labeled data by up to 8% in Dice score. Moreover, the proposed SSL method allows further improvement of this performance by up to 23%, and we show that the SSL is beneficial regardless of the network architecture.	翻訳日:2023-07-13 18:38:23 公開日:2023-07-09
# datacomp: 次世代のマルチモーダルデータセットの探索 DataComp: In search of the next generation of multimodal datasets ( http://arxiv.org/abs/2304.14108v3 ) ライセンス: Link先を確認	Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt	(参考訳) マルチモーダルデータセットは、安定拡散やgpt-4のような最近のブレークスルーにおいて重要な要素であるが、その設計はモデルアーキテクチャやトレーニングアルゴリズムと同じ研究の注目を集めていない。 MLエコシステムにおけるこの欠点に対処するため、私たちは、Common Crawlから128億のイメージテキストペアの候補プールを中心としたデータセット実験用のテストベッドであるDataCompを紹介した。ベンチマーク参加者は、新しいフィルタリングテクニックを設計し、新しいデータソースをキュレートし、標準化されたCLIPトレーニングコードを実行し、38の下流テストセットで結果モデルをテストすることで、新しいデータセットを評価します。ベンチマークは4桁の計算スケールで構成されており、スケーリングトレンドの研究を可能にし、様々なリソースを持つ研究者がベンチマークを利用できるようにしている。我々のベースライン実験は、DataCompのワークフローがより良いトレーニングセットをもたらすことを示している。特に、最良のベースラインであるDataComp-1Bでは、ImageNet上でCLIP ViT-L/14をゼロショット精度79.2%までトレーニングすることが可能で、同じトレーニング手順と計算を使用して、OpenAIのCLIP ViT-L/14を3.7%上回っている。 DataComp と付随するコードはすべて www.datacomp.ai でリリースしています。 Multimodal datasets are a critical component in recent breakthroughs such as Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms. To address this shortcoming in the ML ecosystem, we introduce DataComp, a testbed for dataset experiments centered around a new candidate pool of 12.8 billion image-text pairs from Common Crawl. Participants in our benchmark design new filtering techniques or curate new data sources and then evaluate their new dataset by running our standardized CLIP training code and testing the resulting model on 38 downstream test sets. Our benchmark consists of multiple compute scales spanning four orders of magnitude, which enables the study of scaling trends and makes the benchmark accessible to researchers with varying resources. Our baseline experiments show that the DataComp workflow leads to better training sets. In particular, our best baseline, DataComp-1B, enables training a CLIP ViT-L/14 from scratch to 79.2% zero-shot accuracy on ImageNet, outperforming OpenAI's CLIP ViT-L/14 by 3.7 percentage points while using the same training procedure and compute. We release DataComp and all accompanying code at www.datacomp.ai.	翻訳日:2023-07-13 16:46:21 公開日:2023-07-09
# 命令型言語モデルのゼロショットロバスト性の評価 Evaluating the Zero-shot Robustness of Instruction-tuned Language Models ( http://arxiv.org/abs/2306.11270v2 ) ライセンス: Link先を確認	Jiuding Sun, Chantal Shaib, Byron C. Wallace	(参考訳) 命令の微調整は、新しいタスクにおける大規模言語モデル(llm)のゼロショット能力を改善するための有望なアプローチとして最近登場した。この技術は、控えめな大きさのLLMの性能向上において特に強みを示しており、時にはより大型のモデルと競合する性能を誘導する。本論文では,(1)命令調整モデルと命令の特定の記述にどの程度敏感か,(2)自然言語変化に対してどのようにより強固にできるか,という2つの疑問を問う。前者に対応するために,NLP実践者が手書きした319個の命令を,広く使用されているベンチマークに含まれる80以上のユニークなタスクに対して収集し,これらの命令のばらつきと平均性能を,命令微調整中に観察された命令句と比較して評価した。我々は,新しい(観測されていない)が適切な命令句を用いることで,モデルの性能を劣化させることがある。さらに、このような自然な命令は、意味的同値にもかかわらず、下流のパフォーマンスに幅広いばらつきをもたらす。別の言い方をすれば、命令調整されたモデルは命令の再記述に対して特に堅牢ではない。本稿では,「ソフトプロンプト」埋め込みパラメータを導入し,意味的に等価な命令の表現の類似性を最大化するために最適化することで,この問題を軽減するための簡単な手法を提案する。本手法は命令調整モデルのロバスト性を常に改善することを示す。 Instruction fine-tuning has recently emerged as a promising approach for improving the zero-shot capabilities of Large Language Models (LLMs) on new tasks. This technique has shown particular strength in improving the performance of modestly sized LLMs, sometimes inducing performance competitive with much larger model variants. In this paper we ask two questions: (1) How sensitive are instruction-tuned models to the particular phrasings of instructions, and, (2) How can we make them more robust to such natural language variation? To answer the former, we collect a set of 319 instructions manually written by NLP practitioners for over 80 unique tasks included in widely used benchmarks, and we evaluate the variance and average performance of these instructions as compared to instruction phrasings observed during instruction fine-tuning. We find that using novel (unobserved) but appropriate instruction phrasings consistently degrades model performance, sometimes substantially so. Further, such natural instructions yield a wide variance in downstream performance, despite their semantic equivalence. Put another way, instruction-tuned models are not especially robust to instruction re-phrasings. We propose a simple method to mitigate this issue by introducing ``soft prompt'' embedding parameters and optimizing these to maximize the similarity between representations of semantically equivalent instructions. We show that this method consistently improves the robustness of instruction-tuned models.	翻訳日:2023-07-13 16:36:01 公開日:2023-07-09
# 自動評価におけるフィードバックの見直し Review of feedback in Automated Essay Scoring ( http://arxiv.org/abs/2307.05553v1 ) ライセンス: Link先を確認	You-Jin Jong, Yong-Jin Kim, Ok-Chol Ri	(参考訳) 最初の自動エッセイ評価システムは50年前に開発された。自動エッセイスコアリングシステムは、従来の単純なスコアリングシステムよりもリッチな機能を持つシステムに発展しつつある。その目的は、エッセイのスコアだけでなく、ユーザの書き方を改善するための学習ツールでもある。フィードバックは、実生活で有用な自動エッセイ評価システムを構築する上で最も重要な側面である。最初のAESシステムではフィードバックの重要性が強調されていた。本稿では,異なるフィードバックタイプやエッセイ特性を含むフィードバックに関する研究についてレビューする。また,フィードバックを提供する自動エッセイ評価システムの最新事例について検討した。 The first automated essay scoring system was developed 50 years ago. Automated essay scoring systems are developing into systems with richer functions than the previous simple scoring systems. Its purpose is not only to score essays but also as a learning tool to improve the writing skill of users. Feedback is the most important aspect of making an automated essay scoring system useful in real life. The importance of feedback was already emphasized in the first AES system. This paper reviews research on feedback including different feedback types and essay traits on automated essay scoring. We also reviewed the latest case studies of the automated essay scoring system that provides feedback.	翻訳日:2023-07-13 16:28:39 公開日:2023-07-09
# グラフニューラルネットワークによるテラヘルツ型フロー誘導ナノスケール局在 Graph Neural Network-enabled Terahertz-based Flow-guided Nanoscale Localization ( http://arxiv.org/abs/2307.05551v1 ) ライセンス: Link先を確認	Gerard Calvo Bartra, Filip Lemic, Sergi Abadal, Xavier Costa Perez	(参考訳) ナノテクノロジーと先端材料における科学的進歩は、センシング、コンピューティング、通信、データ、エネルギー貯蔵機能を含む体内精密医療のためのナノスケールデバイスへの道を開く。ヒトの心血管系では、そのような装置は受動的に流れ、継続的に検知され、診断上の関心事を検出する。このような事象を検出する診断値は、フロー誘導ローカライゼーションの主命題である物理的な位置(例えば、身体領域)に割り当てることによって向上することができる。現在のフローガイド型ローカライズアプローチはローカライズ精度が低く、心血管系全体の事象をローカライズできない設計になっている。この問題に対処するために,我々はグラフニューラルネットワーク(GNN)の利用を提案し,既存の最先端技術(SotA)アプローチに対して,提案手法の局所化精度とカバレッジ向上を示す。本評価に基づき,GNN対応フロー誘導ローカライゼーションの設計ガイドラインについて述べる。 Scientific advancements in nanotechnology and advanced materials are paving the way toward nanoscale devices for in-body precision medicine; comprising integrated sensing, computing, communication, data and energy storage capabilities. In the human cardiovascular system, such devices are envisioned to be passively flowing and continuously sensing for detecting events of diagnostic interest. The diagnostic value of detecting such events can be enhanced by assigning to them their physical locations (e.g., body region), which is the main proposition of flow-guided localization. Current flow-guided localization approaches suffer from low localization accuracy and they are by-design unable to localize events within the entire cardiovascular system. Toward addressing this issue, we propose the utilization of Graph Neural Networks (GNNs) for this purpose, and demonstrate localization accuracy and coverage enhancements of our proposal over the existing State of the Art (SotA) approaches. Based on our evaluation, we provide several design guidelines for GNN-enabled flow-guided localization.	翻訳日:2023-07-13 16:28:13 公開日:2023-07-09
# 接続の開放:アインシュタインの業績におけるERブリッジとEPR Unveiling the Connection: ER bridges and EPR in the work of Einstein ( http://arxiv.org/abs/2307.05548v1 ) ライセンス: Link先を確認	Galina Weinstein	(参考訳) 本稿では,ERブリッジ理論とその量子現象との関係について考察する。 ERブリッジ理論は量子現象に明示的に対応せず、アインシュタインがERブリッジ理論内の個々の粒子とEPRパラドックスに関わる系とを区別することを意図している、という主張が成り立つ。しかし、この論文はアインシュタインが異なる視点を持っていたと論じている。一般相対性理論を変更して量子特性の解明に尽力し、量子力学の原理に頼らずに局所現実主義、分離性、因果性、決定論といった概念を取り入れることを目指した。彼は2枚の平板を接続する平行ER橋を用いた素粒子の表現を提案した。 This paper explores the ER bridges theory and its relationship with quantum phenomena. An argument can be made that the ER bridges theory does not explicitly address quantum phenomena and implies that Einstein intended to differentiate between individual particles within the ER bridges theory and the systems involved in the EPR paradox. However, this paper contends that Einstein held a distinct viewpoint. He endeavored to elucidate quantum characteristics by modifying general relativity, aiming to incorporate concepts such as local realism, separability, causality, and determinism, without relying on the principles of quantum mechanics. He proposed representing elementary particles using parallel ER bridges connecting two flat sheets to achieve this.	翻訳日:2023-07-13 16:27:43 公開日:2023-07-09
# semeval-2023タスク1:プロンプト拡張とテキストから画像への拡散によるゼロショット視覚wsdの構成性とあいまいさの処理におけるクリップの強化 Augmenters at SemEval-2023 Task 1: Enhancing CLIP in Handling Compositionality and Ambiguity for Zero-Shot Visual WSD through Prompt Augmentation and Text-To-Image Diffusion ( http://arxiv.org/abs/2307.05564v1 ) ライセンス: Link先を確認	Jie S. Li, Yow-Ting Shiue, Yong-Siang Shih, and Jonas Geiping	(参考訳) 本稿では,Visual Word Sense Disambiguation (VWSD)タスクに対するゼロショットアプローチについて述べる。予備研究の結果,クリップを用いて候補画像とフレーズをマッチングする手法は,画像テキスト対の多対多性に苦しむことがわかった。 CLIPテキストエンコーダは、自然言語の合成性を捉える能力に制限がある可能性がある。逆に、フレーズの記述的焦点は、例によって異なる。 Augment-CLIPとStable Diffusion Smpling(SDサンプリング)という2つのシステムでこの問題に対処する。 Augment-CLIPは、大きな言語モデル(LLM)の助けを借りてコンテキストフレーズを含む文を生成することで、テキストプロンプトを強化する。あいまいな単語が他言語の曖昧な単語に翻訳される可能性があるため、他の言語のCLIPモデルについても検討する。 sdサンプリングは、テキストから画像への安定した拡散を使用して、与えられた句から複数の画像を生成する。 This paper describes our zero-shot approaches for the Visual Word Sense Disambiguation (VWSD) Task in English. Our preliminary study shows that the simple approach of matching candidate images with the phrase using CLIP suffers from the many-to-many nature of image-text pairs. We find that the CLIP text encoder may have limited abilities in capturing the compositionality in natural language. Conversely, the descriptive focus of the phrase varies from instance to instance. We address these issues in our two systems, Augment-CLIP and Stable Diffusion Sampling (SD Sampling). Augment-CLIP augments the text prompt by generating sentences that contain the context phrase with the help of large language models (LLMs). We further explore CLIP models in other languages, as the an ambiguous word may be translated into an unambiguous one in the other language. SD Sampling uses text-to-image Stable Diffusion to generate multiple images from the given phrase, increasing the likelihood that a subset of images match the one that paired with the text.	翻訳日:2023-07-13 16:17:08 公開日:2023-07-09
# ridgebase: クロスセンサー多指非接触指紋データセット RidgeBase: A Cross-Sensor Multi-Finger Contactless Fingerprint Dataset ( http://arxiv.org/abs/2307.05563v1 ) ライセンス: Link先を確認	Bhavin Jawade, Deen Dayal Mohan, Srirangaraj Setlur, Nalini Ratha and Venu Govindaraju	(参考訳) スマートフォンカメラを用いた非接触指紋マッチングは、衛生的取得、ポータビリティ、プレゼンテーションアタックを含む従来の指紋システムの大きな課題を軽減することができる。しかし、実用的で堅牢な非接触指紋マッチング技術の開発は、大規模な実世界のデータセットの可用性に制限されている。センサ間の非接触指紋マッチングのさらなる進歩を動機付けるために, ridgebaseベンチマークデータセットを紹介する。 RidgeBaseは、異なる背景と照明条件下で88人の個人から2台のスマートフォンカメラと1台のフラットベッドコンタクトセンサーで取得された15,000以上のコンタクトレスとコンタクトベースの指紋画像からなる。既存のデータセットとは異なり、RageBaseは、コンタクトレス・トゥ・コンタクトレス(CL2CL)とコンタクト・トゥ・コンタクトレス(C2CL)の検証と識別のためのシングルフィンガーマッチングとマルチフィンガーマッチングを含む、異なるマッチングシナリオ下での研究を促進するように設計されている。さらに,同一指に属する非接触指紋のサンプル内ばらつきが高いため,顔認識データセットの進歩に触発されたセットベースマッチングプロトコルを提案する。このプロトコルは、焦点、極性、指角のばらつきを考慮できる実用的な非接触指紋マッチングのために特別に設計されている。我々は,COTS指紋マーカ(Verifinger)とDep CNNに基づくRageBaseデータセットに基づくアプローチを用いて,異なるプロトコルに対する質的,定量的なベースライン結果について報告する。データセットは以下にダウンロードできる。 https://www.buffalo.edu/cubs/research/datasets/ridgebase-benchmark-dataset。 Contactless fingerprint matching using smartphone cameras can alleviate major challenges of traditional fingerprint systems including hygienic acquisition, portability and presentation attacks. However, development of practical and robust contactless fingerprint matching techniques is constrained by the limited availability of large scale real-world datasets. To motivate further advances in contactless fingerprint matching across sensors, we introduce the RidgeBase benchmark dataset. RidgeBase consists of more than 15,000 contactless and contact-based fingerprint image pairs acquired from 88 individuals under different background and lighting conditions using two smartphone cameras and one flatbed contact sensor. Unlike existing datasets, RidgeBase is designed to promote research under different matching scenarios that include Single Finger Matching and Multi-Finger Matching for both contactless- to-contactless (CL2CL) and contact-to-contactless (C2CL) verification and identification. Furthermore, due to the high intra-sample variance in contactless fingerprints belonging to the same finger, we propose a set-based matching protocol inspired by the advances in facial recognition datasets. This protocol is specifically designed for pragmatic contactless fingerprint matching that can account for variances in focus, polarity and finger-angles. We report qualitative and quantitative baseline results for different protocols using a COTS fingerprint matcher (Verifinger) and a Deep CNN based approach on the RidgeBase dataset. The dataset can be downloaded here: https://www.buffalo.edu/cubs/research/datasets/ridgebase-benchmark-dataset.html	翻訳日:2023-07-13 16:16:48 公開日:2023-07-09
# TransPose:深度補正機能を備えたトランスフォーマーベースの6Dオブジェクトポス推定ネットワーク TransPose: A Transformer-based 6D Object Pose Estimation Network with Depth Refinement ( http://arxiv.org/abs/2307.05561v1 ) ライセンス: Link先を確認	Mahmoud Abdulsalam and Nabil Aouf	(参考訳) ロボット操作アプリケーションへの需要が増加するにつれて、正確な視覚に基づく6dポーズ推定が自律運転に必須となる。畳み込みニューラルネットワーク(CNN)に基づくポーズ推定手法が以前にも紹介されている。しかし、特に正確なロボティクス操作では、パフォーマンス向上の追求は引き続き続いている。この探求はアグリ・ロボティクス領域にまで及ぶ。本稿では,奥行き補正モジュールを用いたトランストランスベース6次元ポーズ推定法であるtransposeを提案する。アーキテクチャはRGB画像のみを入力として取り込むが、深度や熱画像などの追加の補正は行わない。このアーキテクチャは、アップサンプリング方式で特徴ピラミッドを用いてRGB画像から深度を推定する革新的な光深度推定ネットワークを含んでいる。対象物の中心を直接後退させ,対象物の6次元姿勢を予測するために,追加予測ヘッドを備えたトランスベース検出ネットワークを提案する。次に、予測された中心、6Dポーズ、および6Dポーズの精度を向上するために、新しい深度補正モジュールが使用される。その結果を最先端の他の手法と比較し,果実摘みの応用について分析した。その結果,提案手法は文献で利用可能な他の手法よりも優れていることがわかった。 As demand for robotics manipulation application increases, accurate vision-based 6D pose estimation becomes essential for autonomous operations. Convolutional Neural Networks (CNNs) based approaches for pose estimation have been previously introduced. However, the quest for better performance still persists especially for accurate robotics manipulation. This quest extends to the Agri-robotics domain. In this paper, we propose TransPose, an improved Transformer-based 6D pose estimation with a depth refinement module. The architecture takes in only an RGB image as input with no additional supplementing modalities such as depth or thermal images. The architecture encompasses an innovative lighter depth estimation network that estimates depth from an RGB image using feature pyramid with an up-sampling method. A transformer-based detection network with additional prediction heads is proposed to directly regress the object's centre and predict the 6D pose of the target. A novel depth refinement module is then used alongside the predicted centers, 6D poses and depth patches to refine the accuracy of the estimated 6D pose. We extensively compared our results with other state-of-the-art methods and analysed our results for fruit-picking applications. The results we achieved show that our proposed technique outperforms the other methods available in the literature.	翻訳日:2023-07-13 16:16:18 公開日:2023-07-09
# 大規模自動コーディング:チリ公共医療システムにおけるレファラーの正規化のための全国的なシステムの設計と展開 Automatic Coding at Scale: Design and Deployment of a Nationwide System for Normalizing Referrals in the Chilean Public Healthcare System ( http://arxiv.org/abs/2307.05560v1 ) ライセンス: Link先を確認	Fabi\'an Villena, Mat\'ias Rojas, Felipe Arias, Jorge Pacheco, Paulina Vera, Jocelyn Dunstan	(参考訳) 疾患符号化タスクは、コントロールされた語彙から臨床文書に記載された各疾患にユニークな識別子を割り当てることを含む。このタスクは、非構造化データからの情報抽出を可能とし、例えば、特定された状況において疾患の発生率と感染率に関する疫学研究を行う。しかしながら、手動のコーディングプロセスは、医療従事者がコーディングルールや用語に精通する必要があるため、エラーとなる。さらに、このプロセスは多くの時間とエネルギーを消費し、より臨床的に関連するタスクに割り当てることができる。これらの困難は、自動的に病気にコードを割り当てる計算システムを開発することで対処できる。そこで本稿では,チリの公共医療システムから参照される疾患を自動的にコードする2段階のシステムを提案する。具体的には,病名認識に最先端のNERモデルとElasticsearchをベースとした検索エンジンシステムを用いて,これらの疾患名に関連性の高いコードを割り当てる。このシステムの性能は、臨床専門家が手作業でコーディングした基準に基づいて評価された。本システムでは,サブカテゴリレベルでは0.63,カテゴリーレベルでは0.83のマップスコアを得た。このシステムは、コーディングと管理のプロセスを最適化する健康専門家のためのサポートツールになり得る。最後に、再現性を保証するため、我々のモデルと実験のコードを公開します。 The disease coding task involves assigning a unique identifier from a controlled vocabulary to each disease mentioned in a clinical document. This task is relevant since it allows information extraction from unstructured data to perform, for example, epidemiological studies about the incidence and prevalence of diseases in a determined context. However, the manual coding process is subject to errors as it requires medical personnel to be competent in coding rules and terminology. In addition, this process consumes a lot of time and energy, which could be allocated to more clinically relevant tasks. These difficulties can be addressed by developing computational systems that automatically assign codes to diseases. In this way, we propose a two-step system for automatically coding diseases in referrals from the Chilean public healthcare system. Specifically, our model uses a state-of-the-art NER model for recognizing disease mentions and a search engine system based on Elasticsearch for assigning the most relevant codes associated with these disease mentions. The system's performance was evaluated on referrals manually coded by clinical experts. Our system obtained a MAP score of 0.63 for the subcategory level and 0.83 for the category level, close to the best-performing models in the literature. This system could be a support tool for health professionals, optimizing the coding and management process. Finally, to guarantee reproducibility, we publicly release the code of our models and experiments.	翻訳日:2023-07-13 16:16:01 公開日:2023-07-09
# スパイク・アンド・スラブによるベイズ線形回帰の推定からサンプリングへ From Estimation to Sampling for Bayesian Linear Regression with Spike-and-Slab Prior ( http://arxiv.org/abs/2307.05558v1 ) ライセンス: Link先を確認	Qijia Jiang	(参考訳) 後方収縮特性を利用した事前及び設計効率的なサンプリングアルゴリズムを用いてベイズ線形回帰を考察する。ガウスのスパイク・アンド・スラブ(統計的にも計算的にも好適)による準類似性を調査し、ギブスサンプリングと確率的局在に基づく2つのアルゴリズムを、スパース植込み信号の正当な推論を可能にする同じ(quite natural)統計仮定の下で解析する。 Stochastic Localization samplerの利点は、よく設計されていないデータマトリックスで特に顕著である。 We consider Bayesian linear regression with sparsity-inducing prior and design efficient sampling algorithms leveraging posterior contraction properties. A quasi-likelihood with Gaussian spike-and-slab (that is favorable both statistically and computationally) is investigated and two algorithms based on Gibbs sampling and Stochastic Localization are analyzed, both under the same (quite natural) statistical assumptions that also enable valid inference on the sparse planted signal. The benefit of the Stochastic Localization sampler is particularly prominent for data matrix that is not well-designed.	翻訳日:2023-07-13 16:15:39 公開日:2023-07-09
# オッペンハイマーとスナイダー重力崩壊の量子化によるブラックホールのシュル・オーディンガーとクライン=ゴルドンの理論 Schr\"odinger and Klein-Gordon theories of black holes from the quantization of the Oppenheimer and Snyder gravitational collapse ( http://arxiv.org/abs/2307.05554v1 ) ライセンス: Link先を確認	Christian Corda	(参考訳) シュワルツシルトブラックホール (bh) のシュルツチャイルド方程式は、bh が中心場と相互作用する「電子」、すなわち「核」からなることを示しており、ド・ブロイの仮説により、bh ホライズンモードの観点で「電子」を解釈する。量子重力効果はプランクスケールではなくシュワルツシルトスケールでのBH半古典構造を変化させる。この BH Schr\"odinger 方程式と水素原子の s 状態の Schr\"odinger 方程式の類似により、同じ方程式を解くことができる。したがって、BHは「重力水素原子」というシュリンガーの理論に従うよく定義された量子重力系である。 By identifying the potential energy in the BH Schr\"odinger equation as being the gravitational energy of a spherically symmetric shell, a different nature of the quantum BH seems to surface. BHs are self-interacting, highly excited, spherically symmetric, massive quantum shells generated by matter condensing on the apparent horizon, concretely realizing the membrane paradigm. The quantum BH descripted as a "gravitational hydrogen atom" is a fictitious mathematical representation of the real, quantum BH, a quantum massive shell having as radius the oscillating gravitational radius. この結果から自明な結果が生まれます i) bhs は地平線も特異点も持たない。 ii) bh蒸発における情報損失もbh相補性もファイアウォールパラドックスも存在しない。これらの結果は、Hawking、Vaz、Mitraなどによる以前のものと一致している。最後に、BH Schr\\odinger方程式に対する特殊相対論的補正は、BH Klein-Gordon方程式と対応する固有値を与える。 The Schr\"odinger equation of the Schwarzschild black hole (BH) shows that a BH is composed of a particle, the "electron", interacting with a central field, the "nucleus". Via de Broglie's hypothesis, one interprets the "electron" in terms of BH horizon's modes. Quantum gravity effects modify the BH semi-classical structure at the Schwarzschild scale rather than at the Planck scale. The analogy between this BH Schr\"odinger equation and the Schr\"odinger equation of the s states of the hydrogen atom permits us to solve the same equation. Therefore, BHs are well defined quantum gravitational systems obeying Schr\"odinger's theory: the "gravitational hydrogen atoms". By identifying the potential energy in the BH Schr\"odinger equation as being the gravitational energy of a spherically symmetric shell, a different nature of the quantum BH seems to surface. BHs are self-interacting, highly excited, spherically symmetric, massive quantum shells generated by matter condensing on the apparent horizon, concretely realizing the membrane paradigm. The quantum BH descripted as a "gravitational hydrogen atom" is a fictitious mathematical representation of the real, quantum BH, a quantum massive shell having as radius the oscillating gravitational radius. Nontrivial consequences emerge from this result: i) BHs have neither horizons nor singularities; ii) there is neither information loss in BH evaporation, nor BH complementarity, nor firewall paradox. These results are consistent with previous ones by Hawking, Vaz, Mitra and others. Finally, the special relativistic corrections to the BH Schr\"odinger equation give the BH Klein-Gordon equation and the corresponding eigenvalues.	翻訳日:2023-07-13 16:15:27 公開日:2023-07-09
# 非構造化データから学習構造をパーソナライズした強化学習要約サービス A Personalized Reinforcement Learning Summarization Service for Learning Structure from Unstructured Data ( http://arxiv.org/abs/2307.05696v1 ) ライセンス: Link先を確認	Samira Ghodratnama, Amin Beheshti, Mehrdad Zakershahrak	(参考訳) テキストデータの指数関数的な成長は、有意義な洞察の抽出を支援するツールに対する重要なニーズを生み出した。従来の文書要約アプローチは、個々のユーザ要求を満たすことができず、効率的な情報処理のための構造が欠如していることが多い。これらの制限に対処するため,我々は階層型パーソナライズ概念に基づく要約手法であるsummationを提案する。文書を簡潔な階層的な概念マップに合成し、ユーザの好みを学習し、適応することによって、積極的にユーザと対話する。 Reinforcement Learningアルゴリズムを用いて、Summationは特定のトピックに関する未確認文書のパーソナライズされた要約を生成する。このフレームワークは、理解を高め、効果的なナビゲーションを可能にし、ユーザが独自の要求に沿う大きなドキュメントコレクションから意味のある洞察を抽出できるようにする。 The exponential growth of textual data has created a crucial need for tools that assist users in extracting meaningful insights. Traditional document summarization approaches often fail to meet individual user requirements and lack structure for efficient information processing. To address these limitations, we propose Summation, a hierarchical personalized concept-based summarization approach. It synthesizes documents into a concise hierarchical concept map and actively engages users by learning and adapting to their preferences. Using a Reinforcement Learning algorithm, Summation generates personalized summaries for unseen documents on specific topics. This framework enhances comprehension, enables effective navigation, and empowers users to extract meaningful insights from large document collections aligned with their unique requirements.	翻訳日:2023-07-13 15:27:22 公開日:2023-07-09
# 科学文献における図形分類手法に関する調査 A Survey on Figure Classification Techniques in Scientific Documents ( http://arxiv.org/abs/2307.05694v1 ) ライセンス: Link先を確認	Anurag Dhote and Mohammed Javed and David S Doermann	(参考訳) 図は重要な情報を視覚的に表現し、科学的事実を伝える効果的な手段を提供する。近年、さまざまな人工知能と機械学習技術を用いて、図、特に表、図、プロットから直接データを抽出する取り組みが数多く行われている。これは、数字から情報を取り除くことが、科学文書で強調された概念に対する深い洞察をもたらす可能性があるためである。本稿では,図を5つのクラス(表,写真,図,地図,プロット)に体系的に分類し,その上で,図形分類の問題に対処する既存の方法論とデータセットについて批判的なレビューを行う。最後に,現在の研究のギャップを特定し,図分類に関するさらなる研究の方向性を示す。 Figures visually represent an essential piece of information and provide an effective means to communicate scientific facts. Recently there have been many efforts toward extracting data directly from figures, specifically from tables, diagrams, and plots, using different Artificial Intelligence and Machine Learning techniques. This is because removing information from figures could lead to deeper insights into the concepts highlighted in the scientific documents. In this survey paper, we systematically categorize figures into five classes - tables, photos, diagrams, maps, and plots, and subsequently present a critical review of the existing methodologies and data sets that address the problem of figure classification. Finally, we identify the current research gaps and provide possible directions for further research on figure classification.	翻訳日:2023-07-13 15:26:55 公開日:2023-07-09
# HA-ViD:統合アセンブリ理解のためのヒューマンアセンブリビデオデータセット HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding ( http://arxiv.org/abs/2307.05721v1 ) ライセンス: Link先を確認	Hao Zheng, Regina Lee, Yuqian Lu	(参考訳) ビデオから総合的な組み立て知識を理解することは、未来的な超知能産業にとって不可欠である。技術的ブレークスルーを実現するため、HA-ViDは、産業的な組み立てシナリオ、自然な手続き的知識獲得プロセス、一貫性のあるヒューマンロボット共有アノテーションを特徴とする、最初のヒューマンアセンブリビデオデータセットである。特に、HA-ViDは、現実世界のアセンブリ、自然な人間の振る舞い、組み立て中の学習の進行の多様なコラボレーションパターンをキャプチャし、主語、アクション動詞、操作対象、ターゲット対象、ツールに対するグラニュレートなアクションアノテーションをキャプチャする。マルチビュー・マルチモーダルビデオ(各ビデオは1つの組立タスクを含む)、1.5Mフレーム、96K時間ラベル、2M空間ラベルを提供する。我々は、アクション認識、アクションセグメンテーション、オブジェクト検出、マルチオブジェクトトラッキングの4つの基本的なビデオ理解タスクをベンチマークする。重要なことは、アセンブリの進捗、プロセス効率、タスクコラボレーション、スキルパラメータ、人間の意図といった知識を理解するために、それらのパフォーマンスを分析することである。 HA-ViDの詳細は以下の通り。 Understanding comprehensive assembly knowledge from videos is critical for futuristic ultra-intelligent industry. To enable technological breakthrough, we present HA-ViD - the first human assembly video dataset that features representative industrial assembly scenarios, natural procedural knowledge acquisition process, and consistent human-robot shared annotations. Specifically, HA-ViD captures diverse collaboration patterns of real-world assembly, natural human behaviors and learning progression during assembly, and granulate action annotations to subject, action verb, manipulated object, target object, and tool. We provide 3222 multi-view, multi-modality videos (each video contains one assembly task), 1.5M frames, 96K temporal labels and 2M spatial labels. We benchmark four foundational video understanding tasks: action recognition, action segmentation, object detection and multi-object tracking. Importantly, we analyze their performance for comprehending knowledge in assembly progress, process efficiency, task collaboration, skill parameters and human intention. Details of HA-ViD is available at: https://iai-hrc.github.io/ha-vid.	翻訳日:2023-07-13 15:15:47 公開日:2023-07-09
# SpreadNUTS -- No-U-Turn サンプリングのための経路の動的拡張と訪問地域分割 SpreadNUTS -- Moderate Dynamic Extension of Paths for No-U-Turn Sampling & Partitioning Visited Regions ( http://arxiv.org/abs/2307.06279v1 ) ライセンス: Link先を確認	Fareed Sheriff	(参考訳) マルコフ連鎖モンテカルロ法(MCMC)は長い間存在しており、その分野はよく研究されている。 MCMC法の目的は、繰り返しサンプリングによって分布を近似することであり、ほとんどのMCMCアルゴリズムは、その極限で真の分布に収束する漸近的に最適な挙動を示す。しかし、これらのアルゴリズムを区別しているのは、実用的収束保証と効率性である。サンプリング器は最終的に分布をよく近似することができるが、実世界で使用されるため、サンプリング器が良い推定値を得る点が妥当な時間内に到達可能である必要がある。同様に、推定に使用する分布から良いサンプルを生成するのが計算的に困難または難解であれば、サンプリング者が利用できる実世界のユーティリティは存在しない。したがって、最近のMCMC手法のほとんどは効率の向上と収束のスピードアップに重点を置いている。しかし、多くのmcmcアルゴリズムはランダムウォークに苦しむため、ランダムウォークを消去するなど、そのような動作を緩和することは困難である。ハミルトニアン・モンテカルロ(英: Hamiltonian Monte Carlo、HMC)は、理論上はハミルトニアン力学に関連する性質のためランダムウォークの振る舞いを示さないMCMC法の一種である。本稿では, NUTSよりも高速にサンプル空間を探索することを目的とした, No-U-turn sampler (NUTS) と呼ばれる特定のHMCアルゴリズムの修正について述べる。 Markov chain Monte Carlo (MCMC) methods have existed for a long time and the field is well-explored. The purpose of MCMC methods is to approximate a distribution through repeated sampling; most MCMC algorithms exhibit asymptotically optimal behavior in that they converge to the true distribution at the limit. However, what differentiates these algorithms are their practical convergence guarantees and efficiency. While a sampler may eventually approximate a distribution well, because it is used in the real world it is necessary that the point at which the sampler yields a good estimate of the distribution is reachable in a reasonable amount of time. Similarly, if it is computationally difficult or intractable to produce good samples from a distribution for use in estimation, then there is no real-world utility afforded by the sampler. Thus, most MCMC methods these days focus on improving efficiency and speeding up convergence. However, many MCMC algorithms suffer from random walk behavior and often only mitigate such behavior as outright erasing random walks is difficult. Hamiltonian Monte Carlo (HMC) is a class of MCMC methods that theoretically exhibit no random walk behavior because of properties related to Hamiltonian dynamics. This paper introduces modifications to a specific HMC algorithm known as the no-U-turn sampler (NUTS) that aims to explore the sample space faster than NUTS, yielding a sampler that has faster convergence to the true distribution than NUTS.	翻訳日:2023-07-13 12:20:57 公開日:2023-07-09
# 大規模言語モデルの評価に関する調査 A Survey on Evaluation of Large Language Models ( http://arxiv.org/abs/2307.03109v2 ) ライセンス: Link先を確認	Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Kaijie Zhu, Hao Chen, Linyi Yang, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, and Xing Xie	(参考訳) 大規模言語モデル(LLM)は、様々なアプリケーションにおける前例のない性能のため、学術と産業の両方で人気が高まっている。 LLMは研究と日常利用の両方において重要な役割を担い続けており、その評価はタスクレベルだけでなく社会レベルでもますます重要になり、潜在的なリスクの理解を深めている。過去数年間、様々な観点からLSMを調べるための重要な努力が続けられてきた。本稿では, これらのLCMの評価手法を総合的に検討し, 評価方法, 評価方法, 評価方法の3つの重要な側面に着目した。まず,一般的な自然言語処理タスク,推論,医療利用,倫理,教育,自然科学,社会科学,エージェント応用など,評価タスクの観点から概観する。第2に,LLMの性能評価において重要な要素である評価手法とベンチマークに飛び乗ることで,'where' と 'how' の質問に答える。次に、異なるタスクにおけるLCMの成功事例と失敗事例を要約する。最後に、llms評価の先にあるいくつかの将来の課題に光を当てた。我々の目的は、LLMの評価の領域における研究者に貴重な洞察を提供することであり、それによってより熟練したLLMの開発を支援することである。我々のキーポイントは、LCMの開発を支援するために、評価を必須の規律として扱うべきであるということです。関連したオープンソース資料は、https://github.com/mlgroupjlu/llm-eval-surveyで一貫して保守しています。 Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented performance in various applications. As LLMs continue to play a vital role in both research and daily use, their evaluation becomes increasingly critical, not only at the task level, but also at the society level for better understanding of their potential risks. Over the past years, significant efforts have been made to examine LLMs from various perspectives. This paper presents a comprehensive review of these evaluation methods for LLMs, focusing on three key dimensions: what to evaluate, where to evaluate, and how to evaluate. Firstly, we provide an overview from the perspective of evaluation tasks, encompassing general natural language processing tasks, reasoning, medical usage, ethics, educations, natural and social sciences, agent applications, and other areas. Secondly, we answer the `where' and `how' questions by diving into the evaluation methods and benchmarks, which serve as crucial components in assessing performance of LLMs. Then, we summarize the success and failure cases of LLMs in different tasks. Finally, we shed light on several future challenges that lie ahead in LLMs evaluation. Our aim is to offer invaluable insights to researchers in the realm of LLMs evaluation, thereby aiding the development of more proficient LLMs. Our key point is that evaluation should be treated as an essential discipline to better assist the development of LLMs. We consistently maintain the related open-source materials at: https://github.com/MLGroupJLU/LLM-eval-survey.	翻訳日:2023-07-12 17:50:45 公開日:2023-07-09
# mentalhealthai: パーソナルヘルスデバイスデータを利用して精神科治療を最適化する MentalHealthAI: Utilizing Personal Health Device Data to Optimize Psychiatry Treatment ( http://arxiv.org/abs/2307.04777v1 ) ライセンス: Link先を確認	Manan Shukla and Oshani Seneviratne	(参考訳) 精神疾患は現代医療において重要な課題であり、診断と治療はしばしば主観的な患者の記述と過去の医療史に依存している。この問題に対処するため,個人健康装置を用いて収集した患者の生理的データを利用する,個別のメンタルヘルストラッキングと気分予測システムを提案する。本システムでは,スマートコントラクトを用いた移動と連合型機械学習の概念を組み合わせた分散学習機構を活用し,ユーザのデバイスにデータを残し,プライバシを意識し説明可能な方法で精神科治療と管理のためのメンタルヘルス状態の効果的な追跡を可能にする。我々は、有望な結果を示す一般的なメンタルヘルスデータセットを用いてモデルを評価する。統合医療システムと機械学習モデルを利用することで、精神科医に従来のオフィス訪問以外の患者のメンタルヘルスに関するさらなる洞察を与えるという課題に対する新しい解決策を提供する。 Mental health disorders remain a significant challenge in modern healthcare, with diagnosis and treatment often relying on subjective patient descriptions and past medical history. To address this issue, we propose a personalized mental health tracking and mood prediction system that utilizes patient physiological data collected through personal health devices. Our system leverages a decentralized learning mechanism that combines transfer and federated machine learning concepts using smart contracts, allowing data to remain on users' devices and enabling effective tracking of mental health conditions for psychiatric treatment and management in a privacy-aware and accountable manner. We evaluate our model using a popular mental health dataset that demonstrates promising results. By utilizing connected health systems and machine learning models, our approach offers a novel solution to the challenge of providing psychiatrists with further insight into their patients' mental health outside of traditional office visits.	翻訳日:2023-07-12 17:29:41 公開日:2023-07-09
# 機械学習とニューラルネットワークを用いた脳波信号の感情解析 Emotion Analysis on EEG Signal Using Machine Learning and Neural Network ( http://arxiv.org/abs/2307.05375v1 ) ライセンス: Link先を確認	S. M. Masrur Ahmed (1), Eshaan Tanzim Sabur (2) ((1) bKash Limited, (2) BRAC University)	(参考訳) 感情は他人の考えや相互作用に大きな影響を与える。これは、その人の気持ちと行動とを結びつける役割を担っているが、時には人生の判断に影響を及ぼすともいえる。感情のパターンとその反射は人によって異なるため、その調査は幅広い地域において有効であるアプローチに基づいて行われる必要がある。特徴を抽出し精度を高めるため、脳波や脳波信号を用いた感情認識には、効率的な信号処理技術の実装が必要である。人間と機械の相互作用技術への様々なアプローチは長い間進行中であり、近年では脳信号を使って感情を自動的に理解することに成功した。本研究では、SVM(Support Vector Machine)、KNN(K-Nearest Neighbor)、LSTM(Long Short Term Memory)をトレーニングした先進ニューラルネットワークモデルRNN(Recurrent Neural Network)を用いて、よく知られた公開データセットであるDEAPデータセットから収集された脳波信号に基づいて、いくつかの感情状態を分類、検証した。本研究の目的は,脳信号を用いた感情認識性能を改善する方法を改善することである。一方、感情は時間とともに変化します。その結果,時間経過に伴う感情の変化についても検討した。 Emotion has a significant influence on how one thinks and interacts with others. It serves as a link between how a person feels and the actions one takes, or it could be said that it influences one's life decisions on occasion. Since the patterns of emotions and their reflections vary from person to person, their inquiry must be based on approaches that are effective over a wide range of population regions. To extract features and enhance accuracy, emotion recognition using brain waves or EEG signals requires the implementation of efficient signal processing techniques. Various approaches to human-machine interaction technologies have been ongoing for a long time, and in recent years, researchers have had great success in automatically understanding emotion using brain signals. In our research, several emotional states were classified and tested on EEG signals collected from a well-known publicly available dataset, the DEAP Dataset, using SVM (Support Vector Machine), KNN (K-Nearest Neighbor), and an advanced neural network model, RNN (Recurrent Neural Network), trained with LSTM (Long Short Term Memory). The main purpose of this study is to improve ways to improve emotion recognition performance using brain signals. Emotions, on the other hand, can change with time. As a result, the changes in emotion over time are also examined in our research.	翻訳日:2023-07-12 14:15:44 公開日:2023-07-09
# 書き直し規則による線形プログラム間の等価性を示す自己教師付き学習 Self-Supervised Learning to Prove Equivalence Between Straight-Line Programs via Rewrite Rules ( http://arxiv.org/abs/2109.10476v4 ) ライセンス: Link先を確認	Steve Kommrusch, Martin Monperrus and Louis-No\"el Pouchet	(参考訳) 文列からなる2つのプログラム間の意味同値性の証明を自動合成する問題を対象とする。抽象構文木(AST)を用いてプログラムを表現し、特定のASTパターンに特定のセマンティクス保存規則を適用することで、変換され、意味的に等価なプログラムを生成する。このシステムでは、2つのプログラムが等価であり、この書き換え規則の適用順序が1つのプログラムをもう1つのプログラムに書き換える結果となる場合である。本稿では,プログラムペア間の等価性の証明を生成するトランスフォーマーモデルに基づくニューラルネットワークアーキテクチャを提案する。システムは書き直しのシーケンスを出力し、そのシーケンスの妥当性は、適用可能な検証によって単純にチェックされる。ニューラルネットワークによって有効なシーケンスが生成されない場合、システムはプログラムを等価でないと報告し、設計によってプログラムが不正に等価であると報告されないようにする。本システムは,関数呼び出しと複数の型を持つ直線プログラムを表現可能な単一文法に対して完全に実装されている。このようなシーケンスを生成するシステムを効率的にトレーニングするために,自己教師付きサンプル選択という独自のインクリメンタルトレーニング手法を開発した。本稿では,この新たなトレーニング手法の有効性を,複雑さと長さの増大の証明に広く研究する。私たちのシステムであるs4eqは、1万組の同等プログラムのデータセットで97%の証明成功を達成しています。 We target the problem of automatically synthesizing proofs of semantic equivalence between two programs made of sequences of statements. We represent programs using abstract syntax trees (AST), where a given set of semantics-preserving rewrite rules can be applied on a specific AST pattern to generate a transformed and semantically equivalent program. In our system, two programs are equivalent if there exists a sequence of application of these rewrite rules that leads to rewriting one program into the other. We propose a neural network architecture based on a transformer model to generate proofs of equivalence between program pairs. The system outputs a sequence of rewrites, and the validity of the sequence is simply checked by verifying it can be applied. If no valid sequence is produced by the neural network, the system reports the programs as non-equivalent, ensuring by design no programs may be incorrectly reported as equivalent. Our system is fully implemented for one single grammar which can represent straight-line programs with function calls and multiple types. To efficiently train the system to generate such sequences, we develop an original incremental training technique, named self-supervised sample selection. We extensively study the effectiveness of this novel training approach on proofs of increasing complexity and length. Our system, S4Eq, achieves 97% proof success on a curated dataset of 10,000 pairs of equivalent programs.	翻訳日:2023-07-11 23:06:01 公開日:2023-07-09
# SeedGNN: 教師付きグラフマッチングのためのグラフニューラルネットワーク SeedGNN: Graph Neural Networks for Supervised Seeded Graph Matching ( http://arxiv.org/abs/2205.13679v3 ) ライセンス: Link先を確認	Liren Yu, Jiaming Xu, Xiaojun Lin	(参考訳) グラフマッチングのためのグラフニューラルネットワーク(gnns)の設計には、トポロジカルな情報と小さなシードノードのみを使用して、2つのラベルのないグラフをマッチングすることを目的としている。しかし、このタスクの以前のgnnのほとんどは半教師付きアプローチを使用しており、大量の種を必要とし、見当たらないグラフに転送可能な知識を学べない。対照的に本論文では,未発見のグラフを数種の種とマッチングする方法を学習する新しい教師付きアプローチを提案する。私たちのSeedGNNアーキテクチャは、シードグラフマッチングの理論研究に触発された、いくつかの新しい設計を取り入れています。 1) 異なる大きさのグラフに一般化できる方法で、異なるホップから目撃者のような情報を計算し使用することを学ぶことができる。 2) 容易に整合したノードペアを新しいシードとして使用して,その後のレイヤでの整合性を改善する。合成グラフおよび実世界のグラフ上でのSeedGNNの評価を行い,既存の文献における非学習アルゴリズムと学習アルゴリズムを比較検討した。さらに,学習グラフからseedgnnから得られた知識を,サイズやカテゴリの異なるテストグラフに一般化できることを確認した。 There is a growing interest in designing Graph Neural Networks (GNNs) for seeded graph matching, which aims to match two unlabeled graphs using only topological information and a small set of seed nodes. However, most previous GNNs for this task use a semi-supervised approach, which requires a large number of seeds and cannot learn knowledge that is transferable to unseen graphs. In contrast, this paper proposes a new supervised approach that can learn from a training set how to match unseen graphs with only a few seeds. Our SeedGNN architecture incorporates several novel designs, inspired by theoretical studies of seeded graph matching: 1) it can learn to compute and use witness-like information from different hops, in a way that can be generalized to graphs of different sizes; 2) it can use easily-matched node-pairs as new seeds to improve the matching in subsequent layers. We evaluate SeedGNN on synthetic and real-world graphs and demonstrate significant performance improvements over both non-learning and learning algorithms in the existing literature. Furthermore, our experiments confirm that the knowledge learned by SeedGNN from training graphs can be generalized to test graphs of different sizes and categories.	翻訳日:2023-07-11 22:56:37 公開日:2023-07-09
# 量子最適化アルゴリズムはどの程度必要か? How Much Entanglement Do Quantum Optimization Algorithms Require? ( http://arxiv.org/abs/2205.12283v2 ) ライセンス: Link先を確認	Yanzhu Chen, Linghua Zhu, Chenxu Liu, Nicholas J. Mayhall, Edwin Barnes, and Sophia E. Economou	(参考訳) 多くの古典的最適化問題は、量子近似最適化アルゴリズム(qaoa)のような変分量子アルゴリズムがヒューリスティックな手法を提供する対角イジングハミルトンの基底状態を見つけるためにマッピングすることができる。このような古典的最適化問題の解は必ずしも積状態であるため、絡み合いが性能に与える影響は明らかでない。 QAOAのAdaptive Derivative-Assembled Problem-Tailored (ADAPT) 変動は、回路全体のCNOTゲートが少なくなるのに対して、ミキサー層におけるエンタングリング操作を許容することで収束率を向上させる。本研究では,ADAPT-QAOAの実行時に発生する絡みについて検討する。重み付きMax-Cut問題のシミュレーションにより、ADAPT-QAOAは量子ビットのエンタングおよびアンタングリングにおいてかなりの柔軟性を示すことを示す。この柔軟性を漸進的に制限することにより、初期におけるより多くの絡み合いエントロピーが、後段におけるより速い収束と一致することが分かる。対照的に、標準QAOAはいくつかの層内での絡み合いを迅速に生成するが、過剰な絡み合いを効率的に除去することはできない。量子最適化における絡み合いの役割は微妙であり、量子最適化アルゴリズムに有利な特徴を構築するためのガイダンスを提供する。 Many classical optimization problems can be mapped to finding the ground states of diagonal Ising Hamiltonians, for which variational quantum algorithms such as the Quantum Approximate Optimization Algorithm (QAOA) provide heuristic methods. Because the solutions of such classical optimization problems are necessarily product states, it is unclear how entanglement affects their performance. An Adaptive Derivative-Assembled Problem-Tailored (ADAPT) variation of QAOA improves the convergence rate by allowing entangling operations in the mixer layers whereas it requires fewer CNOT gates in the entire circuit. In this work, we study the entanglement generated during the execution of ADAPT-QAOA. Through simulations of the weighted Max-Cut problem, we show that ADAPT-QAOA exhibits substantial flexibility in entangling and disentangling qubits. By incrementally restricting this flexibility, we find that a larger amount of entanglement entropy at earlier stages coincides with faster convergence at later stages. In contrast, while the standard QAOA quickly generates entanglement within a few layers, it cannot remove excess entanglement efficiently. Our results demonstrate that the role of entanglement in quantum optimization is subtle and provide guidance for building favorable features into quantum optimization algorithms.	翻訳日:2023-07-11 22:56:16 公開日:2023-07-09
# 未知動環境における高速運動計画のための障害物同定と楕円形分解 Obstacle Identification and Ellipsoidal Decomposition for Fast Motion Planning in Unknown Dynamic Environments ( http://arxiv.org/abs/2209.14233v4 ) ライセンス: Link先を確認	Mehmetcan Kaymaz and Nazim Kemal Ure	(参考訳) 未知の環境における動的障害物の存在による衝突回避は、無人システムにとって最も重要な課題の1つである。本稿では,楕円体の観点から障害物を識別し,線形および角障害物速度を推定する手法を提案する。提案手法は,任意の物体を楕円体で近似的に表現できるという考えに基づいている。そこで本研究では,ガウス混合モデルの変分ベイズ推定法,カチヤンアルゴリズム,精細化アルゴリズムを提案する。提案手法はクラスタ数の知識を必要とせず,既存の最適化手法と異なり,リアルタイムに動作可能である。さらに,2つの時間的近接点フレームの障害物に一致する楕円型特徴ベクトルを定義する。本手法は, 回転する障害物を含む静的および動的障害のある環境に適用することができる。このアルゴリズムを他のクラスタリング手法と比較し,軌道プランナーと組み合わせることで,動的障害が存在する場合,システム全体が未知の環境を効率的に横断できることを示す。 Collision avoidance in the presence of dynamic obstacles in unknown environments is one of the most critical challenges for unmanned systems. In this paper, we present a method that identifies obstacles in terms of ellipsoids to estimate linear and angular obstacle velocities. Our proposed method is based on the idea of any object can be approximately expressed by ellipsoids. To achieve this, we propose a method based on variational Bayesian estimation of Gaussian mixture model, the Kyachiyan algorithm, and a refinement algorithm. Our proposed method does not require knowledge of the number of clusters and can operate in real-time, unlike existing optimization-based methods. In addition, we define an ellipsoid-based feature vector to match obstacles given two timely close point frames. Our method can be applied to any environment with static and dynamic obstacles, including the ones with rotating obstacles. We compare our algorithm with other clustering methods and show that when coupled with a trajectory planner, the overall system can efficiently traverse unknown environments in the presence of dynamic obstacles.	翻訳日:2023-07-11 22:47:15 公開日:2023-07-09
# 深層学習のための勾配に基づくbiレベル最適化に関する研究 Gradient-based Bi-level Optimization for Deep Learning: A Survey ( http://arxiv.org/abs/2207.11719v4 ) ライセンス: Link先を確認	Can Chen, Xi Chen, Chen Ma, Zixuan Liu, Xue Liu	(参考訳) 双レベル最適化,特に勾配に基づくカテゴリは,ハイパーパラメータ最適化やメタ知識抽出など,ディープラーニングコミュニティで広く利用されている。双レベル最適化は別の問題に埋め込まれ、勾配に基づくカテゴリは、進化アルゴリズムのような古典的な手法よりもはるかに効率的な過次性を計算することによって、外層課題を解決する。本研究では,まず,勾配に基づくbiレベル最適化を形式的に定義する。次に、二段階最適化に研究課題が適しているかどうかを判断するための基準を明確にし、これらの問題を二段階最適化フレームワークに構造化するための実践的なガイドを提供する。具体的には、正規化パラメータや蒸留データなどのハイパーパラメータを最適化するシングルタスク定式化と、モデル初期化のようなメタ知識を抽出するマルチタスク定式化の2つがある。次に,2段階の定式化により,外変数の明示的な勾配更新,プロキシ更新,暗黙的関数更新,クローズドフォーム更新を含む4つの2段階最適化ソルバについて検討する。最後に,(1)課題定式化のレンズを通して検証した科学における効果的なデータ最適化の2つの今後の方向性を強調することで調査をまとめる。 2)最適化の観点から解析した正確な明示的プロキシ更新。 Bi-level optimization, especially the gradient-based category, has been widely used in the deep learning community including hyperparameter optimization and meta-knowledge extraction. Bi-level optimization embeds one problem within another and the gradient-based category solves the outer-level task by computing the hypergradient, which is much more efficient than classical methods such as the evolutionary algorithm. In this survey, we first give a formal definition of the gradient-based bi-level optimization. Next, we delineate criteria to determine if a research problem is apt for bi-level optimization and provide a practical guide on structuring such problems into a bi-level optimization framework, a feature particularly beneficial for those new to this domain. More specifically, there are two formulations: the single-task formulation to optimize hyperparameters such as regularization parameters and the distilled data, and the multi-task formulation to extract meta-knowledge such as the model initialization. With a bi-level formulation, we then discuss four bi-level optimization solvers to update the outer variable including explicit gradient update, proxy update, implicit function update, and closed-form update. Finally, we wrap up the survey by highlighting two prospective future directions: (1) Effective Data Optimization for Science examined through the lens of task formulation. (2) Accurate Explicit Proxy Update analyzed from an optimization standpoint.	翻訳日:2023-07-11 22:46:08 公開日:2023-07-09
# アダマール試験と近似振幅制約を用いた量子ゲーマン-ウィリアムソンアルゴリズム Quantum Goemans-Williamson Algorithm with the Hadamard Test and Approximate Amplitude Constraints ( http://arxiv.org/abs/2206.14999v3 ) ライセンス: Link先を確認	Taylor L. Patti, Jean Kossaifi, Anima Anandkumar, and Susanne F. Yelin	(参考訳) 半有限プログラムは、難しい組合せ問題を近似するなど、幅広い応用の最適化手法である。そのような半定義のプログラムの1つは、一般的な整数緩和法であるゴーマンス・ウィリアムソンアルゴリズムである。我々は、最大$N=2^n$変数と$M \sim O(N)$制約を持つ半定値プログラムを近似的に解くために、n{+}1$ qubits, a constant number of circuit prepareds, $\text{poly}(n)$ expectation値のみを使用するGoemans-Williamsonアルゴリズムの変分量子アルゴリズムを導入する。効率的な最適化は、目的行列を補助量子ビット上で適切にパラメータ化されたユニタリ条件として符号化することで達成される。アダマールテストにより、指数的に多くの期待値を推定するのではなく、1つの期待値のみを推定することで、目的関数を最適化することができる。同様に、半定値プログラミングの制約は、パウリ弦振幅制約の多項式数を課すとともに、第2アダマールテストを実装することで効果的に実施できることを示す。我々は,Guemans-Williamson アルゴリズムの効率的な量子実装を MaxCut を含む様々なNPハード問題に対して考案し,提案プロトコルの有効性を実証する。本手法は,GSetライブラリから得られた多種多様なMaxCut問題に対する類似の古典的手法の性能を上回る。 Semidefinite programs are optimization methods with a wide array of applications, such as approximating difficult combinatorial problems. One such semidefinite program is the Goemans-Williamson algorithm, a popular integer relaxation technique. We introduce a variational quantum algorithm for the Goemans-Williamson algorithm that uses only $n{+}1$ qubits, a constant number of circuit preparations, and $\text{poly}(n)$ expectation values in order to approximately solve semidefinite programs with up to $N=2^n$ variables and $M \sim O(N)$ constraints. Efficient optimization is achieved by encoding the objective matrix as a properly parameterized unitary conditioned on an auxilary qubit, a technique known as the Hadamard Test. The Hadamard Test enables us to optimize the objective function by estimating only a single expectation value of the ancilla qubit, rather than separately estimating exponentially many expectation values. Similarly, we illustrate that the semidefinite programming constraints can be effectively enforced by implementing a second Hadamard Test, as well as imposing a polynomial number of Pauli string amplitude constraints. We demonstrate the effectiveness of our protocol by devising an efficient quantum implementation of the Goemans-Williamson algorithm for various NP-hard problems, including MaxCut. Our method exceeds the performance of analogous classical methods on a diverse subset of well-studied MaxCut problems from the GSet library.	翻訳日:2023-07-11 22:45:22 公開日:2023-07-09
# 量子解を用いたビザンチン合意におけるフォールトトレランス境界とセキュリティホールの破れ Beating the fault-tolerance bound and security loopholes for Byzantine agreement with a quantum solution ( http://arxiv.org/abs/2206.09159v2 ) ライセンス: Link先を確認	Chen-Xun Weng, Rui-Qi Gao, Yu Bao, Bing-Hong Li, Wen-Bo Liu, Yuan-Mei Xie, Yu-Shuo Lu, Hua-Lei Yin, Zeng-Bing Chen	(参考訳) ブロックチェーンの基盤となるByzantine合意は、分散ネットワーク内のすべてのノードが合意に達することを目指している。古典的なビザンツ協定は2つの大きな問題に直面しない。 1つは、1/3$のフォールトトレランスバウンドであり、悪意のあるプレイヤーに許容するシステムは少なくとも3f+1$のプレイヤーを必要とする。もう1つは、古典的な暗号方式のセキュリティの抜け穴だ。ここでは,量子デジタル署名によるマルチパーティ相関により,約1/2ドルのフォールトトレランスでこの境界を破るために,無条件セキュリティと厳格な量子ビザンチン合意を提案する。我々の研究は、もともとのビザンチン条件に厳密に従い、多粒子絡みを必要とせずに任意の数のプレイヤーに拡張することができる。デジタル台帳の3者および5人の量子コンセンサスを実験的に実証した。我々の研究は、コンセンサス問題の観点から量子優位性を示し、量子ブロックチェーンと量子コンセンサスネットワークの重要な道のりを示唆している。 Byzantine agreement, the underlying core of blockchain, aims to make every node in a decentralized network reach consensus. Classical Byzantine agreements unavoidably face two major problems. One is $1/3$ fault-tolerance bound, which means that the system to tolerate $f$ malicious players requires at least $3f+1$ players. The other is the security loopholes from its classical cryptography methods. Here, we propose a strict quantum Byzantine agreement with unconditional security to break this bound with nearly $1/2$ fault tolerance due to multiparty correlation provided by quantum digital signatures. Our work strictly obeys the original Byzantine conditions and can be extended to any number of players without requirements for multiparticle entanglement. We experimentally demonstrate three-party and five-party quantum consensus for a digital ledger. Our work indicates the quantum advantage in terms of consensus problems and suggests an important avenue for quantum blockchain and quantum consensus networks.	翻訳日:2023-07-11 22:44:54 公開日:2023-07-09
# 最適二分分類木学習のための混合整数線形最適化公式 Mixed integer linear optimization formulations for learning optimal binary classification trees ( http://arxiv.org/abs/2206.04857v2 ) ライセンス: Link先を確認	Brandon Alston, Hamidreza Validi, Illya V. Hicks	(参考訳) 決定木は分類と回帰のための強力なツールであり、機械学習の急成長する分野で働く多くの研究者を惹きつける。他の方法よりも決定木の方が優れているのは解釈可能性であり、比較的解釈不能な他の高精度な方法よりも好まれる。二分分類木には2種類の頂点がある。 (i)ちょうど2人の子供がいて、データポイントが一組の離散的特徴に基づいて評価される分岐頂点 (ii)データポイントが個別に予測される葉の頂点。最適な二分分類木は、目的とする生体的最適化問題を解くことで得られる。 i) 正しく分類されたデータポイントの数を最大化し、 (ii)分岐頂点の数を最小化する。本稿では, 最適二分分類木を設計するための4つの混合整数線形最適化 (milo) 式を提案する。本稿では,提案した定式化とAghaei et al. (2021) の最強フローベースMILO定式化とを理論的に比較する。我々は,パレートフロンティアを用いて,モデルがスケールする能力と2目的アプローチの強みを示すために,13の公開データセットについて実験を行う。コードとデータはGitHubで公開されている。 Decision trees are powerful tools for classification and regression that attract many researchers working in the burgeoning area of machine learning. One advantage of decision trees over other methods is their interpretability, which is often preferred over other higher accuracy methods that are relatively uninterpretable. A binary classification tree has two types of vertices: (i) branching vertices which have exactly two children and where datapoints are assessed on a set of discrete features; and (ii) leaf vertices at which datapoints are given a discrete prediction. An optimal binary classification tree can be obtained by solving a biobjective optimization problem that seeks to (i) maximize the number of correctly classified datapoints and (ii) minimize the number of branching vertices. In this paper, we propose four mixed integer linear optimization (MILO) formulations for designing optimal binary classification trees: two flow-based formulations and two-cut based formulations. We provide theoretical comparisons between our proposed formulations and the strongest flow-based MILO formulation of Aghaei et al. (2021). We conduct experiments on 13 publicly available datasets to show the models' ability to scale and the strength of a biobjective approach using Pareto frontiers. Our code and data are available on GitHub.	翻訳日:2023-07-11 22:44:39 公開日:2023-07-09
# 高次級数展開を用いたマルコフ開量子系シミュレーション Simulating Markovian open quantum systems using higher-order series expansion ( http://arxiv.org/abs/2212.02051v2 ) ライセンス: Link先を確認	Xiantao Li, Chunhao Wang	(参考訳) マルコフ開量子系の力学をシミュレーションするための効率的な量子アルゴリズムを提案する。このアルゴリズムの性能は、従来の最先端量子アルゴリズムと類似しており、進化時間に線形にスケールし、逆精度で多対数にスケールする。しかし,本アルゴリズムは概念的にクリーンであり,圧縮符号化のない単純な量子プリミティブのみを使用する。このアプローチは、デュハメルの原理に基づく高階級数展開とスケールドガウス二次数を用いた多重積分の近似を含む進化写像の新しい数学的処理に基づいている。本手法は時間依存リンドブレディアンを用いた量子力学のシミュレーションに容易に一般化する。さらに, スケールドガウス二次数を用いた多重積分近似法は, 時間次積分のより効率的な近似生成に応用できる可能性があり, ダイソン級数に基づく時間依存ハミルトニアンをシミュレートするための既存の量子アルゴリズムを単純化することができる。 We present an efficient quantum algorithm for simulating the dynamics of Markovian open quantum systems. The performance of our algorithm is similar to the previous state-of-the-art quantum algorithm, i.e., it scales linearly in evolution time and poly-logarithmically in inverse precision. However, our algorithm is conceptually cleaner, and it only uses simple quantum primitives without compressed encoding. Our approach is based on a novel mathematical treatment of the evolution map, which involves a higher-order series expansion based on Duhamel's principle and approximating multiple integrals using scaled Gaussian quadrature. Our method easily generalizes to simulating quantum dynamics with time-dependent Lindbladians. Furthermore, our method of approximating multiple integrals using scaled Gaussian quadrature could potentially be used to produce a more efficient approximation of time-ordered integrals, and therefore can simplify existing quantum algorithms for simulating time-dependent Hamiltonians based on a truncated Dyson series.	翻訳日:2023-07-11 22:36:37 公開日:2023-07-09
# ニューラルネットワークを用いたバイカルGVDデータのノイズ除去 Rejecting noise in Baikal-GVD data with neural networks ( http://arxiv.org/abs/2210.04653v2 ) ライセンス: Link先を確認	I. Kharuk, G. Rubtsov, G. Safronov	(参考訳) Baikal-GVDはバイカル湖の淡水に設置された大型($1 km$^3$)水中ニュートリノ望遠鏡である。深い湖水環境は背景光によって浸透し、バイカルGVDの光センサーによって検出される。本稿では,これらのノイズを信号から効率的に分離するためのニューラルネットワークを提案する。モデルはU-netのようなアーキテクチャを持ち、イベントの時間的(因果的)構造を用いる。ニューラルネットワークのメトリクスは、モンテカルロシミュレーションデータセット上で、99\%の信号純度(精度)と96\%の生存効率(リコール)に達する。提案手法は,雑音を無視するアルゴリズム手法と比較し,グラフベースなど他のニューラルネットワークのアーキテクチャについて考察する。 Baikal-GVD is a large ($\sim$1 km$^3$) underwater neutrino telescope installed in the fresh waters of Lake Baikal. The deep lake water environment is pervaded by background light, which is detectable by Baikal-GVD's photosensors. We introduce a neural network for an efficient separation of these noise hits from the signal ones, stemming from the propagation of relativistic particles through the detector. The model has a U-net-like architecture and employs temporal (causal) structure of events. The neural network's metrics reach up to 99\% signal purity (precision) and 96\% survival efficiency (recall) on Monte-Carlo simulated dataset. We compare the developed method with the algorithmic approach to rejecting the noise and discuss other possible architectures of neural networks, including graph-based ones.	翻訳日:2023-07-11 22:33:45 公開日:2023-07-09
# 学生のt-distribution:観測時の信頼度の測定について Student's t-Distribution: On Measuring the Inter-Rater Reliability When the Observations are Scarce ( http://arxiv.org/abs/2303.04526v2 ) ライセンス: Link先を確認	Serge Gladkoff and Lifeng Han and Goran Nenadic	(参考訳) 自然言語処理(NLP)において、我々は常にゴールデンクオリティ評価法として人間の判断に頼っている。しかし、翻訳品質評価(TQE)、特にデータサンプル(観測値)が非常に少ない場合など、特定の評価タスクに対して、レータ間信頼性(IRR)レベルをより良く評価する方法に関する議論が続いている。本研究ではまず,1つのデータ(評価)ポイントしか得られない場合に,測定値の信頼区間を推定する方法について検討する。次に,2つの人間生成観察スコアを例示し,``sudent's \textit{t}-distribution'' 法を紹介し,これら2つのデータ点のみを用いて irr スコアを測定する方法と,品質評価の信頼区間 (cis) について説明する。評価信頼度は, 1回だけ観察しても, より多くの観察を導入することで, 評価信頼度が大幅に向上することを示す。研究者は、学生の「textit{t}-Distribution method」など、あらゆる方法でIRRスコアを報告し、NLP評価をより有意義で透明で信頼性の高いものにすることを推奨する。この \textit{t}-distribution 法は nlp フィールドの外でも利用でき、観測データが乏しい場合には、実験調査の信頼に値する評価のために irr レベルを測定することができる。キーワード:インターレータ信頼性(IRR)、スカース観測(Scarce Observations)、信頼区間(CIs)、自然言語処理(NLP)、翻訳品質評価(TQE)、学生の『textit{t}-Distribution』 In natural language processing (NLP) we always rely on human judgement as the golden quality evaluation method. However, there has been an ongoing debate on how to better evaluate inter-rater reliability (IRR) levels for certain evaluation tasks, such as translation quality evaluation (TQE), especially when the data samples (observations) are very scarce. In this work, we first introduce the study on how to estimate the confidence interval for the measurement value when only one data (evaluation) point is available. Then, this leads to our example with two human-generated observational scores, for which, we introduce ``Student's \textit{t}-Distribution'' method and explain how to use it to measure the IRR score using only these two data points, as well as the confidence intervals (CIs) of the quality evaluation. We give quantitative analysis on how the evaluation confidence can be greatly improved by introducing more observations, even if only one extra observation. We encourage researchers to report their IRR scores in all possible means, e.g. using Student's \textit{t}-Distribution method whenever possible; thus making the NLP evaluation more meaningful, transparent, and trustworthy. This \textit{t}-Distribution method can be also used outside of NLP fields to measure IRR level for trustworthy evaluation of experimental investigations, whenever the observational data is scarce. Keywords: Inter-Rater Reliability (IRR); Scarce Observations; Confidence Intervals (CIs); Natural Language Processing (NLP); Translation Quality Evaluation (TQE); Student's \textit{t}-Distribution	翻訳日:2023-07-11 22:16:24 公開日:2023-07-09
# 固有値問題に対するほぼ退化密度行列摂動理論の係数 Coefficients of almost-degenerate density matrix perturbation theory for eigenvalue problems ( http://arxiv.org/abs/2305.09026v2 ) ライセンス: Link先を確認	Charles Arnal, Louis Garrigue	(参考訳) 固有値問題のほぼ退化摂動理論をスペクトルプロジェクタ、別名密度行列を用いて検討する。複数の固有値が互いに近いとき、摂動級数の係数は、固有値間の差の逆がいくつかの因子として現れるため特異になる。級数の係数の表現におけるこれらの人工特異点を取り除き、固有値のギャップを任意に小さくし、結果の式で消えることさえできる。 We investigate almost-degenerate perturbation theory of eigenvalue problems, using spectral projectors, also named density matrices. When several eigenvalues are close to each other, the coefficients of the perturbative series become singular because inverses of differences between eigenvalues arise as some factors. We remove those artificial singularities in the expressions of the coefficients of the series, allowing eigenvalue gaps to be arbitrarily small and even vanishing in the resulting formulas.	翻訳日:2023-07-11 22:07:41 公開日:2023-07-09
# 分子関係学習のための条件付きグラフ情報基盤 Conditional Graph Information Bottleneck for Molecular Relational Learning ( http://arxiv.org/abs/2305.01520v2 ) ライセンス: Link先を確認	Namkyeong Lee, Dongmin Hyun, Gyoung S. Na, Sungwon Kim, Junseok Lee, Chanyoung Park	(参考訳) 分子関係学習は、分子対間の相互作用の振る舞いを学ぶことを目的としており、その幅広い応用のために分子科学への関心が高まった。近年、グラフニューラルネットワークは、分子をグラフ構造としてモデル化し、2分子間の原子レベルの相互作用を考慮し、分子関係学習において大きな成功を収めている。その成功にもかかわらず、既存の分子関係学習法は化学の性質を見落としている傾向にあり、例えば、化学反応を引き起こす官能基のような複数のサブ構造からなる化合物である。本研究では,コアサブグラフを検出することによって,グラフ対間のインタラクション挙動を予測するcgibと呼ばれる新しい関係学習フレームワークを提案する。主なアイデアは、一対のグラフが与えられたとき、条件付きグラフ情報ボトルネックの原理に基づいて、ペア付きグラフ上で条件付けされたタスクに関する最小限の十分な情報を含むグラフからサブグラフを見つけることである。提案手法は化学反応の性質、すなわち分子の核構造がどの分子と相互作用するかによって変化するという性質を模倣していると論じる。実世界のデータセットを用いた様々なタスクに関する大規模な実験は、最先端のベースラインよりもCGIBの方が優れていることを示す。私たちのコードはhttps://github.com/Namkyeong/CGIB.comで利用可能です。 Molecular relational learning, whose goal is to learn the interaction behavior between molecular pairs, got a surge of interest in molecular sciences due to its wide range of applications. Recently, graph neural networks have recently shown great success in molecular relational learning by modeling a molecule as a graph structure, and considering atom-level interactions between two molecules. Despite their success, existing molecular relational learning methods tend to overlook the nature of chemistry, i.e., a chemical compound is composed of multiple substructures such as functional groups that cause distinctive chemical reactions. In this work, we propose a novel relational learning framework, called CGIB, that predicts the interaction behavior between a pair of graphs by detecting core subgraphs therein. The main idea is, given a pair of graphs, to find a subgraph from a graph that contains the minimal sufficient information regarding the task at hand conditioned on the paired graph based on the principle of conditional graph information bottleneck. We argue that our proposed method mimics the nature of chemical reactions, i.e., the core substructure of a molecule varies depending on which other molecule it interacts with. Extensive experiments on various tasks with real-world datasets demonstrate the superiority of CGIB over state-of-the-art baselines. Our code is available at https://github.com/Namkyeong/CGIB.	翻訳日:2023-07-11 22:07:32 公開日:2023-07-09
# Cu配線の非破壊診断における反射係数のグラフパターンの学習 Learning Graph Patterns of Reflection Coefficient for Non-destructive Diagnosis of Cu Interconnects ( http://arxiv.org/abs/2304.10207v2 ) ライセンス: Link先を確認	Tae Yeob Kang, Haebom Lee, Sungho Suh	(参考訳) プロセッサの動作周波数とクロック速度の増加に伴い、相互接続は電子システム全体の信頼性と性能の両方に影響を及ぼす。配線の故障検出と診断は、電子の予後と健康管理(PHM)に不可欠である。しかし、電気信号を予後因子として用いる従来のアプローチは、欠陥根本原因を識別し、さらなる破壊的な評価を必要とすることがあり、ノイズ干渉の危険性があり、誤報につながる可能性がある。これらの制約に対処するため,Cu配線欠陥の非破壊検出と診断のための新しい手法を提案し,早期検出,診断精度の向上,耐雑音性を実現した。本手法は,従来の時系列信号解析とは異なる手法である反射係数のグラフパターンを利用して,相互接続欠陥の根本原因と重大度を一意に解析する。本研究では,グラフパターンが故障診断の能力を有し,学習アルゴリズムの効果的な入力データとなることを実験的に実証する。さらに,重大度評価アンサンブル学習(srel)アプローチを導入し,診断精度と雑音ロバスト性を大幅に向上させる。実験の結果,提案手法は従来の機械学習手法やマルチクラス畳み込みニューラルネットワーク(CNN)よりも優れており,特に高騒音下での最大精度は99.3%であることがわかった。 With the increasing operating frequencies and clock speeds in processors, interconnects affect both the reliability and performance of entire electronic systems. Fault detection and diagnosis of the interconnects are crucial for prognostics and health management (PHM) of electronics. However, traditional approaches using electrical signals as prognostic factors often face challenges in distinguishing defect root causes, necessitating additional destructive evaluations, and are prone to noise interference, leading to potential false alarms. To address these limitations, this paper introduces a novel approach for non-destructive detection and diagnosis of defects in Cu interconnects, offering early detection, enhanced diagnostic accuracy, and noise resilience. Our approach uniquely analyzes both the root cause and severity of interconnect defects by leveraging graph patterns of reflection coefficient, a technique distinct from traditional time series signal analysis. We experimentally demonstrate that the graph patterns possess the capability for fault diagnosis and serve as effective input data for learning algorithms. Additionally, we introduce a novel severity rating ensemble learning (SREL) approach, which significantly enhances diagnostic accuracy and noise robustness. Experimental results demonstrate that the proposed method outperforms conventional machine learning methods and multi-class convolutional neural networks (CNN), achieving a maximum accuracy of 99.3%, especially under elevated noise levels.	翻訳日:2023-07-11 22:07:14 公開日:2023-07-09
# simbaml: 機械モデルと機械学習を拡張データで接続する SimbaML: Connecting Mechanistic Models and Machine Learning with Augmented Data ( http://arxiv.org/abs/2304.04000v2 ) ライセンス: Link先を確認	Maximilian Kleissl, Lukas Drews, Benedict B. Heyder, Julian Zabbarov, Pascal Iversen, Simon Witzke, Bernhard Y. Renard, Katharina Baum	(参考訳) 高度な機械学習(ML)モデルのトレーニングには、多くのアプリケーションで収集するのが困難または高価である大規模なデータセットが必要である。システムダイナミクスに関する事前知識が利用可能であれば、実世界のデータを補完するために機械的な表現が使用できる。我々は,通常の微分方程式モデルからリアルな合成データセットを生成するオープンソースツールであるSimbaML(Simulation-based ML)と,MLパイプラインの直接解析と包含について述べる。 SimbaMLは、合成データから実世界のデータへの変換学習、データ拡張、データ収集の必要性の識別、物理インフォームドMLアプローチのベンチマークを可能にする。 SimbaMLはhttps://pypi.org/project/simba-ml/から入手できる。 Training sophisticated machine learning (ML) models requires large datasets that are difficult or expensive to collect for many applications. If prior knowledge about system dynamics is available, mechanistic representations can be used to supplement real-world data. We present SimbaML (Simulation-Based ML), an open-source tool that unifies realistic synthetic dataset generation from ordinary differential equation-based models and the direct analysis and inclusion in ML pipelines. SimbaML conveniently enables investigating transfer learning from synthetic to real-world data, data augmentation, identifying needs for data collection, and benchmarking physics-informed ML approaches. SimbaML is available from https://pypi.org/project/simba-ml/.	翻訳日:2023-07-11 22:05:42 公開日:2023-07-09
# 深部物理誘導粒子流場を用いた非教師なしクロスドメインソフトセンサモデリング Unsupervised Cross-Domain Soft Sensor Modelling via Deep Physics-Inspired Particle Flow Bayes ( http://arxiv.org/abs/2306.04919v4 ) ライセンス: Link先を確認	Junn Yong Loo, Ze Yang Ding, Surya G. Nurzaman, Chee-Ming Ting, Vishnu Monn Baskaran and Chee Pin Tan	(参考訳) データ駆動型ソフトセンサーは、信頼できる状態推定によって正確な知覚を達成するために不可欠である。しかし、代表的なソフトセンサーモデルの開発には、ラベルの欠如、ドメイン適応性、データの時間的コヒーレンスといった問題がある。これらの課題に対処するため,我々は,対象とする状態ラベルがない場合のクロスドメインソフトセンサモデリングのためのdpfb(deep particle flow bayes)フレームワークを提案する。特に、シーケンシャルベイズ目標を最初に定式化し、クロスドメインソフトセンシング問題の基礎となる最大確率推定を行う。フレームワークのコアには物理に触発された粒子の流れが組み込まれており、シーケンシャルベイズ目標を最適化し、抽出された潜在性と隠れた特徴の正確なベイズ更新を行う。その結果,提案手法は複雑なクロスドメインシステムのダイナミクスを特徴付け,効率的な時系列非教師なしドメイン適応 (uda) を実現することができる。最後に,複雑なダイナミクスと複数の動作条件を有する複合産業多相流プロセスシステム上での枠組みを検証する。その結果,DPFBフレームワークは高いドメイン間ソフトセンシング性能,最先端の深部UDA性能,正規化フローアプローチを実現していることがわかった。 Data-driven soft sensors are essential for achieving accurate perception through reliable state inference. However, developing representative soft sensor models is challenged by issues such as missing labels, domain adaptability, and temporal coherence in data. To address these challenges, we propose a deep Particle Flow Bayes (DPFB) framework for cross-domain soft sensor modeling in the absence of target state labels. In particular, a sequential Bayes objective is first formulated to perform the maximum likelihood estimation underlying the cross-domain soft sensing problem. At the core of the framework, we incorporate a physics-inspired particle flow that optimizes the sequential Bayes objective to perform an exact Bayes update of the model extracted latent and hidden features. As a result, these contributions enable the proposed framework to learn a rich approximate posterior feature representation capable of characterizing complex cross-domain system dynamics and performing effective time series unsupervised domain adaptation (UDA). Finally, we validate the framework on a complex industrial multiphase flow process system with complex dynamics and multiple operating conditions. The results demonstrate that the DPFB framework achieves superior cross-domain soft sensing performance, outperforming state-of-the-art deep UDA and normalizing flow approaches.	翻訳日:2023-07-11 21:57:29 公開日:2023-07-09
# 量子コヒーレンス保護のための熱コヒーレント状態の調製 Preparation of thermal coherent state for quantum coherence protection ( http://arxiv.org/abs/2306.04369v2 ) ライセンス: Link先を確認	Asghar Ullah, M. Tahir Naseem, and \"Ozg\"ur E. M\"ustecapl{\i}o\u{g}lu	(参考訳) 熱環境と量子システムの間の不可避な相互作用は、量子特性の劣化を招き、量子状態工学によって対抗することができる。特に、熱コヒーレント状態(tcs)の調製は、量子ビットの量子特性の延長に有望である。熱的, 縦方向の伝送線路共振器において, アンシラ量子ビットを用いてTCSを実現することを提案する。開系力学を記述するためにマスター方程式を用いると、量子ビットと共振器に対するマスター方程式の定常解が得られる。注目すべきは、共振器の状態はTCSであり、アンシラ量子ビットは熱のままである。さらに,2次相関係数と光子数統計値を用いて量子特性の検証を行った。そこで本研究では,二段系と共振器からなるハイブリッド系に基づいて量子コヒーレンスを生成する機構について検討し,アシラ支援による熱コヒーレント状態が量子ビットのコヒーレンス寿命を延ばすのに役立つと主張する。この結果は,量子科学と技術のためのTCSの作成と実装に有望な方向性をもたらす可能性がある。 The unavoidable interaction between thermal environments and quantum systems leads to the degradation of the quantum features, which can be fought against by engineered quantum states. In particular, preparing a thermal coherent state (TCS) can be promising for prolonging the quantum properties of qubits. We propose that a TCS can be realized by using an ancilla qubit to thermally and longitudinally driven transmission line resonator. Using the master equation approach to describe the open system dynamics, we obtain the steady-state solution of the master equation for the qubit and resonator. Remarkably, the state of the resonator is a TCS, while the ancilla qubit remains thermal. Furthermore, we study the second-order correlation coefficient and photon number statistics to validate its quantum properties. To sum up, we also investigate a mechanism for generating quantum coherence based on a hybrid system composed of two-level systems and a resonator to claim that an ancilla-assisted engineered thermal coherent state can assist in prolonging the coherence lifetimes of qubits. Our results may provide a promising direction for preparing and practically implementing TCSs for quantum science and technology.	翻訳日:2023-07-11 21:57:07 公開日:2023-07-09
# 音声表現モデルのタスク非依存的構造化プルーニング Task-Agnostic Structured Pruning of Speech Representation Models ( http://arxiv.org/abs/2306.01385v2 ) ライセンス: Link先を確認	Haoyu Wang, Siyuan Wang, Wei-Qiang Zhang, Hongbin Suo, Yulong Wan	(参考訳) Wav2vec2, Hubert, WavLMなどの自己教師付き事前訓練モデルでは、多くの音声タスクを大幅に改善することが示されている。しかし、その大きなメモリと強力な計算要求が産業応用を妨げている。構造化プルーニングはハードウェアフレンドリーなモデル圧縮技術であるが、通常は精度が低下する。本稿では,性能劣化を補償するための細粒度注意ヘッドプルーニング法を提案する。さらに,L0正則化に直線スルー推定器を導入し,プルーンドモデルをさらに高速化する。 superbベンチマークの実験では、複数のタスクで密度の高いモデルと同等の性能を達成でき、平均でwav2vec 2.0ベースモデルよりも72%少ないパラメータと2倍速い推論速度を持つ。 Self-supervised pre-trained models such as Wav2vec2, Hubert, and WavLM have been shown to significantly improve many speech tasks. However, their large memory and strong computational requirements hinder their industrial applicability. Structured pruning is a hardware-friendly model compression technique but usually results in a larger loss of accuracy. In this paper, we propose a fine-grained attention head pruning method to compensate for the performance degradation. In addition, we also introduce the straight through estimator into the L0 regularization to further accelerate the pruned model. Experiments on the SUPERB benchmark show that our model can achieve comparable performance to the dense model in multiple tasks and outperforms the Wav2vec 2.0 base model on average, with 72% fewer parameters and 2 times faster inference speed.	翻訳日:2023-07-11 21:56:28 公開日:2023-07-09
# 二重複素量子トランスダクションのための最適化プロトコル Optimized protocols for duplex quantum transduction ( http://arxiv.org/abs/2305.15648v2 ) ライセンス: Link先を確認	Zhaoyou Wang, Mengzhen Zhang, Yat Wong, Changchun Zhong, Liang Jiang	(参考訳) 量子トランスデューサは量子ネットワーク内の物理プラットフォームのハイブリッドインターフェースを介して量子信号を変換する。量子通信チャネルとしてモデル化され、一方向量子変換の性能は量子チャネル容量によって測定できる。しかし、双方向に信号が変換される二重量子トランスダクションに用いられる量子トランスデューサの特性は未解決のままである。本稿では、二重複素量子トランスダクションの性能を特徴付けるためのレート領域を提案する。このツールを用いることで、同時二重変換に最適化された量子トランスデューサは、時間共有一方向変換の標準プロトコルに基づく戦略よりも優れることがわかった。周波数領域に統合されたレート領域は、有限帯域幅の量子トランスデューサを特徴付けることもできる。 Quantum transducers convert quantum signals through hybrid interfaces of physical platforms in quantum networks. Modeled as quantum communication channels, performance of unidirectional quantum transduction can be measured by the quantum channel capacity. However, characterizing performance of quantum transducers used for duplex quantum transduction where signals are converted bidirectionally remains an open question. Here, we propose rate regions to characterize the performance of duplex quantum transduction. Using this tool, we find that quantum transducers optimized for simultaneous duplex transduction can outperform strategies based on the standard protocol of time-shared unidirectional transduction. Integrated over the frequency domain, we demonstrate that rate region can also characterize quantum transducers with finite bandwidth.	翻訳日:2023-07-11 21:55:30 公開日:2023-07-09
# 自由フェルミオン模型における拡散複雑性 Spread Complexity in free fermion models ( http://arxiv.org/abs/2305.12115v2 ) ライセンス: Link先を確認	Mamta Gautam, Nitesh Jaiswal, and Ankit Gill	(参考訳) 3スピン相互作用型イジングモデル、xyスピンチェーン、su-schrieffer-heegerモデルにおけるクエンチェの作業の複雑さと統計について検討した。我々は,これらのモデルについて,急速クエンチや急速クエンチなどの異なるクエンチのスキームについて検討した。パラメータの時間依存周期駆動の存在下で、3つのモデルすべてを調べるためにフロッケ演算子手法を用いる。急激な焼成事件とは対照的に、周期的に変化するパラメーターケースは臨界点付近の非解析的挙動をはっきりと示している。また, 作業とランチョス係数の関係と, 作業の統計が臨界点付近でどのように振る舞うかを明らかにする。 We study spread complexity and the statistics of work done for quenches in the three-spin interacting Ising model, the XY spin chain, and the Su-Schrieffer-Heeger model. We study these models without quench and for different schemes of quenches, such as sudden quench and multiple sudden quenches. We employ the Floquet operator technique to investigate all three models in the presence of time-dependent periodic driving of parameters. In contrast to the sudden quenched cases, the periodically varying parameter case clearly shows non-analytical behaviour near the critical point. We also elucidate the relation between work done and the Lanczos coefficient and how the statistics of work done behave near critical points.	翻訳日:2023-07-11 21:54:44 公開日:2023-07-09
# 最適化正方形誤差を用いた量子振幅推定 Quantum Amplitude Estimation with Optimized Squared Error ( http://arxiv.org/abs/2306.16695v2 ) ライセンス: Link先を確認	Xi Lu, Hongwei Lin	(参考訳) まず,量子位相推定回路の初期状態の最適化により,量子振幅推定の誤差挙動を最適化する手法を提案する。次に、半分のoracle呼び出しで同じ性能を達成する量子回路を構築する。このような最適化された量子振幅推定(OQAE)アルゴリズムは、標準偏差$\Delta x \sim 1.283/N$を達成することができる。 We first introduce a method to optimize the error behavior of quantum amplitude estimation by optimizing the initial state of the quantum phase estimation circuit. Then we construct a quantum circuit that achieves the same performance with half number of oracle calls. Such optimized quantum amplitude estimation (OQAE) algorithm can achieve a standard deviation $\Delta x \sim 1.283/N$, which overwhelms existing algorithm with $\Delta x$ about $>4/N$.	翻訳日:2023-07-11 21:48:48 公開日:2023-07-09
# インテリジェントトレーディング確率波方程式に基づく複素適応学習の理論 A Theory of Complex Adaptive Learning Based on an Intelligent Trading Probability Wave Equation ( http://arxiv.org/abs/2306.15554v3 ) ライセンス: Link先を確認	Leilei Shi, Bing-Hong Wang, Xinshuai Guo, Guocheng Wang	(参考訳) 複雑適応学習は知的であり、生命と無生物の複雑なシステムにおいて不可欠である。複雑なシステムは、相互作用する多くの個人や単位を含み、相互作用するときに隠れたパターンを示し、自然科学から社会科学まで、ほぼ全ての伝統的な分野において広く起こる。最近の研究では、いわゆる建築材料が学習できることを示した。複雑な系の定式化のメカニズムを探求する科学者を刺激する。しかし、それは非常に難しい。ここでは,複素系の局所的力学平衡を対象とする普遍的規則あるいは複素適応学習法則を,貿易量-価格確率波方程式から抽出し,その応用として複素量子系に適用しようとする。複雑な量子系に作用する運動量力が非局在化されていれば、相互作用コヒーレンスにおけるインテリジェンスのような性質を持つ粒子が証明される。これは時間間隔で観測された移動粒子の累積確率である。したがって、複雑な量子系の粒子は、金融市場の複雑さにおけるトレーダーのそれと正確に複雑な適応学習機構によって支配される強化座標において、複雑な適応学習または知性のような性質を持つと仮定する。この仮定により、量子力学における絡み合いの革新的な解釈を提案する。量子の絡み合いはコペンハーゲンの主流派が維持するコヒーレント状態の重ね合わせの状態ではないと結論付けている。相補的な2つの力と可変力の間の相互作用におけるコヒーレントな状態である。著者らは,新しい技術経路における絡み合い資源の産業生産を示唆し,その妥当性を検証し,その理論が完全になるまでさらに改良する実験結果を見据えた。 Complex adaptive learning is intelligent and crucial in living and inanimate complex systems. A complex system comprises many interacting individuals or units, shows hidden patterns as they interact, and widely occurs in almost every traditional discipline, from natural to social sciences. A recent study has demonstrated a so-called architected material capable of learning. It stimulates scientists to explore the mechanism of complex systems formulation. However, it is very challenging. Here the authors attempt to extract a universal rule or a law of complex adaptive learning subject to local dynamic equilibrium in complex systems from a trading volume-price probability wave equation and apply it to complex quantum systems as its application. It proves particles capable of intelligence-like properties in interactive coherence if the momentum force exerted on the complex quantum systems is non-localized. It is the cumulative probability of the moving particles observed in a time interval. Thus, it assumes that particles in complex quantum systems have a complex adaptive learning- or intelligence-like property in a reinforced coordinate, governed by the exact complex adaptive learning mechanism as that of traders in the complexity of the financial markets. With this assumption, the authors propose an innovative interpretation of entanglement in quantum mechanics. It concludes that quantum entanglement is not a state of the superposition of coherent states as the mainstream Copenhagen school of thought maintains. It is a coherent state in the interaction between two opposite, complementary, and variable forces. The authors look forward to the experimental results to examine its validity and further improve the theory until it is perfect, suggesting industrial production of entanglement resources in new technical routes available	翻訳日:2023-07-11 21:48:38 公開日:2023-07-09
# 擬似プログラミングにおける$O(\sqrt{n})$から$O(\log(n))$へ From $O(\sqrt{n})$ to $O(\log(n))$ in Quadratic Programming ( http://arxiv.org/abs/2306.15079v2 ) ライセンス: Link先を確認	Liang Wu	(参考訳) 暗雲」は数十年間、数値最適化理論、すなわち、最適化アルゴリズム $o(\log(n))$ の反復複雑性が存在するかどうかにかかっている。この論文は,新たな最適化アルゴリズムと厳密な理論証明を用いて答える。ボックス制約付き二次プログラミング(Box-QP)から始まり、多くの実用的な最適化問題はBox-QPに該当する。一般的な滑らかな二次計画法(QP)、非滑らかなラッソ、サポートベクターマシン(または回帰)は双対性理論によりBox-QPとして再構成できる。特に "direct" メソッドのように振る舞う$o(\log(n))$ 反復複雑性 qp アルゴリズムを提示するのは初めてである: 必要なイテレーション数は、正確な値 $\left\lceil\log\left(\frac{3.125n}{\epsilon}\right)/\log(1.5625)\right\rceil$ で決定論的である。この大きなブレークスルーによって、$o(\sqrt{n})$から$o(\log(n))$の最適化アルゴリズムへの移行が可能になります。 A "dark cloud" hangs over numerical optimization theory for decades, namely, whether an optimization algorithm $O(\log(n))$ iteration complexity exists. "Yes", this paper answers, with a new optimization algorithm and strict theory proof. It starts with box-constrained quadratic programming (Box-QP), and many practical optimization problems fall into Box-QP. General smooth quadratic programming (QP), nonsmooth Lasso, and support vector machine (or regression) can be reformulated as Box-QP via duality theory. It is the first time to present an $O(\log(n))$ iteration complexity QP algorithm, in particular, which behaves like a "direct" method: the required number of iterations is deterministic with exact value $\left\lceil\log\left(\frac{3.125n}{\epsilon}\right)/\log(1.5625)\right\rceil$. This significant breakthrough enables us to transition from the $O(\sqrt{n})$ to the $O(\log(n))$ optimization algorithm, whose amazing scalability is particularly relevant in today's era of big data and artificial intelligence.	翻訳日:2023-07-11 21:48:13 公開日:2023-07-09
# 構造量子状態のための安定トモグラフィ Stable Tomography for Structured Quantum States ( http://arxiv.org/abs/2306.09432v2 ) ライセンス: Link先を確認	Zhen Qin, Casey Jameson, Zhexuan Gong, Michael B. Wakin and Zhihui Zhu	(参考訳) 量子状態トモグラフィ(QST)を用いてしばしば達成される実験的測定から量子状態の再構成は、量子デバイスの検証とベンチマークに不可欠である。しかし、一般の非構造化量子状態に対してQSTを実行するには、最も最適な測定設定であっても、システム内の個々の量子数とともに \emph{exponentially} を成長させる膨大な数の状態コピーが必要である。幸いなことに、ノイズや中間スケールの量子コンピュータによって生成される状態のような多くの物理量子状態は通常、構造化される。一次元では、そのような状態は、キュービットの個数に依存しない有限行列/結合次元を持つ行列積作用素(MPO)によってよく近似されることが期待される。しかしながら、これらの状態に対して効率的なQSTが実行可能であるかどうかはまだ不明である。本稿では, このギャップを橋渡しし, 圧縮センシングと経験的過程の理論を用いたmposの安定回復のための理論的保証を確立する。まず、ガウス測度とHaar random rank-one Positive Operator Valued Measures (POVMs)の2種類のランダム測定設定について検討する。有限結合次元のMPOに含まれる情報は、測定値の統計的誤差を仮定して、キュービット数にのみ依存する多数のランダムな測定値を用いて保存可能であることを示す。次に、量子コンピュータ上で実装可能なHaarランダムランクワンPOVMを用いて、MPOベースのQSTを物理量子測定により研究する。我々は、MPO状態の有界回復誤差を保証するために、キュービット数における状態コピー数 \emph{polynomial} だけが必要であることを証明した。 The reconstruction of quantum states from experimental measurements, often achieved using quantum state tomography (QST), is crucial for the verification and benchmarking of quantum devices. However, performing QST for a generic unstructured quantum state requires an enormous number of state copies that grows \emph{exponentially} with the number of individual quanta in the system, even for the most optimal measurement settings. Fortunately, many physical quantum states, such as states generated by noisy, intermediate-scale quantum computers, are usually structured. In one dimension, such states are expected to be well approximated by matrix product operators (MPOs) with a finite matrix/bond dimension independent of the number of qubits, therefore enabling efficient state representation. Nevertheless, it is still unclear whether efficient QST can be performed for these states in general. In this paper, we attempt to bridge this gap and establish theoretical guarantees for the stable recovery of MPOs using tools from compressive sensing and the theory of empirical processes. We begin by studying two types of random measurement settings: Gaussian measurements and Haar random rank-one Positive Operator Valued Measures (POVMs). We show that the information contained in an MPO with a finite bond dimension can be preserved using a number of random measurements that depends only \emph{linearly} on the number of qubits, assuming no statistical error of the measurements. We then study MPO-based QST with physical quantum measurements through Haar random rank-one POVMs that can be implemented on quantum computers. We prove that only a \emph{polynomial} number of state copies in the number of qubits is required to guarantee bounded recovery error of an MPO state.	翻訳日:2023-07-11 21:46:37 公開日:2023-07-09
# GEMO-CLAP:ジェンダー属性強化コントラスト言語-Audio Pretraining for Speech Emotion Recognition GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Speech Emotion Recognition ( http://arxiv.org/abs/2306.07848v3 ) ライセンス: Link先を確認	Yu Pan, Yanni Hu, Yuguang Yang, Jixun Yao, Wen Fei, Lei Ma, Heng Lu	(参考訳) コントラスト学習に基づく事前学習手法は,近年,様々な分野において顕著な成功を収めている。本稿では,音声感情認識のための,ジェンダー属性強調コントラスト言語-audio pretraining (clap) モデルの一種であるgemo-clapを提案する。具体的には、まず感情認識のための効果的な感情CLAPモデルEmo-CLAPを構築し、様々な自己教師付き学習に基づく事前学習モデルを利用する。そして、音声感情モデリングにおけるジェンダー属性の重要性を考慮し、2つのGEmo-CLAPアプローチを提案し、音声信号の感情情報とジェンダー情報を統合し、より合理的な目的を形成する。 iemocapコーパスの広範囲な実験により,本提案手法は異なる事前学習モデルでベースラインのemo-clapを一貫して上回り,他の最先端手法よりも優れた認識性能を達成していることが示された。 Contrastive learning based pretraining methods have recently exhibited impressive success in diverse fields. In this paper, we propose GEmo-CLAP, a kind of efficient gender-attribute-enhanced contrastive language-audio pretraining (CLAP) model for speech emotion recognition. To be specific, we first build an effective emotion CLAP model Emo-CLAP for emotion recognition, utilizing various self-supervised learning based pre-trained models. Then, considering the importance of the gender attribute in speech emotion modeling, two GEmo-CLAP approaches are further proposed to integrate the emotion and gender information of speech signals, forming more reasonable objectives. Extensive experiments on the IEMOCAP corpus demonstrate that our proposed two GEmo-CLAP approaches consistently outperform the baseline Emo-CLAP with different pre-trained models, while also achieving superior recognition performance compared with other state-of-the-art methods.	翻訳日:2023-07-11 21:45:49 公開日:2023-07-09
# 実用的なコラボレーティブ知覚:非同期およびマルチエージェント3dオブジェクト検出のためのフレームワーク Practical Collaborative Perception: A Framework for Asynchronous and Multi-Agent 3D Object Detection ( http://arxiv.org/abs/2307.01462v2 ) ライセンス: Link先を確認	Minh-Quan Dao, Julie Stephany Berrio, Vincent Fr\'emont, Mao Shan, Elwan H\'ery, and Stewart Worrall	(参考訳) 咬合は、LiDARベースのオブジェクト検出方法において大きな課題である。この課題は、多数の道路利用者による障害により視野が著しく低下する一方、衝突を避けるため、エゴ車両が信頼性の高い物体検出を行う必要がある都市交通において、安全上重要となる。車間コミュニケーション(V2X)による協調的知覚は、接続されたエージェントが複数存在することで様々な視点を生かし、完全なシーン表現を形成することで、魅力的な解決法である。最先端のV2X方式では,早期の協調作業において,点雲のバードアイビューイメージを交換し,通信点雲よりも通信点雲の帯域消費が低く,また,接続エージェント間の深い相互作用によりエージェントの出力を融合させる遅延協調よりも検出性能が高いという,中間協調方式を用いて,性能帯域幅のトレードオフを解消する。強力なパフォーマンスを実現する一方で、ほとんどの中途半端なアプローチの実際の展開は、学習可能なコラボレーショングラフやオートエンコーダベースの圧縮/圧縮機、エージェント間の同期に関する非現実的な仮定など、非常に複雑なアーキテクチャによって妨げられている。本研究では,単一車両検出モデルの変更を最小限に抑えつつ,エージェント間同期における非現実的な仮定を緩和しつつ,従来手法よりも優れた帯域幅性能のトレードオフを実現する,シンプルかつ効果的な協調手法を提案する。 v2x-simデータセットを用いた実験により,提案手法は,遅延コラボレーション法と同等の帯域幅のみを消費しながら,早期コラボレーション法の性能の98\%を達成した。 Occlusion is a major challenge for LiDAR-based object detection methods. This challenge becomes safety-critical in urban traffic where the ego vehicle must have reliable object detection to avoid collision while its field of view is severely reduced due to the obstruction posed by a large number of road users. Collaborative perception via Vehicle-to-Everything (V2X) communication, which leverages the diverse perspective thanks to the presence at multiple locations of connected agents to form a complete scene representation, is an appealing solution. State-of-the-art V2X methods resolve the performance-bandwidth tradeoff using a mid-collaboration approach where the Bird-Eye View images of point clouds are exchanged so that the bandwidth consumption is lower than communicating point clouds as in early collaboration, and the detection performance is higher than late collaboration, which fuses agents' output, thanks to a deeper interaction among connected agents. While achieving strong performance, the real-world deployment of most mid-collaboration approaches is hindered by their overly complicated architectures, involving learnable collaboration graphs and autoencoder-based compressor/ decompressor, and unrealistic assumptions about inter-agent synchronization. In this work, we devise a simple yet effective collaboration method that achieves a better bandwidth-performance tradeoff than prior state-of-the-art methods while minimizing changes made to the single-vehicle detection models and relaxing unrealistic assumptions on inter-agent synchronization. Experiments on the V2X-Sim dataset show that our collaboration method achieves 98\% of the performance of an early-collaboration method, while only consuming the equivalent bandwidth of a late-collaboration method.	翻訳日:2023-07-11 21:37:52 公開日:2023-07-09
# 非局所性のない非有界ランダム性の証明 Certification of unbounded randomness without nonlocality ( http://arxiv.org/abs/2307.01333v2 ) ライセンス: Link先を確認	Shubhayan Sarkar	(参考訳) 乱数生成器は暗号と鍵分布において重要な役割を果たす。したがって、これらのデバイスから生成された乱数は、あらゆる敵によって予測不可能であるかどうかを検証することが重要である。近年、量子非局所性はランダム性を証明できる資源として認識されている。これらのスキームはデバイスに依存しないため非常に安全であるが、量子非局所性の観測は実際的な観点からは非常に困難である。本研究では,Leggett-Gargの不等式の最大値違反に基づいて,半デバイス独立な方法で非有界ランダム性を証明するためのスキームを提供する。興味深いことに、このスキームは量子状態の選択とは独立であり、従って「量子」ノイズでさえ自己検定の量子測定に利用でき、非有界ランダム性を生成して、このスキームを実用目的に非常に効率的にすることができる。 Random number generators play an essential role in cryptography and key distribution. It is thus important to verify whether the random numbers generated from these devices are genuine and unpredictable by any adversary. Recently, quantum nonlocality has been identified as a resource that can be utilised to certify randomness. Although these schemes are device-independent and thus highly secure, the observation of quantum nonlocality is extremely difficult from a practical perspective. In this work, we provide a scheme to certify unbounded randomness in a semi-device-independent way based on the maximal violation of Leggett-Garg inequalities. Interestingly, the scheme is independent of the choice of the quantum state, and consequently even "quantum" noise could be utilized to self-test quantum measurements and generate unbounded randomness making the scheme highly efficient for practical purposes.	翻訳日:2023-07-11 21:37:20 公開日:2023-07-09
# エッジクラウドコンピューティングによる大規模AI生成の概観 An Overview on Generative AI at Scale with Edge-Cloud Computing ( http://arxiv.org/abs/2306.17170v2 ) ライセンス: Link先を確認	Yun-Cheng Wang, Jintang Xue, Chengwei Wei, C.-C. Jay Kuo	(参考訳) 人工知能(AI)の特定のカテゴリとして、生成人工知能(GenAI)は、人間が生成したものに似た新しいコンテンツを生成する。 GenAIシステムの急速な開発は、インターネット上で大量の新しいデータを生み出し、現在のコンピューティングおよび通信フレームワークに新たな課題を提起している。現在、GenAIサービスは大規模な計算リソースを必要とするため、従来のクラウドコンピューティングフレームワークに依存している。しかし、データ転送と大量のリクエストのために、そのようなサービスは高いレイテンシに直面する。一方、エッジクラウドコンピューティングは、エッジとクラウドのコラボレーションを通じて、適切な計算能力と低レイテンシを同時に提供することができる。したがって、エッジクラウドコンピューティングのパラダイムを活用することで、GenAIシステムを大規模に構築することは魅力的である。本稿では,GenAIとエッジクラウドコンピューティングの最近の展開について概説する。そして、2つの例のGenAIアプリケーションを使って、エッジクラウドの協調システムを使ってソリューションをスケールアップする技術的な課題について議論します。最後に、GenAIシステムを大規模に運用するための設計上の考慮事項をリストアップし、今後の研究方向性を指摘する。 As a specific category of artificial intelligence (AI), generative artificial intelligence (GenAI) generates new content that resembles what is created by humans. The rapid development of GenAI systems has created a huge amount of new data on the Internet, posing new challenges to current computing and communication frameworks. Currently, GenAI services rely on the traditional cloud computing framework due to the need for large computation resources. However, such services will encounter high latency because of data transmission and a high volume of requests. On the other hand, edge-cloud computing can provide adequate computation power and low latency at the same time through the collaboration between edges and the cloud. Thus, it is attractive to build GenAI systems at scale by leveraging the edge-cloud computing paradigm. In this overview paper, we review recent developments in GenAI and edge-cloud computing, respectively. Then, we use two exemplary GenAI applications to discuss technical challenges in scaling up their solutions using edge-cloud collaborative systems. Finally, we list design considerations for training and deploying GenAI systems at scale and point out future research directions.	翻訳日:2023-07-11 21:36:17 公開日:2023-07-09
# OSP: 2段階同期による分散モデルトレーニングの強化 OSP: Boosting Distributed Model Training with 2-stage Synchronization ( http://arxiv.org/abs/2306.16926v2 ) ライセンス: Link先を確認	Zixuan Chen, Lei Shi, Xuandong Liu, Jiahui Li, Sen Liu, Yang Xu	(参考訳) 分散ディープラーニング(DDL)は、データセットとモデルの大きなサイズでディープラーニングタスクをトレーニングする効率を高めることを目的とした、有望な研究分野である。 DDLノードの計算能力が向上し続けており、ノード間のネットワーク接続が大きなボトルネックとなっている。パラメータサーバベースのDDLにおいて、このボトルネックに対処するために、勾配圧縮の様々な手法とモデル同期の改善が提案されている。しかし、これら2つの手法は、廃棄された勾配による精度の損失を生じさせ、それぞれモデル同期のスループットを低下させる可能性がある。これらの課題に対処するために,2段階同期方式による効率的な通信を実現し,局所勾配パラメータ補正 (lgp) を用いて,staleパラメータによる精度損失を回避する新しいモデル同期法,ospを提案する。 OSPのプロトタイプはPyTorchを使用して実装され、9ノードテストベッドで一般的に使用されるディープラーニングモデルとデータセットで評価されている。評価の結果,OSPは一般的な同期モデルと比較して,精度の低下を伴わずに最大50%のスループット向上を実現可能であることがわかった。 Distributed deep learning (DDL) is a promising research area, which aims to increase the efficiency of training deep learning tasks with large size of datasets and models. As the computation capability of DDL nodes continues to increase, the network connection between nodes is becoming a major bottleneck. Various methods of gradient compression and improved model synchronization have been proposed to address this bottleneck in Parameter-Server-based DDL. However, these two types of methods can result in accuracy loss due to discarded gradients and have limited enhancement on the throughput of model synchronization, respectively. To address these challenges, we propose a new model synchronization method named Overlapped Synchronization Parallel (OSP), which achieves efficient communication with a 2-stage synchronization approach and uses Local-Gradient-based Parameter correction (LGP) to avoid accuracy loss caused by stale parameters. The prototype of OSP has been implemented using PyTorch and evaluated on commonly used deep learning models and datasets with a 9-node testbed. Evaluation results show that OSP can achieve up to 50\% improvement in throughput without accuracy loss compared to popular synchronization models.	翻訳日:2023-07-11 21:35:40 公開日:2023-07-09
# Riemannian Gauss-Newtonによる低ランクテンソル推定:統計的最適性と2次収束 Low-rank Tensor Estimation via Riemannian Gauss-Newton: Statistical Optimality and Second-Order Convergence ( http://arxiv.org/abs/2104.12031v4 ) ライセンス: Link先を確認	Yuetian Luo, Anru R. Zhang	(参考訳) 本稿では, タッカー級のテンソルを, ノイズの少ない線形測定値から推定する。一般的な問題は、テンソル回帰、テンソル完備化、テンソルPCA/SVDなど、応用から生じる多くの具体例をカバーする。低タッカーランクテンソル推定のための効率的なリーマンガウスニュートン法(RGN)を提案する。文献におけるRGNの一般(超)線形収束保証とは違い、正規性条件下での雑音条件下での低ランクテンソル推定に対するRGNの最初の局所二次収束保証を証明し、対応する推定誤差上限を与える。 rgnの統計的最適性を示す決定論的推定誤差が上限値に一致する。 RGNの利点は、テンソル回帰とテンソルSVDという2つの機械学習アプリケーションを通して説明される。最後に,理論的な知見を裏付けるシミュレーション結果を提供する。 In this paper, we consider the estimation of a low Tucker rank tensor from a number of noisy linear measurements. The general problem covers many specific examples arising from applications, including tensor regression, tensor completion, and tensor PCA/SVD. We consider an efficient Riemannian Gauss-Newton (RGN) method for low Tucker rank tensor estimation. Different from the generic (super)linear convergence guarantee of RGN in the literature, we prove the first local quadratic convergence guarantee of RGN for low-rank tensor estimation in the noisy setting under some regularity conditions and provide the corresponding estimation error upper bounds. A deterministic estimation error lower bound, which matches the upper bound, is provided that demonstrates the statistical optimality of RGN. The merit of RGN is illustrated through two machine learning applications: tensor regression and tensor SVD. Finally, we provide the simulation results to corroborate our theoretical findings.	翻訳日:2023-07-11 19:54:24 公開日:2023-07-09
# 予期せぬ敵に対するロバスト性テスト Testing Robustness Against Unforeseen Adversaries ( http://arxiv.org/abs/1908.08016v3 ) ライセンス: Link先を確認	Max Kaufmann, Daniel Kang, Yi Sun, Steven Basart, Xuwang Yin, Mantas Mazeika, Akul Arora, Adam Dziedzic, Franziska Boenisch, Tom Brown, Jacob Steinhardt, Dan Hendrycks	(参考訳) 現実の敵の設定を考えると、ディフェンダーは訓練中に展開時間の完全な敵にアクセスできる可能性は低く、敵は小さなL_p制約の摂動に制限されない現実的な敵の歪みを使用する可能性が高い。この研究と現実の相違を狭めるために、我々は、予期せぬ幅広い敵に対してモデルロバスト性を評価する新しいベンチマークであるImageNet-UAを作成するために使用する18の新たな敵攻撃を導入する。当社は、この一般化ギャップを克服するための幅広い防御戦略を特定し、予期せぬ堅牢性を改善するための豊富な技術空間を見つけるために、ベンチマークを利用しています。 ImageNet-UAの多様性と現実性により、これは現実世界の最悪のケースの堅牢性に取り組む人々にとって有用なツールになり、トレーニング中に見られる攻撃を超えて、より堅牢な防御を開発することができることを期待しています。 When considering real-world adversarial settings, defenders are unlikely to have access to the full range of deployment-time adversaries during training, and adversaries are likely to use realistic adversarial distortions that will not be limited to small L_p-constrained perturbations. To narrow in on this discrepancy between research and reality we introduce eighteen novel adversarial attacks, which we use to create ImageNet-UA, a new benchmark for evaluating model robustness against a wide range of unforeseen adversaries. We make use of our benchmark to identify a range of defense strategies which can help overcome this generalization gap, finding a rich space of techniques which can improve unforeseen robustness. We hope the greater variety and realism of ImageNet-UA will make it a useful tool for those working on real-world worst-case robustness, enabling development of more robust defenses which can generalize beyond attacks seen during training.	翻訳日:2023-07-11 19:51:40 公開日:2023-07-09
# 非支配的ソーティング遺伝的アルゴリズム(NSGA-II)の数学的実行解析 Mathematical Runtime Analysis for the Non-Dominated Sorting Genetic Algorithm II (NSGA-II) ( http://arxiv.org/abs/2112.08581v5 ) ライセンス: Link先を確認	Weijie Zheng, Benjamin Doerr	(参考訳) 非支配的ソート遺伝アルゴリズムII(NSGA-II)は、現実世界の応用において最も集中的に使用される多目的進化アルゴリズムである。しかし、数学的な方法で解析されたいくつかの単純なMOEAとは対照的に、NSGA-IIにはそのような研究は存在しない。本研究では,NSGA-IIにも数学的ランタイム解析が適用可能であることを示す。特に,paretoフロントの大きさの4倍の大きさの個体群を持つnsga-iiは,従来の2つの変異演算子と4つの異なる方法で親を選択することで,基本oneminmaxおよびleadingonestrailingzerosベンチマークにおけるsemoおよびgsemoアルゴリズムと同じ漸近的実行保証を満足できることが証明された。しかし、人口の大きさがパレート前線のサイズに等しい場合、nsga-iiは完全なパレート前線を効率的に計算することはできない。我々の実験は上記の結果を確認した。 The non-dominated sorting genetic algorithm II (NSGA-II) is the most intensively used multi-objective evolutionary algorithm (MOEA) in real-world applications. However, in contrast to several simple MOEAs analyzed also via mathematical means, no such study exists for the NSGA-II so far. In this work, we show that mathematical runtime analyses are feasible also for the NSGA-II. As particular results, we prove that with a population size four times larger than the size of the Pareto front, the NSGA-II with two classic mutation operators and four different ways to select the parents satisfies the same asymptotic runtime guarantees as the SEMO and GSEMO algorithms on the basic OneMinMax and LeadingOnesTrailingZeros benchmarks. However, if the population size is only equal to the size of the Pareto front, then the NSGA-II cannot efficiently compute the full Pareto front: for an exponential number of iterations, the population will always miss a constant fraction of the Pareto front. Our experiments confirm the above findings.	翻訳日:2023-07-11 19:45:00 公開日:2023-07-09
# 単眼路平面視差推定法 Monocular Road Planar Parallax Estimation ( http://arxiv.org/abs/2111.11089v2 ) ライセンス: Link先を確認	Haobo Yuan, Teng Chen, Wei Sui, Jiafeng Xie, Lefei Zhang, Yuan Li, Qian Zhang	(参考訳) ドライブル表面および周辺環境の3次元構造の推定は、補助運転および自律運転にとって重要な課題である。 lidarのような3dセンサーを使うか、ディープラーニングによってポイントの深さを直接予測する。しかし、前者は高価であり、後者はシーンの幾何学的情報を使用しない。本稿では,既存の手法を踏襲する代わりに,平面視差に基づく単眼画像シーケンスから3次元センシングを行う新しい深層ニューラルネットワークである road planar parallax attention network (rpanet) を提案する。 rpanetは、路面のホモグラフィで整列した画像を入力とし、3次元再構成のために$\gamma$ map(高さと深さの比)を出力する。 $\gamma$ 写像は、2つの連続するフレーム間の2次元変換を構築することができる。これは平面視差を意味し、連続するフレームをワープすることで3次元構造を推定するための基準となる道路平面と組み合わせることができる。さらに,平面視差による変位をネットワークがよりよく知覚できるように,新しいクロスアテンションモジュールを導入する。提案手法の有効性を検証するため,Waymo Open Datasetのデータをサンプリングし,平面視差に関するアノテーションを構築する。また,本手法の3次元再構成精度を示すため,サンプルデータセットを用いた総合実験を行った。 Estimating the 3D structure of the drivable surface and surrounding environment is a crucial task for assisted and autonomous driving. It is commonly solved either by using 3D sensors such as LiDAR or directly predicting the depth of points via deep learning. However, the former is expensive, and the latter lacks the use of geometry information for the scene. In this paper, instead of following existing methodologies, we propose Road Planar Parallax Attention Network (RPANet), a new deep neural network for 3D sensing from monocular image sequences based on planar parallax, which takes full advantage of the omnipresent road plane geometry in driving scenes. RPANet takes a pair of images aligned by the homography of the road plane as input and outputs a $\gamma$ map (the ratio of height to depth) for 3D reconstruction. The $\gamma$ map has the potential to construct a two-dimensional transformation between two consecutive frames. It implies planar parallax and can be combined with the road plane serving as a reference to estimate the 3D structure by warping the consecutive frames. Furthermore, we introduce a novel cross-attention module to make the network better perceive the displacements caused by planar parallax. To verify the effectiveness of our method, we sample data from the Waymo Open Dataset and construct annotations related to planar parallax. Comprehensive experiments are conducted on the sampled dataset to demonstrate the 3D reconstruction accuracy of our approach in challenging scenarios.	翻訳日:2023-07-11 19:43:57 公開日:2023-07-09
# SCORE:オフライン強化学習のためのSpurious Correlation Reduction SCORE: Spurious COrrelation REduction for Offline Reinforcement Learning ( http://arxiv.org/abs/2110.12468v2 ) ライセンス: Link先を確認	Zhihong Deng, Zuyue Fu, Lingxiao Wang, Zhuoran Yang, Chenjia Bai, Tianyi Zhou, Zhaoran Wang, Jing Jiang	(参考訳) オフライン強化学習(RL)は、シーケンシャルな決定問題の解決に大量のデータセットのパワーを利用する。既存の論文では,より広い課題である認識的不確実性と意思決定との相関性,すなわち非最適性を引き起こす重要な要因について検討しながら,分散(ood)行動に対する防御についてのみ論じている。本稿では,実効的かつ理論的に証明可能なアルゴリズムであるオフラインRLに対するSpurious Correlation Reduction (SCORE)を提案する。 SCOREは、標準ベンチマーク(D4RL)において、様々なタスクにおいて3.1倍の高速化でSoTA性能を達成することを実証的に示す。提案アルゴリズムでは,不確かさの高精度な推定を支援するため,アニーリング動作クローニング正則化器を導入している。理論的には,提案手法の合理性を正当化し,その最適方針への収束を軽度仮定下でサブリニアレートで証明する。 Offline reinforcement learning (RL) harnesses the power of massive datasets for resolving sequential decision problems. Most existing papers only discuss defending against out-of-distribution (OOD) actions while we investigate a broader issue, the spurious correlations between epistemic uncertainty and decision-making, an essential factor that causes suboptimality. In this paper, we propose Spurious COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm. We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL). The proposed algorithm introduces an annealing behavior cloning regularizer to help produce a high-quality estimation of uncertainty which is critical for eliminating spurious correlations from suboptimality. Theoretically, we justify the rationality of the proposed method and prove its convergence to the optimal policy with a sublinear rate under mild assumptions.	翻訳日:2023-07-11 19:43:14 公開日:2023-07-09
# スパースMoEが効率的なアンサンブルと出会う Sparse MoEs meet Efficient Ensembles ( http://arxiv.org/abs/2110.03360v2 ) ライセンス: Link先を確認	James Urquhart Allingham, Florian Wenzel, Zelda E Mariet, Basil Mustafa, Joan Puigcerver, Neil Houlsby, Ghassen Jerfel, Vincent Fortuin, Balaji Lakshminarayanan, Jasper Snoek, Dustin Tran, Carlos Riquelme Ruiz, Rodolphe Jenatton	(参考訳) サブモデルの集約された出力に基づく機械学習モデルは、アクティベーションレベルまたは予測レベルにおいて、個々のモデルと比較して強いパフォーマンスを示すことが多い。本稿では,ニューラルネットワークのアンサンブルと,専門家のスパースミックス(スパースMoE)の2つの人気クラスの相互作用について検討する。まず、2つのアプローチが相補的な特徴を持ち,それらの組み合わせが有益であることを示す。これには、不確実性関連ベンチマークにおけるスパースMoEの包括的な評価が含まれる。次に、両モデルのクラスを最大限に活用するスケーラブルでシンプルなMOEのアンサンブルであるE$^3$(Efficient Ensemble of Experts)を紹介し、深いアンサンブルよりも最大45%少ないFLOPを使用する。大規模な実験では、いくつかの難解な視覚トランスフォーマーベースのベースラインに対して、精度、ログライク、少数ショット学習、ロバスト性、E$^3$の不確実性の改善が示されている。 e$^3$は、最大2.7bのパラメータを持つモデルにスケールしながらその効率を維持するだけでなく、より大きなモデルに対する予測性能と不確実性の推定も改善する。 Machine learning models based on the aggregated outputs of submodels, either at the activation or prediction levels, often exhibit strong performance compared to individual models. We study the interplay of two popular classes of such models: ensembles of neural networks and sparse mixture of experts (sparse MoEs). First, we show that the two approaches have complementary features whose combination is beneficial. This includes a comprehensive evaluation of sparse MoEs in uncertainty related benchmarks. Then, we present Efficient Ensemble of Experts (E$^3$), a scalable and simple ensemble of sparse MoEs that takes the best of both classes of models, while using up to 45% fewer FLOPs than a deep ensemble. Extensive experiments demonstrate the accuracy, log-likelihood, few-shot learning, robustness, and uncertainty improvements of E$^3$ over several challenging vision Transformer-based baselines. E$^3$ not only preserves its efficiency while scaling to models with up to 2.7B parameters, but also provides better predictive performance and uncertainty estimates for larger models.	翻訳日:2023-07-11 19:42:23 公開日:2023-07-09
# ニューラルビデオ圧縮のための生成モデリングの展望 Insights from Generative Modeling for Neural Video Compression ( http://arxiv.org/abs/2107.13136v2 ) ライセンス: Link先を確認	Ruihan Yang, Yibo Yang, Joseph Marino, Stephan Mandt	(参考訳) 最近の機械学習研究は、VAEのような深層生成モデルと学習圧縮で使用される速度歪み損失の関連を明らかにしているが、この研究の大部分は画像に焦点を当てている。同様に、我々は最近提案されたニューラルビデオ符号化アルゴリズムを、深い自己回帰と潜伏変数モデリングのレンズを通して見る。我々は、これらのコーデックを一般化された確率的時間的自己回帰変換の例として提示し、流れの正規化と構造的事前化に触発されたさらなる改善のための新しい道を提案する。本稿では,高精細度ビデオに最先端のビデオ圧縮性能をもたらすいくつかのアーキテクチャを提案し,そのトレードオフと改善について議論する。特に,提案する (i)時間的自己回帰変換の改善 (ii)構造的・時間的依存によるエントロピーモデルの改善、及び (iii)我々のアルゴリズムの可変ビットレートバージョン。我々の改良は既存のモデルと互換性があるため、生成的モデリングの観点がニューラルビデオ符号化の分野を前進させる証拠となる。 While recent machine learning research has revealed connections between deep generative models such as VAEs and rate-distortion losses used in learned compression, most of this work has focused on images. In a similar spirit, we view recently proposed neural video coding algorithms through the lens of deep autoregressive and latent variable modeling. We present these codecs as instances of a generalized stochastic temporal autoregressive transform, and propose new avenues for further improvements inspired by normalizing flows and structured priors. We propose several architectures that yield state-of-the-art video compression performance on high-resolution video and discuss their tradeoffs and ablations. In particular, we propose (i) improved temporal autoregressive transforms, (ii) improved entropy models with structured and temporal dependencies, and (iii) variable bitrate versions of our algorithms. Since our improvements are compatible with a large class of existing models, we provide further evidence that the generative modeling viewpoint can advance the neural video coding field.	翻訳日:2023-07-11 19:42:03 公開日:2023-07-09
# KenSwQuAD - Swahili低リソース言語のための質問回答データセット KenSwQuAD -- A Question Answering Dataset for Swahili Low Resource Language ( http://arxiv.org/abs/2205.02364v3 ) ライセンス: Link先を確認	Barack W. Wanjawa (1), Lilian D.A. Wanzare (2), Florence Indede (2), Owen McOnyango (2), Lawrence Muchemi (1), Edward Ombui (3) ((1) University of Nairobi Kenya, (2) Maseno University Kenya (3) Africa Nazarene University Kenya)	(参考訳) 低リソース言語における質問回答データセットの必要性はこの研究の動機であり、Kencorpus Swahili Question Answering Dataset, KenSwQuADの開発につながっている。このデータセットは、東アフリカや世界の他の地域で主に話されているスワヒリ低資源言語の生の物語テキストから注釈付けされている。質問応答(QA)データセットは、インターネット検索やダイアログシステムなどのタスクに対する自然言語の機械的理解において重要である。機械学習システムには,本研究で開発されたゴールド標準質問回答セットなどのトレーニングデータが必要である。この研究は、ケニア語コーパスであるKencorpusプロジェクトによって収集されたスワヒリ語のテキストからQAペアを定式化するためにアノテータを雇った。このプロジェクトは、少なくとも5つのQAペアを持つ合計2,585のテキストから1,445の注釈を付け、最終的なデータセットは7,526のQAペアになった。注釈付きテキストの12.5%の品質保証セットは、QAペアがすべて正しく注釈付けされていることを確認した。データセットをQAタスクに適用する概念実証では、データセットがそのようなタスクに使用できることを確認した。 KenSwQuADはスワヒリ語の再配布にも貢献している。 The need for Question Answering datasets in low resource languages is the motivation of this research, leading to the development of Kencorpus Swahili Question Answering Dataset, KenSwQuAD. This dataset is annotated from raw story texts of Swahili low resource language, which is a predominantly spoken in Eastern African and in other parts of the world. Question Answering (QA) datasets are important for machine comprehension of natural language for tasks such as internet search and dialog systems. Machine learning systems need training data such as the gold standard Question Answering set developed in this research. The research engaged annotators to formulate QA pairs from Swahili texts collected by the Kencorpus project, a Kenyan languages corpus. The project annotated 1,445 texts from the total 2,585 texts with at least 5 QA pairs each, resulting into a final dataset of 7,526 QA pairs. A quality assurance set of 12.5% of the annotated texts confirmed that the QA pairs were all correctly annotated. A proof of concept on applying the set to the QA task confirmed that the dataset can be usable for such tasks. KenSwQuAD has also contributed to resourcing of the Swahili language.	翻訳日:2023-07-11 19:35:32 公開日:2023-07-09
# 量子力学学習のための分布外一般化 Out-of-distribution generalization for learning quantum dynamics ( http://arxiv.org/abs/2204.10268v3 ) ライセンス: Link先を確認	Matthias C. Caro, Hsin-Yuan Huang, Nicholas Ezzell, Joe Gibbs, Andrew T. Sornborger, Lukasz Cincio, Patrick J. Coles, Zo\"e Holmes	(参考訳) 一般化バウンダリは、量子機械学習(QML)のトレーニングデータ要求を評価する重要なツールである。最近の研究は、同じデータ分布からトレーニングとテストデータを引き出す量子ニューラルネットワーク(QNN)の分散内一般化の保証を確立している。しかし,qmlでは,異なる分布からトレーニング分布へ引き出されたデータに対しても,トレーニングモデルがうまく機能するように要求されるため,分散一般化の結果は得られていない。ここでは,未知のユニタリを学習するタスクに対する分散の一般化を証明する。特に,積状態のみを訓練することで,絡み合った状態に対するユニタリの作用を学習できることを示す。積状態は単一量子ビットゲートのみを使用して作成できるため、近距離量子ハードウェア上での量子力学の学習の展望を前進させ、量子回路の古典的および量子的コンパイルのための新しい方法をさらに開ける。 Generalization bounds are a critical tool to assess the training data requirements of Quantum Machine Learning (QML). Recent work has established guarantees for in-distribution generalization of quantum neural networks (QNNs), where training and testing data are drawn from the same data distribution. However, there are currently no results on out-of-distribution generalization in QML, where we require a trained model to perform well even on data drawn from a different distribution to the training distribution. Here, we prove out-of-distribution generalization for the task of learning an unknown unitary. In particular, we show that one can learn the action of a unitary on entangled states having trained only product states. Since product states can be prepared using only single-qubit gates, this advances the prospects of learning quantum dynamics on near term quantum hardware, and further opens up new methods for both the classical and quantum compilation of quantum circuits.	翻訳日:2023-07-11 19:34:39 公開日:2023-07-09
# クラウドソーシングにおける空間的未報告格差の定量化 Quantifying Spatial Under-reporting Disparities in Resident Crowdsourcing ( http://arxiv.org/abs/2204.08620v3 ) ライセンス: Link先を確認	Zhi Liu, Uma Bhandaram, Nikhil Garg	(参考訳) 現代の都市ガバナンスは、倒木や電力線といった問題を特定するためにクラウドソーシング(‘コプロダクション’)に大きく依存している。主な懸念は、住民が同じレートで問題を報告しないことである。不均質な報告遅延は、インシデントがいかに迅速に対処できるかで下流の格差に直接翻訳される。このようなアンダーレポートの測定は、定義上、報告されていないインシデントや報告されたインシデントの発生を観測しないため、難しい統計的タスクである。したがって、低報告率と低地動事故率をナレーション的に区別することはできず、報告遅延は観測されない。外部の根拠データを用いずに(ヘテロジェンシーな)報告遅延を識別する手法を開発した。当社の見解では、同じインシデントに関する \textit{duplicate}レポートのレートは、インシデントが発生した後にそのインシデントがレポートレートで発生したかどうかを曖昧化するために利用することができる。このアイデアを用いて、我々は、標準的なポアソンレート推定タスク -- 完全なインシデント報告間隔が守られていないにもかかわらず。我々は、ニューヨークで作成された10万以上のインシデントレポートと、シカゴで作成された90万以上のレポートに適用し、インシデント特性を制御した後でも、インシデントがいかに早く報告されるかにかなりの空間的差異があることを見出します。これらの空間的格差は社会経済的特徴に対応しており、ニューヨーク市では人口密度が高く、大学の学位を持つ人の比率、収入、人口の比率は報告率と正の相関がある。最後に、ニューヨーク市公園・レクリエーション省との協力を利用して、レポートの遅延を見積もると、より公平で効率的な政府サービスのための‘textit{practical}の洞察と介入につながるかを実証する。 Modern city governance relies heavily on crowdsourcing (``co-production'') to identify problems such as downed trees and power-lines. A major concern is that residents do not report problems at the same rates, with heterogeneous reporting delays directly translating to downstream disparities in how quickly incidents can be addressed. Measuring such under-reporting is a difficult statistical task, as, by definition, we do not observe incidents that are not reported or when reported incidents first occurred. Thus, low reporting rates and low ground-truth incident rates cannot be naively distinguished, and reporting delays are unobserved. We develop a method to identify (heterogeneous) reporting delays, without using external ground truth data. Our insight is that rates on \textit{duplicate} reports about the same incident can be leveraged to disambiguate whether an incident has occurred with its reporting rate once it has occurred. Using this idea, we reduce the question to a standard Poisson rate estimation task -- even though the full incident reporting interval is also unobserved. We apply our method to over 100,000 resident reports made in New York City and to over 900,000 reports made in Chicago, finding that there are substantial spatial disparities in how quickly incidents are reported, even after controlling for incident characteristics -- some neighborhoods report three times as quickly as do others. These spatial disparities correspond to socio-economic characteristics: in NYC, higher population density, fraction of people with college degrees, income, and fraction of population that is White all positively correlate with reporting rates. Finally, leveraging a collaboration with the NYC Department of Parks and Recreation, we demonstrate how estimating reporting delays leads to \textit{practical} insights and interventions for more equitable, efficient government service.	翻訳日:2023-07-11 19:34:22 公開日:2023-07-09
# 継続的な学習、速く、ゆっくり Continual Learning, Fast and Slow ( http://arxiv.org/abs/2209.02370v3 ) ライセンス: Link先を確認	Quang Pham, Chenghao Liu, Steven C. H. Hoi	(参考訳) 神経科学における補足学習システム(cls)理論~\cite{mcclelland1995there} によれば、人間は2つの補足的なシステムを通して効果的な \emph{continual learning} を行う。この理論によって動機づけられた「emph{DualNets}」(デュアルネットワークのための)は、特定のタスクからパターン分離表現を指導する高速学習システムと、自己監視学習(SSL)を介してタスク非依存の汎用表現を学習する遅い学習システムからなる一般的な連続学習フレームワークである。 DualNetsは、両方の表現型を総合的なフレームワークにシームレスに組み込んで、ディープニューラルネットワークの継続的な学習を容易にする。幅広い実験を通じて,オフライン環境からタスク対応環境,オンライン・タスクフリーシナリオまで幅広い学習プロトコルにおいて,デュアルネットの有望な結果を示す。特に、CTrL~\cite{veniat2020efficient}ベンチマークでは、非常に異なる視覚イメージと無関係なタスクを持つため、DualNetsは既存の最先端の動的アーキテクチャ戦略~\cite{ostapenko2021continual}と競合する性能を達成できる。さらに,デュアルネットの有効性,ロバスト性,拡張性を検証するため,包括的なアブレーション研究を行う。コードは \url{https://github.com/phquang/dualnet}で入手できる。 According to the Complementary Learning Systems (CLS) theory~\cite{mcclelland1995there} in neuroscience, humans do effective \emph{continual learning} through two complementary systems: a fast learning system centered on the hippocampus for rapid learning of the specifics, individual experiences; and a slow learning system located in the neocortex for the gradual acquisition of structured knowledge about the environment. Motivated by this theory, we propose \emph{DualNets} (for Dual Networks), a general continual learning framework comprising a fast learning system for supervised learning of pattern-separated representation from specific tasks and a slow learning system for representation learning of task-agnostic general representation via Self-Supervised Learning (SSL). DualNets can seamlessly incorporate both representation types into a holistic framework to facilitate better continual learning in deep neural networks. Via extensive experiments, we demonstrate the promising results of DualNets on a wide range of continual learning protocols, ranging from the standard offline, task-aware setting to the challenging online, task-free scenario. Notably, on the CTrL~\cite{veniat2020efficient} benchmark that has unrelated tasks with vastly different visual images, DualNets can achieve competitive performance with existing state-of-the-art dynamic architecture strategies~\cite{ostapenko2021continual}. Furthermore, we conduct comprehensive ablation studies to validate DualNets efficacy, robustness, and scalability. Code will be made available at \url{https://github.com/phquang/DualNet}.	翻訳日:2023-07-11 19:25:52 公開日:2023-07-09
# 大規模言語モデルを用いた複数人のシミュレーションと人間研究の再現 Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies ( http://arxiv.org/abs/2208.10264v5 ) ライセンス: Link先を確認	Gati Aher, Rosa I. Arriaga, Adam Tauman Kalai	(参考訳) チューリング実験(te)と呼ばれる新しいタイプのテストを導入し、gptモデルのような特定の言語モデルが人間の行動の様々な側面をシミュレートできるかどうかを評価する。 TEはまた、言語モデルの特定の人間の振る舞いのシミュレーションにおいて一貫した歪みを明らかにすることができる。単一の任意の個人をシミュレートするチューリングテストとは異なり、TEは人体研究の参加者の代表サンプルをシミュレートする必要がある。我々は,先行研究から確立した発見を再現しようとするTEを行う。我々は、TEをシミュレーションするための方法論を設計し、異なる言語モデルが古典的な経済、精神言語、社会心理学の実験をいかにうまく再現できるかを比較するために、Ultimatum Game、Garden Path Sentences、Milgram Shock Experiment、Wisdom of Crowds。最初の3つのTEでは、既存の発見は最近のモデルで再現され、最後のTEでは、一部の言語モデル(ChatGPTやGPT-4など)に「超精度の歪み」があることが示され、教育や芸術における下流の応用に影響を及ぼす可能性がある。 We introduce a new type of test, called a Turing Experiment (TE), for evaluating to what extent a given language model, such as GPT models, can simulate different aspects of human behavior. A TE can also reveal consistent distortions in a language model's simulation of a specific human behavior. Unlike the Turing Test, which involves simulating a single arbitrary individual, a TE requires simulating a representative sample of participants in human subject research. We carry out TEs that attempt to replicate well-established findings from prior studies. We design a methodology for simulating TEs and illustrate its use to compare how well different language models are able to reproduce classic economic, psycholinguistic, and social psychology experiments: Ultimatum Game, Garden Path Sentences, Milgram Shock Experiment, and Wisdom of Crowds. In the first three TEs, the existing findings were replicated using recent models, while the last TE reveals a "hyper-accuracy distortion" present in some language models (including ChatGPT and GPT-4), which could affect downstream applications in education and the arts.	翻訳日:2023-07-11 19:24:34 公開日:2023-07-09
# オンライン診断最小化による適応的ドメイン一般化 Adaptive Domain Generalization via Online Disagreement Minimization ( http://arxiv.org/abs/2208.01996v2 ) ライセンス: Link先を確認	Xin Zhang, Ying-Cong Chen	(参考訳) ディープニューラルネットワークは、デプロイメントとトレーニングの間に分散シフトがある場合、パフォーマンスが著しく低下する。ドメインジェネリゼーション(dg)は、ソースドメインの集合のみに依存することによって、モデルをターゲットドメインに安全に転送することを目的としている。様々なDGアプローチが提案されているが、DomainBedという最近の研究によると、そのほとんどは単純な経験的リスク最小化(ERM)に勝っていない。そこで本研究では,既存のDGアルゴリズムに直交する汎用フレームワークを提案する。従来のdgと異なり、静的なソースモデルに固執して普遍的であることを願っているが、提案するadaodmは、異なるターゲットドメインのテスト時にソースモデルを適応的に修正する。具体的には、共有ドメインジェネリック特徴抽出器上に複数のドメイン固有の分類器を作成する。特徴抽出器と分類器は、その特徴抽出器が入力サンプルをドメイン不変空間に埋め込み、複数の分類器がそれぞれが特定のソースドメインに関連する決定境界をキャプチャする逆方向で訓練される。テスト中、ソース分類器間の予測不一致を利用して、ターゲットドメインとソースドメインの分布差を効果的に測定できる。テスト時に不一致を最小限に抑えるためにソースモデルを微調整することで、ターゲットドメイン機能は不変機能空間とよく一致します。 AdaODMは、EMMとCORALという2つの一般的なDG手法と、VLCS、PACS、OfficeHome、TerraIncognitaという4つのDGベンチマークで検証する。その結果, adaodm は未認識領域の一般化能力を安定的に改善し, 最先端の性能を実現する。 Deep neural networks suffer from significant performance deterioration when there exists distribution shift between deployment and training. Domain Generalization (DG) aims to safely transfer a model to unseen target domains by only relying on a set of source domains. Although various DG approaches have been proposed, a recent study named DomainBed, reveals that most of them do not beat the simple Empirical Risk Minimization (ERM). To this end, we propose a general framework that is orthogonal to existing DG algorithms and could improve their performance consistently. Unlike previous DG works that stake on a static source model to be hopefully a universal one, our proposed AdaODM adaptively modifies the source model at test time for different target domains. Specifically, we create multiple domain-specific classifiers upon a shared domain-generic feature extractor. The feature extractor and classifiers are trained in an adversarial way, where the feature extractor embeds the input samples into a domain-invariant space, and the multiple classifiers capture the distinct decision boundaries that each of them relates to a specific source domain. During testing, distribution differences between target and source domains could be effectively measured by leveraging prediction disagreement among source classifiers. By fine-tuning source models to minimize the disagreement at test time, target domain features are well aligned to the invariant feature space. We verify AdaODM on two popular DG methods, namely ERM and CORAL, and four DG benchmarks, namely VLCS, PACS, OfficeHome, and TerraIncognita. The results show AdaODM stably improves the generalization capacity on unseen domains and achieves state-of-the-art performance.	翻訳日:2023-07-11 19:24:13 公開日:2023-07-09
# 多言語対話における多言語対応 Multilingual Coreference Resolution in Multiparty Dialogue ( http://arxiv.org/abs/2208.01307v2 ) ライセンス: Link先を確認	Boyuan Zheng, Patrick Xia, Mahsa Yarmohammadi, Benjamin Van Durme	(参考訳) エンティティのコリファレンス解決のための既存のマルチパーティ対話データセットが誕生したばかりだが、多くの課題はまだ解決されていない。そこで我々は,テレビの文字起こしに基づく大規模データセットMultilingual Multiparty Coref (MMC) を作成した。複数の言語でゴールド品質の字幕が利用できるため、アノテーションを再利用して他の言語(中国語とFarsi)で銀のコア参照解決データを作成することを提案する。金(英)データでは、市販のモデルはMCCでは比較的低性能であり、MCCは以前のデータセットよりも多人数のコア参照を幅広くカバーしていることを示している。シルバーデータでは、データ拡張とゼロショットの言語間設定を効果的にシミュレートするスクラッチからのトレーニングの両方にそれを使うことに成功した。 Existing multiparty dialogue datasets for entity coreference resolution are nascent, and many challenges are still unaddressed. We create a large-scale dataset, Multilingual Multiparty Coref (MMC), for this task based on TV transcripts. Due to the availability of gold-quality subtitles in multiple languages, we propose reusing the annotations to create silver coreference resolution data in other languages (Chinese and Farsi) via annotation projection. On the gold (English) data, off-the-shelf models perform relatively poorly on MMC, suggesting that MMC has broader coverage of multiparty coreference than prior datasets. On the silver data, we find success both using it for data augmentation and training from scratch, which effectively simulates the zero-shot cross-lingual setting.	翻訳日:2023-07-11 19:23:45 公開日:2023-07-09
# indecision tree: 量化不確実性下での議論に基づく推論の学習 Indecision Trees: Learning Argument-Based Reasoning under Quantified Uncertainty ( http://arxiv.org/abs/2206.12252v2 ) ライセンス: Link先を確認	Jonathan S. Kent, David H. Menager	(参考訳) 現実世界での機械学習システムの使用は、しばしば問題となり、説明不能なブラックボックスモデル、不完全な測定の仮定された確実性、確率分布の代わりに単一の分類を提供する。本稿では,不確実性の下で学習し,不確実性の下で推論を行い,可能なラベル上で強固な分布を提供し,他の推論システムで使用する論理的な引数の集合に分解できる決定木の改良であるindecision treeを提案する。 Using Machine Learning systems in the real world can often be problematic, with inexplicable black-box models, the assumed certainty of imperfect measurements, or providing a single classification instead of a probability distribution. This paper introduces Indecision Trees, a modification to Decision Trees which learn under uncertainty, can perform inference under uncertainty, provide a robust distribution over the possible labels, and can be disassembled into a set of logical arguments for use in other reasoning systems.	翻訳日:2023-07-11 19:23:07 公開日:2023-07-09
# Survival Kernets: 精度保証によるスケーラブルで解釈可能なDeep Kernel Survival Analysis Survival Kernets: Scalable and Interpretable Deep Kernel Survival Analysis with an Accuracy Guarantee ( http://arxiv.org/abs/2206.10477v4 ) ライセンス: Link先を確認	George H. Chen	(参考訳) カーネルサバイバル解析モデルは、2つのデータポイント間の類似度を測定するカーネル関数の助けを借りて、個々のサバイバル分布を推定する。このようなカーネル関数は、ディープカーネルサバイバルモデルを用いて学習することができる。本稿では,モデル解釈や理論解析に適した方法で大規模データセットにスケール可能な,サバイバルカーネットと呼ばれる新しいディープカーネルサバイバルモデルを提案する。具体的には、最近開発されたカーネルネットと呼ばれる分類と回帰のためのトレーニングセット圧縮スキームに基づいて、トレーニングデータをクラスタに分割し、サバイバル分析設定に拡張する。テスト時には、各データポイントをこれらのクラスタの重み付けの組み合わせとして表現し、それぞれのクラスタを可視化することができる。生存カーネットの特殊な場合、予測生存分布に縛られる有限サンプル誤差を、ログ係数まで最適に設定する。上記のカーネルネット圧縮戦略を用いてテスト時のスケーラビリティを実現する一方で、トレーニング中のスケーラビリティは、XGBoostのようなツリーアンサンブルに基づくウォームスタート手順と、ニューラルネットワーク探索を加速するためのヒューリスティックアプローチによって達成される。異なるサイズ(約300万データポイントまで)の標準生存分析データセットでは、時間依存コンコーダンス指数で検証された各種ベースラインと比較して、生存カーネットは高い競争力を示す。私たちのコードは、https://github.com/georgehc/survival-kernetsで利用可能です。 Kernel survival analysis models estimate individual survival distributions with the help of a kernel function, which measures the similarity between any two data points. Such a kernel function can be learned using deep kernel survival models. In this paper, we present a new deep kernel survival model called a survival kernet, which scales to large datasets in a manner that is amenable to model interpretation and also theoretical analysis. Specifically, the training data are partitioned into clusters based on a recently developed training set compression scheme for classification and regression called kernel netting that we extend to the survival analysis setting. At test time, each data point is represented as a weighted combination of these clusters, and each such cluster can be visualized. For a special case of survival kernets, we establish a finite-sample error bound on predicted survival distributions that is, up to a log factor, optimal. Whereas scalability at test time is achieved using the aforementioned kernel netting compression strategy, scalability during training is achieved by a warm-start procedure based on tree ensembles such as XGBoost and a heuristic approach to accelerating neural architecture search. On four standard survival analysis datasets of varying sizes (up to roughly 3 million data points), we show that survival kernets are highly competitive compared to various baselines tested in terms of time-dependent concordance index. Our code is available at: https://github.com/georgehc/survival-kernets	翻訳日:2023-07-11 19:22:58 公開日:2023-07-09
# metafed: 循環型知識蒸留によるパーソナライズ医療における連合学習 MetaFed: Federated Learning among Federations with Cyclic Knowledge Distillation for Personalized Healthcare ( http://arxiv.org/abs/2206.08516v3 ) ライセンス: Link先を確認	Yiqiang Chen, Wang Lu, Xin Qin, Jindong Wang, Xing Xie	(参考訳) フェデレーション学習は、特にヘルスケアにおいて、生のユーザーデータにアクセスせずにモデルを構築することに注目が集まっている。実際のアプリケーションでは、異なるフェデレーションは、データの不均一性や中央サーバの不信/不信など、起こりうる理由により、ほとんど連携できない。本稿では,異なるフェデレーション間の信頼性の高いFLを実現するためのMetaFedというフレームワークを提案する。 MetaFedは、提案されたサイクリック知識蒸留を通じて、中央サーバーなしで各フェデレーションのパーソナライズされたモデルを取得する。具体的には、MetaFedは各フェデレーションをメタ分布として扱い、各フェデレーションの知識を循環的に集約する。トレーニングは、共通知識蓄積とパーソナライズという2つの部分に分けられる。 3つのベンチマークの総合的な実験により、MetaFedは最先端の手法(PAMAP2のベースラインに比べて10%以上精度が向上している)に比べて通信コストが低いことが示されている。 Federated learning has attracted increasing attention to building models without accessing the raw user data, especially in healthcare. In real applications, different federations can seldom work together due to possible reasons such as data heterogeneity and distrust/inexistence of the central server. In this paper, we propose a novel framework called MetaFed to facilitate trustworthy FL between different federations. MetaFed obtains a personalized model for each federation without a central server via the proposed Cyclic Knowledge Distillation. Specifically, MetaFed treats each federation as a meta distribution and aggregates knowledge of each federation in a cyclic manner. The training is split into two parts: common knowledge accumulation and personalization. Comprehensive experiments on three benchmarks demonstrate that MetaFed without a server achieves better accuracy compared to state-of-the-art methods (e.g., 10%+ accuracy improvement compared to the baseline for PAMAP2) with fewer communication costs.	翻訳日:2023-07-11 19:22:33 公開日:2023-07-09
# 再帰的分割のポイントワイズ挙動とその不均一因果効果推定への応用について On the Pointwise Behavior of Recursive Partitioning and Its Implications for Heterogeneous Causal Effect Estimation ( http://arxiv.org/abs/2211.10805v2 ) ライセンス: Link先を確認	Matias D. Cattaneo, Jason M. Klusowski, Peter M. Tian	(参考訳) 決定木学習は、ポイントワイズ推論にますます使われている。重要な応用例としては、因果的不均質な治療効果や動的政策決定、条件付き質的回帰や実験の設計などがある。本稿では,決定木(適応再帰的分割によって訓練される)が一様ノルムにおける収束率を定式化しても達成できないことを示すことで,決定木の使用を疑問視する。代わりに、収束は多対数であるかもしれないし、正直な回帰木のようないくつかの重要な特殊ケースでは、完全に失敗する。ランダムな森林は、樹木をほとんど最適な手順に転換し、解釈可能性を失い、さらに2つの追加のチューニングパラメータを導入することで状況を改善することができることを示す。ランダム林の2つの特徴, サブサンプリングとランダム特徴選択機構は, それぞれが考慮されたモデルクラスに対してほぼ最適な性能を達成するのに顕著に寄与している。 Decision tree learning is increasingly being used for pointwise inference. Important applications include causal heterogenous treatment effects and dynamic policy decisions, as well as conditional quantile regression and design of experiments, where tree estimation and inference is conducted at specific values of the covariates. In this paper, we call into question the use of decision trees (trained by adaptive recursive partitioning) for such purposes by demonstrating that they can fail to achieve polynomial rates of convergence in uniform norm, even with pruning. Instead, the convergence may be poly-logarithmic or, in some important special cases, such as honest regression trees, fail completely. We show that random forests can remedy the situation, turning poor performing trees into nearly optimal procedures, at the cost of losing interpretability and introducing two additional tuning parameters. The two hallmarks of random forests, subsampling and the random feature selection mechanism, are seen to each distinctively contribute to achieving nearly optimal performance for the model class considered.	翻訳日:2023-07-11 19:15:54 公開日:2023-07-09
# nano: 最小限の言語モデル制御のためのループ内人間報酬学習 Nano: Nested Human-in-the-Loop Reward Learning for Few-shot Language Model Control ( http://arxiv.org/abs/2211.05750v2 ) ライセンス: Link先を確認	Xiang Fan, Yiwei Lyu, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency	(参考訳) 事前訓練された言語モデルは、言語生成において異常な能力を示した。しかし、現実のタスクは、バイアスを緩和し、公平性を促進し、パーソナライズを達成するために、生成されたテキストの分配を制御する必要があることが多い。生成したテキストの分布を制御する既存の技術は、あらかじめ定義されたカテゴリ、分布の比率、あるいは所望の分布に従う既存のコーパスを必要とする、定量化された分布でのみ機能する。しかし、個人の好みなど多くの重要な分布は不適切である。本研究では,人間のフィードバックから継続的に学習する数発の学習アルゴリズムであるnanoを提案することで,任意の分布(定量化,非定量化)に従ってテキストを生成する問題に取り組む。 nanoは、以前の作品と比較して、単一のトピック/属性と定量化された分布制御で最先端の結果を得る。また,nanoは非定量的分布を学習し,パーソナライゼーションを実現し,サンプル効率の高い個人選好の違いを捉えることができることを示した。 Pretrained language models have demonstrated extraordinary capabilities in language generation. However, real-world tasks often require controlling the distribution of generated text in order to mitigate bias, promote fairness, and achieve personalization. Existing techniques for controlling the distribution of generated text only work with quantified distributions, which require pre-defined categories, proportions of the distribution, or an existing corpus following the desired distributions. However, many important distributions, such as personal preferences, are unquantified. In this work, we tackle the problem of generating text following arbitrary distributions (quantified and unquantified) by proposing Nano, a few-shot human-in-the-loop training algorithm that continuously learns from human feedback. Nano achieves state-of-the-art results on single topic/attribute as well as quantified distribution control compared to previous works. We also show that Nano is able to learn unquantified distributions, achieves personalization, and captures differences between different individuals' personal preferences with high sample efficiency.	翻訳日:2023-07-11 19:15:36 公開日:2023-07-09
# ブラックボックス検証アルゴリズムを用いた強化学習による運転の安全性向上 Self-Improving Safety Performance of Reinforcement Learning Based Driving with Black-Box Verification Algorithms ( http://arxiv.org/abs/2210.16575v3 ) ライセンス: Link先を確認	Resul Dagdanov, Halil Durmus, Nazim Kemal Ure	(参考訳) 本研究では,強化学習(RL)に基づく自律運転(AD)エージェントの安全性向上を目的とした,ブラックボックス検証手法を用いた自己改善人工知能システムを提案する。近年,ADアプリケーションでRLアルゴリズムが普及している。しかし、既存のRLアルゴリズムの性能はトレーニングシナリオの多様性に大きく依存している。トレーニング段階での安全性クリティカルなシナリオの欠如は、実世界の運転アプリケーションの一般化性能を低下させる可能性がある。本稿では,ブラックボックス検証手法を用いて,トレーニングセットの弱点を探索する新しい枠組みを提案する。 AD障害シナリオを発見した後、RLエージェントのトレーニングは転送学習を通じて再起動され、以前は安全ではなかったシナリオのパフォーマンスが向上する。シミュレーションの結果,RLに基づく適応巡航制御(ACC)アプリケーションにおける動作決定の安全性の低下を効果的に発見し,本手法の反復的適用により車両衝突回数を大幅に削減することを示した。ソースコードはhttps://github.com/data-and-decision-lab/self-improving-RLで公開されている。 In this work, we propose a self-improving artificial intelligence system to enhance the safety performance of reinforcement learning (RL)-based autonomous driving (AD) agents using black-box verification methods. RL algorithms have become popular in AD applications in recent years. However, the performance of existing RL algorithms heavily depends on the diversity of training scenarios. A lack of safety-critical scenarios during the training phase could result in poor generalization performance in real-world driving applications. We propose a novel framework in which the weaknesses of the training set are explored through black-box verification methods. After discovering AD failure scenarios, the RL agent's training is re-initiated via transfer learning to improve the performance of previously unsafe scenarios. Simulation results demonstrate that our approach efficiently discovers safety failures of action decisions in RL-based adaptive cruise control (ACC) applications and significantly reduces the number of vehicle collisions through iterative applications of our method. The source code is publicly available at https://github.com/data-and-decision-lab/self-improving-RL.	翻訳日:2023-07-11 19:15:18 公開日:2023-07-09
# シミュレーションに基づく推論のための合成スコアモデリング Compositional Score Modeling for Simulation-based Inference ( http://arxiv.org/abs/2209.14249v3 ) ライセンス: Link先を確認	Tomas Geffner, George Papamakarios, Andriy Mnih	(参考訳) シミュレーションに基づく推論のための神経後部推定法は、正確な近似を学習するために多数のシミュレーターコールを必要とする傾向があるため、複数の観測で条件付けした後部分布を扱うのに不適である。対照的に、Neural Likelihood Estimation法は個々の観測から学んだ後の推論時間で複数の観測を処理できるが、MCMCや変分推論のような標準的な推論法に依存しており、特定の性能上の欠点がある。本稿では,両手法の利点を享受する条件スコアモデリングに基づく新しい手法を提案する。個々の観測によって引き起こされる(拡散した)後方分布のスコアをモデル化し、学習したスコアを目標後方分布からほぼサンプルに結合する方法を紹介する。提案手法はサンプル効率が高く,自然に複数の観測結果を推定時に集約し,標準推定手法の欠点を回避することができる。 Neural Posterior Estimation methods for simulation-based inference can be ill-suited for dealing with posterior distributions obtained by conditioning on multiple observations, as they tend to require a large number of simulator calls to learn accurate approximations. In contrast, Neural Likelihood Estimation methods can handle multiple observations at inference time after learning from individual observations, but they rely on standard inference methods, such as MCMC or variational inference, which come with certain performance drawbacks. We introduce a new method based on conditional score modeling that enjoys the benefits of both approaches. We model the scores of the (diffused) posterior distributions induced by individual observations, and introduce a way of combining the learned scores to approximately sample from the target posterior distribution. Our approach is sample-efficient, can naturally aggregate multiple observations at inference time, and avoids the drawbacks of standard inference methods.	翻訳日:2023-07-11 19:14:22 公開日:2023-07-09
# DynDepNet:動的グラフ構造学習によるfMRIデータからの時間変化依存構造学習 DynDepNet: Learning Time-Varying Dependency Structures from fMRI Data via Dynamic Graph Structure Learning ( http://arxiv.org/abs/2209.13513v3 ) ライセンス: Link先を確認	Alexander Campbell, Antonio Giuliano Zippo, Luca Passamonti, Nicola Toschi, Pietro Lio	(参考訳) グラフニューラルネットワーク(GNN)は、機能的磁気共鳴画像(fMRI)データから得られる脳グラフの学習表現に成功している。しかし、既存のGNN法では、脳グラフは時間とともに静的であると仮定し、グラフ隣接行列はモデルトレーニング前に知られている。これらの仮定は、脳グラフが機能的接続尺度の選択に依存する接続構造を持つ時間変化である証拠と矛盾する。ノイズの多い脳グラフでfMRIデータを誤って表現することは、GNNのパフォーマンスに悪影響を及ぼす可能性がある。そこで我々は,下流予測タスクによって誘導されるfMRIデータの最適時間変化依存性構造を学習するDynDepNetを提案する。実世界のfMRIデータセットの実験は、性別分類のタスクにおいて、DynDepNetが最先端の結果を達成し、それぞれ8ポイントと6ポイントの精度で最高のベースラインを上回ります。さらに、学習したダイナミックグラフの分析により、既存の神経科学文献と一致する予測関連脳領域が明らかになる。 Graph neural networks (GNNs) have demonstrated success in learning representations of brain graphs derived from functional magnetic resonance imaging (fMRI) data. However, existing GNN methods assume brain graphs are static over time and the graph adjacency matrix is known prior to model training. These assumptions contradict evidence that brain graphs are time-varying with a connectivity structure that depends on the choice of functional connectivity measure. Incorrectly representing fMRI data with noisy brain graphs can adversely affect GNN performance. To address this, we propose DynDepNet, a novel method for learning the optimal time-varying dependency structure of fMRI data induced by downstream prediction tasks. Experiments on real-world fMRI datasets, for the task of sex classification, demonstrate that DynDepNet achieves state-of-the-art results, outperforming the best baseline in terms of accuracy by approximately 8 and 6 percentage points, respectively. Furthermore, analysis of the learned dynamic graphs reveals prediction-related brain regions consistent with existing neuroscience literature.	翻訳日:2023-07-11 19:14:06 公開日:2023-07-09
# 高次元(ロバスト)ワッサースタインアライメントに対するデータ依存的アプローチ A Data-dependent Approach for High Dimensional (Robust) Wasserstein Alignment ( http://arxiv.org/abs/2209.02905v2 ) ライセンス: Link先を確認	Hu Ding, Wenjie Liu, Mingquan Ye	(参考訳) 多くの実世界の問題は、2つの幾何学的パターンのアライメントとして定式化することができる。これまで多くの研究が、コンピュータビジョンの分野における2dまたは3dパターンのアライメントに焦点を当ててきた。近年,高次元のアライメント問題にいくつかの新しい応用が提案されている。しかし、この研究はアルゴリズム的な側面ではまだ限られている。我々の知る限りでは、既存のほとんどのアプローチは2次元および3次元のケースに対する単純な拡張であり、高い計算複雑性のような問題に悩まされることが多い。本稿では,高次元幾何学パターンを圧縮する効果的な枠組みを提案する。既存のアライメント法は圧縮幾何パターンに適用でき、時間の複雑さを大幅に削減できる。我々の考えは、高次元データはしばしば本質的な次元が低いという観察にインスパイアされている。我々のフレームワークは ``data-dependent' アプローチであり、入力データの本質的な次元に依存する複雑さを持つ。実験結果から, 圧縮パターン上でのアライメントアルゴリズムの実行は, 元のパターンと比較すると, 同様の特性が得られることがわかったが, 実行時(圧縮にかかる時間を含む)は著しく低い。 Many real-world problems can be formulated as the alignment between two geometric patterns. Previously, a great amount of research focus on the alignment of 2D or 3D patterns in the field of computer vision. Recently, the alignment problem in high dimensions finds several novel applications in practice. However, the research is still rather limited in the algorithmic aspect. To the best of our knowledge, most existing approaches are just simple extensions of their counterparts for 2D and 3D cases, and often suffer from the issues such as high computational complexities. In this paper, we propose an effective framework to compress the high dimensional geometric patterns. Any existing alignment method can be applied to the compressed geometric patterns and the time complexity can be significantly reduced. Our idea is inspired by the observation that high dimensional data often has a low intrinsic dimension. Our framework is a ``data-dependent'' approach that has the complexity depending on the intrinsic dimension of the input data. Our experimental results reveal that running the alignment algorithm on compressed patterns can achieve similar qualities, comparing with the results on the original patterns, but the runtimes (including the times cost for compression) are substantially lower.	翻訳日:2023-07-11 19:13:33 公開日:2023-07-09
# 高品質シャドウ合成によるシャドウ除去 Shadow Removal by High-Quality Shadow Synthesis ( http://arxiv.org/abs/2212.04108v2 ) ライセンス: Link先を確認	Yunshan Zhong, Lizhou You, Yuxin Zhang, Fei Chao, Yonghong Tian, Rongrong Ji	(参考訳) ほとんどのシャドウ除去手法は、精巧で豪華なシャドウ領域アノテーションに関連するトレーニング画像の侵入に依存しているため、シャドウ画像合成の人気が高まっている。しかし、これらの合成画像は、しばしば陰性で細部が不完全であるため、性能が劣っている。本稿では,高品質擬似影画像合成のためのhqssと呼ばれる新しい生成フレームワークを提案する。与えられた画像はまずシャドー領域idと非シャドー領域idに分離される。 HQSSは擬似画像を合成するためにシャドー機能エンコーダとジェネレータを使用している。具体的には、エンコーダは、他の領域アイデンティティとペアになって擬似画像を合成するジェネレータ入力として機能する領域アイデンティティの影特徴を抽出する。擬似画像は、その入力影特徴としての影特徴と、その入力領域のアイデンティティとしてのリアルライクな画像詳細を有することが期待されている。この目標を達成するために,我々は3つの学習目標を設計する。影の特徴と入力領域のアイデンティティが同じ領域の同一性を持つ場合、生成元を誘導して同一の擬似画像を入力として再構成する自己再構成損失を提案する。シャドウ特徴と入力領域の同一性が異なる場合、合成画像中にシャドウ特性と詳細情報が適切に保持されることを確認するために、再構成間損失とサイクル再構成損失を導入する。我々のHQSSは、ISTDデータセット、ビデオシャドウ除去データセット、SRDデータセットにおいて最先端の手法よりも優れています。コードはhttps://github.com/zysxmu/hqssで入手できる。 Most shadow removal methods rely on the invasion of training images associated with laborious and lavish shadow region annotations, leading to the increasing popularity of shadow image synthesis. However, the poor performance also stems from these synthesized images since they are often shadow-inauthentic and details-impaired. In this paper, we present a novel generation framework, referred to as HQSS, for high-quality pseudo shadow image synthesis. The given image is first decoupled into a shadow region identity and a non-shadow region identity. HQSS employs a shadow feature encoder and a generator to synthesize pseudo images. Specifically, the encoder extracts the shadow feature of a region identity which is then paired with another region identity to serve as the generator input to synthesize a pseudo image. The pseudo image is expected to have the shadow feature as its input shadow feature and as well as a real-like image detail as its input region identity. To fulfill this goal, we design three learning objectives. When the shadow feature and input region identity are from the same region identity, we propose a self-reconstruction loss that guides the generator to reconstruct an identical pseudo image as its input. When the shadow feature and input region identity are from different identities, we introduce an inter-reconstruction loss and a cycle-reconstruction loss to make sure that shadow characteristics and detail information can be well retained in the synthesized images. Our HQSS is observed to outperform the state-of-the-art methods on ISTD dataset, Video Shadow Removal dataset, and SRD dataset. The code is available at https://github.com/zysxmu/HQSS.	翻訳日:2023-07-11 19:04:54 公開日:2023-07-09
# 連続学習のための逐次ベイズ推論について On Sequential Bayesian Inference for Continual Learning ( http://arxiv.org/abs/2301.01828v2 ) ライセンス: Link先を確認	Samuel Kessler, Adam Cobb, Tim G. J. Rudner, Stefan Zohren, Stephen J. Roberts	(参考訳) 連続ベイズ推論は、過去のタスクの破滅的な忘れ込みを防止し、新しいタスクを学ぶ前に情報を提供するために連続学習に使用できる。我々はシーケンシャルベイズ推定を再検討し、真の後方へのアクセスがベイズニューラルネットワークの破滅的な忘れを防げるかどうかを検証する。これを行うために、ハミルトンモンテカルロを用いて連続ベイズ推論を行う。我々は、ハミルトンモンテカルロサンプルに密度推定器を組み込むことにより、新しいタスクの先行として後部を伝播する。ニューラルネットワークにおける逐次ベイズ推論の実行の困難さを示す破滅的な忘れ込みを防ぐには,このアプローチは失敗する。そこで, 逐次ベイズ推論とCLの簡単な解析例を考察し, 正確な推論にもかかわらず, 準最適連続学習性能に繋がるモデル不特定の問題を強調した。さらに、タスクデータの不均衡がいかに忘れてしまうかについて議論する。これらの制限から、ベイズニューラルネットワークの重みに対する逐次ベイズ推定に頼るのではなく、連続的な学習生成過程の確率論的モデルが必要であると論じる。そこで本研究では,古典的ベイズ連続学習法と競合する,原型的ベイズ連続学習という単純なベースラインを提案する。 Sequential Bayesian inference can be used for continual learning to prevent catastrophic forgetting of past tasks and provide an informative prior when learning new tasks. We revisit sequential Bayesian inference and test whether having access to the true posterior is guaranteed to prevent catastrophic forgetting in Bayesian neural networks. To do this we perform sequential Bayesian inference using Hamiltonian Monte Carlo. We propagate the posterior as a prior for new tasks by fitting a density estimator on Hamiltonian Monte Carlo samples. We find that this approach fails to prevent catastrophic forgetting demonstrating the difficulty in performing sequential Bayesian inference in neural networks. From there we study simple analytical examples of sequential Bayesian inference and CL and highlight the issue of model misspecification which can lead to sub-optimal continual learning performance despite exact inference. Furthermore, we discuss how task data imbalances can cause forgetting. From these limitations, we argue that we need probabilistic models of the continual learning generative process rather than relying on sequential Bayesian inference over Bayesian neural network weights. In this vein, we also propose a simple baseline called Prototypical Bayesian Continual Learning, which is competitive with state-of-the-art Bayesian continual learning methods on class incremental continual learning vision benchmarks.	翻訳日:2023-07-11 18:54:53 公開日:2023-07-09
# Adaptive Experimentation at Scale: 柔軟なバッチのための計算フレームワーク Adaptive Experimentation at Scale: A Computational Framework for Flexible Batches ( http://arxiv.org/abs/2303.11582v3 ) ライセンス: Link先を確認	Ethan Che, Hongseok Namkoong	(参考訳) 計測努力の継続的な再配置を仮定する標準的なバンディットアルゴリズムは、遅延したフィードバックとインフラ/組織的困難のために実装が困難である。結果がバッチで測定される少数の再配置時代の実例に動機づけられて,バッチ処理を柔軟に処理可能な計算駆動型適応実験フレームワークを開発した。我々の主な観察は、統計的推論において普遍的な正規近似は適応アルゴリズムの設計を導くことができることである。ガウスの逐次実験を導出することにより,先行情報を平均報酬に活用できる動的プログラムを定式化する。一般的な理論駆動のパラダイムの代わりに、計算ツールと経験的ベンチマークをアルゴリズム開発に活用する。特に,経験的解析では,確率的勾配降下を用いて計画問題を反復的に解く,単純かつ効果的なアルゴリズムである残留地平線最適化を強調する。我々の手法は、個々の報酬の完全な分布的知識を必要とするベイズ帯域幅アルゴリズム(例えばトンプソンサンプリング)と比較しても、標準手法よりも統計的パワーを著しく向上させる。全体的に、適応実験の範囲を標準的な方法では難しい設定に拡大し、少数の再配置エポック、低い信号対雑音比、未知の報酬分布を含む。 Standard bandit algorithms that assume continual reallocation of measurement effort are challenging to implement due to delayed feedback and infrastructural/organizational difficulties. Motivated by practical instances involving a handful of reallocation epochs in which outcomes are measured in batches, we develop a computation-driven adaptive experimentation framework that can flexibly handle batching. Our main observation is that normal approximations, which are universal in statistical inference, can also guide the design of adaptive algorithms. By deriving a Gaussian sequential experiment, we formulate a dynamic program that can leverage prior information on average rewards. Instead of the typical theory-driven paradigm, we leverage computational tools and empirical benchmarking for algorithm development. In particular, our empirical analysis highlights a simple yet effective algorithm, Residual Horizon Optimization, which iteratively solves a planning problem using stochastic gradient descent. Our approach significantly improves statistical power over standard methods, even when compared to Bayesian bandit algorithms (e.g., Thompson sampling) that require full distributional knowledge of individual rewards. Overall, we expand the scope of adaptive experimentation to settings that are difficult for standard methods, involving a small number of reallocation epochs, low signal-to-noise ratio, and unknown reward distributions.	翻訳日:2023-07-11 18:46:09 公開日:2023-07-09
# 3次元点雲における開ボキャブラリーアフォーアンス検出 Open-Vocabulary Affordance Detection in 3D Point Clouds ( http://arxiv.org/abs/2303.02401v2 ) ライセンス: Link先を確認	Toan Nguyen, Minh Nhat Vu, An Vuong, Dzung Nguyen, Thieu Vo, Ngan Le, Anh Nguyen	(参考訳) 加速度検出は様々なロボット応用において難しい問題である。従来のアフォーアンス検出手法は、予め定義されたアフォーアンスラベルに制限されており、複雑な動的環境でのインテリジェントロボットの適応性を制限する可能性がある。そこで,本稿では,3次元点雲内の無拘束数を検出できるopen-vocabulary affordance detection (openad)法を提案する。 OpenADは、手当テキストとポイント特徴を同時に学習することで、手当間の意味的関係をうまく活用する。したがって,提案手法はゼロショット検出が可能であり,単一アノテーションの例を使わずに,事前の認識不能を検出できる。集中的な実験結果から,OpenADは幅広いアベイランス検出装置で効果的に機能し,他のベースラインよりも大きなマージンで優れていた。さらに,高速な推論速度(約100ms)を持つ実世界のロボットアプリケーションにおいて,提案するOpenADの実用性を示す。私たちのプロジェクトはhttps://openad2023.github.ioで利用可能です。 Affordance detection is a challenging problem with a wide variety of robotic applications. Traditional affordance detection methods are limited to a predefined set of affordance labels, hence potentially restricting the adaptability of intelligent robots in complex and dynamic environments. In this paper, we present the Open-Vocabulary Affordance Detection (OpenAD) method, which is capable of detecting an unbounded number of affordances in 3D point clouds. By simultaneously learning the affordance text and the point feature, OpenAD successfully exploits the semantic relationships between affordances. Therefore, our proposed method enables zero-shot detection and can be able to detect previously unseen affordances without a single annotation example. Intensive experimental results show that OpenAD works effectively on a wide range of affordance detection setups and outperforms other baselines by a large margin. Additionally, we demonstrate the practicality of the proposed OpenAD in real-world robotic applications with a fast inference speed (~100ms). Our project is available at https://openad2023.github.io.	翻訳日:2023-07-11 18:44:55 公開日:2023-07-09
# FedCLIP:フェデレートラーニングにおけるCLIPの迅速な一般化とパーソナライズ FedCLIP: Fast Generalization and Personalization for CLIP in Federated Learning ( http://arxiv.org/abs/2302.13485v2 ) ライセンス: Link先を確認	Wang Lu, Xixu Hu, Jindong Wang, Xing Xie	(参考訳) フェデレーション学習(fl)は,近年,プライバシ保護計算の新しいパラダイムとして登場している。残念ながら、FLはその実際のパフォーマンスを妨げる2つの重要な課題に直面している。特に、異なるクライアントの非IIDデータは既存のFLアルゴリズムを収束させるのを難しくし、実際のシナリオでのデプロイメントの難しさを増大させる計算コストや通信コストを含む高いリソースコストがかかる。本稿では,フェデレート学習におけるCLIPの迅速な一般化とパーソナライズを実現するために,FedCLIPという効果的かつシンプルな手法を提案する。具体的には,大規模モデルであるCLIPのアテンションベースのアダプタを設計し,残りの操作はアダプタにのみ依存する。軽量アダプタは事前訓練されたモデル情報を最大限活用し、特定のタスクにおいてモデルがクライアントに適応することを保証する。同時に、大規模モデルによる計算負担と通信負担を軽減することができる。分布シフトを伴う3つのデータセットに対して大規模な実験を行う。定性的かつ定量的な結果は、FedCLIPが他のベースライン(PACS全体の9%の改善)を著しく上回り、計算と通信のコスト(FedAVGより283倍速い)を効果的に削減していることを示している。私たちのコードは、https://github.com/microsoft/PersonalizedFL.comで利用可能です。 Federated learning (FL) has emerged as a new paradigm for privacy-preserving computation in recent years. Unfortunately, FL faces two critical challenges that hinder its actual performance: data distribution heterogeneity and high resource costs brought by large foundation models. Specifically, the non-IID data in different clients make existing FL algorithms hard to converge while the high resource costs, including computational and communication costs that increase the deployment difficulty in real-world scenarios. In this paper, we propose an effective yet simple method, named FedCLIP, to achieve fast generalization and personalization for CLIP in federated learning. Concretely, we design an attention-based adapter for the large model, CLIP, and the rest operations merely depend on adapters. Lightweight adapters can make the most use of pretrained model information and ensure models be adaptive for clients in specific tasks. Simultaneously, small-scale operations can mitigate the computational burden and communication burden caused by large models. Extensive experiments are conducted on three datasets with distribution shifts. Qualitative and quantitative results demonstrate that FedCLIP significantly outperforms other baselines (9% overall improvements on PACS) and effectively reduces computational and communication costs (283x faster than FedAVG). Our code will be available at: https://github.com/microsoft/PersonalizedFL.	翻訳日:2023-07-11 18:43:37 公開日:2023-07-09
# MCCは幾何平均の精度に近づき、真の負は無限に近づきます The MCC approaches the geometric mean of precision and recall as true negatives approach infinity ( http://arxiv.org/abs/2305.00594v2 ) ライセンス: Link先を確認	Jon Crall	(参考訳) 二項分類器の性能は、真正数(TP)、真負数(TN)、偽正数(FP)、偽負数(FN)の4つのエントリからなる混乱行列によって記述される。マシューの相関係数(MCC)、F1、Fowlkes-Mallows(FM)スコアは、混乱行列をまとめたスカラーである。 F1 と FM のスコアは、混乱行列の4つのエントリのうち3つしか基づかない(それらは TN を無視している)。対照的に、mcc は混乱行列の4つのエントリすべてを考慮し、より代表的なイメージを提供すると見なすことができる。しかし、物体検出問題において、真の負の数を測定するのは非常に大きいため、しばしば難解である。したがって、真の負の数が無限大に近づくと、MCCはどうなるのか? 本稿では,真の負の数が無限に近づくと,fm測定値がmccの限界値に等しいことを証明し,mccとfmスコアの関係について考察する。 The performance of a binary classifier is described by a confusion matrix with four entries: the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). The Matthew's Correlation Coefficient (MCC), F1, and Fowlkes--Mallows (FM) scores are scalars that summarize a confusion matrix. Both the F1 and FM scores are based on only three of the four entries in the confusion matrix (they ignore TN). In contrast, the MCC takes into account all four entries of the confusion matrix and thus can be seen as providing a more representative picture. However, in object detection problems, measuring the number of true negatives is so large it is often intractable. Thus we ask, what happens to the MCC as the number of true negatives approaches infinity? This paper provides insight into the relationship between the MCC and FM score by proving that the FM-measure is equal to the limit of the MCC as the number of true negatives approaches infinity.	翻訳日:2023-07-11 18:37:16 公開日:2023-07-09
# 動きブレアを有する大規模シーンのためのハイブリッドニューラルレンダリング Hybrid Neural Rendering for Large-Scale Scenes with Motion Blur ( http://arxiv.org/abs/2304.12652v2 ) ライセンス: Link先を確認	Peng Dai, Yinda Zhang, Xin Yu, Xiaoyang Lyu, Xiaojuan Qi	(参考訳) 新規なビューイメージのレンダリングは多くのアプリケーションにとって非常に望ましい。近年の進歩にもかかわらず、不可避なアーティファクト(例えば、動きのぼかし)で、野生のイメージから大規模シーンの高忠実さとビュー一貫性を保った斬新なビューをレンダリングすることは、依然として困難である。そこで我々は,画像ベース表現とニューラル3D表現を結合して高品質なビュー一貫性画像を生成するハイブリッドなニューラルレンダリングモデルを開発した。さらに、野生で撮影された画像には、レンダリングされた画像の品質を劣化させる動きのぼやけなど、必然的に人工物が含まれている。そこで本研究では,画像のぼかし効果をシミュレートし,ぼやけた画像の悪影響を軽減し,事前計算した品質認識重みに基づいて学習中の重要度を低減させる手法を提案する。実データおよび合成データに関する広範な実験により,新しい視点合成のための最先端のポイントベース手法を超越したモデルが証明された。コードはhttps://daipengwa.github.io/hybrid-rendering-projectpageで入手できる。 Rendering novel view images is highly desirable for many applications. Despite recent progress, it remains challenging to render high-fidelity and view-consistent novel views of large-scale scenes from in-the-wild images with inevitable artifacts (e.g., motion blur). To this end, we develop a hybrid neural rendering model that makes image-based representation and neural 3D representation join forces to render high-quality, view-consistent images. Besides, images captured in the wild inevitably contain artifacts, such as motion blur, which deteriorates the quality of rendered images. Accordingly, we propose strategies to simulate blur effects on the rendered images to mitigate the negative influence of blurriness images and reduce their importance during training based on precomputed quality-aware weights. Extensive experiments on real and synthetic data demonstrate our model surpasses state-of-the-art point-based methods for novel view synthesis. The code is available at https://daipengwa.github.io/Hybrid-Rendering-ProjectPage.	翻訳日:2023-07-11 18:36:49 公開日:2023-07-09
# Adaptive Spiking Encoder-Decoder Network を用いた高精度かつ効率的なイベントベースセマンティックセマンティックセグメンテーション Accurate and Efficient Event-based Semantic Segmentation Using Adaptive Spiking Encoder-Decoder Network ( http://arxiv.org/abs/2304.11857v2 ) ライセンス: Link先を確認	Rui Zhang, Luziwei Leng, Kaiwei Che, Hu Zhang, Jie Cheng, Qinghai Guo, Jiangxing Liao and Ran Cheng	(参考訳) 低消費電力でイベント駆動型計算と固有の時間的ダイナミクスを活用して、スパイクニューラルネットワーク(SNN)は、イベントベースのセンサーから動的および非同期信号を処理するための、潜在的に理想的なソリューションである。しかしながら、トレーニングの課題とアーキテクチャ設計の制約により、人工知能ニューラルネットワーク(ANN)と比較して、イベントベースの高密度予測という領域における競合するSNNの例は限られている。本稿では,大規模なイベントベースセマンティックセマンティックセグメンテーションタスクのために設計された,効率的なスパイキングエンコーダデコーダネットワークを提案する。これは階層探索法を用いてエンコーダを最適化することで達成される。動的イベントストリームからの学習を強化するために,スパイキングニューロンの固有適応しきい値を用いてネットワーク活性化を変調する。さらに,スパースイベントの表現性を高め,ネットワーク性能を著しく向上させるために,二経路空間適応変調(SSAM)ブロックを導入する。提案するネットワークは,ddd17データセット上では72.57%,最近導入された大規模dsec-semanticデータセットでは57.22%のmiouを実現する。この性能は、現在の最先端のANNを4%上回り、計算リソースを著しく削減している。我々の知る限りでは、イベントベースセマンティックセグメンテーションタスクにおいて、SNNがANNよりも優れていることを示す最初の研究であり、イベントベースビジョンの分野でSNNの巨大な可能性を確立する。私たちのソースコードは公開されます。 Leveraging the low-power, event-driven computation and the inherent temporal dynamics, spiking neural networks (SNNs) are potentially ideal solutions for processing dynamic and asynchronous signals from event-based sensors. However, due to the challenges in training and the restrictions in architectural design, there are limited examples of competitive SNNs in the realm of event-based dense prediction when compared to artificial neural networks (ANNs). In this paper, we present an efficient spiking encoder-decoder network designed for large-scale event-based semantic segmentation tasks. This is achieved by optimizing the encoder using a hierarchical search method. To enhance learning from dynamic event streams, we harness the inherent adaptive threshold of spiking neurons to modulate network activation. Moreover, we introduce a dual-path Spiking Spatially-Adaptive Modulation (SSAM) block, specifically designed to enhance the representation of sparse events, thereby considerably improving network performance. Our proposed network achieves a 72.57% mean intersection over union (MIoU) on the DDD17 dataset and a 57.22% MIoU on the recently introduced, larger DSEC-Semantic dataset. This performance surpasses the current state-of-the-art ANNs by 4%, whilst consuming significantly less computational resources. To the best of our knowledge, this is the first study demonstrating SNNs outperforming ANNs in demanding event-based semantic segmentation tasks, thereby establishing the vast potential of SNNs in the field of event-based vision. Our source code will be made publicly accessible.	翻訳日:2023-07-11 18:36:26 公開日:2023-07-09
# 大規模言語モデルの創造性について On the Creativity of Large Language Models ( http://arxiv.org/abs/2304.00008v3 ) ライセンス: Link先を確認	Giorgio Franceschelli, Mirco Musolesi	(参考訳) 大規模言語モデル(LLM)は、人工知能のいくつかの領域に革命をもたらしている。最も顕著な応用の1つは、例えば詩やストーリーテリングのような創造的な執筆である: 生成されたアウトプットは、しばしば驚くべき品質である。しかし、自然の疑問が生まれます。 LLMは本当に創造的であるか? この記事では、まず創造性理論のレンズの下でllmの開発を分析し、鍵となるオープン質問と課題を調査します。特に、マーガレット・ボーデン(Margaret Boden)が自身の著書で提案した、価値、斬新、驚きの次元に関する議論に焦点をあてる。次に, 製品, プロセス, プレス, パーソナライズという, 異なる古典的視点を考える。我々は,機械の創造性における「easy」と「hard」の一連の問題を論じ,LLMに関連する問題を提示する。最後に,これらの技術の社会的影響を,特に創造産業に焦点を絞って検討し,それらがもたらす機会,それらによって生じる課題,法的・倫理的な観点からの潜在的なリスクを分析した。 Large Language Models (LLMs) are revolutionizing several areas of Artificial Intelligence. One of the most remarkable applications is creative writing, e.g., poetry or storytelling: the generated outputs are often of astonishing quality. However, a natural question arises: can LLMs be really considered creative? In this article we firstly analyze the development of LLMs under the lens of creativity theories, investigating the key open questions and challenges. In particular, we focus our discussion around the dimensions of value, novelty and surprise as proposed by Margaret Boden in her work. Then, we consider different classic perspectives, namely product, process, press and person. We discuss a set of ``easy'' and ``hard'' problems in machine creativity, presenting them in relation to LLMs. Finally, we examine the societal impact of these technologies with a particular focus on the creative industries, analyzing the opportunities offered by them, the challenges arising by them and the potential associated risks, from both legal and ethical points of view.	翻訳日:2023-07-11 18:35:06 公開日:2023-07-09
# SKIの高速化 - 非対称カーネルによるToeplitzニューラルネットワークの高速化 SKI to go Faster: Accelerating Toeplitz Neural Networks via Asymmetric Kernels ( http://arxiv.org/abs/2305.09028v2 ) ライセンス: Link先を確認	Alexander Moreno, Jonathan Mei, Luke Walters	(参考訳) Toeplitz Neural Networks (TNN) (Qin et. al. 2023) は、印象的な結果を持つ最近のシーケンスモデルである。これらは O(n log n) 計算複雑性と O(n) 相対位置エンコーダ (RPE) 多層パーセプトロン (MLP) と崩壊バイアス呼び出しを必要とする。私たちは両方を減らすことを目指している。まず、RPEは非SPD(対称正定値)カーネルであり、Toeplitz行列は擬グラム行列である。さらに 1) 学習した核は,主対角線付近にスパイクな振る舞いを示す。 2) RPE MLP は遅い。双方向モデルの場合、これはスパースと低ランクのToeplitz行列分解を動機付ける。スパース成分の作用に対して、我々は小さな1D畳み込みを行う。低階成分に対しては、線形補間により RPE MLP を置換し、O(n) の複雑性に対して非対称な構造化カーネル補間 (SKI) (Wilson et. al. 2015) を用いる。因果モデルでは、"高速"因果マスク (Katharopoulos et. al. 2020) はSKIの利点を否定する。周波数領域では、明示的な減衰バイアスを避ける。因果関係を強制するために、RPEを用いて周波数応答の実部を通してカーネルを表現し、ヒルベルト変換を用いて虚部を計算する。これは O(n log n) の複雑性を維持するが、絶対的なスピードアップを達成する。周波数応答を直接モデル化することは、FFTを1つ減らして双方向の訓練にも適している。我々は,最小限のスコア劣化を伴って,ロングレンジアリーナ(Tay et al. 2020)の速度状態を設定した。 Toeplitz Neural Networks (TNNs) (Qin et. al. 2023) are a recent sequence model with impressive results. They require O(n log n) computational complexity and O(n) relative positional encoder (RPE) multi-layer perceptron (MLP) and decay bias calls. We aim to reduce both. We first note that the RPE is a non-SPD (symmetric positive definite) kernel and the Toeplitz matrices are pseudo-Gram matrices. Further 1) the learned kernels display spiky behavior near the main diagonals with otherwise smooth behavior; 2) the RPE MLP is slow. For bidirectional models, this motivates a sparse plus low-rank Toeplitz matrix decomposition. For the sparse component's action, we do a small 1D convolution. For the low rank component, we replace the RPE MLP with linear interpolation and use asymmetric Structured Kernel Interpolation (SKI) (Wilson et. al. 2015) for O(n) complexity: we provide rigorous error analysis. For causal models, "fast" causal masking (Katharopoulos et. al. 2020) negates SKI's benefits. Working in the frequency domain, we avoid an explicit decay bias. To enforce causality, we represent the kernel via the real part of its frequency response using the RPE and compute the imaginary part via a Hilbert transform. This maintains O(n log n) complexity but achieves an absolute speedup. Modeling the frequency response directly is also competitive for bidirectional training, using one fewer FFT. We set a speed state of the art on Long Range Arena (Tay et. al. 2020) with minimal score degradation.	翻訳日:2023-07-11 18:25:35 公開日:2023-07-09
# MARS: 車両損傷事例セグメンテーションのためのシークエンシャル・クアドツリーノードを用いたマスク注意保持 MARS: Mask Attention Refinement with Sequential Quadtree Nodes for Car Damage Instance Segmentation ( http://arxiv.org/abs/2305.04743v2 ) ライセンス: Link先を確認	Teerapong Panboonyuen, Naphat Nithisopa, Panin Pienroj, Laphonchai Jirachuphun, Chaiwasut Watthanasirikrit, Naruepon Pornwiriyakul	(参考訳) 自動車保険業界にとって不運による自動車被害の評価は重要である。しかし、ディープラーニングネットワークは入力として車の損傷画像用に設計されておらず、セグメンテッドマスクはいまだに非常に粗いため、現実のアプリケーションでは精度が不十分である。本稿では,車両損傷事例分割のためのmars(mask attentionfine with sequential quadtree node)を提案する。我々のMARSは、シーケンシャルなクアッドツリーノード層とクアッドツリートランスフォーマーの間のグローバルな依存関係を引き出す自己注意機構を示し、チャネル重みを補正し、高精度なインスタンスマスクを予測する。広範囲にわたる実験により,mars は +1.3 maskap ベースの r50-fpn バックボーンと +2.3 maskap ベースの r101-fpn バックボーンによって,マスキング r-cnn [9] や pointrend [13] や mask transfiner [12] といった3つの人気のあるベンチマークで,最先端 (sota) インスタンスのセグメンテーション法を上回っていることが証明された。デモはhttps://github.com/kaopanboonyuen/MARS.comで公開しています。 Evaluating car damages from misfortune is critical to the car insurance industry. However, the accuracy is still insufficient for real-world applications since the deep learning network is not designed for car damage images as inputs, and its segmented masks are still very coarse. This paper presents MARS (Mask Attention Refinement with Sequential quadtree nodes) for car damage instance segmentation. Our MARS represents self-attention mechanisms to draw global dependencies between the sequential quadtree nodes layer and quadtree transformer to recalibrate channel weights and predict highly accurate instance masks. Our extensive experiments demonstrate that MARS outperforms state-of-the-art (SOTA) instance segmentation methods on three popular benchmarks such as Mask R-CNN [9], PointRend [13], and Mask Transfiner [12], by a large margin of +1.3 maskAP-based R50-FPN backbone and +2.3 maskAP-based R101-FPN backbone on Thai car-damage dataset. Our demos are available at https://github.com/kaopanboonyuen/MARS.	翻訳日:2023-07-11 18:24:23 公開日:2023-07-09
# CrAFT: 効率的な視覚タスク適応のための圧縮対応ファインチューニング CrAFT: Compression-Aware Fine-Tuning for Efficient Visual Task Adaptation ( http://arxiv.org/abs/2305.04526v2 ) ライセンス: Link先を確認	Jung Hwan Heo, Seyedarmin Azizi, Arash Fayyazi, Massoud Pedram	(参考訳) 転移学習は基礎モデルの時代において一般的なタスク適応手法となった。しかし、多くのファンデーションモデルは大規模なストレージとコンピューティングリソースを必要としている。プルーニングや量子化といったポストトレーニング圧縮技術は、デプロイメントコストの削減に役立つ。残念ながら、結果として生じるパフォーマンス劣化は、そのようなテクニックのユーザビリティとメリットを制限します。この性能ギャップを埋めるために,ネットワーク圧縮を効果的に学習できる簡易な微調整フレームワークCrAFTを提案する。 CrAFTでは、ユーザーは単にデフォルトの微調整スケジュールとシャープネスの最小化目標を使い、同時にタスク適応と圧縮親和性を容易にする。事前トレーニング中に適用される従来のシャープネス最小化技術とは対照的に、CrAFTアプローチでは、単一のGPUで数分または数時間で微調整を行うため、無視可能なトレーニングオーバーヘッドが加わる。汎用ツールであるCrAFTの有効性は,多種多様な目標タスクにおいて,畳み込みに基づく視覚基盤モデルと注意に基づく視覚基盤モデルの両方で実証された。コードは公開される予定だ。 Transfer learning has become a popular task adaptation method in the era of foundation models. However, many foundation models require large storage and computing resources, which makes off-the-shelf deployment impractical. Post-training compression techniques such as pruning and quantization can help lower deployment costs. Unfortunately, the resulting performance degradation limits the usability and benefits of such techniques. To close this performance gap, we propose CrAFT, a simple fine-tuning framework that enables effective post-training network compression. In CrAFT, users simply employ the default fine-tuning schedule along with sharpness minimization objective, simultaneously facilitating task adaptation and compression-friendliness. Contrary to the conventional sharpness minimization techniques, which are applied during pretraining, the CrAFT approach adds negligible training overhead as fine-tuning is done in under a couple of minutes or hours with a single GPU. The effectiveness of CrAFT, which is a general-purpose tool that can significantly boost one-shot pruning and post-training quantization, is demonstrated on both convolution-based and attention-based vision foundation models on a variety of target tasks. The code will be made publicly available.	翻訳日:2023-07-11 18:23:57 公開日:2023-07-09
# DiffusEmp: 共感応答生成のための多点制御による拡散モデルベースフレームワーク DiffusEmp: A Diffusion Model-Based Framework with Multi-Grained Control for Empathetic Response Generation ( http://arxiv.org/abs/2306.01657v2 ) ライセンス: Link先を確認	Guanqun Bi, Lei Shen, Yanan Cao, Meng Chen, Yuqiang Xie, Zheng Lin and Xiaodong He	(参考訳) 共感はオープンドメインの会話において重要な要素であり、他人の世話や理解を自然に示します。共感応答を生成するためにいくつかの方法が提案されているが、既存の作品はしばしば汎用的で安全な表現を参照する単調な共感に繋がる。本稿では,対話コンテキストと属性指向制御信号の利用を統一する条件拡散言語モデルに基づいて,共感表現のガイドとフレームワークDiffusEmpの設計に明示的な制御を用いることを提案する。具体的には, コミュニケーション機構, 意図, セマンティックフレームを, 粗いレベルから細かいレベルへの共感の実現を制御するための, 多粒度信号として輸入する。次に,多重粒度信号と応答トークンの関係を反映したマスキング戦略をデザインし,生成過程に影響を与える拡散モデルに統合する。ベンチマークデータセットEmpatheticDialogueの実験結果から,我々のフレームワークは文脈関連性を失うことなく,制御性,情報性,多様性の点で競争ベースラインを上回っていることがわかった。 Empathy is a crucial factor in open-domain conversations, which naturally shows one's caring and understanding to others. Though several methods have been proposed to generate empathetic responses, existing works often lead to monotonous empathy that refers to generic and safe expressions. In this paper, we propose to use explicit control to guide the empathy expression and design a framework DiffusEmp based on conditional diffusion language model to unify the utilization of dialogue context and attribute-oriented control signals. Specifically, communication mechanism, intent, and semantic frame are imported as multi-grained signals that control the empathy realization from coarse to fine levels. We then design a specific masking strategy to reflect the relationship between multi-grained signals and response tokens, and integrate it into the diffusion model to influence the generative process. Experimental results on a benchmark dataset EmpatheticDialogue show that our framework outperforms competitive baselines in terms of controllability, informativeness, and diversity without the loss of context-relatedness.	翻訳日:2023-07-11 18:17:06 公開日:2023-07-09
# 会話における感情認識のための教師付きコントラスト学習 Supervised Adversarial Contrastive Learning for Emotion Recognition in Conversations ( http://arxiv.org/abs/2306.01505v2 ) ライセンス: Link先を確認	Dou Hu, Yinan Bao, Lingwei Wei, Wei Zhou, Songlin Hu	(参考訳) 一般化されたロバスト表現の抽出は、会話における感情認識(erc)において大きな課題である。そこで本研究では,クラススプレッド構造表現を教師付きで学習するための,教師付き対逆学習(SACL)フレームワークを提案する。 SACLはコントラスト対応逆行訓練を適用し、最悪のサンプルを生成し、コントラスト学習を用いて構造化表現を抽出する。ラベルレベルの機能一貫性を効果的に活用し、クラス内の細かな機能を保持できる。文脈依存データに対する敵意摂動の悪影響を避けるために,コンテキストからより多様な特徴を学習し,モデルのコンテキストロバスト性を高めるために,cat(contextual adversarial training)戦略を設計する。 CAT を用いたフレームワークでは,ERC のラベル一貫性とコンテキスト特性を学習するためのシーケンスベース SACL-LSTM を開発した。 3つのデータセットの実験により、SACL-LSTMはERCの最先端のパフォーマンスを達成することが示された。拡張実験はSACLとCATの有効性を証明した。 Extracting generalized and robust representations is a major challenge in emotion recognition in conversations (ERC). To address this, we propose a supervised adversarial contrastive learning (SACL) framework for learning class-spread structured representations in a supervised manner. SACL applies contrast-aware adversarial training to generate worst-case samples and uses joint class-spread contrastive learning to extract structured representations. It can effectively utilize label-level feature consistency and retain fine-grained intra-class features. To avoid the negative impact of adversarial perturbations on context-dependent data, we design a contextual adversarial training (CAT) strategy to learn more diverse features from context and enhance the model's context robustness. Under the framework with CAT, we develop a sequence-based SACL-LSTM to learn label-consistent and context-robust features for ERC. Experiments on three datasets show that SACL-LSTM achieves state-of-the-art performance on ERC. Extended experiments prove the effectiveness of SACL and CAT.	翻訳日:2023-07-11 18:16:49 公開日:2023-07-09
# 因果部分構造を用いたシフトロバスト分子関係学習 Shift-Robust Molecular Relational Learning with Causal Substructure ( http://arxiv.org/abs/2305.18451v2 ) ライセンス: Link先を確認	Namkyeong Lee, Kanghoon Yoon, Gyoung S. Na, Sein Kim, Chanyoung Park	(参考訳) 近年、分子対間の相互作用の振る舞いを予測することを目的とした分子関係学習が、幅広い応用のために分子科学への関心が高まっている。本研究では,分子関係学習における分布変化に頑健なCMRLを提案する。そこで我々はまず,分子科学の領域知識に基づいて因果関係を仮定し,変数間の関係を明らかにする構造因果モデル(SCM)を構築する。 SCMに基づいて, 組換え分子上での干渉を条件付けした新しい条件付き干渉機構を導入する。条件付き介入の枠組みにより,本モデルは因果的サブ構造から学習し,化学反応に急激な相関を持つショートカットサブ構造の共起効果を緩和する。実世界および合成データセットを用いた様々なタスクに関する大規模な実験は、最先端のベースラインモデルよりもCMRLの方が優れていることを示す。私たちのコードはhttps://github.com/namkyeong/cmrlで利用可能です。 Recently, molecular relational learning, whose goal is to predict the interaction behavior between molecular pairs, got a surge of interest in molecular sciences due to its wide range of applications. In this work, we propose CMRL that is robust to the distributional shift in molecular relational learning by detecting the core substructure that is causally related to chemical reactions. To do so, we first assume a causal relationship based on the domain knowledge of molecular sciences and construct a structural causal model (SCM) that reveals the relationship between variables. Based on the SCM, we introduce a novel conditional intervention framework whose intervention is conditioned on the paired molecule. With the conditional intervention framework, our model successfully learns from the causal substructure and alleviates the confounding effect of shortcut substructures that are spuriously correlated to chemical reactions. Extensive experiments on various tasks with real-world and synthetic datasets demonstrate the superiority of CMRL over state-of-the-art baseline models. Our code is available at https://github.com/Namkyeong/CMRL.	翻訳日:2023-07-11 18:14:58 公開日:2023-07-09
# 効率的なシーケンスモデリングのためのスパースモジュラーアクティベーション Sparse Modular Activation for Efficient Sequence Modeling ( http://arxiv.org/abs/2306.11197v2 ) ライセンス: Link先を確認	Liliang Ren, Yang Liu, Shuohang Wang, Yichong Xu, Chenguang Zhu, ChengXiang Zhai	(参考訳) 線形状態空間モデル(SSM)は、繰り返し構造を効率的に符号化するため、様々なシーケンスモデリングタスクにおいて強い性能を示した。しかし、言語モデリングや機械翻訳といったより包括的なタスクでは、自己注意に基づくモデルは依然としてSSMよりも優れています。 SSMと自己注意の両方を併用したハイブリッドモデルは一般に有望な性能を示すが、現在のアプローチでは、入力シーケンスのすべての要素に対して静的かつ均一に注意モジュールを適用し、準最適品質と効率のトレードオフをもたらす。本研究では,ニューラルネットワークが配列要素のサブモジュールを分離的かつ動的に動的に活性化する機構であるスパースモジュール活性化(SMA)を紹介する。各要素が非アクティブなサブモジュールをスキップできるようにすることで、SMAはシーケンスモデリングのトレーニングと推論の段階で計算とメモリ消費を減らす。 SMAの特定のインスタンス化として、SMAを用いて、SSMから学んだ状態表現に基づいて、GAU(Gated Attention Unit)をスパースに活性化する新しいニューラルネットワークSeqBoatを設計する。 GAUが活性化された入力にのみ局所的な注意を集中させることで、セックボートは理論上無限の注意範囲を持つ線形推論複雑性を達成でき、チャンキングベースモデルよりもはるかに優れた品質と効率のトレードオフを提供できる。言語モデリング、音声分類、長距離アリーナを含む幅広いタスクの実験により、SeqBoatは線形複雑性を持つハイブリッドモデルに新しい最先端の結果をもたらし、学習されたスパースアクティベーションパターンを通じて各タスクに必要な注意の量を明らかにする。 Linear State Space Models (SSMs) have demonstrated strong performance in a variety of sequence modeling tasks due to their efficient encoding of the recurrent structure. However, in more comprehensive tasks like language modeling and machine translation, self-attention-based models still outperform SSMs. Hybrid models employing both SSM and self-attention generally show promising performance, but current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs. In this work, we introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely and dynamically activate sub-modules for sequence elements in a differentiable manner. Through allowing each element to skip non-activated sub-modules, SMA reduces computation and memory consumption at both training and inference stages of sequence modeling. As a specific instantiation of SMA, we design a novel neural architecture, SeqBoat, which employs SMA to sparsely activate a Gated Attention Unit (GAU) based on the state representations learned from an SSM. By constraining the GAU to only conduct local attention on the activated inputs, SeqBoat can achieve linear inference complexity with theoretically infinite attention span, and provide substantially better quality-efficiency trade-off than the chunking-based models. With experiments on a wide range of tasks, including language modeling, speech classification and long-range arena, SeqBoat brings new state-of-the-art results among hybrid models with linear complexity and reveals the amount of attention needed for each task through the learned sparse activation patterns.	翻訳日:2023-07-11 18:06:29 公開日:2023-07-09
# AIに基づくモーション編集とスティル化のためのモーションキャプチャデータセット Motion Capture Dataset for Practical Use of AI-based Motion Editing and Stylization ( http://arxiv.org/abs/2306.08861v2 ) ライセンス: Link先を確認	Makito Kobayashi, Chen-Chieh Liao, Keito Inoue, Sentaro Yojima, Masafumi Takahashi	(参考訳) そこで本研究では,動きスタイル伝達領域のための新しいスタイル多様性データセットを提案する。モーションデータセットは産業標準の人骨構造を用いており、多くのプロジェクトのために3D文字に差し込むことができる。我々はモーションスタイル転送の課題を主張し,提案するモーションデータセットを一般と市場の両方に公開することにより,この領域における今後の作業を促進する。本研究は,最先端手法を用いた実験において,モーションスタイル転送に関する包括的研究を行い,提案するデータセットがモーションスタイル転送タスクに有効であることを示す。 In this work, we proposed a new style-diverse dataset for the domain of motion style transfer. The motion dataset uses an industrial-standard human bone structure and thus is industry-ready to be plugged into 3D characters for many projects. We claim the challenges in motion style transfer and encourage future work in this domain by releasing the proposed motion dataset both to the public and the market. We conduct a comprehensive study on motion style transfer in the experiment using the state-of-the-art method, and the results show the proposed dataset's validity for the motion style transfer task.	翻訳日:2023-07-11 18:05:25 公開日:2023-07-09
# 単語順における一様情報密度に対する言語間圧力 A Cross-Linguistic Pressure for Uniform Information Density in Word Order ( http://arxiv.org/abs/2306.03734v2 ) ライセンス: Link先を確認	Thomas Hikaru Clark, Clara Meister, Tiago Pimentel, Michael Hahn, Ryan Cotterell, Richard Futrell and Roger Levy	(参考訳) 自然言語は、標準語順と単語順の柔軟性の両方で大きく異なるが、その単語順は、しばしば機能的な圧力による共有言語間統計パターンに従っている。これらのプレッシャーを特定するために、先行研究は実際の語順と偽語順を比較した。しかし、このような調査では、一様情報密度(UID)仮説という1つの機能的圧力が見過ごされている。ここでは,UIDの圧力が語順パターンに相互言語的に影響を与えているかどうかを問う。この目的のために、実順序が反実順序よりも情報均一性が高まるかどうかを計算モデルを用いて検証する。類型的に多様性のある10の言語に関する実証的研究では、 (i)SVO言語では、実語順は逆語順よりも一貫して一様であり、 (ii) 言語的に不可解な反実順序のみが、実際の順序の均一性を超え続ける。これらの知見は、自然言語の開発と利用における情報の均一性の圧力と互換性がある。 While natural languages differ widely in both canonical word order and word order flexibility, their word orders still follow shared cross-linguistic statistical patterns, often attributed to functional pressures. In the effort to identify these pressures, prior work has compared real and counterfactual word orders. Yet one functional pressure has been overlooked in such investigations: the uniform information density (UID) hypothesis, which holds that information should be spread evenly throughout an utterance. Here, we ask whether a pressure for UID may have influenced word order patterns cross-linguistically. To this end, we use computational models to test whether real orders lead to greater information uniformity than counterfactual orders. In our empirical study of 10 typologically diverse languages, we find that: (i) among SVO languages, real word orders consistently have greater uniformity than reverse word orders, and (ii) only linguistically implausible counterfactual orders consistently exceed the uniformity of real orders. These findings are compatible with a pressure for information uniformity in the development and usage of natural languages.	翻訳日:2023-07-11 18:03:56 公開日:2023-07-09
# 多言語言語モデルは多文化的ではない:感情のケーススタディ Multilingual Language Models are not Multicultural: A Case Study in Emotion ( http://arxiv.org/abs/2307.01370v2 ) ライセンス: Link先を確認	Shreya Havaldar, Sunny Rai, Bhumika Singhal, Langchen Liu, Sharath Chandra Guntuku, Lyle Ungar	(参考訳) 感情は世界中で経験され、表現される。感情に敏感な多言語タスクにLarge Language Models(LM)を使用するには、感情の文化的変化を反映しなければならない。本研究では,2023年の多言語LMが,文化や言語間の感情表現の差異を反映しているかどうかを検討する。 LMから得られる埋め込み(例えば、XLM-RoBERTa)はアングロ中心であり、生成的LM(例えば、ChatGPT)は、他の言語のプロンプトに応答しても、西洋のノルムを反映する。以上の結果から,多言語lmsは感情の文化的に適切なニュアンスを学習できないことを示し,これを修正するための研究の方向性を強調する。 Emotions are experienced and expressed differently across the world. In order to use Large Language Models (LMs) for multilingual tasks that require emotional sensitivity, LMs must reflect this cultural variation in emotion. In this study, we investigate whether the widely-used multilingual LMs in 2023 reflect differences in emotional expressions across cultures and languages. We find that embeddings obtained from LMs (e.g., XLM-RoBERTa) are Anglocentric, and generative LMs (e.g., ChatGPT) reflect Western norms, even when responding to prompts in other languages. Our results show that multilingual LMs do not successfully learn the culturally appropriate nuances of emotion and we highlight possible research directions towards correcting this.	翻訳日:2023-07-11 17:57:33 公開日:2023-07-09
# DragDiffusion:インタラクティブなポイントベース画像編集のための拡散モデル DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing ( http://arxiv.org/abs/2306.14435v3 ) ライセンス: Link先を確認	Yujun Shi, Chuhui Xue, Jiachun Pan, Wenqing Zhang, Vincent Y. F. Tan, Song Bai	(参考訳) 正確かつ制御可能な画像編集は、大きな注目を集めている課題である。近年、DragGANはインタラクティブな点ベース画像編集フレームワークを提供し、画素レベルの精度で印象的な編集結果を実現する。しかし, この手法はGAN(Generative Adversarial Network)に基づくため, 事前学習したGANモデルの容量により, 一般性は上界となる。本研究では,このようなフレームワークを拡散モデルに拡張し,DragDiffusionを提案する。大規模事前学習された拡散モデルを利用することにより,実世界シナリオにおける対話型ポイントベース編集の適用性が大幅に向上する。既存の拡散ベースの画像編集手法はテキスト埋め込みで動作するが、dragdiffusionは拡散潜時を最適化して正確な空間制御を実現する。拡散モデルは反復的に画像を生成するが、一つのステップで拡散遅延を最適化すればコヒーレントな結果が得られ、DragDiffusionが効率よく高品質な編集を完了できることを実証的に示す。幅広い挑戦的なケース(マルチオブジェクト、多様なオブジェクトカテゴリ、様々なスタイルなど)にわたる広範な実験は、dragdiffusionの汎用性と汎用性を示している。コード: https://github.com/yujun-shi/dragdiffusion。 Precise and controllable image editing is a challenging task that has attracted significant attention. Recently, DragGAN enables an interactive point-based image editing framework and achieves impressive editing results with pixel-level precision. However, since this method is based on generative adversarial networks (GAN), its generality is upper-bounded by the capacity of the pre-trained GAN models. In this work, we extend such an editing framework to diffusion models and propose DragDiffusion. By leveraging large-scale pretrained diffusion models, we greatly improve the applicability of interactive point-based editing in real world scenarios. While most existing diffusion-based image editing methods work on text embeddings, DragDiffusion optimizes the diffusion latent to achieve precise spatial control. Although diffusion models generate images in an iterative manner, we empirically show that optimizing diffusion latent at one single step suffices to generate coherent results, enabling DragDiffusion to complete high-quality editing efficiently. Extensive experiments across a wide range of challenging cases (e.g., multi-objects, diverse object categories, various styles, etc.) demonstrate the versatility and generality of DragDiffusion. Code: https://github.com/Yujun-Shi/DragDiffusion.	翻訳日:2023-07-11 17:55:28 公開日:2023-07-09
# LVM-Med:2次グラフマッチングによる医用イメージングのための大規模自己スーパービジョンモデル学習 LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching ( http://arxiv.org/abs/2306.11925v2 ) ライセンス: Link先を確認	Duy M. H. Nguyen, Hoang Nguyen, Nghiem T. Diep, Tan N. Pham, Tri Cao, Binh T. Nguyen, Paul Swoboda, Nhat Ho, Shadi Albarqouni, Pengtao Xie, Daniel Sonntag, Mathias Niepert	(参考訳) 注釈付きサンプルを限定した新しいタスクに微調整できる大規模な事前訓練モデルを持つことは、医療画像データにとってオープンな課題である。 ImageNetの事前訓練されたディープネットワークとWebスケールデータで訓練されたビジョン言語基盤モデルが一般的であるが、天然画像と医用画像のドメインシフトが大きいため、医療タスクにおけるそれらの効果は限られている。このギャップを埋めるために,大規模医療データセットでトレーニングされた最初のディープネットワークであるlmm-medを紹介する。我々は、55の公開データセットから約130万の医療画像を収集し、CT、MRI、X線、超音波などの多数の臓器とモダリティをカバーした。このデータセット上で,最先端の自己教師付きアルゴリズムをベンチマークし,グラフマッチングを用いた新しい自己教師付きコントラスト学習アルゴリズムを提案する。提案するアプローチには3つの貢献がある。 (i)地域情報及びグローバル情報に基づく先行的な対向画像類似度指標を統合する。 (ii)組合せグラフマッチング目的によって構築された損失関数を通して特徴埋め込みの構造的制約を捉え、 (iii)ブラックボックスソルバに対する現代の勾配推定手法を用いて、エンドツーエンドを効率的に訓練することができる。提案手法は,セグメンテーションや分類,オブジェクト検出,分布内および分布外の設定など15の下流医療タスクにおいて,提案手法を徹底的に評価した。 LVM-Medは、多くの最先端の教師付き、自己監督型、基礎モデルよりも経験的に優れている。脳腫瘍分類や糖尿病網膜症グラディングといった課題に対して、LVM-MedはResNet-50のみを使用しながら、10億のマスクでトレーニングされた以前の視覚言語モデルを6～7%改善する。 Obtaining large pre-trained models that can be fine-tuned to new tasks with limited annotated samples has remained an open challenge for medical imaging data. While pre-trained deep networks on ImageNet and vision-language foundation models trained on web-scale data are prevailing approaches, their effectiveness on medical tasks is limited due to the significant domain shift between natural and medical images. To bridge this gap, we introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets, covering a large number of organs and modalities such as CT, MRI, X-ray, and Ultrasound. We benchmark several state-of-the-art self-supervised algorithms on this dataset and propose a novel self-supervised contrastive learning algorithm using a graph-matching formulation. The proposed approach makes three contributions: (i) it integrates prior pair-wise image similarity metrics based on local and global information; (ii) it captures the structural constraints of feature embeddings through a loss function constructed via a combinatorial graph-matching objective; and (iii) it can be trained efficiently end-to-end using modern gradient-estimation techniques for black-box solvers. We thoroughly evaluate the proposed LVM-Med on 15 downstream medical tasks ranging from segmentation and classification to object detection, and both for the in and out-of-distribution settings. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models. For challenging tasks such as Brain Tumor Classification or Diabetic Retinopathy Grading, LVM-Med improves previous vision-language models trained on 1 billion masks by 6-7% while using only a ResNet-50.	翻訳日:2023-07-11 17:55:08 公開日:2023-07-09
# SRCD:単一ドメイン汎用オブジェクト検出のための複合ドメインを用いた意味推論 SRCD: Semantic Reasoning with Compound Domains for Single-Domain Generalized Object Detection ( http://arxiv.org/abs/2307.01750v2 ) ライセンス: Link先を確認	Zhijie Rao, Jingcai Guo, Luyao Tang, Yue Huang, Xinghao Ding, Song Guo	(参考訳) 本稿では,単一ドメイン一般化オブジェクト検出のための新しいフレームワーク(すなわち単一dgod)を提案し,モデル一般化能力を高めるために,自己提供型複合クロスドメインサンプルの意味構造を学習し,維持することに関心を寄せる。複数のソースドメインでトレーニングされたDGODとは異なり、シングルDGODは単一のソースドメインだけで複数のターゲットドメインにうまく一般化することがはるかに難しい。既存の手法は主にDGODからの同様の処理を採用し、意味空間を分離または圧縮することでドメイン不変の特徴を学習する。しかし、潜在的な制限は2つある。 1) 極端に少ない単一ドメインデータによる擬似属性・ラベル相関 2) セマンティックな構造情報は一般に無視される。つまり,サンプルにおけるインスタンスレベルのセマンティック関係の親和性は,一般化のモデル化に不可欠である。本稿では,Single-DGODのためのSingmantic Reasoning with Compound Domains (SRCD)を提案する。具体的には,テクスチャベースの自己拡張(TBSA)モジュールと局所言語意味推論(LGSR)モジュールの2つの主要コンポーネントを含む。 TBSAは、光、影、色などのラベルに関連する無関係な属性を、光量効率の自己増強によって画像レベルで除去することを目的としている。さらに、lgsrは、インスタンス特徴のセマンティック関係をさらにモデル化し、本質的なセマンティック構造を解明し、維持するために使用される。複数のベンチマークで大規模な実験を行い、提案したSRCDの有効性を示した。 This paper provides a novel framework for single-domain generalized object detection (i.e., Single-DGOD), where we are interested in learning and maintaining the semantic structures of self-augmented compound cross-domain samples to enhance the model's generalization ability. Different from DGOD trained on multiple source domains, Single-DGOD is far more challenging to generalize well to multiple target domains with only one single source domain. Existing methods mostly adopt a similar treatment from DGOD to learn domain-invariant features by decoupling or compressing the semantic space. However, there may have two potential limitations: 1) pseudo attribute-label correlation, due to extremely scarce single-domain data; and 2) the semantic structural information is usually ignored, i.e., we found the affinities of instance-level semantic relations in samples are crucial to model generalization. In this paper, we introduce Semantic Reasoning with Compound Domains (SRCD) for Single-DGOD. Specifically, our SRCD contains two main components, namely, the texture-based self-augmentation (TBSA) module, and the local-global semantic reasoning (LGSR) module. TBSA aims to eliminate the effects of irrelevant attributes associated with labels, such as light, shadow, color, etc., at the image level by a light-yet-efficient self-augmentation. Moreover, LGSR is used to further model the semantic relationships on instance features to uncover and maintain the intrinsic semantic structures. Extensive experiments on multiple benchmarks demonstrate the effectiveness of the proposed SRCD.	翻訳日:2023-07-11 17:44:52 公開日:2023-07-09
# DebateKG: セマンティック知識グラフを用いた事例作成のための自動政策議論 DebateKG: Automatic Policy Debate Case Creation with Semantic Knowledge Graphs ( http://arxiv.org/abs/2307.04090v1 ) ライセンス: Link先を確認	Allen Roush	(参考訳) 近年のArgument Miningコミュニティにおける研究は、競争の激しい議論の中で見つかった問題の解決に自然言語処理システムの適用性を示している。競争討論における最も重要な課題の1つは、議論者が高品質の討論ケースを作成することである。議論的意味論的知識グラフ上の制約付き最短経路トラバーサルを用いて,効果的な議論事例を構築できることを示す。我々は、この可能性について、DebateSumと呼ばれる大規模データセットをすでに備えている、Policy Debateと呼ばれる米国競争的議論の文脈で研究する。我々は,データセットに53180個の新しい例と,さらに有用なメタデータを導入することで,ディベートサムを大幅に改善した。我々はtxtaiセマンティックサーチとナレッジグラフツールチェーンを利用して,このデータセット上に構築した9つのセマンティックナレッジグラフを作成し,コントリビュートする。政策論争事例作成の文脈において,どの知識グラフが優れているかを評価するユニークな手法を提案する。他のすべてのコードや知識グラフとともに、議論のケースを自動的に生成するデモがオープンソースとして公開されている。 Recent work within the Argument Mining community has shown the applicability of Natural Language Processing systems for solving problems found within competitive debate. One of the most important tasks within competitive debate is for debaters to create high quality debate cases. We show that effective debate cases can be constructed using constrained shortest path traversals on Argumentative Semantic Knowledge Graphs. We study this potential in the context of a type of American Competitive Debate, called Policy Debate, which already has a large scale dataset targeting it called DebateSum. We significantly improve upon DebateSum by introducing 53180 new examples, as well as further useful metadata for every example, to the dataset. We leverage the txtai semantic search and knowledge graph toolchain to produce and contribute 9 semantic knowledge graphs built on this dataset. We create a unique method for evaluating which knowledge graphs are better in the context of producing policy debate cases. A demo which automatically generates debate cases, along with all other code and the Knowledge Graphs, are open-sourced and made available to the public here: https://github.com/Hellisotherpeople/DebateKG	翻訳日:2023-07-11 15:41:44 公開日:2023-07-09
# 変分量子アルゴリズムは量子アドバンテージを実証できるか? 本当に重要な時間 Can Variational Quantum Algorithms Demonstrate Quantum Advantages? Time Really Matters ( http://arxiv.org/abs/2307.04089v1 ) ライセンス: Link先を確認	Huan-Yu Liu, Zhao-Yun Chen, Tai-Ping Sun, Cheng Xue, Yu-Chun Wu, and Guo-Ping Guo	(参考訳) 低深度量子ニューラルネットワーク(QNN)を採用することで、変動量子アルゴリズム(VQA)は、ノイズの多い中間スケール量子(NISQ)時代にも有望かつ挑戦的である。しかしながら、VQAが量子的優位性を証明できるかどうかはまだ未定であり、本論文で検討する。まず、QNNのトレーニングにパラメータ数と勾配評価コストの間に依存性があることを証明する。バックプロパゲーションアルゴリズムを用いて古典的ニューラルネットワークをトレーニングする際、そのような直接的な依存は存在しないことに気づき、そのような依存はVQAのスケーラビリティを制限すると論じる。第2に、ノイズや到達可能性といった現実的な制限を考慮せずに、理想的な場合、すなわち、VQAの実行時間を見積もる。理想的な時間コストが1年の壁時間の順序に容易に達することを示します。第3に、量子回路の古典的シミュレーションを用いて時間コストを比較することにより、VQAsは10^0$-$10^2$のスケーリングに達すると、古典的なシミュレーションケースよりも優れていることを示す。最後に、上記の結果に基づいて、VQAが時間スケーリングの観点から古典的なケースよりも優れており、従って現在のワークフローで量子的優位性を示すことは困難である、と論じる。 VQAと量子コンピューティングは急速に発展しているため、この研究はVQAの可能性を否定しようとはしていない。本論文の分析はvqaの最適化に向けた指針を提供し、長期的にはより自然なハイブリッド量子古典アルゴリズムを求めることは有意義である。 Applying low-depth quantum neural networks (QNNs), variational quantum algorithms (VQAs) are both promising and challenging in the noisy intermediate-scale quantum (NISQ) era: Despite its remarkable progress, criticisms on the efficiency and feasibility issues never stopped. However, whether VQAs can demonstrate quantum advantages is still undetermined till now, which will be investigated in this paper. First, we will prove that there exists a dependency between the parameter number and the gradient-evaluation cost when training QNNs. Noticing there is no such direct dependency when training classical neural networks with the backpropagation algorithm, we argue that such a dependency limits the scalability of VQAs. Second, we estimate the time for running VQAs in ideal cases, i.e., without considering realistic limitations like noise and reachability. We will show that the ideal time cost easily reaches the order of a 1-year wall time. Third, by comparing with the time cost using classical simulation of quantum circuits, we will show that VQAs can only outperform the classical simulation case when the time cost reaches the scaling of $10^0$-$10^2$ years. Finally, based on the above results, we argue that it would be difficult for VQAs to outperform classical cases in view of time scaling, and therefore, demonstrate quantum advantages, with the current workflow. Since VQAs as well as quantum computing are developing rapidly, this work does not aim to deny the potential of VQAs. The analysis in this paper provides directions for optimizing VQAs, and in the long run, seeking more natural hybrid quantum-classical algorithms would be meaningful.	翻訳日:2023-07-11 15:41:22 公開日:2023-07-09
# SVIT: Visual Instruction Tuningのスケールアップ SVIT: Scaling up Visual Instruction Tuning ( http://arxiv.org/abs/2307.04087v1 ) ライセンス: Link先を確認	Bo Zhao, Boya Wu, Tiejun Huang	(参考訳) 基礎モデルの出現により、大きな言語とビジョンモデルは統合され、視覚的キャプション、対話、質問応答などのマルチモーダル機能を取得する。既存のマルチモーダルモデルは、視覚的理解と推論の印象的な性能を示すが、高品質な命令チューニングデータの不足のため、その限界は依然としてほとんど未熟である。マルチモーダル能力の限界を押し上げるために,1.6mの会話質問応答(qa)ペアと1.6mの複雑な推論qaペアと106kの詳細な画像記述を含む320万の視覚的命令チューニングデータのデータセットを構築し,視覚命令チューニング(svit)を売り出す。ボリュームに加えて,画像の豊富な手動アノテーションでGPT-4を誘導することにより,高品質で豊富な多様性を特徴付けるデータセットも提案されている。 SVIT上でのマルチモーダルモデルのトレーニングは,視覚的知覚や推論,計画といった面で,多モーダル性能を大幅に向上させることができることを実証的に検証した。 Thanks to the emerging of foundation models, the large language and vision models are integrated to acquire the multimodal ability of visual captioning, dialogue, question answering, etc. Although existing multimodal models present impressive performance of visual understanding and reasoning, their limits are still largely under-explored due to the scarcity of high-quality instruction tuning data. To push the limits of multimodal capability, we Sale up Visual Instruction Tuning (SVIT) by constructing a dataset of 3.2 million visual instruction tuning data including 1.6M conversation question-answer (QA) pairs and 1.6M complex reasoning QA pairs and 106K detailed image descriptions. Besides the volume, the proposed dataset is also featured by the high quality and rich diversity, which is generated by prompting GPT-4 with the abundant manual annotations of images. We empirically verify that training multimodal models on SVIT can significantly improve the multimodal performance in terms of visual perception, reasoning and planing.	翻訳日:2023-07-11 15:40:52 公開日:2023-07-09
# 自己校正分類器指導によるラベルデータ少ないスコアベース条件生成 Score-based Conditional Generation with Fewer Labeled Data by Self-calibrating Classifier Guidance ( http://arxiv.org/abs/2307.04081v1 ) ライセンス: Link先を確認	Paul Kuo-Ming Huang, Si-An Chen, Hsuan-Tien Lin	(参考訳) SGM(Score-based Generative Models)は、画像生成品質の高い深層生成モデルのファミリである。以前の研究では、未条件のSGMと訓練された分類器のガイダンスを結合することにより、SGMをクラス条件の生成に適応するように拡張してきた。しかしながら、そのような分類器誘導型SGMは、特にラベル付きデータが少ない場合、正確な条件生成を必ずしも達成しない。この問題は、分類器の信頼性の低い勾配と、トレーニング中にラベルなしのデータを完全に活用できないことに根ざしている。次に、分類器自身を校正することで分類器誘導SGMを改善することを提案する。我々のキーとなる考え方は、エネルギーモデルからの原理を使って分類器を無条件SGMの別の見方に変換することである。そして、ラベル付きデータとラベルなしデータの両方を用いて分類器を校正するために、無条件SGMの既存の損失を採用することができる。実験により,提案手法はラベル付きデータの異なるパーセンテージ間で条件生成品質を著しく改善することを確認した。性能の改善により、ラベル付きデータが少ない場合、提案手法は他の条件付きSGMよりも一貫して優れている。その結果,限定ラベルデータを用いた生成モデルに対する提案手法の可能性が確認された。 Score-based Generative Models (SGMs) are a popular family of deep generative models that achieves leading image generation quality. Earlier studies have extended SGMs to tackle class-conditional generation by coupling an unconditional SGM with the guidance of a trained classifier. Nevertheless, such classifier-guided SGMs do not always achieve accurate conditional generation, especially when trained with fewer labeled data. We argue that the issue is rooted in unreliable gradients of the classifier and the inability to fully utilize unlabeled data during training. We then propose to improve classifier-guided SGMs by letting the classifier calibrate itself. Our key idea is to use principles from energy-based models to convert the classifier as another view of the unconditional SGM. Then, existing loss for the unconditional SGM can be adopted to calibrate the classifier using both labeled and unlabeled data. Empirical results validate that the proposed approach significantly improves the conditional generation quality across different percentages of labeled data. The improved performance makes the proposed approach consistently superior to other conditional SGMs when using fewer labeled data. The results confirm the potential of the proposed approach for generative modeling with limited labeled data.	翻訳日:2023-07-11 15:40:32 公開日:2023-07-09
# 高速でスケーラブルなプライベート推論に向けて Towards Fast and Scalable Private Inference ( http://arxiv.org/abs/2307.04077v1 ) ライセンス: Link先を確認	Jianqiao Mo, Karthik Garimella, Negar Neda, Austin Ebel, Brandon Reagen	(参考訳) プライバシとセキュリティは、ファーストオーダーの設計制約として急速に現れています。ユーザーは、データを見る人(秘密性)と利用方法(コントロール)に対して、より多くの保護を求めるようになった。ここでは、セキュリティのための既存の暗号化技術は不足している。保存または通信時にデータを保護するが、計算のために復号化する必要がある。幸いにも、プライバシ保護計算(PPC)と呼ばれる新しい計算パラダイムが存在する。新興のPPC技術は、セキュアなアウトソース計算や、2つのパーティの計算に利用することができる。デジタル時代のユーザー保護に革命をもたらす驚くべき可能性にもかかわらず、その実現は計算能力、通信能力、ストレージのオーバーヘッドのために制限されている。本稿では、ニューラルネットワークにおけるプライベート推論(PI)をモチベーションアプリケーションとして利用して、様々なPPCオーバーヘッドに対処する取り組みについてレビューする。まず,準同型暗号 (he), 秘密共有 (ss), ガーブレッド回路 (gcs), オブリベイト転送 (ot) など様々な技術が紹介されている。次に、PI実装時のオーバーヘッドのキャラクタリゼーションをカバーします。キャラクタリゼーションはgcとheアクセラレータの両方の必要性を動機付けている。次に、GCを加速するHAACとHEを加速するRPUの2つのソリューションが提示される。結論として、piの残りのオーバーヘッドを克服するための今後の作業について、結果と効果を議論して示します。 Privacy and security have rapidly emerged as first order design constraints. Users now demand more protection over who can see their data (confidentiality) as well as how it is used (control). Here, existing cryptographic techniques for security fall short: they secure data when stored or communicated but must decrypt it for computation. Fortunately, a new paradigm of computing exists, which we refer to as privacy-preserving computation (PPC). Emerging PPC technologies can be leveraged for secure outsourced computation or to enable two parties to compute without revealing either users' secret data. Despite their phenomenal potential to revolutionize user protection in the digital age, the realization has been limited due to exorbitant computational, communication, and storage overheads. This paper reviews recent efforts on addressing various PPC overheads using private inference (PI) in neural network as a motivating application. First, the problem and various technologies, including homomorphic encryption (HE), secret sharing (SS), garbled circuits (GCs), and oblivious transfer (OT), are introduced. Next, a characterization of their overheads when used to implement PI is covered. The characterization motivates the need for both GCs and HE accelerators. Then two solutions are presented: HAAC for accelerating GCs and RPU for accelerating HE. To conclude, results and effects are shown with a discussion on what future work is needed to overcome the remaining overheads of PI.	翻訳日:2023-07-11 15:40:13 公開日:2023-07-09
# 癌マルチオミクスデータに基づくがんの新しいサブタイプと治療のためのマルチヘッド注意機構学習 Multi-Head Attention Mechanism Learning for Cancer New Subtypes and Treatment Based on Cancer Multi-Omics Data ( http://arxiv.org/abs/2307.04075v1 ) ライセンス: Link先を確認	Liangrui Pan, Dazhen Liu, Yutao Dou, Lian Wang, Zhichao Feng, Pengfei Rong, Liwen Xu, Shaoliang Peng	(参考訳) がんの多様性が高く, 臨床的特徴も高いため, 癌サブタイプ間では, マルチオミクスデータと臨床特徴に有意差がみられた。したがって、癌の診断、治療、予後には、癌サブタイプの同定と発見が不可欠である。本研究では,非教師なしコントラスト学習(unsupervised contrastive learning, amucl)のための注意機構に基づく一般化フレームワークを提案する。 AMUCLフレームワークには、教師なしマルチヘッドアテンション機構が含まれており、マルチオミクスデータの特徴を深く抽出する。さらに,マルチヘッドアテンション機構に基づく非結合型コントラスト学習モデル(dmacl)を提案し,マルチオミクスデータの特徴とクラスターを学習し,新しいがんサブタイプを同定する。この教師なしコントラスト学習法は、マルチオミクスデータの特徴空間におけるサンプルとサンプル空間との類似度を計算してサブタイプをクラスタ化する。他の11のディープラーニングモデルと比較して、DMACLモデルは0.002のCインデックス、Silhouetteスコア0.801、Davies Bouldinスコア0.38のCインデックスをシングルセルマルチオミクスデータセットで達成した。がんマルチオミクスデータセットにおいて、dmaclモデルは、0.016のc-インデックス、0.688のシルエットスコア、0.06のデイビスブルディンスコアを取得し、各種類のがんに対して最も信頼性の高い癌サブタイプクラスタリング結果を得た。最後に、AMUCLフレームワークでDMACLモデルを用いて、AMLの6つの癌サブタイプを明らかにした。 amlのgo機能強化,サブタイプ特異的生物学的機能,gseaの解析により,amuclフレームワークに基づいた癌サブタイプ解析の解釈性がさらに向上した。 Due to the high heterogeneity and clinical characteristics of cancer, there are significant differences in multi-omics data and clinical features among subtypes of different cancers. Therefore, the identification and discovery of cancer subtypes are crucial for the diagnosis, treatment, and prognosis of cancer. In this study, we proposed a generalization framework based on attention mechanisms for unsupervised contrastive learning (AMUCL) to analyze cancer multi-omics data for the identification and characterization of cancer subtypes. AMUCL framework includes a unsupervised multi-head attention mechanism, which deeply extracts multi-omics data features. Importantly, a decoupled contrastive learning model (DMACL) based on a multi-head attention mechanism is proposed to learn multi-omics data features and clusters and identify new cancer subtypes. This unsupervised contrastive learning method clusters subtypes by calculating the similarity between samples in the feature space and sample space of multi-omics data. Compared to 11 other deep learning models, the DMACL model achieved a C-index of 0.002, a Silhouette score of 0.801, and a Davies Bouldin Score of 0.38 on a single-cell multi-omics dataset. On a cancer multi-omics dataset, the DMACL model obtained a C-index of 0.016, a Silhouette score of 0.688, and a Davies Bouldin Score of 0.46, and obtained the most reliable cancer subtype clustering results for each type of cancer. Finally, we used the DMACL model in the AMUCL framework to reveal six cancer subtypes of AML. By analyzing the GO functional enrichment, subtype-specific biological functions, and GSEA of AML, we further enhanced the interpretability of cancer subtype analysis based on the generalizable AMUCL framework.	翻訳日:2023-07-11 15:39:51 公開日:2023-07-09
# 視覚トランスフォーマーのためのランダム位置反転パッチ Random Position Adversarial Patch for Vision Transformers ( http://arxiv.org/abs/2307.04066v1 ) ライセンス: Link先を確認	Mingzhen Shao	(参考訳) 以前の研究では、視覚トランスフォーマーが敵のパッチに脆弱性があることが示されているが、これらの研究はすべて重要な仮定に依存している。この厳密な要件により、視覚トランスフォーマーの物理的世界での対向パッチの展開は、cnnでの有効性とは異なり、現実的ではない。本稿では、アライメント制約を克服し、視野内の任意の位置に標的攻撃を発射できる敵パッチ(G-Patch)を生成する新しい手法を提案する。具体的には、勾配を使ってパッチを直接最適化するのではなく、GANのような構造を用いて逆パッチを生成する。本実験は,デジタルおよび物理世界のシナリオにおいて,視覚トランスフォーマーに対するユニバーサルアタックを実現する上で,敵パッチの有効性を示す。さらに、さらに分析した結果、生成した対向パッチは、輝度制限、色移動、ランダムノイズに対する堅牢性を示すことが明らかとなった。実世界の攻撃実験は、非常に困難な条件下でも堅牢な攻撃を発射するためのGパッチの有効性を検証する。 Previous studies have shown the vulnerability of vision transformers to adversarial patches, but these studies all rely on a critical assumption: the attack patches must be perfectly aligned with the patches used for linear projection in vision transformers. Due to this stringent requirement, deploying adversarial patches for vision transformers in the physical world becomes impractical, unlike their effectiveness on CNNs. This paper proposes a novel method for generating an adversarial patch (G-Patch) that overcomes the alignment constraint, allowing the patch to launch a targeted attack at any position within the field of view. Specifically, instead of directly optimizing the patch using gradients, we employ a GAN-like structure to generate the adversarial patch. Our experiments show the effectiveness of the adversarial patch in achieving universal attacks on vision transformers, both in digital and physical-world scenarios. Additionally, further analysis reveals that the generated adversarial patch exhibits robustness to brightness restriction, color transfer, and random noise. Real-world attack experiments validate the effectiveness of the G-Patch to launch robust attacks even under some very challenging conditions.	翻訳日:2023-07-11 15:39:19 公開日:2023-07-09
# 生成ニューラルネットワークに基づく超高次元非凸景観の大規模大域的最適化 Large-scale global optimization of ultra-high dimensional non-convex landscapes based on generative neural networks ( http://arxiv.org/abs/2307.04065v1 ) ライセンス: Link先を確認	Jiaqi Jiang, Jonathan A. Fan	(参考訳) 超高次元連続景観における効果的な探索を可能にする深層生成ネットワークの訓練に基づいて,非凸最適化アルゴリズムのメタヒューリスティックを提案する。ネットワークトレーニングでは, サンプリングした局所勾配の集団をカスタマイズされた損失関数内で利用し, ネットワーク出力分布関数を高い性能で1つのピークに進化させる。深層ネットワークアーキテクチャは、トレーニングの過程で進行的な成長をサポートするように調整されており、高次元景観の次元特性の呪いをアルゴリズムが管理できる。我々は,1000の次元を持つ標準的な最適化問題に適用し,最先端のアルゴリズムベンチマークと比較して,関数評価の少ない手法で性能が向上することを示す。また、深層ネットワークの過度パラメータ化、損失関数工学、最適化における適切なネットワークアーキテクチャ選択の役割や、サンプリングした局所勾配のバッチサイズが問題次元に依存しない理由についても論じる。これらの概念は、非凸最適化問題を解決するためにカスタマイズ可能で表現可能な深層生成ネットワークを利用する新しいアルゴリズムの基盤となる。 We present a non-convex optimization algorithm metaheuristic, based on the training of a deep generative network, which enables effective searching within continuous, ultra-high dimensional landscapes. During network training, populations of sampled local gradients are utilized within a customized loss function to evolve the network output distribution function towards one peak at high-performing optima. The deep network architecture is tailored to support progressive growth over the course of training, which allows the algorithm to manage the curse of dimensionality characteristic of high-dimensional landscapes. We apply our concept to a range of standard optimization problems with dimensions as high as one thousand and show that our method performs better with fewer function evaluations compared to state-of-the-art algorithm benchmarks. We also discuss the role of deep network over-parameterization, loss function engineering, and proper network architecture selection in optimization, and why the required batch size of sampled local gradients is independent of problem dimension. These concepts form the foundation for a new class of algorithms that utilize customizable and expressive deep generative networks to solve non-convex optimization problems.	翻訳日:2023-07-11 15:39:02 公開日:2023-07-09
# 最適輸送による条件付サンプリングのための生成フロー A generative flow for conditional sampling via optimal transport ( http://arxiv.org/abs/2307.04102v1 ) ライセンス: Link先を確認	Jason Alfonso, Ricardo Baptista, Anupam Bhakta, Noam Gal, Alfin Hou, Isa Lyubimova, Daniel Pocklington, Josef Sajonz, Giulio Trigila, and Ryan Tsai	(参考訳) サンプリング条件分布はベイズ推定と密度推定の基本的なタスクである。フローの正規化や生成的敵ネットワークのような生成モデルは、単純な参照(例えば標準ガウス)を目標分布にプッシュするトランスポートマップを学習することで条件分布を特徴付ける。これらのアプローチは非ゲージ問題の多くをうまく記述するが、パラメトリックバイアスと、これらの変換を学ぶための勾配ベース(逆)最適化器の信頼性によって、その性能はしばしば制限される。本研究は,参照サンプルをターゲットに反復的にマッピングする非パラメトリック生成モデルを提案する。モデルはブロック三角形輸送マップを使用し、そのコンポーネントは対象分布の条件を特徴付ける。これらのマップは、重み付き$L^2$コスト関数による最適輸送問題の解法から生じ、条件付きサンプリングのための[Trigila and Tabak, 2016]におけるデータ駆動アプローチを拡張した。提案手法は,2次元の例と非線形odeを含むパラメータ推論問題について実証した。 Sampling conditional distributions is a fundamental task for Bayesian inference and density estimation. Generative models, such as normalizing flows and generative adversarial networks, characterize conditional distributions by learning a transport map that pushes forward a simple reference (e.g., a standard Gaussian) to a target distribution. While these approaches successfully describe many non-Gaussian problems, their performance is often limited by parametric bias and the reliability of gradient-based (adversarial) optimizers to learn these transformations. This work proposes a non-parametric generative model that iteratively maps reference samples to the target. The model uses block-triangular transport maps, whose components are shown to characterize conditionals of the target distribution. These maps arise from solving an optimal transport problem with a weighted $L^2$ cost function, thereby extending the data-driven approach in [Trigila and Tabak, 2016] for conditional sampling. The proposed approach is demonstrated on a two dimensional example and on a parameter inference problem involving nonlinear ODEs.	翻訳日:2023-07-11 15:30:35 公開日:2023-07-09
# 超解像とディープラーニングによる意味セグメンテーションの精度向上--空間分解能が各種データセットに与える影響の検討 Enhancing Building Semantic Segmentation Accuracy with Super Resolution and Deep Learning: Investigating the Impact of Spatial Resolution on Various Datasets ( http://arxiv.org/abs/2307.04101v1 ) ライセンス: Link先を確認	Zhiling Guo, Xiaodan Shi, Haoran Zhang, Dou Huang, Xiaoya Song, Jinyue Yan, Ryosuke Shibasaki	(参考訳) リモートセンシングおよび深層学習技術の開発により,高精度かつ効率的にセマンティックセグメンテーションを構築することが可能となった。異なるタスクで成功したにもかかわらず、深層学習に基づくセマンティックセグメンテーションに対する空間分解能の影響に関する議論は非常に不十分であり、コスト効率の高いデータソースを選択することが大きな課題である。以上の課題に対処するため,本研究では,3つの研究領域のリモートセンシング画像を,超解像・ダウンサンプリングにより複数の空間解像度に分割する。その後、モデルトレーニングとテストのためにUNetとFPNの2つの代表的なディープラーニングアーキテクチャが選択される。 2つの深層学習モデルを持つ3つの都市から得られた実験結果から,空間分解能が建物セグメンテーションに大きく影響し,コスト効率が0.3m程度に向上することが示唆された。 The development of remote sensing and deep learning techniques has enabled building semantic segmentation with high accuracy and efficiency. Despite their success in different tasks, the discussions on the impact of spatial resolution on deep learning based building semantic segmentation are quite inadequate, which makes choosing a higher cost-effective data source a big challenge. To address the issue mentioned above, in this study, we create remote sensing images among three study areas into multiple spatial resolutions by super-resolution and down-sampling. After that, two representative deep learning architectures: UNet and FPN, are selected for model training and testing. The experimental results obtained from three cities with two deep learning models indicate that the spatial resolution greatly influences building segmentation results, and with a better cost-effectiveness around 0.3m, which we believe will be an important insight for data selection and preparation.	翻訳日:2023-07-11 15:30:19 公開日:2023-07-09
# 単一例による可視・赤外線自己監督核融合 Visible and infrared self-supervised fusion trained on a single example ( http://arxiv.org/abs/2307.04100v1 ) ライセンス: Link先を確認	Nati Ofir	(参考訳) 本稿では、可視光(RGB)と近赤外(NIR)画像融合の問題に対処する。マルチスペクトルイメージングは、RGBTセンサーの開発以来、画像処理やコンピュータビジョンにおいて重要な課題である。可視画像は色が見え、ノイズ、ヘイズ、雲に苦しむが、NIRチャネルはより鮮明な画像をキャプチャし、デハジングやオブジェクト検出などのアプリケーションでかなり必要である。提案手法は,CNN(Convolutional-Neural-Network)をSSL(Self-Supervised-Learning)でトレーニングすることで,これら2つのチャネルを融合させる。 RGBとIRのそれぞれのペアに対して、ネットワークは最終融合を推定するために数秒間訓練される。 SSLは、SSIM(Sturcture-of-Similarity)損失とEP(Edge-Preservation)損失の組み合わせに基づいている。 SSLのラベルは入力チャネル自身である。この融合は、重いトレーニングプロセスに基づいていないが、各スペクトルチャネルの関連する詳細を保存する。実験部では,大規模データセットのトレーニングを基礎としない他の手法に対して,提案手法はより質的かつ定量的なマルチスペクトル融合結果を達成する。 This paper addresses the problem of visible (RGB) to Near-Infrared (NIR) image fusion. Multispectral imaging is an important task relevant to image processing and computer vision, even more, since the development of the RGBT sensor. While the visible image sees color and suffers from noise, haze, and clouds, the NIR channel captures a clearer picture and it is significantly required by applications such as dehazing or object detection. The proposed approach fuses these two aligned channels by training a Convolutional-Neural-Network (CNN) by a Self-Supervised-Learning (SSL) on a single example. For each such pair, RGB and IR, the network is trained for seconds to deduce the final fusion. The SSL is based on Sturcture-of-Similarity (SSIM) loss combined with Edge-Preservation (EP) loss. The labels for the SSL are the input channels themselves. This fusion preserves the relevant detail of each spectral channel while not based on a heavy training process. In the experiments section, the proposed approach achieves better qualitative and quantitative multispectral fusion results with respect to other recent methods, that are not based on large dataset training.	翻訳日:2023-07-11 15:30:00 公開日:2023-07-09
# gnpアタック:勾配ノルムペナルティによる転送可能な逆行例 GNP Attack: Transferable Adversarial Examples via Gradient Norm Penalty ( http://arxiv.org/abs/2307.04099v1 ) ライセンス: Link先を確認	Tao Wu, Tie Luo, Donald C. Wunsch	(参考訳) 転送性の良い逆例(ae)は、ターゲットモデルに関する内部知識が不要な多様なターゲットモデルに対して、実用的なブラックボックス攻撃を可能にする。つまり、ソースのホワイトボックスモデルの特定のアーキテクチャや特徴表現に容易に適合し、生成されたAEはターゲットのブラックボックスモデルではほとんど機能しない。本稿では,GNP(Gradient Norm Penalty)を用いたAE転送性向上手法を提案する。損失関数最適化手順を駆動し、損失ランドスケープ内の局所最適の平坦な領域に収束する。 11種類の最先端(SOTA)深層学習モデルと6つの先進防衛手法を攻撃することにより、GNPは高い伝達性を持つAEを生成するのに非常に有効であることを示す。また,より強固な転送ベースの攻撃に対して,他の勾配ベース手法と容易に統合できるという点で,非常に柔軟であることを示す。 Adversarial examples (AE) with good transferability enable practical black-box attacks on diverse target models, where insider knowledge about the target models is not required. Previous methods often generate AE with no or very limited transferability; that is, they easily overfit to the particular architecture and feature representation of the source, white-box model and the generated AE barely work for target, black-box models. In this paper, we propose a novel approach to enhance AE transferability using Gradient Norm Penalty (GNP). It drives the loss function optimization procedure to converge to a flat region of local optima in the loss landscape. By attacking 11 state-of-the-art (SOTA) deep learning models and 6 advanced defense methods, we empirically show that GNP is very effective in generating AE with high transferability. We also demonstrate that it is very flexible in that it can be easily integrated with other gradient based methods for stronger transfer-based attacks.	翻訳日:2023-07-11 15:29:42 公開日:2023-07-09
# 適応型システムのための説明可能なオンライン強化学習に関する研究 A User Study on Explainable Online Reinforcement Learning for Adaptive Systems ( http://arxiv.org/abs/2307.04098v1 ) ライセンス: Link先を確認	Andreas Metzger and Jan Laufer and Felix Feit and Klaus Pohl	(参考訳) オンライン強化学習(RL)は、設計時間の不確実性の存在下で適応システムの実現にますます利用されている。オンラインRLは実際の運用データからの学習を容易にし、実行時にのみ利用できるフィードバックを活用する。しかし、オンラインRLは、RLアルゴリズムへのフィードバックを定量化し、学習をガイドする効果的な報酬関数の定義を必要とする。 deep rlへの関心が高まるにつれ、学習知識はもはや明示的に表現されるものではなく、ニューラルネットワークとして表現される。人間にとって、ニューラルネットワークのパラメータ化と具体的なRL決定を関連付けることは事実上不可能になる。したがって、Deep RLは本質的にブラックボックスとして現れ、適応システムのデバッグを著しく制限する。我々は以前、重要な時点において決定が下された理由についての視覚的な洞察を提供する説明可能なRL技術であるXRL-DINEを紹介した。本稿では,学術・産業系ソフトウェア技術者54名を対象に,(1)XRL-DINEを用いて異なるタスクを遂行する際のソフトウェア技術者の性能評価を行い,(2)XRL-DINEの有用性と使いやすさについて考察する。 Online reinforcement learning (RL) is increasingly used for realizing adaptive systems in the presence of design time uncertainty. Online RL facilitates learning from actual operational data and thereby leverages feedback only available at runtime. However, Online RL requires the definition of an effective and correct reward function, which quantifies the feedback to the RL algorithm and thereby guides learning. With Deep RL gaining interest, the learned knowledge is no longer explicitly represented, but is represented as a neural network. For a human, it becomes practically impossible to relate the parametrization of the neural network to concrete RL decisions. Deep RL thus essentially appears as a black box, which severely limits the debugging of adaptive systems. We previously introduced the explainable RL technique XRL-DINE, which provides visual insights into why certain decisions were made at important time points. Here, we introduce an empirical user study involving 54 software engineers from academia and industry to assess (1) the performance of software engineers when performing different tasks using XRL-DINE and (2) the perceived usefulness and ease of use of XRL-DINE.	翻訳日:2023-07-11 15:29:24 公開日:2023-07-09
# 1クラス分類と異常検出のための制約付き生成投影 Restricted Generative Projection for One-Class Classification and Anomaly Detection ( http://arxiv.org/abs/2307.04097v1 ) ライセンス: Link先を確認	Feng Xiao, Ruoyu Sun, Jicong Fan	(参考訳) 一級分類と異常検出のための簡単なフレームワークを提案する。中心となるアイデアは、未知のトレーニング(通常の)データの分布を既知のターゲット分布に変換するマッピングを学ぶことだ。重要な点として、ターゲット分布は十分に単純でコンパクトで情報に富むべきである。簡易性は、分布から容易にサンプリングできること、コンパクト性は、正規データと異常データとの間の決定境界が明確かつ信頼性があること、情報性は、変換されたデータが元のデータの重要な情報を保存することを保証することである。そこで,超球面における一様,超球面上の一様,あるいは超球面間の一様を対象分布として用いることを提案する。次に、変換されたデータ分布とターゲット分布との距離を最小化し、元のデータの再構成誤差を十分に小さくする。複数のベンチマークデータセットの比較研究により,本手法の有効性をベースラインと比較した。 We present a simple framework for one-class classification and anomaly detection. The core idea is to learn a mapping to transform the unknown distribution of training (normal) data to a known target distribution. Crucially, the target distribution should be sufficiently simple, compact, and informative. The simplicity is to ensure that we can sample from the distribution easily, the compactness is to ensure that the decision boundary between normal data and abnormal data is clear and reliable, and the informativeness is to ensure that the transformed data preserve the important information of the original data. Therefore, we propose to use truncated Gaussian, uniform in hypersphere, uniform on hypersphere, or uniform between hyperspheres, as the target distribution. We then minimize the distance between the transformed data distribution and the target distribution while keeping the reconstruction error for the original data small enough. Comparative studies on multiple benchmark datasets verify the effectiveness of our methods in comparison to baselines.	翻訳日:2023-07-11 15:29:04 公開日:2023-07-09
# 言語間意味解析のための最適伝達後方アライメント Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing ( http://arxiv.org/abs/2307.04096v1 ) ライセンス: Link先を確認	Tom Sherborne, Tom Hosking, Mirella Lapata	(参考訳) 言語間のセマンティックパーシングは、高いソース言語(例えば英語)から少ないトレーニングデータを持つ低リソース言語へのパーシング能力を伝達する。以前の研究は銀標準データ拡張法やゼロショット法を主に検討していたが、金の少ないデータを利用する方法は比較的探究されていない。最適輸送を用いた確率潜在変数間の言語間差異を明示的に最小化することにより,言語間意味解析への新たなアプローチを提案する。この直接的なガイダンスが、より少ない例と少ないトレーニングを用いて、自然言語からの構文解析をどのように改善するかを実証する。本手法は,mtopとmultiatis++sqlの2つのデータセットで評価し,数秒の言語間比較で最新の結果を得た。アブレーション研究により, 並列入力を使わずとも, 性能が向上することが明らかとなった。さらに,本モデルでは,潜在空間における言語間構造をよりよく捉え,意味表現の類似性を改善する。 Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data. Previous work has primarily considered silver-standard data augmentation or zero-shot methods, however, exploiting few-shot gold data is comparatively unexplored. We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between probabilistic latent variables using Optimal Transport. We demonstrate how this direct guidance improves parsing from natural languages using fewer examples and less training. We evaluate our method on two datasets, MTOP and MultiATIS++SQL, establishing state-of-the-art results under a few-shot cross-lingual regime. Ablation studies further reveal that our method improves performance even without parallel input translations. In addition, we show that our model better captures cross-lingual structure in the latent space to improve semantic representation similarity.	翻訳日:2023-07-11 15:28:49 公開日:2023-07-09
# 深層連続学習のためのクラス・インクリメンタル混合ガウス Class-Incremental Mixture of Gaussians for Deep Continual Learning ( http://arxiv.org/abs/2307.04094v1 ) ライセンス: Link先を確認	Lukasz Korycki, Bartosz Krawczyk	(参考訳) 定常データに対する継続的な学習モデルは、それらに連続的に来る概念の学習と保持に焦点を当てる。最も一般的なクラスインクリメンタルな環境では、高レベルのグループ化なしに、クラスをひとつずつ扱う準備ができている必要があります。この要件は、これまで提案されていた多くの手法を無効にし、研究者により柔軟な代替アプローチを探さざるを得ない。本研究では,遠心駆動型手法の考え方に従い,ガウスモデルの混合を連続学習フレームワークに組み入れることを提案する。解の退化を回避しながら識別的特徴を学習できる勾配に基づくアプローチと設計損失を利用することで,混合モデルと深部特徴抽出器を組み合わせ,潜在空間における共同最適化と調整を実現した。さらに,固定抽出器を用いてメモリフリーシナリオで効果的に学習できることを示す。実験では,提案手法の有効性を実証的に実証し,画像分類問題の文脈で評価された最先端の連続学習ベースラインと比較した場合のモデルの競争力を示す。 Continual learning models for stationary data focus on learning and retaining concepts coming to them in a sequential manner. In the most generic class-incremental environment, we have to be ready to deal with classes coming one by one, without any higher-level grouping. This requirement invalidates many previously proposed methods and forces researchers to look for more flexible alternative approaches. In this work, we follow the idea of centroid-driven methods and propose end-to-end incorporation of the mixture of Gaussians model into the continual learning framework. By employing the gradient-based approach and designing losses capable of learning discriminative features while avoiding degenerate solutions, we successfully combine the mixture model with a deep feature extractor allowing for joint optimization and adjustments in the latent space. Additionally, we show that our model can effectively learn in memory-free scenarios with fixed extractors. In the conducted experiments, we empirically demonstrate the effectiveness of the proposed solutions and exhibit the competitiveness of our model when compared with state-of-the-art continual learning baselines evaluated in the context of image classification problems.	翻訳日:2023-07-11 15:28:33 公開日:2023-07-09
# クエリで決定木を適切に学習するNP-Hard Properly Learning Decision Trees with Queries Is NP-Hard ( http://arxiv.org/abs/2307.04093v1 ) ライセンス: Link先を確認	Caleb Koch and Carmen Strassle and Li-Yang Tan	(参考訳) PACが問合せ付き決定木を適切に学習することがNPハードであることを証明する(Bshouty 1993; Guijarro-Lavin-Raghavan 1999; Mehta-Raghavan 2002; Feldman 2016)。ランダムな例から判断木を適切に学習することの難しさを確立する(pitt-valiant 1988)まで遡る長い作業があったが、クエリ学習者のより困難な設定にはさまざまなテクニックが必要であり、それまでの下限は存在しなかった。そこで本研究では,決定木最小化問題(Zantema-Bodlaender 2000; Sieling 2003)について,最もよく知られた下界を単純化し,強化する。技術的レベルでは、決定木複雑性について研究するが、いかなる複雑性尺度に対しても考慮できる硬度蒸留の概念を導入し、大きな決定木を必要とする関数に対しては、その複雑さに責任がある小さな入力の集合を識別する一般的な方法を与える。我々の手法は、一定のエラーを許容するクエリ学習者を規則化さえしている。これは、逆多項式誤差のみを保持するランダムな例の設定に対する既存の下界とは対照的である。その結果,一様分布下で決定木を適切に学習する近多項時間問合せアルゴリズム(blanc-lange-qiao-tan 2022)を組み合わせることで,分布仮定が問題に劇的な影響を与えることを示した。 We prove that it is NP-hard to properly PAC learn decision trees with queries, resolving a longstanding open problem in learning theory (Bshouty 1993; Guijarro-Lavin-Raghavan 1999; Mehta-Raghavan 2002; Feldman 2016). While there has been a long line of work, dating back to (Pitt-Valiant 1988), establishing the hardness of properly learning decision trees from random examples, the more challenging setting of query learners necessitates different techniques and there were no previous lower bounds. En route to our main result, we simplify and strengthen the best known lower bounds for a different problem of Decision Tree Minimization (Zantema-Bodlaender 2000; Sieling 2003). On a technical level, we introduce the notion of hardness distillation, which we study for decision tree complexity but can be considered for any complexity measure: for a function that requires large decision trees, we give a general method for identifying a small set of inputs that is responsible for its complexity. Our technique even rules out query learners that are allowed constant error. This contrasts with existing lower bounds for the setting of random examples which only hold for inverse-polynomial error. Our result, taken together with a recent almost-polynomial time query algorithm for properly learning decision trees under the uniform distribution (Blanc-Lange-Qiao-Tan 2022), demonstrates the dramatic impact of distributional assumptions on the problem.	翻訳日:2023-07-11 15:28:13 公開日:2023-07-09
# CMDFusion: LIDARセマンティックセマンティックセグメンテーションのための双方向融合ネットワーク CMDFusion: Bidirectional Fusion Network with Cross-modality Knowledge Distillation for LIDAR Semantic Segmentation ( http://arxiv.org/abs/2307.04091v1 ) ライセンス: Link先を確認	Jun Cen, Shiwei Zhang, Yixuan Pei, Kun Li, Hang Zheng, Maochun Luo, Yingya Zhang, Qifeng Chen	(参考訳) 2D RGB画像と3D LIDAR点雲は、自動運転車の知覚システムに補完的な知識を提供する。 LIDARセマンティックセグメンテーションタスクのためにいくつかの2Dおよび3D融合法が検討されているが、それらは異なる問題に悩まされている。 2D-to-3D融合法は、実世界のシナリオでは利用できないが、3D-to-2D融合法は2D情報を完全に利用できない。そこで本研究では,クロスモーダル知識蒸留(CMDFusion)を用いた双方向融合ネットワークを提案する。我々の方法には2つの貢献がある。まず,2次元から3次元への融合と3次元から2次元への融合により,両方向の融合スキームは2次元の融合スキームのいずれかを上回る3次元特徴を明確かつ暗黙的に拡張する。次に、2dネットワーク(カメラブランチ)から3dネットワーク(2d知識ブランチ)への2d知識を蒸留することにより、3dネットワークがカメラのfov(視野領域)にない点でも2d情報を生成することができる。このようにして、2D知識ブランチは3D LIDAR入力に従って2D情報を提供するため、推論中にRGB画像は不要になる。我々のCMDFusionは、SemanticKITTIとnuScenesデータセット上のすべてのフュージョンベースのメソッドの中で、最高のパフォーマンスを実現していることを示す。コードはhttps://github.com/jun-cen/cmdfusionでリリースされる。 2D RGB images and 3D LIDAR point clouds provide complementary knowledge for the perception system of autonomous vehicles. Several 2D and 3D fusion methods have been explored for the LIDAR semantic segmentation task, but they suffer from different problems. 2D-to-3D fusion methods require strictly paired data during inference, which may not be available in real-world scenarios, while 3D-to-2D fusion methods cannot explicitly make full use of the 2D information. Therefore, we propose a Bidirectional Fusion Network with Cross-Modality Knowledge Distillation (CMDFusion) in this work. Our method has two contributions. First, our bidirectional fusion scheme explicitly and implicitly enhances the 3D feature via 2D-to-3D fusion and 3D-to-2D fusion, respectively, which surpasses either one of the single fusion schemes. Second, we distillate the 2D knowledge from a 2D network (Camera branch) to a 3D network (2D knowledge branch) so that the 3D network can generate 2D information even for those points not in the FOV (field of view) of the camera. In this way, RGB images are not required during inference anymore since the 2D knowledge branch provides 2D information according to the 3D LIDAR input. We show that our CMDFusion achieves the best performance among all fusion-based methods on SemanticKITTI and nuScenes datasets. The code will be released at https://github.com/Jun-CEN/CMDFusion.	翻訳日:2023-07-11 15:27:47 公開日:2023-07-09
# 注意機構を用いた衛星観測における海ゴミ検出 Marine Debris Detection in Satellite Surveillance using Attention Mechanisms ( http://arxiv.org/abs/2307.04128v1 ) ライセンス: Link先を確認	Ao Shen, Yijie Zhu and Richard Jiang	(参考訳) 海洋デブリは環境保護の重要な問題であるが、現在の海洋デブリの特定方法はまだ限られている。海洋堆積物の局在化において高い効率とより広い適用性を達成するため,本研究は,yolov7のインスタンス分割を異なる注意機構と組み合わせ,最良のモデルについて検討する。海洋ゴミを含む衛星画像からなるラベル付きデータセットを用いて,軽量座標注意,CBAM(空間焦点とチャネル焦点を組み合わせた),ボトルネックトランスフォーマ(自己注意に基づく)の3つの注意モデルを検討した。ボックス検出評価の結果,CBAMは座標注意(F1スコア71%)とYOLOv7/bottleneck Transformer(F1スコア約66%)と比較して最高の成績(F1スコア77%)を示した。マスク評価では、cbamが再びf1スコアを73%、コーディネートアテンションとyolov7が同等のパフォーマンス(f1スコア68%/69%)、ボトルネックトランスフォーマーがf1スコア56%で遅れていた。これらの結果から,CBAMは海洋破片の検出に最適であることがわかった。しかし、ボトルネックトランスフォーマは手動アノテーションで見落とされた部分を検出し、大きな破片のマスク精度が向上し、実用的な性能が向上する可能性があることに注意すべきである。 Marine debris is an important issue for environmental protection, but current methods for locating marine debris are yet limited. In order to achieve higher efficiency and wider applicability in the localization of Marine debris, this study tries to combine the instance segmentation of YOLOv7 with different attention mechanisms and explores the best model. By utilizing a labelled dataset consisting of satellite images containing ocean debris, we examined three attentional models including lightweight coordinate attention, CBAM (combining spatial and channel focus), and bottleneck transformer (based on self-attention). Box detection assessment revealed that CBAM achieved the best outcome (F1 score of 77%) compared to coordinate attention (F1 score of 71%) and YOLOv7/bottleneck transformer (both F1 scores around 66%). Mask evaluation showed CBAM again leading with an F1 score of 73%, whereas coordinate attention and YOLOv7 had comparable performances (around F1 score of 68%/69%) and bottleneck transformer lagged behind at F1 score of 56%. These findings suggest that CBAM offers optimal suitability for detecting marine debris. However, it should be noted that the bottleneck transformer detected some areas missed by manual annotation and displayed better mask precision for larger debris pieces, signifying potentially superior practical performance.	翻訳日:2023-07-11 15:22:45 公開日:2023-07-09
# 対話のための言語間韻律伝達に向けて Towards cross-language prosody transfer for dialog ( http://arxiv.org/abs/2307.04123v1 ) ライセンス: Link先を確認	Jonathan E. Avila, Nigel G. Ward	(参考訳) 現在、音声音声翻訳システムは、対話目的の使用を十分にサポートしていない。特に、不適切な韻律移動により話者意図や姿勢のニュアンスを失うことがある。我々はこれを克服するためにすべきことを探求する。まず, 英語とスペイン語のコーパスを収集するために, 1871年のマッチング発話ペアを用いて, バイリンガル話者が他の言語での会話から発話を再現するデータ収集プロトコルを開発した。第2に,幅広い韻律的特徴集合上のユークリッド距離に基づく簡易な韻律的異性度尺度を開発した。次にこれらを用いて、言語間の韻律的差異を調査し、3つの単純なベースラインモデルの有用性を測定し、より強力なモデリングを必要とする現象を特定する。本研究は, 言語間韻律に関する今後の研究や, 効果的韻律伝達が可能な音声音声翻訳システムの設計について報告する。 Speech-to-speech translation systems today do not adequately support use for dialog purposes. In particular, nuances of speaker intent and stance can be lost due to improper prosody transfer. We present an exploration of what needs to be done to overcome this. First, we developed a data collection protocol in which bilingual speakers re-enact utterances from an earlier conversation in their other language, and used this to collect an English-Spanish corpus, so far comprising 1871 matched utterance pairs. Second, we developed a simple prosodic dissimilarity metric based on Euclidean distance over a broad set of prosodic features. We then used these to investigate cross-language prosodic differences, measure the likely utility of three simple baseline models, and identify phenomena which will require more powerful modeling. Our findings should inform future research on cross-language prosody and the design of speech-to-speech translation systems capable of effective prosody transfer.	翻訳日:2023-07-11 15:22:18 公開日:2023-07-09
# 赤外線符号化画像を用いた低光度画像の強調 Enhancing Low-Light Images Using Infrared-Encoded Images ( http://arxiv.org/abs/2307.04122v1 ) ライセンス: Link先を確認	Shulin Tian, Yufei Wang, Renjie Wan, Wenhan Yang, Alex C. Kot, Bihan Wen	(参考訳) 低照度画像強調タスクは、本質的に不備であるため、不可欠だが困難である。以前の芸術は、ピクセル単位での損失を用いて可視光スペクトルで撮影された低光度画像を主に重視し、わずかな収入光子によって明るさ、コントラスト、テクスチャの詳細を回復する能力を制限する。本研究では,低光環境下で撮影される画像の可視性を向上させるために,赤外線遮断フィルタ(ir)を除去し,より多くの光子を捕捉し,irスペクトルからの情報を包含することで信号対雑音比が向上する手法を提案する。提案手法を検証するために,irカットオフフィルタを使わずに撮像された低光画像と,外部フィルタを用いた長時間露光参照画像のペアデータセットを収集した。その結果,提案手法の有効性が実証され,定量的,質的に性能が向上した。データセットとコードはhttps://wyf0912.github.io/ELIEI/で公開されている。 Low-light image enhancement task is essential yet challenging as it is ill-posed intrinsically. Previous arts mainly focus on the low-light images captured in the visible spectrum using pixel-wise loss, which limits the capacity of recovering the brightness, contrast, and texture details due to the small number of income photons. In this work, we propose a novel approach to increase the visibility of images captured under low-light environments by removing the in-camera infrared (IR) cut-off filter, which allows for the capture of more photons and results in improved signal-to-noise ratio due to the inclusion of information from the IR spectrum. To verify the proposed strategy, we collect a paired dataset of low-light images captured without the IR cut-off filter, with corresponding long-exposure reference images with an external filter. The experimental results on the proposed dataset demonstrate the effectiveness of the proposed method, showing better performance quantitatively and qualitatively. The dataset and code are publicly available at https://wyf0912.github.io/ELIEI/	翻訳日:2023-07-11 15:22:06 公開日:2023-07-09
# 双曲偏微分方程式を解くための深層学習フレームワーク:その1 A Deep Learning Framework for Solving Hyperbolic Partial Differential Equations: Part I ( http://arxiv.org/abs/2307.04121v1 ) ライセンス: Link先を確認	Rajat Arora	(参考訳) 物理情報ニューラルネットワーク(PINN)は、偏微分方程式(PDE)に対する解の堅牢かつ正確な近似を提供する強力なツールとして登場した。しかし、PINNは、PDEを支配的な双曲的特徴と近似しようとする際に深刻な困難と課題に直面している。本研究は, 非線形pdesに対する近似解法として, aプライオリな解の知識や不連続の場所を知らずに, 衝撃や不連続を生じさせる物理学的インフォームド深層学習フレームワークの開発に焦点をあてている。この研究は、離散化された領域のノードにおける解の値を解く有限要素法から動機づけられ、これらのノーダル値を用いてグローバルに定義された解体を得る。不連続ガレルキン法の厳密な数学的基礎の上に構築され、この枠組みは境界条件(ノイマン/ディリクレ)、エントロピー条件、および正則性要件を自然に扱う。解析解を用いた数値実験と検証により,提案手法の精度,堅牢性,有効性を示す。 Physics informed neural networks (PINNs) have emerged as a powerful tool to provide robust and accurate approximations of solutions to partial differential equations (PDEs). However, PINNs face serious difficulties and challenges when trying to approximate PDEs with dominant hyperbolic character. This research focuses on the development of a physics informed deep learning framework to approximate solutions to nonlinear PDEs that can develop shocks or discontinuities without any a-priori knowledge of the solution or the location of the discontinuities. The work takes motivation from finite element method that solves for solution values at nodes in the discretized domain and use these nodal values to obtain a globally defined solution field. Built on the rigorous mathematical foundations of the discontinuous Galerkin method, the framework naturally handles imposition of boundary conditions (Neumann/Dirichlet), entropy conditions, and regularity requirements. Several numerical experiments and validation with analytical solutions demonstrate the accuracy, robustness, and effectiveness of the proposed framework.	翻訳日:2023-07-11 15:21:39 公開日:2023-07-09
# FILM: 事前学習言語モデルによる画像分類はどのように適合するか? FILM: How can Few-Shot Image Classification Benefit from Pre-Trained Language Models? ( http://arxiv.org/abs/2307.04114v1 ) ライセンス: Link先を確認	Zihao Jiang, Yunkai Dang, Dong Pang, Huishuai Zhang, Weiran Huang	(参考訳) 少数のサンプルしか持たない新しいクラスに一般化可能なモデルをトレーニングすることを目的としている。近年、クラス名からアクセス可能なセマンティック情報を用いて、少数ショット学習を強化するための一連の研究が提案されている。しかし、これらの作業は、標準のマイナショット学習フレームワークのビジュアルプロトタイプや機能抽出子などの既存のモジュールの改善に焦点を当てている。これにより、意味情報の完全な利用が制限される。本稿では,コントラスト学習に基づく事前学習言語モデルを用いた,新しい数発学習フレームワークを提案する。テキストベースの事前学習言語モデルから得られる視覚的特徴とテキスト埋め込みの整合性に対処するため,フレームワークのテキスト分岐を慎重に設計し,コサイン類似性を一般化するためのメトリックモジュールを導入する。転送性を向上させるため、メトリックモジュールを異なる数ショットタスクに適応させ、MAMLを採用してバイレベル最適化によりモデルをトレーニングする。さらに,本手法の有効性を実証するため,複数のベンチマーク実験を行った。 Few-shot learning aims to train models that can be generalized to novel classes with only a few samples. Recently, a line of works are proposed to enhance few-shot learning with accessible semantic information from class names. However, these works focus on improving existing modules such as visual prototypes and feature extractors of the standard few-shot learning framework. This limits the full potential use of semantic information. In this paper, we propose a novel few-shot learning framework that uses pre-trained language models based on contrastive learning. To address the challenge of alignment between visual features and textual embeddings obtained from text-based pre-trained language model, we carefully design the textual branch of our framework and introduce a metric module to generalize the cosine similarity. For better transferability, we let the metric module adapt to different few-shot tasks and adopt MAML to train the model via bi-level optimization. Moreover, we conduct extensive experiments on multiple benchmarks to demonstrate the effectiveness of our method.	翻訳日:2023-07-11 15:21:06 公開日:2023-07-09
# フレーム次フリップによるデータセット生成による部分アノテーションからのミトーシスの検出 Mitosis Detection from Partial Annotation by Dataset Generation via Frame-Order Flipping ( http://arxiv.org/abs/2307.04113v1 ) ライセンス: Link先を確認	Kazuya Nishimura, Ami Katanaya, Shinichiro Chuma, Ryoma Bise	(参考訳) 分裂現象の検出は、生物医学研究において重要な役割を担っている。深層学習に基づくミトーシス検出法は,一定のラベル付きデータを用いて優れた性能を達成している。しかし、これらの手法は各撮像条件にアノテーションを必要とする。ラベル付きデータの収集には時間を要する。本稿では,部分的に注釈付きシーケンスでトレーニング可能なミオシス検出法を提案する。基本的なアイデアは、部分ラベルから完全なラベル付きデータセットを生成し、生成されたデータセットで分裂検出モデルをトレーニングすることだ。まず,フレーム次反転によりmitosisイベントを含まない画像対を生成する。次に,アルファブレイディングペーストにより画像ペアにmitosisイベントをペーストし,完全なラベル付きデータセットを生成する。提案手法は,4つのデータセット上での性能を実証し,部分ラベル付きシーケンスを用いた他の比較よりも優れていることを確認した。 Detection of mitosis events plays an important role in biomedical research. Deep-learning-based mitosis detection methods have achieved outstanding performance with a certain amount of labeled data. However, these methods require annotations for each imaging condition. Collecting labeled data involves time-consuming human labor. In this paper, we propose a mitosis detection method that can be trained with partially annotated sequences. The base idea is to generate a fully labeled dataset from the partial labels and train a mitosis detection model with the generated dataset. First, we generate an image pair not containing mitosis events by frame-order flipping. Then, we paste mitosis events to the image pair by alpha-blending pasting and generate a fully labeled dataset. We demonstrate the performance of our method on four datasets, and we confirm that our method outperforms other comparisons which use partially labeled sequences.	翻訳日:2023-07-11 15:20:12 公開日:2023-07-09
# 部分観測状態からの時空間連続型PDEの学習 Learning Space-Time Continuous Neural PDEs from Partially Observed States ( http://arxiv.org/abs/2307.04110v1 ) ライセンス: Link先を確認	Valerii Iakovlev, Markus Heinonen, Harri L\"ahdesm\"aki	(参考訳) 本稿では,不規則時空間格子上の雑音および部分観測から偏微分方程式(pdes)を学習するための新しい格子非依存モデルを提案する。本稿では,効率的な確率的枠組みを持つ時空連続潜在性ニューラルpdeモデルと,データ効率とグリッド独立性を改善する新しいエンコーダ設計を提案する。潜在状態力学は、コロケーション法とライン法を組み合わせたPDEモデルによって制御される。近似後推定にアモータイズされた変分推定を用い、訓練速度と安定性を向上させるために多重射撃法を用いる。本モデルは,複雑な合成データと実世界のデータセットにおける最先端のパフォーマンスを示し,従来のアプローチの限界を克服し,部分的に観測されたデータを効果的に処理する。提案手法は,データ駆動pdeモデリングを前進させる可能性を示し,複雑な部分観測動的プロセスのロバストでグリッド非依存なモデリングを可能にする。 We introduce a novel grid-independent model for learning partial differential equations (PDEs) from noisy and partial observations on irregular spatiotemporal grids. We propose a space-time continuous latent neural PDE model with an efficient probabilistic framework and a novel encoder design for improved data efficiency and grid independence. The latent state dynamics are governed by a PDE model that combines the collocation method and the method of lines. We employ amortized variational inference for approximate posterior estimation and utilize a multiple shooting technique for enhanced training speed and stability. Our model demonstrates state-of-the-art performance on complex synthetic and real-world datasets, overcoming limitations of previous approaches and effectively handling partially-observed data. The proposed model outperforms recent methods, showing its potential to advance data-driven PDE modeling and enabling robust, grid-independent modeling of complex partially-observed dynamic processes.	翻訳日:2023-07-11 15:19:52 公開日:2023-07-09
# 鳥眼視における物体検出とセグメンテーションのためのパラメトリック奥行きに基づく特徴表現学習 Parametric Depth Based Feature Representation Learning for Object Detection and Segmentation in Bird's Eye View ( http://arxiv.org/abs/2307.04106v1 ) ライセンス: Link先を確認	Jiayu Yang, Enze Xie, Jose M. Alvarez, Miaomiao Liu	(参考訳) 近年の自律走行のための視覚のみの知覚モデルは、多視点画像特徴をバードアイビュー(BEV)空間に符号化することで有望な結果を得た。これらの手法の主なボトルネックは、画像特徴をBEV座標フレームに変換することである。本稿では,そのような特徴変換をモデル化するために,深度などの幾何学情報を活用することに焦点を当てる。既存の研究は、メモリ消費に繋がる非パラメトリックな深さ分布モデリングや、この問題に対処する幾何情報を無視している。対照的に、特徴変換にパラメトリック深度分布モデルを用いることを提案する。まず2次元画像の特徴をego車両で定義された3次元空間に持ち上げ,各ビューにおける各画素のパラメトリック深度分布を予測した。次に、深度からBEVフレームへの3次元空間占有度に基づいて、3次元特徴量を集約する。最後に、オブジェクト検出やセマンティクスセグメンテーションといった下流タスクに変換された機能を使用します。既存のセマンティックセグメンテーション手法は、視覚的な情報を考慮に入れないため、幻覚的な問題にも悩まされる。この幻覚は、制御や計画といった後続のモジュールでは特に問題となる。この問題を軽減するため,本手法は深度不確実性と信頼性の高い可視性評価を行う。我々はさらにパラメトリック深度モデルを用いて、幻覚の問題を緩和できる新しい可視性を考慮した評価指標を提案する。 nuscenesデータセットにおけるオブジェクト検出とセマンティクスセグメンテーションに関する広範な実験により,提案手法が両タスクにおいて既存の手法よりも優れていることが証明された。 Recent vision-only perception models for autonomous driving achieved promising results by encoding multi-view image features into Bird's-Eye-View (BEV) space. A critical step and the main bottleneck of these methods is transforming image features into the BEV coordinate frame. This paper focuses on leveraging geometry information, such as depth, to model such feature transformation. Existing works rely on non-parametric depth distribution modeling leading to significant memory consumption, or ignore the geometry information to address this problem. In contrast, we propose to use parametric depth distribution modeling for feature transformation. We first lift the 2D image features to the 3D space defined for the ego vehicle via a predicted parametric depth distribution for each pixel in each view. Then, we aggregate the 3D feature volume based on the 3D space occupancy derived from depth to the BEV frame. Finally, we use the transformed features for downstream tasks such as object detection and semantic segmentation. Existing semantic segmentation methods do also suffer from an hallucination problem as they do not take visibility information into account. This hallucination can be particularly problematic for subsequent modules such as control and planning. To mitigate the issue, our method provides depth uncertainty and reliable visibility-aware estimations. We further leverage our parametric depth modeling to present a novel visibility-aware evaluation metric that, when taken into account, can mitigate the hallucination problem. Extensive experiments on object detection and semantic segmentation on the nuScenes datasets demonstrate that our method outperforms existing methods on both tasks.	翻訳日:2023-07-11 15:19:36 公開日:2023-07-09
# 無仮定バイアス緩和に向けて Towards Assumption-free Bias Mitigation ( http://arxiv.org/abs/2307.04105v1 ) ライセンス: Link先を確認	Chia-Yuan Chang, Yu-Neng Chuang, Kwei-Herng Lai, Xiaotian Han, Xia Hu, Na Zou	(参考訳) 驚くべき予測能力にもかかわらず、機械学習モデルは特定の人口層に対する差別を示し、不公平な予測行動に苦しむ。差別を緩和するために、広範囲な研究は複数のアプローチによる機密属性の不等な分布の排除に焦点を当てている。しかしながら、プライバシ上の懸念から、センシティブな属性は現実のシナリオでは利用できないか、あるいは欠落していることが多い。したがって、いくつかの既存の研究は、敏感な属性なしでバイアスを軽減する。これらの研究は、センシティブな属性の不正確な予測や、バイアスに関連する手動で定義された非センシティブな属性の不平等な分布の緩和といった課題に直面している。後者は、感度特性と非感度特性の相関について強い仮定を必要とする。データ分散とタスクの目標が異なるため、非感受性属性に対する強い仮定は有効ではなく、ドメインの専門知識を必要とする可能性がある。本研究では,バイアス緩和のための特徴的相互作用をモデル化し,関連する属性を自動的に検出する前提なしフレームワークを提案する。提案するフレームワークは、特定されたバイアスのある特徴相互作用による不公平な影響を軽減することを目的としている。実世界の4つのデータセットに対する実験結果から,提案するフレームワークは,偏りのある特徴相互作用を考慮し,不当な予測行動を著しく軽減できることが示された。 Despite the impressive prediction ability, machine learning models show discrimination towards certain demographics and suffer from unfair prediction behaviors. To alleviate the discrimination, extensive studies focus on eliminating the unequal distribution of sensitive attributes via multiple approaches. However, due to privacy concerns, sensitive attributes are often either unavailable or missing in real-world scenarios. Therefore, several existing works alleviate the bias without sensitive attributes. Those studies face challenges, either in inaccurate predictions of sensitive attributes or the need to mitigate unequal distribution of manually defined non-sensitive attributes related to bias. The latter requires strong assumptions about the correlation between sensitive and non-sensitive attributes. As data distribution and task goals vary, the strong assumption on non-sensitive attributes may not be valid and require domain expertise. In this work, we propose an assumption-free framework to detect the related attributes automatically by modeling feature interaction for bias mitigation. The proposed framework aims to mitigate the unfair impact of identified biased feature interactions. Experimental results on four real-world datasets demonstrate that our proposed framework can significantly alleviate unfair prediction behaviors by considering biased feature interactions.	翻訳日:2023-07-11 15:19:14 公開日:2023-07-09
# CA-CentripetalNet: ハードハット着用検出のための新しいアンカーフリーディープラーニングフレームワーク CA-CentripetalNet: A novel anchor-free deep learning framework for hardhat wearing detection ( http://arxiv.org/abs/2307.04103v1 ) ライセンス: Link先を確認	Zhijian Liu, Nian Cai, Wensheng Ouyang, Chengbin Zhang, Nili Tian, Han Wang	(参考訳) 検出用ヘルメットの自動着用は、複雑なビデオ監視シーンのため、建設現場の安全管理を強化することができる。従来の深層学習手法の一般化に対処するために,CA-CentripetalNetと呼ばれる新しいアンカーフリー深層学習フレームワークが提案されている。垂直水平コーナープール型ca-centripetalnetの特性抽出と利用能力の向上を目的として, 2つの新しい手法を提案した。前者は限界特徴と内部特徴の包括的利用を実現するように設計されている。後者は、バックボーンが内部機能に注意を払わなければならないように設計されており、これは検出中ではなくトレーニング中にのみ使用される。実験結果から,CA-CentripetalNet は 86.63% mAP (平均平均精度) で,既存のディープラーニングベースの手法,特に小型のハードハットや非ウーンハードハットと比較して,メモリ消費を適度に削減した。 Automatic hardhat wearing detection can strengthen the safety management in construction sites, which is still challenging due to complicated video surveillance scenes. To deal with the poor generalization of previous deep learning based methods, a novel anchor-free deep learning framework called CA-CentripetalNet is proposed for hardhat wearing detection. Two novel schemes are proposed to improve the feature extraction and utilization ability of CA-CentripetalNet, which are vertical-horizontal corner pooling and bounding constrained center attention. The former is designed to realize the comprehensive utilization of marginal features and internal features. The latter is designed to enforce the backbone to pay attention to internal features, which is only used during the training rather than during the detection. Experimental results indicate that the CA-CentripetalNet achieves better performance with the 86.63% mAP (mean Average Precision) with less memory consumption at a reasonable speed than the existing deep learning based methods, especially in case of small-scale hardhats and non-worn-hardhats.	翻訳日:2023-07-11 15:18:54 公開日:2023-07-09
# DIFF-NST: 変形可能な神経伝達のための拡散インターリーブ DIFF-NST: Diffusion Interleaving For deFormable Neural Style Transfer ( http://arxiv.org/abs/2307.04157v1 ) ライセンス: Link先を確認	Dan Ruta, Gemma Canet Tarr\'es, Andrew Gilbert, Eli Shechtman, Nicholas Kolkin, John Collomosse	(参考訳) ニューラルスタイル転送(Neural Style Transfer, NST)は、コンテンツイメージの芸術的外観を、参照スタイルイメージのスタイルに合わせるために、ニューラルテクニックを適用した研究分野である。伝統的に、NST法はテクスチャベースの画像編集に重点を置いており、ほとんどの低レベル情報に影響を与え、ほとんどの画像構造を同じに保っている。しかし、特にそのスタイルが抽象的である場合や、スタイルの主要な概念が、一部のコンテンツの変形したレンドレーションにある場合など、一部のスタイルには、スタイルに基づく変形が望ましい。安定拡散など最近の拡散モデルの導入により、より強力な画像生成技術にアクセスでき、新しい可能性を可能にしている。本研究では,従来のモデルにおいて,変形可能なスタイル転送を実現しつつ,スタイル転送を行うために,この新しいモデルのクラスを提案する。我々は,これらのモデルの先行的活用が推論時に新たな芸術的制御を顕在化できることを示すとともに,この新たなスタイル伝達の方向性を探究する上での知見を文書化する。 Neural Style Transfer (NST) is the field of study applying neural techniques to modify the artistic appearance of a content image to match the style of a reference style image. Traditionally, NST methods have focused on texture-based image edits, affecting mostly low level information and keeping most image structures the same. However, style-based deformation of the content is desirable for some styles, especially in cases where the style is abstract or the primary concept of the style is in its deformed rendition of some content. With the recent introduction of diffusion models, such as Stable Diffusion, we can access far more powerful image generation techniques, enabling new possibilities. In our work, we propose using this new class of models to perform style transfer while enabling deformable style transfer, an elusive capability in previous models. We show how leveraging the priors of these models can expose new artistic controls at inference time, and we document our findings in exploring this new direction for the field of style transfer.	翻訳日:2023-07-11 15:10:49 公開日:2023-07-09
# 空間文脈拡張のための潜在グラフ注意 Latent Graph Attention for Enhanced Spatial Context ( http://arxiv.org/abs/2307.04149v1 ) ライセンス: Link先を確認	Ayush Singh, Yash Bhambhu, Himanshu Buckchash, Deepak K. Gupta, Dilip K. Prasad	(参考訳) 画像のグローバルコンテキストは、画像から画像への翻訳問題で非常に有用である。従来のアテンションベースモデルとグラフベースモデルは、グローバルコンテキストをかなり捉えているが、これらは計算コストが高い。さらに、既存のアプローチは、画像上の任意の2点間のペアワイズ意味関係を学習することのみに限られる。本稿では、LGA(Latent Graph Attention)を、計算コストが低く(ノード数に比例して)、かつ、既存のアーキテクチャにグローバルコンテキストを組み込むための、安定的でモジュール化されたフレームワークとして提案する。 lgaは局所連結グラフのネットワークを用いて空間的に情報を伝達し、中間画素の影響も考慮した2つの空間的距離点間の意味的にコヒーレントな関係の構築を容易にする。さらに、グラフネットワークの深さを利用して、ターゲットデータセットへのコンテキスト拡散の程度を調整し、追加の計算コストを明示的に制御することができる。また,LGAの学習機構を向上するために,LGAモジュールを計算負荷の最小化を犠牲にして,元のアーキテクチャとうまく結合するのに役立つ新しい対照的な損失項を導入する。 LGAを取り入れることで、透明なオブジェクトセグメンテーション、デハジングのための画像復元、光フロー推定という3つの難解なアプリケーションの性能が向上することを示す。 Global contexts in images are quite valuable in image-to-image translation problems. Conventional attention-based and graph-based models capture the global context to a large extent, however, these are computationally expensive. Moreover, the existing approaches are limited to only learning the pairwise semantic relation between any two points on the image. In this paper, we present Latent Graph Attention (LGA) a computationally inexpensive (linear to the number of nodes) and stable, modular framework for incorporating the global context in the existing architectures, especially empowering small-scale architectures to give performance closer to large size architectures, thus making the light-weight architectures more useful for edge devices with lower compute power and lower energy needs. LGA propagates information spatially using a network of locally connected graphs, thereby facilitating to construct a semantically coherent relation between any two spatially distant points that also takes into account the influence of the intermediate pixels. Moreover, the depth of the graph network can be used to adapt the extent of contextual spread to the target dataset, thereby being able to explicitly control the added computational cost. To enhance the learning mechanism of LGA, we also introduce a novel contrastive loss term that helps our LGA module to couple well with the original architecture at the expense of minimal additional computational load. We show that incorporating LGA improves the performance on three challenging applications, namely transparent object segmentation, image restoration for dehazing and optical flow estimation.	翻訳日:2023-07-11 15:10:30 公開日:2023-07-09
# チャート分類に関する調査とアプローチ A Survey and Approach to Chart Classification ( http://arxiv.org/abs/2307.04147v1 ) ライセンス: Link先を確認	Anurag Dhote and Mohammed Javed and David S Doermann	(参考訳) チャートは文書における視覚情報の本質的な情報源であり、典型的には数値的に伝えられる情報の深い理解と解釈を促進する。科学文献には多くの図表があり、それぞれに様式的な違いがある。近年,文書理解コミュニティは,表分類から始まる自動チャート理解の問題に対処し始めている。本稿では,グラフ分類の最先端技術に関する調査を行い,利用可能なデータセットとその対応するチャートタイプについて考察する。これらの貢献をml、cnn、transformersに基づいた従来のアプローチに大まかに分類します。さらに、ICPR 2022におけるCHART-InfographicsコンペティションのためのCHARTINFO UB-UNITECH PMCデータセットについて、CNNベースのアプローチとトランスフォーマーベースのアプローチの比較分析を行った。データセットには、22,923のトレーニングイメージと13,260のテストイメージを含む15の異なるチャートカテゴリが含まれている。我々は,グラフ分類における最先端結果を生成するビジョンベーストランスフォーマーモデルを実装した。 Charts represent an essential source of visual information in documents and facilitate a deep understanding and interpretation of information typically conveyed numerically. In the scientific literature, there are many charts, each with its stylistic differences. Recently the document understanding community has begun to address the problem of automatic chart understanding, which begins with chart classification. In this paper, we present a survey of the current state-of-the-art techniques for chart classification and discuss the available datasets and their supported chart types. We broadly classify these contributions as traditional approaches based on ML, CNN, and Transformers. Furthermore, we carry out an extensive comparative performance analysis of CNN-based and transformer-based approaches on the recently published CHARTINFO UB-UNITECH PMC dataset for the CHART-Infographics competition at ICPR 2022. The data set includes 15 different chart categories, including 22,923 training images and 13,260 test images. We have implemented a vision-based transformer model that produces state-of-the-art results in chart classification.	翻訳日:2023-07-11 15:10:03 公開日:2023-07-09
# 機械学習のランダム性がグループフェアネスに及ぼす影響について On The Impact of Machine Learning Randomness on Group Fairness ( http://arxiv.org/abs/2307.04138v1 ) ライセンス: Link先を確認	Prakhar Ganesh, Hongyan Chang, Martin Strobel, Reza Shokri	(参考訳) 機械学習におけるグループフェアネスの統計的尺度は、異なるグループにわたるアルゴリズムのパフォーマンスのギャップを反映している。しかし、これらの尺度は異なるトレーニングインスタンス間で高いばらつきを示し、公平さの実証的評価には信頼できない。この大きなばらつきの原因は何でしょう? ニューラルネットワークのトレーニングにおけるランダム性の異なる源の群フェアネスへの影響について検討する。グループフェアネス尺度のばらつきは、非表現群における学習過程の高ボラティリティに根ざしていることを示す。さらに,学習中のデータ順序の確率性として,ランダム性の主源が認識される。これらの結果から,グループレベルの精度(すなわちモデルフェアネス)を1つのエポックのデータ順序を変更するだけで,モデル全体の性能に高い効率と無視可能な影響で制御できることを示す。 Statistical measures for group fairness in machine learning reflect the gap in performance of algorithms across different groups. These measures, however, exhibit a high variance between different training instances, which makes them unreliable for empirical evaluation of fairness. What causes this high variance? We investigate the impact on group fairness of different sources of randomness in training neural networks. We show that the variance in group fairness measures is rooted in the high volatility of the learning process on under-represented groups. Further, we recognize the dominant source of randomness as the stochasticity of data order during training. Based on these findings, we show how one can control group-level accuracy (i.e., model fairness), with high efficiency and negligible impact on the model's overall performance, by simply changing the data order for a single epoch.	翻訳日:2023-07-11 15:09:51 公開日:2023-07-09
# 画像分類問題における説明可能な人工知能モデル A Novel Explainable Artificial Intelligence Model in Image Classification problem ( http://arxiv.org/abs/2307.04137v1 ) ライセンス: Link先を確認	Quoc Hung Cao, Truong Thanh Hung Nguyen, Vo Thanh Khang Nguyen, Xuan Phong Nguyen	(参考訳) 近年、人工知能は様々な分野に広く適用され、人間の生活に深く直接的に影響を与えるようになっている。次に、予測を行うモデルの原則を理解する必要がある。現在の高精度モデルのほとんどはブラックボックスであるため、AI科学者もエンドユーザもこれらのモデル内で何が起きているのかを深く理解していません。したがって、AIモデル、特にLIME、CAM、GradCAMといったコンピュータビジョンの分野における画像分類の問題を説明するために、多くのアルゴリズムが研究されている。しかし、これらのアルゴリズムには、limeの長い実行時間やcamの具体性と明快さの紛らわしい解釈といった制限がある。そこで本稿では,これらのアルゴリズムの利点を組み合わせたセグメンテーション-クラス活性化マッピング(SeCAM)という新しい手法を提案する。我々は、このアルゴリズムを、画像Net Large Scale Visual Recognition Challenge (ILSVRC)データセットのResNet50、Inception-v3、VGG16など様々なモデルでテストした。アルゴリズムが特定の説明に対する全ての要求を非常に簡潔な時間で満たした際、優れた結果が得られる。 In recent years, artificial intelligence is increasingly being applied widely in many different fields and has a profound and direct impact on human life. Following this is the need to understand the principles of the model making predictions. Since most of the current high-precision models are black boxes, neither the AI scientist nor the end-user deeply understands what's going on inside these models. Therefore, many algorithms are studied for the purpose of explaining AI models, especially those in the problem of image classification in the field of computer vision such as LIME, CAM, GradCAM. However, these algorithms still have limitations such as LIME's long execution time and CAM's confusing interpretation of concreteness and clarity. Therefore, in this paper, we propose a new method called Segmentation - Class Activation Mapping (SeCAM) that combines the advantages of these algorithms above, while at the same time overcoming their disadvantages. We tested this algorithm with various models, including ResNet50, Inception-v3, VGG16 from ImageNet Large Scale Visual Recognition Challenge (ILSVRC) data set. Outstanding results when the algorithm has met all the requirements for a specific explanation in a remarkably concise time.	翻訳日:2023-07-11 15:09:39 公開日:2023-07-09
# ECL:ロングテール皮膚病変分類のためのクラスエンハンスメントコントラスト学習 ECL: Class-Enhancement Contrastive Learning for Long-tailed Skin Lesion Classification ( http://arxiv.org/abs/2307.04136v1 ) ライセンス: Link先を確認	Yilan Zhang, Jianqi Chen, Ke Wang, Fengying Xie	(参考訳) 皮膚画像データセットは、しばしば不均衡なデータ分布に悩まされ、コンピュータ支援皮膚疾患の診断が困難になる。最近の研究では、この長い課題に対して教師付きコントラスト学習(SCL)を活用している。性能は高いが、これらのSCLベースの手法はヘッドクラスに重点を置いているが、テールクラスにおける情報の利用は無視している。本稿では,マイノリティクラスの情報を充実させ,異なるクラスを等しく扱う,ecl(class-enhancement contrastive learning)を提案する。情報強化のために,クラス依存プロキシを生成するハイブリッドプロキシモデルを設計し,パラメータ最適化のためのサイクル更新戦略を提案する。 balanced-hybrid-proxy lossは、異なるクラスで等しく扱われるサンプルとプロキシの関係を利用するように設計されている。さらに,「不均衡データ」と「不均衡診断困難」を考慮に入れ,カリキュラム学習スケジュールに従って,バランスのとれたクロスエントロピー損失を示す。不均衡皮膚病変データの分類実験の結果,本手法の優位性と有効性が確認された。 Skin image datasets often suffer from imbalanced data distribution, exacerbating the difficulty of computer-aided skin disease diagnosis. Some recent works exploit supervised contrastive learning (SCL) for this long-tailed challenge. Despite achieving significant performance, these SCL-based methods focus more on head classes, yet ignoring the utilization of information in tail classes. In this paper, we propose class-Enhancement Contrastive Learning (ECL), which enriches the information of minority classes and treats different classes equally. For information enhancement, we design a hybrid-proxy model to generate class-dependent proxies and propose a cycle update strategy for parameters optimization. A balanced-hybrid-proxy loss is designed to exploit relations between samples and proxies with different classes treated equally. Taking both "imbalanced data" and "imbalanced diagnosis difficulty" into account, we further present a balanced-weighted cross-entropy loss following curriculum learning schedule. Experimental results on the classification of imbalanced skin lesion data have demonstrated the superiority and effectiveness of our method.	翻訳日:2023-07-11 15:09:20 公開日:2023-07-09
# 超音波画像のアノテーション除去 : 自己教師付きノイズ2ノイズアプローチ Ultrasonic Image's Annotation Removal: A Self-supervised Noise2Noise Approach ( http://arxiv.org/abs/2307.04133v1 ) ライセンス: Link先を確認	Yuanheng Zhang, Nan Jiang, Zhaoheng Xie, Junying Cao, Yueyang Teng	(参考訳) 正確な注釈付き超音波画像は、高品質な医療報告の重要な構成要素である。病院はしばしば、撮像結果に現れるべきアノテーションの種類について厳格なガイドラインを持っている。しかし、手動でこれらの画像を検査するのは面倒な作業です。ニューラルネットワークはプロセスを自動化する可能性があるが、そのようなモデルのトレーニングは通常、ペア化された入力とターゲットイメージのデータセットを必要とする。本研究では,画像中のアノテーションを自動検出する手法を提案する。これは、アノテーションをノイズとして扱い、自己教師付きプリテキストタスクを作成し、ノイズ2noiseスキームでトレーニングされたモデルを使用して、画像をクリーンな状態に復元することで実現される。我々は、ボディマーカーアノテーションやラジアルラインアノテーションなど、様々なタイプのアノテーションに対して、分節タスクで様々なモデル構造をテストした。その結果,ノイズ2ノイズ方式でトレーニングされたほとんどのモデルは,ノイズとクリーンなデータペアでトレーニングしたモデルよりも優れていた。コスチュームされたu-netは、ボディマーカーアノテーションデータセットにおいて最も最適な結果となり、セグメンテーションの精度と再構成の類似度が高い。私たちはコードをhttps://github.com/grandarth/ultrasonicimage-n2n-approachでリリースした。 Accurately annotated ultrasonic images are vital components of a high-quality medical report. Hospitals often have strict guidelines on the types of annotations that should appear on imaging results. However, manually inspecting these images can be a cumbersome task. While a neural network could potentially automate the process, training such a model typically requires a dataset of paired input and target images, which in turn involves significant human labour. This study introduces an automated approach for detecting annotations in images. This is achieved by treating the annotations as noise, creating a self-supervised pretext task and using a model trained under the Noise2Noise scheme to restore the image to a clean state. We tested a variety of model structures on the denoising task against different types of annotation, including body marker annotation, radial line annotation, etc. Our results demonstrate that most models trained under the Noise2Noise scheme outperformed their counterparts trained with noisy-clean data pairs. The costumed U-Net yielded the most optimal outcome on the body marker annotation dataset, with high scores on segmentation precision and reconstruction similarity. We released our code at https://github.com/GrandArth/UltrasonicImage-N2N-Approach.	翻訳日:2023-07-11 15:09:01 公開日:2023-07-09
# 副詞型認識のためのビデオクリップにおける物体の挙動に関する推論 Reasoning over the Behaviour of Objects in Video-Clips for Adverb-Type Recognition ( http://arxiv.org/abs/2307.04132v1 ) ライセンス: Link先を確認	Amrit Diggavi Seshadri, Alessandra Russo	(参考訳) 本稿では,シーン系列を記述した副詞が,高レベルなオブジェクト・ビヘイビアの概念を推論することによって最も識別されるという直感に従い,生のビデオクリップから抽出されたオブジェクト・ビヘイビアを理由とする新しいフレームワークの設計を提案し,クリップの対応する副詞タイプを認識する。本手法は,ビデオクリップのアクションタイプが不明なより一般的な問題設定において,従来のシーンの副詞認識では,アクションタイプに基づくクリップの知識を前提としていたが,本手法は直接的に適用可能である。具体的には、生のビデオクリップから人間の解釈可能な物体の挙動を抽出する新しいパイプラインを提案し、これら抽出された事実を操作して副詞型を識別する新しいシンボルと変換器に基づく推論手法を提案する。実験の結果,提案手法は従来の技術に対して好適に機能することが示された。さらに,シンボリックビデオ処理の取り組みをサポートするために,生のビデオクリップから抽出したオブジェクトビヘイビアファクトの2つの新しいデータセット,msr-vtt-asp と activitynet-asp データセットをリリースする。 In this work, following the intuition that adverbs describing scene-sequences are best identified by reasoning over high-level concepts of object-behavior, we propose the design of a new framework that reasons over object-behaviours extracted from raw-video-clips to recognize the clip's corresponding adverb-types. Importantly, while previous works for general scene adverb-recognition assume knowledge of the clips underlying action-types, our method is directly applicable in the more general problem setting where the action-type of a video-clip is unknown. Specifically, we propose a novel pipeline that extracts human-interpretable object-behaviour-facts from raw video clips and propose novel symbolic and transformer based reasoning methods that operate over these extracted facts to identify adverb-types. Experiment results demonstrate that our proposed methods perform favourably against the previous state-of-the-art. Additionally, to support efforts in symbolic video-processing, we release two new datasets of object-behaviour-facts extracted from raw video clips - the MSR-VTT-ASP and ActivityNet-ASP datasets.	翻訳日:2023-07-11 15:08:42 公開日:2023-07-09
# 炭素効率のよいニューラルアーキテクチャ探索 Carbon-Efficient Neural Architecture Search ( http://arxiv.org/abs/2307.04131v1 ) ライセンス: Link先を確認	Yiyang Zhao and Tian Guo	(参考訳) 本研究は, モデル設計過程におけるエネルギーコストの低減と炭素効率の向上を目的としたニューラルアーキテクチャサーチ(NAS)の新たなアプローチを提案する。 carbon- efficient nas (ce-nas) と呼ばれるこのフレームワークは、異なるエネルギー要件を持つnas評価アルゴリズム、マルチ目的オプティマイザ、ヒューリスティックなgpu割り当て戦略で構成されている。 CE-NASは、現在の二酸化炭素排出量に基づくエネルギー効率サンプリングとエネルギー消費評価タスクを動的にバランスさせる。最近のnasベンチマークデータセットと2つのカーボントレースを用いて、ce-nasが3つのベースラインよりも優れた炭素と検索効率を達成していることを示す。 This work presents a novel approach to neural architecture search (NAS) that aims to reduce energy costs and increase carbon efficiency during the model design process. The proposed framework, called carbon-efficient NAS (CE-NAS), consists of NAS evaluation algorithms with different energy requirements, a multi-objective optimizer, and a heuristic GPU allocation strategy. CE-NAS dynamically balances energy-efficient sampling and energy-consuming evaluation tasks based on current carbon emissions. Using a recent NAS benchmark dataset and two carbon traces, our trace-driven simulations demonstrate that CE-NAS achieves better carbon and search efficiency than the three baselines.	翻訳日:2023-07-11 15:08:20 公開日:2023-07-09
# RGB-Event Transformer-Tracker におけるクロスモーダル直交高階化 Cross-modal Orthogonal High-rank Augmentation for RGB-Event Transformer-trackers ( http://arxiv.org/abs/2307.04129v1 ) ライセンス: Link先を確認	Zhiyu Zhu, Junhui Hou, and Dapeng Oliver Wu	(参考訳) 本稿では,RGBビデオとイベントデータからのクロスモーダルオブジェクト追跡の問題に対処する。複雑なクロスモーダル融合ネットワークを構築するのではなく、事前学習された視覚変換器(ViT)の大きな可能性を探る。特に,2つのモード間の広い分散ギャップを橋渡しし,網羅的な相互モーダル情報通信を可能にし,その能力を高めるプラグイン・アンド・プレイ・トレーニングの強化を微妙に調査する。具体的には,あるトークンの特定のモダリティをランダムにマスクして,異なるモダリティからのトークン間のインタラクションを積極的に実施するマスクモデリング戦略を提案する。マスキング戦略によるネットワーク振動を緩和し、さらにその正の効果を増幅するため、理論上は注意行列を正則化する直交高ランク損失を提案する。広汎な実験により、我々のプラグアンドプレイトレーニング強化技術は、追跡精度と成功率の両方の観点から、最先端の1ストリームと2ストリームのトラッカーを大幅に向上させることができることが示された。我々の新たな視点と発見は、強力なトレーニング済みのViTを使って、クロスモーダルデータをモデル化する分野に洞察をもたらす可能性がある。コードは公開される予定だ。 This paper addresses the problem of cross-modal object tracking from RGB videos and event data. Rather than constructing a complex cross-modal fusion network, we explore the great potential of a pre-trained vision Transformer (ViT). Particularly, we delicately investigate plug-and-play training augmentations that encourage the ViT to bridge the vast distribution gap between the two modalities, enabling comprehensive cross-modal information interaction and thus enhancing its ability. Specifically, we propose a mask modeling strategy that randomly masks a specific modality of some tokens to enforce the interaction between tokens from different modalities interacting proactively. To mitigate network oscillations resulting from the masking strategy and further amplify its positive effect, we then theoretically propose an orthogonal high-rank loss to regularize the attention matrix. Extensive experiments demonstrate that our plug-and-play training augmentation techniques can significantly boost state-of-the-art one-stream and twostream trackers to a large extent in terms of both tracking precision and success rate. Our new perspective and findings will potentially bring insights to the field of leveraging powerful pre-trained ViTs to model cross-modal data. The code will be publicly available.	翻訳日:2023-07-11 15:08:08 公開日:2023-07-09
# 周波数変調光パラメトリック発振器 Integrated frequency-modulated optical parametric oscillator ( http://arxiv.org/abs/2307.04200v1 ) ライセンス: Link先を確認	Hubert S. Stokowski, Devin J. Dean, Alexander Y. Hwang, Taewon Park, Oguz Tolga Celik, Marc Jankowski, Carsten Langrock, Vahid Ansari, Martin M. Fejer, and Amir H. Safavi-Naeini	(参考訳) 光周波数コムは精密測定、時間保存、分子分光に革命をもたらした。コーム生成技術をコンパクトで信頼性の高いフォトニックプラットフォームに統合することである。最近のマイクロコンブ生成のアプローチには、電気光学(eo)機構とケラー機構がある。急速な進歩にもかかわらず、高い効率と広い帯域幅を維持することは依然として困難である。本稿では、電気光学とパラメトリック増幅を組み合わせて周波数変調光パラメトリック発振器(FM-OPO)を生成する集積型光周波数コム発生器の新たなクラスを紹介する。 eoやカーコームとは対照的に、fm-opoマイクロコームはパルスを形成するのではなく、周波数変調レーザーに似た出力で操作の単純さと高効率なポンプ電力利用を維持している。 FM-OPOの動作原理を概説し, 薄膜ニオブ酸リチウム (LNOI) で完全な光学系を作製した。約1,000モード (約6 THz) にまたがるほぼ平らなスペクトル分布に対して, 内部変換効率が93%(34%外結合)を超えるようにポンプを計測した。 EOコムと比較して、損失よりもキャビティ分散がFM-OPO帯域幅を決定するので、より小さいRF変調パワーでブロードバンドコムを実現することができる。 fm-opoマイクロコームは、その堅牢な運用ダイナミクス、高効率、大きな帯域幅を持ち、マイクロコームの分野への新しいアプローチに貢献し、小型化による精密測定の時代と、メトロロジー、スペクトロスコピー、通信、センシング、コンピューティングの進歩を加速する分光ツールの確立を約束している。 Optical frequency combs have revolutionized precision measurement, time-keeping, and molecular spectroscopy. A substantial effort has developed around "microcombs": integrating comb-generating technologies into compact, reliable photonic platforms. Current approaches for generating these microcombs involve either the electro-optic (EO) or Kerr mechanisms. Despite rapid progress, maintaining high efficiency and wide bandwidth remains challenging. Here, we introduce a new class of microcomb -- an integrated optical frequency comb generator that combines electro-optics and parametric amplification to yield a frequency-modulated optical parametric oscillator (FM-OPO). In stark contrast to EO and Kerr combs, the FM-OPO microcomb does not form pulses but maintains operational simplicity and highly efficient pump power utilization with an output resembling a frequency-modulated laser. We outline the working principles of FM-OPO and demonstrate them by fabricating the complete optical system in thin-film lithium niobate (LNOI). We measure pump to comb internal conversion efficiency exceeding 93% (34% out-coupled) over a nearly flat-top spectral distribution spanning approximately 1,000 modes (approximately 6 THz). Compared to an EO comb, the cavity dispersion rather than loss determines the FM-OPO bandwidth, enabling broadband combs with a smaller RF modulation power. The FM-OPO microcomb, with its robust operational dynamics, high efficiency, and large bandwidth, contributes a new approach to the field of microcombs and promises to herald an era of miniaturized precision measurement, and spectroscopy tools to accelerate advancements in metrology, spectroscopy, telecommunications, sensing, and computing.	翻訳日:2023-07-11 15:02:24 公開日:2023-07-09
# 広波長可変薄膜ニオブ酸リチウム光パラメトリック発振器を用いた中赤外分光 Mid-infrared spectroscopy with a broadly tunable thin-film lithium niobate optical parametric oscillator ( http://arxiv.org/abs/2307.04199v1 ) ライセンス: Link先を確認	Alexander Y. Hwang, Hubert S. Stokowski, Taewon Park, Marc Jankowski, Timothy P. McKenna, Carsten Langrock, Jatadhari Mishra, Vahid Ansari, Martin M. Fejer, and Amir H. Safavi-Naeini	(参考訳) 中赤外分光法(mid-infrared spectroscopy)は、分子を感知する重要な技術であり、調整範囲が限られているか、現場での使用のために過度にかさばる源からの障壁に遭遇している。本稿では,これらの課題を克服した,コンパクトで効率的な広帯域可変光パラメトリック発振器(OPO)を提案する。薄膜ニオブ酸リチウムオンサファイアに実装した分散工学による単共振OPOを用いて,オクターブを1.5ミクロンから3.3ミクロンの範囲で広帯域かつ制御したチューニングを実現する。この装置は3.2ミクロンで25mWの赤外線光を生成し、電力変換効率は15%(量子効率45%)である。メタンとアンモニアのスペクトルを計測し, ガス検知に対するアプローチの有効性を検証することで, 装置のチューニングと性能を実証した。我々の装置は、非線形フォトニクスの小型化における重要な進歩を示し、高速・ブロードバンド中赤外分光の実用的応用を現実に近づける。 Mid-infrared spectroscopy, an important and widespread technique for sensing molecules, has encountered barriers stemming from sources either limited in tuning range or excessively bulky for practical field use. We present a compact, efficient, and broadly tunable optical parametric oscillator (OPO) device surmounting these challenges. Leveraging a dispersion-engineered singly-resonant OPO implemented in thin-film lithium niobate-on-sapphire, we achieve broad and controlled tuning over an octave, from 1.5 to 3.3 microns by combining laser and temperature tuning. The device generates > 25 mW of mid-infrared light at 3.2 microns, offering a power conversion efficiency of 15% (45% quantum efficiency). We demonstrate the tuning and performance of the device by successfully measuring the spectra of methane and ammonia, verifying our approach's relevance for gas sensing. Our device signifies an important advance in nonlinear photonics miniaturization and brings practical field applications of high-speed and broadband mid-infrared spectroscopy closer to reality.	翻訳日:2023-07-11 15:01:56 公開日:2023-07-09
# 現場作業におけるロボットアシスタントとの直感的対話のための自然言語指導 Natural Language Instructions for Intuitive Human Interaction with Robotic Assistants in Field Construction Work ( http://arxiv.org/abs/2307.04195v1 ) ライセンス: Link先を確認	Somin Park, Xi Wang, Carol C. Menassa, Vineet R. Kamat, Joyce Y. Chai	(参考訳) ロボットの導入は、建設産業に支障をきたす労働者不足や生産性の停滞を緩和する大きな可能性を秘めていると考えられている。しかし、複雑で非構造な建設現場で完全自動化されたロボットを使うことは困難である。ヒューマンロボットコラボレーション(HRC)は、建設作業に固有の不確実性に共同で対処するために、人間の労働者の柔軟性とロボットアシスタントの身体能力を組み合わせることを約束している。建設にHRCを導入する際には、現場建設におけるチームワークと監督の重要性を認識し、ヒューマンワーカーとロボットアシスタントの自然な直感的なコミュニケーションシステムを確立することが重要である。自然言語に基づく対話は、ロボットプログラミングの非熟練者のために、直感的で親しみやすいロボットとのコミュニケーションを可能にする。しかし、この話題に関する限定的な研究が建設中である。本稿では,人間の作業者が自然言語に基づく建設ロボットと対話できる枠組みを提案する。提案手法は,自然言語理解(NLU),情報マッピング(IM),ロボット制御(RC)の3段階からなる。自然言語命令は言語モデルに入力され、NLUモジュール内の各単語のタグを予測する。 IMモジュールは、NLUモジュールの結果とコンポーネント情報を用いて、ロボットが建設作業を認識し実行するために必要となる最終命令出力を生成する。提案手法を評価するために, ドライウォール設置の事例検討を行った。その結果,人間ロボットチームのコンテキスト内での作業者間のコミュニケーションを再現するために,自然言語によるインタラクションを利用する可能性を強調した。 The introduction of robots is widely considered to have significant potential of alleviating the issues of worker shortage and stagnant productivity that afflict the construction industry. However, it is challenging to use fully automated robots in complex and unstructured construction sites. Human-Robot Collaboration (HRC) has shown promise of combining human workers' flexibility and robot assistants' physical abilities to jointly address the uncertainties inherent in construction work. When introducing HRC in construction, it is critical to recognize the importance of teamwork and supervision in field construction and establish a natural and intuitive communication system for the human workers and robotic assistants. Natural language-based interaction can enable intuitive and familiar communication with robots for human workers who are non-experts in robot programming. However, limited research has been conducted on this topic in construction. This paper proposes a framework to allow human workers to interact with construction robots based on natural language instructions. The proposed method consists of three stages: Natural Language Understanding (NLU), Information Mapping (IM), and Robot Control (RC). Natural language instructions are input to a language model to predict a tag for each word in the NLU module. The IM module uses the result of the NLU module and building component information to generate the final instructional output essential for a robot to acknowledge and perform the construction task. A case study for drywall installation is conducted to evaluate the proposed approach. The obtained results highlight the potential of using natural language-based interaction to replicate the communication that occurs between human workers within the context of human-robot teams.	翻訳日:2023-07-11 15:01:37 公開日:2023-07-09
# SAS Video-QA: 効率的なビデオ質問応答のための自己適応サンプリング SAS Video-QA: Self-Adaptive Sampling for Efficient Video Question-Answering ( http://arxiv.org/abs/2307.04192v1 ) ライセンス: Link先を確認	Wei Han, Hui Chen, Min-Yen Kan, Soujanya Poria	(参考訳) ビデオ質問応答は、ビデオ理解の分野における基本的な課題である。ビデオ変換器を備えた現在の視覚言語モデル(VLM)では、時間的モデリングが可能であり、優れた結果が得られるが、計算能力の巨大なコストがかかるため、リアルタイムアプリケーションシナリオへのデプロイには高すぎる。 An economical workaround only samples a small portion of frames to represent the main content of that video and tune an image--text model on these sampled frames. Recent video understanding models usually randomly sample a set of frames or clips, regardless of internal correlations between their visual contents, nor their relevance to the problem. We argue that such kinds of aimless sampling may omit the key frames from which the correct answer can be deduced, and the situation gets worse when the sampling sparsity increases, which always happens as the video lengths increase. To mitigate this issue, we propose two frame sampling strategies, namely the most domain frames (MDF) and most implied frames (MIF), to maximally preserve those frames that are most likely vital to the given questions. MDF passively minimizes the risk of key frame omission in a bootstrap manner, while MIS actively searches key frames customized for each video--question pair with the assistance of auxiliary models. 3つの高度なVLM(CLIP, GIT, All-in-one)による3つの公開データセットに対する実験結果から,提案手法が画像テキスト事前学習モデルの性能を向上させることを示す。本論文で提案されている手法に関するソースコードはhttps://github.com/declare-lab/sas-vqa.comで公開されている。 Video question--answering is a fundamental task in the field of video understanding. Although current vision--language models (VLMs) equipped with Video Transformers have enabled temporal modeling and yielded superior results, they are at the cost of huge computational power and thus too expensive to deploy in real-time application scenarios. An economical workaround only samples a small portion of frames to represent the main content of that video and tune an image--text model on these sampled frames. Recent video understanding models usually randomly sample a set of frames or clips, regardless of internal correlations between their visual contents, nor their relevance to the problem. We argue that such kinds of aimless sampling may omit the key frames from which the correct answer can be deduced, and the situation gets worse when the sampling sparsity increases, which always happens as the video lengths increase. To mitigate this issue, we propose two frame sampling strategies, namely the most domain frames (MDF) and most implied frames (MIF), to maximally preserve those frames that are most likely vital to the given questions. MDF passively minimizes the risk of key frame omission in a bootstrap manner, while MIS actively searches key frames customized for each video--question pair with the assistance of auxiliary models. The experimental results on three public datasets from three advanced VLMs (CLIP, GIT and All-in-one) demonstrate that our proposed strategies can boost the performance for image--text pretrained models. The source codes pertaining to the method proposed in this paper are publicly available at https://github.com/declare-lab/sas-vqa.	翻訳日:2023-07-11 15:01:12 公開日:2023-07-09
# ロジスティック回帰における推定のサンプル複雑性について On the sample complexity of estimation in logistic regression ( http://arxiv.org/abs/2307.04191v1 ) ライセンス: Link先を確認	Daniel Hsu, Arya Mazumdar	(参考訳) ロジスティック回帰モデルは、ノイズの多いバイナリ分類問題において最も一般的なデータ生成モデルの一つである。本研究では,ロジスティック回帰モデルのパラメータを与えられた$\ell_2$誤差まで推定するサンプルの複雑さを,標準正規共変量を用いて,次元と逆温度の観点から検討する。逆温度は、データ生成プロセスの信号対雑音比を制御する。一般化境界とロジスティック回帰のための最大類似推定器の漸近的性能はよく研究されているが、誤差依存性とパラメータ推定の逆温度を示す非漸近的サンプル複雑性は、以前の解析から外れている。サンプルの複雑性曲線は逆温度の点で2つの変化点(もしくは臨界点)を持ち、低、中、高温の状態を明確に分離していることを示す。 The logistic regression model is one of the most popular data generation model in noisy binary classification problems. In this work, we study the sample complexity of estimating the parameters of the logistic regression model up to a given $\ell_2$ error, in terms of the dimension and the inverse temperature, with standard normal covariates. The inverse temperature controls the signal-to-noise ratio of the data generation process. While both generalization bounds and asymptotic performance of the maximum-likelihood estimator for logistic regression are well-studied, the non-asymptotic sample complexity that shows the dependence on error and the inverse temperature for parameter estimation is absent from previous analyses. We show that the sample complexity curve has two change-points (or critical points) in terms of the inverse temperature, clearly separating the low, moderate, and high temperature regimes.	翻訳日:2023-07-11 15:00:47 公開日:2023-07-09
# 異種グラフ表現学習を用いた病理組織学的全スライド画像解析 Histopathology Whole Slide Image Analysis with Heterogeneous Graph Representation Learning ( http://arxiv.org/abs/2307.04189v1 ) ライセンス: Link先を確認	Tsai Hor Chan, Fernando Julio Cendra, Lan Ma, Guosheng Yin, Lequan Yu	(参考訳) 種々の組織間の空間的関係をモデル化することの利点から,wsi解析にグラフベースの手法が広く適用されている。しかしながら、既存の手法のほとんどは、均質なグラフ(例えば、均質なノード型)によるwsisのモデリングに焦点を当てている。その成功にもかかわらず、これらの作品はwsiにおける生物学的実体間の複雑な構造的関係(例えば、異なる細胞種間の多様な相互作用)を採掘することができない。本稿では,WSI分析のために,異なる種類の核間の相互関係を利用する新しい異種グラフベースのフレームワークを提案する。具体的には、各ノードに"nucleus-type"属性と各エッジにセマンティック類似性属性を持つ異種グラフとしてwsiを定式化する。次に,マッサージアグリゲーション中にエッジとノードの不均一性を利用する新しい異種グラフエッジ属性トランスフォーマー(heat)を提案する。さらに,従来のクラスタベースプールの過度パラメータ化問題を緩和できるグラフレベルの特徴量を得るための,擬似ラベルベースのセマンティック一貫性プーリング機構を設計する。さらに,既存の連想型ローカライズ手法の限界を観測し,各ノードの寄与を因果駆動アプローチにより,フレームワークの解釈性を向上させることを提案する。 3つの公開TCGAベンチマークデータセットに対する大規模な実験により、我々のフレームワークは様々なタスクに対してかなりのマージンで最先端の手法よりも優れています。私たちのコードはhttps://github.com/HKU-MedAI/WSI-HGNNで公開されています。 Graph-based methods have been extensively applied to whole-slide histopathology image (WSI) analysis due to the advantage of modeling the spatial relationships among different entities. However, most of the existing methods focus on modeling WSIs with homogeneous graphs (e.g., with homogeneous node type). Despite their successes, these works are incapable of mining the complex structural relations between biological entities (e.g., the diverse interaction among different cell types) in the WSI. We propose a novel heterogeneous graph-based framework to leverage the inter-relationships among different types of nuclei for WSI analysis. Specifically, we formulate the WSI as a heterogeneous graph with "nucleus-type" attribute to each node and a semantic similarity attribute to each edge. We then present a new heterogeneous-graph edge attribute transformer (HEAT) to take advantage of the edge and node heterogeneity during massage aggregating. Further, we design a new pseudo-label-based semantic-consistent pooling mechanism to obtain graph-level features, which can mitigate the over-parameterization issue of conventional cluster-based pooling. Additionally, observing the limitations of existing association-based localization methods, we propose a causal-driven approach attributing the contribution of each node to improve the interpretability of our framework. Extensive experiments on three public TCGA benchmark datasets demonstrate that our framework outperforms the state-of-the-art methods with considerable margins on various tasks. Our codes are available at https://github.com/HKU-MedAI/WSI-HGNN.	翻訳日:2023-07-11 15:00:25 公開日:2023-07-09
# 動画圧縮のための予測符号化 Predictive Coding For Animation-Based Video Compression ( http://arxiv.org/abs/2307.04187v1 ) ライセンス: Link先を確認	Goluck Konuko, St\'ephane Lathuili\`ere and Giuseppe Valenzise	(参考訳) 会議型アプリケーションにおいて,映像を効率よく圧縮する問題に対処する。画像アニメーションをベースとした近年のアプローチは, 粗いキーポイントの集合で顔の動きを表現することで, 非常に低ビットレートで良好な再構成品質を実現することができる。しかし、これらの手法はフレームバイフレーム方式で映像をエンコードする、すなわち、各フレームは参照フレームから再構成されるため、帯域幅が大きくなると再構成品質が制限される。そこで我々は,画像アニメーションを予測器として用いる予測符号化方式を提案し,実際の対象フレームに対する残差を符号化する。残差は予測的な方法でコード化できるため、効率良く時間依存を取り除くことができる。実験の結果, HEVCビデオ標準に比べて70%以上, VVCに比べて30%以上, 有意なビットレート増加が認められた。 We address the problem of efficiently compressing video for conferencing-type applications. We build on recent approaches based on image animation, which can achieve good reconstruction quality at very low bitrate by representing face motions with a compact set of sparse keypoints. However, these methods encode video in a frame-by-frame fashion, i.e. each frame is reconstructed from a reference frame, which limits the reconstruction quality when the bandwidth is larger. Instead, we propose a predictive coding scheme which uses image animation as a predictor, and codes the residual with respect to the actual target frame. The residuals can be in turn coded in a predictive manner, thus removing efficiently temporal dependencies. Our experiments indicate a significant bitrate gain, in excess of 70% compared to the HEVC video standard and over 30% compared to VVC, on a datasetof talking-head videos	翻訳日:2023-07-11 14:59:57 公開日:2023-07-09
# 生成型大規模言語モデルによるasr誤り訂正は可能か? Can Generative Large Language Models Perform ASR Error Correction? ( http://arxiv.org/abs/2307.04172v1 ) ライセンス: Link先を確認	Rao Ma, Mengjie Qian, Potsawee Manakul, Mark Gales, Kate Knill	(参考訳) ASR誤り訂正は、音声認識システムにおける後処理の重要な部分であり続けている。伝統的にこれらのモデルは、基礎となるasrシステムと参照テキストのデコード結果を使用して教師付きトレーニングでトレーニングされる。このアプローチは計算集約的であり、基礎となるASRモデルを切り替える際にモデルを再訓練する必要がある。近年,大規模言語モデルの開発や,自然言語処理タスクをゼロショットで行う能力が注目されている。本稿では,チャットgptを実例とし,ゼロショットまたは1ショット設定でasr誤り訂正を行う能力について検討する。我々は,asr n-bestリストをモデル入力として使用し,制約なし誤り訂正とn-best制約付き誤り補正法を提案する。コンフォーメータトランスデューサモデルと事前学習されたwhisperモデルの結果から,強力なchatgptモデルを用いた誤り訂正により,asrシステムの性能が大幅に向上することが示された。 ASR error correction continues to serve as an important part of post-processing for speech recognition systems. Traditionally, these models are trained with supervised training using the decoding results of the underlying ASR system and the reference text. This approach is computationally intensive and the model needs to be re-trained when switching the underlying ASR model. Recent years have seen the development of large language models and their ability to perform natural language processing tasks in a zero-shot manner. In this paper, we take ChatGPT as an example to examine its ability to perform ASR error correction in the zero-shot or 1-shot settings. We use the ASR N-best list as model input and propose unconstrained error correction and N-best constrained error correction methods. Results on a Conformer-Transducer model and the pre-trained Whisper model show that we can largely improve the ASR system performance with error correction using the powerful ChatGPT model.	翻訳日:2023-07-11 14:59:43 公開日:2023-07-09
# 教師なし混合手法によるRedditからのドリームコンテンツ発見 Dream Content Discovery from Reddit with an Unsupervised Mixed-Method Approach ( http://arxiv.org/abs/2307.04167v1 ) ライセンス: Link先を確認	Anubhab Das, Sanja \v{S}\'cepanovi\'c, Luca Maria Aiello, Remington Mallett, Deirdre Barrett, and Daniele Quercia	(参考訳) 夢は人間の体験の基本的な部分ですが、完全には理解されていません。伝統的なドリーム分析のプラクティスは、130以上のユニークなスケールと評価システムによって人気があり助けられているが、制限がある。主に振り返り調査や研究室の調査に基づいて、それらは大規模に適用されるか、異なる夢のテーマ間の重要性とつながりを示すのに苦労している。これらの問題を克服するために,自然言語処理による自由形式のドリームレポートにおけるトピックを識別するためのデータ駆動型混合手法を開発した。 Redditのr/Dreamsサブレディット(r/Dreams subreddit)の44,213のドリームレポートでこの方法を試したところ、217のトピックが22の大きなテーマにまとめられました。広範に使用されているホールとファン・デ・キャッスルのスケールと比較し,そのトピックを検証する。従来のスケールを超えて、様々な種類の夢(悪夢や繰り返しの夢など)に特有のパターンを見つけ、話題の重要性とつながりを理解し、covid-19パンデミックや最近のロシア・ウクライナ戦争のような主要な出来事に関する集団的な夢体験の変化を観察します。本手法の応用は,夢の複雑な性質に対する貴重な洞察を与えるものと期待する。 Dreaming is a fundamental but not fully understood part of human experience that can shed light on our thought patterns. Traditional dream analysis practices, while popular and aided by over 130 unique scales and rating systems, have limitations. Mostly based on retrospective surveys or lab studies, they struggle to be applied on a large scale or to show the importance and connections between different dream themes. To overcome these issues, we developed a new, data-driven mixed-method approach for identifying topics in free-form dream reports through natural language processing. We tested this method on 44,213 dream reports from Reddit's r/Dreams subreddit, where we found 217 topics, grouped into 22 larger themes: the most extensive collection of dream topics to date. We validated our topics by comparing it to the widely-used Hall and van de Castle scale. Going beyond traditional scales, our method can find unique patterns in different dream types (like nightmares or recurring dreams), understand topic importance and connections, and observe changes in collective dream experiences over time and around major events, like the COVID-19 pandemic and the recent Russo-Ukrainian war. We envision that the applications of our method will provide valuable insights into the intricate nature of dreaming.	翻訳日:2023-07-11 14:59:26 公開日:2023-07-09
# 深部特徴統計モデルによる映像サーベイランスにおける偽アラームの低減 Reducing False Alarms in Video Surveillance by Deep Feature Statistical Modeling ( http://arxiv.org/abs/2307.04159v1 ) ライセンス: Link先を確認	Xavier Bou, Aitor Artola, Thibaud Ehret, Gabriele Facciolo, Jean-Michel Morel, Rafael Grompone von Gioi	(参考訳) 関連する変化を検出することは、ビデオ監視の根本的な問題である。データの可変性が高く、適切に変更をアノテートすることが難しいため、教師なしのメソッドがフィールドを支配している。実用性を実現する上で最も重要な問題のひとつは、誤報率を下げることだろう。本研究では, 深部特徴の高次元統計モデルに基づく手法に依存しない弱教師付きa-コントラリオ検証法を開発し, 変化検出アルゴリズムの誤報数を削減する。また,ほとんどの実アプリケーションの性能要求を正確に把握できないため,従来の画素評価では不十分である。このため、画素単位のメトリクスとオブジェクト単位のメトリクスを補完し、異なるデータセットからの6つのメソッドと複数のシーケンスに対して、画素レベルとオブジェクトレベルの両方でのアプローチの影響を評価する。実験結果から,提案するa-contrarioバリデーションにより,画素レベルとオブジェクトレベルでの誤報数を大幅に削減できることがわかった。 Detecting relevant changes is a fundamental problem of video surveillance. Because of the high variability of data and the difficulty of properly annotating changes, unsupervised methods dominate the field. Arguably one of the most critical issues to make them practical is to reduce their false alarm rate. In this work, we develop a method-agnostic weakly supervised a-contrario validation process, based on high dimensional statistical modeling of deep features, to reduce the number of false alarms of any change detection algorithm. We also raise the insufficiency of the conventionally used pixel-wise evaluation, as it fails to precisely capture the performance needs of most real applications. For this reason, we complement pixel-wise metrics with object-wise metrics and evaluate the impact of our approach at both pixel and object levels, on six methods and several sequences from different datasets. Experimental results reveal that the proposed a-contrario validation is able to largely reduce the number of false alarms at both pixel and object levels.	翻訳日:2023-07-11 14:58:58 公開日:2023-07-09
# 感度インフォーム多項式カオス展開と深部生成ネットワークを用いた地質コンプレックスによるベイズ旅行時間トモグラフィの効率化 Efficient Bayesian travel-time tomography with geologically-complex priors using sensitivity-informed polynomial chaos expansion and deep generative networks ( http://arxiv.org/abs/2307.04228v1 ) ライセンス: Link先を確認	Giovanni Angelo Meles, Macarena Amaya, Shiran Levy, Stefano Marelli, Niklas Linde	(参考訳) モンテカルロ・マルコフ・チェーン (mcmc) 法は、事前分布の正確なキャラクタリゼーションと確率の効率的な評価という2つの基本的な課題に直面する。トモグラフィーに関するベイズ研究の文脈では、主成分分析(PCA)は、計算集約的な全物理前方解法を置き換えるために多項式カオス展開(PCE)に基づく正確な代理モデルの実装を可能にすると同時に、事前分布の直接的な定義を容易にする。 PCAが、より深い生成モデル(VAE)のような、事前の配布方法を簡単に定義する手段を提供していないシナリオに直面する場合、実行可能なオプションとして使用できる。しかしながら、VAEの潜伏パラメータとフォワードモデリングの出力との間の複雑な非線形関係を捉えることができるサロゲートを正確に生成することは、注目すべき課題である。実際、PCEモデルは、入力-出力関係が比較的低次多変量多項式によって効果的に近似できる場合に高い精度を提供するが、この条件は通常、深層生成モデルから派生した潜時変数を利用する際には未成熟である。本研究では,prio表現の観点からのvaeの優れた再構成性能と,ベイズ地中レーダ(gpr)トモグラフィの文脈におけるpca-pceサロゲートモデル精度を組み合わせた手法を提案する。 MCMCプロセス内では、VOEのパラメトリゼーションが事前探索とサンプル提案に利用される。同時に、VAEサンプルのグローバルまたはローカルに定義された主成分を検査対象とするPCEを用いてモデリングを行う。 Monte Carlo Markov Chain (MCMC) methods commonly confront two fundamental challenges: the accurate characterization of the prior distribution and the efficient evaluation of the likelihood. In the context of Bayesian studies on tomography, principal component analysis (PCA) can in some cases facilitate the straightforward definition of the prior distribution, while simultaneously enabling the implementation of accurate surrogate models based on polynomial chaos expansion (PCE) to replace computationally intensive full-physics forward solvers. When faced with scenarios where PCA does not offer a direct means of easily defining the prior distribution alternative methods like deep generative models (e.g., variational autoencoders (VAEs)), can be employed as viable options. However, accurately producing a surrogate capable of capturing the intricate non-linear relationship between the latent parameters of a VAE and the outputs of forward modeling presents a notable challenge. Indeed, while PCE models provide high accuracy when the input-output relationship can be effectively approximated by relatively low-degree multivariate polynomials, this condition is typically unmet when utilizing latent variables derived from deep generative models. In this contribution, we present a strategy that combines the excellent reconstruction performances of VAE in terms of prio representation with the accuracy of PCA-PCE surrogate modeling in the context of Bayesian ground penetrating radar (GPR) travel-time tomography. Within the MCMC process, the parametrization of the VAE is leveraged for prior exploration and sample proposal. Concurrently, modeling is conducted using PCE, which operates on either globally or locally defined principal components of the VAE samples under examination.	翻訳日:2023-07-11 14:50:58 公開日:2023-07-09
# 再サンプリングを伴う拡散入射モデルに基づく地震データ補間 Seismic Data Interpolation based on Denoising Diffusion Implicit Models with Resampling ( http://arxiv.org/abs/2307.04226v1 ) ライセンス: Link先を確認	Xiaoli Wei, Chunxia Zhang, Hongtao Wang, Chengli Tan, Deng Xiong, Baisong Jiang, Jiangshe Zhang, Sang-Woon Kim	(参考訳) 空間拡張に伴う痕跡の欠如に起因する地震データの不完全性は,地下地質構造の撮像品質を著しく損なう障害や経済的な制約が存在するため,地震探査において一般的な問題である。近年, 深層学習に基づく補間法が有望な進歩を遂げているが, 生成型逆ネットワークの安定な訓練は容易ではなく, テストやトレーニングの欠落パターンが一致しない場合, 性能劣化が顕著である。そこで本稿では,再サンプリングによる暗黙的拡散モデルを提案する。モデルトレーニングは、U-Netが各ステップのノイズにマッチするマルチヘッド自己アテンションを備えているデノナイジング拡散確率モデルに基づいて行われる。グローバルノイズ構成としてのコサインノイズスケジュールは、過大なノイズステージの通過を加速することにより、既知のトレース情報の高利用を促進する。モデル推論は、既知のトレースの条件付けである拡散暗黙モデルを利用して、拡散ステップの少ない高品質な補間を可能にする。各逆ステップにおける既知のトレースと不足トレースとの一貫性を高めるために、推論プロセスは、再サンプリング戦略を統合し、以前の補間されたトレースに記録された情報を取得する。合成およびフィールド地震探査データを用いた大規模実験により, モデルが優れていること, 各種の欠落パターンに対するロバスト性について検証した。また不確かさの定量化とアブレーションの研究も行われている。 The incompleteness of the seismic data caused by missing traces along the spatial extension is a common issue in seismic acquisition due to the existence of obstacles and economic constraints, which severely impairs the imaging quality of subsurface geological structures. Recently, deep learning-based seismic interpolation methods have attained promising progress, while achieving stable training of generative adversarial networks is not easy, and performance degradation is usually notable if the missing patterns in the testing and training do not match. In this paper, we propose a novel seismic denoising diffusion implicit model with resampling. The model training is established on the denoising diffusion probabilistic model, where U-Net is equipped with the multi-head self-attention to match the noise in each step. The cosine noise schedule, serving as the global noise configuration, promotes the high utilization of known trace information by accelerating the passage of the excessive noise stages. The model inference utilizes the denoising diffusion implicit model, conditioning on the known traces, to enable high-quality interpolation with fewer diffusion steps. To enhance the coherency between the known traces and the missing traces within each reverse step, the inference process integrates a resampling strategy to achieve an information recap on the former interpolated traces. Extensive experiments conducted on synthetic and field seismic data validate the superiority of our model and its robustness on various missing patterns. In addition, uncertainty quantification and ablation studies are also investigated.	翻訳日:2023-07-11 14:50:26 公開日:2023-07-09
# 赤外線・熱画像融合による火災シナリオのリアルタイム人体検出 Real-time Human Detection in Fire Scenarios using Infrared and Thermal Imaging Fusion ( http://arxiv.org/abs/2307.04223v1 ) ライセンス: Link先を確認	Truong-Dong Do, Nghe-Nhan Truong and My-Ha Le	(参考訳) 火災は人命に対する最も深刻な脅威の1つと考えられており、死者の確率が高い。これらの深刻な影響は、避難する犠牲者や救助隊の視認性をほとんど制限する火災による激しい煙によるものである。このような危険な状況下では、視覚に基づく人間検出システムを使用することで、より多くの命を救う能力を向上させることができる。そこで本論文では, 煙による低視認性シナリオにおける人間検出のための複数のカメラを用いた熱赤外画像融合方式を提案する。複数のカメラで処理することで、人間の検出に有用な特徴を生成するために、バイタル情報を収集することができる。まず、カメラはLight Heatating Chessboardを使って調整される。その後、軽量のディープニューラルネットワークを通過する前に入力画像から抽出した特徴をマージして人検出タスクを実行する。 NVIDIA Jetson Nano コンピュータで行った実験により,提案手法は妥当な速度で処理でき,mAP@0.5 95% で良好な性能が得られることを示した。 Fire is considered one of the most serious threats to human lives which results in a high probability of fatalities. Those severe consequences stem from the heavy smoke emitted from a fire that mostly restricts the visibility of escaping victims and rescuing squad. In such hazardous circumstances, the use of a vision-based human detection system is able to improve the ability to save more lives. To this end, a thermal and infrared imaging fusion strategy based on multiple cameras for human detection in low-visibility scenarios caused by smoke is proposed in this paper. By processing with multiple cameras, vital information can be gathered to generate more useful features for human detection. Firstly, the cameras are calibrated using a Light Heating Chessboard. Afterward, the features extracted from the input images are merged prior to being passed through a lightweight deep neural network to perform the human detection task. The experiments conducted on an NVIDIA Jetson Nano computer demonstrated that the proposed method can process with reasonable speed and can achieve favorable performance with a mAP@0.5 of 95%.	翻訳日:2023-07-11 14:50:00 公開日:2023-07-09
# lakebench: データレイク上のデータディスカバリのベンチマーク LakeBench: Benchmarks for Data Discovery over Data Lakes ( http://arxiv.org/abs/2307.04217v1 ) ライセンス: Link先を確認	Kavitha Srinivas, Julian Dolby, Ibrahim Abdelaziz, Oktie Hassanzadeh, Harsha Kokel, Aamod Khatiwada, Tejaswini Pedapati, Subhajit Chaudhury, Horst Samulowitz	(参考訳) 企業では、データ発見を中心に、データレイクをインテリジェントにナビゲートする必要性が高まっています。企業にとって特に重要なのは、関連するテーブルをデータレポジトリで見つける能力だ。これらのテーブルは互いに結合可能、結合可能、あるいはサブセットでもよい。パブリックドメインにはこれらのタスクのベンチマークが多数あり、関連する作業はプライベートデータセットをターゲットにしている。 LakeBenchでは、CKAN、ソクラタ、欧州中央銀行の政府データなど、さまざまなデータソースから抽出された表を用いて、これらのタスクの複数のベンチマークを作成する。これらのタスクにおける4つの表型基礎モデルの性能を比較した。既存のモデルはいずれも、このベンチマークのために開発したデータ発見タスクについてトレーニングされていません。その結果,このようなベンチマークの確立は,データレイクにおけるデータ発見に有用な表型モデルを構築する上で,コミュニティにとって有用であることが示唆された。 Within enterprises, there is a growing need to intelligently navigate data lakes, specifically focusing on data discovery. Of particular importance to enterprises is the ability to find related tables in data repositories. These tables can be unionable, joinable, or subsets of each other. There is a dearth of benchmarks for these tasks in the public domain, with related work targeting private datasets. In LakeBench, we develop multiple benchmarks for these tasks by using the tables that are drawn from a diverse set of data sources such as government data from CKAN, Socrata, and the European Central Bank. We compare the performance of 4 publicly available tabular foundational models on these tasks. None of the existing models had been trained on the data discovery tasks that we developed for this benchmark; not surprisingly, their performance shows significant room for improvement. The results suggest that the establishment of such benchmarks may be useful to the community to build tabular models usable for data discovery in data lakes.	翻訳日:2023-07-11 14:49:46 公開日:2023-07-09
# 階層型オートエンコーダを用いた大規模高解像度科学データに対するロシー圧縮 Hierarchical Autoencoder-based Lossy Compression for Large-scale High-resolution Scientific Data ( http://arxiv.org/abs/2307.04216v1 ) ライセンス: Link先を確認	Hieu Le, Hernan Santos, Jian Tao	(参考訳) ロスシー圧縮は多くの領域でデータサイズを減らす重要な技術となっている。この種の圧縮は、サイズが数ペタバイトに及ぶ大規模な科学データに特に有用である。オートエンコーダベースのモデルは画像やビデオの圧縮に成功しているが、そのようなニューラルネットワークは科学データ領域で広く注目を集めていない。本研究は,大規模科学データを著しく圧縮するだけでなく,高い再構成品質を維持するニューラルネットワークを提案する。提案モデルは,大規模高分解能気候モデルデータセットに適用可能な科学ベンチマークデータを用いて検証した。本モデルは,複数のベンチマークデータセットにおいて,復元品質を損なうことなく圧縮率140を達成する。高分解能コミュニティ・アース・システム・モデル(cesm)のバージョン1.3のシミュレーションデータは、500年以上にわたって圧縮率200で圧縮されているが、復元誤差は科学的解析には無視できない。 Lossy compression has become an important technique to reduce data size in many domains. This type of compression is especially valuable for large-scale scientific data, whose size ranges up to several petabytes. Although Autoencoder-based models have been successfully leveraged to compress images and videos, such neural networks have not widely gained attention in the scientific data domain. Our work presents a neural network that not only significantly compresses large-scale scientific data but also maintains high reconstruction quality. The proposed model is tested with scientific benchmark data available publicly and applied to a large-scale high-resolution climate modeling data set. Our model achieves a compression ratio of 140 on several benchmark data sets without compromising the reconstruction quality. Simulation data from the High-Resolution Community Earth System Model (CESM) Version 1.3 over 500 years are also being compressed with a compression ratio of 200 while the reconstruction error is negligible for scientific analysis.	翻訳日:2023-07-11 14:49:32 公開日:2023-07-09
# 360$^\circ$データを用いた一般アクションベースボール復元モデル Generalized Action-based Ball Recovery Model using 360$^\circ$ data ( http://arxiv.org/abs/2307.04215v1 ) ライセンス: Link先を確認	Ricardo Furbino Marques do Nascimento and Hugo M. R. Rios-Neto	(参考訳) しかし、マンチェスター・シティ、リバプール、リーズ・ユナイテッドといったチームは、この数年間で失ったボールをすぐに取り戻そうとしている。現在、世界トップマネージャの何人かは、ハイプレッシャースタイルを採用しており、通常はguardiolaとクレジットされる5秒ルールのような概念は、[9][10]を広めており、近年、多くのチームがプレーしている基本的な部分となっている。メディア[4][5][6]では、“息を吸わない”や“できるだけ早くボールを取り戻す”といった表現が頻繁に聞かれるが、持ち主の変更に最も繋がるアクションは何か? チームの位置決めがボールのリカバリに与える影響は? プレッシャーを受けると、より頻繁に崩壊する選手はどちらですか。上記のようにプレイヤーを強烈に押すわけではないチームの防御力を評価することは可能か? 本稿では, Statsbomb 360$^\circ$データを用いてGABR(Generalized Action based Ball Recovery Model)を作成することで, これらの疑問に答えようとしている。 Even though having more possession does not necessarily lead to winning, teams like Manchester City, Liverpool, and Leeds United notably have tried to recover the ball quickly after they lost it over the past few years. Nowadays, some of the top managers in the world apply high-pressing styles, and concepts such as the five-second rule, usually credited to Guardiola, have been spreading out [9][10], becoming a fundamental part of how lots of teams have played over the recent years. Expressions like "don't let them breathe" and "get the ball back as soon as possible" are often heard in the media [4][5][6], but what are the actions that most lead to a change in possession? What is the influence of a team's positioning on the ball recovery? Which are the players that more often collapse when under pressure? Can we evaluate the defensive dynamics of teams that do not necessarily press the player in possession as intensely as those mentioned above? We try to answer those and other questions in this paper by creating a Generalized Action based Ball Recovery model (GABR) using Statsbomb 360$^\circ$ data.	翻訳日:2023-07-11 14:49:19 公開日:2023-07-09
# 強化学習における安定現象のエッジの検討 Investigating the Edge of Stability Phenomenon in Reinforcement Learning ( http://arxiv.org/abs/2307.04210v1 ) ライセンス: Link先を確認	Rares Iordan, Marc Peter Deisenroth, Mihaela Rosca	(参考訳) 近年,教師付き学習における安定性現象のエッジを明らかにする運動量による全バッチ勾配降下学習ニューラルネットワークの最適化ダイナミクスの理解が進んでいる。安定現象のエッジは、ヘッシアンの主固有値が二次損失に対する最適化アルゴリズムの発散しきい値に達すると発生し、その後、しきい値の周りを振動し始め、損失は局所不安定となり始めるが、長い時間フレームで減少する。本研究では,オフラインからオンラインrlまで,さまざまなデータレジームにまたがるオフポリシーq-ラーニングアルゴリズムである強化学習(rl)における安定性現象のエッジについて検討する。実験の結果,データ分布の非定常性やブートストラップの利用など,教師あり学習に大きく違いがあるにもかかわらず,非政治的な深層RLには安定性現象の端が存在することがわかった。しかし、教師あり学習とは異なり、根底にある損失によって強い違いが観察され、DQN -- Huber損失 -- はC51では観測できない安定性効果の強いエッジを示す。この結果から,ニューラルネットワーク構造は問題領域間の移動を最適化するダイナミクスをもたらす可能性があるが,深いRL最適化の特定の側面は,教師付き学習のような領域と区別できる可能性が示唆された。 Recent progress has been made in understanding optimisation dynamics in neural networks trained with full-batch gradient descent with momentum with the uncovering of the edge of stability phenomenon in supervised learning. The edge of stability phenomenon occurs as the leading eigenvalue of the Hessian reaches the divergence threshold of the underlying optimisation algorithm for a quadratic loss, after which it starts oscillating around the threshold, and the loss starts to exhibit local instability but decreases over long time frames. In this work, we explore the edge of stability phenomenon in reinforcement learning (RL), specifically off-policy Q-learning algorithms across a variety of data regimes, from offline to online RL. Our experiments reveal that, despite significant differences to supervised learning, such as non-stationarity of the data distribution and the use of bootstrapping, the edge of stability phenomenon can be present in off-policy deep RL. Unlike supervised learning, however, we observe strong differences depending on the underlying loss, with DQN -- using a Huber loss -- showing a strong edge of stability effect that we do not observe with C51 -- using a cross entropy loss. Our results suggest that, while neural network structure can lead to optimisation dynamics that transfer between problem domains, certain aspects of deep RL optimisation can differentiate it from domains such as supervised learning.	翻訳日:2023-07-11 14:48:58 公開日:2023-07-09
# 企業におけるプライバシ保護型合成データの展開の課題 On the Challenges of Deploying Privacy-Preserving Synthetic Data in the Enterprise ( http://arxiv.org/abs/2307.04208v1 ) ライセンス: Link先を確認	Lauren Arthur, Jason Costello, Jonathan Hardy, Will O'Brien, James Rea, Gareth Rees, Georgi Ganev	(参考訳) 生成AI技術は前例のない人気を得ており、その優れた能力によって興奮と不安が混ざり合っている。本稿では,生成AIのサブフィールドである合成データのデプロイに関わる課題について検討する。当社の焦点は企業の展開であり、大量の個人的および高度に機密性の高いデータによって引き起こされるプライバシーの懸念に重点を置いている。 40以上の課題を特定し、それらを5つの主要なグループに体系化する。 i)世代二インフラ及び建築三統治四コンプライアンス及び規制、及び v) 採用。さらに,企業が課題に効果的に対処し,実現したソリューションへの信頼を確立することで目標を達成するための戦略的かつ体系的なアプローチについても論じる。 Generative AI technologies are gaining unprecedented popularity, causing a mix of excitement and apprehension through their remarkable capabilities. In this paper, we study the challenges associated with deploying synthetic data, a subfield of Generative AI. Our focus centers on enterprise deployment, with an emphasis on privacy concerns caused by the vast amount of personal and highly sensitive data. We identify 40+ challenges and systematize them into five main groups -- i) generation, ii) infrastructure & architecture, iii) governance, iv) compliance & regulation, and v) adoption. Additionally, we discuss a strategic and systematic approach that enterprises can employ to effectively address the challenges and achieve their goals by establishing trust in the implemented solutions.	翻訳日:2023-07-11 14:48:35 公開日:2023-07-09
# forward アルゴリズムの拡張 Extending the Forward Forward Algorithm ( http://arxiv.org/abs/2307.04205v1 ) ライセンス: Link先を確認	Saumya Gandhi, Ritu Gala, Jonah Kornberg, Advaith Sridhar	(参考訳) 2022年11月にGeoffrey Hintonによって提案されたフォワードフォワードアルゴリズムは、バックプロパゲーションの代わりにニューラルネットワークをトレーニングするための新しい方法である。本プロジェクトでは,mnistデータセットにおける hinton の実験を再現し,その手法の範囲を2つの重要な貢献で拡張する。まず,imdb movie reviewsデータセット上で,フォワードフォワードネットワークのベースライン性能を確立する。私たちが知る限り、この感情分析タスクの結果は、コンピュータビジョンを超えたアルゴリズムの拡張の最初の例である。第二に、損失閾値に対する新しいピラミッド最適化戦略、すなわちフォワードフォワード法に特有のハイパーパラメータを導入する。我々のピラミッド的アプローチは、良好なしきい値戦略がテストエラーの最大8%の差を引き起こすことを示している。最後に,訓練パラメータの可視化を行い,特に大きな (10-20x) 平均や前方ネットワークによって獲得された重みのばらつきなど,いくつかの重要な洞察を得た。 The Forward Forward algorithm, proposed by Geoffrey Hinton in November 2022, is a novel method for training neural networks as an alternative to backpropagation. In this project, we replicate Hinton's experiments on the MNIST dataset, and subsequently extend the scope of the method with two significant contributions. First, we establish a baseline performance for the Forward Forward network on the IMDb movie reviews dataset. As far as we know, our results on this sentiment analysis task marks the first instance of the algorithm's extension beyond computer vision. Second, we introduce a novel pyramidal optimization strategy for the loss threshold - a hyperparameter specific to the Forward Forward method. Our pyramidal approach shows that a good thresholding strategy causes a difference of upto 8% in test error. 1 Lastly, we perform visualizations of the trained parameters and derived several significant insights, such as a notably larger (10-20x) mean and variance in the weights acquired by the Forward Forward network.	翻訳日:2023-07-11 14:48:24 公開日:2023-07-09
# 軌道アライメント:分岐理論による安定性現象の端の理解 Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory ( http://arxiv.org/abs/2307.04204v1 ) ライセンス: Link先を確認	Minhak Song, Chulhee Yun	(参考訳) cohen et al. (2021) は勾配降下(gd)軌道に沿って損失ヘッセンの最大の固有値の進化を実証的に研究し、安定性のエッジ(英語版)(eos)と呼ばれる現象を観測した。トレーニングの初期段階(プログレッシブ・シャープニング(progressive sharpening)と呼ばれる)でシャープ性が向上し、最終的に2 / \text{(step size)$のしきい値近くで飽和する。本稿では、EoS現象が起こると(適切な再パラメータ化の後)異なるGD軌道が初期化とは無関係に特定の分岐図に整列することを示す経験的研究から始める。次に、この軌道アライメント現象を2層完全連結線形ネットワークと1つのデータポイントで訓練された1つの非線形ネットワークに対して厳密に証明する。トラジェクトリアライメント分析により,最近の文献の知見を包含し,拡張する進行的シャープニングとEoS現象が確立される。 Cohen et al. (2021) empirically study the evolution of the largest eigenvalue of the loss Hessian, also known as sharpness, along the gradient descent (GD) trajectory and observe a phenomenon called the Edge of Stability (EoS). The sharpness increases at the early phase of training (referred to as progressive sharpening), and eventually saturates close to the threshold of $2 / \text{(step size)}$. In this paper, we start by demonstrating through empirical studies that when the EoS phenomenon occurs, different GD trajectories (after a proper reparameterization) align on a specific bifurcation diagram independent of initialization. We then rigorously prove this trajectory alignment phenomenon for a two-layer fully-connected linear network and a single-neuron nonlinear network trained with a single data point. Our trajectory alignment analysis establishes both progressive sharpening and EoS phenomena, encompassing and extending recent findings in the literature.	翻訳日:2023-07-11 14:48:09 公開日:2023-07-09
# イノベーションのエコシステムを育む - ステークホルダ,ツール,人々の相乗効果 Thriving Innovation Ecosystems: Synergy Among Stakeholders, Tools, and People ( http://arxiv.org/abs/2307.04263v1 ) ライセンス: Link先を確認	Shruti Misra, Denise Wilson	(参考訳) イノベーションエコシステムは、様々な利害関係者が交流して複雑な社会技術的課題を解決する、マルチステークホルダー環境である。我々は、ステークホルダーがデジタルツール、人的資源、それらの組み合わせを使って情報を集め、イノベーションエコシステムで意思決定する方法について検討した。利害関係者のモチベーション,情報ニーズ,実践を包括的に理解するため,インタラクティブなデジタルダッシュボードを用いて5つの利害関係者グループ(N=13)を対象に,三部インタビュー調査を行った。利害関係者は主に、彼らの貢献の潜在的な社会的影響によって、イノベーションエコシステムに参加する動機があることに気付きました。また、ステークホルダーはデジタルツールを使って「ハイレベル」な情報を探し出し、初期意思決定の努力を足場としたが、最終的な決定は人間のネットワークが提供するコンテキスト情報に依存していた。したがって、デジタルツールではなく、人々はこれらのエコシステムにおける重要な情報源であるように見える。我々は,技術がステークホルダーの意思決定努力をいかに強化し,堅牢で公平なイノベーションエコシステムを実現するかを検討した。 An innovation ecosystem is a multi-stakeholder environment, where different stakeholders interact to solve complex socio-technical challenges. We explored how stakeholders use digital tools, human resources, and their combination to gather information and make decisions in innovation ecosystems. To comprehensively understand stakeholders' motivations, information needs and practices, we conducted a three-part interview study across five stakeholder groups (N=13) using an interactive digital dashboard. We found that stakeholders were primarily motivated to participate in innovation ecosystems by the potential social impact of their contributions. We also found that stakeholders used digital tools to seek "high-level" information to scaffold initial decision-making efforts but ultimately relied on contextual information provided by human networks to enact final decisions. Therefore, people, not digital tools, appear to be the key source of information in these ecosystems. Guided by our findings, we explored how technology might nevertheless enhance stakeholders' decision-making efforts and enable robust and equitable innovation ecosystems.	翻訳日:2023-07-11 14:41:31 公開日:2023-07-09
# ビームスプリッタアレイ上の量子ランダムウォーク Quantum random walks on a beam splitter array ( http://arxiv.org/abs/2307.04262v1 ) ライセンス: Link先を確認	Mario Ivan Estrada Delgado and Zurika Iveth Blanco Garcia	(参考訳) ビームスプリッタアレイの一般的な行列表現を示す。各ビームスプリッターは送信/反射係数を持ち、それぞれの装置の動作を決定し、その結果、システム全体の応答を決定する。各ビームスプリッターの一般的な行列表現は、2n-{th}$次元空間の回転として与えられる。これらの演算子により、配列全体を記述し、その結果、入力光子状態の最終確率分布を計算することができる。 The general matrix representation of a beam splitter array is presented. Each beam splitter has a transmission/reflection coefficient that determines the behavior of these individual devices and, in consequence, the whole system response. The general matrix representation of each beam splitter is given as rotations of a $2n-{th}$ dimensional space. With these operators, the matrix that describes the entire array and, consequently, the final probability distribution of an input photon state can be calculated.	翻訳日:2023-07-11 14:41:11 公開日:2023-07-09
# 量子確率過程からの古典性 Classicality from Quantum Stochastic Processes ( http://arxiv.org/abs/2307.04258v1 ) ライセンス: Link先を確認	Esteban Mart\'inez-Vargas	(参考訳) 我々は量子システムから古典論を発展させる。この理論は古典的および量子的定常確率過程の研究に由来する。確率過程は、多面体(古典)および半定値の代表(量子)錐体によって特徴づけられる。以前の結果 cite{2209.06806v1} に基づいて、量子チャネルからの固定点の研究を拡大する。我々は、コアと多くのイテレーションで崩壊する部分に分かれた量子チャネルを特徴付ける半定値プログラムを与える。一般に、解は定義されている空間において非分離である。分離可能な場合の固定点の観点から,チャネルの特性を示す。そして、多面体円錐の量子シミュレーションを構築することができる。 We develop a theory of classicality from quantum systems. This theory stems from the study of classical and quantum stationary stochastic processes. The stochastic processes are characterized by polyhedral (classical) and semidefinite representative (quantum) cones. Based on a previous result \cite{2209.06806v1} we expand the study of fixed points from quantum channels. We give a semidefinite program that characterizes a quantum channel separating into a core and a part that decays with many iterations. In general, the solution is non-separable in the space it is defined. We present a characterization of channels in terms of their fixed points for the separable case. A quantum simulation of a polyhedral cone can then be constructed.	翻訳日:2023-07-11 14:41:04 公開日:2023-07-09
# 古典領域と量子領域における学習と制御の枠組み Framework for Learning and Control in the Classical and Quantum Domains ( http://arxiv.org/abs/2307.04256v1 ) ライセンス: Link先を確認	Seyed Shakib Vedaie, Archismita Dalal, Eduardo J. P\'aez, Barry C. Sanders	(参考訳) 制御と学習は古典的領域と量子的領域の両方において技術進歩の鍵であるが、古典的および量子的定義と学習の間の相互関係は文学において不十分である。我々は,古典的および量子的に,学習と制御を形式的に関連付ける枠組みを構築し,学習が制御にどのように役立つかを示す。さらに,本フレームワークは,古典的および量子的制御と学習のネクサスにおける興味深い未解決問題を識別し,問題解決ツールの選択を支援する。利用例として,適応型量子エンハンス型干渉位相推定法を,実現可能な制御方針を考案するための教師あり学習問題とした。これらの分野の統合は、既存の知識をエレガントに要約し、知識ギャップを露呈する知識の状態を図式的に表現することに依存します。 Control and learning are key to technological advancement, both in the classical and quantum domains, yet their interrelationship is insufficiently clear in the literature, especially between classical and quantum definitions of control and learning. We construct a framework that formally relates learning and control, both classical and quantum, to each other, with this formalism showing how learning can aid control. Furthermore, our framework helps to identify interesting unsolved problems in the nexus of classical and quantum control and learning and help in choosing tools to solve problems. As a use case, we cast the well-studied problem of adaptive quantum-enhanced interferometric-phase estimation as a supervised learning problem for devising feasible control policies. Our unification of these fields relies on diagrammatically representing the state of knowledge, which elegantly summarizes existing knowledge and exposes knowledge gaps.	翻訳日:2023-07-11 14:40:56 公開日:2023-07-09
# 量子機構としての相対論的時間拡張 Relativistic time dilation as a quantum mechanism ( http://arxiv.org/abs/2307.04254v1 ) ライセンス: Link先を確認	Esteban Mart\'inez-Vargas	(参考訳) 量子システムを用いた時間拡張のメカニズムを提案する。我々は、異なる参照フレームからの量子状態の変化に敏感な作用素の族を導入する。参照フレーム間の変化はガリレオ変換によって行われるので、この場合の拡張の源は可観測性に由来する。これらの観測物は時間とともに線形に成長し、状態の基準フレームによって線形成長はその傾きが変化するので、同じ点まで成長するのに時間がかかる。このようなメカニズムは、時空に対する通常の理解とは異なる見方を意味する。 We propose a mechanism for time dilation using quantum systems. We introduce a family of operators that are sensitive to the changes of quantum states from different frames of reference. The change between reference frames is done via a Galilean transformation, therefore, the source of the dilation in our case comes from the observable. These observables grow linearly in time and depending on the reference frame of the state the linear growth changes its slope, therefore it takes longer to grow to the same point. Such mechanism implies a different view from the usual understanding of spacetime.	翻訳日:2023-07-11 14:40:39 公開日:2023-07-09
# 生成AIと大規模言語モデルの時代におけるチャットGPT:簡潔な調査 ChatGPT in the Age of Generative AI and Large Language Models: A Concise Survey ( http://arxiv.org/abs/2307.04251v1 ) ライセンス: Link先を確認	Salman Mohamadi, Ghulam Mujtaba, Ngan Le, Gianfranco Doretto, Donald A. Adjeroh	(参考訳) ChatGPTはOpenAIが開発した大規模言語モデル(LLM)で、大量のデータに対して慎重にトレーニングされている。自然言語処理(NLP)の分野に革命をもたらし、LLMの機能の境界を押し広げた。 ChatGPTは、生成的人工知能(GAI)を大規模に公開するための重要な役割を担っている。また、同様の技術を開発し、その応用や影響を調査する研究にも関心が寄せられている。本稿では、ChatGPTとその進化に関する現在の研究ラインについて、簡潔な調査を行うことを目標とする。 chatgptのglass boxとblack boxのビューの両方を検討し、テクノロジーのコンポーネントと基本的な要素、そしてその応用、影響、そして影響について検討しました。ガラス箱のアプローチは技術の内部の動作を理解することに集中しており、ブラックボックスのアプローチは複雑なシステムとして受け入れ、入力、出力、効果を調べる。これは、この技術の包括的な探求の道を開き、さらなる研究と実験のためのロードマップを提供する。また, LLM と GAI に関する基本文献と ChatGPT との関係についても概説した。この概要は、llmの新興分野における既存および欠落の研究ラインに光を当て、パブリックユーザと開発者の両方に利益をもたらす。さらに, 教育, 研究, 医療, ファイナンスなどの分野において, 幅広い応用範囲と重要な関心事について検討した。 ChatGPT is a large language model (LLM) created by OpenAI that has been carefully trained on a large amount of data. It has revolutionized the field of natural language processing (NLP) and has pushed the boundaries of LLM capabilities. ChatGPT has played a pivotal role in enabling widespread public interaction with generative artificial intelligence (GAI) on a large scale. It has also sparked research interest in developing similar technologies and investigating their applications and implications. In this paper, our primary goal is to provide a concise survey on the current lines of research on ChatGPT and its evolution. We considered both the glass box and black box views of ChatGPT, encompassing the components and foundational elements of the technology, as well as its applications, impacts, and implications. The glass box approach focuses on understanding the inner workings of the technology, and the black box approach embraces it as a complex system, and thus examines its inputs, outputs, and effects. This paves the way for a comprehensive exploration of the technology and provides a road map for further research and experimentation. We also lay out essential foundational literature on LLMs and GAI in general and their connection with ChatGPT. This overview sheds light on existing and missing research lines in the emerging field of LLMs, benefiting both public users and developers. Furthermore, the paper delves into the broad spectrum of applications and significant concerns in fields such as education, research, healthcare, finance, etc.	翻訳日:2023-07-11 14:40:30 公開日:2023-07-09
# 室内シーンの凸分解 Convex Decomposition of Indoor Scenes ( http://arxiv.org/abs/2307.04246v1 ) ライセンス: Link先を確認	Vaibhav Vavilala and David Forsyth	(参考訳) 本稿では,複雑な室内シーンをプリミティブに解析する方法について述べる。プリミティブは単純な凸です。提案手法は,RGBD入力からシーンを一定数の凸に解析するために学習された回帰手法を用いており,任意のセグメンテーションを受け入れて分解を改善することができる。その結果は下降法で研磨され、凸を調整して非常によくフィットし、強欲に余分な原始物を取り除く。シーン全体が解析されるので、従来の深さ、正規度、セグメンテーションエラーメトリクスを使って評価できる。評価手法により, プリミティブ表現からの誤差は, 一つの画像から深度を予測する誤差に匹敵することを示した。 We describe a method to parse a complex, cluttered indoor scene into primitives which offer a parsimonious abstraction of scene structure. Our primitives are simple convexes. Our method uses a learned regression procedure to parse a scene into a fixed number of convexes from RGBD input, and can optionally accept segmentations to improve the decomposition. The result is then polished with a descent method which adjusts the convexes to produce a very good fit, and greedily removes superfluous primitives. Because the entire scene is parsed, we can evaluate using traditional depth, normal, and segmentation error metrics. Our evaluation procedure demonstrates that the error from our primitive representation is comparable to that of predicting depth from a single image.	翻訳日:2023-07-11 14:40:05 公開日:2023-07-09
# 自然言語処理を用いた後処理による光文字認識のための新しいパイプライン A Novel Pipeline for Improving Optical Character Recognition through Post-processing Using Natural Language Processing ( http://arxiv.org/abs/2307.04245v1 ) ライセンス: Link先を確認	Aishik Rakshit, Samyak Mehta, Anirban Dasgupta	(参考訳) 光文字認識(OCR)技術は、書籍や非構造化文書のデジタル化や、モビリティ統計、法執行機関、交通、セキュリティシステムなど他の分野の応用に応用している。最先端のメソッドは、ライセンスプレートやショップ名などに印刷されたテキストでOCRとうまく動作します。しかし、印刷教科書や手書きテキストなどのアプリケーションは、既存の技術では精度が限られている。その理由は、類似した文字や手書き文字のバリエーションによる可能性がある。これらの課題はOCR技術にのみ対処することが困難であるため,自然言語処理(NLP)ツールを用いた後処理手法を提案する。この研究は、手書きまたは印刷されたテキストに対して最初にOCRを実行するエンドツーエンドパイプラインを示し、NLPを使用してその精度を向上させる。 Optical Character Recognition (OCR) technology finds applications in digitizing books and unstructured documents, along with applications in other domains such as mobility statistics, law enforcement, traffic, security systems, etc. The state-of-the-art methods work well with the OCR with printed text on license plates, shop names, etc. However, applications such as printed textbooks and handwritten texts have limited accuracy with existing techniques. The reason may be attributed to similar-looking characters and variations in handwritten characters. Since these issues are challenging to address with OCR technologies exclusively, we propose a post-processing approach using Natural Language Processing (NLP) tools. This work presents an end-to-end pipeline that first performs OCR on the handwritten or printed text and then improves its accuracy using NLP.	翻訳日:2023-07-11 14:39:51 公開日:2023-07-09
# 強結合状態における温度測定用マルチスピンプローブ Multi-spin probes for thermometry in the strong-coupling regime ( http://arxiv.org/abs/2307.04232v1 ) ライセンス: Link先を確認	Marlon Brenes and Dvira Segal	(参考訳) 温度$t$で調製した試料にn$のスピンを結合した熱測定プローブの感度について検討した。我々の分析は弱い結合限界を超えて強いサンプル-プローブカップリングレジームにまで及んでいる。特に、各スピン間のサンプル誘起相互作用は強いカップリング効果によって生成され、プローブを構成する各物体間で微調整されていない。反応座標マッピングを用いて、プローブの非正準平衡状態を有限結合で評価することにより、平衡状態自体を通じて量子フィッシャー情報を介して熱量感度を計算する。単スピンプローブが$(N = 1)$の場合、温度感度は弱い中間結合強度のレギュレーションで低下するが、結合の増大に伴い、低温のレギュレーションにおいてプローブのより高い感度が観察される。さらに、N > 1$ である限り、試料-プローブ相互作用エネルギーの最適値が存在し、特に低温の状態では、熱ギブス状態から得られる最大精度と比較して、温度測定感度を高めることができる。最後に, この感度の増大は, サブオプティカルな測定から観察できることを示した。 We study the sensitivity of thermometric probes that are composed of $N$ spins coupled to a sample prepared at temperature $T$. Our analysis extends beyond the weak-coupling limit into the strong sample-probe coupling regime. In particular, sample-induced interactions between each of the spins are generated via strong coupling effects and are not fine-tuned amongst each body composing the probe. By employing the reaction-coordinate mapping to evaluate the non-canonical equilibrium state of the probe at finite coupling, we compute the thermometric sensitivity via the quantum Fisher information through the equilibrium state itself. We find that for single-spin probes $(N = 1)$, temperature sensitivity decreases in the regime of weak-to-intermediate coupling strength, however, as the coupling increases we observe much higher sensitivity of the probe in the low-temperature regime. Furthermore, as long as $N > 1$, there exist optimal values of the sample-probe interaction energy that allow one to attain enhanced thermometric sensitivity when compared to the maximum achieved precision obtained from thermal Gibbs states at weak coupling, particularly in the regime of low temperature. Finally, we show that this enhanced sensitivity may be observed from suboptimal measurements.	翻訳日:2023-07-11 14:39:37 公開日:2023-07-09
# mx2m:3次元意味セグメンテーションのための領域適応におけるマスク型クロスモダリティモデリング Mx2M: Masked Cross-Modality Modeling in Domain Adaptation for 3D Semantic Segmentation ( http://arxiv.org/abs/2307.04231v1 ) ライセンス: Link先を確認	Boxiang Zhang, Zunran Wang, Yonggen Ling, Yuanyuan Guan, Shenghao Zhang, Wenhui Li	(参考訳) 3次元セマンティックセグメンテーションのための既存のクロスモーダル領域適応法は、クロスモーダル特徴マッチングによって得られる2D-3D相補性によってのみ結果を予測する。しかし、対象ドメインの監督が欠如しているため、相補性は常に信頼できるとは限らない。ドメインギャップが大きい場合、結果は理想的ではありません。監視の欠如を解決するため,マスクドモデリングを課題に導入し,マスクド・クロスモダリティ・モデリングを用いて大きなドメインギャップを低減する手法Mx2Mを提案する。私たちのMx2Mには2つのコンポーネントがあります。ひとつは、Mx2Mを様々なシナリオに適応させ、クロスモーダルな自己スーパービジョンを提供する、クロスモーダルな除去と予測(xMRP)である。もう1つはクロスモーダルな特徴マッチングの新しい方法である動的クロスモーダルフィルタ(DxMF)で、メソッド全体がより適切な2D-3D相補性を動的に使用できるようにする。 DAシナリオにおけるMx2Mの評価には、Day/Night、USA/Singapore、A2D2/SemanticKITTIなどがある。 Existing methods of cross-modal domain adaptation for 3D semantic segmentation predict results only via 2D-3D complementarity that is obtained by cross-modal feature matching. However, as lacking supervision in the target domain, the complementarity is not always reliable. The results are not ideal when the domain gap is large. To solve the problem of lacking supervision, we introduce masked modeling into this task and propose a method Mx2M, which utilizes masked cross-modality modeling to reduce the large domain gap. Our Mx2M contains two components. One is the core solution, cross-modal removal and prediction (xMRP), which makes the Mx2M adapt to various scenarios and provides cross-modal self-supervision. The other is a new way of cross-modal feature matching, the dynamic cross-modal filter (DxMF) that ensures the whole method dynamically uses more suitable 2D-3D complementarity. Evaluation of the Mx2M on three DA scenarios, including Day/Night, USA/Singapore, and A2D2/SemanticKITTI, brings large improvements over previous methods on many metrics.	翻訳日:2023-07-11 14:39:19 公開日:2023-07-09
# 窒素イオンLasingにおける軌道角運動量(OAM)による光パルス増幅 Amplification of light pulses with orbital angular momentum (OAM) in nitrogen ions lasing ( http://arxiv.org/abs/2307.04282v1 ) ライセンス: Link先を確認	Haicheng Mei, Jingsong Gao, Kailu Wang, Jiahao Dong, Qihuang Gong, Chengyin Wu, Yunquan Liu, Hongbing Jiang, and Yi Liu	(参考訳) 強いフェムト秒レーザーパルスで励起された窒素イオンは、紫外域の光増幅を引き起こす。ここでは,軌道角運動量(OAM)を有するシード光パルスが,ガウスフェムト秒レーザーパルスによって励起される窒素プラズマにおいて顕著に増幅できることを実証した。トポロジカル電荷 +1 と -1 では、シード光パルスの2桁のエネルギー増幅が観測され、増幅パルスはインシデントシードパルスと同じOAMを担っている。さらに,oamシードビームを用いたプラズマ増幅器の空間的不一致は,ドーナツ形状の強度分布を示すoamシードパルスの特別な空間的プロファイルにより,oamを介さずにガウスモードの増幅を生じさせることを示した。この誤解を利用して、ガウスモードとoamモードの間で出力信号をトグルする光スイッチを実装できる。この研究は、シード光から増幅信号への位相移動を認証するだけでなく、OAMビーム増幅の達成のために、ドーナツ形状のシードビームと窒素プラズマのゲイン領域との空間的重なりが重要であることも強調している。 Nitrogen ions pumped by intense femtosecond laser pulses give rise to optical amplification in the ultraviolet range. Here, we demonstrated that a seed light pulse carrying orbital angular momentum (OAM) can be significantly amplified in nitrogen plasma excited by a Gaussian femtosecond laser pulse. With the topological charge of +1 and -1, we observed an energy amplification of the seed light pulse by two orders of magnitude, while the amplified pulse carries the same OAM as the incident seed pulse. Moreover, we show that a spatial misalignment of the plasma amplifier with the OAM seed beam leads to an amplified emission of Gaussian mode without OAM, due to the special spatial profile of the OAM seed pulse that presents a donut-shaped intensity distribution. Utilizing this misalignment, we can implement an optical switch that toggles the output signal between Gaussian mode and OAM mode. This work not only certifies the phase transfer from the seed light to the amplified signal, but also highlights the important role of spatial overlap of the donut-shaped seed beam with the gain region of the nitrogen plasma for the achievement of OAM beam amplification.	翻訳日:2023-07-11 14:29:27 公開日:2023-07-09
# 解説文における自動エッセイスコーリング:DeBERTeaching Assistant Automated Essay Scoring in Argumentative Writing: DeBERTeachingAssistant ( http://arxiv.org/abs/2307.04276v1 ) ライセンス: Link先を確認	Yann Hicke, Tonghua Tian, Karan Jha, Choong Hee Kim	(参考訳) 自動評価は50年以上にわたって研究・産業問題として研究されてきた。世界中の教育者にとって貴重な時間節約ツールを創出できる研究分野としての教育的価値が明白であることから、NLPコミュニティから多くの注目を集めている。しかし、これらのツールは一般的に良い文法の検出、スペルミス、組織品質にフォーカスしているが、最終的な評価には説得力のある特徴を組み込むのに失敗する傾向がある。議論の強さを改善するために生徒に行動可能なフィードバックを与える責任は、教師の肩にのみ残される。そこで本研究では,その説得力の質を議論的に記述する談話要素に注釈を付けることで,上述の正確性を達成するトランスフォーマーアーキテクチャを提案するとともに,提案モデルの説明可能性を調査する今後の課題についても拡張し,教師のアドバイスと機械のアドバイスとのパートナーシップを可能にする。 Automated Essay scoring has been explored as a research and industry problem for over 50 years. It has drawn a lot of attention from the NLP community because of its clear educational value as a research area that can engender the creation of valuable time-saving tools for educators around the world. Yet, these tools are generally focused on detecting good grammar, spelling mistakes, and organization quality but tend to fail at incorporating persuasiveness features in their final assessment. The responsibility to give actionable feedback to the student to improve the strength of their arguments is left solely on the teacher's shoulders. In this work, we present a transformer-based architecture capable of achieving above-human accuracy in annotating argumentative writing discourse elements for their persuasiveness quality and we expand on planned future work investigating the explainability of our model so that actionable feedback can be offered to the student and thus potentially enable a partnership between the teacher's advice and the machine's advice.	翻訳日:2023-07-11 14:29:06 公開日:2023-07-09
# 教師の正確な反応生成における大規模言語モデルの有効性評価 Assessing the efficacy of large language models in generating accurate teacher responses ( http://arxiv.org/abs/2307.04274v1 ) ライセンス: Link先を確認	Yann Hicke, Abhishek Masand, Wentao Guo, Tushaar Gangavarapu	(参考訳) (Tack et al., 2023)は、教育対話における教師語の生成に関する教育アプリケーション構築のためのNLPの革新的利用に関する第18回ワークショップの主催する共有タスクを組織した。本研究は,共用課題の構造に従って,学生に情報的かつ有益な洞察を提供することによって,大規模言語モデルの生成能力を評価し,知識のある教師の役割をシミュレートする。そこで本研究では,GPT-4 (few-shot, in-context learning), fine-tuned GPT-2, fine-tuned DialoGPTなどのベンチマーク生成モデルの広範な評価を行う。さらに,教育的品質を最適化するために,強化学習を用いたflan-t5モデルの微調整を行った。教師-学生チャットルームコーパスのサブセットについて,BERTScore と DialogRPT を用いて測定し,他の微調整モデルに対する GPT-4 の有効性を示した。我々は、サンプリング、代表性、ダイアログ完全性など、いくつかのデータセット特性が微調整に重大な課題をもたらし、微調整モデルの一般化性に悪影響を及ぼすと仮定する。最後に,これらの生成モデルに対して,対話コヒーレンスやマッチング言語モデル分布だけでなく,教育的スキルを提示するモデルの能力にも依存するメトリクスを用いた評価の必要性を指摘する。 (Tack et al., 2023) organized the shared task hosted by the 18th Workshop on Innovative Use of NLP for Building Educational Applications on generation of teacher language in educational dialogues. Following the structure of the shared task, in this study, we attempt to assess the generative abilities of large language models in providing informative and helpful insights to students, thereby simulating the role of a knowledgeable teacher. To this end, we present an extensive evaluation of several benchmarking generative models, including GPT-4 (few-shot, in-context learning), fine-tuned GPT-2, and fine-tuned DialoGPT. Additionally, to optimize for pedagogical quality, we fine-tuned the Flan-T5 model using reinforcement learning. Our experimental findings on the Teacher-Student Chatroom Corpus subset indicate the efficacy of GPT-4 over other fine-tuned models, measured using BERTScore and DialogRPT. We hypothesize that several dataset characteristics, including sampling, representativeness, and dialog completeness, pose significant challenges to fine-tuning, thus contributing to the poor generalizability of the fine-tuned models. Finally, we note the need for these generative models to be evaluated with a metric that relies not only on dialog coherence and matched language modeling distribution but also on the model's ability to showcase pedagogical skills.	翻訳日:2023-07-11 14:28:49 公開日:2023-07-09
# 局所ブラウン回路におけるサンプリングと誤差補正の相転移 Phase transitions in sampling and error correction in local Brownian circuits ( http://arxiv.org/abs/2307.04267v1 ) ライセンス: Link先を確認	Subhayan Sahu, Shao-Kai Jian	(参考訳) 局所ブラウン回路における反集中性と近似ユニタリ設計挙動の出現について検討した。出力状態の確率分布とエントロピーの回路平均モーメントのダイナミクスは、レプリカ空間に有効な局所ハミルトニアンを用いて想像上の時間発展として表現することができる。これにより、テンソルネットワークツールを用いて、そのような回路平均量の1+1d$のダイナミックスを大規模に数値シミュレーションし、ブラウン回路の様々な状態を異なる熱力学相として同定することができる。特に、反濃縮の出現は衝突確率の急激な遷移として$\log N$ timescale と同定し、そこでは$N$は量子ビットの数である。また,特定の古典近似アルゴリズムが同時に計算硬度遷移を持つことを示す。ノイズの存在下では、ノイズレートを1/n$にスケールダウンした場合、線形クロスエントロピーベンチマークにノイズ誘起1次位相遷移が存在することを示す。ブラウン回路はより長い時間に、o(n)$タイムでユニタリな2-設計を近似する。このような回路による量子誤差補正の実現可能性を直接調査し、o(n)$タイムスケールで1次遷移を同定する。これら全ての相転移のスケーリング挙動は、大規模数値から得られ、有効レプリカハミルトニアンのスペクトルを解析することによって裏付けられる。 We study the emergence of anticoncentration and approximate unitary design behavior in local Brownian circuits. The dynamics of circuit averaged moments of the probability distribution and entropies of the output state can be represented as imaginary time evolution with an effective local Hamiltonian in the replica space. This facilitates large scale numerical simulation of the dynamics in $1+1d$ of such circuit-averaged quantities using tensor network tools, as well as identifying the various regimes of the Brownian circuit as distinct thermodynamic phases. In particular, we identify the emergence of anticoncentration as a sharp transition in the collision probability at $\log N$ timescale, where $N$ is the number of qubits. We also show that a specific classical approximation algorithm has a computational hardness transition at the same timescale. In the presence of noise, we show there is a noise-induced first order phase transition in the linear cross entropy benchmark when the noise rate is scaled down as $1/N$. At longer times, the Brownian circuits approximate a unitary 2-design in $O(N)$ time. We directly probe the feasibility of quantum error correction by such circuits, and identify a first order transition at $O(N)$ timescales. The scaling behaviors for all these phase transitions are obtained from the large scale numerics, and corroborated by analyzing the spectrum of the effective replica Hamiltonian.	翻訳日:2023-07-11 14:28:28 公開日:2023-07-09

Title

Authors

Abstract

論文公表日・翻訳日

# SeePrivacy: モバイルアプリケーションのためのコンテキストプライバシポリシの自動生成

SeePrivacy: Automated Contextual Privacy Policy Generation for Mobile Applications ( http://arxiv.org/abs/2307.01691v3 )

ライセンス: Link先を確認

Shidong Pan, Zhen Tao, Thong Hoang, Dawen Zhang, Zhenchang Xing, Xiwei Xu, Mark Staples, and David Lo

(参考訳) プライバシーポリシーは個人のプライバシーとデジタルセキュリティを守るための最も重要なアプローチとなっている。プレゼンテーションと可読性を高めるために、研究者はコンテキストプライバシポリシ(cpps)の概念を提案し、ポリシーを短いスニペットに断片化し、対応するコンテキストでのみ表示する。本稿では,モバイルアプリのコンテキストプライバシポリシを自動的に生成するように設計された,新たなマルチモーダルフレームワークseeprivacyを提案する。本手法は,モバイルguiの理解とプライバシーポリシー文書分析を相乗的に組み合わせ,プライバシー関連コンテキスト検出のための83.6%のカバー率と,対応するポリシーセグメントを抽出する際の精度0.92である。驚くべきことに、検索されたポリシーセグメントの96%は、彼らのコンテキストと正しくマッチすることができる。 SeePrivacyは優れた機能とユーザビリティ(4.5/5)を示している。具体的には、参加者はオリジナルのプライバシーポリシー(2/5)と比較してCPP(4.1/5)を読む意欲が強い。弊社のソリューションは、ユーザのプライバシー通知の理解を効果的に支援し、この研究は、さらなる進歩と探索のための確かな基盤を確立する。

Privacy policies have become the most critical approach to safeguarding individuals' privacy and digital security. To enhance their presentation and readability, researchers propose the concept of contextual privacy policies (CPPs), aiming to fragment policies into shorter snippets and display them only in corresponding contexts. In this paper, we propose a novel multi-modal framework, namely SeePrivacy, designed to automatically generate contextual privacy policies for mobile apps. Our method synergistically combines mobile GUI understanding and privacy policy document analysis, yielding an impressive overall 83.6% coverage rate for privacy-related context detection and an accuracy of 0.92 in extracting corresponding policy segments. Remarkably, 96% of the retrieved policy segments can be correctly matched with their contexts. The user study shows SeePrivacy demonstrates excellent functionality and usability (4.5/5). Specifically, participants exhibit a greater willingness to read CPPs (4.1/5) compared to original privacy policies (2/5). Our solution effectively assists users in comprehending privacy notices, and this research establishes a solid foundation for further advancements and exploration.

翻訳日:2023-10-23 18:27:04 公開日:2023-07-09

# 機械学習ライブラリの自動静的バグ検出:まだ存在するか?

Automatic Static Bug Detection for Machine Learning Libraries: Are We There Yet? ( http://arxiv.org/abs/2307.04080v1 )

ライセンス: Link先を確認

Nima Shiri harzevili, Jiho Shin, Junjie Wang, Song Wang, Nachiappan Nagappan

(参考訳) ソフトウェアバグの自動検出は、ソフトウェアセキュリティにおいて重要なタスクである。バグ検出に役立つ多くの静的ツールが提案されている。これらの静的バグ検出は主に、一般的なソフトウェアプロジェクトで評価されているが、機械学習ライブラリの実用性と有用性に疑問を投げかける。本稿では、mlpack、mxnet、pytorch、tensorflowを含む4つのポピュラーな機械学習ライブラリから収集されたソフトウェアバグのキュレートされたデータセットについて、rustfinder、 rats、cppcheck、facebook infer、clang static analyzerの5つ、合計410の既知のバグを分析して、この質問に答える。私たちの研究は、これらのツールの能力を分類し、機械学習ライブラリ内のソフトウェアバグを検出するツールの強みと弱みをよりよく理解する。全体として,静的なバグ検出者は6/410のバグ(0.01%),欠陥発見者およびラットが,機械学習ライブラリでソフトウェアバグを見つける上で最も効果的な静的チェッカーであることを示す。観察結果に基づいて,ツールをより効果的かつ実用的なものにするための機会を更に特定し,議論する。

Automatic detection of software bugs is a critical task in software security. Many static tools that can help detect bugs have been proposed. While these static bug detectors are mainly evaluated on general software projects call into question their practical effectiveness and usefulness for machine learning libraries. In this paper, we address this question by analyzing five popular and widely used static bug detectors, i.e., Flawfinder, RATS, Cppcheck, Facebook Infer, and Clang static analyzer on a curated dataset of software bugs gathered from four popular machine learning libraries including Mlpack, MXNet, PyTorch, and TensorFlow with a total of 410 known bugs. Our research provides a categorization of these tools' capabilities to better understand the strengths and weaknesses of the tools for detecting software bugs in machine learning libraries. Overall, our study shows that static bug detectors find a negligible amount of all bugs accounting for 6/410 bugs (0.01%), Flawfinder and RATS are the most effective static checker for finding software bugs in machine learning libraries. Based on our observations, we further identify and discuss opportunities to make the tools more effective and practical.

翻訳日:2023-10-23 18:06:00 公開日:2023-07-09

# 要件トレーサビリティ: オブジェクト指向ソフトウェアシステムの要件とソースコード間のトレーサビリティリンクの回復と可視化

Requirements Traceability: Recovering and Visualizing Traceability Links Between Requirements and Source Code of Object-oriented Software Systems ( http://arxiv.org/abs/2307.05188v1 )

ライセンス: Link先を確認

Ra'Fat Al-Msie'deen

(参考訳) 要求トレーサビリティは、要求工学において効果的な要求管理手法に到達するための重要な活動である。要件間トレーサビリティリンク(rtc-tls)は、要件とソースコードアーチファクトの関係を形作る。 RtC-TLは、ソフトウェアコードのどの部分が特定の要件を実装するかを知るのに役立つ。さらに、これらのリンクはエンジニアがソフトウェアの正しいメンタルモデルを維持するのを手助けし、大規模で複雑なソフトウェアで主に要求が変化するときにコード品質が低下するリスクを減らすことができる。しかし、これらのTLを手動でリカバリし保存することは、エンジニアにさらなる負担を与え、エラーを起こしやすく、面倒で、コストのかかる作業である。本稿では,Latent Semantic Indexing (LSI) とFormal Concept Analysis (FCA) に基づくオブジェクト指向ソフトウェアにおいて,RtC-TLを復元・可視化するための自動アプローチと実装であるYamenTraceを紹介する。 YamenTraceの独創性は、TLSリカバリプロセスにおけるすべてのコード識別子名、コメント、リレーションシップを活用することである。 YamenTraceはLSIを使用して、ソフトウェアコードと要件間のテキスト類似性を見つける。 FCAは類似のコードと要件を一緒にクラスタリングする。さらにYamenTraceは、回復したTLを視覚化する。 YamenTraceを検証するために、3つのケーススタディに適用した。この評価の結果、RtC-TLの大部分が正しく回収され、視覚化されたため、YamenTraceの提案の重要性と性能が証明された。

Requirements traceability is an important activity to reach an effective requirements management method in the requirements engineering. Requirement-to-Code Traceability Links (RtC-TLs) shape the relations between requirement and source code artifacts. RtC-TLs can assist engineers to know which parts of software code implement a specific requirement. In addition, these links can assist engineers to keep a correct mental model of software, and decreasing the risk of code quality degradation when requirements change with time mainly in large sized and complex software. However, manually recovering and preserving of these TLs puts an additional burden on engineers and is error-prone, tedious, and costly task. This paper introduces YamenTrace, an automatic approach and implementation to recover and visualize RtC-TLs in Object-Oriented software based on Latent Semantic Indexing (LSI) and Formal Concept Analysis (FCA). The originality of YamenTrace is that it exploits all code identifier names, comments, and relations in TLs recovery process. YamenTrace uses LSI to find textual similarity across software code and requirements. While FCA employs to cluster similar code and requirements together. Furthermore, YamenTrace gives a visualization of recovered TLs. To validate YamenTrace, it applied on three case studies. The findings of this evaluation prove the importance and performance of YamenTrace proposal as most of RtC-TLs were correctly recovered and visualized.

翻訳日:2023-10-23 17:54:53 公開日:2023-07-09

# 時空間学習のための半教師付きメタ学習

Semi Supervised Meta Learning for Spatiotemporal Learning ( http://arxiv.org/abs/2308.01916v1 )

ライセンス: Link先を確認

Faraz Waseem, Pratyush Muthukumar

(参考訳) メタラーニングを自己指導型マスク付きオートエンコーダに適用し,時空間学習を3段階に分けた。我々は,メタラーニングを既存の最先端表現学習アーキテクチャに適用することの影響を広く理解しようと試みている。そこで我々は,メタラーニングアーキテクチャのみ,表現学習アーキテクチャのみ,表現学習をメタラーニングアーキテクチャとともに適用するアーキテクチャという,時空間学習をテストする。メモリ拡張ニューラルネットワーク(MANN)アーキテクチャを用いて、メタ学習をフレームワークに適用する。具体的には,事前学習したMAEを適用して,ビデオ再構成作業のための小規模な時空間データセットを微調整する実験を行った。次に、maeエンコーダを訓練し、アクション分類タスクに分類ヘッドを適用する実験を行う。最後に、動作分類タスクに事前訓練されたMAEとMANNバックボーンの微調整を適用する実験を行った。

We approached the goal of applying meta-learning to self-supervised masked autoencoders for spatiotemporal learning in three steps. Broadly, we seek to understand the impact of applying meta-learning to existing state-of-the-art representation learning architectures. Thus, we test spatiotemporal learning through: a meta-learning architecture only, a representation learning architecture only, and an architecture applying representation learning alongside a meta learning architecture. We utilize the Memory Augmented Neural Network (MANN) architecture to apply meta-learning to our framework. Specifically, we first experiment with applying a pre-trained MAE and fine-tuning on our small-scale spatiotemporal dataset for video reconstruction tasks. Next, we experiment with training an MAE encoder and applying a classification head for action classification tasks. Finally, we experiment with applying a pre-trained MAE and fine-tune with MANN backbone for action classification tasks.

翻訳日:2023-08-14 02:05:57 公開日:2023-07-09

# 生成的閉ループ型人工知能による基礎科学の未来

The Future of Fundamental Science Led by Generative Closed-Loop Artificial Intelligence ( http://arxiv.org/abs/2307.07522v1 )

ライセンス: Link先を確認

Hector Zenil, Jesper Tegn\'er, Felipe S. Abrah\~ao, Alexander Lavin, Vipin Kumar, Jeremy G. Frey, Adrian Weller, Larisa Soldatova, Alan R. Bundy, Nicholas R. Jennings, Koichi Takahashi, Lawrence Hunter, Saso Dzeroski, Andrew Briggs, Frederick D. Gregory, Carla P. Gomes, Christopher K. I. Williams, Jon Rowe, James Evans, Hiroaki Kitano, Joshua B. Tenenbaum, Ross King

(参考訳) ジェネレーティブAIやLLMなど、機械学習とAIの最近の進歩は、技術革新、製品開発、社会全体を破壊している。 AIのテクノロジへの貢献は、大規模なトレーニングデータセットへのアクセスと、パターン認識や分類から生成モデルまで、パフォーマンス評価基準の明確化を必要とする複数のアプローチから得ることができる。しかしaiは、科学的な実践やモデル発見のための高品質なデータの大規模なデータセットへのアクセスが難しいため、基礎科学にはあまり貢献していない。生成的AI、特に大規模言語モデルは、定量的モデルによる基礎的な深層科学の科学的発見を拡大し加速する機会である。ここでは、自己駆動仮説生成や仮説空間のオープンエンド自律探索を含む、科学的な発見に対するAI駆動、自動化されたクローズドループアプローチの側面を調査し、調査する。 AIによる自動化を科学の実践に統合することは、発見の複製、データの体系的な生産、究極的には科学プロセスの民主化など、現在の問題を緩和する。これらの可能性を実現するには、aiのビジョンと、因果分析とモデル発見の基本的な側面に対処できるaiアプローチの多様性が必要となる。これらの進歩は、人間の科学者が達成した以上の世界の基本構造を探索し発見するAIの可能性を解き放つと約束している。このようなビジョンは、現在のワークフローを自動化するのではなく、新しい基礎科学の境界を推し進め、今日の人類が直面している最大の課題に取り組むために技術革新のための扉を開くだろう。

Recent advances in machine learning and AI, including Generative AI and LLMs, are disrupting technological innovation, product development, and society as a whole. AI's contribution to technology can come from multiple approaches that require access to large training data sets and clear performance evaluation criteria, ranging from pattern recognition and classification to generative models. Yet, AI has contributed less to fundamental science in part because large data sets of high-quality data for scientific practice and model discovery are more difficult to access. Generative AI, in general, and Large Language Models in particular, may represent an opportunity to augment and accelerate the scientific discovery of fundamental deep science with quantitative models. Here we explore and investigate aspects of an AI-driven, automated, closed-loop approach to scientific discovery, including self-driven hypothesis generation and open-ended autonomous exploration of the hypothesis space. Integrating AI-driven automation into the practice of science would mitigate current problems, including the replication of findings, systematic production of data, and ultimately democratisation of the scientific process. Realising these possibilities requires a vision for augmented AI coupled with a diversity of AI approaches able to deal with fundamental aspects of causality analysis and model discovery while enabling unbiased search across the space of putative explanations. These advances hold the promise to unleash AI's potential for searching and discovering the fundamental structure of our world beyond what human scientists have been able to achieve. Such a vision would push the boundaries of new fundamental science rather than automatize current workflows and instead open doors for technological innovation to tackle some of the greatest challenges facing humanity today.

翻訳日:2023-07-23 12:15:29 公開日:2023-07-09

# モデルバイアスからの社会不平等の解消--離婚訴訟手続における性不平等

Disentangling Societal Inequality from Model Biases: Gender Inequality in Divorce Court Proceedings ( http://arxiv.org/abs/2307.10200v1 )

ライセンス: Link先を確認

Sujan Dutta and Parth Srivastava and Vaishnavi Solunke and Swaprava Nath and Ashiqur R. KhudaBukhsh

(参考訳) 離婚は、裁判所による結婚の法的解消である。これは通常、婚姻組合の不愉快な結果であるので、各当事者は、裁判所の手続で概ね詳細に文書化されている離脱決定を呼ぶ理由があるかもしれない。本稿では,17,306件の訴訟手続の実質的コーパスとして,離婚手続のレンズを通して男女不平等を調査する。センシティブな社会問題に関する新たなデータソース(例えば、公判記録)は、社会科学研究を支援する可能性を秘めているが、最先端自然言語処理(nlp)の手法に存在するバイアスは、そのような研究に干渉または影響する可能性がある。したがって、既存のNLPリソースに存在する潜在的なギャップと限界を徹底的に分析する必要がある。本稿では,既存のNLP資源が社会的不平等の定量化にいくつかの非自明な修正を必要としていることを示す。従属的な側面では、多くの訴訟は、女性が父長制にますます挑戦しているインドにおける規範を変えることを示唆しているが、これらの訴訟のaiによる分析は、しばしば家庭内暴力にさらされる女性との男女不平等を示すことを示している。

Divorce is the legal dissolution of a marriage by a court. Since this is usually an unpleasant outcome of a marital union, each party may have reasons to call the decision to quit which is generally documented in detail in the court proceedings. Via a substantial corpus of 17,306 court proceedings, this paper investigates gender inequality through the lens of divorce court proceedings. While emerging data sources (e.g., public court records) on sensitive societal issues hold promise in aiding social science research, biases present in cutting-edge natural language processing (NLP) methods may interfere with or affect such studies. We thus require a thorough analysis of potential gaps and limitations present in extant NLP resources. In this paper, on the methodological side, we demonstrate that existing NLP resources required several non-trivial modifications to quantify societal inequalities. On the substantive side, we find that while a large number of court cases perhaps suggest changing norms in India where women are increasingly challenging patriarchy, AI-powered analyses of these court proceedings indicate striking gender inequality with women often subjected to domestic violence.

翻訳日:2023-07-23 11:27:13 公開日:2023-07-09

# グリッド衛星とゲージ計測降水データを組み合わせたアンサンブル学習

Ensemble learning for blending gridded satellite and gauge-measured precipitation data ( http://arxiv.org/abs/2307.06840v1 )

ライセンス: Link先を確認

Georgia Papacharalampous, Hristos Tyralis, Nikolaos Doulamis, Anastasios Doulamis

(参考訳) 回帰アルゴリズムは衛星降水の精度を向上させるために定期的に用いられる。この文脈では、地上測定は依存変数であり、衛星データは地形因子と共に予測変数である。これに加えて、アンサンブル学習によるアルゴリズムの組み合わせが予測性能を大幅に向上させる可能性があると多くの分野において認識されている。しかし,衛星沈殿物の精度を向上させるためのアンサンブル学習者の数は少なく,その大規模比較は文献に欠落している。本研究では,この分野で新たに11人のアンサンブル学習者を提案し,アメリカ合衆国全域と15年間にわたってそれを広範囲に比較することにより,このギャップを埋める。 PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) とIMERG (Integrated Multi-SatellitE Retrievals for GPM) のグリッド化されたデータセットから月毎のデータを利用する。また,global historical climatology network monthly database, version 2 (ghcnm) からのゲージ測定降水データも使用する。学習者は、多変量適応回帰スプライン(mars)、多変量適応多項式スプライン(poly-mars)、ランダムフォレスト(rf)、勾配ブースティングマシン(gbm)、極端な勾配ブースティング(xgboost)、ベイズ正規化ニューラルネットワーク(brnn)の6つの回帰アルゴリズム(ベース学習者)による予測を組み合わせて、それぞれ異なるコンビネータに基づいている。コンバインダーには、等重量コンバインダー、中央結合器、2つの最高の学習者、洗練された積み重ね法の7つの変種が含まれる。後者は、ベース学習者のトップに回帰アルゴリズムを積み重ねて、独立した予測を組み合わせる。

Regression algorithms are regularly used for improving the accuracy of satellite precipitation products. In this context, ground-based measurements are the dependent variable and the satellite data are the predictor variables, together with topography factors. Alongside this, it is increasingly recognised in many fields that combinations of algorithms through ensemble learning can lead to substantial predictive performance improvements. Still, a sufficient number of ensemble learners for improving the accuracy of satellite precipitation products and their large-scale comparison are currently missing from the literature. In this work, we fill this specific gap by proposing 11 new ensemble learners in the field and by extensively comparing them for the entire contiguous United States and for a 15-year period. We use monthly data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and IMERG (Integrated Multi-satellitE Retrievals for GPM) gridded datasets. We also use gauge-measured precipitation data from the Global Historical Climatology Network monthly database, version 2 (GHCNm). The ensemble learners combine the predictions by six regression algorithms (base learners), namely the multivariate adaptive regression splines (MARS), multivariate adaptive polynomial splines (poly-MARS), random forests (RF), gradient boosting machines (GBM), extreme gradient boosting (XGBoost) and Bayesian regularized neural networks (BRNN), and each of them is based on a different combiner. The combiners include the equal-weight combiner, the median combiner, two best learners and seven variants of a sophisticated stacking method. The latter stacks a regression algorithm on the top of the base learners to combine their independent predictions...

翻訳日:2023-07-14 14:08:49 公開日:2023-07-09

# 公正なアルゴリズム設計:公正で効率的なマシンスケジューリング

Fair Algorithm Design: Fair and Efficacious Machine Scheduling ( http://arxiv.org/abs/2204.06438v2 )

ライセンス: Link先を確認

April Niu, Agnes Totschnig, Adrian Vetta

(参考訳) 自動決定アルゴリズムによってバイアスが誘導される多くの実践例に動機付けられ、近年、公正アルゴリズムの設計に強い関心が寄せられている。しかし、公正性と有効性の間には二分されることが多く、公正なアルゴリズムは低い社会福祉の解決をもたらすが、福祉最適化アルゴリズムは非常に不公平である。この問題は、機械スケジューリング問題において例示されており、$n$ジョブの場合、公正なソリューションの社会的福祉は、最適な福祉よりも悪い$\Omega(n)$ファクタである可能性がある。本稿では, 公平性と有効性の二分法が, 「ほぼ完全に公平」であり, 一定の因子有効率を持つアルゴリズムが存在すること, すなわち, 社会福祉を最適福祉の一定の要因内に持つ解を出力することが保証されていることを証明した。具体的には、$\epsilon>0$に対して、有効率$\Theta(\frac{1}{\epsilon})$のメカニズムが存在し、最も公平なソリューション(個人データや型データを使用しないアルゴリズムによって)に比較して$\epsilon$分の1以上のエージェントは存在しない。さらに、これらのbicriteriaの保証は厳密であり、単一マシンケースと複数マシンケースの両方に適用できる。私たちの結果の鍵は、Paretoスケジューリングメカニズムの使用です。これらのメカニズムは、個人またはタイプデータの司法的利用によって、個々の個人に利益をもたらすパレートの改善を利用することができる。このパラダイムは、偏見を無視するコストで性能を大幅に向上させる公平なアルゴリズムによる個人データの司法的利用であり、幅広い応用が期待できる。

Motivated by a plethora of practical examples where bias is induced by automated-decision making algorithms, there has been strong recent interest in the design of fair algorithms. However, there is often a dichotomy between fairness and efficacy: fair algorithms may proffer low social welfare solutions whereas welfare optimizing algorithms may be very unfair. This issue is exemplified in the machine scheduling problem where, for $n$ jobs, the social welfare of any fair solution may be a factor $\Omega(n)$ worse than the optimal welfare. In this paper, we prove that this dichotomy between fairness and efficacy can be overcome if we allow for a negligible amount of bias: there exist algorithms that are both "almost perfectly fair" and have a constant factor efficacy ratio, that is, are guaranteed to output solutions that have social welfare within a constant factor of optimal welfare. Specifically, for any $\epsilon>0$, there exist mechanisms with efficacy ratio $\Theta(\frac{1}{\epsilon})$ and where no agent is more than an $\epsilon$ fraction worse off than they are in the fairest possible solution (given by an algorithm that does not use personal or type data). Moreover, these bicriteria guarantees are tight and apply to both the single machine case and the multiple machine case. The key to our results are the use of Pareto scheduling mechanisms. These mechanisms, by the judicious use of personal or type data, are able to exploit Pareto improvements that benefit every individual; such Pareto improvements would typically be forbidden by fair scheduling algorithms designed to satisfy standard statistical measures of group fairness. We anticipate this paradigm, the judicious use of personal data by a fair algorithm to greatly improve performance at the cost of negligible bias, has wider application.

翻訳日:2023-07-13 20:47:25 公開日:2023-07-09

# 各種計量における1中心の複雑さについて

On Complexity of 1-Center in Various Metrics ( http://arxiv.org/abs/2112.03222v3 )

ライセンス: Link先を確認

Amir Abboud, Mohammad Hossein Bateni, Vincent Cohen-Addad, Karthik C. S., and Saeed Seddighin

(参考訳) 古典的な 1 中心問題を考える: 計量空間の集合 $P$ の$n$ 点が与えられたとき、P$ の点を見つけると、他の点への最大距離が $P$ になる。我々は、この問題の複雑さを、$d$-dimensional $\ell_p$-metricsと、$d$の文字列に対するeditおよびummメトリクスで研究する。 1中心問題に対する我々の結果は以下の$d$に基づいて分類することができる。 $\bullet$ small $d$: ヒット集合予想 (hsc) を仮定すると、$d=\omega(\log n)$ のとき、$\ell_p$-metrics または編集または ulam メトリクスのいずれかにおいて、1-センタ問題を解くサブクアドラティックなアルゴリズムは存在しない。 $\bullet$ Large $d$: if $d=\Omega(n)$ では、条件付き下限を拡張して、(量子化SETHを仮定すると)1中心問題に対する部分量子アルゴリズムを除外します。一方、1+\epsilon)$-approximation for 1-center in Ulam metric with running time $\tilde{O_{\varepsilon}}(nd+n^2\sqrt{d})$とする。また、上記の下限のいくつかを近似化したり、次元 $d$ を減らすことで強化するが、全ての必要な解をリストアップするより弱いアルゴリズムのクラスに対してのみ適用する。さらに、私たちは難しさの1つを拡張して、編集メートル法でよく研究された1-median問題の下位4次アルゴリズムを除外し、長さ$n$のそれぞれ$n$文字列のセットが与えられた場合、編集距離の和をセット内の他の文字列の和に最小化する文字列を見つけることを目標としている。

We consider the classic 1-center problem: Given a set $P$ of $n$ points in a metric space find the point in $P$ that minimizes the maximum distance to the other points of $P$. We study the complexity of this problem in $d$-dimensional $\ell_p$-metrics and in edit and Ulam metrics over strings of length $d$. Our results for the 1-center problem may be classified based on $d$ as follows. $\bullet$ Small $d$: Assuming the hitting set conjecture (HSC), we show that when $d=\omega(\log n)$, no subquadratic algorithm can solve 1-center problem in any of the $\ell_p$-metrics, or in edit or Ulam metrics. $\bullet$ Large $d$: When $d=\Omega(n)$, we extend our conditional lower bound to rule out subquartic algorithms for 1-center problem in edit metric (assuming Quantified SETH). On the other hand, we give a $(1+\epsilon)$-approximation for 1-center in Ulam metric with running time $\tilde{O_{\varepsilon}}(nd+n^2\sqrt{d})$. We also strengthen some of the above lower bounds by allowing approximations or by reducing the dimension $d$, but only against a weaker class of algorithms which list all requisite solutions. Moreover, we extend one of our hardness results to rule out subquartic algorithms for the well-studied 1-median problem in the edit metric, where given a set of $n$ strings each of length $n$, the goal is to find a string in the set that minimizes the sum of the edit distances to the rest of the strings in the set.

翻訳日:2023-07-13 20:46:01 公開日:2023-07-09

# GreenKGC:軽量な知識グラフ補完方法

GreenKGC: A Lightweight Knowledge Graph Completion Method ( http://arxiv.org/abs/2208.09137v2 )

ライセンス: Link先を確認

Yun-Cheng Wang, Xiou Ge, Bin Wang, C.-C. Jay Kuo

(参考訳) 知識グラフ補完(KGC)は、知識グラフ(KG)におけるエンティティ間の欠落した関係を発見することを目的としている。初期のkgcの研究は、単純なスコアリング関数を通じてエンティティとリレーションの埋め込みを学ぶことに焦点を当てている。しかし、より高次元の埋め込み空間は、より優れた推論能力のために要求されるため、モデルのサイズが大きくなり、現実世界の問題(大規模なKGやモバイル/エッジコンピューティングなど)への適用が妨げられる。この問題に対処するために,GreenKGCと呼ばれる軽量モジュール化KGCソリューションが提案されている。 GreenKGCは、表現学習、特徴抽出、決定学習の3つのモジュールから構成され、識別可能なKG特徴を抽出し、分類器と負のサンプリングを用いて、行方不明な関係を正確に予測する。実験により、低次元では、GreenKGCはほとんどのデータセットでSOTA法より優れていることが示された。さらに、低次元のGreenKGCは、モデルサイズがはるかに小さい高次元モデルに対して、競争力や性能が向上する。

Knowledge graph completion (KGC) aims to discover missing relationships between entities in knowledge graphs (KGs). Most prior KGC work focuses on learning embeddings for entities and relations through a simple scoring function. Yet, a higher-dimensional embedding space is usually required for a better reasoning capability, which leads to a larger model size and hinders applicability to real-world problems (e.g., large-scale KGs or mobile/edge computing). A lightweight modularized KGC solution, called GreenKGC, is proposed in this work to address this issue. GreenKGC consists of three modules: representation learning, feature pruning, and decision learning, to extract discriminant KG features and make accurate predictions on missing relationships using classifiers and negative sampling. Experimental results demonstrate that, in low dimensions, GreenKGC can outperform SOTA methods in most datasets. In addition, low-dimensional GreenKGC can achieve competitive or even better performance against high-dimensional models with a much smaller model size.

翻訳日:2023-07-13 20:37:56 公開日:2023-07-09

# Misogynist Incels Forumにおけるアイデンティティ構築

Identity Construction in a Misogynist Incels Forum ( http://arxiv.org/abs/2306.15745v3 )

ライセンス: Link先を確認

Michael Miller Yoder, Chloe Perry, David West Brown, Kathleen M. Carley, Meredith L. Pruden

(参考訳) incels(online community of involuntary celibates)は、ミソグミストによるヘイトスピーチの源泉である。本稿では,ブラックパイルド・インセルズ・フォーラムである incels-dot-is において,アイデンティティグループがどのように議論されるかを検討するために,定量的テキストとネットワーク分析のアプローチを用いる。このコミュニティは幅広い新しいアイデンティティ用語を生み出しており、女性の用語が最も一般的である一方で、他のマイノリティ化されたアイデンティティの言及が増えている。アイデンティティグループと結びついた関連性の分析は、身体的な外見と性別、人種的階層が人間の価値を決定する本質的なイデオロギーを示唆している。本研究は, 自動失語症ヘイトスピーチ検出研究の意義について論じる。

Online communities of involuntary celibates (incels) are a prominent source of misogynist hate speech. In this paper, we use quantitative text and network analysis approaches to examine how identity groups are discussed on incels-dot-is, the largest black-pilled incels forum. We find that this community produces a wide range of novel identity terms and, while terms for women are most common, mentions of other minoritized identities are increasing. An analysis of the associations made with identity groups suggests an essentialist ideology where physical appearance, as well as gender and racial hierarchies, determine human value. We discuss implications for research into automated misogynist hate speech detection.

翻訳日:2023-07-13 18:47:56 公開日:2023-07-09

# ラベル効率3d-to2dセグメンテーションのためのモード間再構成と特徴投影ネットワークによる自己教師あり学習

Self-supervised learning via inter-modal reconstruction and feature projection networks for label-efficient 3D-to-2D segmentation ( http://arxiv.org/abs/2307.03008v2 )

ライセンス: Link先を確認

Jos\'e Morano, Guilherme Aresta, Dmitrii Lachinov, Julia Mai, Ursula Schmidt-Erfurth, Hrvoje Bogunovi\'c

(参考訳) 深層学習は、特定の医用画像セグメンテーションタスクを自動化し、医療専門家の作業量を大幅に軽減する貴重なツールとなっている。これらのタスクのいくつかは、入力次元のサブセットでセグメンテーションを行う必要があり、最も一般的なケースは3D-to-2Dである。しかし、既存の手法の性能は、現在これらのタスクで検証されている転送学習のようなデータ効率のよい手法がないため、ラベル付きデータの量によって強く条件付けられている。本研究では,ラベル効率のよい3D-to-2Dセグメンテーションのための新しい畳み込みニューラルネットワーク(CNN)と自己教師付き学習(SSL)手法を提案する。 cnnは、3dエンコーダと、2dデコーダからなり、新しい3d-to2dブロックで接続される。 SSL法は次元の異なるモダリティのイメージペアを再構成する。光コヒーレンス・トモグラフィーにおける地理的萎縮の面分画と直交性偽ドライセンの2つの臨床的関連性について検討した。異なるデータセット上の結果から,提案するcnnは,diceスコアの最大8%の制限付きデータを用いて,シナリオにおけるアートの状態を著しく改善することが示された。さらに,提案手法により,最大23%の性能向上が可能となり,ネットワークアーキテクチャに関係なくSSLが有効であることを示す。

Deep learning has become a valuable tool for the automation of certain medical image segmentation tasks, significantly relieving the workload of medical specialists. Some of these tasks require segmentation to be performed on a subset of the input dimensions, the most common case being 3D-to-2D. However, the performance of existing methods is strongly conditioned by the amount of labeled data available, as there is currently no data efficient method, e.g. transfer learning, that has been validated on these tasks. In this work, we propose a novel convolutional neural network (CNN) and self-supervised learning (SSL) method for label-efficient 3D-to-2D segmentation. The CNN is composed of a 3D encoder and a 2D decoder connected by novel 3D-to-2D blocks. The SSL method consists of reconstructing image pairs of modalities with different dimensionality. The approach has been validated in two tasks with clinical relevance: the en-face segmentation of geographic atrophy and reticular pseudodrusen in optical coherence tomography. Results on different datasets demonstrate that the proposed CNN significantly improves the state of the art in scenarios with limited labeled data by up to 8% in Dice score. Moreover, the proposed SSL method allows further improvement of this performance by up to 23%, and we show that the SSL is beneficial regardless of the network architecture.

翻訳日:2023-07-13 18:38:23 公開日:2023-07-09

# datacomp: 次世代のマルチモーダルデータセットの探索

DataComp: In search of the next generation of multimodal datasets ( http://arxiv.org/abs/2304.14108v3 )

ライセンス: Link先を確認

Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt

(参考訳) マルチモーダルデータセットは、安定拡散やgpt-4のような最近のブレークスルーにおいて重要な要素であるが、その設計はモデルアーキテクチャやトレーニングアルゴリズムと同じ研究の注目を集めていない。 MLエコシステムにおけるこの欠点に対処するため、私たちは、Common Crawlから128億のイメージテキストペアの候補プールを中心としたデータセット実験用のテストベッドであるDataCompを紹介した。ベンチマーク参加者は、新しいフィルタリングテクニックを設計し、新しいデータソースをキュレートし、標準化されたCLIPトレーニングコードを実行し、38の下流テストセットで結果モデルをテストすることで、新しいデータセットを評価します。ベンチマークは4桁の計算スケールで構成されており、スケーリングトレンドの研究を可能にし、様々なリソースを持つ研究者がベンチマークを利用できるようにしている。我々のベースライン実験は、DataCompのワークフローがより良いトレーニングセットをもたらすことを示している。特に、最良のベースラインであるDataComp-1Bでは、ImageNet上でCLIP ViT-L/14をゼロショット精度79.2%までトレーニングすることが可能で、同じトレーニング手順と計算を使用して、OpenAIのCLIP ViT-L/14を3.7%上回っている。 DataComp と付随するコードはすべて www.datacomp.ai でリリースしています。

Multimodal datasets are a critical component in recent breakthroughs such as Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms. To address this shortcoming in the ML ecosystem, we introduce DataComp, a testbed for dataset experiments centered around a new candidate pool of 12.8 billion image-text pairs from Common Crawl. Participants in our benchmark design new filtering techniques or curate new data sources and then evaluate their new dataset by running our standardized CLIP training code and testing the resulting model on 38 downstream test sets. Our benchmark consists of multiple compute scales spanning four orders of magnitude, which enables the study of scaling trends and makes the benchmark accessible to researchers with varying resources. Our baseline experiments show that the DataComp workflow leads to better training sets. In particular, our best baseline, DataComp-1B, enables training a CLIP ViT-L/14 from scratch to 79.2% zero-shot accuracy on ImageNet, outperforming OpenAI's CLIP ViT-L/14 by 3.7 percentage points while using the same training procedure and compute. We release DataComp and all accompanying code at www.datacomp.ai.

翻訳日:2023-07-13 16:46:21 公開日:2023-07-09

# 命令型言語モデルのゼロショットロバスト性の評価

Evaluating the Zero-shot Robustness of Instruction-tuned Language Models ( http://arxiv.org/abs/2306.11270v2 )

ライセンス: Link先を確認

Jiuding Sun, Chantal Shaib, Byron C. Wallace

(参考訳) 命令の微調整は、新しいタスクにおける大規模言語モデル(llm)のゼロショット能力を改善するための有望なアプローチとして最近登場した。この技術は、控えめな大きさのLLMの性能向上において特に強みを示しており、時にはより大型のモデルと競合する性能を誘導する。本論文では,(1)命令調整モデルと命令の特定の記述にどの程度敏感か,(2)自然言語変化に対してどのようにより強固にできるか,という2つの疑問を問う。前者に対応するために,NLP実践者が手書きした319個の命令を,広く使用されているベンチマークに含まれる80以上のユニークなタスクに対して収集し,これらの命令のばらつきと平均性能を,命令微調整中に観察された命令句と比較して評価した。我々は,新しい(観測されていない)が適切な命令句を用いることで,モデルの性能を劣化させることがある。さらに、このような自然な命令は、意味的同値にもかかわらず、下流のパフォーマンスに幅広いばらつきをもたらす。別の言い方をすれば、命令調整されたモデルは命令の再記述に対して特に堅牢ではない。本稿では,「ソフトプロンプト」埋め込みパラメータを導入し,意味的に等価な命令の表現の類似性を最大化するために最適化することで,この問題を軽減するための簡単な手法を提案する。本手法は命令調整モデルのロバスト性を常に改善することを示す。

Instruction fine-tuning has recently emerged as a promising approach for improving the zero-shot capabilities of Large Language Models (LLMs) on new tasks. This technique has shown particular strength in improving the performance of modestly sized LLMs, sometimes inducing performance competitive with much larger model variants. In this paper we ask two questions: (1) How sensitive are instruction-tuned models to the particular phrasings of instructions, and, (2) How can we make them more robust to such natural language variation? To answer the former, we collect a set of 319 instructions manually written by NLP practitioners for over 80 unique tasks included in widely used benchmarks, and we evaluate the variance and average performance of these instructions as compared to instruction phrasings observed during instruction fine-tuning. We find that using novel (unobserved) but appropriate instruction phrasings consistently degrades model performance, sometimes substantially so. Further, such natural instructions yield a wide variance in downstream performance, despite their semantic equivalence. Put another way, instruction-tuned models are not especially robust to instruction re-phrasings. We propose a simple method to mitigate this issue by introducing ``soft prompt'' embedding parameters and optimizing these to maximize the similarity between representations of semantically equivalent instructions. We show that this method consistently improves the robustness of instruction-tuned models.

翻訳日:2023-07-13 16:36:01 公開日:2023-07-09

# 自動評価におけるフィードバックの見直し

Review of feedback in Automated Essay Scoring ( http://arxiv.org/abs/2307.05553v1 )

ライセンス: Link先を確認

You-Jin Jong, Yong-Jin Kim, Ok-Chol Ri

(参考訳) 最初の自動エッセイ評価システムは50年前に開発された。自動エッセイスコアリングシステムは、従来の単純なスコアリングシステムよりもリッチな機能を持つシステムに発展しつつある。その目的は、エッセイのスコアだけでなく、ユーザの書き方を改善するための学習ツールでもある。フィードバックは、実生活で有用な自動エッセイ評価システムを構築する上で最も重要な側面である。最初のAESシステムではフィードバックの重要性が強調されていた。本稿では,異なるフィードバックタイプやエッセイ特性を含むフィードバックに関する研究についてレビューする。また,フィードバックを提供する自動エッセイ評価システムの最新事例について検討した。

The first automated essay scoring system was developed 50 years ago. Automated essay scoring systems are developing into systems with richer functions than the previous simple scoring systems. Its purpose is not only to score essays but also as a learning tool to improve the writing skill of users. Feedback is the most important aspect of making an automated essay scoring system useful in real life. The importance of feedback was already emphasized in the first AES system. This paper reviews research on feedback including different feedback types and essay traits on automated essay scoring. We also reviewed the latest case studies of the automated essay scoring system that provides feedback.

翻訳日:2023-07-13 16:28:39 公開日:2023-07-09

# グラフニューラルネットワークによるテラヘルツ型フロー誘導ナノスケール局在

Graph Neural Network-enabled Terahertz-based Flow-guided Nanoscale Localization ( http://arxiv.org/abs/2307.05551v1 )

ライセンス: Link先を確認

Gerard Calvo Bartra, Filip Lemic, Sergi Abadal, Xavier Costa Perez

(参考訳) ナノテクノロジーと先端材料における科学的進歩は、センシング、コンピューティング、通信、データ、エネルギー貯蔵機能を含む体内精密医療のためのナノスケールデバイスへの道を開く。ヒトの心血管系では、そのような装置は受動的に流れ、継続的に検知され、診断上の関心事を検出する。このような事象を検出する診断値は、フロー誘導ローカライゼーションの主命題である物理的な位置(例えば、身体領域)に割り当てることによって向上することができる。現在のフローガイド型ローカライズアプローチはローカライズ精度が低く、心血管系全体の事象をローカライズできない設計になっている。この問題に対処するために,我々はグラフニューラルネットワーク(GNN)の利用を提案し,既存の最先端技術(SotA)アプローチに対して,提案手法の局所化精度とカバレッジ向上を示す。本評価に基づき,GNN対応フロー誘導ローカライゼーションの設計ガイドラインについて述べる。

Scientific advancements in nanotechnology and advanced materials are paving the way toward nanoscale devices for in-body precision medicine; comprising integrated sensing, computing, communication, data and energy storage capabilities. In the human cardiovascular system, such devices are envisioned to be passively flowing and continuously sensing for detecting events of diagnostic interest. The diagnostic value of detecting such events can be enhanced by assigning to them their physical locations (e.g., body region), which is the main proposition of flow-guided localization. Current flow-guided localization approaches suffer from low localization accuracy and they are by-design unable to localize events within the entire cardiovascular system. Toward addressing this issue, we propose the utilization of Graph Neural Networks (GNNs) for this purpose, and demonstrate localization accuracy and coverage enhancements of our proposal over the existing State of the Art (SotA) approaches. Based on our evaluation, we provide several design guidelines for GNN-enabled flow-guided localization.

翻訳日:2023-07-13 16:28:13 公開日:2023-07-09

# 接続の開放:アインシュタインの業績におけるERブリッジとEPR

Unveiling the Connection: ER bridges and EPR in the work of Einstein ( http://arxiv.org/abs/2307.05548v1 )

ライセンス: Link先を確認

Galina Weinstein

(参考訳) 本稿では,ERブリッジ理論とその量子現象との関係について考察する。 ERブリッジ理論は量子現象に明示的に対応せず、アインシュタインがERブリッジ理論内の個々の粒子とEPRパラドックスに関わる系とを区別することを意図している、という主張が成り立つ。しかし、この論文はアインシュタインが異なる視点を持っていたと論じている。一般相対性理論を変更して量子特性の解明に尽力し、量子力学の原理に頼らずに局所現実主義、分離性、因果性、決定論といった概念を取り入れることを目指した。彼は2枚の平板を接続する平行ER橋を用いた素粒子の表現を提案した。

This paper explores the ER bridges theory and its relationship with quantum phenomena. An argument can be made that the ER bridges theory does not explicitly address quantum phenomena and implies that Einstein intended to differentiate between individual particles within the ER bridges theory and the systems involved in the EPR paradox. However, this paper contends that Einstein held a distinct viewpoint. He endeavored to elucidate quantum characteristics by modifying general relativity, aiming to incorporate concepts such as local realism, separability, causality, and determinism, without relying on the principles of quantum mechanics. He proposed representing elementary particles using parallel ER bridges connecting two flat sheets to achieve this.

翻訳日:2023-07-13 16:27:43 公開日:2023-07-09

# semeval-2023タスク1:プロンプト拡張とテキストから画像への拡散によるゼロショット視覚wsdの構成性とあいまいさの処理におけるクリップの強化

Augmenters at SemEval-2023 Task 1: Enhancing CLIP in Handling Compositionality and Ambiguity for Zero-Shot Visual WSD through Prompt Augmentation and Text-To-Image Diffusion ( http://arxiv.org/abs/2307.05564v1 )

ライセンス: Link先を確認

Jie S. Li, Yow-Ting Shiue, Yong-Siang Shih, and Jonas Geiping

(参考訳) 本稿では,Visual Word Sense Disambiguation (VWSD)タスクに対するゼロショットアプローチについて述べる。予備研究の結果,クリップを用いて候補画像とフレーズをマッチングする手法は,画像テキスト対の多対多性に苦しむことがわかった。 CLIPテキストエンコーダは、自然言語の合成性を捉える能力に制限がある可能性がある。逆に、フレーズの記述的焦点は、例によって異なる。 Augment-CLIPとStable Diffusion Smpling(SDサンプリング)という2つのシステムでこの問題に対処する。 Augment-CLIPは、大きな言語モデル(LLM)の助けを借りてコンテキストフレーズを含む文を生成することで、テキストプロンプトを強化する。あいまいな単語が他言語の曖昧な単語に翻訳される可能性があるため、他の言語のCLIPモデルについても検討する。 sdサンプリングは、テキストから画像への安定した拡散を使用して、与えられた句から複数の画像を生成する。

This paper describes our zero-shot approaches for the Visual Word Sense Disambiguation (VWSD) Task in English. Our preliminary study shows that the simple approach of matching candidate images with the phrase using CLIP suffers from the many-to-many nature of image-text pairs. We find that the CLIP text encoder may have limited abilities in capturing the compositionality in natural language. Conversely, the descriptive focus of the phrase varies from instance to instance. We address these issues in our two systems, Augment-CLIP and Stable Diffusion Sampling (SD Sampling). Augment-CLIP augments the text prompt by generating sentences that contain the context phrase with the help of large language models (LLMs). We further explore CLIP models in other languages, as the an ambiguous word may be translated into an unambiguous one in the other language. SD Sampling uses text-to-image Stable Diffusion to generate multiple images from the given phrase, increasing the likelihood that a subset of images match the one that paired with the text.

翻訳日:2023-07-13 16:17:08 公開日:2023-07-09

# ridgebase: クロスセンサー多指非接触指紋データセット

RidgeBase: A Cross-Sensor Multi-Finger Contactless Fingerprint Dataset ( http://arxiv.org/abs/2307.05563v1 )

ライセンス: Link先を確認

Bhavin Jawade, Deen Dayal Mohan, Srirangaraj Setlur, Nalini Ratha and Venu Govindaraju

(参考訳) スマートフォンカメラを用いた非接触指紋マッチングは、衛生的取得、ポータビリティ、プレゼンテーションアタックを含む従来の指紋システムの大きな課題を軽減することができる。しかし、実用的で堅牢な非接触指紋マッチング技術の開発は、大規模な実世界のデータセットの可用性に制限されている。センサ間の非接触指紋マッチングのさらなる進歩を動機付けるために, ridgebaseベンチマークデータセットを紹介する。 RidgeBaseは、異なる背景と照明条件下で88人の個人から2台のスマートフォンカメラと1台のフラットベッドコンタクトセンサーで取得された15,000以上のコンタクトレスとコンタクトベースの指紋画像からなる。既存のデータセットとは異なり、RageBaseは、コンタクトレス・トゥ・コンタクトレス(CL2CL)とコンタクト・トゥ・コンタクトレス(C2CL)の検証と識別のためのシングルフィンガーマッチングとマルチフィンガーマッチングを含む、異なるマッチングシナリオ下での研究を促進するように設計されている。さらに,同一指に属する非接触指紋のサンプル内ばらつきが高いため,顔認識データセットの進歩に触発されたセットベースマッチングプロトコルを提案する。このプロトコルは、焦点、極性、指角のばらつきを考慮できる実用的な非接触指紋マッチングのために特別に設計されている。我々は,COTS指紋マーカ(Verifinger)とDep CNNに基づくRageBaseデータセットに基づくアプローチを用いて,異なるプロトコルに対する質的,定量的なベースライン結果について報告する。データセットは以下にダウンロードできる。 https://www.buffalo.edu/cubs/research/datasets/ridgebase-benchmark-dataset。

Contactless fingerprint matching using smartphone cameras can alleviate major challenges of traditional fingerprint systems including hygienic acquisition, portability and presentation attacks. However, development of practical and robust contactless fingerprint matching techniques is constrained by the limited availability of large scale real-world datasets. To motivate further advances in contactless fingerprint matching across sensors, we introduce the RidgeBase benchmark dataset. RidgeBase consists of more than 15,000 contactless and contact-based fingerprint image pairs acquired from 88 individuals under different background and lighting conditions using two smartphone cameras and one flatbed contact sensor. Unlike existing datasets, RidgeBase is designed to promote research under different matching scenarios that include Single Finger Matching and Multi-Finger Matching for both contactless- to-contactless (CL2CL) and contact-to-contactless (C2CL) verification and identification. Furthermore, due to the high intra-sample variance in contactless fingerprints belonging to the same finger, we propose a set-based matching protocol inspired by the advances in facial recognition datasets. This protocol is specifically designed for pragmatic contactless fingerprint matching that can account for variances in focus, polarity and finger-angles. We report qualitative and quantitative baseline results for different protocols using a COTS fingerprint matcher (Verifinger) and a Deep CNN based approach on the RidgeBase dataset. The dataset can be downloaded here: https://www.buffalo.edu/cubs/research/datasets/ridgebase-benchmark-dataset.html

翻訳日:2023-07-13 16:16:48 公開日:2023-07-09

# TransPose:深度補正機能を備えたトランスフォーマーベースの6Dオブジェクトポス推定ネットワーク

TransPose: A Transformer-based 6D Object Pose Estimation Network with Depth Refinement ( http://arxiv.org/abs/2307.05561v1 )

ライセンス: Link先を確認

Mahmoud Abdulsalam and Nabil Aouf

(参考訳) ロボット操作アプリケーションへの需要が増加するにつれて、正確な視覚に基づく6dポーズ推定が自律運転に必須となる。畳み込みニューラルネットワーク(CNN)に基づくポーズ推定手法が以前にも紹介されている。しかし、特に正確なロボティクス操作では、パフォーマンス向上の追求は引き続き続いている。この探求はアグリ・ロボティクス領域にまで及ぶ。本稿では,奥行き補正モジュールを用いたトランストランスベース6次元ポーズ推定法であるtransposeを提案する。アーキテクチャはRGB画像のみを入力として取り込むが、深度や熱画像などの追加の補正は行わない。このアーキテクチャは、アップサンプリング方式で特徴ピラミッドを用いてRGB画像から深度を推定する革新的な光深度推定ネットワークを含んでいる。対象物の中心を直接後退させ,対象物の6次元姿勢を予測するために,追加予測ヘッドを備えたトランスベース検出ネットワークを提案する。次に、予測された中心、6Dポーズ、および6Dポーズの精度を向上するために、新しい深度補正モジュールが使用される。その結果を最先端の他の手法と比較し,果実摘みの応用について分析した。その結果,提案手法は文献で利用可能な他の手法よりも優れていることがわかった。

As demand for robotics manipulation application increases, accurate vision-based 6D pose estimation becomes essential for autonomous operations. Convolutional Neural Networks (CNNs) based approaches for pose estimation have been previously introduced. However, the quest for better performance still persists especially for accurate robotics manipulation. This quest extends to the Agri-robotics domain. In this paper, we propose TransPose, an improved Transformer-based 6D pose estimation with a depth refinement module. The architecture takes in only an RGB image as input with no additional supplementing modalities such as depth or thermal images. The architecture encompasses an innovative lighter depth estimation network that estimates depth from an RGB image using feature pyramid with an up-sampling method. A transformer-based detection network with additional prediction heads is proposed to directly regress the object's centre and predict the 6D pose of the target. A novel depth refinement module is then used alongside the predicted centers, 6D poses and depth patches to refine the accuracy of the estimated 6D pose. We extensively compared our results with other state-of-the-art methods and analysed our results for fruit-picking applications. The results we achieved show that our proposed technique outperforms the other methods available in the literature.

翻訳日:2023-07-13 16:16:18 公開日:2023-07-09

# 大規模自動コーディング:チリ公共医療システムにおけるレファラーの正規化のための全国的なシステムの設計と展開

Automatic Coding at Scale: Design and Deployment of a Nationwide System for Normalizing Referrals in the Chilean Public Healthcare System ( http://arxiv.org/abs/2307.05560v1 )

ライセンス: Link先を確認

Fabi\'an Villena, Mat\'ias Rojas, Felipe Arias, Jorge Pacheco, Paulina Vera, Jocelyn Dunstan

(参考訳) 疾患符号化タスクは、コントロールされた語彙から臨床文書に記載された各疾患にユニークな識別子を割り当てることを含む。このタスクは、非構造化データからの情報抽出を可能とし、例えば、特定された状況において疾患の発生率と感染率に関する疫学研究を行う。しかしながら、手動のコーディングプロセスは、医療従事者がコーディングルールや用語に精通する必要があるため、エラーとなる。さらに、このプロセスは多くの時間とエネルギーを消費し、より臨床的に関連するタスクに割り当てることができる。これらの困難は、自動的に病気にコードを割り当てる計算システムを開発することで対処できる。そこで本稿では,チリの公共医療システムから参照される疾患を自動的にコードする2段階のシステムを提案する。具体的には,病名認識に最先端のNERモデルとElasticsearchをベースとした検索エンジンシステムを用いて,これらの疾患名に関連性の高いコードを割り当てる。このシステムの性能は、臨床専門家が手作業でコーディングした基準に基づいて評価された。本システムでは,サブカテゴリレベルでは0.63,カテゴリーレベルでは0.83のマップスコアを得た。このシステムは、コーディングと管理のプロセスを最適化する健康専門家のためのサポートツールになり得る。最後に、再現性を保証するため、我々のモデルと実験のコードを公開します。

The disease coding task involves assigning a unique identifier from a controlled vocabulary to each disease mentioned in a clinical document. This task is relevant since it allows information extraction from unstructured data to perform, for example, epidemiological studies about the incidence and prevalence of diseases in a determined context. However, the manual coding process is subject to errors as it requires medical personnel to be competent in coding rules and terminology. In addition, this process consumes a lot of time and energy, which could be allocated to more clinically relevant tasks. These difficulties can be addressed by developing computational systems that automatically assign codes to diseases. In this way, we propose a two-step system for automatically coding diseases in referrals from the Chilean public healthcare system. Specifically, our model uses a state-of-the-art NER model for recognizing disease mentions and a search engine system based on Elasticsearch for assigning the most relevant codes associated with these disease mentions. The system's performance was evaluated on referrals manually coded by clinical experts. Our system obtained a MAP score of 0.63 for the subcategory level and 0.83 for the category level, close to the best-performing models in the literature. This system could be a support tool for health professionals, optimizing the coding and management process. Finally, to guarantee reproducibility, we publicly release the code of our models and experiments.

翻訳日:2023-07-13 16:16:01 公開日:2023-07-09

# スパイク・アンド・スラブによるベイズ線形回帰の推定からサンプリングへ

From Estimation to Sampling for Bayesian Linear Regression with Spike-and-Slab Prior ( http://arxiv.org/abs/2307.05558v1 )

ライセンス: Link先を確認

Qijia Jiang

(参考訳) 後方収縮特性を利用した事前及び設計効率的なサンプリングアルゴリズムを用いてベイズ線形回帰を考察する。ガウスのスパイク・アンド・スラブ(統計的にも計算的にも好適)による準類似性を調査し、ギブスサンプリングと確率的局在に基づく2つのアルゴリズムを、スパース植込み信号の正当な推論を可能にする同じ(quite natural)統計仮定の下で解析する。 Stochastic Localization samplerの利点は、よく設計されていないデータマトリックスで特に顕著である。

We consider Bayesian linear regression with sparsity-inducing prior and design efficient sampling algorithms leveraging posterior contraction properties. A quasi-likelihood with Gaussian spike-and-slab (that is favorable both statistically and computationally) is investigated and two algorithms based on Gibbs sampling and Stochastic Localization are analyzed, both under the same (quite natural) statistical assumptions that also enable valid inference on the sparse planted signal. The benefit of the Stochastic Localization sampler is particularly prominent for data matrix that is not well-designed.

翻訳日:2023-07-13 16:15:39 公開日:2023-07-09

# オッペンハイマーとスナイダー重力崩壊の量子化によるブラックホールのシュル・オーディンガーとクライン=ゴルドンの理論

Schr\"odinger and Klein-Gordon theories of black holes from the quantization of the Oppenheimer and Snyder gravitational collapse ( http://arxiv.org/abs/2307.05554v1 )

ライセンス: Link先を確認

Christian Corda

(参考訳) シュワルツシルトブラックホール (bh) のシュルツチャイルド方程式は、bh が中心場と相互作用する「電子」、すなわち「核」からなることを示しており、ド・ブロイの仮説により、bh ホライズンモードの観点で「電子」を解釈する。量子重力効果はプランクスケールではなくシュワルツシルトスケールでのBH半古典構造を変化させる。この BH Schr\"odinger 方程式と水素原子の s 状態の Schr\"odinger 方程式の類似により、同じ方程式を解くことができる。したがって、BHは「重力水素原子」というシュリンガーの理論に従うよく定義された量子重力系である。 By identifying the potential energy in the BH Schr\"odinger equation as being the gravitational energy of a spherically symmetric shell, a different nature of the quantum BH seems to surface. BHs are self-interacting, highly excited, spherically symmetric, massive quantum shells generated by matter condensing on the apparent horizon, concretely realizing the membrane paradigm. The quantum BH descripted as a "gravitational hydrogen atom" is a fictitious mathematical representation of the real, quantum BH, a quantum massive shell having as radius the oscillating gravitational radius. この結果から自明な結果が生まれます i) bhs は地平線も特異点も持たない。 ii) bh蒸発における情報損失もbh相補性もファイアウォールパラドックスも存在しない。これらの結果は、Hawking、Vaz、Mitraなどによる以前のものと一致している。最後に、BH Schr\\odinger方程式に対する特殊相対論的補正は、BH Klein-Gordon方程式と対応する固有値を与える。

The Schr\"odinger equation of the Schwarzschild black hole (BH) shows that a BH is composed of a particle, the "electron", interacting with a central field, the "nucleus". Via de Broglie's hypothesis, one interprets the "electron" in terms of BH horizon's modes. Quantum gravity effects modify the BH semi-classical structure at the Schwarzschild scale rather than at the Planck scale. The analogy between this BH Schr\"odinger equation and the Schr\"odinger equation of the s states of the hydrogen atom permits us to solve the same equation. Therefore, BHs are well defined quantum gravitational systems obeying Schr\"odinger's theory: the "gravitational hydrogen atoms". By identifying the potential energy in the BH Schr\"odinger equation as being the gravitational energy of a spherically symmetric shell, a different nature of the quantum BH seems to surface. BHs are self-interacting, highly excited, spherically symmetric, massive quantum shells generated by matter condensing on the apparent horizon, concretely realizing the membrane paradigm. The quantum BH descripted as a "gravitational hydrogen atom" is a fictitious mathematical representation of the real, quantum BH, a quantum massive shell having as radius the oscillating gravitational radius. Nontrivial consequences emerge from this result: i) BHs have neither horizons nor singularities; ii) there is neither information loss in BH evaporation, nor BH complementarity, nor firewall paradox. These results are consistent with previous ones by Hawking, Vaz, Mitra and others. Finally, the special relativistic corrections to the BH Schr\"odinger equation give the BH Klein-Gordon equation and the corresponding eigenvalues.

翻訳日:2023-07-13 16:15:27 公開日:2023-07-09

# 非構造化データから学習構造をパーソナライズした強化学習要約サービス

A Personalized Reinforcement Learning Summarization Service for Learning Structure from Unstructured Data ( http://arxiv.org/abs/2307.05696v1 )

ライセンス: Link先を確認

Samira Ghodratnama, Amin Beheshti, Mehrdad Zakershahrak

(参考訳) テキストデータの指数関数的な成長は、有意義な洞察の抽出を支援するツールに対する重要なニーズを生み出した。従来の文書要約アプローチは、個々のユーザ要求を満たすことができず、効率的な情報処理のための構造が欠如していることが多い。これらの制限に対処するため,我々は階層型パーソナライズ概念に基づく要約手法であるsummationを提案する。文書を簡潔な階層的な概念マップに合成し、ユーザの好みを学習し、適応することによって、積極的にユーザと対話する。 Reinforcement Learningアルゴリズムを用いて、Summationは特定のトピックに関する未確認文書のパーソナライズされた要約を生成する。このフレームワークは、理解を高め、効果的なナビゲーションを可能にし、ユーザが独自の要求に沿う大きなドキュメントコレクションから意味のある洞察を抽出できるようにする。

The exponential growth of textual data has created a crucial need for tools that assist users in extracting meaningful insights. Traditional document summarization approaches often fail to meet individual user requirements and lack structure for efficient information processing. To address these limitations, we propose Summation, a hierarchical personalized concept-based summarization approach. It synthesizes documents into a concise hierarchical concept map and actively engages users by learning and adapting to their preferences. Using a Reinforcement Learning algorithm, Summation generates personalized summaries for unseen documents on specific topics. This framework enhances comprehension, enables effective navigation, and empowers users to extract meaningful insights from large document collections aligned with their unique requirements.

翻訳日:2023-07-13 15:27:22 公開日:2023-07-09

# 科学文献における図形分類手法に関する調査

A Survey on Figure Classification Techniques in Scientific Documents ( http://arxiv.org/abs/2307.05694v1 )

ライセンス: Link先を確認

Anurag Dhote and Mohammed Javed and David S Doermann

(参考訳) 図は重要な情報を視覚的に表現し、科学的事実を伝える効果的な手段を提供する。近年、さまざまな人工知能と機械学習技術を用いて、図、特に表、図、プロットから直接データを抽出する取り組みが数多く行われている。これは、数字から情報を取り除くことが、科学文書で強調された概念に対する深い洞察をもたらす可能性があるためである。本稿では,図を5つのクラス(表,写真,図,地図,プロット)に体系的に分類し,その上で,図形分類の問題に対処する既存の方法論とデータセットについて批判的なレビューを行う。最後に,現在の研究のギャップを特定し,図分類に関するさらなる研究の方向性を示す。

Figures visually represent an essential piece of information and provide an effective means to communicate scientific facts. Recently there have been many efforts toward extracting data directly from figures, specifically from tables, diagrams, and plots, using different Artificial Intelligence and Machine Learning techniques. This is because removing information from figures could lead to deeper insights into the concepts highlighted in the scientific documents. In this survey paper, we systematically categorize figures into five classes - tables, photos, diagrams, maps, and plots, and subsequently present a critical review of the existing methodologies and data sets that address the problem of figure classification. Finally, we identify the current research gaps and provide possible directions for further research on figure classification.

翻訳日:2023-07-13 15:26:55 公開日:2023-07-09

# HA-ViD:統合アセンブリ理解のためのヒューマンアセンブリビデオデータセット

HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding ( http://arxiv.org/abs/2307.05721v1 )

ライセンス: Link先を確認

Hao Zheng, Regina Lee, Yuqian Lu

(参考訳) ビデオから総合的な組み立て知識を理解することは、未来的な超知能産業にとって不可欠である。技術的ブレークスルーを実現するため、HA-ViDは、産業的な組み立てシナリオ、自然な手続き的知識獲得プロセス、一貫性のあるヒューマンロボット共有アノテーションを特徴とする、最初のヒューマンアセンブリビデオデータセットである。特に、HA-ViDは、現実世界のアセンブリ、自然な人間の振る舞い、組み立て中の学習の進行の多様なコラボレーションパターンをキャプチャし、主語、アクション動詞、操作対象、ターゲット対象、ツールに対するグラニュレートなアクションアノテーションをキャプチャする。マルチビュー・マルチモーダルビデオ(各ビデオは1つの組立タスクを含む)、1.5Mフレーム、96K時間ラベル、2M空間ラベルを提供する。我々は、アクション認識、アクションセグメンテーション、オブジェクト検出、マルチオブジェクトトラッキングの4つの基本的なビデオ理解タスクをベンチマークする。重要なことは、アセンブリの進捗、プロセス効率、タスクコラボレーション、スキルパラメータ、人間の意図といった知識を理解するために、それらのパフォーマンスを分析することである。 HA-ViDの詳細は以下の通り。

Understanding comprehensive assembly knowledge from videos is critical for futuristic ultra-intelligent industry. To enable technological breakthrough, we present HA-ViD - the first human assembly video dataset that features representative industrial assembly scenarios, natural procedural knowledge acquisition process, and consistent human-robot shared annotations. Specifically, HA-ViD captures diverse collaboration patterns of real-world assembly, natural human behaviors and learning progression during assembly, and granulate action annotations to subject, action verb, manipulated object, target object, and tool. We provide 3222 multi-view, multi-modality videos (each video contains one assembly task), 1.5M frames, 96K temporal labels and 2M spatial labels. We benchmark four foundational video understanding tasks: action recognition, action segmentation, object detection and multi-object tracking. Importantly, we analyze their performance for comprehending knowledge in assembly progress, process efficiency, task collaboration, skill parameters and human intention. Details of HA-ViD is available at: https://iai-hrc.github.io/ha-vid.

翻訳日:2023-07-13 15:15:47 公開日:2023-07-09

# SpreadNUTS -- No-U-Turn サンプリングのための経路の動的拡張と訪問地域分割

SpreadNUTS -- Moderate Dynamic Extension of Paths for No-U-Turn Sampling & Partitioning Visited Regions ( http://arxiv.org/abs/2307.06279v1 )

ライセンス: Link先を確認

Fareed Sheriff

(参考訳) マルコフ連鎖モンテカルロ法(MCMC)は長い間存在しており、その分野はよく研究されている。 MCMC法の目的は、繰り返しサンプリングによって分布を近似することであり、ほとんどのMCMCアルゴリズムは、その極限で真の分布に収束する漸近的に最適な挙動を示す。しかし、これらのアルゴリズムを区別しているのは、実用的収束保証と効率性である。サンプリング器は最終的に分布をよく近似することができるが、実世界で使用されるため、サンプリング器が良い推定値を得る点が妥当な時間内に到達可能である必要がある。同様に、推定に使用する分布から良いサンプルを生成するのが計算的に困難または難解であれば、サンプリング者が利用できる実世界のユーティリティは存在しない。したがって、最近のMCMC手法のほとんどは効率の向上と収束のスピードアップに重点を置いている。しかし、多くのmcmcアルゴリズムはランダムウォークに苦しむため、ランダムウォークを消去するなど、そのような動作を緩和することは困難である。ハミルトニアン・モンテカルロ(英: Hamiltonian Monte Carlo、HMC)は、理論上はハミルトニアン力学に関連する性質のためランダムウォークの振る舞いを示さないMCMC法の一種である。本稿では, NUTSよりも高速にサンプル空間を探索することを目的とした, No-U-turn sampler (NUTS) と呼ばれる特定のHMCアルゴリズムの修正について述べる。

Markov chain Monte Carlo (MCMC) methods have existed for a long time and the field is well-explored. The purpose of MCMC methods is to approximate a distribution through repeated sampling; most MCMC algorithms exhibit asymptotically optimal behavior in that they converge to the true distribution at the limit. However, what differentiates these algorithms are their practical convergence guarantees and efficiency. While a sampler may eventually approximate a distribution well, because it is used in the real world it is necessary that the point at which the sampler yields a good estimate of the distribution is reachable in a reasonable amount of time. Similarly, if it is computationally difficult or intractable to produce good samples from a distribution for use in estimation, then there is no real-world utility afforded by the sampler. Thus, most MCMC methods these days focus on improving efficiency and speeding up convergence. However, many MCMC algorithms suffer from random walk behavior and often only mitigate such behavior as outright erasing random walks is difficult. Hamiltonian Monte Carlo (HMC) is a class of MCMC methods that theoretically exhibit no random walk behavior because of properties related to Hamiltonian dynamics. This paper introduces modifications to a specific HMC algorithm known as the no-U-turn sampler (NUTS) that aims to explore the sample space faster than NUTS, yielding a sampler that has faster convergence to the true distribution than NUTS.

翻訳日:2023-07-13 12:20:57 公開日:2023-07-09

# 大規模言語モデルの評価に関する調査

A Survey on Evaluation of Large Language Models ( http://arxiv.org/abs/2307.03109v2 )

ライセンス: Link先を確認

Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Kaijie Zhu, Hao Chen, Linyi Yang, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, and Xing Xie

(参考訳) 大規模言語モデル(LLM)は、様々なアプリケーションにおける前例のない性能のため、学術と産業の両方で人気が高まっている。 LLMは研究と日常利用の両方において重要な役割を担い続けており、その評価はタスクレベルだけでなく社会レベルでもますます重要になり、潜在的なリスクの理解を深めている。過去数年間、様々な観点からLSMを調べるための重要な努力が続けられてきた。本稿では, これらのLCMの評価手法を総合的に検討し, 評価方法, 評価方法, 評価方法の3つの重要な側面に着目した。まず,一般的な自然言語処理タスク,推論,医療利用,倫理,教育,自然科学,社会科学,エージェント応用など,評価タスクの観点から概観する。第2に,LLMの性能評価において重要な要素である評価手法とベンチマークに飛び乗ることで,'where' と 'how' の質問に答える。次に、異なるタスクにおけるLCMの成功事例と失敗事例を要約する。最後に、llms評価の先にあるいくつかの将来の課題に光を当てた。我々の目的は、LLMの評価の領域における研究者に貴重な洞察を提供することであり、それによってより熟練したLLMの開発を支援することである。我々のキーポイントは、LCMの開発を支援するために、評価を必須の規律として扱うべきであるということです。関連したオープンソース資料は、https://github.com/mlgroupjlu/llm-eval-surveyで一貫して保守しています。

Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented performance in various applications. As LLMs continue to play a vital role in both research and daily use, their evaluation becomes increasingly critical, not only at the task level, but also at the society level for better understanding of their potential risks. Over the past years, significant efforts have been made to examine LLMs from various perspectives. This paper presents a comprehensive review of these evaluation methods for LLMs, focusing on three key dimensions: what to evaluate, where to evaluate, and how to evaluate. Firstly, we provide an overview from the perspective of evaluation tasks, encompassing general natural language processing tasks, reasoning, medical usage, ethics, educations, natural and social sciences, agent applications, and other areas. Secondly, we answer the `where' and `how' questions by diving into the evaluation methods and benchmarks, which serve as crucial components in assessing performance of LLMs. Then, we summarize the success and failure cases of LLMs in different tasks. Finally, we shed light on several future challenges that lie ahead in LLMs evaluation. Our aim is to offer invaluable insights to researchers in the realm of LLMs evaluation, thereby aiding the development of more proficient LLMs. Our key point is that evaluation should be treated as an essential discipline to better assist the development of LLMs. We consistently maintain the related open-source materials at: https://github.com/MLGroupJLU/LLM-eval-survey.

翻訳日:2023-07-12 17:50:45 公開日:2023-07-09

# mentalhealthai: パーソナルヘルスデバイスデータを利用して精神科治療を最適化する

MentalHealthAI: Utilizing Personal Health Device Data to Optimize Psychiatry Treatment ( http://arxiv.org/abs/2307.04777v1 )

ライセンス: Link先を確認

Manan Shukla and Oshani Seneviratne

(参考訳) 精神疾患は現代医療において重要な課題であり、診断と治療はしばしば主観的な患者の記述と過去の医療史に依存している。この問題に対処するため,個人健康装置を用いて収集した患者の生理的データを利用する,個別のメンタルヘルストラッキングと気分予測システムを提案する。本システムでは,スマートコントラクトを用いた移動と連合型機械学習の概念を組み合わせた分散学習機構を活用し,ユーザのデバイスにデータを残し,プライバシを意識し説明可能な方法で精神科治療と管理のためのメンタルヘルス状態の効果的な追跡を可能にする。我々は、有望な結果を示す一般的なメンタルヘルスデータセットを用いてモデルを評価する。統合医療システムと機械学習モデルを利用することで、精神科医に従来のオフィス訪問以外の患者のメンタルヘルスに関するさらなる洞察を与えるという課題に対する新しい解決策を提供する。

Mental health disorders remain a significant challenge in modern healthcare, with diagnosis and treatment often relying on subjective patient descriptions and past medical history. To address this issue, we propose a personalized mental health tracking and mood prediction system that utilizes patient physiological data collected through personal health devices. Our system leverages a decentralized learning mechanism that combines transfer and federated machine learning concepts using smart contracts, allowing data to remain on users' devices and enabling effective tracking of mental health conditions for psychiatric treatment and management in a privacy-aware and accountable manner. We evaluate our model using a popular mental health dataset that demonstrates promising results. By utilizing connected health systems and machine learning models, our approach offers a novel solution to the challenge of providing psychiatrists with further insight into their patients' mental health outside of traditional office visits.

翻訳日:2023-07-12 17:29:41 公開日:2023-07-09

# 機械学習とニューラルネットワークを用いた脳波信号の感情解析

Emotion Analysis on EEG Signal Using Machine Learning and Neural Network ( http://arxiv.org/abs/2307.05375v1 )

ライセンス: Link先を確認

S. M. Masrur Ahmed (1), Eshaan Tanzim Sabur (2) ((1) bKash Limited, (2) BRAC University)

(参考訳) 感情は他人の考えや相互作用に大きな影響を与える。これは、その人の気持ちと行動とを結びつける役割を担っているが、時には人生の判断に影響を及ぼすともいえる。感情のパターンとその反射は人によって異なるため、その調査は幅広い地域において有効であるアプローチに基づいて行われる必要がある。特徴を抽出し精度を高めるため、脳波や脳波信号を用いた感情認識には、効率的な信号処理技術の実装が必要である。人間と機械の相互作用技術への様々なアプローチは長い間進行中であり、近年では脳信号を使って感情を自動的に理解することに成功した。本研究では、SVM(Support Vector Machine)、KNN(K-Nearest Neighbor)、LSTM(Long Short Term Memory)をトレーニングした先進ニューラルネットワークモデルRNN(Recurrent Neural Network)を用いて、よく知られた公開データセットであるDEAPデータセットから収集された脳波信号に基づいて、いくつかの感情状態を分類、検証した。本研究の目的は,脳信号を用いた感情認識性能を改善する方法を改善することである。一方、感情は時間とともに変化します。その結果,時間経過に伴う感情の変化についても検討した。

Emotion has a significant influence on how one thinks and interacts with others. It serves as a link between how a person feels and the actions one takes, or it could be said that it influences one's life decisions on occasion. Since the patterns of emotions and their reflections vary from person to person, their inquiry must be based on approaches that are effective over a wide range of population regions. To extract features and enhance accuracy, emotion recognition using brain waves or EEG signals requires the implementation of efficient signal processing techniques. Various approaches to human-machine interaction technologies have been ongoing for a long time, and in recent years, researchers have had great success in automatically understanding emotion using brain signals. In our research, several emotional states were classified and tested on EEG signals collected from a well-known publicly available dataset, the DEAP Dataset, using SVM (Support Vector Machine), KNN (K-Nearest Neighbor), and an advanced neural network model, RNN (Recurrent Neural Network), trained with LSTM (Long Short Term Memory). The main purpose of this study is to improve ways to improve emotion recognition performance using brain signals. Emotions, on the other hand, can change with time. As a result, the changes in emotion over time are also examined in our research.

翻訳日:2023-07-12 14:15:44 公開日:2023-07-09

# 書き直し規則による線形プログラム間の等価性を示す自己教師付き学習

Self-Supervised Learning to Prove Equivalence Between Straight-Line Programs via Rewrite Rules ( http://arxiv.org/abs/2109.10476v4 )

ライセンス: Link先を確認

Steve Kommrusch, Martin Monperrus and Louis-No\"el Pouchet

(参考訳) 文列からなる2つのプログラム間の意味同値性の証明を自動合成する問題を対象とする。抽象構文木(AST)を用いてプログラムを表現し、特定のASTパターンに特定のセマンティクス保存規則を適用することで、変換され、意味的に等価なプログラムを生成する。このシステムでは、2つのプログラムが等価であり、この書き換え規則の適用順序が1つのプログラムをもう1つのプログラムに書き換える結果となる場合である。本稿では,プログラムペア間の等価性の証明を生成するトランスフォーマーモデルに基づくニューラルネットワークアーキテクチャを提案する。システムは書き直しのシーケンスを出力し、そのシーケンスの妥当性は、適用可能な検証によって単純にチェックされる。ニューラルネットワークによって有効なシーケンスが生成されない場合、システムはプログラムを等価でないと報告し、設計によってプログラムが不正に等価であると報告されないようにする。本システムは,関数呼び出しと複数の型を持つ直線プログラムを表現可能な単一文法に対して完全に実装されている。このようなシーケンスを生成するシステムを効率的にトレーニングするために,自己教師付きサンプル選択という独自のインクリメンタルトレーニング手法を開発した。本稿では,この新たなトレーニング手法の有効性を,複雑さと長さの増大の証明に広く研究する。私たちのシステムであるs4eqは、1万組の同等プログラムのデータセットで97%の証明成功を達成しています。

We target the problem of automatically synthesizing proofs of semantic equivalence between two programs made of sequences of statements. We represent programs using abstract syntax trees (AST), where a given set of semantics-preserving rewrite rules can be applied on a specific AST pattern to generate a transformed and semantically equivalent program. In our system, two programs are equivalent if there exists a sequence of application of these rewrite rules that leads to rewriting one program into the other. We propose a neural network architecture based on a transformer model to generate proofs of equivalence between program pairs. The system outputs a sequence of rewrites, and the validity of the sequence is simply checked by verifying it can be applied. If no valid sequence is produced by the neural network, the system reports the programs as non-equivalent, ensuring by design no programs may be incorrectly reported as equivalent. Our system is fully implemented for one single grammar which can represent straight-line programs with function calls and multiple types. To efficiently train the system to generate such sequences, we develop an original incremental training technique, named self-supervised sample selection. We extensively study the effectiveness of this novel training approach on proofs of increasing complexity and length. Our system, S4Eq, achieves 97% proof success on a curated dataset of 10,000 pairs of equivalent programs.

翻訳日:2023-07-11 23:06:01 公開日:2023-07-09

# SeedGNN: 教師付きグラフマッチングのためのグラフニューラルネットワーク

SeedGNN: Graph Neural Networks for Supervised Seeded Graph Matching ( http://arxiv.org/abs/2205.13679v3 )

ライセンス: Link先を確認

Liren Yu, Jiaming Xu, Xiaojun Lin

(参考訳) グラフマッチングのためのグラフニューラルネットワーク(gnns)の設計には、トポロジカルな情報と小さなシードノードのみを使用して、2つのラベルのないグラフをマッチングすることを目的としている。しかし、このタスクの以前のgnnのほとんどは半教師付きアプローチを使用しており、大量の種を必要とし、見当たらないグラフに転送可能な知識を学べない。対照的に本論文では,未発見のグラフを数種の種とマッチングする方法を学習する新しい教師付きアプローチを提案する。私たちのSeedGNNアーキテクチャは、シードグラフマッチングの理論研究に触発された、いくつかの新しい設計を取り入れています。 1) 異なる大きさのグラフに一般化できる方法で、異なるホップから目撃者のような情報を計算し使用することを学ぶことができる。 2) 容易に整合したノードペアを新しいシードとして使用して,その後のレイヤでの整合性を改善する。合成グラフおよび実世界のグラフ上でのSeedGNNの評価を行い,既存の文献における非学習アルゴリズムと学習アルゴリズムを比較検討した。さらに,学習グラフからseedgnnから得られた知識を,サイズやカテゴリの異なるテストグラフに一般化できることを確認した。

There is a growing interest in designing Graph Neural Networks (GNNs) for seeded graph matching, which aims to match two unlabeled graphs using only topological information and a small set of seed nodes. However, most previous GNNs for this task use a semi-supervised approach, which requires a large number of seeds and cannot learn knowledge that is transferable to unseen graphs. In contrast, this paper proposes a new supervised approach that can learn from a training set how to match unseen graphs with only a few seeds. Our SeedGNN architecture incorporates several novel designs, inspired by theoretical studies of seeded graph matching: 1) it can learn to compute and use witness-like information from different hops, in a way that can be generalized to graphs of different sizes; 2) it can use easily-matched node-pairs as new seeds to improve the matching in subsequent layers. We evaluate SeedGNN on synthetic and real-world graphs and demonstrate significant performance improvements over both non-learning and learning algorithms in the existing literature. Furthermore, our experiments confirm that the knowledge learned by SeedGNN from training graphs can be generalized to test graphs of different sizes and categories.

翻訳日:2023-07-11 22:56:37 公開日:2023-07-09

# 量子最適化アルゴリズムはどの程度必要か?

How Much Entanglement Do Quantum Optimization Algorithms Require? ( http://arxiv.org/abs/2205.12283v2 )

ライセンス: Link先を確認

Yanzhu Chen, Linghua Zhu, Chenxu Liu, Nicholas J. Mayhall, Edwin Barnes, and Sophia E. Economou

(参考訳) 多くの古典的最適化問題は、量子近似最適化アルゴリズム(qaoa)のような変分量子アルゴリズムがヒューリスティックな手法を提供する対角イジングハミルトンの基底状態を見つけるためにマッピングすることができる。このような古典的最適化問題の解は必ずしも積状態であるため、絡み合いが性能に与える影響は明らかでない。 QAOAのAdaptive Derivative-Assembled Problem-Tailored (ADAPT) 変動は、回路全体のCNOTゲートが少なくなるのに対して、ミキサー層におけるエンタングリング操作を許容することで収束率を向上させる。本研究では,ADAPT-QAOAの実行時に発生する絡みについて検討する。重み付きMax-Cut問題のシミュレーションにより、ADAPT-QAOAは量子ビットのエンタングおよびアンタングリングにおいてかなりの柔軟性を示すことを示す。この柔軟性を漸進的に制限することにより、初期におけるより多くの絡み合いエントロピーが、後段におけるより速い収束と一致することが分かる。対照的に、標準QAOAはいくつかの層内での絡み合いを迅速に生成するが、過剰な絡み合いを効率的に除去することはできない。量子最適化における絡み合いの役割は微妙であり、量子最適化アルゴリズムに有利な特徴を構築するためのガイダンスを提供する。

Many classical optimization problems can be mapped to finding the ground states of diagonal Ising Hamiltonians, for which variational quantum algorithms such as the Quantum Approximate Optimization Algorithm (QAOA) provide heuristic methods. Because the solutions of such classical optimization problems are necessarily product states, it is unclear how entanglement affects their performance. An Adaptive Derivative-Assembled Problem-Tailored (ADAPT) variation of QAOA improves the convergence rate by allowing entangling operations in the mixer layers whereas it requires fewer CNOT gates in the entire circuit. In this work, we study the entanglement generated during the execution of ADAPT-QAOA. Through simulations of the weighted Max-Cut problem, we show that ADAPT-QAOA exhibits substantial flexibility in entangling and disentangling qubits. By incrementally restricting this flexibility, we find that a larger amount of entanglement entropy at earlier stages coincides with faster convergence at later stages. In contrast, while the standard QAOA quickly generates entanglement within a few layers, it cannot remove excess entanglement efficiently. Our results demonstrate that the role of entanglement in quantum optimization is subtle and provide guidance for building favorable features into quantum optimization algorithms.

翻訳日:2023-07-11 22:56:16 公開日:2023-07-09

# 未知動環境における高速運動計画のための障害物同定と楕円形分解

Obstacle Identification and Ellipsoidal Decomposition for Fast Motion Planning in Unknown Dynamic Environments ( http://arxiv.org/abs/2209.14233v4 )

ライセンス: Link先を確認

Mehmetcan Kaymaz and Nazim Kemal Ure

(参考訳) 未知の環境における動的障害物の存在による衝突回避は、無人システムにとって最も重要な課題の1つである。本稿では,楕円体の観点から障害物を識別し,線形および角障害物速度を推定する手法を提案する。提案手法は,任意の物体を楕円体で近似的に表現できるという考えに基づいている。そこで本研究では,ガウス混合モデルの変分ベイズ推定法,カチヤンアルゴリズム,精細化アルゴリズムを提案する。提案手法はクラスタ数の知識を必要とせず,既存の最適化手法と異なり,リアルタイムに動作可能である。さらに,2つの時間的近接点フレームの障害物に一致する楕円型特徴ベクトルを定義する。本手法は, 回転する障害物を含む静的および動的障害のある環境に適用することができる。このアルゴリズムを他のクラスタリング手法と比較し,軌道プランナーと組み合わせることで,動的障害が存在する場合,システム全体が未知の環境を効率的に横断できることを示す。

Collision avoidance in the presence of dynamic obstacles in unknown environments is one of the most critical challenges for unmanned systems. In this paper, we present a method that identifies obstacles in terms of ellipsoids to estimate linear and angular obstacle velocities. Our proposed method is based on the idea of any object can be approximately expressed by ellipsoids. To achieve this, we propose a method based on variational Bayesian estimation of Gaussian mixture model, the Kyachiyan algorithm, and a refinement algorithm. Our proposed method does not require knowledge of the number of clusters and can operate in real-time, unlike existing optimization-based methods. In addition, we define an ellipsoid-based feature vector to match obstacles given two timely close point frames. Our method can be applied to any environment with static and dynamic obstacles, including the ones with rotating obstacles. We compare our algorithm with other clustering methods and show that when coupled with a trajectory planner, the overall system can efficiently traverse unknown environments in the presence of dynamic obstacles.

翻訳日:2023-07-11 22:47:15 公開日:2023-07-09

# 深層学習のための勾配に基づくbiレベル最適化に関する研究

Gradient-based Bi-level Optimization for Deep Learning: A Survey ( http://arxiv.org/abs/2207.11719v4 )

ライセンス: Link先を確認

Can Chen, Xi Chen, Chen Ma, Zixuan Liu, Xue Liu

(参考訳) 双レベル最適化,特に勾配に基づくカテゴリは,ハイパーパラメータ最適化やメタ知識抽出など,ディープラーニングコミュニティで広く利用されている。双レベル最適化は別の問題に埋め込まれ、勾配に基づくカテゴリは、進化アルゴリズムのような古典的な手法よりもはるかに効率的な過次性を計算することによって、外層課題を解決する。本研究では,まず,勾配に基づくbiレベル最適化を形式的に定義する。次に、二段階最適化に研究課題が適しているかどうかを判断するための基準を明確にし、これらの問題を二段階最適化フレームワークに構造化するための実践的なガイドを提供する。具体的には、正規化パラメータや蒸留データなどのハイパーパラメータを最適化するシングルタスク定式化と、モデル初期化のようなメタ知識を抽出するマルチタスク定式化の2つがある。次に,2段階の定式化により,外変数の明示的な勾配更新,プロキシ更新,暗黙的関数更新,クローズドフォーム更新を含む4つの2段階最適化ソルバについて検討する。最後に,(1)課題定式化のレンズを通して検証した科学における効果的なデータ最適化の2つの今後の方向性を強調することで調査をまとめる。 2)最適化の観点から解析した正確な明示的プロキシ更新。

Bi-level optimization, especially the gradient-based category, has been widely used in the deep learning community including hyperparameter optimization and meta-knowledge extraction. Bi-level optimization embeds one problem within another and the gradient-based category solves the outer-level task by computing the hypergradient, which is much more efficient than classical methods such as the evolutionary algorithm. In this survey, we first give a formal definition of the gradient-based bi-level optimization. Next, we delineate criteria to determine if a research problem is apt for bi-level optimization and provide a practical guide on structuring such problems into a bi-level optimization framework, a feature particularly beneficial for those new to this domain. More specifically, there are two formulations: the single-task formulation to optimize hyperparameters such as regularization parameters and the distilled data, and the multi-task formulation to extract meta-knowledge such as the model initialization. With a bi-level formulation, we then discuss four bi-level optimization solvers to update the outer variable including explicit gradient update, proxy update, implicit function update, and closed-form update. Finally, we wrap up the survey by highlighting two prospective future directions: (1) Effective Data Optimization for Science examined through the lens of task formulation. (2) Accurate Explicit Proxy Update analyzed from an optimization standpoint.

翻訳日:2023-07-11 22:46:08 公開日:2023-07-09

# アダマール試験と近似振幅制約を用いた量子ゲーマン-ウィリアムソンアルゴリズム

Quantum Goemans-Williamson Algorithm with the Hadamard Test and Approximate Amplitude Constraints ( http://arxiv.org/abs/2206.14999v3 )

ライセンス: Link先を確認

Taylor L. Patti, Jean Kossaifi, Anima Anandkumar, and Susanne F. Yelin

(参考訳) 半有限プログラムは、難しい組合せ問題を近似するなど、幅広い応用の最適化手法である。そのような半定義のプログラムの1つは、一般的な整数緩和法であるゴーマンス・ウィリアムソンアルゴリズムである。我々は、最大$N=2^n$変数と$M \sim O(N)$制約を持つ半定値プログラムを近似的に解くために、n{+}1$ qubits, a constant number of circuit prepareds, $\text{poly}(n)$ expectation値のみを使用するGoemans-Williamsonアルゴリズムの変分量子アルゴリズムを導入する。効率的な最適化は、目的行列を補助量子ビット上で適切にパラメータ化されたユニタリ条件として符号化することで達成される。アダマールテストにより、指数的に多くの期待値を推定するのではなく、1つの期待値のみを推定することで、目的関数を最適化することができる。同様に、半定値プログラミングの制約は、パウリ弦振幅制約の多項式数を課すとともに、第2アダマールテストを実装することで効果的に実施できることを示す。我々は,Guemans-Williamson アルゴリズムの効率的な量子実装を MaxCut を含む様々なNPハード問題に対して考案し,提案プロトコルの有効性を実証する。本手法は,GSetライブラリから得られた多種多様なMaxCut問題に対する類似の古典的手法の性能を上回る。

Semidefinite programs are optimization methods with a wide array of applications, such as approximating difficult combinatorial problems. One such semidefinite program is the Goemans-Williamson algorithm, a popular integer relaxation technique. We introduce a variational quantum algorithm for the Goemans-Williamson algorithm that uses only $n{+}1$ qubits, a constant number of circuit preparations, and $\text{poly}(n)$ expectation values in order to approximately solve semidefinite programs with up to $N=2^n$ variables and $M \sim O(N)$ constraints. Efficient optimization is achieved by encoding the objective matrix as a properly parameterized unitary conditioned on an auxilary qubit, a technique known as the Hadamard Test. The Hadamard Test enables us to optimize the objective function by estimating only a single expectation value of the ancilla qubit, rather than separately estimating exponentially many expectation values. Similarly, we illustrate that the semidefinite programming constraints can be effectively enforced by implementing a second Hadamard Test, as well as imposing a polynomial number of Pauli string amplitude constraints. We demonstrate the effectiveness of our protocol by devising an efficient quantum implementation of the Goemans-Williamson algorithm for various NP-hard problems, including MaxCut. Our method exceeds the performance of analogous classical methods on a diverse subset of well-studied MaxCut problems from the GSet library.

翻訳日:2023-07-11 22:45:22 公開日:2023-07-09

# 量子解を用いたビザンチン合意におけるフォールトトレランス境界とセキュリティホールの破れ

Beating the fault-tolerance bound and security loopholes for Byzantine agreement with a quantum solution ( http://arxiv.org/abs/2206.09159v2 )

ライセンス: Link先を確認

Chen-Xun Weng, Rui-Qi Gao, Yu Bao, Bing-Hong Li, Wen-Bo Liu, Yuan-Mei Xie, Yu-Shuo Lu, Hua-Lei Yin, Zeng-Bing Chen

(参考訳) ブロックチェーンの基盤となるByzantine合意は、分散ネットワーク内のすべてのノードが合意に達することを目指している。古典的なビザンツ協定は2つの大きな問題に直面しない。 1つは、1/3$のフォールトトレランスバウンドであり、悪意のあるプレイヤーに許容するシステムは少なくとも3f+1$のプレイヤーを必要とする。もう1つは、古典的な暗号方式のセキュリティの抜け穴だ。ここでは,量子デジタル署名によるマルチパーティ相関により,約1/2ドルのフォールトトレランスでこの境界を破るために,無条件セキュリティと厳格な量子ビザンチン合意を提案する。我々の研究は、もともとのビザンチン条件に厳密に従い、多粒子絡みを必要とせずに任意の数のプレイヤーに拡張することができる。デジタル台帳の3者および5人の量子コンセンサスを実験的に実証した。我々の研究は、コンセンサス問題の観点から量子優位性を示し、量子ブロックチェーンと量子コンセンサスネットワークの重要な道のりを示唆している。

Byzantine agreement, the underlying core of blockchain, aims to make every node in a decentralized network reach consensus. Classical Byzantine agreements unavoidably face two major problems. One is $1/3$ fault-tolerance bound, which means that the system to tolerate $f$ malicious players requires at least $3f+1$ players. The other is the security loopholes from its classical cryptography methods. Here, we propose a strict quantum Byzantine agreement with unconditional security to break this bound with nearly $1/2$ fault tolerance due to multiparty correlation provided by quantum digital signatures. Our work strictly obeys the original Byzantine conditions and can be extended to any number of players without requirements for multiparticle entanglement. We experimentally demonstrate three-party and five-party quantum consensus for a digital ledger. Our work indicates the quantum advantage in terms of consensus problems and suggests an important avenue for quantum blockchain and quantum consensus networks.

翻訳日:2023-07-11 22:44:54 公開日:2023-07-09

# 最適二分分類木学習のための混合整数線形最適化公式

Mixed integer linear optimization formulations for learning optimal binary classification trees ( http://arxiv.org/abs/2206.04857v2 )

ライセンス: Link先を確認

Brandon Alston, Hamidreza Validi, Illya V. Hicks

(参考訳) 決定木は分類と回帰のための強力なツールであり、機械学習の急成長する分野で働く多くの研究者を惹きつける。他の方法よりも決定木の方が優れているのは解釈可能性であり、比較的解釈不能な他の高精度な方法よりも好まれる。二分分類木には2種類の頂点がある。 (i)ちょうど2人の子供がいて、データポイントが一組の離散的特徴に基づいて評価される分岐頂点 (ii)データポイントが個別に予測される葉の頂点。最適な二分分類木は、目的とする生体的最適化問題を解くことで得られる。 i) 正しく分類されたデータポイントの数を最大化し、 (ii)分岐頂点の数を最小化する。本稿では, 最適二分分類木を設計するための4つの混合整数線形最適化 (milo) 式を提案する。本稿では,提案した定式化とAghaei et al. (2021) の最強フローベースMILO定式化とを理論的に比較する。我々は,パレートフロンティアを用いて,モデルがスケールする能力と2目的アプローチの強みを示すために,13の公開データセットについて実験を行う。コードとデータはGitHubで公開されている。

Decision trees are powerful tools for classification and regression that attract many researchers working in the burgeoning area of machine learning. One advantage of decision trees over other methods is their interpretability, which is often preferred over other higher accuracy methods that are relatively uninterpretable. A binary classification tree has two types of vertices: (i) branching vertices which have exactly two children and where datapoints are assessed on a set of discrete features; and (ii) leaf vertices at which datapoints are given a discrete prediction. An optimal binary classification tree can be obtained by solving a biobjective optimization problem that seeks to (i) maximize the number of correctly classified datapoints and (ii) minimize the number of branching vertices. In this paper, we propose four mixed integer linear optimization (MILO) formulations for designing optimal binary classification trees: two flow-based formulations and two-cut based formulations. We provide theoretical comparisons between our proposed formulations and the strongest flow-based MILO formulation of Aghaei et al. (2021). We conduct experiments on 13 publicly available datasets to show the models' ability to scale and the strength of a biobjective approach using Pareto frontiers. Our code and data are available on GitHub.

翻訳日:2023-07-11 22:44:39 公開日:2023-07-09

# 高次級数展開を用いたマルコフ開量子系シミュレーション

Simulating Markovian open quantum systems using higher-order series expansion ( http://arxiv.org/abs/2212.02051v2 )

ライセンス: Link先を確認

Xiantao Li, Chunhao Wang

(参考訳) マルコフ開量子系の力学をシミュレーションするための効率的な量子アルゴリズムを提案する。このアルゴリズムの性能は、従来の最先端量子アルゴリズムと類似しており、進化時間に線形にスケールし、逆精度で多対数にスケールする。しかし,本アルゴリズムは概念的にクリーンであり,圧縮符号化のない単純な量子プリミティブのみを使用する。このアプローチは、デュハメルの原理に基づく高階級数展開とスケールドガウス二次数を用いた多重積分の近似を含む進化写像の新しい数学的処理に基づいている。本手法は時間依存リンドブレディアンを用いた量子力学のシミュレーションに容易に一般化する。さらに, スケールドガウス二次数を用いた多重積分近似法は, 時間次積分のより効率的な近似生成に応用できる可能性があり, ダイソン級数に基づく時間依存ハミルトニアンをシミュレートするための既存の量子アルゴリズムを単純化することができる。

We present an efficient quantum algorithm for simulating the dynamics of Markovian open quantum systems. The performance of our algorithm is similar to the previous state-of-the-art quantum algorithm, i.e., it scales linearly in evolution time and poly-logarithmically in inverse precision. However, our algorithm is conceptually cleaner, and it only uses simple quantum primitives without compressed encoding. Our approach is based on a novel mathematical treatment of the evolution map, which involves a higher-order series expansion based on Duhamel's principle and approximating multiple integrals using scaled Gaussian quadrature. Our method easily generalizes to simulating quantum dynamics with time-dependent Lindbladians. Furthermore, our method of approximating multiple integrals using scaled Gaussian quadrature could potentially be used to produce a more efficient approximation of time-ordered integrals, and therefore can simplify existing quantum algorithms for simulating time-dependent Hamiltonians based on a truncated Dyson series.

翻訳日:2023-07-11 22:36:37 公開日:2023-07-09

# ニューラルネットワークを用いたバイカルGVDデータのノイズ除去

Rejecting noise in Baikal-GVD data with neural networks ( http://arxiv.org/abs/2210.04653v2 )

ライセンス: Link先を確認

I. Kharuk, G. Rubtsov, G. Safronov

(参考訳) Baikal-GVDはバイカル湖の淡水に設置された大型($1 km$^3$)水中ニュートリノ望遠鏡である。深い湖水環境は背景光によって浸透し、バイカルGVDの光センサーによって検出される。本稿では,これらのノイズを信号から効率的に分離するためのニューラルネットワークを提案する。モデルはU-netのようなアーキテクチャを持ち、イベントの時間的(因果的)構造を用いる。ニューラルネットワークのメトリクスは、モンテカルロシミュレーションデータセット上で、99\%の信号純度(精度)と96\%の生存効率(リコール)に達する。提案手法は,雑音を無視するアルゴリズム手法と比較し,グラフベースなど他のニューラルネットワークのアーキテクチャについて考察する。

Baikal-GVD is a large ($\sim$1 km$^3$) underwater neutrino telescope installed in the fresh waters of Lake Baikal. The deep lake water environment is pervaded by background light, which is detectable by Baikal-GVD's photosensors. We introduce a neural network for an efficient separation of these noise hits from the signal ones, stemming from the propagation of relativistic particles through the detector. The model has a U-net-like architecture and employs temporal (causal) structure of events. The neural network's metrics reach up to 99\% signal purity (precision) and 96\% survival efficiency (recall) on Monte-Carlo simulated dataset. We compare the developed method with the algorithmic approach to rejecting the noise and discuss other possible architectures of neural networks, including graph-based ones.

翻訳日:2023-07-11 22:33:45 公開日:2023-07-09

# 学生のt-distribution:観測時の信頼度の測定について

Student's t-Distribution: On Measuring the Inter-Rater Reliability When the Observations are Scarce ( http://arxiv.org/abs/2303.04526v2 )

ライセンス: Link先を確認

Serge Gladkoff and Lifeng Han and Goran Nenadic

(参考訳) 自然言語処理(NLP)において、我々は常にゴールデンクオリティ評価法として人間の判断に頼っている。しかし、翻訳品質評価(TQE)、特にデータサンプル(観測値)が非常に少ない場合など、特定の評価タスクに対して、レータ間信頼性(IRR)レベルをより良く評価する方法に関する議論が続いている。本研究ではまず,1つのデータ(評価)ポイントしか得られない場合に,測定値の信頼区間を推定する方法について検討する。次に,2つの人間生成観察スコアを例示し,``sudent's \textit{t}-distribution'' 法を紹介し,これら2つのデータ点のみを用いて irr スコアを測定する方法と,品質評価の信頼区間 (cis) について説明する。評価信頼度は, 1回だけ観察しても, より多くの観察を導入することで, 評価信頼度が大幅に向上することを示す。研究者は、学生の「textit{t}-Distribution method」など、あらゆる方法でIRRスコアを報告し、NLP評価をより有意義で透明で信頼性の高いものにすることを推奨する。この \textit{t}-distribution 法は nlp フィールドの外でも利用でき、観測データが乏しい場合には、実験調査の信頼に値する評価のために irr レベルを測定することができる。キーワード:インターレータ信頼性(IRR)、スカース観測(Scarce Observations)、信頼区間(CIs)、自然言語処理(NLP)、翻訳品質評価(TQE)、学生の『textit{t}-Distribution』

In natural language processing (NLP) we always rely on human judgement as the golden quality evaluation method. However, there has been an ongoing debate on how to better evaluate inter-rater reliability (IRR) levels for certain evaluation tasks, such as translation quality evaluation (TQE), especially when the data samples (observations) are very scarce. In this work, we first introduce the study on how to estimate the confidence interval for the measurement value when only one data (evaluation) point is available. Then, this leads to our example with two human-generated observational scores, for which, we introduce ``Student's \textit{t}-Distribution'' method and explain how to use it to measure the IRR score using only these two data points, as well as the confidence intervals (CIs) of the quality evaluation. We give quantitative analysis on how the evaluation confidence can be greatly improved by introducing more observations, even if only one extra observation. We encourage researchers to report their IRR scores in all possible means, e.g. using Student's \textit{t}-Distribution method whenever possible; thus making the NLP evaluation more meaningful, transparent, and trustworthy. This \textit{t}-Distribution method can be also used outside of NLP fields to measure IRR level for trustworthy evaluation of experimental investigations, whenever the observational data is scarce. Keywords: Inter-Rater Reliability (IRR); Scarce Observations; Confidence Intervals (CIs); Natural Language Processing (NLP); Translation Quality Evaluation (TQE); Student's \textit{t}-Distribution

翻訳日:2023-07-11 22:16:24 公開日:2023-07-09

# 固有値問題に対するほぼ退化密度行列摂動理論の係数

Coefficients of almost-degenerate density matrix perturbation theory for eigenvalue problems ( http://arxiv.org/abs/2305.09026v2 )

ライセンス: Link先を確認

Charles Arnal, Louis Garrigue

(参考訳) 固有値問題のほぼ退化摂動理論をスペクトルプロジェクタ、別名密度行列を用いて検討する。複数の固有値が互いに近いとき、摂動級数の係数は、固有値間の差の逆がいくつかの因子として現れるため特異になる。級数の係数の表現におけるこれらの人工特異点を取り除き、固有値のギャップを任意に小さくし、結果の式で消えることさえできる。

We investigate almost-degenerate perturbation theory of eigenvalue problems, using spectral projectors, also named density matrices. When several eigenvalues are close to each other, the coefficients of the perturbative series become singular because inverses of differences between eigenvalues arise as some factors. We remove those artificial singularities in the expressions of the coefficients of the series, allowing eigenvalue gaps to be arbitrarily small and even vanishing in the resulting formulas.

翻訳日:2023-07-11 22:07:41 公開日:2023-07-09

# 分子関係学習のための条件付きグラフ情報基盤

Conditional Graph Information Bottleneck for Molecular Relational Learning ( http://arxiv.org/abs/2305.01520v2 )

ライセンス: Link先を確認

Namkyeong Lee, Dongmin Hyun, Gyoung S. Na, Sungwon Kim, Junseok Lee, Chanyoung Park

(参考訳) 分子関係学習は、分子対間の相互作用の振る舞いを学ぶことを目的としており、その幅広い応用のために分子科学への関心が高まった。近年、グラフニューラルネットワークは、分子をグラフ構造としてモデル化し、2分子間の原子レベルの相互作用を考慮し、分子関係学習において大きな成功を収めている。その成功にもかかわらず、既存の分子関係学習法は化学の性質を見落としている傾向にあり、例えば、化学反応を引き起こす官能基のような複数のサブ構造からなる化合物である。本研究では,コアサブグラフを検出することによって,グラフ対間のインタラクション挙動を予測するcgibと呼ばれる新しい関係学習フレームワークを提案する。主なアイデアは、一対のグラフが与えられたとき、条件付きグラフ情報ボトルネックの原理に基づいて、ペア付きグラフ上で条件付けされたタスクに関する最小限の十分な情報を含むグラフからサブグラフを見つけることである。提案手法は化学反応の性質、すなわち分子の核構造がどの分子と相互作用するかによって変化するという性質を模倣していると論じる。実世界のデータセットを用いた様々なタスクに関する大規模な実験は、最先端のベースラインよりもCGIBの方が優れていることを示す。私たちのコードはhttps://github.com/Namkyeong/CGIB.comで利用可能です。

Molecular relational learning, whose goal is to learn the interaction behavior between molecular pairs, got a surge of interest in molecular sciences due to its wide range of applications. Recently, graph neural networks have recently shown great success in molecular relational learning by modeling a molecule as a graph structure, and considering atom-level interactions between two molecules. Despite their success, existing molecular relational learning methods tend to overlook the nature of chemistry, i.e., a chemical compound is composed of multiple substructures such as functional groups that cause distinctive chemical reactions. In this work, we propose a novel relational learning framework, called CGIB, that predicts the interaction behavior between a pair of graphs by detecting core subgraphs therein. The main idea is, given a pair of graphs, to find a subgraph from a graph that contains the minimal sufficient information regarding the task at hand conditioned on the paired graph based on the principle of conditional graph information bottleneck. We argue that our proposed method mimics the nature of chemical reactions, i.e., the core substructure of a molecule varies depending on which other molecule it interacts with. Extensive experiments on various tasks with real-world datasets demonstrate the superiority of CGIB over state-of-the-art baselines. Our code is available at https://github.com/Namkyeong/CGIB.

翻訳日:2023-07-11 22:07:32 公開日:2023-07-09

# Cu配線の非破壊診断における反射係数のグラフパターンの学習

Learning Graph Patterns of Reflection Coefficient for Non-destructive Diagnosis of Cu Interconnects ( http://arxiv.org/abs/2304.10207v2 )

ライセンス: Link先を確認

Tae Yeob Kang, Haebom Lee, Sungho Suh

(参考訳) プロセッサの動作周波数とクロック速度の増加に伴い、相互接続は電子システム全体の信頼性と性能の両方に影響を及ぼす。配線の故障検出と診断は、電子の予後と健康管理(PHM)に不可欠である。しかし、電気信号を予後因子として用いる従来のアプローチは、欠陥根本原因を識別し、さらなる破壊的な評価を必要とすることがあり、ノイズ干渉の危険性があり、誤報につながる可能性がある。これらの制約に対処するため,Cu配線欠陥の非破壊検出と診断のための新しい手法を提案し,早期検出,診断精度の向上,耐雑音性を実現した。本手法は,従来の時系列信号解析とは異なる手法である反射係数のグラフパターンを利用して,相互接続欠陥の根本原因と重大度を一意に解析する。本研究では,グラフパターンが故障診断の能力を有し,学習アルゴリズムの効果的な入力データとなることを実験的に実証する。さらに,重大度評価アンサンブル学習(srel)アプローチを導入し,診断精度と雑音ロバスト性を大幅に向上させる。実験の結果,提案手法は従来の機械学習手法やマルチクラス畳み込みニューラルネットワーク(CNN)よりも優れており,特に高騒音下での最大精度は99.3%であることがわかった。

With the increasing operating frequencies and clock speeds in processors, interconnects affect both the reliability and performance of entire electronic systems. Fault detection and diagnosis of the interconnects are crucial for prognostics and health management (PHM) of electronics. However, traditional approaches using electrical signals as prognostic factors often face challenges in distinguishing defect root causes, necessitating additional destructive evaluations, and are prone to noise interference, leading to potential false alarms. To address these limitations, this paper introduces a novel approach for non-destructive detection and diagnosis of defects in Cu interconnects, offering early detection, enhanced diagnostic accuracy, and noise resilience. Our approach uniquely analyzes both the root cause and severity of interconnect defects by leveraging graph patterns of reflection coefficient, a technique distinct from traditional time series signal analysis. We experimentally demonstrate that the graph patterns possess the capability for fault diagnosis and serve as effective input data for learning algorithms. Additionally, we introduce a novel severity rating ensemble learning (SREL) approach, which significantly enhances diagnostic accuracy and noise robustness. Experimental results demonstrate that the proposed method outperforms conventional machine learning methods and multi-class convolutional neural networks (CNN), achieving a maximum accuracy of 99.3%, especially under elevated noise levels.

翻訳日:2023-07-11 22:07:14 公開日:2023-07-09

# simbaml: 機械モデルと機械学習を拡張データで接続する

SimbaML: Connecting Mechanistic Models and Machine Learning with Augmented Data ( http://arxiv.org/abs/2304.04000v2 )

ライセンス: Link先を確認

Maximilian Kleissl, Lukas Drews, Benedict B. Heyder, Julian Zabbarov, Pascal Iversen, Simon Witzke, Bernhard Y. Renard, Katharina Baum

(参考訳) 高度な機械学習(ML)モデルのトレーニングには、多くのアプリケーションで収集するのが困難または高価である大規模なデータセットが必要である。システムダイナミクスに関する事前知識が利用可能であれば、実世界のデータを補完するために機械的な表現が使用できる。我々は,通常の微分方程式モデルからリアルな合成データセットを生成するオープンソースツールであるSimbaML(Simulation-based ML)と,MLパイプラインの直接解析と包含について述べる。 SimbaMLは、合成データから実世界のデータへの変換学習、データ拡張、データ収集の必要性の識別、物理インフォームドMLアプローチのベンチマークを可能にする。 SimbaMLはhttps://pypi.org/project/simba-ml/から入手できる。

Training sophisticated machine learning (ML) models requires large datasets that are difficult or expensive to collect for many applications. If prior knowledge about system dynamics is available, mechanistic representations can be used to supplement real-world data. We present SimbaML (Simulation-Based ML), an open-source tool that unifies realistic synthetic dataset generation from ordinary differential equation-based models and the direct analysis and inclusion in ML pipelines. SimbaML conveniently enables investigating transfer learning from synthetic to real-world data, data augmentation, identifying needs for data collection, and benchmarking physics-informed ML approaches. SimbaML is available from https://pypi.org/project/simba-ml/.

翻訳日:2023-07-11 22:05:42 公開日:2023-07-09

# 深部物理誘導粒子流場を用いた非教師なしクロスドメインソフトセンサモデリング

Unsupervised Cross-Domain Soft Sensor Modelling via Deep Physics-Inspired Particle Flow Bayes ( http://arxiv.org/abs/2306.04919v4 )

ライセンス: Link先を確認

Junn Yong Loo, Ze Yang Ding, Surya G. Nurzaman, Chee-Ming Ting, Vishnu Monn Baskaran and Chee Pin Tan

(参考訳) データ駆動型ソフトセンサーは、信頼できる状態推定によって正確な知覚を達成するために不可欠である。しかし、代表的なソフトセンサーモデルの開発には、ラベルの欠如、ドメイン適応性、データの時間的コヒーレンスといった問題がある。これらの課題に対処するため,我々は,対象とする状態ラベルがない場合のクロスドメインソフトセンサモデリングのためのdpfb(deep particle flow bayes)フレームワークを提案する。特に、シーケンシャルベイズ目標を最初に定式化し、クロスドメインソフトセンシング問題の基礎となる最大確率推定を行う。フレームワークのコアには物理に触発された粒子の流れが組み込まれており、シーケンシャルベイズ目標を最適化し、抽出された潜在性と隠れた特徴の正確なベイズ更新を行う。その結果,提案手法は複雑なクロスドメインシステムのダイナミクスを特徴付け,効率的な時系列非教師なしドメイン適応 (uda) を実現することができる。最後に,複雑なダイナミクスと複数の動作条件を有する複合産業多相流プロセスシステム上での枠組みを検証する。その結果,DPFBフレームワークは高いドメイン間ソフトセンシング性能,最先端の深部UDA性能,正規化フローアプローチを実現していることがわかった。

Data-driven soft sensors are essential for achieving accurate perception through reliable state inference. However, developing representative soft sensor models is challenged by issues such as missing labels, domain adaptability, and temporal coherence in data. To address these challenges, we propose a deep Particle Flow Bayes (DPFB) framework for cross-domain soft sensor modeling in the absence of target state labels. In particular, a sequential Bayes objective is first formulated to perform the maximum likelihood estimation underlying the cross-domain soft sensing problem. At the core of the framework, we incorporate a physics-inspired particle flow that optimizes the sequential Bayes objective to perform an exact Bayes update of the model extracted latent and hidden features. As a result, these contributions enable the proposed framework to learn a rich approximate posterior feature representation capable of characterizing complex cross-domain system dynamics and performing effective time series unsupervised domain adaptation (UDA). Finally, we validate the framework on a complex industrial multiphase flow process system with complex dynamics and multiple operating conditions. The results demonstrate that the DPFB framework achieves superior cross-domain soft sensing performance, outperforming state-of-the-art deep UDA and normalizing flow approaches.

翻訳日:2023-07-11 21:57:29 公開日:2023-07-09

# 量子コヒーレンス保護のための熱コヒーレント状態の調製

Preparation of thermal coherent state for quantum coherence protection ( http://arxiv.org/abs/2306.04369v2 )

ライセンス: Link先を確認

Asghar Ullah, M. Tahir Naseem, and \"Ozg\"ur E. M\"ustecapl{\i}o\u{g}lu

(参考訳) 熱環境と量子システムの間の不可避な相互作用は、量子特性の劣化を招き、量子状態工学によって対抗することができる。特に、熱コヒーレント状態(tcs)の調製は、量子ビットの量子特性の延長に有望である。熱的, 縦方向の伝送線路共振器において, アンシラ量子ビットを用いてTCSを実現することを提案する。開系力学を記述するためにマスター方程式を用いると、量子ビットと共振器に対するマスター方程式の定常解が得られる。注目すべきは、共振器の状態はTCSであり、アンシラ量子ビットは熱のままである。さらに,2次相関係数と光子数統計値を用いて量子特性の検証を行った。そこで本研究では,二段系と共振器からなるハイブリッド系に基づいて量子コヒーレンスを生成する機構について検討し,アシラ支援による熱コヒーレント状態が量子ビットのコヒーレンス寿命を延ばすのに役立つと主張する。この結果は,量子科学と技術のためのTCSの作成と実装に有望な方向性をもたらす可能性がある。

The unavoidable interaction between thermal environments and quantum systems leads to the degradation of the quantum features, which can be fought against by engineered quantum states. In particular, preparing a thermal coherent state (TCS) can be promising for prolonging the quantum properties of qubits. We propose that a TCS can be realized by using an ancilla qubit to thermally and longitudinally driven transmission line resonator. Using the master equation approach to describe the open system dynamics, we obtain the steady-state solution of the master equation for the qubit and resonator. Remarkably, the state of the resonator is a TCS, while the ancilla qubit remains thermal. Furthermore, we study the second-order correlation coefficient and photon number statistics to validate its quantum properties. To sum up, we also investigate a mechanism for generating quantum coherence based on a hybrid system composed of two-level systems and a resonator to claim that an ancilla-assisted engineered thermal coherent state can assist in prolonging the coherence lifetimes of qubits. Our results may provide a promising direction for preparing and practically implementing TCSs for quantum science and technology.

翻訳日:2023-07-11 21:57:07 公開日:2023-07-09

# 音声表現モデルのタスク非依存的構造化プルーニング

Task-Agnostic Structured Pruning of Speech Representation Models ( http://arxiv.org/abs/2306.01385v2 )

ライセンス: Link先を確認

Haoyu Wang, Siyuan Wang, Wei-Qiang Zhang, Hongbin Suo, Yulong Wan

(参考訳) Wav2vec2, Hubert, WavLMなどの自己教師付き事前訓練モデルでは、多くの音声タスクを大幅に改善することが示されている。しかし、その大きなメモリと強力な計算要求が産業応用を妨げている。構造化プルーニングはハードウェアフレンドリーなモデル圧縮技術であるが、通常は精度が低下する。本稿では,性能劣化を補償するための細粒度注意ヘッドプルーニング法を提案する。さらに,L0正則化に直線スルー推定器を導入し,プルーンドモデルをさらに高速化する。 superbベンチマークの実験では、複数のタスクで密度の高いモデルと同等の性能を達成でき、平均でwav2vec 2.0ベースモデルよりも72%少ないパラメータと2倍速い推論速度を持つ。

Self-supervised pre-trained models such as Wav2vec2, Hubert, and WavLM have been shown to significantly improve many speech tasks. However, their large memory and strong computational requirements hinder their industrial applicability. Structured pruning is a hardware-friendly model compression technique but usually results in a larger loss of accuracy. In this paper, we propose a fine-grained attention head pruning method to compensate for the performance degradation. In addition, we also introduce the straight through estimator into the L0 regularization to further accelerate the pruned model. Experiments on the SUPERB benchmark show that our model can achieve comparable performance to the dense model in multiple tasks and outperforms the Wav2vec 2.0 base model on average, with 72% fewer parameters and 2 times faster inference speed.

翻訳日:2023-07-11 21:56:28 公開日:2023-07-09

# 二重複素量子トランスダクションのための最適化プロトコル

Optimized protocols for duplex quantum transduction ( http://arxiv.org/abs/2305.15648v2 )

ライセンス: Link先を確認

Zhaoyou Wang, Mengzhen Zhang, Yat Wong, Changchun Zhong, Liang Jiang

(参考訳) 量子トランスデューサは量子ネットワーク内の物理プラットフォームのハイブリッドインターフェースを介して量子信号を変換する。量子通信チャネルとしてモデル化され、一方向量子変換の性能は量子チャネル容量によって測定できる。しかし、双方向に信号が変換される二重量子トランスダクションに用いられる量子トランスデューサの特性は未解決のままである。本稿では、二重複素量子トランスダクションの性能を特徴付けるためのレート領域を提案する。このツールを用いることで、同時二重変換に最適化された量子トランスデューサは、時間共有一方向変換の標準プロトコルに基づく戦略よりも優れることがわかった。周波数領域に統合されたレート領域は、有限帯域幅の量子トランスデューサを特徴付けることもできる。

Quantum transducers convert quantum signals through hybrid interfaces of physical platforms in quantum networks. Modeled as quantum communication channels, performance of unidirectional quantum transduction can be measured by the quantum channel capacity. However, characterizing performance of quantum transducers used for duplex quantum transduction where signals are converted bidirectionally remains an open question. Here, we propose rate regions to characterize the performance of duplex quantum transduction. Using this tool, we find that quantum transducers optimized for simultaneous duplex transduction can outperform strategies based on the standard protocol of time-shared unidirectional transduction. Integrated over the frequency domain, we demonstrate that rate region can also characterize quantum transducers with finite bandwidth.

翻訳日:2023-07-11 21:55:30 公開日:2023-07-09

# 自由フェルミオン模型における拡散複雑性

Spread Complexity in free fermion models ( http://arxiv.org/abs/2305.12115v2 )

ライセンス: Link先を確認

Mamta Gautam, Nitesh Jaiswal, and Ankit Gill

(参考訳) 3スピン相互作用型イジングモデル、xyスピンチェーン、su-schrieffer-heegerモデルにおけるクエンチェの作業の複雑さと統計について検討した。我々は,これらのモデルについて,急速クエンチや急速クエンチなどの異なるクエンチのスキームについて検討した。パラメータの時間依存周期駆動の存在下で、3つのモデルすべてを調べるためにフロッケ演算子手法を用いる。急激な焼成事件とは対照的に、周期的に変化するパラメーターケースは臨界点付近の非解析的挙動をはっきりと示している。また, 作業とランチョス係数の関係と, 作業の統計が臨界点付近でどのように振る舞うかを明らかにする。

We study spread complexity and the statistics of work done for quenches in the three-spin interacting Ising model, the XY spin chain, and the Su-Schrieffer-Heeger model. We study these models without quench and for different schemes of quenches, such as sudden quench and multiple sudden quenches. We employ the Floquet operator technique to investigate all three models in the presence of time-dependent periodic driving of parameters. In contrast to the sudden quenched cases, the periodically varying parameter case clearly shows non-analytical behaviour near the critical point. We also elucidate the relation between work done and the Lanczos coefficient and how the statistics of work done behave near critical points.

翻訳日:2023-07-11 21:54:44 公開日:2023-07-09

# 最適化正方形誤差を用いた量子振幅推定

Quantum Amplitude Estimation with Optimized Squared Error ( http://arxiv.org/abs/2306.16695v2 )

ライセンス: Link先を確認

Xi Lu, Hongwei Lin

(参考訳) まず,量子位相推定回路の初期状態の最適化により,量子振幅推定の誤差挙動を最適化する手法を提案する。次に、半分のoracle呼び出しで同じ性能を達成する量子回路を構築する。このような最適化された量子振幅推定(OQAE)アルゴリズムは、標準偏差$\Delta x \sim 1.283/N$を達成することができる。

We first introduce a method to optimize the error behavior of quantum amplitude estimation by optimizing the initial state of the quantum phase estimation circuit. Then we construct a quantum circuit that achieves the same performance with half number of oracle calls. Such optimized quantum amplitude estimation (OQAE) algorithm can achieve a standard deviation $\Delta x \sim 1.283/N$, which overwhelms existing algorithm with $\Delta x$ about $>4/N$.

翻訳日:2023-07-11 21:48:48 公開日:2023-07-09

# インテリジェントトレーディング確率波方程式に基づく複素適応学習の理論

A Theory of Complex Adaptive Learning Based on an Intelligent Trading Probability Wave Equation ( http://arxiv.org/abs/2306.15554v3 )

ライセンス: Link先を確認

Leilei Shi, Bing-Hong Wang, Xinshuai Guo, Guocheng Wang

(参考訳) 複雑適応学習は知的であり、生命と無生物の複雑なシステムにおいて不可欠である。複雑なシステムは、相互作用する多くの個人や単位を含み、相互作用するときに隠れたパターンを示し、自然科学から社会科学まで、ほぼ全ての伝統的な分野において広く起こる。最近の研究では、いわゆる建築材料が学習できることを示した。複雑な系の定式化のメカニズムを探求する科学者を刺激する。しかし、それは非常に難しい。ここでは,複素系の局所的力学平衡を対象とする普遍的規則あるいは複素適応学習法則を,貿易量-価格確率波方程式から抽出し,その応用として複素量子系に適用しようとする。複雑な量子系に作用する運動量力が非局在化されていれば、相互作用コヒーレンスにおけるインテリジェンスのような性質を持つ粒子が証明される。これは時間間隔で観測された移動粒子の累積確率である。したがって、複雑な量子系の粒子は、金融市場の複雑さにおけるトレーダーのそれと正確に複雑な適応学習機構によって支配される強化座標において、複雑な適応学習または知性のような性質を持つと仮定する。この仮定により、量子力学における絡み合いの革新的な解釈を提案する。量子の絡み合いはコペンハーゲンの主流派が維持するコヒーレント状態の重ね合わせの状態ではないと結論付けている。相補的な2つの力と可変力の間の相互作用におけるコヒーレントな状態である。著者らは,新しい技術経路における絡み合い資源の産業生産を示唆し,その妥当性を検証し,その理論が完全になるまでさらに改良する実験結果を見据えた。

Complex adaptive learning is intelligent and crucial in living and inanimate complex systems. A complex system comprises many interacting individuals or units, shows hidden patterns as they interact, and widely occurs in almost every traditional discipline, from natural to social sciences. A recent study has demonstrated a so-called architected material capable of learning. It stimulates scientists to explore the mechanism of complex systems formulation. However, it is very challenging. Here the authors attempt to extract a universal rule or a law of complex adaptive learning subject to local dynamic equilibrium in complex systems from a trading volume-price probability wave equation and apply it to complex quantum systems as its application. It proves particles capable of intelligence-like properties in interactive coherence if the momentum force exerted on the complex quantum systems is non-localized. It is the cumulative probability of the moving particles observed in a time interval. Thus, it assumes that particles in complex quantum systems have a complex adaptive learning- or intelligence-like property in a reinforced coordinate, governed by the exact complex adaptive learning mechanism as that of traders in the complexity of the financial markets. With this assumption, the authors propose an innovative interpretation of entanglement in quantum mechanics. It concludes that quantum entanglement is not a state of the superposition of coherent states as the mainstream Copenhagen school of thought maintains. It is a coherent state in the interaction between two opposite, complementary, and variable forces. The authors look forward to the experimental results to examine its validity and further improve the theory until it is perfect, suggesting industrial production of entanglement resources in new technical routes available

翻訳日:2023-07-11 21:48:38 公開日:2023-07-09

# 擬似プログラミングにおける$O(\sqrt{n})$から$O(\log(n))$へ

From $O(\sqrt{n})$ to $O(\log(n))$ in Quadratic Programming ( http://arxiv.org/abs/2306.15079v2 )

ライセンス: Link先を確認

Liang Wu

(参考訳) 暗雲」は数十年間、数値最適化理論、すなわち、最適化アルゴリズム $o(\log(n))$ の反復複雑性が存在するかどうかにかかっている。この論文は,新たな最適化アルゴリズムと厳密な理論証明を用いて答える。ボックス制約付き二次プログラミング(Box-QP)から始まり、多くの実用的な最適化問題はBox-QPに該当する。一般的な滑らかな二次計画法(QP)、非滑らかなラッソ、サポートベクターマシン(または回帰)は双対性理論によりBox-QPとして再構成できる。特に "direct" メソッドのように振る舞う$o(\log(n))$ 反復複雑性 qp アルゴリズムを提示するのは初めてである: 必要なイテレーション数は、正確な値 $\left\lceil\log\left(\frac{3.125n}{\epsilon}\right)/\log(1.5625)\right\rceil$ で決定論的である。この大きなブレークスルーによって、$o(\sqrt{n})$から$o(\log(n))$の最適化アルゴリズムへの移行が可能になります。

A "dark cloud" hangs over numerical optimization theory for decades, namely, whether an optimization algorithm $O(\log(n))$ iteration complexity exists. "Yes", this paper answers, with a new optimization algorithm and strict theory proof. It starts with box-constrained quadratic programming (Box-QP), and many practical optimization problems fall into Box-QP. General smooth quadratic programming (QP), nonsmooth Lasso, and support vector machine (or regression) can be reformulated as Box-QP via duality theory. It is the first time to present an $O(\log(n))$ iteration complexity QP algorithm, in particular, which behaves like a "direct" method: the required number of iterations is deterministic with exact value $\left\lceil\log\left(\frac{3.125n}{\epsilon}\right)/\log(1.5625)\right\rceil$. This significant breakthrough enables us to transition from the $O(\sqrt{n})$ to the $O(\log(n))$ optimization algorithm, whose amazing scalability is particularly relevant in today's era of big data and artificial intelligence.

翻訳日:2023-07-11 21:48:13 公開日:2023-07-09

# 構造量子状態のための安定トモグラフィ

Stable Tomography for Structured Quantum States ( http://arxiv.org/abs/2306.09432v2 )

ライセンス: Link先を確認

Zhen Qin, Casey Jameson, Zhexuan Gong, Michael B. Wakin and Zhihui Zhu

(参考訳) 量子状態トモグラフィ(QST)を用いてしばしば達成される実験的測定から量子状態の再構成は、量子デバイスの検証とベンチマークに不可欠である。しかし、一般の非構造化量子状態に対してQSTを実行するには、最も最適な測定設定であっても、システム内の個々の量子数とともに \emph{exponentially} を成長させる膨大な数の状態コピーが必要である。幸いなことに、ノイズや中間スケールの量子コンピュータによって生成される状態のような多くの物理量子状態は通常、構造化される。一次元では、そのような状態は、キュービットの個数に依存しない有限行列/結合次元を持つ行列積作用素(MPO)によってよく近似されることが期待される。しかしながら、これらの状態に対して効率的なQSTが実行可能であるかどうかはまだ不明である。本稿では, このギャップを橋渡しし, 圧縮センシングと経験的過程の理論を用いたmposの安定回復のための理論的保証を確立する。まず、ガウス測度とHaar random rank-one Positive Operator Valued Measures (POVMs)の2種類のランダム測定設定について検討する。有限結合次元のMPOに含まれる情報は、測定値の統計的誤差を仮定して、キュービット数にのみ依存する多数のランダムな測定値を用いて保存可能であることを示す。次に、量子コンピュータ上で実装可能なHaarランダムランクワンPOVMを用いて、MPOベースのQSTを物理量子測定により研究する。我々は、MPO状態の有界回復誤差を保証するために、キュービット数における状態コピー数 \emph{polynomial} だけが必要であることを証明した。

The reconstruction of quantum states from experimental measurements, often achieved using quantum state tomography (QST), is crucial for the verification and benchmarking of quantum devices. However, performing QST for a generic unstructured quantum state requires an enormous number of state copies that grows \emph{exponentially} with the number of individual quanta in the system, even for the most optimal measurement settings. Fortunately, many physical quantum states, such as states generated by noisy, intermediate-scale quantum computers, are usually structured. In one dimension, such states are expected to be well approximated by matrix product operators (MPOs) with a finite matrix/bond dimension independent of the number of qubits, therefore enabling efficient state representation. Nevertheless, it is still unclear whether efficient QST can be performed for these states in general. In this paper, we attempt to bridge this gap and establish theoretical guarantees for the stable recovery of MPOs using tools from compressive sensing and the theory of empirical processes. We begin by studying two types of random measurement settings: Gaussian measurements and Haar random rank-one Positive Operator Valued Measures (POVMs). We show that the information contained in an MPO with a finite bond dimension can be preserved using a number of random measurements that depends only \emph{linearly} on the number of qubits, assuming no statistical error of the measurements. We then study MPO-based QST with physical quantum measurements through Haar random rank-one POVMs that can be implemented on quantum computers. We prove that only a \emph{polynomial} number of state copies in the number of qubits is required to guarantee bounded recovery error of an MPO state.

翻訳日:2023-07-11 21:46:37 公開日:2023-07-09

# GEMO-CLAP:ジェンダー属性強化コントラスト言語-Audio Pretraining for Speech Emotion Recognition

GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Speech Emotion Recognition ( http://arxiv.org/abs/2306.07848v3 )

ライセンス: Link先を確認

Yu Pan, Yanni Hu, Yuguang Yang, Jixun Yao, Wen Fei, Lei Ma, Heng Lu

(参考訳) コントラスト学習に基づく事前学習手法は,近年,様々な分野において顕著な成功を収めている。本稿では,音声感情認識のための,ジェンダー属性強調コントラスト言語-audio pretraining (clap) モデルの一種であるgemo-clapを提案する。具体的には、まず感情認識のための効果的な感情CLAPモデルEmo-CLAPを構築し、様々な自己教師付き学習に基づく事前学習モデルを利用する。そして、音声感情モデリングにおけるジェンダー属性の重要性を考慮し、2つのGEmo-CLAPアプローチを提案し、音声信号の感情情報とジェンダー情報を統合し、より合理的な目的を形成する。 iemocapコーパスの広範囲な実験により,本提案手法は異なる事前学習モデルでベースラインのemo-clapを一貫して上回り,他の最先端手法よりも優れた認識性能を達成していることが示された。

Contrastive learning based pretraining methods have recently exhibited impressive success in diverse fields. In this paper, we propose GEmo-CLAP, a kind of efficient gender-attribute-enhanced contrastive language-audio pretraining (CLAP) model for speech emotion recognition. To be specific, we first build an effective emotion CLAP model Emo-CLAP for emotion recognition, utilizing various self-supervised learning based pre-trained models. Then, considering the importance of the gender attribute in speech emotion modeling, two GEmo-CLAP approaches are further proposed to integrate the emotion and gender information of speech signals, forming more reasonable objectives. Extensive experiments on the IEMOCAP corpus demonstrate that our proposed two GEmo-CLAP approaches consistently outperform the baseline Emo-CLAP with different pre-trained models, while also achieving superior recognition performance compared with other state-of-the-art methods.

翻訳日:2023-07-11 21:45:49 公開日:2023-07-09

# 実用的なコラボレーティブ知覚:非同期およびマルチエージェント3dオブジェクト検出のためのフレームワーク

Practical Collaborative Perception: A Framework for Asynchronous and Multi-Agent 3D Object Detection ( http://arxiv.org/abs/2307.01462v2 )

ライセンス: Link先を確認

Minh-Quan Dao, Julie Stephany Berrio, Vincent Fr\'emont, Mao Shan, Elwan H\'ery, and Stewart Worrall

(参考訳) 咬合は、LiDARベースのオブジェクト検出方法において大きな課題である。この課題は、多数の道路利用者による障害により視野が著しく低下する一方、衝突を避けるため、エゴ車両が信頼性の高い物体検出を行う必要がある都市交通において、安全上重要となる。車間コミュニケーション(V2X)による協調的知覚は、接続されたエージェントが複数存在することで様々な視点を生かし、完全なシーン表現を形成することで、魅力的な解決法である。最先端のV2X方式では,早期の協調作業において,点雲のバードアイビューイメージを交換し,通信点雲よりも通信点雲の帯域消費が低く,また,接続エージェント間の深い相互作用によりエージェントの出力を融合させる遅延協調よりも検出性能が高いという,中間協調方式を用いて,性能帯域幅のトレードオフを解消する。強力なパフォーマンスを実現する一方で、ほとんどの中途半端なアプローチの実際の展開は、学習可能なコラボレーショングラフやオートエンコーダベースの圧縮/圧縮機、エージェント間の同期に関する非現実的な仮定など、非常に複雑なアーキテクチャによって妨げられている。本研究では,単一車両検出モデルの変更を最小限に抑えつつ,エージェント間同期における非現実的な仮定を緩和しつつ,従来手法よりも優れた帯域幅性能のトレードオフを実現する,シンプルかつ効果的な協調手法を提案する。 v2x-simデータセットを用いた実験により,提案手法は,遅延コラボレーション法と同等の帯域幅のみを消費しながら,早期コラボレーション法の性能の98\%を達成した。

Occlusion is a major challenge for LiDAR-based object detection methods. This challenge becomes safety-critical in urban traffic where the ego vehicle must have reliable object detection to avoid collision while its field of view is severely reduced due to the obstruction posed by a large number of road users. Collaborative perception via Vehicle-to-Everything (V2X) communication, which leverages the diverse perspective thanks to the presence at multiple locations of connected agents to form a complete scene representation, is an appealing solution. State-of-the-art V2X methods resolve the performance-bandwidth tradeoff using a mid-collaboration approach where the Bird-Eye View images of point clouds are exchanged so that the bandwidth consumption is lower than communicating point clouds as in early collaboration, and the detection performance is higher than late collaboration, which fuses agents' output, thanks to a deeper interaction among connected agents. While achieving strong performance, the real-world deployment of most mid-collaboration approaches is hindered by their overly complicated architectures, involving learnable collaboration graphs and autoencoder-based compressor/ decompressor, and unrealistic assumptions about inter-agent synchronization. In this work, we devise a simple yet effective collaboration method that achieves a better bandwidth-performance tradeoff than prior state-of-the-art methods while minimizing changes made to the single-vehicle detection models and relaxing unrealistic assumptions on inter-agent synchronization. Experiments on the V2X-Sim dataset show that our collaboration method achieves 98\% of the performance of an early-collaboration method, while only consuming the equivalent bandwidth of a late-collaboration method.

翻訳日:2023-07-11 21:37:52 公開日:2023-07-09

# 非局所性のない非有界ランダム性の証明

Certification of unbounded randomness without nonlocality ( http://arxiv.org/abs/2307.01333v2 )

ライセンス: Link先を確認

Shubhayan Sarkar

(参考訳) 乱数生成器は暗号と鍵分布において重要な役割を果たす。したがって、これらのデバイスから生成された乱数は、あらゆる敵によって予測不可能であるかどうかを検証することが重要である。近年、量子非局所性はランダム性を証明できる資源として認識されている。これらのスキームはデバイスに依存しないため非常に安全であるが、量子非局所性の観測は実際的な観点からは非常に困難である。本研究では,Leggett-Gargの不等式の最大値違反に基づいて,半デバイス独立な方法で非有界ランダム性を証明するためのスキームを提供する。興味深いことに、このスキームは量子状態の選択とは独立であり、従って「量子」ノイズでさえ自己検定の量子測定に利用でき、非有界ランダム性を生成して、このスキームを実用目的に非常に効率的にすることができる。

Random number generators play an essential role in cryptography and key distribution. It is thus important to verify whether the random numbers generated from these devices are genuine and unpredictable by any adversary. Recently, quantum nonlocality has been identified as a resource that can be utilised to certify randomness. Although these schemes are device-independent and thus highly secure, the observation of quantum nonlocality is extremely difficult from a practical perspective. In this work, we provide a scheme to certify unbounded randomness in a semi-device-independent way based on the maximal violation of Leggett-Garg inequalities. Interestingly, the scheme is independent of the choice of the quantum state, and consequently even "quantum" noise could be utilized to self-test quantum measurements and generate unbounded randomness making the scheme highly efficient for practical purposes.

翻訳日:2023-07-11 21:37:20 公開日:2023-07-09

# エッジクラウドコンピューティングによる大規模AI生成の概観

An Overview on Generative AI at Scale with Edge-Cloud Computing ( http://arxiv.org/abs/2306.17170v2 )

ライセンス: Link先を確認

Yun-Cheng Wang, Jintang Xue, Chengwei Wei, C.-C. Jay Kuo

(参考訳) 人工知能(AI)の特定のカテゴリとして、生成人工知能(GenAI)は、人間が生成したものに似た新しいコンテンツを生成する。 GenAIシステムの急速な開発は、インターネット上で大量の新しいデータを生み出し、現在のコンピューティングおよび通信フレームワークに新たな課題を提起している。現在、GenAIサービスは大規模な計算リソースを必要とするため、従来のクラウドコンピューティングフレームワークに依存している。しかし、データ転送と大量のリクエストのために、そのようなサービスは高いレイテンシに直面する。一方、エッジクラウドコンピューティングは、エッジとクラウドのコラボレーションを通じて、適切な計算能力と低レイテンシを同時に提供することができる。したがって、エッジクラウドコンピューティングのパラダイムを活用することで、GenAIシステムを大規模に構築することは魅力的である。本稿では,GenAIとエッジクラウドコンピューティングの最近の展開について概説する。そして、2つの例のGenAIアプリケーションを使って、エッジクラウドの協調システムを使ってソリューションをスケールアップする技術的な課題について議論します。最後に、GenAIシステムを大規模に運用するための設計上の考慮事項をリストアップし、今後の研究方向性を指摘する。

As a specific category of artificial intelligence (AI), generative artificial intelligence (GenAI) generates new content that resembles what is created by humans. The rapid development of GenAI systems has created a huge amount of new data on the Internet, posing new challenges to current computing and communication frameworks. Currently, GenAI services rely on the traditional cloud computing framework due to the need for large computation resources. However, such services will encounter high latency because of data transmission and a high volume of requests. On the other hand, edge-cloud computing can provide adequate computation power and low latency at the same time through the collaboration between edges and the cloud. Thus, it is attractive to build GenAI systems at scale by leveraging the edge-cloud computing paradigm. In this overview paper, we review recent developments in GenAI and edge-cloud computing, respectively. Then, we use two exemplary GenAI applications to discuss technical challenges in scaling up their solutions using edge-cloud collaborative systems. Finally, we list design considerations for training and deploying GenAI systems at scale and point out future research directions.

翻訳日:2023-07-11 21:36:17 公開日:2023-07-09

# OSP: 2段階同期による分散モデルトレーニングの強化

OSP: Boosting Distributed Model Training with 2-stage Synchronization ( http://arxiv.org/abs/2306.16926v2 )

ライセンス: Link先を確認

Zixuan Chen, Lei Shi, Xuandong Liu, Jiahui Li, Sen Liu, Yang Xu

(参考訳) 分散ディープラーニング(DDL)は、データセットとモデルの大きなサイズでディープラーニングタスクをトレーニングする効率を高めることを目的とした、有望な研究分野である。 DDLノードの計算能力が向上し続けており、ノード間のネットワーク接続が大きなボトルネックとなっている。パラメータサーバベースのDDLにおいて、このボトルネックに対処するために、勾配圧縮の様々な手法とモデル同期の改善が提案されている。しかし、これら2つの手法は、廃棄された勾配による精度の損失を生じさせ、それぞれモデル同期のスループットを低下させる可能性がある。これらの課題に対処するために,2段階同期方式による効率的な通信を実現し,局所勾配パラメータ補正 (lgp) を用いて,staleパラメータによる精度損失を回避する新しいモデル同期法,ospを提案する。 OSPのプロトタイプはPyTorchを使用して実装され、9ノードテストベッドで一般的に使用されるディープラーニングモデルとデータセットで評価されている。評価の結果,OSPは一般的な同期モデルと比較して,精度の低下を伴わずに最大50%のスループット向上を実現可能であることがわかった。

Distributed deep learning (DDL) is a promising research area, which aims to increase the efficiency of training deep learning tasks with large size of datasets and models. As the computation capability of DDL nodes continues to increase, the network connection between nodes is becoming a major bottleneck. Various methods of gradient compression and improved model synchronization have been proposed to address this bottleneck in Parameter-Server-based DDL. However, these two types of methods can result in accuracy loss due to discarded gradients and have limited enhancement on the throughput of model synchronization, respectively. To address these challenges, we propose a new model synchronization method named Overlapped Synchronization Parallel (OSP), which achieves efficient communication with a 2-stage synchronization approach and uses Local-Gradient-based Parameter correction (LGP) to avoid accuracy loss caused by stale parameters. The prototype of OSP has been implemented using PyTorch and evaluated on commonly used deep learning models and datasets with a 9-node testbed. Evaluation results show that OSP can achieve up to 50\% improvement in throughput without accuracy loss compared to popular synchronization models.

翻訳日:2023-07-11 21:35:40 公開日:2023-07-09

# Riemannian Gauss-Newtonによる低ランクテンソル推定:統計的最適性と2次収束

Low-rank Tensor Estimation via Riemannian Gauss-Newton: Statistical Optimality and Second-Order Convergence ( http://arxiv.org/abs/2104.12031v4 )

ライセンス: Link先を確認

Yuetian Luo, Anru R. Zhang

(参考訳) 本稿では, タッカー級のテンソルを, ノイズの少ない線形測定値から推定する。一般的な問題は、テンソル回帰、テンソル完備化、テンソルPCA/SVDなど、応用から生じる多くの具体例をカバーする。低タッカーランクテンソル推定のための効率的なリーマンガウスニュートン法(RGN)を提案する。文献におけるRGNの一般(超)線形収束保証とは違い、正規性条件下での雑音条件下での低ランクテンソル推定に対するRGNの最初の局所二次収束保証を証明し、対応する推定誤差上限を与える。 rgnの統計的最適性を示す決定論的推定誤差が上限値に一致する。 RGNの利点は、テンソル回帰とテンソルSVDという2つの機械学習アプリケーションを通して説明される。最後に,理論的な知見を裏付けるシミュレーション結果を提供する。

In this paper, we consider the estimation of a low Tucker rank tensor from a number of noisy linear measurements. The general problem covers many specific examples arising from applications, including tensor regression, tensor completion, and tensor PCA/SVD. We consider an efficient Riemannian Gauss-Newton (RGN) method for low Tucker rank tensor estimation. Different from the generic (super)linear convergence guarantee of RGN in the literature, we prove the first local quadratic convergence guarantee of RGN for low-rank tensor estimation in the noisy setting under some regularity conditions and provide the corresponding estimation error upper bounds. A deterministic estimation error lower bound, which matches the upper bound, is provided that demonstrates the statistical optimality of RGN. The merit of RGN is illustrated through two machine learning applications: tensor regression and tensor SVD. Finally, we provide the simulation results to corroborate our theoretical findings.

翻訳日:2023-07-11 19:54:24 公開日:2023-07-09

# 予期せぬ敵に対するロバスト性テスト

Testing Robustness Against Unforeseen Adversaries ( http://arxiv.org/abs/1908.08016v3 )

ライセンス: Link先を確認

Max Kaufmann, Daniel Kang, Yi Sun, Steven Basart, Xuwang Yin, Mantas Mazeika, Akul Arora, Adam Dziedzic, Franziska Boenisch, Tom Brown, Jacob Steinhardt, Dan Hendrycks

(参考訳) 現実の敵の設定を考えると、ディフェンダーは訓練中に展開時間の完全な敵にアクセスできる可能性は低く、敵は小さなL_p制約の摂動に制限されない現実的な敵の歪みを使用する可能性が高い。この研究と現実の相違を狭めるために、我々は、予期せぬ幅広い敵に対してモデルロバスト性を評価する新しいベンチマークであるImageNet-UAを作成するために使用する18の新たな敵攻撃を導入する。当社は、この一般化ギャップを克服するための幅広い防御戦略を特定し、予期せぬ堅牢性を改善するための豊富な技術空間を見つけるために、ベンチマークを利用しています。 ImageNet-UAの多様性と現実性により、これは現実世界の最悪のケースの堅牢性に取り組む人々にとって有用なツールになり、トレーニング中に見られる攻撃を超えて、より堅牢な防御を開発することができることを期待しています。

When considering real-world adversarial settings, defenders are unlikely to have access to the full range of deployment-time adversaries during training, and adversaries are likely to use realistic adversarial distortions that will not be limited to small L_p-constrained perturbations. To narrow in on this discrepancy between research and reality we introduce eighteen novel adversarial attacks, which we use to create ImageNet-UA, a new benchmark for evaluating model robustness against a wide range of unforeseen adversaries. We make use of our benchmark to identify a range of defense strategies which can help overcome this generalization gap, finding a rich space of techniques which can improve unforeseen robustness. We hope the greater variety and realism of ImageNet-UA will make it a useful tool for those working on real-world worst-case robustness, enabling development of more robust defenses which can generalize beyond attacks seen during training.

翻訳日:2023-07-11 19:51:40 公開日:2023-07-09

# 非支配的ソーティング遺伝的アルゴリズム(NSGA-II)の数学的実行解析

Mathematical Runtime Analysis for the Non-Dominated Sorting Genetic Algorithm II (NSGA-II) ( http://arxiv.org/abs/2112.08581v5 )

ライセンス: Link先を確認

Weijie Zheng, Benjamin Doerr

(参考訳) 非支配的ソート遺伝アルゴリズムII(NSGA-II)は、現実世界の応用において最も集中的に使用される多目的進化アルゴリズムである。しかし、数学的な方法で解析されたいくつかの単純なMOEAとは対照的に、NSGA-IIにはそのような研究は存在しない。本研究では,NSGA-IIにも数学的ランタイム解析が適用可能であることを示す。特に,paretoフロントの大きさの4倍の大きさの個体群を持つnsga-iiは,従来の2つの変異演算子と4つの異なる方法で親を選択することで,基本oneminmaxおよびleadingonestrailingzerosベンチマークにおけるsemoおよびgsemoアルゴリズムと同じ漸近的実行保証を満足できることが証明された。しかし、人口の大きさがパレート前線のサイズに等しい場合、nsga-iiは完全なパレート前線を効率的に計算することはできない。我々の実験は上記の結果を確認した。

The non-dominated sorting genetic algorithm II (NSGA-II) is the most intensively used multi-objective evolutionary algorithm (MOEA) in real-world applications. However, in contrast to several simple MOEAs analyzed also via mathematical means, no such study exists for the NSGA-II so far. In this work, we show that mathematical runtime analyses are feasible also for the NSGA-II. As particular results, we prove that with a population size four times larger than the size of the Pareto front, the NSGA-II with two classic mutation operators and four different ways to select the parents satisfies the same asymptotic runtime guarantees as the SEMO and GSEMO algorithms on the basic OneMinMax and LeadingOnesTrailingZeros benchmarks. However, if the population size is only equal to the size of the Pareto front, then the NSGA-II cannot efficiently compute the full Pareto front: for an exponential number of iterations, the population will always miss a constant fraction of the Pareto front. Our experiments confirm the above findings.

翻訳日:2023-07-11 19:45:00 公開日:2023-07-09

# 単眼路平面視差推定法

Monocular Road Planar Parallax Estimation ( http://arxiv.org/abs/2111.11089v2 )

ライセンス: Link先を確認

Haobo Yuan, Teng Chen, Wei Sui, Jiafeng Xie, Lefei Zhang, Yuan Li, Qian Zhang

(参考訳) ドライブル表面および周辺環境の3次元構造の推定は、補助運転および自律運転にとって重要な課題である。 lidarのような3dセンサーを使うか、ディープラーニングによってポイントの深さを直接予測する。しかし、前者は高価であり、後者はシーンの幾何学的情報を使用しない。本稿では,既存の手法を踏襲する代わりに,平面視差に基づく単眼画像シーケンスから3次元センシングを行う新しい深層ニューラルネットワークである road planar parallax attention network (rpanet) を提案する。 rpanetは、路面のホモグラフィで整列した画像を入力とし、3次元再構成のために$\gamma$ map(高さと深さの比)を出力する。 $\gamma$ 写像は、2つの連続するフレーム間の2次元変換を構築することができる。これは平面視差を意味し、連続するフレームをワープすることで3次元構造を推定するための基準となる道路平面と組み合わせることができる。さらに,平面視差による変位をネットワークがよりよく知覚できるように,新しいクロスアテンションモジュールを導入する。提案手法の有効性を検証するため,Waymo Open Datasetのデータをサンプリングし,平面視差に関するアノテーションを構築する。また,本手法の3次元再構成精度を示すため,サンプルデータセットを用いた総合実験を行った。

Estimating the 3D structure of the drivable surface and surrounding environment is a crucial task for assisted and autonomous driving. It is commonly solved either by using 3D sensors such as LiDAR or directly predicting the depth of points via deep learning. However, the former is expensive, and the latter lacks the use of geometry information for the scene. In this paper, instead of following existing methodologies, we propose Road Planar Parallax Attention Network (RPANet), a new deep neural network for 3D sensing from monocular image sequences based on planar parallax, which takes full advantage of the omnipresent road plane geometry in driving scenes. RPANet takes a pair of images aligned by the homography of the road plane as input and outputs a $\gamma$ map (the ratio of height to depth) for 3D reconstruction. The $\gamma$ map has the potential to construct a two-dimensional transformation between two consecutive frames. It implies planar parallax and can be combined with the road plane serving as a reference to estimate the 3D structure by warping the consecutive frames. Furthermore, we introduce a novel cross-attention module to make the network better perceive the displacements caused by planar parallax. To verify the effectiveness of our method, we sample data from the Waymo Open Dataset and construct annotations related to planar parallax. Comprehensive experiments are conducted on the sampled dataset to demonstrate the 3D reconstruction accuracy of our approach in challenging scenarios.

翻訳日:2023-07-11 19:43:57 公開日:2023-07-09

# SCORE:オフライン強化学習のためのSpurious Correlation Reduction

SCORE: Spurious COrrelation REduction for Offline Reinforcement Learning ( http://arxiv.org/abs/2110.12468v2 )

ライセンス: Link先を確認

Zhihong Deng, Zuyue Fu, Lingxiao Wang, Zhuoran Yang, Chenjia Bai, Tianyi Zhou, Zhaoran Wang, Jing Jiang

(参考訳) オフライン強化学習(RL)は、シーケンシャルな決定問題の解決に大量のデータセットのパワーを利用する。既存の論文では,より広い課題である認識的不確実性と意思決定との相関性,すなわち非最適性を引き起こす重要な要因について検討しながら,分散(ood)行動に対する防御についてのみ論じている。本稿では,実効的かつ理論的に証明可能なアルゴリズムであるオフラインRLに対するSpurious Correlation Reduction (SCORE)を提案する。 SCOREは、標準ベンチマーク(D4RL)において、様々なタスクにおいて3.1倍の高速化でSoTA性能を達成することを実証的に示す。提案アルゴリズムでは,不確かさの高精度な推定を支援するため,アニーリング動作クローニング正則化器を導入している。理論的には,提案手法の合理性を正当化し,その最適方針への収束を軽度仮定下でサブリニアレートで証明する。

Offline reinforcement learning (RL) harnesses the power of massive datasets for resolving sequential decision problems. Most existing papers only discuss defending against out-of-distribution (OOD) actions while we investigate a broader issue, the spurious correlations between epistemic uncertainty and decision-making, an essential factor that causes suboptimality. In this paper, we propose Spurious COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm. We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL). The proposed algorithm introduces an annealing behavior cloning regularizer to help produce a high-quality estimation of uncertainty which is critical for eliminating spurious correlations from suboptimality. Theoretically, we justify the rationality of the proposed method and prove its convergence to the optimal policy with a sublinear rate under mild assumptions.

翻訳日:2023-07-11 19:43:14 公開日:2023-07-09

# スパースMoEが効率的なアンサンブルと出会う

Sparse MoEs meet Efficient Ensembles ( http://arxiv.org/abs/2110.03360v2 )

ライセンス: Link先を確認

James Urquhart Allingham, Florian Wenzel, Zelda E Mariet, Basil Mustafa, Joan Puigcerver, Neil Houlsby, Ghassen Jerfel, Vincent Fortuin, Balaji Lakshminarayanan, Jasper Snoek, Dustin Tran, Carlos Riquelme Ruiz, Rodolphe Jenatton

(参考訳) サブモデルの集約された出力に基づく機械学習モデルは、アクティベーションレベルまたは予測レベルにおいて、個々のモデルと比較して強いパフォーマンスを示すことが多い。本稿では,ニューラルネットワークのアンサンブルと,専門家のスパースミックス(スパースMoE)の2つの人気クラスの相互作用について検討する。まず、2つのアプローチが相補的な特徴を持ち,それらの組み合わせが有益であることを示す。これには、不確実性関連ベンチマークにおけるスパースMoEの包括的な評価が含まれる。次に、両モデルのクラスを最大限に活用するスケーラブルでシンプルなMOEのアンサンブルであるE$^3$(Efficient Ensemble of Experts)を紹介し、深いアンサンブルよりも最大45%少ないFLOPを使用する。大規模な実験では、いくつかの難解な視覚トランスフォーマーベースのベースラインに対して、精度、ログライク、少数ショット学習、ロバスト性、E$^3$の不確実性の改善が示されている。 e$^3$は、最大2.7bのパラメータを持つモデルにスケールしながらその効率を維持するだけでなく、より大きなモデルに対する予測性能と不確実性の推定も改善する。

Machine learning models based on the aggregated outputs of submodels, either at the activation or prediction levels, often exhibit strong performance compared to individual models. We study the interplay of two popular classes of such models: ensembles of neural networks and sparse mixture of experts (sparse MoEs). First, we show that the two approaches have complementary features whose combination is beneficial. This includes a comprehensive evaluation of sparse MoEs in uncertainty related benchmarks. Then, we present Efficient Ensemble of Experts (E$^3$), a scalable and simple ensemble of sparse MoEs that takes the best of both classes of models, while using up to 45% fewer FLOPs than a deep ensemble. Extensive experiments demonstrate the accuracy, log-likelihood, few-shot learning, robustness, and uncertainty improvements of E$^3$ over several challenging vision Transformer-based baselines. E$^3$ not only preserves its efficiency while scaling to models with up to 2.7B parameters, but also provides better predictive performance and uncertainty estimates for larger models.

翻訳日:2023-07-11 19:42:23 公開日:2023-07-09

# ニューラルビデオ圧縮のための生成モデリングの展望

Insights from Generative Modeling for Neural Video Compression ( http://arxiv.org/abs/2107.13136v2 )

ライセンス: Link先を確認

Ruihan Yang, Yibo Yang, Joseph Marino, Stephan Mandt

(参考訳) 最近の機械学習研究は、VAEのような深層生成モデルと学習圧縮で使用される速度歪み損失の関連を明らかにしているが、この研究の大部分は画像に焦点を当てている。同様に、我々は最近提案されたニューラルビデオ符号化アルゴリズムを、深い自己回帰と潜伏変数モデリングのレンズを通して見る。我々は、これらのコーデックを一般化された確率的時間的自己回帰変換の例として提示し、流れの正規化と構造的事前化に触発されたさらなる改善のための新しい道を提案する。本稿では,高精細度ビデオに最先端のビデオ圧縮性能をもたらすいくつかのアーキテクチャを提案し,そのトレードオフと改善について議論する。特に,提案する (i)時間的自己回帰変換の改善 (ii)構造的・時間的依存によるエントロピーモデルの改善、及び (iii)我々のアルゴリズムの可変ビットレートバージョン。我々の改良は既存のモデルと互換性があるため、生成的モデリングの観点がニューラルビデオ符号化の分野を前進させる証拠となる。

While recent machine learning research has revealed connections between deep generative models such as VAEs and rate-distortion losses used in learned compression, most of this work has focused on images. In a similar spirit, we view recently proposed neural video coding algorithms through the lens of deep autoregressive and latent variable modeling. We present these codecs as instances of a generalized stochastic temporal autoregressive transform, and propose new avenues for further improvements inspired by normalizing flows and structured priors. We propose several architectures that yield state-of-the-art video compression performance on high-resolution video and discuss their tradeoffs and ablations. In particular, we propose (i) improved temporal autoregressive transforms, (ii) improved entropy models with structured and temporal dependencies, and (iii) variable bitrate versions of our algorithms. Since our improvements are compatible with a large class of existing models, we provide further evidence that the generative modeling viewpoint can advance the neural video coding field.

翻訳日:2023-07-11 19:42:03 公開日:2023-07-09

# KenSwQuAD - Swahili低リソース言語のための質問回答データセット

KenSwQuAD -- A Question Answering Dataset for Swahili Low Resource Language ( http://arxiv.org/abs/2205.02364v3 )

ライセンス: Link先を確認

Barack W. Wanjawa (1), Lilian D.A. Wanzare (2), Florence Indede (2), Owen McOnyango (2), Lawrence Muchemi (1), Edward Ombui (3) ((1) University of Nairobi Kenya, (2) Maseno University Kenya (3) Africa Nazarene University Kenya)

(参考訳) 低リソース言語における質問回答データセットの必要性はこの研究の動機であり、Kencorpus Swahili Question Answering Dataset, KenSwQuADの開発につながっている。このデータセットは、東アフリカや世界の他の地域で主に話されているスワヒリ低資源言語の生の物語テキストから注釈付けされている。質問応答(QA)データセットは、インターネット検索やダイアログシステムなどのタスクに対する自然言語の機械的理解において重要である。機械学習システムには,本研究で開発されたゴールド標準質問回答セットなどのトレーニングデータが必要である。この研究は、ケニア語コーパスであるKencorpusプロジェクトによって収集されたスワヒリ語のテキストからQAペアを定式化するためにアノテータを雇った。このプロジェクトは、少なくとも5つのQAペアを持つ合計2,585のテキストから1,445の注釈を付け、最終的なデータセットは7,526のQAペアになった。注釈付きテキストの12.5%の品質保証セットは、QAペアがすべて正しく注釈付けされていることを確認した。データセットをQAタスクに適用する概念実証では、データセットがそのようなタスクに使用できることを確認した。 KenSwQuADはスワヒリ語の再配布にも貢献している。

The need for Question Answering datasets in low resource languages is the motivation of this research, leading to the development of Kencorpus Swahili Question Answering Dataset, KenSwQuAD. This dataset is annotated from raw story texts of Swahili low resource language, which is a predominantly spoken in Eastern African and in other parts of the world. Question Answering (QA) datasets are important for machine comprehension of natural language for tasks such as internet search and dialog systems. Machine learning systems need training data such as the gold standard Question Answering set developed in this research. The research engaged annotators to formulate QA pairs from Swahili texts collected by the Kencorpus project, a Kenyan languages corpus. The project annotated 1,445 texts from the total 2,585 texts with at least 5 QA pairs each, resulting into a final dataset of 7,526 QA pairs. A quality assurance set of 12.5% of the annotated texts confirmed that the QA pairs were all correctly annotated. A proof of concept on applying the set to the QA task confirmed that the dataset can be usable for such tasks. KenSwQuAD has also contributed to resourcing of the Swahili language.

翻訳日:2023-07-11 19:35:32 公開日:2023-07-09

# 量子力学学習のための分布外一般化

Out-of-distribution generalization for learning quantum dynamics ( http://arxiv.org/abs/2204.10268v3 )

ライセンス: Link先を確認

Matthias C. Caro, Hsin-Yuan Huang, Nicholas Ezzell, Joe Gibbs, Andrew T. Sornborger, Lukasz Cincio, Patrick J. Coles, Zo\"e Holmes

(参考訳) 一般化バウンダリは、量子機械学習(QML)のトレーニングデータ要求を評価する重要なツールである。最近の研究は、同じデータ分布からトレーニングとテストデータを引き出す量子ニューラルネットワーク(QNN)の分散内一般化の保証を確立している。しかし,qmlでは,異なる分布からトレーニング分布へ引き出されたデータに対しても,トレーニングモデルがうまく機能するように要求されるため,分散一般化の結果は得られていない。ここでは,未知のユニタリを学習するタスクに対する分散の一般化を証明する。特に,積状態のみを訓練することで,絡み合った状態に対するユニタリの作用を学習できることを示す。積状態は単一量子ビットゲートのみを使用して作成できるため、近距離量子ハードウェア上での量子力学の学習の展望を前進させ、量子回路の古典的および量子的コンパイルのための新しい方法をさらに開ける。

Generalization bounds are a critical tool to assess the training data requirements of Quantum Machine Learning (QML). Recent work has established guarantees for in-distribution generalization of quantum neural networks (QNNs), where training and testing data are drawn from the same data distribution. However, there are currently no results on out-of-distribution generalization in QML, where we require a trained model to perform well even on data drawn from a different distribution to the training distribution. Here, we prove out-of-distribution generalization for the task of learning an unknown unitary. In particular, we show that one can learn the action of a unitary on entangled states having trained only product states. Since product states can be prepared using only single-qubit gates, this advances the prospects of learning quantum dynamics on near term quantum hardware, and further opens up new methods for both the classical and quantum compilation of quantum circuits.

翻訳日:2023-07-11 19:34:39 公開日:2023-07-09

# クラウドソーシングにおける空間的未報告格差の定量化

Quantifying Spatial Under-reporting Disparities in Resident Crowdsourcing ( http://arxiv.org/abs/2204.08620v3 )

ライセンス: Link先を確認

Zhi Liu, Uma Bhandaram, Nikhil Garg

(参考訳) 現代の都市ガバナンスは、倒木や電力線といった問題を特定するためにクラウドソーシング(‘コプロダクション’)に大きく依存している。主な懸念は、住民が同じレートで問題を報告しないことである。不均質な報告遅延は、インシデントがいかに迅速に対処できるかで下流の格差に直接翻訳される。このようなアンダーレポートの測定は、定義上、報告されていないインシデントや報告されたインシデントの発生を観測しないため、難しい統計的タスクである。したがって、低報告率と低地動事故率をナレーション的に区別することはできず、報告遅延は観測されない。外部の根拠データを用いずに(ヘテロジェンシーな)報告遅延を識別する手法を開発した。当社の見解では、同じインシデントに関する \textit{duplicate}レポートのレートは、インシデントが発生した後にそのインシデントがレポートレートで発生したかどうかを曖昧化するために利用することができる。このアイデアを用いて、我々は、標準的なポアソンレート推定タスク -- 完全なインシデント報告間隔が守られていないにもかかわらず。我々は、ニューヨークで作成された10万以上のインシデントレポートと、シカゴで作成された90万以上のレポートに適用し、インシデント特性を制御した後でも、インシデントがいかに早く報告されるかにかなりの空間的差異があることを見出します。これらの空間的格差は社会経済的特徴に対応しており、ニューヨーク市では人口密度が高く、大学の学位を持つ人の比率、収入、人口の比率は報告率と正の相関がある。最後に、ニューヨーク市公園・レクリエーション省との協力を利用して、レポートの遅延を見積もると、より公平で効率的な政府サービスのための‘textit{practical}の洞察と介入につながるかを実証する。

Modern city governance relies heavily on crowdsourcing (``co-production'') to identify problems such as downed trees and power-lines. A major concern is that residents do not report problems at the same rates, with heterogeneous reporting delays directly translating to downstream disparities in how quickly incidents can be addressed. Measuring such under-reporting is a difficult statistical task, as, by definition, we do not observe incidents that are not reported or when reported incidents first occurred. Thus, low reporting rates and low ground-truth incident rates cannot be naively distinguished, and reporting delays are unobserved. We develop a method to identify (heterogeneous) reporting delays, without using external ground truth data. Our insight is that rates on \textit{duplicate} reports about the same incident can be leveraged to disambiguate whether an incident has occurred with its reporting rate once it has occurred. Using this idea, we reduce the question to a standard Poisson rate estimation task -- even though the full incident reporting interval is also unobserved. We apply our method to over 100,000 resident reports made in New York City and to over 900,000 reports made in Chicago, finding that there are substantial spatial disparities in how quickly incidents are reported, even after controlling for incident characteristics -- some neighborhoods report three times as quickly as do others. These spatial disparities correspond to socio-economic characteristics: in NYC, higher population density, fraction of people with college degrees, income, and fraction of population that is White all positively correlate with reporting rates. Finally, leveraging a collaboration with the NYC Department of Parks and Recreation, we demonstrate how estimating reporting delays leads to \textit{practical} insights and interventions for more equitable, efficient government service.

翻訳日:2023-07-11 19:34:22 公開日:2023-07-09

# 継続的な学習、速く、ゆっくり

Continual Learning, Fast and Slow ( http://arxiv.org/abs/2209.02370v3 )

ライセンス: Link先を確認

Quang Pham, Chenghao Liu, Steven C. H. Hoi

(参考訳) 神経科学における補足学習システム(cls)理論~\cite{mcclelland1995there} によれば、人間は2つの補足的なシステムを通して効果的な \emph{continual learning} を行う。この理論によって動機づけられた「emph{DualNets}」(デュアルネットワークのための)は、特定のタスクからパターン分離表現を指導する高速学習システムと、自己監視学習(SSL)を介してタスク非依存の汎用表現を学習する遅い学習システムからなる一般的な連続学習フレームワークである。 DualNetsは、両方の表現型を総合的なフレームワークにシームレスに組み込んで、ディープニューラルネットワークの継続的な学習を容易にする。幅広い実験を通じて,オフライン環境からタスク対応環境,オンライン・タスクフリーシナリオまで幅広い学習プロトコルにおいて,デュアルネットの有望な結果を示す。特に、CTrL~\cite{veniat2020efficient}ベンチマークでは、非常に異なる視覚イメージと無関係なタスクを持つため、DualNetsは既存の最先端の動的アーキテクチャ戦略~\cite{ostapenko2021continual}と競合する性能を達成できる。さらに,デュアルネットの有効性,ロバスト性,拡張性を検証するため,包括的なアブレーション研究を行う。コードは \url{https://github.com/phquang/dualnet}で入手できる。

According to the Complementary Learning Systems (CLS) theory~\cite{mcclelland1995there} in neuroscience, humans do effective \emph{continual learning} through two complementary systems: a fast learning system centered on the hippocampus for rapid learning of the specifics, individual experiences; and a slow learning system located in the neocortex for the gradual acquisition of structured knowledge about the environment. Motivated by this theory, we propose \emph{DualNets} (for Dual Networks), a general continual learning framework comprising a fast learning system for supervised learning of pattern-separated representation from specific tasks and a slow learning system for representation learning of task-agnostic general representation via Self-Supervised Learning (SSL). DualNets can seamlessly incorporate both representation types into a holistic framework to facilitate better continual learning in deep neural networks. Via extensive experiments, we demonstrate the promising results of DualNets on a wide range of continual learning protocols, ranging from the standard offline, task-aware setting to the challenging online, task-free scenario. Notably, on the CTrL~\cite{veniat2020efficient} benchmark that has unrelated tasks with vastly different visual images, DualNets can achieve competitive performance with existing state-of-the-art dynamic architecture strategies~\cite{ostapenko2021continual}. Furthermore, we conduct comprehensive ablation studies to validate DualNets efficacy, robustness, and scalability. Code will be made available at \url{https://github.com/phquang/DualNet}.

翻訳日:2023-07-11 19:25:52 公開日:2023-07-09

# 大規模言語モデルを用いた複数人のシミュレーションと人間研究の再現

Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies ( http://arxiv.org/abs/2208.10264v5 )

ライセンス: Link先を確認

Gati Aher, Rosa I. Arriaga, Adam Tauman Kalai

(参考訳) チューリング実験(te)と呼ばれる新しいタイプのテストを導入し、gptモデルのような特定の言語モデルが人間の行動の様々な側面をシミュレートできるかどうかを評価する。 TEはまた、言語モデルの特定の人間の振る舞いのシミュレーションにおいて一貫した歪みを明らかにすることができる。単一の任意の個人をシミュレートするチューリングテストとは異なり、TEは人体研究の参加者の代表サンプルをシミュレートする必要がある。我々は,先行研究から確立した発見を再現しようとするTEを行う。我々は、TEをシミュレーションするための方法論を設計し、異なる言語モデルが古典的な経済、精神言語、社会心理学の実験をいかにうまく再現できるかを比較するために、Ultimatum Game、Garden Path Sentences、Milgram Shock Experiment、Wisdom of Crowds。最初の3つのTEでは、既存の発見は最近のモデルで再現され、最後のTEでは、一部の言語モデル(ChatGPTやGPT-4など)に「超精度の歪み」があることが示され、教育や芸術における下流の応用に影響を及ぼす可能性がある。

We introduce a new type of test, called a Turing Experiment (TE), for evaluating to what extent a given language model, such as GPT models, can simulate different aspects of human behavior. A TE can also reveal consistent distortions in a language model's simulation of a specific human behavior. Unlike the Turing Test, which involves simulating a single arbitrary individual, a TE requires simulating a representative sample of participants in human subject research. We carry out TEs that attempt to replicate well-established findings from prior studies. We design a methodology for simulating TEs and illustrate its use to compare how well different language models are able to reproduce classic economic, psycholinguistic, and social psychology experiments: Ultimatum Game, Garden Path Sentences, Milgram Shock Experiment, and Wisdom of Crowds. In the first three TEs, the existing findings were replicated using recent models, while the last TE reveals a "hyper-accuracy distortion" present in some language models (including ChatGPT and GPT-4), which could affect downstream applications in education and the arts.

翻訳日:2023-07-11 19:24:34 公開日:2023-07-09

# オンライン診断最小化による適応的ドメイン一般化

Adaptive Domain Generalization via Online Disagreement Minimization ( http://arxiv.org/abs/2208.01996v2 )

ライセンス: Link先を確認

Xin Zhang, Ying-Cong Chen

(参考訳) ディープニューラルネットワークは、デプロイメントとトレーニングの間に分散シフトがある場合、パフォーマンスが著しく低下する。ドメインジェネリゼーション(dg)は、ソースドメインの集合のみに依存することによって、モデルをターゲットドメインに安全に転送することを目的としている。様々なDGアプローチが提案されているが、DomainBedという最近の研究によると、そのほとんどは単純な経験的リスク最小化(ERM)に勝っていない。そこで本研究では,既存のDGアルゴリズムに直交する汎用フレームワークを提案する。従来のdgと異なり、静的なソースモデルに固執して普遍的であることを願っているが、提案するadaodmは、異なるターゲットドメインのテスト時にソースモデルを適応的に修正する。具体的には、共有ドメインジェネリック特徴抽出器上に複数のドメイン固有の分類器を作成する。特徴抽出器と分類器は、その特徴抽出器が入力サンプルをドメイン不変空間に埋め込み、複数の分類器がそれぞれが特定のソースドメインに関連する決定境界をキャプチャする逆方向で訓練される。テスト中、ソース分類器間の予測不一致を利用して、ターゲットドメインとソースドメインの分布差を効果的に測定できる。テスト時に不一致を最小限に抑えるためにソースモデルを微調整することで、ターゲットドメイン機能は不変機能空間とよく一致します。 AdaODMは、EMMとCORALという2つの一般的なDG手法と、VLCS、PACS、OfficeHome、TerraIncognitaという4つのDGベンチマークで検証する。その結果, adaodm は未認識領域の一般化能力を安定的に改善し, 最先端の性能を実現する。

Deep neural networks suffer from significant performance deterioration when there exists distribution shift between deployment and training. Domain Generalization (DG) aims to safely transfer a model to unseen target domains by only relying on a set of source domains. Although various DG approaches have been proposed, a recent study named DomainBed, reveals that most of them do not beat the simple Empirical Risk Minimization (ERM). To this end, we propose a general framework that is orthogonal to existing DG algorithms and could improve their performance consistently. Unlike previous DG works that stake on a static source model to be hopefully a universal one, our proposed AdaODM adaptively modifies the source model at test time for different target domains. Specifically, we create multiple domain-specific classifiers upon a shared domain-generic feature extractor. The feature extractor and classifiers are trained in an adversarial way, where the feature extractor embeds the input samples into a domain-invariant space, and the multiple classifiers capture the distinct decision boundaries that each of them relates to a specific source domain. During testing, distribution differences between target and source domains could be effectively measured by leveraging prediction disagreement among source classifiers. By fine-tuning source models to minimize the disagreement at test time, target domain features are well aligned to the invariant feature space. We verify AdaODM on two popular DG methods, namely ERM and CORAL, and four DG benchmarks, namely VLCS, PACS, OfficeHome, and TerraIncognita. The results show AdaODM stably improves the generalization capacity on unseen domains and achieves state-of-the-art performance.

翻訳日:2023-07-11 19:24:13 公開日:2023-07-09

# 多言語対話における多言語対応

Multilingual Coreference Resolution in Multiparty Dialogue ( http://arxiv.org/abs/2208.01307v2 )

ライセンス: Link先を確認

Boyuan Zheng, Patrick Xia, Mahsa Yarmohammadi, Benjamin Van Durme

(参考訳) エンティティのコリファレンス解決のための既存のマルチパーティ対話データセットが誕生したばかりだが、多くの課題はまだ解決されていない。そこで我々は,テレビの文字起こしに基づく大規模データセットMultilingual Multiparty Coref (MMC) を作成した。複数の言語でゴールド品質の字幕が利用できるため、アノテーションを再利用して他の言語(中国語とFarsi)で銀のコア参照解決データを作成することを提案する。金(英)データでは、市販のモデルはMCCでは比較的低性能であり、MCCは以前のデータセットよりも多人数のコア参照を幅広くカバーしていることを示している。シルバーデータでは、データ拡張とゼロショットの言語間設定を効果的にシミュレートするスクラッチからのトレーニングの両方にそれを使うことに成功した。

Existing multiparty dialogue datasets for entity coreference resolution are nascent, and many challenges are still unaddressed. We create a large-scale dataset, Multilingual Multiparty Coref (MMC), for this task based on TV transcripts. Due to the availability of gold-quality subtitles in multiple languages, we propose reusing the annotations to create silver coreference resolution data in other languages (Chinese and Farsi) via annotation projection. On the gold (English) data, off-the-shelf models perform relatively poorly on MMC, suggesting that MMC has broader coverage of multiparty coreference than prior datasets. On the silver data, we find success both using it for data augmentation and training from scratch, which effectively simulates the zero-shot cross-lingual setting.

翻訳日:2023-07-11 19:23:45 公開日:2023-07-09

# indecision tree: 量化不確実性下での議論に基づく推論の学習

Indecision Trees: Learning Argument-Based Reasoning under Quantified Uncertainty ( http://arxiv.org/abs/2206.12252v2 )

ライセンス: Link先を確認

Jonathan S. Kent, David H. Menager

(参考訳) 現実世界での機械学習システムの使用は、しばしば問題となり、説明不能なブラックボックスモデル、不完全な測定の仮定された確実性、確率分布の代わりに単一の分類を提供する。本稿では,不確実性の下で学習し,不確実性の下で推論を行い,可能なラベル上で強固な分布を提供し,他の推論システムで使用する論理的な引数の集合に分解できる決定木の改良であるindecision treeを提案する。

Using Machine Learning systems in the real world can often be problematic, with inexplicable black-box models, the assumed certainty of imperfect measurements, or providing a single classification instead of a probability distribution. This paper introduces Indecision Trees, a modification to Decision Trees which learn under uncertainty, can perform inference under uncertainty, provide a robust distribution over the possible labels, and can be disassembled into a set of logical arguments for use in other reasoning systems.

翻訳日:2023-07-11 19:23:07 公開日:2023-07-09

# Survival Kernets: 精度保証によるスケーラブルで解釈可能なDeep Kernel Survival Analysis

Survival Kernets: Scalable and Interpretable Deep Kernel Survival Analysis with an Accuracy Guarantee ( http://arxiv.org/abs/2206.10477v4 )

ライセンス: Link先を確認

George H. Chen

(参考訳) カーネルサバイバル解析モデルは、2つのデータポイント間の類似度を測定するカーネル関数の助けを借りて、個々のサバイバル分布を推定する。このようなカーネル関数は、ディープカーネルサバイバルモデルを用いて学習することができる。本稿では,モデル解釈や理論解析に適した方法で大規模データセットにスケール可能な,サバイバルカーネットと呼ばれる新しいディープカーネルサバイバルモデルを提案する。具体的には、最近開発されたカーネルネットと呼ばれる分類と回帰のためのトレーニングセット圧縮スキームに基づいて、トレーニングデータをクラスタに分割し、サバイバル分析設定に拡張する。テスト時には、各データポイントをこれらのクラスタの重み付けの組み合わせとして表現し、それぞれのクラスタを可視化することができる。生存カーネットの特殊な場合、予測生存分布に縛られる有限サンプル誤差を、ログ係数まで最適に設定する。上記のカーネルネット圧縮戦略を用いてテスト時のスケーラビリティを実現する一方で、トレーニング中のスケーラビリティは、XGBoostのようなツリーアンサンブルに基づくウォームスタート手順と、ニューラルネットワーク探索を加速するためのヒューリスティックアプローチによって達成される。異なるサイズ(約300万データポイントまで)の標準生存分析データセットでは、時間依存コンコーダンス指数で検証された各種ベースラインと比較して、生存カーネットは高い競争力を示す。私たちのコードは、https://github.com/georgehc/survival-kernetsで利用可能です。

Kernel survival analysis models estimate individual survival distributions with the help of a kernel function, which measures the similarity between any two data points. Such a kernel function can be learned using deep kernel survival models. In this paper, we present a new deep kernel survival model called a survival kernet, which scales to large datasets in a manner that is amenable to model interpretation and also theoretical analysis. Specifically, the training data are partitioned into clusters based on a recently developed training set compression scheme for classification and regression called kernel netting that we extend to the survival analysis setting. At test time, each data point is represented as a weighted combination of these clusters, and each such cluster can be visualized. For a special case of survival kernets, we establish a finite-sample error bound on predicted survival distributions that is, up to a log factor, optimal. Whereas scalability at test time is achieved using the aforementioned kernel netting compression strategy, scalability during training is achieved by a warm-start procedure based on tree ensembles such as XGBoost and a heuristic approach to accelerating neural architecture search. On four standard survival analysis datasets of varying sizes (up to roughly 3 million data points), we show that survival kernets are highly competitive compared to various baselines tested in terms of time-dependent concordance index. Our code is available at: https://github.com/georgehc/survival-kernets

翻訳日:2023-07-11 19:22:58 公開日:2023-07-09

# metafed: 循環型知識蒸留によるパーソナライズ医療における連合学習

MetaFed: Federated Learning among Federations with Cyclic Knowledge Distillation for Personalized Healthcare ( http://arxiv.org/abs/2206.08516v3 )

ライセンス: Link先を確認

Yiqiang Chen, Wang Lu, Xin Qin, Jindong Wang, Xing Xie

(参考訳) フェデレーション学習は、特にヘルスケアにおいて、生のユーザーデータにアクセスせずにモデルを構築することに注目が集まっている。実際のアプリケーションでは、異なるフェデレーションは、データの不均一性や中央サーバの不信/不信など、起こりうる理由により、ほとんど連携できない。本稿では,異なるフェデレーション間の信頼性の高いFLを実現するためのMetaFedというフレームワークを提案する。 MetaFedは、提案されたサイクリック知識蒸留を通じて、中央サーバーなしで各フェデレーションのパーソナライズされたモデルを取得する。具体的には、MetaFedは各フェデレーションをメタ分布として扱い、各フェデレーションの知識を循環的に集約する。トレーニングは、共通知識蓄積とパーソナライズという2つの部分に分けられる。 3つのベンチマークの総合的な実験により、MetaFedは最先端の手法(PAMAP2のベースラインに比べて10%以上精度が向上している)に比べて通信コストが低いことが示されている。

Federated learning has attracted increasing attention to building models without accessing the raw user data, especially in healthcare. In real applications, different federations can seldom work together due to possible reasons such as data heterogeneity and distrust/inexistence of the central server. In this paper, we propose a novel framework called MetaFed to facilitate trustworthy FL between different federations. MetaFed obtains a personalized model for each federation without a central server via the proposed Cyclic Knowledge Distillation. Specifically, MetaFed treats each federation as a meta distribution and aggregates knowledge of each federation in a cyclic manner. The training is split into two parts: common knowledge accumulation and personalization. Comprehensive experiments on three benchmarks demonstrate that MetaFed without a server achieves better accuracy compared to state-of-the-art methods (e.g., 10%+ accuracy improvement compared to the baseline for PAMAP2) with fewer communication costs.

翻訳日:2023-07-11 19:22:33 公開日:2023-07-09

# 再帰的分割のポイントワイズ挙動とその不均一因果効果推定への応用について

On the Pointwise Behavior of Recursive Partitioning and Its Implications for Heterogeneous Causal Effect Estimation ( http://arxiv.org/abs/2211.10805v2 )

ライセンス: Link先を確認

Matias D. Cattaneo, Jason M. Klusowski, Peter M. Tian

(参考訳) 決定木学習は、ポイントワイズ推論にますます使われている。重要な応用例としては、因果的不均質な治療効果や動的政策決定、条件付き質的回帰や実験の設計などがある。本稿では,決定木(適応再帰的分割によって訓練される)が一様ノルムにおける収束率を定式化しても達成できないことを示すことで,決定木の使用を疑問視する。代わりに、収束は多対数であるかもしれないし、正直な回帰木のようないくつかの重要な特殊ケースでは、完全に失敗する。ランダムな森林は、樹木をほとんど最適な手順に転換し、解釈可能性を失い、さらに2つの追加のチューニングパラメータを導入することで状況を改善することができることを示す。ランダム林の2つの特徴, サブサンプリングとランダム特徴選択機構は, それぞれが考慮されたモデルクラスに対してほぼ最適な性能を達成するのに顕著に寄与している。

Decision tree learning is increasingly being used for pointwise inference. Important applications include causal heterogenous treatment effects and dynamic policy decisions, as well as conditional quantile regression and design of experiments, where tree estimation and inference is conducted at specific values of the covariates. In this paper, we call into question the use of decision trees (trained by adaptive recursive partitioning) for such purposes by demonstrating that they can fail to achieve polynomial rates of convergence in uniform norm, even with pruning. Instead, the convergence may be poly-logarithmic or, in some important special cases, such as honest regression trees, fail completely. We show that random forests can remedy the situation, turning poor performing trees into nearly optimal procedures, at the cost of losing interpretability and introducing two additional tuning parameters. The two hallmarks of random forests, subsampling and the random feature selection mechanism, are seen to each distinctively contribute to achieving nearly optimal performance for the model class considered.

翻訳日:2023-07-11 19:15:54 公開日:2023-07-09

# nano: 最小限の言語モデル制御のためのループ内人間報酬学習

Nano: Nested Human-in-the-Loop Reward Learning for Few-shot Language Model Control ( http://arxiv.org/abs/2211.05750v2 )

ライセンス: Link先を確認

Xiang Fan, Yiwei Lyu, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency

(参考訳) 事前訓練された言語モデルは、言語生成において異常な能力を示した。しかし、現実のタスクは、バイアスを緩和し、公平性を促進し、パーソナライズを達成するために、生成されたテキストの分配を制御する必要があることが多い。生成したテキストの分布を制御する既存の技術は、あらかじめ定義されたカテゴリ、分布の比率、あるいは所望の分布に従う既存のコーパスを必要とする、定量化された分布でのみ機能する。しかし、個人の好みなど多くの重要な分布は不適切である。本研究では,人間のフィードバックから継続的に学習する数発の学習アルゴリズムであるnanoを提案することで,任意の分布(定量化,非定量化)に従ってテキストを生成する問題に取り組む。 nanoは、以前の作品と比較して、単一のトピック/属性と定量化された分布制御で最先端の結果を得る。また,nanoは非定量的分布を学習し,パーソナライゼーションを実現し,サンプル効率の高い個人選好の違いを捉えることができることを示した。

Pretrained language models have demonstrated extraordinary capabilities in language generation. However, real-world tasks often require controlling the distribution of generated text in order to mitigate bias, promote fairness, and achieve personalization. Existing techniques for controlling the distribution of generated text only work with quantified distributions, which require pre-defined categories, proportions of the distribution, or an existing corpus following the desired distributions. However, many important distributions, such as personal preferences, are unquantified. In this work, we tackle the problem of generating text following arbitrary distributions (quantified and unquantified) by proposing Nano, a few-shot human-in-the-loop training algorithm that continuously learns from human feedback. Nano achieves state-of-the-art results on single topic/attribute as well as quantified distribution control compared to previous works. We also show that Nano is able to learn unquantified distributions, achieves personalization, and captures differences between different individuals' personal preferences with high sample efficiency.

翻訳日:2023-07-11 19:15:36 公開日:2023-07-09

# ブラックボックス検証アルゴリズムを用いた強化学習による運転の安全性向上

Self-Improving Safety Performance of Reinforcement Learning Based Driving with Black-Box Verification Algorithms ( http://arxiv.org/abs/2210.16575v3 )

ライセンス: Link先を確認

Resul Dagdanov, Halil Durmus, Nazim Kemal Ure

(参考訳) 本研究では,強化学習(RL)に基づく自律運転(AD)エージェントの安全性向上を目的とした,ブラックボックス検証手法を用いた自己改善人工知能システムを提案する。近年,ADアプリケーションでRLアルゴリズムが普及している。しかし、既存のRLアルゴリズムの性能はトレーニングシナリオの多様性に大きく依存している。トレーニング段階での安全性クリティカルなシナリオの欠如は、実世界の運転アプリケーションの一般化性能を低下させる可能性がある。本稿では,ブラックボックス検証手法を用いて,トレーニングセットの弱点を探索する新しい枠組みを提案する。 AD障害シナリオを発見した後、RLエージェントのトレーニングは転送学習を通じて再起動され、以前は安全ではなかったシナリオのパフォーマンスが向上する。シミュレーションの結果,RLに基づく適応巡航制御(ACC)アプリケーションにおける動作決定の安全性の低下を効果的に発見し,本手法の反復的適用により車両衝突回数を大幅に削減することを示した。ソースコードはhttps://github.com/data-and-decision-lab/self-improving-RLで公開されている。

In this work, we propose a self-improving artificial intelligence system to enhance the safety performance of reinforcement learning (RL)-based autonomous driving (AD) agents using black-box verification methods. RL algorithms have become popular in AD applications in recent years. However, the performance of existing RL algorithms heavily depends on the diversity of training scenarios. A lack of safety-critical scenarios during the training phase could result in poor generalization performance in real-world driving applications. We propose a novel framework in which the weaknesses of the training set are explored through black-box verification methods. After discovering AD failure scenarios, the RL agent's training is re-initiated via transfer learning to improve the performance of previously unsafe scenarios. Simulation results demonstrate that our approach efficiently discovers safety failures of action decisions in RL-based adaptive cruise control (ACC) applications and significantly reduces the number of vehicle collisions through iterative applications of our method. The source code is publicly available at https://github.com/data-and-decision-lab/self-improving-RL.

翻訳日:2023-07-11 19:15:18 公開日:2023-07-09

# シミュレーションに基づく推論のための合成スコアモデリング

Compositional Score Modeling for Simulation-based Inference ( http://arxiv.org/abs/2209.14249v3 )

ライセンス: Link先を確認

Tomas Geffner, George Papamakarios, Andriy Mnih

(参考訳) シミュレーションに基づく推論のための神経後部推定法は、正確な近似を学習するために多数のシミュレーターコールを必要とする傾向があるため、複数の観測で条件付けした後部分布を扱うのに不適である。対照的に、Neural Likelihood Estimation法は個々の観測から学んだ後の推論時間で複数の観測を処理できるが、MCMCや変分推論のような標準的な推論法に依存しており、特定の性能上の欠点がある。本稿では,両手法の利点を享受する条件スコアモデリングに基づく新しい手法を提案する。個々の観測によって引き起こされる(拡散した)後方分布のスコアをモデル化し、学習したスコアを目標後方分布からほぼサンプルに結合する方法を紹介する。提案手法はサンプル効率が高く,自然に複数の観測結果を推定時に集約し,標準推定手法の欠点を回避することができる。

Neural Posterior Estimation methods for simulation-based inference can be ill-suited for dealing with posterior distributions obtained by conditioning on multiple observations, as they tend to require a large number of simulator calls to learn accurate approximations. In contrast, Neural Likelihood Estimation methods can handle multiple observations at inference time after learning from individual observations, but they rely on standard inference methods, such as MCMC or variational inference, which come with certain performance drawbacks. We introduce a new method based on conditional score modeling that enjoys the benefits of both approaches. We model the scores of the (diffused) posterior distributions induced by individual observations, and introduce a way of combining the learned scores to approximately sample from the target posterior distribution. Our approach is sample-efficient, can naturally aggregate multiple observations at inference time, and avoids the drawbacks of standard inference methods.

翻訳日:2023-07-11 19:14:22 公開日:2023-07-09

# DynDepNet:動的グラフ構造学習によるfMRIデータからの時間変化依存構造学習

DynDepNet: Learning Time-Varying Dependency Structures from fMRI Data via Dynamic Graph Structure Learning ( http://arxiv.org/abs/2209.13513v3 )

ライセンス: Link先を確認

Alexander Campbell, Antonio Giuliano Zippo, Luca Passamonti, Nicola Toschi, Pietro Lio

(参考訳) グラフニューラルネットワーク(GNN)は、機能的磁気共鳴画像(fMRI)データから得られる脳グラフの学習表現に成功している。しかし、既存のGNN法では、脳グラフは時間とともに静的であると仮定し、グラフ隣接行列はモデルトレーニング前に知られている。これらの仮定は、脳グラフが機能的接続尺度の選択に依存する接続構造を持つ時間変化である証拠と矛盾する。ノイズの多い脳グラフでfMRIデータを誤って表現することは、GNNのパフォーマンスに悪影響を及ぼす可能性がある。そこで我々は,下流予測タスクによって誘導されるfMRIデータの最適時間変化依存性構造を学習するDynDepNetを提案する。実世界のfMRIデータセットの実験は、性別分類のタスクにおいて、DynDepNetが最先端の結果を達成し、それぞれ8ポイントと6ポイントの精度で最高のベースラインを上回ります。さらに、学習したダイナミックグラフの分析により、既存の神経科学文献と一致する予測関連脳領域が明らかになる。

Graph neural networks (GNNs) have demonstrated success in learning representations of brain graphs derived from functional magnetic resonance imaging (fMRI) data. However, existing GNN methods assume brain graphs are static over time and the graph adjacency matrix is known prior to model training. These assumptions contradict evidence that brain graphs are time-varying with a connectivity structure that depends on the choice of functional connectivity measure. Incorrectly representing fMRI data with noisy brain graphs can adversely affect GNN performance. To address this, we propose DynDepNet, a novel method for learning the optimal time-varying dependency structure of fMRI data induced by downstream prediction tasks. Experiments on real-world fMRI datasets, for the task of sex classification, demonstrate that DynDepNet achieves state-of-the-art results, outperforming the best baseline in terms of accuracy by approximately 8 and 6 percentage points, respectively. Furthermore, analysis of the learned dynamic graphs reveals prediction-related brain regions consistent with existing neuroscience literature.

翻訳日:2023-07-11 19:14:06 公開日:2023-07-09

# 高次元(ロバスト)ワッサースタインアライメントに対するデータ依存的アプローチ

A Data-dependent Approach for High Dimensional (Robust) Wasserstein Alignment ( http://arxiv.org/abs/2209.02905v2 )

ライセンス: Link先を確認

Hu Ding, Wenjie Liu, Mingquan Ye

(参考訳) 多くの実世界の問題は、2つの幾何学的パターンのアライメントとして定式化することができる。これまで多くの研究が、コンピュータビジョンの分野における2dまたは3dパターンのアライメントに焦点を当ててきた。近年,高次元のアライメント問題にいくつかの新しい応用が提案されている。しかし、この研究はアルゴリズム的な側面ではまだ限られている。我々の知る限りでは、既存のほとんどのアプローチは2次元および3次元のケースに対する単純な拡張であり、高い計算複雑性のような問題に悩まされることが多い。本稿では,高次元幾何学パターンを圧縮する効果的な枠組みを提案する。既存のアライメント法は圧縮幾何パターンに適用でき、時間の複雑さを大幅に削減できる。我々の考えは、高次元データはしばしば本質的な次元が低いという観察にインスパイアされている。我々のフレームワークは ``data-dependent' アプローチであり、入力データの本質的な次元に依存する複雑さを持つ。実験結果から, 圧縮パターン上でのアライメントアルゴリズムの実行は, 元のパターンと比較すると, 同様の特性が得られることがわかったが, 実行時(圧縮にかかる時間を含む)は著しく低い。

Many real-world problems can be formulated as the alignment between two geometric patterns. Previously, a great amount of research focus on the alignment of 2D or 3D patterns in the field of computer vision. Recently, the alignment problem in high dimensions finds several novel applications in practice. However, the research is still rather limited in the algorithmic aspect. To the best of our knowledge, most existing approaches are just simple extensions of their counterparts for 2D and 3D cases, and often suffer from the issues such as high computational complexities. In this paper, we propose an effective framework to compress the high dimensional geometric patterns. Any existing alignment method can be applied to the compressed geometric patterns and the time complexity can be significantly reduced. Our idea is inspired by the observation that high dimensional data often has a low intrinsic dimension. Our framework is a ``data-dependent'' approach that has the complexity depending on the intrinsic dimension of the input data. Our experimental results reveal that running the alignment algorithm on compressed patterns can achieve similar qualities, comparing with the results on the original patterns, but the runtimes (including the times cost for compression) are substantially lower.

翻訳日:2023-07-11 19:13:33 公開日:2023-07-09

# 高品質シャドウ合成によるシャドウ除去

Shadow Removal by High-Quality Shadow Synthesis ( http://arxiv.org/abs/2212.04108v2 )

ライセンス: Link先を確認

Yunshan Zhong, Lizhou You, Yuxin Zhang, Fei Chao, Yonghong Tian, Rongrong Ji

(参考訳) ほとんどのシャドウ除去手法は、精巧で豪華なシャドウ領域アノテーションに関連するトレーニング画像の侵入に依存しているため、シャドウ画像合成の人気が高まっている。しかし、これらの合成画像は、しばしば陰性で細部が不完全であるため、性能が劣っている。本稿では,高品質擬似影画像合成のためのhqssと呼ばれる新しい生成フレームワークを提案する。与えられた画像はまずシャドー領域idと非シャドー領域idに分離される。 HQSSは擬似画像を合成するためにシャドー機能エンコーダとジェネレータを使用している。具体的には、エンコーダは、他の領域アイデンティティとペアになって擬似画像を合成するジェネレータ入力として機能する領域アイデンティティの影特徴を抽出する。擬似画像は、その入力影特徴としての影特徴と、その入力領域のアイデンティティとしてのリアルライクな画像詳細を有することが期待されている。この目標を達成するために,我々は3つの学習目標を設計する。影の特徴と入力領域のアイデンティティが同じ領域の同一性を持つ場合、生成元を誘導して同一の擬似画像を入力として再構成する自己再構成損失を提案する。シャドウ特徴と入力領域の同一性が異なる場合、合成画像中にシャドウ特性と詳細情報が適切に保持されることを確認するために、再構成間損失とサイクル再構成損失を導入する。我々のHQSSは、ISTDデータセット、ビデオシャドウ除去データセット、SRDデータセットにおいて最先端の手法よりも優れています。コードはhttps://github.com/zysxmu/hqssで入手できる。

Most shadow removal methods rely on the invasion of training images associated with laborious and lavish shadow region annotations, leading to the increasing popularity of shadow image synthesis. However, the poor performance also stems from these synthesized images since they are often shadow-inauthentic and details-impaired. In this paper, we present a novel generation framework, referred to as HQSS, for high-quality pseudo shadow image synthesis. The given image is first decoupled into a shadow region identity and a non-shadow region identity. HQSS employs a shadow feature encoder and a generator to synthesize pseudo images. Specifically, the encoder extracts the shadow feature of a region identity which is then paired with another region identity to serve as the generator input to synthesize a pseudo image. The pseudo image is expected to have the shadow feature as its input shadow feature and as well as a real-like image detail as its input region identity. To fulfill this goal, we design three learning objectives. When the shadow feature and input region identity are from the same region identity, we propose a self-reconstruction loss that guides the generator to reconstruct an identical pseudo image as its input. When the shadow feature and input region identity are from different identities, we introduce an inter-reconstruction loss and a cycle-reconstruction loss to make sure that shadow characteristics and detail information can be well retained in the synthesized images. Our HQSS is observed to outperform the state-of-the-art methods on ISTD dataset, Video Shadow Removal dataset, and SRD dataset. The code is available at https://github.com/zysxmu/HQSS.

翻訳日:2023-07-11 19:04:54 公開日:2023-07-09

# 連続学習のための逐次ベイズ推論について

On Sequential Bayesian Inference for Continual Learning ( http://arxiv.org/abs/2301.01828v2 )

ライセンス: Link先を確認

Samuel Kessler, Adam Cobb, Tim G. J. Rudner, Stefan Zohren, Stephen J. Roberts

(参考訳) 連続ベイズ推論は、過去のタスクの破滅的な忘れ込みを防止し、新しいタスクを学ぶ前に情報を提供するために連続学習に使用できる。我々はシーケンシャルベイズ推定を再検討し、真の後方へのアクセスがベイズニューラルネットワークの破滅的な忘れを防げるかどうかを検証する。これを行うために、ハミルトンモンテカルロを用いて連続ベイズ推論を行う。我々は、ハミルトンモンテカルロサンプルに密度推定器を組み込むことにより、新しいタスクの先行として後部を伝播する。ニューラルネットワークにおける逐次ベイズ推論の実行の困難さを示す破滅的な忘れ込みを防ぐには,このアプローチは失敗する。そこで, 逐次ベイズ推論とCLの簡単な解析例を考察し, 正確な推論にもかかわらず, 準最適連続学習性能に繋がるモデル不特定の問題を強調した。さらに、タスクデータの不均衡がいかに忘れてしまうかについて議論する。これらの制限から、ベイズニューラルネットワークの重みに対する逐次ベイズ推定に頼るのではなく、連続的な学習生成過程の確率論的モデルが必要であると論じる。そこで本研究では,古典的ベイズ連続学習法と競合する,原型的ベイズ連続学習という単純なベースラインを提案する。

Sequential Bayesian inference can be used for continual learning to prevent catastrophic forgetting of past tasks and provide an informative prior when learning new tasks. We revisit sequential Bayesian inference and test whether having access to the true posterior is guaranteed to prevent catastrophic forgetting in Bayesian neural networks. To do this we perform sequential Bayesian inference using Hamiltonian Monte Carlo. We propagate the posterior as a prior for new tasks by fitting a density estimator on Hamiltonian Monte Carlo samples. We find that this approach fails to prevent catastrophic forgetting demonstrating the difficulty in performing sequential Bayesian inference in neural networks. From there we study simple analytical examples of sequential Bayesian inference and CL and highlight the issue of model misspecification which can lead to sub-optimal continual learning performance despite exact inference. Furthermore, we discuss how task data imbalances can cause forgetting. From these limitations, we argue that we need probabilistic models of the continual learning generative process rather than relying on sequential Bayesian inference over Bayesian neural network weights. In this vein, we also propose a simple baseline called Prototypical Bayesian Continual Learning, which is competitive with state-of-the-art Bayesian continual learning methods on class incremental continual learning vision benchmarks.

翻訳日:2023-07-11 18:54:53 公開日:2023-07-09

# Adaptive Experimentation at Scale: 柔軟なバッチのための計算フレームワーク

Adaptive Experimentation at Scale: A Computational Framework for Flexible Batches ( http://arxiv.org/abs/2303.11582v3 )

ライセンス: Link先を確認

Ethan Che, Hongseok Namkoong

(参考訳) 計測努力の継続的な再配置を仮定する標準的なバンディットアルゴリズムは、遅延したフィードバックとインフラ/組織的困難のために実装が困難である。結果がバッチで測定される少数の再配置時代の実例に動機づけられて,バッチ処理を柔軟に処理可能な計算駆動型適応実験フレームワークを開発した。我々の主な観察は、統計的推論において普遍的な正規近似は適応アルゴリズムの設計を導くことができることである。ガウスの逐次実験を導出することにより,先行情報を平均報酬に活用できる動的プログラムを定式化する。一般的な理論駆動のパラダイムの代わりに、計算ツールと経験的ベンチマークをアルゴリズム開発に活用する。特に,経験的解析では,確率的勾配降下を用いて計画問題を反復的に解く,単純かつ効果的なアルゴリズムである残留地平線最適化を強調する。我々の手法は、個々の報酬の完全な分布的知識を必要とするベイズ帯域幅アルゴリズム(例えばトンプソンサンプリング)と比較しても、標準手法よりも統計的パワーを著しく向上させる。全体的に、適応実験の範囲を標準的な方法では難しい設定に拡大し、少数の再配置エポック、低い信号対雑音比、未知の報酬分布を含む。

Standard bandit algorithms that assume continual reallocation of measurement effort are challenging to implement due to delayed feedback and infrastructural/organizational difficulties. Motivated by practical instances involving a handful of reallocation epochs in which outcomes are measured in batches, we develop a computation-driven adaptive experimentation framework that can flexibly handle batching. Our main observation is that normal approximations, which are universal in statistical inference, can also guide the design of adaptive algorithms. By deriving a Gaussian sequential experiment, we formulate a dynamic program that can leverage prior information on average rewards. Instead of the typical theory-driven paradigm, we leverage computational tools and empirical benchmarking for algorithm development. In particular, our empirical analysis highlights a simple yet effective algorithm, Residual Horizon Optimization, which iteratively solves a planning problem using stochastic gradient descent. Our approach significantly improves statistical power over standard methods, even when compared to Bayesian bandit algorithms (e.g., Thompson sampling) that require full distributional knowledge of individual rewards. Overall, we expand the scope of adaptive experimentation to settings that are difficult for standard methods, involving a small number of reallocation epochs, low signal-to-noise ratio, and unknown reward distributions.

翻訳日:2023-07-11 18:46:09 公開日:2023-07-09

# 3次元点雲における開ボキャブラリーアフォーアンス検出

Open-Vocabulary Affordance Detection in 3D Point Clouds ( http://arxiv.org/abs/2303.02401v2 )

ライセンス: Link先を確認

Toan Nguyen, Minh Nhat Vu, An Vuong, Dzung Nguyen, Thieu Vo, Ngan Le, Anh Nguyen

(参考訳) 加速度検出は様々なロボット応用において難しい問題である。従来のアフォーアンス検出手法は、予め定義されたアフォーアンスラベルに制限されており、複雑な動的環境でのインテリジェントロボットの適応性を制限する可能性がある。そこで,本稿では,3次元点雲内の無拘束数を検出できるopen-vocabulary affordance detection (openad)法を提案する。 OpenADは、手当テキストとポイント特徴を同時に学習することで、手当間の意味的関係をうまく活用する。したがって,提案手法はゼロショット検出が可能であり,単一アノテーションの例を使わずに,事前の認識不能を検出できる。集中的な実験結果から,OpenADは幅広いアベイランス検出装置で効果的に機能し,他のベースラインよりも大きなマージンで優れていた。さらに,高速な推論速度(約100ms)を持つ実世界のロボットアプリケーションにおいて,提案するOpenADの実用性を示す。私たちのプロジェクトはhttps://openad2023.github.ioで利用可能です。

Affordance detection is a challenging problem with a wide variety of robotic applications. Traditional affordance detection methods are limited to a predefined set of affordance labels, hence potentially restricting the adaptability of intelligent robots in complex and dynamic environments. In this paper, we present the Open-Vocabulary Affordance Detection (OpenAD) method, which is capable of detecting an unbounded number of affordances in 3D point clouds. By simultaneously learning the affordance text and the point feature, OpenAD successfully exploits the semantic relationships between affordances. Therefore, our proposed method enables zero-shot detection and can be able to detect previously unseen affordances without a single annotation example. Intensive experimental results show that OpenAD works effectively on a wide range of affordance detection setups and outperforms other baselines by a large margin. Additionally, we demonstrate the practicality of the proposed OpenAD in real-world robotic applications with a fast inference speed (~100ms). Our project is available at https://openad2023.github.io.

翻訳日:2023-07-11 18:44:55 公開日:2023-07-09

# FedCLIP:フェデレートラーニングにおけるCLIPの迅速な一般化とパーソナライズ

FedCLIP: Fast Generalization and Personalization for CLIP in Federated Learning ( http://arxiv.org/abs/2302.13485v2 )

ライセンス: Link先を確認

Wang Lu, Xixu Hu, Jindong Wang, Xing Xie

(参考訳) フェデレーション学習(fl)は,近年,プライバシ保護計算の新しいパラダイムとして登場している。残念ながら、FLはその実際のパフォーマンスを妨げる2つの重要な課題に直面している。特に、異なるクライアントの非IIDデータは既存のFLアルゴリズムを収束させるのを難しくし、実際のシナリオでのデプロイメントの難しさを増大させる計算コストや通信コストを含む高いリソースコストがかかる。本稿では,フェデレート学習におけるCLIPの迅速な一般化とパーソナライズを実現するために,FedCLIPという効果的かつシンプルな手法を提案する。具体的には,大規模モデルであるCLIPのアテンションベースのアダプタを設計し,残りの操作はアダプタにのみ依存する。軽量アダプタは事前訓練されたモデル情報を最大限活用し、特定のタスクにおいてモデルがクライアントに適応することを保証する。同時に、大規模モデルによる計算負担と通信負担を軽減することができる。分布シフトを伴う3つのデータセットに対して大規模な実験を行う。定性的かつ定量的な結果は、FedCLIPが他のベースライン(PACS全体の9%の改善)を著しく上回り、計算と通信のコスト(FedAVGより283倍速い)を効果的に削減していることを示している。私たちのコードは、https://github.com/microsoft/PersonalizedFL.comで利用可能です。

Federated learning (FL) has emerged as a new paradigm for privacy-preserving computation in recent years. Unfortunately, FL faces two critical challenges that hinder its actual performance: data distribution heterogeneity and high resource costs brought by large foundation models. Specifically, the non-IID data in different clients make existing FL algorithms hard to converge while the high resource costs, including computational and communication costs that increase the deployment difficulty in real-world scenarios. In this paper, we propose an effective yet simple method, named FedCLIP, to achieve fast generalization and personalization for CLIP in federated learning. Concretely, we design an attention-based adapter for the large model, CLIP, and the rest operations merely depend on adapters. Lightweight adapters can make the most use of pretrained model information and ensure models be adaptive for clients in specific tasks. Simultaneously, small-scale operations can mitigate the computational burden and communication burden caused by large models. Extensive experiments are conducted on three datasets with distribution shifts. Qualitative and quantitative results demonstrate that FedCLIP significantly outperforms other baselines (9% overall improvements on PACS) and effectively reduces computational and communication costs (283x faster than FedAVG). Our code will be available at: https://github.com/microsoft/PersonalizedFL.

翻訳日:2023-07-11 18:43:37 公開日:2023-07-09

# MCCは幾何平均の精度に近づき、真の負は無限に近づきます

The MCC approaches the geometric mean of precision and recall as true negatives approach infinity ( http://arxiv.org/abs/2305.00594v2 )

ライセンス: Link先を確認

Jon Crall

(参考訳) 二項分類器の性能は、真正数(TP)、真負数(TN)、偽正数(FP)、偽負数(FN)の4つのエントリからなる混乱行列によって記述される。マシューの相関係数(MCC)、F1、Fowlkes-Mallows(FM)スコアは、混乱行列をまとめたスカラーである。 F1 と FM のスコアは、混乱行列の4つのエントリのうち3つしか基づかない(それらは TN を無視している)。対照的に、mcc は混乱行列の4つのエントリすべてを考慮し、より代表的なイメージを提供すると見なすことができる。しかし、物体検出問題において、真の負の数を測定するのは非常に大きいため、しばしば難解である。したがって、真の負の数が無限大に近づくと、MCCはどうなるのか? 本稿では,真の負の数が無限に近づくと,fm測定値がmccの限界値に等しいことを証明し,mccとfmスコアの関係について考察する。

The performance of a binary classifier is described by a confusion matrix with four entries: the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). The Matthew's Correlation Coefficient (MCC), F1, and Fowlkes--Mallows (FM) scores are scalars that summarize a confusion matrix. Both the F1 and FM scores are based on only three of the four entries in the confusion matrix (they ignore TN). In contrast, the MCC takes into account all four entries of the confusion matrix and thus can be seen as providing a more representative picture. However, in object detection problems, measuring the number of true negatives is so large it is often intractable. Thus we ask, what happens to the MCC as the number of true negatives approaches infinity? This paper provides insight into the relationship between the MCC and FM score by proving that the FM-measure is equal to the limit of the MCC as the number of true negatives approaches infinity.

翻訳日:2023-07-11 18:37:16 公開日:2023-07-09

# 動きブレアを有する大規模シーンのためのハイブリッドニューラルレンダリング

Hybrid Neural Rendering for Large-Scale Scenes with Motion Blur ( http://arxiv.org/abs/2304.12652v2 )

ライセンス: Link先を確認

Peng Dai, Yinda Zhang, Xin Yu, Xiaoyang Lyu, Xiaojuan Qi

(参考訳) 新規なビューイメージのレンダリングは多くのアプリケーションにとって非常に望ましい。近年の進歩にもかかわらず、不可避なアーティファクト(例えば、動きのぼかし)で、野生のイメージから大規模シーンの高忠実さとビュー一貫性を保った斬新なビューをレンダリングすることは、依然として困難である。そこで我々は,画像ベース表現とニューラル3D表現を結合して高品質なビュー一貫性画像を生成するハイブリッドなニューラルレンダリングモデルを開発した。さらに、野生で撮影された画像には、レンダリングされた画像の品質を劣化させる動きのぼやけなど、必然的に人工物が含まれている。そこで本研究では,画像のぼかし効果をシミュレートし,ぼやけた画像の悪影響を軽減し,事前計算した品質認識重みに基づいて学習中の重要度を低減させる手法を提案する。実データおよび合成データに関する広範な実験により,新しい視点合成のための最先端のポイントベース手法を超越したモデルが証明された。コードはhttps://daipengwa.github.io/hybrid-rendering-projectpageで入手できる。

Rendering novel view images is highly desirable for many applications. Despite recent progress, it remains challenging to render high-fidelity and view-consistent novel views of large-scale scenes from in-the-wild images with inevitable artifacts (e.g., motion blur). To this end, we develop a hybrid neural rendering model that makes image-based representation and neural 3D representation join forces to render high-quality, view-consistent images. Besides, images captured in the wild inevitably contain artifacts, such as motion blur, which deteriorates the quality of rendered images. Accordingly, we propose strategies to simulate blur effects on the rendered images to mitigate the negative influence of blurriness images and reduce their importance during training based on precomputed quality-aware weights. Extensive experiments on real and synthetic data demonstrate our model surpasses state-of-the-art point-based methods for novel view synthesis. The code is available at https://daipengwa.github.io/Hybrid-Rendering-ProjectPage.

翻訳日:2023-07-11 18:36:49 公開日:2023-07-09

# Adaptive Spiking Encoder-Decoder Network を用いた高精度かつ効率的なイベントベースセマンティックセマンティックセグメンテーション

Accurate and Efficient Event-based Semantic Segmentation Using Adaptive Spiking Encoder-Decoder Network ( http://arxiv.org/abs/2304.11857v2 )

ライセンス: Link先を確認

Rui Zhang, Luziwei Leng, Kaiwei Che, Hu Zhang, Jie Cheng, Qinghai Guo, Jiangxing Liao and Ran Cheng

(参考訳) 低消費電力でイベント駆動型計算と固有の時間的ダイナミクスを活用して、スパイクニューラルネットワーク(SNN)は、イベントベースのセンサーから動的および非同期信号を処理するための、潜在的に理想的なソリューションである。しかしながら、トレーニングの課題とアーキテクチャ設計の制約により、人工知能ニューラルネットワーク(ANN)と比較して、イベントベースの高密度予測という領域における競合するSNNの例は限られている。本稿では,大規模なイベントベースセマンティックセマンティックセグメンテーションタスクのために設計された,効率的なスパイキングエンコーダデコーダネットワークを提案する。これは階層探索法を用いてエンコーダを最適化することで達成される。動的イベントストリームからの学習を強化するために,スパイキングニューロンの固有適応しきい値を用いてネットワーク活性化を変調する。さらに,スパースイベントの表現性を高め,ネットワーク性能を著しく向上させるために,二経路空間適応変調(SSAM)ブロックを導入する。提案するネットワークは,ddd17データセット上では72.57%,最近導入された大規模dsec-semanticデータセットでは57.22%のmiouを実現する。この性能は、現在の最先端のANNを4%上回り、計算リソースを著しく削減している。我々の知る限りでは、イベントベースセマンティックセグメンテーションタスクにおいて、SNNがANNよりも優れていることを示す最初の研究であり、イベントベースビジョンの分野でSNNの巨大な可能性を確立する。私たちのソースコードは公開されます。

Leveraging the low-power, event-driven computation and the inherent temporal dynamics, spiking neural networks (SNNs) are potentially ideal solutions for processing dynamic and asynchronous signals from event-based sensors. However, due to the challenges in training and the restrictions in architectural design, there are limited examples of competitive SNNs in the realm of event-based dense prediction when compared to artificial neural networks (ANNs). In this paper, we present an efficient spiking encoder-decoder network designed for large-scale event-based semantic segmentation tasks. This is achieved by optimizing the encoder using a hierarchical search method. To enhance learning from dynamic event streams, we harness the inherent adaptive threshold of spiking neurons to modulate network activation. Moreover, we introduce a dual-path Spiking Spatially-Adaptive Modulation (SSAM) block, specifically designed to enhance the representation of sparse events, thereby considerably improving network performance. Our proposed network achieves a 72.57% mean intersection over union (MIoU) on the DDD17 dataset and a 57.22% MIoU on the recently introduced, larger DSEC-Semantic dataset. This performance surpasses the current state-of-the-art ANNs by 4%, whilst consuming significantly less computational resources. To the best of our knowledge, this is the first study demonstrating SNNs outperforming ANNs in demanding event-based semantic segmentation tasks, thereby establishing the vast potential of SNNs in the field of event-based vision. Our source code will be made publicly accessible.

翻訳日:2023-07-11 18:36:26 公開日:2023-07-09

# 大規模言語モデルの創造性について

On the Creativity of Large Language Models ( http://arxiv.org/abs/2304.00008v3 )

ライセンス: Link先を確認

Giorgio Franceschelli, Mirco Musolesi

(参考訳) 大規模言語モデル(LLM)は、人工知能のいくつかの領域に革命をもたらしている。最も顕著な応用の1つは、例えば詩やストーリーテリングのような創造的な執筆である: 生成されたアウトプットは、しばしば驚くべき品質である。しかし、自然の疑問が生まれます。 LLMは本当に創造的であるか? この記事では、まず創造性理論のレンズの下でllmの開発を分析し、鍵となるオープン質問と課題を調査します。特に、マーガレット・ボーデン(Margaret Boden)が自身の著書で提案した、価値、斬新、驚きの次元に関する議論に焦点をあてる。次に, 製品, プロセス, プレス, パーソナライズという, 異なる古典的視点を考える。我々は,機械の創造性における「easy」と「hard」の一連の問題を論じ,LLMに関連する問題を提示する。最後に,これらの技術の社会的影響を,特に創造産業に焦点を絞って検討し,それらがもたらす機会,それらによって生じる課題,法的・倫理的な観点からの潜在的なリスクを分析した。

Large Language Models (LLMs) are revolutionizing several areas of Artificial Intelligence. One of the most remarkable applications is creative writing, e.g., poetry or storytelling: the generated outputs are often of astonishing quality. However, a natural question arises: can LLMs be really considered creative? In this article we firstly analyze the development of LLMs under the lens of creativity theories, investigating the key open questions and challenges. In particular, we focus our discussion around the dimensions of value, novelty and surprise as proposed by Margaret Boden in her work. Then, we consider different classic perspectives, namely product, process, press and person. We discuss a set of ``easy'' and ``hard'' problems in machine creativity, presenting them in relation to LLMs. Finally, we examine the societal impact of these technologies with a particular focus on the creative industries, analyzing the opportunities offered by them, the challenges arising by them and the potential associated risks, from both legal and ethical points of view.

翻訳日:2023-07-11 18:35:06 公開日:2023-07-09

# SKIの高速化 - 非対称カーネルによるToeplitzニューラルネットワークの高速化

SKI to go Faster: Accelerating Toeplitz Neural Networks via Asymmetric Kernels ( http://arxiv.org/abs/2305.09028v2 )

ライセンス: Link先を確認

Alexander Moreno, Jonathan Mei, Luke Walters

(参考訳) Toeplitz Neural Networks (TNN) (Qin et. al. 2023) は、印象的な結果を持つ最近のシーケンスモデルである。これらは O(n log n) 計算複雑性と O(n) 相対位置エンコーダ (RPE) 多層パーセプトロン (MLP) と崩壊バイアス呼び出しを必要とする。私たちは両方を減らすことを目指している。まず、RPEは非SPD(対称正定値)カーネルであり、Toeplitz行列は擬グラム行列である。さらに 1) 学習した核は,主対角線付近にスパイクな振る舞いを示す。 2) RPE MLP は遅い。双方向モデルの場合、これはスパースと低ランクのToeplitz行列分解を動機付ける。スパース成分の作用に対して、我々は小さな1D畳み込みを行う。低階成分に対しては、線形補間により RPE MLP を置換し、O(n) の複雑性に対して非対称な構造化カーネル補間 (SKI) (Wilson et. al. 2015) を用いる。因果モデルでは、"高速"因果マスク (Katharopoulos et. al. 2020) はSKIの利点を否定する。周波数領域では、明示的な減衰バイアスを避ける。因果関係を強制するために、RPEを用いて周波数応答の実部を通してカーネルを表現し、ヒルベルト変換を用いて虚部を計算する。これは O(n log n) の複雑性を維持するが、絶対的なスピードアップを達成する。周波数応答を直接モデル化することは、FFTを1つ減らして双方向の訓練にも適している。我々は,最小限のスコア劣化を伴って,ロングレンジアリーナ(Tay et al. 2020)の速度状態を設定した。

Toeplitz Neural Networks (TNNs) (Qin et. al. 2023) are a recent sequence model with impressive results. They require O(n log n) computational complexity and O(n) relative positional encoder (RPE) multi-layer perceptron (MLP) and decay bias calls. We aim to reduce both. We first note that the RPE is a non-SPD (symmetric positive definite) kernel and the Toeplitz matrices are pseudo-Gram matrices. Further 1) the learned kernels display spiky behavior near the main diagonals with otherwise smooth behavior; 2) the RPE MLP is slow. For bidirectional models, this motivates a sparse plus low-rank Toeplitz matrix decomposition. For the sparse component's action, we do a small 1D convolution. For the low rank component, we replace the RPE MLP with linear interpolation and use asymmetric Structured Kernel Interpolation (SKI) (Wilson et. al. 2015) for O(n) complexity: we provide rigorous error analysis. For causal models, "fast" causal masking (Katharopoulos et. al. 2020) negates SKI's benefits. Working in the frequency domain, we avoid an explicit decay bias. To enforce causality, we represent the kernel via the real part of its frequency response using the RPE and compute the imaginary part via a Hilbert transform. This maintains O(n log n) complexity but achieves an absolute speedup. Modeling the frequency response directly is also competitive for bidirectional training, using one fewer FFT. We set a speed state of the art on Long Range Arena (Tay et. al. 2020) with minimal score degradation.

翻訳日:2023-07-11 18:25:35 公開日:2023-07-09

# MARS: 車両損傷事例セグメンテーションのためのシークエンシャル・クアドツリーノードを用いたマスク注意保持

MARS: Mask Attention Refinement with Sequential Quadtree Nodes for Car Damage Instance Segmentation ( http://arxiv.org/abs/2305.04743v2 )

ライセンス: Link先を確認

Teerapong Panboonyuen, Naphat Nithisopa, Panin Pienroj, Laphonchai Jirachuphun, Chaiwasut Watthanasirikrit, Naruepon Pornwiriyakul

(参考訳) 自動車保険業界にとって不運による自動車被害の評価は重要である。しかし、ディープラーニングネットワークは入力として車の損傷画像用に設計されておらず、セグメンテッドマスクはいまだに非常に粗いため、現実のアプリケーションでは精度が不十分である。本稿では,車両損傷事例分割のためのmars(mask attentionfine with sequential quadtree node)を提案する。我々のMARSは、シーケンシャルなクアッドツリーノード層とクアッドツリートランスフォーマーの間のグローバルな依存関係を引き出す自己注意機構を示し、チャネル重みを補正し、高精度なインスタンスマスクを予測する。広範囲にわたる実験により,mars は +1.3 maskap ベースの r50-fpn バックボーンと +2.3 maskap ベースの r101-fpn バックボーンによって,マスキング r-cnn [9] や pointrend [13] や mask transfiner [12] といった3つの人気のあるベンチマークで,最先端 (sota) インスタンスのセグメンテーション法を上回っていることが証明された。デモはhttps://github.com/kaopanboonyuen/MARS.comで公開しています。

Evaluating car damages from misfortune is critical to the car insurance industry. However, the accuracy is still insufficient for real-world applications since the deep learning network is not designed for car damage images as inputs, and its segmented masks are still very coarse. This paper presents MARS (Mask Attention Refinement with Sequential quadtree nodes) for car damage instance segmentation. Our MARS represents self-attention mechanisms to draw global dependencies between the sequential quadtree nodes layer and quadtree transformer to recalibrate channel weights and predict highly accurate instance masks. Our extensive experiments demonstrate that MARS outperforms state-of-the-art (SOTA) instance segmentation methods on three popular benchmarks such as Mask R-CNN [9], PointRend [13], and Mask Transfiner [12], by a large margin of +1.3 maskAP-based R50-FPN backbone and +2.3 maskAP-based R101-FPN backbone on Thai car-damage dataset. Our demos are available at https://github.com/kaopanboonyuen/MARS.

翻訳日:2023-07-11 18:24:23 公開日:2023-07-09

# CrAFT: 効率的な視覚タスク適応のための圧縮対応ファインチューニング

CrAFT: Compression-Aware Fine-Tuning for Efficient Visual Task Adaptation ( http://arxiv.org/abs/2305.04526v2 )

ライセンス: Link先を確認

Jung Hwan Heo, Seyedarmin Azizi, Arash Fayyazi, Massoud Pedram

(参考訳) 転移学習は基礎モデルの時代において一般的なタスク適応手法となった。しかし、多くのファンデーションモデルは大規模なストレージとコンピューティングリソースを必要としている。プルーニングや量子化といったポストトレーニング圧縮技術は、デプロイメントコストの削減に役立つ。残念ながら、結果として生じるパフォーマンス劣化は、そのようなテクニックのユーザビリティとメリットを制限します。この性能ギャップを埋めるために,ネットワーク圧縮を効果的に学習できる簡易な微調整フレームワークCrAFTを提案する。 CrAFTでは、ユーザーは単にデフォルトの微調整スケジュールとシャープネスの最小化目標を使い、同時にタスク適応と圧縮親和性を容易にする。事前トレーニング中に適用される従来のシャープネス最小化技術とは対照的に、CrAFTアプローチでは、単一のGPUで数分または数時間で微調整を行うため、無視可能なトレーニングオーバーヘッドが加わる。汎用ツールであるCrAFTの有効性は,多種多様な目標タスクにおいて,畳み込みに基づく視覚基盤モデルと注意に基づく視覚基盤モデルの両方で実証された。コードは公開される予定だ。

Transfer learning has become a popular task adaptation method in the era of foundation models. However, many foundation models require large storage and computing resources, which makes off-the-shelf deployment impractical. Post-training compression techniques such as pruning and quantization can help lower deployment costs. Unfortunately, the resulting performance degradation limits the usability and benefits of such techniques. To close this performance gap, we propose CrAFT, a simple fine-tuning framework that enables effective post-training network compression. In CrAFT, users simply employ the default fine-tuning schedule along with sharpness minimization objective, simultaneously facilitating task adaptation and compression-friendliness. Contrary to the conventional sharpness minimization techniques, which are applied during pretraining, the CrAFT approach adds negligible training overhead as fine-tuning is done in under a couple of minutes or hours with a single GPU. The effectiveness of CrAFT, which is a general-purpose tool that can significantly boost one-shot pruning and post-training quantization, is demonstrated on both convolution-based and attention-based vision foundation models on a variety of target tasks. The code will be made publicly available.

翻訳日:2023-07-11 18:23:57 公開日:2023-07-09

# DiffusEmp: 共感応答生成のための多点制御による拡散モデルベースフレームワーク

DiffusEmp: A Diffusion Model-Based Framework with Multi-Grained Control for Empathetic Response Generation ( http://arxiv.org/abs/2306.01657v2 )

ライセンス: Link先を確認

Guanqun Bi, Lei Shen, Yanan Cao, Meng Chen, Yuqiang Xie, Zheng Lin and Xiaodong He

(参考訳) 共感はオープンドメインの会話において重要な要素であり、他人の世話や理解を自然に示します。共感応答を生成するためにいくつかの方法が提案されているが、既存の作品はしばしば汎用的で安全な表現を参照する単調な共感に繋がる。本稿では,対話コンテキストと属性指向制御信号の利用を統一する条件拡散言語モデルに基づいて,共感表現のガイドとフレームワークDiffusEmpの設計に明示的な制御を用いることを提案する。具体的には, コミュニケーション機構, 意図, セマンティックフレームを, 粗いレベルから細かいレベルへの共感の実現を制御するための, 多粒度信号として輸入する。次に,多重粒度信号と応答トークンの関係を反映したマスキング戦略をデザインし,生成過程に影響を与える拡散モデルに統合する。ベンチマークデータセットEmpatheticDialogueの実験結果から,我々のフレームワークは文脈関連性を失うことなく,制御性,情報性,多様性の点で競争ベースラインを上回っていることがわかった。

Empathy is a crucial factor in open-domain conversations, which naturally shows one's caring and understanding to others. Though several methods have been proposed to generate empathetic responses, existing works often lead to monotonous empathy that refers to generic and safe expressions. In this paper, we propose to use explicit control to guide the empathy expression and design a framework DiffusEmp based on conditional diffusion language model to unify the utilization of dialogue context and attribute-oriented control signals. Specifically, communication mechanism, intent, and semantic frame are imported as multi-grained signals that control the empathy realization from coarse to fine levels. We then design a specific masking strategy to reflect the relationship between multi-grained signals and response tokens, and integrate it into the diffusion model to influence the generative process. Experimental results on a benchmark dataset EmpatheticDialogue show that our framework outperforms competitive baselines in terms of controllability, informativeness, and diversity without the loss of context-relatedness.

翻訳日:2023-07-11 18:17:06 公開日:2023-07-09

# 会話における感情認識のための教師付きコントラスト学習

Supervised Adversarial Contrastive Learning for Emotion Recognition in Conversations ( http://arxiv.org/abs/2306.01505v2 )

ライセンス: Link先を確認

Dou Hu, Yinan Bao, Lingwei Wei, Wei Zhou, Songlin Hu

(参考訳) 一般化されたロバスト表現の抽出は、会話における感情認識(erc)において大きな課題である。そこで本研究では,クラススプレッド構造表現を教師付きで学習するための,教師付き対逆学習(SACL)フレームワークを提案する。 SACLはコントラスト対応逆行訓練を適用し、最悪のサンプルを生成し、コントラスト学習を用いて構造化表現を抽出する。ラベルレベルの機能一貫性を効果的に活用し、クラス内の細かな機能を保持できる。文脈依存データに対する敵意摂動の悪影響を避けるために,コンテキストからより多様な特徴を学習し,モデルのコンテキストロバスト性を高めるために,cat(contextual adversarial training)戦略を設計する。 CAT を用いたフレームワークでは,ERC のラベル一貫性とコンテキスト特性を学習するためのシーケンスベース SACL-LSTM を開発した。 3つのデータセットの実験により、SACL-LSTMはERCの最先端のパフォーマンスを達成することが示された。拡張実験はSACLとCATの有効性を証明した。

Extracting generalized and robust representations is a major challenge in emotion recognition in conversations (ERC). To address this, we propose a supervised adversarial contrastive learning (SACL) framework for learning class-spread structured representations in a supervised manner. SACL applies contrast-aware adversarial training to generate worst-case samples and uses joint class-spread contrastive learning to extract structured representations. It can effectively utilize label-level feature consistency and retain fine-grained intra-class features. To avoid the negative impact of adversarial perturbations on context-dependent data, we design a contextual adversarial training (CAT) strategy to learn more diverse features from context and enhance the model's context robustness. Under the framework with CAT, we develop a sequence-based SACL-LSTM to learn label-consistent and context-robust features for ERC. Experiments on three datasets show that SACL-LSTM achieves state-of-the-art performance on ERC. Extended experiments prove the effectiveness of SACL and CAT.

翻訳日:2023-07-11 18:16:49 公開日:2023-07-09

# 因果部分構造を用いたシフトロバスト分子関係学習

Shift-Robust Molecular Relational Learning with Causal Substructure ( http://arxiv.org/abs/2305.18451v2 )

ライセンス: Link先を確認

Namkyeong Lee, Kanghoon Yoon, Gyoung S. Na, Sein Kim, Chanyoung Park

(参考訳) 近年、分子対間の相互作用の振る舞いを予測することを目的とした分子関係学習が、幅広い応用のために分子科学への関心が高まっている。本研究では,分子関係学習における分布変化に頑健なCMRLを提案する。そこで我々はまず,分子科学の領域知識に基づいて因果関係を仮定し,変数間の関係を明らかにする構造因果モデル(SCM)を構築する。 SCMに基づいて, 組換え分子上での干渉を条件付けした新しい条件付き干渉機構を導入する。条件付き介入の枠組みにより,本モデルは因果的サブ構造から学習し,化学反応に急激な相関を持つショートカットサブ構造の共起効果を緩和する。実世界および合成データセットを用いた様々なタスクに関する大規模な実験は、最先端のベースラインモデルよりもCMRLの方が優れていることを示す。私たちのコードはhttps://github.com/namkyeong/cmrlで利用可能です。

Recently, molecular relational learning, whose goal is to predict the interaction behavior between molecular pairs, got a surge of interest in molecular sciences due to its wide range of applications. In this work, we propose CMRL that is robust to the distributional shift in molecular relational learning by detecting the core substructure that is causally related to chemical reactions. To do so, we first assume a causal relationship based on the domain knowledge of molecular sciences and construct a structural causal model (SCM) that reveals the relationship between variables. Based on the SCM, we introduce a novel conditional intervention framework whose intervention is conditioned on the paired molecule. With the conditional intervention framework, our model successfully learns from the causal substructure and alleviates the confounding effect of shortcut substructures that are spuriously correlated to chemical reactions. Extensive experiments on various tasks with real-world and synthetic datasets demonstrate the superiority of CMRL over state-of-the-art baseline models. Our code is available at https://github.com/Namkyeong/CMRL.

翻訳日:2023-07-11 18:14:58 公開日:2023-07-09

# 効率的なシーケンスモデリングのためのスパースモジュラーアクティベーション

Sparse Modular Activation for Efficient Sequence Modeling ( http://arxiv.org/abs/2306.11197v2 )

ライセンス: Link先を確認

Liliang Ren, Yang Liu, Shuohang Wang, Yichong Xu, Chenguang Zhu, ChengXiang Zhai

(参考訳) 線形状態空間モデル(SSM)は、繰り返し構造を効率的に符号化するため、様々なシーケンスモデリングタスクにおいて強い性能を示した。しかし、言語モデリングや機械翻訳といったより包括的なタスクでは、自己注意に基づくモデルは依然としてSSMよりも優れています。 SSMと自己注意の両方を併用したハイブリッドモデルは一般に有望な性能を示すが、現在のアプローチでは、入力シーケンスのすべての要素に対して静的かつ均一に注意モジュールを適用し、準最適品質と効率のトレードオフをもたらす。本研究では,ニューラルネットワークが配列要素のサブモジュールを分離的かつ動的に動的に活性化する機構であるスパースモジュール活性化(SMA)を紹介する。各要素が非アクティブなサブモジュールをスキップできるようにすることで、SMAはシーケンスモデリングのトレーニングと推論の段階で計算とメモリ消費を減らす。 SMAの特定のインスタンス化として、SMAを用いて、SSMから学んだ状態表現に基づいて、GAU(Gated Attention Unit)をスパースに活性化する新しいニューラルネットワークSeqBoatを設計する。 GAUが活性化された入力にのみ局所的な注意を集中させることで、セックボートは理論上無限の注意範囲を持つ線形推論複雑性を達成でき、チャンキングベースモデルよりもはるかに優れた品質と効率のトレードオフを提供できる。言語モデリング、音声分類、長距離アリーナを含む幅広いタスクの実験により、SeqBoatは線形複雑性を持つハイブリッドモデルに新しい最先端の結果をもたらし、学習されたスパースアクティベーションパターンを通じて各タスクに必要な注意の量を明らかにする。

Linear State Space Models (SSMs) have demonstrated strong performance in a variety of sequence modeling tasks due to their efficient encoding of the recurrent structure. However, in more comprehensive tasks like language modeling and machine translation, self-attention-based models still outperform SSMs. Hybrid models employing both SSM and self-attention generally show promising performance, but current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs. In this work, we introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely and dynamically activate sub-modules for sequence elements in a differentiable manner. Through allowing each element to skip non-activated sub-modules, SMA reduces computation and memory consumption at both training and inference stages of sequence modeling. As a specific instantiation of SMA, we design a novel neural architecture, SeqBoat, which employs SMA to sparsely activate a Gated Attention Unit (GAU) based on the state representations learned from an SSM. By constraining the GAU to only conduct local attention on the activated inputs, SeqBoat can achieve linear inference complexity with theoretically infinite attention span, and provide substantially better quality-efficiency trade-off than the chunking-based models. With experiments on a wide range of tasks, including language modeling, speech classification and long-range arena, SeqBoat brings new state-of-the-art results among hybrid models with linear complexity and reveals the amount of attention needed for each task through the learned sparse activation patterns.

翻訳日:2023-07-11 18:06:29 公開日:2023-07-09

# AIに基づくモーション編集とスティル化のためのモーションキャプチャデータセット

Motion Capture Dataset for Practical Use of AI-based Motion Editing and Stylization ( http://arxiv.org/abs/2306.08861v2 )

ライセンス: Link先を確認

Makito Kobayashi, Chen-Chieh Liao, Keito Inoue, Sentaro Yojima, Masafumi Takahashi

(参考訳) そこで本研究では,動きスタイル伝達領域のための新しいスタイル多様性データセットを提案する。モーションデータセットは産業標準の人骨構造を用いており、多くのプロジェクトのために3D文字に差し込むことができる。我々はモーションスタイル転送の課題を主張し,提案するモーションデータセットを一般と市場の両方に公開することにより,この領域における今後の作業を促進する。本研究は,最先端手法を用いた実験において,モーションスタイル転送に関する包括的研究を行い,提案するデータセットがモーションスタイル転送タスクに有効であることを示す。

In this work, we proposed a new style-diverse dataset for the domain of motion style transfer. The motion dataset uses an industrial-standard human bone structure and thus is industry-ready to be plugged into 3D characters for many projects. We claim the challenges in motion style transfer and encourage future work in this domain by releasing the proposed motion dataset both to the public and the market. We conduct a comprehensive study on motion style transfer in the experiment using the state-of-the-art method, and the results show the proposed dataset's validity for the motion style transfer task.

翻訳日:2023-07-11 18:05:25 公開日:2023-07-09

# 単語順における一様情報密度に対する言語間圧力

A Cross-Linguistic Pressure for Uniform Information Density in Word Order ( http://arxiv.org/abs/2306.03734v2 )

ライセンス: Link先を確認

Thomas Hikaru Clark, Clara Meister, Tiago Pimentel, Michael Hahn, Ryan Cotterell, Richard Futrell and Roger Levy

(参考訳) 自然言語は、標準語順と単語順の柔軟性の両方で大きく異なるが、その単語順は、しばしば機能的な圧力による共有言語間統計パターンに従っている。これらのプレッシャーを特定するために、先行研究は実際の語順と偽語順を比較した。しかし、このような調査では、一様情報密度(UID)仮説という1つの機能的圧力が見過ごされている。ここでは,UIDの圧力が語順パターンに相互言語的に影響を与えているかどうかを問う。この目的のために、実順序が反実順序よりも情報均一性が高まるかどうかを計算モデルを用いて検証する。類型的に多様性のある10の言語に関する実証的研究では、 (i)SVO言語では、実語順は逆語順よりも一貫して一様であり、 (ii) 言語的に不可解な反実順序のみが、実際の順序の均一性を超え続ける。これらの知見は、自然言語の開発と利用における情報の均一性の圧力と互換性がある。

While natural languages differ widely in both canonical word order and word order flexibility, their word orders still follow shared cross-linguistic statistical patterns, often attributed to functional pressures. In the effort to identify these pressures, prior work has compared real and counterfactual word orders. Yet one functional pressure has been overlooked in such investigations: the uniform information density (UID) hypothesis, which holds that information should be spread evenly throughout an utterance. Here, we ask whether a pressure for UID may have influenced word order patterns cross-linguistically. To this end, we use computational models to test whether real orders lead to greater information uniformity than counterfactual orders. In our empirical study of 10 typologically diverse languages, we find that: (i) among SVO languages, real word orders consistently have greater uniformity than reverse word orders, and (ii) only linguistically implausible counterfactual orders consistently exceed the uniformity of real orders. These findings are compatible with a pressure for information uniformity in the development and usage of natural languages.

翻訳日:2023-07-11 18:03:56 公開日:2023-07-09

# 多言語言語モデルは多文化的ではない:感情のケーススタディ

Multilingual Language Models are not Multicultural: A Case Study in Emotion ( http://arxiv.org/abs/2307.01370v2 )

ライセンス: Link先を確認

Shreya Havaldar, Sunny Rai, Bhumika Singhal, Langchen Liu, Sharath Chandra Guntuku, Lyle Ungar

(参考訳) 感情は世界中で経験され、表現される。感情に敏感な多言語タスクにLarge Language Models(LM)を使用するには、感情の文化的変化を反映しなければならない。本研究では,2023年の多言語LMが,文化や言語間の感情表現の差異を反映しているかどうかを検討する。 LMから得られる埋め込み(例えば、XLM-RoBERTa)はアングロ中心であり、生成的LM(例えば、ChatGPT)は、他の言語のプロンプトに応答しても、西洋のノルムを反映する。以上の結果から,多言語lmsは感情の文化的に適切なニュアンスを学習できないことを示し,これを修正するための研究の方向性を強調する。

Emotions are experienced and expressed differently across the world. In order to use Large Language Models (LMs) for multilingual tasks that require emotional sensitivity, LMs must reflect this cultural variation in emotion. In this study, we investigate whether the widely-used multilingual LMs in 2023 reflect differences in emotional expressions across cultures and languages. We find that embeddings obtained from LMs (e.g., XLM-RoBERTa) are Anglocentric, and generative LMs (e.g., ChatGPT) reflect Western norms, even when responding to prompts in other languages. Our results show that multilingual LMs do not successfully learn the culturally appropriate nuances of emotion and we highlight possible research directions towards correcting this.

翻訳日:2023-07-11 17:57:33 公開日:2023-07-09

# DragDiffusion:インタラクティブなポイントベース画像編集のための拡散モデル

DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing ( http://arxiv.org/abs/2306.14435v3 )

ライセンス: Link先を確認

Yujun Shi, Chuhui Xue, Jiachun Pan, Wenqing Zhang, Vincent Y. F. Tan, Song Bai

(参考訳) 正確かつ制御可能な画像編集は、大きな注目を集めている課題である。近年、DragGANはインタラクティブな点ベース画像編集フレームワークを提供し、画素レベルの精度で印象的な編集結果を実現する。しかし, この手法はGAN(Generative Adversarial Network)に基づくため, 事前学習したGANモデルの容量により, 一般性は上界となる。本研究では,このようなフレームワークを拡散モデルに拡張し,DragDiffusionを提案する。大規模事前学習された拡散モデルを利用することにより,実世界シナリオにおける対話型ポイントベース編集の適用性が大幅に向上する。既存の拡散ベースの画像編集手法はテキスト埋め込みで動作するが、dragdiffusionは拡散潜時を最適化して正確な空間制御を実現する。拡散モデルは反復的に画像を生成するが、一つのステップで拡散遅延を最適化すればコヒーレントな結果が得られ、DragDiffusionが効率よく高品質な編集を完了できることを実証的に示す。幅広い挑戦的なケース(マルチオブジェクト、多様なオブジェクトカテゴリ、様々なスタイルなど)にわたる広範な実験は、dragdiffusionの汎用性と汎用性を示している。コード: https://github.com/yujun-shi/dragdiffusion。

Precise and controllable image editing is a challenging task that has attracted significant attention. Recently, DragGAN enables an interactive point-based image editing framework and achieves impressive editing results with pixel-level precision. However, since this method is based on generative adversarial networks (GAN), its generality is upper-bounded by the capacity of the pre-trained GAN models. In this work, we extend such an editing framework to diffusion models and propose DragDiffusion. By leveraging large-scale pretrained diffusion models, we greatly improve the applicability of interactive point-based editing in real world scenarios. While most existing diffusion-based image editing methods work on text embeddings, DragDiffusion optimizes the diffusion latent to achieve precise spatial control. Although diffusion models generate images in an iterative manner, we empirically show that optimizing diffusion latent at one single step suffices to generate coherent results, enabling DragDiffusion to complete high-quality editing efficiently. Extensive experiments across a wide range of challenging cases (e.g., multi-objects, diverse object categories, various styles, etc.) demonstrate the versatility and generality of DragDiffusion. Code: https://github.com/Yujun-Shi/DragDiffusion.

翻訳日:2023-07-11 17:55:28 公開日:2023-07-09

# LVM-Med:2次グラフマッチングによる医用イメージングのための大規模自己スーパービジョンモデル学習

LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching ( http://arxiv.org/abs/2306.11925v2 )

ライセンス: Link先を確認

Duy M. H. Nguyen, Hoang Nguyen, Nghiem T. Diep, Tan N. Pham, Tri Cao, Binh T. Nguyen, Paul Swoboda, Nhat Ho, Shadi Albarqouni, Pengtao Xie, Daniel Sonntag, Mathias Niepert

(参考訳) 注釈付きサンプルを限定した新しいタスクに微調整できる大規模な事前訓練モデルを持つことは、医療画像データにとってオープンな課題である。 ImageNetの事前訓練されたディープネットワークとWebスケールデータで訓練されたビジョン言語基盤モデルが一般的であるが、天然画像と医用画像のドメインシフトが大きいため、医療タスクにおけるそれらの効果は限られている。このギャップを埋めるために,大規模医療データセットでトレーニングされた最初のディープネットワークであるlmm-medを紹介する。我々は、55の公開データセットから約130万の医療画像を収集し、CT、MRI、X線、超音波などの多数の臓器とモダリティをカバーした。このデータセット上で,最先端の自己教師付きアルゴリズムをベンチマークし,グラフマッチングを用いた新しい自己教師付きコントラスト学習アルゴリズムを提案する。提案するアプローチには3つの貢献がある。 (i)地域情報及びグローバル情報に基づく先行的な対向画像類似度指標を統合する。 (ii)組合せグラフマッチング目的によって構築された損失関数を通して特徴埋め込みの構造的制約を捉え、 (iii)ブラックボックスソルバに対する現代の勾配推定手法を用いて、エンドツーエンドを効率的に訓練することができる。提案手法は,セグメンテーションや分類,オブジェクト検出,分布内および分布外の設定など15の下流医療タスクにおいて,提案手法を徹底的に評価した。 LVM-Medは、多くの最先端の教師付き、自己監督型、基礎モデルよりも経験的に優れている。脳腫瘍分類や糖尿病網膜症グラディングといった課題に対して、LVM-MedはResNet-50のみを使用しながら、10億のマスクでトレーニングされた以前の視覚言語モデルを6～7%改善する。

Obtaining large pre-trained models that can be fine-tuned to new tasks with limited annotated samples has remained an open challenge for medical imaging data. While pre-trained deep networks on ImageNet and vision-language foundation models trained on web-scale data are prevailing approaches, their effectiveness on medical tasks is limited due to the significant domain shift between natural and medical images. To bridge this gap, we introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets, covering a large number of organs and modalities such as CT, MRI, X-ray, and Ultrasound. We benchmark several state-of-the-art self-supervised algorithms on this dataset and propose a novel self-supervised contrastive learning algorithm using a graph-matching formulation. The proposed approach makes three contributions: (i) it integrates prior pair-wise image similarity metrics based on local and global information; (ii) it captures the structural constraints of feature embeddings through a loss function constructed via a combinatorial graph-matching objective; and (iii) it can be trained efficiently end-to-end using modern gradient-estimation techniques for black-box solvers. We thoroughly evaluate the proposed LVM-Med on 15 downstream medical tasks ranging from segmentation and classification to object detection, and both for the in and out-of-distribution settings. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models. For challenging tasks such as Brain Tumor Classification or Diabetic Retinopathy Grading, LVM-Med improves previous vision-language models trained on 1 billion masks by 6-7% while using only a ResNet-50.

翻訳日:2023-07-11 17:55:08 公開日:2023-07-09

# SRCD:単一ドメイン汎用オブジェクト検出のための複合ドメインを用いた意味推論

SRCD: Semantic Reasoning with Compound Domains for Single-Domain Generalized Object Detection ( http://arxiv.org/abs/2307.01750v2 )

ライセンス: Link先を確認

Zhijie Rao, Jingcai Guo, Luyao Tang, Yue Huang, Xinghao Ding, Song Guo

(参考訳) 本稿では,単一ドメイン一般化オブジェクト検出のための新しいフレームワーク(すなわち単一dgod)を提案し,モデル一般化能力を高めるために,自己提供型複合クロスドメインサンプルの意味構造を学習し,維持することに関心を寄せる。複数のソースドメインでトレーニングされたDGODとは異なり、シングルDGODは単一のソースドメインだけで複数のターゲットドメインにうまく一般化することがはるかに難しい。既存の手法は主にDGODからの同様の処理を採用し、意味空間を分離または圧縮することでドメイン不変の特徴を学習する。しかし、潜在的な制限は2つある。 1) 極端に少ない単一ドメインデータによる擬似属性・ラベル相関 2) セマンティックな構造情報は一般に無視される。つまり,サンプルにおけるインスタンスレベルのセマンティック関係の親和性は,一般化のモデル化に不可欠である。本稿では,Single-DGODのためのSingmantic Reasoning with Compound Domains (SRCD)を提案する。具体的には,テクスチャベースの自己拡張(TBSA)モジュールと局所言語意味推論(LGSR)モジュールの2つの主要コンポーネントを含む。 TBSAは、光、影、色などのラベルに関連する無関係な属性を、光量効率の自己増強によって画像レベルで除去することを目的としている。さらに、lgsrは、インスタンス特徴のセマンティック関係をさらにモデル化し、本質的なセマンティック構造を解明し、維持するために使用される。複数のベンチマークで大規模な実験を行い、提案したSRCDの有効性を示した。

This paper provides a novel framework for single-domain generalized object detection (i.e., Single-DGOD), where we are interested in learning and maintaining the semantic structures of self-augmented compound cross-domain samples to enhance the model's generalization ability. Different from DGOD trained on multiple source domains, Single-DGOD is far more challenging to generalize well to multiple target domains with only one single source domain. Existing methods mostly adopt a similar treatment from DGOD to learn domain-invariant features by decoupling or compressing the semantic space. However, there may have two potential limitations: 1) pseudo attribute-label correlation, due to extremely scarce single-domain data; and 2) the semantic structural information is usually ignored, i.e., we found the affinities of instance-level semantic relations in samples are crucial to model generalization. In this paper, we introduce Semantic Reasoning with Compound Domains (SRCD) for Single-DGOD. Specifically, our SRCD contains two main components, namely, the texture-based self-augmentation (TBSA) module, and the local-global semantic reasoning (LGSR) module. TBSA aims to eliminate the effects of irrelevant attributes associated with labels, such as light, shadow, color, etc., at the image level by a light-yet-efficient self-augmentation. Moreover, LGSR is used to further model the semantic relationships on instance features to uncover and maintain the intrinsic semantic structures. Extensive experiments on multiple benchmarks demonstrate the effectiveness of the proposed SRCD.

翻訳日:2023-07-11 17:44:52 公開日:2023-07-09

# DebateKG: セマンティック知識グラフを用いた事例作成のための自動政策議論

DebateKG: Automatic Policy Debate Case Creation with Semantic Knowledge Graphs ( http://arxiv.org/abs/2307.04090v1 )

ライセンス: Link先を確認

Allen Roush

(参考訳) 近年のArgument Miningコミュニティにおける研究は、競争の激しい議論の中で見つかった問題の解決に自然言語処理システムの適用性を示している。競争討論における最も重要な課題の1つは、議論者が高品質の討論ケースを作成することである。議論的意味論的知識グラフ上の制約付き最短経路トラバーサルを用いて,効果的な議論事例を構築できることを示す。我々は、この可能性について、DebateSumと呼ばれる大規模データセットをすでに備えている、Policy Debateと呼ばれる米国競争的議論の文脈で研究する。我々は,データセットに53180個の新しい例と,さらに有用なメタデータを導入することで,ディベートサムを大幅に改善した。我々はtxtaiセマンティックサーチとナレッジグラフツールチェーンを利用して,このデータセット上に構築した9つのセマンティックナレッジグラフを作成し,コントリビュートする。政策論争事例作成の文脈において,どの知識グラフが優れているかを評価するユニークな手法を提案する。他のすべてのコードや知識グラフとともに、議論のケースを自動的に生成するデモがオープンソースとして公開されている。

Recent work within the Argument Mining community has shown the applicability of Natural Language Processing systems for solving problems found within competitive debate. One of the most important tasks within competitive debate is for debaters to create high quality debate cases. We show that effective debate cases can be constructed using constrained shortest path traversals on Argumentative Semantic Knowledge Graphs. We study this potential in the context of a type of American Competitive Debate, called Policy Debate, which already has a large scale dataset targeting it called DebateSum. We significantly improve upon DebateSum by introducing 53180 new examples, as well as further useful metadata for every example, to the dataset. We leverage the txtai semantic search and knowledge graph toolchain to produce and contribute 9 semantic knowledge graphs built on this dataset. We create a unique method for evaluating which knowledge graphs are better in the context of producing policy debate cases. A demo which automatically generates debate cases, along with all other code and the Knowledge Graphs, are open-sourced and made available to the public here: https://github.com/Hellisotherpeople/DebateKG

翻訳日:2023-07-11 15:41:44 公開日:2023-07-09

# 変分量子アルゴリズムは量子アドバンテージを実証できるか? 本当に重要な時間

Can Variational Quantum Algorithms Demonstrate Quantum Advantages? Time Really Matters ( http://arxiv.org/abs/2307.04089v1 )

ライセンス: Link先を確認

Huan-Yu Liu, Zhao-Yun Chen, Tai-Ping Sun, Cheng Xue, Yu-Chun Wu, and Guo-Ping Guo

(参考訳) 低深度量子ニューラルネットワーク(QNN)を採用することで、変動量子アルゴリズム(VQA)は、ノイズの多い中間スケール量子(NISQ)時代にも有望かつ挑戦的である。しかしながら、VQAが量子的優位性を証明できるかどうかはまだ未定であり、本論文で検討する。まず、QNNのトレーニングにパラメータ数と勾配評価コストの間に依存性があることを証明する。バックプロパゲーションアルゴリズムを用いて古典的ニューラルネットワークをトレーニングする際、そのような直接的な依存は存在しないことに気づき、そのような依存はVQAのスケーラビリティを制限すると論じる。第2に、ノイズや到達可能性といった現実的な制限を考慮せずに、理想的な場合、すなわち、VQAの実行時間を見積もる。理想的な時間コストが1年の壁時間の順序に容易に達することを示します。第3に、量子回路の古典的シミュレーションを用いて時間コストを比較することにより、VQAsは10^0$-$10^2$のスケーリングに達すると、古典的なシミュレーションケースよりも優れていることを示す。最後に、上記の結果に基づいて、VQAが時間スケーリングの観点から古典的なケースよりも優れており、従って現在のワークフローで量子的優位性を示すことは困難である、と論じる。 VQAと量子コンピューティングは急速に発展しているため、この研究はVQAの可能性を否定しようとはしていない。本論文の分析はvqaの最適化に向けた指針を提供し、長期的にはより自然なハイブリッド量子古典アルゴリズムを求めることは有意義である。

Applying low-depth quantum neural networks (QNNs), variational quantum algorithms (VQAs) are both promising and challenging in the noisy intermediate-scale quantum (NISQ) era: Despite its remarkable progress, criticisms on the efficiency and feasibility issues never stopped. However, whether VQAs can demonstrate quantum advantages is still undetermined till now, which will be investigated in this paper. First, we will prove that there exists a dependency between the parameter number and the gradient-evaluation cost when training QNNs. Noticing there is no such direct dependency when training classical neural networks with the backpropagation algorithm, we argue that such a dependency limits the scalability of VQAs. Second, we estimate the time for running VQAs in ideal cases, i.e., without considering realistic limitations like noise and reachability. We will show that the ideal time cost easily reaches the order of a 1-year wall time. Third, by comparing with the time cost using classical simulation of quantum circuits, we will show that VQAs can only outperform the classical simulation case when the time cost reaches the scaling of $10^0$-$10^2$ years. Finally, based on the above results, we argue that it would be difficult for VQAs to outperform classical cases in view of time scaling, and therefore, demonstrate quantum advantages, with the current workflow. Since VQAs as well as quantum computing are developing rapidly, this work does not aim to deny the potential of VQAs. The analysis in this paper provides directions for optimizing VQAs, and in the long run, seeking more natural hybrid quantum-classical algorithms would be meaningful.

翻訳日:2023-07-11 15:41:22 公開日:2023-07-09

# SVIT: Visual Instruction Tuningのスケールアップ

SVIT: Scaling up Visual Instruction Tuning ( http://arxiv.org/abs/2307.04087v1 )

ライセンス: Link先を確認

Bo Zhao, Boya Wu, Tiejun Huang

(参考訳) 基礎モデルの出現により、大きな言語とビジョンモデルは統合され、視覚的キャプション、対話、質問応答などのマルチモーダル機能を取得する。既存のマルチモーダルモデルは、視覚的理解と推論の印象的な性能を示すが、高品質な命令チューニングデータの不足のため、その限界は依然としてほとんど未熟である。マルチモーダル能力の限界を押し上げるために,1.6mの会話質問応答(qa)ペアと1.6mの複雑な推論qaペアと106kの詳細な画像記述を含む320万の視覚的命令チューニングデータのデータセットを構築し,視覚命令チューニング(svit)を売り出す。ボリュームに加えて,画像の豊富な手動アノテーションでGPT-4を誘導することにより,高品質で豊富な多様性を特徴付けるデータセットも提案されている。 SVIT上でのマルチモーダルモデルのトレーニングは,視覚的知覚や推論,計画といった面で,多モーダル性能を大幅に向上させることができることを実証的に検証した。

Thanks to the emerging of foundation models, the large language and vision models are integrated to acquire the multimodal ability of visual captioning, dialogue, question answering, etc. Although existing multimodal models present impressive performance of visual understanding and reasoning, their limits are still largely under-explored due to the scarcity of high-quality instruction tuning data. To push the limits of multimodal capability, we Sale up Visual Instruction Tuning (SVIT) by constructing a dataset of 3.2 million visual instruction tuning data including 1.6M conversation question-answer (QA) pairs and 1.6M complex reasoning QA pairs and 106K detailed image descriptions. Besides the volume, the proposed dataset is also featured by the high quality and rich diversity, which is generated by prompting GPT-4 with the abundant manual annotations of images. We empirically verify that training multimodal models on SVIT can significantly improve the multimodal performance in terms of visual perception, reasoning and planing.

翻訳日:2023-07-11 15:40:52 公開日:2023-07-09

# 自己校正分類器指導によるラベルデータ少ないスコアベース条件生成

Score-based Conditional Generation with Fewer Labeled Data by Self-calibrating Classifier Guidance ( http://arxiv.org/abs/2307.04081v1 )

ライセンス: Link先を確認

Paul Kuo-Ming Huang, Si-An Chen, Hsuan-Tien Lin

(参考訳) SGM(Score-based Generative Models)は、画像生成品質の高い深層生成モデルのファミリである。以前の研究では、未条件のSGMと訓練された分類器のガイダンスを結合することにより、SGMをクラス条件の生成に適応するように拡張してきた。しかしながら、そのような分類器誘導型SGMは、特にラベル付きデータが少ない場合、正確な条件生成を必ずしも達成しない。この問題は、分類器の信頼性の低い勾配と、トレーニング中にラベルなしのデータを完全に活用できないことに根ざしている。次に、分類器自身を校正することで分類器誘導SGMを改善することを提案する。我々のキーとなる考え方は、エネルギーモデルからの原理を使って分類器を無条件SGMの別の見方に変換することである。そして、ラベル付きデータとラベルなしデータの両方を用いて分類器を校正するために、無条件SGMの既存の損失を採用することができる。実験により,提案手法はラベル付きデータの異なるパーセンテージ間で条件生成品質を著しく改善することを確認した。性能の改善により、ラベル付きデータが少ない場合、提案手法は他の条件付きSGMよりも一貫して優れている。その結果,限定ラベルデータを用いた生成モデルに対する提案手法の可能性が確認された。

Score-based Generative Models (SGMs) are a popular family of deep generative models that achieves leading image generation quality. Earlier studies have extended SGMs to tackle class-conditional generation by coupling an unconditional SGM with the guidance of a trained classifier. Nevertheless, such classifier-guided SGMs do not always achieve accurate conditional generation, especially when trained with fewer labeled data. We argue that the issue is rooted in unreliable gradients of the classifier and the inability to fully utilize unlabeled data during training. We then propose to improve classifier-guided SGMs by letting the classifier calibrate itself. Our key idea is to use principles from energy-based models to convert the classifier as another view of the unconditional SGM. Then, existing loss for the unconditional SGM can be adopted to calibrate the classifier using both labeled and unlabeled data. Empirical results validate that the proposed approach significantly improves the conditional generation quality across different percentages of labeled data. The improved performance makes the proposed approach consistently superior to other conditional SGMs when using fewer labeled data. The results confirm the potential of the proposed approach for generative modeling with limited labeled data.

翻訳日:2023-07-11 15:40:32 公開日:2023-07-09

# 高速でスケーラブルなプライベート推論に向けて

Towards Fast and Scalable Private Inference ( http://arxiv.org/abs/2307.04077v1 )

ライセンス: Link先を確認

Jianqiao Mo, Karthik Garimella, Negar Neda, Austin Ebel, Brandon Reagen

(参考訳) プライバシとセキュリティは、ファーストオーダーの設計制約として急速に現れています。ユーザーは、データを見る人(秘密性)と利用方法(コントロール)に対して、より多くの保護を求めるようになった。ここでは、セキュリティのための既存の暗号化技術は不足している。保存または通信時にデータを保護するが、計算のために復号化する必要がある。幸いにも、プライバシ保護計算(PPC)と呼ばれる新しい計算パラダイムが存在する。新興のPPC技術は、セキュアなアウトソース計算や、2つのパーティの計算に利用することができる。デジタル時代のユーザー保護に革命をもたらす驚くべき可能性にもかかわらず、その実現は計算能力、通信能力、ストレージのオーバーヘッドのために制限されている。本稿では、ニューラルネットワークにおけるプライベート推論(PI)をモチベーションアプリケーションとして利用して、様々なPPCオーバーヘッドに対処する取り組みについてレビューする。まず,準同型暗号 (he), 秘密共有 (ss), ガーブレッド回路 (gcs), オブリベイト転送 (ot) など様々な技術が紹介されている。次に、PI実装時のオーバーヘッドのキャラクタリゼーションをカバーします。キャラクタリゼーションはgcとheアクセラレータの両方の必要性を動機付けている。次に、GCを加速するHAACとHEを加速するRPUの2つのソリューションが提示される。結論として、piの残りのオーバーヘッドを克服するための今後の作業について、結果と効果を議論して示します。

Privacy and security have rapidly emerged as first order design constraints. Users now demand more protection over who can see their data (confidentiality) as well as how it is used (control). Here, existing cryptographic techniques for security fall short: they secure data when stored or communicated but must decrypt it for computation. Fortunately, a new paradigm of computing exists, which we refer to as privacy-preserving computation (PPC). Emerging PPC technologies can be leveraged for secure outsourced computation or to enable two parties to compute without revealing either users' secret data. Despite their phenomenal potential to revolutionize user protection in the digital age, the realization has been limited due to exorbitant computational, communication, and storage overheads. This paper reviews recent efforts on addressing various PPC overheads using private inference (PI) in neural network as a motivating application. First, the problem and various technologies, including homomorphic encryption (HE), secret sharing (SS), garbled circuits (GCs), and oblivious transfer (OT), are introduced. Next, a characterization of their overheads when used to implement PI is covered. The characterization motivates the need for both GCs and HE accelerators. Then two solutions are presented: HAAC for accelerating GCs and RPU for accelerating HE. To conclude, results and effects are shown with a discussion on what future work is needed to overcome the remaining overheads of PI.

翻訳日:2023-07-11 15:40:13 公開日:2023-07-09

# 癌マルチオミクスデータに基づくがんの新しいサブタイプと治療のためのマルチヘッド注意機構学習

Multi-Head Attention Mechanism Learning for Cancer New Subtypes and Treatment Based on Cancer Multi-Omics Data ( http://arxiv.org/abs/2307.04075v1 )

ライセンス: Link先を確認

Liangrui Pan, Dazhen Liu, Yutao Dou, Lian Wang, Zhichao Feng, Pengfei Rong, Liwen Xu, Shaoliang Peng

(参考訳) がんの多様性が高く, 臨床的特徴も高いため, 癌サブタイプ間では, マルチオミクスデータと臨床特徴に有意差がみられた。したがって、癌の診断、治療、予後には、癌サブタイプの同定と発見が不可欠である。本研究では,非教師なしコントラスト学習(unsupervised contrastive learning, amucl)のための注意機構に基づく一般化フレームワークを提案する。 AMUCLフレームワークには、教師なしマルチヘッドアテンション機構が含まれており、マルチオミクスデータの特徴を深く抽出する。さらに,マルチヘッドアテンション機構に基づく非結合型コントラスト学習モデル(dmacl)を提案し,マルチオミクスデータの特徴とクラスターを学習し,新しいがんサブタイプを同定する。この教師なしコントラスト学習法は、マルチオミクスデータの特徴空間におけるサンプルとサンプル空間との類似度を計算してサブタイプをクラスタ化する。他の11のディープラーニングモデルと比較して、DMACLモデルは0.002のCインデックス、Silhouetteスコア0.801、Davies Bouldinスコア0.38のCインデックスをシングルセルマルチオミクスデータセットで達成した。がんマルチオミクスデータセットにおいて、dmaclモデルは、0.016のc-インデックス、0.688のシルエットスコア、0.06のデイビスブルディンスコアを取得し、各種類のがんに対して最も信頼性の高い癌サブタイプクラスタリング結果を得た。最後に、AMUCLフレームワークでDMACLモデルを用いて、AMLの6つの癌サブタイプを明らかにした。 amlのgo機能強化,サブタイプ特異的生物学的機能,gseaの解析により,amuclフレームワークに基づいた癌サブタイプ解析の解釈性がさらに向上した。

Due to the high heterogeneity and clinical characteristics of cancer, there are significant differences in multi-omics data and clinical features among subtypes of different cancers. Therefore, the identification and discovery of cancer subtypes are crucial for the diagnosis, treatment, and prognosis of cancer. In this study, we proposed a generalization framework based on attention mechanisms for unsupervised contrastive learning (AMUCL) to analyze cancer multi-omics data for the identification and characterization of cancer subtypes. AMUCL framework includes a unsupervised multi-head attention mechanism, which deeply extracts multi-omics data features. Importantly, a decoupled contrastive learning model (DMACL) based on a multi-head attention mechanism is proposed to learn multi-omics data features and clusters and identify new cancer subtypes. This unsupervised contrastive learning method clusters subtypes by calculating the similarity between samples in the feature space and sample space of multi-omics data. Compared to 11 other deep learning models, the DMACL model achieved a C-index of 0.002, a Silhouette score of 0.801, and a Davies Bouldin Score of 0.38 on a single-cell multi-omics dataset. On a cancer multi-omics dataset, the DMACL model obtained a C-index of 0.016, a Silhouette score of 0.688, and a Davies Bouldin Score of 0.46, and obtained the most reliable cancer subtype clustering results for each type of cancer. Finally, we used the DMACL model in the AMUCL framework to reveal six cancer subtypes of AML. By analyzing the GO functional enrichment, subtype-specific biological functions, and GSEA of AML, we further enhanced the interpretability of cancer subtype analysis based on the generalizable AMUCL framework.

翻訳日:2023-07-11 15:39:51 公開日:2023-07-09

# 視覚トランスフォーマーのためのランダム位置反転パッチ

Random Position Adversarial Patch for Vision Transformers ( http://arxiv.org/abs/2307.04066v1 )

ライセンス: Link先を確認

Mingzhen Shao

(参考訳) 以前の研究では、視覚トランスフォーマーが敵のパッチに脆弱性があることが示されているが、これらの研究はすべて重要な仮定に依存している。この厳密な要件により、視覚トランスフォーマーの物理的世界での対向パッチの展開は、cnnでの有効性とは異なり、現実的ではない。本稿では、アライメント制約を克服し、視野内の任意の位置に標的攻撃を発射できる敵パッチ(G-Patch)を生成する新しい手法を提案する。具体的には、勾配を使ってパッチを直接最適化するのではなく、GANのような構造を用いて逆パッチを生成する。本実験は,デジタルおよび物理世界のシナリオにおいて,視覚トランスフォーマーに対するユニバーサルアタックを実現する上で,敵パッチの有効性を示す。さらに、さらに分析した結果、生成した対向パッチは、輝度制限、色移動、ランダムノイズに対する堅牢性を示すことが明らかとなった。実世界の攻撃実験は、非常に困難な条件下でも堅牢な攻撃を発射するためのGパッチの有効性を検証する。

Previous studies have shown the vulnerability of vision transformers to adversarial patches, but these studies all rely on a critical assumption: the attack patches must be perfectly aligned with the patches used for linear projection in vision transformers. Due to this stringent requirement, deploying adversarial patches for vision transformers in the physical world becomes impractical, unlike their effectiveness on CNNs. This paper proposes a novel method for generating an adversarial patch (G-Patch) that overcomes the alignment constraint, allowing the patch to launch a targeted attack at any position within the field of view. Specifically, instead of directly optimizing the patch using gradients, we employ a GAN-like structure to generate the adversarial patch. Our experiments show the effectiveness of the adversarial patch in achieving universal attacks on vision transformers, both in digital and physical-world scenarios. Additionally, further analysis reveals that the generated adversarial patch exhibits robustness to brightness restriction, color transfer, and random noise. Real-world attack experiments validate the effectiveness of the G-Patch to launch robust attacks even under some very challenging conditions.

翻訳日:2023-07-11 15:39:19 公開日:2023-07-09

# 生成ニューラルネットワークに基づく超高次元非凸景観の大規模大域的最適化

Large-scale global optimization of ultra-high dimensional non-convex landscapes based on generative neural networks ( http://arxiv.org/abs/2307.04065v1 )

ライセンス: Link先を確認

Jiaqi Jiang, Jonathan A. Fan

(参考訳) 超高次元連続景観における効果的な探索を可能にする深層生成ネットワークの訓練に基づいて,非凸最適化アルゴリズムのメタヒューリスティックを提案する。ネットワークトレーニングでは, サンプリングした局所勾配の集団をカスタマイズされた損失関数内で利用し, ネットワーク出力分布関数を高い性能で1つのピークに進化させる。深層ネットワークアーキテクチャは、トレーニングの過程で進行的な成長をサポートするように調整されており、高次元景観の次元特性の呪いをアルゴリズムが管理できる。我々は,1000の次元を持つ標準的な最適化問題に適用し,最先端のアルゴリズムベンチマークと比較して,関数評価の少ない手法で性能が向上することを示す。また、深層ネットワークの過度パラメータ化、損失関数工学、最適化における適切なネットワークアーキテクチャ選択の役割や、サンプリングした局所勾配のバッチサイズが問題次元に依存しない理由についても論じる。これらの概念は、非凸最適化問題を解決するためにカスタマイズ可能で表現可能な深層生成ネットワークを利用する新しいアルゴリズムの基盤となる。

We present a non-convex optimization algorithm metaheuristic, based on the training of a deep generative network, which enables effective searching within continuous, ultra-high dimensional landscapes. During network training, populations of sampled local gradients are utilized within a customized loss function to evolve the network output distribution function towards one peak at high-performing optima. The deep network architecture is tailored to support progressive growth over the course of training, which allows the algorithm to manage the curse of dimensionality characteristic of high-dimensional landscapes. We apply our concept to a range of standard optimization problems with dimensions as high as one thousand and show that our method performs better with fewer function evaluations compared to state-of-the-art algorithm benchmarks. We also discuss the role of deep network over-parameterization, loss function engineering, and proper network architecture selection in optimization, and why the required batch size of sampled local gradients is independent of problem dimension. These concepts form the foundation for a new class of algorithms that utilize customizable and expressive deep generative networks to solve non-convex optimization problems.

翻訳日:2023-07-11 15:39:02 公開日:2023-07-09

# 最適輸送による条件付サンプリングのための生成フロー

A generative flow for conditional sampling via optimal transport ( http://arxiv.org/abs/2307.04102v1 )

ライセンス: Link先を確認

Jason Alfonso, Ricardo Baptista, Anupam Bhakta, Noam Gal, Alfin Hou, Isa Lyubimova, Daniel Pocklington, Josef Sajonz, Giulio Trigila, and Ryan Tsai

(参考訳) サンプリング条件分布はベイズ推定と密度推定の基本的なタスクである。フローの正規化や生成的敵ネットワークのような生成モデルは、単純な参照(例えば標準ガウス)を目標分布にプッシュするトランスポートマップを学習することで条件分布を特徴付ける。これらのアプローチは非ゲージ問題の多くをうまく記述するが、パラメトリックバイアスと、これらの変換を学ぶための勾配ベース(逆)最適化器の信頼性によって、その性能はしばしば制限される。本研究は,参照サンプルをターゲットに反復的にマッピングする非パラメトリック生成モデルを提案する。モデルはブロック三角形輸送マップを使用し、そのコンポーネントは対象分布の条件を特徴付ける。これらのマップは、重み付き$L^2$コスト関数による最適輸送問題の解法から生じ、条件付きサンプリングのための[Trigila and Tabak, 2016]におけるデータ駆動アプローチを拡張した。提案手法は,2次元の例と非線形odeを含むパラメータ推論問題について実証した。

Sampling conditional distributions is a fundamental task for Bayesian inference and density estimation. Generative models, such as normalizing flows and generative adversarial networks, characterize conditional distributions by learning a transport map that pushes forward a simple reference (e.g., a standard Gaussian) to a target distribution. While these approaches successfully describe many non-Gaussian problems, their performance is often limited by parametric bias and the reliability of gradient-based (adversarial) optimizers to learn these transformations. This work proposes a non-parametric generative model that iteratively maps reference samples to the target. The model uses block-triangular transport maps, whose components are shown to characterize conditionals of the target distribution. These maps arise from solving an optimal transport problem with a weighted $L^2$ cost function, thereby extending the data-driven approach in [Trigila and Tabak, 2016] for conditional sampling. The proposed approach is demonstrated on a two dimensional example and on a parameter inference problem involving nonlinear ODEs.

翻訳日:2023-07-11 15:30:35 公開日:2023-07-09

# 超解像とディープラーニングによる意味セグメンテーションの精度向上--空間分解能が各種データセットに与える影響の検討

Enhancing Building Semantic Segmentation Accuracy with Super Resolution and Deep Learning: Investigating the Impact of Spatial Resolution on Various Datasets ( http://arxiv.org/abs/2307.04101v1 )

ライセンス: Link先を確認

Zhiling Guo, Xiaodan Shi, Haoran Zhang, Dou Huang, Xiaoya Song, Jinyue Yan, Ryosuke Shibasaki

(参考訳) リモートセンシングおよび深層学習技術の開発により,高精度かつ効率的にセマンティックセグメンテーションを構築することが可能となった。異なるタスクで成功したにもかかわらず、深層学習に基づくセマンティックセグメンテーションに対する空間分解能の影響に関する議論は非常に不十分であり、コスト効率の高いデータソースを選択することが大きな課題である。以上の課題に対処するため,本研究では,3つの研究領域のリモートセンシング画像を,超解像・ダウンサンプリングにより複数の空間解像度に分割する。その後、モデルトレーニングとテストのためにUNetとFPNの2つの代表的なディープラーニングアーキテクチャが選択される。 2つの深層学習モデルを持つ3つの都市から得られた実験結果から,空間分解能が建物セグメンテーションに大きく影響し,コスト効率が0.3m程度に向上することが示唆された。

The development of remote sensing and deep learning techniques has enabled building semantic segmentation with high accuracy and efficiency. Despite their success in different tasks, the discussions on the impact of spatial resolution on deep learning based building semantic segmentation are quite inadequate, which makes choosing a higher cost-effective data source a big challenge. To address the issue mentioned above, in this study, we create remote sensing images among three study areas into multiple spatial resolutions by super-resolution and down-sampling. After that, two representative deep learning architectures: UNet and FPN, are selected for model training and testing. The experimental results obtained from three cities with two deep learning models indicate that the spatial resolution greatly influences building segmentation results, and with a better cost-effectiveness around 0.3m, which we believe will be an important insight for data selection and preparation.

翻訳日:2023-07-11 15:30:19 公開日:2023-07-09

# 単一例による可視・赤外線自己監督核融合

Visible and infrared self-supervised fusion trained on a single example ( http://arxiv.org/abs/2307.04100v1 )

ライセンス: Link先を確認

Nati Ofir

(参考訳) 本稿では、可視光(RGB)と近赤外(NIR)画像融合の問題に対処する。マルチスペクトルイメージングは、RGBTセンサーの開発以来、画像処理やコンピュータビジョンにおいて重要な課題である。可視画像は色が見え、ノイズ、ヘイズ、雲に苦しむが、NIRチャネルはより鮮明な画像をキャプチャし、デハジングやオブジェクト検出などのアプリケーションでかなり必要である。提案手法は,CNN(Convolutional-Neural-Network)をSSL(Self-Supervised-Learning)でトレーニングすることで,これら2つのチャネルを融合させる。 RGBとIRのそれぞれのペアに対して、ネットワークは最終融合を推定するために数秒間訓練される。 SSLは、SSIM(Sturcture-of-Similarity)損失とEP(Edge-Preservation)損失の組み合わせに基づいている。 SSLのラベルは入力チャネル自身である。この融合は、重いトレーニングプロセスに基づいていないが、各スペクトルチャネルの関連する詳細を保存する。実験部では,大規模データセットのトレーニングを基礎としない他の手法に対して,提案手法はより質的かつ定量的なマルチスペクトル融合結果を達成する。

This paper addresses the problem of visible (RGB) to Near-Infrared (NIR) image fusion. Multispectral imaging is an important task relevant to image processing and computer vision, even more, since the development of the RGBT sensor. While the visible image sees color and suffers from noise, haze, and clouds, the NIR channel captures a clearer picture and it is significantly required by applications such as dehazing or object detection. The proposed approach fuses these two aligned channels by training a Convolutional-Neural-Network (CNN) by a Self-Supervised-Learning (SSL) on a single example. For each such pair, RGB and IR, the network is trained for seconds to deduce the final fusion. The SSL is based on Sturcture-of-Similarity (SSIM) loss combined with Edge-Preservation (EP) loss. The labels for the SSL are the input channels themselves. This fusion preserves the relevant detail of each spectral channel while not based on a heavy training process. In the experiments section, the proposed approach achieves better qualitative and quantitative multispectral fusion results with respect to other recent methods, that are not based on large dataset training.

翻訳日:2023-07-11 15:30:00 公開日:2023-07-09

# gnpアタック:勾配ノルムペナルティによる転送可能な逆行例

GNP Attack: Transferable Adversarial Examples via Gradient Norm Penalty ( http://arxiv.org/abs/2307.04099v1 )

ライセンス: Link先を確認

Tao Wu, Tie Luo, Donald C. Wunsch

(参考訳) 転送性の良い逆例(ae)は、ターゲットモデルに関する内部知識が不要な多様なターゲットモデルに対して、実用的なブラックボックス攻撃を可能にする。つまり、ソースのホワイトボックスモデルの特定のアーキテクチャや特徴表現に容易に適合し、生成されたAEはターゲットのブラックボックスモデルではほとんど機能しない。本稿では,GNP(Gradient Norm Penalty)を用いたAE転送性向上手法を提案する。損失関数最適化手順を駆動し、損失ランドスケープ内の局所最適の平坦な領域に収束する。 11種類の最先端(SOTA)深層学習モデルと6つの先進防衛手法を攻撃することにより、GNPは高い伝達性を持つAEを生成するのに非常に有効であることを示す。また,より強固な転送ベースの攻撃に対して,他の勾配ベース手法と容易に統合できるという点で,非常に柔軟であることを示す。

Adversarial examples (AE) with good transferability enable practical black-box attacks on diverse target models, where insider knowledge about the target models is not required. Previous methods often generate AE with no or very limited transferability; that is, they easily overfit to the particular architecture and feature representation of the source, white-box model and the generated AE barely work for target, black-box models. In this paper, we propose a novel approach to enhance AE transferability using Gradient Norm Penalty (GNP). It drives the loss function optimization procedure to converge to a flat region of local optima in the loss landscape. By attacking 11 state-of-the-art (SOTA) deep learning models and 6 advanced defense methods, we empirically show that GNP is very effective in generating AE with high transferability. We also demonstrate that it is very flexible in that it can be easily integrated with other gradient based methods for stronger transfer-based attacks.

翻訳日:2023-07-11 15:29:42 公開日:2023-07-09

# 適応型システムのための説明可能なオンライン強化学習に関する研究

A User Study on Explainable Online Reinforcement Learning for Adaptive Systems ( http://arxiv.org/abs/2307.04098v1 )

ライセンス: Link先を確認

Andreas Metzger and Jan Laufer and Felix Feit and Klaus Pohl

(参考訳) オンライン強化学習(RL)は、設計時間の不確実性の存在下で適応システムの実現にますます利用されている。オンラインRLは実際の運用データからの学習を容易にし、実行時にのみ利用できるフィードバックを活用する。しかし、オンラインRLは、RLアルゴリズムへのフィードバックを定量化し、学習をガイドする効果的な報酬関数の定義を必要とする。 deep rlへの関心が高まるにつれ、学習知識はもはや明示的に表現されるものではなく、ニューラルネットワークとして表現される。人間にとって、ニューラルネットワークのパラメータ化と具体的なRL決定を関連付けることは事実上不可能になる。したがって、Deep RLは本質的にブラックボックスとして現れ、適応システムのデバッグを著しく制限する。我々は以前、重要な時点において決定が下された理由についての視覚的な洞察を提供する説明可能なRL技術であるXRL-DINEを紹介した。本稿では,学術・産業系ソフトウェア技術者54名を対象に,(1)XRL-DINEを用いて異なるタスクを遂行する際のソフトウェア技術者の性能評価を行い,(2)XRL-DINEの有用性と使いやすさについて考察する。

Online reinforcement learning (RL) is increasingly used for realizing adaptive systems in the presence of design time uncertainty. Online RL facilitates learning from actual operational data and thereby leverages feedback only available at runtime. However, Online RL requires the definition of an effective and correct reward function, which quantifies the feedback to the RL algorithm and thereby guides learning. With Deep RL gaining interest, the learned knowledge is no longer explicitly represented, but is represented as a neural network. For a human, it becomes practically impossible to relate the parametrization of the neural network to concrete RL decisions. Deep RL thus essentially appears as a black box, which severely limits the debugging of adaptive systems. We previously introduced the explainable RL technique XRL-DINE, which provides visual insights into why certain decisions were made at important time points. Here, we introduce an empirical user study involving 54 software engineers from academia and industry to assess (1) the performance of software engineers when performing different tasks using XRL-DINE and (2) the perceived usefulness and ease of use of XRL-DINE.

翻訳日:2023-07-11 15:29:24 公開日:2023-07-09

# 1クラス分類と異常検出のための制約付き生成投影

Restricted Generative Projection for One-Class Classification and Anomaly Detection ( http://arxiv.org/abs/2307.04097v1 )

ライセンス: Link先を確認

Feng Xiao, Ruoyu Sun, Jicong Fan

(参考訳) 一級分類と異常検出のための簡単なフレームワークを提案する。中心となるアイデアは、未知のトレーニング(通常の)データの分布を既知のターゲット分布に変換するマッピングを学ぶことだ。重要な点として、ターゲット分布は十分に単純でコンパクトで情報に富むべきである。簡易性は、分布から容易にサンプリングできること、コンパクト性は、正規データと異常データとの間の決定境界が明確かつ信頼性があること、情報性は、変換されたデータが元のデータの重要な情報を保存することを保証することである。そこで,超球面における一様,超球面上の一様,あるいは超球面間の一様を対象分布として用いることを提案する。次に、変換されたデータ分布とターゲット分布との距離を最小化し、元のデータの再構成誤差を十分に小さくする。複数のベンチマークデータセットの比較研究により,本手法の有効性をベースラインと比較した。

We present a simple framework for one-class classification and anomaly detection. The core idea is to learn a mapping to transform the unknown distribution of training (normal) data to a known target distribution. Crucially, the target distribution should be sufficiently simple, compact, and informative. The simplicity is to ensure that we can sample from the distribution easily, the compactness is to ensure that the decision boundary between normal data and abnormal data is clear and reliable, and the informativeness is to ensure that the transformed data preserve the important information of the original data. Therefore, we propose to use truncated Gaussian, uniform in hypersphere, uniform on hypersphere, or uniform between hyperspheres, as the target distribution. We then minimize the distance between the transformed data distribution and the target distribution while keeping the reconstruction error for the original data small enough. Comparative studies on multiple benchmark datasets verify the effectiveness of our methods in comparison to baselines.

翻訳日:2023-07-11 15:29:04 公開日:2023-07-09

# 言語間意味解析のための最適伝達後方アライメント

Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing ( http://arxiv.org/abs/2307.04096v1 )

ライセンス: Link先を確認

Tom Sherborne, Tom Hosking, Mirella Lapata

(参考訳) 言語間のセマンティックパーシングは、高いソース言語(例えば英語)から少ないトレーニングデータを持つ低リソース言語へのパーシング能力を伝達する。以前の研究は銀標準データ拡張法やゼロショット法を主に検討していたが、金の少ないデータを利用する方法は比較的探究されていない。最適輸送を用いた確率潜在変数間の言語間差異を明示的に最小化することにより,言語間意味解析への新たなアプローチを提案する。この直接的なガイダンスが、より少ない例と少ないトレーニングを用いて、自然言語からの構文解析をどのように改善するかを実証する。本手法は,mtopとmultiatis++sqlの2つのデータセットで評価し,数秒の言語間比較で最新の結果を得た。アブレーション研究により, 並列入力を使わずとも, 性能が向上することが明らかとなった。さらに,本モデルでは,潜在空間における言語間構造をよりよく捉え,意味表現の類似性を改善する。

Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data. Previous work has primarily considered silver-standard data augmentation or zero-shot methods, however, exploiting few-shot gold data is comparatively unexplored. We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between probabilistic latent variables using Optimal Transport. We demonstrate how this direct guidance improves parsing from natural languages using fewer examples and less training. We evaluate our method on two datasets, MTOP and MultiATIS++SQL, establishing state-of-the-art results under a few-shot cross-lingual regime. Ablation studies further reveal that our method improves performance even without parallel input translations. In addition, we show that our model better captures cross-lingual structure in the latent space to improve semantic representation similarity.

翻訳日:2023-07-11 15:28:49 公開日:2023-07-09

# 深層連続学習のためのクラス・インクリメンタル混合ガウス

Class-Incremental Mixture of Gaussians for Deep Continual Learning ( http://arxiv.org/abs/2307.04094v1 )

ライセンス: Link先を確認

Lukasz Korycki, Bartosz Krawczyk

(参考訳) 定常データに対する継続的な学習モデルは、それらに連続的に来る概念の学習と保持に焦点を当てる。最も一般的なクラスインクリメンタルな環境では、高レベルのグループ化なしに、クラスをひとつずつ扱う準備ができている必要があります。この要件は、これまで提案されていた多くの手法を無効にし、研究者により柔軟な代替アプローチを探さざるを得ない。本研究では,遠心駆動型手法の考え方に従い,ガウスモデルの混合を連続学習フレームワークに組み入れることを提案する。解の退化を回避しながら識別的特徴を学習できる勾配に基づくアプローチと設計損失を利用することで,混合モデルと深部特徴抽出器を組み合わせ,潜在空間における共同最適化と調整を実現した。さらに,固定抽出器を用いてメモリフリーシナリオで効果的に学習できることを示す。実験では,提案手法の有効性を実証的に実証し,画像分類問題の文脈で評価された最先端の連続学習ベースラインと比較した場合のモデルの競争力を示す。

Continual learning models for stationary data focus on learning and retaining concepts coming to them in a sequential manner. In the most generic class-incremental environment, we have to be ready to deal with classes coming one by one, without any higher-level grouping. This requirement invalidates many previously proposed methods and forces researchers to look for more flexible alternative approaches. In this work, we follow the idea of centroid-driven methods and propose end-to-end incorporation of the mixture of Gaussians model into the continual learning framework. By employing the gradient-based approach and designing losses capable of learning discriminative features while avoiding degenerate solutions, we successfully combine the mixture model with a deep feature extractor allowing for joint optimization and adjustments in the latent space. Additionally, we show that our model can effectively learn in memory-free scenarios with fixed extractors. In the conducted experiments, we empirically demonstrate the effectiveness of the proposed solutions and exhibit the competitiveness of our model when compared with state-of-the-art continual learning baselines evaluated in the context of image classification problems.

翻訳日:2023-07-11 15:28:33 公開日:2023-07-09

# クエリで決定木を適切に学習するNP-Hard

Properly Learning Decision Trees with Queries Is NP-Hard ( http://arxiv.org/abs/2307.04093v1 )

ライセンス: Link先を確認

Caleb Koch and Carmen Strassle and Li-Yang Tan

(参考訳) PACが問合せ付き決定木を適切に学習することがNPハードであることを証明する(Bshouty 1993; Guijarro-Lavin-Raghavan 1999; Mehta-Raghavan 2002; Feldman 2016)。ランダムな例から判断木を適切に学習することの難しさを確立する(pitt-valiant 1988)まで遡る長い作業があったが、クエリ学習者のより困難な設定にはさまざまなテクニックが必要であり、それまでの下限は存在しなかった。そこで本研究では,決定木最小化問題(Zantema-Bodlaender 2000; Sieling 2003)について,最もよく知られた下界を単純化し,強化する。技術的レベルでは、決定木複雑性について研究するが、いかなる複雑性尺度に対しても考慮できる硬度蒸留の概念を導入し、大きな決定木を必要とする関数に対しては、その複雑さに責任がある小さな入力の集合を識別する一般的な方法を与える。我々の手法は、一定のエラーを許容するクエリ学習者を規則化さえしている。これは、逆多項式誤差のみを保持するランダムな例の設定に対する既存の下界とは対照的である。その結果,一様分布下で決定木を適切に学習する近多項時間問合せアルゴリズム(blanc-lange-qiao-tan 2022)を組み合わせることで,分布仮定が問題に劇的な影響を与えることを示した。

We prove that it is NP-hard to properly PAC learn decision trees with queries, resolving a longstanding open problem in learning theory (Bshouty 1993; Guijarro-Lavin-Raghavan 1999; Mehta-Raghavan 2002; Feldman 2016). While there has been a long line of work, dating back to (Pitt-Valiant 1988), establishing the hardness of properly learning decision trees from random examples, the more challenging setting of query learners necessitates different techniques and there were no previous lower bounds. En route to our main result, we simplify and strengthen the best known lower bounds for a different problem of Decision Tree Minimization (Zantema-Bodlaender 2000; Sieling 2003). On a technical level, we introduce the notion of hardness distillation, which we study for decision tree complexity but can be considered for any complexity measure: for a function that requires large decision trees, we give a general method for identifying a small set of inputs that is responsible for its complexity. Our technique even rules out query learners that are allowed constant error. This contrasts with existing lower bounds for the setting of random examples which only hold for inverse-polynomial error. Our result, taken together with a recent almost-polynomial time query algorithm for properly learning decision trees under the uniform distribution (Blanc-Lange-Qiao-Tan 2022), demonstrates the dramatic impact of distributional assumptions on the problem.

翻訳日:2023-07-11 15:28:13 公開日:2023-07-09

# CMDFusion: LIDARセマンティックセマンティックセグメンテーションのための双方向融合ネットワーク

CMDFusion: Bidirectional Fusion Network with Cross-modality Knowledge Distillation for LIDAR Semantic Segmentation ( http://arxiv.org/abs/2307.04091v1 )

ライセンス: Link先を確認

Jun Cen, Shiwei Zhang, Yixuan Pei, Kun Li, Hang Zheng, Maochun Luo, Yingya Zhang, Qifeng Chen

(参考訳) 2D RGB画像と3D LIDAR点雲は、自動運転車の知覚システムに補完的な知識を提供する。 LIDARセマンティックセグメンテーションタスクのためにいくつかの2Dおよび3D融合法が検討されているが、それらは異なる問題に悩まされている。 2D-to-3D融合法は、実世界のシナリオでは利用できないが、3D-to-2D融合法は2D情報を完全に利用できない。そこで本研究では,クロスモーダル知識蒸留(CMDFusion)を用いた双方向融合ネットワークを提案する。我々の方法には2つの貢献がある。まず,2次元から3次元への融合と3次元から2次元への融合により,両方向の融合スキームは2次元の融合スキームのいずれかを上回る3次元特徴を明確かつ暗黙的に拡張する。次に、2dネットワーク(カメラブランチ)から3dネットワーク(2d知識ブランチ)への2d知識を蒸留することにより、3dネットワークがカメラのfov(視野領域)にない点でも2d情報を生成することができる。このようにして、2D知識ブランチは3D LIDAR入力に従って2D情報を提供するため、推論中にRGB画像は不要になる。我々のCMDFusionは、SemanticKITTIとnuScenesデータセット上のすべてのフュージョンベースのメソッドの中で、最高のパフォーマンスを実現していることを示す。コードはhttps://github.com/jun-cen/cmdfusionでリリースされる。

2D RGB images and 3D LIDAR point clouds provide complementary knowledge for the perception system of autonomous vehicles. Several 2D and 3D fusion methods have been explored for the LIDAR semantic segmentation task, but they suffer from different problems. 2D-to-3D fusion methods require strictly paired data during inference, which may not be available in real-world scenarios, while 3D-to-2D fusion methods cannot explicitly make full use of the 2D information. Therefore, we propose a Bidirectional Fusion Network with Cross-Modality Knowledge Distillation (CMDFusion) in this work. Our method has two contributions. First, our bidirectional fusion scheme explicitly and implicitly enhances the 3D feature via 2D-to-3D fusion and 3D-to-2D fusion, respectively, which surpasses either one of the single fusion schemes. Second, we distillate the 2D knowledge from a 2D network (Camera branch) to a 3D network (2D knowledge branch) so that the 3D network can generate 2D information even for those points not in the FOV (field of view) of the camera. In this way, RGB images are not required during inference anymore since the 2D knowledge branch provides 2D information according to the 3D LIDAR input. We show that our CMDFusion achieves the best performance among all fusion-based methods on SemanticKITTI and nuScenes datasets. The code will be released at https://github.com/Jun-CEN/CMDFusion.

翻訳日:2023-07-11 15:27:47 公開日:2023-07-09

# 注意機構を用いた衛星観測における海ゴミ検出

Marine Debris Detection in Satellite Surveillance using Attention Mechanisms ( http://arxiv.org/abs/2307.04128v1 )

ライセンス: Link先を確認

Ao Shen, Yijie Zhu and Richard Jiang

(参考訳) 海洋デブリは環境保護の重要な問題であるが、現在の海洋デブリの特定方法はまだ限られている。海洋堆積物の局在化において高い効率とより広い適用性を達成するため,本研究は,yolov7のインスタンス分割を異なる注意機構と組み合わせ,最良のモデルについて検討する。海洋ゴミを含む衛星画像からなるラベル付きデータセットを用いて,軽量座標注意,CBAM(空間焦点とチャネル焦点を組み合わせた),ボトルネックトランスフォーマ(自己注意に基づく)の3つの注意モデルを検討した。ボックス検出評価の結果,CBAMは座標注意(F1スコア71%)とYOLOv7/bottleneck Transformer(F1スコア約66%)と比較して最高の成績(F1スコア77%)を示した。マスク評価では、cbamが再びf1スコアを73%、コーディネートアテンションとyolov7が同等のパフォーマンス(f1スコア68%/69%)、ボトルネックトランスフォーマーがf1スコア56%で遅れていた。これらの結果から,CBAMは海洋破片の検出に最適であることがわかった。しかし、ボトルネックトランスフォーマは手動アノテーションで見落とされた部分を検出し、大きな破片のマスク精度が向上し、実用的な性能が向上する可能性があることに注意すべきである。

Marine debris is an important issue for environmental protection, but current methods for locating marine debris are yet limited. In order to achieve higher efficiency and wider applicability in the localization of Marine debris, this study tries to combine the instance segmentation of YOLOv7 with different attention mechanisms and explores the best model. By utilizing a labelled dataset consisting of satellite images containing ocean debris, we examined three attentional models including lightweight coordinate attention, CBAM (combining spatial and channel focus), and bottleneck transformer (based on self-attention). Box detection assessment revealed that CBAM achieved the best outcome (F1 score of 77%) compared to coordinate attention (F1 score of 71%) and YOLOv7/bottleneck transformer (both F1 scores around 66%). Mask evaluation showed CBAM again leading with an F1 score of 73%, whereas coordinate attention and YOLOv7 had comparable performances (around F1 score of 68%/69%) and bottleneck transformer lagged behind at F1 score of 56%. These findings suggest that CBAM offers optimal suitability for detecting marine debris. However, it should be noted that the bottleneck transformer detected some areas missed by manual annotation and displayed better mask precision for larger debris pieces, signifying potentially superior practical performance.

翻訳日:2023-07-11 15:22:45 公開日:2023-07-09

# 対話のための言語間韻律伝達に向けて

Towards cross-language prosody transfer for dialog ( http://arxiv.org/abs/2307.04123v1 )

ライセンス: Link先を確認

Jonathan E. Avila, Nigel G. Ward

(参考訳) 現在、音声音声翻訳システムは、対話目的の使用を十分にサポートしていない。特に、不適切な韻律移動により話者意図や姿勢のニュアンスを失うことがある。我々はこれを克服するためにすべきことを探求する。まず, 英語とスペイン語のコーパスを収集するために, 1871年のマッチング発話ペアを用いて, バイリンガル話者が他の言語での会話から発話を再現するデータ収集プロトコルを開発した。第2に,幅広い韻律的特徴集合上のユークリッド距離に基づく簡易な韻律的異性度尺度を開発した。次にこれらを用いて、言語間の韻律的差異を調査し、3つの単純なベースラインモデルの有用性を測定し、より強力なモデリングを必要とする現象を特定する。本研究は, 言語間韻律に関する今後の研究や, 効果的韻律伝達が可能な音声音声翻訳システムの設計について報告する。

Speech-to-speech translation systems today do not adequately support use for dialog purposes. In particular, nuances of speaker intent and stance can be lost due to improper prosody transfer. We present an exploration of what needs to be done to overcome this. First, we developed a data collection protocol in which bilingual speakers re-enact utterances from an earlier conversation in their other language, and used this to collect an English-Spanish corpus, so far comprising 1871 matched utterance pairs. Second, we developed a simple prosodic dissimilarity metric based on Euclidean distance over a broad set of prosodic features. We then used these to investigate cross-language prosodic differences, measure the likely utility of three simple baseline models, and identify phenomena which will require more powerful modeling. Our findings should inform future research on cross-language prosody and the design of speech-to-speech translation systems capable of effective prosody transfer.

翻訳日:2023-07-11 15:22:18 公開日:2023-07-09

# 赤外線符号化画像を用いた低光度画像の強調

Enhancing Low-Light Images Using Infrared-Encoded Images ( http://arxiv.org/abs/2307.04122v1 )

ライセンス: Link先を確認

Shulin Tian, Yufei Wang, Renjie Wan, Wenhan Yang, Alex C. Kot, Bihan Wen

(参考訳) 低照度画像強調タスクは、本質的に不備であるため、不可欠だが困難である。以前の芸術は、ピクセル単位での損失を用いて可視光スペクトルで撮影された低光度画像を主に重視し、わずかな収入光子によって明るさ、コントラスト、テクスチャの詳細を回復する能力を制限する。本研究では,低光環境下で撮影される画像の可視性を向上させるために,赤外線遮断フィルタ(ir)を除去し,より多くの光子を捕捉し,irスペクトルからの情報を包含することで信号対雑音比が向上する手法を提案する。提案手法を検証するために,irカットオフフィルタを使わずに撮像された低光画像と,外部フィルタを用いた長時間露光参照画像のペアデータセットを収集した。その結果,提案手法の有効性が実証され,定量的,質的に性能が向上した。データセットとコードはhttps://wyf0912.github.io/ELIEI/で公開されている。

Low-light image enhancement task is essential yet challenging as it is ill-posed intrinsically. Previous arts mainly focus on the low-light images captured in the visible spectrum using pixel-wise loss, which limits the capacity of recovering the brightness, contrast, and texture details due to the small number of income photons. In this work, we propose a novel approach to increase the visibility of images captured under low-light environments by removing the in-camera infrared (IR) cut-off filter, which allows for the capture of more photons and results in improved signal-to-noise ratio due to the inclusion of information from the IR spectrum. To verify the proposed strategy, we collect a paired dataset of low-light images captured without the IR cut-off filter, with corresponding long-exposure reference images with an external filter. The experimental results on the proposed dataset demonstrate the effectiveness of the proposed method, showing better performance quantitatively and qualitatively. The dataset and code are publicly available at https://wyf0912.github.io/ELIEI/

翻訳日:2023-07-11 15:22:06 公開日:2023-07-09

# 双曲偏微分方程式を解くための深層学習フレームワーク:その1

A Deep Learning Framework for Solving Hyperbolic Partial Differential Equations: Part I ( http://arxiv.org/abs/2307.04121v1 )

ライセンス: Link先を確認

Rajat Arora

(参考訳) 物理情報ニューラルネットワーク(PINN)は、偏微分方程式(PDE)に対する解の堅牢かつ正確な近似を提供する強力なツールとして登場した。しかし、PINNは、PDEを支配的な双曲的特徴と近似しようとする際に深刻な困難と課題に直面している。本研究は, 非線形pdesに対する近似解法として, aプライオリな解の知識や不連続の場所を知らずに, 衝撃や不連続を生じさせる物理学的インフォームド深層学習フレームワークの開発に焦点をあてている。この研究は、離散化された領域のノードにおける解の値を解く有限要素法から動機づけられ、これらのノーダル値を用いてグローバルに定義された解体を得る。不連続ガレルキン法の厳密な数学的基礎の上に構築され、この枠組みは境界条件(ノイマン/ディリクレ)、エントロピー条件、および正則性要件を自然に扱う。解析解を用いた数値実験と検証により,提案手法の精度,堅牢性,有効性を示す。

Physics informed neural networks (PINNs) have emerged as a powerful tool to provide robust and accurate approximations of solutions to partial differential equations (PDEs). However, PINNs face serious difficulties and challenges when trying to approximate PDEs with dominant hyperbolic character. This research focuses on the development of a physics informed deep learning framework to approximate solutions to nonlinear PDEs that can develop shocks or discontinuities without any a-priori knowledge of the solution or the location of the discontinuities. The work takes motivation from finite element method that solves for solution values at nodes in the discretized domain and use these nodal values to obtain a globally defined solution field. Built on the rigorous mathematical foundations of the discontinuous Galerkin method, the framework naturally handles imposition of boundary conditions (Neumann/Dirichlet), entropy conditions, and regularity requirements. Several numerical experiments and validation with analytical solutions demonstrate the accuracy, robustness, and effectiveness of the proposed framework.

翻訳日:2023-07-11 15:21:39 公開日:2023-07-09

# FILM: 事前学習言語モデルによる画像分類はどのように適合するか?

FILM: How can Few-Shot Image Classification Benefit from Pre-Trained Language Models? ( http://arxiv.org/abs/2307.04114v1 )

ライセンス: Link先を確認

Zihao Jiang, Yunkai Dang, Dong Pang, Huishuai Zhang, Weiran Huang

(参考訳) 少数のサンプルしか持たない新しいクラスに一般化可能なモデルをトレーニングすることを目的としている。近年、クラス名からアクセス可能なセマンティック情報を用いて、少数ショット学習を強化するための一連の研究が提案されている。しかし、これらの作業は、標準のマイナショット学習フレームワークのビジュアルプロトタイプや機能抽出子などの既存のモジュールの改善に焦点を当てている。これにより、意味情報の完全な利用が制限される。本稿では,コントラスト学習に基づく事前学習言語モデルを用いた,新しい数発学習フレームワークを提案する。テキストベースの事前学習言語モデルから得られる視覚的特徴とテキスト埋め込みの整合性に対処するため,フレームワークのテキスト分岐を慎重に設計し,コサイン類似性を一般化するためのメトリックモジュールを導入する。転送性を向上させるため、メトリックモジュールを異なる数ショットタスクに適応させ、MAMLを採用してバイレベル最適化によりモデルをトレーニングする。さらに,本手法の有効性を実証するため,複数のベンチマーク実験を行った。

Few-shot learning aims to train models that can be generalized to novel classes with only a few samples. Recently, a line of works are proposed to enhance few-shot learning with accessible semantic information from class names. However, these works focus on improving existing modules such as visual prototypes and feature extractors of the standard few-shot learning framework. This limits the full potential use of semantic information. In this paper, we propose a novel few-shot learning framework that uses pre-trained language models based on contrastive learning. To address the challenge of alignment between visual features and textual embeddings obtained from text-based pre-trained language model, we carefully design the textual branch of our framework and introduce a metric module to generalize the cosine similarity. For better transferability, we let the metric module adapt to different few-shot tasks and adopt MAML to train the model via bi-level optimization. Moreover, we conduct extensive experiments on multiple benchmarks to demonstrate the effectiveness of our method.

翻訳日:2023-07-11 15:21:06 公開日:2023-07-09

# フレーム次フリップによるデータセット生成による部分アノテーションからのミトーシスの検出

Mitosis Detection from Partial Annotation by Dataset Generation via Frame-Order Flipping ( http://arxiv.org/abs/2307.04113v1 )

ライセンス: Link先を確認

Kazuya Nishimura, Ami Katanaya, Shinichiro Chuma, Ryoma Bise

(参考訳) 分裂現象の検出は、生物医学研究において重要な役割を担っている。深層学習に基づくミトーシス検出法は,一定のラベル付きデータを用いて優れた性能を達成している。しかし、これらの手法は各撮像条件にアノテーションを必要とする。ラベル付きデータの収集には時間を要する。本稿では,部分的に注釈付きシーケンスでトレーニング可能なミオシス検出法を提案する。基本的なアイデアは、部分ラベルから完全なラベル付きデータセットを生成し、生成されたデータセットで分裂検出モデルをトレーニングすることだ。まず,フレーム次反転によりmitosisイベントを含まない画像対を生成する。次に,アルファブレイディングペーストにより画像ペアにmitosisイベントをペーストし,完全なラベル付きデータセットを生成する。提案手法は,4つのデータセット上での性能を実証し,部分ラベル付きシーケンスを用いた他の比較よりも優れていることを確認した。

Detection of mitosis events plays an important role in biomedical research. Deep-learning-based mitosis detection methods have achieved outstanding performance with a certain amount of labeled data. However, these methods require annotations for each imaging condition. Collecting labeled data involves time-consuming human labor. In this paper, we propose a mitosis detection method that can be trained with partially annotated sequences. The base idea is to generate a fully labeled dataset from the partial labels and train a mitosis detection model with the generated dataset. First, we generate an image pair not containing mitosis events by frame-order flipping. Then, we paste mitosis events to the image pair by alpha-blending pasting and generate a fully labeled dataset. We demonstrate the performance of our method on four datasets, and we confirm that our method outperforms other comparisons which use partially labeled sequences.

翻訳日:2023-07-11 15:20:12 公開日:2023-07-09

# 部分観測状態からの時空間連続型PDEの学習

Learning Space-Time Continuous Neural PDEs from Partially Observed States ( http://arxiv.org/abs/2307.04110v1 )

ライセンス: Link先を確認

Valerii Iakovlev, Markus Heinonen, Harri L\"ahdesm\"aki

(参考訳) 本稿では,不規則時空間格子上の雑音および部分観測から偏微分方程式(pdes)を学習するための新しい格子非依存モデルを提案する。本稿では,効率的な確率的枠組みを持つ時空連続潜在性ニューラルpdeモデルと,データ効率とグリッド独立性を改善する新しいエンコーダ設計を提案する。潜在状態力学は、コロケーション法とライン法を組み合わせたPDEモデルによって制御される。近似後推定にアモータイズされた変分推定を用い、訓練速度と安定性を向上させるために多重射撃法を用いる。本モデルは,複雑な合成データと実世界のデータセットにおける最先端のパフォーマンスを示し,従来のアプローチの限界を克服し,部分的に観測されたデータを効果的に処理する。提案手法は,データ駆動pdeモデリングを前進させる可能性を示し,複雑な部分観測動的プロセスのロバストでグリッド非依存なモデリングを可能にする。

We introduce a novel grid-independent model for learning partial differential equations (PDEs) from noisy and partial observations on irregular spatiotemporal grids. We propose a space-time continuous latent neural PDE model with an efficient probabilistic framework and a novel encoder design for improved data efficiency and grid independence. The latent state dynamics are governed by a PDE model that combines the collocation method and the method of lines. We employ amortized variational inference for approximate posterior estimation and utilize a multiple shooting technique for enhanced training speed and stability. Our model demonstrates state-of-the-art performance on complex synthetic and real-world datasets, overcoming limitations of previous approaches and effectively handling partially-observed data. The proposed model outperforms recent methods, showing its potential to advance data-driven PDE modeling and enabling robust, grid-independent modeling of complex partially-observed dynamic processes.

翻訳日:2023-07-11 15:19:52 公開日:2023-07-09

# 鳥眼視における物体検出とセグメンテーションのためのパラメトリック奥行きに基づく特徴表現学習

Parametric Depth Based Feature Representation Learning for Object Detection and Segmentation in Bird's Eye View ( http://arxiv.org/abs/2307.04106v1 )

ライセンス: Link先を確認

Jiayu Yang, Enze Xie, Jose M. Alvarez, Miaomiao Liu

(参考訳) 近年の自律走行のための視覚のみの知覚モデルは、多視点画像特徴をバードアイビュー(BEV)空間に符号化することで有望な結果を得た。これらの手法の主なボトルネックは、画像特徴をBEV座標フレームに変換することである。本稿では,そのような特徴変換をモデル化するために,深度などの幾何学情報を活用することに焦点を当てる。既存の研究は、メモリ消費に繋がる非パラメトリックな深さ分布モデリングや、この問題に対処する幾何情報を無視している。対照的に、特徴変換にパラメトリック深度分布モデルを用いることを提案する。まず2次元画像の特徴をego車両で定義された3次元空間に持ち上げ,各ビューにおける各画素のパラメトリック深度分布を予測した。次に、深度からBEVフレームへの3次元空間占有度に基づいて、3次元特徴量を集約する。最後に、オブジェクト検出やセマンティクスセグメンテーションといった下流タスクに変換された機能を使用します。既存のセマンティックセグメンテーション手法は、視覚的な情報を考慮に入れないため、幻覚的な問題にも悩まされる。この幻覚は、制御や計画といった後続のモジュールでは特に問題となる。この問題を軽減するため,本手法は深度不確実性と信頼性の高い可視性評価を行う。我々はさらにパラメトリック深度モデルを用いて、幻覚の問題を緩和できる新しい可視性を考慮した評価指標を提案する。 nuscenesデータセットにおけるオブジェクト検出とセマンティクスセグメンテーションに関する広範な実験により,提案手法が両タスクにおいて既存の手法よりも優れていることが証明された。

Recent vision-only perception models for autonomous driving achieved promising results by encoding multi-view image features into Bird's-Eye-View (BEV) space. A critical step and the main bottleneck of these methods is transforming image features into the BEV coordinate frame. This paper focuses on leveraging geometry information, such as depth, to model such feature transformation. Existing works rely on non-parametric depth distribution modeling leading to significant memory consumption, or ignore the geometry information to address this problem. In contrast, we propose to use parametric depth distribution modeling for feature transformation. We first lift the 2D image features to the 3D space defined for the ego vehicle via a predicted parametric depth distribution for each pixel in each view. Then, we aggregate the 3D feature volume based on the 3D space occupancy derived from depth to the BEV frame. Finally, we use the transformed features for downstream tasks such as object detection and semantic segmentation. Existing semantic segmentation methods do also suffer from an hallucination problem as they do not take visibility information into account. This hallucination can be particularly problematic for subsequent modules such as control and planning. To mitigate the issue, our method provides depth uncertainty and reliable visibility-aware estimations. We further leverage our parametric depth modeling to present a novel visibility-aware evaluation metric that, when taken into account, can mitigate the hallucination problem. Extensive experiments on object detection and semantic segmentation on the nuScenes datasets demonstrate that our method outperforms existing methods on both tasks.

翻訳日:2023-07-11 15:19:36 公開日:2023-07-09

# 無仮定バイアス緩和に向けて

Towards Assumption-free Bias Mitigation ( http://arxiv.org/abs/2307.04105v1 )

ライセンス: Link先を確認

Chia-Yuan Chang, Yu-Neng Chuang, Kwei-Herng Lai, Xiaotian Han, Xia Hu, Na Zou

(参考訳) 驚くべき予測能力にもかかわらず、機械学習モデルは特定の人口層に対する差別を示し、不公平な予測行動に苦しむ。差別を緩和するために、広範囲な研究は複数のアプローチによる機密属性の不等な分布の排除に焦点を当てている。しかしながら、プライバシ上の懸念から、センシティブな属性は現実のシナリオでは利用できないか、あるいは欠落していることが多い。したがって、いくつかの既存の研究は、敏感な属性なしでバイアスを軽減する。これらの研究は、センシティブな属性の不正確な予測や、バイアスに関連する手動で定義された非センシティブな属性の不平等な分布の緩和といった課題に直面している。後者は、感度特性と非感度特性の相関について強い仮定を必要とする。データ分散とタスクの目標が異なるため、非感受性属性に対する強い仮定は有効ではなく、ドメインの専門知識を必要とする可能性がある。本研究では,バイアス緩和のための特徴的相互作用をモデル化し,関連する属性を自動的に検出する前提なしフレームワークを提案する。提案するフレームワークは、特定されたバイアスのある特徴相互作用による不公平な影響を軽減することを目的としている。実世界の4つのデータセットに対する実験結果から,提案するフレームワークは,偏りのある特徴相互作用を考慮し,不当な予測行動を著しく軽減できることが示された。

Despite the impressive prediction ability, machine learning models show discrimination towards certain demographics and suffer from unfair prediction behaviors. To alleviate the discrimination, extensive studies focus on eliminating the unequal distribution of sensitive attributes via multiple approaches. However, due to privacy concerns, sensitive attributes are often either unavailable or missing in real-world scenarios. Therefore, several existing works alleviate the bias without sensitive attributes. Those studies face challenges, either in inaccurate predictions of sensitive attributes or the need to mitigate unequal distribution of manually defined non-sensitive attributes related to bias. The latter requires strong assumptions about the correlation between sensitive and non-sensitive attributes. As data distribution and task goals vary, the strong assumption on non-sensitive attributes may not be valid and require domain expertise. In this work, we propose an assumption-free framework to detect the related attributes automatically by modeling feature interaction for bias mitigation. The proposed framework aims to mitigate the unfair impact of identified biased feature interactions. Experimental results on four real-world datasets demonstrate that our proposed framework can significantly alleviate unfair prediction behaviors by considering biased feature interactions.

翻訳日:2023-07-11 15:19:14 公開日:2023-07-09

# CA-CentripetalNet: ハードハット着用検出のための新しいアンカーフリーディープラーニングフレームワーク

CA-CentripetalNet: A novel anchor-free deep learning framework for hardhat wearing detection ( http://arxiv.org/abs/2307.04103v1 )

ライセンス: Link先を確認

Zhijian Liu, Nian Cai, Wensheng Ouyang, Chengbin Zhang, Nili Tian, Han Wang

(参考訳) 検出用ヘルメットの自動着用は、複雑なビデオ監視シーンのため、建設現場の安全管理を強化することができる。従来の深層学習手法の一般化に対処するために,CA-CentripetalNetと呼ばれる新しいアンカーフリー深層学習フレームワークが提案されている。垂直水平コーナープール型ca-centripetalnetの特性抽出と利用能力の向上を目的として, 2つの新しい手法を提案した。前者は限界特徴と内部特徴の包括的利用を実現するように設計されている。後者は、バックボーンが内部機能に注意を払わなければならないように設計されており、これは検出中ではなくトレーニング中にのみ使用される。実験結果から,CA-CentripetalNet は 86.63% mAP (平均平均精度) で,既存のディープラーニングベースの手法,特に小型のハードハットや非ウーンハードハットと比較して,メモリ消費を適度に削減した。

Automatic hardhat wearing detection can strengthen the safety management in construction sites, which is still challenging due to complicated video surveillance scenes. To deal with the poor generalization of previous deep learning based methods, a novel anchor-free deep learning framework called CA-CentripetalNet is proposed for hardhat wearing detection. Two novel schemes are proposed to improve the feature extraction and utilization ability of CA-CentripetalNet, which are vertical-horizontal corner pooling and bounding constrained center attention. The former is designed to realize the comprehensive utilization of marginal features and internal features. The latter is designed to enforce the backbone to pay attention to internal features, which is only used during the training rather than during the detection. Experimental results indicate that the CA-CentripetalNet achieves better performance with the 86.63% mAP (mean Average Precision) with less memory consumption at a reasonable speed than the existing deep learning based methods, especially in case of small-scale hardhats and non-worn-hardhats.

翻訳日:2023-07-11 15:18:54 公開日:2023-07-09

# DIFF-NST: 変形可能な神経伝達のための拡散インターリーブ

DIFF-NST: Diffusion Interleaving For deFormable Neural Style Transfer ( http://arxiv.org/abs/2307.04157v1 )

ライセンス: Link先を確認

Dan Ruta, Gemma Canet Tarr\'es, Andrew Gilbert, Eli Shechtman, Nicholas Kolkin, John Collomosse

(参考訳) ニューラルスタイル転送(Neural Style Transfer, NST)は、コンテンツイメージの芸術的外観を、参照スタイルイメージのスタイルに合わせるために、ニューラルテクニックを適用した研究分野である。伝統的に、NST法はテクスチャベースの画像編集に重点を置いており、ほとんどの低レベル情報に影響を与え、ほとんどの画像構造を同じに保っている。しかし、特にそのスタイルが抽象的である場合や、スタイルの主要な概念が、一部のコンテンツの変形したレンドレーションにある場合など、一部のスタイルには、スタイルに基づく変形が望ましい。安定拡散など最近の拡散モデルの導入により、より強力な画像生成技術にアクセスでき、新しい可能性を可能にしている。本研究では,従来のモデルにおいて,変形可能なスタイル転送を実現しつつ,スタイル転送を行うために,この新しいモデルのクラスを提案する。我々は,これらのモデルの先行的活用が推論時に新たな芸術的制御を顕在化できることを示すとともに,この新たなスタイル伝達の方向性を探究する上での知見を文書化する。

Neural Style Transfer (NST) is the field of study applying neural techniques to modify the artistic appearance of a content image to match the style of a reference style image. Traditionally, NST methods have focused on texture-based image edits, affecting mostly low level information and keeping most image structures the same. However, style-based deformation of the content is desirable for some styles, especially in cases where the style is abstract or the primary concept of the style is in its deformed rendition of some content. With the recent introduction of diffusion models, such as Stable Diffusion, we can access far more powerful image generation techniques, enabling new possibilities. In our work, we propose using this new class of models to perform style transfer while enabling deformable style transfer, an elusive capability in previous models. We show how leveraging the priors of these models can expose new artistic controls at inference time, and we document our findings in exploring this new direction for the field of style transfer.

翻訳日:2023-07-11 15:10:49 公開日:2023-07-09

# 空間文脈拡張のための潜在グラフ注意

Latent Graph Attention for Enhanced Spatial Context ( http://arxiv.org/abs/2307.04149v1 )

ライセンス: Link先を確認

Ayush Singh, Yash Bhambhu, Himanshu Buckchash, Deepak K. Gupta, Dilip K. Prasad

(参考訳) 画像のグローバルコンテキストは、画像から画像への翻訳問題で非常に有用である。従来のアテンションベースモデルとグラフベースモデルは、グローバルコンテキストをかなり捉えているが、これらは計算コストが高い。さらに、既存のアプローチは、画像上の任意の2点間のペアワイズ意味関係を学習することのみに限られる。本稿では、LGA(Latent Graph Attention)を、計算コストが低く(ノード数に比例して)、かつ、既存のアーキテクチャにグローバルコンテキストを組み込むための、安定的でモジュール化されたフレームワークとして提案する。 lgaは局所連結グラフのネットワークを用いて空間的に情報を伝達し、中間画素の影響も考慮した2つの空間的距離点間の意味的にコヒーレントな関係の構築を容易にする。さらに、グラフネットワークの深さを利用して、ターゲットデータセットへのコンテキスト拡散の程度を調整し、追加の計算コストを明示的に制御することができる。また,LGAの学習機構を向上するために,LGAモジュールを計算負荷の最小化を犠牲にして,元のアーキテクチャとうまく結合するのに役立つ新しい対照的な損失項を導入する。 LGAを取り入れることで、透明なオブジェクトセグメンテーション、デハジングのための画像復元、光フロー推定という3つの難解なアプリケーションの性能が向上することを示す。

Global contexts in images are quite valuable in image-to-image translation problems. Conventional attention-based and graph-based models capture the global context to a large extent, however, these are computationally expensive. Moreover, the existing approaches are limited to only learning the pairwise semantic relation between any two points on the image. In this paper, we present Latent Graph Attention (LGA) a computationally inexpensive (linear to the number of nodes) and stable, modular framework for incorporating the global context in the existing architectures, especially empowering small-scale architectures to give performance closer to large size architectures, thus making the light-weight architectures more useful for edge devices with lower compute power and lower energy needs. LGA propagates information spatially using a network of locally connected graphs, thereby facilitating to construct a semantically coherent relation between any two spatially distant points that also takes into account the influence of the intermediate pixels. Moreover, the depth of the graph network can be used to adapt the extent of contextual spread to the target dataset, thereby being able to explicitly control the added computational cost. To enhance the learning mechanism of LGA, we also introduce a novel contrastive loss term that helps our LGA module to couple well with the original architecture at the expense of minimal additional computational load. We show that incorporating LGA improves the performance on three challenging applications, namely transparent object segmentation, image restoration for dehazing and optical flow estimation.

翻訳日:2023-07-11 15:10:30 公開日:2023-07-09

# チャート分類に関する調査とアプローチ

A Survey and Approach to Chart Classification ( http://arxiv.org/abs/2307.04147v1 )

ライセンス: Link先を確認

Anurag Dhote and Mohammed Javed and David S Doermann

(参考訳) チャートは文書における視覚情報の本質的な情報源であり、典型的には数値的に伝えられる情報の深い理解と解釈を促進する。科学文献には多くの図表があり、それぞれに様式的な違いがある。近年,文書理解コミュニティは,表分類から始まる自動チャート理解の問題に対処し始めている。本稿では,グラフ分類の最先端技術に関する調査を行い,利用可能なデータセットとその対応するチャートタイプについて考察する。これらの貢献をml、cnn、transformersに基づいた従来のアプローチに大まかに分類します。さらに、ICPR 2022におけるCHART-InfographicsコンペティションのためのCHARTINFO UB-UNITECH PMCデータセットについて、CNNベースのアプローチとトランスフォーマーベースのアプローチの比較分析を行った。データセットには、22,923のトレーニングイメージと13,260のテストイメージを含む15の異なるチャートカテゴリが含まれている。我々は,グラフ分類における最先端結果を生成するビジョンベーストランスフォーマーモデルを実装した。

Charts represent an essential source of visual information in documents and facilitate a deep understanding and interpretation of information typically conveyed numerically. In the scientific literature, there are many charts, each with its stylistic differences. Recently the document understanding community has begun to address the problem of automatic chart understanding, which begins with chart classification. In this paper, we present a survey of the current state-of-the-art techniques for chart classification and discuss the available datasets and their supported chart types. We broadly classify these contributions as traditional approaches based on ML, CNN, and Transformers. Furthermore, we carry out an extensive comparative performance analysis of CNN-based and transformer-based approaches on the recently published CHARTINFO UB-UNITECH PMC dataset for the CHART-Infographics competition at ICPR 2022. The data set includes 15 different chart categories, including 22,923 training images and 13,260 test images. We have implemented a vision-based transformer model that produces state-of-the-art results in chart classification.

翻訳日:2023-07-11 15:10:03 公開日:2023-07-09

# 機械学習のランダム性がグループフェアネスに及ぼす影響について

On The Impact of Machine Learning Randomness on Group Fairness ( http://arxiv.org/abs/2307.04138v1 )

ライセンス: Link先を確認

Prakhar Ganesh, Hongyan Chang, Martin Strobel, Reza Shokri

(参考訳) 機械学習におけるグループフェアネスの統計的尺度は、異なるグループにわたるアルゴリズムのパフォーマンスのギャップを反映している。しかし、これらの尺度は異なるトレーニングインスタンス間で高いばらつきを示し、公平さの実証的評価には信頼できない。この大きなばらつきの原因は何でしょう? ニューラルネットワークのトレーニングにおけるランダム性の異なる源の群フェアネスへの影響について検討する。グループフェアネス尺度のばらつきは、非表現群における学習過程の高ボラティリティに根ざしていることを示す。さらに,学習中のデータ順序の確率性として,ランダム性の主源が認識される。これらの結果から,グループレベルの精度(すなわちモデルフェアネス)を1つのエポックのデータ順序を変更するだけで,モデル全体の性能に高い効率と無視可能な影響で制御できることを示す。

Statistical measures for group fairness in machine learning reflect the gap in performance of algorithms across different groups. These measures, however, exhibit a high variance between different training instances, which makes them unreliable for empirical evaluation of fairness. What causes this high variance? We investigate the impact on group fairness of different sources of randomness in training neural networks. We show that the variance in group fairness measures is rooted in the high volatility of the learning process on under-represented groups. Further, we recognize the dominant source of randomness as the stochasticity of data order during training. Based on these findings, we show how one can control group-level accuracy (i.e., model fairness), with high efficiency and negligible impact on the model's overall performance, by simply changing the data order for a single epoch.

翻訳日:2023-07-11 15:09:51 公開日:2023-07-09

# 画像分類問題における説明可能な人工知能モデル

A Novel Explainable Artificial Intelligence Model in Image Classification problem ( http://arxiv.org/abs/2307.04137v1 )

ライセンス: Link先を確認

Quoc Hung Cao, Truong Thanh Hung Nguyen, Vo Thanh Khang Nguyen, Xuan Phong Nguyen

(参考訳) 近年、人工知能は様々な分野に広く適用され、人間の生活に深く直接的に影響を与えるようになっている。次に、予測を行うモデルの原則を理解する必要がある。現在の高精度モデルのほとんどはブラックボックスであるため、AI科学者もエンドユーザもこれらのモデル内で何が起きているのかを深く理解していません。したがって、AIモデル、特にLIME、CAM、GradCAMといったコンピュータビジョンの分野における画像分類の問題を説明するために、多くのアルゴリズムが研究されている。しかし、これらのアルゴリズムには、limeの長い実行時間やcamの具体性と明快さの紛らわしい解釈といった制限がある。そこで本稿では,これらのアルゴリズムの利点を組み合わせたセグメンテーション-クラス活性化マッピング(SeCAM)という新しい手法を提案する。我々は、このアルゴリズムを、画像Net Large Scale Visual Recognition Challenge (ILSVRC)データセットのResNet50、Inception-v3、VGG16など様々なモデルでテストした。アルゴリズムが特定の説明に対する全ての要求を非常に簡潔な時間で満たした際、優れた結果が得られる。

In recent years, artificial intelligence is increasingly being applied widely in many different fields and has a profound and direct impact on human life. Following this is the need to understand the principles of the model making predictions. Since most of the current high-precision models are black boxes, neither the AI scientist nor the end-user deeply understands what's going on inside these models. Therefore, many algorithms are studied for the purpose of explaining AI models, especially those in the problem of image classification in the field of computer vision such as LIME, CAM, GradCAM. However, these algorithms still have limitations such as LIME's long execution time and CAM's confusing interpretation of concreteness and clarity. Therefore, in this paper, we propose a new method called Segmentation - Class Activation Mapping (SeCAM) that combines the advantages of these algorithms above, while at the same time overcoming their disadvantages. We tested this algorithm with various models, including ResNet50, Inception-v3, VGG16 from ImageNet Large Scale Visual Recognition Challenge (ILSVRC) data set. Outstanding results when the algorithm has met all the requirements for a specific explanation in a remarkably concise time.

翻訳日:2023-07-11 15:09:39 公開日:2023-07-09

# ECL:ロングテール皮膚病変分類のためのクラスエンハンスメントコントラスト学習

ECL: Class-Enhancement Contrastive Learning for Long-tailed Skin Lesion Classification ( http://arxiv.org/abs/2307.04136v1 )

ライセンス: Link先を確認

Yilan Zhang, Jianqi Chen, Ke Wang, Fengying Xie

(参考訳) 皮膚画像データセットは、しばしば不均衡なデータ分布に悩まされ、コンピュータ支援皮膚疾患の診断が困難になる。最近の研究では、この長い課題に対して教師付きコントラスト学習(SCL)を活用している。性能は高いが、これらのSCLベースの手法はヘッドクラスに重点を置いているが、テールクラスにおける情報の利用は無視している。本稿では,マイノリティクラスの情報を充実させ,異なるクラスを等しく扱う,ecl(class-enhancement contrastive learning)を提案する。情報強化のために,クラス依存プロキシを生成するハイブリッドプロキシモデルを設計し,パラメータ最適化のためのサイクル更新戦略を提案する。 balanced-hybrid-proxy lossは、異なるクラスで等しく扱われるサンプルとプロキシの関係を利用するように設計されている。さらに,「不均衡データ」と「不均衡診断困難」を考慮に入れ,カリキュラム学習スケジュールに従って,バランスのとれたクロスエントロピー損失を示す。不均衡皮膚病変データの分類実験の結果,本手法の優位性と有効性が確認された。

Skin image datasets often suffer from imbalanced data distribution, exacerbating the difficulty of computer-aided skin disease diagnosis. Some recent works exploit supervised contrastive learning (SCL) for this long-tailed challenge. Despite achieving significant performance, these SCL-based methods focus more on head classes, yet ignoring the utilization of information in tail classes. In this paper, we propose class-Enhancement Contrastive Learning (ECL), which enriches the information of minority classes and treats different classes equally. For information enhancement, we design a hybrid-proxy model to generate class-dependent proxies and propose a cycle update strategy for parameters optimization. A balanced-hybrid-proxy loss is designed to exploit relations between samples and proxies with different classes treated equally. Taking both "imbalanced data" and "imbalanced diagnosis difficulty" into account, we further present a balanced-weighted cross-entropy loss following curriculum learning schedule. Experimental results on the classification of imbalanced skin lesion data have demonstrated the superiority and effectiveness of our method.

翻訳日:2023-07-11 15:09:20 公開日:2023-07-09

# 超音波画像のアノテーション除去 : 自己教師付きノイズ2ノイズアプローチ

Ultrasonic Image's Annotation Removal: A Self-supervised Noise2Noise Approach ( http://arxiv.org/abs/2307.04133v1 )

ライセンス: Link先を確認

Yuanheng Zhang, Nan Jiang, Zhaoheng Xie, Junying Cao, Yueyang Teng

(参考訳) 正確な注釈付き超音波画像は、高品質な医療報告の重要な構成要素である。病院はしばしば、撮像結果に現れるべきアノテーションの種類について厳格なガイドラインを持っている。しかし、手動でこれらの画像を検査するのは面倒な作業です。ニューラルネットワークはプロセスを自動化する可能性があるが、そのようなモデルのトレーニングは通常、ペア化された入力とターゲットイメージのデータセットを必要とする。本研究では,画像中のアノテーションを自動検出する手法を提案する。これは、アノテーションをノイズとして扱い、自己教師付きプリテキストタスクを作成し、ノイズ2noiseスキームでトレーニングされたモデルを使用して、画像をクリーンな状態に復元することで実現される。我々は、ボディマーカーアノテーションやラジアルラインアノテーションなど、様々なタイプのアノテーションに対して、分節タスクで様々なモデル構造をテストした。その結果,ノイズ2ノイズ方式でトレーニングされたほとんどのモデルは,ノイズとクリーンなデータペアでトレーニングしたモデルよりも優れていた。コスチュームされたu-netは、ボディマーカーアノテーションデータセットにおいて最も最適な結果となり、セグメンテーションの精度と再構成の類似度が高い。私たちはコードをhttps://github.com/grandarth/ultrasonicimage-n2n-approachでリリースした。

Accurately annotated ultrasonic images are vital components of a high-quality medical report. Hospitals often have strict guidelines on the types of annotations that should appear on imaging results. However, manually inspecting these images can be a cumbersome task. While a neural network could potentially automate the process, training such a model typically requires a dataset of paired input and target images, which in turn involves significant human labour. This study introduces an automated approach for detecting annotations in images. This is achieved by treating the annotations as noise, creating a self-supervised pretext task and using a model trained under the Noise2Noise scheme to restore the image to a clean state. We tested a variety of model structures on the denoising task against different types of annotation, including body marker annotation, radial line annotation, etc. Our results demonstrate that most models trained under the Noise2Noise scheme outperformed their counterparts trained with noisy-clean data pairs. The costumed U-Net yielded the most optimal outcome on the body marker annotation dataset, with high scores on segmentation precision and reconstruction similarity. We released our code at https://github.com/GrandArth/UltrasonicImage-N2N-Approach.

翻訳日:2023-07-11 15:09:01 公開日:2023-07-09

# 副詞型認識のためのビデオクリップにおける物体の挙動に関する推論

Reasoning over the Behaviour of Objects in Video-Clips for Adverb-Type Recognition ( http://arxiv.org/abs/2307.04132v1 )

ライセンス: Link先を確認

Amrit Diggavi Seshadri, Alessandra Russo

(参考訳) 本稿では,シーン系列を記述した副詞が,高レベルなオブジェクト・ビヘイビアの概念を推論することによって最も識別されるという直感に従い,生のビデオクリップから抽出されたオブジェクト・ビヘイビアを理由とする新しいフレームワークの設計を提案し,クリップの対応する副詞タイプを認識する。本手法は,ビデオクリップのアクションタイプが不明なより一般的な問題設定において,従来のシーンの副詞認識では,アクションタイプに基づくクリップの知識を前提としていたが,本手法は直接的に適用可能である。具体的には、生のビデオクリップから人間の解釈可能な物体の挙動を抽出する新しいパイプラインを提案し、これら抽出された事実を操作して副詞型を識別する新しいシンボルと変換器に基づく推論手法を提案する。実験の結果,提案手法は従来の技術に対して好適に機能することが示された。さらに,シンボリックビデオ処理の取り組みをサポートするために,生のビデオクリップから抽出したオブジェクトビヘイビアファクトの2つの新しいデータセット,msr-vtt-asp と activitynet-asp データセットをリリースする。

In this work, following the intuition that adverbs describing scene-sequences are best identified by reasoning over high-level concepts of object-behavior, we propose the design of a new framework that reasons over object-behaviours extracted from raw-video-clips to recognize the clip's corresponding adverb-types. Importantly, while previous works for general scene adverb-recognition assume knowledge of the clips underlying action-types, our method is directly applicable in the more general problem setting where the action-type of a video-clip is unknown. Specifically, we propose a novel pipeline that extracts human-interpretable object-behaviour-facts from raw video clips and propose novel symbolic and transformer based reasoning methods that operate over these extracted facts to identify adverb-types. Experiment results demonstrate that our proposed methods perform favourably against the previous state-of-the-art. Additionally, to support efforts in symbolic video-processing, we release two new datasets of object-behaviour-facts extracted from raw video clips - the MSR-VTT-ASP and ActivityNet-ASP datasets.

翻訳日:2023-07-11 15:08:42 公開日:2023-07-09

# 炭素効率のよいニューラルアーキテクチャ探索

Carbon-Efficient Neural Architecture Search ( http://arxiv.org/abs/2307.04131v1 )

ライセンス: Link先を確認

Yiyang Zhao and Tian Guo

(参考訳) 本研究は, モデル設計過程におけるエネルギーコストの低減と炭素効率の向上を目的としたニューラルアーキテクチャサーチ(NAS)の新たなアプローチを提案する。 carbon- efficient nas (ce-nas) と呼ばれるこのフレームワークは、異なるエネルギー要件を持つnas評価アルゴリズム、マルチ目的オプティマイザ、ヒューリスティックなgpu割り当て戦略で構成されている。 CE-NASは、現在の二酸化炭素排出量に基づくエネルギー効率サンプリングとエネルギー消費評価タスクを動的にバランスさせる。最近のnasベンチマークデータセットと2つのカーボントレースを用いて、ce-nasが3つのベースラインよりも優れた炭素と検索効率を達成していることを示す。

This work presents a novel approach to neural architecture search (NAS) that aims to reduce energy costs and increase carbon efficiency during the model design process. The proposed framework, called carbon-efficient NAS (CE-NAS), consists of NAS evaluation algorithms with different energy requirements, a multi-objective optimizer, and a heuristic GPU allocation strategy. CE-NAS dynamically balances energy-efficient sampling and energy-consuming evaluation tasks based on current carbon emissions. Using a recent NAS benchmark dataset and two carbon traces, our trace-driven simulations demonstrate that CE-NAS achieves better carbon and search efficiency than the three baselines.

翻訳日:2023-07-11 15:08:20 公開日:2023-07-09

# RGB-Event Transformer-Tracker におけるクロスモーダル直交高階化

Cross-modal Orthogonal High-rank Augmentation for RGB-Event Transformer-trackers ( http://arxiv.org/abs/2307.04129v1 )

ライセンス: Link先を確認

Zhiyu Zhu, Junhui Hou, and Dapeng Oliver Wu

(参考訳) 本稿では,RGBビデオとイベントデータからのクロスモーダルオブジェクト追跡の問題に対処する。複雑なクロスモーダル融合ネットワークを構築するのではなく、事前学習された視覚変換器(ViT)の大きな可能性を探る。特に,2つのモード間の広い分散ギャップを橋渡しし,網羅的な相互モーダル情報通信を可能にし,その能力を高めるプラグイン・アンド・プレイ・トレーニングの強化を微妙に調査する。具体的には,あるトークンの特定のモダリティをランダムにマスクして,異なるモダリティからのトークン間のインタラクションを積極的に実施するマスクモデリング戦略を提案する。マスキング戦略によるネットワーク振動を緩和し、さらにその正の効果を増幅するため、理論上は注意行列を正則化する直交高ランク損失を提案する。広汎な実験により、我々のプラグアンドプレイトレーニング強化技術は、追跡精度と成功率の両方の観点から、最先端の1ストリームと2ストリームのトラッカーを大幅に向上させることができることが示された。我々の新たな視点と発見は、強力なトレーニング済みのViTを使って、クロスモーダルデータをモデル化する分野に洞察をもたらす可能性がある。コードは公開される予定だ。

This paper addresses the problem of cross-modal object tracking from RGB videos and event data. Rather than constructing a complex cross-modal fusion network, we explore the great potential of a pre-trained vision Transformer (ViT). Particularly, we delicately investigate plug-and-play training augmentations that encourage the ViT to bridge the vast distribution gap between the two modalities, enabling comprehensive cross-modal information interaction and thus enhancing its ability. Specifically, we propose a mask modeling strategy that randomly masks a specific modality of some tokens to enforce the interaction between tokens from different modalities interacting proactively. To mitigate network oscillations resulting from the masking strategy and further amplify its positive effect, we then theoretically propose an orthogonal high-rank loss to regularize the attention matrix. Extensive experiments demonstrate that our plug-and-play training augmentation techniques can significantly boost state-of-the-art one-stream and twostream trackers to a large extent in terms of both tracking precision and success rate. Our new perspective and findings will potentially bring insights to the field of leveraging powerful pre-trained ViTs to model cross-modal data. The code will be publicly available.

翻訳日:2023-07-11 15:08:08 公開日:2023-07-09

# 周波数変調光パラメトリック発振器

Integrated frequency-modulated optical parametric oscillator ( http://arxiv.org/abs/2307.04200v1 )

ライセンス: Link先を確認

Hubert S. Stokowski, Devin J. Dean, Alexander Y. Hwang, Taewon Park, Oguz Tolga Celik, Marc Jankowski, Carsten Langrock, Vahid Ansari, Martin M. Fejer, and Amir H. Safavi-Naeini

(参考訳) 光周波数コムは精密測定、時間保存、分子分光に革命をもたらした。コーム生成技術をコンパクトで信頼性の高いフォトニックプラットフォームに統合することである。最近のマイクロコンブ生成のアプローチには、電気光学(eo)機構とケラー機構がある。急速な進歩にもかかわらず、高い効率と広い帯域幅を維持することは依然として困難である。本稿では、電気光学とパラメトリック増幅を組み合わせて周波数変調光パラメトリック発振器(FM-OPO)を生成する集積型光周波数コム発生器の新たなクラスを紹介する。 eoやカーコームとは対照的に、fm-opoマイクロコームはパルスを形成するのではなく、周波数変調レーザーに似た出力で操作の単純さと高効率なポンプ電力利用を維持している。 FM-OPOの動作原理を概説し, 薄膜ニオブ酸リチウム (LNOI) で完全な光学系を作製した。約1,000モード (約6 THz) にまたがるほぼ平らなスペクトル分布に対して, 内部変換効率が93%(34%外結合)を超えるようにポンプを計測した。 EOコムと比較して、損失よりもキャビティ分散がFM-OPO帯域幅を決定するので、より小さいRF変調パワーでブロードバンドコムを実現することができる。 fm-opoマイクロコームは、その堅牢な運用ダイナミクス、高効率、大きな帯域幅を持ち、マイクロコームの分野への新しいアプローチに貢献し、小型化による精密測定の時代と、メトロロジー、スペクトロスコピー、通信、センシング、コンピューティングの進歩を加速する分光ツールの確立を約束している。

Optical frequency combs have revolutionized precision measurement, time-keeping, and molecular spectroscopy. A substantial effort has developed around "microcombs": integrating comb-generating technologies into compact, reliable photonic platforms. Current approaches for generating these microcombs involve either the electro-optic (EO) or Kerr mechanisms. Despite rapid progress, maintaining high efficiency and wide bandwidth remains challenging. Here, we introduce a new class of microcomb -- an integrated optical frequency comb generator that combines electro-optics and parametric amplification to yield a frequency-modulated optical parametric oscillator (FM-OPO). In stark contrast to EO and Kerr combs, the FM-OPO microcomb does not form pulses but maintains operational simplicity and highly efficient pump power utilization with an output resembling a frequency-modulated laser. We outline the working principles of FM-OPO and demonstrate them by fabricating the complete optical system in thin-film lithium niobate (LNOI). We measure pump to comb internal conversion efficiency exceeding 93% (34% out-coupled) over a nearly flat-top spectral distribution spanning approximately 1,000 modes (approximately 6 THz). Compared to an EO comb, the cavity dispersion rather than loss determines the FM-OPO bandwidth, enabling broadband combs with a smaller RF modulation power. The FM-OPO microcomb, with its robust operational dynamics, high efficiency, and large bandwidth, contributes a new approach to the field of microcombs and promises to herald an era of miniaturized precision measurement, and spectroscopy tools to accelerate advancements in metrology, spectroscopy, telecommunications, sensing, and computing.

翻訳日:2023-07-11 15:02:24 公開日:2023-07-09

# 広波長可変薄膜ニオブ酸リチウム光パラメトリック発振器を用いた中赤外分光

Mid-infrared spectroscopy with a broadly tunable thin-film lithium niobate optical parametric oscillator ( http://arxiv.org/abs/2307.04199v1 )

ライセンス: Link先を確認

Alexander Y. Hwang, Hubert S. Stokowski, Taewon Park, Marc Jankowski, Timothy P. McKenna, Carsten Langrock, Jatadhari Mishra, Vahid Ansari, Martin M. Fejer, and Amir H. Safavi-Naeini

(参考訳) 中赤外分光法(mid-infrared spectroscopy)は、分子を感知する重要な技術であり、調整範囲が限られているか、現場での使用のために過度にかさばる源からの障壁に遭遇している。本稿では,これらの課題を克服した,コンパクトで効率的な広帯域可変光パラメトリック発振器(OPO)を提案する。薄膜ニオブ酸リチウムオンサファイアに実装した分散工学による単共振OPOを用いて,オクターブを1.5ミクロンから3.3ミクロンの範囲で広帯域かつ制御したチューニングを実現する。この装置は3.2ミクロンで25mWの赤外線光を生成し、電力変換効率は15%(量子効率45%)である。メタンとアンモニアのスペクトルを計測し, ガス検知に対するアプローチの有効性を検証することで, 装置のチューニングと性能を実証した。我々の装置は、非線形フォトニクスの小型化における重要な進歩を示し、高速・ブロードバンド中赤外分光の実用的応用を現実に近づける。

Mid-infrared spectroscopy, an important and widespread technique for sensing molecules, has encountered barriers stemming from sources either limited in tuning range or excessively bulky for practical field use. We present a compact, efficient, and broadly tunable optical parametric oscillator (OPO) device surmounting these challenges. Leveraging a dispersion-engineered singly-resonant OPO implemented in thin-film lithium niobate-on-sapphire, we achieve broad and controlled tuning over an octave, from 1.5 to 3.3 microns by combining laser and temperature tuning. The device generates > 25 mW of mid-infrared light at 3.2 microns, offering a power conversion efficiency of 15% (45% quantum efficiency). We demonstrate the tuning and performance of the device by successfully measuring the spectra of methane and ammonia, verifying our approach's relevance for gas sensing. Our device signifies an important advance in nonlinear photonics miniaturization and brings practical field applications of high-speed and broadband mid-infrared spectroscopy closer to reality.

翻訳日:2023-07-11 15:01:56 公開日:2023-07-09

# 現場作業におけるロボットアシスタントとの直感的対話のための自然言語指導

Natural Language Instructions for Intuitive Human Interaction with Robotic Assistants in Field Construction Work ( http://arxiv.org/abs/2307.04195v1 )

ライセンス: Link先を確認

Somin Park, Xi Wang, Carol C. Menassa, Vineet R. Kamat, Joyce Y. Chai

(参考訳) ロボットの導入は、建設産業に支障をきたす労働者不足や生産性の停滞を緩和する大きな可能性を秘めていると考えられている。しかし、複雑で非構造な建設現場で完全自動化されたロボットを使うことは困難である。ヒューマンロボットコラボレーション(HRC)は、建設作業に固有の不確実性に共同で対処するために、人間の労働者の柔軟性とロボットアシスタントの身体能力を組み合わせることを約束している。建設にHRCを導入する際には、現場建設におけるチームワークと監督の重要性を認識し、ヒューマンワーカーとロボットアシスタントの自然な直感的なコミュニケーションシステムを確立することが重要である。自然言語に基づく対話は、ロボットプログラミングの非熟練者のために、直感的で親しみやすいロボットとのコミュニケーションを可能にする。しかし、この話題に関する限定的な研究が建設中である。本稿では,人間の作業者が自然言語に基づく建設ロボットと対話できる枠組みを提案する。提案手法は,自然言語理解(NLU),情報マッピング(IM),ロボット制御(RC)の3段階からなる。自然言語命令は言語モデルに入力され、NLUモジュール内の各単語のタグを予測する。 IMモジュールは、NLUモジュールの結果とコンポーネント情報を用いて、ロボットが建設作業を認識し実行するために必要となる最終命令出力を生成する。提案手法を評価するために, ドライウォール設置の事例検討を行った。その結果,人間ロボットチームのコンテキスト内での作業者間のコミュニケーションを再現するために,自然言語によるインタラクションを利用する可能性を強調した。

The introduction of robots is widely considered to have significant potential of alleviating the issues of worker shortage and stagnant productivity that afflict the construction industry. However, it is challenging to use fully automated robots in complex and unstructured construction sites. Human-Robot Collaboration (HRC) has shown promise of combining human workers' flexibility and robot assistants' physical abilities to jointly address the uncertainties inherent in construction work. When introducing HRC in construction, it is critical to recognize the importance of teamwork and supervision in field construction and establish a natural and intuitive communication system for the human workers and robotic assistants. Natural language-based interaction can enable intuitive and familiar communication with robots for human workers who are non-experts in robot programming. However, limited research has been conducted on this topic in construction. This paper proposes a framework to allow human workers to interact with construction robots based on natural language instructions. The proposed method consists of three stages: Natural Language Understanding (NLU), Information Mapping (IM), and Robot Control (RC). Natural language instructions are input to a language model to predict a tag for each word in the NLU module. The IM module uses the result of the NLU module and building component information to generate the final instructional output essential for a robot to acknowledge and perform the construction task. A case study for drywall installation is conducted to evaluate the proposed approach. The obtained results highlight the potential of using natural language-based interaction to replicate the communication that occurs between human workers within the context of human-robot teams.

翻訳日:2023-07-11 15:01:37 公開日:2023-07-09

# SAS Video-QA: 効率的なビデオ質問応答のための自己適応サンプリング

SAS Video-QA: Self-Adaptive Sampling for Efficient Video Question-Answering ( http://arxiv.org/abs/2307.04192v1 )

ライセンス: Link先を確認

Wei Han, Hui Chen, Min-Yen Kan, Soujanya Poria

(参考訳) ビデオ質問応答は、ビデオ理解の分野における基本的な課題である。ビデオ変換器を備えた現在の視覚言語モデル(VLM)では、時間的モデリングが可能であり、優れた結果が得られるが、計算能力の巨大なコストがかかるため、リアルタイムアプリケーションシナリオへのデプロイには高すぎる。 An economical workaround only samples a small portion of frames to represent the main content of that video and tune an image--text model on these sampled frames. Recent video understanding models usually randomly sample a set of frames or clips, regardless of internal correlations between their visual contents, nor their relevance to the problem. We argue that such kinds of aimless sampling may omit the key frames from which the correct answer can be deduced, and the situation gets worse when the sampling sparsity increases, which always happens as the video lengths increase. To mitigate this issue, we propose two frame sampling strategies, namely the most domain frames (MDF) and most implied frames (MIF), to maximally preserve those frames that are most likely vital to the given questions. MDF passively minimizes the risk of key frame omission in a bootstrap manner, while MIS actively searches key frames customized for each video--question pair with the assistance of auxiliary models. 3つの高度なVLM(CLIP, GIT, All-in-one)による3つの公開データセットに対する実験結果から,提案手法が画像テキスト事前学習モデルの性能を向上させることを示す。本論文で提案されている手法に関するソースコードはhttps://github.com/declare-lab/sas-vqa.comで公開されている。

Video question--answering is a fundamental task in the field of video understanding. Although current vision--language models (VLMs) equipped with Video Transformers have enabled temporal modeling and yielded superior results, they are at the cost of huge computational power and thus too expensive to deploy in real-time application scenarios. An economical workaround only samples a small portion of frames to represent the main content of that video and tune an image--text model on these sampled frames. Recent video understanding models usually randomly sample a set of frames or clips, regardless of internal correlations between their visual contents, nor their relevance to the problem. We argue that such kinds of aimless sampling may omit the key frames from which the correct answer can be deduced, and the situation gets worse when the sampling sparsity increases, which always happens as the video lengths increase. To mitigate this issue, we propose two frame sampling strategies, namely the most domain frames (MDF) and most implied frames (MIF), to maximally preserve those frames that are most likely vital to the given questions. MDF passively minimizes the risk of key frame omission in a bootstrap manner, while MIS actively searches key frames customized for each video--question pair with the assistance of auxiliary models. The experimental results on three public datasets from three advanced VLMs (CLIP, GIT and All-in-one) demonstrate that our proposed strategies can boost the performance for image--text pretrained models. The source codes pertaining to the method proposed in this paper are publicly available at https://github.com/declare-lab/sas-vqa.

翻訳日:2023-07-11 15:01:12 公開日:2023-07-09

# ロジスティック回帰における推定のサンプル複雑性について

On the sample complexity of estimation in logistic regression ( http://arxiv.org/abs/2307.04191v1 )

ライセンス: Link先を確認

Daniel Hsu, Arya Mazumdar

(参考訳) ロジスティック回帰モデルは、ノイズの多いバイナリ分類問題において最も一般的なデータ生成モデルの一つである。本研究では,ロジスティック回帰モデルのパラメータを与えられた$\ell_2$誤差まで推定するサンプルの複雑さを,標準正規共変量を用いて,次元と逆温度の観点から検討する。逆温度は、データ生成プロセスの信号対雑音比を制御する。一般化境界とロジスティック回帰のための最大類似推定器の漸近的性能はよく研究されているが、誤差依存性とパラメータ推定の逆温度を示す非漸近的サンプル複雑性は、以前の解析から外れている。サンプルの複雑性曲線は逆温度の点で2つの変化点(もしくは臨界点)を持ち、低、中、高温の状態を明確に分離していることを示す。

The logistic regression model is one of the most popular data generation model in noisy binary classification problems. In this work, we study the sample complexity of estimating the parameters of the logistic regression model up to a given $\ell_2$ error, in terms of the dimension and the inverse temperature, with standard normal covariates. The inverse temperature controls the signal-to-noise ratio of the data generation process. While both generalization bounds and asymptotic performance of the maximum-likelihood estimator for logistic regression are well-studied, the non-asymptotic sample complexity that shows the dependence on error and the inverse temperature for parameter estimation is absent from previous analyses. We show that the sample complexity curve has two change-points (or critical points) in terms of the inverse temperature, clearly separating the low, moderate, and high temperature regimes.

翻訳日:2023-07-11 15:00:47 公開日:2023-07-09

# 異種グラフ表現学習を用いた病理組織学的全スライド画像解析

Histopathology Whole Slide Image Analysis with Heterogeneous Graph Representation Learning ( http://arxiv.org/abs/2307.04189v1 )

ライセンス: Link先を確認

Tsai Hor Chan, Fernando Julio Cendra, Lan Ma, Guosheng Yin, Lequan Yu

(参考訳) 種々の組織間の空間的関係をモデル化することの利点から,wsi解析にグラフベースの手法が広く適用されている。しかしながら、既存の手法のほとんどは、均質なグラフ(例えば、均質なノード型)によるwsisのモデリングに焦点を当てている。その成功にもかかわらず、これらの作品はwsiにおける生物学的実体間の複雑な構造的関係(例えば、異なる細胞種間の多様な相互作用)を採掘することができない。本稿では,WSI分析のために,異なる種類の核間の相互関係を利用する新しい異種グラフベースのフレームワークを提案する。具体的には、各ノードに"nucleus-type"属性と各エッジにセマンティック類似性属性を持つ異種グラフとしてwsiを定式化する。次に,マッサージアグリゲーション中にエッジとノードの不均一性を利用する新しい異種グラフエッジ属性トランスフォーマー(heat)を提案する。さらに,従来のクラスタベースプールの過度パラメータ化問題を緩和できるグラフレベルの特徴量を得るための,擬似ラベルベースのセマンティック一貫性プーリング機構を設計する。さらに,既存の連想型ローカライズ手法の限界を観測し,各ノードの寄与を因果駆動アプローチにより,フレームワークの解釈性を向上させることを提案する。 3つの公開TCGAベンチマークデータセットに対する大規模な実験により、我々のフレームワークは様々なタスクに対してかなりのマージンで最先端の手法よりも優れています。私たちのコードはhttps://github.com/HKU-MedAI/WSI-HGNNで公開されています。

Graph-based methods have been extensively applied to whole-slide histopathology image (WSI) analysis due to the advantage of modeling the spatial relationships among different entities. However, most of the existing methods focus on modeling WSIs with homogeneous graphs (e.g., with homogeneous node type). Despite their successes, these works are incapable of mining the complex structural relations between biological entities (e.g., the diverse interaction among different cell types) in the WSI. We propose a novel heterogeneous graph-based framework to leverage the inter-relationships among different types of nuclei for WSI analysis. Specifically, we formulate the WSI as a heterogeneous graph with "nucleus-type" attribute to each node and a semantic similarity attribute to each edge. We then present a new heterogeneous-graph edge attribute transformer (HEAT) to take advantage of the edge and node heterogeneity during massage aggregating. Further, we design a new pseudo-label-based semantic-consistent pooling mechanism to obtain graph-level features, which can mitigate the over-parameterization issue of conventional cluster-based pooling. Additionally, observing the limitations of existing association-based localization methods, we propose a causal-driven approach attributing the contribution of each node to improve the interpretability of our framework. Extensive experiments on three public TCGA benchmark datasets demonstrate that our framework outperforms the state-of-the-art methods with considerable margins on various tasks. Our codes are available at https://github.com/HKU-MedAI/WSI-HGNN.

翻訳日:2023-07-11 15:00:25 公開日:2023-07-09

# 動画圧縮のための予測符号化

Predictive Coding For Animation-Based Video Compression ( http://arxiv.org/abs/2307.04187v1 )

ライセンス: Link先を確認

Goluck Konuko, St\'ephane Lathuili\`ere and Giuseppe Valenzise

(参考訳) 会議型アプリケーションにおいて,映像を効率よく圧縮する問題に対処する。画像アニメーションをベースとした近年のアプローチは, 粗いキーポイントの集合で顔の動きを表現することで, 非常に低ビットレートで良好な再構成品質を実現することができる。しかし、これらの手法はフレームバイフレーム方式で映像をエンコードする、すなわち、各フレームは参照フレームから再構成されるため、帯域幅が大きくなると再構成品質が制限される。そこで我々は,画像アニメーションを予測器として用いる予測符号化方式を提案し,実際の対象フレームに対する残差を符号化する。残差は予測的な方法でコード化できるため、効率良く時間依存を取り除くことができる。実験の結果, HEVCビデオ標準に比べて70%以上, VVCに比べて30%以上, 有意なビットレート増加が認められた。

We address the problem of efficiently compressing video for conferencing-type applications. We build on recent approaches based on image animation, which can achieve good reconstruction quality at very low bitrate by representing face motions with a compact set of sparse keypoints. However, these methods encode video in a frame-by-frame fashion, i.e. each frame is reconstructed from a reference frame, which limits the reconstruction quality when the bandwidth is larger. Instead, we propose a predictive coding scheme which uses image animation as a predictor, and codes the residual with respect to the actual target frame. The residuals can be in turn coded in a predictive manner, thus removing efficiently temporal dependencies. Our experiments indicate a significant bitrate gain, in excess of 70% compared to the HEVC video standard and over 30% compared to VVC, on a datasetof talking-head videos

翻訳日:2023-07-11 14:59:57 公開日:2023-07-09

# 生成型大規模言語モデルによるasr誤り訂正は可能か?

Can Generative Large Language Models Perform ASR Error Correction? ( http://arxiv.org/abs/2307.04172v1 )

ライセンス: Link先を確認

Rao Ma, Mengjie Qian, Potsawee Manakul, Mark Gales, Kate Knill

(参考訳) ASR誤り訂正は、音声認識システムにおける後処理の重要な部分であり続けている。伝統的にこれらのモデルは、基礎となるasrシステムと参照テキストのデコード結果を使用して教師付きトレーニングでトレーニングされる。このアプローチは計算集約的であり、基礎となるASRモデルを切り替える際にモデルを再訓練する必要がある。近年,大規模言語モデルの開発や,自然言語処理タスクをゼロショットで行う能力が注目されている。本稿では,チャットgptを実例とし,ゼロショットまたは1ショット設定でasr誤り訂正を行う能力について検討する。我々は,asr n-bestリストをモデル入力として使用し,制約なし誤り訂正とn-best制約付き誤り補正法を提案する。コンフォーメータトランスデューサモデルと事前学習されたwhisperモデルの結果から,強力なchatgptモデルを用いた誤り訂正により,asrシステムの性能が大幅に向上することが示された。

ASR error correction continues to serve as an important part of post-processing for speech recognition systems. Traditionally, these models are trained with supervised training using the decoding results of the underlying ASR system and the reference text. This approach is computationally intensive and the model needs to be re-trained when switching the underlying ASR model. Recent years have seen the development of large language models and their ability to perform natural language processing tasks in a zero-shot manner. In this paper, we take ChatGPT as an example to examine its ability to perform ASR error correction in the zero-shot or 1-shot settings. We use the ASR N-best list as model input and propose unconstrained error correction and N-best constrained error correction methods. Results on a Conformer-Transducer model and the pre-trained Whisper model show that we can largely improve the ASR system performance with error correction using the powerful ChatGPT model.

翻訳日:2023-07-11 14:59:43 公開日:2023-07-09

# 教師なし混合手法によるRedditからのドリームコンテンツ発見

Dream Content Discovery from Reddit with an Unsupervised Mixed-Method Approach ( http://arxiv.org/abs/2307.04167v1 )

ライセンス: Link先を確認

Anubhab Das, Sanja \v{S}\'cepanovi\'c, Luca Maria Aiello, Remington Mallett, Deirdre Barrett, and Daniele Quercia

(参考訳) 夢は人間の体験の基本的な部分ですが、完全には理解されていません。伝統的なドリーム分析のプラクティスは、130以上のユニークなスケールと評価システムによって人気があり助けられているが、制限がある。主に振り返り調査や研究室の調査に基づいて、それらは大規模に適用されるか、異なる夢のテーマ間の重要性とつながりを示すのに苦労している。これらの問題を克服するために,自然言語処理による自由形式のドリームレポートにおけるトピックを識別するためのデータ駆動型混合手法を開発した。 Redditのr/Dreamsサブレディット(r/Dreams subreddit)の44,213のドリームレポートでこの方法を試したところ、217のトピックが22の大きなテーマにまとめられました。広範に使用されているホールとファン・デ・キャッスルのスケールと比較し,そのトピックを検証する。従来のスケールを超えて、様々な種類の夢(悪夢や繰り返しの夢など)に特有のパターンを見つけ、話題の重要性とつながりを理解し、covid-19パンデミックや最近のロシア・ウクライナ戦争のような主要な出来事に関する集団的な夢体験の変化を観察します。本手法の応用は,夢の複雑な性質に対する貴重な洞察を与えるものと期待する。

Dreaming is a fundamental but not fully understood part of human experience that can shed light on our thought patterns. Traditional dream analysis practices, while popular and aided by over 130 unique scales and rating systems, have limitations. Mostly based on retrospective surveys or lab studies, they struggle to be applied on a large scale or to show the importance and connections between different dream themes. To overcome these issues, we developed a new, data-driven mixed-method approach for identifying topics in free-form dream reports through natural language processing. We tested this method on 44,213 dream reports from Reddit's r/Dreams subreddit, where we found 217 topics, grouped into 22 larger themes: the most extensive collection of dream topics to date. We validated our topics by comparing it to the widely-used Hall and van de Castle scale. Going beyond traditional scales, our method can find unique patterns in different dream types (like nightmares or recurring dreams), understand topic importance and connections, and observe changes in collective dream experiences over time and around major events, like the COVID-19 pandemic and the recent Russo-Ukrainian war. We envision that the applications of our method will provide valuable insights into the intricate nature of dreaming.

翻訳日:2023-07-11 14:59:26 公開日:2023-07-09

# 深部特徴統計モデルによる映像サーベイランスにおける偽アラームの低減

Reducing False Alarms in Video Surveillance by Deep Feature Statistical Modeling ( http://arxiv.org/abs/2307.04159v1 )

ライセンス: Link先を確認

Xavier Bou, Aitor Artola, Thibaud Ehret, Gabriele Facciolo, Jean-Michel Morel, Rafael Grompone von Gioi

(参考訳) 関連する変化を検出することは、ビデオ監視の根本的な問題である。データの可変性が高く、適切に変更をアノテートすることが難しいため、教師なしのメソッドがフィールドを支配している。実用性を実現する上で最も重要な問題のひとつは、誤報率を下げることだろう。本研究では, 深部特徴の高次元統計モデルに基づく手法に依存しない弱教師付きa-コントラリオ検証法を開発し, 変化検出アルゴリズムの誤報数を削減する。また,ほとんどの実アプリケーションの性能要求を正確に把握できないため,従来の画素評価では不十分である。このため、画素単位のメトリクスとオブジェクト単位のメトリクスを補完し、異なるデータセットからの6つのメソッドと複数のシーケンスに対して、画素レベルとオブジェクトレベルの両方でのアプローチの影響を評価する。実験結果から,提案するa-contrarioバリデーションにより,画素レベルとオブジェクトレベルでの誤報数を大幅に削減できることがわかった。

Detecting relevant changes is a fundamental problem of video surveillance. Because of the high variability of data and the difficulty of properly annotating changes, unsupervised methods dominate the field. Arguably one of the most critical issues to make them practical is to reduce their false alarm rate. In this work, we develop a method-agnostic weakly supervised a-contrario validation process, based on high dimensional statistical modeling of deep features, to reduce the number of false alarms of any change detection algorithm. We also raise the insufficiency of the conventionally used pixel-wise evaluation, as it fails to precisely capture the performance needs of most real applications. For this reason, we complement pixel-wise metrics with object-wise metrics and evaluate the impact of our approach at both pixel and object levels, on six methods and several sequences from different datasets. Experimental results reveal that the proposed a-contrario validation is able to largely reduce the number of false alarms at both pixel and object levels.

翻訳日:2023-07-11 14:58:58 公開日:2023-07-09

# 感度インフォーム多項式カオス展開と深部生成ネットワークを用いた地質コンプレックスによるベイズ旅行時間トモグラフィの効率化

Efficient Bayesian travel-time tomography with geologically-complex priors using sensitivity-informed polynomial chaos expansion and deep generative networks ( http://arxiv.org/abs/2307.04228v1 )

ライセンス: Link先を確認

Giovanni Angelo Meles, Macarena Amaya, Shiran Levy, Stefano Marelli, Niklas Linde

(参考訳) モンテカルロ・マルコフ・チェーン (mcmc) 法は、事前分布の正確なキャラクタリゼーションと確率の効率的な評価という2つの基本的な課題に直面する。トモグラフィーに関するベイズ研究の文脈では、主成分分析(PCA)は、計算集約的な全物理前方解法を置き換えるために多項式カオス展開(PCE)に基づく正確な代理モデルの実装を可能にすると同時に、事前分布の直接的な定義を容易にする。 PCAが、より深い生成モデル(VAE)のような、事前の配布方法を簡単に定義する手段を提供していないシナリオに直面する場合、実行可能なオプションとして使用できる。しかしながら、VAEの潜伏パラメータとフォワードモデリングの出力との間の複雑な非線形関係を捉えることができるサロゲートを正確に生成することは、注目すべき課題である。実際、PCEモデルは、入力-出力関係が比較的低次多変量多項式によって効果的に近似できる場合に高い精度を提供するが、この条件は通常、深層生成モデルから派生した潜時変数を利用する際には未成熟である。本研究では,prio表現の観点からのvaeの優れた再構成性能と,ベイズ地中レーダ(gpr)トモグラフィの文脈におけるpca-pceサロゲートモデル精度を組み合わせた手法を提案する。 MCMCプロセス内では、VOEのパラメトリゼーションが事前探索とサンプル提案に利用される。同時に、VAEサンプルのグローバルまたはローカルに定義された主成分を検査対象とするPCEを用いてモデリングを行う。

Monte Carlo Markov Chain (MCMC) methods commonly confront two fundamental challenges: the accurate characterization of the prior distribution and the efficient evaluation of the likelihood. In the context of Bayesian studies on tomography, principal component analysis (PCA) can in some cases facilitate the straightforward definition of the prior distribution, while simultaneously enabling the implementation of accurate surrogate models based on polynomial chaos expansion (PCE) to replace computationally intensive full-physics forward solvers. When faced with scenarios where PCA does not offer a direct means of easily defining the prior distribution alternative methods like deep generative models (e.g., variational autoencoders (VAEs)), can be employed as viable options. However, accurately producing a surrogate capable of capturing the intricate non-linear relationship between the latent parameters of a VAE and the outputs of forward modeling presents a notable challenge. Indeed, while PCE models provide high accuracy when the input-output relationship can be effectively approximated by relatively low-degree multivariate polynomials, this condition is typically unmet when utilizing latent variables derived from deep generative models. In this contribution, we present a strategy that combines the excellent reconstruction performances of VAE in terms of prio representation with the accuracy of PCA-PCE surrogate modeling in the context of Bayesian ground penetrating radar (GPR) travel-time tomography. Within the MCMC process, the parametrization of the VAE is leveraged for prior exploration and sample proposal. Concurrently, modeling is conducted using PCE, which operates on either globally or locally defined principal components of the VAE samples under examination.

翻訳日:2023-07-11 14:50:58 公開日:2023-07-09

# 再サンプリングを伴う拡散入射モデルに基づく地震データ補間

Seismic Data Interpolation based on Denoising Diffusion Implicit Models with Resampling ( http://arxiv.org/abs/2307.04226v1 )

ライセンス: Link先を確認

Xiaoli Wei, Chunxia Zhang, Hongtao Wang, Chengli Tan, Deng Xiong, Baisong Jiang, Jiangshe Zhang, Sang-Woon Kim

(参考訳) 空間拡張に伴う痕跡の欠如に起因する地震データの不完全性は,地下地質構造の撮像品質を著しく損なう障害や経済的な制約が存在するため,地震探査において一般的な問題である。近年, 深層学習に基づく補間法が有望な進歩を遂げているが, 生成型逆ネットワークの安定な訓練は容易ではなく, テストやトレーニングの欠落パターンが一致しない場合, 性能劣化が顕著である。そこで本稿では,再サンプリングによる暗黙的拡散モデルを提案する。モデルトレーニングは、U-Netが各ステップのノイズにマッチするマルチヘッド自己アテンションを備えているデノナイジング拡散確率モデルに基づいて行われる。グローバルノイズ構成としてのコサインノイズスケジュールは、過大なノイズステージの通過を加速することにより、既知のトレース情報の高利用を促進する。モデル推論は、既知のトレースの条件付けである拡散暗黙モデルを利用して、拡散ステップの少ない高品質な補間を可能にする。各逆ステップにおける既知のトレースと不足トレースとの一貫性を高めるために、推論プロセスは、再サンプリング戦略を統合し、以前の補間されたトレースに記録された情報を取得する。合成およびフィールド地震探査データを用いた大規模実験により, モデルが優れていること, 各種の欠落パターンに対するロバスト性について検証した。また不確かさの定量化とアブレーションの研究も行われている。

The incompleteness of the seismic data caused by missing traces along the spatial extension is a common issue in seismic acquisition due to the existence of obstacles and economic constraints, which severely impairs the imaging quality of subsurface geological structures. Recently, deep learning-based seismic interpolation methods have attained promising progress, while achieving stable training of generative adversarial networks is not easy, and performance degradation is usually notable if the missing patterns in the testing and training do not match. In this paper, we propose a novel seismic denoising diffusion implicit model with resampling. The model training is established on the denoising diffusion probabilistic model, where U-Net is equipped with the multi-head self-attention to match the noise in each step. The cosine noise schedule, serving as the global noise configuration, promotes the high utilization of known trace information by accelerating the passage of the excessive noise stages. The model inference utilizes the denoising diffusion implicit model, conditioning on the known traces, to enable high-quality interpolation with fewer diffusion steps. To enhance the coherency between the known traces and the missing traces within each reverse step, the inference process integrates a resampling strategy to achieve an information recap on the former interpolated traces. Extensive experiments conducted on synthetic and field seismic data validate the superiority of our model and its robustness on various missing patterns. In addition, uncertainty quantification and ablation studies are also investigated.

翻訳日:2023-07-11 14:50:26 公開日:2023-07-09

# 赤外線・熱画像融合による火災シナリオのリアルタイム人体検出

Real-time Human Detection in Fire Scenarios using Infrared and Thermal Imaging Fusion ( http://arxiv.org/abs/2307.04223v1 )

ライセンス: Link先を確認

Truong-Dong Do, Nghe-Nhan Truong and My-Ha Le

(参考訳) 火災は人命に対する最も深刻な脅威の1つと考えられており、死者の確率が高い。これらの深刻な影響は、避難する犠牲者や救助隊の視認性をほとんど制限する火災による激しい煙によるものである。このような危険な状況下では、視覚に基づく人間検出システムを使用することで、より多くの命を救う能力を向上させることができる。そこで本論文では, 煙による低視認性シナリオにおける人間検出のための複数のカメラを用いた熱赤外画像融合方式を提案する。複数のカメラで処理することで、人間の検出に有用な特徴を生成するために、バイタル情報を収集することができる。まず、カメラはLight Heatating Chessboardを使って調整される。その後、軽量のディープニューラルネットワークを通過する前に入力画像から抽出した特徴をマージして人検出タスクを実行する。 NVIDIA Jetson Nano コンピュータで行った実験により,提案手法は妥当な速度で処理でき,mAP@0.5 95% で良好な性能が得られることを示した。

Fire is considered one of the most serious threats to human lives which results in a high probability of fatalities. Those severe consequences stem from the heavy smoke emitted from a fire that mostly restricts the visibility of escaping victims and rescuing squad. In such hazardous circumstances, the use of a vision-based human detection system is able to improve the ability to save more lives. To this end, a thermal and infrared imaging fusion strategy based on multiple cameras for human detection in low-visibility scenarios caused by smoke is proposed in this paper. By processing with multiple cameras, vital information can be gathered to generate more useful features for human detection. Firstly, the cameras are calibrated using a Light Heating Chessboard. Afterward, the features extracted from the input images are merged prior to being passed through a lightweight deep neural network to perform the human detection task. The experiments conducted on an NVIDIA Jetson Nano computer demonstrated that the proposed method can process with reasonable speed and can achieve favorable performance with a mAP@0.5 of 95%.

翻訳日:2023-07-11 14:50:00 公開日:2023-07-09

# lakebench: データレイク上のデータディスカバリのベンチマーク

LakeBench: Benchmarks for Data Discovery over Data Lakes ( http://arxiv.org/abs/2307.04217v1 )

ライセンス: Link先を確認

Kavitha Srinivas, Julian Dolby, Ibrahim Abdelaziz, Oktie Hassanzadeh, Harsha Kokel, Aamod Khatiwada, Tejaswini Pedapati, Subhajit Chaudhury, Horst Samulowitz

(参考訳) 企業では、データ発見を中心に、データレイクをインテリジェントにナビゲートする必要性が高まっています。企業にとって特に重要なのは、関連するテーブルをデータレポジトリで見つける能力だ。これらのテーブルは互いに結合可能、結合可能、あるいはサブセットでもよい。パブリックドメインにはこれらのタスクのベンチマークが多数あり、関連する作業はプライベートデータセットをターゲットにしている。 LakeBenchでは、CKAN、ソクラタ、欧州中央銀行の政府データなど、さまざまなデータソースから抽出された表を用いて、これらのタスクの複数のベンチマークを作成する。これらのタスクにおける4つの表型基礎モデルの性能を比較した。既存のモデルはいずれも、このベンチマークのために開発したデータ発見タスクについてトレーニングされていません。その結果,このようなベンチマークの確立は,データレイクにおけるデータ発見に有用な表型モデルを構築する上で,コミュニティにとって有用であることが示唆された。

Within enterprises, there is a growing need to intelligently navigate data lakes, specifically focusing on data discovery. Of particular importance to enterprises is the ability to find related tables in data repositories. These tables can be unionable, joinable, or subsets of each other. There is a dearth of benchmarks for these tasks in the public domain, with related work targeting private datasets. In LakeBench, we develop multiple benchmarks for these tasks by using the tables that are drawn from a diverse set of data sources such as government data from CKAN, Socrata, and the European Central Bank. We compare the performance of 4 publicly available tabular foundational models on these tasks. None of the existing models had been trained on the data discovery tasks that we developed for this benchmark; not surprisingly, their performance shows significant room for improvement. The results suggest that the establishment of such benchmarks may be useful to the community to build tabular models usable for data discovery in data lakes.

翻訳日:2023-07-11 14:49:46 公開日:2023-07-09

# 階層型オートエンコーダを用いた大規模高解像度科学データに対するロシー圧縮

Hierarchical Autoencoder-based Lossy Compression for Large-scale High-resolution Scientific Data ( http://arxiv.org/abs/2307.04216v1 )

ライセンス: Link先を確認

Hieu Le, Hernan Santos, Jian Tao

(参考訳) ロスシー圧縮は多くの領域でデータサイズを減らす重要な技術となっている。この種の圧縮は、サイズが数ペタバイトに及ぶ大規模な科学データに特に有用である。オートエンコーダベースのモデルは画像やビデオの圧縮に成功しているが、そのようなニューラルネットワークは科学データ領域で広く注目を集めていない。本研究は,大規模科学データを著しく圧縮するだけでなく,高い再構成品質を維持するニューラルネットワークを提案する。提案モデルは,大規模高分解能気候モデルデータセットに適用可能な科学ベンチマークデータを用いて検証した。本モデルは,複数のベンチマークデータセットにおいて,復元品質を損なうことなく圧縮率140を達成する。高分解能コミュニティ・アース・システム・モデル(cesm)のバージョン1.3のシミュレーションデータは、500年以上にわたって圧縮率200で圧縮されているが、復元誤差は科学的解析には無視できない。

Lossy compression has become an important technique to reduce data size in many domains. This type of compression is especially valuable for large-scale scientific data, whose size ranges up to several petabytes. Although Autoencoder-based models have been successfully leveraged to compress images and videos, such neural networks have not widely gained attention in the scientific data domain. Our work presents a neural network that not only significantly compresses large-scale scientific data but also maintains high reconstruction quality. The proposed model is tested with scientific benchmark data available publicly and applied to a large-scale high-resolution climate modeling data set. Our model achieves a compression ratio of 140 on several benchmark data sets without compromising the reconstruction quality. Simulation data from the High-Resolution Community Earth System Model (CESM) Version 1.3 over 500 years are also being compressed with a compression ratio of 200 while the reconstruction error is negligible for scientific analysis.

翻訳日:2023-07-11 14:49:32 公開日:2023-07-09

# 360$^\circ$データを用いた一般アクションベースボール復元モデル

Generalized Action-based Ball Recovery Model using 360$^\circ$ data ( http://arxiv.org/abs/2307.04215v1 )

ライセンス: Link先を確認

Ricardo Furbino Marques do Nascimento and Hugo M. R. Rios-Neto

(参考訳) しかし、マンチェスター・シティ、リバプール、リーズ・ユナイテッドといったチームは、この数年間で失ったボールをすぐに取り戻そうとしている。現在、世界トップマネージャの何人かは、ハイプレッシャースタイルを採用しており、通常はguardiolaとクレジットされる5秒ルールのような概念は、[9][10]を広めており、近年、多くのチームがプレーしている基本的な部分となっている。メディア[4][5][6]では、“息を吸わない”や“できるだけ早くボールを取り戻す”といった表現が頻繁に聞かれるが、持ち主の変更に最も繋がるアクションは何か? チームの位置決めがボールのリカバリに与える影響は? プレッシャーを受けると、より頻繁に崩壊する選手はどちらですか。上記のようにプレイヤーを強烈に押すわけではないチームの防御力を評価することは可能か? 本稿では, Statsbomb 360$^\circ$データを用いてGABR(Generalized Action based Ball Recovery Model)を作成することで, これらの疑問に答えようとしている。

Even though having more possession does not necessarily lead to winning, teams like Manchester City, Liverpool, and Leeds United notably have tried to recover the ball quickly after they lost it over the past few years. Nowadays, some of the top managers in the world apply high-pressing styles, and concepts such as the five-second rule, usually credited to Guardiola, have been spreading out [9][10], becoming a fundamental part of how lots of teams have played over the recent years. Expressions like "don't let them breathe" and "get the ball back as soon as possible" are often heard in the media [4][5][6], but what are the actions that most lead to a change in possession? What is the influence of a team's positioning on the ball recovery? Which are the players that more often collapse when under pressure? Can we evaluate the defensive dynamics of teams that do not necessarily press the player in possession as intensely as those mentioned above? We try to answer those and other questions in this paper by creating a Generalized Action based Ball Recovery model (GABR) using Statsbomb 360$^\circ$ data.

翻訳日:2023-07-11 14:49:19 公開日:2023-07-09

# 強化学習における安定現象のエッジの検討

Investigating the Edge of Stability Phenomenon in Reinforcement Learning ( http://arxiv.org/abs/2307.04210v1 )

ライセンス: Link先を確認

Rares Iordan, Marc Peter Deisenroth, Mihaela Rosca

(参考訳) 近年,教師付き学習における安定性現象のエッジを明らかにする運動量による全バッチ勾配降下学習ニューラルネットワークの最適化ダイナミクスの理解が進んでいる。安定現象のエッジは、ヘッシアンの主固有値が二次損失に対する最適化アルゴリズムの発散しきい値に達すると発生し、その後、しきい値の周りを振動し始め、損失は局所不安定となり始めるが、長い時間フレームで減少する。本研究では,オフラインからオンラインrlまで,さまざまなデータレジームにまたがるオフポリシーq-ラーニングアルゴリズムである強化学習(rl)における安定性現象のエッジについて検討する。実験の結果,データ分布の非定常性やブートストラップの利用など,教師あり学習に大きく違いがあるにもかかわらず,非政治的な深層RLには安定性現象の端が存在することがわかった。しかし、教師あり学習とは異なり、根底にある損失によって強い違いが観察され、DQN -- Huber損失 -- はC51では観測できない安定性効果の強いエッジを示す。この結果から,ニューラルネットワーク構造は問題領域間の移動を最適化するダイナミクスをもたらす可能性があるが,深いRL最適化の特定の側面は,教師付き学習のような領域と区別できる可能性が示唆された。

Recent progress has been made in understanding optimisation dynamics in neural networks trained with full-batch gradient descent with momentum with the uncovering of the edge of stability phenomenon in supervised learning. The edge of stability phenomenon occurs as the leading eigenvalue of the Hessian reaches the divergence threshold of the underlying optimisation algorithm for a quadratic loss, after which it starts oscillating around the threshold, and the loss starts to exhibit local instability but decreases over long time frames. In this work, we explore the edge of stability phenomenon in reinforcement learning (RL), specifically off-policy Q-learning algorithms across a variety of data regimes, from offline to online RL. Our experiments reveal that, despite significant differences to supervised learning, such as non-stationarity of the data distribution and the use of bootstrapping, the edge of stability phenomenon can be present in off-policy deep RL. Unlike supervised learning, however, we observe strong differences depending on the underlying loss, with DQN -- using a Huber loss -- showing a strong edge of stability effect that we do not observe with C51 -- using a cross entropy loss. Our results suggest that, while neural network structure can lead to optimisation dynamics that transfer between problem domains, certain aspects of deep RL optimisation can differentiate it from domains such as supervised learning.

翻訳日:2023-07-11 14:48:58 公開日:2023-07-09

# 企業におけるプライバシ保護型合成データの展開の課題

On the Challenges of Deploying Privacy-Preserving Synthetic Data in the Enterprise ( http://arxiv.org/abs/2307.04208v1 )

ライセンス: Link先を確認

Lauren Arthur, Jason Costello, Jonathan Hardy, Will O'Brien, James Rea, Gareth Rees, Georgi Ganev

(参考訳) 生成AI技術は前例のない人気を得ており、その優れた能力によって興奮と不安が混ざり合っている。本稿では,生成AIのサブフィールドである合成データのデプロイに関わる課題について検討する。当社の焦点は企業の展開であり、大量の個人的および高度に機密性の高いデータによって引き起こされるプライバシーの懸念に重点を置いている。 40以上の課題を特定し、それらを5つの主要なグループに体系化する。 i)世代二インフラ及び建築三統治四コンプライアンス及び規制、及び v) 採用。さらに,企業が課題に効果的に対処し,実現したソリューションへの信頼を確立することで目標を達成するための戦略的かつ体系的なアプローチについても論じる。

Generative AI technologies are gaining unprecedented popularity, causing a mix of excitement and apprehension through their remarkable capabilities. In this paper, we study the challenges associated with deploying synthetic data, a subfield of Generative AI. Our focus centers on enterprise deployment, with an emphasis on privacy concerns caused by the vast amount of personal and highly sensitive data. We identify 40+ challenges and systematize them into five main groups -- i) generation, ii) infrastructure & architecture, iii) governance, iv) compliance & regulation, and v) adoption. Additionally, we discuss a strategic and systematic approach that enterprises can employ to effectively address the challenges and achieve their goals by establishing trust in the implemented solutions.

翻訳日:2023-07-11 14:48:35 公開日:2023-07-09

# forward アルゴリズムの拡張

Extending the Forward Forward Algorithm ( http://arxiv.org/abs/2307.04205v1 )

ライセンス: Link先を確認

Saumya Gandhi, Ritu Gala, Jonah Kornberg, Advaith Sridhar

(参考訳) 2022年11月にGeoffrey Hintonによって提案されたフォワードフォワードアルゴリズムは、バックプロパゲーションの代わりにニューラルネットワークをトレーニングするための新しい方法である。本プロジェクトでは,mnistデータセットにおける hinton の実験を再現し,その手法の範囲を2つの重要な貢献で拡張する。まず,imdb movie reviewsデータセット上で,フォワードフォワードネットワークのベースライン性能を確立する。私たちが知る限り、この感情分析タスクの結果は、コンピュータビジョンを超えたアルゴリズムの拡張の最初の例である。第二に、損失閾値に対する新しいピラミッド最適化戦略、すなわちフォワードフォワード法に特有のハイパーパラメータを導入する。我々のピラミッド的アプローチは、良好なしきい値戦略がテストエラーの最大8%の差を引き起こすことを示している。最後に,訓練パラメータの可視化を行い,特に大きな (10-20x) 平均や前方ネットワークによって獲得された重みのばらつきなど,いくつかの重要な洞察を得た。

The Forward Forward algorithm, proposed by Geoffrey Hinton in November 2022, is a novel method for training neural networks as an alternative to backpropagation. In this project, we replicate Hinton's experiments on the MNIST dataset, and subsequently extend the scope of the method with two significant contributions. First, we establish a baseline performance for the Forward Forward network on the IMDb movie reviews dataset. As far as we know, our results on this sentiment analysis task marks the first instance of the algorithm's extension beyond computer vision. Second, we introduce a novel pyramidal optimization strategy for the loss threshold - a hyperparameter specific to the Forward Forward method. Our pyramidal approach shows that a good thresholding strategy causes a difference of upto 8% in test error. 1 Lastly, we perform visualizations of the trained parameters and derived several significant insights, such as a notably larger (10-20x) mean and variance in the weights acquired by the Forward Forward network.

翻訳日:2023-07-11 14:48:24 公開日:2023-07-09

# 軌道アライメント:分岐理論による安定性現象の端の理解

Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory ( http://arxiv.org/abs/2307.04204v1 )

ライセンス: Link先を確認

Minhak Song, Chulhee Yun

(参考訳) cohen et al. (2021) は勾配降下(gd)軌道に沿って損失ヘッセンの最大の固有値の進化を実証的に研究し、安定性のエッジ(英語版)(eos)と呼ばれる現象を観測した。トレーニングの初期段階(プログレッシブ・シャープニング(progressive sharpening)と呼ばれる)でシャープ性が向上し、最終的に2 / \text{(step size)$のしきい値近くで飽和する。本稿では、EoS現象が起こると(適切な再パラメータ化の後)異なるGD軌道が初期化とは無関係に特定の分岐図に整列することを示す経験的研究から始める。次に、この軌道アライメント現象を2層完全連結線形ネットワークと1つのデータポイントで訓練された1つの非線形ネットワークに対して厳密に証明する。トラジェクトリアライメント分析により,最近の文献の知見を包含し,拡張する進行的シャープニングとEoS現象が確立される。

Cohen et al. (2021) empirically study the evolution of the largest eigenvalue of the loss Hessian, also known as sharpness, along the gradient descent (GD) trajectory and observe a phenomenon called the Edge of Stability (EoS). The sharpness increases at the early phase of training (referred to as progressive sharpening), and eventually saturates close to the threshold of $2 / \text{(step size)}$. In this paper, we start by demonstrating through empirical studies that when the EoS phenomenon occurs, different GD trajectories (after a proper reparameterization) align on a specific bifurcation diagram independent of initialization. We then rigorously prove this trajectory alignment phenomenon for a two-layer fully-connected linear network and a single-neuron nonlinear network trained with a single data point. Our trajectory alignment analysis establishes both progressive sharpening and EoS phenomena, encompassing and extending recent findings in the literature.

翻訳日:2023-07-11 14:48:09 公開日:2023-07-09

# イノベーションのエコシステムを育む - ステークホルダ,ツール,人々の相乗効果

Thriving Innovation Ecosystems: Synergy Among Stakeholders, Tools, and People ( http://arxiv.org/abs/2307.04263v1 )

ライセンス: Link先を確認

Shruti Misra, Denise Wilson

(参考訳) イノベーションエコシステムは、様々な利害関係者が交流して複雑な社会技術的課題を解決する、マルチステークホルダー環境である。我々は、ステークホルダーがデジタルツール、人的資源、それらの組み合わせを使って情報を集め、イノベーションエコシステムで意思決定する方法について検討した。利害関係者のモチベーション,情報ニーズ,実践を包括的に理解するため,インタラクティブなデジタルダッシュボードを用いて5つの利害関係者グループ(N=13)を対象に,三部インタビュー調査を行った。利害関係者は主に、彼らの貢献の潜在的な社会的影響によって、イノベーションエコシステムに参加する動機があることに気付きました。また、ステークホルダーはデジタルツールを使って「ハイレベル」な情報を探し出し、初期意思決定の努力を足場としたが、最終的な決定は人間のネットワークが提供するコンテキスト情報に依存していた。したがって、デジタルツールではなく、人々はこれらのエコシステムにおける重要な情報源であるように見える。我々は,技術がステークホルダーの意思決定努力をいかに強化し,堅牢で公平なイノベーションエコシステムを実現するかを検討した。

An innovation ecosystem is a multi-stakeholder environment, where different stakeholders interact to solve complex socio-technical challenges. We explored how stakeholders use digital tools, human resources, and their combination to gather information and make decisions in innovation ecosystems. To comprehensively understand stakeholders' motivations, information needs and practices, we conducted a three-part interview study across five stakeholder groups (N=13) using an interactive digital dashboard. We found that stakeholders were primarily motivated to participate in innovation ecosystems by the potential social impact of their contributions. We also found that stakeholders used digital tools to seek "high-level" information to scaffold initial decision-making efforts but ultimately relied on contextual information provided by human networks to enact final decisions. Therefore, people, not digital tools, appear to be the key source of information in these ecosystems. Guided by our findings, we explored how technology might nevertheless enhance stakeholders' decision-making efforts and enable robust and equitable innovation ecosystems.

翻訳日:2023-07-11 14:41:31 公開日:2023-07-09

# ビームスプリッタアレイ上の量子ランダムウォーク

Quantum random walks on a beam splitter array ( http://arxiv.org/abs/2307.04262v1 )

ライセンス: Link先を確認

Mario Ivan Estrada Delgado and Zurika Iveth Blanco Garcia

(参考訳) ビームスプリッタアレイの一般的な行列表現を示す。各ビームスプリッターは送信/反射係数を持ち、それぞれの装置の動作を決定し、その結果、システム全体の応答を決定する。各ビームスプリッターの一般的な行列表現は、2n-{th}$次元空間の回転として与えられる。これらの演算子により、配列全体を記述し、その結果、入力光子状態の最終確率分布を計算することができる。

The general matrix representation of a beam splitter array is presented. Each beam splitter has a transmission/reflection coefficient that determines the behavior of these individual devices and, in consequence, the whole system response. The general matrix representation of each beam splitter is given as rotations of a $2n-{th}$ dimensional space. With these operators, the matrix that describes the entire array and, consequently, the final probability distribution of an input photon state can be calculated.

翻訳日:2023-07-11 14:41:11 公開日:2023-07-09

# 量子確率過程からの古典性

Classicality from Quantum Stochastic Processes ( http://arxiv.org/abs/2307.04258v1 )

ライセンス: Link先を確認

Esteban Mart\'inez-Vargas

(参考訳) 我々は量子システムから古典論を発展させる。この理論は古典的および量子的定常確率過程の研究に由来する。確率過程は、多面体(古典)および半定値の代表(量子)錐体によって特徴づけられる。以前の結果 cite{2209.06806v1} に基づいて、量子チャネルからの固定点の研究を拡大する。我々は、コアと多くのイテレーションで崩壊する部分に分かれた量子チャネルを特徴付ける半定値プログラムを与える。一般に、解は定義されている空間において非分離である。分離可能な場合の固定点の観点から,チャネルの特性を示す。そして、多面体円錐の量子シミュレーションを構築することができる。

We develop a theory of classicality from quantum systems. This theory stems from the study of classical and quantum stationary stochastic processes. The stochastic processes are characterized by polyhedral (classical) and semidefinite representative (quantum) cones. Based on a previous result \cite{2209.06806v1} we expand the study of fixed points from quantum channels. We give a semidefinite program that characterizes a quantum channel separating into a core and a part that decays with many iterations. In general, the solution is non-separable in the space it is defined. We present a characterization of channels in terms of their fixed points for the separable case. A quantum simulation of a polyhedral cone can then be constructed.

翻訳日:2023-07-11 14:41:04 公開日:2023-07-09

# 古典領域と量子領域における学習と制御の枠組み

Framework for Learning and Control in the Classical and Quantum Domains ( http://arxiv.org/abs/2307.04256v1 )

ライセンス: Link先を確認

Seyed Shakib Vedaie, Archismita Dalal, Eduardo J. P\'aez, Barry C. Sanders

(参考訳) 制御と学習は古典的領域と量子的領域の両方において技術進歩の鍵であるが、古典的および量子的定義と学習の間の相互関係は文学において不十分である。我々は,古典的および量子的に,学習と制御を形式的に関連付ける枠組みを構築し,学習が制御にどのように役立つかを示す。さらに,本フレームワークは,古典的および量子的制御と学習のネクサスにおける興味深い未解決問題を識別し,問題解決ツールの選択を支援する。利用例として,適応型量子エンハンス型干渉位相推定法を,実現可能な制御方針を考案するための教師あり学習問題とした。これらの分野の統合は、既存の知識をエレガントに要約し、知識ギャップを露呈する知識の状態を図式的に表現することに依存します。

Control and learning are key to technological advancement, both in the classical and quantum domains, yet their interrelationship is insufficiently clear in the literature, especially between classical and quantum definitions of control and learning. We construct a framework that formally relates learning and control, both classical and quantum, to each other, with this formalism showing how learning can aid control. Furthermore, our framework helps to identify interesting unsolved problems in the nexus of classical and quantum control and learning and help in choosing tools to solve problems. As a use case, we cast the well-studied problem of adaptive quantum-enhanced interferometric-phase estimation as a supervised learning problem for devising feasible control policies. Our unification of these fields relies on diagrammatically representing the state of knowledge, which elegantly summarizes existing knowledge and exposes knowledge gaps.

翻訳日:2023-07-11 14:40:56 公開日:2023-07-09

# 量子機構としての相対論的時間拡張

Relativistic time dilation as a quantum mechanism ( http://arxiv.org/abs/2307.04254v1 )

ライセンス: Link先を確認

Esteban Mart\'inez-Vargas

(参考訳) 量子システムを用いた時間拡張のメカニズムを提案する。我々は、異なる参照フレームからの量子状態の変化に敏感な作用素の族を導入する。参照フレーム間の変化はガリレオ変換によって行われるので、この場合の拡張の源は可観測性に由来する。これらの観測物は時間とともに線形に成長し、状態の基準フレームによって線形成長はその傾きが変化するので、同じ点まで成長するのに時間がかかる。このようなメカニズムは、時空に対する通常の理解とは異なる見方を意味する。

We propose a mechanism for time dilation using quantum systems. We introduce a family of operators that are sensitive to the changes of quantum states from different frames of reference. The change between reference frames is done via a Galilean transformation, therefore, the source of the dilation in our case comes from the observable. These observables grow linearly in time and depending on the reference frame of the state the linear growth changes its slope, therefore it takes longer to grow to the same point. Such mechanism implies a different view from the usual understanding of spacetime.

翻訳日:2023-07-11 14:40:39 公開日:2023-07-09

# 生成AIと大規模言語モデルの時代におけるチャットGPT:簡潔な調査

ChatGPT in the Age of Generative AI and Large Language Models: A Concise Survey ( http://arxiv.org/abs/2307.04251v1 )

ライセンス: Link先を確認

Salman Mohamadi, Ghulam Mujtaba, Ngan Le, Gianfranco Doretto, Donald A. Adjeroh

(参考訳) ChatGPTはOpenAIが開発した大規模言語モデル(LLM)で、大量のデータに対して慎重にトレーニングされている。自然言語処理(NLP)の分野に革命をもたらし、LLMの機能の境界を押し広げた。 ChatGPTは、生成的人工知能(GAI)を大規模に公開するための重要な役割を担っている。また、同様の技術を開発し、その応用や影響を調査する研究にも関心が寄せられている。本稿では、ChatGPTとその進化に関する現在の研究ラインについて、簡潔な調査を行うことを目標とする。 chatgptのglass boxとblack boxのビューの両方を検討し、テクノロジーのコンポーネントと基本的な要素、そしてその応用、影響、そして影響について検討しました。ガラス箱のアプローチは技術の内部の動作を理解することに集中しており、ブラックボックスのアプローチは複雑なシステムとして受け入れ、入力、出力、効果を調べる。これは、この技術の包括的な探求の道を開き、さらなる研究と実験のためのロードマップを提供する。また, LLM と GAI に関する基本文献と ChatGPT との関係についても概説した。この概要は、llmの新興分野における既存および欠落の研究ラインに光を当て、パブリックユーザと開発者の両方に利益をもたらす。さらに, 教育, 研究, 医療, ファイナンスなどの分野において, 幅広い応用範囲と重要な関心事について検討した。

ChatGPT is a large language model (LLM) created by OpenAI that has been carefully trained on a large amount of data. It has revolutionized the field of natural language processing (NLP) and has pushed the boundaries of LLM capabilities. ChatGPT has played a pivotal role in enabling widespread public interaction with generative artificial intelligence (GAI) on a large scale. It has also sparked research interest in developing similar technologies and investigating their applications and implications. In this paper, our primary goal is to provide a concise survey on the current lines of research on ChatGPT and its evolution. We considered both the glass box and black box views of ChatGPT, encompassing the components and foundational elements of the technology, as well as its applications, impacts, and implications. The glass box approach focuses on understanding the inner workings of the technology, and the black box approach embraces it as a complex system, and thus examines its inputs, outputs, and effects. This paves the way for a comprehensive exploration of the technology and provides a road map for further research and experimentation. We also lay out essential foundational literature on LLMs and GAI in general and their connection with ChatGPT. This overview sheds light on existing and missing research lines in the emerging field of LLMs, benefiting both public users and developers. Furthermore, the paper delves into the broad spectrum of applications and significant concerns in fields such as education, research, healthcare, finance, etc.

翻訳日:2023-07-11 14:40:30 公開日:2023-07-09

# 室内シーンの凸分解

Convex Decomposition of Indoor Scenes ( http://arxiv.org/abs/2307.04246v1 )

ライセンス: Link先を確認

Vaibhav Vavilala and David Forsyth

(参考訳) 本稿では,複雑な室内シーンをプリミティブに解析する方法について述べる。プリミティブは単純な凸です。提案手法は,RGBD入力からシーンを一定数の凸に解析するために学習された回帰手法を用いており,任意のセグメンテーションを受け入れて分解を改善することができる。その結果は下降法で研磨され、凸を調整して非常によくフィットし、強欲に余分な原始物を取り除く。シーン全体が解析されるので、従来の深さ、正規度、セグメンテーションエラーメトリクスを使って評価できる。評価手法により, プリミティブ表現からの誤差は, 一つの画像から深度を予測する誤差に匹敵することを示した。

We describe a method to parse a complex, cluttered indoor scene into primitives which offer a parsimonious abstraction of scene structure. Our primitives are simple convexes. Our method uses a learned regression procedure to parse a scene into a fixed number of convexes from RGBD input, and can optionally accept segmentations to improve the decomposition. The result is then polished with a descent method which adjusts the convexes to produce a very good fit, and greedily removes superfluous primitives. Because the entire scene is parsed, we can evaluate using traditional depth, normal, and segmentation error metrics. Our evaluation procedure demonstrates that the error from our primitive representation is comparable to that of predicting depth from a single image.

翻訳日:2023-07-11 14:40:05 公開日:2023-07-09

# 自然言語処理を用いた後処理による光文字認識のための新しいパイプライン

A Novel Pipeline for Improving Optical Character Recognition through Post-processing Using Natural Language Processing ( http://arxiv.org/abs/2307.04245v1 )

ライセンス: Link先を確認

Aishik Rakshit, Samyak Mehta, Anirban Dasgupta

(参考訳) 光文字認識(OCR)技術は、書籍や非構造化文書のデジタル化や、モビリティ統計、法執行機関、交通、セキュリティシステムなど他の分野の応用に応用している。最先端のメソッドは、ライセンスプレートやショップ名などに印刷されたテキストでOCRとうまく動作します。しかし、印刷教科書や手書きテキストなどのアプリケーションは、既存の技術では精度が限られている。その理由は、類似した文字や手書き文字のバリエーションによる可能性がある。これらの課題はOCR技術にのみ対処することが困難であるため,自然言語処理(NLP)ツールを用いた後処理手法を提案する。この研究は、手書きまたは印刷されたテキストに対して最初にOCRを実行するエンドツーエンドパイプラインを示し、NLPを使用してその精度を向上させる。

Optical Character Recognition (OCR) technology finds applications in digitizing books and unstructured documents, along with applications in other domains such as mobility statistics, law enforcement, traffic, security systems, etc. The state-of-the-art methods work well with the OCR with printed text on license plates, shop names, etc. However, applications such as printed textbooks and handwritten texts have limited accuracy with existing techniques. The reason may be attributed to similar-looking characters and variations in handwritten characters. Since these issues are challenging to address with OCR technologies exclusively, we propose a post-processing approach using Natural Language Processing (NLP) tools. This work presents an end-to-end pipeline that first performs OCR on the handwritten or printed text and then improves its accuracy using NLP.

翻訳日:2023-07-11 14:39:51 公開日:2023-07-09

# 強結合状態における温度測定用マルチスピンプローブ

Multi-spin probes for thermometry in the strong-coupling regime ( http://arxiv.org/abs/2307.04232v1 )

ライセンス: Link先を確認

Marlon Brenes and Dvira Segal

(参考訳) 温度$t$で調製した試料にn$のスピンを結合した熱測定プローブの感度について検討した。我々の分析は弱い結合限界を超えて強いサンプル-プローブカップリングレジームにまで及んでいる。特に、各スピン間のサンプル誘起相互作用は強いカップリング効果によって生成され、プローブを構成する各物体間で微調整されていない。反応座標マッピングを用いて、プローブの非正準平衡状態を有限結合で評価することにより、平衡状態自体を通じて量子フィッシャー情報を介して熱量感度を計算する。単スピンプローブが$(N = 1)$の場合、温度感度は弱い中間結合強度のレギュレーションで低下するが、結合の増大に伴い、低温のレギュレーションにおいてプローブのより高い感度が観察される。さらに、N > 1$ である限り、試料-プローブ相互作用エネルギーの最適値が存在し、特に低温の状態では、熱ギブス状態から得られる最大精度と比較して、温度測定感度を高めることができる。最後に, この感度の増大は, サブオプティカルな測定から観察できることを示した。

We study the sensitivity of thermometric probes that are composed of $N$ spins coupled to a sample prepared at temperature $T$. Our analysis extends beyond the weak-coupling limit into the strong sample-probe coupling regime. In particular, sample-induced interactions between each of the spins are generated via strong coupling effects and are not fine-tuned amongst each body composing the probe. By employing the reaction-coordinate mapping to evaluate the non-canonical equilibrium state of the probe at finite coupling, we compute the thermometric sensitivity via the quantum Fisher information through the equilibrium state itself. We find that for single-spin probes $(N = 1)$, temperature sensitivity decreases in the regime of weak-to-intermediate coupling strength, however, as the coupling increases we observe much higher sensitivity of the probe in the low-temperature regime. Furthermore, as long as $N > 1$, there exist optimal values of the sample-probe interaction energy that allow one to attain enhanced thermometric sensitivity when compared to the maximum achieved precision obtained from thermal Gibbs states at weak coupling, particularly in the regime of low temperature. Finally, we show that this enhanced sensitivity may be observed from suboptimal measurements.

翻訳日:2023-07-11 14:39:37 公開日:2023-07-09

# mx2m:3次元意味セグメンテーションのための領域適応におけるマスク型クロスモダリティモデリング

Mx2M: Masked Cross-Modality Modeling in Domain Adaptation for 3D Semantic Segmentation ( http://arxiv.org/abs/2307.04231v1 )

ライセンス: Link先を確認

Boxiang Zhang, Zunran Wang, Yonggen Ling, Yuanyuan Guan, Shenghao Zhang, Wenhui Li

(参考訳) 3次元セマンティックセグメンテーションのための既存のクロスモーダル領域適応法は、クロスモーダル特徴マッチングによって得られる2D-3D相補性によってのみ結果を予測する。しかし、対象ドメインの監督が欠如しているため、相補性は常に信頼できるとは限らない。ドメインギャップが大きい場合、結果は理想的ではありません。監視の欠如を解決するため,マスクドモデリングを課題に導入し,マスクド・クロスモダリティ・モデリングを用いて大きなドメインギャップを低減する手法Mx2Mを提案する。私たちのMx2Mには2つのコンポーネントがあります。ひとつは、Mx2Mを様々なシナリオに適応させ、クロスモーダルな自己スーパービジョンを提供する、クロスモーダルな除去と予測(xMRP)である。もう1つはクロスモーダルな特徴マッチングの新しい方法である動的クロスモーダルフィルタ(DxMF)で、メソッド全体がより適切な2D-3D相補性を動的に使用できるようにする。 DAシナリオにおけるMx2Mの評価には、Day/Night、USA/Singapore、A2D2/SemanticKITTIなどがある。

Existing methods of cross-modal domain adaptation for 3D semantic segmentation predict results only via 2D-3D complementarity that is obtained by cross-modal feature matching. However, as lacking supervision in the target domain, the complementarity is not always reliable. The results are not ideal when the domain gap is large. To solve the problem of lacking supervision, we introduce masked modeling into this task and propose a method Mx2M, which utilizes masked cross-modality modeling to reduce the large domain gap. Our Mx2M contains two components. One is the core solution, cross-modal removal and prediction (xMRP), which makes the Mx2M adapt to various scenarios and provides cross-modal self-supervision. The other is a new way of cross-modal feature matching, the dynamic cross-modal filter (DxMF) that ensures the whole method dynamically uses more suitable 2D-3D complementarity. Evaluation of the Mx2M on three DA scenarios, including Day/Night, USA/Singapore, and A2D2/SemanticKITTI, brings large improvements over previous methods on many metrics.

翻訳日:2023-07-11 14:39:19 公開日:2023-07-09

# 窒素イオンLasingにおける軌道角運動量(OAM)による光パルス増幅

Amplification of light pulses with orbital angular momentum (OAM) in nitrogen ions lasing ( http://arxiv.org/abs/2307.04282v1 )

ライセンス: Link先を確認

Haicheng Mei, Jingsong Gao, Kailu Wang, Jiahao Dong, Qihuang Gong, Chengyin Wu, Yunquan Liu, Hongbing Jiang, and Yi Liu

(参考訳) 強いフェムト秒レーザーパルスで励起された窒素イオンは、紫外域の光増幅を引き起こす。ここでは,軌道角運動量(OAM)を有するシード光パルスが,ガウスフェムト秒レーザーパルスによって励起される窒素プラズマにおいて顕著に増幅できることを実証した。トポロジカル電荷 +1 と -1 では、シード光パルスの2桁のエネルギー増幅が観測され、増幅パルスはインシデントシードパルスと同じOAMを担っている。さらに,oamシードビームを用いたプラズマ増幅器の空間的不一致は,ドーナツ形状の強度分布を示すoamシードパルスの特別な空間的プロファイルにより,oamを介さずにガウスモードの増幅を生じさせることを示した。この誤解を利用して、ガウスモードとoamモードの間で出力信号をトグルする光スイッチを実装できる。この研究は、シード光から増幅信号への位相移動を認証するだけでなく、OAMビーム増幅の達成のために、ドーナツ形状のシードビームと窒素プラズマのゲイン領域との空間的重なりが重要であることも強調している。

Nitrogen ions pumped by intense femtosecond laser pulses give rise to optical amplification in the ultraviolet range. Here, we demonstrated that a seed light pulse carrying orbital angular momentum (OAM) can be significantly amplified in nitrogen plasma excited by a Gaussian femtosecond laser pulse. With the topological charge of +1 and -1, we observed an energy amplification of the seed light pulse by two orders of magnitude, while the amplified pulse carries the same OAM as the incident seed pulse. Moreover, we show that a spatial misalignment of the plasma amplifier with the OAM seed beam leads to an amplified emission of Gaussian mode without OAM, due to the special spatial profile of the OAM seed pulse that presents a donut-shaped intensity distribution. Utilizing this misalignment, we can implement an optical switch that toggles the output signal between Gaussian mode and OAM mode. This work not only certifies the phase transfer from the seed light to the amplified signal, but also highlights the important role of spatial overlap of the donut-shaped seed beam with the gain region of the nitrogen plasma for the achievement of OAM beam amplification.

翻訳日:2023-07-11 14:29:27 公開日:2023-07-09

# 解説文における自動エッセイスコーリング:DeBERTeaching Assistant

Automated Essay Scoring in Argumentative Writing: DeBERTeachingAssistant ( http://arxiv.org/abs/2307.04276v1 )

ライセンス: Link先を確認

Yann Hicke, Tonghua Tian, Karan Jha, Choong Hee Kim

(参考訳) 自動評価は50年以上にわたって研究・産業問題として研究されてきた。世界中の教育者にとって貴重な時間節約ツールを創出できる研究分野としての教育的価値が明白であることから、NLPコミュニティから多くの注目を集めている。しかし、これらのツールは一般的に良い文法の検出、スペルミス、組織品質にフォーカスしているが、最終的な評価には説得力のある特徴を組み込むのに失敗する傾向がある。議論の強さを改善するために生徒に行動可能なフィードバックを与える責任は、教師の肩にのみ残される。そこで本研究では,その説得力の質を議論的に記述する談話要素に注釈を付けることで,上述の正確性を達成するトランスフォーマーアーキテクチャを提案するとともに,提案モデルの説明可能性を調査する今後の課題についても拡張し,教師のアドバイスと機械のアドバイスとのパートナーシップを可能にする。

Automated Essay scoring has been explored as a research and industry problem for over 50 years. It has drawn a lot of attention from the NLP community because of its clear educational value as a research area that can engender the creation of valuable time-saving tools for educators around the world. Yet, these tools are generally focused on detecting good grammar, spelling mistakes, and organization quality but tend to fail at incorporating persuasiveness features in their final assessment. The responsibility to give actionable feedback to the student to improve the strength of their arguments is left solely on the teacher's shoulders. In this work, we present a transformer-based architecture capable of achieving above-human accuracy in annotating argumentative writing discourse elements for their persuasiveness quality and we expand on planned future work investigating the explainability of our model so that actionable feedback can be offered to the student and thus potentially enable a partnership between the teacher's advice and the machine's advice.

翻訳日:2023-07-11 14:29:06 公開日:2023-07-09

# 教師の正確な反応生成における大規模言語モデルの有効性評価

Assessing the efficacy of large language models in generating accurate teacher responses ( http://arxiv.org/abs/2307.04274v1 )

ライセンス: Link先を確認

Yann Hicke, Abhishek Masand, Wentao Guo, Tushaar Gangavarapu

(参考訳) (Tack et al., 2023)は、教育対話における教師語の生成に関する教育アプリケーション構築のためのNLPの革新的利用に関する第18回ワークショップの主催する共有タスクを組織した。本研究は,共用課題の構造に従って,学生に情報的かつ有益な洞察を提供することによって,大規模言語モデルの生成能力を評価し,知識のある教師の役割をシミュレートする。そこで本研究では,GPT-4 (few-shot, in-context learning), fine-tuned GPT-2, fine-tuned DialoGPTなどのベンチマーク生成モデルの広範な評価を行う。さらに,教育的品質を最適化するために,強化学習を用いたflan-t5モデルの微調整を行った。教師-学生チャットルームコーパスのサブセットについて,BERTScore と DialogRPT を用いて測定し,他の微調整モデルに対する GPT-4 の有効性を示した。我々は、サンプリング、代表性、ダイアログ完全性など、いくつかのデータセット特性が微調整に重大な課題をもたらし、微調整モデルの一般化性に悪影響を及ぼすと仮定する。最後に,これらの生成モデルに対して,対話コヒーレンスやマッチング言語モデル分布だけでなく,教育的スキルを提示するモデルの能力にも依存するメトリクスを用いた評価の必要性を指摘する。

(Tack et al., 2023) organized the shared task hosted by the 18th Workshop on Innovative Use of NLP for Building Educational Applications on generation of teacher language in educational dialogues. Following the structure of the shared task, in this study, we attempt to assess the generative abilities of large language models in providing informative and helpful insights to students, thereby simulating the role of a knowledgeable teacher. To this end, we present an extensive evaluation of several benchmarking generative models, including GPT-4 (few-shot, in-context learning), fine-tuned GPT-2, and fine-tuned DialoGPT. Additionally, to optimize for pedagogical quality, we fine-tuned the Flan-T5 model using reinforcement learning. Our experimental findings on the Teacher-Student Chatroom Corpus subset indicate the efficacy of GPT-4 over other fine-tuned models, measured using BERTScore and DialogRPT. We hypothesize that several dataset characteristics, including sampling, representativeness, and dialog completeness, pose significant challenges to fine-tuning, thus contributing to the poor generalizability of the fine-tuned models. Finally, we note the need for these generative models to be evaluated with a metric that relies not only on dialog coherence and matched language modeling distribution but also on the model's ability to showcase pedagogical skills.

翻訳日:2023-07-11 14:28:49 公開日:2023-07-09

# 局所ブラウン回路におけるサンプリングと誤差補正の相転移

Phase transitions in sampling and error correction in local Brownian circuits ( http://arxiv.org/abs/2307.04267v1 )

ライセンス: Link先を確認

Subhayan Sahu, Shao-Kai Jian

(参考訳) 局所ブラウン回路における反集中性と近似ユニタリ設計挙動の出現について検討した。出力状態の確率分布とエントロピーの回路平均モーメントのダイナミクスは、レプリカ空間に有効な局所ハミルトニアンを用いて想像上の時間発展として表現することができる。これにより、テンソルネットワークツールを用いて、そのような回路平均量の1+1d$のダイナミックスを大規模に数値シミュレーションし、ブラウン回路の様々な状態を異なる熱力学相として同定することができる。特に、反濃縮の出現は衝突確率の急激な遷移として$\log N$ timescale と同定し、そこでは$N$は量子ビットの数である。また,特定の古典近似アルゴリズムが同時に計算硬度遷移を持つことを示す。ノイズの存在下では、ノイズレートを1/n$にスケールダウンした場合、線形クロスエントロピーベンチマークにノイズ誘起1次位相遷移が存在することを示す。ブラウン回路はより長い時間に、o(n)$タイムでユニタリな2-設計を近似する。このような回路による量子誤差補正の実現可能性を直接調査し、o(n)$タイムスケールで1次遷移を同定する。これら全ての相転移のスケーリング挙動は、大規模数値から得られ、有効レプリカハミルトニアンのスペクトルを解析することによって裏付けられる。

We study the emergence of anticoncentration and approximate unitary design behavior in local Brownian circuits. The dynamics of circuit averaged moments of the probability distribution and entropies of the output state can be represented as imaginary time evolution with an effective local Hamiltonian in the replica space. This facilitates large scale numerical simulation of the dynamics in $1+1d$ of such circuit-averaged quantities using tensor network tools, as well as identifying the various regimes of the Brownian circuit as distinct thermodynamic phases. In particular, we identify the emergence of anticoncentration as a sharp transition in the collision probability at $\log N$ timescale, where $N$ is the number of qubits. We also show that a specific classical approximation algorithm has a computational hardness transition at the same timescale. In the presence of noise, we show there is a noise-induced first order phase transition in the linear cross entropy benchmark when the noise rate is scaled down as $1/N$. At longer times, the Brownian circuits approximate a unitary 2-design in $O(N)$ time. We directly probe the feasibility of quantum error correction by such circuits, and identify a first order transition at $O(N)$ timescales. The scaling behaviors for all these phase transitions are obtained from the large scale numerics, and corroborated by analyzing the spectrum of the effective replica Hamiltonian.

翻訳日:2023-07-11 14:28:28 公開日:2023-07-09

PDF登録状況（公開日: 20230709）