Fugu-MT 論文翻訳(概要): Human-aligned Quantification of Numerical Data

論文の概要: Human-aligned Quantification of Numerical Data

arxiv url: http://arxiv.org/abs/2511.15723v1
Date: Sat, 15 Nov 2025 04:44:18 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-21 17:08:52.288301
Title: Human-aligned Quantification of Numerical Data
Title（参考訳）: 数値データのヒューマンアライン定量化
Authors: Anton Kolonin,
Abstract要約: 数値データを定量化するための情報圧縮とシルエット係数に基づいて,メトリクスの適用性を評価する。以上の結果から,数値データを別カテゴリに分類する能力は,0.65以上のシルエット係数と0.5以下のディップテストに関連があることが示唆された。
参考スコア（独自算出の注目度）: 0.152292571922932
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Quantifying numerical data involves addressing two key challenges: first, determining whether the data can be naturally quantified, and second, identifying the numerical intervals or ranges of values that correspond to specific value classes, referred to as "quantums," which represent statistically meaningful states. If such quantification is feasible, continuous streams of numerical data can be transformed into sequences of "symbols" that reflect the states of the system described by the measured parameter. People often perform this task intuitively, relying on common sense or practical experience, while information theory and computer science offer computable metrics for this purpose. In this study, we assess the applicability of metrics based on information compression and the Silhouette coefficient for quantifying numerical data. We also investigate the extent to which these metrics correlate with one another and with what is commonly referred to as "human intuition." Our findings suggest that the ability to classify numeric data values into distinct categories is associated with a Silhouette coefficient above 0.65 and a Dip Test below 0.5; otherwise, the data can be treated as following a unimodal normal distribution. Furthermore, when quantification is possible, the Silhouette coefficient appears to align more closely with human intuition than the "normalized centroid distance" method derived from information compression perspective.
Abstract（参考訳）: まず、データが自然に定量化できるかどうかを判断し、次に、統計的に意味のある状態を表す「量子」と呼ばれる特定の値クラスに対応する値の数値間隔や範囲を特定する。このような定量化が実現可能であれば、数値データの連続ストリームは、測定パラメータによって記述されたシステムの状態を反映した「シンボル」のシーケンスに変換することができる。情報理論と計算機科学はこの目的のために計算可能なメトリクスを提供するのに対し、人々はよく常識や実践的な経験に頼って直感的にこのタスクを実行する。本研究では,数値データの定量化のための情報圧縮とシルエット係数に基づいて,メトリクスの適用性を評価する。また、これらの指標が相互にどのように関連しているか、また一般に「人間の直観」と呼ばれるものとの関係についても検討する。以上の結果から, 数値値の分類は0.65以上のシルエット係数, 0.5以下のディップテストと関係していることが明らかとなった。さらに、定量化が可能となると、情報圧縮の観点から導かれた「正規化セントロイド距離」法よりも、シルエット係数は人間の直感とより密接に一致しているように見える。

論文の概要: Human-aligned Quantification of Numerical Data

関連論文リスト