Scientific collaborations are increasingly relying on large volumes of data
for their work and many of them employ tiered systems to replicate the data to
their worldwide user communities. Each user in the community often selects a
different subset of data for their analysis tasks; however, members of a
research group often are working on related research topics that require
similar data objects. Thus, there is a significant amount of data sharing
possible. In this work, we study the access traces of a federated storage cache
known as the Southern California Petabyte Scale Cache. By studying the access
patterns and potential for network traffic reduction by this caching system, we
aim to explore the predictability of the cache uses and the potential for a
more general in-network data caching. Our study shows that this distributed
storage cache is able to reduce the network traffic volume by a factor of 2.35
during a part of the study period. We further show that machine learning models
could predict cache utilization with an accuracy of 0.88. This demonstrates
that such cache usage is predictable, which could be useful for managing
complex networking resources such as in-network caching.
Access Trends of In-network Cache for Scientific Data
科学データに対するネットワーク内キャッシュのアクセス動向
0.80
Ruize Han University of California, Berkeley
ライトズハン カリフォルニア大学バークレー校
0.42
Berkeley, CA, USA hrz98@berkeley.edu
バークレー, CA, USA hrz98@berkeley.edu
0.76
Lawrence Berkeley Nat’l Laboratory
ローレンス・バークレー・ナットル研究所
0.52
Alex Sim Kesheng Wu
アレックス・シム 華シェンウー(kesheng wu)
0.44
Berkeley, CA, USA {asim,kwu}@lbl.gov
バークレー, CA, USA {asim,kwu}@lbl.gov
0.75
Inder Monga Chin Guok
インダー・モンガ・チン・グーク(inder monga chin guok)
0.42
Energy Sciences Network
エネルギー科学ネットワーク
0.79
Berkeley, CA, USA {imonga,chin}@es.net
バークレー, ca, usa {imonga,chin}@es.net
0.58
University of California, San Diego
カリフォルニア大学サンディエゴ校
0.52
California Institute of Technology
カリフォルニア工科大学
0.55
Frank Würthwein
フランク・ヴュルツヴァイン
0.55
Diego Davila La Jolla, CA, USA
ディエゴ・ダビラ アメリカ合衆国カリフォルニア州ラ・ジョラ
0.59
{fkw,didavila}@ucsd.edu
{fkw,didavila}@ucsd.edu
0.47
Justas Balcas
Justas Balcas
0.42
Harvey Newman
ハーヴェイ・ニューマン
0.50
Pasadena, CA, USA
パサデナ、カリフォルニア州、アメリカ
0.79
ABSTRACT Scientific collaborations are increasingly relying on large volumes of data for their work and many of them employ tiered systems to replicate the data to their worldwide user communities.
Each user in the community often selects a different subset of data for their analysis tasks; however, members of a research group often are working on related research topics that require similar data objects.
By studying the access patterns and potential for network traffic reduction by this caching system, we aim to explore the predictability of the cache uses and the potential for a more general in-network data caching.
Our study shows that this distributed storage cache is able to reduce the network traffic volume by a factor of 2.35 during a part of the study period.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.
{jbalcas,newman}@hep.caltech.edu 1 INTRODUCTION The increasing volume of data from scientific experiments and simulations requires a vast amount of resources to store and distribute to geographically distributed users.
Many collaborations such as the Large Hadron Collider (LHC) utilize tiered systems to replicate the data in a few places, and the users could access their nearby storage sites.
However, with the increasing cost of managing storage resources and the limited number of replicas, the large number of user accesses still create considerable demand on the wide-area network that increases the cost of data analyses, and could cause large-scale network traffic congestion [3, 6].
The wide-area network traffic for retrieving and replicating their data is primarily carried on the Energy Science Network (ESnet), one of the key components of the internet backbone especially designed for our nation’s science and research communities.
Because the data lakes have demonstrated their effectiveness in reducing the load on the internet backbone, we are interested in exploring the predictability of their impact and the potential for providing a more general distributed storage caching strategy known as in-network caching [11, 12, 18].
More specifically, our work starts with a study of data access trends with one of the data lakes named Southern California Petabyte Scale Cache (SoCal Repo) [7].
具体的には、Southern California Petabyte Scale Cache (SoCal Repo) [7]という名のデータレイクを使って、データアクセストレンドの調査から始めています。
0.82
We examine the trends of network traffic volume and establish a machine learning model to predict the future network bandwidth requirement for the regional data cache.
The key contributions of this paper can be summarized as follows: (1) our study finds find that the SoCal Repo was able to reduce the traffic by 23% over the study period, and by 57% under normal usage;
(2) this network traffic reduction is stable and predictable by LSTM, with 88.4% accuracy; (3) because of the network traffic reduction, we recommend a general in-network cache to supplement the existing data lakes from HEP to benefit all science user communities.
XRootD system is the bases for the XCache, and supports unique capabilities for data distribution and access, especially for large collaborations such as the Large Hadron Collider (LHC) [1, 5].
SoCal Repo consists of 24 data cache nodes at Caltech, UCSD, and ESnet with approximately 2.5PB of storage capacity, supporting client computing jobs for High-Luminosity Large Hadron Collider (HL-LHC) analysis in Southern California.
In this cache installation, there are 11 nodes at Caltech with storage sizes ranging from 96TB to 388TB; 12 nodes at UCSD with 24TB each node; one node at an ESnet endpoint at Sunnyvale, CA with 44TB of storage.
The two southern California sites are within 200 km from each other and have a round trip time (RTT) of less than 3 milliseconds (ms) from each other, while the ESnet node is about 700 km away from UCSD, with an RTT of about 10ms.
When a user’s computing job needs a file from SoCal Repo, the system first looks up the location of the file using the "Trivial File Catalogue" (TFC) [8, 9].
Following the established convention for the tiered storage system, the data files are grouped into the namespace for the local cache nodes and the TFC points to a "local redirector" in XRootD where the "local redirector" knows all regional caches.
The XRootD client is configured to get the file from the national XRootD data federation to the local cache node.
xrootdクライアントは、national xrootdデータフェデレーションからローカルキャッシュノードにファイルを取得するように構成されている。 訳抜け防止モード: XRootD クライアントは設定されます to get the file from the national XRootD data federation to the local cache node。
0.86
Local cache nodes do not connect to another cache node but always connect to the higher tier of the federation.
Thus it is possible to find a copy of any file needed for analysis even though the lookup mechanism in TFC does not always guarantee to recommend a replica in the US.
When new cache nodes have been added to the local cache nodes, all cache misses go to the new cache nodes first, so that the distributed cache nodes avoid deletions of old data as long as there is a new space to fill.
It means that cache nodes that have been around for some time will tend to have data that is not of interest to as many users, and those data will eventually get deleted when running out of space.
This happened around Aug. 26, 2021 when 7 new nodes at Caltech (xrd 3-8, 11) are added to the system, and around Sep. 30, 2021 when 2 new nodes at Caltech (xrd 9-10) are added to the system.
The new data is of more interest and leads to more accesses.
新しいデータはより興味を持ち、より多くのアクセスにつながる。
0.84
Old data does not
古いデータはそうではない
0.64
get deleted as there is still space on the new nodes.
新しいノードにはまだスペースがあるので、削除します。
0.78
At some point, it will resolve itself, but may take some time to resolve.
ある時点では解決するが、解決には多少時間がかかるかもしれない。
0.81
3 DATA ACCESS TRENDS Our work is based on monitoring information collected from the SoCal Repo between July 2021 and January 2022.
3 Data ACCESS TRENDS 我々の研究は、2021年7月から2022年1月までのSoCal Repoから収集したモニタリング情報に基づいています。 訳抜け防止モード: 3 Data ACCESS TRENDS 私たちの仕事は、 2021年7月から2022年1月までのソカルレポから収集したモニタリング情報。
0.61
The collected information includes the following attributes about every data access request: user id, file id, file path, file size, the data transmission start time, the data transmission finished time, the total size of the transmission, whether the data request is a data transfer (cache miss) or data share (cache hit), which cache node the request is sent to, whether the transmission is successful, and so on.
Cache miss would require a data file to be transferred from a remote site over the wide-area network.
キャッシュミスは、リモートサイトから広域ネットワーク上で転送されるデータファイルを必要とする。
0.76
The "Data Transfer Size" in the table is the total volume of data transferred to satisfy the cache misses.
テーブル内の"データ転送サイズ"は、キャッシュミスを満たすために転送されるデータの総量である。
0.82
The "Shared Data Size" refers to the total volume from the cache hits.
共有データサイズ」とは、キャッシュヒットからの総ボリュームを指す。
0.73
The "Net Traffic Reduction" is the percentage of network traffic reduction by the cache system, calculated monthly by (shared data size) / (total access size).
Figure 2: Total data access counts in the regional cache.
図2: 地域キャッシュにおけるデータアクセス総数。
0.73
The number of access is relatively stable during the time period of this study.
本研究の期間中,アクセス数は比較的安定している。
0.80
Figure 4: Average data size per access in the regional cache
図4: 地域キャッシュにおけるアクセス当たりの平均データサイズ
0.91
(b) Weekly (b) weekly
(b)週刊 (b)週刊
0.34
(a) Daily Figure 3: Total data access sizes in the regional cache
(a)毎日 図3: 地域キャッシュにおける総データアクセスサイズ
0.79
(b) Weekly misses), and the distribution among the cache nodes.
(b)週刊 キャッシュノード間の分散。
0.33
Figure 2b shows the weekly total data access counts and distribution among the cache nodes.
図2bは、キャッシュノード間の毎週の総データアクセス数と分布を示しています。
0.72
The number of total accesses is fairly consistent throughout the study period, fluctuating around 31,000 per day.
総アクセス数は研究期間を通じて一定であり、1日あたり約31,000回である。
0.72
Each cache node evenly receives file requests before September 2021.
各キャッシュノードは2021年9月までにファイル要求を均等に受信する。
0.60
When new cache nodes have been added to the regional cache, many of the cache accesses have been sent to the new cache nodes evenly with the previously described reason in Section 2.
Figure 3a shows the daily total data access sizes, combining shared data sizes (i.e. cache hits) and transferred data sizes (i.e. cache misses) on each cache node.
Figure 3b shows the weekly total data access sizes among cache nodes.
図3bは、キャッシュノード間の毎週のデータアクセスサイズを示しています。
0.68
The total access size is increasing over the study periods indicating that the requested data size grows while the number of accesses remains about the same each month.
Figure 5: Total sizes of the cache hits in the regional cache been sent to the new cache nodes, and it is expected by the policy described in Section 2.
Figure 4b shows the weekly average data sizes per access.
図4bは、アクセスあたりの平均データサイズを示しています。
0.64
The upper parts of both daily and weekly plots show the average data size per access for each node, and the lower parts show the average data size per access of all nodes combined.
Overall, the average data size per access is increasing during the study period, consistent with the increases in the total access size while the data access counts remain about the same each month.
Figure 5a and 5b show the daily and weekly total shared data sizes among the cache node respectively.
図5aと5bは、それぞれキャッシュノード間の日毎と週毎の共有データサイズを示しています。
0.68
The total shared data size shows a big drop since mid Sept. 2021, with only a few occasional hikes.
共有データのサイズは2021年9月中旬から大幅に減少し、時には数回の上昇しかなかった。
0.61
After new cache nodes have been added to the regional cache, most of the cache hits have been sent to new cache nodes as the new nodes have recent data of more interest.
It shows that the network traffic demand reduction rate experiences a sudden drop since Oct. 2021 when the user access trends changes to streaming many new data files.
The average network traffic demand reduction rate is 1.30 during the study period, while the average rate from July 2021 to Sep. 2021 is 2.35 before the user access trends change.
The average rate drops to 1.11 from Oct. 2021 to Jan. 2021, as user streaming data have a great negative impact on the statistics of the caching system.
(1) Figure 8a shows the daily total data reuse size for all nodes in the regional cache.
(1) 図8aは、地域キャッシュ内のすべてのノードの日毎のデータ再利用サイズを示しています。
0.55
Data reuse means the re-access of the same data file without transferring within the same day (i.e. successive cache hits on the same data without a cache miss on that data during one day.
Figure 9a shows the daily data reuse rates for all nodes in the regional cache.
図9aは、地域キャッシュ内のすべてのノードのデータ再利用率を示しています。
0.69
The data reuse rate is the number of times that the files have been reused in a single day, calculated by (𝑇𝑜𝑡𝑎𝑙 𝐷𝑎𝑡𝑎 𝑅𝑒𝑢𝑠𝑒𝑑 𝐶𝑜𝑢𝑛𝑡)/(𝑁𝑢𝑚𝑏𝑒𝑟 𝑜 𝑓 𝑈 𝑛𝑖𝑞𝑢𝑒 𝑅𝑒𝑢𝑠𝑒𝑑 𝐹𝑖𝑙𝑒𝑠).
データ再利用率 (data reuse rate) は、ファイルが1日で再利用された回数であり、(total data reused count)/(number o f u nique reused files)によって計算される。
0.84
Figure 9b shows the daily data reuse rate of the 7-day moving average for all nodes in the regional cache.
図9bは、地域キャッシュ内の全ノードの7日間移動平均の日々のデータ再利用率を示しています。
0.70
It’s measuring how well the caching system saves the traffic on files that are accessed multiple times.
キャッシュシステムが、複数回アクセスされたファイルのトラフィックをどれだけ節約できるかを測定する。
0.81
The daily data reuse rate increases gradually from July 2021 to mid Nov. 2021, and decreases a bit since then.
UTILIZATION To further understand the trends of cache utilization and explore the potential effectiveness of a more general caching mechanism in addition to the dedicated caching system for the specific user community, we next attempt to build machine learning models to investigate the predictability of common cache utilization trends.
More specifically, we use a version of RNN known as Long-Short Term Memory (LSTM) in this work [10, 16].
具体的には、この作業でLong-Short Term Memory (LSTM)として知られるRNNのバージョンを使用します [10, 16]。
0.82
4.1 LSTM on the Daily Data We anticipate this modeling effort to be used in an advanced softwaredefined networking environment for possible resource allocation of a series of in-network caches.
4.1 LSTM on the Daily Data 我々は、一連のネットワーク内キャッシュのリソース割り当てを可能にする、高度なソフトウェア定義ネットワーク環境で使用されることを期待する。
0.72
In this context, one useful time frame for considering possible resource allocation might be a few hours or a day.
この文脈では、リソース割り当ての可能性を考えるための有用な時間枠の1つは、数時間か1日である。
0.59
With this in mind, this work aggregates the cache utilization statistics into daily records.
これを考慮して、この作業はキャッシュ利用統計データを日々の記録に集約する。
0.74
To construct this daily time series, we need to generate meaningful daily summaries along with other useful features that might support the prediction task.
The daily summary of cache statistics includes the following features:
キャッシュ統計の日次要約には以下の機能が含まれている。
0.68
(a) access counts, (b) access sizes,
(a)アクセス数 (b)アクセスサイズ
0.59
(c) cache hit counts,
(c)キャッシュヒット数。
0.68
(d) cache hit sizes,
(d)キャッシュヒットサイズ。
0.69
(e) cache miss counts,
(e)キャッシュミスカウント。
0.56
(f) cache miss sizes,
(f)キャッシュミスサイズ。
0.67
(g) data reuse counts, and
(g)データの再利用回数、及び
0.84
(h) data reuse sizes.
(h)データ再利用サイズ。
0.85
Figure 10 shows the distribution of these daily summaries.
図10は、これらの日々の要約の分布を示しています。
0.58
Since these features have widely varying values, we plan to normalize these values before giving them to LSTM models.
これらの特徴は様々な値を持つため、LSTMモデルに渡す前にこれらの値を正規化する計画である。
0.69
As there are many extreme values in the data, we have selected to use the z-score normalization [15] instead of the more commonly used min-max normalization.
Due to the limited number of data points available, We allocate the data of the first 80% of the study period to be the training data, and the data of the last 20% of the study period to be the test data.
We prepared two different models, one with the above mentioned eight features and the second one with one additional feature, dayof-the-week.
上記の8つの機能を備えたモデルと、1つの追加機能を備えた2つのモデルを用意しました。
0.63
Because most workplaces follow the workweek schedule, we anticipate seeing a weekly trend and the day-of-the-week feature might improve the prediction accuracy.
The first 8 are the normalized features of 𝑁𝑡ℎ day, and the features include data access count, data access size, cache hit count, cache hit size, cache miss count, cache miss size, data reuse count, data reuse sizes.
If 𝑁𝑡ℎ day is Sunday, then it’s represented as not Monday to Saturday.
もしN日が日曜日なら、月曜日から土曜日ではないと表現される。
0.79
The output of the LSTM model is a vector of size 8, the predicted normalized features of (𝑁 + 1)𝑡ℎ day, and the features include data access count, data access size, cache hit count, cache hit size, cache
Figure 12 shows how the daily LSTM model fits the daily access data.
図12は、日々のLSTMモデルが日々のアクセスデータにどのように適合するかを示しています。
0.56
The model performs well when there are no extreme values, but as shown in Figure 12b, 12d, 12f, and 12h, the model does not fit and predict extreme values well.
The gray shaded area is the predicted variance, defined as 2 standard deviations of the predicted values.
グレーシェード領域は予測分散であり、予測値の2つの標準偏差として定義される。
0.72
If the actual value is within the predicted variance of the predicted value, we consider it as accurate.
実際の値が予測値のばらつきの範囲内であれば、その値が正確であると考える。
0.83
The overall accuracy is 0.884, and the accuracies for daily count data are all over 0.9.
全体的な精度は0.884で、日数データの精度は0.9以上である。
0.79
Table 4 shows the RMSE of the Daily LSTM model on each daily data, along with the accuracy of the prediction.
表4は、日毎のデータに対する日毎のlstmモデルのrmseと、予測の精度を示している。
0.73
Note that the RMSE shown in this table is measured on the scale of the original values, not the normalized values.
この表で示されるrmseは正規化値ではなく、元の値のスケールで測定される。
0.67
The overall accuracy is 0.884.
全体の精度は0.884。
0.78
The difference between the train RMSE and test RMSE on the size features is due to the model’s inability to fit on extreme values.
列車のRMSEと試験のRMSEの違いは、モデルが極端な値に収まらないためである。
0.41
When the day-of-the-week feature is added to the model for training, the model performance is improved on the daily counts, while the performance improvement in predicting daily sizes is minimal.
The extreme values in the daily sizes make it hard to fit the daily sizes well; thus, adding day-of-the-week information can only improve the performance on the daily counts.
This suggests that there might be a weekly seasonality in the daily data.
これは、毎日のデータに毎週の季節性があることを示唆している。
0.66
英語(論文から抽出)
日本語訳
スコア
(a) Access counts (b) Access sizes
(a)アクセス数 (b)アクセスサイズ
0.77
(c) Cache hit counts
(c)キャッシュヒット数
0.70
(d) Cache hit sizes
(d)キャッシュヒットサイズ
0.76
(e) Cache miss counts
(e)キャッシュミスカウント
0.63
(f) Cache miss sizes
(f)キャッシュミスサイズ
0.73
(g) Data reuse counts
(g)データ再利用数
0.85
(h) Data reuse sizes
(h)データ再利用サイズ
0.91
Figure 10: Distribution of daily features
図10:毎日の特徴の分布
0.88
Table 6: Explored Hyper-parameters for MA LSTM model
表6:ma lstmモデルのハイパーパラメータの検討
0.79
(a) (b) Figure 11:
(a) (b) 図11
0.47
(a) 2-layer LSTM (b) 1-layer LSTM
(a)2層LSTM (b)1層LSTM
0.46
Table 5: hyper-parameter of the MA LSTM model # of epochs
表 5: ma lstm model # of epochsのハイパーパラメータ
0.69
activation function # of LSTM unit
活性化機能 LSTM ユニットの #
0.79
dropout rate
ドロップアウトレート
0.57
values 128 tanh
価値観 128 タン
0.52
0.00 100 4.2 LSTM on the Daily Data with 7-Day Moving
0.00 100 4.2 LSTM on the Daily Data with 7-day moving
0.38
Average (MA LSTM Model)
平均値(MA LSTMモデル)
0.89
In the previous study, we speculated that LSTM models perform poorly on the size feature because of the extreme values.
前回の研究では, LSTMモデルでは, 極端な値のため, サイズ特性が不十分であった。
0.82
To verify this claim, we have smoothed the daily summaries with a 7-day moving average.
この主張を検証するために、我々は7日間の移動平均で毎日の要約を円滑にした。
0.60
The input and output of the MA LSTM model are a vector of size 8, the normalized features of 𝑁𝑡ℎ day and (𝑁 + 1)𝑡ℎ day respectively, and the features include data access count, data access size, cache hit count, cache hit size, cache miss count, cache miss size, data reuse count, and data reuse sizes.
MA LSTMモデルの入力と出力は、サイズ8のベクトル、N日と(N + 1)日の正規化された特徴、データアクセス数、データアクセスサイズ、キャッシュヒット数、キャッシュヒット数、キャッシュミス数、キャッシュミス数、データ再利用数、データ再利用サイズである。
0.62
The loss function is the root mean squared error (RMSE).
損失関数は根平均二乗誤差(RMSE)である。
0.74
All values in the output vectors are given equal weights in calculating the loss.
出力ベクトルの全ての値は損失を計算する際に等しい重みを与える。
0.85
The same 3360 combinations of hyper-parameters shown in Table 2 are explored in the MA LSTM model.
表2に示すような3360のハイパーパラメータの組み合わせをMA LSTMモデルで探索する。
0.72
The model selection process is the same as the selection process for the daily LSTM model.
モデル選択プロセスは、日次LSTMモデルの選択プロセスと同じである。
0.72
The model with the lowest test RMSE is the 1-layer LSTM model shown in Figure 11b; its hyper-parameters are shown in Table 5.
The hyper-parameters and constructions of the daily LSTM model and the MA LSTM model are very similar as they only differ in the dropout rate and the number of training epochs.
This is due to the high similarity between the daily data and the daily data with 7-day moving average, and the limited number of available data points.
Figure 13 shows how the MA LSTM model fits the 7-day moving average on daily data.
図13は、MA LSTMモデルが日々のデータに対して7日間の移動平均にどのように適合するかを示しています。 訳抜け防止モード: 図13は MA LSTMモデルは、毎日のデータに対して7日間の移動平均に適合する。
0.76
The model still deviates a lot on the extreme values in Figure 13f, but the model works well in general.
モデルはまだ図13fの極端な値について多くを逸脱しているが、一般的にはうまく機能している。
0.70
The gray shaded area indicates the predicted variance, which is much smaller compared to the daily LSTM model.
グレーシェード領域は, 日次LSTMモデルよりもはるかに小さい, 予測されたばらつきを示す。
0.77
Table 6 shows the RMSE of the MA LSTM model, along with the prediction accuracy.
表6は、予測精度とともに、MA LSTMモデルのRMSEを示す。
0.72
Overall accuracy is 0.873.
全体の精度は0.873である。
0.55
Although accuracy is less than 0.01 lower than the daily LSTM model, the predicted variance of the MA LSTM model is much smaller, so the prediction of the MA LSTM model is closer to the actual value.
This shows that the LSTM model fits the daily data with 7-day moving average better than the daily data, which confirms that the extreme values severely affect the LSTM performance.
4.3 Seasonality Day-of-the-week information improves the performance of the daily the LSTM model, which suggests some weekly seasonality in the daily time series data.
4.3 季節の日報は,日毎のLSTMモデルの性能を改善し,日毎の時系列データに週毎の季節性を示す。
0.79
We investigate the seasonality using periodograms [17].
周期図[17]を用いて季節性を調べる。
0.64
Figure 14 shows the periodogram of daily data.
図14は、日データの周期図を示しています。
0.63
All columns show relatively strong, if not strongest, seasonal effects of 7 day period, confirming that there exists a weekly seasonal effect.
Figure 12: Daily LSTM model Train and Test result vs True Value
図12:1日LSTMモデルトレインとテスト結果対真の価値
0.82
(a) Access counts (b) Access sizes
(a)アクセス数 (b)アクセスサイズ
0.77
(c) Cache hit counts
(c)キャッシュヒット数
0.70
(d) Cache hit sizes
(d)キャッシュヒットサイズ
0.76
(e) Cache miss counts
(e)キャッシュミスカウント
0.63
(f) Cache miss sizes
(f)キャッシュミスサイズ
0.73
(g) Data reuse counts
(g)データ再利用数
0.85
(h) Data reuse sizes
(h)データ再利用サイズ
0.91
Figure 13: MA LSTM model Train and Test result vs True Value
図13:MA LSTMモデルトレインとテスト結果対真の価値
0.80
2021-072021-082021-0 92021-102021-112021- 122022-012022-021000 00100002000030000400 005000060000True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0202000040000600 0080000True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0250000500010000 15000200002500030000 True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0225000250050007 500100001250015000Tr ue valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0210000010000200 0030000True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0202000040000600 0080000True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0250000500010000 150002000025000True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0202000400060008 0001000012000True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0210000010000200 003000040000True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0250000500010000 15000200002500030000 35000True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0250000500010000 150002000025000True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0220000200040006 00080001000012000140 00True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0250000500010000 1500020000True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0205000100001500 0200002500030000True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0250000500010000 1500020000True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0202000400060008 000True valueTraining set predictionTest set prediction
2021-072021-082021-0 92021-102021-112021- 122022-012022-021000 00100002000030000400 005000060000True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0202000040000600 0080000True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0250000500010000 15000200002500030000 True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0225000250050007 500100001250015000Tr ue valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0210000010000200 0030000True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0202000040000600 0080000True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0250000500010000 150002000025000True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0202000400060008 0001000012000True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0210000010000200 003000040000True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0250000500010000 15000200002500030000 35000True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0250000500010000 150002000025000True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0220000200040006 00080001000012000140 00True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0250000500010000 1500020000True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0205000100001500 0200002500030000True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0250000500010000 1500020000True valueTraining set predictionTest set prediction2021-07202 1-082021-092021-1020 21-112021-122022-012 022-0202000400060008 000True valueTraining set predictionTest set prediction
0.11
英語(論文から抽出)
日本語訳
スコア
(a) Access counts (b) Access sizes
(a)アクセス数 (b)アクセスサイズ
0.77
(c) Cache hit counts
(c)キャッシュヒット数
0.70
(d) Cache hit sizes
(d)キャッシュヒットサイズ
0.76
(e) Cache miss counts
(e)キャッシュミスカウント
0.63
(f) Cache miss sizes
(f)キャッシュミスサイズ
0.73
(g) Data reuse counts
(g)データ再利用数
0.85
(h) Data reuse sizes
(h)データ再利用サイズ
0.91
Figure 14: Periodogram of daily data.
図14: 日常データの周期図。
0.83
All eight features show the same peaks at 31 days and 62 days.
8つの特徴はいずれも31日62日で同じピークを示す。
0.77
physicists in California.
カリフォルニアの物理学者。
0.83
Our analysis shows that the SoCal Repo was able to reduce the network traffic by 57% for a large portion of the period of the study.
However, some periods of study show access patterns of streaming data which is an inefficient way of using the caching system, and impacts the performance of the backbone network.
Through this study, we developed a number of machine learning models to further explore the predictability of the cache utilization statistics.
本研究では,キャッシュ利用統計の予測可能性を探るために,複数の機械学習モデルを開発した。
0.87
Because the regional storage cache could predictably reduce the network utilization, we anticipate that a more general caching mechanism could benefit many more scientific communities beyond the specific physics community studied.
The study also reveals a number of unexpected characteristics worth further investigation.
この研究はまた、さらなる調査に値する予期せぬ特徴をいくつか明らかにしている。
0.53
For example, the cache hit rates decrease significantly during the most recent months of the study, and a need for a larger dataset to train LSTM models.
ACKNOWLEDGMENTS This work was supported by the Office of Advanced Scientific Computing Research, Office of Science, of the U.S. Department of Energy under Contract No.
ACKNOWLEDGMENTS この研究は、契約第1号の下でアメリカ合衆国エネルギー省の先端科学計算研究局(Office of Advanced Scientific Computing Research)によって支援された。
0.63
DE-AC02-05CH11231, and also used resources of the National Energy Research Scientific Computing Center (NERSC).
DE-AC02-05CH11231は、NERSC(National Energy Research Scientific Computing Center)のリソースも使用した。
0.70
This work was also supported by the National Science Foundation through the grants OAC-2030508, OAC-1836650, MPS-1148698, PHY-1120138 and OAC-1541349.
REFERENCES [1] L Bauerdick, D Benjamin, K Bloom, B Bockelman, D Bradley, S Dasu, M Ernst, R Gardner, A Hanushevsky, H Ito, D Lesny, P McGuigan, S McKee, O Rind, H Severini, I Sfiligoi, M Tadel, I Vukotic, S Williams, F Würthwein, A Yagil, and W Yang.
ReferenceS [1] L Bauerdick, D Benjamin, K Bloom, B Bockelman, D Bradley, S Dasu, M Ernst, R Gardner, A Hanushevsky, H Ito, D Lesny, P McGuigan, S McKee, O Rind, H Severini, I Sfiligoi, M Tadel, I Vukotic, S Williams, F Würthwein, A Yagil, W Yang 訳抜け防止モード: 参考文献 [1 ] l bauerdick, d benjamin, k bloom b・ボッケルマン d・ブラッドリー s・ダス m・エルンスト r・ガードナー a hanushevsky, h ito, d lesny, p mcguigan, s mckee, オ・リンド、h・セヴェリーニ、i sfiligoi、m tadel。 i vukotic, s williams, f würthwein, a yagil。 そしてwyangだ。
0.62
2012. Using Xrootd to Federate Regional Storage.
2012. XrootdをFederate Regional Storageに使用。
0.62
Journal of Physics: Conference Series 396, 4 (2012), 042009.
journal of physics: conference series 396, 4 (2012), 042009。
0.36
[2] L. Bauerdick, K. Bloom, B. Bockelman, D. Bradley, S. Dasu, J. Dost, I. Sfiligoi, A. Tadel, M. Tadel, F. Wuerthwein, A. Yafil, and the CMS collaboration.
[2] L. Bauerdick, K. Bloom, B. Bockelman, D. Bradley, S. Dasu, J. Dost, I. Sfiligoi, A. Tadel, M. Tadel, F. Wuerthwein, A. Yafil, CMSコラボレーション。 訳抜け防止モード: [2 ]L. Bauerdick, K. Bloom, B. Bockelman D. Bradley, S. Dasu, J. Dost, I. Sfiligoi A. Tadel, M. Tadel, F. Wuerthwein, A. Yafil そして、CMSコラボレーション。
0.96
2014. XRootd, disk-based, caching proxy for optimization of data access, data placement and data replication.
In 2013 IEEE 33rd International Conference on Distributed Computing Systems.
2013年、IEEE 33rd International Conference on Distributed Computing Systems に参加。
0.77
62–72. https://doi.org/10.1 109/ICDCS.2013.71
62–72. https://doi.org/10.1 109/ICDCS.2013.71
0.24
[13] Ruth Pordes, Don Petravick, Bill Kramer, Doug Olson, Miron Livny, Alain Roy, Paul Avery, Kent Blackburn, Torre Wenaus, Frank Würthwein, Ian Foster, Rob Gardner, Mike Wilde, Alan Blatecky, John McGee, and Rob Quick.
Ruth Pordes氏、Don Petravick氏、Bill Kramer氏、Doug Olson氏、Miron Livny氏、Alain Roy氏、Paul Avery氏、Kent Blackburn氏、Torre Wenaus氏、Frank Würthwein氏、Ian Foster氏、Rob Gardner氏、Mike Wilde氏、Alan Blatecky氏、John McGee氏、Rob Quick氏。 訳抜け防止モード: [13 ]ルース・ポルデス、ドン・ペトラビック、ビル・クラマー Doug Olson, Miron Livny, Alain Roy, Paul Avery Kent Blackburn, Torre Wenaus, Frank Würthwein, Ian Foster Rob Gardner, Mike Wilde, Alan Blatecky, John McGee そしてロブ・クイック。
0.81
2007. The open science grid.
2007. オープンサイエンスのグリッドです
0.46
Journal of Physics: Conference Series 78, 1 (2007), 012057.
Journal of Physics: Conference Series 78, 1 (2007), 012057。
0.38
[14] Rizzi, Andrea, Petrucciani, Giovanni, and Peruzzi, Marco.
14] Rizzi, Andrea, Petrucciani, Giovanni, Peruzzi, Marco
0.31
2019. A further reduction in CMS event data for analysis: the NANOAOD format.
2019. 分析用のCMSイベントデータのさらなる削減:NANOAODフォーマット。
0.54
EPJ Web Conf.
epj web conf(英語)
0.44
214 (2019), 06021.
214 (2019), 06021.
0.37
https://doi.org/10.1 051/epjconf/20192140 6021
https://doi.org/10.1 051/epjconf/20192140 6021
0.15
[5] A. Dorigo, P. Elmer, F. Furano, and A. Hanushevsky.
5] a. dorigo, p. elmer, f. furano, a. hanushevsky。
0.79
2005. XROOTD - A highly scalable architecture for data access.
2005. XROOTD - データアクセスのための高度にスケーラブルなアーキテクチャ。
0.60
WSEAS Transactions on Computers 4, 4 (2005), 348–353.
WSEAS Transactions on Computers 4, 4 (2005), 348–353。
0.43
[6] X. Espinal, S. Jezequel, M. Schulz, A. Sciabà, I. Vukotic, and F. Wuerthwein.
X. Espinal, S. Jezequel, M. Schulz, A. Sciabà, I. Vukotic, F. Wuerthwein
0.40
2020. The Quest to solve the HL-LHC data access puzzle.
2020. HL-LHCデータアクセスパズルを解くためのクエスト。
0.58
EPJ Web of Conferences 245 (2020), 04027.
EPJ Web of Conferences 245 (2020), 04027。
0.73
https://doi.org/10.1 051/epjconf/20202450 4027
https://doi.org/10.1 051/epjconf/20202450 4027
0.15
[7] E. Fajardo, A. Tadel, M. Tadel, B. Steer, T. Martin, and F. Würthwein.
E. Fajardo, A. Tadel, M. Tadel, B. Steer, T. Martin, F. Würthwein.
0.41
2018. A federated Xrootd cache.
2018. フェデレートされたXrootdキャッシュ。
0.50
Journal of Physics: Conference Series 1085 (2018), 032025.
journal of physics: conference series 1085 (2018)、032025。
0.38
[8] Edgar Fajardo, Derek Weitzel, Mats Rynge, Marian Zvada, John Hicks, Mat Selmeci, Brian Lin, Pascal Paschos, Brian Bockelman, Andrew Hanushevsky, Frank Würthwein, and Igor Sfiligoi.