This paper presents a convolutional neural network based foot motion tracking
with only six-axis Inertial-Measurement -Unit (IMU) sensor data. The presented
approach can adapt to various walking conditions by adopting differential and
window based input. The training data are further augmented by sliding and
random window samplings on IMU sensor data to increase data diversity for
better performance. The proposed approach fuses predictions of three
dimensional output into one model. The proposed fused model can achieve average
error of 2.30+-2.23 cm in X-axis, 0.91+-0.95 cm in Y-axis and 0.58+-0.52 cm in
Z-axis.
Jien De Sui, and Tian Sheuan Chang Institute of Electronics, National Chiao Tung University, Hsinchu 300, Taiwan
台湾・新中300国立華東大学 十円デ・スと天山チャン電機研究所
0.54
Abstract— This paper presents a convolutional neural network based foot motion tracking with only six-axis InertialMeasurement- Unit (IMU) sensor data.
The optical based tracking methods can provide accurate tracking results but suffer from high product cost, complex setup and limited test environments [5-8].
On the other hand, the IMU (inertial measurement unit) based sensory systems allow measurement entity to be neither constrained in motion nor to any specific environments.
Existing IMU based designs assumes zero velocity update and use double integral on IMU data to recreate the trajectory [9-10], which will need per sensor based or personalized calibrations to get correct results.
This paper presents the first trial of foot motion tracking with convolutional neural network (CNN).
本稿では,畳み込みニューラルネットワーク(CNN)を用いた足の動き追跡の最初の試行について述べる。
0.68
CNN is a popular deep learning approach in recent years due to its excellent performance for speech and computer vision (e g , image classification, action or gesture recognition).
Several works have applied CNN on IMU data for activity recognition [12-14].
IMUデータにCNNを適用した研究がいくつかある[12-14]。
0.60
All these works are for classification.
これらの作品は全て分類のためのものである。
0.52
On the other hand, the proposed work is for regression that predicts foot trajectory with CNN.
一方,提案する研究は,CNNによる足の軌道予測のためのものである。
0.64
In this paper, the foot trajectory per step is learned in an end-to-end way without explicit calibrations [9] or assumptions of zero velocity update [10].
This model also uses differential input and output to avoid absolute coordinate reference, and two data augmentations to increase data diversity for better robustness.
window based input to adapt to different length per step.
ウィンドウベースの入力は、ステップごとに異なる長さに適応する。
0.63
The rest of the paper is organized as follows.
残りの論文は以下の通り整理される。
0.66
Section II shows the presented approach.
第2節は、提示されたアプローチを示す。
0.42
Section III shows the results and Section IV concludes this paper.
第3節は結果を示し、第4節はこの論文を締めくくる。
0.49
A. Sensor II.
A.センサー II。
0.79
METHODOLOGY Fig. 1 The sensor system architecture
方法論 第1図 センサシステムアーキテクチャ
0.43
(a) and its PCB (b).
(a)とそのPCB (b)
0.56
(c) The position of sensor on the foot.
(c)足のセンサーの位置。
0.60
Fig. 1 shows the sensor system architecture, which consists of a 9axis IMU from InvenSense (MPU-9250), 1Gb flash memory for storage, and a Bluetooth chip for wireless data transmission.
The IMU consists of tri-axis accelerometer, tri-axis gyroscope and tri-axis magnetometer.
IMUは三軸加速度計、三軸ジャイロスコープ、三軸磁気計からなる。
0.58
This system is integrated in a PCB of form factor 30
このシステムは、フォームファクタ30のPCBに統合される
0.87
J. D. Sui and T. S. Chang, "Deep Gait Tracking With Inertial Measurement Unit," in IEEE Sensors Letters, vol.
J.D. SuiとT.S. Chang, "Deep Gait Tracking with Inertial Measurement Unit" in IEEE Sensors Letters, vol. 訳抜け防止モード: j. d. sui と t.s. chang 「慣性測定ユニットを用いた深部歩行追跡」 ieeeセンサーの文字はvol.1。
0.64
3, no. 11, pp. 1-4, Nov. 2019, Art no. 7002404, doi: 10.1109/LSENS.2019.2 947625.
3 no. 11, pp. 1-4, nov. 2019, art no. 7002404, doi: 10.1109/lsens.2019.2 947625。
This prototype consumes 9.3mA current for the highest the data amount will become 344,935 sample points (2315 seconds), sampling rate at 1000 Hz, and can ensure all day long continuous data which is 8.6 times than the original one.
To fit the fixed size input requirement of the CNN model, we use a window based input that consists of 150 sample points per window as shown in Fig 2, which can help adapt to different length per step.
To further increase training data amounts, we implement two data augmentation methods for the proposed CNN model: sliding window and random window sampling.
The first method, sliding window, is to split one step of sensor data to 150sample-point windows with 10-sample-point overlapping between windows, as shown in Fig 2.
Similar operations are also applied to the ground truth but use 30-sample-point windows with 2-samplepoint overlapping since the sampling rate of the Vicon system is roughly five times smaller than that in IMUs.
With this method, the available data amount will become 110,856 sample points (744 seconds), which is 2.8 times than the original one.
この方法では、利用可能なデータ量は110,856点(744秒)となり、これは元の2.8倍になる。
0.75
Fig. 2 Data augmentation with the window based method.
第2図 ウィンドウベースの方法によるデータ拡張。
0.73
The second data augmentation, random window, randomly chose five 150-sample-point windows from each step instead of the fixed and overlapped approach as the sliding window.
The proposed motion tracking is then applied for each step.
提案したモーショントラッキングは各ステップに適用される。
0.79
Beyond segmentation, directly applying raw data to the regression model for motion trajectory faces the difficulty to track absolute coordinates of the ground truth data, which is not reasonable since similar trajectory could occur just with different baseline.
Such differential value approach can make the model more robust.
このような微分値アプローチは、モデルをより堅牢にすることができる。
0.56
Thus, input sensor data become 6 × 149 ×1 (sensors × time segment size × channel), where 6 for 6-axis sensor data (tri-axis accelerometer, and tri-axis gyroscope) and 149 for data length after preprocessing.
The ground truth data is 3 × 29, where 3 for three-dimension space axis.
基底真理データは 3 × 29 であり、3次元空間軸は 3 である。
0.85
E. Network Architecture
E.ネットワークアーキテクチャ
0.86
Fig. 3 shows the two proposed network architectures based on convolutional layers and fully connected layers.
第3図は、畳み込み層と完全連結層に基づく2つのネットワークアーキテクチャを示している。
0.79
The first one uses a single model to predict (X, Y, Z) positions at the same time, denoted as the fused model.
最初のモデルでは、1つのモデルを使って(X, Y, Z)の位置を同時に予測し、融合モデルと表現する。
0.84
The second one uses three separated models to predict (X, Y, Z) positions respectively, denoted as the independent model.
後者は3つの分離モデルを用いてそれぞれ(X, Y, Z)の位置を予測し、独立モデルと表現する。
0.85
The fused model has shared network layers for (X, Y, Z) predictions, while the independent model can optimize the network tailored for each axis.
融合モデルは (x, y, z) 予測のための共有ネットワーク層を持ち、独立モデルは各軸向けに調整されたネットワークを最適化することができる。
0.80
All these models use the same input sensor data structure and the same network architecture, which includes nine convolutional layers and two fully connected layers.
For the convolutional layers we set channel number N1 = 64, N2 = 64, N3 = 128, N4 = 128, N5 = 256, N6 = 256, N7 = 512, N8 = 512 and N9 = 1024, with 3×3 filter size and Rectified Linear Unit (ReLU) as the activation function.
In addition, batch-normalization and max pooling operations are applied after activation function successively.
さらに、アクティベーション関数の後にバッチ正規化と最大プーリング操作を順次適用する。
0.75
For max pooling, the filter size is set to 1 × 2 (sensors × time, downsampling only in the time axis) so that we can get wider receptive field along the time axis instead of sensor axis to avoid information loss.
For the fused model, we combine all the RMSE loss values from each axis with weighted factors to balance between different scales of three axis values.
The 3-D fusion model can be regarded as multi-task learning.
3次元融合モデルはマルチタスク学習と見なすことができる。
0.84
With such multi-task networks, we will not only excel at accuracy due to regularization by multi-task learning to reduce the probability of overfitting [16], but also reduce model complexity to one model instead of three models.
REFERENCES [1] R. N. Kirkwood, B. de Souza Moreira, M. L. D. C. Vallone, S. A. Mingoti, R. C. Dias, and R. F. Sampaio, "Step length appears to be a strong discriminant gait parameter for elderly females highly concerned about falls: a cross-sectional observational study," Physiotherapy, vol.
参考 [1] r. n. kirkwood, b. de souza moreira, m. l. d. c. vallone, s. a. mingoti, r. c. dias, r. f. sampaio, 「ステップの長さは、転倒を懸念する高齢女性にとって、強い差別的な歩行パラメータであるように見える: 横断的観察的研究」 理学療法, vol.
0.58
97, pp. 126-131, June.
97, pp. 126-131, 6月。
0.77
2011. [2] M. E. Morris, R. Iansek, T. A. Matyas, and J. J. Summers, "Stride length regulation in Parkinson's disease: normalization strategies and underlying mechanisms," Brain, vol.
2011. [2] m. e. morris, r. ianek, t. a. matyas, j. j. summers, and "stride length regulation in parkinson's disease: normalization strategies and underlying mechanisms" brain, vol. パーキンソン病におけるストライド長の調節について)。 訳抜け防止モード: 2011. M. E. Morris, R. Iansek, T. A. Matyas, J・J・サマーズ「パーキンソン病におけるストライド長制御 : 正規化戦略とメカニズム」 脳、脳。
0.58
119, pp. 551-568, April.
119, pp. 551-568, April。
0.92
1996. [3] C. H. Chen, “Accelerometer Only Gait Parameter Extraction With All Convolutional Neural Network,” M.S. thesis, Institute of Electronics, National Chiao Tung University, Hsinchu, Taiwan, 2018.
1996. C. H. Chen, “Accelerometer Only Gait Parameter extract with All Convolutional Neural Network”. M.S.thesis, Institute of Electronics, National Chiao Tung University, Hsinchu, Taiwan, 2018. 訳抜け防止モード: 1996. C. H. Chen, “Accelerometer only Gait Parameter extract with all Convolutional Neural Network” M.S.thesis, Institute of Electronics, National Chiao Tung University (英語) 日中、台湾、2018年。
0.61
[4] J. Hannink, T. Kautz, C. F. Pasluosta, K. G. Gaßmann, J. Klucken, and B. M. Eskofier, "Sensor-based gait parameter extraction with deep convolutional neural networks," IEEE journal of biomedical and health informatics, vol.
4] j. hannink, t. kautz, c. f. pasluosta, k. g. gaßmann, j. klucken, b. m. eskofier, "深い畳み込みニューラルネットワークを用いたセンサベースの歩行パラメータ抽出" ieee journal of biomedical and health informatics, vol.
0.82
21, pp. 85-93, Dec. 2016.
21 pp. 85-93, 2016年12月。
0.79
[5] G. Zhao, G. Liu, H. Li, and M. Pietikainen, "3D gait recognition using multiple cameras," in 7th International Conference on Automatic Face and Gesture Recognition (FGR06), Southampton, UK, 2006, pp. 529-534.
G. Zhao, G. Liu, H. Li, M. Pietikainen, "3D Gait recognition using multiple camera" in 7th International Conference on Automatic Face and Gesture Recognition (FGR06, Southampton, UK, 2006 pp. 529-534。 訳抜け防止モード: [5]G.Zhao,G.Liu,H.Li, M. Pietikainen「マルチカメラを用いた3D歩行認識」 第7回顔とジェスチャーの自動認識国際会議(FGR06)に参加して サウサンプトン, 2006, pp. 529 - 534。
0.82
[6] R. Urtasun and P. Fua, "3D tracking for gait characterization and recognition," in Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004.
第6回IEEE International Conference on Automatic Face and Gesture Recognition, 2004 において, R. Urtasun と P. Fua, “3D tracking for gait Characterization and Recognition” を報告した。
0.87
Proceedings. , Seoul, South Korea, South Korea, 2004, pp. 17-22.
手続きだ ソウル,韓国,韓国,2004,p.17-22。
0.55
[7] Q. Cai and J. K. Aggarwal, "Tracking human motion in structured environments using a distributed-camera system," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.
7] q. cai と j. k. aggarwal, "分散カメラシステムを用いた構造化環境におけるヒューマンモーションの追跡",ieee transactions on pattern analysis and machine intelligence, vol。 訳抜け防止モード: [7 ] Q. Cai と J. K. Aggarwal, 分散カメラシステムを用いた構造環境における人間の動きの追跡 IEEE Transactions on Pattern Analysis and Machine Intelligence , vol。
0.91
21, pp. 1241-1247, Nov. 1999.
21p.1241-1247、1999年11月。
0.74
[8] S. L. Dockstader and A. M. Tekalp, "Multiple camera tracking of interacting and occluded human motion," Proceedings of the IEEE, vol.
2001. [9] S. O. H. Madgwick, A. J. L. Harrison, and R. Vaidyanathan, "Estimation of IMU and MARG orientation using a gradient descent algorithm," in 2011 IEEE international conference on rehabilitation robotics, ETH Zurich Science City, Switzerland, 2011, pp. 1-7.
2001. 9] S. O. H. Madgwick, A. J. L. Harrison, R. Vaidyanathan, “Estimation of IMU and MARG orientation using a gradient descent algorithm” in 2011 IEEE International conference on rehabilitation Roboticss, ETH Zurich Science City, Switzerland, 2011, pp. 1-7。 訳抜け防止モード: 2001. [9]S.O.H.マドウィック、A.J.L.ハリソン R. Vaidyanathan 「勾配降下アルゴリズムによるIMUとMARGの向きの推定」 2011年IEEE国際会議「ETH Zurich Science City」に参加して スイス、2011年、p.1-7。
0.59
[10] Y. S. Suh and S. Park, "Pedestrian inertial navigation with gait phase detection assisted zero velocity updating," in 2009 4th International Conference on Autonomous Robots and Agents, Wellington, New Zealand, 2009, pp. 336-341.
10] y. s. suh and s. park, "pedestrian inertial navigation with gait phase detection assisted zero velocity updates" 2009年、ニュージーランドのウェリントンで開催された第4回自律ロボットとエージェントに関する国際会議。
0.69
[11] G. Ian, B. Yoshua, and C. Aaron, Deep Learning.
11] g. ian、b. yoshua、c. aaron、ディープラーニング。
0.65
Cambridge, MA, USA: The MIT
ケンブリッジ, MA, USA: The MIT
0.77
Press, 2016. [12] C. A. Ronao and S.
2016年出版。 [12]C.A.ローナオとS.
0.51
-B. Cho, "Human activity recognition with smartphone sensors using deep learning neural networks," Expert systems with applications, vol.
[13] O. Dehzangi, M. Taherisadr, and R. ChangalVala, "IMU-based gait recognition using convolutional neural networks and multi-sensor fusion," Sensors, vol.
13] O. Dehzangi, M. Taherisadr, R. ChangalVala, "畳み込みニューラルネットワークとマルチセンサー融合を用いたIMUに基づく歩行認識" Sensors, vol。
0.86
17, p. 2735, Nov. 2017.
17,p.2735,2017年11月。
0.74
[14] S. Bhattacharya and N. D. Lane, "From smart to deep: Robust activity recognition on smartwatches using deep learning," in 2016 IEEE International Conference on Pervasive Computing and Communication Workshops (PerCom Workshops), Sydney, NSW, Australia, 2016, pp. 1-6.
14] s. bhattacharya と n. d. lane, "from smart to deep: robust activity recognition on smartwatchs using deep learning" 2016 ieee international conference on pervasive computing and communication workshops (percom workshops), sydney, nsw, australia, pp. 1-6。 訳抜け防止モード: [14 ]S. Bhattacharya,N.D. Lane,「スマートからディープへ : ディープラーニングを用いたスマートウォッチにおけるロバストなアクティビティ認識」 2016年IEEE International Conference on Pervasive Computing and Communication Workshops (PerCom Workshops) シドニー, NSW, Australia, 2016, pp. 1 - 6。
0.81
[15] Y. S. Liu and K. A. Wen, “High Precision Trajectory Reconstruction for Wearable
[15]Y・S・リューとK・A・ウェン「ウェアラブル用高精度軌道再構成」
0.54
System Using Consumer- Grade IMU,” Tech.
消費者級imuを使ったシステムだ”、とtech.comは語る。
0.35
Report. , May. 2018.
報告しろ 2018年5月。
0.56
[16] G. Sistu, I. Leang, and S. Yogamani, "Real-time joint object detection and semantic segmentation network for automated driving," arXiv preprint arXiv:1901.03912, 2019.
G. Sistu, I. Leang, and S. Yogamani, "Real-time joint object detection and semantic segmentation network for automated driving", arXiv preprint arXiv:1901.03912, 2019。 訳抜け防止モード: [16 ]G.Sistu,I.Leang,S.Yo gamani 「リアルタイム共同物体検出と自動運転のためのセマンティックセグメンテーションネットワーク」 arXiv preprint arXiv:1901.03912, 2019。
0.84
Table 2 also shows the effectiveness of the two data augmentation methods.
表2は2つのデータ拡張手法の有効性を示す。
0.75
When compared to original design without window based input, window based data augmentations can improve the errors significantly.
In which, sliding window has lower accuracy than the random window, which could be due to similar window partitioning between training and testing data in this case.
The differential input approach is necessary for model success, as shown in Table 2.
表2に示すように、差分入力アプローチはモデルの成功に必要である。
0.81
The errors of raw data are much higher than those with differential data.
生データの誤差は、差分データよりもはるかに高い。
0.68
In summary, the proposed approach with 3D fusion network and combined data augmentations can achieve average error of 2.30 cm in X-axis, 0.91 cm in Y-axis and 0.58 cm in Z-axis.
In this scenario, 10% of steps from the training set will be the validation set.
このシナリオでは、トレーニングセットからのステップの10%が検証セットになります。
0.72
The lowest error of the independent walker test is 7.78±5.31cm, 1.49±1.14cm, 1.03±0.53cm for X, Y, and Z respectively, which is also from the 3D fusion with 9 convolution layers.