Detecting Handwritten Mathematical Terms with Sensor Based Data
- URL: http://arxiv.org/abs/2109.05594v1
- Date: Sun, 12 Sep 2021 19:33:34 GMT
- Title: Detecting Handwritten Mathematical Terms with Sensor Based Data
- Authors: Lukas Wegmeth, Alexander Hoelzemann, Kristof Van Laerhoven
- Abstract summary: We propose a solution to the UbiComp 2021 Challenge by Stabilo in which handwritten mathematical terms are supposed to be automatically classified.
The input data set contains data of different writers, with label strings constructed from a total of 15 different possible characters.
- Score: 71.84852429039881
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this work we propose a solution to the UbiComp 2021 Challenge by Stabilo
in which handwritten mathematical terms are supposed to be automatically
classified based on time series sensor data captured on the DigiPen. The input
data set contains data of different writers, with label strings constructed
from a total of 15 different possible characters. The label should first be
split into separate characters to classify them one by one. This issue is
solved by applying a data-dependant and rule-based information extraction
algorithm to the labeled data. Using the resulting data, two classifiers are
constructed. The first is a binary classifier that is able to predict, for
unknown data, if a sample is part of a writing activity, and consists of a Deep
Neural Network feature extractor in concatenation with a Random Forest that is
trained to classify the extracted features at an F1 score of >90%. The second
classifier is a Deep Neural Network that combines convolution layers with
recurrent layers to predict windows with a single label, out of the 15 possible
classes, at an F1 score of >60%. A simulation of the challenge evaluation
procedure reports a Levensthein Distance of 8 and shows that the chosen
approach still lacks in overall accuracy and real-time applicability.
Related papers
- Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Novel Deep Neural Network Classifier Characterization Metrics with Applications to Dataless Evaluation [1.6574413179773757]
In this work, we evaluate a Deep Neural Network (DNN) classifier's training quality without any example dataset.
Our empirical study of the proposed method for ResNet18, trained with CAFIR10 and CAFIR100 datasets, confirms that data-less evaluation of DNN classifiers is indeed possible.
arXiv Detail & Related papers (2024-07-17T20:40:46Z) - Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain.
We propose an adversarial algorithm to make the retriever component robust against distribution shift.
We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z) - A Self-Encoder for Learning Nearest Neighbors [5.297261090056809]
The self-encoder learns to distribute the data samples in the embedding space so that they are linearly separable from one another.
Unlike regular nearest neighbors, the predictions resulting from this encoding of data are invariant to any scaling of features.
arXiv Detail & Related papers (2023-06-25T14:30:31Z) - A new data augmentation method for intent classification enhancement and
its application on spoken conversation datasets [23.495743195811375]
We present the Nearest Neighbors Scores Improvement (NNSI) algorithm for automatic data selection and labeling.
The NNSI reduces the need for manual labeling by automatically selecting highly-ambiguous samples and labeling them with high accuracy.
We demonstrated the use of NNSI on two large-scale, real-life voice conversation systems.
arXiv Detail & Related papers (2022-02-21T11:36:19Z) - Transformers Can Do Bayesian Inference [56.99390658880008]
We present Prior-Data Fitted Networks (PFNs)
PFNs leverage in-context learning in large-scale machine learning techniques to approximate a large set of posteriors.
We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems.
arXiv Detail & Related papers (2021-12-20T13:07:39Z) - A Unified Generative Adversarial Network Training via Self-Labeling and
Self-Attention [38.31735499785227]
We propose a novel GAN training scheme that can handle any level of labeling in a unified manner.
Our scheme introduces a form of artificial labeling that can incorporate manually defined labels, when available.
We evaluate our approach on CIFAR-10, STL-10 and SVHN, and show that both self-labeling and self-attention consistently improve the quality of generated data.
arXiv Detail & Related papers (2021-06-18T04:40:26Z) - Label Inference Attacks from Log-loss Scores [11.780563744330038]
In this paper, we investigate the problem of inferring the labels of a dataset from single (or multiple) log-loss score(s) without any other access to the dataset.
Surprisingly, we show that for any finite number of label classes, it is possible to accurately infer the labels of the dataset from the reported log-loss score of a single carefully constructed prediction vector.
We present label inference algorithms (attacks) that succeed even under addition of noise to the log-loss scores and under limited precision arithmetic.
arXiv Detail & Related papers (2021-05-18T04:17:06Z) - Unsupervised Label Refinement Improves Dataless Text Classification [48.031421660674745]
Dataless text classification is capable of classifying documents into previously unseen labels by assigning a score to any document paired with a label description.
While promising, it crucially relies on accurate descriptions of the label set for each downstream task.
This reliance causes dataless classifiers to be highly sensitive to the choice of label descriptions and hinders the broader application of dataless classification in practice.
arXiv Detail & Related papers (2020-12-08T03:37:50Z) - Classify and Generate Reciprocally: Simultaneous Positive-Unlabelled
Learning and Conditional Generation with Extra Data [77.31213472792088]
The scarcity of class-labeled data is a ubiquitous bottleneck in many machine learning problems.
We address this problem by leveraging Positive-Unlabeled(PU) classification and the conditional generation with extra unlabeled data.
We present a novel training framework to jointly target both PU classification and conditional generation when exposed to extra data.
arXiv Detail & Related papers (2020-06-14T08:27:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.