Related papers: Detecting Handwritten Mathematical Terms with Sensor Based Data

Detecting Handwritten Mathematical Terms with Sensor Based Data

URL: http://arxiv.org/abs/2109.05594v1
Date: Sun, 12 Sep 2021 19:33:34 GMT
Title: Detecting Handwritten Mathematical Terms with Sensor Based Data
Authors: Lukas Wegmeth, Alexander Hoelzemann, Kristof Van Laerhoven
Abstract summary: We propose a solution to the UbiComp 2021 Challenge by Stabilo in which handwritten mathematical terms are supposed to be automatically classified. The input data set contains data of different writers, with label strings constructed from a total of 15 different possible characters.
Score: 71.84852429039881
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: In this work we propose a solution to the UbiComp 2021 Challenge by Stabilo in which handwritten mathematical terms are supposed to be automatically classified based on time series sensor data captured on the DigiPen. The input data set contains data of different writers, with label strings constructed from a total of 15 different possible characters. The label should first be split into separate characters to classify them one by one. This issue is solved by applying a data-dependant and rule-based information extraction algorithm to the labeled data. Using the resulting data, two classifiers are constructed. The first is a binary classifier that is able to predict, for unknown data, if a sample is part of a writing activity, and consists of a Deep Neural Network feature extractor in concatenation with a Random Forest that is trained to classify the extracted features at an F1 score of >90%. The second classifier is a Deep Neural Network that combines convolution layers with recurrent layers to predict windows with a single label, out of the 15 possible classes, at an F1 score of >60%. A simulation of the challenge evaluation procedure reports a Levensthein Distance of 8 and shows that the chosen approach still lacks in overall accuracy and real-time applicability.

Related papers

Benchmarking Fraud Detectors on Private Graph Data [70.4654745317714]
Currently, many types of fraud are managed in part by automated detection algorithms that operate over graphs.<n>We consider the scenario where a data holder wishes to outsource development of fraud detectors to third parties.<n>Third parties submit their fraud detectors to the data holder, who evaluates these algorithms on a private dataset and then publicly communicates the results.<n>We propose a realistic privacy attack on this system that allows an adversary to de-anonymize individuals' data based only on the evaluation results.
arXiv Detail & Related papers (2025-07-30T03:20:15Z)
Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance. DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator. Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z)
Novel Deep Neural Network Classifier Characterization Metrics with Applications to Dataless Evaluation [1.6574413179773757]
In this work, we evaluate a Deep Neural Network (DNN) classifier's training quality without any example dataset. Our empirical study of the proposed method for ResNet18, trained with CAFIR10 and CAFIR100 datasets, confirms that data-less evaluation of DNN classifiers is indeed possible.
arXiv Detail & Related papers (2024-07-17T20:40:46Z)
Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain. We propose an adversarial algorithm to make the retriever component robust against distribution shift. We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z)
A Self-Encoder for Learning Nearest Neighbors [5.297261090056809]
The self-encoder learns to distribute the data samples in the embedding space so that they are linearly separable from one another. Unlike regular nearest neighbors, the predictions resulting from this encoding of data are invariant to any scaling of features.
arXiv Detail & Related papers (2023-06-25T14:30:31Z)
A new data augmentation method for intent classification enhancement and its application on spoken conversation datasets [23.495743195811375]
We present the Nearest Neighbors Scores Improvement (NNSI) algorithm for automatic data selection and labeling. The NNSI reduces the need for manual labeling by automatically selecting highly-ambiguous samples and labeling them with high accuracy. We demonstrated the use of NNSI on two large-scale, real-life voice conversation systems.
arXiv Detail & Related papers (2022-02-21T11:36:19Z)
Transformers Can Do Bayesian Inference [56.99390658880008]
We present Prior-Data Fitted Networks (PFNs) PFNs leverage in-context learning in large-scale machine learning techniques to approximate a large set of posteriors. We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems.
arXiv Detail & Related papers (2021-12-20T13:07:39Z)
A Unified Generative Adversarial Network Training via Self-Labeling and Self-Attention [38.31735499785227]
We propose a novel GAN training scheme that can handle any level of labeling in a unified manner. Our scheme introduces a form of artificial labeling that can incorporate manually defined labels, when available. We evaluate our approach on CIFAR-10, STL-10 and SVHN, and show that both self-labeling and self-attention consistently improve the quality of generated data.
arXiv Detail & Related papers (2021-06-18T04:40:26Z)
Label Inference Attacks from Log-loss Scores [11.780563744330038]
In this paper, we investigate the problem of inferring the labels of a dataset from single (or multiple) log-loss score(s) without any other access to the dataset. Surprisingly, we show that for any finite number of label classes, it is possible to accurately infer the labels of the dataset from the reported log-loss score of a single carefully constructed prediction vector. We present label inference algorithms (attacks) that succeed even under addition of noise to the log-loss scores and under limited precision arithmetic.
arXiv Detail & Related papers (2021-05-18T04:17:06Z)
Unsupervised Label Refinement Improves Dataless Text Classification [48.031421660674745]
Dataless text classification is capable of classifying documents into previously unseen labels by assigning a score to any document paired with a label description. While promising, it crucially relies on accurate descriptions of the label set for each downstream task. This reliance causes dataless classifiers to be highly sensitive to the choice of label descriptions and hinders the broader application of dataless classification in practice.
arXiv Detail & Related papers (2020-12-08T03:37:50Z)
Classify and Generate Reciprocally: Simultaneous Positive-Unlabelled Learning and Conditional Generation with Extra Data [77.31213472792088]
The scarcity of class-labeled data is a ubiquitous bottleneck in many machine learning problems. We address this problem by leveraging Positive-Unlabeled(PU) classification and the conditional generation with extra unlabeled data. We present a novel training framework to jointly target both PU classification and conditional generation when exposed to extra data.
arXiv Detail & Related papers (2020-06-14T08:27:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.