Hidden Markov Based Mathematical Model dedicated to Extract Ingredients
from Recipe Text
- URL: http://arxiv.org/abs/2110.15707v1
- Date: Tue, 28 Sep 2021 14:38:11 GMT
- Title: Hidden Markov Based Mathematical Model dedicated to Extract Ingredients
from Recipe Text
- Authors: Zied Baklouti (UP, ENIT)
- Abstract summary: Partof-speech tagging (POS tagging) is a pre-processing task that requires an annotated corpus.
I performed a mathematical model based on Hidden Markov structures and obtained a high-level accuracy of ingredients extracted from text recipe.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural Language Processing (NLP) is a branch of artificial intelligence that
gives machines the ability to decode human languages. Partof-speech tagging
(POS tagging) is a pre-processing task that requires an annotated corpus.
Rule-based and stochastic methods showed remarkable results for POS tag
prediction. On this work, I performed a mathematical model based on Hidden
Markov structures and I obtained a high-level accuracy of ingredients extracted
from text recipe with performances greater than what traditional methods could
make without unknown words consideration.
Related papers
- Batching BPE Tokenization Merges [55.2480439325792]
BatchBPE is an open-source pure Python implementation of the Byte Pair algorithm.
It is used to train a high quality tokenizer on a basic laptop.
arXiv Detail & Related papers (2024-08-05T09:37:21Z) - Extracting Definienda in Mathematical Scholarly Articles with
Transformers [0.0]
We consider automatically identifying the defined term within a mathematical definition from the text of an academic article.
It is possible to reach high levels of precision and recall using either recent (and expensive) GPT 4 or simpler pre-trained models fine-tuned on our task.
arXiv Detail & Related papers (2023-11-21T08:58:57Z) - Nonparametric Masked Language Modeling [113.71921977520864]
Existing language models (LMs) predict tokens with a softmax over a finite vocabulary.
We introduce NPM, the first nonparametric masked language model that replaces this softmax with a nonparametric distribution over every phrase in a reference corpus.
NPM can be efficiently trained with a contrastive objective and an in-batch approximation to full corpus retrieval.
arXiv Detail & Related papers (2022-12-02T18:10:42Z) - A Transformer Architecture for Online Gesture Recognition of
Mathematical Expressions [0.0]
Transformer architecture is shown to provide an end-to-end model for building expression trees from online handwritten gestures corresponding to glyph strokes.
The attention mechanism was successfully used to encode, learn and enforce the underlying syntax of expressions.
For the first time, the encoder is fed with unseen online-temporal data tokens potentially forming an infinitely large vocabulary.
arXiv Detail & Related papers (2022-11-04T17:55:55Z) - Classifiers are Better Experts for Controllable Text Generation [63.17266060165098]
We show that the proposed method significantly outperforms recent PPLM, GeDi, and DExperts on PPL and sentiment accuracy based on the external classifier of generated texts.
The same time, it is also easier to implement and tune, and has significantly fewer restrictions and requirements.
arXiv Detail & Related papers (2022-05-15T12:58:35Z) - Compositional Task-Oriented Parsing as Abstractive Question Answering [25.682923914685063]
Task-oriented parsing aims to convert natural language into machine-readable representations of specific tasks, such as setting an alarm.
A popular approach to TOP is to apply seq2seq models to generate linearized parse trees.
A more recent line of work argues that pretrained seq2seq models are better at generating outputs that are themselves natural language, so they replace linearized parse trees with canonical natural-language paraphrases.
arXiv Detail & Related papers (2022-05-04T14:01:08Z) - Syntax-Aware Network for Handwritten Mathematical Expression Recognition [53.130826547287626]
Handwritten mathematical expression recognition (HMER) is a challenging task that has many potential applications.
Recent methods for HMER have achieved outstanding performance with an encoder-decoder architecture.
We propose a simple and efficient method for HMER, which is the first to incorporate syntax information into an encoder-decoder network.
arXiv Detail & Related papers (2022-03-03T09:57:19Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - Feature Extraction of Text for Deep Learning Algorithms: Application on
Fake News Detection [0.0]
It will be shown that by using deep learning algorithms and alphabet frequencies of the original text of a news without any information about the sequence of the alphabet can actually be used to classify fake news and trustworthy ones in high accuracy.
It seems that alphabet frequencies contains some useful features for understanding complex context or meaning of the original text.
arXiv Detail & Related papers (2020-10-12T07:43:01Z) - Exploring Software Naturalness through Neural Language Models [56.1315223210742]
The Software Naturalness hypothesis argues that programming languages can be understood through the same techniques used in natural language processing.
We explore this hypothesis through the use of a pre-trained transformer-based language model to perform code analysis tasks.
arXiv Detail & Related papers (2020-06-22T21:56:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.