Named Entity Extraction with Finite State Transducers
- URL: http://arxiv.org/abs/2006.11548v1
- Date: Sat, 20 Jun 2020 11:09:04 GMT
- Title: Named Entity Extraction with Finite State Transducers
- Authors: Diego Alexander Hu\'erfano Villalba and Elizabeth Le\'on Guzm\'an
- Abstract summary: We describe a named entity tagging system that requires minimal linguistic knowledge.
The system is based on the ideas of the Brill's tagger which makes it really simple.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We describe a named entity tagging system that requires minimal linguistic
knowledge and can be applied to more target languages without substantial
changes. The system is based on the ideas of the Brill's tagger which makes it
really simple. Using supervised machine learning, we construct a series of
automatons (or transducers) in order to tag a given text. The final model is
composed entirely of automatons and it requires a lineal time for tagging. It
was tested with the Spanish data set provided in the CoNLL-$2002$ attaining an
overall $F_{\beta = 1}$ measure of $60\%.$ Also, we present an algorithm for
the construction of the final transducer used to encode all the learned
contextual rules.
Related papers
- Automating Thought of Search: A Journey Towards Soundness and Completeness [20.944440404347908]
Planning remains one of the last standing bastions for large language models (LLMs)
We automate Thought of Search (ToS) completely taking the human out of the loop of solving planning problems.
We achieve 100% accuracy, with minimal feedback, using LLMs of various sizes on all evaluated domains.
arXiv Detail & Related papers (2024-08-21T04:19:52Z) - Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass [72.07642648108849]
Superposed Decoding is a new decoding algorithm that generates $k$ drafts at the cost of one autoregressive inference pass.
Superposed Decoding can be combined with other decoding strategies, resulting in universal coverage gains when scaling inference time compute.
arXiv Detail & Related papers (2024-05-28T17:40:48Z) - Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens [138.36729703589512]
We show that $n$-gram language models are still relevant in this era of neural large language models (LLMs)
This was done by modernizing $n$-gram LMs in two aspects. First, we train them at the same data scale as neural LLMs -- 5 trillion tokens.
Second, existing $n$-gram LMs use small $n$ which hinders their performance; we instead allow $n$ to be arbitrarily large, by introducing a new $infty$-gram LM with backoff.
arXiv Detail & Related papers (2024-01-30T19:03:49Z) - Introducing Rhetorical Parallelism Detection: A New Task with Datasets,
Metrics, and Baselines [8.405938712823565]
parallelism$ is the juxtaposition of phrases which have the same sequence of linguistic features.
Despite the ubiquity of parallelism, the field of natural language processing has seldom investigated it.
We construct a formal definition of it; we provide one new Latin dataset and one adapted Chinese dataset for it; we establish a family of metrics to evaluate performance on it.
arXiv Detail & Related papers (2023-11-30T15:24:57Z) - Guess & Sketch: Language Model Guided Transpilation [59.02147255276078]
Learned transpilation offers an alternative to manual re-writing and engineering efforts.
Probabilistic neural language models (LMs) produce plausible outputs for every input, but do so at the cost of guaranteed correctness.
Guess & Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence.
arXiv Detail & Related papers (2023-09-25T15:42:18Z) - On the Intersection of Context-Free and Regular Languages [71.61206349427509]
We generalize the Bar-Hillel construction to handle finite-state automatons with $varepsilon$-arcs.
We prove that our construction leads to a grammar that encodes the structure of both the input automaton and grammar while retaining the size of the original construction.
arXiv Detail & Related papers (2022-09-14T17:49:06Z) - Automatic question generation based on sentence structure analysis using
machine learning approach [0.0]
This article introduces our framework for generating factual questions from unstructured text in the English language.
It uses a combination of traditional linguistic approaches based on sentence patterns with several machine learning methods.
The framework also includes a question evaluation module which estimates the quality of generated questions.
arXiv Detail & Related papers (2022-05-25T14:35:29Z) - Breaking Writer's Block: Low-cost Fine-tuning of Natural Language
Generation Models [62.997667081978825]
We describe a system that fine-tunes a natural language generation model for the problem of solving Writer's Block.
The proposed fine-tuning obtains excellent results, even with a small number of epochs and a total cost of USD 150.
arXiv Detail & Related papers (2020-12-19T11:19:11Z) - Automatic Extraction of Rules Governing Morphological Agreement [103.78033184221373]
We develop an automated framework for extracting a first-pass grammatical specification from raw text.
We focus on extracting rules describing agreement, a morphosyntactic phenomenon at the core of the grammars of many of the world's languages.
We apply our framework to all languages included in the Universal Dependencies project, with promising results.
arXiv Detail & Related papers (2020-10-02T18:31:45Z) - DeCLUTR: Deep Contrastive Learning for Unsupervised Textual
Representations [4.36561468436181]
We present DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations.
Our approach closes the performance gap between unsupervised and supervised pretraining for universal sentence encoders.
Our code and pretrained models are publicly available and can be easily adapted to new domains or used to embed unseen text.
arXiv Detail & Related papers (2020-06-05T20:00:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.