Related papers: COMBO: State-of-the-Art Morphosyntactic Analysis

COMBO: State-of-the-Art Morphosyntactic Analysis

URL: http://arxiv.org/abs/2109.05361v1
Date: Sat, 11 Sep 2021 20:00:20 GMT
Title: COMBO: State-of-the-Art Morphosyntactic Analysis
Authors: Mateusz Klimaszewski, Alina Wr\'oblewska
Abstract summary: COMBO is a fully neural NLP system for accurate part-of-speech tagging, morphological analysis, lemmatisation, and (enhanced) dependency parsing. It predicts categorical morphosyntactic features whilst also exposing their vector representations, extracted from hidden layers. It is an easy to install Python package with automatically downloadable pre-trained models for over 40 languages.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce COMBO - a fully neural NLP system for accurate part-of-speech tagging, morphological analysis, lemmatisation, and (enhanced) dependency parsing. It predicts categorical morphosyntactic features whilst also exposes their vector representations, extracted from hidden layers. COMBO is an easy to install Python package with automatically downloadable pre-trained models for over 40 languages. It maintains a balance between efficiency and quality. As it is an end-to-end system and its modules are jointly trained, its training is competitively fast. As its models are optimised for accuracy, they achieve often better prediction quality than SOTA. The COMBO library is available at: https://gitlab.clarin-pl.eu/syntactic-tools/combo.

Related papers

Open-Source Web Service with Morphological Dictionary-Supplemented Deep Learning for Morphosyntactic Analysis of Czech [1.7871207544302354]
We present an open-source web service for Czech morphosyntactic analysis. The system combines a deep learning model with rescoring by a high-precision morphological dictionary at inference time.
arXiv Detail & Related papers (2024-06-18T09:14:58Z)
UncertaintyPlayground: A Fast and Simplified Python Library for Uncertainty Estimation [0.0]
UncertaintyPlayground is a Python library built on PyTorch and GPyTorch for uncertainty estimation in supervised learning tasks. The library offers fast training for Gaussian and multi-modal outcome distributions. It can visualize the prediction intervals of one or more instances.
arXiv Detail & Related papers (2023-10-23T18:36:54Z)
LCE: An Augmented Combination of Bagging and Boosting in Python [45.65284933207566]
lcensemble is a high-performing, scalable and user-friendly Python package for the general tasks of classification and regression. Local Cascade Ensemble (LCE) is a machine learning method that further enhances the prediction performance of the current state-of-the-art methods Random Forest and XGBoost.
arXiv Detail & Related papers (2023-08-14T16:34:47Z)
Bayesian Optimization of Catalysts With In-context Learning [0.0]
Large language models (LLMs) are able to do accurate classification with zero or only a few examples. We show a prompting system that enables regression with uncertainty for in-context learning with frozen LLMs.
arXiv Detail & Related papers (2023-04-11T17:00:35Z)
Enhancing Self-Consistency and Performance of Pre-Trained Language Models through Natural Language Inference [72.61732440246954]
Large pre-trained language models often lack logical consistency across test inputs. We propose a framework, ConCoRD, for boosting the consistency and accuracy of pre-trained NLP models. We show that ConCoRD consistently boosts accuracy and consistency of off-the-shelf closed-book QA and VQA models.
arXiv Detail & Related papers (2022-11-21T21:58:30Z)
Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV) NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones. We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z)
Large-Margin Representation Learning for Texture Classification [67.94823375350433]
This paper presents a novel approach combining convolutional layers (CLs) and large-margin metric learning for training supervised models on small datasets for texture classification. The experimental results on texture and histopathologic image datasets have shown that the proposed approach achieves competitive accuracy with lower computational cost and faster convergence when compared to equivalent CNNs.
arXiv Detail & Related papers (2022-06-17T04:07:45Z)
Embarrassingly Simple Performance Prediction for Abductive Natural Language Inference [10.536415845097661]
We propose a method for predicting the performance of NLI models without fine-tuning them. We show that the accuracy of the cosine similarity approach correlates strongly with the accuracy of the classification approach with a Pearson correlation coefficient of 0.65. Our method can lead to significant time savings in the process of model selection.
arXiv Detail & Related papers (2022-02-21T18:10:24Z)
Inducing Transformer's Compositional Generalization Ability via Auxiliary Sequence Prediction Tasks [86.10875837475783]
Systematic compositionality is an essential mechanism in human language, allowing the recombination of known parts to create novel expressions. Existing neural models have been shown to lack this basic ability in learning symbolic structures. We propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics.
arXiv Detail & Related papers (2021-09-30T16:41:19Z)
Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT) Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder. We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z)
Parameter Space Factorization for Zero-Shot Learning across Tasks and Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters. We infer the posteriors over such latent variables based on data from seen task-language combinations. Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.