Fine-tuning BERT-based models for Plant Health Bulletin Classification
- URL: http://arxiv.org/abs/2102.00838v1
- Date: Fri, 29 Jan 2021 08:14:35 GMT
- Title: Fine-tuning BERT-based models for Plant Health Bulletin Classification
- Authors: Shufan Jiang (CRESTIC, ISEP), Rafael Angarita (ISEP), Stephane Cormier
(CRESTIC), Francis Rousseaux (CRESTIC)
- Abstract summary: French Plants Health Bulletins (BSV) give information about the development stages of phytosanitary risks in agricultural production.
They are written in natural language, thus, machines and human cannot exploit them as efficiently as it could be.
Recent advancements Bidirectional Representations from Transformers (BERT) inspire us to rethink of knowledge representation and natural language understanding in plant health management domain.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the era of digitization, different actors in agriculture produce numerous
data. Such data contains already latent historical knowledge in the domain.
This knowledge enables us to precisely study natural hazards within global or
local aspects, and then improve the risk prevention tasks and augment the
yield, which helps to tackle the challenge of growing population and changing
alimentary habits. In particular, French Plants Health Bulletins (BSV, for its
name in French Bulletin de Sant{\'e} du V{\'e}g{\'e}tal) give information about
the development stages of phytosanitary risks in agricultural production.
However, they are written in natural language, thus, machines and human cannot
exploit them as efficiently as it could be. Natural language processing (NLP)
technologies aim to automatically process and analyze large amounts of natural
language data. Since the 2010s, with the increases in computational power and
parallelization, representation learning and deep learning methods became
widespread in NLP. Recent advancements Bidirectional Encoder Representations
from Transformers (BERT) inspire us to rethink of knowledge representation and
natural language understanding in plant health management domain. The goal in
this work is to propose a BERT-based approach to automatically classify the BSV
to make their data easily indexable. We sampled 200 BSV to finetune the
pretrained BERT language models and classify them as pest or/and disease and we
show preliminary results.
Related papers
- Detecting AI Generated Text Based on NLP and Machine Learning Approaches [0.0]
Recent advances in natural language processing may enable AI models to generate writing that is identical to human written form in the future.
This might have profound ethical, legal, and social repercussions.
Our approach includes a machine learning methods that can differentiate between electronically produced text and human-written text.
arXiv Detail & Related papers (2024-04-15T16:37:44Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data
Generation with Large Language Models [48.07083163501746]
Clinical natural language processing requires methods that can address domain-specific challenges.
We propose an innovative, resource-efficient approach, ClinGen, which infuses knowledge into the process.
Our empirical study across 7 clinical NLP tasks and 16 datasets reveals that ClinGen consistently enhances performance across various tasks.
arXiv Detail & Related papers (2023-11-01T04:37:28Z) - UMLS-KGI-BERT: Data-Centric Knowledge Integration in Transformers for
Biomedical Entity Recognition [4.865221751784403]
This work contributes a data-centric paradigm for enriching the language representations of biomedical transformer-encoder LMs by extracting text sequences from the UMLS.
Preliminary results from experiments in the extension of pre-trained LMs as well as training from scratch show that this framework improves downstream performance on multiple biomedical and clinical Named Entity Recognition (NER) tasks.
arXiv Detail & Related papers (2023-07-20T18:08:34Z) - Fine-Tuning Large Neural Language Models for Biomedical Natural Language
Processing [55.52858954615655]
We conduct a systematic study on fine-tuning stability in biomedical NLP.
We show that finetuning performance may be sensitive to pretraining settings, especially in low-resource domains.
We show that these techniques can substantially improve fine-tuning performance for lowresource biomedical NLP applications.
arXiv Detail & Related papers (2021-12-15T04:20:35Z) - Recent Advances in Natural Language Processing via Large Pre-Trained
Language Models: A Survey [67.82942975834924]
Large, pre-trained language models such as BERT have drastically changed the Natural Language Processing (NLP) field.
We present a survey of recent work that uses these large language models to solve NLP tasks via pre-training then fine-tuning, prompting, or text generation approaches.
arXiv Detail & Related papers (2021-11-01T20:08:05Z) - DRILL: Dynamic Representations for Imbalanced Lifelong Learning [15.606651610221416]
Continual or lifelong learning has been a long-standing challenge in machine learning to date.
We introduce DRILL, a novel continual learning architecture for open-domain text classification.
arXiv Detail & Related papers (2021-05-18T11:36:37Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z) - Neural Language Generation: Formulation, Methods, and Evaluation [13.62873478165553]
Recent advances in neural network-based generative modeling have reignited the hopes in having computer systems capable of seamlessly conversing with humans.
High capacity deep learning models trained on large scale datasets demonstrate unparalleled abilities to learn patterns in the data even in the lack of explicit supervision signals.
There is no standard way to assess the quality of text produced by these generative models, which constitutes a serious bottleneck towards the progress of the field.
arXiv Detail & Related papers (2020-07-31T00:08:28Z) - Domain-Specific Language Model Pretraining for Biomedical Natural
Language Processing [73.37262264915739]
We show that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains.
Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedical NLP tasks.
arXiv Detail & Related papers (2020-07-31T00:04:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.