Named Entity Recognition for Monitoring Plant Health Threats in Tweets:
a ChouBERT Approach
- URL: http://arxiv.org/abs/2310.12522v1
- Date: Thu, 19 Oct 2023 06:54:55 GMT
- Title: Named Entity Recognition for Monitoring Plant Health Threats in Tweets:
a ChouBERT Approach
- Authors: Shufan Jiang (CRESTIC, ISEP), Rafael Angarita (ISEP), St\'ephane
Cormier (CRESTIC), Francis Rousseaux (CRESTIC)
- Abstract summary: ChouBERT is a pre-trained language model that can identify Tweets concerning observations of plant health issues with generalizability on unseen natural hazards.
This paper tackles the lack of labelled data by further studying ChouBERT's know-how on token-level annotation tasks over small labeled sets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An important application scenario of precision agriculture is detecting and
measuring crop health threats using sensors and data analysis techniques.
However, the textual data are still under-explored among the existing solutions
due to the lack of labelled data and fine-grained semantic resources. Recent
research suggests that the increasing connectivity of farmers and the emergence
of online farming communities make social media like Twitter a participatory
platform for detecting unfamiliar plant health events if we can extract
essential information from unstructured textual data. ChouBERT is a French
pre-trained language model that can identify Tweets concerning observations of
plant health issues with generalizability on unseen natural hazards. This paper
tackles the lack of labelled data by further studying ChouBERT's know-how on
token-level annotation tasks over small labeled sets.
Related papers
- UCE-FID: Using Large Unlabeled, Medium Crowdsourced-Labeled, and Small
Expert-Labeled Tweets for Foodborne Illness Detection [8.934980946374367]
We propose EGAL, a deep learning framework for foodborne illness detection.
EGAL uses small expert-labeled tweets augmented by crowdsourced-labeled and massive unlabeled data.
EGAL has the potential to be deployed for real-time analysis of tweet streaming, contributing to foodborne illness outbreak surveillance efforts.
arXiv Detail & Related papers (2023-12-02T21:03:23Z) - Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining.
We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data.
Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z) - Foveate, Attribute, and Rationalize: Towards Physically Safe and
Trustworthy AI [76.28956947107372]
Covertly unsafe text is an area of particular interest, as such text may arise from everyday scenarios and are challenging to detect as harmful.
We propose FARM, a novel framework leveraging external knowledge for trustworthy rationale generation in the context of safety.
Our experiments show that FARM obtains state-of-the-art results on the SafeText dataset, showing absolute improvement in safety classification accuracy by 5.9%.
arXiv Detail & Related papers (2022-12-19T17:51:47Z) - Smart Agriculture : A Novel Multilevel Approach for Agricultural Risk
Assessment over Unstructured Data [0.5735035463793008]
Uncertainty refers to a state of not knowing what will happen in the future.
This paper aims to leverage natural language processing and machine learning techniques to model uncertainties and evaluate the risk level in each uncertainty cluster using massive text data.
arXiv Detail & Related papers (2022-11-22T16:47:47Z) - CHEF: A Pilot Chinese Dataset for Evidence-Based Fact-Checking [55.75590135151682]
CHEF is the first CHinese Evidence-based Fact-checking dataset of 10K real-world claims.
The dataset covers multiple domains, ranging from politics to public health, and provides annotated evidence retrieved from the Internet.
arXiv Detail & Related papers (2022-06-06T09:11:03Z) - TWEET-FID: An Annotated Dataset for Multiple Foodborne Illness Detection
Tasks [14.523433519237607]
Foodborne illness is a serious but preventable public health problem.
There is a dearth of labeled datasets for developing effective outbreak detection models.
We present TWEET-FID, the first publicly available annotated dataset for foodborne illness incident detection tasks.
arXiv Detail & Related papers (2022-05-22T03:47:18Z) - Graph-based Joint Pandemic Concern and Relation Extraction on Twitter [19.7176519744206]
Public concern detection provides potential guidance to the authorities for crisis management before or during a pandemic outbreak.
detecting concerns in time from massive information in social media turns out to be a big challenge.
We propose a novel end-to-end deep learning model to identify people's concerns and the corresponding relations.
arXiv Detail & Related papers (2021-06-18T06:06:35Z) - Potato Crop Stress Identification in Aerial Images using Deep
Learning-based Object Detection [60.83360138070649]
The paper presents an approach for analyzing aerial images of a potato crop using deep neural networks.
The main objective is to demonstrate automated spatial recognition of a healthy versus stressed crop at a plant level.
Experimental validation demonstrated the ability for distinguishing healthy and stressed plants in field images, achieving an average Dice coefficient of 0.74.
arXiv Detail & Related papers (2021-06-14T21:57:40Z) - Reconciling Security and Utility in Next-Generation Epidemic Risk Mitigation Systems [49.05741109401773]
We present Silmarillion, a system that reconciles user's privacy with rich data collection for higher utility.
In Silmarillion, user devices record Bluetooth encounters with beacons installed in strategic locations.
We describe the design of Silmarillion and its communication protocols that ensure user privacy and data security.
arXiv Detail & Related papers (2020-11-16T16:19:37Z) - Sensitive Information Detection: Recursive Neural Networks for Encoding
Context [0.20305676256390928]
Leak of sensitive information can potentially be very costly.
We show that simplistic, brittle rule sets for detecting sensitive information only find a small fraction of the actual sensitive information.
We develop a novel family of sensitive information detection approaches which only assumes access to labeled examples.
arXiv Detail & Related papers (2020-08-25T07:49:46Z) - Leveraging Multi-Source Weak Social Supervision for Early Detection of
Fake News [67.53424807783414]
Social media has greatly enabled people to participate in online activities at an unprecedented rate.
This unrestricted access also exacerbates the spread of misinformation and fake news online which might cause confusion and chaos unless being detected early for its mitigation.
We jointly leverage the limited amount of clean data along with weak signals from social engagements to train deep neural networks in a meta-learning framework to estimate the quality of different weak instances.
Experiments on realworld datasets demonstrate that the proposed framework outperforms state-of-the-art baselines for early detection of fake news without using any user engagements at prediction time.
arXiv Detail & Related papers (2020-04-03T18:26:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.