Solution for the EPO CodeFest on Green Plastics: Hierarchical
multi-label classification of patents relating to green plastics using deep
learning
- URL: http://arxiv.org/abs/2302.13784v1
- Date: Wed, 22 Feb 2023 19:06:58 GMT
- Title: Solution for the EPO CodeFest on Green Plastics: Hierarchical
multi-label classification of patents relating to green plastics using deep
learning
- Authors: Tingting Qiao, Gonzalo Moro Perez
- Abstract summary: This work aims at hierarchical multi-label patents classification for patents disclosing technologies related to green plastics.
We first propose a classification scheme for this technology and a way to learn a machine learning model to classify patents into the proposed classification scheme.
To achieve this, we come up with a strategy to automatically assign labels to patents in order to create a labeled training dataset that can be used to learn a classification model in a supervised learning setting.
- Score: 4.050982413149992
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This work aims at hierarchical multi-label patents classification for patents
disclosing technologies related to green plastics. This is an emerging field
for which there is currently no classification scheme, and hence, no labeled
data is available, making this task particularly challenging. We first propose
a classification scheme for this technology and a way to learn a machine
learning model to classify patents into the proposed classification scheme. To
achieve this, we come up with a strategy to automatically assign labels to
patents in order to create a labeled training dataset that can be used to learn
a classification model in a supervised learning setting. Using said training
dataset, we come up with two classification models, a SciBERT Neural Network
(SBNN) model and a SciBERT Hierarchical Neural Network (SBHNN) model. Both
models use a BERT model as a feature extractor and on top of it, a neural
network as a classifier. We carry out extensive experiments and report commonly
evaluation metrics for this challenging classification problem. The experiment
results verify the validity of our approach and show that our model sets a very
strong benchmark for this problem. We also interpret our models by visualizing
the word importance given by the trained model, which indicates the model is
capable to extract high-level semantic information of input documents. Finally,
we highlight how our solution fulfills the evaluation criteria for the EPO
CodeFest and we also outline possible directions for future work. Our code has
been made available at https://github.com/epo/CF22-Green-Hands
Related papers
- Adaptable Embeddings Network (AEN) [49.1574468325115]
We introduce Adaptable Embeddings Networks (AEN), a novel dual-encoder architecture using Kernel Density Estimation (KDE)
AEN allows for runtime adaptation of classification criteria without retraining and is non-autoregressive.
The architecture's ability to preprocess and cache condition embeddings makes it ideal for edge computing applications and real-time monitoring systems.
arXiv Detail & Related papers (2024-11-21T02:15:52Z) - Language Models are Graph Learners [70.14063765424012]
Language Models (LMs) are challenging the dominance of domain-specific models, including Graph Neural Networks (GNNs) and Graph Transformers (GTs)
We propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art GNNs on node classification tasks.
arXiv Detail & Related papers (2024-10-03T08:27:54Z) - Exploring Beyond Logits: Hierarchical Dynamic Labeling Based on Embeddings for Semi-Supervised Classification [49.09505771145326]
We propose a Hierarchical Dynamic Labeling (HDL) algorithm that does not depend on model predictions and utilizes image embeddings to generate sample labels.
Our approach has the potential to change the paradigm of pseudo-label generation in semi-supervised learning.
arXiv Detail & Related papers (2024-04-26T06:00:27Z) - Label-Retrieval-Augmented Diffusion Models for Learning from Noisy
Labels [61.97359362447732]
Learning from noisy labels is an important and long-standing problem in machine learning for real applications.
In this paper, we reformulate the label-noise problem from a generative-model perspective.
Our model achieves new state-of-the-art (SOTA) results on all the standard real-world benchmark datasets.
arXiv Detail & Related papers (2023-05-31T03:01:36Z) - Neuro-symbolic Rule Learning in Real-world Classification Tasks [75.0907310059298]
We extend pix2rule's neural DNF module to support rule learning in real-world multi-class and multi-label classification tasks.
We propose a novel extended model called neural DNF-EO (Exactly One) which enforces mutual exclusivity in multi-class classification.
arXiv Detail & Related papers (2023-03-29T13:27:14Z) - ELFIS: Expert Learning for Fine-grained Image Recognition Using Subsets [6.632855264705276]
We propose ELFIS, an expert learning framework for Fine-Grained Visual Recognition.
A set of neural networks-based experts are trained focusing on the meta-categories and are integrated into a multi-task framework.
Experiments show improvements in the SoTA FGVR benchmarks of up to +1.3% of accuracy using both CNNs and transformer-based networks.
arXiv Detail & Related papers (2023-03-16T12:45:19Z) - Semi-supervised classification using a supervised autoencoder for
biomedical applications [2.578242050187029]
We create a network architecture that encodes labels into the latent space of an autoencoder.
We classify unlabelled samples using the learned network.
arXiv Detail & Related papers (2022-08-22T13:51:00Z) - The Care Label Concept: A Certification Suite for Trustworthy and
Resource-Aware Machine Learning [5.684803689061448]
Machine learning applications have become ubiquitous. This has led to an increased effort of making machine learning trustworthy.
For those who do not want to invest time into understanding the method or the learned model, we offer care labels.
Care labels are the result of a certification suite that tests whether stated guarantees hold.
arXiv Detail & Related papers (2021-06-01T14:16:41Z) - Highly Efficient Representation and Active Learning Framework for
Imbalanced Data and its Application to COVID-19 X-Ray Classification [0.7829352305480284]
We propose a highly data-efficient classification and active learning framework for classifying chest X-rays.
It is based on (1) unsupervised representation learning of a Convolutional Neural Network and (2) the Gaussian Process method.
We demonstrate that only $sim 10%$ of the labeled data is needed to reach the accuracy from training all available labels.
arXiv Detail & Related papers (2021-02-25T02:48:59Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.