An Ensemble-based approach for assigning text to correct Harmonized
system code
- URL: http://arxiv.org/abs/2211.04313v1
- Date: Tue, 8 Nov 2022 15:32:36 GMT
- Title: An Ensemble-based approach for assigning text to correct Harmonized
system code
- Authors: Shubham, Avinash Arya, Subarna Roy, Sridhar Jonnala
- Abstract summary: Harmonized System (HS) is the most standardized numerical method of classifying traded products among industry classification systems.
A hierarchical ensemble model comprising of Bert- transformer, NER, distance-based approaches, and knowledge-graphs have been developed to address scalability, coverage, ability to capture nuances, automation and auditing requirements when classifying unknown text-descriptions as per HS method.
- Score: 2.365702128814616
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Industries must follow government rules and regulations around the world to
classify products when assessing duties and taxes for international shipment.
Harmonized System (HS) is the most standardized numerical method of classifying
traded products among industry classification systems. A hierarchical ensemble
model comprising of Bert- transformer, NER, distance-based approaches, and
knowledge-graphs have been developed to address scalability, coverage, ability
to capture nuances, automation and auditing requirements when classifying
unknown text-descriptions as per HS method.
Related papers
- An Open Knowledge Graph-Based Approach for Mapping Concepts and Requirements between the EU AI Act and International Standards [1.9142148274342772]
The EU's AI Act will shift the focus of such organizations toward conformance with the technical requirements for regulatory compliance.
This paper offers a simple and repeatable mechanism for mapping the terms and requirements relevant to normative statements in regulations and standards.
arXiv Detail & Related papers (2024-08-21T18:21:09Z) - Learnable Item Tokenization for Generative Recommendation [78.30417863309061]
We propose LETTER (a LEarnable Tokenizer for generaTivE Recommendation), which integrates hierarchical semantics, collaborative signals, and code assignment diversity.
LETTER incorporates Residual Quantized VAE for semantic regularization, a contrastive alignment loss for collaborative regularization, and a diversity loss to mitigate code assignment bias.
arXiv Detail & Related papers (2024-05-12T15:49:38Z) - Towards Standards-Compliant Assistive Technology Product Specifications via LLMs [7.30389619012625]
We introduce CompliAT, a pioneering framework designed to streamline the compliance process of AT product specifications.
CompliAT addresses three critical tasks: checking consistency terminology, classifying products according to standards, and tracing key product specifications to standard requirements.
We propose a novel approach for product classification, leveraging a retrieval-augmented generation model to accurately categorize AT products aligning to international standards.
arXiv Detail & Related papers (2024-04-04T00:10:39Z) - RulePrompt: Weakly Supervised Text Classification with Prompting PLMs and Self-Iterative Logical Rules [30.239044569301534]
Weakly supervised text classification (WSTC) has attracted increasing attention due to its applicability in classifying a mass of texts.
We propose a prompting PLM-based approach named RulePrompt for the WSTC task, consisting of a rule mining module and a rule-enhanced pseudo label generation module.
Our approach yields interpretable category rules, proving its advantage in disambiguating easily-confused categories.
arXiv Detail & Related papers (2024-03-05T12:50:36Z) - Gen-Z: Generative Zero-Shot Text Classification with Contextualized
Label Descriptions [50.92702206798324]
We propose a generative prompting framework for zero-shot text classification.
GEN-Z measures the LM likelihood of input text conditioned on natural language descriptions of labels.
We show that zero-shot classification with simple contextualization of the data source consistently outperforms both zero-shot and few-shot baselines.
arXiv Detail & Related papers (2023-11-13T07:12:57Z) - Prompt Tuned Embedding Classification for Multi-Label Industry Sector Allocation [2.024620791810963]
This study benchmarks the performance of Prompt Tuning and baselines for multi-label text classification.
It is applied to classifying companies into an investment firm's proprietary industry taxonomy.
We confirm that the model's performance is consistent across both well-known and less-known companies.
arXiv Detail & Related papers (2023-09-21T13:45:32Z) - Using novel data and ensemble models to improve automated labeling of
Sustainable Development Goals [0.0]
A number of labeling systems based on text have been proposed to help monitor work on the United Nations (UN) Sustainable Development Goals.
We show that systems differ considerably in their specificity (i.e., true-positive rate) and sensitivity (i.e., true-negative rate)
We then show that an ensemble model that pools labeling systems alleviates some of these limitations, exceeding the labeling performance of all currently available systems.
arXiv Detail & Related papers (2023-01-25T07:44:46Z) - Hybrid Rule-Neural Coreference Resolution System based on Actor-Critic
Learning [53.73316523766183]
Coreference resolution systems need to tackle two main tasks.
One task is to detect all of the potential mentions, the other is to learn the linking of an antecedent for each possible mention.
We propose a hybrid rule-neural coreference resolution system based on actor-critic learning.
arXiv Detail & Related papers (2022-12-20T08:55:47Z) - Learning Label Modular Prompts for Text Classification in the Wild [56.66187728534808]
We propose text classification in-the-wild, which introduces different non-stationary training/testing stages.
Decomposing a complex task into modular components can enable robust generalisation under such non-stationary environment.
We propose MODULARPROMPT, a label-modular prompt tuning framework for text classification tasks.
arXiv Detail & Related papers (2022-11-30T16:26:38Z) - Token-level Sequence Labeling for Spoken Language Understanding using
Compositional End-to-End Models [94.30953696090758]
We build compositional end-to-end spoken language understanding systems.
By relying on intermediate decoders trained for ASR, our end-to-end systems transform the input modality from speech to token-level representations.
Our models outperform both cascaded and direct end-to-end models on a labeling task of named entity recognition.
arXiv Detail & Related papers (2022-10-27T19:33:18Z) - Interpretable Reinforcement Learning with Multilevel Subgoal Discovery [77.34726150561087]
We propose a novel Reinforcement Learning model for discrete environments.
In the model, an agent learns information about environment in the form of probabilistic rules.
No reward function is required for learning; an agent only needs to be given a primary goal to achieve.
arXiv Detail & Related papers (2022-02-15T14:04:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.