Benchmarking Harmonized Tariff Schedule Classification Models
- URL: http://arxiv.org/abs/2412.14179v1
- Date: Wed, 04 Dec 2024 16:29:05 GMT
- Title: Benchmarking Harmonized Tariff Schedule Classification Models
- Authors: Bryce Judy,
- Abstract summary: The study evaluates several industry-leading solutions, including those provided by Zonos, Tarifflo, Avalara, and WCO BACUDA.
Results highlight areas for industry-wide improvement and innovation.
- Score: 0.0
- License:
- Abstract: The Harmonized Tariff System (HTS) classification industry, essential to e-commerce and international trade, currently lacks standardized benchmarks for evaluating the effectiveness of classification solutions. This study establishes and tests a benchmark framework for imports to the United States, inspired by the benchmarking approaches used in language model evaluation, to systematically compare prominent HTS classification tools. The framework assesses key metrics--such as speed, accuracy, rationality, and HTS code alignment--to provide a comprehensive performance comparison. The study evaluates several industry-leading solutions, including those provided by Zonos, Tarifflo, Avalara, and WCO BACUDA, identifying each tool's strengths and limitations. Results highlight areas for industry-wide improvement and innovation, paving the way for more effective and standardized HTS classification solutions across the international trade and e-commerce sectors.
Related papers
- Scoring Verifiers: Evaluating Synthetic Verification in Code and Reasoning [59.25951947621526]
We introduce benchmarks designed to evaluate the impact of synthetic verification methods on assessing solution correctness.
We analyze synthetic verification methods in standard, reasoning-based, and reward-based LLMs.
Our results show that recent reasoning models significantly improve test case generation and that scaling test cases enhances verification accuracy.
arXiv Detail & Related papers (2025-02-19T15:32:11Z) - TextClass Benchmark: A Continuous Elo Rating of LLMs in Social Sciences [0.0]
The TextClass Benchmark project aims to provide a comprehensive, fair, and dynamic evaluation of LLMs and transformers for text classification tasks.
This evaluation spans various domains and languages in social sciences disciplines engaged in NLP and text-as-data approach.
The leaderboards present performance metrics and relative ranking using a tailored Elo rating system.
arXiv Detail & Related papers (2024-11-30T17:09:49Z) - LocalValueBench: A Collaboratively Built and Extensible Benchmark for Evaluating Localized Value Alignment and Ethical Safety in Large Language Models [0.0]
The proliferation of large language models (LLMs) requires robust evaluation of their alignment with local values and ethical standards.
textscLocalValueBench is a benchmark designed to assess LLMs' adherence to Australian values.
arXiv Detail & Related papers (2024-07-27T05:55:42Z) - Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition [70.60872754129832]
First NeurIPS competition on unlearning sought to stimulate the development of novel algorithms.
Nearly 1,200 teams from across the world participated.
We analyze top solutions and delve into discussions on benchmarking unlearning.
arXiv Detail & Related papers (2024-06-13T12:58:00Z) - Don't Make Your LLM an Evaluation Benchmark Cheater [142.24553056600627]
Large language models(LLMs) have greatly advanced the frontiers of artificial intelligence, attaining remarkable improvement in model capacity.
To assess the model performance, a typical approach is to construct evaluation benchmarks for measuring the ability level of LLMs.
We discuss the potential risk and impact of inappropriately using evaluation benchmarks and misleadingly interpreting the evaluation results.
arXiv Detail & Related papers (2023-11-03T14:59:54Z) - Prompt Tuned Embedding Classification for Multi-Label Industry Sector Allocation [2.024620791810963]
This study benchmarks the performance of Prompt Tuning and baselines for multi-label text classification.
It is applied to classifying companies into an investment firm's proprietary industry taxonomy.
We confirm that the model's performance is consistent across both well-known and less-known companies.
arXiv Detail & Related papers (2023-09-21T13:45:32Z) - A Review of Benchmarks for Visual Defect Detection in the Manufacturing
Industry [63.52264764099532]
We propose a study of existing benchmarks to compare and expose their characteristics and their use-cases.
A study of industrial metrics requirements, as well as testing procedures, will be presented and applied to the studied benchmarks.
arXiv Detail & Related papers (2023-05-05T07:44:23Z) - An Ensemble-based approach for assigning text to correct Harmonized
system code [2.365702128814616]
Harmonized System (HS) is the most standardized numerical method of classifying traded products among industry classification systems.
A hierarchical ensemble model comprising of Bert- transformer, NER, distance-based approaches, and knowledge-graphs have been developed to address scalability, coverage, ability to capture nuances, automation and auditing requirements when classifying unknown text-descriptions as per HS method.
arXiv Detail & Related papers (2022-11-08T15:32:36Z) - Vote'n'Rank: Revision of Benchmarking with Social Choice Theory [7.224599819499157]
This paper proposes Vote'n'Rank, a framework for ranking systems in multi-task benchmarks under the principles of the social choice theory.
We demonstrate that our approach can be efficiently utilised to draw new insights on benchmarking in several ML sub-fields.
arXiv Detail & Related papers (2022-10-11T20:19:11Z) - fairlib: A Unified Framework for Assessing and Improving Classification
Fairness [66.27822109651757]
fairlib is an open-source framework for assessing and improving classification fairness.
We implement 14 debiasing methods, including pre-processing, at-training-time, and post-processing approaches.
The built-in metrics cover the most commonly used fairness criterion and can be further generalized and customized for fairness evaluation.
arXiv Detail & Related papers (2022-05-04T03:50:23Z) - CRACT: Cascaded Regression-Align-Classification for Robust Visual
Tracking [97.84109669027225]
We introduce an improved proposal refinement module, Cascaded Regression-Align- Classification (CRAC)
CRAC yields new state-of-the-art performances on many benchmarks.
In experiments on seven benchmarks including OTB-2015, UAV123, NfS, VOT-2018, TrackingNet, GOT-10k and LaSOT, our CRACT exhibits very promising results in comparison with state-of-the-art competitors.
arXiv Detail & Related papers (2020-11-25T02:18:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.