RethinkCWS: Is Chinese Word Segmentation a Solved Task?
- URL: http://arxiv.org/abs/2011.06858v2
- Date: Wed, 9 Dec 2020 04:48:04 GMT
- Title: RethinkCWS: Is Chinese Word Segmentation a Solved Task?
- Authors: Jinlan Fu, Pengfei Liu, Qi Zhang, Xuanjing Huang
- Abstract summary: The performance of the Chinese Word (CWS) systems has gradually reached a plateau with the rapid development of deep neural networks.
In this paper, we take stock of what we have achieved and rethink what's left in the CWS task.
- Score: 81.11161697133095
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The performance of the Chinese Word Segmentation (CWS) systems has gradually
reached a plateau with the rapid development of deep neural networks,
especially the successful use of large pre-trained models. In this paper, we
take stock of what we have achieved and rethink what's left in the CWS task.
Methodologically, we propose a fine-grained evaluation for existing CWS
systems, which not only allows us to diagnose the strengths and weaknesses of
existing models (under the in-dataset setting), but enables us to quantify the
discrepancy between different criterion and alleviate the negative transfer
problem when doing multi-criteria learning. Strategically, despite not aiming
to propose a novel model in this paper, our comprehensive experiments on eight
models and seven datasets, as well as thorough analysis, could search for some
promising direction for future research. We make all codes publicly available
and release an interface that can quickly evaluate and diagnose user's models:
https://github.com/neulab/InterpretEval.
Related papers
- Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Efficiently Robustify Pre-trained Models [18.392732966487582]
robustness of large scale models towards real-world settings is still a less-explored topic.
We first benchmark the performance of these models under different perturbations and datasets.
We then discuss on how complete model fine-tuning based existing robustification schemes might not be a scalable option given very large scale networks.
arXiv Detail & Related papers (2023-09-14T08:07:49Z) - Language Models for Novelty Detection in System Call Traces [0.27309692684728604]
This paper introduces a novelty detection methodology that relies on a probability distribution over sequences of system calls.
The proposed methodology requires minimal expert hand-crafting and achieves an F-score and AuROC greater than 95% on most novelties.
The source code and trained models are publicly available on GitHub while the datasets are available on Zenodo.
arXiv Detail & Related papers (2023-09-05T13:11:40Z) - Frugal Reinforcement-based Active Learning [12.18340575383456]
We propose a novel active learning approach for label-efficient training.
The proposed method is iterative and aims at minimizing a constrained objective function that mixes diversity, representativity and uncertainty criteria.
We also introduce a novel weighting mechanism based on reinforcement learning, which adaptively balances these criteria at each training iteration.
arXiv Detail & Related papers (2022-12-09T14:17:45Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - 2nd Place Scheme on Action Recognition Track of ECCV 2020 VIPriors
Challenges: An Efficient Optical Flow Stream Guided Framework [57.847010327319964]
We propose a data-efficient framework that can train the model from scratch on small datasets.
Specifically, by introducing a 3D central difference convolution operation, we proposed a novel C3D neural network-based two-stream framework.
It is proved that our method can achieve a promising result even without a pre-trained model on large scale datasets.
arXiv Detail & Related papers (2020-08-10T09:50:28Z) - SpaceNet: Make Free Space For Continual Learning [15.914199054779438]
We propose a novel architectural-based method referred as SpaceNet for class incremental learning scenario.
SpaceNet trains sparse deep neural networks from scratch in an adaptive way that compresses the sparse connections of each task in a compact number of neurons.
Experimental results show the robustness of our proposed method against catastrophic forgetting old tasks and the efficiency of SpaceNet in utilizing the available capacity of the model.
arXiv Detail & Related papers (2020-07-15T11:21:31Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.