Combining Deep Learning and String Kernels for the Localization of Swiss
German Tweets
- URL: http://arxiv.org/abs/2010.03614v1
- Date: Wed, 7 Oct 2020 19:16:45 GMT
- Title: Combining Deep Learning and String Kernels for the Localization of Swiss
German Tweets
- Authors: Mihaela Gaman, Radu Tudor Ionescu
- Abstract summary: We address the second subtask, which targets a data set composed of nearly 30 thousand Swiss German Jodels.
We frame the task as a double regression problem, employing a variety of machine learning approaches to predict both latitude and longitude.
Our empirical results indicate that the handcrafted model based on string kernels outperforms the deep learning approaches.
- Score: 28.497747521078647
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we introduce the methods proposed by the UnibucKernel team in
solving the Social Media Variety Geolocation task featured in the 2020 VarDial
Evaluation Campaign. We address only the second subtask, which targets a data
set composed of nearly 30 thousand Swiss German Jodels. The dialect
identification task is about accurately predicting the latitude and longitude
of test samples. We frame the task as a double regression problem, employing a
variety of machine learning approaches to predict both latitude and longitude.
From simple models for regression, such as Support Vector Regression, to deep
neural networks, such as Long Short-Term Memory networks and character-level
convolutional neural networks, and, finally, to ensemble models based on
meta-learners, such as XGBoost, our interest is focused on approaching the
problem from a few different perspectives, in an attempt to minimize the
prediction error. With the same goal in mind, we also considered many types of
features, from high-level features, such as BERT embeddings, to low-level
features, such as characters n-grams, which are known to provide good results
in dialect identification. Our empirical results indicate that the handcrafted
model based on string kernels outperforms the deep learning approaches.
Nevertheless, our best performance is given by the ensemble model that combines
both handcrafted and deep learning models.
Related papers
- The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Analyzing Vietnamese Legal Questions Using Deep Neural Networks with
Biaffine Classifiers [3.116035935327534]
We propose using deep neural networks to extract important information from Vietnamese legal questions.
Given a legal question in natural language, the goal is to extract all the segments that contain the needed information to answer the question.
arXiv Detail & Related papers (2023-04-27T18:19:24Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities.
We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention.
Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z) - Compare learning: bi-attention network for few-shot learning [6.559037166322981]
One of the Few-shot learning methods called metric learning addresses this challenge by first learning a deep distance metric to determine whether a pair of images belong to the same category.
In this paper, we propose a novel approach named Bi-attention network to compare the instances, which can measure the similarity between embeddings of instances precisely, globally and efficiently.
arXiv Detail & Related papers (2022-03-25T07:39:10Z) - UnibucKernel: Geolocating Swiss-German Jodels Using Ensemble Learning [15.877673959068455]
We focus on the second subtask, which is based on a data set formed of approximately 30 thousand Swiss German Jodels.
The dialect identification task is about accurately predicting the latitude and longitude of test samples.
We frame the task as a double regression problem, employing an XGBoost meta-learner with the combined power of a variety of machine learning approaches.
arXiv Detail & Related papers (2021-02-18T14:26:00Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z) - Parameter Space Factorization for Zero-Shot Learning across Tasks and
Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters.
We infer the posteriors over such latent variables based on data from seen task-language combinations.
Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.