UnibucKernel: Geolocating Swiss-German Jodels Using Ensemble Learning
- URL: http://arxiv.org/abs/2102.09379v2
- Date: Fri, 19 Feb 2021 08:31:31 GMT
- Title: UnibucKernel: Geolocating Swiss-German Jodels Using Ensemble Learning
- Authors: Mihaela Gaman, Sebastian Cojocariu, Radu Tudor Ionescu
- Abstract summary: We focus on the second subtask, which is based on a data set formed of approximately 30 thousand Swiss German Jodels.
The dialect identification task is about accurately predicting the latitude and longitude of test samples.
We frame the task as a double regression problem, employing an XGBoost meta-learner with the combined power of a variety of machine learning approaches.
- Score: 15.877673959068455
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we describe our approach addressing the Social Media Variety
Geolocation task featured in the 2021 VarDial Evaluation Campaign. We focus on
the second subtask, which is based on a data set formed of approximately 30
thousand Swiss German Jodels. The dialect identification task is about
accurately predicting the latitude and longitude of test samples. We frame the
task as a double regression problem, employing an XGBoost meta-learner with the
combined power of a variety of machine learning approaches to predict both
latitude and longitude. The models included in our ensemble range from simple
regression techniques, such as Support Vector Regression, to deep neural
models, such as a hybrid neural network and a neural transformer. To minimize
the prediction error, we approach the problem from a few different perspectives
and consider various types of features, from low-level character n-grams to
high-level BERT embeddings. The XGBoost ensemble resulted from combining the
power of the aforementioned methods achieves a median distance of 23.6 km on
the test data, which places us on the third place in the ranking, at a
difference of 6.05 km and 2.9 km from the submissions on the first and second
places, respectively.
Related papers
- Coupled Laplacian Eigenmaps for Locally-Aware 3D Rigid Point Cloud Matching [0.0]
We propose a new technique, based on graph Laplacian eigenmaps, to match point clouds by taking into account fine local structures.
To deal with the order and sign ambiguity of Laplacian eigenmaps, we introduce a new operator, called Coupled Laplacian.
We show that the similarity between those aligned high-dimensional spaces provides a locally meaningful score to match shapes.
arXiv Detail & Related papers (2024-02-27T10:10:12Z) - GeoLLM: Extracting Geospatial Knowledge from Large Language Models [49.20315582673223]
We present GeoLLM, a novel method that can effectively extract geospatial knowledge from large language models.
We demonstrate the utility of our approach across multiple tasks of central interest to the international community, including the measurement of population density and economic livelihoods.
Our experiments reveal that LLMs are remarkably sample-efficient, rich in geospatial information, and robust across the globe.
arXiv Detail & Related papers (2023-10-10T00:03:23Z) - Gpachov at CheckThat! 2023: A Diverse Multi-Approach Ensemble for
Subjectivity Detection in News Articles [34.98368667957678]
This paper presents the solution built by the Gpachov team for the CLEF-2023 CheckThat! lab Task2 on subjectivity detection.
The three approaches are combined in a simple majority voting ensemble, resulting in 0.77 macro F1 on the test set and achieving 2nd place on the English subtask.
arXiv Detail & Related papers (2023-09-13T09:49:20Z) - Geo-Encoder: A Chunk-Argument Bi-Encoder Framework for Chinese
Geographic Re-Ranking [61.60169764507917]
Chinese geographic re-ranking task aims to find the most relevant addresses among retrieved candidates.
We propose an innovative framework, namely Geo-Encoder, to more effectively integrate Chinese geographical semantics into re-ranking pipelines.
arXiv Detail & Related papers (2023-09-04T13:44:50Z) - Domain Adaptive Synapse Detection with Weak Point Annotations [63.97144211520869]
We present AdaSyn, a framework for domain adaptive synapse detection with weak point annotations.
In the WASPSYN challenge at I SBI 2023, our method ranks the 1st place.
arXiv Detail & Related papers (2023-08-31T05:05:53Z) - Predicting the Geolocation of Tweets Using transformer models on Customized Data [17.55660062746406]
This research is aimed to solve the tweet/user geolocation prediction task.
The suggested approach implements neural networks for natural language processing to estimate the location.
The scope of proposed models has been finetuned on a Twitter dataset.
arXiv Detail & Related papers (2023-03-14T12:56:47Z) - TopoBERT: Plug and Play Toponym Recognition Module Harnessing Fine-tuned
BERT [11.446721140340575]
TopoBERT, a toponym recognition module based on a one dimensional Convolutional Neural Network (CNN1D) and Bidirectional Representation from Transformers (BERT), is proposed and fine-tuned.
TopoBERT achieves state-of-the-art performance compared to the other five baseline models and can be applied to diverse toponym recognition tasks without additional training.
arXiv Detail & Related papers (2023-01-31T13:44:34Z) - Rethinking Spatial Invariance of Convolutional Networks for Object
Counting [119.83017534355842]
We try to use locally connected Gaussian kernels to replace the original convolution filter to estimate the spatial position in the density map.
Inspired by previous work, we propose a low-rank approximation accompanied with translation invariance to favorably implement the approximation of massive Gaussian convolution.
Our methods significantly outperform other state-of-the-art methods and achieve promising learning of the spatial position of objects.
arXiv Detail & Related papers (2022-06-10T17:51:25Z) - Stratified Transformer for 3D Point Cloud Segmentation [89.9698499437732]
Stratified Transformer is able to capture long-range contexts and demonstrates strong generalization ability and high performance.
To combat the challenges posed by irregular point arrangements, we propose first-layer point embedding to aggregate local information.
Experiments demonstrate the effectiveness and superiority of our method on S3DIS, ScanNetv2 and ShapeNetPart datasets.
arXiv Detail & Related papers (2022-03-28T05:35:16Z) - Meta-Generating Deep Attentive Metric for Few-shot Classification [53.07108067253006]
We present a novel deep metric meta-generation method to generate a specific metric for a new few-shot learning task.
In this study, we structure the metric using a three-layer deep attentive network that is flexible enough to produce a discriminative metric for each task.
We gain surprisingly obvious performance improvement over state-of-the-art competitors, especially in the challenging cases.
arXiv Detail & Related papers (2020-12-03T02:07:43Z) - Combining Deep Learning and String Kernels for the Localization of Swiss
German Tweets [28.497747521078647]
We address the second subtask, which targets a data set composed of nearly 30 thousand Swiss German Jodels.
We frame the task as a double regression problem, employing a variety of machine learning approaches to predict both latitude and longitude.
Our empirical results indicate that the handcrafted model based on string kernels outperforms the deep learning approaches.
arXiv Detail & Related papers (2020-10-07T19:16:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.