Fugu-MT 論文翻訳(概要): Theedhum Nandrum@Dravidian-CodeMix-FIRE2020: A Sentiment Polarity Classifier for YouTube Comments with Code-switching between Tamil, Malayalam and English

論文の概要: Theedhum Nandrum@Dravidian-CodeMix-FIRE2020: A Sentiment Polarity Classifier for YouTube Comments with Code-switching between Tamil, Malayalam and English

arxiv url: http://arxiv.org/abs/2010.03189v2
Date: Tue, 13 Oct 2020 09:27:35 GMT
ステータス: 翻訳完了
システム内更新日: 2022-10-09 22:08:55.639922
Title: Theedhum Nandrum@Dravidian-CodeMix-FIRE2020: A Sentiment Polarity Classifier for YouTube Comments with Code-switching between Tamil, Malayalam and English
Title（参考訳）: Theedhum Nandrum@Dravidian-CodeMix-FIRE2020: The Sentiment Polarity Classificationifier for YouTube Comments with Code-switching between Tamil, Malayalam and English
Authors: BalaSundaraRaman Lakshmanan and Sanjeeth Kumar Ravindranath
Abstract要約: Theedhum Nandrumは2つのアプローチを用いた感情極性検出システムである。絵文字の使用、スクリプトの選択、コードミキシングなどの言語機能を使用する。タミル英語では4位、マラヤラム英語では平均F1得点が0.62、9得点が0.65である。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Theedhum Nandrum is a sentiment polarity detection system using two approaches--a Stochastic Gradient Descent (SGD) based classifier and a Long Short-term Memory (LSTM) based Classifier. Our approach utilises language features like use of emoji, choice of scripts and code mixing which appeared quite marked in the datasets specified for the Dravidian Codemix - FIRE 2020 task. The hyperparameters for the SGD were tuned using GridSearchCV. Our system was ranked 4th in Tamil-English with a weighted average F1 score of 0.62 and 9th in Malayalam-English with a score of 0.65. We achieved a weighted average F1 score of 0.77 for Tamil-English using a Logistic Regression based model after the task deadline. This performance betters the top ranked classifier on this dataset by a wide margin. Our use of language-specific Soundex to harmonise the spelling variants in code-mixed data appears to be a novel application of Soundex. Our complete code is published in github at https://github.com/oligoglot/theedhum-nandrum.
Abstract（参考訳）: Theedhum Nandrumは、SGD(Stochastic Gradient Descent)ベースの分類器とLSTM(Long Short-term Memory)ベースの分類器の2つのアプローチを用いた感情極性検出システムである。私たちのアプローチでは、絵文字の使用、スクリプトの選択、そしてdravidian codemix - fire 2020タスクで指定されたデータセットに非常にマークされたコード混合といった言語機能を利用します。 SGDのハイパーパラメータはGridSearchCVで調整された。我々のシステムはタミル英語では4位、マラヤラム英語では平均F1得点は0.62と9位、スコアは0.65であった。課題期限後,ロジスティック回帰モデルを用いて,タミル英語の重み付き平均F1スコア0.77を達成した。このパフォーマンスは、このデータセットの上位ランクの分類器を広いマージンで改善する。コード混合データにおける綴りの変形を調和させるための言語固有のsoundexの利用は、soundexの新しい応用であると思われる。完全なコードはgithubのhttps://github.com/oligoglot/theedhum-nandrumで公開されている。

関連論文リスト

READIN: A Chinese Multi-Task Benchmark with Realistic and Diverse Input Noises [87.70001456418504]
我々は、Realistic and Diverse Input Noisesを用いた中国のマルチタスクベンチマークREADINを構築した。 READINには4つの多様なタスクとアノテータが含まれており、Pinyin入力と音声入力という2つの一般的な中国語入力方式で元のテストデータを再入力するよう要求する。我々は、強化された事前訓練された言語モデルと、堅牢なトレーニング手法を用いて実験を行い、これらのモデルがREADINに顕著な性能低下を被ることがしばしば見いだされた。
論文参考訳（メタデータ） (2023-02-14T20:14:39Z)
Transformer-based Model for Word Level Language Identification in Code-mixed Kannada-English Texts [55.41644538483948]
コードミキシングしたカンナダ英語テキストにおける単語レベル言語識別のためのトランスフォーマーベースモデルを提案する。 The proposed model on the CoLI-Kenglish dataset achieves a weighted F1-score of 0.84 and a macro F1-score of 0.61。
論文参考訳（メタデータ） (2022-11-26T02:39:19Z)
PSG@HASOC-Dravidian CodeMixFIRE2021: Pretrained Transformers for Offensive Language Identification in Tanglish [0.0]
本稿では,Dravidian-Codemix-HASOC2021: Hate Speech and Offensive Language Identification in Dravidian Languageについて述べる。本課題は,ソーシャルメディアから収集したDravidian言語における,コードミキシングされたコメント・ポスト中の攻撃的コンテンツを特定することを目的とする。
論文参考訳（メタデータ） (2021-10-06T15:23:40Z)
IIITG-ADBU@HASOC-Dravidian-CodeMix-FIRE2020: Offensive Content Detection in Code-Mixed Dravidian Text [2.4890053912861654]
本稿では,共有タスクDravidian-CodeMix-HASOC 2020におけるSVMとXLM-RoBERTaによる分類結果について述べる。
論文参考訳（メタデータ） (2021-07-29T21:23:17Z)
Explicit Alignment Objectives for Multilingual Bidirectional Encoders [111.65322283420805]
本稿では,多言語エンコーダAMBER(Aligned Multilingual Bi-directional EncodeR)の学習方法を提案する。 AMBERは、異なる粒度で多言語表現を整列する2つの明示的なアライメント目標を使用して、追加の並列データに基づいて訓練される。実験結果から、AMBERは、シーケンスタグ付けで1.1平均F1スコア、XLMR-大規模モデル上での検索で27.3平均精度を得ることがわかった。
論文参考訳（メタデータ） (2020-10-15T18:34:13Z)
Gauravarora@HASOC-Dravidian-CodeMix-FIRE2020: Pre-training ULMFiT on Synthetically Generated Code-Mixed Data for Hate Speech Detection [0.0]
本稿では,ドラヴィダ語におけるHate Speech and Offensive Content Identification in Dravidian Language (Tamil-British and Malayalam-British)について述べる。このタスクは、ソーシャルメディアから収集されたDravidian言語におけるコメント/ポストのコード混合データセットにおける攻撃的言語を特定することを目的としている。
論文参考訳（メタデータ） (2020-10-05T15:25:47Z)
NLP-CIC at SemEval-2020 Task 9: Analysing sentiment in code-switching language using a simple deep-learning classifier [63.137661897716555]
コードスイッチングは、2つ以上の言語が同じメッセージで使用される現象である。標準的な畳み込みニューラルネットワークモデルを用いて、スペイン語と英語の混在するツイートの感情を予測する。
論文参考訳（メタデータ） (2020-09-07T19:57:09Z)
Predicting the Humorousness of Tweets Using Gaussian Process Preference Learning [56.18809963342249]
本稿では,人間の嗜好判断と言語アノテーションの自動生成を利用して,短文のユーモラスさのランク付けと評価を学習する確率論的アプローチを提案する。本研究は, HAHA@IberLEF 2019データにおける数値スコアの変換と, 提案手法に必要な判定アノテーションの相互変換から生じる問題点について報告する。
論文参考訳（メタデータ） (2020-08-03T13:05:42Z)
ULD@NUIG at SemEval-2020 Task 9: Generative Morphemes with an Attention Model for Sentiment Analysis in Code-Mixed Text [1.4926515182392508]
本稿では,SemEval 2020 Task 9 SentiMixに寄与したGenMAモデル感情分析システムについて述べる。このシステムは、単語レベルの言語タグを使わずに、与えられた英語とヒンディー語を混合したツイートの感情を予測することを目的としている。
論文参考訳（メタデータ） (2020-07-27T23:58:54Z)
Voice@SRIB at SemEval-2020 Task 9 and 12: Stacked Ensembling method for Sentiment and Offensiveness detection in Social Media [2.9008108937701333]
埋め込み、Sentimixのアンサンブルメソッド、OffensEvalタスクをトレーニングします。我々は、マクロF1スコア、精度、精度、およびデータセットのリコールについて、我々のモデルを評価する。
論文参考訳（メタデータ） (2020-07-20T11:54:43Z)
Enhanced Universal Dependency Parsing with Second-Order Inference and Mixture of Training Data [48.8386313914471]
本稿では,テキストIWPT 2020共有タスクに使用するシステムについて述べる。低リソースのタミルコーパスでは、タミルの訓練データを他の言語と特別に混合し、タミルの性能を大幅に改善する。
論文参考訳（メタデータ） (2020-06-02T06:42:22Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。