Related papers: LRWR: Large-Scale Benchmark for Lip Reading in Russian language

LRWR: Large-Scale Benchmark for Lip Reading in Russian language

URL: http://arxiv.org/abs/2109.06692v1
Date: Tue, 14 Sep 2021 13:51:19 GMT
Title: LRWR: Large-Scale Benchmark for Lip Reading in Russian language
Authors: Evgeniy Egorov, Vasily Kostyumov, Mikhail Konyk, Sergey Kolesnikov
Abstract summary: Lipreading aims to identify the speech content from videos by analyzing the visual deformations of lips and nearby areas. One of the significant obstacles for research in this field is the lack of proper datasets for a wide variety of languages. We introduce a naturally distributed benchmark for lipreading in Russian language, named LRWR, which contains 235 classes and 135 speakers.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Lipreading, also known as visual speech recognition, aims to identify the speech content from videos by analyzing the visual deformations of lips and nearby areas. One of the significant obstacles for research in this field is the lack of proper datasets for a wide variety of languages: so far, these methods have been focused only on English or Chinese. In this paper, we introduce a naturally distributed large-scale benchmark for lipreading in Russian language, named LRWR, which contains 235 classes and 135 speakers. We provide a detailed description of the dataset collection pipeline and dataset statistics. We also present a comprehensive comparison of the current popular lipreading methods on LRWR and conduct a detailed analysis of their performance. The results demonstrate the differences between the benchmarked languages and provide several promising directions for lipreading models finetuning. Thanks to our findings, we also achieved new state-of-the-art results on the LRW benchmark.

Related papers

LLM-Based Evaluation of Low-Resource Machine Translation: A Reference-less Dialect Guided Approach with a Refined Sylheti-English Benchmark [1.3927943269211591]
We propose a comprehensive framework that enhances Large Language Models (LLMs)-based machine translation evaluation.<n>We extend the ONUBAD dataset by incorporating Sylheti-English sentence pairs, corresponding machine translations, and Direct Assessment (DA) scores annotated by native speakers.<n>Our evaluation shows that the proposed pipeline consistently outperforms existing methods, achieving the highest gain of +0.1083 in Spearman correlation.
arXiv Detail & Related papers (2025-05-18T07:24:13Z)
Logos as a Well-Tempered Pre-train for Sign Language Recognition [75.42794328290088]
This paper presents Logos, a novel Russian Sign Language (RSL) dataset.<n>It is shown that a model, pre-trained on the Logos dataset can be used as a universal encoder for other language SLR tasks.<n>We show that explicitly labeling visually similar signs improves trained model quality as a visual encoder for downstream tasks.
arXiv Detail & Related papers (2025-05-15T16:31:49Z)
A Comparative Study of Translation Bias and Accuracy in Multilingual Large Language Models for Cross-Language Claim Verification [1.566834021297545]
This study systematically evaluates translation bias and the effectiveness of Large Language Models for cross-lingual claim verification. We investigate two distinct translation methods: pre-translation and self-translation. Our findings reveal that low-resource languages exhibit significantly lower accuracy in direct inference due to underrepresentation.
arXiv Detail & Related papers (2024-10-14T09:02:42Z)
Multilingual Retrieval Augmented Generation for Culturally-Sensitive Tasks: A Benchmark for Cross-lingual Robustness [30.00463676754559]
We introduce BordIRLines, a benchmark consisting of 720 territorial dispute queries paired with 14k Wikipedia documents across 49 languages. Our experiments reveal that retrieving multilingual documents best improves response consistency and decreases geopolitical bias over using purely in-language documents. Our further experiments and case studies investigate how cross-lingual RAG is affected by aspects from IR to document contents.
arXiv Detail & Related papers (2024-10-02T01:59:07Z)
Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets. This survey delves into an important attribute of these datasets: the dialect of a language. Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z)
End-to-End Lip Reading in Romanian with Cross-Lingual Domain Adaptation and Lateral Inhibition [2.839471733237535]
We analyze several architectures and optimizations on the underrepresented, short-scale Romanian language dataset called Wild LRRo. We obtain state-of-the-art results using our proposed method, namely cross-lingual domain adaptation and unlabeled videos. We also assess the performance of adding a layer inspired by the neural inhibition mechanism.
arXiv Detail & Related papers (2023-10-07T15:36:58Z)
Leveraging Visemes for Better Visual Speech Representation and Lip Reading [2.7836084563851284]
We propose a novel approach that leverages visemes, which are groups of phonetically similar lip shapes, to extract more discriminative and robust video features for lip reading. The proposed method reduces the lip-reading word error rate (WER) by 9.1% relative to the best previous method.
arXiv Detail & Related papers (2023-07-19T17:38:26Z)
FRMT: A Benchmark for Few-Shot Region-Aware Machine Translation [64.9546787488337]
We present FRMT, a new dataset and evaluation benchmark for Few-shot Region-aware Machine Translation. The dataset consists of professional translations from English into two regional variants each of Portuguese and Mandarin Chinese.
arXiv Detail & Related papers (2022-10-01T05:02:04Z)
A Multi-level Supervised Contrastive Learning Framework for Low-Resource Natural Language Inference [54.678516076366506]
Natural Language Inference (NLI) is a growingly essential task in natural language understanding. Here we propose a multi-level supervised contrastive learning framework named MultiSCL for low-resource natural language inference.
arXiv Detail & Related papers (2022-05-31T05:54:18Z)
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video. Recent studies have found that current benchmark datasets may have obvious moment annotation biases. We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z)
A Multimodal German Dataset for Automatic Lip Reading Systems and Transfer Learning [18.862801476204886]
We present the dataset GLips (German Lips) consisting of 250,000 publicly available videos of the faces of speakers of the Hessian Parliament. The format is similar to that of the English language LRW (Lip Reading in the Wild) dataset, with each video encoding one word of interest in a context of 1.16 seconds duration. By training a deep neural network, we investigate whether lip reading has language-independent features, so that datasets of different languages can be used to improve lip reading models.
arXiv Detail & Related papers (2022-02-27T17:37:35Z)
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark. IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages. We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z)
Sub-word Level Lip Reading With Visual Attention [88.89348882036512]
We focus on the unique challenges encountered in lip reading and propose tailored solutions. We obtain state-of-the-art results on the challenging LRS2 and LRS3 benchmarks when training on public datasets. Our best model achieves 22.6% word error rate on the LRS2 dataset, a performance unprecedented for lip reading models.
arXiv Detail & Related papers (2021-10-14T17:59:57Z)
Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval [51.004601358498135]
Mr. TyDi is a benchmark dataset for mono-lingual retrieval in eleven typologically diverse languages. The goal of this resource is to spur research in dense retrieval techniques in non-English languages.
arXiv Detail & Related papers (2021-08-19T16:53:43Z)
Cross-language Sentence Selection via Data Augmentation and Rationale Training [22.106577427237635]
It uses data augmentation and negative sampling techniques on noisy parallel sentence data to learn a cross-lingual embedding-based query relevance model. Results show that this approach performs as well as or better than multiple state-of-the-art machine translation + monolingual retrieval systems trained on the same parallel data.
arXiv Detail & Related papers (2021-06-04T07:08:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.