MEG-MASC: a high-quality magneto-encephalography dataset for evaluating
natural speech processing
- URL: http://arxiv.org/abs/2208.11488v1
- Date: Tue, 26 Jul 2022 19:17:01 GMT
- Title: MEG-MASC: a high-quality magneto-encephalography dataset for evaluating
natural speech processing
- Authors: Laura Gwilliams, Graham Flick, Alec Marantz, Liina Pylkkanen, David
Poeppel and Jean-Remi King
- Abstract summary: The "MEG-MASC" dataset provides a curated set of raw magnetoencephalography (MEG) recordings of 27 English speakers.
We time-stamp the onset and offset of each word and phoneme in the metadata of the recording, and organize the dataset according to the 'Brain Imaging Data Structure' (BIDS)
This data collection provides a suitable benchmark to large-scale encoding and decoding analyses of temporally-resolved brain responses to speech.
- Score: 1.345669927504424
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The "MEG-MASC" dataset provides a curated set of raw magnetoencephalography
(MEG) recordings of 27 English speakers who listened to two hours of
naturalistic stories. Each participant performed two identical sessions,
involving listening to four fictional stories from the Manually Annotated
Sub-Corpus (MASC) intermixed with random word lists and comprehension
questions. We time-stamp the onset and offset of each word and phoneme in the
metadata of the recording, and organize the dataset according to the 'Brain
Imaging Data Structure' (BIDS). This data collection provides a suitable
benchmark to large-scale encoding and decoding analyses of temporally-resolved
brain responses to speech. We provide the Python code to replicate several
validations analyses of the MEG evoked related fields such as the temporal
decoding of phonetic features and word frequency. All code and MEG, audio and
text data are publicly available to keep with best practices in transparent and
reproducible research.
Related papers
- Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics [0.0]
This report serves as my self project wherein models finetuned on medical call recordings are analysed.
Automatic Speech Recognition (ASR) for speech transcription and a Large Language Model (LLM) for context-aware, professional responses are analysed.
arXiv Detail & Related papers (2025-02-18T14:05:13Z) - GigaCheck: Detecting LLM-generated Content [72.27323884094953]
In this work, we investigate the task of generated text detection by proposing the GigaCheck.
Our research explores two approaches: (i) distinguishing human-written texts from LLM-generated ones, and (ii) detecting LLM-generated intervals in Human-Machine collaborative texts.
Specifically, we use a fine-tuned general-purpose LLM in conjunction with a DETR-like detection model, adapted from computer vision, to localize AI-generated intervals within text.
arXiv Detail & Related papers (2024-10-31T08:30:55Z) - Algorithms For Automatic Accentuation And Transcription Of Russian Texts In Speech Recognition Systems [0.0]
This paper presents an overview of rule-based system for automatic accentuation and phonemic transcription of Russian texts.
Two parts of the developed system, accentuation and transcription, use different approaches to achieve correct phonemic representations of input phrases.
The developed toolkit is written in the Python language and is accessible on GitHub for any researcher interested.
arXiv Detail & Related papers (2024-10-03T14:43:43Z) - A multimodal LLM for the non-invasive decoding of spoken text from brain recordings [0.4187344935012482]
We propose and end-to-end multimodal LLM for decoding spoken text from fMRI signals.
The proposed architecture is founded on (i) an encoder derived from a specific transformer incorporating an augmented embedding layer for the encoder and a better-adjusted attention mechanism than that present in the state of the art.
A benchmark in performed on a corpus consisting of a set of interactions human-human and human-robot interactions where fMRI and conversational signals are recorded synchronously.
arXiv Detail & Related papers (2024-09-29T14:03:39Z) - Introducing Semantics into Speech Encoders [91.37001512418111]
We propose an unsupervised way of incorporating semantic information from large language models into self-supervised speech encoders without labeled audio transcriptions.
Our approach achieves similar performance as supervised methods trained on over 100 hours of labeled audio transcripts.
arXiv Detail & Related papers (2022-11-15T18:44:28Z) - SVTS: Scalable Video-to-Speech Synthesis [105.29009019733803]
We introduce a scalable video-to-speech framework consisting of two components: a video-to-spectrogram predictor and a pre-trained neural vocoder.
We are the first to show intelligible results on the challenging LRS3 dataset.
arXiv Detail & Related papers (2022-05-04T13:34:07Z) - Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo
Languages [58.43299730989809]
We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data.
We induce a pseudo language as a compact discrete representation, and formulate a self-supervised pseudo speech recognition task.
This process stands on its own, or can be applied as low-cost second-stage pre-training.
arXiv Detail & Related papers (2022-05-02T17:59:02Z) - Open Source MagicData-RAMC: A Rich Annotated Mandarin
Conversational(RAMC) Speech Dataset [51.75617364782418]
This paper introduces a high-quality rich annotated Mandarin conversational (RAMC) speech dataset called MagicData-RAMC.
The MagicData-RAMC corpus contains 180 hours of conversational speech data recorded from native speakers of Mandarin Chinese over mobile phones with a sampling rate of 16 kHz.
arXiv Detail & Related papers (2022-03-31T07:01:06Z) - Audio-text Retrieval in Context [24.38055340045366]
In this work, we investigate several audio features as well as sequence aggregation methods for better audio-text alignment.
We build our contextual audio-text retrieval system using pre-trained audio features and a descriptor-based aggregation method.
With our proposed system, a significant improvement has been achieved on bidirectional audio-text retrieval, on all metrics including recall, median and mean rank.
arXiv Detail & Related papers (2022-03-25T13:41:17Z) - Unsupervised Pattern Discovery from Thematic Speech Archives Based on
Multilingual Bottleneck Features [41.951988293049205]
We propose a two-stage approach, which comprises unsupervised acoustic modeling and decoding, followed by pattern mining in acoustic unit sequences.
The proposed system is able to effectively extract topic-related words and phrases from the lecture recordings on MIT OpenCourseWare.
arXiv Detail & Related papers (2020-11-03T20:06:48Z) - Continuous speech separation: dataset and analysis [52.10378896407332]
In natural conversations, a speech signal is continuous, containing both overlapped and overlap-free components.
This paper describes a dataset and protocols for evaluating continuous speech separation algorithms.
arXiv Detail & Related papers (2020-01-30T18:01:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.