Related papers: MEG-MASC: a high-quality magneto-encephalography dataset for evaluating natural speech processing

MEG-MASC: a high-quality magneto-encephalography dataset for evaluating natural speech processing

URL: http://arxiv.org/abs/2208.11488v1
Date: Tue, 26 Jul 2022 19:17:01 GMT
Title: MEG-MASC: a high-quality magneto-encephalography dataset for evaluating natural speech processing
Authors: Laura Gwilliams, Graham Flick, Alec Marantz, Liina Pylkkanen, David Poeppel and Jean-Remi King
Abstract summary: The "MEG-MASC" dataset provides a curated set of raw magnetoencephalography (MEG) recordings of 27 English speakers. We time-stamp the onset and offset of each word and phoneme in the metadata of the recording, and organize the dataset according to the 'Brain Imaging Data Structure' (BIDS) This data collection provides a suitable benchmark to large-scale encoding and decoding analyses of temporally-resolved brain responses to speech.
Score: 1.345669927504424
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The "MEG-MASC" dataset provides a curated set of raw magnetoencephalography (MEG) recordings of 27 English speakers who listened to two hours of naturalistic stories. Each participant performed two identical sessions, involving listening to four fictional stories from the Manually Annotated Sub-Corpus (MASC) intermixed with random word lists and comprehension questions. We time-stamp the onset and offset of each word and phoneme in the metadata of the recording, and organize the dataset according to the 'Brain Imaging Data Structure' (BIDS). This data collection provides a suitable benchmark to large-scale encoding and decoding analyses of temporally-resolved brain responses to speech. We provide the Python code to replicate several validations analyses of the MEG evoked related fields such as the temporal decoding of phonetic features and word frequency. All code and MEG, audio and text data are publicly available to keep with best practices in transparent and reproducible research.

Related papers

Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics [0.0]
This report serves as my self project wherein models finetuned on medical call recordings are analysed. Automatic Speech Recognition (ASR) for speech transcription and a Large Language Model (LLM) for context-aware, professional responses are analysed.
arXiv Detail & Related papers (2025-02-18T14:05:13Z)
GigaCheck: Detecting LLM-generated Content [72.27323884094953]
In this work, we investigate the task of generated text detection by proposing the GigaCheck. Our research explores two approaches: (i) distinguishing human-written texts from LLM-generated ones, and (ii) detecting LLM-generated intervals in Human-Machine collaborative texts. Specifically, we use a fine-tuned general-purpose LLM in conjunction with a DETR-like detection model, adapted from computer vision, to localize AI-generated intervals within text.
arXiv Detail & Related papers (2024-10-31T08:30:55Z)
Algorithms For Automatic Accentuation And Transcription Of Russian Texts In Speech Recognition Systems [0.0]
This paper presents an overview of rule-based system for automatic accentuation and phonemic transcription of Russian texts. Two parts of the developed system, accentuation and transcription, use different approaches to achieve correct phonemic representations of input phrases. The developed toolkit is written in the Python language and is accessible on GitHub for any researcher interested.
arXiv Detail & Related papers (2024-10-03T14:43:43Z)
A multimodal LLM for the non-invasive decoding of spoken text from brain recordings [0.4187344935012482]
We propose and end-to-end multimodal LLM for decoding spoken text from fMRI signals. The proposed architecture is founded on (i) an encoder derived from a specific transformer incorporating an augmented embedding layer for the encoder and a better-adjusted attention mechanism than that present in the state of the art. A benchmark in performed on a corpus consisting of a set of interactions human-human and human-robot interactions where fMRI and conversational signals are recorded synchronously.
arXiv Detail & Related papers (2024-09-29T14:03:39Z)
Introducing Semantics into Speech Encoders [91.37001512418111]
We propose an unsupervised way of incorporating semantic information from large language models into self-supervised speech encoders without labeled audio transcriptions. Our approach achieves similar performance as supervised methods trained on over 100 hours of labeled audio transcripts.
arXiv Detail & Related papers (2022-11-15T18:44:28Z)
SVTS: Scalable Video-to-Speech Synthesis [105.29009019733803]
We introduce a scalable video-to-speech framework consisting of two components: a video-to-spectrogram predictor and a pre-trained neural vocoder. We are the first to show intelligible results on the challenging LRS3 dataset.
arXiv Detail & Related papers (2022-05-04T13:34:07Z)
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages [58.43299730989809]
We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data. We induce a pseudo language as a compact discrete representation, and formulate a self-supervised pseudo speech recognition task. This process stands on its own, or can be applied as low-cost second-stage pre-training.
arXiv Detail & Related papers (2022-05-02T17:59:02Z)
Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset [51.75617364782418]
This paper introduces a high-quality rich annotated Mandarin conversational (RAMC) speech dataset called MagicData-RAMC. The MagicData-RAMC corpus contains 180 hours of conversational speech data recorded from native speakers of Mandarin Chinese over mobile phones with a sampling rate of 16 kHz.
arXiv Detail & Related papers (2022-03-31T07:01:06Z)
Audio-text Retrieval in Context [24.38055340045366]
In this work, we investigate several audio features as well as sequence aggregation methods for better audio-text alignment. We build our contextual audio-text retrieval system using pre-trained audio features and a descriptor-based aggregation method. With our proposed system, a significant improvement has been achieved on bidirectional audio-text retrieval, on all metrics including recall, median and mean rank.
arXiv Detail & Related papers (2022-03-25T13:41:17Z)
Unsupervised Pattern Discovery from Thematic Speech Archives Based on Multilingual Bottleneck Features [41.951988293049205]
We propose a two-stage approach, which comprises unsupervised acoustic modeling and decoding, followed by pattern mining in acoustic unit sequences. The proposed system is able to effectively extract topic-related words and phrases from the lecture recordings on MIT OpenCourseWare.
arXiv Detail & Related papers (2020-11-03T20:06:48Z)
Continuous speech separation: dataset and analysis [52.10378896407332]
In natural conversations, a speech signal is continuous, containing both overlapped and overlap-free components. This paper describes a dataset and protocols for evaluating continuous speech separation algorithms.
arXiv Detail & Related papers (2020-01-30T18:01:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.