Automatic Embedding of Stories Into Collections of Independent Media
- URL: http://arxiv.org/abs/2111.02216v1
- Date: Wed, 3 Nov 2021 13:36:47 GMT
- Title: Automatic Embedding of Stories Into Collections of Independent Media
- Authors: Dylan R. Ashley and Vincent Herrmann and Zachary Friggstad and Kory W.
Mathewson and J\"urgen Schmidhuber
- Abstract summary: We look at how machine learning techniques can be used to automatically embed stories into collections of independent media.
We use models that extract the tempo of songs to make a music playlist follow a narrative arc.
- Score: 5.188557858279645
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We look at how machine learning techniques that derive properties of items in
a collection of independent media can be used to automatically embed stories
into such collections. To do so, we use models that extract the tempo of songs
to make a music playlist follow a narrative arc. Our work specifies an
open-source tool that uses pre-trained neural network models to extract the
global tempo of a set of raw audio files and applies these measures to create a
narrative-following playlist. This tool is available at
https://github.com/dylanashley/playlist-story-builder/releases/tag/v1.0.0
Related papers
- Look, Listen and Recognise: Character-Aware Audio-Visual Subtitling [62.25533750469467]
We propose an audio-visual method that generates a full transcript of the dialogue, with precise speech timestamps, and the character speaking identified.
We evaluate the method over a variety of TV sitcoms, including Seinfeld, Fraiser and Scrubs.
We envision this system being useful for the automatic generation of subtitles to improve the accessibility of videos available on modern streaming services.
arXiv Detail & Related papers (2024-01-22T15:26:01Z) - WikiMuTe: A web-sourced dataset of semantic descriptions for music audio [7.4327407361824935]
We present WikiMuTe, a new and open dataset containing rich semantic descriptions of music.
The data is sourced from Wikipedia's rich catalogue of articles covering musical works.
We train a model that jointly learns text and audio representations and performs cross-modal retrieval.
arXiv Detail & Related papers (2023-12-14T18:38:02Z) - MusicAgent: An AI Agent for Music Understanding and Generation with
Large Language Models [54.55063772090821]
MusicAgent integrates numerous music-related tools and an autonomous workflow to address user requirements.
The primary goal of this system is to free users from the intricacies of AI-music tools, enabling them to concentrate on the creative aspect.
arXiv Detail & Related papers (2023-10-18T13:31:10Z) - Follow Anything: Open-set detection, tracking, and following in
real-time [89.83421771766682]
We present a robotic system to detect, track, and follow any object in real-time.
Our approach, dubbed follow anything'' (FAn), is an open-vocabulary and multimodal model.
FAn can be deployed on a laptop with a lightweight (6-8 GB) graphics card, achieving a throughput of 6-20 frames per second.
arXiv Detail & Related papers (2023-08-10T17:57:06Z) - Separate Anything You Describe [55.0784713558149]
Language-queried audio source separation (LASS) is a new paradigm for computational auditory scene analysis (CASA)
AudioSep is a foundation model for open-domain audio source separation with natural language queries.
arXiv Detail & Related papers (2023-08-09T16:09:44Z) - Noise2Music: Text-conditioned Music Generation with Diffusion Models [73.74580231353684]
We introduce Noise2Music, where a series of diffusion models is trained to generate high-quality 30-second music clips from text prompts.
We find that the generated audio is not only able to faithfully reflect key elements of the text prompt such as genre, tempo, instruments, mood, and era.
Pretrained large language models play a key role in this story -- they are used to generate paired text for the audio of the training set and to extract embeddings of the text prompts ingested by the diffusion models.
arXiv Detail & Related papers (2023-02-08T07:27:27Z) - Music Playlist Title Generation Using Artist Information [4.201869316472344]
We present an encoder-decoder model that generates a playlist title from a sequence of music tracks.
Comparing the track IDs and artist IDs as input sequences, we show that the artist-based approach significantly enhances the performance in terms of word overlap, semantic relevance, and diversity.
arXiv Detail & Related papers (2023-01-14T00:19:39Z) - Spectrograms Are Sequences of Patches [5.253100011321437]
We design a self-supervised model that captures a spectrogram of music as a series of patches: Patchifier.
We do not use labeled data for the pre-training process, only a subset of the MTAT dataset containing 16k music clips.
Our model achieves a considerably acceptable result compared to other audio representation models.
arXiv Detail & Related papers (2022-10-28T08:39:36Z) - Malakai: Music That Adapts to the Shape of Emotions [0.0]
Malakai is a tool that helps users to create, listen, remix and share such dynamic songs.
Using Malakai, a Composer can create a dynamic song that can be interacted with by a Listener.
arXiv Detail & Related papers (2021-12-03T18:34:54Z) - Melon Playlist Dataset: a public dataset for audio-based playlist
generation and music tagging [8.658926288789164]
We present a public dataset of mel-spectrograms for 649,091tracks and 148,826 associated playlists annotated by 30,652 different tags.
All the data is gathered from Melon, a popular Korean streaming service.
The dataset is suitable for music information retrieval tasks, in particular, auto-tagging and automatic playlist continuation.
arXiv Detail & Related papers (2021-01-30T10:13:10Z) - Automatic Curation of Large-Scale Datasets for Audio-Visual
Representation Learning [62.47593143542552]
We describe a subset optimization approach for automatic dataset curation.
We demonstrate that our approach finds videos with high audio-visual correspondence and show that self-supervised models trained on our data, despite being automatically constructed, achieve similar downstream performances to existing video datasets with similar scales.
arXiv Detail & Related papers (2021-01-26T14:27:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.