Search Result Clustering in Collaborative Sound Collections
- URL: http://arxiv.org/abs/2004.03985v1
- Date: Wed, 8 Apr 2020 13:08:17 GMT
- Title: Search Result Clustering in Collaborative Sound Collections
- Authors: Xavier Favory, Frederic Font and Xavier Serra
- Abstract summary: We propose a graph-based approach using audio features for clustering diverse sound collections obtained when querying large online databases.
We show that using a confidence measure for discarding inconsistent clusters improves the quality of the partitions.
- Score: 17.48516881308658
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The large size of nowadays' online multimedia databases makes retrieving
their content a difficult and time-consuming task. Users of online sound
collections typically submit search queries that express a broad intent, often
making the system return large and unmanageable result sets. Search Result
Clustering is a technique that organises search-result content into coherent
groups, which allows users to identify useful subsets in their results.
Obtaining coherent and distinctive clusters that can be explored with a
suitable interface is crucial for making this technique a useful complement of
traditional search engines. In our work, we propose a graph-based approach
using audio features for clustering diverse sound collections obtained when
querying large online databases. We propose an approach to assess the
performance of different features at scale, by taking advantage of the metadata
associated with each sound. This analysis is complemented with an evaluation
using ground-truth labels from manually annotated datasets. We show that using
a confidence measure for discarding inconsistent clusters improves the quality
of the partitions. After identifying the most appropriate features for
clustering, we conduct an experiment with users performing a sound design task,
in order to evaluate our approach and its user interface. A qualitative
analysis is carried out including usability questionnaires and semi-structured
interviews. This provides us with valuable new insights regarding the features
that promote efficient interaction with the clusters.
Related papers
- A structured regression approach for evaluating model performance across intersectional subgroups [53.91682617836498]
Disaggregated evaluation is a central task in AI fairness assessment, where the goal is to measure an AI system's performance across different subgroups.
We introduce a structured regression approach to disaggregated evaluation that we demonstrate can yield reliable system performance estimates even for very small subgroups.
arXiv Detail & Related papers (2024-01-26T14:21:45Z) - One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets.
We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z) - Representation Learning for the Automatic Indexing of Sound Effects
Libraries [79.68916470119743]
We show that a task-specific but dataset-independent representation can successfully address data issues such as class imbalance, inconsistent class labels, and insufficient dataset size.
Detailed experimental results show the impact of metric learning approaches and different cross-dataset training methods on representational effectiveness.
arXiv Detail & Related papers (2022-08-18T23:46:13Z) - Seeking the Truth Beyond the Data. An Unsupervised Machine Learning
Approach [0.0]
Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together.
This article provides a deep description of the most widely used clustering methodologies.
It emphasizes the comparison of these algorithms' clustering efficiency based on 3 datasets.
arXiv Detail & Related papers (2022-07-14T14:22:36Z) - Twitter Referral Behaviours on News Consumption with Ensemble Clustering
of Click-Stream Data in Turkish Media [2.9005223064604078]
This study investigates the readers' click activities in the organizations' websites to identify news consumption patterns following referrals from Twitter.
The investigation is widened to a broad perspective by linking the log data with news content to enrich the insights.
arXiv Detail & Related papers (2022-02-04T09:57:13Z) - A review of systematic selection of clustering algorithms and their
evaluation [0.0]
This paper aims to identify a systematic selection logic for clustering algorithms and corresponding validation concepts.
The goal is to enable potential users to choose an algorithm that fits best to their needs and the properties of their underlying data clustering problem.
arXiv Detail & Related papers (2021-06-24T07:01:46Z) - Connecting Images through Time and Sources: Introducing Low-data,
Heterogeneous Instance Retrieval [3.6526118822907594]
We show that it is not trivial to pick features responding well to a panel of variations and semantic content.
Introducing a new enhanced version of the Alegoria benchmark, we compare descriptors using the detailed annotations.
arXiv Detail & Related papers (2021-03-19T10:54:51Z) - Automatic Curation of Large-Scale Datasets for Audio-Visual
Representation Learning [62.47593143542552]
We describe a subset optimization approach for automatic dataset curation.
We demonstrate that our approach finds videos with high audio-visual correspondence and show that self-supervised models trained on our data, despite being automatically constructed, achieve similar downstream performances to existing video datasets with similar scales.
arXiv Detail & Related papers (2021-01-26T14:27:47Z) - Boxer: Interactive Comparison of Classifier Results [9.957660146705745]
Boxer is a system to enable machine learning comparisons.
It allows the user to identify interesting subsets of training and testing instances and comparing performance of the classifiers on these subsets.
We demonstrate Boxer in use cases including model selection, tuning, fairness assessment, and data quality diagnosis.
arXiv Detail & Related papers (2020-04-16T21:05:34Z) - Mining Implicit Entity Preference from User-Item Interaction Data for
Knowledge Graph Completion via Adversarial Learning [82.46332224556257]
We propose a novel adversarial learning approach by leveraging user interaction data for the Knowledge Graph Completion task.
Our generator is isolated from user interaction data, and serves to improve the performance of the discriminator.
To discover implicit entity preference of users, we design an elaborate collaborative learning algorithms based on graph neural networks.
arXiv Detail & Related papers (2020-03-28T05:47:33Z) - Improving Multi-Turn Response Selection Models with Complementary
Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals.
We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.