COVID-19 Multidimensional Kaggle Literature Organization
- URL: http://arxiv.org/abs/2107.08190v2
- Date: Tue, 20 Jul 2021 01:59:41 GMT
- Title: COVID-19 Multidimensional Kaggle Literature Organization
- Authors: Maksim E. Eren, Nick Solovyev, Chris Hamer, Renee McDonald, Boian S.
Alexandrov, Charles Nicholas
- Abstract summary: We show that factorization is a powerful unsupervised learning method capable of discovering hidden patterns in a document corpus.
We show that a higher-order representation of the corpus allows for the simultaneous grouping of similar articles, relevant journals, authors with similar research interests, and topic keywords.
- Score: 3.201839066679614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The unprecedented outbreak of Severe Acute Respiratory Syndrome Coronavirus-2
(SARS-CoV-2), or COVID-19, continues to be a significant worldwide problem. As
a result, a surge of new COVID-19 related research has followed suit. The
growing number of publications requires document organization methods to
identify relevant information. In this paper, we expand upon our previous work
with clustering the CORD-19 dataset by applying multi-dimensional analysis
methods. Tensor factorization is a powerful unsupervised learning method
capable of discovering hidden patterns in a document corpus. We show that a
higher-order representation of the corpus allows for the simultaneous grouping
of similar articles, relevant journals, authors with similar research
interests, and topic keywords. These groupings are identified within and among
the latent components extracted via tensor decomposition. We further
demonstrate the application of this method with a publicly available
interactive visualization of the dataset.
Related papers
- Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - An Information Retrieval and Extraction Tool for Covid-19 Related Papers [0.0]
The main focus of this paper is to provide researchers with a better search tool for COVID-19 related papers.
Our tool has shown the potential to assist researchers by automating a topic-based search of CORD-19 papers.
arXiv Detail & Related papers (2024-01-20T01:34:50Z) - Exploring the evolution of research topics during the COVID-19 pandemic [3.234641429290768]
We present the CORD-19 Topic Visualizer (CORToViz), a method and associated visualization tool for inspecting the CORD-19 textual corpus of scientific abstracts.
Our method is based upon a careful selection of up-to-date technologies (including large language models) and extraction techniques for temporal topic mining.
Topic inspection is supported by an interactive dashboard, providing fast, one-click visualization of topic contents as word clouds and topic trends as time series.
arXiv Detail & Related papers (2023-10-05T22:16:41Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - A Meta-embedding-based Ensemble Approach for ICD Coding Prediction [64.42386426730695]
International Classification of Diseases (ICD) are the de facto codes used globally for clinical coding.
These codes enable healthcare providers to claim reimbursement and facilitate efficient storage and retrieval of diagnostic information.
Our proposed approach enhances the performance of neural models by effectively training word vectors using routine medical data as well as external knowledge from scientific articles.
arXiv Detail & Related papers (2021-02-26T17:49:58Z) - Contrastive analysis for scatter plot-based representations of
dimensionality reduction [0.0]
This paper introduces a methodology to explore multidimensional datasets and interpret clusters' formation.
We also introduce a bipartite graph to visually interpret and explore the relationship between the statistical variables used to understand how the attributes influenced cluster formation.
arXiv Detail & Related papers (2021-01-26T01:16:31Z) - Navigating the landscape of COVID-19 research through literature
analysis: A bird's eye view [11.362549790802483]
We analyze the LitCovid collection, 13,369 COVID-19 related articles found in PubMed as of May 15th, 2020.
We do that by applying state-of-the-art named entity recognition, classification, clustering and other NLP techniques.
Our clustering algorithm identifies topics represented by groups of related terms, and computes clusters corresponding to documents associated with the topic terms.
arXiv Detail & Related papers (2020-08-07T23:39:29Z) - CO-Search: COVID-19 Information Retrieval with Semantic Search, Question
Answering, and Abstractive Summarization [53.67205506042232]
CO-Search is a retriever-ranker semantic search engine designed to handle complex queries over the COVID-19 literature.
To account for the domain-specific and relatively limited dataset, we generate a bipartite graph of document paragraphs and citations.
We evaluate our system on the data of the TREC-COVID information retrieval challenge.
arXiv Detail & Related papers (2020-06-17T01:32:48Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z) - Target specific mining of COVID-19 scholarly articles using one-class
approach [3.4935179780034247]
This paper aims to extract the activity and trends of corona-virus related research articles using machine learning approaches.
The k-means clustering algorithm, followed by parallel OCSVMs, outperforms other methods for both original and reduced feature space.
arXiv Detail & Related papers (2020-04-24T12:39:54Z) - Rapidly Bootstrapping a Question Answering Dataset for COVID-19 [88.86456834766288]
We present CovidQA, the beginnings of a question answering dataset specifically designed for COVID-19.
This is the first publicly available resource of its type, and intended as a stopgap measure for guiding research until more substantial evaluation resources become available.
arXiv Detail & Related papers (2020-04-23T17:35:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.