NLP Scholar: An Interactive Visual Explorer for Natural Language
Processing Literature
- URL: http://arxiv.org/abs/2006.01131v1
- Date: Sun, 31 May 2020 17:12:37 GMT
- Title: NLP Scholar: An Interactive Visual Explorer for Natural Language
Processing Literature
- Authors: Saif M. Mohammad
- Abstract summary: We describe several interconnected interactive visualizations (dashboards) that present various aspects of the data.
The interactive visualizations presented here, and the associated dataset of papers mapped to citations, have additional uses as well including understanding how the field is growing.
- Score: 31.87319293259599
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As part of the NLP Scholar project, we created a single unified dataset of
NLP papers and their meta-information (including citation numbers), by
extracting and aligning information from the ACL Anthology and Google Scholar.
In this paper, we describe several interconnected interactive visualizations
(dashboards) that present various aspects of the data. Clicking on an item
within a visualization or entering query terms in the search boxes filters the
data in all visualizations in the dashboard. This allows users to search for
papers in the area of their interest, published within specific time periods,
published by specified authors, etc. The interactive visualizations presented
here, and the associated dataset of papers mapped to citations, have additional
uses as well including understanding how the field is growing (both overall and
across sub-areas), as well as quantifying the impact of different types of
papers on subsequent publications.
Related papers
- Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities.
Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z) - cPAPERS: A Dataset of Situated and Multimodal Interactive Conversations in Scientific Papers [5.103692331918768]
This work introduces Conversational Papers (cPAPERS), a dataset of conversational question-answer pairs from reviews of academic papers.
We present a data collection strategy to collect these question-answer pairs from OpenReview and associate them with contextual information from source files.
arXiv Detail & Related papers (2024-06-12T16:46:12Z) - Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions [62.0123588983514]
Large Language Models (LLMs) have demonstrated wide-ranging applications across various fields.
We reformulate the peer-review process as a multi-turn, long-context dialogue, incorporating distinct roles for authors, reviewers, and decision makers.
We construct a comprehensive dataset containing over 26,841 papers with 92,017 reviews collected from multiple sources.
arXiv Detail & Related papers (2024-06-09T08:24:17Z) - DiscoverPath: A Knowledge Refinement and Retrieval System for
Interdisciplinarity on Biomedical Research [96.10765714077208]
Traditional keyword-based search engines fall short in assisting users who may not be familiar with specific terminologies.
We present a knowledge graph-based paper search engine for biomedical research to enhance the user experience.
The system, dubbed DiscoverPath, employs Named Entity Recognition (NER) and part-of-speech (POS) tagging to extract terminologies and relationships from article abstracts to create a KG.
arXiv Detail & Related papers (2023-09-04T20:52:33Z) - SciLit: A Platform for Joint Scientific Literature Discovery,
Summarization and Citation Generation [11.186252009101077]
We propose SciLit, a pipeline that automatically recommends relevant papers, extracts highlights, and suggests a reference sentence as a citation of a paper.
SciLit efficiently recommends papers from large databases of hundreds of millions of papers using a two-stage pre-fetching and re-ranking literature search system.
arXiv Detail & Related papers (2023-06-06T09:34:45Z) - Topic Segmentation of Research Article Collections [4.0810783261728565]
We perform topic segmentation of a paper data collection that we crawled and produce a multitopic dataset of roughly seven million paper data records.
We construct a taxonomy of topics extracted from the data records and then annotate each document with its corresponding topic from that taxonomy.
It is possible to use this newly proposed dataset in two modalities: as a heterogeneous collection of documents from various disciplines or as a set of homogeneous collections, each from a single research topic.
arXiv Detail & Related papers (2022-05-18T15:19:42Z) - iFacetSum: Coreference-based Interactive Faceted Summarization for
Multi-Document Exploration [63.272359227081836]
iFacetSum integrates interactive summarization together with faceted search.
Fine-grained facets are automatically produced based on cross-document coreference pipelines.
arXiv Detail & Related papers (2021-09-23T20:01:11Z) - A Graph Representation of Semi-structured Data for Web Question
Answering [96.46484690047491]
We propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations.
Our method improves F1 score by 3.90 points over the state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-14T04:01:54Z) - Machine Identification of High Impact Research through Text and Image
Analysis [0.4737991126491218]
We present a system to automatically separate papers with a high from those with a low likelihood of gaining citations.
Our system uses both a visual classifier, useful for surmising a document's overall appearance, and a text classifier, for making content-informed decisions.
arXiv Detail & Related papers (2020-05-20T19:12:24Z) - Let Me Choose: From Verbal Context to Font Selection [50.293897197235296]
We aim to learn associations between visual attributes of fonts and the verbal context of the texts they are typically applied to.
We introduce a new dataset, containing examples of different topics in social media posts and ads, labeled through crowd-sourcing.
arXiv Detail & Related papers (2020-05-03T17:36:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.