Decoding MIE: A Novel Dataset Approach Using Topic Extraction and Affiliation Parsing
- URL: http://arxiv.org/abs/2410.04602v1
- Date: Sun, 06 Oct 2024 19:34:23 GMT
- Title: Decoding MIE: A Novel Dataset Approach Using Topic Extraction and Affiliation Parsing
- Authors: Ehsan Bitaraf, Maryam Jafarpour,
- Abstract summary: This study introduces a novel dataset derived from the Medical Informatics Europe (MIE) Conference proceedings.
We extracted and processed metadata and abstract from 4,606 articles published in the "Studies in Health Technology and Informatics" journal series.
- Score: 0.0
- License:
- Abstract: The rapid expansion of medical informatics literature presents significant challenges in synthesizing and analyzing research trends. This study introduces a novel dataset derived from the Medical Informatics Europe (MIE) Conference proceedings, addressing the need for sophisticated analytical tools in the field. Utilizing the Triple-A software, we extracted and processed metadata and abstract from 4,606 articles published in the "Studies in Health Technology and Informatics" journal series, focusing on MIE conferences from 1996 onwards. Our methodology incorporated advanced techniques such as affiliation parsing using the TextRank algorithm. The resulting dataset, available in JSON format, offers a comprehensive view of bibliometric details, extracted topics, and standardized affiliation information. Analysis of this data revealed interesting patterns in Digital Object Identifier usage, citation trends, and authorship attribution across the years. Notably, we observed inconsistencies in author data and a brief period of linguistic diversity in publications. This dataset represents a significant contribution to the medical informatics community, enabling longitudinal studies of research trends, collaboration network analyses, and in-depth bibliometric investigations. By providing this enriched, structured resource spanning nearly three decades of conference proceedings, we aim to facilitate novel insights and advancements in the rapidly evolving field of medical informatics.
Related papers
- A Review on Generative AI Models for Synthetic Medical Text, Time Series, and Longitudinal Data [0.3374875022248865]
This paper presents the results of a novel scoping review on the practical models for generating three different types of synthetic health records (SHRs)
In total, 52 publications met the eligibility criteria for generating medical time series (22), longitudinal data (17), and medical text (13).
Privacy preservation was found to be the main research objective of the studied papers, along with class imbalance, data scarcity, and data imputation as the other objectives.
arXiv Detail & Related papers (2024-11-19T06:53:54Z) - Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - CARE: Extracting Experimental Findings From Clinical Literature [29.763929941107616]
This work presents CARE, a new IE dataset for the task of extracting clinical findings.
We develop a new annotation schema capturing fine-grained findings as n-ary relations between entities and attributes.
We collect extensive annotations for 700 abstracts from two sources: clinical trials and case reports.
arXiv Detail & Related papers (2023-11-16T10:06:19Z) - Leveraging text data for causal inference using electronic health records [1.4182510510164876]
This paper presents a unified framework for leveraging text data to support causal inference with electronic health data.
We show how incorporating text data in a traditional matching analysis can help strengthen the validity of an estimated treatment effect.
We believe these methods have the potential to expand the scope of secondary analysis of clinical data to domains where structured EHR data is limited.
arXiv Detail & Related papers (2023-06-09T16:06:02Z) - Application of Transformers based methods in Electronic Medical Records:
A Systematic Literature Review [77.34726150561087]
This work presents a systematic literature review of state-of-the-art advances using transformer-based methods on electronic medical records (EMRs) in different NLP tasks.
arXiv Detail & Related papers (2023-04-05T22:19:42Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - Multimodal Machine Learning in Precision Health [10.068890037410316]
This review was conducted to summarize this field and identify topics ripe for future research.
We used a combination of content analysis and literature searches to establish search strings and databases of PubMed, Google Scholar, and IEEEXplore from 2011 to 2021.
The most common form of information fusion was early fusion. Notably, there was an improvement in predictive performance performing heterogeneous data fusion.
arXiv Detail & Related papers (2022-04-10T21:56:07Z) - Deep Learning Schema-based Event Extraction: Literature Review and
Current Trends [60.29289298349322]
Event extraction technology based on deep learning has become a research hotspot.
This paper fills the gap by reviewing the state-of-the-art approaches, focusing on deep learning-based models.
arXiv Detail & Related papers (2021-07-05T16:32:45Z) - What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work.
We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels.
We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z) - Visual Exploration and Knowledge Discovery from Biomedical Dark Data [0.0]
We employ a natural language processing based pipeline to discover knowledge out of the biomedical dark data.
We aim to proffer a potential solution to overcome the problem of analyzing overwhelming amounts of information.
arXiv Detail & Related papers (2020-09-28T04:27:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.