Extracting Explainable Dates From Medical Images By Reverse-Engineering UNIX Timestamps
- URL: http://arxiv.org/abs/2505.11451v2
- Date: Tue, 03 Jun 2025 10:43:24 GMT
- Title: Extracting Explainable Dates From Medical Images By Reverse-Engineering UNIX Timestamps
- Authors: Lee Harris,
- Abstract summary: We show that regular expressions can be created through regular expression synthesis to identify complex dates and date ranges in text transcriptions.<n>To our knowledge, our proposed way of learning deterministic logic by reverse-engineering several many-one mappings and feeding these into a regular expression synthesiser is a new approach.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Dates often contribute towards highly impactful medical decisions, but it is rarely clear how to extract this data. AI has only just begun to be used transcribe such documents, and common methods are either to trust that the output produced by a complex AI model, or to parse the text using regular expressions. Recent work has established that regular expressions are an explainable form of logic, but it is difficult to decompose these into the component parts that are required to construct precise UNIX timestamps. First, we test publicly-available regular expressions, and we found that these were unable to capture a significant number of our dates. Next, we manually created easily-decomposable regular expressions, and we found that these were able to detect the majority of real dates, but also a lot of sequences of text that look like dates. Finally, we used regular expression synthesis to automatically identify regular expressions from the reverse-engineered UNIX timestamps that we created. We find that regular expressions created by regular expression synthesis detect far fewer sequences of text that look like dates than those that were manually created, at the cost of a slight increase to the number of missed dates. Overall, our results show that regular expressions can be created through regular expression synthesis to identify complex dates and date ranges in text transcriptions. To our knowledge, our proposed way of learning deterministic logic by reverse-engineering several many-one mappings and feeding these into a regular expression synthesiser is a new approach.
Related papers
- Is Reuse All You Need? A Systematic Comparison of Regular Expression Composition Strategies [5.503553586086489]
Authors: Are composition tasks unique enough to merit dedicated machinery, or is reuse all we need?<n>We collect a novel dataset of composition tasks mined from GitHub and RegExLib.<n>Our evaluation uses multiple dimensions, including a novel metric, to compare reuse-by-example against two synthesis approaches.
arXiv Detail & Related papers (2025-03-26T14:25:27Z) - Handling Numeric Expressions in Automatic Speech Recognition [56.972851337263755]
We compare cascaded and end-to-end approaches to recognize and format numeric expression.
Results show that adapted end-to-end models offer competitive performance with the advantage of lower latency and inference cost.
arXiv Detail & Related papers (2024-07-18T09:46:19Z) - Real-time Regular Expression Matching [65.268245109828]
This paper is devoted to finite state automata, regular expression matching, pattern recognition, and the exponential blow-up problem.
This paper presents a theoretical and hardware solution to the exponential blow-up problem for some complicated classes of regular languages.
arXiv Detail & Related papers (2023-08-20T09:25:40Z) - Linear-Time Modeling of Linguistic Structure: An Order-Theoretic
Perspective [97.57162770792182]
Tasks that model the relation between pairs of tokens in a string are a vital part of understanding natural language.
We show that these exhaustive comparisons can be avoided, and, moreover, the complexity can be reduced to linear by casting the relation between tokens as a partial order over the string.
Our method predicts real numbers for each token in a string in parallel and sorts the tokens accordingly, resulting in total orders of the tokens in the string.
arXiv Detail & Related papers (2023-05-24T11:47:35Z) - Attributable and Scalable Opinion Summarization [79.87892048285819]
We generate abstractive summaries by decoding frequent encodings, and extractive summaries by selecting the sentences assigned to the same frequent encodings.
Our method is attributable, because the model identifies sentences used to generate the summary as part of the summarization process.
It scales easily to many hundreds of input reviews, because aggregation is performed in the latent space rather than over long sequences of tokens.
arXiv Detail & Related papers (2023-05-19T11:30:37Z) - Improving Structured Text Recognition with Regular Expression Biasing [13.801707647700727]
We study the problem of recognizing structured text that follows certain formats.
We propose to improve the recognition accuracy of structured text by specifying regular expressions (regexes) for biasing.
arXiv Detail & Related papers (2021-11-10T23:12:05Z) - FOREST: An Interactive Multi-tree Synthesizer for Regular Expressions [5.21480688623047]
We present FOREST, a regular expression synthesizer for digital form validations.
Forestry produces a regular expression that matches the desired pattern for the input values.
We also present a new SMT encoding to synthesize capture conditions for a given regular expression.
arXiv Detail & Related papers (2020-12-28T14:06:01Z) - Benchmarking Multimodal Regex Synthesis with Complex Structures [45.35689345004124]
Existing datasets for regular expression (regex) generation from natural language are limited in complexity.
We introduce StructuredRegex, a new synthesis dataset differing from prior ones in three aspects.
arXiv Detail & Related papers (2020-05-02T00:16:09Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.