Strategies of Effective Digitization of Commentaries and
Sub-commentaries: Towards the Construction of Textual History
- URL: http://arxiv.org/abs/2201.01693v1
- Date: Wed, 5 Jan 2022 16:43:43 GMT
- Title: Strategies of Effective Digitization of Commentaries and
Sub-commentaries: Towards the Construction of Textual History
- Authors: Diptesh Kanojia, Malhar Kulkarni, Sayali Ghodekar, Eivind Kahrs,
Pushpak Bhattacharyya
- Abstract summary: We use the text of the K=a'sik=avrtti ( KV) as a sample text, and with the help of philologists, we digitize the commentaries available to us.
We divide each commentary and sub-commentary into functional units and describe the methodology and motivation behind the functional unit division.
- Score: 26.355399011710944
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper describes additional aspects of a digital tool called the 'Textual
History Tool'. We describe its various salient features with special reference
to those of its features that may help the philologist digitize commentaries
and sub-commentaries on a text. This tool captures the historical evolution of
a text through various temporal stages, and interrelated data culled from
various types of related texts. We use the text of the K\=a\'sik\=avrtti (KV)
as a sample text, and with the help of philologists, we digitize the
commentaries available to us. We digitize the Ny\=asa (Ny), the Padama\~njar\=i
(Pm) and sub commentaries on the KV text known as the Tantraprad\=ipa (Tp), and
the Makaranda (Mk). We divide each commentary and sub-commentary into
functional units and describe the methodology and motivation behind the
functional unit division. Our functional unit division helps generate more
accurate phylogenetic trees for the text, based on distance methods using the
data entered in the tool.
Related papers
- ParsiPy: NLP Toolkit for Historical Persian Texts in Python [1.637832760977605]
This work introduces ParsiPy, an NLP toolkit to handle phonetic transcriptions and analyze ancient texts.
ParsiPy offers modules for tokenization, lemmatization, part-of-speech tagging, phoneme-to-transliteration conversion, and word embedding.
arXiv Detail & Related papers (2025-03-22T16:21:29Z) - The Learnable Typewriter: A Generative Approach to Text Analysis [17.355857281085164]
We present a generative document-specific approach to character analysis and recognition in text lines.
Taking as input a set of text lines with similar font or handwriting, our approach can learn a large number of different characters.
arXiv Detail & Related papers (2023-02-03T11:17:59Z) - Classifying text using machine learning models and determining
conversation drift [4.785406121053965]
An analysis of various types of texts is invaluable to understanding both their semantic meaning, as well as their relevance.
Text classification is a method of categorising documents.
It combines computer text classification and natural language processing to analyse text in aggregate.
arXiv Detail & Related papers (2022-11-15T18:09:45Z) - Beyond Text Generation: Supporting Writers with Continuous Automatic
Text Summaries [27.853155569154705]
We propose a text editor to help users plan, structure and reflect on their writing process.
It provides continuously updated paragraph-wise summaries as margin annotations, using automatic text summarization.
arXiv Detail & Related papers (2022-08-19T13:09:56Z) - Contrastive Graph Multimodal Model for Text Classification in Videos [9.218562155255233]
We are the first to address this new task of video text classification by fusing multimodal information.
We tailor a specific module called CorrelationNet to reinforce feature representation by explicitly extracting layout information.
We construct a new well-defined industrial dataset from the news domain, called TI-News, which is dedicated to building and evaluating video text recognition and classification applications.
arXiv Detail & Related papers (2022-06-06T04:06:21Z) - Discourse Analysis for Evaluating Coherence in Video Paragraph Captions [99.37090317971312]
We are exploring a novel discourse based framework to evaluate the coherence of video paragraphs.
Central to our approach is the discourse representation of videos, which helps in modeling coherence of paragraphs conditioned on coherence of videos.
Our experiment results have shown that the proposed framework evaluates coherence of video paragraphs significantly better than all the baseline methods.
arXiv Detail & Related papers (2022-01-17T04:23:08Z) - SCROLLS: Standardized CompaRison Over Long Language Sequences [62.574959194373264]
We introduce SCROLLS, a suite of tasks that require reasoning over long texts.
SCROLLS contains summarization, question answering, and natural language inference tasks.
We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.
arXiv Detail & Related papers (2022-01-10T18:47:15Z) - Digital Editions as Distant Supervision for Layout Analysis of Printed
Books [76.29918490722902]
We describe methods for exploiting this semantic markup as distant supervision for training and evaluating layout analysis models.
In experiments with several model architectures on the half-million pages of the Deutsches Textarchiv (DTA), we find a high correlation of these region-level evaluation methods with pixel-level and word-level metrics.
We discuss the possibilities for improving accuracy with self-training and the ability of models trained on the DTA to generalize to other historical printed books.
arXiv Detail & Related papers (2021-12-23T16:51:53Z) - Integrating Visuospatial, Linguistic and Commonsense Structure into
Story Visualization [81.26077816854449]
We first explore the use of constituency parse trees for encoding structured input.
Second, we augment the structured input with commonsense information and study the impact of this external knowledge on the generation of visual story.
Third, we incorporate visual structure via bounding boxes and dense captioning to provide feedback about the characters/objects in generated images.
arXiv Detail & Related papers (2021-10-21T00:16:02Z) - Aligning Subtitles in Sign Language Videos [80.20961722170655]
We train on manually annotated alignments covering over 15K subtitles that span 17.7 hours of video.
We use BERT subtitle embeddings and CNN video representations learned for sign recognition to encode the two signals.
Our model outputs frame-level predictions, i.e., for each video frame, whether it belongs to the queried subtitle or not.
arXiv Detail & Related papers (2021-05-06T17:59:36Z) - Topical Change Detection in Documents via Embeddings of Long Sequences [4.13878392637062]
We formulate the task of text segmentation as an independent supervised prediction task.
By fine-tuning on paragraphs of similar sections, we are able to show that learned features encode topic information.
Unlike previous approaches, which mostly operate on sentence-level, we consistently use a broader context.
arXiv Detail & Related papers (2020-12-07T12:09:37Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.