Entities of Interest
- URL: http://arxiv.org/abs/2102.10962v1
- Date: Mon, 22 Feb 2021 13:07:48 GMT
- Title: Entities of Interest
- Authors: David Graus
- Abstract summary: This dissertation revolves around discovery in digital traces, and sits at the intersection of Information Retrieval, Natural Language Processing, and applied Machine Learning.
We propose computational methods that aim to support the exploration and sense-making process of large collections of digital traces.
- Score: 2.609279398946235
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the era of big data, we continuously - and at times unknowingly - leave
behind digital traces, by browsing, sharing, posting, liking, searching,
watching, and listening to online content. When aggregated, these digital
traces can provide powerful insights into the behavior, preferences,
activities, and traits of people. While many have raised privacy concerns
around the use of aggregated digital traces, it has undisputedly brought us
many advances, from the search engines that learn from their users and enable
our access to unforeseen amounts of data, knowledge, and information, to, e.g.,
the discovery of previously unknown adverse drug reactions from search engine
logs.
Whether in online services, journalism, digital forensics, law, or research,
we increasingly set out to exploring large amounts of digital traces to
discover new information. Consider for instance, the Enron scandal, Hillary
Clinton's email controversy, or the Panama papers: cases that revolve around
analyzing, searching, investigating, exploring, and turning upside down large
amounts of digital traces to gain new insights, knowledge, and information.
This discovery task is at its core about "finding evidence of activity in the
real world."
This dissertation revolves around discovery in digital traces, and sits at
the intersection of Information Retrieval, Natural Language Processing, and
applied Machine Learning. We propose computational methods that aim to support
the exploration and sense-making process of large collections of digital
traces. We focus on textual traces, e.g., emails and social media streams, and
address two aspects that are central to discovery in digital traces.
Related papers
- Tourists Profiling by Interest Analysis [0.0]
It is now easier to examine behaviors of tourists using digital traces they leave during their travels.<n>We suggest a study focused on both qualitative and quantitative aspect of digital traces to understand the dynamics governing tourist behavior.
arXiv Detail & Related papers (2025-12-05T18:35:49Z) - Knowing Unknowns in an Age of Information Overload [1.5229257192293202]
The problem of incomplete information consumption stems from the nature of explicitly ranked information on digital platforms.<n>We propose an innovative metric that quantifies information completeness.<n>We find causal evidence that awareness of information completeness while browsing the Internet reduces resistance to factual information.
arXiv Detail & Related papers (2025-10-12T02:31:33Z) - From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents [96.65646344634524]
Large Language Models (LLMs), endowed with reasoning and agentic capabilities, are ushering in a new paradigm termed Agentic Deep Research.<n>We trace the evolution from static web search to interactive, agent-based systems that plan, explore, and learn.<n>We demonstrate that Agentic Deep Research not only significantly outperforms existing approaches, but is also poised to become the dominant paradigm for future information seeking.
arXiv Detail & Related papers (2025-06-23T17:27:19Z) - SoK: Advances and Open Problems in Web Tracking [71.54586748169943]
Web tracking is a pervasive and opaque practice that enables personalized advertising, and conversion tracking.<n>Web tracking is undergoing a once-in-a-generation transformation driven by shifts in the advertising industry, the adoption of anti-tracking countermeasures by browsers, and the growing enforcement of emerging privacy regulations.<n>This Systematization of Knowledge (SoK) aims to consolidate and synthesize this wide-ranging research, offering a comprehensive overview of the technical mechanisms, countermeasures, and regulations that shape the modern and rapidly evolving web tracking landscape.
arXiv Detail & Related papers (2025-06-16T23:30:54Z) - Fingerprinting and Tracing Shadows: The Development and Impact of Browser Fingerprinting on Digital Privacy [55.2480439325792]
Browser fingerprinting is a growing technique for identifying and tracking users online without traditional methods like cookies.
This paper gives an overview by examining the various fingerprinting techniques and analyzes the entropy and uniqueness of the collected data.
arXiv Detail & Related papers (2024-11-18T20:32:31Z) - Advancing Web Browser Forensics: Critical Evaluation of Emerging Tools and Techniques [6.691341144481509]
Web forensics involves collecting and analyzing browser artifacts, such as browser history, search keywords, and downloads.
This paper defines four browsing scenarios to perform a comprehensive evaluation of popular browsers.
arXiv Detail & Related papers (2024-10-16T14:24:16Z) - Digital Fingerprinting on Multimedia: A Survey [38.00034058447254]
The survey first introduces the definition, characteristics, and related concepts of digital fingerprints.
It then focuses on analyzing and summarizing the algorithms for extracting unimodal fingerprints of different types of digital content.
The survey elaborates on the various practical applications of digital fingerprints and outlines the challenges and potential future research directions.
arXiv Detail & Related papers (2024-08-26T09:59:45Z) - An Innovative Tool for Uploading/Scraping Large Image Datasets on Social
Networks [9.27070946719462]
We propose an automated approach by means of a digital tool that we created on purpose.
The tool is capable of automatically uploading an entire image dataset to the desired digital platform and then downloading all the uploaded pictures.
arXiv Detail & Related papers (2023-11-01T23:27:37Z) - A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning [58.107474025048866]
Forgetting refers to the loss or deterioration of previously acquired knowledge.
Forgetting is a prevalent phenomenon observed in various other research domains within deep learning.
arXiv Detail & Related papers (2023-07-16T16:27:58Z) - Fighting Malicious Media Data: A Survey on Tampering Detection and
Deepfake Detection [115.83992775004043]
Recent advances in deep learning, particularly deep generative models, open the doors for producing perceptually convincing images and videos at a low cost.
This paper provides a comprehensive review of the current media tampering detection approaches, and discusses the challenges and trends in this field for future research.
arXiv Detail & Related papers (2022-12-12T02:54:08Z) - Knowledge-augmented Deep Learning and Its Applications: A Survey [60.221292040710885]
knowledge-augmented deep learning (KADL) aims to identify domain knowledge and integrate it into deep models for data-efficient, generalizable, and interpretable deep learning.
This survey subsumes existing works and offers a bird's-eye view of research in the general area of knowledge-augmented deep learning.
arXiv Detail & Related papers (2022-11-30T03:44:15Z) - Deep Person Generation: A Survey from the Perspective of Face, Pose and
Cloth Synthesis [55.72674354651122]
We first summarize the scope of person generation, then systematically review recent progress and technical trends in deep person generation.
More than two hundred papers are covered for a thorough overview, and the milestone works are highlighted to witness the major technical breakthrough.
We hope this survey could shed some light on the future prospects of deep person generation, and provide a helpful foundation for full applications towards digital human.
arXiv Detail & Related papers (2021-09-05T14:15:24Z) - A Search Engine for Scientific Publications: a Cybersecurity Case Study [0.7734726150561086]
This work proposes a new search engine for scientific publications which combines both information retrieval and reading comprehension algorithms.
The proposed solution although being applied to the context of cybersecurity exhibited great generalization capabilities and can be easily adapted to perform under other distinct knowledge domains.
arXiv Detail & Related papers (2021-06-30T20:10:04Z) - Visual Exploration and Knowledge Discovery from Biomedical Dark Data [0.0]
We employ a natural language processing based pipeline to discover knowledge out of the biomedical dark data.
We aim to proffer a potential solution to overcome the problem of analyzing overwhelming amounts of information.
arXiv Detail & Related papers (2020-09-28T04:27:05Z) - A Survey on Knowledge Graphs: Representation, Acquisition and
Applications [89.78089494738002]
We review research topics about 1) knowledge graph representation learning, 2) knowledge acquisition and completion, 3) temporal knowledge graph, and 4) knowledge-aware applications.
For knowledge acquisition, especially knowledge graph completion, embedding methods, path inference, and logical rule reasoning, are reviewed.
We explore several emerging topics, including meta learning, commonsense reasoning, and temporal knowledge graphs.
arXiv Detail & Related papers (2020-02-02T13:17:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.