Related papers: Boostlet.js: Image processing plugins for the web via JavaScript injection

Related papers

Nested Browser-Use Learning for Agentic Information Seeking [60.775556172513014]
Information-seeking (IS) agents have achieved strong performance across a range of wide and deep search tasks, yet their tool use remains largely restricted to API-level snippet retrieval and URL-based page fetching.<n>We propose Nested Browser-Use Learning (NestBrowse), which introduces a minimal and complete browser-action framework that decouples interaction control from page exploration through a nested structure.
arXiv Detail & Related papers (2025-12-29T17:59:14Z)
Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform [104.39464309969253]
We present Visionary, an open, web-native platform for real-time various Gaussian Splatting and rendering.<n> Visionary enables dynamic neural processing while maintaining a lightweight, "click-to-run" browser experience.
arXiv Detail & Related papers (2025-12-09T10:54:58Z)
Development of an Automated Web Application for Efficient Web Scraping: Design and Implementation [0.0]
This paper presents the design and implementation of a user-friendly, automated web application that simplifies and optimize the web scraping process for non-technical users.<n>The application breaks down the complex task of web scraping into three main stages: fetching, extraction, and execution.<n>This automated tool not only enhances the efficiency of web scraping but also democratizes access to data extraction by empowering users of all technical levels to gather and manage data tailored to their needs.
arXiv Detail & Related papers (2025-10-22T04:56:00Z)
WALT: Web Agents that Learn Tools [66.73502484310121]
WALT is a framework that reverse-engineers latent website functionality into reusable invocable tools.<n>Rather than hypothesizing ad-hoc skills, WALT exposes robust implementations of automations already designed into websites.<n>On VisualWebArena and WebArena, WALT achieves higher success with fewer steps and less LLM-dependent reasoning.
arXiv Detail & Related papers (2025-10-01T23:41:47Z)
ZjsComponent: A Pragmatic Approach to Modular, Reusable UI Fragments for Web Development [0.0]
ZjsComponent is a lightweight and framework-agnostic web component designed for creating modular, reusable UI elements.<n>ZjsComponent does not require build-steps, transpiling, pre-compilation, any specific ecosystem or any other dependency.
arXiv Detail & Related papers (2025-05-04T08:57:31Z)
BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing [86.26405009039868]
BlobCtrl is a framework that unifies element-level generation and editing using a probabilistic blob-based representation. Our approach effectively decouples and represents spatial location, semantic content, and identity information. Experiments show that BlobCtrl excels in various element-level manipulation tasks while maintaining computational efficiency.
arXiv Detail & Related papers (2025-03-17T17:58:05Z)
WebLLM: A High-Performance In-Browser LLM Inference Engine [9.771248136952039]
WebLLM is an open-source framework that enables high-performance LLM inference in web browsers. WebLLM provides an OpenAI-style API for seamless integration into web applications. Evaluations show that WebLLM can retain up to 80% native performance on the same device.
arXiv Detail & Related papers (2024-12-20T11:24:13Z)
Blendify -- Python rendering framework for Blender [31.334130573156937]
Blendify is a Python-based framework that seamlessly integrates with Blender. It automates object creation, handling the colors and material linking, and implementing features such as shadow-catcher objects.
arXiv Detail & Related papers (2024-10-23T13:29:02Z)
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions [66.92809850624118]
PixWizard is an image-to-image visual assistant designed for image generation, manipulation, and translation based on free-from language instructions. We tackle a variety of vision tasks into a unified image-text-to-image generation framework and curate an Omni Pixel-to-Pixel Instruction-Tuning dataset. Our experiments demonstrate that PixWizard not only shows impressive generative and understanding abilities for images with diverse resolutions but also exhibits promising generalization capabilities with unseen tasks and human instructions.
arXiv Detail & Related papers (2024-09-23T17:59:46Z)
Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid [87.09900996643516]
We introduce a Complementary Image Pyramid (CIP) to mitigate semantic discontinuity during high-resolution image processing. We also introduce a Scale Compression Mechanism (SCM) to reduce the additional computational overhead by compressing the redundant visual tokens. Our experiments demonstrate that CIP can consistently enhance the performance across diverse architectures.
arXiv Detail & Related papers (2024-08-04T13:55:58Z)
DistML.js: Installation-free Distributed Deep Learning Framework for Web Browsers [40.48978035180545]
"DistML.js" is a library designed for training and inference of machine learning models within web browsers. We provide a comprehensive explanation of DistML.js's design, API, and implementation, alongside practical applications.
arXiv Detail & Related papers (2024-07-01T07:13:14Z)
AutoScraper: A Progressive Understanding Web Agent for Web Scraper Generation [54.17246674188208]
Web scraping is a powerful technique that extracts data from websites, enabling automated data collection, enhancing data analysis capabilities, and minimizing manual data entry efforts. Existing methods, wrappers-based methods suffer from limited adaptability and scalability when faced with a new website. We introduce the paradigm of generating web scrapers with large language models (LLMs) and propose AutoScraper, a two-stage framework that can handle diverse and changing web environments more efficiently.
arXiv Detail & Related papers (2024-04-19T09:59:44Z)
EasyPhoto: Your Smart AI Photo Generator [11.926387357705712]
We propose a novel WebUI plugin called EasyPhoto, which enables the generation of AI portraits. By training a digital doppelganger of a specific user ID using 5 to 20 relevant images, the finetuned model allows for the generation of AI photos using arbitrary templates.
arXiv Detail & Related papers (2023-10-07T03:16:56Z)
Internet Explorer: Targeted Representation Learning on the Open Web [121.02587846761627]
Modern vision models typically rely on fine-tuning general-purpose models pre-trained on large, static datasets. We propose dynamically utilizing the Internet to quickly train a small-scale model that does extremely well on the task at hand. Our approach, called Internet Explorer, explores the web in a self-supervised manner to progressively find relevant examples that improve performance on a desired target dataset.
arXiv Detail & Related papers (2023-02-27T18:59:55Z)
FeatureBooster: Boosting Feature Descriptors with a Lightweight Neural Network [16.10404845106396]
We introduce a lightweight network to improve descriptors of keypoints within the same image. The network takes the original descriptors and the geometric properties of keypoints as the input. We use the proposed network to boost both hand-crafted (ORB, SIFT) and the state-of-the-art learning-based descriptors.
arXiv Detail & Related papers (2022-11-28T05:06:03Z)
BRIMA: low-overhead BRowser-only IMage Annotation tool (Preprint) [3.523597468588939]
BRIMA is a flexible open-source browser extension for BRowser-only IMage. It allows the user to easily and efficiently develop and annotate images. It also features cross-browser and cross-platform functionality thus presenting itself as a neat tool for researchers within the Computer Vision, Artificial Intelligence, and privacy-related fields.
arXiv Detail & Related papers (2021-07-13T19:23:13Z)
ZeroShotCeres: Zero-Shot Relation Extraction from Semi-Structured Webpages [66.45377533562417]
We propose a solution for "zero-shot" open-domain relation extraction from webpages with a previously unseen template. Our model uses a graph neural network-based approach to build a rich representation of text fields on a webpage.
arXiv Detail & Related papers (2020-05-14T16:15:58Z)
TorchIO: A Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning [68.8204255655161]
We present TorchIO, an open-source Python library to enable efficient loading, preprocessing, augmentation and patch-based sampling of medical images for deep learning. TorchIO follows the style of PyTorch and integrates standard medical image processing libraries to efficiently process images during training of neural networks. It includes a command-line interface which allows users to apply transforms to image files without using Python.
arXiv Detail & Related papers (2020-03-09T13:36:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.