Related papers: Towards Smart Point-and-Shoot Photography

Towards Smart Point-and-Shoot Photography

URL: http://arxiv.org/abs/2505.03638v1
Date: Tue, 06 May 2025 15:40:14 GMT
Title: Towards Smart Point-and-Shoot Photography
Authors: Jiawan Li, Fei Zhou, Zhipeng Zhong, Jiongzhi Lin, Guoping Qiu,
Abstract summary: We present a first of its kind smart point and shoot (SPAS) system to help users to take good photos.<n>Our SPAS proposes to help users to compose a good shot of a scene by automatically guiding the users to adjust the camera pose live on the scene.<n>We will present extensive results to demonstrate the performances of our SPAS system using publicly available image composition datasets.
Score: 16.192062592740154
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Hundreds of millions of people routinely take photos using their smartphones as point and shoot (PAS) cameras, yet very few would have the photography skills to compose a good shot of a scene. While traditional PAS cameras have built-in functions to ensure a photo is well focused and has the right brightness, they cannot tell the users how to compose the best shot of a scene. In this paper, we present a first of its kind smart point and shoot (SPAS) system to help users to take good photos. Our SPAS proposes to help users to compose a good shot of a scene by automatically guiding the users to adjust the camera pose live on the scene. We first constructed a large dataset containing 320K images with camera pose information from 4000 scenes. We then developed an innovative CLIP-based Composition Quality Assessment (CCQA) model to assign pseudo labels to these images. The CCQA introduces a unique learnable text embedding technique to learn continuous word embeddings capable of discerning subtle visual quality differences in the range covered by five levels of quality description words {bad, poor, fair, good, perfect}. And finally we have developed a camera pose adjustment model (CPAM) which first determines if the current view can be further improved and if so it outputs the adjust suggestion in the form of two camera pose adjustment angles. The two tasks of CPAM make decisions in a sequential manner and each involves different sets of training samples, we have developed a mixture-of-experts model with a gated loss function to train the CPAM in an end-to-end manner. We will present extensive results to demonstrate the performances of our SPAS system using publicly available image composition datasets.

Related papers

Subjective Camera 0.1: Bridging Human Cognition and Visual Reconstruction through Sequence-Aware Sketch-Guided Diffusion [8.477506348193]
We introduce the concept of a subjective camera to reconstruct meaningful moments that physical cameras fail to capture.<n>We propose Subjective Camera 0.1, a framework for reconstructing real-world scenes from readily accessible subjective readouts.<n>Our approach avoids large-scale paired training data and mitigates generalization issues.
arXiv Detail & Related papers (2025-06-30T10:36:49Z)
ProCrop: Learning Aesthetic Image Cropping from Professional Compositions [57.949730056500634]
ProCrop is a retrieval-based method that leverages professional photography to guide cropping decisions.<n>We present a large-scale dataset of 242K weakly-annotated images, generated by out-painting professional images.<n>This composition-aware dataset generation offers diverse high-quality crop proposals guided by aesthetic principles.
arXiv Detail & Related papers (2025-05-28T15:38:44Z)
Photography Perspective Composition: Towards Aesthetic Perspective Recommendation [8.915832522709529]
Traditional photography composition approaches are dominated by 2D cropping-based methods.<n>Professional photographers often employ perspective adjustment as a form of 3D recomposition.<n>We propose photography perspective composition (PPC), extending beyond traditional cropping-based methods.
arXiv Detail & Related papers (2025-05-27T03:04:48Z)
Towards Understanding Camera Motions in Any Video [80.223048294482]
We introduce CameraBench, a large-scale dataset and benchmark designed to assess and improve camera motion understanding.<n>CameraBench consists of 3,000 diverse internet videos annotated by experts through a rigorous quality control process.<n>One of our contributions is a taxonomy of camera motion primitives, designed in collaboration with cinematographers.
arXiv Detail & Related papers (2025-04-21T18:34:57Z)
IC-Portrait: In-Context Matching for View-Consistent Personalized Portrait [51.18967854258571]
IC-Portrait is a novel framework designed to accurately encode individual identities for personalized portrait generation.<n>Our key insight is that pre-trained diffusion models are fast learners for in-context dense correspondence matching.<n>We show that IC-Portrait consistently outperforms existing state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2025-01-28T18:59:03Z)
Redundancy-Aware Camera Selection for Indoor Scene Neural Rendering [54.468355408388675]
We build a similarity matrix that incorporates both the spatial diversity of the cameras and the semantic variation of the images. We apply a diversity-based sampling algorithm to optimize the camera selection. We also develop a new dataset, IndoorTraj, which includes long and complex camera movements captured by humans in virtual indoor environments.
arXiv Detail & Related papers (2024-09-11T08:36:49Z)
Rawformer: Unpaired Raw-to-Raw Translation for Learnable Camera ISPs [53.68932498994655]
This paper introduces a novel method for unpaired learning of raw-to-raw translation across diverse cameras. It accurately maps raw images captured by a certain camera to the target camera, facilitating the generalization of learnable ISPs to new unseen cameras. Our method demonstrates superior performance on real camera datasets, achieving higher accuracy compared to previous state-of-the-art techniques.
arXiv Detail & Related papers (2024-04-16T16:17:48Z)
PhotoBot: Reference-Guided Interactive Photography via Natural Language [15.486784377142314]
PhotoBot is a framework for fully automated photo acquisition based on an interplay between high-level human language guidance and a robot photographer.<n>We leverage a visual language model (VLM) and an object manipulator to characterize the reference images.<n>We also use a large language model (LLM) to retrieve relevant reference images based on a user's language query.
arXiv Detail & Related papers (2024-01-19T23:34:48Z)
DISeR: Designing Imaging Systems with Reinforcement Learning [13.783685993646738]
We formulate four building blocks of imaging systems as a context-free grammar (CFG), which can be automatically searched over with a learned camera designer. We show how the camera designer can be implemented with reinforcement learning to intelligently search over the space of possible imaging system configurations.
arXiv Detail & Related papers (2023-09-25T03:35:51Z)
Point-and-Shoot All-in-Focus Photo Synthesis from Smartphone Camera Pair [25.863069406779125]
We introduce a new task of AIF synthesis from main (wide) and ultra-wide cameras. The goal is to recover sharp details from defocused regions in the main-camera photo with the help of the ultra-wide-camera one. For the first time, we demonstrate point-and-shoot AIF photo synthesis successfully from main and ultra-wide cameras.
arXiv Detail & Related papers (2023-04-11T01:09:54Z)
Enhanced Stable View Synthesis [86.69338893753886]
We introduce an approach to enhance the novel view synthesis from images taken from a freely moving camera. The introduced approach focuses on outdoor scenes where recovering accurate geometric scaffold and camera pose is challenging.
arXiv Detail & Related papers (2023-03-30T01:53:14Z)
Controllable Image Enhancement [66.18525728881711]
We present a semiautomatic image enhancement algorithm that can generate high-quality images with multiple styles by controlling a few parameters. An encoder-decoder framework encodes the retouching skills into latent codes and decodes them into the parameters of image signal processing functions.
arXiv Detail & Related papers (2022-06-16T23:54:53Z)
Camera View Adjustment Prediction for Improving Image Composition [14.541539156817045]
We propose a deep learning-based approach that provides suggestions to the photographer on how to adjust the camera view before capturing. By optimizing the composition before a photo is captured, our system helps photographers to capture better photos.
arXiv Detail & Related papers (2021-04-15T17:18:31Z)
PhotoApp: Photorealistic Appearance Editing of Head Portraits [97.23638022484153]
We present an approach for high-quality intuitive editing of the camera viewpoint and scene illumination in a portrait image. Most editing approaches rely on supervised learning using training data captured with setups such as light and camera stages. We design a supervised problem which learns in the latent space of StyleGAN. This combines the best of supervised learning and generative adversarial modeling.
arXiv Detail & Related papers (2021-03-13T08:59:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.