Urban visual perception is important for the human experience in cities, shaped by intertwined characteristics of urban landscapes. By quantifying and explaining these perceptual experiences, researchers can gain insights into human preferences and support decision-making in planning and design. However, past studies have shown inconsistencies in survey design and ambiguities in reporting, leading to concerns about the reliability and reproducibility of results. This study proposes the first comprehensive framework to guide image-based survey design for capturing perceptions of outdoor urban environments across different scenarios, addressing the lack of methodological standardization in current research. We reviewed existing surveys to identify key parameters, conducted comprehensive between-subject and within-subject surveys, and performed statistical analyses to determine best practices for survey design across different contexts. Aiming to set a potential community standard, our study doubles as a blueprint for a reporting protocol for survey designs. Based on the results, we recommend: (1) meeting a minimum of 12 and 22 ratings per image for Likert Scale and Pairwise Comparison studies to reach survey reliability, respectively, and reporting these alongside other survey design parameters to enhance transparency and reproducibility; and (2) when resource allows larger experiments, adopt a ranking method such as Pairwise Comparison to achieve firmer rating results; and (3) using perspective (non-panoramic) images more frequently, as they exhibit comparable overall scores to panoramic images (R mostly >0.7), while being more widely available via crowdsourced sources, supporting their use in large-scale visual perception research.