Can a Large Language Model Assess Urban Design Quality? Evaluating Walkability Metrics Across Expertise Levels

Chenyi Cai, Kosuke Kuriyama, Youlong Gu, Filip Biljecki, Pieter Herthogs

September 2025

Abstract

Urban street environments are vital to supporting human activity in public spaces. The emergence of big data, such as street view images (SVI) combined with multi-modal large language models (MLLM), is transforming how researchers and practitioners investigate, measure, and evaluate semantic and visual elements of urban environments. Considering the low threshold for creating automated evaluative workflows using MLLM, it is crucial to explore both the risks and opportunities associated with these probabilistic models. In particular, the extent to which the integration of expert knowledge can influence the performance of MLLM in the evaluation of the quality of urban design has not been fully explored. This study set out an initial exploration of how integrating more formal and structured representations of expert urban design knowledge (e.g., formal quantifiers and descriptions from existing methods) into the input prompts of an MLLM (ChatGPT-4) can enhance the model’s capability and reliability to evaluate the walkability of built environments using SVIs. We collect walkability metrics through the existing literature and categorise them using relevant ontologies. Then we select a subset of these metrics, used for assessing the subthemes of pedestrian safety and attractiveness, and develop prompts for MLLMs accordingly. We analyse MLLM’s abilities to evaluate SVI walkability subthemes through prompts with multiple levels of clarity and specificity about evaluation criteria. Our experiments demonstrate that MLLMs are capable of providing assessments and interpretations based on general knowledge and can support the automation of imagetext multimodal evaluations. However, they generally provide more optimistic scores and can make mistakes when interpreting the provided metrics, resulting in incorrect evaluations. By integrating expert knowledge, MLLM’s evaluative performance exhibits higher consistency and concentration. Therefore, this paper highlights the importance of formally and effectively integrating domain knowledge into MLLMs for evaluating urban design quality.

Type

Journal article

Publication

ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences

Can a Large Language Model Assess Urban Design Quality? Evaluating Walkability Metrics Across Expertise Levels

Abstract

Chenyi Cai

Research Fellow

Youlong Gu

Research Engineer

Filip Biljecki

Assistant Professor