Although the concept of virtual spatial audio has existed for almost twenty-five years, only in the past fifteen years has modern computing technology enabled the real-time processing needed to deliver high-precision spatial audio. Furthermore, the concept of virtually walking through an auditory environment did not exist. The applications of such an interface have numerous potential uses. Spatial audio has the potential to be used in various manners ranging from enhancing sounds delivered in virtual gaming worlds to conveying spatial locations in real-time emergency response systems. To incorporate this technology in real-world systems, various concerns should be addressed. First, to widely incorporate spatial audio into real-world systems, head-related transfer functions (HRTFs) must be inexpensively created for each user. The present study further investigated an HRTF subjective selection procedure previously developed within our research group. Users discriminated auditory cues to subjectively select their preferred HRTF from a publicly available database. Next, the issue of training to find virtual sources was addressed. Listeners participated in a localization training experiment using their selected HRTFs. The training procedure was created from the characterization of successful search strategies in prior auditory search experiments. Search accuracy significantly improved after listeners performed the training procedure. Next, in the investigation of auditory spatial memory, listeners completed three search and recall tasks with differing recall methods. Recall accuracy significantly decreased in tasks that required the storage of sound source configurations in memory. To assess the impacts of practical scenarios, the present work assessed the performance effects of: signal uncertainty, visual augmentation, and different attenuation modeling. Fortunately, source uncertainty did not affect listeners’ ability to recall or identify sound sources. The present study also found that the presence of visual reference frames significantly increased recall accuracy. Additionally, the incorporation of drastic attenuation significantly improved environment recall accuracy. Through investigating the aforementioned concerns, the present study made initial footsteps guiding the design of virtual auditory environments that support spatial configuration recall.