Auralizing 3D Locations of Objects using Deep Learning

By Eickhoff, Patrick und Ekweariri, Augustine und Hölzen, Paul und Pierenkemper, Marius und Wiehe, Anton

October 12, 2020

Auralizing 3D Locations of Objects using Deep Learning

Our project is inspired by the difficulty of blind or visually-impaired people to identify and locate objects in their environment. We created a system with help of computer vision algorithms that both identifies and locates different objects and then leads the user towards it with audio cues. This helps blind or visually-impaired people to interact with their environment by e.g. scanning the room for a missing key and then guide the user to the right direction to finally grasp it. The only requirement to use our system is a mobile RGB camera and some computing power (which could also be provided by a cloud service). This means that our system can be deployed on every smartphone device.

System overview

Illustration of the subparts and their interaction of our Auralization system.

Our project consists of the following subtasks: object detection, depth estimation (optional), simultaneous localization and mapping (SLAM) and audio generation.

Please take a look at our demonstration video for a better understanding (headphones recommended):

System overview (technical)

Overview of our main processing pipeline. The main “Auralizer”-Class retrieves one frame at a time from the video input, initiates tracking and computes bounding boxes. If depth estimation is enabled, the resulting object distance is considered to estimate the absolute object position. Finally the 3D coordinates are translated to an audio output signal.

For further details about our project (algorithms, implementation details and evaluation), please consider reading our full report: Download Full Report (PDF ~10MB)