Depth Sensing Technologies Overview

Choose the Best one for Your Application
Depth Sensing Technologies
3D depth sensing technologies enable devices and machines to sense their surroundings. Recently, depth measurement and three-dimensional perception have gained importance in many industries and applications. These applications include industrial environments with respect to process optimization and automation, robotics, and autonomous vehicles. There are many different physical and technological approaches to optimally solve certain tasks. Below is an overview of the most important methods to help you understand how they compare to one another, and how to better assess which method works best for your application.   

3D Depth Sensing

Working Principles Stereo Vision

Stereo vision technology determines quantifiable depth perception by recording objects with two cameras. In principle, stereo vision mimics human vision. A scene is captured with two cameras located on a common baseline, with a fixed distance between the two lens centers. In this way, each camera captures the scene at a slightly different angle. The closer an object, the more its features are shifted laterally apart in the two images. This shift is referred to as disparity. A depth map is calculated with the disparity value between the two images. ​Passive systems use available ambient light or artificial lighting to illuminate the object. Active systems employ a light source projecting onto the visible scene, adding features that improve the recognition of such in the two captured images. Most commonly infrared laser projectors with pseudo-random patterns are used, but also structured light is an option. Benefits of stereo vision technology (active) are the robustness in changing lighting conditions and easy multi-camera setup, as the cameras do not interfere with one another. Stereo depth cameras come in all price ranges, depending on the required accuracy level and range.​

Stereo Vision

Working Principle Structured Light

Structured light refers to a known light pattern that is projected onto an object or scene with a projector and is recorded by a minimum of one camera. Usually dot, stripe, or color-coded patterns are used in the projection; time-coded patterns are also common. Cameras placed at known angles to the projector pick up the distorted pattern (usually camera and projector are combined in a single device). By calculating the difference between the projected pattern and the distorted pattern observed by the camera, the depth of the scene can be reconstructed and presented as depth map. This method is not ideal for transparent objects, highly reflective surfaces, or long ranges. Also, depth reconstruction is affected when multiple cameras cover a scene with overlapping field of view, or external light sources emit in the same wavelength, competing with the projected pattern causing interferences.​The benefits of structured light are its high accuracy within short ranges at fair costs compared to other technologies.​


Working Principle Time Of Flight

Time of Flight (ToF) ​

In computer vision, Time of Flight (ToF) refers to the principle of measuring the time of light to travel a certain distance. By knowing the speed of light, the distance between emitter and receiver (usually combined in a single device) can be calculated, as the required time is directly proportional to the distance. The light is usually emitted by LED or laser in the infrared spectrum. Several different implementations are possible; commonly flash based Time of Flight cameras are distinguished from the scanning-based light detecting and ranging (LiDAR), direct ToF from indirect ToF systems. 
Like structured light, cameras leveraging the ToF principle are susceptible to interferences from other cameras or external light sources that emit in the same wavelength. For multicamera setups, this can be resolved by synchronization of the cameras. Overall benefits are the high accuracy, independence of external light sources and being able to retrieve depth information from surfaces with little to no textures.

Direct and Indirect Time of Flight​



Direct ToF (dToF) refers to emitting a single pulse and calculating the distance based on the time difference between the emitted pulse and the received reflection. While indirect ToF (iToF) uses a continuous modulated/ coded stream of light. The distance is then calculated through the difference in the phase between emitted and received reflected light. dToF is most implemented in scanning based LiDARs (see illustration ToF Principle) – with few exceptions like the Intel® RealSense™ LiDAR camera L515 (see illustration iToF LiDAR) that leverages iToF. While iToF is the main principle for flash-based cameras (see illustration iToF Camera).​ 
Especially for LiDAR systems, iToF comes with the benefit of higher accuracy without the need for extreme sampling rates of the laser light pulse. Thus, allowing to capture at higher resolutions and field of views with high frame rates at reasonable cost.​

ToF Flash Based Cameras



Typically, cameras that are referred to as flash-based Time of Flight cameras work with classic image sensor arrays and modulated light (indirect) that illuminate the entire visible scene simultaneously. They can image a scene with only “one shot”. This sets the optimal operating range from short to medium distances, as the maximum range is dictated by the power of the light source. High frame rates and reasonable cost make these cameras interesting in many applications such as dimensioning and weighing of packages in logistics.​


LiDAR (light imaging, detection and ranging) is a special scanning based ToF technology. Common setups of a LiDAR consist of a laser source that emits the laser pulses; a scanner that deflects the light onto the scene; and a detector that picks up the reflected light. Traditionally, the scanning of the scene is realized through a mirror that mechanically directs the laser beam across the scene based on dToF. Recently, iToF solid state are gaining more traction, implementing a MEMS mirror to guide the laser.

The scanning process is repeated up to millions of times per second and produces a precise 3D point cloud of the environment. General benefits are high precision, reliability, and long ranges depending on the laser power.