iBims-1 (independent Benchmark images and matched scans - version 1) is a new high-quality RGB-D dataset, especially designed for testing single-image depth estimation (SIDE) methods. A customized acquisition setup, composed of a digital single-lens reflex (DSLR) camera and a high-precision laser scanner was used to acquire high-resolution images and highly accurate depth maps of diverse indoors scenarios.
Compared to related RGB-D datasets, iBims-1 stands out due to a very low noise level, sharp depth transitions, no occlusions, and high depth ranges.

Our dataset consists of the following components:

  • Core dataset:
    • 100 RGB-D image pairs of various indoor scenes in high- and low resolution
    • Masks for invalid, transparent and planar regions (tables, floors, walls)
    • Masks for distinct depth transitions
    • Camera calibration parameters
  • Auxiliary dataset:
    • 56 different color and geometric augmentations for each image of the core dataset
    • Additional hand-held images for testing MVS methods
    • Images of printed patterns and photos posted on a wall to assess performance of textured planar surfaces
    • Several RGB-D image sequences of static scenes with varying illumation


Method   Standard Metrics (σi=1.25i) PE (cm/°) DBE (px) DDE (%) for d=3m
    rel log10 RMS σ1 σ2 σ3 εplan εorie εacc εcomp ε0 ε- ε+
Eigen (2014)   0.32 0.17 1.55 0.36 0.65 0.84 7.70 24.91 9.97 9.99 70.37 27.42 2.22
Eigen  (2015)  (AlexNet)   0.30 0.15 1.38 0.40 0.73 0.88 7.52 21.50 4.66 8.68 77.48 18.93 3.59
Eigen (2015)  (VGG)   0.25 0.13 1.26 0.47 0.78 0.93 5.97 17.65 4.05 8.01 79.88 18.72 1.41
Laina  (2016)   0.26 0.13 1.20 0.50 0.78 0.91 6.46 19.13 6.19 9.17 81.02 17.01 1.97
Liu (2015)   0.30 0.13 1.26 0.48 0.78 0.91 8.45 28.69 2.42 7.11 79.70 14.16 6.14
Li (2017)   0.22 0.11 1.09 0.58 0.85 0.94 7.82 22.20 3.90 8.17 83.71 13.20 3.09
Liu (2018)   0.29 0.17 1.45 0.41 0.70 0.86 7.26 17.24 4.84 8.86 71.24 28.36 0.40
Ramamonjisoa (2019)   0.26 0.11 1.07 0.59 0.84 0.94 9.95 25.67 3.52 7.61 84.03 9.48 6.49


Contents of the Core Dataset

RGB images and matching depth maps, acquired using a calibrated DSLR camera and a high quality terrestrial laser-scanner. The image pairs are available in two different sizes, namely a HD version with a resolution of 1500x1000 px and a VGA version with a resolution for 640x480 px for both RGB and depth maps. Note that the VGA version is compatible with the popular NYU-v2 dataset and that the low-resolution depth maps are directly computed from the point cloud rather than just down-sampled from the HD version.

In addition to this, pixel-masks for transparent and invalid depth areas, annotating planar regions (for three different categories: walls, floors, tables) and masks of distinct depth transitions are provided.


Number of RGB-D Image Pairs: 100
Resolution (high): 1500x1000 px
Resolution (low): 640x480 px
Depth Range: 0.1m - 50m
Planar Masks: 244 (wall: 140, table: 53, floor: 51)

Auxiliary Datasets

Image Augmentations

In order to assess the robustness of single-image depth estimation methods w.r.t. simple geometrical and color transformation and noise, we derived a set of augmented images from our iBims-1 core dataset.
The augmentations include horizontal and vertical flipping, as well as image channel swapping, histogram stretching, hue and saturation changes, image blurring and adding noise to the images.

Multi-View Stereo Image Pairs

This set comprises additional hand-held images for many scenes of the iBims-1 core dataset with viewpoint changes towards the reference images allowing to validate multi-view stereo algorithms with high-quality ground truth depth maps.

Printed Patterns on a Wall

This set of additionally images contains special cases which are expected to mislead single-image depth estimation methods. These show printed samples from the NYU-v2 dataset and printed black and white patterns from the Pattern dataset hung on a wall. Those are supposed to give valuable insights, as they reveal what kind of image features single-image depth estimation methods exploit. No depth maps are provided for those images, as the region of interest is supposed to be approximately planar and depth estimates are, thus, easy to assess qualitatively.

Format and Download

The dataset can be downloaded on our data server or via anonymous FTP.
More information about the download process can be found here.


If this dataset is useful for your research, please consider citing our paper:

Koch, T., Liebel L., Fraundorfer F., Körner M., 2018. Evaluation of CNN-Based Single-Image Depth Estimation Methods. In Proceedings of the European Conference on Computer Vision Workshops (ECCV WS), Springer. pp 331-348. [pdf]


author = {Koch, Tobias and Liebel, Lukas and Fraundorfer, Friedrich and K{\"o}rner, Marco},
title = {Evaluation of CNN-Based Single-Image Depth Estimation Methods},
pages = {331--348},
editor = {Leal-Taixé, Laura and Roth, Stefan},
booktitle = {European Conference on Computer Vision Workshop (ECCV-WS)},
publisher = {Springer International Publishing},
year = {2018}}