There are a couple of questions people ask when building mosaic datasets, such as what format should my data be in, should I have pyramids, how much space will be required, how long will it take to build the mosaic dataset, how long will it take to build the overviews, and so on.
So I did a simple test using 24 images in 4 different raster dataset formats; some with pyramids and some without.
The images were standard tiled orthophotos. Each had:
- 5000 rows and 5000 columns
- 8-bit
- 3-band
Below are 3 of the 24 images.
Compression
The original data is in MrSID format. I converted them to the other formats using ArcGIS Desktop 10.0. The following table lists the combinations used. The compression value of 75% is used because it’s an ArcGIS default and provides reasonable compression and maintains image quality. (Note: The compression value denotes image quality and not percent of compression.)
All except two formats above use a lossy compression, meaning the pixel values can be altered from the original. The TIFF had no compression, and the TIFF with LZW compression is lossless, and these both took up the most space on disk. Using a lossless compression does not usually result in much compression unless there are large areas of NoData or the same value (such as a border).
Notice that the MrSID file is still the smallest due to its proprietary wavelet compression. Some people ask why their MrSID files take so long to add into a geodatabase, such as ArcSDE. This is because this format is highly compressed and gets uncompressed when moved to other less-compressed formats. You can check the estimated uncompressed size in the Raster Datasets Properties dialog box.
Pyramids
Pyramids were generated on some of the TIFF files using bilinear resampling. In the first two examples below pyramids were generate on the uncompressed TIFF file using JPEG compression and the other used the lossless Lz77 compression. In these cases, the additional required storage space was 2.2% to 9%, depending on how compressed the overviews were. The third test generated pyramids using the same compression as the JPEG compressed TIFF files. Here you can see that the percent of additional storage space increased to almost 30%, but you’ll also notice that the size of each pyramid is the same as those generated for the uncompressed TIFF using the same pyramid compression.
If you’re strictly concerned with the total size of data you need to store, you may choose to keep your raster data stored as MrSID files, or convert them to TIFF files using a JPEG compression.
Time
Next, I built 9 mosaic datasets, and add the collection of 24 raster datasets accordingly. Although the mosaic dataset does not contain the actual pixel data, and its size on disk is relatively small, it does represent a total mosaicked image that is:
- 30,000 columns and 20,000 rows
- 3-bands
- 8-bit
Creating each mosaic dataset took 6-8 seconds. This is a very simple operation and should never take very long to complete.
I used the Add Rasters To Mosaic Dataset tool and checked the option to build the overviews and recorded the following information:
From this you can see that it takes very little time, less than 20 seconds to add the raster datasets to the mosaic dataset and calculate required properties, such as the pixel size ranges and boundary. You can also see that the total size of the data on disk does not affect the time it takes to add the data to the mosaic dataset. The remaining time is taken to determine how many overviews are required and to generate them. For those raster datasets with internal overviews, such as JPEG 2000 and MrSID, and external overviews, there are fewer overviews required. Whereas, where there are more overviews to generate, the total time increased.
You can see from these numbers that it’s takes much less time to add rasters to the mosaic datasets then to build the overviews. This is one reason why the process of building overviews at the same time as adding your data to a mosaic dataset is optional.
Space
Finally, I compiled a list of total space for the source raster dataset, any pyramids and including overviews.
You can see from this last table that the question “How much space should I allow for overviews?” is a difficult question to answer. In these scenarios the answer is anywhere from 1-50%. Yet, even though 50% sounds terrible, it may be one of the better options since the total storage size one of the smaller values and is a fast format to read.
Recommended raster format
Before you go off and convert all your raster data to a highly compressed format to save space, please note that the more complex the compression the slower it is to read the data. Therefore, we generally recommend when speed is your #1 concern, that you use a TIFF raster dataset with JPEG compression (which is the second example in the above list). This allows you to use compression and save space, although it is lossy, but still serves your raster data quickly.
Contributed by: Melanie Harlow
Commenting is not enabled for this article.