When I mosaic two rasters, the size of the resulting raster is almost always larger than the sum of the sizes of the original rasters. Why is that?

Fellow researchers and open-source GIS enthusiasts,

Welcome to my blog!

I’d like to start with a disclaimer – I may be a researcher of this very area but that doesn’t mean everything I do or write here will work for you, in your own desktop configurations and package versions. I have no responsibility if you lose data or mess up your installation. I also do not authorize any copies of my content.

Today, I am discussing an effect seen when you mosaic two rasters: the resulting raster almost always uses more disk space that than the sum of the original rasters disk space used. But why?

- Short answer: your GIS software needs to fill the blanks in the area with NoData pixels.

- Long answer: I will demonstrate what happens by an example on QGIS 3.18, and show why the final raster tends to be larger than the other two summed.

I have these two rasters (randomically generated)

original rasters

And I wish to join them, by making a mosaic with them.

The one in the left has these properties:

Dimensions X: 6 Y: 10 Bands: 1

Origin 324836,6.81631e+06

Pixel Size 10000,-10000

And the one in the right has these:

Dimensions X: 10 Y: 8 Bands: 1

Origin 938235,6.59132e+06

Pixel Size 10000,-10000

Their current disk spaces used for the rasters

Both are saved on GEOTIFF (.tif) format.

The one in the left: 600 bytes

The one in the right: 680 bytes

Joining the two raster layers in QGIS

To merge these two raster grids (also known as mosaicking them) you can use SAGA GIS tool “Mosaic Raster Layers” (searchable on the Processing Toolbox).

I added the two layers as Input Grids.

QGIS mosaic tool

The resulting mosaic, in the same file format, takes 9.04 Kb or Kilo bytes. That is equivalent to 9040 bytes. Much larger than 600 + 680 = 1280 bytes, or ~ 1.3 Kb.

To investigate why, first, let’s look into the properties of the resulting raster:

Dimensions X: 71 Y: 30 Bands: 1

Origin 324836,6.81132e+06

Pixel Size 10000,-10000

The pixel size is the same, and the origin is almost the same as the one from the first original raster. But the dimensions show that the resulting raster has much more pixels than the first two summed.

First raster: 6 x 10 = 60 pixels

Second raster: 10 x 8 = 80 pixels

Total: 140 pixels

Mosaic raster: 71 x 30 = 2130 pixels

The resulting raster has 2130 / 140 = 15.21 times more pixels than the first two summed. Due to compressions of the GEOTIFF format you get to have a resulting raster that is just 9.04/1.28 = 7.06 times larger in physical space.

Why does the mosaic raster file have so many pixels that are unused?

A raster is a regular grid, so, when you join two or more rasters that are distant from each other by mosaic, the algorithm has to adjust their grids, and to fill the space between the grids with NoData pixels.

If I select the mosaic layer and click the “empty space” between the two original rasters using the tool Identify Features of QGIS (Ctrl + Shift + I), this appears:

identify features, showing no data result

Showing that the selected pixel has a value assigned, and this value is “no data”. This takes space in the raster file.

Recommendation

To save Hard Drive space, avoid mosaicking rasters that are far from each other, especially if they have a fine resolution (because they will use more pixels).

In rasters with coarse resolution, as the ones shown in this post, this is not so important. However, it is very relevant when dealing with large raster sets with fine resolutions, because this will generate a larger number of “filling” pixels in between the original rasters and this can mean an increase of Gigabytes in resulting size. We want to avoid that!

Luísa Vieira Lucchese
Luísa Vieira Lucchese
Postdoctoral Researcher

Postdoc at University of Pittsburgh

Related