Blob Blame History Raw
<html>
<head>
<!--
  Support idiotic mobile browsers that are incapable of rendering
  straightforward HTML properly
-->
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Keeping filesystem images sparse</title>
</head>
<body>
<h2>Keeping filesystem images sparse</h2>
<p>
Filesystem images in local files can be used by many PC emulators and
virtual machines
(<a href="http://user-mode-linux.sourceforge.net/">user-mode Linux</a>,
<a href="http://fabrice.bellard.free.fr/qemu/">QEMU</a> and
<a href="http://wiki.xensource.com/xenwiki/">Xen</a>, to name but three).
Typically these filesystems are created as sparse files using
commands like:
<p>
<pre>
   dd if=/dev/zero of=fs.image bs=1024 seek=2000000 count=0
   /sbin/mke2fs fs.image
</pre>
where the enormous <code>seek</code> value causes <code>dd</code> to move
forward by 2GB before writing nothing at all.  This results in the
creation of a sparse file which takes disk space only for blocks which are
actually used:
<p>
<pre>
   $ ls -l fs.image 
   -rw-rw-r--  1 rmy rmy 2048001024 Apr 18 19:10 fs.image
   $ du -s fs.image 
   31692   fs.image
</pre>
As the filesystem is used, more and more of the non-existent blocks are
filled with data and the size of the file on disk grows.  Sometimes it would
be nice to be able to reclaim unused blocks from a filesystem image.  However,
deleting files from the image doesn't return the space to the
underlying filesystem:  even free blocks in the image still consume space.
Reclaiming the space can be achieved in two stages:
<ul>
<li>Fill unused blocks with zeroes
<li>Make the file sparse again
</ul>
<p>
One traditional way to zero unused blocks is to create a file that fills
all the free space:
<p>
<pre>
   dd if=/dev/zero of=junk
   sync
   rm junk
</pre>
<p>
The disadvantage of <code>dd</code> in this context is that it destroys
any sparseness that exists:  free blocks that were originally represented
as holes in the image file are replaced with actual blocks containing
zeroes.  Also, filling up a live filesystem is probably a bad idea.
<p>
As an alternative approach, and as practice in mucking about with ext2
filesystems, I've written a utility which scans the free blocks in an
ext2 filesystem and fills any non-zero blocks with zeroes.
The source, <a href="zerofree-1.1.1.tgz">zerofree-1.1.1.tgz</a>, is
available for download.
<p>
<ul>
<li>A cautious user would run fsck on the filesytem both before and after
running zerofree.
<li>The filesystem to be processed should be unmounted or mounted read-only.
<li>The utility also works on ext3 or ext4 filesystems.
<li>Binary packages are available in the standard repositories for Debian and Fedora.
<li>The <a href="http://www.sysresccd.org/SystemRescueCd_Homepage">SystemRescueCd</a> live distribution includes zerofree.
<li><a href="http://libguestfs.org/">guestfish</a> can run zerofree on many types of virtual machine filesystems.
</ul>
<p>
However, this is only half the story:  the empty free
blocks still consume space in the underlying filesystem, so something
must to be done to reclaim that space.
<p>
A common suggestion is to use the sparse file handling capabilities
of the GNU <code>cp</code> command to take a copy of the filesystem image with
<code>cp --sparse=always</code> (though this does require the original
and sparse files to exist at the same time, which may be inconvenient).
<p>
If your kernel and util-linux are sufficiently modern and you have a supported
filesystem you can use <code>fallocate -d</code> to 'dig holes' in a file.
This makes the file sparse in-place, without using extra disk space.
<p>
<hr>
<address>
<a href="mailto:rmy@pobox.com">Ron Yorston</a><br>
18th April 2004 (updated 19th February 2018)<br>
Some <a href="obsolete.html">obsolete</a> information has been moved to a
separate page.
</address>
</body>
</html>