Friday, July 22, 2011

HFS+ compression

Someone might find this useful. Once, I urgently needed free disk space and found a way to get it. I wrote this article after reading posts in MacRumors and MacWorld (both posts were written by brkirch, who used Ars Technica's Snow Leopard review as the basis).

Those who use (or used) Windows with NTFS filesystem, know that one can set "compressed" attribute for a folder (file), so that all the files moved to this folder are compressed by Windows. The files are decompressed automatically when you use them. The fact is that compression and decompression are done in the background unnoticed for the user. Something similar can be done in Mac OS X, though, with some differences: new files in the compressed folder won’t be archived automatically (there are a number of commercial programs that will do it for you, for example Clusters from LateNiteSoft (its former name is Squeeze) or HDCleanUp) and some files larger than 20MB could not be archived. I did my best to sound as simple and comprehensible as possible.

There are different file systems used in В Mac OS X. HFS+ (also referred to as Mac OS Extended) allows you to perform Transparent compression/HFS+ compression in Mac OS X, starting from the v10.6 (Snow Leopard). This functionality is used by Apple to make Snow Leopard distribution size smaller and to improve file loading performance. However, you can use it to compress your own files. Below is the instruction for how it can be done.

To find out your OS version, click the apple menu in the menu bar and then select the About This Mac item. You can see your Mac OS version in "About This Mac" window. Use Disk Utility to get to know your file system type. This screenshot illustrates where to find information about the type of your file system.

Unfortunately, this functionality is not that easy to access as in Windows where you have to just check the Compress contents to save disk space checkbox. You will have to use Terminal utility. Moreover, to know whether the file is compressed or not, you will have to use third-party utilities. Finder shows free space on the disk taking into account compressed files, in other words Finder will show that free space on the disk has been increased after large folder (such as Application folder) compression. I have read about two free ways (the commercial programs a listed above) to use HFS+ compression. You can use the Mac OS built-in ditto utility or afsctool utility. afsctool is an open-source program written by brkirch.

ditto utility (full description) is used to copy folders, to create and unpack archives. There is an option --hfsCompression that allows you to compress files when copying to HFS+ volumes. The documentation contains a warning to avoid using this functionality for user files, since you will not be able to read these files in Mac OS X earlier than v10.6 (Snow Leopard) – “Since files using HFS+ compression are not readable on versions of Mac OS X earlier than 10.6, this flag should not be used when dealing with non-system files or other user-generated content”. To compress files, you will have to copy them by the ditto utility. To do this, in the Terminal command line, run:
ditto --hfsCompression source_folder new_folder_with_compressed_files
example:
ditto --hfsCompression /Users/alien/moo /Users/alien/moo_new
After the folder has been successfully copied (and files are compressed), you can delete the source folder and rename the old folder to a new one:
mv /Users/alien/moo_new /Users/alien/moo
In order not to enter the folders names manually, you can drag-and-drop them from Finder to Terminal, the folder name will be entered to the place where the cursor is set.

The afsctool utility (you can find latest version here) is much more convenient to use. Its author is brkirch. You can run it from Terminal. Use either full path to it (something like this), or copy this program to the directory for other system utilities and then run it by specifying its name only. You can copy as follows:
sudo cp program_full_name /usr/bin/
(see screenshot for my case) sudo will ask for the administrator password, so that to get the permissions to write into the /usr/bin folder.
If you run afsctool without parameters, then the program will show all available parameters. There are many of them; I will mention only the most useful parameters: compress folder/file and display information about compressed files. Let me remind you that you can drag-and-drop folders/files from Finder to Terminal instead of entering their names manually, thus, the folder/file name will be entered to the place where the cursor is set.
1. compress a folder which contains files or compress a file:
afsctool -c -<level> folder_or_file_name
-c parameter – instructs program to compress folder/file, -<level> parameter – compression level to use when compressing (ranging from 1 to 9, with 1 being the fastest and 9 being the best, default is 5). For example, this is how I compress the folder /Developer:
Macintosh:~ alien$ afsctool -c -9 /Developer
2. display information about compressed files:
afsctool -v folder_or_file_name
(for example, see that my folder /Developer has been compressed twice the size of the it, that saved me almost 6Gb of space)
3. display the list of compressed files in the directory:
afsctool -lv folder_name
(for example, you can see that four files have been compressed in the futurama folder and that 58.6% of space has been freed).

The quality of HFS+ compression is almost the same as (1-2 per cent worse) the compression quality of the standard command Compress.

To conclude, let me add that HFS+ compression helps not only to free disk space (lack of disk space is very rare now if not for SSD discs), but also to speed up the launching of the programs. There is a perfect explanation in Ars Technica's Snow Leopard review (Installation footprint): But compression isn't just about saving disk space. It's also a classic example of trading CPU cycles for decreased I/O latency and bandwidth. Over the past few decades, CPU performance has gotten better (and computing resources more plentiful—more on that later) at a much faster rate than disk performance has increased. Modern hard disk seek times and rotational delays are still measured in milliseconds. In one millisecond, a 2 GHz CPU goes through two million cycles. And then, of course, there's still the actual data transfer time to consider.

1 comment: