Today’s article demonstrates how to create a tar.gz file in a single pass in Java. While there’s number of websites that provide instructions for creating a gzip or tar archive via Java, there aren’t any that will tell you how to make a tar.gz file without performing the same operations twice.
Reviewing Tar and Gzip Compression
First, download the Apache Commons Compression library. It is actually a subset of the code found in the Ant Jar for those performing compression operations that do not require all of Ant’s many features. Below is the code to create a tar and gzip archive, respectively, using the Compression library.
TarArchiveOutputStream out = null; try { out = new TarArchiveOutputStream( new BufferedOutputStream(new FileOutputStream("myFile.tar"))); // Add data to out and flush stream ... } finally { if(out != null) out.close(); }
GZIPOutputStream out = null; try { out = new GZIPOutputStream( new BufferedOutputStream(new FileOutputStream("myFile.tar"))); // Add data to out and flush stream ... } finally { if(out != null) out.close(); }
One subtlety in this example is that we use a BufferedOutputStream on the file stream for performance reasons. Often, archive files are large so that buffering the output is desirable. Another good practice is to always close your resources in a finally block after you are done with them.
The Solution
The solution is to wrap the tar stream around a gzip stream, since the order of writing goes inward from outer most to inner most stream. The code below first creates a tar archive, then compresses it inside a gzip stream. Buffering is applied and the result is written to disk.
TarArchiveOutputStream out = null; try { out = new TarArchiveOutputStream( new GZIPOutputStream( new BufferedOutputStream(new FileOutputStream("myFile.tar.gz")))); // Add data to out and flush stream ... } finally { if(out != null) out.close(); }
You can then treat the stream as a tar file using the TarArchiveEntry API to add entries and write data directly to the stream. The gzip compression will happen automatically as the stream is written.