Exploring Java

Previous Chapter 8
Input/Output Facilities
Next
 

8.4 Data compression

Java 1.1 includes a new package, java.util.zip, that contains classes you can use for data compression. In this section we'll talk about how to use the classes. We'll also present two useful example programs that build on what you have just learned about streams and files.

The classes in the java.util.zip package support two widespread compression formats: GZIP and ZIP. Both of these are based on the ZLIB compression algorithm, which is discussed in RFC 1950, RFC 1951, and RFC 1952. These documents are available at ftp://ds.internic.net/rfc/. I don't recommend reading these documents unless you want to implement your own compression algorithm or otherwise extend the functionality of the java.util.zip package.

Compressing data

The java.util.zip class provides two FilterOutputStream subclasses to write compressed data to a stream. To write compressed data in the GZIP format, simply wrap a GZIPOutputStream around an underlying stream and write to it. The following is a complete example that shows how to compress a file using the GZIP format.

import java.io.*;
import java.util.zip.*;
public class GZip {
  public static int sChunk = 8192;
  public static void main(String[] args) {
    if (args.length != 1) {
      System.out.println("Usage: GZip source");
      return;
    }
    // Create output stream.
    String zipname = args[0] + ".gz";
    GZIPOutputStream zipout;
    try {
      FileOutputStream out = new FileOutputStream(zipname);
      zipout = new GZIPOutputStream(out);
    }
    catch (IOException e) {
      System.out.println("Couldn't create " + zipname + ".");
      return;
    }
    byte[] buffer = new byte[sChunk];
    // Compress the file.
    try {
      FileInputStream in = new FileInputStream(args[0]);
      int length;
      while ((length = in.read(buffer, 0, sChunk)) != -1)
        zipout.write(buffer, 0, length);
      in.close();
    }
    catch (IOException e) {
      System.out.println("Couldn't compress " + args[0] + ".");
    }
    try { zipout.close(); }
    catch (IOException e) {}
  }
}

First we check to make sure we have a command-line argument representing a file name. Then we construct a GZIPOutputStream wrapped around a FileOutputStream representing the given file name with the .gz suffix appended. With this in place, we open the source file. We read chunks of data from it and write them into the GZIPOutputStream. Finally, we clean up by closing our open streams.

Writing data to a ZIP file is a little more involved, but still quite manageable. While a GZIP file contains only one compressed file, a ZIP file is actually an archive of files, some (or all) of which may be compressed. Each item in the ZIP file is represented by a ZipEntry object. When writing to a ZipOutputStream, you'll need to call putNextEntry() before writing the data for each item. The following example shows how to create a ZipOutputStream. You'll notice it's just like creating a GZIPOutputStream.

ZipOutputStream zipout;
try {
  FileOutputStream out = new FileOutputStream("archive.zip");
  zipout = new ZipOutputStream(out);
}
catch (IOException e) {}

Let's say we have two files we want to write into this archive. Before we begin writing we need to call putNextEntry(). We'll create a simple entry with just a name. There are other fields in ZipEntry that you can set, but most of the time you won't need to bother with them.

try {
  ZipEntry entry = new ZipEntry("First");
  zipout.putNextEntry(entry);
}
catch (IOException e) {}

At this point you can write the contents of the first file into the archive. When you're ready to write the second file into the archive, you simply call putNextEntry() again:

try {
  ZipEntry entry = new ZipEntry("Second");
  zipout.putNextEntry(entry);
}
catch (IOException e) {}

Decompressing data

To decompress data, you can use one of the two FilterInputStream subclasses provided in java.util.zip. To decompress data in the GZIP format, simply wrap a GZIPInputStream around an underlying stream and read from it. The following is a complete example that shows how to decompress a GZIP file.

import java.io.*;
import java.util.zip.*;
public class GUnzip {
  public static int sChunk = 8192;
  public static void main(String[] args) {
    if (args.length != 1) {
      System.out.println("Usage: GUnzip source");
      return;
    }
    // Create input stream.
    String zipname, source;
    if (args[0].endsWith(".gz")) {
      zipname = args[0];
      source = args[0].substring(0, args[0].length() - 3);
    }
    else {
      zipname = args[0] + ".gz";
      source = args[0];
    }
    GZIPInputStream zipin;
    try {
      FileInputStream in = new FileInputStream(zipname);
      zipin = new GZIPInputStream(in);
    }
    catch (IOException e) {
      System.out.println("Couldn't open " + zipname + ".");
      return;
    }
    byte[] buffer = new byte[sChunk];
    // Decompress the file.
    try {
      FileOutputStream out = new FileOutputStream(source);
      int length;
      while ((length = zipin.read(buffer, 0, sChunk)) != -1)
        out.write(buffer, 0, length);
      out.close();
    }
    catch (IOException e) {
      System.out.println("Couldn't decompress " + args[0] + ".");
    }
    try { zipin.close(); }
    catch (IOException e) {}
  }
}

First we check to make sure we have a command-line argument representing a file name. If the argument ends with .gz, we figure out what the file name for the uncompressed file should be. Otherwise we just use the given argument and assume the compressed file has the .gz suffix. Then we construct a GZIPInputStream wrapped around a FileInputStream representing the compressed file. With this in place, we open the target file. We read chunks of data from the GZIPInputStream and write them into the target file. Finally, we clean up by closing our open streams.

Again, the ZIP archive presents a little more complexity than the GZIP file. When reading from a ZipInputStream, you should call getNextEntry() before reading each item. When getNextEntry() returns null, there are no more items to read. The following example shows how to create a ZipInputStream. You'll notice it's just like creating a GZIPInputStream.

ZipInputStream zipin;
try {
  FileInputStream in = new FileInputStream("archive.zip");
  zipin = new ZipInputStream(in);
}
catch (IOException e) {}

Suppose we want to read two files from this archive. Before we begin reading we need to call getNextEntry(). At the least, the entry will give us a name of the item we are reading from the archive.

try {
  ZipEntry first = zipin.getNextEntry();
}
catch (IOException e) {}

At this point you can read the contents of the first item in the archive. When you come to the end of the item, the read() method will return -1. Now you can call getNextEntry() again to read the second item from the archive.

try {
  ZipEntry second = zipin.getNextEntry();
}
catch (IOException e) {}

If you call getNextEntry() and it returns null, then there are no more items and you have reached the end of the archive.


Previous Home Next
Serialization Book Index Network Programming

Java in a Nutshell Java Language Reference Java AWT Java Fundamental Classes Exploring Java