Two basic types of data - test and binary - are used in applications to create files such as documents, images, video, text and executables. Certain applications, however, may need to alter a file to make it available to other applications; for example, e-mail requires text and binary data to be encoded before it's sent.
This article discusses a technique used to read and write encoded data using Java I/O streams. We'll define encoding and cover some of its history, examine two I/O stream classes and an interface, then finish by applying this technique to both a text and a binary file using the Base64 encoding scheme. With this technique you can provide encoding in your applications as well as encoded user information for authenticating against HTTP servers. This technique is provided using a standard, familiar group of Java classes: the I/O streams.
What Is Encoding?
Encoding manipulates and reorganizes bytes so they can be understood by other applications (see Figure 1). This is done primarily for Internet e-mail systems, but is also used in places like basic authentication. Basic authentication requires the user ID and password to be encoded using Base64. Although encoding has been around awhile, you probably never knew it. For example, your e-mail and attachments could be encoded before being sent and decoded when received. E-mails specify encoded content by using the Content-Transfer-Encoding header. This header field can have the following values:
- 7Bit
- Quoted-Printable
- Base64
- 8Bit
- Binary
One side effect of encoding is a possible increase in the size of your data. It all depends on the encoding scheme you're using.
Now that we have some basics, let's look at the EncodedInputStream and EncodedOutputStream classes, which are used to read and write encoded data.
EncodedInputStream
The EncodedInputStream takes encoded data and give it back as a byte array. Convert this data to any form you wish, such as text (see Listing 1). Its constructor takes two arguments: InputStream and EncodingScheme. The InputStream course could be a FileInputStream or even a socket.
Base64EncodingScheme scheme = new Base64EncodingScheme();
EncodedInputStream eIn = new EncodedInputStream(new FileInputStream("encoded.txt"),scheme);
Byte data[] = eIn.readEncoded();
This class overrides the read method and adds a method called readEncoded, which reads encoded data and returns it as a byte array. The read method has been overridden to always return a -1. Initially this was done because the read method returns single bytes; when decoding data, you may be working with more than a single byte at a time.
EncodedOutputStream
The EncodedOutputStream writes out data using whatever encoding scheme you specify (see Listing 2). Its constructor takes two arguments: InputStream and EncodingScheme. The OutputStream can be almost any kind of stream, such as a FileOutputStream or a socket.
Base64EncodingScheme scheme = new Base64EncodingScheme();
EncodedOutputStream eOut = new EncodedOutputStream(new FileOutputStream("encoded.txt"),scheme);
eOut.write("This is unencoded data".getBytes());
This class will buffer output as it's written to the class, encode the data, then write it out to the actual OutputStream specified in the constructor. Use it as you would any other I/O stream - just write either an integer or a byte array and the data will be encoded using the scheme you passed into the constructor.
EncodingScheme
Let's look at the EncodingScheme interface. It's a class that provides different encoding implementations such as the Base64 used in this article (see Listing 3). Its two methods are encode and decode. The EncodedInputStream and EncodedOutputStream delegate to this class when writing and reading the data. Rather than impose different encoding scheme implementations on a user of the stream, developers can plug in different encoding schemes (Quoted-Printable, 7Bit and Base64) and use familiar methods to read and write data without requiring significant changes to their code.
Base64 Encoding Scheme
Before moving to our sample application, we need to implement an encoding scheme; I'll show the Base64 encoding scheme. This scheme basically reorganizes three 8-bit chunks into four 6-bit chunks (see Figure 2). These four 6-bit chunks are represented using a special NVT ASCII character set. The "=" sign is used to pad chunks that aren't a multiple of 3 bytes. You must also organize encoded data into chunks no greater than 76 bytes each. A more formal explanation is available in RFC 2045. As noted previously, encoding increases the size of your data. Base64 increases the size by approximately one-third.
The basic flow of the encode method is to work with 3 byte chunks at all times. When you reach the end of your data, pad with the "=" character. After each iteration of the loop, 4 bytes will be written out to the buffer. When the loop has completely passed through all the data, padding is added and the encoded byte array is returned. The decode method operates almost the same except it works with 4 byte chunks instead of 3 and ignores the padding character (see Listing 4).
Sample Application
Let's put our encoding scheme to use. Our first example encodes a Java source file, then decodes it (see Listing 5). Compile EncodingSample and then run it, specifying HelloWorld.java as the argument (see Listing 6). Once it's finished running, look at the contents of the encoded.txt file to see what the file looks like in its encoded state.
Now take the HelloWorld Java class file, encode it and then decode it. If you haven't already done so, compile the HelloWorld.java file and then run EncodingSample, specifying HelloWorld.class as the argument. Then look at encoded.txt file to see what the file looked like encoded. To prove the file was successfully decoded, type "java HelloWorld" - you should see "HelloWorld" printed out.
Enhancements
While EncodedInputStream and EncodedOutputStream allow you to easily read and write encoded data, some enhancements can be made. Buffering large datasets makes it easy to decode all at once but may cause intermittent OutOfMemoryErrors. Alternatively, data can be encoded and decoded in chunks rather than all at once. Due to time constraints I was unable to implement this feature.
Summary
It's easy to provide an extensible means to read and write encoded data using ordinary Java I/O streams. You can also provide your own EncodingScheme implementations and plug them into your code without changes. For all you sun.misc.BASE64Encoder users, you now have a documented way to use Base64 encoding. Good Luck!
|