Uploading Files with Beansby Budi Kurniawan
How many times have you asked yourself or been curious about how the developers at Hotmail or Yahoo Mail process the attachments to your email? Rest assured that you are not the only one. Too often Java Internet developers only concentrate on processing strings from an HTML form, and when asked by the boss if they can do file upload, they have to do some research before they can come back with an answer. File upload is too rarely discussed by even respectable Java literature.
And, with the growth of the Internet, file upload has now also
played significant roles beyond email applications. Other
Internet/intranet applications such as Web-based document management
systems and the likes of "Secure File Transfer via HTTP" require
uploading files to the server extensively. This article discusses all
you need to know about file upload. But first things first. Before you
jump too excitedly into coding, you need to understand the underlying
theory: the HTTP request. Knowledge of the HTTP request is critical
because when you process an uploaded file, you work with raw data not
obtainable from an
HttpServletRequest object's methods
The HTTP Request
Each HTTP request from the Web browser or other Web client applications consists of three parts:
- A line containing the HTTP request method, the Uniform Resource Identifier (URI), and the protocol and the protocol version
- HTTP Request headers
- The entity body
These three parts are explained in the following sections.
The Request Method, URI and Protocol
The first subpart of the first part, the HTTP request method,
indicates the method used in the HTTP request. In HTTP 1.0, it could
be one of the following three:
post. In HTTP 1.1, in addition to the three methods,
there are four more methods:
options. Among the seven, the two
methods that are most frequently used are
get is the default method. You use it,
for example, when you type a URL such as
http://www.onjava.com in the Location or Address box of your
browser to request a page. The
post method is common
too. You normally use this as the value of the
method attribute. When
uploading a file, you must use the
The second part of the first part, the URI, specifies an Internet
resource. A URI is normally interpreted as being relative to the Web
server's root directory. Thus, it starts with a forward slash
/) that is of the following format.
For example, in a typical JavaServer Pages application the URI could be the following.
More information about URI can be found here.
The third component of the first part is the protocol and the protocol version understood by the requester (the browser). The protocol must be HTTP and the version could be 1.0 or 1.1. Most Web servers understand both versions 1.0 and 1.1 of HTTP. Therefore, this kind of Web server can serve HTTP requests in both versions as well. If you are still using an old HTTP 1.0 Web server, you could be in trouble if your users use modern browsers that send requests using HTTP 1.1 protocol.
Combining the three sub-parts of the first component of an HTTP request, the first component would look like the following.
POST /virtualRoot/pageName HTTP/version
POST /eshop/login.jsp HTTP/1.1
The HTTP Request Headers
The second component of an HTTP request consists of a number of HTTP headers. There are four types of HTTP headers: general, entity, request, and response. These headers are summarized in Tables 1, 2 and 3. The response headers are HTTP Response specific, thus not relevant to be discussed here.
|Table 1: HTTP General Headers|
The Pragma general header is used to include implementation specific directives that may apply to any recipient along the request/response chain. This is to say that pragmas notify the servers that are used to send this request to behave in a certain way. The Pragma header may contain multiple values. For example, the following line of code inform all proxy servers that relay this request not to use a cached version of the object but to download the object from the specified location:
The Date general header represents the date and time at which the message was originated.
|Table 2: HTTP Entity Headers.|
This header lists the
set of method supported by the resource identified by the requested
URL. The purpose of this field is strictly to inform the recipient of
valid methods associated with the resource. The Allow header is not
permitted in a request using the
This header is used to describe the type of encoding used on the entity. When present, its value indicates the decoding mechanism that must be applied to obtain the media type referenced by the Content-Type header. For example,
indicates the size of the entity-body, in decimal number of octets,
sent to the recipient or, in the case of the
Content-Type header indicates the media type of the entity-body sent
to the recipient or, in the case of the
The Expires header gives the date and time after which the entity should be considered invalid. This allows information providers to suggest the volatility of the resource or a date after which the information may no longer be accurate. Applications must not cache this entity beyond the date given. The presence of an Expires header does not imply that the original resource will change or cease to exist at, before, or after that time. However, information providers should include an Expires header with that date. For example,
The Last-Modified header indicates the date and time at which the sender believes the resource was last modified. The exact semantics of this field are defined in terms of how the recipient should interpret it. If the recipient has a copy of this resource that is older than the date given by the Last-Modified field, that copy should be considered stale For example,
|Table 3: HTTP Request Headers|
The From header specifies who is taking responsibility for the request. This field contains the email address of the user submitting the request. For example,
This header contains a semicolon-separated list of MIME representation schemes that are accepted by the client. The server uses this information to determine which data types are safe to send to the client in the HTTP response. Although the Accept field can contain multiple values, the Accept line itself can also be used more than once to specify additional accept types (this has the same effect as specifying multiple accept types on a singe line). If the Accept filed is not used in the request header, the default accepts types of text/plain and text/html are assumed. For example,
This header is very similar to the accept header in syntax. However, it specifies the content-encoding schemes that are acceptable in the response. For instance,
This header is also similar to the Accept header. It specifies the preferred response language. The following example specifies English as the accepted language:
The User-Agent, if present, specifies the name of the client browser. The first word should be the name of the software followed by a slash and an optional version number. Any other product names that are part of the complete software package may also be included. Each name/version pair should be separated by white space. This field is used mostly for statistical purposes. It allows servers to track software usage and protocol violation. For example,
This header specifies the URI that contained the URI in the request header. In HTML, it would be the address of the page that contained the link to the requested object. Like the User-Agent header, this header is not required but is mostly for the server's statistical and tracking purpose. For example,
The Authorization header contains authorization information. The first word contained in this header specifies the type of authorization system to use. Then, separated by white space, it should be followed by the authorization information such as a user name, password, and so forth. For example,
This header is used with the GET method to make it conditional. Basically, if the object hasn't changed since the date and time specified by this header, the object is not sent. A local cached copy of the object is used instead. For example,
If-Modified-Since: Thu, 10 Aug 2000 12:12:29 GMT