Making Media from Scratch, Part 1by Chris Adamson
QuickTime is often described as a "media creation" API, and that means a lot more than just the ability to edit your audio and video and export it to an arbitrary format. This month I'd like to take the term very literally and show you how to create your movies in Java, one frame at a time, without depending on a pre-existing movie.
To do that, we need to take another look at the format of a QuickTime movie. In "Parsing and Writing the QuickTime File Format," we saw how structures called "atoms" represented this format. For today, let's strip away those details and look at the big picture:
A movie contains metadata (creation and modification time, current selection, preferred volume and rate, etc.) and zero or more tracks.
A track contains metadata (creation and modification time, playback quality), exactly one media object, and an edit list describing which parts of the media are to be used.
A media object contains a data reference that indicates where the audio, video, or other data actually is (in the movie file, in another file, on the network, etc.); information about which QuickTime "media handler" can load, save, and play the data; and a structure called a "sample table" to represent where the sample for a given time can be found in the data.
Graphically, this can be seen as a movie where the references are all to external sources (files, URLs, and other movies), as shown in Figure 1, or a "flattened" one, in which the data is all contained within the same
.mov file as the movie's structure, as shown in Figure 2. Either way, the movie is the structure that represents where the samples are, how they're arranged, and what to do with them.
Figure 1. A movie with external sources
Figure 2. A movie with internal sources
By "samples," we mean what is to be seen or heard at some instant of time, in the smallest amount of time relevant to that kind of media. For example, imagine a format where we have totally uncompressed video (equivalent to, say, North American television) and uncompressed CD-quality audio. The video, by our definition, is 30 frames per second, so there are 30 video samples in one second. CD-quality audio is 44.1 KHz, meaning there are 44,100 samples in a second.
QuickTime, interestingly, realizes that a player would generally like its data to be organized with regards to time. For example, you don't want to have a file with all of the video data first and then all the audio data, since playing back would require jumping back and forth between the two, and the read/write head on your hard drive would scream in agony. It's easier to mix them, so that the video data for a certain time and the audio data for that time are in the same place. In QuickTime's worldview, this is a process of "chunking" — the media data combines video, audio, and any other data into one stream (a long run of bytes), with "chunks" of audio, video, and other samples grouped by time. It's up to the media object to manage several tables, like a time-to-sample table and a sample-to-chunk table, to allow it to find the samples at playback time.
Fortunately, you as a developer aren't responsible for all of that bookkeeping, but it's good to understand how it works.
Getting back to the point, to make a movie from scratch, we need to do the following:
- Lay down samples.
- Add these to a media object.
- Add the media to an appropriate track.
- Add the track to a movie.
You may have noticed in the diagrams above that our hypothetical movie contains not just an audio and a video track, but also a "text track." This is exactly what it sounds like: a time-based collection of text, commonly used for providing captions to QuickTime movies. More technically, it is a track where the media samples are ordinary text strings. This is a good place to start with creating our own media, since it doesn't require knowing anything about images or sounds.
Download the source code for the
MakeTextTrack sample application creates a movie with a
single text track. It starts by creating an empty movie file to write to:
Movie.createMovieFile(file, StdQTConstants.kMoviePlayer, StdQTConstants.createMovieFileDeleteCurFile | StdQTConstants.createMovieFileDontCreateResFile);
Next, it creates an empty text track and a text media object, which it will eventually insert into the track:
Track textTrack = movie.addTrack (TEXT_TRACK_WIDTH, TEXT_TRACK_HEIGHT, TEXT_TRACK_VOLUME); TextMedia textMedia = new TextMedia(textTrack, timeScale);
The last argument is a time scale for the media. Movies, tracks, and media all have their own time scale, which is the number of time units that pass in one second. For a movie, this value defaults to 600, which has the advantage of being an even multiple of many common frame-rates: 30 (NTSC video), 25 (PAL and SECAM video), and 24 (film). Dean Perry of Abstract Plane also reminds me it's an even multiple of the 60 "ticks" per second that older Macintoshes used for timekeeping. However, you're free to use and abuse the time scales as you see fit. I arbitrarily chose a value of 100 for my media, so my sample durations are measured in hundredths of a second.
Next, we tell the new
Media object that we intend to do some
We then get the media handler object, required in this case because it has a method for creating new text samples:
TextMediaHandler handler = textMedia.getTextHandler();
and we create a rectangle that will be used in every sample to describe the shape that the text is to be rendered into when played back:
QDRect textBox = new QDRect(0, 0, TEXT_TRACK_WIDTH, TEXT_TRACK_HEIGHT);
We're finally ready to start adding samples. The sample application uses a
static array of
Strings, getting a QuickTime-compatible
QTPointer to each one and passing that as the first argument to
TextMediaHandler.addTextSample() method. Here's how that call
handler.addTextSample (msgPoint, 0, 12, 0, QDColor.yellow, QDColor.black, QDConstants.teJustCenter, textBox, 0, 0, 0, 0, QDColor.white, 100 );
Obviously, this method has a lot of parameters. In order, they are:
QTPointerRef text: a pointer to the string.
int fontNumber: an integer to indicate font.
0can always be used as a generic default, or use the
QDFontclass to get the ID for a font name
int fontSize: the font size, in points.
int textFace: a style, such as bold, italics, etc., as defined by constants in
QDColor textColor: the foreground color, expressed as a
QDColor backColor: the background color.
int textJustification: the right/left/center justification. Possible values are in
QDRect textBox: a
QDRectrectangle describing the box in which the text is to be displayed.
int displayFlags: zero or many behavior flags, logically
OR'd together, describing behavior such as clipping or scaling the text when displayed over other video, etc. These flags are in
StdQTConstantsand a list of supported flags is documented for the native TextMediaAddTextSample function.
int scrollDelay: a time to delay between scrolls if the
dfScrollOutflags are set. Not useful in this app, with its short samples, but potentially useful for other purposes.
int hiliteStart: the index of first character of text to highlight (select), if any.
int hiliteEnd: the index of the last character of text to highlight.
QDColor rgbHiliteColor: the color of the highlight, if used.
int duration: the duration of this sample, expressed in the media's time scale.
The duration is interesting for a couple of reasons. First, it's expressed in terms of the media's time scale. In our case, the time scale is 100 and the duration is 100, so the sample is exactly one second long. Of course, we could have half-second samples by using a duration of 50, or any sample length that can be expressed as a fraction of duration over time scale. Moreover, despite the commonness of fixed frame rates in audio and video (30 fps video, 44.1 KHz sound, etc.), QuickTime requires no such thing -- each sample can be of an arbitrary duration, different from the sample before or after it.
Wrapping up the application, once the loop is done adding samples, we inform
Media that we're done editing:
and insert this media into the text track:
textTrack.insertMedia (0, // trackStart 0, // mediaTime textMedia.getDuration(), // mediaDuration 1); // mediaRate
after which we save the file to disk as texttrack.mov, in the current directory.
To compile and run the sample code, make sure you've worked through any versioning or classpath issues as covered in our re-introduction to QTJ a few months back. When you're done, the result will look something like this (assuming you have the QT plug-in):
One of the nice things to notice is that we picked up word-wrap automatically, without hand-coding line-breaks.
Pages: 1, 2