The Playground: AoW Title Part 1

I loved the Age of Wonders games as a kid. Amazing graphics, beautiful music, and so much attention to detail. I still have my disk, which became a very fortunate thing for this project. Shockingly, it still installs and runs just fine in 64-bit Windows 10 (albeit with some minor prodding to settings). The title screen doesn’t particularly like my 1920×1080 screen resolution (the sliding background painting doesn’t loop properly) and the smaller maps are pretty much completely visible on such a large screen but other than that it holds up very well.

But enough gushing about my favorite game. That’s not why we’re here. We’re here to learn, and learn I very much did. This project ended up teaching me a number of things about JavaScript that I legitimately did not know about before. The main two things I learned about were the Data View API which allows JavaScript to peek into binary data, and the Web Audio API which allows the browser to play arbitrary sound. Additionally, I dabbled in C++ a bit for this project for extracting graphics from the Age of Wonders ILB format, thanks to documentation found on the Jongware website as well as a little bit from the AoWRead documentation. This proved invaluable to help me get things set up, and eventually I was able to export even complicated composite images as PNG files (via the STB image writing library which I chose because it’s silly easy to use and I already had it due to work on the STB image reading library). Once I’m finished writing the articles, I’ll put the project in my Demos and Examples page.

When I started the project, the first goal I had was to try and use as many of the original game files as I could, untouched. I decided to try starting with something I thought would be easy. The music in Age of Wonders is just a set of Impulse Tracker files, which is a semi-standard file format. I figured documentation would be fairly readily available from the many people that would have reverse engineered it throughout the years. I was half right, but there were a number of things I ended up having to guess on and just play around with until it sounded close enough, and a great deal of things where documentation was spread over a number of pages as each only gave a general idea of what they covered. The two main places I found to be invaluable in deciphering the Impulse Tracker format were the SchismTracker GitHub which explained the actual file structure as well as a rough overview of many different effects, and the OpenMPT Wiki which gave a better understanding of the key details of the different effects (as well as the OpenMPT program itself which let me compare the data I was parsing and the effects I was generating to a base reference).

Obviously the first thing I would need to do is be able to fetch the music file from my server. I decided I’d make my player a simple library with just a set of classes for the key concepts of an Impulse Tracker song, so I started with the ITSong class. The class would have methods responsible for loading the file, starting off the parsing process, keeping track of state while playing and stopping the music, and firing events as it does different things. Loading the file was as simple as setting up a standard XMLHttpRequest, setting the MIME type to “application/octet-stream” so the browser wouldn’t try and parse the data for us, and setting the responseType to “arraybuffer” so the browser would set that up for us.

Once loading was good, it came time to parse the data. For this, I turned to the DataView class. This is one of the new things I had to learn about, but it’s actually pretty simple. You give it an ArrayBuffer (like we told the browser to give us for our request) and it exposes a number of neat methods that reinterpret the data for us. The vast majority of what I ended up using were the DataView.getUint8 and DataView.getUint16 methods, which parse the data on an unsigned byte and short level. The fact that each of these functions take an offset to where to read from means writing the parser was just a matter of keeping track of where in the file we are, and also makes the different data offsets stored in the file trivial to handle. The one thing I wish the API provided was a method to read C strings, so I ended up having to write one of my own. It was trivial to implement though, here’s my entire utility class (including the checkFlag method which is just your bog-standard bitfield->bool function):

class ITUtil
{
 static checkFlag(value, bit)
 {
  return (value & (0x01 << bit)) != 0;
 }

 static dvGetString(dataView, byteOffset, byteLength)
 {
  var currentOffset = byteOffset;
  var result = "";
  var ch = 0;
 
  if(byteLength === undefined)
   byteLength = dataView.byteLength - byteOffset;
 
  var endOffset = byteOffset + byteLength;
 
  do
  {
   ch = dataView.getUint8(currentOffset++);
   // Early out
   if(ch == 0)
    break;
 
   result += String.fromCharCode(ch);
  } while(currentOffset < endOffset);
 
  return result;
 }
}

Pretty easy, and could almost certainly be written better. At this point I was already ready to start parsing bits of the file. The first thing to check was the file signature. Javascript makes this trivial, just read the 4-char string and compare to “IMPM”. Some basic logging and error handling and I was confident I was parsing the file correctly! Eventually I built up parsing the entire header, with logging to show all the different flags and data to make sure I was on the right track. There were a few slip ups but for the most part it all was going smooth. I’m still not entirely sure how to check which channels are actually used by the song, as I don’t think AoW ever uses more than 16 even though IT supports up to 64, and OpenMPT can see. My guess is it checks if a channel is muted in the header.

Eventually I reached the first non-trivial section of data to parse: the sample table. Thankfully none of the AoW songs use IT instruments, allowing me to ignore that mess for now. Because samples are a nontrivial set of data that are used repeatedly throughout the lifetime of the song, I decided this was another good candidate for a class. This class would be responsible for parsing the sample data from the file, creating Audio API nodes to play the sample at various pitches, and adjusting the playback rate of existing audio API nodes based on the sample’s stored data. One thing that made this slightly more difficult was that samples do not have to be stored such that middle C is at any particular sample rate. Instead, they store that info in the file, which means to properly adjust the playback rate of the Audio API node to get the note we want, we have to store that sample rate somewhere. I decided to keep it with the rest of the sample data and just have the sample class do the adjustment. I would have just used the middle C sample rate to create the API node, but on Firefox at least it did not like such low sample rates; instead I just went with 44100 samples per second. This was made easier though by scaling the middle-C sample rate by the sample rate we create the sound buffer at, essentially “c5rate / 44100” where c5rate is what is stored in the file, which gave us a float containing what we need to put in the API node’s playback rate value to get the instrument sample playing at the song’s expected middle C. From there, we can just multiply in the scale of each note relative to middle C (i.e. one octave up is 2x, one octave down is 0.5x, etc.). This made playing a specific note a trivial “node.playbackRate.value = this.c5rate * noteRate”. I could have used the lookup table in the code there, but realized that it would cause problems with various note effects like vibrato that depend on a smooth transition between notes. Instead I had the calling code pass the desired note rate to the function so I could leave all the specific calculation to the one that actually has all that data.

Parsing the sample is not entirely easy simply due to the variation in sample capabilities. I decided to start simple and just support the basic ones: 16 and 8 bit values, mono, signed and unsigned, and not compressed. This made massaging them into the format needed by the Web Audio API easier, as it was just deciding which values to offset and scale by. The API expects samples to be in a Float buffer, scaled between -1 and 1. This means I just need to make sure the samples are signed, then divide by the max value the sample size can hold. Then I can just dump them into the buffer and use that to play the instrument.

Now, because samples are non-trivial, and because it’s often very easy to tell when you’ve messed up, I took the time to put together a basic HTML page that would display key information about the song, the samples, and the pattern order. Eventually I’d extend this to show the patterns too, but that’s later. Once I could see that I was at least parsing the data and not getting lost in the file I added buttons to let me play the samples. On the first test, most of them worked wonderfully! It turned out there were two problems: the 16 bit samples were being played back at the wrong rate because I was under the impression that the sample rate was stored as bytes per second (dividing it by 2 for 16 bit samples, which was incorrect) and the loops weren’t looping.

The Audio API makes it easy to play basic loops within samples. Unfortunately, IT files support a non-basic loop type as well. I call them bounce loops, because instead of jumping from the last value in the loop to the first, it instead reverses direction and plays the sample backwards until it reaches the beginning, where it reverses direction again; bouncing between the two endpoints. Additionally, the Audio API’s loop start and end values are times, not sample indexes. This meant some data mangling was needed, which took some trial and error to deal with. The endpoint time/index issue was fairly easy: just divide the index by the sample rate I chose earlier and it’s good. That just left me with the bounce loop. I figured the easiest way to handle them was to manually wedge a mirrored set of values in and extend the sample so that I could just use the Audio API’s normal loop functionality. It took a couple tries to get everything settled without pops due to indexing errors, but I was correct in that this seems to be the best way to handle it.

Now that samples were playing properly, it was time to get the file to put them together into a song. This ended up being the most complicated and frustrating part of this whole audio endeavor. Most of this frustration simply came from poor documentation of the IT format and playback method, but there were a few design choices I made that didn’t help things.

Having some familiarity with how tracked formats work, I decided to create two classes; one for the rows that contain the various notes and commands played on the channels for each part of the song, and one for the patterns that organize these rows into actual segments of the song. For simplicity I eventually settled on having these classes essentially just be data containers with methods to parse their individual bits of the file format. A quick adjustment to my test page and I could see that I was successfully getting the data and it was doing a good job at matching OpenMPT’s pattern viewer.

It was time to see about playing the song. I had the ITSong class store all the state for the current position in the pattern order list, the currently playing pattern, and the current row in the pattern. Additionally, I had it handle all the channel setup and management, which I later separated out into it’s own ITChannel class. Before I could start playing notes, though, I had to figure out the timing for playing rows. If I played rows as fast as possible, I would get a blast of noise followed by silence. That’s not a song (at least, not what I was trying to play). It was a long bit of searching, but I eventually found the answer. There are two variables that control how fast a song plays: The speed and the tempo. The tempo controls how fast a tick lasts, and the speed controls how many ticks there are during a row. It turns out, the formula for the length of a tick is “2.5 / tempo” where 2.5 is in seconds. So for a song with a tempo of 125, a tick is 20ms. Now that I had this, I could easily set up autoplay. Just have a function that sets itself as a callback for setTimeout() and set the time to “speed * (2.5 / tempo) * 1000” to get the ms per row.

For simplicity, I decided to, at least initially, just play the basic sample at the note and volume specified, as well as handling note cuts to end sample loops. I fired up my test page, hit the play button, and it sounded almost correct! Except for a few weird volume issues… it turned out I didn’t realize when a new note started that if it didn’t have a volume specified then it should be implied to play at full volume, not the last set volume like I thought. Getting that fixed brought things significantly closer to what I was expecting. Even though the song I was playing didn’t use it, I decided to add the volume-data panning code since it was just another conditional and variable setting so it wasn’t difficult at all to add.

With the basics down and a framework to go off of, it was time to start adding different effect commands. These are where all of my real headaches came from, especially the pitch slide effect. Some of the effects, however, were ridiculously easy, such as adjusting the song’s speed and tempo, pattern and row breaks. Some were more complicated but not too bad since once they were set they could be treated like any other note, such as the sample start offset (since apparently there isn’t a way to set the sound buffer playback position in the Web Audio API, I had to essentially create a new buffer for these notes with the appropriate bits cut off and loops readjusted and all that mess). Then there were the tick-based effects. These required a solid restructuring of my code, because many of them required me to keep track of information per channel and update per-tick instead of per-row like I was. This was the point where I filtered out the channel state management code to the ITChannel class, which exposes a number of methods to manage the state of the currently playing note. Adding volume slides was easy and the first thing I did, though the Schism wiki had the effect described weird which threw me off until I found the OpenMPT wiki (which also described all the other effects much more clearly as well).

The biggest pain I had was the note pitch shift effect, typically labeled effect “G” in most tracker software. This was because this effect is based on the note’s playback period (i.e. sample rate) rather than a more simple note pitch adjustment. The latter would have been more simple because I store my note pitches as multipliers of the middle C playback speed. It took me a long time to figure out a period<->pitch multiplier conversion that sounded mostly right. The code I ended up using after a long string of ear-murdering failures:

updateRate()
{
 var notePeriod = (this.sample.c5rate*44100.0) * (Math.pow(2, ((this.note%12)/192)) << Math.trunc(this.note/12));
 
 var newRate = ITNoteRate[this.note] * ((notePeriod + this.period) / notePeriod);
 this.sample.adjustRate(this.sampleNode, newRate);
}

As you can see that sample rate magic number 44100 makes an appearance again. The way this works is it figures out what the note’s period should be based on the code of the Schism and OpenMPT trackers, then scales the adjusted period by that value to give a playback rate scale value, then multiplies that by the actual note playback rate. All the “current period” math is handled in the tick function and is basically just “if the playing note is higher than the final note, increase the period by the effect speed (because higher period = longer wavelength = lower tone) else decrease the period by the effect speed.” Admittedly, I have some weird sign value stuff going on in the code which could be more clear, but the math works out (the this.period adjustment value ends up the opposite of what I described; positive to increase the pitch and negative to decrease it. Fixing it is just a matter of swapping signs and subtracting instead of adding).

From here, adding vibrato was actually very easy: just add the scaled vibrato sine wave to the sample’s playback rate in the above function before passing it to the sample.adjustRate method. All of these get recalculated on a per-tick basis so that needs to be taken into account.

A few more effects used by other songs in the game later and I was confident my Impulse Tracker file player was done. The songs were 99% correctly playing, and doing so fairly smoothly. As long as it was the only tab open. Lag spikes were definitely noticeable, though fortunately handled well by the player. It sounded more like an unconfident band playing the last row until it could catch up. The problem was simply due to the resolution of setTimeout combined with the single-threaded nature of browser JavaScript. I needed something better, but wasn’t sure what to do. I knew the Web Audio API had a means for scheduling the start and stop of different Audio Buffer nodes, but the effects made it so simply starting and stopping notes wouldn’t be enough. The solution I actually stumbled across by accident. It turns out, the various parameters such as the buffer playback rate, gain and pan node values, and other things can all have scheduled adjustments as well.

Finding this out made me incredibly happy! I could schedule an entire row ahead of time, and instead of hoping the browser could call my callback every 20ms, I could let it chill a bit and only need to call me every 120ms (or 6 * 20ms, which is the typical speed*tickrate of the various AoW songs). A jitter of 2ms was only 1/60th of the time period instead of 1/10th under the previous method, and additionally was far less likely to cause problems because there were 1/6th fewer chances for the jitter to affect the timeout too.

At first, I was afraid I would have to completely rewrite my system to take advantage of this new paradigm, but you should always read the documentation fully because it turns out I wouldn’t really have to change much at all. I adjusted the autotick method on the ITSong class to instead be an autorow method, and just had it run through all the ticks for that row ahead of time, passing them the scheduled time when the tick should occur. From there all I had to do was add an extra parameter to all my various helper functions to take the scheduled time and pass those to the Web Audio API and it was done. The entire thing was drastically more smooth even with several heavy tabs including video streaming open.

With all that working well, it was time to move on to the next part: The graphical side of the title screen! But that’s for the next article…

Comments