Music Engine

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Music Engine
by on (#28698)
Two days ago, I planned out a music engine that was very very simple. It was so simple that I made the whole thing yesterday. However, upon looking at it, I realized that it was quite space-consuming. It also didn't allow for things like vibrato without taking up a million bytes. The B flat scale consumed about 33 bytes for the triangle wave's part. To me, that sounds like a HUGE waste of space. They are also all the same length (Making the lowest and highest notes twice as long would have added about 6 bytes). So I'm going to kind of do away with that idea.

So I want to make a new music engine where the music data takes up less space, and allows for things like vibrato and sweeps. I see that 33 bytes for a B flat scale looks really big for such a small thing, but I don't really have a good idea of how big a song should be. So about how many kilobytes (or bytes) would you say a one-minute-long song with lots of dynamic that loops over and over take up in a good music engine?

by on (#28704)
You could try making a .ned file in NerdTracker II to get a ballpark figure.

by on (#28717)
If you think you could do anything with the results, I'll rip an NSF into a few individual songs for you to compare the data against. I don't know how helpful it would be, but just in case...

by on (#28718)
I already wrote a sound engine that allows a lot of effects including vibrato and sweeps, it only lacked better enveloppe control, a few optimisation here and here and arpeggio. I already send the source to you back then so you should have an idea how to store music while keeping the byte count as low as possible.

I also have a simpler and more space efficiant music engine than the one I sent to you, but it doesn't allow for vibratoes and only allows hardware sweeps (wich can lose precision as opposed to software ones), and that allows relatively basic volume control. However, this simpler music replayer take very few space on the ROM, I'm using it on my current project where I want to avoid any PRG bankswitching. If you prefer this version I could give you the source and doccumentation of it.

by on (#28723)
ugetab wrote:
If you think you could do anything with the results, I'll rip an NSF into a few individual songs for you to compare the data against. I don't know how helpful it would be, but just in case...


That would be really really helpful. I would really appreciate it :) .

So Bregalad, I looked at your sound code, and the data for the channels looks pretty big, but maybe that's just because I don't have a good idea of how big music should be. But both of the songs have tons of dynamic, so it looks like it'd take up way less space than my engine. Does music generally take up a lot of space?

by on (#28725)
Sounds like you'd need to track down a multi-track NSF that was written with the system you want to use.

It's not much help to know how much space 1 system takes up when you're not planning to use that system. A lot of the oldest NSFs have been cut down to 3KB, with the header, the code, and the music itself in the file. Other games, like Maniac Mansion, are at the other end of the spectrum taking up 100+ KB of space total.

Pick something to work with, then figure it out using a couple of examples.

by on (#28726)
The largest NSF I found is Dragon Warrior 4, wich is 104 KB (however maybe the dump contains information that isn't vital to sound code).

I don't think you can do much better than one byte per note, one nybble for the lenght and one nybble for the note itself. Plus a few bytes to give the sound code some information. With my system you can repeat a bunch of data, and have two levels among the code (so you can have a track "call" a subtrack just like a programm call a subroutine). I don't know how helpfull this really is, but it can save space here and there. Finally you may want to apply a general-purpose compression algorithm to the whole song to reduce the size even more (but you'd have to decompress the whole song into RAM before playing it, so forget about if if you don't have SRAM, or else you'll only be able to play very small songs). Any scale should be 12 bytes wide (not 33 as you mentionned) regardless if you play the first and last note longer.

Many other formats come up with two bytes per note (which is bad I think) or with a pattern system, where the song just change instuments and call patterns wich only contains notes (wich is good, but maybe not always optimal and not always flexible). NT2 works with the last one, while MCK works with the first one (trus wasting a really big amount of ROM per song).

by on (#28729)
Okay, so I'm going to try to do something like Nerd Tracker where you have sections that you can play whenever you want. So I won't waste space if I'm playing the same thing over again. I'll also be able to use bit 7 of the note to check for dynamic changes and stuff. I'll have to think some more. You're right Bregalad, you really can't get better than a byte per note. That I'll keep in mind.

by on (#28742)
See these documents:
If I remember right, NT2 header is about 52 bytes, NT2 .dat is a 16 byte instrument header, 2 bytes per order table entry, and 3 bytes per distinct pattern (offset + length), NT2 order table data uses 4 bytes per row.
NT2 pattern data uses 4 bits per row, 8 bits per note, 4 bits per instrument, 4 bits per effect with parameter 00, and 12 bits per effect with positive parameter.

In my own files:

Covers vol. 1: 3 KiB of engine, 24 KiB of 15 songs
Opentris: 3 KiB of engine, 18 KiB of 11 songs

by on (#28743)
You can get better than one byte per note, for most notes anyways. Once you set the initial note, you can set the following notes relative to the initial one with 4 bits. I don't think you need to set the length for every single note.

However I wouldn't really sweat it too much when it comes to writing a sound engine. Famitracker and Nerdtracker both get by pretty well, unless you have some other specific need for your own.

by on (#28744)
For one thing, you'll need some sort of hook for sound effects. Do the latest Famitracker and NT2 replay code allow for doing this easily?

by on (#28745)
I was thinking about that actually. For instance, could NT2 easily mimic the sound effect for the breaking holy water glass on CV2? It's done with the square wave, and I have to say, it's quite impressive. I don't see it being done with NT2 or FamiTracker, however. I'll be making a game that will have many sound effects like that, so if I were to use NT2, I'd have to handle sound effects myself.

by on (#28746)
tepples wrote:
For one thing, you'll need some sort of hook for sound effects. Do the latest Famitracker and NT2 replay code allow for doing this easily?


Unfortunately no (AFAIK, I haven't worked with the lastest Famitracker engine).

What works though is to change all sound register writes in the music engine so they write to a buffer in RAM, then you can easily interrupt any channels for a sound effect.

by on (#28763)
The holy water sound effect in Castlevania II is exactly the same as in Castlevania I and Castlevania III. It uses both Square and noise, the square just play a few random very-high notes.
It's true that Konami's sound effects often rock, due to their usage of hardware seeps for Square channels. Capcom sound effect are often great too.
I once reverse-enginereed the sound code of Dragon Quest I (it was incredibly simple), and Just Breed (this one was really insanely complicated). I think I lost my notes about them, I'm not completely sure. They also used the note/command thing.

by on (#28779)
I don't think I'll ever use NT2. It seems to take up a lot of RAM space. I was thinking about compressing note lengths so I could use 4 bits per length instead of a whole byte. In the bytes that define notes, I only have bit 7 open (There are less than 128 notes in the table), but I can still use that to indicate dynamic changes. I want to refrain from defining things more than once as much as I can. I want to be able to make the Bb scale take up VERY little over 15 bytes (The Bb scale consists of 15 note changes). I'll come back once I have everything planned out.

by on (#28782)
Regarding using a small number of bits to encode things, music playback is good for variable-length compression schemes, since you don't need much random access and the encoding doesn't have to be super-efficient (since much of the time is spent actually applying commands). The only random access is for groups of commands, and these can be separate blocks where the decoding state is reinitialized when starting one. So for example Memblers mentioned encoding notes as only 4 bits. Much of the time the next note is near the current one. When it's farther than +/-7 semitones, you can encode it as -8 along with a second byte indicating the absolute note. Similar could be done for note length, encoding the common values more compactly.

by on (#28784)
Here's a scheme that I just came up with:
Code:
76543210  Musical phrase bytecode values $00-$BF
||||||||
|||+++++- 0-29: Offset in semitones from current base note
|||       30: Don't play a note
|||       31: Release current note
+++------ Wait time after note
          000: 1 row; 001: 2 rows; 010: 3 rows; 011: 4 rows;
          100: 6 rows; 101: 8 rows; 11x: escape for other commands

"Other commands" ($C0-$FF) might include ending a pattern, changing instrument, setting the base note (e.g. octave changes), setting vibrato depth, turning retrigger per row (mandolin tremolo) on or off, etc.

by on (#28806)
I just had an idea to slightly reduce the note encoding thing. When a real musician is playing a music tune, he will be told in wich scale he should play it, and wich flat/sharps this implies. Then, he plays note in the said scale until he's especialy told to, with additional flat/sharp symbols before notes. The computer thing would to the same thing, it'd say the computer "play in the Bb scale", then automatically 0 becomes Bb, 1 becomes C, 2 becomes D, then Eb, F, G, and A. This will be 7 bits for the note tone, and since the scale matches the melody, there would be fewer octave change commands needed. If the melody should play another note than the ones told above, a special command could be done in order to tell the computer so. With the same way, you could have a few programmable lenght for the channel, and when notes aren't in the normal ~4 more common note lenght of the track, a special command could told it to do another note.
This could be more efficiant (6 bits per note) but what could you do with the 2 bits left ? Of course you could do a system where 4 notes holds in 3 bytes (4*6 = 3*8 = 24 bits) but this would be somewhat unpratical to encode and decode. The best thing would be to define silences and note prolongation as well as other command with those 2 bits left, but that's still 8 bit per note, only a different (and more annoying) usage of it. I really cannot think of anything with 4-bit per note, unless you want to always give the programm special commands to play nonstandards notes, wich would in fact almost increase the size of the final file.

by on (#28810)
Oh wow, are you suggesting having a key signature for your music code, so one bit could represent a note's value? If not, that's what I'm suggesting, because that would be awesome. So you could have this:

#$FF

Since all 8 bits are set, it would play notes 0 - 7, or Bb to Bb. The problem is you'd need to have a control byte that says what dynamic changes are needed, and you'd also have to switch octaves and stuff. But, you could use bit 7 to check for dynamic changes, and just forget about the high note of the scale, since it's the same as the lower one, just an octave higher.

by on (#28816)
Before this gets too wacky: Kolmogorov complexity

It might be wise to build one to throw away. Just do it, and if you can reproduce my Covers vol. 1 in half the space, good for you.

by on (#28828)
All this talk reminds me of things I was thinking of when I was writing my own music engine (about 4 years ago maybe? BTW I never really used it, so it only ever supported the first pulse channel.)

It was set up so all the music tracks were relative notes, but the actual pattern of the song is normal notes (which triggers the relative notes). It lends itself to transposing short phrases (even 2 notes) or larger parts of a song. More music for less space, but it depends entirely on the song.

Other main features I wanted:
1. Automatic echo (just echoing a channel in another, but with configurable priority thresholds so you can play on top of the echo).
2. Multiple channel instruments (configurable priority for each channel).

1. is annoying to do in MML, 2. is annoying to do in trackers! As we all know here, it's kind of a case where if I want it, I have to make it myself. But I get by just fine using trackers, so I never really used my engine for anything.

by on (#28829)
But managing 8 virtual channels and mapping those onto the 4 built-in channels might take a lot of the 2 KB of RAM that could be better used for gameplay.

by on (#28849)
Definitely.
I also one trought about automatic echo (but on a single channel). When a note is played, even if the user defines the maximum volume and all, the actuall volume would never be above 8 or so, and then after a while if the key is still on the volume would suddentely increase to 12, then 14, finally 15 to simulate echo. When the key is off then the volume would become something like 15 -> 4 -> 2 -> 1 -> 0. That is assuming the volume is constant, but if an enveloppe is added (such as fast decaying), you would hear light decays on the main one, etc... This would require playing the channel so many times internally, and have a mixer that would take the volume of each delay of the channel, take the stronger one, and add it's volume to all the other who are playing the same note (each one shifted right the number of times it should do). However, this is insane and probalby wouldn't sound that good. It would waste ENORMOUS ammount of RAM however.

by on (#28851)
Plus, audio volumes don't really add that way. Instead, they add in root-mean-square space: 11 + 11 = sqrt(11^2 + 11^2) = 15.55.

by on (#28852)
Does that means that volume 5 is the equivalent to 3+4 ? This would make it a lot louder than 4 then. I trought the volume would be logarithmic instead (but I'm unsure how to copute this).

by on (#28854)
NES, Game Boy, GBA, and DS audio use a nearly linear function from volume to amplitude. It's SMS/Game Gear that uses an exponential function.

A signal with amplitude 4 and a signal with a different frequency and amplitude 3 will sound as loud as a single signal of amplitude 5. This is easier to see with uncorrelated signals such as two sources of noise than with periodic signals: if the frequencies are close (e.g. C and C#), you'll get beating as constructive interference goes to 7 and then destructive interference goes to 1, but it'll still sound like 5 overall. Adding 4+3 to get 7 assumes that the phases are always set up for constructive interference, which is not true of actual echo.

by on (#28862)
Oh yeah. That's two different things, the fact that the echoed sound isn't in phase with the original one should be right. This doesn't proof that they are dephased of 90° trough, but I guess that's the best approximation one can do.

What I was talking about it that the NES volume control is linear, but the human perception of sound is exponential (if you hear a song that is 4 times louder, you will think it's 2 times louder). That means that volume 8 seems barely less than volume 15, but volume 4 seems a lot less than volume 8 in comparaison. (btw the SNES has linear volume control too, but (optionnal) exponential enveloppe control).