Well it's time for another one of my design idea ramblings. Not really expecting many to read all this, but taking the time to write all this up has proved helpful with some of my projects as of late. As one may have noticed with some of my recent postings, I've taken on the goal to replicate some common Famicom audio expansion synths with low cost cartridge hardware. This is something I've wanted to do since getting involved with nesdev, but only now am I getting to the point where I can finally realize the idea having overcame the hurdles below:
Console Support
This is one of the biggest hurdles IMO, because if the average NES player uses a front loading NES system lacking built in cartridge audio expansion. Requiring users to modify their consoles by means of soldering can be considered a deal breaker. I recently came up with and teased a solderless audio expansion dongle for front loading NES's. The prototype was successful and most parts are on hand to support manufacturing and release in the upcoming months.
Armed with this, the biggest thing lacking is support for top loader consoles. Not much can be done about that for a solderless solution, routing audio to EXP9 as Bregalad did is the only option I see. To help support that my designs output audio to both EXP6 & EXP9 or have a jumper to connect the two.
There's no hope for cheap clones, and IMO any player who values audio quality wouldn't use one of these due to common duty cycle error and other incompatibilities. I can only hope that high end clones such as AnalogueNT, and AVS were designed with audio expansion support in mind. I haven't been able to confirm they have, I assume they support it via their famicom cartridge connector however.
Cartridge Hardware Cost
To date, the common means to replicate traditional audio expansions is with high density expensive FPGAs. This is a non-option for a homebrew game seeking cartridge release. I have some crazy newfangled projects I'm working on that would bring enough logic to the table on a manageable budget; but the fact remains most homebrew targets discrete mappers for good reasons. There are drawbacks of course, but cost of entry is significantly lowered when utilizing mcu based solutions.
The amount of mcu horsepower that can be purchased for under $1 has significantly improved thanks to the low cost of ARM mcus. It can still be a challenge to push the cost down with a stand alone resistor based DACs running ~$1 itself. The added cost of a separate mcu and DAC effective make it so the only sensible choice is to choose a mcu with built in DAC. Thankfully there are several cost effective mcu's with built in resistor based DACs for close to $1 in qty. That gets things on par with the cost of an ASIC mapper such as MMC1/MMC3/FME7, but the fact remains majority of homebrew targets discrete mappers.
With desire to push the cost even further I recently opened my mind to the viability of PWM based DACs. Armed with a single resistor and cap any mcu's PWM timer can generate reasonable quality audio. I learned most of what I know of PWM DACs from open music labs, and this project is legitimate enough proof of concept to keep myself from rejecting the idea despite the ugliness of PWM noise.
I've even gone so far as to push things to near free BOM cost by integrating the synth with a multitasked lockout chip "CICOprocessor". So at this point I'm mostly at the why not stage as I see a path to permit publishing of homebrew games with audio expansion with minimal added cost. I have a couple other designs in the works with more capable hardware at their disposal as well. This discussion is not intended to be focused on any specific board, mcu, DAC, etc, that's why I've separated this discussion to a dedicated thread. I am however using the CICOprocessor/STM8 as my starting point. My thought is if this general synth design is realizable on what is arguably one lowest cost 8bit mcu on the market, adapting it to more capable hardware won't be a problem. That and I got to the point with my CICOprocessor synth that I needed a better sense of actual compute time needed for the synth to get a better sense of what's possible. Load, add, multiply, store all roughly take a comparable number of cycles regardless of the selected mcu, so decisions that help one mcu will likely help whichever mcu is chosen; the same is not the case when comparing to a programmable logic implementation.
I realize there are trade offs with this lower cost mcu approach that are unlikely to not be acceptable to audiophiles. Arguably satisfying audiophiles can't be done on a budget, they're typically only satisfied with shelling out top dollar for originals or new old stock. I wish them happiness, but in the end success of lower cost solutions will only help pave the way for higher fidelity future projects. I recognize the battle with aliasing will forever be present. Admittedly I'm also not experienced in audio synthesis, I'm just now starting to feel like I'm breaching past noob level. But I've learned quite a bit from several members here in the few discussions I've already had on the topic. I'm thirsty to learn, so don't be shy about telling me where my ideas are flawed or where there may be tricks to be take advantage of.
Community & Tool Support
Aside from the actual hardware and firmware implementation this is the final hurdle. It's safe to say I personally won't go as far as adding support for said audio expansion to emulators or trackers. Tool support is pretty vital to success of this project, so the only viable option I see is to bootstrap myself in by mimicking traditional "standard" synths which are already supported by current tools. Some idea brought up by several members in the CICOprocessor thread have led me to a mcu synth design which can be configured to mimic VRC6 audio. The focus of this thread is to cover the actual mcu implementation which can be easily adapted to a chosen mcu. The interface details such as NES register structure, R2R/PWM DAC, synth sample frequency, or other mapper details are superfluous. My hope is that this mcu synth structure can be adapted, migrated, and utilized by other designs including other people's hardware designs. Or perhaps some details such as sample frequency will be synth settings that can be modified in real time.
The underlying synth I've designed thus far replicates the VRC6, but actually utilizes wave tables to do so. I already have a rough idea of how it can be configured to support Sunsoft-5B with minimal additions barring some questions in my mind on detailed operation of the YM2149F. I've yet to fully wrap my head around the inner workings of namco-163, FDS, and MMC5, however my current basic understanding of those has me relatively confident they too can be supported with a necessary additional features.
Being an mcu based solution the overall goal is fast execution. While avoiding unnecessary computations helps, optimizing the synth design to the point where it is optimized specifically for certain synth it detracts from larger goal of being versatile/universal. At this point it's better to leave those synth specific optimizations for low hanging fruit when it comes to the implementation stage. In a way my idea here is more along the lines of creating a super set of features required to replicate traditional synths. Every single feature doesn't always have to be implemented in practice depending on design constraints. With this approach, many of the underlying features are already present in the hardware should an 'advanced user' desire to start tinkering with all the advanced settings instead of only using the vanilla "VRC6" configuration. There are a design ques I took from namco-163 that you might notice, I got a better picture of how a mcu friendly soft synth might look after gaining a better understanding of namco-163 audio.
Enough Rambling, lets get to the design!
As mentioned, my initial goal is VRC6 support, the wiki does a good job of explaining it's operation so the discussion mostly assumes the reader has that entry knowledge. I'll do my best to explain how VRC6 register values translate to mcu variables to help explain operation. On a high level the mcu is executing the synth code on a fixed periodic basis in a interrupt service routine (isr). This happens to be the PWM DAC overflow interrupt for me currently, but regardless of the DAC type some method of time keeping is needed and the synth code is being executed on a periodic basis. This gives the first necessary variable:
T_tune = T_isr / T_hardware_synth: This is the number of how many hardware synth cycles occur for each soft synth isr. The STM8 runs @ 16Mhz +/- 1% assuming an 8bit PWM DAC the timer's top value is 255, so the soft synth isr occurs every 256 STM8 cycles (62.5nsec) = 16usec. VRC6 runs @ 1.79Mhz, T=558.7nsec. T_tune = 16usec / 558.7nsec = 28.64
This value is stored in a 16bit variable I've chosen 12.4 fixed point to align with subsequent variables. This variable can be adjusted to tune the synth with 1/16 step size, with T_isr of 16usec that gives 0.2% tuning steps.
Chan_count: This is the period counter for the channel, it gets reloaded to the current register value when it decrements past zero. The VRC6 has 3 channels @ 12bit, so we'll need 3 of these registers too. Aligning with T_tune above, I'm using 16bit variable with 12.4 fixed point. One thought I had for the lower fractional bits is to allow the main thread to add/subtract 'random' amounts to this register after each rollover to reduce aliasing. Not sure 4bits is enough to matter, but it's effectively all that there is to spare w/o 32bit mcu.
Chan_period: This is the current value stored in the synth's register which gets copied over at each rollover. For VRC6, these registers hold the actual value currently in $9001/2, $A001/2, $B001/2 registers. Must be at least 16bit, saves time to store as 12.4 fixed point so the 4bit shift only has to be performed once when the register is written to.
WAVE_TABLE: 64-256 entry wave table ram implemented in an array of bytes. The max value VRC6 needs to store in this table is 6, so in reality only 3bits are needed per entry. But condensing two nibble entries per nibble complicates data processing. VRC6 also doesn't really need 64 entries, but this is a good round number to start off with. Using byte variables to point to the current entry in selected in this table makes expansion up to 256 entries easily feasible. STM8 has 1KB of SRAM, dedicating 25% of SRAM to wave tables is within reason.
Chan_ptr: This is pointer to the current index in use by the channel in WAVE_TABLE above. Implemented as a single byte, limits WAVE_TABLE to 256 entries.
Chan_start: This is the starting value of Chan_ptr, this value gets copied to Chan_ptr when it exceeds Chan_last below.
Chan_last: This is the max Chan_ptr value for the channel.
Chan_vol: single byte variable, this equates to the standard volume registers in the hardware synth. If there's need to adjust/shift the volume register for leveling purposes, doing so when written to saves on subsequent computations.
Chan_out: single byte variable, final calculated output of the channel for the current isr cycle.
DAC_out: doesn't necessarily count as a variable, this is simply the value written to the DAC to be output for the current isr.
So for VRC6, with the variables above, each channel requires 9 bytes x 3chan = 27Bytes. Plus 2Bytes for T_tune and a few other hardware register values that aren't accounted for consumes ~30Bytes + WAVE_TABLE ram. Some of these variables will benefit from being stored in STM8 zeropage, but doesn't matter much for most.
VRC6 of course doesn't need wave table ram, but if we initialize the WAVE_TABLE in a way that aligns with the VRC6, we can easily use it to replicate the function of the 2x square channels, and saw channel along with the duty cycle generator. My thought is to initialize the first 16 entries [0-15] to 0x01, and next 15 entries [16-30] to 0x00.
The Chan_start variable gets set based on the selected duty cycle, effectively Chan_start = 15 - D. And Chan_last = Chan_start + 15. So for VRC duty setting of D=0 (6.25%) Chan_start = 15, Chan_last = 30. The falling edge of the square in the WAVE_TABLE occurs between index 15-16. Thus giving the proper 6.25% duty cycle in practice. Yes, the wave table is a bit overkill for a simple square; but it works and doesn't need to cost much for computation. Again it's over kill, but when VRC6 M is set for 100% duty, Chan_start simply gets set to 0, and Chan_last = 15. Just as the square duty cycle generator has 16 steps, the square channel steps through 16 entries of WAVE_TABLE per period.
To replicate the VRC6 saw wave, a series of 14 step entries can be used: 0-1=0x00, 1-2=0x01, 3-4=0x02, ... 12-13=0x06. To keep from colliding with the square table, these entries would be well placed at WAVE_TABLE[32-46], but obviously other locations would work just as well. For the saw channel we simply fix Chan_start = 32, and Chan_last = 46. And here just as the VRC6 saw channel has 7 steps, but is only incremented every other cycle, effectively dividing by 14, we've replicated the same functionality with 14 entries of the WAVE_TABLE.
To put everything together, here's pseudo/C code for the entire soft synth isr:
In the main thread, some tasks may include:
Copying newly written values over to synth variables.
If implementing some sort of gaussian anti-aliasing effort tweak Chan_count if rollover occured last cycle.
Maybe adjust T_tune if synth needs tuned.
Perform any other calculations that are 'nice to have' but not necessary.
There are a few features/nuances of the VRC6 which haven't been accounted for, but it's not hard to imagine how they might be implemented without changing the isr itself. Biggest features being x16/x256 frequency flags that effectively shift period registers by x4/x8 bits. I think the x16 shift makes more sense to shift T_tune to the left 4bits. If the x256 flag is set then additionally Chan_period would be shifted bits to the right. Both of those shifts can be reversed without data loss when the flags are cleared. Beyond that care needs to be taken to assure proper functionality when channels are enabled/disabled, duty cycle generators reset, etc. But these are all things that should be easily handled in the main thread, and don't need to run each soft synth isr cycle.
For completeness, here's a draft of STM8 assembly needed to implement the above psuedo/C code to get a sense of cycle count.
In the end things are looking pretty decent. I initially hadn't considered that channel calculations can be significantly reduced if the Chan_ptr isn't incremented as the last calculated output is still valid. Not every case is covered in this first draft however as things like volume changes won't take effect until the next Chan_ptr increment; I suppose that delay wouldn't be a real issue in practice.
Looking ahead to Sunsoft-5B audio there's only a few features that would need added to support the noise channel and envelope generator. Sunsoft-5B square channels are simpler though without a duty cycle generator. I wouldn't necessarily expect it to be reasonable for a resource constrained mcu to be able to fully implement VRC6 & Sunsoft5B at the same time. But the overall soft synth design could be modified to do so. In the end this model allows the more complicated mcu soft synth act as a standard synth to gain tool/emu support. At the same time the wave table and all the variables to go along with this homebrew soft synth are sitting there ready to be utilized by any motivated individuals. If nothing else this gives me a good soft synth template to start from when working on projects where there is only interest in a specific replica.
As always, I'm interested in thoughts, feedback, and ideas you all have on my scheme here!
Console Support
This is one of the biggest hurdles IMO, because if the average NES player uses a front loading NES system lacking built in cartridge audio expansion. Requiring users to modify their consoles by means of soldering can be considered a deal breaker. I recently came up with and teased a solderless audio expansion dongle for front loading NES's. The prototype was successful and most parts are on hand to support manufacturing and release in the upcoming months.
Armed with this, the biggest thing lacking is support for top loader consoles. Not much can be done about that for a solderless solution, routing audio to EXP9 as Bregalad did is the only option I see. To help support that my designs output audio to both EXP6 & EXP9 or have a jumper to connect the two.
There's no hope for cheap clones, and IMO any player who values audio quality wouldn't use one of these due to common duty cycle error and other incompatibilities. I can only hope that high end clones such as AnalogueNT, and AVS were designed with audio expansion support in mind. I haven't been able to confirm they have, I assume they support it via their famicom cartridge connector however.
Cartridge Hardware Cost
To date, the common means to replicate traditional audio expansions is with high density expensive FPGAs. This is a non-option for a homebrew game seeking cartridge release. I have some crazy newfangled projects I'm working on that would bring enough logic to the table on a manageable budget; but the fact remains most homebrew targets discrete mappers for good reasons. There are drawbacks of course, but cost of entry is significantly lowered when utilizing mcu based solutions.
The amount of mcu horsepower that can be purchased for under $1 has significantly improved thanks to the low cost of ARM mcus. It can still be a challenge to push the cost down with a stand alone resistor based DACs running ~$1 itself. The added cost of a separate mcu and DAC effective make it so the only sensible choice is to choose a mcu with built in DAC. Thankfully there are several cost effective mcu's with built in resistor based DACs for close to $1 in qty. That gets things on par with the cost of an ASIC mapper such as MMC1/MMC3/FME7, but the fact remains majority of homebrew targets discrete mappers.
With desire to push the cost even further I recently opened my mind to the viability of PWM based DACs. Armed with a single resistor and cap any mcu's PWM timer can generate reasonable quality audio. I learned most of what I know of PWM DACs from open music labs, and this project is legitimate enough proof of concept to keep myself from rejecting the idea despite the ugliness of PWM noise.
I've even gone so far as to push things to near free BOM cost by integrating the synth with a multitasked lockout chip "CICOprocessor". So at this point I'm mostly at the why not stage as I see a path to permit publishing of homebrew games with audio expansion with minimal added cost. I have a couple other designs in the works with more capable hardware at their disposal as well. This discussion is not intended to be focused on any specific board, mcu, DAC, etc, that's why I've separated this discussion to a dedicated thread. I am however using the CICOprocessor/STM8 as my starting point. My thought is if this general synth design is realizable on what is arguably one lowest cost 8bit mcu on the market, adapting it to more capable hardware won't be a problem. That and I got to the point with my CICOprocessor synth that I needed a better sense of actual compute time needed for the synth to get a better sense of what's possible. Load, add, multiply, store all roughly take a comparable number of cycles regardless of the selected mcu, so decisions that help one mcu will likely help whichever mcu is chosen; the same is not the case when comparing to a programmable logic implementation.
I realize there are trade offs with this lower cost mcu approach that are unlikely to not be acceptable to audiophiles. Arguably satisfying audiophiles can't be done on a budget, they're typically only satisfied with shelling out top dollar for originals or new old stock. I wish them happiness, but in the end success of lower cost solutions will only help pave the way for higher fidelity future projects. I recognize the battle with aliasing will forever be present. Admittedly I'm also not experienced in audio synthesis, I'm just now starting to feel like I'm breaching past noob level. But I've learned quite a bit from several members here in the few discussions I've already had on the topic. I'm thirsty to learn, so don't be shy about telling me where my ideas are flawed or where there may be tricks to be take advantage of.
Community & Tool Support
Aside from the actual hardware and firmware implementation this is the final hurdle. It's safe to say I personally won't go as far as adding support for said audio expansion to emulators or trackers. Tool support is pretty vital to success of this project, so the only viable option I see is to bootstrap myself in by mimicking traditional "standard" synths which are already supported by current tools. Some idea brought up by several members in the CICOprocessor thread have led me to a mcu synth design which can be configured to mimic VRC6 audio. The focus of this thread is to cover the actual mcu implementation which can be easily adapted to a chosen mcu. The interface details such as NES register structure, R2R/PWM DAC, synth sample frequency, or other mapper details are superfluous. My hope is that this mcu synth structure can be adapted, migrated, and utilized by other designs including other people's hardware designs. Or perhaps some details such as sample frequency will be synth settings that can be modified in real time.
The underlying synth I've designed thus far replicates the VRC6, but actually utilizes wave tables to do so. I already have a rough idea of how it can be configured to support Sunsoft-5B with minimal additions barring some questions in my mind on detailed operation of the YM2149F. I've yet to fully wrap my head around the inner workings of namco-163, FDS, and MMC5, however my current basic understanding of those has me relatively confident they too can be supported with a necessary additional features.
Being an mcu based solution the overall goal is fast execution. While avoiding unnecessary computations helps, optimizing the synth design to the point where it is optimized specifically for certain synth it detracts from larger goal of being versatile/universal. At this point it's better to leave those synth specific optimizations for low hanging fruit when it comes to the implementation stage. In a way my idea here is more along the lines of creating a super set of features required to replicate traditional synths. Every single feature doesn't always have to be implemented in practice depending on design constraints. With this approach, many of the underlying features are already present in the hardware should an 'advanced user' desire to start tinkering with all the advanced settings instead of only using the vanilla "VRC6" configuration. There are a design ques I took from namco-163 that you might notice, I got a better picture of how a mcu friendly soft synth might look after gaining a better understanding of namco-163 audio.
Enough Rambling, lets get to the design!
As mentioned, my initial goal is VRC6 support, the wiki does a good job of explaining it's operation so the discussion mostly assumes the reader has that entry knowledge. I'll do my best to explain how VRC6 register values translate to mcu variables to help explain operation. On a high level the mcu is executing the synth code on a fixed periodic basis in a interrupt service routine (isr). This happens to be the PWM DAC overflow interrupt for me currently, but regardless of the DAC type some method of time keeping is needed and the synth code is being executed on a periodic basis. This gives the first necessary variable:
T_tune = T_isr / T_hardware_synth: This is the number of how many hardware synth cycles occur for each soft synth isr. The STM8 runs @ 16Mhz +/- 1% assuming an 8bit PWM DAC the timer's top value is 255, so the soft synth isr occurs every 256 STM8 cycles (62.5nsec) = 16usec. VRC6 runs @ 1.79Mhz, T=558.7nsec. T_tune = 16usec / 558.7nsec = 28.64
This value is stored in a 16bit variable I've chosen 12.4 fixed point to align with subsequent variables. This variable can be adjusted to tune the synth with 1/16 step size, with T_isr of 16usec that gives 0.2% tuning steps.
Chan_count: This is the period counter for the channel, it gets reloaded to the current register value when it decrements past zero. The VRC6 has 3 channels @ 12bit, so we'll need 3 of these registers too. Aligning with T_tune above, I'm using 16bit variable with 12.4 fixed point. One thought I had for the lower fractional bits is to allow the main thread to add/subtract 'random' amounts to this register after each rollover to reduce aliasing. Not sure 4bits is enough to matter, but it's effectively all that there is to spare w/o 32bit mcu.
Chan_period: This is the current value stored in the synth's register which gets copied over at each rollover. For VRC6, these registers hold the actual value currently in $9001/2, $A001/2, $B001/2 registers. Must be at least 16bit, saves time to store as 12.4 fixed point so the 4bit shift only has to be performed once when the register is written to.
WAVE_TABLE: 64-256 entry wave table ram implemented in an array of bytes. The max value VRC6 needs to store in this table is 6, so in reality only 3bits are needed per entry. But condensing two nibble entries per nibble complicates data processing. VRC6 also doesn't really need 64 entries, but this is a good round number to start off with. Using byte variables to point to the current entry in selected in this table makes expansion up to 256 entries easily feasible. STM8 has 1KB of SRAM, dedicating 25% of SRAM to wave tables is within reason.
Chan_ptr: This is pointer to the current index in use by the channel in WAVE_TABLE above. Implemented as a single byte, limits WAVE_TABLE to 256 entries.
Chan_start: This is the starting value of Chan_ptr, this value gets copied to Chan_ptr when it exceeds Chan_last below.
Chan_last: This is the max Chan_ptr value for the channel.
Chan_vol: single byte variable, this equates to the standard volume registers in the hardware synth. If there's need to adjust/shift the volume register for leveling purposes, doing so when written to saves on subsequent computations.
Chan_out: single byte variable, final calculated output of the channel for the current isr cycle.
DAC_out: doesn't necessarily count as a variable, this is simply the value written to the DAC to be output for the current isr.
So for VRC6, with the variables above, each channel requires 9 bytes x 3chan = 27Bytes. Plus 2Bytes for T_tune and a few other hardware register values that aren't accounted for consumes ~30Bytes + WAVE_TABLE ram. Some of these variables will benefit from being stored in STM8 zeropage, but doesn't matter much for most.
VRC6 of course doesn't need wave table ram, but if we initialize the WAVE_TABLE in a way that aligns with the VRC6, we can easily use it to replicate the function of the 2x square channels, and saw channel along with the duty cycle generator. My thought is to initialize the first 16 entries [0-15] to 0x01, and next 15 entries [16-30] to 0x00.
The Chan_start variable gets set based on the selected duty cycle, effectively Chan_start = 15 - D. And Chan_last = Chan_start + 15. So for VRC duty setting of D=0 (6.25%) Chan_start = 15, Chan_last = 30. The falling edge of the square in the WAVE_TABLE occurs between index 15-16. Thus giving the proper 6.25% duty cycle in practice. Yes, the wave table is a bit overkill for a simple square; but it works and doesn't need to cost much for computation. Again it's over kill, but when VRC6 M is set for 100% duty, Chan_start simply gets set to 0, and Chan_last = 15. Just as the square duty cycle generator has 16 steps, the square channel steps through 16 entries of WAVE_TABLE per period.
To replicate the VRC6 saw wave, a series of 14 step entries can be used: 0-1=0x00, 1-2=0x01, 3-4=0x02, ... 12-13=0x06. To keep from colliding with the square table, these entries would be well placed at WAVE_TABLE[32-46], but obviously other locations would work just as well. For the saw channel we simply fix Chan_start = 32, and Chan_last = 46. And here just as the VRC6 saw channel has 7 steps, but is only incremented every other cycle, effectively dividing by 14, we've replicated the same functionality with 14 entries of the WAVE_TABLE.
To put everything together, here's pseudo/C code for the entire soft synth isr:
Code:
softsynth_isr:
Ch1_count = Ch1_count - T_tune
if (Ch1_count < 0) {
Ch1_count += Ch1_period
Ch1_ptr ++
if (Ch1_ptr > Ch1_last) {
Ch1_ptr = Ch1_start
}
}
Repeat above for each channel
Ch1_out = Ch1_vol * WAVE_TABLE[Ch1_ptr]
Ch2_out = Ch2_vol * WAVE_TABLE[Ch2_ptr]
Ch3_out = Ch3_vol * WAVE_TABLE[Ch3_ptr]
VRC6 requires further manipulation of Saw output:
Ch3_out = (Ch3_out && 0x00FF) >> 3
DAC_out = Ch1_out + Ch2_out + Ch3_out
Ch1_count = Ch1_count - T_tune
if (Ch1_count < 0) {
Ch1_count += Ch1_period
Ch1_ptr ++
if (Ch1_ptr > Ch1_last) {
Ch1_ptr = Ch1_start
}
}
Repeat above for each channel
Ch1_out = Ch1_vol * WAVE_TABLE[Ch1_ptr]
Ch2_out = Ch2_vol * WAVE_TABLE[Ch2_ptr]
Ch3_out = Ch3_vol * WAVE_TABLE[Ch3_ptr]
VRC6 requires further manipulation of Saw output:
Ch3_out = (Ch3_out && 0x00FF) >> 3
DAC_out = Ch1_out + Ch2_out + Ch3_out
In the main thread, some tasks may include:
Copying newly written values over to synth variables.
If implementing some sort of gaussian anti-aliasing effort tweak Chan_count if rollover occured last cycle.
Maybe adjust T_tune if synth needs tuned.
Perform any other calculations that are 'nice to have' but not necessary.
There are a few features/nuances of the VRC6 which haven't been accounted for, but it's not hard to imagine how they might be implemented without changing the isr itself. Biggest features being x16/x256 frequency flags that effectively shift period registers by x4/x8 bits. I think the x16 shift makes more sense to shift T_tune to the left 4bits. If the x256 flag is set then additionally Chan_period would be shifted bits to the right. Both of those shifts can be reversed without data loss when the flags are cleared. Beyond that care needs to be taken to assure proper functionality when channels are enabled/disabled, duty cycle generators reset, etc. But these are all things that should be easily handled in the main thread, and don't need to run each soft synth isr cycle.
For completeness, here's a draft of STM8 assembly needed to implement the above psuedo/C code to get a sense of cycle count.
Code:
isr_TIM2_update: ;ISR requires 9cycles for interrupt + 2cyc for jump to ISR + 9cyc return from interrupt.. 20cycle ISR overhead.
LDW X, Ch1_count ;2cycles
SUBW X, T_tune ;2
LDW Ch1_count, X ;2
JRPL next_channel ;1/2
;Channel update takes 8cycles if no rollover of Ch1_count
ADDW X, Ch1_period ;2
LDW Ch1_count ;2
INC Ch1_ptr ;1
LD A, Ch1_ptr ;1
CP A, Ch1_max ;1
JRPL Ch1_output ;1/2
MOV Ch1_ptr, Ch1_start ;1
;Channel update takes 16cycles if Chan_count rolls over to step Chan_ptr plus output updates below
Ch1_output:
LDW X, Ch1_ptr ;2cyc Reserve empty byte for MSB!
LD A, (WAVE_TABLE, X) ;1
LDW X, Ch1_vol ;2 Reserve empty byte for MSB!
MUL X, A ;4
LDW Ch1_out, X ;2
;Square Channel updates take 27 cycles if Chan_count rolls over
;being a square, optimizations could be made to remove multiply above and use TNZ (WAVE_TABLE, X) instead saving 7 cycles..
next_channel:
;same calculations as above repeated for remaining channels
;saw channel requires all the calculations above for Ch1/2 plus extra volume adjustments:
LD A, XL ;1
SRL A ;1
SRL A ;1
SRL A ;1
LD Ch3_out, A ;1
;Saw wave requires 32 total cycles to update if Chan_count rollover occurs
Sum_channels:
LD A, Ch1_out ;1
ADD A, Ch2_out ;1
ADD A, Ch3_out ;1
LD DAC_out, A ;1
;final output to DAC = 4 cycles fixed
;extra master volume leveling/adjustments may be desired.
;TOTAL ISR cycle count:
;minimum (no rollovers) = 20 + 8 * 3 + 4 = 48cycles
;min + square rollover = 48 + 19 = 67cycles
;max possible all 3chan rollover = 48 + 19 * 3 + 5 = 110cycles
;8bit PWM DAC with top value of 256 would execute this ISR every 256 CPU cycles
;min utilization (no rollovers) = 48/256 = 19%
;nominal util (one rollover) = 67/256 = 27%
;max util (all rollover) = 110/256 = 43%
LDW X, Ch1_count ;2cycles
SUBW X, T_tune ;2
LDW Ch1_count, X ;2
JRPL next_channel ;1/2
;Channel update takes 8cycles if no rollover of Ch1_count
ADDW X, Ch1_period ;2
LDW Ch1_count ;2
INC Ch1_ptr ;1
LD A, Ch1_ptr ;1
CP A, Ch1_max ;1
JRPL Ch1_output ;1/2
MOV Ch1_ptr, Ch1_start ;1
;Channel update takes 16cycles if Chan_count rolls over to step Chan_ptr plus output updates below
Ch1_output:
LDW X, Ch1_ptr ;2cyc Reserve empty byte for MSB!
LD A, (WAVE_TABLE, X) ;1
LDW X, Ch1_vol ;2 Reserve empty byte for MSB!
MUL X, A ;4
LDW Ch1_out, X ;2
;Square Channel updates take 27 cycles if Chan_count rolls over
;being a square, optimizations could be made to remove multiply above and use TNZ (WAVE_TABLE, X) instead saving 7 cycles..
next_channel:
;same calculations as above repeated for remaining channels
;saw channel requires all the calculations above for Ch1/2 plus extra volume adjustments:
LD A, XL ;1
SRL A ;1
SRL A ;1
SRL A ;1
LD Ch3_out, A ;1
;Saw wave requires 32 total cycles to update if Chan_count rollover occurs
Sum_channels:
LD A, Ch1_out ;1
ADD A, Ch2_out ;1
ADD A, Ch3_out ;1
LD DAC_out, A ;1
;final output to DAC = 4 cycles fixed
;extra master volume leveling/adjustments may be desired.
;TOTAL ISR cycle count:
;minimum (no rollovers) = 20 + 8 * 3 + 4 = 48cycles
;min + square rollover = 48 + 19 = 67cycles
;max possible all 3chan rollover = 48 + 19 * 3 + 5 = 110cycles
;8bit PWM DAC with top value of 256 would execute this ISR every 256 CPU cycles
;min utilization (no rollovers) = 48/256 = 19%
;nominal util (one rollover) = 67/256 = 27%
;max util (all rollover) = 110/256 = 43%
In the end things are looking pretty decent. I initially hadn't considered that channel calculations can be significantly reduced if the Chan_ptr isn't incremented as the last calculated output is still valid. Not every case is covered in this first draft however as things like volume changes won't take effect until the next Chan_ptr increment; I suppose that delay wouldn't be a real issue in practice.
Looking ahead to Sunsoft-5B audio there's only a few features that would need added to support the noise channel and envelope generator. Sunsoft-5B square channels are simpler though without a duty cycle generator. I wouldn't necessarily expect it to be reasonable for a resource constrained mcu to be able to fully implement VRC6 & Sunsoft5B at the same time. But the overall soft synth design could be modified to do so. In the end this model allows the more complicated mcu soft synth act as a standard synth to gain tool/emu support. At the same time the wave table and all the variables to go along with this homebrew soft synth are sitting there ready to be utilized by any motivated individuals. If nothing else this gives me a good soft synth template to start from when working on projects where there is only interest in a specific replica.
As always, I'm interested in thoughts, feedback, and ideas you all have on my scheme here!