Looking for feedback on new APU Low-Pass FIR Filter

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Looking for feedback on new APU Low-Pass FIR Filter
by on (#81632)
Hello all! Here are some samples for my new low-pass FIR filter for my APU core. I was hoping I could get some feedback on the quality of the audio. Does it sound excellent/crappy/ok/good/etc?

---
The FIR filter specs are as follows:
Oper Freq = 1.79MHz
Fcutoff = 20kHz
Fsampling = 48kHz
Num_Taps = 256 (255th order)
Input Sample Width = 16
Coefficient Bit Width = 16
---

Super Mario Bros. - Level 1 Song (All Channels):
dead link

Super Mario Bros. - Level 1 Song (Triangle Channel Only):
dead link

Journey to Silius - Title Screen Song (All Channels):
dead link

Journey to Silius - Title Screen Song (Individual Channels):
0:00.0 - 0:23.5 = Pulse 1
0:23.5 - 0:45.0 = Pulse 2
0:45.0 - 1:08.0 = Triangle
1:08.0 - 1:22.0 = Noise
1:22.0 - End = DMC

dead link

UPDATE:
Solstice (All Channels):
dead link

EDIT: Links are now dead but I will be posting new ones later on with my much improved LP FIR filter results. Stay tuned!

Pz!

Jonathon :)

by on (#81645)
So there's obviously something wrong with my FIR filter. In Solstice you can hear that the Noise channels "tink tink tink" in-game sound is all dorked up. :) After talking to Kevtris I don't think it's the FIR filter logic that's messed up - it's the clock rates that I'm running it at. I'm going to fix the issues and then put up a new sample. :)

by on (#81668)
Do you have 25 multipliers at 179 MHz? It seems excessive unless you're filtering reaaaaallllyyy slow information like seismic data. Even the propagation of 25 adders would significantly affect performance.

I think I would go with one time-multiplexed multiplier-accumulator at 21.477272 MHz.

by on (#81670)
Heh, what u suggest is essentially verbatim what kevtris suggested. ;)

In case u haven't noticed yet (yeah right) I'm not very knowledgeable when it comes to this audio stuff. Which is why I'm making these silly errors. Hehe. I really appreciate your reply!

by on (#81674)
There's a technique I learned in the DSP class that helps with multiplicative load: You can move the FIR filter after the decimation (integer ratio downsampling). These guys seem to explain it

by on (#81731)
(1789772.727272[...] * 77) / 3125 = 44100 (exactly)

1789772.727272[...] * 77 = 137812500 (source frequency for FIR design)
FilterLength * 77 = Number of taps for FIR design


Pseudocode:

output[n] = 0;
for(i = 0; i < FilterLength; i++)
{
output[n] += input[(n * 3125 / 77) + i] * FIRTable[(77 - ((n * 3125) % 77)) + (i * 77)];
}


I may have gotten some of it wrong/backwards, it's been so long since I really played with this sort of thing, but that should at least give you some ideas.

Edit: 352 and 13125 for 48000 exactly.

by on (#81808)
lidnariq wrote:
There's a technique I learned in the DSP class that helps with multiplicative load: You can move the FIR filter after the decimation (integer ratio downsampling). These guys seem to explain it

Yes, one of my major problems was that I was not decimating the output of the FIR filter (simply because I didn't know that I was supposed to be, hehe). I've fixed that issue now and have also fixed another major bug that was in the FIR filter module itself. It's all working now and solstice sounds like it should with the high-pitch "tink tink tink"! :) I will post some new audio samples (examples of both good and bad) once I've got all the code cleaned up.

With that said, my decimation is at the output of the filter. I would think that down-sampling at the input to the filter would not be a very good idea since you lose so much information that the filter could be using to get a better "answer" (i.e. resulting filtered sample). But I could certainly be wrong. ;) Down-sampling at the input would definitely reduce the multiplicative load...but my question would then be...at what cost?

Mednafen wrote:
(1789772.727272...

Thanks for the tip! I really appreciate you trying to help me out, but unfortunately my emulator is designed in hardware (HDL+FPGA) so your tip doesn't apply to me. :( But still, thanks!

by on (#81848)
jwdonal wrote:
With that said, my decimation is at the output of the filter. I would think that down-sampling at the input to the filter would not be a very good idea since you lose so much information that the filter could be using to get a better "answer" (i.e. resulting filtered sample). But I could certainly be wrong. ;) Down-sampling at the input would definitely reduce the multiplicative load...but my question would then be...at what cost?
The point is that you only run the FIR filter for each of the output samples, not for all the input samples. I remember something funny about splicing the FIR filter appropriately to make a unified FIR-and-decimate object, but the teacher did seem to enjoy being deliberately obtuse sometimes.

by on (#81862)
As promised here are my much improved results - now with a low-pass FIR filter that actually does what it's supposed to! Imagine that! Lol. Much credit goes to Kevtris and a DSP guru co-worker at my job. I have some plans on how I can make it sound even better but these results match both Kevtris' Solstice MP3 and Nestopia so I'm happy with it as is. Enjoy!

Solstice In-Game Music (Noise Channel Only) - because the noise channel is the actual stress-test for the filter

Solstice In-Game Music (All Channels) - just because the music is cool :)

Solstice In-Game Music (All Channels - Unfiltered!) - What you hear happening is the ultra-high-frequency noise channel audio getting aliased or "folded" back into the audible frequency range. This is not good! The only way you can fix this is to install a low-pass FIR filter on the output.

EDIT: I updated the above links to DropBox locations so the upload should be much faster. You're welcome. ;)

by on (#82036)
Okay, I have another FIR filter/audio question. There must be something that I am just not understanding about the noise channel or possibly something about FIR filters.

Kevtris says his low-pass FIR filter implementation is a 256-tap filter with a sampling frequency of 21.47727M / 256 = 83.9k. And his cutoff frequency is 20kHz. He then takes a sample every 256 iterations of the filter and sends the filtered audio sample to his DAC. He says that this works great for him for filtering out the high-frequency noise.

At first glance his setup seems fine, but from what I understand about FIR filters and the APU's noise channel his implementation simply should not work. And it's driving me bonkers trying to figure out his "magic". :)

The problem I see with his implementation is that the Nyquist frequency of his FIR implementation is (Fsampling/2) 41.9kHz, but the maximum noise channel frequency (using the lookup-table on the Wiki) is 1.79MHz/4 = 447kHz. And the Nyquist-shannon theorem states that any input frequency greater than the Nyquist (Fs/2=41.9k) will get "folded"/"aliased" back into the pass-band range. So everything in the noise channel from 41.9k to 447k will get folded back into the audible range. Which completely defeats the purpose of the low-pass filter. Lol.

I must be missing a key piece of information but I just can't figure out what it is. I even asked my DSP friends at work and (assuming that I explained Kevtris' setup correctly) they also said that it shouldn't work.

If I implement kevtris' filter in my APU as described the noise channel is all aliased for the higher frequencies. However, if I implement a FIR with a sampling frequency of 21.47727M/16 = 1.34M, then that gives me a Nyquist of 671kHz, and the noise channel output sounds perfect (since Nyquist > 447kHz).

What am I missing??? Does anyone have any ideas at all?

Thanks!

Jonathon

by on (#82047)
Is there any chance he runs the audio and its filter at 1.8MHz instead of 21.5MHz? Otherwise, yes, you're right that it doesn't make sense.

by on (#82050)
lidnariq wrote:
Is there any chance he runs the audio and its filter at 1.8MHz instead of 21.5MHz? Otherwise, yes, you're right that it doesn't make sense.

I really don't think so. I asked him probably three times now and I get the same answer. I don't want to keep bugging him. Also, if he was using the 1.79MHz clock that would just make the Nyquist even less (i.e. 1.79M/256 = 7kHz).

Here are excerpts from the #nesdev IRC channel.

---
Once upon a time in #nesdev...

Code:
<jwdonal> hey kev, what type of filter do you have on your NES' audio output
<@kevtris> I have a 256th order lowpass FIR filter
<jwdonal> is that in the FPGA or in the codec itself?
<@kevtris> result audio is very good and sounds like a real NES then for looped noise and such
<@kevtris> in the FPGA
<@kevtris> I have an external DAC of course, stereo 24 bits
<@kevtris> running at 83.9KHz
<jwdonal> are the coefficients stored as mantissa/exponent format?
<@kevtris> fixed point
<@kevtris> fixed point is how all math works on my FPGAs
<@kevtris> I made a lowpass filter, for around 20KHz cutoff
<@kevtris> my audio's 1/256th the rate of the 21MHz clock
<@kevtris> And I take a sample every 256 iterations of the filter
<@kevtris> I run it at 21.47727MHz so I get a 83.9KHz or so sample rate on the output
<@kevtris> which is what my DAC runs at
<@kevtris> so it's all integrated
<@kevtris> I only used 1 multiplier

Note that I think Kev actually meant a 255th order low-pass filter with 256 taps - people very often get order and tap-number confused. But either way, it's not a big deal.

---

And on some other day on #nesdev...

Code:
<jwdonal> hey kev, what sampling frequency did you say you were using for your NES' low-pass filter (i.e. the frequency at which the samples are entering the filters running sample set)?
<@kevtris> 21.472272/256
<@kevtris> 83KHz
<jwdonal> so if you are downsampling the output of the APU to 83kHz doesn't that mean that all frequencies above the Nyquist (i.e. 83k/2) will get aliased back into the audible range?
<jwdonal> cause the noise channel goes up to 447kHz...
<@kevtris> no it means they are cut out
<@kevtris> my audio is generated in real time, and has no sample rate
<@kevtris> just like a real NES
<@kevtris> it is then filtered, which that has a sample frequency of 21.47727MHz
<@kevtris> and then the output is 83KHz to the DAC
<@kevtris> cutoff's like 20KHz
<@kevtris> on the filter proper
<@kevtris> everything above is cut off
<@kevtris> so it's a lowpass
<jwdonal> ok, i def understand that.  i was just thinking that down-sampling causes aliasing, but you're saying not.
<jwdonal> i'm curious what you mean by "generated in real time and has no sample rate".  you just mean that the samples are generated as the APU creates them, yes?
<trap15> jwdonal: yes
<trap15> sorta
<trap15> basically, the output is always being made

From what I understand about filters and the Nyquist-Shannon theorem, Kevtris' reponse to my aliasing question (i.e. "<@kevtris> no it means they are cut out") is incorrect. Unless Kevtris has somehow disproven the Nyquist-Shannon theorem. Lol.

But anyway, from the above I gather that Kevtris is shifting in samples at the 83kHz frequency and then running the FIR's multiply-accumulate at 256x that rate. This makes sense since he is only using 1 multiplier and because he must calculate the filtered result for the current 256-sample set before the next sample is shifted in. What he is doing all makes sense but with the noise channel frequency going far past 41.9kHz it just shouldn't work! :-P

---

I really don't want to have to ask Kevtris about this again. I have to be just mis-understanding something simple. Or it might be possible that Kevtris did this stuff so long ago that he has just forgotten exactly how it worked. Whatever the case, it's driving me up the wall. All I want to know is that I'm not crazy - someone care to confirm? :-D

by on (#82076)
If the cutoff is at 20 kHz, hopefully there aren't frequency components above 41 kHz to alias, much less ones able to alias into audible noise.

by on (#82079)
kyuusaku wrote:
...hopefully there aren't frequency components above 41 kHz to alias...

But that's exactly what I'm saying. There _are_ frequency components above 41kHz. Namely those in the Noise channel which go up to 447kHz.

by on (#82089)
They should be filtered, that's the whole point.

by on (#82108)
You either didn't read anything I wrote, or you don't understand FIR filters and the Nyquist-Shannon theorem. Either way, what you're saying is incorrect.

by on (#82132)
For what you describe the filter would have to run under 447 kHz x 2/downsampled before filtering, which I now see you assume after begrudgingly rereading the post. I have no idea why you assume this, he explicitly said he runs the filter at 21.47 MHz and the sample rate (output) is 83 kHz.

It can be assumed that downsampling takes place within the filter module because he doesn't need 21 MHz output / components up to 10.5 MHz. Conveniently this can be implemented with little more than a counter, single MAC and 256 word ROM for the coefficients.

21.47 MHz samples -> FIR (hopefully no components above 20 kHz) -> one of every 256 samples taken (once the accumulation completes)

But durrrrrr FIR n N-S iz hard so wat do i kno :?

by on (#82161)
Yeah, nothing you said makes sense. Sorry to be blunt. Please don't take that the wrong way either - I'm not saying your "dumb" or any silliness like that. You have actually helped me out a lot in the past. I would just say to study up on FFTs, the Nyquist-Shannon theorem, and implement a couple working FIR filters yourself in HDL and Matlab and then come back if you still want to discuss. But thanks for trying to help!

Anyway, I already spoke to Kevtris at length about this again yesterday on IRC and he verified that with the information that I have (which I posted in detail here) the filter definitely would not work (very bad aliasing would occur). He just said that it's been 7 years so he doesn't remember exactly what he did. And he doesn't want to look at his source because that would make the mystery too easily solved. Lol. :)

I also spoke to a few DSP engineers at work and the filter would not even remotely work as described (again aliasing). And my understanding of Kevtris' description/supposed-implementation was accurate (as confirmed with him yesteday on IRC). The fact is he just doesn't remember everything that he did. :)

I have a working low-pass FIR filter already but it requires more than one multiplier. Which is fine, it just takes a bit longer to implement. I'm trying to bribe Kevtris to take a look at his source and figure out what he did. Hehe, we'll see what happens. ;)

Jonathon

by on (#82190)
jwdonal wrote:
I have a working low-pass FIR filter already but it requires more than one multiplier. Which is fine, it just takes a bit longer to implement.
More than one multiplier per what? I think you can't do better than one per input sample...

by on (#82215)
lidnariq wrote:
More than one multiplier per what? I think you can't do better than one per input sample...

Yep, you are correct. The coefficients for this particular application are always symmetric so I actually only require N/2 multipliers where N is the number of taps (this is referred to as a "folded" FIR btw). So my implementation is not a pure linear FIR requiring one per sample. But still, N/2 multipliers is not as good as 1 multiplier - which is why I'm on a mission to figure out how Kevtris got away with it. ;)

by on (#82225)
kyuusaku wrote:
For what you describe the filter would have to run under 447 kHz x 2/downsampled before filtering, which I now see you assume after begrudgingly rereading the post. I have no idea why you assume this, he explicitly said he runs the filter at 21.47 MHz and the sample rate (output) is 83 kHz.

It can be assumed that downsampling takes place within the filter module because he doesn't need 21 MHz output / components up to 10.5 MHz. Conveniently this can be implemented with little more than a counter, single MAC and 256 word ROM for the coefficients.

21.47 MHz samples -> FIR (hopefully no components above 20 kHz) -> one of every 256 samples taken (once the accumulation completes)

But durrrrrr FIR n N-S iz hard so wat do i kno :?

Somebody care to explain what is wrong with what kyuusaku is saying? I can't see anything wrong with it.

Input sample rate is 21.47 MHz, multiply input samples with coefficients and accumulate for 256 cycles (needing only one hardware multiplier) and we have one filtered result sample at ~83KHz sampling rate. So we're basically only calculating the filter output y[n] at y[255], y[511], y[767], etc.

I'm not sure how well 256 taps would work though when the sample rate is so high compared to the cutoff frequency.

by on (#82227)
Perhaps you could use a few stages of 2:1 or 3:1 downsampling first; those don't need as many taps.

by on (#82249)
tepples wrote:
Perhaps you could use a few stages of 2:1 or 3:1 downsampling first; those don't need as many taps.

That's def a great idea. And myself and one of the DSP guys at work actually tested that a short while back. It does help reduce the number of multipliers but not as much as one might think. After looking at the various frequency responses of both 2-stage and 3-stage filter implementations in matlab we concluded that there was no significant advantage over a single-stage. Actually, I concluded that a single stage was overall more advantageous since the HDL implementation was much less complex and less error prone. :) Hehe.

by on (#82309)
thefox wrote:
Input sample rate is 21.47 MHz, multiply input samples with coefficients and accumulate for 256 cycles (needing only one hardware multiplier)

Wow, I totally missed your reply fox - sorry about that! To answer your question this implementation is not possible on either mine or Kevtris' hardware. The reason is because if your filter's sample rate is 21MHz with only one multiply-accumulate core, then you must run the MAC operation at 21MHz*256 (i.e. 5Ghz!!) in order to calculate the filtered result before the next sample is shifted in on the next 21MHz clock cycle. This is simply not possible on any FPGA in existence. Lol. Hope that clears things up.

thefox wrote:
I'm not sure how well 256 taps would work though when the sample rate is so high compared to the cutoff frequency.

Yep, you're right. The response totally sux - as can be see here. At -45db the 256-tap frequency is wayyyy out near 200kHz. Lol. You would literally needs *thousands* of coefficients to reach the desired cutoff of 20kHz. This is just another of many reasons that Kevtris' filter will not work as described.

Pz!

Jonathon

by on (#82310)
Never was I suggesting that the FPGA be run at 256 x 21 MHz, but rather 21 MHz samples be directly accumulated and decimated simultaneously over 256 iterations. Clearly it's not the traditional topology since the samples aren't buffered and the impulse is non-continuous, but it is some kind of FIR filter.

I took your advice jwdonal and worked out the hardware for a traditional FIR (single-multiplier). Now I understand your conclusion about aliasing, I guess didn't read your post carefully enough either time; nevermind DSP, big posts are hard to follow. What I needed was a reminder of n MAC units / n iterations / n taps = a single MAC implementation effectively must be downsampling at the input. The obvious solution to this problem: reduce the filter order... I really doubt 19th order would be all that bad.

by on (#82347)
kyuusaku wrote:
Never was I suggesting that the FPGA be run at 256 x 21 MHz, but rather 21 MHz samples be directly accumulated and decimated simultaneously over 256 iterations. Clearly it's not the traditional topology since the samples aren't buffered and the impulse is non-continuous, but it is some kind of FIR filter.

Okay, I think I see what you're saying. What you're describing is not a FIR filter in any form. The reason being because there is no convolution involved, that is, the "sliding window" of samples being shifted in each clock cycle does not exist - hence there can be no convolution. I have no idea what kind of filter that is and one of my DSP friends at my job didn't know either. I also tried googling it (not really knowing what to call it) and didn't come up with anything.

In any case, I can try this out and see what happens. It would be incredibly interesting if it worked. If it did work, then the mis-information that Kevtris gave me was in telling me that he implemented a "FIR" filter when he actually didn't. I will let you know what happens after I try it...might not be for a couple days though.

by on (#82367)
jwdonal wrote:
kyuusaku wrote:
Never was I suggesting that the FPGA be run at 256 x 21 MHz, but rather 21 MHz samples be directly accumulated and decimated simultaneously over 256 iterations. Clearly it's not the traditional topology since the samples aren't buffered and the impulse is non-continuous, but it is some kind of FIR filter.

Okay, I think I see what you're saying. What you're describing is not a FIR filter in any form.

You're saying Wikipedia's explanation of FIR is wrong?

Image
(x = input, y = output, b = coefficients, N = filter order)

Filter output is only dependent on N+1 previous input samples. So we can calculate output at y[255] using x[0]..x[255], y[511] using x[256]..x[511] and so on. As long as the ratio we're downsampling by in the end is the same as the number of taps we don't need to process any of the input samples more than once.

by on (#82382)
thefox wrote:
You're saying Wikipedia's explanation of FIR is wrong?

No, but you also need to read it more carefully. And I also wouldn't rely entirely on Wikipedia for an in-depth explanation of FIR filters. Lol. Wikipedia says: "The top part is an N-stage delay line...". The Wiki doesn't go into detail but the key words there are "delay line". That is, you don't take N new samples at a time, rather you take *1* new sample at a time by shifting in a new sample into the delay chain on each clock cycle. I would go check out some other sites or maybe those old dusty things that no one uses anymore called "books". Hehe. ;)

The number of samples that go into a FIR filter is _always_ equal to the number of samples that come out of the FIR filter. Consider the following (completely plausible) FIR filter design scenario:

- The input to the FIR filter is a continuous set of 256-sample digital waveforms.
- The FIR filter is a 256-tap low-pass filter.

With your implementation you would only get one sample out of your filter for each waveform. LOL. That is not a waveform. On the other hand, with a proper FIR filter using a delay chain you would always get a filtered 256-sample waveform (this is the "convolution" process) on the output.

Now, with that said, you can perform "decimation" on the *output* of the FIR filter - which drops samples. Or you can perform "down-sampling" on the samples _before_ they are input to the FIR which also removes samples. But either way, the number of samples into the FIR is always equal to the number of samples out.

So, again, what you are describing is _not_ a FIR filter. Period.

Here are some more sites to educate you on FIR filter operations/calculations/convolution:
http://www.dspguru.com/dsp/faqs/fir/basics
http://www.netrino.com/Embedded-Systems/How-To/Digital-Filters-FIR-IIR
etc etc etc

Pz!

by on (#82383)
jwdonal wrote:
The number of samples that go into a FIR filter is _always_ equal to the number of samples that come out of the FIR filter. Consider the following (completely plausible) FIR filter design scenario:

- The input to the FIR filter is a continuous set of 256-sample digital waveforms.
- The FIR filter is a 256-tap low-pass filter.

With your implementation you would only get one sample out of your filter for each waveform. LOL. That is not a waveform. On the other hand, with a proper FIR filter using a delay chain you would always get a filtered 256-sample waveform (this is the "convolution" process) on the output.

Now, with that said, you can perform "decimation" on the *output* of the FIR filter - which drops samples. Or you can perform "down-sampling" on the samples _before_ they are input to the FIR which also removes samples. But either way, the number of samples into the FIR is always equal to the number of samples out.


You've managed to really confuse yourself. A unified FIR-and-decimate object IS valid, despite your insistance that it's not. The thing thefox is describing is exactly what I said earlier: You only need to run the FIR filter for each OUTPUT sample. After all, if the decimation just throws away the results for most of the previous FIR results, why calculate them at all?

by on (#82384)
lidnariq wrote:
You've managed to REALLY confuse yourself. A unified FIR-and-decimate object IS valid, despite your insistance that it's not.

Haha. Negative. What I said was that what was being described was not a FIR. And I'm right. What you're describing is what you've coined as a "unified FIR-and-decimate".

"Unified FIR-and-decimate" != "FIR"

In other words, if a customer at my job asked me to code up a FIR filter I would not provide them with this "unified FIR-and-decimate" filter that you have described. Because that's not what they're asking for.

And I still don't think that I agree with the term "unified FIR-and-decimate" but it doesn't matter. I will definitely try out this filter that you have described and if it works I will be extremely happy and will happily let everyone know that it works. And I will have learned something new. If it doesn't work then nothing has been lost. :)

Again, I realize that you guys are trying to help and _please_ understand that I truly appreciate that. I'm just trying to explain the reasoning behind my thoughts.

Thanks!

Jonathon

by on (#82405)
lidnariq wrote:
You only need to run the FIR filter for each OUTPUT sample. After all, if the decimation just throws away the results for most of the previous FIR results, why calculate them at all?

Okay, so reading your post again that does make perfect sense to me. But then what sampling frequency should I use to generate the coefficients? Would it still be 21MHz? Or would it be 21M/256? Or something else? If it's still 21M then even with the unified decimation (decimation does not change a filter's response) the filter response would still be what I've shown here (which is incredibly crappy and is getting nowhere near the cutoff). Ideas?

I need the correct sampling frequency in order to generate the proper coefficient values with matlab.

Fcutoff = 20kHz
Fsampling = ??

by on (#82430)
jwdonal wrote:
But then what sampling frequency should I use to generate the coefficients? Would it still be 21MHz? Or would it be 21M/256? Or something else?

The filter coefficients should still match the input audio rate. That said, I don't understand why you're generating the audio at a higher rate than the NES did originally —you're not generating any genuine data; you're probably just adding a lossy interpolation stage (that's hopefully rejected by whatever FIR).
Quote:
If it's still 21M then even with the unified decimation (decimation does not change a filter's response) the filter response would still be what I've shown here (which is incredibly crappy and is getting nowhere near the cutoff). Ideas?

I'd generate the audio at the same rate the NTSC NES does — 1.789773MHz. At that point, I'd pick a FIR filter that achieved 40dB rejection at your cutoff -- Using http://www-users.cs.york.ac.uk/~fisher/ ... racos.html I've gotten useful response graphs with Fcutoff=16kHz, beta=.1, length=257 and beta=.9, length=501. Per his page, if you want to run the input audio faster, you'll want to increase the length of the FIR filter by the same ratio.

In terms of decimation ratios relative to S/PDIF rates, I see ~1% errors for 1789773Hz/41 -> 44.1kHz, 1789773Hz/37 -> 48kHz, and a 0.1% error for 1789773Hz/56 -> 32kHz (although that would need a different cutoff frequency for the filter).

One last anecdote that may be helpful: on the NES's audio output stage, there is a first order analog lowpass at 14kHz

by on (#82434)
The coefficients at http://pastebin.com/raw.php?i=SgwVDLjL can be used
with the algorithm I described earlier, with 256 multiplies per output sample.

Phases: 12, Output rate: 44101.176031, 487 12

by on (#82552)
lidnariq wrote:
You only need to run the FIR filter for each OUTPUT sample. After all, if the decimation just throws away the results for most of the previous FIR results, why calculate them at all?

Hello all! I have an awesome update! What lidnariq described above is something that I had never learned/known about FIR filters before. But after I read it and it made sense to me I ventured to go and try to figure out how to do it. And I did!

Lidnariq has previously referred to his filter description as a "unified FIR-and-decimate". I have come to learn that these types of filters are also very often called "polyphase" FIRs. And dang are these things freakin _awesome_. You can implement insanely efficient FIR filters with very little resource usage.

I still haven't figured out exactly how Kevtris implemented his but I imagine it was some type of polyphase filter. But even if it was, his description still does not make complete sense to me nor can I see how it would work properly given the clock rates he mentions. But regardless, I don't really care anymore at this point since my new FIR implementation is so efficient. I can now implement a 512-tap FIR with only 32 multiplies! :-o And if I cared to increase the filter's operating clock frequency I could do it with even fewer. Not only that but the frequency of the output samples from the FIR is already decimated to the exact frequency that I need. Rockin!

I would like to thank everyone who helped me and also thank them for their persistence in trying to beat this into my noggin. :) I have learned at least twice what I originally knew about FIR filter implementations.

Here are some output samples from Solstice (noise channel only) with my new polyphase FIR filter in case you're interested. It's coolest if you DL them all and then compare them to one another - it's really neat to hear how the increasing number of taps steadily improves the aliasing and pitch.

- Solstice (Noise Channel Only) 16-tap
- Solstice (Noise Channel Only) 32-tap
- Solstice (Noise Channel Only) 64-tap
- Solstice (Noise Channel Only) 128-tap
- Solstice (Noise Channel Only) 256-tap
- Solstice (Noise Channel Only) 512-tap

Thanks again to everyone!

Jonathon :)

by on (#82556)
The pitch changing is very suspicious, and sounds like something else is up.
I'll go whip up a version without any anti-aliasing to see if has the same problem.

edit: Looks like the pitch change with aliasing is correct.

by on (#82589)
Here I ask a (possibly stupid) question: why not using an IIR filter? (Is stability really an issue here?)

by on (#82591)
FIR is easier to parallelize, among other things. (Read here.)