Force Blank timing issue - SNES vs. higan

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Force Blank timing issue - SNES vs. higan
by on (#136202)
I've been using a single interrupt to extend VBlank. Once the stuff that needs doing is done, I poll the V-counter until it hits the scanline I want, and then poll the H-counter until it passes the dot I want. The idea is to turn on the screen in between when the last blank scanline finishes and when the first non-blank scanline starts.

Code:
; wait for the right moment to turn the screen back on:
   rep #$20          ; 16-bit accumulator
   lda #$2100        ; set direct page to PPU regs
   tcd
blankline:
   sep #$20          ; 8-bit accumulator
   lda $37           ; latch H/V counter
   lda $3D           ; read V-position low byte
   xba               ; flip bytes in accumulator
   lda $3D           ; read V-position high byte
   and #$01          ; discard upper 7 bits
   xba               ; flip accumulator bytes again
   rep #$20          ; 16-bit accumulator
   cmp #$0004        ; check for last blank scanline
   bne blankline     ; repeat if not there yet
blankdot:
   sep #$20          ; 8-bit accumulator
   lda $37           ; latch H/V counter
   lda $3C           ; read H-position low byte
   xba
   lda $3C           ; read H-position high byte
   and #$01          ; discard upper 7 bits
   xba
   rep #$20          ; 16-bit accumulator
   sbc #$00D7        ; check if past end of last blank scanline (minus overhead to finish loop and turn screen on)
   bmi blankdot      ; repeat if not there yet
   
   sep #$20          ; 8-bit accumulator
   lda $0010.w       ; turn the screen back on (brightness stored in RAM)
   sta $2100

I know I could simply toggle the IRQ and use it twice, and I suspect it would be faster as well as more accurate, but I encountered something very odd with this method that I think might be relevant.

In higan accuracy, the above dot target (215) is just far enough along that in the context of the rest of the code, the screen always gets turned on during HBlank, and no artifacts are seen. I can move it as far forward as 221 before it starts cutting into the next scanline.

On my real SNES, however, while 214 is not quite far enough along to eliminate pixel flutter at the end of the last blank line (and even this happens more than in higan), it is barely far enough along that the screen sometimes isn't turned on until several pixels into the first active line. Which means I can't use this method at all; it apparently has negative wiggle room.

(It worked fine before, but I'm in the middle of adding a lot of functionality, and apparently it disturbs the loop start timing enough to render this technique non-viable. Unless there's a bug I'm missing...)

I'm using all eight HDMA channels, but the screen is de-blanked between scanlines 4 and 5, and none of the HDMA channels have anything to do between line 0 and line 39 or so.

Is this a known issue with higan? I do seem to recall the mode change taking effect at a slightly different horizontal position vs. a real SNES...
Re: Force Blank timing issue - SNES vs. higan
by on (#136205)
Random thoughts (not fully understanding what's going on in the code there -- well, that is to say, I see what's going on but I don't see any register accesses hence not fully understanding):

1. Is there any general DMA going on at the same time or around the time of any of this? I ask because, well, see the dev manual, section titled "Documented Problems". I'm not sure if this is relevant to the situation or not.

2. Could this at all be worked around by using $4207/4208 + $4209/420a + $4211 + bits 5/4 of $4200 to get more precision, rather than sitting around in a loop?
Re: Force Blank timing issue - SNES vs. higan
by on (#136208)
koitsu wrote:
1. Is there any general DMA going on at the same time or around the time of any of this? I ask because, well, see the dev manual, section titled "Documented Problems".

Could you summarize in your own words?
Re: Force Blank timing issue - SNES vs. higan
by on (#136209)
koitsu wrote:
Random thoughts (not fully understanding what's going on in the code there -- well, that is to say, I see what's going on but I don't see any register accesses hence not fully understanding):

1. Is there any general DMA going on at the same time or around the time of any of this? I ask because, well, see the dev manual, section titled "Documented Problems". I'm not sure if this is relevant to the situation or not.

No. General-purpose DMA is over by the time this loop is entered, and since I'm only refreshing CGRAM and OAM at the moment, it should be over well before HDMA needs to start up; in fact I explicitly turned HDMA off during the DMA period. And I can't use DMA outside my VBlank routine, because HDMA uses all eight channels.

Still, this makes me realize that I should be very careful with how I handle the gameplay screen, as the DMA requirements are quite substantial and the extended blanking period is much larger. I don't want to write a game that doesn't work on a rev. 1 CPU, because unlike with the SNES Jr. it's very hard for a normal user to tell the difference.

Quote:
2. Could this at all be worked around by using $4207/4208 + $4209/420a + $4211 + bits 5/4 of $4200 to get more precision, rather than sitting around in a loop?

Yes. I actually mentioned the possibility right under my code snippet, but when I originally wrote this I was trying to limit myself to one IRQ per frame and at the same time learn how the H/V latch and read functionality worked. (I still think what I've done seems unnecessarily fiddly. Is there really no better way to get that 9-bit value out?)

I will probably end up changing it so the IRQ toggles itself between the pseudo-NMI routine and a short 'turn on the screen and return' deal. Mostly I was wondering about the apparent timing difference between the real system and the most accurate known emulator...

...

I wonder whether the Super Everdrive OS could have disturbed the timing simply by virtue of running before the actual ROM without a reset in between, resulting in the observed discrepancy vs. higan...? But if that were the case, I'd expect a much smaller discrepancy, since the timing is refreshed each frame by the IRQ, and IIUC the IRQ should have no more than a pixel or so of jitter when it interrupts a WAI...

Could this have anything to do with the fact that the artifacting on the left hand side of the first active scanline looks like it comes in 8-pixel chunks on the real system, but definitely doesn't in higan? Or is this just a coincidence (the artifacting on the right side of the last blank scanline isn't in 8-pixel steps on either platform)?
Re: Force Blank timing issue - SNES vs. higan
by on (#136210)
tepples wrote:
Could you summarize in your own words?

Nope, since there are a pair of diagrams that make it clearer. Look what I found on the web!

But thankfully this isn't the situation that 93143 is running into. I think his "pseudo-NMI vs. screen-on-and-return" methodology sounds best for this situation (easy to implement). The timing concern, though, is still interesting. byuu et al probably have ideas, and I'm curious to know what it might be.
Re: Force Blank timing issue - SNES vs. higan
by on (#136218)
> Is this a known issue with higan?

I've already told you in the other thread that my dot-based renderer is still in the guess-work phase.

It's the first (and so far, only publicly available) dot-based SNES PPU renderer. Everything else public renders the entire line immediately.

Unfortunately for the two years it existed and I was in a position to do a lot of research and improvements, nobody was interested in what I was working on.

Now that I'm backlogged on 100 dependencies that I have to resolve before I can continue to work on higan further, I'm getting almost weekly reports about rendering issues (this, Smash It demo mosaic, Anthrox split-screen, etc.)

So ... sorry. Yes, it's buggy. Because it's new, and hasn't had much of any help by anyone else. Yes, you're going to keep finding these bugs if you are trying to do mid-scanline effects. It's going to be at least 6-18 months before I am able to work on higan again. I'm really that back logged.

> Mostly I was wondering about the apparent timing difference between the real system and the most accurate known emulator...

higan is definitely the most accurate emulator by orders of magnitude, but you have to understand that this is not something to praise higan for: it's a testament to how terribly awful everything else is.

We still have a very, very long way to go in SNES emulation.
Re: Force Blank timing issue - SNES vs. higan
by on (#136234)
byuu wrote:
I've already told you in the other thread that my dot-based renderer is still in the guess-work phase.

Fair enough.

Quote:
It's going to be at least 6-18 months before I am able to work on higan again. I'm really that back logged.

Don't let me pressure you. I've got enough going on myself (and am new enough at SNES programming) that I doubt I'll be able to do much more than noodle around for at least that long...
Re: Force Blank timing issue - SNES vs. higan
by on (#136251)
Probably a stupid supposition as i'm not a SNES coder and i'm not sure about how the HV counter is running but can't you miss the #$0004 HV values in your initial wait loop ? At least on Megadrive we would avoid using BNE conditional jump here else we could miss the value and wait severals frames before catching it...

Code:
   sep #$20          ; 8-bit accumulator
   lda $37           ; latch H/V counter
   lda $3D           ; read V-position low byte
   xba               ; flip bytes in accumulator
   lda $3D           ; read V-position high byte
   and #$01          ; discard upper 7 bits
   xba               ; flip accumulator bytes again
   rep #$20          ; 16-bit accumulator
   cmp #$0004        ; check for last blank scanline
   bne blankline   


Edit: just realized that you are just testing the V counter here, no way to miss it ;)
About extending VBlank, i believe using IRQ would be really easier and also faster !
Re: Force Blank timing issue - SNES vs. higan
by on (#136257)
HV IRQ on the Super NES fires even during forced blanking, right? That was the main problem back on the NES: sprite 0 and MMC3's interval timer refused to fire during forced blanking.
Re: Force Blank timing issue - SNES vs. higan
by on (#136263)
93143 wrote:
Could this have anything to do with the fact that the artifacting on the left hand side of the first active scanline looks like it comes in 8-pixel chunks on the real system, but definitely doesn't in higan? Or is this just a coincidence (the artifacting on the right side of the last blank scanline isn't in 8-pixel steps on either platform)?


Higan's dot based renderer fetches each layer's nametable and pattern data from VRAM on the exact pixel it's first needed, which depends on the horizontal scroll register for that layer. This obviously can't be and isn't how a real SNES PPU works--if all the layers had the same horizontal scroll it would have to fetch them all at once, but it can only fetch one word from VRAM at a time. A real SNES works like the NES does: it fetches nametable data and pattern data in a round robin sequence regardless of the scroll registers, and the fine horizontal scroll (the low 3 bits of the scroll register) determine how many pixels the data for each layer is buffered before being output.

On the NES, each byte fetch takes two pixel clocks and the sequence is "one nametable byte, one attribute byte, two pattern bytes" every 8 pixels. On the SNES the sequence also loops every 8 pixels, but each word fetch takes one pixel clock and the sequence is a varying mixture of nametable data, offset-per-tile data and pattern data depending on the BG mode. That's the reason the SNES modes are what they are and why offset-per-tile is different in mode 4 than in mode 2 or 6: if you add up the number of layers (= nametable fetches), the number of offset-per-tile tables (two for mode 2/6, one for mode 4) and the number of pattern data fetches (one word per two bitplanes in normal resolution, one word per bitplane in mode 5/6) every mode adds up to exactly 8, except mode 6 which has a wasted cycle. If mode 6 had two 2bpp layers instead of one 4bpp layer it would use all 8 cycles, but Nintendo's engineers evidently decided two 2bpp layers wasn't colorful enough and one 4bpp layer was more useful (even so, I'm not sure if any commercial game uses mode 6)

I have a plan for determining what the exact sequence and timing is for each mode by using non-mode-7 EXTBG to spy on what the PPU is fetching, but I don't know when I'll get around to it and whether it'll even work on hardware (I'm not sure if it's possible to display the EXTBG in all modes)

My idea is:

Set up VRAM so that each fetch type returns a different, constant value (i.e. every word in a particular layer's nametable is identical and points to a solid colored tile)

Set up the palette so that each of those values shows up as a different, contrasting color when interpreted as EXTBG data.

Set the TM register so that only the EXTBG layer is visible.

Set the mosaic size on the EXTBG layer to 9 pixels to make the output easier to see.

The main obstacle is, as I said, that I'm not sure exactly how EXTBG works in non-mode-7. Is it always layer 2, or is it the layer after the last "real" layer? (the latter seems more useful if EXTBG really was originally designed to mix in layers generated by expansion chips) If it's always layer 2, does it replace the real layer 2, or is it mixed with it somehow?
Re: Force Blank timing issue - SNES vs. higan
by on (#136267)
Mainly for AWJ: I'm under the impression EXTBG only applies (i.e. is only visible/usable) in mode 7. The only time EXTBG is mentioned, other than in the "mode capabilities chart", is within the "BG SC data (mode 7)" section. Other docs I find online hint at the same thing.
Re: Force Blank timing issue - SNES vs. higan
by on (#136269)
koitsu wrote:
Mainly for AWJ: I'm under the impression EXTBG only applies (i.e. is only visible/usable) in mode 7. The only time EXTBG is mentioned, other than in the "mode capabilities chart", is within the "BG SC data (mode 7)" section. Other docs I find online hint at the same thing.


It's only useful for the purpose of displaying game graphics in mode 7, but I'm sure I remember reading somewhere that you can turn it on in other modes and see a layer of garbage pixels reflecting what is on the VRAM data bus.
Re: Force Blank timing issue - SNES vs. higan
by on (#136275)
I cite http://problemkaputt.de/fullsnes.htm for this detail, which might partially explain what's going on...

Quote:
EXTBG is an "external" BG layer (replacing BG2???) enabled via SETINI.6. On the SNES, the 8bit external input is simply shortcut with one half of the PPUs 16bit data bus. So, when using EXTBG in BG Mode 0-6, one will just see garbage. However, in BG Mode 7, it's receiving the same 8bit value as the current BG1 pixel - but, unlike BG1, with bit7 treated as priority bit (and only lower 7bit used as BG2 pixel color).
Re: Force Blank timing issue - SNES vs. higan
by on (#136305)
> Don't let me pressure you. I've got enough going on myself (and am new enough at SNES programming) that I doubt I'll be able to do much more than noodle around for at least that long...

It's all good. I'm happy that you're digging out problems, and we can add them to the list.

But I'd hate for you to be under the impression that higan is a perfect emulator or anything, or that I'll be able to help out in fixing things quickly like I used to. As long as you're okay with that, then by all means, keep up the good work (at your own pace, of course) :D

> This obviously can't be and isn't how a real SNES PPU works

I agree. And I must say, you sound a lot like TRAC and anomie (which is a very high compliment, if it wasn't clear), in that you have a lot of the theory of operations down very well. Moreso than I do.

The problem I find is that turning the theory into practice is a lot more difficult than it sounds.

There are definitely a lot of design decisions I don't strongly care for, but they were made because all of my attempts to do things differently didn't really pan out. It's extremely challenging to model massively parallel operations into single threads. The PPU is absolutely nothing like a computer program running. So despite understanding the bandwidth limitations and interleave patterns, turning it into working C++ code has eluded me thus far.

I am certainly hoping you'll be able to help out some more in this regard, and I would be extremely grateful for any additional insight you could provide. I'm still very appreciative for the awesome work on the hires color blending.

> I'm under the impression EXTBG only applies (i.e. is only visible/usable) in mode 7

nocash has said that it works outside of mode 7. I have not actually attempted any tests to see if you can in fact create visible artifacts by attempting to use it, but I trust him on this.
Re: Force Blank timing issue - SNES vs. higan
by on (#136400)
Well, I got it working. (Not just the VBlank thing; the whole code.)

It's a mockup of the title screen and main menu for the shooter I'm going to try to port. VRAM and OAM are essentially full, all 8 HDMA channels are busy making the most of the CGRAM, both mask windows are in use, a good number of the sprites use add+half transparency, and my slightly extended VBlank (using a pair of interrupts now, rather than a single one with H/V polling) is almost entirely occupied by DMA and related operations, at least in certain frames.

(Protip: If your IRQ might be interrupting a subroutine that changes the data bank, it should change it back before doing anything else...)

The CPU is so busy running my inefficient code rendering a ~5 kB chunk of the background (because I ran out of VRAM and OAM) and recalculating 691 colour entries (the HDMA tables and most of CGRAM) that the animation chugs along at 12 frames per second. I could have hit 20 or 30 with a lot more precomputed data in ROM and maybe some smarter coding, but I plan to have the Super FX handle all that stuff in the real thing, and if there are two things that chip is good at, they are bitplane blitting and multiplication. I figure I can get 60 fps with the Super FX.

Peak colour count on screen is somewhere north of 800. The program runs perfectly on real hardware, and on every emulator I tried... except no$sns, which doesn't respond to controller input, and also seems to mess up the HDMA slightly, resulting in garbage pixels in the middle of the screen. (...I guess that's not really helpful without the code, is it?)

...

The only thing that still bugs me (aside from the horrible brittle mess that is this code) is the fact that I eliminated a sporadic, timing-sensitive, apparently data-sensitive (&?*#@$!) colour transform glitch by removing the direct page reset from the start of the IRQ. I hate it when something works without me knowing why, especially when it involves not doing something that I could very well need to do in the future...
EDIT: ARM9 figured it out. Really basic proper IRQ usage - I ended up clobbering B, but only with the DP reset; everything else was pure 8-bit. That's why removing the DP reset fixed the problem (obviously if any of my code had changed DP this wouldn't have been a good idea).

Even so, it does work. I have demonstrated to myself that this part of the game can be satisfactorily rendered on the Super NES.

...

Interesting technical discussion too. My initial question was kinda soggy, but it seems to have produced useful dialogue...
Re: Force Blank timing issue - SNES vs. higan
by on (#136435)
Your last post sounds very intriguing and exciting ! Seems you are really pushing the SNES to its limit, impatient to see the result =)
Re: Force Blank timing issue - SNES vs. higan
by on (#136455)
It makes me wonder what in the world would use that many colors?
Re: Force Blank timing issue - SNES vs. higan
by on (#136457)
I think he was porting a game to the SNES? The source image could have been true color for all we know.
Re: Force Blank timing issue - SNES vs. higan
by on (#136566)
Yeah, the original background image had a lot of colour depth. Rescaled to 256x216 at 15bpp, it still has a couple thousand colours. But it looks almost identical with 548 colours, whereas with 212 the decline in quality is more evident. So, HDMA.

There are 44 colours reserved for sprites. The second big jump in colour count, to over 800, is a result of translucent sprites overlapping the most chromatically diverse area of the image.

Stef wrote:
Your last post sounds very intriguing and exciting ! Seems you are really pushing the SNES to its limit, impatient to see the result =)

Well, that's certainly pleasant to hear... But you may have to wait a while - seven months ago I knew nothing, and I'm still very much in the learning phase. And since I'm trying to finish a degree this year, most of my mental resources have to be allocated elsewhere.

Once I start coding the game for real, I might start posting WIPs, but right now I can't guarantee anything, which is why I've been keeping the project details close. (Maybe I should have kept quiet about this test too, but I was so pleased with myself for getting the silly little thing running that I didn't think of that...)

tepples wrote:
HV IRQ on the Super NES fires even during forced blanking, right? That was the main problem back on the NES: sprite 0 and MMC3's interval timer refused to fire during forced blanking.

I forgot about this question. Yes, evidently it does, since the new method works fine.

The more I hear about NES programming, the more glad I am that I'm starting with the Super NES. If I had a plan for a NES game I'd obviously feel differently, but there are enough weird things about that old workhorse that didn't carry over that if I had attempted to use it as a trainer I might have had problems making the transition... This game is going to demand everything the SNES can give it; I figure it's better to go in without preconceptions.