Any research done on writing to oam durring active display?

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Any research done on writing to oam durring active display?
by on (#182816)
It's really killing me as to whether I should go with 8x8 and 16x16 or 16x16 and 32x32 sized sprites, because I want to minimize drop out (per line) but what you can do overall is severely limited and often makes you have to sacrifice a BG layer when you wouldn't have to otherwise. I know you can't write to oam during hblank (As that's when the data is being pulled to draw sprites? I'm not 100% sure) and that you can't do it during active display because of some sort of internal processing or whatever that puts whatever you're trying to send into hioam. What I want to know though, is exactly how this information ends up in hioam, and where in hioam it even ends up. Would there ever be the slight possibility of unorthodoxly updating oam by sending data somewhere else at a specific time or something along those lines? I think this has been talked about, but I don't know if anyone has ever tried it.
Re: Any research done on writing to oam durring active displ
by on (#182824)
All I can go off of is Anomie's timing doc and what the official documentation says.

Anomie's timing doc explains where OAM comes into play during HBlank (section "DETAILED RENDERER TIMING"), as well as the quirk about the internal OAM address being reset (normally set by $2012/2013) (see section "OAM RESET"; byuu provided this info).

Everything I've read in official documentation states clearly that updating OAM data (through $2104) can only happen during VBlank or forced blank. The NES is the same way. Outside of VBlank and HBlank, the electron gun is actively "drawing data" being fed by the PPU (from both OAM and VRAM) (and actually, it's even before VBlank starts in one regard, re: what the PPU is does in preparation for the first rendered scanline -- again, see "DETAILED RENDERER TIMING"). HBlank doesn't give you enough time cycle-wise to do anything with OAM even if you could.

The only thing I've seen refute this is here: http://problemkaputt.de/fullsnes.htm#sn ... ryaccesses -- specifically the last 2 lines in the "PPU OAM load" section (I believe the part about Mario Kart using forced blanking refers to the blank/empty lines that split the two sections of the screen (top (game) and bottom (map))). You still wouldn't be left with much time at all, and certainly not enough to do large-scale OAM DMA transfers.

Please explain what you mean by "minimise drop out per line". *If* you're talking about the 32 sprites per scanline limitation, then you need to design your game with that in mind; else see chapter 20.3 ("Priority Order Shifting") for how to deal with changing the OBJ priorities so that the "drop out" is only for 1 frame. OBJ size doesn't have any bearing on this situation -- refer to chapter 20.1 and 20.2 (particularly 20.2, "35's Time Over") for why (also see $213e in the official documentation), or for an alternate reading (though very, very badly formatted) refer to https://wiki.superfamicom.org/snes/show/Registers and bit 7 of register $213e. The point I'm trying to convey here is that it doesn't matter if you use 16 or 32-pixel-wide sprites -- the PPU internally understands these to be 8x8 tiles (in effect), and thus are subject to the same limitation. Amusingly, this subject has already come up in the past -- by none other than you: viewtopic.php?f=12&t=13081 :D
Re: Any research done on writing to oam durring active displ
by on (#182838)
koitsu wrote:
Outside of VBlank and HBlank, the electron gun is actively "drawing data" being fed by the PPU (from both OAM and VRAM)

(WARNING: More than likely nonsense, especially after reading the Fullsnes document) I thought I heard that during HBlank, a line buffer is filled out in the PPU (the sprite "layer") so that would be the only time OAM would be clearly off limits. I don't know why you wouldn't be able to write to OAM during active display, even though it's said you can't. I doubt it said anything about being able to write to HiOAM outside of VBlank, which Uniracers apparently does.

Reading that, is the "1364 master cycles of a scanline" including HBlank? I'm assuming so, according to this: "aside from the 256 known/used dot-cycles, there may (or may not) be up to 84 unused dot-cycles... possibly allowing to change OAM during Hblank(?)." Going along with what you said, Hblank goes after (not before) active display, making the first off screen scanline necessary? This seems to further prove what I thought earlier as a bunch of nonsense.

koitsu wrote:
You still wouldn't be left with much time at all, and certainly not enough to do large-scale OAM DMA transfers.

I wasn't expecting (or needing) much. Even if you could only update one sprite per line, that would still be a big help.

koitsu wrote:
Please explain what you mean by "minimise drop out per line". *If* you're talking about the 32 sprites per scanline limitation,

I meant the 34 tile per line limitation. The chances of ever running into the 32 sprite per line limitation before the 34 tile one are near nonexistent.

I'm glad you responded though. It gives me some hope through those potentially 84 unused "dot-cycles". It seems (but probably isn't) easy to try and test. If there are 1364 master cycles in a scanline, then how many are used toward active display and how many are used toward HBlank?
Re: Any research done on writing to oam durring active displ
by on (#182841)
Here's how the 1364 qpels (quarter pixels) in a scanline are split up, as far as I can tell:

4 are for chroma alignment ("long dots")
64 are for preroll (priming the pixel pipeline for background scrolling in modes 0 through 6)
1024 are picture (and presumably finding which sprites cross the next scanline)
272 are for fetching pattern data for sprites on the next scanline

In each pixel (4 qpels), the PPU can read one low byte and one high byte from VRAM. This means it can fetch one 8x1 pixel sliver of a sprite in 8 qpels, hence 272/8 = 34 slivers per line.
Re: Any research done on writing to oam durring active displ
by on (#182843)
It seems to me that even if there's room in HBlank for OAM writes as nocash describes, it could very well be only on lines with not very many sprites to render, which would limit the practical applicability. Even if that's not true, you apparently can't rewrite the OAM address, so you'd have to rely on it auto-incrementing from the top of HiOAM, which we aren't sure it does. You could force blank and see if that allows you to rewrite the address, but then you'd absolutely be relying on there being few or no sprites (or perhaps just no sprites) on the next line, plus you'd have to re-enable rendering soon enough to not glitch the BG layers... oh, and you can probably forget about using much (if any) HDMA; there's only so much time in HBlank...

If OAM isn't actually locked out during active display (CGRAM isn't), it might be possible to align a DMA precisely enough to piggyback on the PPU's OAM read address to rewrite sprites as the PPU scans them, but that's... really extreme. Even if it worked, it would have to be tested repeatedly on every known model of SNES to make sure it worked reliably. I'd also mention DMA/HDMA conflict, but if your timing is good enough to pull this off you can probably dodge that bug comfortably...

...

Could you not use 8x8 and 32x32 sprites?
Re: Any research done on writing to oam durring active displ
by on (#182856)
93143 wrote:
Could you not use 8x8 and 32x32 sprites?

I've considered it, but ultimately decided it would be even worse. Too many things would be 16x16 for that to work, more than things would be 32x32.

Perhaps I asked this question prematurely, because even if I programmed a test, I don't have a flash cart yet (I'm getting one for Christmas) and I seriously doubt an emulator would be accurate in telling me if it would work or not.

Anyway...

93143 wrote:
It seems to me that even if there's room in HBlank for OAM writes as nocash describes, it could very well be only on lines with not very many sprites to render, which would limit the practical applicability.

If this is the case, do you know if there's a way to stop the SNES from trying to look for sprites to render prematurely (limiting the number of sprites per line, but hopefully not sprite pixels which is more important) to guarantee you can send over data? I thought I heard that sprite multiplexing is done like this on the GBA.

93143 wrote:
Even if that's not true, you apparently can't rewrite the OAM address, so you'd have to rely on it auto-incrementing from the top of HiOAM, which we aren't sure it does.

We aren't sure it auto-increments at all?

93143 wrote:
but that's... really extreme.

How? (I don't really know much about the whole thing.) I do see one problem with this though, and that's DMA during active display, which means you can't use HDMA if you're trying to target a 1/1/1 console.
Re: Any research done on writing to oam durring active displ
by on (#182868)
Espozo wrote:
do you know if there's a way to stop the SNES from trying to look for sprites to render prematurely (limiting the number of sprites per line, but hopefully not sprite pixels which is more important) to guarantee you can send over data?

The problem is the rendering itself, which is done during HBlank. And from fullsnes I get the impression that the behaviour is unknown. If the PPU caches the data it picks up while checking the 128 sprites during the active scanline, OAM should be free during HBlank (but the address doesn't work and cannot be rewritten, according to byuu). If the PPU re-fetches OAM data for the 32 sprites/34 tiles during HBlank, OAM can only be freed by having fewer sprites on the following line (ie: either a fortuitous lull or a reduction in the per-line limits, which is not what you want). However, based on what byuu says I suspect it does cache the data; if this is true, the key question is whether it auto-increments when writing during HBlank.

Quote:
We aren't sure it auto-increments at all?

No, it auto-increments when in VBlank or forced blank. During HBlank, though, the CPU-side OAM address is invalid, and it's not known what the internal behaviour is.

Quote:
93143 wrote:
but that's... really extreme.

How?

DMA alignment is not trivial to cope with (it kicks the timing slightly depending on the length of the most recent instruction cycle and on how long it's been since reset), and if you're running HDMA you have to deal with it happening every line on top of whatever else you're doing... This means that (a) if you're using HDMA you can't rely on having 1324 cycles per line (1364 minus 40 for DRAM refresh), because the HDMA will eat a variable amount, and (b) you can't pick an arbitrary line to do your DMA on, because it will snap to a dot-pair boundary and you have to keep track of the alignment pattern (which flips every two frames) to decide what to write when.

If the OAM requires half-dot timing precision to target a particular byte/word, you'd have to do the writes manually because DMA can't do that. And since IINM there's no way to directly get half-dot position on the CPU side, you'd have to figure out a way to test with actual writes in the middle of an active frame, which may be impossible.

If it requires quarter-dot precision (which I doubt), you're out of luck, because there's no way to do that at all; nothing the S-CPU ca do takes an odd number of master cycles...

That's not even getting into DRAM refresh, which seems to have some weirdness around the edges that's different between revisions...

Quote:
I do see one problem with this though, and that's DMA during active display, which means you can't use HDMA if you're trying to target a 1/1/1 console.

Nah, that's not an issue. Probably. The timing requirements for this (assuming it's even possible, which has not been established) would be so stringent that for a programmer capable of pulling it off, failing to avoid the lockup bug as described would be like a veteran bus driver crashing into a house. Now, if there's another way to trigger the bug that isn't known yet, that could give you trouble...
Re: Any research done on writing to oam durring active displ
by on (#182879)
93143 wrote:
ie: either a fortuitous lull or a reduction in the per-line limits, which is not what you want

I don't think a reduction of at least the sprite per line limit would be much of a problem.

I had thought of something unfortunate though, and that's that even if writes to OAM are auto incremented during HBlank, you can't access both sprite tables at the same time.

93143 wrote:
DMA alignment is not trivial to cope with (it kicks the timing slightly depending on the length of the most recent instruction cycle and on how long it's been since reset),

I'd have thought an HIRQ would handle synchronizing, unless the problem is that the DMA transfer needs to be set up before the line starts, so you'd be wasting a whole line waiting, which would be about every line so you wouldn't have any CPU time left over.

...If it isn't evident by now, I have no clue what I'm talking about... :lol:
Re: Any research done on writing to oam durring active displ
by on (#182883)
Espozo wrote:
93143 wrote:
ie: either a fortuitous lull or a reduction in the per-line limits, which is not what you want

I don't think a reduction of at least the sprite per line limit would be much of a problem.

It would be both. If it's reading OAM while rendering, the only way I can think of to stop it is to halt the rendering, which you do with forced blanking.

Quote:
I had thought of something unfortunate though, and that's that even if writes to OAM are auto incremented during HBlank, you can't access both sprite tables at the same time.

Not so bad. All that means is that you can't multiplex small sprites into big ones or vice versa, or move sprites partway off the left-hand side of the screen.

I wonder if there's something I'm not thinking of. This seems too easy, compared to what I thought of the problem last time... not saying it'll work, but if it does it might actually be kinda useful...

Quote:
93143 wrote:
DMA alignment is not trivial to cope with (it kicks the timing slightly depending on the length of the most recent instruction cycle and on how long it's been since reset),

I'd have thought an HIRQ would handle synchronizing

No, an IRQ only has instruction precision, because the CPU finishes what it's doing before responding. Instructions are anywhere from 3 to 18 dots long. This is why my raster split IRQ has an H-counter read and cycle-counted branch, to reduce the range of H-values the BGMODE write can fall within - and I still need to avoid using very long instructions in my main code.

Besides, I'm not talking about that. Even if you have an IRQ handler that reads the position and branches all the way down to the point where you know you're on a particular dot, you still can't align a DMA like that because it will wait between 0.5 and 2 dots before starting, so as to line up with a multiple of 8 master cycles since reset. Notice that a line isn't a multiple of 8 master cycles long - this means that the starting points of a pair of DMA transfers triggered at the exact same horizontal position on two successive lines will be offset horizontally from each other by one dot.

When a DMA ends, it again waits between 0.5 and 2 dots to align with a multiple of the last instruction cycle before the transfer. This means HDMA can and will kick the timing by what is effectively a random amount if you've got free-running mixed-access code going when it fires...
Re: Any research done on writing to oam durring active displ
by on (#182884)
93143 wrote:
It would be both.

Okay. Yeah, that would render this just about useless.

93143 wrote:
All that means is that you can't multiplex small sprites into big ones or vice versa, or move sprites partway off the left-hand side of the screen.

That first one is actually a major blow. The thing is, 512 only 8x8 sprites wouldn't be much better than the pre existing 8x8 and 16x16 sprite option, unless you're making a bullet hell game or something. Any number higher than that (psychopathicteen said 900) would really be absurd, especially on the CPU side.

93143 wrote:
or move sprites partway off the left-hand side of the screen.

Yeah, this is the "least bad" part, as you could easily cover the area with windows. You could even forgo larger than 256 pixel tilemaps then. It is a little ridiculous to double the amount of space being taken up for the tilemap just for a measly 8 pixels.

93143 wrote:
I wonder if there's something I'm not thinking of. This seems too easy, compared to what I thought of the problem last time...

"too easy" :lol: Well, actually, it is just sending data to OAM at the end of HBlank, right? I don't really think this will work (why would there just be unused cycles?) but I figure it's worth a shot, I just can't do anything toward it at the moment unfortunately. (Even if I did have a flash cart though, I'd just be asking you what to do. :lol:)

Yeah though, DMA during active display looks like it's not really an option, unless you want to go the coprocessor route and fully dedicate the CPU to sending data to the video hardware.
Re: Any research done on writing to oam durring active displ
by on (#182903)
Espozo wrote:
512 only 8x8 sprites wouldn't be much better than the pre existing 8x8 and 16x16 sprite option

Keep in mind that HBlank is only big enough to rewrite about 10 sprites in the best case scenario, and it resets to the end of HiOAM every line. This means that only the bottom 7-10 sprites would be available for multiplexing with this method, meaning you'd still have at least 118 sprites that could be any size and position you desired. Plus there'd be no reason you couldn't divide the multiplexed sprites into small and large segments.

I should try this at some point. If it doesn't auto-increment during HBlank, it would be good to know that as soon as possible.
Re: Any research done on writing to oam durring active displ
by on (#182933)
You can write to OAM at any time. But the writes won't go where you want them to. Same thing for reads.

There's both an internal and external OAM address. During Vblank, the external address is used. That's the one you can say with OAMADDR. During active display, and this includes Hblank, the internal address is used. The internal address is adjusted based on what the PPU is fetching. In other words, it's reading the OAM memory for the purpose of rendering sprites, buffering tiles, etc.

I have the basic idea of this emulated in higan, but I never fully tested the seek patterns for really complicated setups like with range-tile over, or a smaller number of actually visible sprites, etc.

Bottom line to you: you can't really get away with this. Uniracers is the only game that does it, and I suspect they got really lucky that it worked at all, due to there only being two sprites on the screen.
Re: Any research done on writing to oam durring active displ
by on (#182982)
byuu wrote:
The internal address is adjusted based on what the PPU is fetching.

Isn't this the "extreme" way 93143 was talking about? I thought the other way sounded too good to be true. :lol:

byuu wrote:
you can't really get away with this.

It seems like you can get away with it if the CPU is doing nothing but passing data to the video hardware, (making a coprocessor necessary) like in the "phantom bitmap" demo. Of course, I don't really know what I'm talking about.
Re: Any research done on writing to oam durring active displ
by on (#182985)
Yeah, trying to catch the PPU in the act of accessing a specific byte/word/entry or whatever is pretty insane. Almost certainly harder than what I did with the direct colour demo, and potentially completely unachievable. I only mentioned it because we (as far as I'm aware) don't actually know it's technically impossible.

From where I stand, there's only a slim hope that any kind of significant sprite multiplexing could be reasonably possible on the SNES. If (a) the internal/PPU-side OAM address auto-increments after a CPU-side write during HBlank, and (b) the PPU caches the data it reads from OAM during active display, rather than reading it again during HBlank, then you've got a viable method - unless something else goes wrong. If either of those is not true (or something else goes wrong), the applicability is far more limited, not really worth considering unless you're in a very special situation.

(It's kind of annoying that there's no way to write 4 bytes to the same address with an HDMA channel. That would have been super useful for changing sequential entries in CGRAM, and if changing OAM somehow works it would be even more useful there...)
Re: Any research done on writing to oam durring active displ
by on (#182993)
The only viable trick with OAM mid-frame is a trick that some games use to make an entire background from sprites alone: you can change the tiledata address at the halfway point, but leave all the sprite attribute data alone.

It's not very useful. Yet several games do this for no reason: Megalomania OP, Winter Games main menu, etc.

You might find a use in making a static 256x224 background while using Mode7 as a giant single rotating sprite.
Re: Any research done on writing to oam durring active displ
by on (#182995)
93143 wrote:
If (a) the internal/PPU-side OAM address auto-increments after a CPU-side write during HBlank

If it doesn't, you're only writing to sprite 0's X position? :lol: I kind of wonder why you wouldn't think it would though. What would be the point in changing the behavior?

93143 wrote:
(b) the PPU caches the data it reads from OAM during active display, rather than reading it again during HBlank

You're talking about those potentially unused 84 "dot cycles"? As nice as it would be, I don't see this happening. How could "step 1" In the Fullsnes document take 256 cycles, while steps 2 and 3 do not take a single one?

byuu wrote:
Uniracers is the only game that does it, and I suspect they got really lucky that it worked at all

They sent the data for changing sprite 1 and 2's position in HiOAM just as it was being scanned by coincidence?
Re: Any research done on writing to oam durring active displ
by on (#183028)
Well, I tried it, and no joy. At least, not with the scanline basically full of sprites... It seems the PPU does in fact read OAM during HBlank, because on real hardware the writes are ending up all over the place. In higan, they all go to the highest HiOAM address actually used on the scanline (NOT the top of HiOAM like I thought), which implies the same thing if you think about it in context.

(Take this result with a grain of salt - I didn't do too much double-checking, so I may have screwed something up.)

I'd try to check for address incrementing, but it's late. Maybe tomorrow I'll try on a scanline without any sprites and see what it does...

Espozo wrote:
If it doesn't, you're only writing to sprite 0's X position?

No, it wouldn't even get that far, not even if sprite 125 or higher was on that particular scanline. It would repeatedly write to the last HiOAM byte the PPU looked at (assuming sprite rendering was finished - if not, it would go wherever the PPU was reading at the moment).

Quote:
As nice as it would be, I don't see this happening.

Well, it doesn't. It would have been nice if the PPU had turned out to internally cache the OAM data for the 32 sprites on the scanline, rather than having to read it again during HBlank, but I guess the designers figured there was no point adding extra circuitry just so that OAM could sit idle for 85 dot-cycles per line...

Quote:
They sent the data for changing sprite 1 and 2's position in HiOAM just as it was being scanned by coincidence?

More probably they sent the data after sprite rendering was finished, and the internal address had been left at that byte by the PPU.


byuu wrote:
The only viable trick with OAM mid-frame is a trick that some games use to make an entire background from sprites alone: you can change the tiledata address at the halfway point, but leave all the sprite attribute data alone.

It's not very useful. Yet several games do this for no reason: Megalomania OP, Winter Games main menu, etc.

You might find a use in making a static 256x224 background while using Mode7 as a giant single rotating sprite.

Are you serious? Actual games do that?

I thought it was just that one demo, and it didn't even do it right (used HDMA rather than a mid-line IRQ, thus interrupting rendering and causing glitches)... I was so proud of figuring out the right way to do it, too...

My (glacially proceeding) shmup port uses this trick for a Super FX layer on top of a Mode 7 background, which is why I was investigating it.
Re: Any research done on writing to oam durring active displ
by on (#183031)
93143 wrote:
byuu wrote:
You might find a use in making a static 256x224 background while using Mode7 as a giant single rotating sprite.

Are you serious? Actual games do that?

A static background plane behind a mode 7 main playfield is seen in On the Ball/Cameltry. And about half the bosses in Super Mario World, from when mode 7 was a brand new gimmick, likewise are drawn as a mode 7 layer with sprites for terrain.

Quote:
My (glacially proceeding) shmup port uses this trick for a Super FX layer on top of a Mode 7 background, which is why I was investigating it.

The same technique used in the title screen of Yoshi's Island, correct?
Re: Any research done on writing to oam durring active displ
by on (#183038)
Yeah, but what I believe he's talking about is specifically changing OBSEL partway down the screen, so as to access more than 16 KB of sprite data. I'm pretty sure none of the games you mention do that.
Re: Any research done on writing to oam durring active displ
by on (#183039)
I named two games that do it already. Give those a try and see for yourself.

Both of them had single-line errors in my scanline renderer at one point in time because the timing of when OBSEL is cached is really important with this trick.
Re: Any research done on writing to oam durring active displ
by on (#183043)
byuu wrote:
I named two games that do it already.

I know; I was referring to the games tepples named. I looked up the ones you mentioned on YouTube, but it's hard to tell a sprite layer from a BG layer when Mode 7 isn't involved. I believe you, so I don't think I'll bother pirating them to check.

I suppose it could be interesting to see how they do it - whether they make the change during HBlank or active display...