Drawing, Vblank, and NMI - A doc I whipped up

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic

Drawing, Vblank, and NMI - A doc I whipped up
by Disch on 2009-04-15 (#45683)

I wrote this because "how do I use NMI properly" and related seem to be a common question here. And I was bored and needed a break from my other project.

Enjoy:

http://dischmeister.googlepages.com/nesdrawing.html

by Celius on 2009-04-15 (#45689)

Hey, this is pretty good! I'm glad to see that someone took the time to sit down and explain all this. Good work .

by frantik on 2009-04-15 (#45690)

thanks a lot that was a great very indepth review of all the stuff we should know

the only thing i can think of to improve it would be to give examples of buffer loading routines, but i'm sure there's plenty of those about

by Bregalad on 2009-04-15 (#45692)

It's really a very great doccument, it was about time that someone took the effort of explaining all of that clearly. It's really perfect, even I learned something from it.

Oh and when you take the example of the frame being one hour, it'd make it even more sense if you make it being one day. The NMI is your alarm clock, and VBlank is your breakfast time. You have to not make it too long because you have to go to work (that is the work you have to do for the game engine on one frame). When finished, you go back home and sleep until next frame. Also this could explain slowdowns : If you have so much work to do that you haven't finished on morning, your alarm clock would still ring, and you should still take a breakfast, and after that you will return to your work (not begin another). When done you'd be so tired that you'd go immediately to bed (even if it's not night yet) and wait until next morning. Eventually, it can explain what happens when you read $2002, it just acknownledges the alarm clock. If you are constantly fetching it all the night you may not occasionally hear when it rings.
That makes sense in my opinion, but it's free of you to use it or not if you are making another version (the 1 frame = 1 hour example is good enough anyway).

This doc should be uploaded on romhacking.net (if it's not already done).

by Jon on 2009-04-15 (#45695)

Nice doc. Still reading through it, but keep getting interrupted at work. doh!

by Disch on 2009-04-15 (#45697)

I actually had a lot of fun with this, and already have ideas for some other topics. I'm thinking of doing a doc series in the same vein as the C++ FAQ LITE, but for nesdev. You can sort of tell I even drifted into that style midway through the doc (the topics start turning into questions)

Anyway glad you guys liked it. I'll probably redraft this and organize it better. This actually started as a .txt document and then I quickly converted to html (which is why there's some oddities -- like **emphasis** instead of emphasis)

Re: Drawing, Vblank, and NMI - A doc I whipped up
by Hamtaro126 on 2009-04-15 (#45701)

Please fix the ''6502 Code'' stuff in that document, it's a tiny mess!

other than that, it's a nice document

by Banshaku on 2009-04-15 (#45706)

This good be a nice candidate for the getting started section of the new wiki (if it ever sees the light of day).

Would it be an issue to post the content on it once it become available?

by Disch on 2009-04-15 (#45708)

Wouldn't be an issue at all. I'm actually reformatting, reorganizing, and touching it up now. Will be a lot nicer, but will take me a while.

Anyway, yeah, go ahead and use that however you want.

by Banshaku on 2009-04-15 (#45709)

Disch wrote:
Wouldn't be an issue at all. I'm actually reformatting, reorganizing, and touching it up now. Will be a lot nicer, but will take me a while.

Anyway, yeah, go ahead and use that however you want.

Great then. I will be more than happy to mention the source since I think that even thought were fighting right now for the license for the content, mentioning who made it doesn't hurt (my stance on the matter). I think it's important to at least mention the people that did the effort to offer such nice content in the first place.

I'm reading it at the moment while working, eh, I mean "while taking my break at work" (Of course boss! I'm studying some "new" design pattern ) and I like it. I like the example of data structure for the buffer (even thought we don't have to use them like you mentioned) since as a new programmer for the platform, you don't know yet how to do things unless you see those sample and try it for yourself after or have experience for a similar platform/way of programming. This make me things differently on how to do things and I like it. I never saw a "how to" like that for the nes before and this is great news.

Good work!

by noattack on 2009-04-15 (#45711)

Thanks, Disch. This is great for the slow learners like me.

by MetalSlime on 2009-04-16 (#45722)

awesome document! Thanks a lot Disch.

by Disch on 2009-04-16 (#45734)

http://nesdevhandbook.googlepages.com/index.html

It's a start! We'll see how much I can actually get done before getting bored and moving on. XD

by Bregalad on 2009-04-16 (#45739)

Great ! What are the handbook's chapters to come ?

I guess what I wrote about raster effects could be a chapter (html version with demoes).

Other chapters should incude about hardware, PPU fetches, cart types, etc... Probably basic algorithm about how to make a game, collision detection, how to encode maps, basic compression suff.

by Disch on 2009-04-16 (#45755)

Yeah I was thinking about stuff like that. OAM cycling, compression are things I was already thinking about. I like the idea of touching on collision detection too.

by frantik on 2009-04-16 (#45779)

ooo lots of potential here

instead of doing it all in HTML you might consider doing it wiki-style.. especially if it might end up on the future nesdev wiki

by tokumaru on 2009-04-16 (#45780)

Disch wrote:
OAM cycling, compression are things I was already thinking about. I like the idea of touching on collision detection too.

Although compression and collision detection are completely platform-independent. If you do decide to talk about more general concepts like these, I think you should create a very clear distinction between them and the NES-specific stuff. What you have so far (NMI, etc) fall into the "guidelines" category, while compression and collision detection are more like "algorithms", IMO.

by tepples on 2009-04-16 (#45789)

tokumaru wrote:
Although compression and collision detection are completely platform-independent.

Not entirely. Some algorithms are more practical on some platforms than on others. You use a different sort of compression for 2bpp tiles than for 24bpp photoreal images. And you use a different sort of compression in a machine with a 6502 CPU and 32 KiB of ROM, 2 KiB of RAM, and a parallel port to 2 KiB of VRAM (NES) than in a machine with a 6502 CPU and 64 KiB of unified RAM+VRAM (Apple IIe or C64).

by tokumaru on 2009-04-16 (#45801)

Old tepples and his obsession with finding exceptions... Well, I get what you mean. There certainly are better ways to approach general game programming aspects when we're talking about a specific platform, and I believe that's what Disch is seeking to present.

I guess my point was that those subjects were somewhat beyond the scope of what I thought he was trying to achieve. He tried to explain once and for all a very common newbie problem, which is how to set up the basic structure of a game program. That's a huge problem for someone that never made a game before, specially in assembly.

Compression, however, is not a such a critical subject. One can make a NES game and not bother with any type of compression. The fact that one stores all screens as 1KB blocks of name + attribute data does not prevent a game to be made, it will just not have that many screens.

Collision detection also seems a bit advanced when we consider the initial "lesson". Maybe the person doesn't even know how to make some game entities yet, let alone make them collide. I kinda feel like those subjects are improvements, and that the goal of the document was to help people get off the ground for not understanding how all that technical stuff turns into a game. People gotta have something done before they can improve it.

Maybe there are more critical matters to discuss first, such as the concept of updating the game world a step at a time every frame. Most people have problems with that, even programmers. Most programmers today deal with tasks that are performed in one go, so they have a really hard time understanding that to make something move from point a to point b they can't just use a "for" statement on the coordinate, since (as any game programmer knows) that would freeze all other actions. Most programmers just don't know how to program events that seem to occur simultaneously.

I'm not saying Disch shouldn't talk about those other subjects, but maybe leave them for later, in an "optimizations" chapter for instance.

by Petruza on 2009-04-17 (#45840)

Nice doc!
What about the following: for those of us that will use a generic runtime made in assembler but will code our games in c, thus not being able to write custom NMI routines, would it be fine just updating a zeropage var in NMI which means the start of vblank by being non-zero, and then wait for this to happen in the main thread, do graphics, and then set that var to zero?

by tokumaru on 2009-04-17 (#45850)

Petruza wrote:
would it be fine just updating a zeropage var in NMI which means the start of vblank by being non-zero, and then wait for this to happen in the main thread, do graphics, and then set that var to zero?

This works fine, and is exactly what Dwedit wrote in this post. This is the ideal replacement to $2002 polling (which misses frames).

There is only one serious flaw with this method though: If the frame calculations of your game take longer than a NES frame, the whole program will suffer from slowdown. With a custom NMI you can choose to perform some tasks even when a game frame isn't complete. This is the reason why most games slow down when there are many calculations to do (such as when there are many enemies active at once) but their music still plays normally.

If you are sure your frame calculations will never take too long or you don't want any events to happen at a steady 60Hz (50Hz for PAL), just use that solution.

However, like Disch said, it should be possible to write an NMI routine generic enough to be used through the whole game. If you have any hopes of making complex/fast games in C, you'd better code a very generic and optimized NMI routine in assembly, and provide a set of C functions fill the buffers and things like that. If you actually code the drawing routines in C, I don't think you'll be able to do much during VBlank.

by frantik on 2009-04-17 (#45877)

here is a buffer read routine i came up with based on a simplified version of SMB's routine.. this could be a useful bit of code to include on the wiki about this subject

Code:
ldy #$00 ; reset counter
jmp +start:
-setADDR:
sta PPUADDR ; set up PPU ADDR
iny
lda buffer, y
sta PPUADDR
-writeData:
iny
lda buffer, y ; load data length
iny
tax ; set up counter
-DataLoop:
lda buffer, y ; load data
sta PPUDATA ; store data in PPUDATA
iny
dex ; decrease counter, loop until done with data
bne -DataLoop:
+start:
lda buffer, y ; load next value in buffer
bmi -writeData: ; if value is negative, we'll just continue writing
bne -setADDR: ; if value > 0, we'll write the PPU address
lda #$00 ; reset buffer
sta buffer

the data in the buffer should be formatted like so:

[ppu addr] [length] [data]
or
[FF] [length] [data]

the final byte in the buffer should be 00

This is a valid buffer:

20 00 05 11 22 33 44 55
FF 03 AA BB CC
3F 00 04 01 02 03 04
00

by Celius on 2009-04-17 (#45878)

For max speed, and assurance that I don't spill outside of Vblank, I have individual routines for tile columns, tile rows, attribute columns, attribute rows, the palette, and CHR RAM tiles (if applicable). These routines are each given a PPU address and it writes a buffer of data to the PPU. So this:

Code:
ldx #24
ldy #0
jsr WriteTileColumn

Writes a column of tiles located in a specific area of RAM to $2400 (30 tiles, assuming vertical mirroring). To make things even faster, for this routine, I set it up so that the address fed can only be one that adheres to the top of the name table. This way I won't have to deal with calculating different PPU addresses and name table boundary crossing, etc. Though this does require that I position the tile data in the array copied exactly right. Though this isn't that hard. That particular routine is 270 cycles, or if I put the array in zero page, it's 238 cycles. So in terms of Vblank time, it saves a lot. Though it might be kind of a pain to work with (definitely not for calculating PPU addresses though; just positioning tiles in the array exactly right is).

I mainly wrote these routines for extended Vblank code, so I could update 10 different CHR RAM tiles, a row of tiles, a column of tiles, a row of attributes, a column of attributes, the entire palette, and sprite DMA all in one frame. Yeah, this ended up spilling 12 scanlines outside of Vblank, but it's okay. You don't really notice the blanked top of the screen, especially if you're playing on an NTSC TV. I just needed to be sure exactly how long these routines were taking, so by eliminating name table border complications, the cycle count is constant.

by tokumaru on 2009-04-18 (#45880)

frantik wrote:
here is a buffer read routine i came up with based on a simplified version of SMB's routine..

The only problem with your code is that it's quite slow... INY, DEX and BNE for every copied byte is terribly inneficient. Unfortunately I don't have a better suggestion that doesn't require much more ROM.

If there's more ROM available, I can always recommend my pseudo-DMA idea. In my current project, I use a variation of it that is limited to 32 bytes at a time, so it doesn't need so much ROM.

by frantik on 2009-04-18 (#45881)

yeah I suppose if you know how long your buffer will be each time, an unrolled loop would be faster.

right now i'm just happy to have some kind of display buffer going now i can work on LOCATE and PRINT type commands to further simplify development

by tokumaru on 2009-04-18 (#45883)

frantik wrote:
yeah I suppose if you know how long your buffer will be each time, an unrolled loop would be faster.

If you pay attention to my code you'll see that the cool thing about it is that it's actually an unrolled loop that copies a variable amount of bytes. Just the total amount of bytes is limited to 128, but you can read variable amounts of data from those 128.

by frantik on 2009-04-18 (#45886)

i have to admit i'm having a hard time understanding all of your code... but thats ok for now.

with the code snippet I posted, I can see a way to output null terminated strings which would save a dex operation each byte... but then you can't use the 00 tile for anything

edit: actutally i improved my code so that if the first byte is positive, it writes that byte and the following byte to PPUADDR. If the first byte is negative, it uses that value as the length of the string to write. this will make it much easier to make locate and print commands, and also saves a byte in the buffer when appending strings

edit2: eh.. having to deal with string lengths for everything is a pain.. worth it to have 00 be a null character to save time in a lot of places

by No Carrier on 2009-04-19 (#45946)

Great stuff, Disch! Thanks!

by Drag on 2009-04-20 (#45974)

tokumaru wrote:
frantik wrote:
here is a buffer read routine i came up with based on a simplified version of SMB's routine..

The only problem with your code is that it's quite slow... INY, DEX and BNE for every copied byte is terribly inneficient. Unfortunately I don't have a better suggestion that doesn't require much more ROM.

If there's more ROM available, I can always recommend my pseudo-DMA idea. In my current project, I use a variation of it that is limited to 32 bytes at a time, so it doesn't need so much ROM.

You could kinda compromise and do it both ways at the same time, like StarTropics II.

Assume X is the loop count (the number of bytes remaining that we need to transfer).

Basically, it has an unrolled loop of:
Code:
<subtract 8 from X>
LDA buffer,Y
STA $2007
LDA buffer+1,Y
STA $2007
...
LDA buffer+7,Y
STA $2007
<add 8 to Y>

plus the normal loop:
Code:
LDA buffer,Y
STA $2007
INY
DEX
BNE -

The unrolled loop is the equivalent of a loop of 8, so what the program does is use the unrolled loop, copying 8 bytes each time, (subtracting 8 from the loop count each time), until the loop count is < 8, in which it just uses the normal loop for the remaining bytes.

So basically, if you need to transfer 5 bytes, it uses the normal loop with an X of 5.
If you need to transfer 8 bytes, it uses the unrolled loop once.
For 16 bytes, it uses the unrolled loop twice.
For 20 bytes, it's the unrolled loop twice, and the normal loop with an X of 4.

It doesn't rely on huge unrolled loops, indirect jumps, tables, or anything of the sort, yet offers a better efficiency for transfers that are >= 8 bytes in length.

by frantik on 2009-04-20 (#45976)

^ hey i like that idea

by Bregalad on 2009-04-20 (#45978)

Hey, just something about the doc...
Dish says it's good to never disable the NMI is good, but I can see two reasons to disable it :
- When redrawing something when the screen is turned off, if an NMI happens between two $2006 writes, the NMI will acknownledge the interrupt by reading $2002 and the result would be very bad (this has actually happened to me once).
- When starting a sound effect/a new piece of music if the music engine is called while the sound effect/music is initiated, BAD things could happen.

Unless you programm the NMI to not even acknownledge the interrupt depending on the state of the main engine, and because that NMIs on the 6502 are edge based and not state based, I guess it would be a workarround to issue 1, algouth it's not really "clean".

by Disch on 2009-04-20 (#45987)

1) Which is why you don't unconditionally read $2002 or set the scroll. Note that I made both of those conditional in my latest version.

2) The music engine should do all of that anyway. Starting a new track or playing a sound effect should equate to changing a single area of RAM.

Only real concern I had that was unavoidable was for mappers which require serial writes to a register. MMC3 style address/data regs can be guarded against easily enough:

Code:
swapmmc3:
lda #$81
sta soft8000 ; nmi would then copy this to $8000 before exit
sta $8000
lda pagenumber
sta somewhere ; if this is necessary
sta $8001

Howver MMC1 style serial writes are pretty much impossible to guard against, and so you'd need to disable NMIs around them

by Jarhmander on 2009-04-20 (#45990)

Drag wrote:
tokumaru wrote:
frantik wrote:
here is a buffer read routine i came up with based on a simplified version of SMB's routine..

The only problem with your code is that it's quite slow... INY, DEX and BNE for every copied byte is terribly inneficient. Unfortunately I don't have a better suggestion that doesn't require much more ROM.

If there's more ROM available, I can always recommend my pseudo-DMA idea. In my current project, I use a variation of it that is limited to 32 bytes at a time, so it doesn't need so much ROM.

You could kinda compromise and do it both ways at the same time, like StarTropics II.

Assume X is the loop count (the number of bytes remaining that we need to transfer).

Basically, it has an unrolled loop of:
Code:
<subtract 8 from X>
LDA buffer,Y
STA $2007
LDA buffer+1,Y
STA $2007
...
LDA buffer+7,Y
STA $2007
<add 8 to Y>

plus the normal loop:
Code:
LDA buffer,Y
STA $2007
INY
DEX
BNE -

The unrolled loop is the equivalent of a loop of 8, so what the program does is use the unrolled loop, copying 8 bytes each time, (subtracting 8 from the loop count each time), until the loop count is < 8, in which it just uses the normal loop for the remaining bytes.

So basically, if you need to transfer 5 bytes, it uses the normal loop with an X of 5.
If you need to transfer 8 bytes, it uses the unrolled loop once.
For 16 bytes, it uses the unrolled loop twice.
For 20 bytes, it's the unrolled loop twice, and the normal loop with an X of 4.

It doesn't rely on huge unrolled loops, indirect jumps, tables, or anything of the sort, yet offers a better efficiency for transfers that are >= 8 bytes in length.

This is exactly the Duff device! http://en.wikipedia.org/wiki/Duff%27s_device

Re: Drawing, Vblank, and NMI - A doc I whipped up
by ehguacho on 2010-03-11 (#58022)

Disch wrote:
...I was bored and needed a break from my other project...

thanks mate! (a.k.a. "you should get bored more frequently")

by No Carrier on 2010-03-14 (#58212)

Disch wrote:
http://nesdevhandbook.googlepages.com/index.html

It's a start! We'll see how much I can actually get done before getting bored and moving on. XD

Since this thread was brought back from the dead, I'll mention that this link isn't working. You can get to the index, but subpage links are dead.

by Disch on 2010-03-14 (#58214)

Yeah apparently googlepages doesn't work anymore. Talk about lame.

I don't have any free webhosting these days =(

by tepples on 2010-03-14 (#58215)

Disch wrote:
I don't have any free webhosting these days =(

wiki.nesdev.com

by Disch on 2010-03-14 (#58226)

Well I didn't mean for nesdev stuff, I meant for general filesharing.

by WJYkK on 2010-03-14 (#58269)

Oh, it works, except you have to download http://nesdevhandbook.googlepages.com/theframe.html (AKA chapter 1 AKA the only chapter).

by comegordas on 2010-08-19 (#66317)

thanks a lot Disch. hat's off. it's a great doc this one you whipped up.

Re: Drawing, Vblank, and NMI - A doc I whipped up
by jarrodparkes on 2012-12-29 (#105263)

Does anyone still have access or a copy to this guide?

Thanks!
Jarrod

Re: Drawing, Vblank, and NMI - A doc I whipped up
by Hangin10 on 2012-12-29 (#105270)

I managed to find this, not sure if that's all of it. I also don't have the stylesheet.

Also, it won't let me attach html nor .txt, but zipping it is ok?

EDIT: Sorry koitsu, apparently it's false positive-ing, calling it an attack vector if I don't zip it regardless of extension.

Re: Drawing, Vblank, and NMI - A doc I whipped up
by koitsu on 2012-12-29 (#105271)

.txt attachments should absolutely work. .html or .htm don't work, but I'm planning on fixing that in a moment.

Re: Drawing, Vblank, and NMI - A doc I whipped up
by koitsu on 2012-12-29 (#105273)

Like I said, .txt works just fine. .htm, .html, and .css have been added.