I'm working on a game, and I'm wondering what stuff would you have to do in your NMI that you cannot just do in vblank? My engine is currently set up so that everything you do in a frame is done in the endless loop when bit 7 of $2002 is set. So it's like this:
Loop:
lda $2002
bmi framecrap
jmp Loop
framecrap:
jsr joypad
lda scrollx
sta $2005
.....
jmp Loop
Is that not okay to do? My code functions just fine on an emulator, but will it run okay on the real NES? It's just easier, because I don't have to worry about A/X/Y registers being screwed up. I had a code set up to do stuff in the endless loop and in the NMI, and it crashed so bad, and then I realized it was because stuff like this would happen:
lda SomeVar
(NMI is executed, A register is changed)
sta $2005
Or something like that. With how I have it set up, I don't have to worry about that. I knew it was a problem, because when I changed it so my NMI and my loop weren't running at the same time, it worked.
I don't think anything stops it from running on hardware, but it seems that $2002 is not a very reliable way to check for VBlank. Someone said that a quite high percentage of VBlanks can be missed if you do it like this, and your game will not run as smoothly as it could.
If you have trouble with your NMI routine when stuff is running on the main loop, I suggest you put the whole code inside the NMI then. The main loop will be nothing more than "forever: jmp forever" and there is no way to screw that up. Unless your NMI routine takes longer than a frame, in wich case another NMI would fire while the last one wasn't over and you'd have the same problems you have now.
Maybe you could disable NMI's inside the NMI, and enable them again right before it ends (reading $2002 first to avoid firing an NMI right away in the middle of a VBlank), so that they'll never overlap. You may still miss frames, but only when your frame calculations take too long, and that can't be avoided unless you optimize the code...
But I don't get what the big problem is anyway... if you push everything to the stack at the start of the NMI and get it all back at the end, there is no reason for things to screw up. It must be something else.
As tokumaru said, polling $2002 for VBlank will occasionally make you miss a frame on the system (and even on modern emus, since blargg released test ROMs which expose this behavior). NMIs are far more reliable. If you want to keep your current setup and have minimal changes, it's not too hard to replace $2002 polling logic with NMI logic.
In your NMI routine, you could just set a flag in RAM somewhere and RTI immediately. Then when you wait for vblank, just clear the flag, enable nmis, then wait for the flag to be set, then disable nmis.
But I wouldn't recommend doing that. It's rather silly, and NMIs will be disabled until you specifically wait for VBlank (so if you're doing too much logic to be done in a frame, you will miss an NMI, which would be disaterous if you're doing split screen effects). The NES game design scheme I'd go with would be something like the following:
NMI should just do drawing stuff and/or other things that can only be done in vblank (palette updates, NT updates, spr-ram DMA, etc). You might even want to throw in your music code after all the drawing is complete before you RTI... in case you don't want the music to slow down when the rest of the game does.
The main game logic should only set vars and fill buffers telling the NMI routine what to draw or what to update. NMIs should always be enabled, except for transition events (like when you shut the screen off to fill a whole NT, or something like that). If game logic spills into VBlank, it won't matter, since the NMI will catch it, and only the appropriate stuff will be drawn.
At least... that's what I would do. There are many ways you could go. I hear that SMB does everything inside the NMI routine.
The way Dish covered is the best. I tried several ways of doing game logic vs NMI, and definitely, the NMI routine should handle all PPU and APU related stuff, along with counters updating (your game engine will want to have a set of counter and random numbers that change each frame)m while the non-NMI routine would do the actual logic, and fill buffers.
What should be done in NMI should be set through flags you'll want to set a flag only when a buffer is full, and in NMI when the buffer is read (or just when you'll do it), clear the flag.
When game logic relies on a buffer to be empy to continus, poll the flags and it will clear only when the NMI code will proprely update it in PPU/APU/whatever buffer.
When the game logic just have to wait for next frame, poll a main NMI flag or a counter.
That is not the only way, but by far this is the best way I've found. It works very well if you use it well.
I think you should take a bit care of how your game engine works to know in what way you want to make up your buffers. Sometimes you would even have two levels of buffering. Buffers make programming SO much easier !
Bregalad wrote:
Sometimes you would even have two levels of buffering. Buffers make programming SO much easier !
That's so true. While designing the format of the levels in my platform engine, I came up with a pretty nested format. Tiles were indexed by metatiles that were index by blocks, that were indexed by structures that were indexed by screens that were indexed by the level map. This allowed for very compact maps, resulting in bigger levels.
The first design of a decoding routine worked directly on screens, outputing the tiles to be rendered during the next VBlank to a buffer. That routine was madly complicated. And it took more than 1/4 the frame to execute. If by any chance the game was to update a row and a column of metatiles, I'd surelly miss a frame.
In the new design, the screens are decompressed up to the block level as soon as the player enters them. With 32x32 pixel blocks, 4 screens fit into 256 bytes of RAM, wich is easily indexed. All collision is performed against these blocks, wich also made everything easier. The actual rendering routine has only a few steps left until it reaches the tiles, and everything is much much faster than before.
So I agree that breaking bigger tasks into smaller ones through the use of buffers is definatelly a good thing.
Thanks for your replies!
It seems very untimed and inaccurate, the way I'm doing it. I was wondering if any of you knew how to update the screen without the turning off of the screen being so noticable. Because my engine currently stores whole objects(a series of meta tiles, not just a metatile) on the screen when updating is called for. And you always need to turn off the screen when writing to it, or so I understand, so how could you do this without it being noticable? It is possible, I know, because every game pretty smooth with it's updates, like SMB1 and stuff.
Celius wrote:
I was wondering if any of you knew how to update the screen without the turning off of the screen being so noticable.
Update in VBlank. NMI trips at the start of VBlank -- then you have 20 scanlines (about 2273 cycles) to get all your drawing done. If you write to $4014 for sprite DMA (which you should do every frame, or at the very least, every other frame) then you lose 513 cycles (which is how long the DMA takes).
Time is short in VBlank. This is why you must have everything ready to draw and totally spelled out by your game logic... so the NMI routine can just see what it needs to draw, and quickly draw it.
If you finish all your drawing before VBlank is over, you do NOT have to turn the screen off. You can leave the screen on, it won't be a problem.
Note that actual OAM can only be written to in VBlank (or screen off) -- but your dedicated page of RAM to hold sprite data (which you'll later DMA to OAM) can be written to at any time. So moving sprites and whatnot can all be done by game logic. That way... all your NMI routine has to do is copy it over via DMA.
The idea is, you want to have OTHER things ready-to-draw like that too. The NMI routine shouldn't have to "figure anything out"... that takes too long... VBlank time is too short. Game logic should have it all ready (like it does with the sprite page), then the NMI routine can just copy the needed stuff to the nametables/palette/whatever needs to get done.
If you're updating APU stuff in the NMI routine as well -- don't do it until AFTER you're done with your drawing. APU stuff can be written to any time, it doesn't pay to waste precious VBlank time with APU code.
Okay, so, I'm thinking of completely rewriting my routine, using the same ideas, but I'm going to have lightning fast updates. I'm trying to make an NROM sidescroller to help me learn how to not waste space. I am already going to have $100 bytes in RAM that represent the meta tiles on screen to check for collision and whatnot, so this is looking a little tight. I'm going to have my objects in PRG layed out in raw, not meta-tile form. So this:
80, 80, 80
in not meta tile form would be this:
80, 82, 80, 82, 80, 82
81, 83, 81, 83, 81, 83
It's for lightning fast screen update, because a meta tile decoding routine (If you don't know what I mean, I mean like a routine that turns $80 into $80,$81,$82, and $83) in your NMI is just a disaster waiting to happen. I would have a meta tile decoding routine store all the individual tile IDs in RAM, but I don't have enough space in RAM, and when I think about it, what the heck is the point? Just have your objects in PRG in non-metatile form, because there's no real reason besides saving PRG space to have them in metatile form. But it takes alot more time to process, and I wouldn't have anywhere to put the individual tile IDs anyways.
Have your Object Address, X/Y dimensions of object, and PPU pointers in RAM. Access those in the NMI to store your object (in non-metatile form) on the screen really fast. I think Columns would be a lot faster, but a lot more complicated to keep track of. Do you know what I'm saying?
Celius, there is no problem at all in decoding metatiles. You just have to do it before VBlank, and store the results in a buffer. A full column of 15 metatiles will decode to only 60 tiles (60 bytes), so that's no problem.
When you know a column is to be rendered (camera moved 16 pixels), you read the metatiles from the map you use for collision and stuff, and decode them to the buffer. Later, during VBlank, it's just a matter of direcly copying those 60 bytes from the buffer to the name table. 60 bytes is a fairly safe ammount of data to write.
By using buffers, the first version of the screen drawing routines I wrote was able to update 2 columns of tiles (1 column of metatiles), 2 rows of tiles (1 row of metatiles), update attributes, do sprite DMA, and the rest of the time was alternated between updating the palette and erasing background tiles that needed to be cleared. All within standard VBlank time (no tricks). Of course, I used a bunch of LDA's STA's, with no loops and very little logic, and the whole drawing code used like 2kb.
The key here is: you have to learn how to use buffers, or else you'll always be terribly limited by VBlank time.
But then if you overbuffer, you'll run into limitations where your program will only run if a PRG RAM chip is present on the cart. That can get expensive, as it limits the donor carts that can be used.
tepples wrote:
But then if you overbuffer
That won't happen. There is a fixed space in RAM reserved for a row, another space for a column, and a few slots for random blocks that need to be erased. That adds to less than 256 bytes. That's how it worked in my previous engine, and most likely how it will be on my new one (I rewrite stuff a lot!).
Quote:
you'll run into limitations where your program will only run if a PRG RAM chip is present on the cart. That can get expensive, as it limits the donor carts that can be used.
I don't see why I'd need WRAM. I don't like WRAM. I did what I described above with the stock 2kb of RAM. Unless you are thinking I meant self-modifying code on the LDA's STA's thing, wich I didn't. These are stored in ROM, and always read from the place where the buffer is and write to $2007. 2kb of ROM is not a very high price to pay.
tokumaru wrote:
Celius, there is no problem at all in decoding metatiles. You just have to do it before VBlank, and store the results in a buffer. A full column of 15 metatiles will decode to only 60 tiles (60 bytes), so that's no problem.
Okay, first. My maps are made of objects. By an object, I mean a block consisting of many metatiles. Like Metroid does, I believe. So my map is divided into objects, and would be hard to update it column by column.
If I were to update it column by column, I'd have to somehow keep track of how many columns of the object are on the screen, and I'd have to start drawing were I left off, and this would be very tidious and annoying. If I were to entirely copy the object into RAM, it'd take up alot of space, and this is something I cannot do.
I don't have much room in RAM, like I've said, I've already deticated $100 bytes to Sprite DMA, $100 for collision detection, and a bunch for other dumb stuff. I will not forget you suggested this, but I think just writing one object at a time would be good. Do you think writing like 32 tiles and doing 16-bit addition would take a long time? I don't think so, if you just write one object at a time, that's not bad at all. And also, on my game, you can't scroll the other way. It's like SMB1, it's a one-way trip to the flag, boss, or whatever.
Celius wrote:
Okay, first. My maps are made of objects. By an object, I mean a block consisting of many metatiles. Like Metroid does, I believe. So my map is divided into objects, and would be hard to update it column by column.
Such maps are not a problem. I'm guessing you are already decoding those objects to that 256-byte map you use for collision, right? I'm only suggesting that you read the columns from that same map, not directly from ROM.
Quote:
If I were to update it column by column, I'd have to somehow keep track of how many columns of the object are on the screen, and I'd have to start drawing were I left off, and this would be very tidious and annoying.
Yeah, sure, that'd be madness! Don't ever do that.
Quote:
If I were to entirely copy the object into RAM, it'd take up alot of space, and this is something I cannot do.
Well, not entirely. Just one screen worth of metatiles. Maybe 2, as you can't decode the new one over the old one, as it's still beeing used. BTW, how are you handling collision using only one screen and not decoding column by column anyway?
Quote:
I don't have much room in RAM, like I've said, I've already deticated $100 bytes to Sprite DMA, $100 for collision detection, and a bunch for other dumb stuff. I will not forget you suggested this, but I think just writing one object at a time would be good.
I know RAM is limited. I kinda got around that by using 32x32-pixel metatiles, wich allows me to have 4 screens loaded at a time (essential for a multi-way platformer) using only 1 page of RAM. Collision is also done at that level, but finer collision maps assigned to some of the metatiles would allow to simulate 16x16 pixel blocks, and even smaller ones (preferably not blocky at all, that is why the collision maps are for).
Quote:
Do you think writing like 32 tiles and doing 16-bit addition would take a long time? I don't think so, if you just write one object at a time, that's not bad at all.
One single addition? That'd be pretty quick, I guess. If your objects are arranged in order of increasing Y position, you can definately decode them little by little. If you do 1 by 1, then you may have trouble if the next column to be displayed is composed by more than one object... and decoding them backward could be hard too. Unless each object definition has a fixed size, wich would allow you to just read the object list backwards.
Quote:
And also, on my game, you can't scroll the other way. It's like SMB1, it's a one-way trip to the flag, boss, or whatever.
Oh, then forget what I said about reading the object list backwards... =)
Oh hey, I suppose I could use the same 256 byte section in RAM for updating the screen! I'm going to have to update the whole thing each time I scroll 16 pixels over, though, and that'll really irritate me. How should I do this? I have my objects set up so you can never have half of one object on one nametable and the other half on the other. I'd maybe want to use more bytes that represent what is about to be stored on screen. I could write objects in metatile form to $300-$4FF that would be what's on screen, and what's offscreen. So $300-$3FF would be used for collision and stuff, and $400-$4FF would be what is about to be on screen! If I did this, I'd want arrays to represent columns, you know? I think I explained this in another thread to you, it might have been like the "Level designing" one. I'll have a routine that takes the metatile array, and makes it into a regular tile array in zero page, so it'll be more fast.
Okay, I'm just wondering how I will update $300-$4FF. I think it'll be almost the exact same as the screen. I think I'll scroll in RAM as I'm scrolling the screen. So when the scroll is #$00, read $300-$3FF, and when it is #$10, read $310-$40F, when it is #$20, read $320,$41F, etc. Does that make sense? Are any of these ideas good?
Celius wrote:
I could write objects in metatile form to $300-$4FF that would be what's on screen, and what's offscreen. So $300-$3FF would be used for collision and stuff, and $400-$4FF would be what is about to be on screen!
Yeah, ideally you'd have buffers for 2 full screens as that is the number of name tables, and, as you said, the range of your objects is one screen, meaning you need a full spare screen to load the new one without destroying the current one. But 2 pages of RAM may be to much to spend on this. If you can spare the space, go for it, but normally I'd suggest you increase the metatile size (to 32x16, for example, wich can be very handy when it comes to attributes) so that 2 screens will fit in only one page.
Quote:
If I did this, I'd want arrays to represent columns, you know? I think I explained this in another thread to you, it might have been like the "Level designing" one. I'll have a routine that takes the metatile array, and makes it into a regular tile array in zero page, so it'll be more fast.
That's exactly the idea.
Quote:
Okay, I'm just wondering how I will update $300-$4FF. I think it'll be almost the exact same as the screen. I think I'll scroll in RAM as I'm scrolling the screen. So when the scroll is #$00, read $300-$3FF, and when it is #$10, read $310-$40F, when it is #$20, read $320,$41F, etc. Does that make sense? Are any of these ideas good?
I didn't quite get what you said here. One page equals 1 screen, right? So, every time you complete 256 pixels of scrolling, that means you have to load a new screen, over the one that you just left. So that the process of loading a new screen does not slow the game down, you could limit the number of objects you decode per frame.
If I were to use an object-based level, I'd probably divide it up in screens (as you do with the "objects can not cross to the other name table" rule), each having a 2-byte header: 1 tells wich metatile to use to clear the screen (so that the screens can have different "moods" without you having to define many objects for that), and the other tells how many objects compose the screen. Then each object would follow, with a byte specifying it's type, and another used to specify it's position (4 bits for X and 4 bits for Y).
When you load a new screen, use the "clear" byte to fill the screen. Then you draw the objects over it, in small groups each frame, until the whole screen is up. Clearing 256 bytes may take a while, but there are ways to speed it up, such as partially unrolling the fill loop, wich is just a bunch of STA's.
Anyways, these are just ideas. They are not rules or anything.
I was thinking I could have what's on the tables in meta-tile form stored in 2 pages of RAM, yes, you have that correct. What I meant is like, say you are checking $300-$3FF (What's on the screen when the scroll is #$00) for collision. When 16 pixels are scrolled, You will want to be checking $310-$40F, because it's 1 column ahead of what was just on screen. Do you understand?
Celius wrote:
What I meant is like, say you are checking $300-$3FF (What's on the screen when the scroll is #$00) for collision. When 16 pixels are scrolled, You will want to be checking $310-$40F, because it's 1 column ahead of what was just on screen. Do you understand?
Actually... I don't. The way I see it, the character moves within the 2 screens that are loaded in RAM, as if it was a tiny (2 screen) level. Only that this level is updated as you move ahead.
It's like the player constantly loops through these 2 screens, from the first to the second and back to the first, but by the time you get back to the first screen it has been reloaded with a new map, effectivelly making it the 3rd screen.
I didn't understand what you said, because in my head, all movement, physics and collision if performed against the RAM screens. So, checking if the player has hit a wall is just a matter of converting the coordinates of the point you want to check from pixel units into metatile units (divide by 16, shift right 4 times) and directly check these coordinates in the loaded maps, to verify if the player hit a solid block or not.
I don't see why you'd want to do such a broad check within a range as big as $300-$3FF, as you said. It sounds slow, though.
Unless we are thinking of completelly different collision methods... what are you using?
tokumaru wrote:
The way I see it, the character moves within the 2 screens that are loaded in RAM, as if it was a tiny (2 screen) level. Only that this level is updated as you move ahead.
It's like the player constantly loops through these 2 screens, from the first to the second and back to the first, but by the time you get back to the first screen it has been reloaded with a new map, effectivelly making it the 3rd screen.
That is exactly what I'm trying to say. $300-$4FF act pretty much the same as the Name Tables, exept they have the metatile IDs instead of individual tile IDs.
My collision code (I haven't written it yet, I'm trying to plan as much as possible before coding) is going to check which metatiles the sprite is around. This will be read through RAM. It's like what you were saying, I think.
I just want to say hello!!! This site rocks!!! Keep working in your NES games. I think NES is the greatest game console of all times.