Map rendering for side scroller, coding question

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic

Map rendering for side scroller, coding question
by Banshaku on 2009-05-26 (#47266)

There is one approach I want to try but I'm not sure if it's good or not. I'm aware that I'm not always good at explaining things so I will do my best to make it clear.

The goal is for a side scroller. I have 2x2 meta-metatile made of 2x2 metatile (which mean the meta-metatile is 4x4 tile wide).

The metatile are exported in column style. This mean the content order will be:

Code:

  1 3
  2 4

The meta-metatile have the same format. The map data is saved in column style too.

What I want to do is to write the map column by column. Let say we have 7 meta-metatiles per column.

The PPU is put in column mode (increment 32). Now when I read the first meta-metatile, I want to write the first "column" of the meta-metatile on the screen. This mean I will write all the A tiles first, shown in the example below:

Code:

Meta-metatile content

    AB      CD

 1  13   3  13
    24      24

 2  13   4  13
    24      24

This mean I will read tile 1-2 of metatile 1 first then repeat the same process for metatile 2. Then I will skip to the next meta-metatile, repeating the same process (writing column A) for the 6 meta-metatile left. Once the first column is over, the PPU adress will now be in column 2 (not really, this 2 tiles left but let assume it is) of the screen and I must repeat the same process 3 times (column B, C, D).

Repeat the same idea for the rest of the column in the map. Only thing left to think about is to when to write the background attributes. My guess at the end of map rendering is better because this will avoid to change the PPU address pointer many times.

Doe this approach make sense?
Thanks for any comments.

by Memblers on 2009-05-26 (#47267)

Yeah, that's the best way to do it.

Quote:
My guess at the end of map rendering is better because this will avoid to change the PPU address pointer many times.

Just make sure the map rendering is completely separate from any PPU access, because there's not a lot of vblank time. As long as it's in a buffer that's output during NMI you can do about anything.

by tokumaru on 2009-05-26 (#47268)

This is one way to do it, but I have another suggestion that might make attribute handling easier. Instead of decoding a tile column at a time for every 8 pixels scrolled, you might consider decoding whole meta-metatiles at once.

Sure, the buffer needed to hold the tiles will have to be larger, but handling full blocks at once is easier than skipping and such. And since the area being decoded at once is conveniently as wide as the area covered by an attribute byte, attributes will be easier to code.

So, whenever the camera scrolls 64 pixels you could decode 7 whole meta-metatiles for a total of 112 bytes of name table data plus 7 bytes of attribute data. Just spread that amount over the next few VBlanks and you should be fine. If you arrange your buffers well and use the index registers cleverly, you can have the same piece of code always decode the blocks to the buffers and the same piece of code always send the buffers to the PPU, without having to handle special cases depending on what column you are rendering, and without wasting time going through the same blocks multiple times just to fetch different parts of them each time.

Anyway, this is just a suggestion, an alternative to the way you first proposed, but both ways are fine of course. since you thought of the other way first you might be more comfortable implementing that.

Most people have a really hard time handling attributes in scrolling games where tile data is updated on the fly, so you might consider it right from the start when designing your rendering engine. Even though it's much easier when the game only scrolls horizontally (no need to worry about attribute byte vertical misalignment) and you might have chosen to define attributes at the meta-metatile (which is conveniently the same size as the area covered by an attribute byte) level, I thought I'd mention it.

Banshaku wrote:
Once the first column is over, the PPU adress will now be in column 2 (not really, this 2 tiles left but let assume it is) of the screen.

Well, even if you did write all 30 tiles of a column the address would not automatically move on to the second column, it would go into the attribute table and then into the next name table, so you'd still have to set the address for every tile column.

by Banshaku on 2009-05-26 (#47270)

Memblers wrote:
Just make sure the map rendering is completely separate from any PPU access, because there's not a lot of vblank time. As long as it's in a buffer that's output during NMI you can do about anything.

Thanks for the comment. I will keep that in mind.

tokumaru wrote:
Instead of decoding a tile column at a time for every 8 pixels scrolled, you might consider decoding whole meta-metatiles at once.

For now I didn't mention about decoding 1 tile column only for scrolling since I didn't consider in detail yet how to implement it. What I had in mind is to always decode a column of meta-metatile, then decode that column into (4) columns of tiles. But this is still only theoretical: I don't have any code, I'm thinking on how to approach the problem. The explanation above is how I would decode the column of meta-metatile.

tokumaru wrote:
Sure, the buffer needed to hold the tiles will have to be larger, but handling full blocks at once is easier than skipping and such. And since the area being decoded at once is conveniently as wide as the area covered by an attribute byte, attributes will be easier to code.

I fully agree. Starting to skip column of tile inside a metatile seems a pain.

tokumaru wrote:
without having to handle special cases depending on what column you are rendering, and without wasting time going through the same blocks multiple times just to fetch different parts of them each time.

For now my approach requires to check each meta-metatile 4 times. If you process the meta-metatile once, does the data format as an impact? Should it be column based on row based? How the impact on the buffer and the way to write the buffer to the PPU?

tokumaru wrote:
since you thought of the other way first you might be more comfortable implementing that.

Since it's all theoretical and I don't mind about data format because my editor can produce the format I want (or I can program it), either way is fine with me. I'm just trying to find a way that is simpler to develop since my 6502 coding became quite rusty already ^^;;

tokumaru wrote:
Well, even if you did write all 30 tiles of a column the address would not automatically move on to the second column, it would go into the attribute table and then into the next name table, so you'd still have to set the address for every tile column.

Oh. I though the column increment was a little bit more intelligent than that. Didn't saw any mention of it before. For the attribute table, now I can see why since it's at the end of the name table but why it does jump in the second name table after. hmmm.. guess there is something related to the way the memory is organized. I need to check the wiki or some doc to figure out this one.

Thanks for the comment Tokumaru.

Edit:

I checked and I can see why now. If I use vertical mirroring, writing 1 tile at a time at $2000 by increment of 32, once I write the last tile, I end up at $23C0. Once I skip 2 more line, it becomes $2400, becoming the first column of the second name table. So it just increase the address counter.

by Bregalad on 2009-05-27 (#47272)

What is really hard in scrolling is handling direction changes from the user. If you decode a large portion of metatiles and split the updates in smaller parts, it works fine when scrolling to the same direction, but if you reverse the direction during the update real crap shit will happen and there is no workaround arround that.

by tokumaru on 2009-05-27 (#47275)

Bregalad wrote:
If you decode a large portion of metatiles and split the updates in smaller parts, it works fine when scrolling to the same direction, but if you reverse the direction during the update real crap shit will happen and there is no workaround arround that.

It doesn't have to be as catastrophic as you make it seem. In his situation, if a column of meta-metatiles is being rendered at the right, and the player does go back to the point where a new column has to be rendered at the left, there is nothing wrong in aborting the previous update.

See, when you go past a certain point (since his blocks are 32 pixels wide, that would be whenever bit 5 of the camera's coordinate changes) a column update is triggered. Say that it takes 4 VBlanks for him to fully update the column. If the player goes back and crosses the point in the opposite direction before those 4 frames are done, there is nothing bad about canceling that update to process the new one. There are not gonna be glitches, because in order to see the glitched section, he'd have to go past the trigger again, so there is no way he'll see the glitched section, because in order for it to scroll into view he'd have to move in the same direction for more than 4 frames.

Of course I'm assuming that the correctly rendered area is wider than the visible screen by 1 unit in order to hide scrolling glitches. This is the reason why scrolling in both directions is hard with only the stock 2 name tables: in one of the directions it's not possible to correctly render more than 1 screen woth of blocks, so there will be glitches unless you find a way to make the visible area smaller (with IRQs, sprite masks or whatever).

by tokumaru on 2009-05-27 (#47276)

Banshaku wrote:
For now my approach requires to check each meta-metatile 4 times. If you process the meta-metatile once, does the data format as an impact? Should it be column based on row based? How the impact on the buffer and the way to write the buffer to the PPU?

Sometimes the way in which dat is arranged makes all the difference. I don't know if you'd need to arrange the data differently in the ROM, but the RAM buffers will probably look a bit "unconventional" if you're aiming at performance. Here's how I'd do it:

Each meta-metaile expands to 16 tiles, right? So, for easy indexing, I'd have 16 7-byte buffers to hold the decoded blocks. Something like this:

Code:
Tile00 .dsb 7
Tile01 .dsb 7
Tile02 .dsb 7
Tile03 .dsb 7
Tile10 .dsb 7
Tile11 .dsb 7
Tile12 .dsb 7
Tile13 .dsb 7
Tile20 .dsb 7
Tile21 .dsb 7
Tile22 .dsb 7
Tile23 .dsb 7
Tile30 .dsb 7
Tile31 .dsb 7
Tile32 .dsb 7
Tile33 .dsb 7

The numbers are the coordinates of the tiles inside the meta0metatile ((0,0) to (3, 3)). OK, I don't know how you are getting the the index of the meta-metatile, but I'd write the decoding routine somewhat like this:

Code:
ldx #$00
-Decode:
;LOGIC TO GET THE INDEX OF THE META-METATILE INTO THE ACCUMULATOR GOES HERE!

;get the index of the 4 metatiles
tay
lda Metatile3, y
pha
lda Metatile2, y
pha
lda Metatile1, y
pha
lda Metatile0, y

;decode the 1st
tay
lda Tile0, y
sta Tile00, x
lda Tile1, y
sta Tile01, x
lda Tile2, y
sta Tile10, x
lda Tile3, y
sta Tile11, x

;decode the 2nd
pla
tay
lda Tile0, y
sta Tile02, x
lda Tile1, y
sta Tile03, x
lda Tile2, y
sta Tile12, x
lda Tile3, y
sta Tile13, x

;decode the 3rd
pla
tay
lda Tile0, y
sta Tile20, x
lda Tile1, y
sta Tile21, x
lda Tile2, y
sta Tile30, x
lda Tile3, y
sta Tile31, x

;decode the 4th
pla
tay
lda Tile0, y
sta Tile22, x
lda Tile1, y
sta Tile23, x
lda Tile2, y
sta Tile32, x
lda Tile3, y
sta Tile33, x

;move on to the next meta-metatile
inx
cpx #$07
bne -Decode

It's partially unrolled, so it's fast but not too big. I use Y to index the block data because you might want to read it with zero page pointers instead. Oh, I don't know how you are storing the attributes, but they are probably ready as part of the meta-metatile or you'll have to form it by combining the bits of each of the 4 metatiles, but either way you'll have to store it in another array also indexed by X. Anyway, after the buffers are ready, you can write them to VRAM with something like this:

Code:
ldx FirsrTile
-Update:
lda Tile00, x
sta $2007
lda Tile10, x
sta $2007
lda Tile20, x
sta $2007
lda Tile30, x
sta $2007

inx
cpx #$07
bne -Update

Of course you can unroll that, arrange the buffer backwards so that you don't need a "cpx" instruction, whatever you want in order to optimize it. Anyway, "FirstTile" would be set beforehand to 0, 7, 14 or 21, depending on which of the 4 columns you want to update.

I know Bregalad will probably say I'm crazy, he always does because he usually doesn't agree with the way I do things. But yeah, I am a fan of data interleaving and moderate levels of unrolling for some extra speed and reduced complexity of the code (less branches, less special cases and such). I hope my ideas give you some good ones.

by Bregalad on 2009-05-27 (#47277)

Well, maybe you found a working workarround about the colum update, yet I'm pretty sure if the user is going for example right, then goes left a very small amount and goes right again, it's very hard for the scroll engine not to screw up if it decodes larger parts, which is almost always required by the fact you use metatiles, which is needed unless you have infinite ROM and time to draw your levels.

I remember that I was having multidirectionnal scrolling with a status bar "hiding" the row update glitches (using 1-screen mirroring), and it was impossible to me to have a system that allow the player to repeately change the vertical direction without doing any major screwing up.
Anyway I lost all the code that did that so if I'm rewriting it I will make a better version of it (but it wasn't for my current game project, but for another hypotetical future one).

by Dwedit on 2009-05-27 (#47279)

What about fully unrolled code in RAM? I think MC Kids uses that to do scrolling updates. That game also happens to have the 8K WRAM chip, but still...

by tokumaru on 2009-05-27 (#47281)

Of course it's not possible to have clean scrolling in both directions, as we've discussed countless times. But provided there is some "wiggle room" (as is always the case when only one type of scrolling is used), it is possible to flush a buffer small pieces at a time.

The red blocks are blocks that have already been rendered, the yellow box represents the area that is rendered to the screen.

Once the camera moves right and crosses a block boundary, a new column of blocks must be drawn at the far right, after the block that will immediately be displayed (this is why you need some wiggle room).

The data is fully decoded to RAM but only one column is sent to VRAM, so there are 3 more to go during the next few VBlanks.

If the player changes his mind and goes back, there is no problem. If he causes the camera to cross a block boundary, the update in process is canceled and a new starts for the left side. If a boundary is not crossed, the update continues normally.

The camera did cross the block boundary after 2 tile columns were updated, but that's not important at all. The new update at the left is processed normally.

If the player doesn't change his mind again, that block will be updated, and that will continue to happen as long as the same direction is maintained. If the camera ever goes back to the right, that partially rendered block will be rendered from the start again.

So, there is no problem in updating blocks little by little if you have the something valid to display while the new data is rendered. The valid stuff will keep you from seeing any glitches.

by tokumaru on 2009-05-27 (#47282)

Dwedit wrote:
What about fully unrolled code in RAM?

Totally unecessary, in my opinion. It is very much possible to update a great amount of VRAM with partially unrolled code in ROM.

In my scrolling engine I can scroll 16 pixels in both directions if necessary every frame, while also updating sprites, and the bulk of the unrolled code takes only 96 bytes. To update the palette and sparse blocks I need a break from the columns or rows, but I challenge you to find a game (specially a NES game, since not even Sonic on the MD goes that fast) that scrolls 16 pixels in both directions every frame. So chances are there will be time available for other updates besides rows and columns quite often.

If Banshaku really wanted to update all 4 column during a single VBlank, that would be quite possible with a little extra unrolling of the code I presented. I just assumed he'd want to spread the update in order to have a simpler NMI routine, but it could work both ways.

Quote:
I think MC Kids uses that to do scrolling updates.

I believe I have expressed my opinions on this game a few times on this board already. I feel like it's nothing but a mediocre game that uses mediocre programming solutions, favoring hardware enhancements (such as extra RAM) to clever logic solutions.

by tepples on 2009-05-27 (#47283)

tokumaru wrote:
Quote:
I think MC Kids uses that to do scrolling updates.

I believe I have expressed my opinions on this game a few times on this board already. I feel like it's nothing but a mediocre game that uses mediocre programming solutions, favoring hardware enhancements (such as extra RAM) to clever logic solutions.

But it is a benchmark against which to measure your own programming solutions.

by tokumaru on 2009-05-27 (#47285)

tepples wrote:
But it is a benchmark against which to measure your own programming solutions.

Yes, but just because the author wrote an article about how he made the game many people seem to look at it like the holy bible of platformers.

I have nothing against comparing current solutions to solutions used in the old commercial games, but we have to stop thinking of the programmers of those games as gods, celebrities or something. What they came up with is in no way better than what we can come up with nowadays. The only reason production was better back then is that they were paid to do it. If we were too we'd have much better stuff being released.

Also, this game in particular seems to come up more often than others, probably because of the article, that drew more attention to its internal workings.

by Dwedit on 2009-05-27 (#47289)

I only brought it up because it's one of the few games which I've examined the VRAM update code for. The other games I've looked at are Battletoads, and Monster Max for the GB.

Monster Max used some crazy tricks for its platform, a series of ld hl,XXXX / push hl instructions stored inside RAM that wrote to VRAM.
Battletoads is just plain unroll crazy, even though it screws up by triggering the page crossing 1-cycle penalty many times.

by Bregalad on 2009-05-27 (#47291)

I guess completely unrolled loops are useless when partially unrolled loops can get to about 95% of their performance and waste ridiculously less ressources.
I can do 2 row/columns updates + sprite OAM + palette update with completely rolled loops tough.

Tokumaru your shematics are nice. Yet I fail to see exactly how you "trigger" updates, but I guess it doesn't really matter, as long as you say it works. I'd rather come up with my own project-specific solution anyway. I was just saying that the direction changes were often the more pain in the ass when working with updates split into small parts.
One cause of this is the combination of multidirectionnal scrolling and that.
I had a system (the one I lost I mentionned above) where it was the opposite of yours : I updated single 8-pixel columns at a time, and large 4-tile rows by splitting updates into small partss.

The problem was that when updating a column during the split updates of a row, the row was updated on a place it wasn't suppoed to (where the column was updated). When continuing the scroll in the same direction it would just get updated again so that wasn't a problem, but when changing my system was relying on the rest of the row not being updated to go bakward, and the resulted in possible garbage when changing direction vertically while scrolling horizontally. The workarround I eventually found is to only allow direction changing for values of the vertical scroll which are multiple of $10.
Also note that I was trying to minimize updates on that scroller to save them for CHR-RAM.

by Memblers on 2009-05-27 (#47295)

tokumaru wrote:
Of course it's not possible to have clean scrolling in both directions, as we've discussed countless times.

I bet a few people might roll their eyes as they read this, but I wanted to mention that 4-screen mode shouldn't be ignored. Especially if using CHR-RAM already, most of the hardware is already there. But I'd still say it's only worth it if the game (opposed to just to hiding glitches) is helped greatly by having 4 nametables (it's best kept simplified).

Dwedit wrote:
What about fully unrolled code in RAM? I think MC Kids uses that to do scrolling updates. That game also happens to have the 8K WRAM chip, but still...

I was working on a game a while back that just couldn't work without it. It wasn't for the scrolling though, it was needed for all the sprite animations. Rotations and such. It was basically designed around having the highest maximum VRAM throughput.

Bregalad wrote:
I guess completely unrolled loops are useless when partially unrolled loops can get to about 95% of their performance and waste ridiculously less ressources.

You're right about the resources, but LDA #immediate + write is only 6 cycles. It can copy 256 bytes to VRAM like it's nothing. Even sprite-DMA is only 3 times faster than this. It's overkill for scrolling, of course.

by tokumaru on 2009-05-27 (#47296)

Bregalad wrote:
Yet I fail to see exactly how you "trigger" updates

Code:
lda CameraX
eor OldCameraX
and #%00100000

If the result is not 0, you have to draw a new column. Check the direction of the last movement to tell if the movement was to the left or to the right, and based on that decode the appropriate column from the map and calculate the appropriate destination in VRAM.

Memblers wrote:
I bet a few people might roll their eyes as they read this, but I wanted to mention that 4-screen mode shouldn't be ignored.

I think this is a pretty good solution, and I usually mention that the problem mainly exists because of the lack of 4-screen "mirroring", but the sentences come out too long if I say it all the time. But yeah, what I meant was "it's impossible to have glitchless scrolling in both directions without masking tricks or extra name table memory (4-screen)". Since 4-screen is not so common (and modifying a cart to use is not so trivial), I usually favor the masking alternative.

Quote:
You're right about the resources, but LDA #immediate + write is only 6 cycles. It can copy 256 bytes to VRAM like it's nothing.

But you also have to consider that buffering the data that format would be pretty time-consuming. You know, using 1 byte out of every 5 should be pretty hard to index efficiently, so you'll spend much of your game logic time shuffling bytes around.

by Celius on 2009-05-27 (#47299)

I agree with tokumaru about copying all of the code to RAM. I thought about it when making my extended Vblank engine, but found that it would be too hard to deal with when trying to update the data in each lda #xx statement. Instead of doing that, I actually stuck with doing this:

lda Array
sta $2007
lda Array+1
sta $2007
lda Array+2
sta $2007
....

With no indexing to save time. I also never check if I'm crossing a name table boundary because I always update the entire column/row of tiles on the name table. It's just up to me to arrange the data in each array so that it is displayed correctly. Though one might argue this takes more time and adds as much complexity as you'd get just copying code to RAM. I'd have to think about that some more though.

I usually stick with vertical mirroring. Though there's no clipping option, it's more noticeable with glitches horizontally as our eyes are side by side rather than in our skulls vertically, and since that's the case we notice things like horizontal symmetry more so than vertical symmetry.

I actually was lazy with my side-scrolling scrolling engine and I don't check for updates. Instead I just tell it to copy the column of tiles that should be displayed at ScreenX + 256 or ScreenX - 1 depending on which direction the screen is scrolled in. It does it every frame. It's okay though as it doesn't take very long for the map decoding. In a bigger game with multi-directional scrolling, I would definitely need to check for updates.

Oh, and I finally checked out MC Kids. I also find it pretty mediocre, and would much rather like to hear what is done in SMB3, or Batman ROTJ, or other more impressive games. I didn't find anything impressive about that game, and actually found the controls really annoying. I would not look to it for advice on anything (maybe I'd check out what it does for diagonal collision, but I think I already know how to do that).

by tokumaru on 2009-05-27 (#47301)

Celius wrote:
Instead of doing that, I actually stuck with doing this:

lda Array
sta $2007
lda Array+1
sta $2007
lda Array+2
sta $2007
....

With no indexing to save time.

I use similar code, but it uses indexing so that it's possible to select what portion of the buffer to read, and also to copy variable amounts of data by entering at different points of the copying code. About speed, it doesn't matter if you're using indexes if the RAM in question is not zero page. For regular RAM, "LDA $XXXX" and "LDA $XXXX, X" take the same time, 4 cycles.

Quote:
Oh, and I finally checked out MC Kids. I also find it pretty mediocre

Read the article and you'll see that there is nothing special about it. In the "Game Levels" section they explain how needed the 8KB of RAM were so that they could have modifiable levels, but seeing as how they ended up storing levels uncompressed in that RAM, the levels couldn't be very large anyway, meaning that there wouldn't be a lot to modify (to the point where the modifiable stuff could be represented with objects, IMO). Most of the time they just picked the straightforward solutions, probably because they were easier to implement. The way they handle terrain collisions is pretty bad too, they even left a bug that results from the way they handle characters going up a hill.

by Banshaku on 2009-05-27 (#47302)

Quite a burst of messages in such a short amount of time! A lot of interesting comments too.

tokumaru wrote:
I hope my ideas give you some good ones.

Of course. For now I have a basic idea of what you're doing. I still need to re-read the code again to understand it well.

From some of the comments, it seems sometime people have some "animosity" when people bring their concept/idea. From my point of view, any concept presented is invaluable information, without regarding how big it is, especially when you're learning a system. It doesn't mean that you have to use that concept/code exactly, but it can help you think in way you may have not done by yourself.

The first game I want to do is to re-make an old dos game that I started 15 years ago (I want some closure). It only scroll in one direction so it should be easier to develop for my first game. The challenge will be to scale down the color/sprite size at first.

The other ones will requires scrolling in both direction and already started to write down ideas. By the time I finish the first one is should be easier to implement scrolling in both direction.

by Celius on 2009-05-27 (#47303)

tokumaru wrote:
I use similar code, but it uses indexing so that it's possible to select what portion of the buffer to read, and also to copy variable amounts of data by entering at different points of the copying code. About speed, it doesn't matter if you're using indexes if the RAM in question is not zero page. For regular RAM, "LDA $XXXX" and "LDA $XXXX, X" take the same time, 4 cycles.

I use indexing similarly for a peice of code which copies CHR data from PRG ROM to CHR RAM:

lda $8000,x
sta $2007
lda $8001,x
sta $2007
lda $8002,x
sta $2007
...
lda $800E,x
sta $2007
lda $800F,x
sta $2007

And you just load X with a multiple of $10 to load a tile from $8000-$80FF. I have that peice of code for $8000, $8100, $8200... all the way up to $AF00. It makes tile copying really fast.

But still, checking for name table boundaries SUCKS. I remember coding a multi-directional scroller as being one of the worst programming experiences of my life (that was primarily because of the attribute table, though). However, I can't say I've had that many programming experiences!

Leaving in bugs is pretty unacceptable, depending on the seriousness of the bug. But I thought horizontal movement in the game was really annoying more than anything. It made me feel pretty okay about the physics I just programmed for my game . I'll have to read more of the article to fully appreciate the game's mediocrity.

by Banshaku on 2009-05-28 (#47309)

Tokumaru, after reviewing the code example, I have a few questions.

First, the tiles variables. Lets take tile01 as an example. The first number represent which row of the meta-metatile. The second one is the column. Is it correct?

For the way of decoding, I'm starting to understand it. The only thing is that is seems that tile0 of all metatile are located one after another. I can see why, since it seems to make it easier to process. In my case, tile0/1/2/3 are sequential. I will have to see how it the issue with this approach.

Once decoded in memory, I see what you want to do but this part:

Quote:
Anyway, after the buffers are ready, you can write them to VRAM with something like this:

Code:
ldx FirsrTile
-Update:
lda Tile00, x
sta $2007
lda Tile10, x
sta $2007
lda Tile20, x
sta $2007
lda Tile30, x
sta $2007

inx
cpx #$07
bne -Update

Of course you can unroll that, arrange the buffer backwards so that you don't need a "cpx" instruction, whatever you want in order to optimize it. Anyway, "FirstTile" would be set beforehand to 0, 7, 14 or 21, depending on which of the 4 columns you want to update.

You say that you can change the X so to know which column to access. I don't think it's possible in this current example. X only represent which meta-metatile you want to access, 7 possibles only. If you want to access the second row, you would have to change tiles00 to tiles01, tiles10, to tiles11 etc.

One way I tough that I could do it is by changing a little bit the variables order and using both indexes. Maybe it's not a good approach but trying out algorithm is a good way to improve my knowlege of the 6502 ASM. Here's how I would do.

First, I would change the variable order this way:
Code:
; Column 0
Tile00 .dsb 7
Tile10 .dsb 7
Tile20 .dsb 7
Tile30 .dsb 7

; Column 1
Tile01 .dsb 7
Tile11 .dsb 7
Tile21 .dsb 7
Tile31 .dsb 7

; Column 2
Tile02 .dsb 7
Tile12 .dsb 7
Tile22 .dsb 7
Tile32 .dsb 7

; Column 3
Tile03 .dsb 7
Tile13 .dsb 7
Tile23 .dsb 7
Tile33 .dsb 7

There is 28 bytes per column. In the X register, I need to put the column I want to access. It can either be 0, 28, 56 , 84. I could put them in a LUT maybe this way:

Code:
ColumnIndex:
.byte 0, 28, 56, 84

So the code becomes

Code:

ldx #$01 ; I want the second column

lda ColumnIndex, x ; Get the index
tax

ldy #$0 ; My counter for meta-metatile to show

-Update:
lda Tile00, x
sta $2007
lda Tile10, x
sta $2007
lda Tile20, x
sta $2007
lda Tile30, x
sta $2007

inx
iny

cpy #$07
bne -Update

This way I can access the column based on index. Is not perfect thought. For example, I don't like that the name used to access the data based on the index is the first column name. I'm trying to find a cleaner way to do it. I guess the more I will practice, the more I will be able to think in 6502. This is my current problem at the moment. I'm just used to higher level language so the solution doesn't come easily.

Does the modification I made make sense? I didn't test it by code so this is all theoretical.

by tomaitheous on 2009-05-28 (#47314)

tokumaru wrote:
Read the article

Cool link. Love reading articles like that from past developers.

by tokumaru on 2009-05-28 (#47317)

Banshaku wrote:
irst, the tiles variables. Lets take tile01 as an example. The first number represent which row of the meta-metatile. The second one is the column. Is it correct?

Yeah. And I've named the RAM buffers TileXX but the ROM lables TileX, I know that sucks, but it was just to get the idea across... =)

Quote:
For the way of decoding, I'm starting to understand it. The only thing is that is seems that tile0 of all metatile are located one after another. I can see why, since it seems to make it easier to process.

Yeah, that's what I was calling interleaving. It makes accessing data easier, because you can read a whole block using the same index. i store most of my data like this. I believe this is what tepples calls "structure of arrays", as opposed to "arrays of structures". The 6502 really works best with the former.

Quote:
You say that you can change the X so to know which column to access. I don't think it's possible in this current example.

My mistake. It should work as I described, what I got wrong was that "cpx #$07", which will obviously cause problems if you load it with anything besides 0. You used Y to count instead, which fixes the problem.

I assumed the tiles would be numbered like this:

Code:
00 01 02 03
10 11 12 13
20 21 22 23
30 31 32 33

If X is initially 7, the first address read will be Tile00 + 7, which is the same as Tile01. If it was 17, the first address would be Tile00 + 14 which is the same as Tile02. I was just thinking of it as row-based, while you were doing it column-based, but the idea is exactly the same.

Quote:
For example, I don't like that the name used to access the data based on the index is the first column name. I'm trying to find a cleaner way to do it.

Well, it's one of the prices you pay for using optimized code, it usually isn't as readable as the unoptimized version. You could always document it well, saying that the buffer must be ordered that way.

If what makes you uncomfortable is accessing an array by using another array's name (something that would most likely be an error in a high level language), you could always define a single 112-byte array and use constants and formulas to index it properly. So instead of:

Code:
; Column 0
Tile00 .dsb 7
Tile10 .dsb 7
Tile20 .dsb 7
Tile30 .dsb 7

; Column 1
Tile01 .dsb 7
Tile11 .dsb 7
Tile21 .dsb 7
Tile31 .dsb 7

; Column 2
Tile02 .dsb 7
Tile12 .dsb 7
Tile22 .dsb 7
Tile32 .dsb 7

; Column 3
Tile03 .dsb 7
Tile13 .dsb 7
Tile23 .dsb 7
Tile33 .dsb 7

You'd have:

Code:
BLOCK = 1
ROW = 7
COLUMN = 7 * 4

TileBuffer .dsb 112

And to access tile (3, 3) of meta-metatile 2 you'd use address TileBuffer + BLOCK * 2 + ROW * 3 + COLUMN * 3. If you use an index register to index the blocks just leave the "BLOCK *" part out.

It solves your problem of going into the arrays using the wrong names, but it's not very pretty. Of course you could pre-calculate the 16 constants used to index each tile, and although that would be very little different than having multiple tables, you'd still have the illusion of using a single table. The assembly output will be the same, so there will be no performance issues. I doubt you could maintain the performance and the simplicity of the code if you decided to organize the data in some other way though.

Anyway, there will be some implications in assembly that don't exist in high-level languages that we must learn to deal with. Sometimes we will need variables and arrays to be defined in a certain order because of the way we access them. Sometimes we will have to place code at specific locations because of bankswitching, page-crossing penalties and things like that. Sometimes you will have to sacrifice readability if you're aiming for performance. None of those things are "clean", but it should be OK if you document well whatever may be unclear and you'll hardly have problems in the future.

Quote:
Does the modification I made make sense? I didn't test it by code so this is all theoretical.

Yeah, you got the idea perfectly. I think we just had columns and rows switched, and that's why our code is different (and also my "cpx" mistake).

by Banshaku on 2009-05-28 (#47319)

A little bit after you posted, I finished to make it work. I can now print my meta-metatile on the screen. It was a pain at first since I didn't do any assembly for a while (you lose it quite fast). I had 1 bug which I will explain later. On the bright side, it made me use the debugger a lot and learn many things at the same time.

tokumaru wrote:
Yeah, that's what I was calling interleaving. It makes accessing data easier, because you can read a whole block using the same index.

More than easier actually. It reduce the code a lot. Because I was not using any interleaving, it doubled the code with all the extra asl/asl/tax just to find the right position of every metatile.

tokumaru wrote:
i store most of my data like this. I believe this is what tepples calls "structure of arrays", as opposed to "arrays of structures". The 6502 really works best with the former.

I never thought about that way of storing data before, this is quite new to me. If you would have to edit the metatile by hand... That would make it (maybe) a little bit harder to update. With an editor, you don't have to worry about it. After seeing the benefit of it, tomorrow I will add a quick hack in my editor export code to interleave the data.

For the writing the column, I'm using column based write. I first set the PPU address then write the tile by 32 increment. Because of that, I made a little mistake in the code I wrote by hand above. The code for writing a column is wrong since it writes 4 columns at the same times. In my case, I buffer 4 columns and write them one a time, when required. The code should have been instead:

Code:

ldx #$01 ; I want the second column

lda ColumnIndex, x ; Get the index
tax

ldy #$0 ; My counter for meta-metatile to show

-Update:
lda Tile00, x
sta $2007
lda Tile01, x
sta $2007
lda Tile02, x
sta $2007
lda Tile03, x
sta $2007

inx
iny

cpy #$07
bne -Update

For now I don't mind anymore about the name because like you mentioned, not much you can do with optimized code. I will live with it. I'm used to work with 1 dimension array for my map data in higher level language but now the way to define those interleaved tiles for the buffer, I'm starting to like it after all.

Now here it's 2h in the morning and I will have quite a lot of fun to wake up tomorrow for work!... I'm trying to think if there a way to save my map data to make it easier to access it by column but I'm too sleepy for thinking clearly. My map data is column based. This mean the first 7 bytes represent the 1 column of meta-metatile, the next 7 ones the next column etc. I don't know if I can optimize it in or that is already fine that way. I guess I should think about it tomorrow instead.

Thanks for the comments, that made me learn a lot of new things!

by CartCollector on 2009-06-06 (#47708)

Quote:
Quote:
Of course it's not possible to have clean scrolling in both directions, as we've discussed countless times.
I bet a few people might roll their eyes as they read this, but I wanted to mention that 4-screen mode shouldn't be ignored. Especially if using CHR-RAM already, most of the hardware is already there. But I'd still say it's only worth it if the game (opposed to just to hiding glitches) is helped greatly by having 4 nametables (it's best kept simplified).

OT but:
Actually it is possible to have clean 4-way scrolling with only the NES's built in 2k name table and attribute table RAM. Look at Super C and Jurrasic Park. If you use vertical mirroring and disable PPU rendering or set all CHR-ROM to a bank of blank tiles for at least the first and last 8 scanlines (the latter method is used by Jurassic Park), and you disable the rendering of the first 8 scanlines, then you'll get clean four-way scrolling for both the background and sprites. Now you could argue, especially if the game you're writing is PAL, that effectively reducing the resolution to 248 x 224 is limiting, but it's better than getting crap on the sides of the screen all the time. Another caveat is that these methods are hard to do without an MMC3 or better, but if you're writing a game that uses four-way scrolling, chances are you're already using a mapper that meets that requirement.

by Bregalad on 2009-06-06 (#47713)

Without an advanced it may be hard to hide the bottom 8 scanlines, but definitely not the top 8 scanlines since those are right after VBlank. As a plus, you can have sprites scrolling smoothly off the screen if you do that (however you'll need to blank 16 lines if you use 8x16 sprites).

by tokumaru on 2009-06-06 (#47728)

Yeah, we know it's possible, it's just not a trivial thing to do. Very few commercial games put actual effort into hiding scrolling glitches. The most notorious ones, IMO, are Jurassic Park, which you mentioned, Alfred Chicken and Felix the Cat. Those last 2 use the very extreme method of making the leftmost 8-pixel column by hardware and masking the rightmost 8-pixel column with lots of 8x16 sprites (!) while using horizontal mirroring.

Some games that mask scanlines at the bottom of the screen instead should also be mentioned. Most use an IRQ for this, and that's technically uninteresting for homebrewers that aren't using sophisticated mappers. One very interesting game that manages to do this with a sprite 0 hit is Big Nose Freaks Out. I believe other games by Code Masters did as well, such as Micro Machines. Relying on a sprite hit at the bottom of the screen is a pretty risky thing to do. You must be pretty confident about your timing, because if frame calculations do not finish in time, the resulting glitches will look very unpleasant.

by Celius on 2009-06-07 (#47740)

Actually, if you have a scanline or cycle based counter, you could wait until scanline 232 (8 pixels from the bottom) and turn off the screen until 8 scanlines outside of the next frame's Vblank to turn the screen back on. I do something similar for my polygonal movie drawing, except it's way more extreme (40 scanlines from the bottom, wrapping around 40 scanlines outside of the next frame's Vblank) and it can be done with sprite 0 hit. It requires very precise timing though.

The thing about 8x16 sprites actually isn't a bad solution if using horizontal mirroring... That would take 15 sprites though, and I guess that's a lot. However if you have not a lot of sprites on the screen at once, I guess it's a pretty good solution.

by tepples on 2009-06-07 (#47743)

tokumaru wrote:
Relying on a sprite hit at the bottom of the screen is a pretty risky thing to do. You must be pretty confident about your timing, because if frame calculations do not finish in time, the resulting glitches will look very unpleasant.

The same is true of any game for Super NES or Game Boy Advance that uses mode 7: hiccups while generating the HDMA tables can cause all sorts of yucky glitches.

by Bregalad on 2009-06-07 (#47744)

Well as long as the HDMA transfers are initied correctly during VBlank there is not much risk, the CPU can slow down as much as it wants the screen won't show any glitches. (for SNES at least I haven't inverstiged GBA yet)