Game project help and progress thread

Re: Game project help and progress thread
by tokumaru on 2015-03-23 (#143602)

Tsutarja wrote:

1. What would be the best way to do a fade in/out with a single routine?

I like to keep a copy of the unmodified palette in RAM, and have a routine generate a darkened copy of it based on a global brightness level. Pretty much what Bregalad said, except I'm not a big fan of the 4-step fade because it looks very jerky unless you animate it really fast, which kinda defeats the purpose of having a fading routine, which is to make transitions smoother. I prefer to simulate more brightness levels in some way, like darkening the colors one frame, moving them towards blue the other frame, or darkening colors of different brightness at different times ($0x colors become black, then $1x colors become $0x, and so on, until they're all black). You can use whatever darkening method you like in the fading function.

Quote:

2. What would be a good way to make main loop recognize the game's state (title, paused, main game, cutscene etc.)? Should I put a short code at the beginning of the main loop that checks a variable for game state and then jumps to that part of the main loop?

I vote for different loops for different states. In fact, all the different states in my programs have an entry point, where any necessary initialization takes place, followed by the loop itself. This way I can switch to another game mode simply by JMPing to its entry point.

Code:

InitializeTitleScreen:
   ;initialize variables, draw the name table, etc.
UpdateTitleScreen:
   ;read controllers, move the cursor, etc.
   jmp UpdateTitleScreen

InitializeGameplay:
   ;initialize variables, decode a level, set up the pattern tables, etc.
UpdateGameplay:
   ;read the controllers, process game objects, etc.
   jmp UpdateGameplay

Depending on how complex your pause screen is, it might not deserve its own state, but if it's a complicated menu or something, you might want to make it a sub-state, since it will interrupt the gameplay and resume it later. The main difference is that its initialization can't be destructive, it has to use RAM that's not in use by the parent state and it has to be aware that modifications to the screen and other things used by the gameplay have to be undone before returning.

In my projects I even allow custom NMI routines for the different states. My NMI checks the high byte of a pointer in RAM, and if that byte isn't 0, it jumps to that address. If the byte is 0, it just sets a flag indicating that VBlank has started, so that simpler game states can use the good old "wait for VBlank" structure.

Re: Game project help and progress thread
by tokumaru on 2015-03-23 (#143603)

Bregalad wrote:

No. I don't even know what is your sleep variable. Why would you want multiple main loops ?

I also have no idea what the sleep variable is, but multiple main loops help keep the program organized and the different parts independent from each other. Personally I'd prefer that over spaghetti code with multiple game states mixed together any day. If separate game states happen to share a lot of code, you can consider merging them or turning the shared code into subroutines.

Re: Game project help and progress thread
by Bregalad on 2015-03-23 (#143605)

Oh, the method tokumaru posted looks quite organized and powerful. Personally I just don't work with "states" at all. To pause the game, well I just read the controller until start is pressed again and then return. Same for the title screen, I have a loop and when start is pressed, it continues down into the main program. It is just logical for me. However, it might not be as organized.

I think what I describe above could be considered multiple game loops. However, all of them are implicit, they are just natural, because I don't leave a routine (or part of a routine) until some event happen that makes me leave it with the rts instruction.

Re: Game project help and progress thread
by tokumaru on 2015-03-23 (#143607)

Bregalad wrote:

Personally I just don't work with "states" at all. To pause the game, well I just read the controller until start is pressed again and then return. Same for the title screen, I have a loop and when start is pressed, it continues down into the main program. It is just logical for me.

I don't think this is bad as long as the states aren't very complex. A pause screen that does nothing but wait for the game to be unpaused certainly doesn't deserve its own state. The same goes for a title screen that only waits for start. But once you have states that allow navigation and so on, I think it becomes important to separate things, otherwise the code might become hell to maintain.

Quote:

I think what I describe above could be considered multiple game loops. However, all of them are implicit, they are just natural, because I don't leave a routine (or part of a routine) until some event happen that makes me leave it with the rts instruction.

Yeah, I guess it's not that different.

Re: Game project help and progress thread
by Tsutarja on 2015-03-23 (#143610)

The sleep variable is what prevents main loop from running more than once per frame. I have seen a lot of people use this name for it.

Re: Game project help and progress thread
by tokumaru on 2015-03-23 (#143611)

Tsutarja wrote:

The sleep variable is what prevents main loop from running more than once per frame.

In other words, it's a wait for VBlank.

Re: Game project help and progress thread
by zzo38 on 2015-03-23 (#143632)

When using NESASM, remember these feature of its syntax:

For zero page addressing, put < before the address or label (for example LDA <var0 if your variable in the zero page is called var0).
For indirection in all cases, you must use square brackets rather than parentheses.

Some people (including myself) prefer this nonstandard syntax, although still remember it is nonstandard and many documents don't use it.

Re: Game project help and progress thread
by Tsutarja on 2015-03-24 (#143648)

zzo38 wrote:

When using NESASM, remember these feature of its syntax:

For zero page addressing, put < before the address or label (for example LDA <var0 if your variable in the zero page is called var0).
For indirection in all cases, you must use square brackets rather than parentheses.

Some people (including myself) prefer this nonstandard syntax, although still remember it is nonstandard and many documents don't use it.

Is there a syntax guide for NESASM somewhere? I haven't been able to find one. I feel like there might be some functions in NESASM that I don't know about.

Re: Game project help and progress thread
by Alp on 2015-03-24 (#143709)

Tsutarja wrote:

Is there a syntax guide for NESASM somewhere? I haven't been able to find one. I feel like there might be some functions in NESASM that I don't know about.

Depending on where you downloaded NESASM, you may or may not have this text file, that's included.
I've attached it to this post, for your convenience. (You may need to open it in a programming utility.)

Re: Game project help and progress thread
by Tsutarja on 2015-03-30 (#143974)

I've been trying to get the palette stuff working, but every time I come up with a idea on how to do it, I run into some problem that I can't solve... I may need some example code to further explain the whole fading thing.

Re: Game project help and progress thread
by tokumaru on 2015-03-30 (#143999)

I just coded (which means I haven't tested it!) a sample fading system, which hopefully will give you some ideas. Fading is actually a sequence of events, and the first thing we need to take care of is the animation itself: Are we fading? In or out? Is it time to change the brightness? Here's the code that handles this part:

Code:

   lda #$10 ;assume we're fading in and the step is positive
   ldx CurrentBrightness ;compare the current brightness...
   cpx TargetBrightness ;...against the target brightness
   beq DoneFading ;skip fading if the target was already reached
   bmi UpdateDelay ;skip the next instruction if we're indeed fading in
   lda #$f0 ;oops, we're actually fading out, make the step negative
UpdateDelay:
   dec FadeDelay ;decrement the frame counter
   bne DoneFading ;skip fading if the delay isn't over yet
   clc ;prepare for addition
   adc CurrentBrightness ;change the brightness level...
   sta CurrentBrightness ;...according to the step value
   jsr GenerateModifiedPalette ;generete the modified palette
   lda #$04 ;reset the delay...
   sta FadeDelay ;...to the value of your choice
DoneFading:

The nice thing about this system is that all you need to do in order to fade is set a new brightness target. Every time you change the target, this code will detect that it's different from the current brightness and will automatically animate towards the target. Call this once per frame. Obviously, you have to initialize CurrentBrightness, TargetBrightness and FadeDelay to the correct values on power up, but after that you only need to change TargetBrightness whenever you need to fade.

Note that this also allows you to fade to white, not only black. Here are the common values you'll use for different brightness settings (current and target):

Code:

$C0 (i.e. -$40) - all black;
$00 - normal brightness;
$40 (i.e. +$40) - all white;

The next step is actually modifying the colors. The code above calls the "GenerateModifiedPalette" function, which is this:

Code:

GenerateModifiedPalette:
   lda BackgroundColor ;get the background color
   jsr ModifyColor ;modify its brightness...
   sta ModifiedBackgroundColor ;...and save
   ldx #23 ;repeat the following for the other 24 colors
ModifyPalette:
   lda Palette, x ;get the color
   jsr ModifyColor ;modify its brightness...
   sta ModifiedPalette, x ;...and save
   dex ;move on to the next color
   bpl ModifyPalette ;repeat if there are still colors left
   rts ;return

This is just a loop to modify all 25 colors. Storing the palette like this is a personal choice of mine, because I find it silly to use 32 bytes for colors when the NES can only display 25. You can obviously modify this if you prefer the other way. Anyway, the actual color modification happens here:

Code:

ModifyColor:
   clc ;prepare for addition
   adc CurrentBrightness ;change the brightness
   cmp #$0d ;compare the result against the stupid "forbidden color"
   beq UseBlack ;use black if the result is the "forbidden color"
   cmp #$c0 ;check if the result is too dark
   bcs UseBlack ;use black if the result is too dark
   cmp #$40 ;check if the result is too light
   bcs UseWhite ;use white if the result is too light
   rts ;return the modified color
UseBlack:
   lda #$0f ;use black
   rts ;return black
UseWhite:
   lda #$30 ;use white
   rts ;return white

This is all there's to it. All that's left now is to copy the processed palette to VRAM during VBlank, but I'm sure everyone knows how to do this, so I won't make this post any longer with something so trivial.

Note that this system is pretty versatile. Just as easily as you can fade to black, normal or white, you can fade to the intermediary steps. For example, you could use a target brightness of $E0 to make a room darker but not quite black, for when characters destroy light bulbs or something like that. Another possibility would be to set both CurrentBrightness and TargetBrightness (so that the change is instantaneous, without any animation) to $30, and manually call GenerateModifiedPalette to make everything very bright but not quite white, for effects like lightning storms.

I used the very simple concept of "subtracting or adding $10" to modify the colors, but you could still use the same overall idea with other types of color modification if you wanted. Personally, I prefer something smoother, but the vast majority of games appear to settle for this basic method.

EDIT: Crap, just noticed a mistake. In the first block of code I trashed the status flags when loading the default step value. I fixed it now. Hopefully I didn't make any more mistakes.

EDIT: Opps, another little mistake. I used bpl instead of bmi. Fixed it.

EDIT: Tested everything in the 6502 simulator and it appears to work fine now.

Re: Game project help and progress thread
by Tsutarja on 2015-03-31 (#144038)

Hmm... The fading itself works, but the colours and scrolling positions get messed up. I have no idea what causes this.

Code:
http://pastebin.com/uZVR2379

Re: Game project help and progress thread
by tokumaru on 2015-03-31 (#144045)

The only thing obviously wrong I see is the way the palette is being copied to VRAM. It looks like you first copy the background color, and then copy the remaining colors in groups of 3, but incrementing X 4 times each iteration. You are compensating for the fact that you have only 25 colors instead of 32 the wrong way. You're skipping one entry in the array every 3, causing a misalignment with the hardware palette and not filling all of it. What I do in this case is write the background color again every 3 writes. Something like this:

Code:

LDA PPUStatus
 LDA #$3F
 STA PPUAddr
 LDA #$00
 STA PPUAddr
 LDX #$00
 LDY bg_colour_modified

NMIPalLoop:
 STY PPUData
 LDA palette_modified, x
 STA PPUData
 INX
 LDA palette_modified, x
 STA PPUData
 INX
 LDA palette_modified, x
 STA PPUData
 INX
 CPX #$18
 BNE NMIPalLoop
 LDA #$00
 STA <palette_update_flag

~~I didn't notice anything that could be affecting the scroll~~ (see EDIT below). Let's see what happens when you fix this.

I did notice a few unnecessary things. For example, I don't see any need to initialize the modified palettes with the same colors as the unmodified palette, considering that in the beginning the modified palette will most likely be all black, so you're filling it just to have it overwritten with blacks soon after. It would make sense to manually call BrightnessModify as part of the palette initialization process though, to make sure that the correct palette will be properly sent to the PPU as soon as possible:

Code:

 ;initialize the brightness
 LDA #$C0
 STA <brightness_current
 LDA #$00
 STA <brightness_target

 ;initialize speed and delay
 LDA #$02
 STA <fade_speed
 STA <fade_delay

 ;generate the data for the VRAM update
 JSR BrightnessModify

 ;request the VRAM update
 LDA #$01
 STA <palette_update_flag

If you don't do something like this, you'll end up with an unitialized palette on screen for a few frames, instead of the first brightness level. After this, the system is ready to function on it's own.

BTW, making the delay configurable was a nice touch. Makes the fading system even more versatile! =)

EDIT: The "ScrollUpdate" is before the palette update. Setting the scroll should be the very last PPU operation in the NMI, because anything you do with $2006/$2007 will mess up the scroll. You're also missing a $2000 write to select which name table rendering should start in. This write should be near (before or after doesn't matter) the $2005 writes.

Re: Game project help and progress thread
by Tsutarja on 2015-03-31 (#144047)

Got that one working now. I'm not sure if the fade looks correct now (at least to me it does seem odd somehow)

https://www.youtube.com/watch?v=S28xVlw ... e=youtu.be

I probably should next make the background data compression/decompression system, graphics buffer and background update part for my NMI.

Re: Game project help and progress thread
by tokumaru on 2015-03-31 (#144049)

Tsutarja wrote:

https://www.youtube.com/watch?v=S28xVlwj9Vc&feature=youtu.be

Looks good to me (especially combined with the wobbling effect!), except for the gray(?) screen at the very beginning. Did you do the initialization as I suggested, with forced palette generation and update?

Quote:

I probably should next make the background data compression/decompression system, graphics buffer and background update part for my NMI.

Are you going for a generic NMI that will be used throughout the whole program? Just keep an eye on how much CPU time you're spending on it. For larger data transfers, you should consider some degree of loop unrolling.

Re: Game project help and progress thread
by Tsutarja on 2015-03-31 (#144114)

tokumaru wrote:

Yeah I did try it, but I couldn't get it to work. I don't know whether its that I did it wrong, or that I don't do it early enough (that's why I posted from the beginning to the palette initialization)

Code:

RESET:
 SEI
 CLD
 LDX #$40
 STX $4017
 LDX #$FF
 TXS
 INX
 STX PPUCtrl         ; Disable NMI
 STX PPUMask         ; Disable rendering
 STX DCMIRQ         ; Disable DPCM

VBwait1:         ; First PPU warm up wait
 BIT PPUStatus
 BPL VBwait1

ClearMem:         ; Clear internal memory
 LDA #$FF
 STA $0200, x         ; Set OAM to #$FF to render sprites off screen
 LDA #$00
 STA $0000, x         ; Clear Zero Page
 STA $0100, x         ; Clear Stack
 STA $0300, x         ; Clear Sound Engine RAM
 STA $0400, x         ; Clear Graphics Buffer RAM
 STA $0500, x
 STA $0600, x
 STA $0700, x
 DEX
 CPX #$00
 BNE ClearMem

 LDA #LOW(BgTitle00)
 STA <pointerLo
 LDA #HIGH(BgTitle00)
 STA <pointerHi

VBwait2:         ; Second PPU warm up wait
 BIT PPUStatus
 BPL VBwait2

 LDX #$00
 LDA BgColour, x
 STA bg_colour_original

PaletteLoad:
 LDA Palette, x
 STA palette_original, x
 INX
 CPX #$17
 BNE PaletteLoad

 LDA PPUStatus
 LDA #$3F
 STA PPUAddr
 LDA #$00
 STA PPUAddr
 LDX #$00
 LDY bg_colour_original

PalInit:
 STY PPUData
 LDA palette_original, x
 STA PPUData
 INX
 LDA palette_original, x
 STA PPUData
 INX
 LDA palette_original, x
 STA PPUData
 INX
 CPX #$18
 BNE PalInit

tokumaru wrote:

Yeah, I will use the same NMI throughout the whole game. I will probably do a tiles to metatiles and metatiles to screens type of compression. That probably doesn't take that much time since many games use it (?). I probably don't need to update more than one or two vertical background rows at once (and that probably split up over few frames) plus few tiles caused by other objects. During transitions I'll probably just turn off the PPU and load the background during CPU time and turn on the PPU when done. I've read that some people do that, but I'm not sure if that's recommend or not.

Re: Game project help and progress thread
by koitsu on 2015-03-31 (#144117)

You're missing a bit PPUStatus prior to VBWait1. Reference: http://wiki.nesdev.com/w/index.php/PPU_power_up_state

Small optimisation: get rid of cpx #$00 -- it serves no purpose. dex will set the zero flag when the register/effective address value decremented becomes zero. 65xxx CPUs are fantastic for this sort of thing. :-)

Off-the-cuff guess: maybe you get a grey screen is because you're spending a lot of CPU time (memory initialisation, etc.) without having pre-set all palette entries screen to black (and nametable + attribute tables to something that will use such a palette, e.g. $00 or maybe $FF for everything). That's my guess anyway. Some emulators choose to show things as "grey" until that's done. You might try setting $2006 to $0000 and $2005 as well, to see if maybe that settles things a bit.

That said: I doubt this is the full program. I do not see anywhere where you enable background or sprites by adjusting PPUMASK, so I can't even tell what is going on between the point where you turn off the screen and where you start manipulating the palette in real-time.

It should be very easy to find out what is taking up all the time ("grey screen") using an emulator like FCEUX. Step through your code gradually piece by piece and you'll figure it out.

Re: Game project help and progress thread
by tokumaru on 2015-04-01 (#144133)

Tsutarja wrote:

Well, this code is not what I suggested at all. This is just uploading the original palette to the PPU. Why would you do this, if you want to start out all black? What I suggested was: copy the original palette from ROM to RAM (this part is OK), then initialize all fading variables (current brightness, target brightness, delay and speed) then call BrightnessModify to force the generation of the processed palette and finally set the flag that requests a palette update, to force the modified palette to be uploaded the next VBlank.

These last 2 operations have to be forced because normally they only happen when the delay expires and the system advances to the next brightness level, but you need to get the very first palette up somehow. The alternative would be to initialize the delay to 1 instead of the full value, and call BrightnessControl, so the delay expires immediately (triggering the generation of the modified palette and the request for an update). For that to work you'd also have to initialize the current brightness to one level before than what you really want to start with (in this case that would be $b0), otherwise you'll never see the first palette.

Quote:

I will probably do a tiles to metatiles and metatiles to screens type of compression. That probably doesn't take that much time since many games use it (?).

Depending on the size of your metatiles, this could still be a lot of data. If they are the common 16x16-pixel kind, that means each screen will take 240 bytes. An 8KB PRG-ROM bank would only be able to hold 34 such screens. Larger metatiles, such as the ones used in Mega Man (32x32 pixels), are much better in terms of compression. There's always metatiles of metatiles, but the complexity this adds to map decoding sometimes scares people off!

Quote:

During transitions I'll probably just turn off the PPU and load the background during CPU time and turn on the PPU when done. I've read that some people do that, but I'm not sure if that's recommend or not.

That's OK as long as the NMI handler is well structured enough to not get in the way of big updates. It's not advisable to turn NMIs off, mainly because of music, so the NMI handler should be able to detect that it interrupted a large PPU operation and that's it's not supposed to touch the PPU at all in that case.

Re: Game project help and progress thread
by Tsutarja on 2015-04-01 (#144264)

Okay, I managed to get the "grey screen time" at the startup down to 2 frames, which seems to be what many games have.

tokumaru wrote:

There's always metatiles of metatiles, but the complexity this adds to map decoding sometimes scares people off!

Another question is: How much more time does it cost to decode over just one metatile compression, and is it worth it? I think there was a article about data compression and data buffering somewhere, but I can't remember where I found it.

By the way, how are attributes stored? Are they in the background data or are they in a separate table?

Re: Game project help and progress thread
by tokumaru on 2015-04-02 (#144288)

Tsutarja wrote:

Okay, I managed to get the "grey screen time" at the startup down to 2 frames, which seems to be what many games have.

That's probably because of the VBlanks we're required to wait so the PPU can warm up. What you could do is separate the gray frames and the fade in with a few intentional black frames. Half a second (30 frames), maybe. That should help players dissociate the initial flash from the actual fading effect.

Quote:

Another question is: How much more time does it cost to decode over just one metatile compression, and is it worth it?

That depends entirely on how you're doing it. The absolute simplest way would be to decode it all to RAM beforehand, in which case speed would be completely irrelevant. This is only an option if you have extra RAM in the cartridge, though.

If you're decoding the data in real-time, then the amount of optimization in the code will make a big difference. Unrolled code would probably be fast enough to use in real-time without problems.

I think this is something you're gonna have to do the math and decide what the best course of action is yourself. Are your levels really big? If not, it might make more sense to use more PRG-ROM and keep the data formats simple.

Quote:

By the way, how are attributes stored? Are they in the background data or are they in a separate table?

This is completely up to you, but to me it makes sense to keep palette attributes as part of the metatile, along with collision information and all other attributes.

Re: Game project help and progress thread
by Tsutarja on 2015-04-05 (#144552)

I think I have now come up with a decent idea for a graphics buffer.
Every time data is written to the buffer, bg_update_flag will be incremented. Data is written in "length, start address, data" order. Every time a byte is written to the buffer, buffer_offset will be incremented. During NMI bg_update_flag is compared to #$00. If true, graphics updates are skipped. If false, data buffer will be read. Byte indicating length is copied to RAM and decremented after every write to the PPU. At the same time buffer_offset_nmi is incremented. When length reaches zero, bg_update_flag is decremented and compared to #$00. Reading the buffer will continue until bg_update_flag reaches zero. At this point both buffer offsets are reset to zero and graphics updates end.

Do you think that this would be a good way to do this? I haven't thought of the data compression, metatiles, etc. much yet, since the data is uncompressed in the buffer anyway. I will get to that once the buffer itself works.

Re: Game project help and progress thread
by tokumaru on 2015-04-05 (#144559)

Tsutarja wrote:

Do you think that this would be a good way to do this?

Sounds like a good system, just a bit slow. If the data transfer loop is like this:

Code:

CopyByte:
  lda Buffer, x ;4 cycles
  sta $2007 ;4 cycles
  dex ;2 cycles
  bne CopyByte ;3 cycles

...it will take 13 CPU cycles to copy each byte. If you were doing only this during VBlank, you'd be able to transfer about 170 bytes. Since this is NOT the only thing you'll be doing (there's aways the sprite DMA, setting the scroll and other tasks), you can probably transfer around 100 bytes per VBlank. If that's enough for you (and it should be unless you're scrolling at ridiculous speeds or animating CHR-RAM), no worries. Just keep in mind that you have to store data in the buffer backwards so you can count down and index data using the same register, otherwise the copy loop would be slower than 13 cycles per iteration.

If you are however animating CHR-RAM, then that's definitely too slow, since 100 bytes are less than 7 tiles. In this case you'd need more advanced unrolling techniques.

Quote:

I haven't thought of the data compression, metatiles, etc. much yet, since the data is uncompressed in the buffer anyway. I will get to that once the buffer itself works.

Yeah, ideally these things are completely separate. As long as the decompression routines output data in the format you've chosen for the buffers, everything will connect just fine.

Re: Game project help and progress thread
by Tsutarja on 2015-04-05 (#144562)

If it's not too complex, I could try to use faster method for reading the buffer. I don't know if I'm going to animate CHR-RAM, but I guess I should make the code fast enough in case I need to use it.

Re: Game project help and progress thread
by tokumaru on 2015-04-05 (#144575)

Tsutarja wrote:

If it's not too complex, I could try to use faster method for reading the buffer.

The most straightforward way is to use the stack to hold the data, and use an unrolled sequence of PLA STA $2007, so that each byte takes 8 cycles to copy. That way, when processing the buffer, instead of counting down the number of bytes to copy you'd jump somewhere in the middle of this table.

First there's the unrolled code. 32 bytes seems like a good limit, because it's enough to update an entire row or column of tiles. Pattern updates would have to be broken down into blocks of 2 tiles, since each tile is 16 bytes.

Code:

Update32Bytes:
  PLA
  STA $2007
Update31Bytes:
  PLA
  STA $2007
Update30Bytes:
  PLA
  STA $2007
(...)
Update2Bytes:
  PLA
  STA $2007
Update1Byte:
  PLA
  STA $2007
UpdateNothing:

Then there's the jump table, so you know where to jump depending on how many bytes you have to copy:

Code:

JumpTableLo:
  .db <UpdateNothing, <Update1Byte, (...), <Update32Bytes

JumpTableHi:
  .db >UpdateNothing, >Update1Byte, (...), >Update32Bytes

Then, you'd do something like this when processing your update list:

Code:

  LDX LastUpdateIndex
ProcessUpdate:
  LDA UpdateAddressHi, x
  STA $2006
  LDA UpdateAddressLo, x
  STA $2006
  LDY UpdateCount
  LDA JumpTableLo, y
  STA Pointer+0
  LDA JumpTableHi, y
  STA Pointer+1
  JMP (Pointer)
Update32Bytes:
(...)
UpdateNothing:
  DEX
  BNE ProcessUpdate

You should also account for PPU address increments when setting up each transfer, so each update can select between increments of 1 or 32 bytes.

You can probably use 192 bytes of the stack for this, and still have 64 bytes left for normal stack use.

Re: Game project help and progress thread
by Tsutarja on 2015-04-06 (#144620)

If I store the data to the stack, doesn't that mess things up when NMI fires because its pushing the registers to the stack? The data for the background update isn't going to be at the top of the stack during NMI. Or do I move the register data to RAM temporarily? Or is it possible to change where the stack is begin read?

Re: Game project help and progress thread
by tokumaru on 2015-04-06 (#144625)

Tsutarja wrote:

The data for the background update isn't going to be at the top of the stack during NMI. Or do I move the register data to RAM temporarily?

Yes, you have to swap between 2 different stack pointers. Say that the normal stack begins at $FF and grows down to $C0 (64 bytes), then the buffer begins at $BF and grows down to $00 (192 bytes). You'd need 2 variables to back up and restore the stack pointers as needed. Something like this:

Code:

   ;initialize the primary stack pointer
   ldx #$ff
   txs

   ;initialize the secondary stack pointer
   lda #$bf
   sta BufferSP

Then, whenever you need to write to the buffer, you switch to the secondary stack pointer:

Code:

   ;switch to the secondary stack pointer
   tsx
   stx NormalSP
   ldx BufferSP
   txs

After you're done, switch back to the normal stack pointer:

Code:

   ;switch back to the normal stack pointer
   tsx
   stx BufferSP
   ldx NormalSP
   txs

During VBlank, before executing the VRAM updates, you also switch stack pointers, and switch back when you're done.

It's not a problem if an NMI or IRQ fires when you're manipulating the buffer, because whatever gets pushed there will be taken back when the interrupt returns, no worries. Pulling values out of the wrong stack would be a problem, but pulling doesn't happen automatically, and you will not do it when the wrong stack is being used.

You do have to detect when an NMI has fired before the frame calculations have ended (lag frame), so that you don't try to use data from buffers that are only half full, but that would have to be done no matter what.

Re: Game project help and progress thread
by Tsutarja on 2015-04-06 (#144629)

tokumaru wrote:

You do have to detect when an NMI has fired before the frame calculations have ended (lag frame), so that you don't try to use data from buffers that are only half full, but that would have to be done no matter what.

Maybe I could use the sleeping (aka vblank_wait) variable to see if main loop has ended. If it's not, NMI will restore the registers and exit (or should I only skip graphical updates and leave sound engine etc. running?).

Then about the code you posted earlier:
I'm assuming that LastUpdateIndex is incremented every time individual update is requested and the starting addresses are stored to UpdateAddressHi and UpdateAddressLo (giving 16 or so bytes for both of them). I guess I store the UpdateCount and the PPU increment mode in similar style. Do I still need the bg_update_flag variable to see if there are any graphical updates, or does the LastUpdateIndex cover that one too?

Re: Game project help and progress thread
by tokumaru on 2015-04-06 (#144655)

Tsutarja wrote:

Yes, that's one way to do it. If the program isn't "sleeping" when the NMI fires, you can assume the frame logic hasn't finished. Most games update the sound even in lag frames. This is actually an advantage of having the game logic separate from the VBlank handler over having everything in the NMI or everything in the main loop. Besides sound, raster effects (like for status bars) also have to be configured in lag frames, otherwise they'll jump/glitch.

Quote:

I'm assuming that LastUpdateIndex is incremented every time individual update is requested and the starting addresses are stored to UpdateAddressHi and UpdateAddressLo (giving 16 or so bytes for both of them). I guess I store the UpdateCount and the PPU increment mode in similar style. Do I still need the bg_update_flag variable to see if there are any graphical updates, or does the LastUpdateIndex cover that one too?

Your understanding is correct. You could take advantage of the fact that PPU addresses only go up to $3FFF and use the upper bits of the address for extra information, like the PPU increment. That will save you a little bit of RAM.

The exact implementation is up to you, but I prefer to avoid redundancy as much as possible, so if I can deduce something from one variable I think it's pointless to have the same information in another one.

Re: Game project help and progress thread
by Tsutarja on 2015-04-07 (#144678)

Well, the buffer is already doing something when I input data to it, though its not doing what I want. Its supposed to draw text on screen vertically, but instead doing this:

Attachment:

Kemono-0.png [ 1.91 KiB | Viewed 2383 times ]

Here is what I'm loading to the buffer:
Btw, just in case you are not aware of this (if you use some other assembler than NESASM), < is used for zero page addressing mode (at least I've been told so), which many other assemblers seem to use for "get low byte".

Code:

 TSX
 STX normal_sp
 LDX buffer_sp
 TXS

 LDX <bg_update_requests
 LDA #$05
 STA bg_update_count, x
 LDA #$20
 STA bg_update_address_lo, x
 LDA #$A0
 STA bg_update_address_hi, x
 LDA #$18
 PHA
 LDA #$15
 PHA
 PHA
 LDA #$0E
 PHA
 LDA #$11
 PHA
 INX
 STX <bg_update_requests

 TSX
 STX buffer_sp
 LDX normal_sp
 TXS

And here is how I read it in NMI:

Code:

BgUpdate:
 LDX <bg_update_requests
 CPX #$00
 BNE ReadBgBuffer
 JMP PaletteUpdate

ReadBgBuffer:
 TSX
 STX normal_sp
 LDX buffer_sp
 TXS

 LDX <bg_update_requests
ProcessBgUpdate:
 LDA bg_update_address_hi-1, x
 PHA
 AND #%00111111
 STA PPUAddr
 LDA bg_update_address_lo-1, x
 STA PPUAddr
 PLA
 AND #%10000000
 CMP #%10000000
 BNE HorizontalUpdate
 LDA <ppu_ctrl
 AND #%00000100
 CMP #%00000100
 BEQ IncModeDone
 CLC 
 ADC #%00000100
 STA <ppu_ctrl
 STA PPUCtrl
 JMP IncModeDone
HorizontalUpdate
 LDA <ppu_ctrl
 AND #%00000100
 CMP #%00000100
 BNE IncModeDone
 SEC
 SBC #%00000100
 STA <ppu_ctrl
 STA PPUCtrl
IncModeDone:
 LDY bg_update_count-1, x
 LDA BufferJumpTableLo, y
 STA pointerLo
 LDA BufferJumpTableHi, y
 STA pointerHi
 JMP [pointerLo]
Update32Bytes:
 PLA
 STA PPUData
Update31Bytes:
 PLA
 STA PPUData

( ... )

Update01Bytes:
 PLA
 STA PPUData
Update00Bytes:
 DEX
 BEQ EndBuffer
 JMP ProcessBgUpdate

EndBuffer:
 STX <bg_update_requests
 TSX
 STX buffer_sp
 LDX normal_sp
 TXS

Re: Game project help and progress thread
by tokumaru on 2015-04-07 (#144681)

EDIT: Oh, now I see you're accessing the tables with -1 when updating... let me check everything again. I kept the original answer below, but I'll post a new one if I catch the problem.

I see one big problem. After using the request index, you're incrementing it before saving it:

Tsutarja wrote:

Code:

 INX
 STX <bg_update_requests

...meaning that the saved value isn't pointing to the last written request, it's pointing to the NEXT (empty) slot. If you try to use that when updating you'll read junk. You probably want to change that so this variable points to the last written request. Maybe start it at 0 and increment before using, not after.

To make 0 be just a flag without sacrificing a memory position, you can access the list like this: lda bg_update_count-1, x (do this for all the properties). This is not mandatory, just a tip so you don't lose a byte of RAM at the beginning of each list.

Another tip, that I believe has already been pointed out:

Quote:

Code:

 LDX <bg_update_requests
 CPX #$00
 BNE ReadBgBuffer

There's no need for the CPX #$00. After every load or math operation, the Z flag is already set if the value is 0, there's no need to explicitly compare against 0. If you feel like keeping the instruction for clarity or something, that's fine. When I want to keep something unnecessary just for clarity, I simply comment the redundant instructions (such as a CLC before an addition when I know for sure that the carry is already clear). That way I don't sacrifice readability but I also don't lose the performance.

Re: Game project help and progress thread
by tokumaru on 2015-04-07 (#144683)

I still think you should increment the index before using the update slot, and use -1 everywhere the tables are accessed, for consistency. It's confusing to see the tables being accessed one way in one place and another way in another (this is what had me thinking I found the bug). This is up to you though.

The part that selects the increment is unnecessarily complicated. The first optimization you can do is not use the stack for backing up the high byte of the address, for 2 reasons: first, PHA and PLA (3 + 4 = 7 cycles) are slower than simply loading the value again (4 cycles), second, you don't even need to mask the high bits of the address before writing to $2006, because the PPU does that automatically.

Also, you don't need all the branching. The PPU increment setting is a just a bit, and the increment value you placed in the high byte of the address is also just a bit. Instead of checking values and branching you can simply copy the bit from one position to another, without having to make decisions or branch. After these optimizations you get this:

Code:

   LDA bg_update_address_hi-1, x
   STA PPUAddr
   LDY bg_update_address_lo-1, x
   STY PPUAddr
   AND #%10000000
   ASL
   ROL
   ROL
   ROL
   ORA <ppu_ctrl
   STA PPUCtrl

These are just improvements though, probably not related to the actual bug, which I can't find just from looking at the code. Did you try debugging in FCEUX (stepping through each instruction) to see when things start to go wrong? Is the "secondary stack" being filled correctly? Does the update code read the correct data?

Re: Game project help and progress thread
by Tsutarja on 2015-04-07 (#144688)

As far as I have tested so far, the incorrect background color is caused by the bit that I'm using for the PPU Increment mode selection. Also, I noticed that the first byte I'm pushing to the stack is not going there for some reason. The reason for the -1 is so that when bg_update_requests is zero there are no updates, but when it's not zero, it will read the buffer. So in other words its purpose it to prevent the code from skipping over the very first addresses it could read (if it was #$00).

Code:

 TSX
 STX normal_sp
 LDX buffer_sp
 TXS

 LDX <bg_update_requests
 INX
 LDA #$05
 STA bg_update_count-1, x
 LDA #$62
 STA bg_update_address_lo-1, x
 LDA #$20
 STA bg_update_address_hi-1, x
 LDA #$18                          ; This is not going to the stack (?)
 PHA
 LDA #$15
 PHA
 PHA
 LDA #$0E
 PHA
 LDA #$11
 PHA
 STX <bg_update_requests

 TSX
 STX buffer_sp
 LDX normal_sp
 TXS

I did some changes to the buffer reader in the NMI. Such as fixed some missing -1 markings, and the optimization you requested

Code:

BgUpdate:
 LDX <bg_update_requests
 CPX #$00
 BNE ReadBgBuffer
 JMP PaletteUpdate

ReadBgBuffer:
 TSX
 STX normal_sp
 LDX buffer_sp
 TXS

 LDX <bg_update_requests
ProcessBgUpdate:
 LDA bg_update_address_hi-1, x
 STA PPUAddr
 LDY bg_update_address_lo-1, x
 STY PPUAddr
 AND #%10000000
 ASL A
 ROL A
 ROL A
 ROL A
 ORA <ppu_ctrl
 STA PPUCtrl
 LDY bg_update_count-1, x
 LDA BufferJumpTableLo, y
 STA pointerLo
 LDA BufferJumpTableHi, y
 STA pointerHi
 JMP [pointerLo]
Update32Bytes:
 PLA
 STA PPUData
Update31Bytes:
 PLA
 STA PPUData

( ... )

Update01Bytes:
 PLA
 STA PPUData
Update00Bytes:
 DEX
 BEQ EndBuffer
 JMP ProcessBgUpdate

EndBuffer:
 STX <bg_update_requests
 TSX
 STX buffer_sp
 LDX normal_sp
 TXS

Re: Game project help and progress thread
by tokumaru on 2015-04-07 (#144696)

I don't see anything obviously wrong with the code. Only debugging will solve this, I guess. If you don't mind sharing a ROM I can take a look when I have the time, but if you're familiar with FCEUX's debugger you could debug this yourself, instruction by instruction. It should be easy to tell whether the values are going to the stack or not, and why.

Re: Game project help and progress thread
by Tsutarja on 2015-04-09 (#144812)

Alright. The drawing and palette updates work correctly under the same NMI routine. Next up is the background data compression. I need to look into how to do it when I have more time.

Re: Game project help and progress thread
by tokumaru on 2015-04-09 (#144816)

Tsutarja wrote:

Next up is the background data compression. I need to look into how to do it when I have more time.

The exact compression scheme will heavily depend on the game you're making: the dimensions of the levels, whether there's scrolling or not, how objects interact with the background... There are also technical aspects to consider, such as how much RAM you're willing to use for maps, so think carefully about all of that.

Re: Game project help and progress thread
by Tsutarja on 2015-04-09 (#144870)

tokumaru wrote:

The exact compression scheme will heavily depend on the game you're making: the dimensions of the levels, whether there's scrolling or not, how objects interact with the background...

The levels are always 1 screen high. Width depends on stage. The stages themselves are broken into "rooms" of various sizes and you can scroll both left and right in the room as you want but you cannot go to the previous rooms (there may be exceptions to this). The only interaction between objects and background is collision with solid objects. There may be some special collisions such as spikes (or some other tiles that damages the player) and platforms that can be jumper through from below and dropped down with Down + A. This is the plan for now, but I may group all rooms together as a single room and use a "super mario bros scrolling". This depends on if I'm having trouble implementing the scrolling system I originally planned.

tokumaru wrote:

There are also technical aspects to consider, such as how much RAM you're willing to use for maps, so think carefully about all of that.

The amount of RAM depends on how it is used with maps.

Re: Game project help and progress thread
by tokumaru on 2015-04-10 (#144889)

Tsutarja wrote:

Sounds straightforward enough. Not scrolling vertically avoids a lot of trouble. Let us know if you're unsure about how to implement something.

Re: Game project help and progress thread
by Tsutarja on 2015-04-10 (#144902)

I have now made some tile graphics for metatiles. I was thinking of the compression begin something like this:

Code:

 ; Metatiles are 32x32 pixels
Metatile00:                    ; e.g. Ground
 .db $80,$81,$80,$81
 .db $82,$83,$82,$83
 .db $80,$81,$80,$81
 .db $82,$83,$82,$83
 ;Attributes and collision data added here later

Metatile01:                    ; e.g. Wall
 .db $84,$85,$84,$85
 .db $86,$87,$86,$87
 .db $84,$85,$84,$85
 .db $86,$87,$86,$87
 ;Attributes and collision (none for a wall) data added here later

Screen00:                    ; Some indoors (or underground) screen with same floor and ceiling graphic
 .db Metatile00,Metatile00,Metatile00,Metatile00,Metatile00,Metatile00,Metatile00,Metatile00
 .db Metatile00,Metatile00,Metatile00,Metatile00,Metatile00,Metatile00,Metatile00,Metatile00
 .db Metatile01,Metatile01,Metatile01,Metatile01,Metatile01,Metatile01,Metatile01,Metatile01
 .db Metatile01,Metatile01,Metatile01,Metatile01,Metatile01,Metatile01,Metatile01,Metatile01
 .db Metatile01,Metatile01,Metatile01,Metatile01,Metatile01,Metatile01,Metatile01,Metatile01
 .db Metatile01,Metatile01,Metatile01,Metatile01,Metatile01,Metatile01,Metatile01,Metatile01
 .db Metatile00,Metatile00,Metatile00,Metatile00,Metatile00,Metatile00,Metatile00,Metatile00
 .db Metatile00,Metatile00,Metatile00,Metatile00,Metatile00,Metatile00,Metatile00,Metatile00
  ; Lower half of the last 32x32 metatile row is cut off, right?

Stage00:
 .db Screen00,Screen01,Screen02,Screen03 ; Let's pretend there are more screens than what I listed :P

What would be a good way of decompressing this? I probably need to have some kind of pointers in RAM that keeps track of things like current stage, current screen, etc. so the game knows where to look for the metatiles. I will probably update VRAM (when scrolling) so that I update 2 metatiles every frame the VRAM needs to be updated. So, it takes 4 frames to update one vertical row of metatiles. The player movement speed is around the same speed than what castlevania has, so I guess the updating speed is not too slow.

For now I can probably pretty safely say that I can allocate $0700 - $07FF for map RAM (overkill?). I'm assuming that I need to decompress the data to the map RAM and then push it to the "buffer stack".

Attachment:

File comment: Some graphics put together

kemono.png [ 19.59 KiB | Viewed 2245 times ]

Re: Game project help and progress thread
by tokumaru on 2015-04-10 (#144909)

Tsutarja wrote:

I'd do it on the fly... breaking down the coordinates (as I'll explain below) you can tell which screen to load from the level map. Then, using another part of the coordinate, you can tell which column of metatiles in that screen you have to read from. Then you can use the final part of the coordinate to know how many pixels into the metatile that coordinate is, and you can use this information for collision tests and such.

Accessing a level, be it for scrolling/rendering or for collision detection, is about using coordinates and interpreting the "fields" in those coordinates in order to know where to read your data from. You don't need specific variables for "current screen" and such, that information can be deduced from the coordinates you're using to access the map.

Quote:

I will probably update VRAM (when scrolling) so that I update 2 metatiles every frame the VRAM needs to be updated. So, it takes 4 frames to update one vertical row of metatiles. The player movement speed is around the same speed than what castlevania has, so I guess the updating speed is not too slow.

If it takes you 4 frame to update a 32-pixel wide column, that's equivalent to 8 pixels per frame, which should be enough for most games. Even Sonic, without speed shoes, runs slower than 8 pixels per frame, so you should be safe. Just be sure to have an area with valid graphics wider than the screen, so you don't scroll into incomplete columns. For example, here's a 2-screen section of the map, with the camera hovering over it:

Code:

                  CAMERA/SCREEN
        |-------------------------------|
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| U | P | V | V | V | V | V | V | V | V | P | U |   |   |   |   |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| U | P | V | V | V | V | V | V | V | V | P | U |   |   |   |   |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| U | P | V | V | V | V | V | V | V | V | P | U |   |   |   |   |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| U | P | V | V | V | V | V | V | V | V | P | U |   |   |   |   |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| U | P | V | V | V | V | V | V | V | V | P | U |   |   |   |   |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| U | P | V | V | V | V | V | V | V | V | P | U |   |   |   |   |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| U | P | V | V | V | V | V | V | V | V | P | U |   |   |   |   |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| U | P | V | V | V | V | V | V | V | V | P | U |   |   |   |   |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

The metatiles marked "V" are visible, and once the screen starts scrolling, left or right, the metatiles marked "P" would become partially visible, so they should be properly loaded before that happens. And when it does happen, the metatiles marked "U" should start being updated, so that they're ready when the camera reaches it.

One good way to go about this would be to update two metatiles every time the camera crosses an 8-pixel boundary (look for changes in bit 3 of the camera's coordinate - whenever it changes from 1 to 0, or vice-versa, an 8-pixel boundary was crossed), that way you wouldn't occupy the majority of you VRAM bandwidth with metatiles for 4 consecutive frames (unless the player was running at 8 pixels per frame), giving other updates more opportunities to run (such as pattern table updates, if you ever do that).

I'd use the camera position not only to decide WHEN to load new metatiles, but also to find out WHICH metatiles to update. If you break down the camera position, the bits have the following meanings:

Code:

SSSSSSSS MMMPPPPP
SSSSSSSS: Index of the screen within the stage;
MMM: Index of the column of metatiles within the screen;
PPPPP: Index of the column of pixels within the metatile;

Like I said before, you can detect changes in bit 3 to know when to draw a new pair of metatiles. Now consider that the camera is moving right, and this value is increasing: Since metatiles are 32 pixels wide, the camera will go through 4 8-pixel boundaries before crossing a metatile boundary. This means you can consider bits 3 and 4 of the camera's coordinate as the index of the pair of metatiles to update. For example:

Code:

   lda CameraX+0 ;compare the current position...
   eor OldCameraX+0 ;...against the old position
   and #%00001000 ;keep only the bit of interest
   beq NoMetatileUpdate ;skip if it didn't change
   lda CameraX+0 ;get the position again
   and #%00011000 ;find out how close the camera is to the next metatile boundary
   lsr
   lsr
   lsr
   ;OMITTED: USE THIS VALUE TO KNOW WHICH OF THE 4 PAIRS OF METATILES IN A COLUMN TO UPDATE;
   ;OMITTED: DECODE THE METATILES AND FILL THE UPDATE BUFFER;
NoMetatileUpdate:

Quote:

For now I can probably pretty safely say that I can allocate $0700 - $07FF for map RAM (overkill?). I'm assuming that I need to decompress the data to the map RAM and then push it to the "buffer stack".

Unless you want to compress screens or stages even further in the ROM using something like RLE, I don't think you need any RAM at all for maps. This format is friendly enough to be read straight from the ROM.

Re: Game project help and progress thread
by tokumaru on 2015-04-10 (#144917)

Just noticed something in your proposed level format:

Code:

.db Metatile00

You can't do it exactly like this because Metatile00 is not a byte, but a 16-bit address.

You could use .dw instead, but if you have 256 metatiles or less that would be a waste of space. So one possible solution is to use a list of metatile addresses that you can look up using their indices:

Code:

MetatileAddressLo:
   .db <Metatile00, <Metatile01, (...)

MetatileAddressHi:
   .db >Metatile00, >Metatile01, (...)

Then you could read the top right tile of metatile 25 like this:

Code:

   ldy #25
   lda MetatileAddressLo, y
   sta Pointer+0
   lda MetatileAddressHi, y
   sta Pointer+1
   ldy #03 ;the top right tile is the 4th byte in the metatile data
   lda (Pointer), y

Another option is to make sure that the size of the data is always a power of 2 (i.e. each screen is always 64 bytes/metatiles) and properly aligned in memory, so that you can quickly calculate the address based on the index, without needing a table:

Code:

   lda #$00
   sta Pointer+0
   lda #17 ;screen 17
   lsr ;multiply by 64
   rol Pointer+0
   lsr
   rol Pointer+0
   adc #>Screen00 ;add the base address
   sta Pointer+1

Making sure that the size of the data is a power of 2 might not be easy in the case of metatiles, since you plan to add collision and palette info to the 16 bytes you already have. Padding that to 32 bytes would be a monumental waste of space, so you might want to go with the table approach, at lest for the metatiles, and go with the calculated address for screens.

Re: Game project help and progress thread
by tokumaru on 2015-04-11 (#145018)

Hey, I was just checking a few Capcom games (Mega Man, Duck Tales and The Little Mermaid, specifically) and they all use 32x32-pixel metatiles, like you're planning to, and they appear to update the background similarly to how we were discussing. Take a look at these games in FCEUX's name table viewer for some inspiration.

These games update only 1 metatile each frame, but every 4 scrolled pixels, not 8, so the final result is the same (with the disadvantage that the maximum scroll speed is reduced to 4 pixels per frame - but none of these games are particularly fast anyway). If you divide 32 (the width of a metatile) by 4 (the distance between boundaries that trigger updates), the result is 8, meaning that there will be 8 updates in the space of 32 scrolled pixels, and there are 8 metatiles per column, so there's a direct relationship between which of the 8 4-pixel boundaries is crossed and which of the 8 metatiles of a column is updated, like I suggested. This is evidenced by columns being drawn from top to bottom when scrolling right, and bottom to top when scrolling left.

Re: Game project help and progress thread
by Tsutarja on 2015-04-12 (#145030)

Right now I'm trying to get the game to load the background for the first stage's first screen. I was thinking loading the data to the buffer 120 bytes (one vertical metatile row per frame) at a time (of course when the screen is faded to black) and then load the attributes (+64 bytes) at the end, which means that the update takes 9 frames, which doesn't seem like too long time for a transition. Or should I disable rendering and load the whole background as fast as possible? Which one do you think would be easier (speed doesn't really matter here as long as it doesn't take too long, since there are still other stuff that needs to be set up during the same transition)? I still want the subroutine to be usable elsewhere. Not just for the one update.

Here is the way I have packed the data (the thing you noted earlier should be fixed now):

Code:

Stg1Screen1Lo:
 .db LOW(MT0F),LOW(MT0F),LOW(MT0F),LOW(MT0F),LOW(MT0F),LOW(MT0F),LOW(MT0F),LOW(MT0F)
 .db LOW(MT0F),LOW(MT0F),LOW(MT0F),LOW(MT0F),LOW(MT0F),LOW(MT0F),LOW(MT0F),LOW(MT0F)
 .db LOW(MT0F),LOW(MT0F),LOW(MT0F),LOW(MT0F),LOW(MT0F),LOW(MT0F),LOW(MT0F),LOW(MT0F)
 .db LOW(MT0F),LOW(MT0A),LOW(MT0D),LOW(MT0F),LOW(MT0F),LOW(MT0F),LOW(MT0F),LOW(MT0F)
 .db LOW(MT0F),LOW(MT08),LOW(MT0C),LOW(MT0F),LOW(MT13),LOW(MT11),LOW(MT0F),LOW(MT12)
 .db LOW(MT13),LOW(MT07),LOW(MT10),LOW(MT05),LOW(MT04),LOW(MT04),LOW(MT04),LOW(MT04)
 .db LOW(MT04),LOW(MT04),LOW(MT04),LOW(MT06),LOW(MT00),LOW(MT00),LOW(MT00),LOW(MT00)
 .db LOW(MT00),LOW(MT00),LOW(MT00),LOW(MT00),LOW(MT00),LOW(MT00),LOW(MT00),LOW(MT00)

Stg1Screen1Hi:
 .db HIGH(MT0F),HIGH(MT0F),HIGH(MT0F),HIGH(MT0F),HIGH(MT0F),HIGH(MT0F),HIGH(MT0F),HIGH(MT0F)
 .db HIGH(MT0F),HIGH(MT0F),HIGH(MT0F),HIGH(MT0F),HIGH(MT0F),HIGH(MT0F),HIGH(MT0F),HIGH(MT0F)
 .db HIGH(MT0F),HIGH(MT0F),HIGH(MT0F),HIGH(MT0F),HIGH(MT0F),HIGH(MT0F),HIGH(MT0F),HIGH(MT0F)
 .db HIGH(MT0F),HIGH(MT0A),HIGH(MT0D),HIGH(MT0F),HIGH(MT0F),HIGH(MT0F),HIGH(MT0F),HIGH(MT0F)
 .db HIGH(MT0F),HIGH(MT08),HIGH(MT0C),HIGH(MT0F),HIGH(MT13),HIGH(MT11),HIGH(MT0F),HIGH(MT12)
 .db HIGH(MT13),HIGH(MT07),HIGH(MT10),HIGH(MT05),HIGH(MT04),HIGH(MT04),HIGH(MT04),HIGH(MT04)
 .db HIGH(MT04),HIGH(MT04),HIGH(MT04),HIGH(MT06),HIGH(MT00),HIGH(MT00),HIGH(MT00),HIGH(MT00)
 .db HIGH(MT00),HIGH(MT00),HIGH(MT00),HIGH(MT00),HIGH(MT00),HIGH(MT00),HIGH(MT00),HIGH(MT00)

The metatiles are as an attachment to save space:

Attachment:

MetatileData.asm [2.34 KiB]
Downloaded 93 times

Re: Game project help and progress thread
by tokumaru on 2015-04-12 (#145044)

So you decided to go with pointers instead of indexing? That's fine, it's simpler to decode, just keep in mind that this will consume more ROM, so pay attention to how much space you can spare for maps.

I think it's a waste to write code to decode the first screen. Since you'll have to write the code that renders columns for the purpose of scrolling, you can simply write a loop that calls this very same code, in order to draw all the columns of the first screen.

By now you know that I'm strongly against redundancy, and having 2 different routines to render the map would be very redundant, wasting ROM and making the code more difficult to maintain (e.g. if you change anything about how the level is encoded, you'll have to modify 2 routines instead of 1).

I think you should focus on rendering columns right from the start, even if you haven't implemented any scrolling yet. Make this simple, just give the routine the position of the column to render, and have the routine use that to read data from the map and fill the BG uldate buffers with the data.

Then you can use this routine for both purposes: when drawing the first screen, increment the column position in a loop until the whole screen is done. When scrolling, calculate the position of the column based on the position of the camera and direction of the movement.

Re: Game project help and progress thread
by Tsutarja on 2015-04-13 (#145248)

Next thing I should know is that how do I make the update routine to recognize that the background is updated far enough (updated until 'U' area is reached). In the case of updating first screen and in case the updating leaves behind for some reason. If I only use camera's position, the routine probably only updates the 'U' column without checking that the columns before it are updated correctly.

Code:

                  CAMERA/SCREEN
        |-------------------------------|
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| U | P | V | V | V | V | V | V | V | V | P | U |   |   |   |   |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| U | P | V | V | V | V | V | V | V | V | P | U |   |   |   |   |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| U | P | V | V | V | V | V | V | V | V | P | U |   |   |   |   |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| U | P | V | V | V | V | V | V | V | V | P | U |   |   |   |   |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| U | P | V | V | V | V | V | V | V | V | P | U |   |   |   |   |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| U | P | V | V | V | V | V | V | V | V | P | U |   |   |   |   |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| U | P | V | V | V | V | V | V | V | V | P | U |   |   |   |   |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| U | P | V | V | V | V | V | V | V | V | P | U |   |   |   |   |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

I probably need to feed the metatile to the buffer in four parts of four individual tile rows.

Code:

 ; This is just a example I came up with. I don't know if this actually works.

 LDY #$04
Loop:
 LDX bg_update_requests
 INX
 LDA #$04                              ; 4 tiles to update
 STA bg_update_count-1, x
 LDA pre_calculated_address_lo, y      ; pre-calculated from camera position
 STA bg_update_address_lo-1, x
 LDA pre_calculated_address_hi, y      ; pre-calculated from camera position
 STA bg_update_address_hi-1, x
 STX

 ; push four individual tiles to the stack

 DEY
 BNE Loop

Re: Game project help and progress thread
by tokumaru on 2015-04-14 (#145271)

Tsutarja wrote:

For the first screen you can just use a loop. Start from CameraX - 32 and go all the way up to CameraX + 255 + 32, incrementing 8 and updating 2 metatiles each step. When scrolling, check if the camera moved left or right (CameraX - OldCameraX < 0 = left, CameraX - OldCameraX > 0 = right), and calculate ColumnX based on that: If left, ColumnX = CameraX - 32, if right, ColumnX = CameraX + 255 + 32.

In the function that decodes columns, you should check whether ColumnX is out of bounds (if ColumnX >= LevelLength, the column is outside the level - remember that -1 is $FFFF, which would also be considered larger in an unsigned comparison), in which case you can simply return and not do anything.

If you do have to draw metatiles, you have to decode the individual bits of ColumnX to know what to render and where. Here's a breakdown of what the individual bits of ColumnX mean:

Code:

SSSSSSSS CCCPPPPP
SSSSSSSS: screen within the level;
CCC: column of metatiles within the screen;
PPPPP: column of pixels within the metatile;

With this information you can do everything you need, like calculating the destination address for the tiles in VRAM, or locating the correct pointers necessary to decode the compressed data from PRG-ROM. For example, to know the name table where the metatiles must be written to, you can look at the lowest bit of SSSSSSSS (since name tables are 256 pixels wide, just like level screens).

You have to decide whether you're going to update 8 metatiles every 32 scrolled pixels (at the rate of 2 per frame) or 2 metatiles every 8 scrolled pixels. I'd go with the latter, so as to not hog VRAM bandwidth for 4 consecutive frames. It's not such a big difference, but it will affect how you use the bits of ColumnX to calculate the source and destination addresses for the metatiles.

Quote:

Code:

 LDY #$04
Loop:
 LDX bg_update_requests
 INX
 LDA #$04                              ; 4 tiles to update
 STA bg_update_count-1, x
 LDA pre_calculated_address_lo, y      ; pre-calculated from camera position
 STA bg_update_address_lo-1, x
 LDA pre_calculated_address_hi, y      ; pre-calculated from camera position
 STA bg_update_address_hi-1, x
 STX

 ; push four individual tiles to the stack

 DEY
 BNE Loop

Yeah, pushing the tiles to the buffer could be something like this. You probably won't have the luxury of using Y like this though, since it will most likely be necessary for reading the tile indices from the metatile definitions. So I'd probably calculate the address of the first row of 4 tiles, and add 32 to it each iteration of this loop before pushing it to the update list.

I'd probably do the whole scrolling process like this:

Code:

- skip to the end if no 8-pixel boundary was crossed;
- calculate column position from camera position based on the direction of the movement;
(steps below could be a subroutine, so you can use the same code for the first screen and for scrolling)
- skip to the end if the column is out of bounds;
- calculate the first name table address based the column position;
- calculate the first attribute table address based the column position;
- use the column position to read a screen from the map;
- repeat the following twice:
  - use the column position and iteration count to read a metatile from the screen;
  - push AT update to update list;
  - repeat the following 4 times:
    - push new update to update list;
   - read 4 tiles from the metatile and push to buffer;
   - move one tile down (increment NT address by 32);
  - move one metatile down (increment AT address by 8);

Re: Game project help and progress thread
by Tsutarja on 2015-04-16 (#145449)

I made the scrolling update code until the point the buffer needs to be filled. Does this look alright so far?

Code:

 LDA scroll_x
 EOR scroll_x_prev
 AND #%00000100
 BEQ NoMetatileUpdate
 LDA scroll_x
 AND #%00011100
 STA column_inc
 LSR
 LSR
 LSR
 STA row_inc
 LDA screen
 AND #%00000001
 BNE SecondNT

FirstNT:
 LDA #$00
 STA ppu_addr_lo
 LDA #$20
 STA ppu_addr_hi
JMP CheckScrollDir

SecondNT:
 LDA #$00
 STA ppu_addr_lo
 LDA #$24
 STA ppu_addr_hi

CheckScrollDir:
 LDA scroll_x
 CMP scroll_x_prev
 BCC ScrollingLeft

ScrollingRight:
 LDA ppu_addr_lo
 CLC
 ADC column_inc
 BCS ScrollCarry
 JMP FillBuffer

ScrollingLeft:
 LDA ppu_addr_lo
 SEC
 SBC column_inc
 BCC ScrollCarry
 JMP FillBuffer

ScrollCarry:
 LDA ppu_addr_hi
 ADC #$04
 STA ppu_addr_hi

FillBuffer:

( ... )

NoMetatileUpdate:

Re: Game project help and progress thread
by tokumaru on 2015-04-16 (#145455)

Tsutarja wrote:

Code:

 LDA scroll_x
 EOR scroll_x_prev
 AND #%00000100
 BEQ NoMetatileUpdate

So you decided to update 1 metatile every 4 scrolled pixels? That's probably simpler, but you can't scroll more than 4 pixels per frame or you might miss some metatiles.

I'm in a bit of a hurry now so I haven't been able to debug the rest of the code in my head, but I did see one thing that doesn't make sense: scroll_x is an 8-bit variable, and you can't possibly make all the decisions you need with that.

An 8-bit value can only track the pixel position within the screen, so you can't possibly select a name table based on that. You also can't detect the direction of the movement using 8-bit coordinates, because when it wraps from 255 to 0 you'll missinterpret that as a movement to the left.

The PPU may treat the scroll as an 8 bit value (or 9, if you count the NT bits in $2000, which you should), but your levels are larger than that, so you absolutely need to keep track of the scroll/camera. Even if you only write 9 bits worth of scroll info to the PPU, the rest is necessary for detecting the direction of the movement and to locate the correct metatiles to draw, since they come from the level and the level is larger than 2 screens.

Another important thing is that the camera/scroll position represents the top left corner of the screen, and when calculating addresses of updates at the right side I don't see you compensating for the width of the screen. When scrolling to the right, updates must be offset from the right edge of the camera, so you have to add 255 to the camera's position in order to find that edge, and then you add a few columns to find the actual update position. As an optimization, you can merge the 2 additions into 1, but it's important that you understand what is hapenning underneath.

There are quite a lot of things you have to do before you can fill the buffer. You are calculating the output addresses, but you haven't located any of the source addresses.

I might be able to write some example code later if you think that's going to help.

Re: Game project help and progress thread
by Tsutarja on 2015-04-16 (#145463)

tokumaru wrote:

An 8-bit value can only track the pixel position within the screen, so you can't possibly select a name table based on that. You also can't detect the direction of the movement using 8-bit coordinates, because when it wraps from 255 to 0 you'll missinterpret that as a movement to the left.

I'm using the screen variable for that. AND-ing off all other bits than the bit 0 should let me see which nametable I'm on at the moment (to know which nametables' start address to use ($2000 or $2400)).

tokumaru wrote:

I might be able to write some example code later if you think that's going to help.

I'm sure that'll help. You probably don't need to make example code out of the whole process, just the stuff that I may have forgotten or done incorrectly.

Re: Game project help and progress thread
by tokumaru on 2015-04-16 (#145475)

Tsutarja wrote:

You probably don't need to make example code out of the whole process

Oops, too late! Long ass post ahead! I started coding the important parts but ended up doing the whole thing because I wanted to see it everything fit together. It's all untested though, so follow with caution. You don't have to follow everything I did step by step, this is just an example of how I'd do things considering the compression format you've chosen. You might want to do things differently in some places, but it's important that you understand what's going on. I also separated the actual metatile decoding to its own subroutine, so it can be used for rendering the initial screen too.

Code:

   ;decide whether to render a metatile
   lda OldCameraX+0
   eor CameraX+0
   and #%00000100
   beq Done

   ;detect the direction of the movement
   lda OldCameraX+0
   cmp CameraX+0
   lda OldCameraX+1
   sbc CameraX+1
   bcc MovedRight

MovedLeft:

   ;calculate the position of the column to the left of the camera
   ;sec (omitted because the carry is always set by this point)
   lda CameraX+0 ;subtract 32 ($0020)
   sbc #$20
   sta ColumnX+0
   lda CameraX+1
   sbc #$00
   sta ColumnX+1
   jmp ColumnReady

MovedRight:

   ;calculate the position of the column to the right of the camera
   ;clc (omitted because the carry is always clear by this point)
   lda CameraX+0 ;add 255 + 32 = 287 ($011F)
   adc #$1F
   sta ColumnX+0
   lda CameraX+1
   adc #$01
   sta ColumnX+1

ColumnReady:

   ;go decode the metatile
   jsr DecodeMetatile

Done:

And here's the subroutine:

Code:

DecodeMetatile:

   ;return if the column is out of bounds
   lda ColumnX+1
   cmp LevelLength
   bcc Continue
   rts

Continue:

   ;use the screen index to set up a pointer to the screen we'll read from
   asl
   tay
   lda (LevelPointer), y
   sta ScreenPointer+0
   iny
   lda (LevelPointer), y
   sta ScreenPointer+1

   ;set up the indices needed to access pre-calculated values
   lda ColumnX+0
   and #%11111100
   lsr
   tay ;index for tables of words
   lsr
   tax ;index for tables of bytes

   ;prepare the bit that will be used for name table selection
   lda ColumnX+1
   and #%00000001 ;keep only the bit that selects 1 of the 2 name tables
   asl
   asl
   sta Temp

   ;prepare the name table address
   lda NameTableAddresses+0, y
   sta NameTableAddress+0
   lda NameTableAddresses+1, y
   ora Temp
   sta NameTableAddress+1

   ;prepare the attribute table address
   lda AttributeTableAddresses, x
   sta AttributeTableAddress+0
   lda #%00100011
   ora Temp
   sta AttributeTableAddress+1

   ;get the index of the metatile within the screen
   ldy MetatileIndices, x

   ;set up a pointer to the metatile
   lda (ScreenPointer), y
   sta MetatilePointer+0
   iny
   lda (ScreenPointer), y
   sta MetatilePointer+1

   ;OMITTED: SWITCH TO BUFFER STACK;

   ;OMITTED: SET UP VRAM UPDATE USING THE ATTRIBUTE TABLE ADDRESS;

   ;put the attribute byte in the buffer
   ldy #ATTRIBUTE_OFFSET
   lda (MetatilePointer), y
   pha

   ;prepare to read the first tile of the metatile
   ldy #$00

BufferRow:

   ;OMITTED: SET UP VRAM UPDATE USING THE NAME TABLE ADDRESS;

   ;put 4 tiles in the buffer
   lda (MetatilePointer), y
   pha
   iny
   lda (MetatilePointer), y
   pha
   iny
   lda (MetatilePointer), y
   pha
   iny
   lda (MetatilePointer), y
   pha
   iny

   ;check if all 16 tiles have already been processed
   cpy #$10
   beq Done

   ;move the output position one row down
   clc
   lda NameTableAddress+0
   adc #$20
   sta NameTableAddress+0
   lda NameTableAddress+1
   adc #$00
   sta NameTableAddress+1

   ;process another row if we didn't invade the attribute tables (happens with the last metatile)
   lda NameTableAddress+0
   and #%11000000
   cmp #%11000000
   bne BufferRow
   lda NameTableAddress+1
   and #%00000011
   cmp #%00000011
   bne BufferRow

Done:

   ;OMITTED: SWITCH TO THE NORMAL STACK;

   ;return
   rts

Note that this is reading pointers stored as words, not split into bytes like you originally intended. This means you can define levels like this:

Code:

Level0:
   .dw Screen00, Screen01, Screen00, Screen03, Screen03, Screen01

And screens like this:

Code:

Screen00:
   .dw Metatile00, Metatile00, Metatile00, Metatile00, Metatile00, Metatile00, Metatile00, Metatile00
   .dw Metatile00, Metatile00, Metatile00, Metatile00, Metatile00, Metatile00, Metatile00, Metatile00
   .dw Metatile00, Metatile00, Metatile00, Metatile00, Metatile00, Metatile00, Metatile00, Metatile00
   .dw Metatile00, Metatile00, Metatile00, Metatile00, Metatile03, Metatile03, Metatile00, Metatile03
   .dw Metatile00, Metatile00, Metatile00, Metatile00, Metatile00, Metatile00, Metatile00, Metatile00
   .dw Metatile03, Metatile03, Metatile03, Metatile00, Metatile03, Metatile03, Metatile03, Metatile03
   .dw Metatile02, Metatile02, Metatile02, Metatile00, Metatile02, Metatile02, Metatile02, Metatile02
   .dw Metatile02, Metatile02, Metatile02, Metatile00, Metatile02, Metatile02, Metatile02, Metatile02

Which is much easier to write by hand than splitting the high and low bytes of each pointer. You'd probably go crazy trying to design levels the split way!

Also, I owe you some explanations about how the addresses are set up, since I used tables to solve this part. First I'm gonna tell you how we're finding the position of the metatile that will be updated, and then how all the addresses and indices are calculated from that position.

There are only 8 columns of metatiles per screen, but since updating all 8 metatiles of a column would take much more time than available during VBlank, we'll only update 1 metatile at a time. This means we need 8 updates to complete a column, and since each column is 32 pixels wide, we'll have to update one metatile every 32 / 8 = 4 pixels.

We could use a separate counter to keep track of how many metatiles we've already updated, but we already have this info implied in the column coordinate. You can think of the bits in the column coordinate like this:

Code:

SSSSSSSS MMM444PP
SSSSSSSS: index of the screen within the level (0 to 255);
MMM: index of the metatile column (0 to 7);
444: index of the 4-pixel column within the metatile column (0 to 7);
PP: index of the pixel column within the 4-pixel column (0 to 3);

OK, so we already know the column where the metatile we'll be processing is, but we still need to figure out its row. We have absolutely no use for the pixel index, but the 4-pixel column index can be REPURPOSED as the row of the metatile, since it will count from 0 to 7 during the course of one metatile column, which is exactly what we need. We'll be using this to count rows instead of a separate variable, so we'll treat the coordinate like this:

Code:

SSSSSSSS CCCRRR**
SSSSSSSS: index of the screen within the level (0 to 255);
CCC: column of the metatile within the screen;
RRR: row of the metatile within the screen;
**: not used;

Since we now know the column and the row of the metatile being processed, we can use that information to generate all the pointers and indices necessary to access it, we just need to move the bits around. This could be done with bit shifting and bitwise operations, but that could be slow and difficult to understand. To avoid that, I decided to use tables. To index these tables I used the position of the metatile exactly like it's arranged in the low byte of the column index, but shifted according to the size of the entries of each table (0CCCRRR0 for words, 00CCCRRR for bytes).

To read from the screens, you have to think of the format in which they're stored: a grid of 8x8 words. This means that indices to read screens are in the following format:

Code:

0RRRCCC0

The formula used to find this format is (y * 8 + x) * 2. This is the good old formula used to convert 2D to 1D, and the multiplication by 2 is there because each entry is 2 bytes (a pointer). So there's one table (MetatileIndices) that converts 00CCCRRR into 0RRRCCC0.

Another thing we need to find out is the target address for the metatile in the name tables. Name table addresses are in the following format (this was defined by Nintendo):

Code:

0010YXYY YYYXXXXX

This is a bit more complicated because we have a base address (name tables start at $2000, not $0000), and because the top X and Y bits are separated from the rest and given more relevance, for name table selection. We can ignore that when creating the tables (to keep their size down) and assume that all addresses are in the first name table ($2000), and change the name table bit after reading the address from the table. Anyway, the conversion goes like this:

Code:

0CCCRRR0 (index)
001000RR R00CCC00 (NT address)

The last table we need converts the index into an attribute table address. AT addresses are in the folowing format (again, defined by Nintendo):

Code:

0010YX11 11YYYXXX

Which means that the conversion goes like this:

Code:

0CCCRRR0 (index)
00100011 11RRRCCC (AT address)

Like I said before, you could do these conversions in real time instead of using tables, but I wanted to keep the routine fast, and didn't want to confuse you with all the bit shifting. But you can decide to go that route if you don't want to waste space with these tables. The important thing is that the bits end up where they have to be.

I'll give you all the tables, but there might be errors since I wrote them manually instead of writing a script to generate them. Each table has 64 entries, but one uses words while the other 2 use bytes, for a total of 256 btes worth of tables.

Code:

NameTableAddresses:

   .dw %0010000000000000, %0010000010000000, %0010000100000000, %0010000110000000, %0010001000000000, %0010001010000000, %0010001100000000, %0010001110000000
   .dw %0010000000000100, %0010000010000100, %0010000100000100, %0010000110000100, %0010001000000100, %0010001010000100, %0010001100000100, %0010001110000100
   .dw %0010000000001000, %0010000010001000, %0010000100001000, %0010000110001000, %0010001000001000, %0010001010001000, %0010001100001000, %0010001110001000
   .dw %0010000000001100, %0010000010001100, %0010000100001100, %0010000110001100, %0010001000001100, %0010001010001100, %0010001100001100, %0010001110001100
   .dw %0010000000010000, %0010000010010000, %0010000100010000, %0010000110010000, %0010001000010000, %0010001010010000, %0010001100010000, %0010001110010000
   .dw %0010000000010100, %0010000010010100, %0010000100010100, %0010000110010100, %0010001000010100, %0010001010010100, %0010001100010100, %0010001110010100
   .dw %0010000000011000, %0010000010011000, %0010000100011000, %0010000110011000, %0010001000011000, %0010001010011000, %0010001100011000, %0010001110011000
   .dw %0010000000011100, %0010000010011100, %0010000100011100, %0010000110011100, %0010001000011100, %0010001010011100, %0010001100011100, %0010001110011100

AttributeTableAddresses:

   .db %11000000, %11001000, %11010000, %11011000, %11100000, %11101000, %11110000, %11111000
   .db %11000001, %11001001, %11010001, %11011001, %11100001, %11101001, %11110001, %11111001
   .db %11000010, %11001010, %11010010, %11011010, %11100010, %11101010, %11110010, %11111010
   .db %11000011, %11001011, %11010011, %11011011, %11100011, %11101011, %11110011, %11111011
   .db %11000100, %11001100, %11010100, %11011100, %11100100, %11101100, %11110100, %11111100
   .db %11000101, %11001101, %11010101, %11011101, %11100101, %11101101, %11110101, %11111101
   .db %11000110, %11001110, %11010110, %11011110, %11100110, %11101110, %11110110, %11111110
   .db %11000111, %11001111, %11010111, %11011111, %11100111, %11101111, %11110111, %11111111

MetatileIndices:

   .db %00000000, %00010000, %00100000, %00110000, %01000000, %01010000, %01100000, %01110000
   .db %00000010, %00010010, %00100010, %00110010, %01000010, %01010010, %01100010, %01110010
   .db %00000100, %00010100, %00100100, %00110100, %01000100, %01010100, %01100100, %01110100
   .db %00000110, %00010110, %00100110, %00110110, %01000110, %01010110, %01100110, %01110110
   .db %00001000, %00011000, %00101000, %00111000, %01001000, %01011000, %01101000, %01111000
   .db %00001010, %00011010, %00101010, %00111010, %01001010, %01011010, %01101010, %01111010
   .db %00001100, %00011100, %00101100, %00111100, %01001100, %01011100, %01101100, %01111100
   .db %00001110, %00011110, %00101110, %00111110, %01001110, %01011110, %01101110, %01111110

Re: Game project help and progress thread
by tokumaru on 2015-04-16 (#145476)

Tsutarja wrote:

Right, but that's not the only situation where you need the screen index. You need it when calculating the scroll direction, when calculating the position of the column you need to render, when generating the target NT and AT addresses, and so on. And the position of the column also needs to be a 16-bit number, since levels are larger than 1 screen.

I really think you should consider implementing a "camera" object, and giving it its own position. The low byte of the camera's position is the scroll value you'll write to $2005, without any modifications, and the lowest bit of the high byte will go to $2000, so there are no added complications. Having a camera entity will allow for advanced scrolling later down the road if you decide to. You can apply physics to the camera, so it moves with a bit of inertia, like real cameras do. You can move it more to the right or to the left of the player depending on which side he's facing. You can easily move the camera around in cutscenes while the player is stopped, to reveal things that are farther away. The camera is a real thing that moves around the stage, like the other game objects do. Pretending it doesn't exist and manipulating the scroll directly is a poor way to treat such an important dynamic entity, IMO.

Re: Game project help and progress thread
by Tsutarja on 2015-04-17 (#145512)

Okay, here are some of the things I don't understand:

Here you are comparing OldCameraX against CameraX, but you are not doing anything with the result. Does this set one of the processor flags or something?

Code:

   ;detect the direction of the movement
   lda OldCameraX+0
   cmp CameraX+0
   lda OldCameraX+1
   sbc CameraX+1
   bcc MovedRight

What's the point of having SBC here if you are subtracting #$00? Again, does this set a processor flag?

Code:

   lda CameraX+1
   sbc #$00
   sta ColumnX+1

Are these pointers supposed to be memory addresses or do they point to a table? The way you are writing the variable names and labels makes them sometimes hard to distinguish which one you mean. (I use different style for both, example_variable for variables and ExampleTable for lookup tables and labels)

Code:

 lda (LevelPointer), y

( ... )

 lda (ScreenPointer), y

Re: Game project help and progress thread
by tokumaru on 2015-04-17 (#145514)

Nice to see you trying to figure out the code!

Tsutarja wrote:

Here you are comparing OldCameraX against CameraX, but you are not doing anything with the result. Does this set one of the processor flags or something?

Code:

   ;detect the direction of the movement
   lda OldCameraX+0
   cmp CameraX+0
   lda OldCameraX+1
   sbc CameraX+1
   bcc MovedRight

I'm using the result (look at the branch instruction), but since the camera position is unsigned I'm using the C flag instead of the N flag to check the result. After a comparison/subtraction, the CPU indicates a borrow by clearing the C flag. So, if there was a borrow when subtracting CameraX from OldCameraX, that means CameraX is larger than OldCameraX, meaning the camera moved right.

Quote:

What's the point of having SBC here if you are subtracting #$00? Again, does this set a processor flag?

Code:

   lda CameraX+1
   sbc #$00
   sta ColumnX+1

That's how 16-bit math works. This part propagates the borrow onto the next byte. For example, let's subtract $0014 from $0a12. First we do $12 - $14, which is -$02 ($fe). since the result was smaller than 0, there was a borrow. If you don't do anything with that information, you'll end up with $0afe as your result, which is wrong. You have to do the other half of the subtraction, which is $0a - $00 = $0a, which looks pointless at first, but since there was a borrow, an extra unit is subtracted from $0a and you end up with $09, forming $09fe, which is the correct result.

It's very important that you understand this. Since the 6502 is an 8-bit CPU, it's very convenient to work with 8-bit numbers, but a lot of things in a game need numbers larger than that, so you absolutely must understand how 16-bit math works.

Quote:

Are these pointers supposed to be memory addresses or do they point to a table?

These are ZP variables that point to the current level map, a screen definition and a metatile definition, so you can read the data that composes the level.

Re: Game project help and progress thread
by Tsutarja on 2015-04-17 (#145516)

tokumaru wrote:

So, when the carry flag is set, SBC #$00 subtracts nothing, but if it's cleared, it will subtract #$01 instead? If this is right, does this work with ADC too?

tokumaru wrote:

These are ZP variables that point to the current level map, a screen definition and a metatile definition, so you can read the data that composes the level.

So, they point the locations of the tables in ROM?
LevelPointer points to [1.], ScreenPointer points to [2.] and MetatilePointer points to [3.]

[1.]

Code:

Stage1:
 .dw Stg1Screen1,Stg1Screen2,Stg1Screen3,Stg1Screen4,Stg1Screen5

[2.]

Code:

Stg1Screen1:
 .dw MT0F,MT0F,MT0F,MT0F,MT0F,MT0F,MT0F,MT0F
 .dw MT0F,MT0F,MT0F,MT0F,MT0F,MT0F,MT0F,MT0F
 .dw MT0F,MT0F,MT0F,MT0F,MT0F,MT0F,MT0F,MT0F
 .dw MT0F,MT0A,MT0D,MT0F,MT0F,MT0F,MT0F,MT0F
 .dw MT0F,MT08,MT0C,MT0F,MT13,MT11,MT0F,MT12
 .dw MT13,MT07,MT10,MT05,MT04,MT04,MT04,MT04
 .dw MT04,MT04,MT04,MT06,MT00,MT00,MT00,MT00
 .dw MT00,MT00,MT00,MT00,MT00,MT00,MT00,MT00

[3.]

Code:

MT00:
 .db $29,$2A,$29,$2A
 .db $39,$3A,$39,$3A
 .db $29,$2A,$29,$2A
 .db $39,$3A,$39,$3A
 .db %01010101

Re: Game project help and progress thread
by tokumaru on 2015-04-17 (#145533)

Tsutarja wrote:

So, when the carry flag is set, SBC #$00 subtracts nothing, but if it's cleared, it will subtract #$01 instead?

Exactly. Whenever the carry is clear it will subtract one extra unit.

This is why we need to use SEC before subtractions, we don't want to accidentally subtract one more than we should from the first byte, we only want to do that to the other bytes, to propagate any possible borrow.

Quote:

If this is right, does this work with ADC too?

Yes, the carry affects additions too, but in the opposite way. Whenever it's set, it adds one extra unit.

Again, this is why we use CLC before additions, we don't want to add one more than the actual number we're adding, we only want to propagate the carry to the other places.

Quote:

So, they point the locations of the tables in ROM?

Yes, exactly like in your example.

The update routine itself sets the ScreenPointer and the MetatilePointer as necessary every time it's called, but you have to set LevelPointer whenever you start a new level.

Re: Game project help and progress thread
by Tsutarja on 2015-04-18 (#145594)

Okay, I've been trying to think of how to use the decoding subroutine as a loop to draw a screen, but it has been quiet difficult to think of a good way to make it work. I probably could make it work, but it would be very slow. I want to have the individual screen loader as a subroutine in a way that I can use it anytime by giving the address of the screen. If possible to have it so that it can draw both nametables. I guess column per frame is not too much to overflow the buffer. How should be doing this? (I don't know if this is simple or not, but I may be thinking too complicated again)

Re: Game project help and progress thread
by tokumaru on 2015-04-18 (#145605)

If you're doing things as I suggested, drawing the initial screen is as simple as this:

Code:

   sec
   lda CameraX+0
   sbc #$20
   sta ColumnX+0
   lda CameraX+1
   sbc #$00
   sta ColumnX+1
   lda #$50 ;10 columns * 8 metatiles
   sta Counter
Loop:
   jsr DecodeMetatile
   ;OMITTED: WRITE BUFFER TO VRAM
   dec Counter
   beq Done
   clc
   lda ColumnX+0
   adc #$04
   sta ColumnX+0
   lda ColumnX+1
   adc #$00
   sta ColumnX+1
Done:

You can't possibly think that reimplementing the entire decoding proccess is simpler than this loop.

If you make the VRAM updating its own subroutine, the omitted part is just another JSR. You'd use the same routine in the NMI. Subroutines are your friends, so don't repeat yourself unless you absolutely need to.

Re: Game project help and progress thread
by Tsutarja on 2015-04-19 (#145624)

tokumaru wrote:

Code:

   sec
   lda CameraX+0
   sbc #$20
   sta ColumnX+0
   lda CameraX+1
   sbc #$00
   sta ColumnX+1
   lda #$50 ;10 columns * 8 metatiles
   sta Counter
Loop:
   jsr DecodeMetatile
   ;OMITTED: WRITE BUFFER TO VRAM
   dec Counter
   beq Done
   clc
   lda ColumnX+0
   adc #$04
   sta ColumnX+0
   lda ColumnX+1
   adc #$00
   sta ColumnX+1
Done:

I just tested the code, but it seems like the SBC #$00 causes ColumnX+1 to underflow to #$FF (because camera position is #$00 on bot X and Y axis) and not drawing the screen because it's larger than the level length. According to FCEUX's debugger after the screen draw routine finished, the game goes to loop the IRQ routine infinitely for some reason (Interrupt disable flag is set so it shouldn't happen). I tried removing the SBCs, but then the ADCs will mess something up (at least it seems like it as far as my debugging skills goes), and the loop obviously doesn't work without them. I did try setting the camera position to #$20 to prevent the carry flag getting cleared and the ColumnX+1 getting underflowed, but that didn't help. I also did take the background update routine from NMI as a separate subroutine so I can use it at the omitted part like you suggested.

Re: Game project help and progress thread
by tokumaru on 2015-04-19 (#145630)

Hum... it's hard to debug this not knowing how things are organized in the ROM, but this loop is supposed to run before the game loop, with rendering turned off. You also don't want the NMI meddling with thing while this runs, so make sure to configure flags as necessary to keep the NMI handler from doing anything PPU related.

The Camera must be positioned already, as does the pointer to the level and the level length, all used by the DecodeMetatile subroutine. This is all I can think of at the moment, but look hard for anything that could be interfering with the variables used here.

If you want me to take a look at the ROM, I can.

The underflow you mentioned is expected, which is why we check whether the column is within the boundaries of the level inside the DecodeMetatile routine. ColumnX will be out of bounds for the first few iterations of the loop, due to CameraX being 0, but since ColumnX is incremented each iteration, it will eventually be within bounds.

Does the normal column rendering during scrolling work? Since you're reusing code from that, you must make sure that works properly.

Re: Game project help and progress thread
by Tsutarja on 2015-04-26 (#146221)

I think I'm going to make the sound engine next. How difficult would it be to implement "instruments" that have pre-defined volume and pitch envelopes? I think that would save some space over writing the envelopes in the song data.

Re: Game project help and progress thread
by tepples on 2015-04-26 (#146224)

Most NES games' music engines use some sort of "instrument" to represent duty, volume, and pitch changes. Play around in FamiTracker to see one common method.

Re: Game project help and progress thread
by za909 on 2015-04-26 (#146258)

Tsutarja wrote:

There are really only two options that come to my mind:
1. Implementing some sort of ADSR envelope in software, therefore calculating your volume and other changes with countdowns and selectable load values for these counters (Capcom games use this method)
2. Making a bunch of tables with all the raw values inteded to be consequently written to $4000 and $4004, and then writing the next item one after the other until the stream is terminated or halted, until the engine signals that it has read a note cut byte, which then runs the "release" part of the instrument.

As for pitch, you have to apply the deviation to the low period value once you have it, and you can also store this as a data stream. I do my vibrato in a cheap way (I guess) by using 4 different 16-byte long sequences, all containing a stream of signed relative deviations from the previous value. I can simply select which sequence to use by changing the upper 4 bits of my variable, the lower 4 bits are loaded into Y, and I wrap these bits back to 0 when they overflow.
Plus it also has a countdown from the start of the note, so I can make it automatically apply mid-note or whenever I want.
(I realise this is really out of context so it doesn't make much sense but hopefully you can see what I meant)

Code:

  VibHandler:
  ; apply vibrato to a pulse channel
   lda pu1_vibenable,x
   bmi +thereisvib ; if the vibrato is disabled, don't waste time with this
   rts 

   +thereisvib:
   bit temp_0
   bvc +read ; keyon frame has to initialize the value reading vector
   ldy temp_6 ; store 4 high bits for the value reading
   lda pu1_vibtbl,y
   sta pu1_vibphase,x
   +read:
   ldy pu1_vibphase,x
   lda Vibrato_TBL,y
   sta pu1_vibvalue,x
   tya
   and #$0F ; if all 16 items are done, loop back to the first
   cmp #$0F
   bne +notatend
   tya 
   and #$F0
   sta pu1_vibphase,x
   jmp +here ; skip increment here
   
   +notatend:
   inc pu1_vibphase,x
   +here:
   lda pu1_vibtimeleft,x
   beq +timeover
   dec pu1_vibtimeleft,x ; timer is nonzero so no vibrato applies yet
   rts
   
   +timeover:
   lda temp_6
   asl a
   asl a
   tay
   lda pu1_loSHmu,y
   clc
   adc pu1_vibvalue,x ; add relative sine to the low period
   sta pu1_loSHmu,y
   rts

 Vibrato_TBL: ; contains relative waves for modulation
  .db $01,$01,$00,$01,$00,$00,$FF,$00,$FF,$FE,$FF,$00,$FF,$00,$01,$02  ; 16/60 Hz Depth 3
  .db $02,$01,$00,$FF,$FE,$FE,$FF,$03,$02,$01,$00,$FF,$FE,$FE,$FF,$03 ; 8/60 Hz  Depth 3
  .db $03,$FD,$FD,$03,$03,$FD,$FD,$03,$03,$FD,$FD,$03,$03,$FD,$FD,$03 ; 4/60 Hz  Depth 3
  .db $02,$02,$01,$FE,$FD,$FE,$FF,$03,$02,$02,$01,$FE,$FD,$FE,$FF,$03  ; 8/60 Hz  Depth 5

Re: Game project help and progress thread
by tepples on 2015-04-26 (#146267)

za909 wrote:

There are really only two options [for defining envelopes] that come to my mind:
1. Implementing some sort of ADSR envelope in software, therefore calculating your volume and other changes with countdowns and selectable load values for these counters (Capcom games use this method)
2. Making a bunch of tables with all the raw values inteded to be consequently written to $4000 and $4004, and then writing the next item one after the other until the stream is terminated or halted, until the engine signals that it has read a note cut byte, which then runs the "release" part of the instrument.

NerdTracker uses mostly 1, and FamiTracker uses all 2. My own music engine uses a mix of the two: tables at 2 bytes per frame for the attack and early decay, and then a simple volume ramp for the late decay and sustain. This technique of data-heavy attacks and parameter-driven sustains was inspired by Linear Arithmetic synthesis, a technique used in Roland's D-50 synthesizer that combines data-heavy PCM attack transients with parameter-driven digital subtractive synthesis for the rest of the note.

Re: Game project help and progress thread
by Tsutarja on 2015-04-26 (#146326)

I think I'm going with having streams of volume duty and pitch data that are read every frame. #$00 would be no volume or pitch envelope, so it's skipped. Duty would be assigned to the notes in instrument, but can be changed with the duty stream if necessary. I don't think that I'll be needing values higher than #$0F for the envelopes, so I could use the upper bits to set some parameters, like add or subtract for pitch and loop flag that would instead of changing the values, would jump X amount of bytes back in the stream and read that instead. I attached one of the songs I've made for the game so you get some kind of idea what kind of style I use (I don't use instruments in FamiTracker because for me it's easier to read when volume, pitch, etc are visible).

Attachment:

Kemono Boss 900 BPM.ftm [13.87 KiB]
Downloaded 81 times

EDIT: Here is how I was planning of making the instruments and envelopes:

Code:

InstrumentSQ_Duty0:
 .db %00110000         ; Duty 12.5%, Length Counter Halt, Constant Volume

InstrumentSQ_Duty1:
 .db %01110000         ; Duty 25%, Length Counter Halt, Constant Volume

InstrumentSQ_Duty2:
 .db %10110000         ; Duty 50%, Length Counter Halt, Constant Volume

InstrumentSQ_Duty3:
 .db %11110000         ; Duty 75%, Length Counter Halt, Constant Volume

InstrumentNoise:
 .db %00110000

 ; %xxx1 xxxx 0 = Add, 1 = Subtract (Pitch only) ($10)
 ; %xx1x LLLL Loop Flag, Subtract 'LLLL' from envelope stream counter  ($2L)
 ; %x1xx WWWW Wait 'WWWW' frames and move on to the next byte ($4W)
 ; %1xxx xxxx Halt Flag, Stop envelope update and leave last updated value ($80)

            ; Square Volume Envelopes

SQVolEnv00:
 .db $07,$07,$07,$06,$06,$05,$80

SQVolEnv01:
 .db $02,$02,$02,$01,$80

SQVolEnv02:
 .db $06,$06,$07,$80

SQVolEnv03:
 .db $02,$02,$02,$00,$00,$01,$80

SQVolEnv04:
 .db $06,$06,$07,$01,$80

SQVolEnv05:
 .db $04,$04,$03,$01,$80

SQVolEnv06:
 .db $02,$02,$03,$01,$80

            ; Noise Volume Envelopes

NoiseVolEnv00:
 .db $05,$05,$03,$03,$01,$00,$80

NoiseVolEnv01:
 .db $03,$03,$02,$02,$01,$00,$80

            ; Square Pitch Envelopes

SQPtcEnv00:
 .db $43,$14,$00,$07,$00,$17,$00,$24

SQPtcEnv01:
 .db $46,$14,$00,$00,$07,$00,$00,$00,$17,$00,$00,$27

SQPtcEnv02:
 .db $4F,$4B,$14,$00,$00,$07,$00,$00,$00,$17,$00,$00,$27

Re: Game project help and progress thread
by Tsutarja on 2015-04-28 (#146414)

I think I have the data format ready. Do you think that this would be too complex/slow to process, and can it be simplified?

I included only Square 1 channel to not to make the post too long.

Code:

            ; Channels in track

MusicBoss:
 .dw BossSQ1,BossSQ2,BossTri,BossNoise,BossDMC

            ; Patterns in channel

BossSQ1:
 .dw BossSQ1_00,BossSQ1_00
 .dw BossSQ1_01,BossSQ1_02,BossSQ1_03,BossSQ1_04
 .dw BossSQ1_01,BossSQ1_02,BossSQ1_03,BossSQ1_04
 .dw BossSQ1_05,BossSQ1_06,BossSQ1_05,BossSQ1_07

            ; Streams in Pattern

BossSQ1_00:
 .dw BossSQ1_00_Note,BossSQ1_00_Volume,BossSQ1_00_Vibrato
 .dw BossSQ1_00_Duty,BossSQ1_00_Len_Lo,BossSQ1_00_Len_Hi

BossSQ1_01:
 .dw BossSQ1_01_Note,BossSQ1_01_Volume,BossSQ1_01_Vibrato
 .dw BossSQ1_01_Duty,BossSQ1_01_Len_Lo,BossSQ1_01_Len_Hi

BossSQ1_02:
 .dw BossSQ1_02_Note,BossSQ1_02_Volume,BossSQ1_02_Vibrato
 .dw BossSQ1_02_Duty,BossSQ1_02_Len_Lo,BossSQ1_02_Len_Hi

BossSQ1_03:
 .dw BossSQ1_03_Note,BossSQ1_03_Volume,BossSQ1_03_Vibrato
 .dw BossSQ1_03_Duty,BossSQ1_03_Len_Lo,BossSQ1_03_Len_Hi

BossSQ1_04:
 .dw BossSQ1_04_Note,BossSQ1_04_Volume,BossSQ1_04_Vibrato
 .dw BossSQ1_04_Duty,BossSQ1_04_Len_Lo,BossSQ1_04_Len_Hi

BossSQ1_05:
 .dw BossSQ1_05_Note,BossSQ1_05_Volume,BossSQ1_05_Vibrato
 .dw BossSQ1_05_Duty,BossSQ1_05_Len_Lo,BossSQ1_05_Len_Hi

BossSQ1_06:
 .dw BossSQ1_06_Note,BossSQ1_06_Volume,BossSQ1_06_Vibrato
 .dw BossSQ1_06_Duty,BossSQ1_06_Len_Lo,BossSQ1_06_Len_Hi

BossSQ1_07:
 .dw BossSQ1_07_Note,BossSQ1_07_Volume,BossSQ1_07_Vibrato
 .dw BossSQ1_07_Duty,BossSQ1_07_Len_Lo,BossSQ1_07_Len_Hi

And here is example of one pattern's streams:

Code:

BossSQ1_00_Note:
 .db B_2,As2,A_2,Gs2            ; Notes represent a value from range of $00 - $5E
 .db B_2,As2,A_2,Gs2            ; This value is used in X or Y incremented addressing
 .db B_2,As2,A_2,Gs2            ; to get the correct period value for the note
 .db B_2,As2,A_2,Gs2

BossSQ1_00_Volume:
 .db SQVolEnv00,SQVolEnv00,SQVolEnv00,SQVolEnv00           ; Volume, pitch and duty envelopes are in my previous post
 .db SQVolEnv00,SQVolEnv00,SQVolEnv00,SQVolEnv00
 .db SQVolEnv00,SQVolEnv00,SQVolEnv00,SQVolEnv00
 .db SQVolEnv00,SQVolEnv00,SQVolEnv00,SQVolEnv00

BossSQ1_00_Vibrato:
 .db SQPtcEnv00,SQPtcEnv00,SQPtcEnv00,SQPtcEnv00
 .db SQPtcEnv00,SQPtcEnv00,SQPtcEnv00,SQPtcEnv00
 .db SQPtcEnv00,SQPtcEnv00,SQPtcEnv00,SQPtcEnv00
 .db SQPtcEnv00,SQPtcEnv00,SQPtcEnv00,SQPtcEnv00

BossSQ1_00_Duty:
 .db SQ_Duty1,SQ_Duty1,SQ_Duty1,SQ_Duty1
 .db SQ_Duty1,SQ_Duty1,SQ_Duty1,SQ_Duty1
 .db SQ_Duty1,SQ_Duty1,SQ_Duty1,SQ_Duty1
 .db SQ_Duty1,SQ_Duty1,SQ_Duty1,SQ_Duty1

BossSQ1_00_Len_Lo:
 .db $06,$07,$06,$07
 .db $06,$07,$06,$07
 .db $06,$07,$06,$07
 .db $06,$07,$06,$07

BossSQ1_00_Len_Hi:
 .db $00,$00,$00,$00                 ; I have these just in case
 .db $00,$00,$00,$00                 ; Not sure if I'll ever need these
 .db $00,$00,$00,$00
 .db $00,$00,$00,$00

BossSQ1_01_Note:
 .db B_2,Cs3,D_3,Fs3,E_3,D_3,A_2

BossSQ1_01_Volume:
 .db SQVolEnv02,SQVolEnv02,SQVolEnv02,SQVolEnv02,SQVolEnv02,SQVolEnv02,SQVolEnv02

BossSQ1_01_Vibrato:
 .db SQPtcEnv00,SQPtcEnv00,SQPtcEnv01,SQPtcEnv01,SQPtcEnv01,SQPtcEnv01,SQPtcEnv01

BossSQ1_01_Duty:
 .db SQ_Duty1,SQ_Duty1,SQ_Duty1,SQ_Duty1,SQ_Duty1,SQ_Duty1,SQ_Duty1

BossSQ1_01_Len_Lo:
 .db $06,$07,$0D,$13,$14,$1A,$0C

BossSQ1_01_Len_Hi:
 .db $00,$00,$00,$00,$00,$00,$00

BossSQ1_02_Note:
 .db B_2

BossSQ1_02_Volume:
 .db SQVolEnv02

BossSQ1_02_Vibrato:
 .db SQPtcEnv02

BossSQ1_02_Duty:
 .db SQ_Duty1

BossSQ1_02_Len_Lo:
 .db $68

BossSQ1_02_Len_Hi:
 .db $00

Re: Game project help and progress thread
by za909 on 2015-04-30 (#146643)

You're on the right track but by having looked at your plans, I can't help advising you to find some other way of representing your note effects because this produces TONS of redundant data, for example you are storing the current instrument for every single note. Instead, you should keep a variable like "Pu1Instr" or something, use that as an index to find the table representing this instrument number, and having an effect command to change the Pu1Instr variable. This way you will only have two bytes added when you actually need to switch to different instrument.
I remember in my earlier sound engine I did this by setting up an indirect vector to the desired table, then loading the current volume write number into Y and reading from the table. ~~(Sorry, couldn't find my source code for that)~~
As you can see I only had 16 pulse instruments, the other bits of p1_flags were bitflags for detune, hw. sweep mode, etc.

Code:

  GetInstrumentId:
; Put instrument ID in Y for index

   lda p1_flags,x 
   and #%00001111
   tay   
   rts

Code:

; Y is Instrument ID
   sty temp_4
   tya
   asl a
   tay
   lda instrumentaddrTBL,y
   sta temp_E
   lda instrumentaddrTBL+1,y
   sta temp_F
   ldy p1_patchseq,x
   lda (temp_E),y
   sta p1_shvol,x
   ldy temp_4
   rts

Re: Game project help and progress thread
by Tsutarja on 2015-05-03 (#146861)

za909 wrote:

Instead, you should keep a variable like "Pu1Instr" or something, use that as an index to find the table representing this instrument number, and having an effect command to change the Pu1Instr variable. This way you will only have two bytes added when you actually need to switch to different instrument.

That actually sounds better. I'll have to edit the data later today.

By the way, when I start to actually write the sound engine, which one of these would be better:
1. Run the whole sound engine in NMI
2. Run sound engine outside NMI and copy bytes to the registers during NMI

I have heard both of these methods, but I'm not sure which one is better and what are their possible downsides.

Re: Game project help and progress thread
by tokumaru on 2015-05-04 (#146873)

Tsutarja wrote:

2. Run sound engine outside NMI and copy bytes to the registers during NMI

This approach makes more sense for PPU updates, since access to VRAM is restricted, but the APU can be accessed at any time, so there's no problem in doing everything at once.

The only advantage I can see in buffering APU updates is that it might make the timing a little steadier, since the time taken to process the audio data won't delay the register writes, but unless your sound engine's execution times vary greatly, I doubt the difference is noticeable.

Re: Game project help and progress thread
by za909 on 2015-05-05 (#146921)

By the way, if you are struggling with DPCM samples and the space loss they cause in the fixed bank, you can also try playing your code as a sound! Most of the time it's just random garbage (but you can still use it as a looped wind sound so your noise channel is free to do something else) but I found that an unrolled PLA STA $2007 loop conveniently produces a C note sound with a very wavetable-esque tone to it. I made a little happy loop with it in Famitracker to show what it sounds like but my engine could do it as well if I spent an hour or two with it. (If you are working with CHR-RAM you can also show animated tiles made of code, sometimes you can find a useful flashing tile or something like that)

Edit: As far as I know, Rockman 4 MI uses this method and plays some garbled sounds together with the noise channel.

Re: Game project help and progress thread
by Tsutarja on 2015-05-06 (#146959)

I made a list of sound engine variables that I need to use. Tell me if there are some important ones missing, considering the data type I'm using:

The variables with 'env' are used to keep track of the offset in the envelope stream.

Code:

sound_disable_flag .rs 1
current_song .rs 1
current_channel .rs 1
sq1_pattern .rs 1
sq1_stream .rs 1
sq1_vol .rs 1
sq1_vol_env .rs 1
sq1_pitch_env .rs 1
sq1_note_lo .rs 1
sq1_note_hi .rs 1
sq1_len_lo .rs 1
sq1_len_hi .rs 1
sq1_inst .rs 1
sq2_pattern .rs 1
sq2_stream .rs 1
sq2_vol .rs 1
sq2_vol_env .rs 1
sq2_pitch_env .rs 1
sq2_note_lo .rs 1
sq2_note_hi .rs 1
sq2_len_lo .rs 1
sq2_len_hi .rs 1
sq2_inst .rs 1
tri_pattern .rs 1
tri_stream .rs 1
tri_note_lo .rs 1
tri_note_hi .rs 1
tri_len_lo .rs 1
tri_len_hi .rs 1
noise_pattern .rs 1
noise_stream .rs 1
noise_note .rs 1
noise_len_lo .rs 1
noise_len_hi .rs 1
dmc_pattern .rs 1
dmc_stream .rs 1
dmc_sample .rs 1
dmc_len_lo .rs 1
dmc_len_hi .rs 1

Here is a short compilation of the current structure of the data format:

Code:

; Music Data

            ; Channels in track

MusicBoss:
 .dw BossSQ1,BossSQ2,BossTri,BossNoise,BossDMC

            ; Patterns in channel

BossSQ1:
 .dw BossSQ1_00,BossSQ1_00
 .dw BossSQ1_01,BossSQ1_02,BossSQ1_03,BossSQ1_04
 .dw BossSQ1_01,BossSQ1_02,BossSQ1_03,BossSQ1_04
 .dw BossSQ1_05,BossSQ1_06,BossSQ1_05,BossSQ1_07

BossDMC:
 .dw BossDMC_00,BossDMC_01
 .dw BossDMC_02,BossDMC_02,BossDMC_02,BossDMC_02
 .dw BossDMC_02,BossDMC_02,BossDMC_02,BossDMC_03
 .dw BossDMC_02,BossDMC_02,BossDMC_02,BossDMC_04

            ; Streams in Pattern

BossSQ1_00:
 .dw BossSQ1_00_Note,BossSQ1_00_Inst
 .dw BossSQ1_00_Len_Lo,BossSQ1_00_Len_Hi

BossDMC_00:
 .dw BossDMC_00_Sample,BossDMC_00_Lo,BossDMC_00_Hi

Code:

; Stream Data

            ; Square 1

BossSQ1_00_Note:
 .db B_2,As2,A_2,Gs2
 .db B_2,As2,A_2,Gs2
 .db B_2,As2,A_2,Gs2
 .db B_2,As2,A_2,Gs2

BossSQ1_00_Inst:
 .db SQInstrument0,SQInstrument0,SQInstrument0,SQInstrument0
 .db SQInstrument0,SQInstrument0,SQInstrument0,SQInstrument0
 .db SQInstrument0,SQInstrument0,SQInstrument0,SQInstrument0
 .db SQInstrument0,SQInstrument0,SQInstrument0,SQInstrument0

BossSQ1_00_Len_Lo:
 .db $06,$07,$06,$07
 .db $06,$07,$06,$07
 .db $06,$07,$06,$07
 .db $06,$07,$06,$07

BossSQ1_00_Len_Hi:
 .db $00,$00,$00,$00
 .db $00,$00,$00,$00
 .db $00,$00,$00,$00
 .db $00,$00,$00,$00

            ; DMC

BossDMC_00_Sample:
 .db Kick,Kick,Kick,Kick

BossDMC_00_Len_Lo:
 .db $1A,$1A,$1A,$1A

BossDMC_00_Len_Hi:
 .db $00,$00,$00,$00

Code:

; Instrument Data

SQInstrumentNull:
 .db SQ_Duty0,NoEnv,NoEnv

SQInstrument0:
 .db SQ_Duty1,SQVolEnv0,SQPtcEnv0

SQInstrument1:
 .db SQ_Duty1,SQVolEnv2,SQPtcEnv0

SQInstrument2:
 .db SQ_Duty1,SQVolEnv2,SQPtcEnv1

SQInstrument3:
 .db SQ_Duty1,SQVolEnv2,SQPtcEnv2

SQInstrument4:
 .db SQ_Duty1,SQVolEnv4,SQPtcEnv0

SQInstrument5:
 .db SQ_Duty1,SQVolEnv1,SQPtcEnv0

SQInstrument6:
 .db SQ_Duty2,SQVolEnv3,NoEnv

SQInstrument7:
 .db SQ_Duty2,SQVolEnv5,SQPtcEnv0

SQInstrument8:
 .db SQ_Duty0,SQVolEnv6,SQPtcEnv0


SQ_Duty0:
 .db %00110000         ; Duty 12.5%, Length Counter Halt, Constant Volume

SQ_Duty1:
 .db %01110000         ; Duty 25%, Length Counter Halt, Constant Volume

SQ_Duty2:
 .db %10110000         ; Duty 50%, Length Counter Halt, Constant Volume

SQ_Duty3:
 .db %11110000         ; Duty 75%, Length Counter Halt, Constant Volume

Noise_0:
 .db %00110000

Noise_Duty0:
 .db %00000000

Noise_Duty1:
 .db %10000000

 ; %xxx1 xxxx 0 = Add, 1 = Subtract (Pitch only) ($10)
 ; %xx1x LLLL Loop Flag, Subtract 'LLLL' from envelope stream counter  ($2L)
 ; %x1xx WWWW Wait 'WWWW' frames and move on to the next byte ($4W)
 ; %1xxx xxxx Halt Flag, Stop envelope update and leave last updated value ($80)

NoEnv:
 .db $80

            ; Square Volume Envelopes

SQVolEnv00:
 .db $07,$07,$07,$06,$06,$05,$80

SQVolEnv01:
 .db $02,$02,$02,$01,$80

SQVolEnv02:
 .db $06,$06,$07,$80

SQVolEnv03:
 .db $02,$02,$02,$00,$00,$01,$80

SQVolEnv04:
 .db $06,$06,$07,$01,$80

SQVolEnv05:
 .db $04,$04,$03,$01,$80

SQVolEnv06:
 .db $02,$02,$03,$01,$80

            ; Noise Volume Envelopes

NoiseVolEnv00:
 .db $05,$05,$03,$03,$01,$00,$80

NoiseVolEnv01:
 .db $03,$03,$02,$02,$01,$00,$80

            ; Square Pitch Envelopes

SQPtcEnv00:
 .db $43,$14,$00,$07,$00,$17,$00,$24

SQPtcEnv01:
 .db $46,$14,$00,$00,$07,$00,$00,$00,$17,$00,$00,$27

SQPtcEnv02:
 .db $4F,$4B,$14,$00,$00,$07,$00,$00,$00,$17,$00,$00,$27

Code:

; Note Effects

         ; Effects (Bit 7 indicates effect if it's set)

OctUp = $C0      ; %1100 0000
OctDwn = $A0      ; %1010 0000
Loop = $90      ; %1001 0000
Stop = $80      ; %1000 0000

Code:

; DMC Data

Kick = $00      ; DMC offsets
SnareLo = $01
SnareHi = $02
TomLo = $03
TomMed = $04
TomHi = $05

DMC_Pitch:
 .db $0E,$0C,$0D,$0C,$0D,$0E

DMC_Address:
 .db $FD,$FE,$FE,$FF,$FF,$FF

DMC_Length:
 .db $08,$30,$30,$20,$20,$20

Code:

; DMC Addresses

 .bank 3
 .org $FF40
 .incbin "KickLen$08.dmc"

 .org $FF80
 .incbin "SnareLen$30.dmc"

 .org $FFC0
 .incbin "TomLen$20.dmc"

The post ended up begin pretty long, even though I only put in about 10% of the music data

Re: Game project help and progress thread
by za909 on 2015-05-06 (#147014)

Code:

BossSQ1_00_Inst:
 .db SQInstrument0,SQInstrument0,SQInstrument0,SQInstrument0
 .db SQInstrument0,SQInstrument0,SQInstrument0,SQInstrument0
 .db SQInstrument0,SQInstrument0,SQInstrument0,SQInstrument0
 .db SQInstrument0,SQInstrument0,SQInstrument0,SQInstrument0

BossSQ1_00_Len_Lo:
 .db $06,$07,$06,$07
 .db $06,$07,$06,$07
 .db $06,$07,$06,$07
 .db $06,$07,$06,$07

BossSQ1_00_Len_Hi:
 .db $00,$00,$00,$00
 .db $00,$00,$00,$00
 .db $00,$00,$00,$00
 .db $00,$00,$00,$00

This is the part that's probably going to cause you a huge space issue over time. I have implemented a "popular" note format for myself and I can't begin to tell you how much space it saves in comparison. It pretty much merges the Len_Lo Len_Hi and Note data into a single byte.
Now my version is not the most versatile out there because it doesn't support alternating speeds like you would do it with Fxx Fxx±1 Fxx Fxx±1 in Famitracker. You have to introduce a speed variable and octave variables for the 3 tonal channels for it. The data format looks like this:
LLLN NNNN - where L is your note length (but L = 0 is reserved for meaning note effects) and N is the positive deviation from the current octave of the channel. So if say, N = %00010 and your octave is set to $03 then it's going to find the 3x12 +2 note ID. This limits you to be able to access 2 and a half octaves at a time, but you can also program a single-byte effect to add or decrease octave by 2 to make it even more effective. When you are calculating the note ID, you can also add transposition values or arpeggio values to find what you really after at that time.
My engine does this: Octave×12 + N + Arpeggio + Channel transposition + Global transposition = Note ID

And I calculate length in frames simply by taking the speed value and shifting it left L-1 times. It evaluates to powers of two-number of rows in Famitracker basically.

Re: Game project help and progress thread
by Tsutarja on 2015-05-06 (#147015)

Instead of giving every note instrument value separately, I could make a it so that the instrument table only has the changes in the instruments and how many notes that instrument plays before changing to the next one.

This would be all for SQ1 in the boss theme I have already made.

Code:

BossSQ1InstTable:
 .db SQInstrument0
 .db SQInstrument1,SQInstrument2,SQInstrument3
 .db SQInstrument1,SQInstrument2,SQInstrument3
 .db SQInstrument4

BossSQ1InstLen:
 .db $10
 .db $02,$05,$01
 .db $02,$05,$01
 .db $40

I also could leave len_hi out and make a tied note effect. I didn't use len_hi in the boss theme at all, so It's just a bunch of $00 in there for no reason. The tied note effect would tell the sound engine to read next len_lo byte without updating anything else extending the duration of the note. I probably won't use len_hi a lot so this would also save a lot of space.

Re: Game project help and progress thread
by tepples on 2015-05-07 (#147027)

za909 wrote:

I have implemented a "popular" note format for myself and I can't begin to tell you how much space it saves in comparison. It pretty much merges the Len_Lo Len_Hi and Note data into a single byte.
[...]
You have to introduce a speed variable and octave variables for the 3 tonal channels for it. The data format looks like this:
LLLN NNNN - where L is your note length (but L = 0 is reserved for meaning note effects) and N is the positive deviation from the current octave of the channel.

I do essentially the same thing in my own engine, except I roll the "current octave" into channel transposition, and I reserve N=27-31 for effects so that I can look up L from the table [1, 2, 3, 4, 6, 8, 12, 16] so that I get values between powers of two that commonly occur with dotted notes and the swung time that appears in a lot of my music. Then N=25 is tie (with L being additional length), N=26 is note cut (with L being length of rest), and N=0 to 24 are offsets from the channel transposition (which can be changed with transpose effects).

Re: Game project help and progress thread
by Tsutarja on 2015-06-26 (#149778)

I just recently got back to this project!
I tried to continue the sound engine, but I can't figure out a good way to run it (which is probably why I have not continued it). I need to get some addresses setup when a song or SFX is requested, and then run the engine itself in such way that I can use the patterns and instruments. I'm pretty sure I posted the sound format on this topic already. That might be useful. Also, any changes to the sound format that would make running it easier are welcome too.

Then the other stuff!
I made some updates to the graphics (background made in YY-CHR and sprites added afterwards in paint):
The top will be parallax scrolled by the way

I also had an idea where the boss of the stage that takes place in a forest sets the forest on fire on it's appearance. The fire will be using the same wavy effect that I used in the title screen. Though, I may need to scrap the idea if it creates too much slowdown, but well see when I get there.

Re: Game project help and progress thread
by Tsutarja on 2015-07-14 (#150954)

Completely remade the player sprites. Instead of using edited Castlevania II Werewolf sprites, I only used them as a reference when I felt like I couldn't make some part look right. IMO, these look better for begin the player as the sprites are in a more upright posture than the enemy Werewolf in CV II.

Attachment:

KemonoPlayerSpr.png [ 3.61 KiB | Viewed 3382 times ]

Re: Game project help and progress thread
by Tsutarja on 2015-12-14 (#160653)

After a long break, I came back to this project. I managed to get the sprite assembler one step closer begin fully functional. The sprite appears on screen, however, there are some odd things I can't figure out:
1. The sprite "breaks" when scrolled off screen
2. Flip masks don't work correctly (?), $06D0 = h_flip and $06E0 = v_flip
3. Sprite no. 1 is displayed regardless of object sprite begin set to #$00 instead of #$01 ($0690 in memory)
4. obj_sprite_count has no changes in ram search (used to define how many 8x8 sprites to process for the metasprite)

The .nes file is attached, so people can take a look at the problem. I also attacked the .asm file that contains all the variables to make debugging easier. If there are any problems understanding where a specific variable is used, feel free to ask c:

Re: Game project help and progress thread
by Memblers on 2015-12-20 (#160965)

I haven't completely followed the previous pages of the thread, so hopefully I didn't miss some important background info.

But just from a quick view of the OAM memory, it looks like you're doing some kind of OAM cycling when the character is on the screen. When you scroll it off the screen, this OAM cycling totally breaks down. The character's sprites stay at the last OAM position used, and the X positions of all 64 sprites seem to gradually settle into a value that depends on the scroll position.

(edit: just noticed my reply is a week late, hope this little observation helps anyways)

Re: Game project help and progress thread
by Myask on 2015-12-23 (#161141)

Tsutarja wrote:

Like in Sonic 3 & Knuckles, Demon's Crest, or some others? (hmm, did the boss in Tails' Adventure do it?)

Re: Game project help and progress thread
by tokumaru on 2015-12-23 (#161144)

Myask wrote:

Like in Sonic 3 & Knuckles, Demon's Crest, or some others? (hmm, did the boss in Tails' Adventure do it?)

None of those games are on the NES though, so there's still merit in this idea.

Re: Game project help and progress thread
by Drew Sebastino on 2015-12-23 (#161145)

Well, now that Myask mentioned forests fires, I imagine a wavy effect you described would take up pretty much all the CPU time if you weren't using a mapper to change scrolling, because I think it has to be timed otherwise like mid scanline scrolling on the SNES. I really don't know much about the NES, but wouldn't it be possible to change out chrrom graphics for the BG every couple of frames to actually animate the fire without a special mapper?

tokumaru wrote:

Myask wrote:

Like in Sonic 3 & Knuckles, Demon's Crest, or some others? (hmm, did the boss in Tails' Adventure do it?)

None of those games are on the NES though, so there's still merit in this idea.

Yet another 16bit forest fire:

Re: Game project help and progress thread
by tokumaru on 2015-12-23 (#161149)

Espozo wrote:

Well, now that Myask mentioned forests fires, I imagine a wavy effect you described would take up pretty much all the CPU time if you weren't using a mapper to change scrolling

Yes, without mapper IRQs the CPU will be pretty busy. Ideally you'd place the effect near the top of the screen , so you still had a good portion of the time left for the game logic.

Quote:

wouldn't it be possible to change out chrrom graphics for the BG every couple of frames to actually animate the fire without a special mapper?

Well.. you need a special mapper for changing CHR-ROM anyway, so...

Games avoiding fancy raster effects would most likely do large fires by cycling the palette, which conveys the idea but doesn't look so cool.

Quote:

Yet another 16bit forest fire:

I'm sure there are plenty of examples, this idea isn't exactly the most original idea ever. Characters jumping around in blocky worlds have also been done to death, but we're still doing that, aren't we? :wink:

Re: Game project help and progress thread
by dougeff on 2015-12-23 (#161151)

What about DMC music channel IRQs?

Re: Game project help and progress thread
by Drew Sebastino on 2015-12-23 (#161152)

tokumaru wrote:

Games avoiding fancy raster effects would most likely do large fires by cycling the palette, which conveys the idea but doesn't look so cool.

That's actually what's done in the example I showed.

tokumaru wrote:

Well.. you need a special mapper for changing CHR-ROM anyway, so...

Doesn't just about any NES game past 1987 change CHR-ROM for different level graphics and stuff like that? What would require a more advanced mapper, swapping out the graphics, or doing row scrolling?

Re: Game project help and progress thread
by tokumaru on 2015-12-23 (#161159)

Espozo wrote:

Doesn't just about any NES game past 1987 change CHR-ROM for different level graphics and stuff like that?

Most mappers with fine CHR-ROM swapping (finer than 4KB) have IRQs too, so there really isn't much to choose in terms of mappers.

Quote:

What would require a more advanced mapper, swapping out the graphics, or doing row scrolling?

Swapping graphics in small chunks would require an advanced mapper, while row scrolling can be done with no mapper if you're willing to dedicate the CPU time necessary. If not, you need IRQs, which are only available in advanced mappers.

Re: Game project help and progress thread
by tepples on 2015-12-23 (#161161)

Espozo wrote:

wouldn't it be possible to change out chrrom graphics for the BG every couple of frames to actually animate the fire without a special mapper?

Like the fire in the first stage of Teenage Mutant Ninja Turtles II: The Arcade Game, which uses plain old MMC3?

Re: Game project help and progress thread
by rainwarrior on 2015-12-23 (#161163)

tokumaru wrote:

Quote:

Yet another 16bit forest fire:

The original plans for my own in-progress game included a forest fire.

Eventually I moved the forest away from the volcano, though as I revised my plans.

Re: Game project help and progress thread
by tokumaru on 2015-12-23 (#161164)

dougeff wrote:

What about DMC music channel IRQs?

They can be used for raster effects to some extent, but it's quite tricky. You'd think that if you started the same sample at the exact same time every time, that an IRQ would fire exactly the same number of cycles later every time, right? You'd be wrong. The APU has its own rhythm and as far as we know there are only 2 ways to use it for raster effects:

1- Have the IRQ fire a safe number of scanlines (a number larger than the maximum error) before a sprite 0 hit or sprite overflow that will be used to properly sync the CPU with the display.

2- Set a preliminary IRQ and measure how long it takes for it to fire (wasting CPU time), so you can measure the error and compensate for it (wasting more CPU time).

Re: Game project help and progress thread
by rainwarrior on 2015-12-23 (#161170)

tokumaru wrote:

The APU has its own rhythm...

What do you mean by this? I think the APU is very deterministic?

The primary problem with it as an IRQ timer is that it has a very coarse resolution. You can only change the sample length in 128 sample (16 byte) increments, and there are only 16 available samplerates. This makes it rather difficult to choose a sample-length + speed combination that lands where you want it, but once you choose one it should fire in a rather predictable location. (edit: lidnariq explains why this won't work, below.)

Picking an arbitrary scanline would be difficult. (Probably would require a lookup table of sample length + speed + extra CPU wait time?)

Re: Game project help and progress thread
by lidnariq on 2015-12-23 (#161175)

Tokumaru seems to be referring to that the bit phase of the DMC is in a semi-random state when you write to the registers, isn't changed by writing to the registers, and the only way you can know its value it to wait for an APU DMC interrupt.

Re: Game project help and progress thread
by rainwarrior on 2015-12-23 (#161180)

Isn't that only true if you're interrupting a running sample? If you let a sample finish (or reach the IRQ) it should be in a predictable state, shouldn't it? If you're using it for a raster split, presumably it's still empty from the previous frame.

You can also just set the sample length to 1, and wait enough cycles for the last byte to drain out, if you really need to.

Re: Game project help and progress thread
by lidnariq on 2015-12-23 (#161182)

No, the APU always run 8 bits at a time... when there's no data, rather than sitting idle it clocks through 8 bits of "do nothing"

Re: Game project help and progress thread
by tokumaru on 2015-12-23 (#161200)

I won't pretend I know a lot about the internals of the APU, but from trying to use DMC IRQs for scanline timing I realized IRQs simply won't fire a constant amount of time after you set them up. They'll vary by several scanlines, and I can only assume this is because the APU has its own pipeline of operation, much like the PPU repeats the whole process of generating video frames over and over. Think about the PPU: if rendering is off and you turn it on, the PPU doesn't immediately start a new frame just because a register was written to... the PPU will finish the current frame, and only then it'll start a new one, because the chip was made that way. I can only assume that something similar happens inside the APU, it does things in a certain order and will only start playing samples at certain times.

rainwarrior wrote:

I think the APU is very deterministic?

Oh, I'm sure it's quite deterministic in its own isolated universe, but it runs in parallel with the PPU while not in sync with it in any way. The CPU can sync with the PPU though NMIs or the status register, and it can sync with the DMC channel by waiting for a DMC IRQ to fire, but in order to use APU IRQs for timing raster effects, you kinda have to sync the CPU to both of them, and that's the hard part.

Quote:

The primary problem with it as an IRQ timer is that it has a very coarse resolution.

I wish! If this was the case I would long ago have assembled a table with parameters (sample length, rate and padding, like you mentioned) for all 240 scanlines and forever forget the NES doesn't have a built-in scanline counter.

lidnariq wrote:

Exactly. You can sync with the APU by setting up a preliminary IRQ, and by counting how much time passes until it fires you can tell how "off" the APU is compared to how you wanted it to be. Once you're synced with the APU, you can then predict when subsequent interrupts will fire, but since those will be relative to the "wrong" time when the first interrupt fired, you have to compensate for this error using the difference you encountered when waiting for that first IRQ, in order to find the correct scanline. I never manged to get this working correctly though... I did get a stable enough split that would jitter by up to 4 tiles (good enough for simple raster effects), but every few seconds it would jump several tiles and I never found out why.

rainwarrior wrote:

Again, I wish!

lidnariq wrote:

No, the APU always run 8 bits at a time... when there's no data, rather than sitting idle it clocks through 8 bits of "do nothing"

Thanks for clarifying that.