battletoads perfect if 258 scanlines, but slow/shakey if 262

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
battletoads perfect if 258 scanlines, but slow/shakey if 262
by on (#91108)
this is a fun one... what could be causing battletoads to run perfectly if i emulate 258 scanlines, but it becomes shakey and slow if emulate the proper 262??

these short videos show what i mean:

257 scanlines - http://rubbermallet.org/bt1.avi
262 scanlines - http://rubbermallet.org/bt2.avi


i'd appreciate any help here. i'll post code if there's part of it that you want to see. thanks again! :)

by on (#91109)
Are you making sure to emulate the penalty when a branch crosses a page?

by on (#91110)
yes, my CPU timings match 100% with the real 6502. all my branch ops have a line that increments the cycle count if ((oldpc & 0xFF00) != (pc & 0xFF00))

btw, i just noticed that the sound cuts in and out in those AVI files. the emu doesn't actually sound like that but i was recording the gameplay to video while in a class at college, and i was doing it on my netbook with an Atom N450. it was a bit too slow to play a game and encode 60 FPS XViD in realtime so it broke the audio. :P

EDIT: after playing the videos again on my desktop, the sound is fine. it was cutting out in playback via VLC on the Atom chip. those CPUs are garbage!

by on (#91111)
Did you ever fix scrolling? Get that working first.

by on (#91112)
Dwedit wrote:
Did you ever fix scrolling? Get that working first.


yes, i did. all games i try scroll properly now. any particularly picky games i should run to double check that other than battletoads?

for a long time, battletoads vertical scrolling was very jerky but i fixed it a couple days ago. i wasn't updating the fine vscroll register on writes to the PPU $2006. that got it working correctly, and it fixed a couple other games too.

by on (#91115)
Well, probably...

a) your frame is taking more than 29780 CPU cycles, as result of wrong calculation or bad timestamp system;

b) you have a problem clearing the sprite #0 hit (bit $40 of $2002);

c) your PPU/CPU alignment suffers of bad synchronization.

by on (#91117)
Is it missing the sprite 0 hit every other frame?

by on (#91119)
Yeah, sprite 0 only resets at the end of Vblank, around there. Check tech docs for exact time.

by on (#91122)
i've been working on it a little, and it's full speed. framerate is correct with 262 scanlines but my vertical scroll is jerky. it kind of looks like it's sprite 0 being hit in the wrong places on some frames. this is how it looks now on 262 scanlines:

http://rubbermallet.org/bt3.avi (25 MB)


this is what my code looks like for rendering each frame. should i even be manually clearing vblank in the pre-render scanline? it doesn't appear to make a difference either way. i'd imagine it doesn't since whenever a game reads the status register, it gets cleared anyway.

Code:
   if (PPU->bgvisible) {
      exec6502(86);
      PPU->addr = PPU->tempaddr;
      PPU->vblank = 0;
      PPU->sprzero = 0;
      PPU->sprover = 0;
      if (totalframes&1) exec6502(27);
         else exec6502(28);
   } else exec6502(114);
   for (scanline=0; scanline<261; scanline++) {
      exec6502(86);

      if (scanline <240) {
         if (PPU->bgvisible) PPU->addr = (PPU->addr & 0xFBE0) | (PPU->tempaddr & 0x041F) | ((PPU->yscroll & 7) << 12);
         renderscanline(scanline);

         if (cartridge.mapper == 4) {
            if (PPU->bgvisible || PPU->sprvisible) {
               map4irqdecrement();
               if ((map4->irqenable) && (map4->irqcounter==0)) irq6502();
            }
         }
   
      }
      if ((scanline%3)==0) exec6502(28);
         else exec6502(27);

      if (scanline == 241) {
         PPU->vblank = 1;
         if (PPU->nmivblank) nmi6502();
      }
   }
   totalframes++;

by on (#91124)
I believe that the Vblank flag gets cleared at the prerender scanline.

by on (#91126)
I still insist that you should count the number of CPU cycles actually executed in a frame (two consecutive VBlanks, for example). You're smashing the things about "262 scanlines", but did you count the number of CPU cycles?

Personally, I disagree about a run6502(num_of_cycles) though.

by on (#91130)
Zepper wrote:
I still insist that you should count the number of CPU cycles actually executed in a frame (two consecutive VBlanks, for example). You're smashing the things about "262 scanlines", but did you count the number of CPU cycles?

Personally, I disagree about a run6502(num_of_cycles) though.


i did, it does come out to ~29780 cycles. some frames may be 1 or 3 cycles off depending on what instructions are being run since it may not be able to perfectly sync since some are 2 or more cycles long.

my exec6502 function also keeps track of how many it was supposed to run and how many actually got run so it can fix the discrepancy on the next call.

by on (#91134)
Zepper, how do you handle the CPU execution if you don't use what i'm using?

by on (#91140)
miker00lz wrote:
Some frames may be 1 or 3 cycles off depending on what instructions are being run since it may not be able to perfectly sync since some are 2 or more cycles long.


Yup, that's normal.

miker00lz wrote:
Zepper, how do you handle the CPU execution if you don't use what i'm using?


The PPU is clocked at each CPU memory access (read or fetch), so it's an infinte loop. Inside it, there's a check for pending IRQ/NMI INTs or user_requested_int (quitting the emulation, for example).

I don't see your method as "wrong", but be sure you catch the PPU events/requests as needed.

by on (#91145)
The difference between 258 and 262 lines is 455 cycles, which is just short of one 513 to 514 cycle OAM DMA. How are you accounting for this time?

by on (#91149)
tepples wrote:
The difference between 258 and 262 lines is 455 cycles, which is just short of one 513 to 514 cycle OAM DMA. How are you accounting for this time?


Good point.