Sprite #0 - NESdev BBS

Sprite #0
by WedNESday on 2005-09-25 (#4899)

I am finding it hard to get my Sprite #0 hit detection to work.

Here are the circumstances that I set bit 6 of 2002 under;

if( Sprite == 0 && Pixel != Transparent )
SetFlag;

If the collision is behind the background then I have no problem. But if the sprite is in the foreground (like the 2nd NT of Super Mario.) then there seems to be an error. NEStress give me an error for my code. Anything else that I should know?

by Anes on 2005-09-25 (#4901)

i have problems too, but you should do:

Code:

if( Sprite == 0 && Pixel != Transparent && Background != Transparent)
SetFlag;

by WedNESday on 2005-09-25 (#4904)

Yeah, I am already doing that, but for some unknown reason collision detection is a bit buggy. Games that seems to use a bg #0 sprite are ok but the other way around seems to fail. I'll keep trying.

by Disch on 2005-09-25 (#4905)

The sprite 0 hit flag is raised when a non-transparent BG pixel "hits" a non-transparent sprite 0 pixel. The logic ANes provided gives the general idea.

As for timing it... you must know that each scanline consists of 341 PPU cycles... the first 256 of which render pixels on the scanline. So if sprite 0 hit happens on, say, pixel 120 of the scanline... then $2002.6 is raised on PPU cycle 120 of the scanline (on an NTSC system, this would translate to CPU cycle 40 of the scanline -- 113.6666667 CPU cycles per scanline).

Rouding that time off to the start or end of the scanline might cause the timing to go off in some games.

by WedNESday on 2005-09-25 (#4907)

Because WedNESday renders a scanline and THEN goes back and draws the sprites I suppose tha this may have an effect on certain games. Would it explain why NEStress gives me an error then?

by Disch on 2005-09-25 (#4910)

How you render doesn't make any difference at all. What matters is that the game gets the flag set when it expects it. I'm not exactly sure what triggers a NEStress error... so I can't really help you with that.

by Anes on 2005-09-25 (#4912)

be carefull with nestress, i when i was starting i supossed nestress was very fiable, but it is not. As a starting point for cpu emulation and ppu emu is fine, but... i remember when nestress throw me constantly that ADC Overflow operation was OK and some games have errors cos it wasnt at all OK.
Just an advice.

Disch wants to say that "when the game exepct it" you can calculate it of how many cpu cycles has to pass to sprite hit flag goes up
and if your emu ppu timing is correct or in other way nearly exact calculating the cycles to pass to hit flag to set is cool.

But, you mentioned before you dont want or you are not interested in Brad Taylor "2C02 Referece" wich is a crucial doc to have a good understading on NES Architecture for (al least at a functional level, which is fine for emu dev) a good emulation (altought mmc3 topic is bad).
Understading that doc will clarify a lot of your doubts (which are already explained there).
I tell you becouse when i was starting my project a lot of things where unclarify for me. So i talk you from my expirience and thanks to other people who helped me here in the forum.

EOP

by WedNESday on 2005-09-26 (#4924)

After doing some testing, FCE Ultra is the only emulator that seems to get an ok from NEStress, even though there seems to be no issues with other emulators and their hit detection (i.e. games that use it play just fine).

Let's just clarify one thing. If a sprite's pixel (i.e. 1-3 in the palette indicies (NOT including att colour?)) is also on the same pixel as a background pixel (i.e. !Transparent) then the hit flag is set. Is this correct?

This is now the only graphical issue I have left with my emulator and I am now very keen to get onto memory mapping and other things. So please help!!!

by Disch on 2005-09-26 (#4925)

The flag is set when a non-transparent BG pixel is to be rendered on the same space a non-transparent sprite-0 pixel. Given the following graphics:

Code:
sprite: background: Sprite 0 would hit on this pixel:
....xxxx xxxx.... ........
....xxxx xxx..... ........
....xxxx xx...... ........
....xxxx x....... ........
xxxxxxxx ........ ........
xxxxxxxx ...x.... ...X....
xxxxxxxx ..xxx... ........
xxxxxxxx .xxxxx.. ........

('x' denotes a non-transparent pixel, '.' denotes a transparent pixel)

Sprite 0's background/foreground priority does NOT matter. It can be rendered in front or behind the background.... it will not affect the hit flag at all. Attribute data and contents of the palette also do not matter.

by tepples on 2005-09-27 (#4965)

Don't set it if the only overlap occurs with x=255. This breaks RTC Demo at the raster bars but that's broken on HW as well.

Don't set it if the only overlap occurs with x in [0..7] if sprites or the BG is masked.

by WedNESday on 2005-10-04 (#5151)

Here is the code that sets the hit detection flag.

if( !Transparent )
if( Background && Sprites == Visible )
if( SpriteNo. == 0 )
if( !( !( Sprite Clipping && Spritex < 8 ) )
if( Spritex != 255 )
if( BgPixel != Transparent )

SetFlag

(Phew...)

Can anyone see what is wrong with this code?

by Anes on 2005-10-04 (#5153)

First: DONT TRUST NESTRESS, you should only trust in games that need sprite 0 hit if they work ok (smb, castlevania, etc). Its difficult (i know) to have an accurate engine for all games.

Second:
This can help for you scanline based engine:

- ONE scanline takes 341 / 3 = 113.67~ CPU Cycles ("~" means approx.)
- You have to take into account how much cpu cycles takes a scanline to be rendered. It should be 256 ppu cyles / 3 = 85 cpu cycles.
- And (341 - 256) ppu cycles / 3 = 28.3~ cpu cycles. You should do NOTHING since it is HBLANK

Third:
Instead of setting hit flag when the ppu is rendering you can calculate how many cycles will take until it will be "set" this can be done in this way:

- There are 341/3 cpu cycles per scanline
- There are 240 scanlines rendered (including the dummy one)
- You know sprite zero is in scanline (going from 0 to 240) 17 (Y Position) , but there are only pixels in scanline 19!! , but you have to know that Y pos is programmed one scanline minus the scanline it will appear on, so sprite 0 will render in scanline 20!, now you can do something like this:

20 scanlines * 341 ppu cycles = 6820 ppu cycles will take sprite hit since scanline 0 starts render to set the hit flag, which equals to: 6820 ppu cycles / 3 = 2273.34~ cpu cycles. After your cpu cycles counter is >= of that value you should set the 2002.5 flag.

Please correct me if im wrong.

I know its not the most accurate way of calculating sprite zero hit but you can take it like a begining.

by WedNESday on 2005-10-04 (#5154)

I know that people say the you should not trust NEStress and all that. One emulator DOES actually pass the test in NEStress and that is FCE Ultra. So it is possible to get a pass. WedNESday does not process graphics in the way that you have described.

Here is how it works for WedNESday;

(Scanline for Scanline basis)

Allow 113 CC of CPU time;
Draw Background;
Go back and Draw Sprites

I know that this method is unconventional but it is very simple to emulate. Could this be affecting the hit detection?

by Disch on 2005-10-04 (#5155)

WedNESday wrote:
Allow 113 CC of CPU time;
Draw Background;
Go back and Draw Sprites

I know that this method is unconventional but it is very simple to emulate. Could this be affecting the hit detection?

If you're setting the sprite 0 hit flag when you draw the sprites (according to that layout)... then yes.

Say, for example, Sprite 0 hit happens on pixel 90 of the scanline. Normally this will mean that the Sprite 0 hit flag will be raised on PPU cycle 90 of the scanline (90/3 = CPU cycle 30). If you're running the CPU for a full scanline before looking for sprite 0 hit.. then the flag will be raised after the scanline is complete... on CPU cycle 113 of the scanline... 83 CPU cycles later than it should.

by blargg on 2005-10-04 (#5157)

I just polished my sprite 0 hit test ROM and verified each result code by breaking my emulator in that manner. It currently only checks with an accuracy of 4 CPU clocks (12 PPU clocks). It tests time it's cleared each frame, time it's set at upper-left corner, time for each PPU pixel, and time for each PPU scanline. This should find most problems with the timing of it.

sprite_hit_timing.zip

I've improved my test framework so it now displays a bit more text on screen. The result codes are listed at the beginning of the asm source code included.

by WedNESday on 2005-10-04 (#5160)

Ok, I want to implement some form of accurate timing on my emulator. However, I don't want to rewrite my CPU for cycle for cycle timing. Can I do it this way.

Opcode == 0xA9 (LDA Immediate)

Do Operation...
Clock Cycles = Clock Cycles + 2;

DrawPixel( NumberofPixels ) (NumberofPixels == 2 * 3)
{
...
}

Obviously, this would increase the accuracy of my emulator. But would it be enough for hit detection?

by blargg on 2005-10-04 (#5161)

That would work, though it doesn't match the simplicity and speed of the method described in the thread "timing... (attn: disch)".

The basic idea is very simple: whenever the CPU is just about to do something that might affect PPU rendering, first run the PPU until that time, then carry out the read/write. The only requirement is that the CPU keep track of how many clocks it's executed and make this available when reading and writing I/O memory locations. With this scheme you don't constantly run the PPU every instruction, so it's quite fast.

Code:
void run_ppu( long ppu_time )
{
...
}

void write_ppu( long ppu_time, int addr, int data )
{
run_ppu( ppu_time );
switch ( addr & 0x2007 )
{
case 0x2000:
...

case 0x2005:
...
}
}

long cpu_time;
long cpu_end; // CPU will run until or just after this time

void write_memory( int addr, int data )
{
if ( (addr & 0xe000) == 0x2000 )
write_ppu( cpu_time * 3, addr, data );
...
}

void stop_cpu()
{
cpu_end = 0; // stop CPU execution after current instruction
}

void run_cpu()
{
while ( cpu_time < cpu_end )
{
int opcode = read_memory( pc++ );
cpu_time += timing_table [opcode];
switch ( opcode )
{
0x8D: { // STA abs
int addr = read_memory( pc + 1 ) * 0x100 +
read_memory( pc );
pc += 2;
write_memory( addr, a );
break;
}
...
}
}
}

by Disch on 2005-10-04 (#5162)

What you're thinking looks a lot like a pixel-accurate renderer.. although catching up after every instruction would be slow.

One way to go would be to impiment a pixel-by-pixel PPU as I describe in this thread... although that would likely require many significant changes.

A good alternative would be to predict the cycle at which sprite 0 hit will occur... and on $2002 reads, see if the CPU is before or after that cycle... if at or after, you would set the sprite0 hit flag without having to do any PPU emulation.

You could predict by rendering the sprite 0 into a temporary buffer... then rendering the BG tiles on top of it to see where they'd first collide (up to 6 BG tiles will need to be drawn -- if sprite 0 is 8x16 it can be over at most 6 tiles).. to see where they'd hit and get the timestamp from that. However you'd need to re-predict every time the circumstances change (CHR swapped, CHR-RAM written to, Sprite/BG enable/disable change, scroll change, etc, etc.. anything that could affect when sprite 0 will happen).

Rather than re-predict every time those things change (since they change all the freaking time), you could raise a "NeedRepredict" flag when those things change... and re-predict only on $2002 reads if the NeedRepredict flag is set (of course clearing it after you predict).

I was meaning to put something like that in my emu to speed up games which do wait for sprite 0 loops. You could do something similar to this for the 8-sprite flag, too.

Anyway I hope that makes sense.

edit --- oop too slow... blargg beat me to it... and he linked to that thread as well XD

by blargg on 2005-10-04 (#5163)

Disch described exactly what I've been thinking of for my emulator (though he's going at it from an already-correct implementation, while I'm trying to improve accuracy). Currently in my $2002 read function I check to see if the current time is after the earliest sprite 0 could occur. (based on its Y position). If so, I just scan however many lines of sprite 0 have been drawn and report a hit when I find one with any non-transparent pixels (i.e. I never look at the background). This works surprising well for many games (even Battletoads, except for the snake pit and tower). This also passes the sprite hit timing test ROM I posted earlier (since they use a sprite that's just a big block of non-transparent pixels).

I haven't yet come up with a way to handle sprite 0 hit without interacting with sprite rendering. I don't want to write a separate mini-renderer because it would be so similar to main rendering and might have subtle differences. The idea I'm working on involves saving the pixels under sprite 0, then comparing those to the pixels after it's drawn. Cheap, but simpler to implement and it doesn't affect low-level pixel rendering (which is done in chunks one or more scanlines).

by blargg on 2005-10-04 (#5166)

Well, I just implemented the scheme I described above and it works well so far. It came out quite simple and I didn't have to duplicate any of the sprite drawing logic (flipping, etc.). I'm going to be improving the sprite hit timing ROM to test with pixels in the four corners, and writing a second test ROM to test many different situations of transparent and non-transparent pixels, other sprites, non-hit under left clip border and right edge, etc. Hopefully I'll post it tomorrow, if I don't run into any problems.

by Zepper on 2005-10-05 (#5184)

I still need to run a benchmark test in a PentiumIII 800Mhz, but for my machine (Celeron 2.66GHz), my emu runs at 130~140 FPS on 256x240 windowed mode. On 640x480 stretched, it goes up to 85 FPS. I have no clue if this is a good or bad result, but anyways it uses pixel precision emulation. ^_^;;

by WedNESday on 2005-10-05 (#5188)

Ok, I am going to implement my afforementioned method of rendering, i.e. execute a full CPU instruction, followed by rendering 3 pixels etc.

Draw (Instruction Time - 1) * 3 Pixels
Execute Instruction
Draw 3 Pixels (Remaining Cycle)

With this method am I guarenteed to have an accurate CPU/PPU/APU relation?

I was wondering which methods other people use in their emulators. Quitest, Fx3, blargg, What do yours use?

Also what is the importance of Loopy's scroll document? I have totally ignored the information contained inside (as I also find it totally incomprehensible), but I have had not scrolling issues in my emulator.

by Disch on 2005-10-05 (#5192)

A pixel is not rendered on every PPU cycle. There are 341 PPU cycles per scanline... but only the first 256 of those cycles render pixels. The other cycles do other things.

This method'll work... but as blargg and I have already pointed out, it'll be difficult to get going properly and will be painfully slow (it's basically the same concept as the "catch up" method described in the previously linked thread, only instead of only catching up when needed you're catching up after every instruction).

Quietust, afaik, does things one cycle at a time... as in he runs the CPU for one cycle, then the PPU, then the APU, CPU, PPU, APU, etc... which makes it easier to do things with cycle-perfect accuracy.. however it is DREADFULLY slow, which is why Nintendulator demands a much more powerful computer to run than other emus do. (Feel free to correct me on this Q, that's just my understanding of how Nintendulator works.. I could very well be wrong).

Some games may rely on $2006 and $2005 interaction for split screen effects, so understanding and applying the info in Loopy's docs might be important. The docs are pretty hard to understand at first... but it's not really as complex as it may seem.

There's a PPU address (Loopy_V) which the PPU uses to not only handle $2007 read/writes, but also uses for tile fetching when rendering. There's also a temporary value (Loopy_T) which it uses to refresh Loopy_V with during rendering (like say, to reset the X scroll at the start of a new scanline).

Loopy_V and Loopy_T are both 15 bits... and are referred to as 'v' and 't' in loopy's doc. 'd' in loopy's doc refers to the value being written to the register, and 'x' is the fine X-scroll value.

so in loopy's doc:
Code:
2000 write:
t:0000110000000000=d:00000011

Means the low 2 bits of the value written to $2000, are written to bits 10 and 11 of Loopy_T (other bits in Loopy_T are unaffected).

Code:
2005 first write:
t:0000000000011111=d:11111000
x=d:00000111

means the high 5 bits of the written value get written to the low 5 bits of Loopy_T, and the low 3 bits of the written value set the fine X scroll.

And so on.

by Quietust on 2005-10-05 (#5197)

Disch wrote:
Quietust, afaik, does things one cycle at a time... as in he runs the CPU for one cycle, then the PPU, then the APU, CPU, PPU, APU, etc... which makes it easier to do things with cycle-perfect accuracy.. however it is DREADFULLY slow, which is why Nintendulator demands a much more powerful computer to run than other emus do. (Feel free to correct me on this Q, that's just my understanding of how Nintendulator works.. I could very well be wrong).

For the most part, you are correct - the only detail is that while my CPU does emulate individual instruction cycles (and emulates the PPU/APU between each one), it is not capable of stopping in the middle of an instruction. The end result is effectively the same, however.

For example, the instruction "STA $4015" would do the following:
* Read opcode (LDA absolute) and update PPU+APU
* Read operand low byte ($15) and update PPU+APU
* Read operand high byte ($40) and update PPU+APU
* Write value in accumulator to $4015 and update PPU+APU

In my current code, I emulate the PPU+APU before the corresponding CPU cycle. The only down side is that this can cause some PPU updates (grayscale, colour emphasis, fine X scroll, palette change) to be up to 3 pixels off, which is negligible.

by WedNESday on 2005-10-06 (#5209)

Disch wrote:
A pixel is not rendered on every PPU cycle. There are 341 PPU cycles per scanline... but only the first 256 of those cycles render pixels. The other cycles do other things.

I know about that. 256 Pixels are rendered and the rest of the CPU time is HBlank (about 28.3 cc's). What do the remaining PPU cycles do then?

by hap on 2005-10-06 (#5210)

Quote:
What do the remaining PPU cycles do then?

A lot. Did you read Brad Taylor's NTSC 2C02 technical reference ? If you didn't yet, now's the time. It explains what the PPU cycles 'do'.

by WedNESday on 2005-10-06 (#5211)

hap wrote:
A lot. Did you read Brad Taylor's NTSC 2C02 technical reference ? If you didn't yet, now's the time. It explains what the PPU cycles 'do'.

Should I use that reference or this one http://www.nesworld.com/dev/ntscpput.txt

by Disch on 2005-10-06 (#5212)

That doc seems to say the exact same thing... just cut down (a lot of other not-as-useful-for-emu-development information removed). I'd say either reference is fine... nothing in the two should contradict each other... at least not that I saw.