Visual Nes - C++/C# port of Visual 2A03 + 2C02

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#185919)
Edit: Now also contains the visual 2A03's core - see posts below for more info.

I originally intended to modify the Visual 2C02's code to improve its performance and add a few features, but after a day of trying to optimize every single detail of the JS code, I realized it wasn't going to be fast no matter what I tried.

So here I am, 2 days later with a C++/C# port that runs ~20 times faster than the original JS version does in Chrome (on my computer at least).
On my old i5, it simulates the chip at 6000-9000hz, while tracing a bunch of stuff. (A full frame takes about 30 seconds)
In comparison, I usually got to about 400hz on the JS version with similar tracing (~10 minutes for a frame).

Obviously, this is all based on Quietust's work, which is based on the Visual 6502 - so all credits to him and the folks behind the Visual 6502.
I simply converted the JS code to C++, with relatively few modifications (aside a few optimizations).

This version adds a few features: log the trace to a file, select nodes to trace from a list, load/save state to files, load/save ram content
It also emulates the full $0000-$2FFF memory range for the PPU.
It's still missing some things, though - most notably, I did not port the code that draws/animates the actual chip yet.

Code: https://github.com/SourMesen/VisualNes/
Windows binary: http://www.mesen.ca/VisualNes.zip
Linux: There is a makefile included, it seems to run fine on Mono from the few tests I did.

Hopefully this is useful for someone else!
Let me know if you find any issues.

EDIT: Replaced download link with a statically linked version (no longer requires the VC2015 runtime to be installed to run)

Attachment:
Visual2C02.png
Visual2C02.png [ 64.69 KiB | Viewed 3703 times ]
Re: Visual 2C02 - C++/C# port
by on (#185929)
Cool! Gonna give it a try. I wanted to look into sprite hits anyway!

Edit: Tried it out, seems to work well! I really like the regex node search, haha. Some things -

The "next scanline" button jumps a whole scanline instead of to the beginning of the next scanline. This goes for all the "next x" buttons. I don't really mind, but it differs from the JS "next x" buttons.

Secondary OAM isn't viewable.

Some sprite values get destroyed during the first cycles of running, but this might be intentional? The JS version does it too.
Re: Visual 2C02 - C++/C# port
by on (#185970)
Yea, I'm aware of the difference between the next pixel/scanline buttons - the way I have it now was just easier to implement, so I was lazy. In the end, I feel like having both options available would probably be best (at least for the scanline/frame options)

I don't think the secondary OAM is visible on the JS version, either? Adding this would probably require finding out which nodes correspond to the secondary OAM (this is how the code works for the sprite & palette ram). I don't know enough about chips to ever hope to find this information myself, though.

The sprite ram being overridden is because of the "Program" (bottom left) that's loaded up by default - it makes some writes to registers and fills up some of the sprite ram with preset values. You can erase the program and then reset to make sure it doesn't do that.

Also, I just finished porting some more features over - the chip is now visible and you can highlight nodes/zoom/pan it.
-Double-click to zoom in, right click to zoom out (or use the mouse wheel for both)
-Click and drag to pan
-Click on a node to get its name/highlight, Shift-click to select a group of connected nodes

I updated the download link in the first post.

Attachment:
Visual2C02.png
Visual2C02.png [ 119.04 KiB | Viewed 3604 times ]
Re: Visual 2C02 - C++/C# port
by on (#185973)
Sour wrote:
I don't think the secondary OAM is visible on the JS version, either?

Secondary OAM is at S100..S11F in Visual 2C02.
Re: Visual 2C02 - C++/C# port
by on (#185975)
thefox wrote:
Secondary OAM is at S100..S11F in Visual 2C02.
Whoops, you're absolutely correct. Fixed!
Re: Visual 2C02 - C++/C# port
by on (#185990)
Sour wrote:
The sprite ram being overridden is because of the "Program" (bottom left) that's loaded up by default - it makes some writes to registers and fills up some of the sprite ram with preset values. You can erase the program and then reset to make sure it doesn't do that.


That's the thing, some values just gets destroyed even if you remove the program. I think it's just intentional, but why I don't know.
Re: Visual 2C02 - C++/C# port
by on (#186018)
About the OAM values being destroyed - the 3rd byte of every entry loses some of the bits, this is normal. But beyond that, I'm not too sure.

I decided to push this whole thing one step further:
Attachment:
visualnes.png
visualnes.png [ 181.01 KiB | Viewed 3522 times ]


A 2A03 & 2C02 running in the same simulation, connected to the same CLK and RESET lines. Still far from being done, though.
Technically, couldn't I just connect some of the data/address buses (not quite sure which!) and have a working "NES"?
If I go that far, wouldn't it be simple to run NROM test roms & get a "perfect" trace of what happens?

It still runs at ~5000Hz even with both chips in the simulation, so it should be able to run about 1 frame per minute. An hour for a second, that's not too bad considering most test roms take only a second or 2 to complete.
Re: Visual 2C02 - C++/C# port
by on (#186025)
fred wrote:
Some sprite values get destroyed during the first cycles of running, but this might be intentional? The JS version does it too.


If you don't fully initialize every byte of Sprite RAM, then this sort of thing will happen because the DRAM cells themselves initialize (and would normally decay) to an "indeterminate" state in which they are neither 0 nor 1 and will acquire a new value during refresh.
Re: Visual 2C02 - C++/C# port
by on (#186028)
I see!

Sour: Haha! That would be quite something.
Re: Visual 2C02 - C++/C# port
by on (#186079)
Still not quite finished, UI-wise, but I've managed to hook up both cores together.

I loaded up one of blargg's NROM tests into it - the code waited for the vblank flag in a loop reading $2002 until the PPU set the flag, at which point the CPU continued its execution. I'd imagine the cores are working properly if this much works.
Re: Visual 2C02 - C++/C# port
by on (#186145)
I've mostly finished integrating the Visual 2A03 core into it. Renamed the whole thing to "Visual NES", since that just makes more sense at this point (the name may very well be taken by something else, but I'm not too worried about it :p)

It supports loading .nes ROMs and is meant to reproduce the NES' environment - $800 ram with mirroring, NT mirroring, etc.
A lot of this hasn't been tested that much, so if you find issues, please let me know.

Code: https://github.com/SourMesen/VisualNes/
Windows binary: http://www.mesen.ca/VisualNes.zip

Attachment:
visualnes.png
visualnes.png [ 164.48 KiB | Viewed 3388 times ]
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186151)
Didn't think I'd live to see a transistor-level NES emulator. How long before we can do it in realtime? :lol:
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186156)
Rahsennor wrote:
How long before we can do it in realtime? :lol:
Currently runs at about 1/1000th of the speed of the NES. If you could somehow speed up the code 10 times over by splitting the workload onto multiple cores and optimizing the code, and then use a more recent CPU than mine, you might be down to around 1/50th of the speed. So.. I guess somewhere around 2040 we might be able to get it done!

For now I'm mostly interested in using this to compare execution traces with Mesen for the couple of tests it still doesn't pass and try to figure out why.
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186211)
I don't have much to say other than good job, the performance upgrade with this port is fantastic. And I always thought it would be cool to see Visual 2A03 and 2C02 combined.

Seems that it doesn't like loading 16kB NES files, but that's not a big problem (easily solved by making it 32kB).

From what I understand, the NES CPU and PPU has several different clock alignment possibilities determined at power-on (or is it reset?). Maybe that's something that could be included in this? I'm not saying that I need it myself though, or that I know what would be involved exactly.
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186222)
This is exactly the kind of project that would benefit greatly from PGO. Perhaps even 2x or more.
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186228)
Thanks!
Yea, the 16kb loading is a bug I realized after posting, but didn't fix yet. It copies the 16kb bank to $8000-$BFFF, so that obviously doesn't work. Also, I forgot to mention it, but at the moment it's only meant to run with mapper 0 stuff - though I suppose very simple mappers could be added easily with HLE without really having too much impact on the accuracy.

For the alignment, I just tried changing the soft reset logic to not alter the state of the chip other than putting the reset signal low for a given number of cycles, and it seems to yield 6 (out of a possible 8) different alignments (on a half-master clock level). I was under the impression there were only 4 possible alignments, so maybe I'm doing this wrong.

calima wrote:
This is exactly the kind of project that would benefit greatly from PGO. Perhaps even 2x or more.
A quick test with PGO seems to yield approximately ~15% faster code (4900hz -> 5500hz on my machine). Which is pretty similar to what I get on Mesen with PGO, too.
At the moment ~50-60% of the time is spent in this recursive function. I haven't been able to find any way to make it faster though. Converting "group" from a vector to a hashset makes it slower (presumably because "group" is usually very small), and the way it works makes it pretty hard/impossible to split the work across multiple threads without a ton of lock contention.
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186231)
What is the range of c1/c2? Perhaps you can use that to optimize it.
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186249)
C1/C2 corresponds to node numbers - between both chips, they range from 0 to 33000 (though some numbers are unused).

In other news, I was trying to get the simulator to display the PPU's actual output (based on the VRAM's content) and found out the writes to the PPU don't seem to be working as expected. Writing to $2000 to enable NMIs doesn't appear to work, and a simple program like:
Code:
LDA #$77
STA $2007
JMP $0000
Ends up writing garbage to VRAM instead of $77. Writing to CPU RAM works as expected, though, so the problem seems to be the communication between both chips. It's probably a silly mistake, but I've been looking at this for a few hours already and I haven't been able to figure it out.

If anyone's willing to check if they see something that's obviously wrong, it'd probably be around here: halfstep()
clk0 is the master clock, cpu_clk0 is the cpu's clock (e.g clk0 / 12) and io_ce is the chip enable input on the PPU (which is based on the cpu's address bus & phi2)
I'm unsure if the logic I'm using to replicate the 74139's behavior (io_ce) is correct, among other things.

Everything else is pretty much copy/pasted from the original javascript simulators, though. The major difference being that the original Visual 2C02 uses the function handleIoBus() to emulate a CPU (this used to work in the C# version too, before I integrated the 2A03 into it).
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186250)
Sour wrote:
I was trying to get the simulator to display the PPU's actual output (based on the VRAM's content)

I don't see much use for a high-level interpretation of the VRAM contents... Wouldn't it be MUCH better to decode the composite signal generated by the PPU? Now that would be awesome!

Quote:
and found out the writes to the PPU don't seem to be working as expected.

Doesn't this have to do with the fact that the PPU needs time to "warm up"? Games are required to wait 1 or 2 frames before using the PPU for this reason. I don't know anything about this type of low-level simulation, so this is the only thing I can think of!
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186252)
Sour wrote:
At the moment ~50-60% of the time is spent in this recursive function. I haven't been able to find any way to make it faster though. Converting "group" from a vector to a hashset makes it slower (presumably because "group" is usually very small), and the way it works makes it pretty hard/impossible to split the work across multiple threads without a ton of lock contention.

My first suggestion would be to add an array of booleans big enough to count every single node (e.g. "vector<bool> groupbool"), then use it to keep track of whether any node is in the list or not ("if (!groupbool[i]) return; groupbool[i] = true; group.push_back(i);", and the opposite when removing an element from "group") in order to avoid the delay of searching through the vector each time.

For small sets of node updates, it probably won't help that much, but for a very high-use function, every little bit helps.
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186265)
For such a small range, the first thing is to move to unsigned short/uint16_t instead of int. Including in your data struct, you're jumping all over memory in that function, so shrinking the data will increase cache hits.

You may also consider dividing the data into two containers/arrays, one just for the hot function and one with the rest. Again for improved cache hits.

The range then enables other things, like using a fixed-size presence array like Quietust said above.
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186273)
tokumaru wrote:
Sour wrote:
I was trying to get the simulator to display the PPU's actual output (based on the VRAM's content)

I don't see much use for a high-level interpretation of the VRAM contents... Wouldn't it be MUCH better to decode the composite signal generated by the PPU? Now that would be awesome!

And/or tap into the intermediate signals (after palette look up, before composite signal generation) to generate a pixel-perfect output.
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186274)
thefox wrote:
And/or tap into the intermediate signals (after palette look up, before composite signal generation) to generate a pixel-perfect output.

Yeah, that'd be pretty useful too!
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186303)
Thanks for the suggestions - I've changed the ints to shorts, removed anything that wasn't actually required from the structs and a few other things. Adding an array of bool to avoid scanning "group" did not make any difference, though (seemed to be 1-2% slower)
Between these and PGO, it is roughly 50% faster than before (~7500hz instead of ~5000hz). I'm using a pretty old i5, so I'd imagine more recent CPUs should be able to get above 10kHz.

thefox wrote:
And/or tap into the intermediate signals (after palette look up, before composite signal generation) to generate a pixel-perfect output.
This was the first option I wanted to do, but couldn't find any node in the list that seemed to match. I was looking for things along the lines of "pixel" though, not palette. I just took another look at the node list and it seems like this might be what pal_d0_out to pal_d5_out are for - if so, I'll use those to generate the picture.

tokumaru wrote:
Doesn't this have to do with the fact that the PPU needs time to "warm up"?
Unfortunately, no. The writes to the registers are ignored during the first frame (due to the warm up period), but once they do start actually having an effect, they aren't working properly. I'm pretty sure it has to do with io_ce not being timed properly, but still haven't figured it out completely.
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186323)
If the profiler still points to the group search, a bloom filter in front could be useful. One of my favorite speedup techniques.
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186363)
calima wrote:
If the profiler still points to the group search, a bloom filter in front could be useful. One of my favorite speedup techniques.
I read up a bit of bloom filters, but I'm not quite sure see how I could apply them here? The nodes that are in a particular group change constantly as transistors turn on and off, and this is recursive, so a single transistor changing state could make a group go from 2 nodes to 50 nodes..

And I've fixed up some of the issues, it seems, but some remain (e.g: there's an incorrectly displayed sprite at the top left)
Also DK is the only game I found that boots in a reasonable amount of frames (SMB surprisingly takes about 30 frames..)
Attachment:
visualnes.png
visualnes.png [ 176.64 KiB | Viewed 1945 times ]

For fun, here's what it looks like with Quietust's scanline test rom - bg color is wrong, but it actually displays exactly like what Mesen does (and like what Eugene posted a while ago from a real Famicom). Maybe caused by PPU-CPU alignment? I'd have to try to change the alignment and run it some more to see.
Attachment:
scanline.png
scanline.png [ 18.8 KiB | Viewed 1945 times ]
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186379)
Quote:
(e.g: there's an incorrectly displayed sprite at the top left)

Donkey Kong, as [mostly-]good practice initializes the Y-values of unused sprites to FF. Anything from F0-FF should be invisible. I wonder if you got a Y-wrap introduced, and where.
for (J)'s .nes, 31bd:FF is the initial value used in the OAM-page initializing loop, if you want to change it to see if that's your issue.
for (U) it should be at 31ae.
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186401)
There's a bug in the Visual 2C02 OAM DMA: viewtopic.php?p=169373#p169373 (it does not actually seem to corrupting the source address to 0 always unlike I said in that post, instead it seems to depend on the value written and the hibyte of "ab": spr_addr = value_written AND hibyte(ab)).
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186414)
There are different versions of scanline.nes:
1) black: http://nesdev.com/scanline.zip
2) gray: https://github.com/christopherpow/nes-test-roms (see \scanline\scanline.nes)

I used gray:
viewtopic.php?f=3&t=14833
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186429)
Eugene.S wrote:
There are different versions of scanline.nes:
1) black: http://nesdev.com/scanline.zip
2) gray: https://github.com/christopherpow/nes-test-roms (see \scanline\scanline.nes)

I used gray:
viewtopic.php?f=3&t=14833

The "black" one is, in fact, broken - it's the original version I wrote back in 2003 before I actually had it tested on a real NES.

The "gray" version is the correct one (and is the one taken from my website, as evidenced by the extra build-script input files in the directory).
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186430)
thefox wrote:
There's a bug in the Visual 2C02 OAM DMA: viewtopic.php?p=169373#p169373 (it does not actually seem to corrupting the source address to 0 always unlike I said in that post, instead it seems to depend on the value written and the hibyte of "ab": spr_addr = value_written AND hibyte(ab)).
Yea, that seems to be what DK is doing - the OAM data matches the zero page. Isn't the high byte of AB always $40 in this particular scenario?
cpow seemed to be convinced these bugs were not there originally - is there any Git/SVN repository available for the Visual 2A03 that I might be able to use to figure out when the bugs appeared?

Eugene.S wrote:
There are different versions of scanline.nes
Ah, I wasn't aware there were 2 versions, thanks for pointing that out. But the version I'm using is definitely supposed to be gray.

The contents of the palette RAM doesn't match what I get in Mesen for both the scanline test & DK.
My assembly skills are terrible, so I may have messed up, but I expected this to roughly fill palette ram with $01 to $20 (mirroring aside)
Code:
  lda #$3F
  sta $2006
  lda #$00
  sta $2006
  ldx #$01
  ldy #$1F
loop:
  stx $2007
  inx
  dey
  bne loop

Instead I got this:
Code:
17 08 09 0A 1B 0C 0D 0E 1F 10 11 12 13 14 15 16
17 18 19 1A 1B 1C 1D 1E 1F 00 01 02 13 04 05 06
When I look at the trace, I see the PPU setting AB to $3F01 and DB to $01 at one point, which is off by 1 (should be $3F00?) but the content written in the actual RAM doesn't match - it seems to be writing to the AB value that the PPU is set to outside of those "writes" (typically 3F30-3F4Fish range). So I guess there might be something wrong with palette writes, too (although I would have to test the palette writes on the visual 2C02 to make sure this isn't a bug specific to my code)
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186434)
This is really cool stuff Sour! I'm already impressed at the speed this is running at and what you are getting it to do. I'll definitely be keeping track of the progress and look forward to making use of it.

Good Luck!
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186438)
@Sour That code looks fine, but it's not going to write to $3f1f due to the dey/bne causing Y=$00 and the branch to not be taken. Use of bpl should work (branch will be taken until the negative flag in P is set, which Y going from $00 to $FF will trigger (the branch to no longer be taken)), thus allowing PPU RAM $3f1f=$20. Sorry for the wording of this paragraph. :)

As for the rest of the PPU RAM values: I'm fairly certain PPU palette mirroring plays a role here (with regards to what values you end up seeing in PPU RAM). Can't really help with the "internal behaviour" aspect.
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186472)
Sour wrote:
thefox wrote:
There's a bug in the Visual 2C02 OAM DMA: viewtopic.php?p=169373#p169373 (it does not actually seem to corrupting the source address to 0 always unlike I said in that post, instead it seems to depend on the value written and the hibyte of "ab": spr_addr = value_written AND hibyte(ab)).
Yea, that seems to be what DK is doing - the OAM data matches the zero page. Isn't the high byte of AB always $40 in this particular scenario?
cpow seemed to be convinced these bugs were not there originally - is there any Git/SVN repository available for the Visual 2A03 that I might be able to use to figure out when the bugs appeared?

IIRC "ab" was related to the value of PC (probably due to the fact that right after the $4014 write the CPU still has time to fetch the first byte of the next instruction before RDY is deasserted by the DMA unit). You can try this by executing LDA #$FF / STA $4014 at $100. spr_addr should start as $FF and then get corrupted to $01.

I don't think there's a repository or change history of Visual 2A03 publicly available, but with some luck Quietust might have one.
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186505)
koitsu wrote:
@Sour That code looks fine, but it's not going to write to $3f1f due to the dey/bne causing Y=$00 and the branch to not be taken. Use of bpl should work (branch will be taken until the negative flag in P is set, which Y going from $00 to $FF will trigger (the branch to no longer be taken)), thus allowing PPU RAM $3f1f=$20.
Thanks, that probably explains the $00 value in the palette ram. Not quite sure why the addresses are wrong though - maybe my test is wrong, considering DK seems to be able to write to the correct indexes (although about half the of values are incorrect in that case)

thefox wrote:
I don't think there's a repository or change history of Visual 2A03 publicly available, but with some luck Quietust might have one.
I just went ahead and checked on archive.org for both the 2A03 & 2C02 - the oldest copy was from May 1st 2013. The 2A03 had no significant change (all node definitions are unchanged as far as I could tell). The 2C02 had some changes to node definitions, apparently related to sprite position, but that wouldn't explain the DMA bug (since it's a 2A03 issue). Maybe the bug has always been there?
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186510)
One somewhat important thing to consider is that the Visual 2A03 and Visual 2C02 use slightly different versions of ChipSim - I believe Visual 2A03 uses the same version as Visual 6502, but Visual 2C02 uses different logic to resolve groups of "floating" nodes (whereby it considers the area of each node to determine whether the group goes high or low) to fix $2007 writes and Sprite DRAM refreshes.

Of course, there are also several bugs in the Javascript versions of Visual 2A03/2C02 that I've simply never gotten around to fixing...
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186522)
Quietust wrote:
whereby it considers the area of each node to determine whether the group goes high or low
At the moment, both chips use this logic. Do you think the CPU would still run correctly with it, since it's technically just an extra level of precision on the simulation?

Also, I decided to try running some CPU test roms to see how it fares:
Code:
branch_timing_tests:
  1.Branch_Basics  -Pass
  2.Backward_Branch  -Pass
  3.Forward_Branch  -Pass

instr_misc:
  01-abs_x_wrap  -Pass
  02-branch_wrap  -Pass

instr_test-v3:
  01-implied  -Pass
  02-immediate  -Fail (both 69 ADC #n and E9 SBC #n failed)
  10-stack  -Pass  (This one took 117m half clocks to complete, probably took over an hour to run)
  11-jmp_jsr  -Pass
  12-rts  -Pass
  13-rti  -Pass
  14-brk  -Pass
  15-special  -Pass
So far it seems pretty good, but not quite perfect. I'll keep running some in the background and try to get through most of the CPU-related tests done eventually (some take a long time to run so it may take a while). At least that way we'll have an idea of what works and what doesn't. Not sure if that would help in actually finding and fixing bugs, though - and unfortunately, I have very little hope of being able to fix these myself.
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186599)
I've been running tests for the past 10 hours or so (running 3 copies of the simulator at once). I didn't run every single test, some take a ridiculously long time to run (1 second test = 1 hour... one of them took about 5-6 hours to complete)

The CPU seems to be working correctly in most cases ($4014 writes aside).
ADC/SBC (and RRA/ISC which reuse their logic) are bugged (I imagine the "carry" part of the operation might not be working properly?) - this has the potential to break other tests if they are used.
The APU seems to be working as well (irq_flag failed, and dmc dma's behavior seems to be slightly incorrect)

The PPU is hard to say - with the $4014 bug, sprite-related tests will all fail.
The palette RAM test passed, but I'm fairly sure there is something wrong with the palette in general.
The background color that gets output seems to always use $3F0F instead of $3F00 (so the lower 4 bits are inverted - incorrect wiring maybe?), among other things.

Hopefully these results can eventually be useful in trying to fix the visual 2A03/2C02 - I'm not sure there is anything I can do beyond this, though.

Edit: Also updated the download link to include the latest build (better speed, fixes, and some UI improvements)

Code:
blargg_apu_2005.07.30:
  01.len_ctr: Pass
  02.len_table: Pass
  03.irq_flag: FAIL ($06 - "Writing $00 or $80 to $4017 doesn't affect flag")
  04.clock_jitter: Pass
  05.len_timing_mode0: Pass
  06.len_timing_mode1: Pass
  07.irq_flag_timing: Pass
  08.irq_timing: Pass
  09.reset_timing: Pass
  10.len_halt_timing: Pass
  11.len_reload_timing: Pass

blargg_ppu_tests_2005.09.15b:
  palette_ram: Pass
  power_up_palette: FAIL (expected it to fail)
  sprite_ram: FAIL ($06 - "$4014 DMA copy doesn't work at all")
  vbl_clear_time: Pass
  vram_access: Pass
 
branch_timing_tests:
  1.Branch_Basics: Pass
  2.Backward_Branch: Pass
  3.Forward_Branch: Pass

cpu_interrupts_v2:
  1-cli_latency: Pass
 
dmc_dma_during_read4:
  dma_2007_read: FAIL? (Outputs: 11 22, 11 22, 11 22, 11 22, 33 44 - 4AEFDE12)
  dma_2007_write: Pass
  double_2007_read: FAIL? (Outputs: 22 33 44 55 66, 02 33 44 55 66, 31D9ED83)
  read_write_2007: Pass

instr_misc:
  01-abs_x_wrap: Pass
  02-branch_wrap: Pass
  03-dummy_reads: Pass
  04-dummy_reads_apu: Pass

instr_test-v3:
  01-implied: Pass
  02-immediate: FAIL (69 ADC, E9 SBC)
  03-zero_page: FAIL (65 ADC, E5 SBC, 67 RRA, E7 ISC)
  04-zp_xy: FAIL (75 ADC, F5 SBC, 77 RRA, F7 ISC)
  05-absolute: FAIL (6D ADC, ED SBC, 6E RRA, EF ISC)
  06-abs_xy: FAIL (7D ADC, 79 ADC, FD SBC, F9 SBC, 7F RRA, FF ISC, 7B RRA, FB ISC)
  07-ind_x: FAIL (61 ADC, E1 SBC, 63 RRA, E3 ISC)
  08-ind_y: FAIL (F1 SBC, 71 ADC, 73 RRA, F3 ISC)
  09-branches: Pass
  10-stack: Pass
  11-jmp_jsr: Pass
  12-rts: Pass
  13-rti: Pass
  14-brk: Pass
  15-special: Pass
 
oam_read: FAIL (Displays mostly stars)

ppu_sprite_hit:
  01-basics: FAIL ("Flag isn't working at all" - Most likely caused by broken $4014 writes)

ppu_sprite_overflow:
  01-basics: FAIL ("Should clear flag at end of VBL" - Not sure what is causing this)
 
read_joy3:
  count_errors_fast: FAIL (because no controller is connected - need to emulate a standard controller and try again)
 
test_apu_2:
  test_1: Pass
  test_2: FAIL (might be normal - apparently can also fail on NES based on cpu-ppu alignment)
  test_3: Pass
  test_4: Pass
  test_5: Pass
  test_6: FAIL (not sure if this is normal - test 6 was originally affected by alignment, but it sounded like it was fixed?)
  test_7: Pass
  test_8: Pass
  test_9: Pass
  test_10: Pass
  test_11: Pass

The OAM read test looked like this:
Attachment:
oamread.png
oamread.png [ 18.12 KiB | Viewed 2316 times ]
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186602)
All the ADC/SBC operations failing is interesting. Quite possibly someone didn't implement twos-complement correctly? These two opcodes are the #1 pain point, opcode-wise, for emulator authors. Just the first thing that comes to mind.
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186604)
koitsu wrote:
All the ADC/SBC operations failing is interesting. Quite possibly someone didn't implement twos-complement correctly? These two opcodes are the #1 pain point, opcode-wise, for emulator authors. Just the first thing that comes to mind.

There's one distinct possibility: as I originally mentioned when I released the Visual 2A03, the 6502 core I used is a direct copy of the Visual 6502 which has working decimal mode, so if the D flag somehow got set, then I would expect lots of ADC/SBC test failures.
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186608)
The decimal flag isn't set at startup, and the tests don't set it anywhere in their code - so it doesn't look like that would be it.

I just spent 30+ minutes trying a lot of combinations of ADC #$xx (with and without the carry flag set) and couldn't find any that didn't set the flags as expected or gave the wrong result.. I think I'll try to recompile blargg's test with only the ADC portion of the test, trace the value of A and the flags at each step & then compare that with an emulator's trace.
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186623)
Sour wrote:
The decimal flag isn't set at startup, and the tests don't set it anywhere in their code - so it doesn't look like that would be it.

Make sure it also doesn't get set by PLP. It really does seem like the most likely culprit in this case.

(How difficult would it be to disable the decimal mode in Visual 6502 in the same way that they disabled it in 2A03? Wasn't it something like one wire cut or a transistor removed?)
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186625)
thefox wrote:
Make sure it also doesn't get set by PLP. It really does seem like the most likely culprit in this case.
And it looks like you're probably correct - forgot that was even possible.
The test roms seem to be doing this at some point:
Code:
lda #$FF
sta in_p
[...]
lda in_p
pha
[...]
plp
So it's pretty likely the decimal flag is on during some of the tests (although "cld" is called at one point in the code).
I'll replace the $FF with $F7 in the rom and see if that changes anything.

According to this, it sounds like removing transistors t1329, t3212, t2750, t2202 and t2256 would replicate the 2A03's modifications.
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#186672)
Attachment:
immediate.png
immediate.png [ 20.48 KiB | Viewed 2216 times ]

Progress? Sort of...

I've forced all 5 transistors mentioned above to be "on" at all times (this is what the link said the 2A03's modifications did) and this is what I'm getting now - the official opcodes pass, but now these unofficial ones apparently don't (I've only tried this test though, since it completes in about an hour, vs 4+ for some of the others).

I had hacked up blargg's test before trying this to make sure the instructions were always performed with the decimal flag off, and that version of the test actually gave me the same result (I think - unfortunately did not save the result screen from that) - so I guess decimal mode is actually correctly disabled by this, but then why do these break when decimal mode is off is another story.
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#187056)
Since the wiki says it takes 2-3 and 3-4 cpu cycles for the sequencer counter to reset, i tried to see if i could find out with visual nes. Also if half and quarter frames are delayed or not.
I'm not very good at using visual nes or especially operating the apu, but here goes...

Code:
test:
lda 0x0F
sta 0x4000 //set volume / envelope to 15 to see when it decreases
lda 0x80
sta 0x4017 //restart things
jmp test


Cpu cycles, starting from the write cycle in STA 0x4017:

0: Write to 0x4017
1: Read
2: Read* (happens if write was or wasn't on an apu cycle, probably if it wasn't)
3: Read - cpu_frm_half and cpu_frm_quarter goes low, cpu_sq0_envt3-0 decrements
4: Read - cpu_'frm_/tXX' resets

If I shouldn't post random visual nes testing in this topic, tell me haha...


So I guess that's 2-3 cpu cycles of delay until half and quarter frames resets? The sequencer counter, if I even logged the correct thing, resets one cycle later instead of doing some kind of increment.
I'm incrementing the sequencer counter every cpu cycle, resetting the counter instead of / and incrementing at the same time as half & quarter frames should work in that case, I think. I say "instead of / and" because resetting *and* incrementing makes my emulator pass blargg's apu jitter test, but that doesn't really mean much.

I also tried "asl 0x4017" just because, it seems to write back 0xFF which wasn't what I was expecting (0x80?) but should work anyway. It doesn't toggle cpu_frm_half and cpu_frm_quarter. It seems to reset the counter based on the first write, which would make sense.

---

The download link has a typo, btw! "VisualNEs.zip"
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#187063)
fred wrote:
I also tried "asl 0x4017" just because, it seems to write back 0xFF which wasn't what I was expecting (0x80?) but should work anyway.

Do we now need to wire up a set of caps representing the CPU data bus, in the same way that the PPU's data bus for CPU communication has the _io_db lines that act as dynamic latches?
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#187561)
Thanks for letting me know about the typo - it's fixed.

Like tepples implied, the data bus' behavior is not part of the simulation (the RAM/VRAM isn't part of the simulation, for example, so these are emulated on a higher level), and their behavior is potentially incorrect.

As far as I can tell, it seems to be writing $00, not $FF ($FF is the bus' value in the first half of the clock, but this switches to $00 when the write is performed (which occurs when cpu_clk0 goes low) - this seems to be the same on the 2A03, though. Like you said, this should probably be $80 - the top 3 bits on $4017 reads are always meant to return open bus behavior. Note that I am far from being qualified to analyze this - so don't assume anything I just said is actually true. :)
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#187562)
Sour wrote:
$FF is the bus' value in the first half of the clock, but this switches to $00 when the write is performed (which occurs when cpu_clk0 goes low) - this seems to be the same on the 2A03, though.
That's ... weird.

During a write, the data bus should be stable for the entire time that the address bus is unchanging, i.e. both φ1 and φ0≈φ2≈M2

Unless maybe you're tickling problems with hold times ?
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#187578)
lidnariq wrote:
During a write, the data bus should be stable for the entire time that the address bus is unchanging, i.e. both φ1 and φ0≈φ2≈M2
Thanks, that's good to know, and probably explains why I wasn't able to get the PPU writes working without a bit of a hack. The data bus is $FF for 6 master clocks and then switches to $00 for the last 6 clocks. But I just checked on the Visual 2A03 and it seems to keep the bus to $00 for the entire cycle in this case, like you said. This could be an issue with how the CPU/PPU are connected in the simulation, or some other bug in my code, I'll take a look when I get a chance.
Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02
by on (#187752)
I tried reading 0x2002 around the time when the nmi flag gets set (241.001). Below is the "read from address" cycle of LDA 0x2002... wasn't sure how to present the results, but maybe you guys'll understand.

Code:
  phi1        phi2
  v           v
B 0123456789AB0123456789AB  = 2002.7 set, nmi occurs (A.7 clear)
^        ^       ^       ^
338      339     340     0

  phi1        phi2
  v           v
B 0123456789AB0123456789AB  = 2002.7 set, nmi occurs (A.7 clear)
^        ^       ^       ^
339      340     0       1

  phi1        phi2
  v           v
B 0123456789AB0123456789AB  = 2002.7 never set, nmi doesn't occur (A.7 clear)
^        ^       ^       ^
340      0       1       2

  phi1        phi2
  v           v
B 0123456789AB0123456789AB  = 2002.7 set, nmi doesn't occur (A.7 set)
^        ^       ^       ^
0        1       2       3

  phi1        phi2
  v           v
B 0123456789AB0123456789AB  = 2002.7 set, nmi occurs (A.7 set)
^        ^       ^       ^
1        2       3       4

I had expected the last case to also block the nmi, that's what the wiki says I think. Unsure how much clock alignment does here, but that can't be changed in visual nes as far as i know.


Edit:
Brief test of sprite overflow clearing (261.001):

Code:
  phi1        phi2
  v           v
B 0123456789AB0123456789AB  = 2002.5 cleared (A.5 clear)
^        ^       ^       ^
339      340     0       1

With dot 1 barely in on this cycle, it still manages to clear sprite overflow and return a 0 in bit 5. That should mean that if dot 1 is where 340 or 0 is in that diagram, overflow will also be seen as cleared. Haven't tested any other timing, though.