Mesen-S is a SNES emulator written from scratch, based entirely on available documentation (mostly anomie's docs), forum posts and test roms. It aims to (eventually) be a high accuracy emulator with a user friendly UI, a lot of features and debugging tools. Essentially, it aims to be Mesen for the SNES.
That being said, this is still in its infancy (I started writing this last month) and far from finished. Notably, it does not support any of the enhancement chips and a number of planned features are still missing (movies, cheats, netplay, etc.)
It currently uses a "pixel-based" renderer (it catches up the rendering mid-scanline as needed) and the timings should be fairly accurate (but there are definitely a large number of scenarios where they may be off by a few master cycles).
Just to be clear though, this is not bsnes-level accuracy, but in some regards it is probably more accurate than snes9x
[citation needed]A small number of games still have issues (freeze at boot, etc.), and some PPU effects are not perfectly accurate (e.g the mosaic effect in particular has issues).
With all the caveats out of the way, here's what it does offer at the moment:
-Relatively high compatibility (I would guesstimate that over 90% of games that don't use extra chips appear to boot and run properly)
-Windows/Linux support (Linux support is something I put together this morning and haven't had the chance to test much yet, but it compiles and runs)
-Video filters
-Save states
-Rewinding
-Loading from zip/7z files
-Softpatching of IPS/BPS files
-SNES mouse support (Superscope is not supported yet)
-Recording to AVI/WAV
-Debugger, including:
-Watch expressions
-Breakpoints (w/ conditional breakpoints)
-Call stack
-Memory viewer/editor (including highlighting for recent reads/writes/exec, and data/code highlighting)
-Trace logger
-PPU viewer (tilemaps, tiles, palette)
-Event viewer (the same as Mesen's)
If you've ever used Mesen, you already know your way around Mesen-S. It reuses a lot of Mesen's code (and so looks very similar in terms of both code and UI), which has allowed me to put all of this together very quickly. This also means that some features haven't been tested thoroughly - so bugs are to be expected (I am aware of a number of issues, but if you find any, please report them here!)
Also, performance isn't great at the moment (somewhat slower than Mesen in general), it's something that I hope to improve over time.
Source:
https://www.github.com/SourMesen/Mesen-S/Download:
https://www.mesen.ca/Mesen-S-0.1.0.zipWebsite/Documentation: None for now.
Here's what it looks like at the moment (in terms of debugging tools):
Attachment:
MesenS.png [ 386.44 KiB | Viewed 16700 times ]
Note: The emulator currently requires the 64-byte SPC bios to be put in Mesen's data folder (i.e the one you picked on startup) and named "spc700.rom". If the file cannot be found, the UI should print an error message about it when trying to load a game.
As usual, feedback, ideas and bug reports are very much welcome!
P.S: This is not an April Fools' joke, but I did release this now on purpose. :p
Wow, that's really exciting and cool! Congratulations on your first release! :D
Quote:
Just to be clear though, this is not bsnes-level accuracy
Someone of your talent will no doubt catch up quickly, and likely exceed what I've been able to.
Nonetheless, if you have any questions on things, feel free to DM me any time and I'll try to help if I can.
Great job, it passes all of my SNES CPU tests, even the STP instruction test which requires a reset to pass (& which crashes snes9x lol):
https://github.com/PeterLemon/SNES/tree/master/CPUTest/CPUIt is already shaping up to be a great SNES emu, thanks so much for this =D
Absolutely wonderful. I don't really have any other words right now, my mouth is still kinda agape. Romhackers incl. myself are going to be very, very happy with this and future improvements. This + bsnes-plus = fantastic.
byuu wrote:
Wow, that's really exciting and cool! Congratulations on your first release! :D
[..]
Nonetheless, if you have any questions on things, feel free to DM me any time and I'll try to help if I can.
Thanks! And thank you for bsnes/higan, too! A lot of this was only possible because I had bsnes-plus to compare against, which wouldn't have existed without bsnes in the first place. I'll most definitely have questions eventually when I get to the more obscure stuff (and probably some of the not-so-obscure stuff, too.).
krom wrote:
Great job, it passes all of my SNES CPU tests
Thanks for those tests, by the way! They were extremely useful when I started this and helped me fix a pretty large amount of CPU bugs (though I did find a few bugs that they don't seem to catch - if you're interested in adding some more test cases to them, let me know and I'll try to come up with a list of things I found so far)
koitsu wrote:
Absolutely wonderful. I don't really have any other words right now, my mouth is still kinda agape.
It's been a bit of a marathon trying to get this into a releasable state by April Fools'! I'll probably slow down my pace a little bit, but I should be able to add a lot of the missing stuff (compared to Mesen) over the next few months.
Less than one month and this. Crazy.
I've only used this a little bit but I have to say, the debugger is super nice. I was able to watch SRW3's event command queue change in real time in the memory viewer, which by itself is really neat. Then I used a breakpoint on write to memory range to find one byte in the ROM and make a change that'd stumped me for a few days, that I'd have otherwise needed to dig through dozens of megabytes of tracelogs to find. You've just saved me a *lot* of trouble
I did, however, hit a few bugs:
I brought up the memory viewer and tried typing in it at 7E:EDA1 I think it was. The whole emulator crashed.
(Not sure if this will repro, but I used Go To from the menu to jump to 7E:EDA0, scrolled up one line so I could see from 7E:ED90, then clicked at 7E:EDA1 and tried to type "5F". It crashed when I hit 5.)
When I brought the program back up, it wouldn't load the savestate I'd created prior to the crash.
I rebound a few keys - swapped the as and zx keys in the default arrow-key configuration, and when I restarted the emulator I noted that while the change appeared to have saved, I was getting strange behavior from the controller inputs. Hard to exactly describe, but it was either that some of the re-bound keys were ignored, or it was trying to use both my new settings and the defaults at the same time.
Those issues aside, this is already *massive* improvement on all the other debugging-enabled SNES emulators that I've tried. I can easily see this replacing several pieces of my toolkit. Looking forward to seeing where you take it!
Aside from the latest version of NSFPlay, this is the second thing I've been super hyped about this year. Almost speechless.
The debugger is awesome. It is so nice to have a visual overview of all the register writes in the event viewer.
One thing I noticed was a breakpoint on register $2133 was triggering at cycle 340 but the event viewer only goes to 339 so it didn't show up there.
Really liking it so far! Got a couple bugs that might already be on your radar:
1.) Mega Man X's copy protection is getting triggered. This is applicable to both 1.0 (CRC32 1033EBA4) and 1.1/Rev 1 (CRC32 DED53C64). It's somewhat random in how it works, but
https://tcrf.net/Mega_Man_X#Copy_Protection has a nice write up on it. I actually didn't know Rev 1 HAD copy protection at all, but with both ROMs I ended up being thrown back to the intro stage after the first boss was defeated. I guess this is a cartridge database issue?, but thought I'd mention it anyway in case.
2. Changing the video scale doesn't automatically change the window size like it does in Mesen.
3. I also get a lot of audio clipping if I increase the Master Volume, in fact, 10 seems to be better than default of... 25 I think it was? No such issue in Mesen, which I have at 100% and everything is crystal clear. If I decrease the Master Volume in Mesen-S and just turn up Windows audio, it's also crystal clear. I'm on Windows 10, 1809. I'll see if there's an update to the audio drivers in a little bit too, might be something dumb like that.
geod wrote:
Less than one month and this. Crazy.
To be fair, it's been 7 weeks :p
Gideon Zhi wrote:
I've only used this a little bit but I have to say, the debugger is super nice
Happy to hear it's already useful! There is actually a ton of features still missing when you compare it to Mesen's debugger, I do plan on adding most of them over time.
Thanks for the bug reports - I'll try and see if I can reproduce the crash you got.
For the key bindings, it's possible you doubled up the bindings - when you press "Setup" for the controller, the window has 4 tabs allowing you to bind the same controller button to multiple keys, you might have some leftover bindings in the 2nd tab that are conflicting with the ones you've changed in the first tab?
paulb_nl wrote:
$2133 was triggering at cycle 340 but the event viewer only goes to 339 so it didn't show up there.
Damn, you've already discovered my terrible secret. At the moment the code emulates 341 regular cycles per line, rather than 338 regular cycles + 2 longer cycles, it comes out as the same number of master cycles but does have some implications (both with debug tools & emulation itself). I do want to fix the PPU to properly run the longer cycles, though, it's mostly a matter of figuring out the best way to do it with as little overhead as possible.
AxlRocks wrote:
I guess this is a cartridge database issue?, but thought I'd mention it anyway in case.
That's good to know, I actually played MMX a bit, but did not get far enough to trigger it (didn't know that protection existed). It's most likely a case of incorrect memory mappings, like you said - the core currently just assumes there are only 2 board types ("lorom" and "hirom") and nothing else. Is there a proper database of cartridges available somewhere? There only thing I found when I looked for one before was a website with relatively vague/incomplete information.
The video scale thing has always been a bit of a pain in Mesen, too - I'll get it working... eventually.. :p
I'll have to see about the volume, I didn't really put much thought into the slider, I think it's pretty much the same as Mesen (25% is 1x, 100% is 4x). It's very possible it clips rather early, I'll have to test and readjust the default/formula a bit.
SuperFamicom.org is the only cart database I know of. Perhaps icarus in higan might help?
For a ROM whose size is not a power of two, you'll want to split it at the largest power of 2 no larger than the ROM and then double up the remainder as big as the first. For example, you'd split a 10 Mbit ROM into 8 and 2 Mbit pieces and then quadruple the 2 Mbit part.
Sour wrote:
Is there a proper database of cartridges available somewhere?
The only one I know of
is this (and
related), which isn't "exceedingly" helpful from an emulation POV, but it might be better than nothing. Possibly Cowering has something?
If you need documentation on what the majority of official PCBs do, address-space-mapping-wise, that information is available, but you might already have it; ask in PM.
Quote:
I do want to fix the PPU to properly run the longer cycles, though, it's mostly a matter of figuring out the best way to do it with as little overhead as possible.
Count every scanline as 1364 clocks, except for:
* NTSC scanline 240 field 1 in non-interlace mode (1360 clocks)
* PAL scanline 311 field 1 in interlace mode (1368 clocks).
When the counters are latched via $2137 reads or clearing $4201.d7 on writes, compute the dot position from the clock position. But don't forget that long dots don't exist in the 1360-clock NTSC scanline. A simple approximation:
Code:
return (hcounter - ((hcounter > 1292) << 1) - ((hcounter > 1310) << 1)) >> 2;
I don't yet know what happens with long dots on the PAL 1368-clock scanline.
It's also worth noting that the long dot start positions are fussy. 323 and 327 are the most likely, but I often see 322 and 326 become the long dots, and sometimes even 321 and 325. It can even change at run-time on the same system, so it's not like the NES startup phase states (the SNES has those too, but you can only get them if you reset the system via the cartridge port, ala the sd2snes.)
The CPU NMI and IRQ timing is based off of the Vblank and Hblank pin outputs from the PPU, and it keeps its own internal counters, which means that IRQ timings are not affected by long dots.
Quote:
Is there a proper database of cartridges available somewhere?
Right into the deep end, I see ^-^;
It's incomplete (I have around 1200 unique games in there so far), but this has been my little niche for the past several years. I personally dumped all of these games, their full 16MiB address range, and recorded how their mirroring worked. Yes, this has been *very* expensive and time consuming ^^;
https://preservation.byuu.org/The raw database is here:
https://gitlab.com/higan/higan/tree/mas ... s/DatabaseThe first link shows you how each PCB is mapped out on the bus, and the other links give you the PCB IDs of game cartridges to map them to. There are two cases so far where the PCB ID wasn't enough for an exact memory map, so I had to put #A in the PCB IDs.
Board PCB mappings are a huge undertaking. If you would rather use heuristics (which you'd have to for all the games not in my database yet anyway), I have heuristics that work on the entire licensed, released SNES game library here:
https://gitlab.com/higan/higan/blob/mas ... amicom.cppIf you like, you may use my database or heuristics code linked above under the public domain, no credit necessary. You would have to adapt the heuristics code of course. Think of SNES PCB IDs as being similar to NES mappers. There's hundreds of them, they're just less important to get right and there's lots of overlap (multiple PCB IDs share the same memory mapping.) But there are lots of exceptions: games with coprocessors, games with flash memory, games with Game Boy, Sufami Turbo, and BS-X slots, games with both a BS-X slot and an SA1, etc. Realistically, you'll need at least 30 memory maps to run the entire library.
Or, if you'd rather the short of it, here's a crash course on heuristics:
* LoROM games under 2MB should map SRAM to 70-7d,f0-ff:0000-ffff (Wanderers from Ys requires SRAM in 0000-7fff)
* LoROM games >= 2MB should map SRAM to 70-7d,f0-ff:8000-ffff (Fire Emblem: Thracia 776 requires ROM in 0000-7fff)
* Distinguish the ST010 and ST011 based on the size of the ROM or on the internal game title
* Distinguish the SGB and SGB2 based on size of the ROM or on the internal game title
* For everything else, be as permissive as possible
Many prototype games documented by Evan at snescentral will not work with heuristics, because the SNES header bytes are all blank. You will either need a database for them (I did not make one for prototypes), a way to externally specify board mappings, or tell users to patch their ROMs to fix the bad headers (preferably with a BPS patch.)
Quote:
I'll have to see about the volume, I didn't really put much thought into the slider, I think it's pretty much the same as Mesen (25% is 1x, 100% is 4x). It's very possible it clips rather early, I'll have to test and readjust the default/formula a bit.
Correct. There is a game out there which clips if your volume goes above even 100%. SNES audio volume is just very quiet.
A fancy solution would be to detect clips and dynamically lower the audio volume. But end users may find that annoying.
Good to see that it finally came out! I will give it a try on linux but will completely ignore mac testing for now (never restored the partition and don't feel like it
)
Some minor things:
- You could mention on the website that spc700.rom can be found in higan's source code archive as ipl.rom
- In the key bindings window,
Shift is not differentiated. Also, the program disappears from the Alt+Tab list when the window is open. (Win7)
- Copy protection screen in DKC is triggered.
byuu wrote:
[...] and time consuming ^^;
Yeah...
AxlRocks wrote:
Mega Man X's copy protection is getting triggered.
Looks like this was a mirroring issue, tepples' explanation on mirroring ROMs that aren't powers of 2 seems to have fixed it. Will try to setup appveyor dev builds tonight & commit the fix for it.
byuu wrote:
The CPU NMI and IRQ timing is based off of the Vblank and Hblank pin outputs from the PPU, and it keeps its own internal counters, which means that IRQ timings are not affected by long dots.
This (and well, your whole post really) is actually very important information, thanks!
I was actually already using a slightly modified older version of your heuristics that you had posted on the zsnes boards like a decade ago, actually. It's been working pretty well (far better than what I had done initially), but I haven't gotten to implementing Ex-HiRom and the like at all yet. The database info will definitely be useful whenever I need to confirm if the default mappings might be an issue or not. I might stick to heuristics for now (especially if they work for the vast majority of games), but might try incorporating the database eventually just for the sake of having accurate mappings for known boards.
SGB is probably not something I will end up supporting (or at least, not anytime soon), Sufami Turbo I hadn't even ever heard of yet!
But still, I think ~30 somewhat complex boards (e.g coprocessors, etc.) is probably more manageable than the NES' 300+ comparatively simple boards. Or at least, they're most interesting to work on when compared to implementing the 150th slight variation of an MMC3 clone :|
Banshaku wrote:
I will give it a try on linux but will completely ignore mac testing for now
It's essentially the same framework as Mesen uses, so it probably won't work too well on macOS anyway. Still hoping to find a proper solution someday, though.
creaothceann wrote:
- In the key bindings window,
Shift is not differentiated. Also, the program disappears from the Alt+Tab list when the window is open. (Win7)
- Copy protection screen in DKC is triggered.
Shift not being differentiated is a bit of a limitation to the way keys are processed in both Mesen & this one. I can't remember the exact details anymore, but I know I did actually try to fix this at some point in the past.. (did not find a viable solution, though)
I had assumed the whole disappearing from Alt-Tab when a tool window is opened to be a Windows issue (but maybe its a WinForms issue?), I've always found this to be pretty annoying myself, though. I'll try and see if there's a solution for it.
Thanks for the DKC report - based on the tcrf, it's most likely a SRAM mapping issue, will take a look tonight.
Sour wrote:
Thanks for those tests, by the way! They were extremely useful when I started this and helped me fix a pretty large amount of CPU bugs (though I did find a few bugs that they don't seem to catch - if you're interested in adding some more test cases to them, let me know and I'll try to come up with a list of things I found so far)
Yes I would love to make my cpu tests more useful to help new SNES emulation authors, let me know anything you want me to add, & I'll try to implement it =D
Damn this is real?? I was reading this taking it half seriously thinking it was an april fools joke. A SNES emulator with both the Mesen and bsnes goodnes sounds like a dream coming true!
byuu wrote:
Many prototype games documented by Evan at snescentral will not work with heuristics, because the SNES header bytes are all blank. You will either need a database for them (I did not make one for prototypes), a way to externally specify board mappings, or tell users to patch their ROMs to fix the bad headers (preferably with a BPS patch.)
I always thought prototypes comes with blank headers because the authors just didn't bother filling them in on an unreleased game.
Anyway it sounds like sfc ROMs would benefit from an iNES-like format so that databases and heuristics can be avoided. The iNES/NES 2.0 header mess might want to be avoided though so maybe a more complete understanding of licensed games' mappings is needed before something like that is attempted.
Pokun wrote:
Damn this is real?? I was reading this taking it half seriously thinking it was an april fools joke.
Mission accomplished, I've managed to fool at least one person!
RE: Headered roms, I think that's essentially the problem that higan's game folders are meant to fix, without needing a binary format header in the rom itself. I know a lot of people apparently don't like them, but if you just zip them up and consider the zip file as a rom with a header, they're pretty much a functional solution, imo.
krom wrote:
let me know anything you want me to add, & I'll try to implement it
Looking at my commit history, here's a few things the tests don't catch at the moment:
Code:
-The CPUPHL test seems to rely on open bus behavior for $4210 reads, which is a good test in a sense, but also confusing when trying to debug it
-Shift operations when 8-bit memory flag is enabled shouldn't affect the MSB of the A register
-MVP/MVN:
-Check that the destination/source bank are correct (I had inverted them by mistake)
-Check behavior when A is set to $FFFF
-Check that the value of DBR is altered by the instruction
-Check the wrap behavior for X/Y when 8-bit indexes are enabled
-Check that MVP/MVN can be interrupted by an interrupt (this one might be a bit outside the scope of CPU-only tests, though)
-TSC/TDC/TCD always transfer 16 bits, regardless of the 8-bit memory flag
-I think BRK/COP aren't tested at all? Could be wrong - I had forgotten about the signature byte and so RTI wasn't returning to the correct address.
-HDMA:
-Check that the "fixed transfer" flag is ignored by HDMA
-Check that the "decrement" flag is ignored by HDMA
-Check that DMA gets cancelled by HDMA
-Validate some of the DMA restrictions (e.g trying to DMA from work ram to work ram, etc.)
I think those are the main ones that aren't PPU-related at all. The ADC/SBC tests don't catch everything, either, but since blargg made some tests for those particular instructions, it's not a big problem. The majority of these I found and fixed by debugging games that didn't work properly, so having test roms to validate these would have saved me a lot of time (and having them for regression testing would still be nice, too). That being said, please don't feel like you *have* to add these! At the very least maybe this list will someday be useful for someone else trying to debug their emulator :p
Pokun wrote:
Anyway it sounds like sfc ROMs would benefit from an iNES-like format so that databases and heuristics can be avoided.
On the contrary, we should go back in time and obliterate all traces of ines. Databases aren't copyrighted and are immune to the update issues the NES scene wallows in daily.
I'm going to make this crystal clear: any attempt to "invent a new file format" to relieve whatever absolute nonsensical pedantry is posted by users in this thread will completely and utterly fail. Stop talking about it. The existing widespread formats already in use won (read: what came first is what stuck). 98% of the userbase will not switch fully functional ROMs to whatever you come up with. The methodologies being used to identify this as it stands (re: heuristics and internal database) are completely reasonable.
I will not be commenting on this matter further, as there is well-established history of people trying to invent "better crap" in the past (for both consoles) and failing miserably. Multiple times. Fin.
Sour wrote:
Shift not being differentiated is a bit of a limitation to the way keys are processed in both Mesen & this one. I can't remember the exact details anymore, but I know I did actually try to fix this at some point in the past.. (did not find a viable solution, though)
I had assumed the whole disappearing from Alt-Tab when a tool window is opened to be a Windows issue (but maybe its a WinForms issue?), I've always found this to be pretty annoying myself, though. I'll try and see if there's a solution for it.
The Shift key(s) seem to confuse the GUI a lot... really just minor issues though.
More notes:
- When enabling "Show FPS", exiting the program and restarting it, the FPS is still displayed but the menu item is unchecked.
- enabling G-Sync shows a frame rate of 60/52. The 60 occasionally becomes a 61 (as expected) and the 52 also changes sometimes. Here's a video. (At the end of it I switched to another program and back to Mesen-S.)
- The Electronics Test in Nintendo's test cart (No-Intro: "World Class Service Super Nintendo Tester (USA).sfc") fails sometimes; also a line at the top is flickering.
- Here's a comparison of the graphics test with bsnes v87 (via BizHawk). bsnes is to the left.
- ActRaiser 2 (USA) doesn't get to the title screen (because it uses Direct Color mode?)
- Axelay level 2 is broken here.
- Chrono Trigger's time warp is broken.
- In Jurassic Park, when you wait on the title screen it transitions via mosaic to the highscore list. This transition seems to be broken. During gameplay the left and right borders aren't cropped.
- In Super Metroid after the title screen, the menu transitions that don't change the screen brightness don't look right (mosaic).
Sour wrote:
Looking at my commit history, here's a few things the tests don't catch at the moment:...
Thanks for the list, I'll see what I can do to improve my tests, using the help you have given me =D
I have come to agree with koitsu. All of my attempts have failed, and I burned up a lot of good will in trying to force the issue, which I regret. My current stance is that we will never replace the current file formats and must support them, but that we can support our alternatives quietly as an *option* for people who care. With the appropriate reverence of insight from blargg that this *does* cause community fragmentation and confusion, and increase complexity for future authors like Sour (see xkcd.com/927). I don't think there's an easy answer here. People are going to be unhappy no matter what.
I sent Sour a DM with detailed information on these (and no requests to support any of it my way, don't worry), but for everyone else ... here's a short list of the complications of SNES emulation:
* coprocessor firmware files, including games that shipped with both DSP1 and DSP1B
* homebrew that wishes to write custom coprocessor firmware
* prototypes with missing or invalid headers
* Rockman X v1.0 has bodge wiring on the PCB to unmap the last 8mbit of address space to avoid triggering copy protection
* Super Game Boy carts that take Game Boy sub-cartridges (and black GBC cartridges)
* Nintendo Super System per-game DIP switches to toggle lives, difficulty, etc
* the Nintendo Super System machine which has three game slots
* the Super Famicom Box machine which has three game slots
* Super Famicom Box cartridges which hold 2-3 games each
* Campus Challenge '92 and Powerfest '94 which hold a menu ROM plus three games (No-Intro stores them as separate ROMs)
* BS-X slotted cartridges that take BS Memory Packs, which are writable and maintain block erase counters
* JRA / SPAT cartridges that have writable flash memory
* the Voicer-kun (official) and MSU1 (unofficial) add ~20 or so additional CD audio tracks to games
* homebrew often makes up new "mappers", like ExLoROM, the Tengai Makyou Zero fan translation, and neviksti's SDD1 removal patch
* Sufami Turbo cartridges that take *two* Sufami Turbo games at once (consider save states in linked mode)
* peripherals like the Super Turbo File store game saves inside of them (consider where to save that memory to)
* IPS patches will always have a 50% failure rate due to copier headers
* and a few more I'm likely forgetting ...
Any SNES emulator that wants 100% completeness is going to have to address all of these. In the end, I was able to support most of it with bsnes v107, following traditional and historical ways of distributing games. In fact, bsnes is pragmatic and supports 2-3 alternative methods for most points on that list. A few exceptions require higan, but those cases are not emulated by any other SNES emulators, and are very niche.
If anyone wants to beat a dead horse with me, please switch to DMs or post in a new thread. I don't want to see Sour's thread derailed with this stuff.
Quote:
whatever absolute nonsensical pedantry
The same ASD that led to me spending 15 years perfecting an emulator for a retro video game system is what also drives my obsession with supporting absolutely everything I possibly can. I get huge levels of anxiety when things aren't "perfect" in my mind. I can't really separate the two, nor is this a condition that can be cured. I know how annoying I've been about all of this, and you all have my apologies for that. I'll do my best going forward to act professionally. I'm hopeful that Sour's emulator will be a worthy alternative for people who are tired of putting up with me, and give me some more room to experiment with my emulator.
creaothceann wrote:
More notes:
Thanks for testing!
There was a off-by-1 bug that happened when overscan mode was off (which is the source of the random line showing up on screen & seems to explain most of the differences with the bsnes video), this should be fixed.
Axelay was a bug in the offset-per-tile logic, should be fixed too.
CT was a bug where some sprites were not being shown when they should have been, this is fixed.
The borders in Jurassic Park should be fixed, was caused by color math not being applied to the subscreen pixels in hires mode.
The screen transition issues with Jurassic Park and Metroid are caused by the buggy mosaic implementation - pretty much any game that makes use of mosaic effects doesn't show up quite right.
About G-Sync, I don't actually own a monitor that supports it, so it's hard for me to say. FYI, the first number is the number of frames the emulation core generated in the last second, and the 2nd number is the number of frames that were pushed to the video card in the last second. The only thing I could see that could make it fall below 60 is if the video card was freezing the rendering thread for an abnormal amount of time while waiting on vsync. I'd imagine you'd get the same result on Mesen, though.
ActRaiser 2 seems like it's freezing due to an SPC bug (much like Illusion of Gaia). There are a couple of games that no longer work after I replaced blargg's SPC core with my own (though my core also fixed 20+ games that did have SPC-related issues, too) - ActRaiser 2 looks like it's one of the regressions. I'll try to see if I can figure out why this one freezes, maybe I'll have better luck with it than with Illusion of Gaia.
Haven't had the chance to check DKC yet, will try to figure that one out tomorrow.
Also, I added appveyor dev builds (they're linked in the readme on github), so these fixes should be available in the latest appveyor build.
byuu wrote:
I sent Sour a DM with detailed information on these
Thanks for that, very useful information that will save me from having to rewrite some portions of the code when I inevitably stumble on some hardware I didn't foresee could exist (which will definitely still happen, but hopefully just a little bit less often :p)
::four fixes in a row::
Holy heck, you are on fire!
Quote:
About G-Sync, I don't actually own a monitor that supports it, so it's hard for me to say.
The main challenge of adaptive sync is that you need your audio buffers to be extremely small, eg WASAPI exclusive or ASIO. Since your only synchronization is now based on audio, if they're too big, the stalls waiting on audio buffers to empty will become too inconsistently spaced and scrolling will become choppy.
Quote:
There are a couple of games that no longer work after I replaced blargg's SPC core with my own (though my core also fixed 20+ games that did have SPC-related issues, too)
Speaking of ... blargg's DSP core is almost perfect. The one flaw is that it reuses the 128-byte RAM block for direct register values. On real hardware, the registers and underlying RAM are separate. Magical Drop will hang on game over in endless mode if the underlying RAM is not randomized (ENDX can't equal zero), but if you actually randomize the real registers, then King of Dragons won't work (expects internal KOFF register to be zero.) Well, that and muting the DSP isn't instantaneous on real hardware, but that's getting pedantic.
A simple fix is to add a second 128-byte RAM block. Write to both, but only read from the one you randomize on reset.
Quote:
very useful information that will save me from having to rewrite some portions of the code when I inevitably stumble on some hardware I didn't foresee could exist
Thanks for taking it well. I'm trying not to overload you with minutiae and scare you off ^^;
The SNES is endlessly complex, but as long as you take it one step at a time, each task is very manageable. It's a very rewarding system to work on, and I hope Mesen-S will inspire more emudevs to try.
Sour wrote:
About G-Sync, I don't actually own a monitor that supports it, so it's hard for me to say. FYI, the first number is the number of frames the emulation core generated in the last second, and the 2nd number is the number of frames that were pushed to the video card in the last second. The only thing I could see that could make it fall below 60 is if the video card was freezing the rendering thread for an abnormal amount of time while waiting on vsync. I'd imagine you'd get the same result on Mesen, though.
As far as I know the video card shouldn't block programs at all, it just pushes out a frame as soon as it receives one and then just waits for the next frame. It only waits for the monitor's vsync if you push frames faster than the monitor's specifications allow. I'll test Mesen later today. For now I just disable G-Sync.
I played Chrono Trigger up to Heckran Cave (to test the noise effect), didn't notice any other errors.
byuu wrote:
[...] Since your only synchronization is now based on audio [...]
Wouldn't it be possible to sync on the CPU's own high resolution timer (QueryPerformanceCounter in Windows)?
EDIT: Same thing with G-Sync happens in Mesen.
creaothceann wrote:
Wouldn't it be possible to sync on the CPU's own high resolution timer (QueryPerformanceCounter in Windows)?
Yes, this is exactly what I do - high performance counters & sleep calls to regulate frame rate, and dynamically resample the audio to make sure it stays in sync.
Thanks for confirming it's the same on Mesen, at the very least it reduces the possible causes, though I'm still not quite sure what might be causing it. I'll keep it in mind and try to review that code eventually, but without a g-sync monitor it may be rather hard to fix (and I don't exactly want to buy a 300+$ monitor that I would never use again just to debug this :p)
Thanks for testing so much, by the way, really helpful to get bug reports like these. Let me know if you find anything else!
byuu wrote:
Speaking of ... blargg's DSP core is almost perfect. The one flaw is that it reuses the 128-byte RAM block for direct register values.
Yea, I don't really plan on rewriting the DSP core anytime soon (maybe much later on, just for the sake of having written my own). I think I saw you mention that register initialization problem recently somewhere else, too (maybe in another thread on here?). I'll add it to my list of things to check/fix, thanks!
byuu wrote:
The SNES is endlessly complex, but as long as you take it one step at a time, each task is very manageable. It's a very rewarding system to work on, and I hope Mesen-S will inspire more emudevs to try.
It seemed pretty daunting at first with the 100+ registers to implement and the seemingly endless number of PPU features, but being able to use Mesen's code as a stepping stone allowed me to waste far less time on non-emulation related code, so it's been surprisingly smooth sailing so far. I've been gathering/reformatting info/documents into a wiki of my own as I go, which helped a lot in getting familiar with everything (and having a reference I'm familiar with to consult whenever I need to confirm some details)
Sour wrote:
I'll keep it in mind and try to review that code eventually, but without a g-sync monitor it may be rather hard to fix (and I don't exactly want to buy a 300+$ monitor that I would never use again just to debug this :p)
Not a problem, but other users might encounter it too. Might just be an issue of my setup, or all G-Sync monitors, or all
VRR implementations. Maybe someone else here has one?
Sour wrote:
I've been gathering/reformatting info/documents into a wiki of my own as I go, which helped a lot in getting familiar with everything (and having a reference I'm familiar with to consult whenever I need to confirm some details)
fullsnes EDIT:
krom wrote:
Sour wrote:
Thanks for those tests, by the way! They were extremely useful when I started this and helped me fix a pretty large amount of CPU bugs (though I did find a few bugs that they don't seem to catch - if you're interested in adding some more test cases to them, let me know and I'll try to come up with a list of things I found so far)
Yes I would love to make my cpu tests more useful to help new SNES emulation authors, let me know anything you want me to add, & I'll try to implement it =D
A perfect CPU test set would be quite difficult though... You'd have to test that an emulator's implementation does the right thing, taking the exact amount of cycles as the real hardware, and does it for all valid inputs and under all scenarios (i.e. internal states).
Quote:
Wouldn't it be possible to sync on the CPU's own high resolution timer (QueryPerformanceCounter in Windows)?
Yes, but now you need DRC for audio to keep your buffers half-filled at all times.
Maybe we need adaptive sync for sound cards, a good old-fashioned PCM DAC register that you write the current sound card output sample to (as if :P)
Quote:
Thanks for testing so much, by the way, really helpful to get bug reports like these. Let me know if you find anything else!
Here's my evil games list, in case you wanted to test them, creaothceann:
* Speedy Gonzales (stage 6-1) [requires HDMA to update the open bus latch to break an infinite loop polling an unmapped register]
* Mecarobot Golf [picky on timing in-game]
* Jumbo Osaki no Hole in One Golf [picky on timing at the name entry screen]
* Koushien 2 [requires cycle-accurate DSP or echo RAM trashes SMP code] (if you ever replace blargg's DSP with your own, test this one)
* Air Strike Patrol [plane shadow and level start rotating text, mid-scanline writes]
* Battle Blaze [title screen flame effect and raster effect after winning battles]
* SInk or Swim [levels would break up]
* Magical Drop [endless mode game over screen hanging]
* Super Bonk [the intro sequence sometimes desyncs, but this also happens on real hardware, I never did find out why]
* Bishoujo Janshi Suchie-Pai [character sprites] and Marvelous (SA-1) [dialogue boxes] [very picky hires color add/sub effects]
* Goodbye Anthrox (atx2.sfc) [trainer that changes BGMODE mid-scanline]
* Krusty's Super Fun House [you have to block invalid DMA transfers to prevent palette corruption]
* Bugs Bunny [in-game] [requires that HDMA during DMA stops the DMA]
* Taz-Mania [in-game] [only if you emulate the MUL/DIV cycle states, this game reads the registers early]
Quote:
and the seemingly endless number of PPU features
What makes the PPU so daunting is that people don't really explain how the features all work together. Everything has to be done in a very certain order, and the settings for one thing change the effects of another thing.
With the NES, you get this wonderful break-down of every single cycle. It was trivial to run Battletoads with information that good.
Quote:
A perfect CPU test set would be quite difficult though... You'd have to test that an emulator's implementation does the right thing, taking the exact amount of cycles as the real hardware, and does it for all valid inputs and under all scenarios (i.e. internal states).
Most of my tests were like that. Mesen might be the first other emulator capable of using them.
(un)fortunately, I don't see the test_nmi, test_irq test ROMs up on snescentral ... odd. I always seem to lose those ones.
byuu wrote:
Here's my evil games list, in case you wanted to test them, creaothceann:[you have to block invalid DMA transfers to prevent palette corruption]
Krusty I fixed a few days ago, ASP *almost* works properly, but you can tell the shadow is not quite right (the timings are probably off by a couple of pixels) and the level start text has a glitch on the right hand side. I was aware of the whole Speedy Gonzales thing, but not the specific quirk it relied on. HDMA should be updating open bus like any other regular write, but it's possible open bus itself isn't quite correct yet. The others I don't think I've really tested yet, thanks for the list!
Quote:
What makes the PPU so daunting is that people don't really explain how the features all work together. Everything has to be done in a very certain order, and the settings for one thing change the effects of another thing.
Which is exactly the problem with my mosaic implementation at the moment :p There's something about how its applied that I'm not quite getting yet.
Quote:
Most of my tests were like that. Mesen might be the first other emulator capable of using them.
(un)fortunately, I don't see the test_nmi, test_irq test ROMs up on snescentral ... odd. I always seem to lose those ones.
Speaking of which, are the ones up on snescentral essentially all the ones you've written? I pass half or so of those I could find there at the moment (the HDMA timing ones fail, but it's a probably due to a combination of factors beyond just HDMA). Beyond that I have krom's tests, a number of tests by blargg (adc/sbc, oam, mul timing, both sets of spc tests) and that's more or less it. Am I missing anything major?
creaothceann wrote:
A perfect CPU test set would be quite difficult though
blargg pretty much created a nearly perfect set of test roms for the NES - if you pass all of his tests, you can essentially assume your emulator will run 99% of games without any issue. Really wish the SNES test roms were as complete as those, but beggars can't be choosers. I unfortunately completely lack the skills required to write any meaningful tests, and also lack the equipment needed to run them :p
Edit: Also, DKC should be fixed now, was a save ram mirroring issue as expected (my code was broken when trying to mirror SRAM sizes smaller than 4kb)
Quote:
Which is exactly the problem with my mosaic implementation at the moment :p
Mine is currently not perfect either. H mosaic and V mosaic are quite different. V mosaic has odd restart rules for messing with it during rendering, and there's a weird quirk with mode 7 EXTBG and mosaic. But then, EXTBG mode is another thing I don't fully emulate ... that actually has effects on all eight BG modes. Just not useful ones.
Quote:
Speaking of which, are the ones up on snescentral essentially all the ones you've written?
Oh no, it's a small number of them. I put them all on a flash drive for a disk reformat, then lost it for a while, and by the time I found them, they bit-rotted away. I still have all the files and source, but their file sizes are zero bytes.
The tests you most want to pass are demo_nmi, demo_irq, test_nmi, test_irq, test_dma*, test_hdma*. You can pretty much assume all the games with IRQ issues (Battle Blaze, Cu-On-Pa, etc) and HDMA issues (Mecarobot Golf, Energy Breaker, Circuit USA, etc) work if you pass all of those.
blargg's MUL timing test doesn't work on real hardware, but I have memory that it used to. I suspect Evan found an outdated copy of it, maybe?
Quote:
and also lack the equipment needed to run them
You've more than proven yourself worthy. If you'd like the means to test your code on real hardware, I can send you my sd2snes. If you don't want to provide an address to send it to (I wouldn't blame you), I can Paypal you some funding for one.
I feel you on test ROMs and documentation. My efforts were super lazy there.
You probably will want to dig through the bsnes source code though. A lot of my findings I never wrote about. A few off the top of my head:
* the DRAM refresh point plus the HDMA frame initialization point is different between CPUr1 and CPUr2
** (^ likely to fix the CPUr1 crash when performing DMA and HDMA at the same time)
* when OAM interlace is on, and sprite size is 0, OAM sizes 6 and 7 are halved to {width}x16 instead of {width}x32
* HDMA does all of its transfers first, and *then* updates all of the addresses as needed
** (^ otherwise you might run out of Hblank time for really complicated 8-channel HDMAs)
* if the final HDMA transfer on an entire frame is indirect, the last HDMA indirect address only fetches the low byte, and then sets the channel indirect address to lowbyte<<8 (so the channel low byte is 0x00), and it saves one cycle
* if an interrupt occurs during an instruction that is {opcode fetch + I/O cycle}, eg nop, inc, etc: then the I/O cycle is transformed into a bus read cycle from the current PC address instead. If the bus read is from a slow ROM region, this will cost another 2hz of time to pass.
** (^ I spent weeks tracking that down because latching IRQ / NMI PPU counters kept being off slightly from real hardware. It took forever to test every possible thing it could be on my old SNES copier.)
* if you write to VRAM on the *very* last possible cycle before frame rendering starts, it writes the CPU open bus value to VRAM instead of the value you wrote to it
* VIRQs and HVIRQs fire one scanline later than expected on the very first frame after reset (likely the CPU counter is misaligned)
* it takes either 186 or 188 cycles to complete a CPU reset before code starts executing (it varies)
* CPU soft reset acts just like a regular interrupt: it pushes the PC address and P register onto the stack, but there was a gotcha because the CPU switches to emulation mode on reset which causes stack wrapping around $01xx.
But there's definitely more I'm forgetting in there ...
I just wish there was something like the nesdev wiki for the SNES which has very complete information for both emulator authors and homebrew developers alike. Fullsnes is nice and superfamicom.org exists, but it's still not as complete.
Sour wrote:
Pokun wrote:
Damn this is real?? I was reading this taking it half seriously thinking it was an april fools joke.
Mission accomplished, I've managed to fool at least one person!
Hahaha! So it WAS an april fools joke but in reverse... or something!? Ah my head hurts!
byuu wrote:
Quote:
whatever absolute nonsensical pedantry
The same ASD that led to me spending 15 years perfecting an emulator for a retro video game system is what also drives my obsession with supporting absolutely everything I possibly can. I get huge levels of anxiety when things aren't "perfect" in my mind. I can't really separate the two, nor is this a condition that can be cured. I know how annoying I've been about all of this, and you all have my apologies for that. I'll do my best going forward to act professionally. I'm hopeful that Sour's emulator will be a worthy alternative for people who are tired of putting up with me, and give me some more room to experiment with my emulator.
That nonsensical pedantry and research has given us a wealth information and made the SNES emulation and development scenes go forward (which can't be said about for example Nintendo 64 emulation until much more recently). I wholly understand your sense of perfectionism.
I just have to say that I also agree with Koitsu, and I'm not a fan of making new formats and standards left and right (for the same reason as in that linked xkcd strip). But I think this discussion will come up every time SNES "mappers" are talked about, and like Byuu said, there might not be a good solution to it.
Anyway I tested Mesen-S a bit. I like how similar it is to the great Mesen which has a very intuitive interace. It did hang a few times though when setting up input buttons.
I checked Mesen-S on my problem games and some games have the same issues that I had on my FPGA SNES
.
1. Wild Guns, flickering
TSB $66 must be executed before runs NMI handler.
Attachment:
Снимок.PNG [ 54.95 KiB | Viewed 3228 times ]
https://board.zsnes.com/phpBB3/viewtopic.php?p=77276&sid=bf8f0333cabdc554baa99bf8b3946eb4#p772762. Robo Cop Vs Terminator, flickering
If V-IRQ enable and current scanline = VTIME then IRQ set immediately. Scanline 31 IRQ handler finished on scanline 32 and immediately runs scanline 32 IRQ handler, but in Mesen-S scanline 32 IRQ handler execute on next frame, so current frame still in force vblank mode.
Attachment:
Снимок2.PNG [ 40.44 KiB | Viewed 3228 times ]
3. Uniracers
https://github.com/MiSTer-devel/SNES_MiSTer/issues/26https://forums.nesdev.com/viewtopic.php?f=12&t=18447I still have one issue with HV-IRQ in Top Gear 3000, but it with DSP4 chip.
>
https://board.zsnes.com/phpBB3/viewtopi ... eb4#p77276Oh heck, 2005. I remember that now:
https://gitlab.com/higan/higan/blob/mas ... ma.cpp#L20When DMA/HDMA events fire, it blocks interrupts from triggering in the next IRQ test (which is on the last work cycle of each previous instruction, or the first bus cycle of each next instruction.) Confirmed on hardware.
I don't know if you emulate the psychopathic DMA<>CPU sync yet Sour. If you don't, I can explain it in detail. But basically, to start an H/DMA, it waits one cycle and then the clock aligns itself to a multiple of 8 cycles since reboot. When H/DMA ends, it aligns itself to an even multiple of the last CPU cycle that executed. That alignment probably does odd things that affect IRQs from triggering.
> 3. Uniracers
Two-player mode writes to OAM during Hblank in the middle of the frame. Not 'supposed' to be allowed. The write ends up going to where the PPU last fetched sprite data, which so happens to be the byte that Uniracers needs to write to move the sprites on the second player half of the screen around.
While we're on the subject, lots of titles also write to CGRAM not just in Hblank, but during screen rendering. Those writes go to whatever palette entry the PPU is currently fetching for display. The easy/lazy hack is to just assume it's CGRAM[0], or the usual backdrop color.
Quote:
I just wish there was something like the nesdev wiki for the SNES which has very complete information for both emulator authors and homebrew developers alike. Fullsnes is nice and superfamicom.org exists, but it's still not as complete.
What we really need is someone starting on a new SNES emulator (Mesen's probably too far along already) willing to write up detailed pages about each specific game's quirks that they had trouble emulating. That stuff is pure gold for new emudevs. "Are Wild Guns sprites flickering? Here's what you do to fix it. Here's a test ROM you can use so you don't have to test against the game itself."
Yea, Circuit USA is one I still need to fix, most likely timing related. I'll have to find the nmi/irq ones and try to figure out what they expected. Thanks!
RE sd2snes: As much as I appreciate the offer, I honestly don't have the assembly/hardware-level knowledge to actually write good tests (I've written a couple of basic ones for the NES, but they weren't timing-sensitive things, etc.) Not to mention I usually have my hands full already just coding the emulators in the first place, heh.
Thanks for the list of quirks, I think a couple of them I've seen mentioned in anomie's docs, but a lot of these I don't recall reading anywhere so far.
srg320 wrote:
I checked Mesen-S on my problem games and some games have the same issues that I had on my FPGA SNES
Thanks for testing and for the extremely detailed bug reports! I think the first couple of issues are pretty likely to fix some flickering problems I've encountered in another couple of games. If only all bug reports came with all the information needed to fix the problem, too! :p
byuu wrote:
I don't know if you emulate the psychopathic DMA<>CPU sync yet Sour.
I have a basic implementation of it based on anomie's timing info, but it's definitely not perfectly accurate yet.
byuu wrote:
While we're on the subject, lots of titles also write to CGRAM not just in Hblank, but during screen rendering.
Ah, that's interesting, and something that I didn't expect (unlike the OAM bit). Writing to palette RAM on the NES during rendering is actually impossible, afaik.
byuu wrote:
What we really need is someone starting on a new SNES emulator (Mesen's probably too far along already) willing to write up detailed pages about each specific game's quirks that they had trouble emulating.
I've got you covered! :p
Attachment:
Untitled.png [ 33.51 KiB | Viewed 3185 times ]
I've been keeping track of what games were fixed by what changes in my code as I fix them, so I'm hoping to have a decent list of them eventually.
FYI, the wiki I've been working on is here:
https://snesdev.mesen.ca/wiki/It's very much incomplete and a work in progress, though.
There was
some undefined behavior with the math registers. We might need to wait for Visual5A22 for that...
Quote:
There was some undefined behavior with the math registers.
Unknown/unemulated hardware behaviors:
1. when you start a MUL during a DIV or vice versa with the SNES, both run at the same time and the results are nonsensical. It's sharing logic gates among both paths, basically. It's probably the easiest way to detect an emulator right now.
2. muting the DSP causes a short pulse effect where the audio fades out instead of instantly muting. This seems to happen in the analog space.
3. (partially emulated) the SNES brightness register is analog, not digital. A value of 0 is not actually black. It's like 99% black, similar to a CRT TV's mute function where if you put your ear against the speaker, you can still hear it. Max out your brightness and you can just barely make out an image. 8-bit RGB lacks the fidelity to show this.
4. the SNES brightness register has serious issues on 1CHIP PPUs. It can take several scanlines to update. Games that use it for gradient fades, and Air Strike Patrol's shadow, will render completely wrong.
5. on CPU revision 1, a DMA firing too closely to HDMA (or vice versa) will deadlock the system. I don't know the details as to how, but my presumption is somehow it causes the CPU to increment the program counter one too many times, misaligning things. If this is true, it could be possible to safely DMA during HDMA on CPUr1 by including NOPs immediately after sta $420b?
6. if both the CPU and SMP access the I/O ports (2140-217f / f4-f7) at the same time, a bus conflict occurs and the results get ORed together. I tried to write a test ROM for this and it caused my copier to immediately stop working, which was probably just coincidental, but worth keeping in mind just in case. I've also read 16-bit operations to these ports are dangerous?
7. the PPU uses the PPU MUL functionality during mode 7. If you use the MUL regs then, you'll probably either break mode 7 rendering, or get mode 7 results instead of what you asked for. The results appear to be as fast as you can read them, unlike CPU MUL. But I haven't tried reading the results as instantly as possible.
8. EXTBG can be enabled in any background mode. It does odd things in all of them, not just mode 7. Grouping it here, mosaic is likely not emulated correctly either.
9. there's a very obscure SMP timer glitch that blargg discovered on older SMP revisions. He mapped out a probabilities chart for it.
10. the exact behavior of the bus conflict manager of the SA1 (stalls to the SA1 when the CPU accesses the same memory region at the same time) is unknown. I was able to get the results ~98% correct compared to Vitor Vilela's test ROMs, but there are some incomprehensible numbers in there still. (emulating these bus conflicts is really painful, too.)
11. it's been reported to me that my emulation of the NEC uPD7725 has a bug in the carry flag implementation. Regrettably I lost the message I received about it (I have trouble with organization.)
12. I strongly suspect that an NMI triggering during an IRQ will transform the IRQ into an NMI during certain cycles (as it does on the NES CPU.)
13. finally, exact PPU timing behavior is unknown. When do registers get latched internally? When does the PPU access certain memory locations?
Implement all of that, and my perfectionism will be satisfied that SNES emulation is complete enough. There will always be bugs we don't know about, but that's unavoidable.
Pokun wrote:
Anyway I tested Mesen-S a bit. I like how similar it is to the great Mesen which has a very intuitive interace. It did hang a few times though when setting up input buttons.
Whoops, forgot to reply to this earlier. I haven't been able to reproduce this unfortunately - anything else you might have noticed in terms of steps to get it to crash, etc?
Wild Guns, Robocop vs Terminator & Uniracers should all be fixed - though that unfortunately didn't fix the other 2 games I knew of that have flickering issues (Alien vs Predator + Pocky & Rocky).
General feature request (I've wanted this in Mesen just as bad): File -> Reload ROM (or Reopen ROM). Equivalent to powering off + closing ROM + opening exact same file. I know infiniteneslives has wanted this as well. :-)
byuu wrote:
Regrettably I lost the message I received about it (I have trouble with organization.)
"Only wimps use [...] backup:
real men just upload their important stuff [...], and let the rest of the world
mirror it"
/sbyuu wrote:
If this is true, it could be possible to safely DMA during HDMA on CPUr1 by including NOPs immediately after sta $420b?
It
seems Tepples and lidnariq have a 1/1/1 SNES...
koitsu wrote:
General feature request (I've wanted this in Mesen just as bad): File -> Reload ROM (or Reopen ROM). Equivalent to powering off + closing ROM + opening exact same file. I know infiniteneslives has wanted this as well. :-)
"Power Cycle" in both emulators does this - it'll scrap everything and reload the ROM from disk.
Sour wrote:
Pokun wrote:
Anyway I tested Mesen-S a bit. I like how similar it is to the great Mesen which has a very intuitive interace. It did hang a few times though when setting up input buttons.
Whoops, forgot to reply to this earlier. I haven't been able to reproduce this unfortunately - anything else you might have noticed in terms of steps to get it to crash, etc?
I just got straight into the input options and setup SNES controller buttons to the keyboard and to an USB joystick (a Wii Controller Classic with an adapter). I noticed that after clicking a controller button to assign, it will take notably longer time to bring up the window that prompts for an input device button to press than it did in Mesen. A few times it took too long and it seemingly stopped responding so I forced Mesen-S to shut down (forcing me to redo all the buttons again as nothing was saved). I'm using the same computer and same joystick in Mesen without problems.
Amazing progress with your emu Sour, & thanks for that wiki you are making. That will be a great resource for new SNES emu authors!
Also thanks byuu for helping out too =D
- DKC copy protection screen: doesn't appear any more
- program disappearing from the Alt+Tab list when the key bindings window is open: doesn't happen on Windows 10
- World Class Service Super Nintendo Tester: seems like electronics test only fails when using Fast Forward?
- Chrono Trigger time warp: fixed
- Speedy Gonzales: the HDMA issue doesn't seem to cause an infinite loop (ironically I don't know where to go after that to finish the level...)
- Mecarobot Golf: seems fine
- Koushien 2: seems fine
- Air Strike Patrol: freezes on
this screen unless I go up and press A (
savestate). Also when pressing A on the briefing screen there's a glitch (
2,
3). There's another one when gameplay starts (
4).
- Battle Blaze: looks fine
- Sink or Swim: the water covers the whole screen after placing a bomb and using a ladder. This persists across resets and power cycles.
-
Goodbye Anthrox: shows graphical corruption
- Krusty's Super Fun House: some graphical corruption on the left side of the screen, didn't see any palette corruption
- Bugs Bunny: looks good?
Needs to be tested by someone who knows Japanese:
- Bishoujo Janshi Suchie-Pai
- Jumbo Osaki no Hole in One Golf: doesn't seem to display the letters that I'm choosing?
- Magical Drop: not sure what Endless Mode is
Quote:
- World Class Service Super Nintendo Tester: seems like electronics test only fails when using Fast Forward?
That's because it tests range-tile over, but fast forward usually skips frame rendering. Computing the sprites just to pass the electronics test isn't worth the performance penalty when the goal of fast forward is to be as fast as possible.
Quote:
- Speedy Gonzales: the HDMA issue doesn't seem to cause an infinite loop (ironically I don't know where to go after that to finish the level...)
Wait, he got it right without even knowing about that? That is sooooooooo cool!!
Quote:
There's another one when gameplay starts (4).
PPU timing really is a sore point in SNES emulation. There's lots of games that rely on it being right:
* Super Mario World level ... 1-3? 1-4? The one with the water, will show a black line at the top left if timing is off too much
* Megalomania intro will show a black line if object fetch timing is off too much
* Dai Kaijuu Monogatari II will show a distorted line at the top of the HP status bar line in battles
* NHL '94 will flicker on the intro
* Winter Olympics will show one black line when starting a game from the main menu
* and there's many more games like this. Sorry I don't remember them all ...
Quote:
- Sink or Swim: the water covers the whole screen after placing a bomb and using a ladder. This persists across resets and power cycles.
It's an IRQ bug. In that case, F1 Grand Prix would also have a broken HUD.
I'm really sorry, but it's been twelve years since I fixed that bug, so I can't tell you the exact problem anymore ...
If anyone has test_nmi / test_irq to give Sour, I'm sure those test ROMs test the behavior those two games rely on.
Quote:
- Magical Drop: not sure what Endless Mode is
No need to test this one, it won't work. The bug is in blargg's DSP (well, technically in Magical Drop, but ...) Sour will need to patch the code as I mentioned before.
Alien Vs Predator
There are 2 V-IRQ interrupts by frame: 16 and 209 scanlines.
On line 209 every first IRQ handler is ending on line ~210. On line 225 run NMI handler.
Attachment:
Снимок.PNG [ 56.45 KiB | Viewed 7352 times ]
On line 209 every second IRQ handler is ending on line ~238-245 (i.e. in VBlank), NMI handler does not start and one frame skipped (remains black), but NMI handler should start inside IRQ handler on line ~226, when IRQ handler enabled nmi interrupt.
Attachment:
Снимок2.PNG [ 56.43 KiB | Viewed 7352 times ]
More tests...- F-1 Grand Prix: as expected, flickering HUD during a race
- Mouryou Senki Madara 2: intro text looks fine
- Tetris Attack map screen in Vs. mode: looks fine
- Timecop distortion of the ship in the intro: looks fine
- SMW lvl 1-4: looks fine
- Mega Lo Mania intro:
shows a black line- Daikaijuu Monogatari II: there's just
a little bit of garbage at the bottom of the screen
- NHL '94:
flickering Mode7 (IRQ issue?)
- Winter Olympics: looks good?
byuu wrote:
That's because it tests range-tile over, but fast forward usually skips frame rendering.
Actually it shouldn't in this case. On the NES skipping the screen rendering can easily make pretty much any game lock up, so FF only removes the sleep call between frames, it shouldn't change the results of emulation. This does explain why bsnes-plus fails some stuff when fast forwarding, though! And might actually be worth exploring as an optimization (e.g process only the sprite flags without rendering any pixels). I'll have to try and figure out why that test is failing.
Quote:
Wait, he got it right without even knowing about that? That is sooooooooo cool!!
Beginners' luck, I guess!
Quote:
PPU timing really is a sore point in SNES emulation. There's lots of games that rely on it being right:
Thanks for the list, I'll have to check them. It should already be pretty close to bsnes-plus' timings (at least as far as comparing trace logs), but there are definitely a few details that aren't properly implemented yet.
Quote:
I'm really sorry, but it's been twelve years since I fixed that bug, so I can't tell you the exact problem anymore ...
If anyone has test_nmi / test_irq to give Sour, I'm sure those test ROMs test the behavior those two games rely on.
No problem! Just knowing it's an IRQ bug is already helpful. But yea, if someone has the test_nmi/test_irq tests that'd be great, I haven't had any luck finding them anywhere (this sounds like the DSP tests all over again...)
creaothceann wrote:
More tests...
Thanks for retesting & the new test results! I'm in the middle of adding SPC tracing/disassembling/stepping at the moment, but will check these over the next couple of days.
srg320 wrote:
On line 209 every second IRQ handler is ending on line ~238-245 (i.e. in VBlank), NMI handler does not start and one frame skipped (remains black), but NMI handler should start inside IRQ handler on line ~226, when IRQ handler enabled nmi interrupt.
Ah! That sounds like the same behavior as the NES has, which I did not implement in this. Thanks for going through the trouble of debugging this for me, should be a pretty easy fix!
byuu wrote:
Quote:
- Sink or Swim: the water covers the whole screen after placing a bomb and using a ladder. This persists across resets and power cycles.
It's an IRQ bug. In that case, F1 Grand Prix would also have a broken HUD.
I'm really sorry, but it's been twelve years since I fixed that bug, so I can't tell you the exact problem anymore ...
If anyone has test_nmi / test_irq to give Sour, I'm sure those test ROMs test the behavior those two games rely on.
Like this collection of tests I found one day, trying to go hunting for some other elusive test ROMs that byuu was looking for? They're from 2005-2008, though, and some of them are $200-byte headered... most of them are HDMA/IRQ tests. Two of them are test_nmi and test_irq. I found these in 2016 while I was attempting to go hunting myself.
Should we expect in a few years from now mesen-gb, mesen-sms and mesen-pce?
Just pulling your leg. I still need to test it, didn't have time this week but can't wait to try it. Should be able soon.
But I think Mesen might be "Nes-em[ulator]" backwards, so Mesen-S is "S[uper]-Nes-em[ulator]".
In other words we are expecting Megebo, Mesms and Mepce from now on.
jk
KungFuFurby wrote:
Like this collection of tests I found one day
Thank you! This should be incredibly useful in fixing some of the remaining issues.
Banshaku wrote:
Should we expect in a few years from now mesen-gb, mesen-sms and mesen-pce?
Never say never, but probably not. Only the NES & SNES really interest me in terms of games (though I know very little about the games available on other consoles). At the very least, I'm absolutely sure I won't ever try emulating a 3d console :p
Pokun wrote:
In other words we are expecting Megebo, Mesms and Mepce
You're right, but you're doing it wrong! It should be Meecp and Mebg :p (What's Gebo?)
RE: The crash/freeze, still haven't managed to reproduce it on my end. I'll give it a try on the other 2 PCs I have access to when I get a chance, hopefully I can reproduce it there.
Here's a fun one:
This demo has two different modes beside each other on the same screen. Currently no emulator gets it perfect, the most common issue being that the right mode flickers.
In Mesen-S, the mode on the right does indeed bounce about (It's stationary on hardware) but the more interesting part is that there appears to be a mode 5 background behind the mode 3 one on the left.
In this one I've also used windows to hide the middle section: on hardware it's a mess.
This demo is also a good use for the event viewer, I found. That feature is *really* cool.
What's this one supposed to show? All I get at the moment is this:
Attachment:
test.png [ 8.71 KiB | Viewed 7214 times ]
Which makes me believe the test is telling me it's working properly. In snes9x/bsnes-plus I get "Incorrect behaviour -emulator" screens. But you said Mesen-S fails it, so I tried the 0.1.0 release build instead of the latest commit and got the same result so... I'm confused? :p
From your description, I'm starting to think you might have attached the wrong file, but I'm not sure?
Quote:
- Mega Lo Mania intro: shows a black line
Ah, a full line. Wrong part of Winter Olympics then, they both do the same thing: change the OAM base address halfway through the frame, to render an entire background using only sprite data. At least the latter had a good reason for it, using the BG for a mode 7 foreground.
Quote:
- NHL '94: flickering Mode7 (IRQ issue?)
It changes video mode registers in the middle of the scanline.
Quote:
Like this collection of tests I found one day
Ah, thank you very much! I'll have to point Evan at these this time.
Quote:
Never say never, but probably not. Only the NES & SNES really interest me in terms of games (though I know very little about the games available on other consoles). At the very least, I'm absolutely sure I won't ever try emulating a 3d console :p
I said the same back when bsnes was only an NES+SNES emulator ;)
Look on the bright side, it'll only take you 6-12 months at most to catch up to my 22 emulators :P
Quote:
Here's a fun one:
Ah right, the Megalomania fix landed a while back, I'll take a look at this again now.
Sour wrote:
RE: The crash/freeze, still haven't managed to reproduce it on my end. I'll give it a try on the other 2 PCs I have access to when I get a chance, hopefully I can reproduce it there.
I'm also unable to reproduce it now. It seems the input-wait window now appears much faster as well. The event handler also crashed the emulator the first time I tried it but it works OK now. Maybe my computer was playing tricks on me.
Sour wrote:
Pokun wrote:
In other words we are expecting Megebo, Mesms and Mepce
You're right, but you're doing it wrong! It should be Meecp and Mebg :p (What's Gebo?)
I got Mesms right!
Gebo is short for Game Boy. GB is a hard to pronounce combination of consonants so therefore gebo (remember the "a" in "game" is pronounced [ei] in English), but I guess it should be bego. Meecp is easy to pronounce if the "c" is pronounced as an s-sound rather than a k-sound.
Sour wrote:
At the very least, I'm absolutely sure I won't ever try emulating a 3d console :p
I'm sure we will see a cycle-accurate Me46 in a few years.
Sour wrote:
From your description, I'm starting to think you might have attached the wrong file, but I'm not sure?
Ah, you're right, I did! Sorry for wasting your time.
The test I accidentally attached is actually one to test what happens to sprites as you force blank during h-blank. It should not load them, causing them to disappear and show the correct emulator screen.
However there are many quirks surrounding this and that test doesn't really show those. I'll come up with a better one later.
The one I meant to attach is below. It should look a lot more like I described.
Is this your first time releasing this demo? I don't see any posts with images of what it's supposed to look like.
I see both images stationary in higan, but the colors are wrong on the right. This one sets mode 3 in hblank, and runs that for half the frame, then switches to mode 5.
The right looks the same even if I force the PPU to always use mode 5, so ... if the colors are wrong, then I have no idea why.
Ah, I don't believe I released this one. Sorry.
The colours on the right are supposed to be like that, because they're corrupted by the ones on the left. I haven't worked out how to not make that happen, but It's probably more to do with just my bad palette organisation than anything.
I've attached screenshots from the Bsnes-plus tilemap viewer - they look like that on hardware. Altthough Higan is stationary, it still has a jagged edge, for some reason.
I can move this into a different thread if you want
Oh, I see. Basically your BGMODE writes are landing at different spots each frame, which is resulting in the change happening immediately in higan, so when you're in the middle of a tile, it ends up misaligning the tiledata shifter.
So, either the BGMODE change only happens every 8 pixels at a tile fetch edge, or the tile shifter logic has to be smarter about mode changes to different bit depths.
https://i.imgur.com/S0r7k8R.png (larger image, so not inlined.)
I think for now, I probably won't emulate this, since I don't know which behavior is actually correct. But it does reveal an interesting detail, so thank you for the second test ROM!
Is this not the exact thing that
Pan of Anthrox's old mode7.smc demo did? Subject discussed
here? Or is this something different?
(Apologies for not having those recordings any more, BTW. I can re-do them easily though the same hardware (SNES + SD2SNES) and using OBS Studio + USB3HDCAP, since I just this weekend got my SNES hooked up to my USB3HDCAP for doing 60fps recordings. If needed, just ask.)
Yes, it's the same idea, but instead of mode 0->7, it's mode 3->5. Given the nature of the Anthrox plasma effect, both sides being lowres, and both sides being the same color depth for tiles, there's no obvious misalignment there.
The combination of all of that gets in the way here. It's an easy fix, once we know which behavior is the correct one, in any case. But guessing fixes just results in breaking other games and making it harder to find the real fix later on.
I suspect BGMODE changes only take effect on tile edges, but there's likely a lot more to it than that, as evidenced by both of these demos needing something in the middle of the screen to hide all the corrupted data.
@sour
I was sure that mesen meant "line of sight", from the Japanese 目線 since the icon shows a eye with lines. Maybe I guessed wrong then.
Why not a multilingual pun?
Banshaku wrote:
目線
Actually I remembered this was written verbatim in the original web site, but seems that it's not mentioned anymore.
I just didn't get it being NESEM spelt backwards until now, so I used to think it's a strange random name for an emulator.
(And that it makes sense now why its descendant is called MESEN-S, not SMESEN...)
Ah, mode 3 to 5... yikes. Do we need this tested on actual hardware, or...?
I may not have guessed and it just my memory playing tricks on me after reading the meaning a long time ago on the site then
Quote:
Ah, mode 3 to 5... yikes. Do we need this tested on actual hardware, or...?
It certainly wouldn't hurt to have, but I'd be very surprised if the right-hand image ends up jagged on a real TV.
If anyone can think of a good secondary test ROM to rule out what the underlying behavior is, I'll fix it. But again, I'd rather we wait for some PPU LA logs before spending too much time on these kinds of tests. It's a bit like the DSPs: people spent nearly a decade bit-perfecting those, and then LLE (with much respect and reverence to the past hard work) undid all that work in a week. The PPU is going to be the same.
Happy to do a recording of this on an actual SNES + put up a YT link of the result tomorrow.
Yay for 17+ hours of power outage (still ongoing..)
I fixed the SplitScreen test yesterday, was caused by a small issue when turning hires modes on or off in the middle of a scanline. As far as I can tell, it looks like the bsnes-plus screenshots you posted.
The hblank test is broken by my "fix" for megalomania, the sprites are loaded 1 ppu cycle after rendering is enabled. Like byuu said, fixing these kinds of things would require knowing the exact way the ppu loads/evaluates the sprites. I'm not too worried about this right now as I have a lot more pressing issues to fix :p
Megalomania, Allen vs predator and the krusty issues should be fixed, too.
Also finished the SPC debugger window, which is essentially the same as the regular debugger, but for the SPC. It supports stepping, watch expressions, breakpoints, etc.
Re: the name, it is both Nesem backwards and 目線, that's why the icon is an eye!
Finally was able to get some more fixes done (Thank you freezing rain for that sweet 43 hours straight of power outage!).
-F1 Grand Prix & Sink or Swim are both fixed (was caused by the same IRQ problem)
-Fixed V+H IRQs set for scanline 240, cycle 0 were never triggered (which was breaking Full Throttle)
-Improved the PPU timing logic to remove the 341th cycle hack and the like, which seems to have fixed the small graphical glitch in Daikaijuu, too.
Which I think leaves Circuit USA, Jumbo Ozaki & NHL 94 with known graphical bugs (mosaic issues aside). Then I still have to figure out why Illusion of Gaia and ActRaiser 2 refuse to boot (almost certain this is SPC-related).
Also added frame skipping whenever the speed is set higher than 150% (unless video footage is being recorded), which gave a ~100-150 FPS boost in most games, as far as I can tell, and should have no impact on emulation accuracy.
Both Circuit USA and Jumbo Ozaki rely on extremely accurate HDMA timing.
I see you implemented the early termination oddity with HDMA indirect transfers.
A fairly major issue in your current implementation is running all eight HDMA channels sequentially. You need to run through all eight channels and perform the HDMA transfers, and then go through all eight channels again to perform the HDMA indirect address fetches. Like so:
https://gitlab.com/higan/higan/blob/mas ... ma.cpp#L33Next, there are actually two parts to HDMA per-frame initialization. Regardless of whether any HDMA channels are enabled, you do this:
Code:
auto CPU::Channel::hdmaReset() -> void {
hdmaCompleted = false;
hdmaDoTransfer = false;
}
It takes no CPU time to complete (done in parallel.)
And then only if at least one HDMA channel is enabled:
Code:
auto CPU::hdmaSetup() -> void {
step(8); //we assume we've performed CPU->DMA sync already
for(auto& channel : channels) channel.hdmaSetup();
status.irqLock = true;
}
auto CPU::Channel::hdmaSetup() -> void {
hdmaDoTransfer = true; //note: needs hardware verification
if(!hdmaEnable) return;
dmaEnable = false; //HDMA will stop active DMA mid-transfer
hdmaAddress = sourceAddress;
lineCounter = 0;
hdmaReload();
}
Note the strange case of setting hdmaDoTransfer there. This was a weird edge case that ladida uncovered with HDMA.
That discussion, plus a test ROM, is here:
https://forums.nesdev.com/viewtopic.php?f=12&t=16330It's extremely odd behavior, I know. If you come up with a better theory, please do share.
Lastly, you should also implement the CPU->DMA->CPU sync timing instead of relying on 18 cycles being the average case. It's definitely annoying, and be careful about the edge case where HDMA triggers during DMA, that avoids the need for the DMA sync portion.
I'm fairly confident the above changes will fix those two bugs.
Quote:
Then I still have to figure out why Illusion of Gaia and ActRaiser 2 refuse to boot (almost certain this is SPC-related).
Check your implementation of PCALL/TCALL very closely, especially related to how they interact with the IPLROM enabled or disabled.
EDIT: well I looked it over, and it looks good as far as I can tell ... darn. I did find something interesting, though.
Code:
switch(addr) {
case 0xF0:
if(!CheckFlag(SpcFlags::DirectPage)) {
...
_state.WriteEnabled = value & 0x02;
_dsp->setEchoWriteEnabled(_state.WriteEnabled);
I'm not sure nocash's fullsnes is entirely accurate here, or based on real hardware testing. I've never heard any confirmation that the SMP TEST register flags affect the DSP. If that's true, then value & 0x04 (RAM disable) would completely cripple the DSP.
I suspect this is not the case. The DSP controls the RAM and clock, and gives the SMP interleaved access to it. It doesn't make sense for the DSP to be inspecting the state of SMP I/O registers for its own RAM accesses.
It really doesn't matter in practice since no games ever write to this register, of course.
This looks great! I couldn’t help myself from starting to muck around with a MacOS port...
The core library was easy enough to compile with a bleeding edge clang and some compiler flags (the dummy input manager drop-in dummy will need to actually do something, at some point). The big issue is the sorry state of Windows.Forms on MacOS; the current mono distribution only ships with a 32-bit Carbon version. So yeah. There are some signs of that being rectified “soon”. I’ll dig around a bit more.
byuu wrote:
A fairly major issue in your current implementation is running all eight HDMA channels sequentially.
Ah yea, I kept meaning to fix that, but forgot about it every time.
I've improved the start/end sync timing as well (wait until end of next cpu cycle, sync to multiple of 8, then sync back to a multiple of the CPU speed when stopping). It's probably still not quite perfect, though, but lo and behold, it actually fixed both Circuit USA and Jumbo Ozaki! (I actually went into this assuming that there was also probably more timing-related stuff that would need to be fixed for those games).
I also implemented the "DoTransfer" flag quirk, which makes that test you linked to work as expected.
FYI (just mentioning this since that other thread gave no firm answer on this), the "DoTransfer" flag does need to be reset to false during HDMA init, not doing so breaks both Ghouls 'n Ghosts and Aladdin.
Thanks for the help! I would never have even guessed that the games relied on near-perfect HDMA timing to work properly.
byuu wrote:
I'm not sure nocash's fullsnes is entirely accurate here, or based on real hardware testing
To be honest, I'm unsure why I left that part in. It was most likely part of my attempts to fix SPC issues while I was still rewriting it, though nocash's docs might have been the inspiration for this. Since you're saying the hardware most likely doesn't do this, I'll take it out.
Thanks for taking the time to check the SPC's code, btw. Currently have 4 games that seem to be having SPC-related issues: Illusion of Gaia, ActRaiser 2, Hiouden, Kishin Douji Zenki. Gaia/ActRaiser freeze during their opening sequences, Hiouden produces a screech and then freezes, Kishin doesn't boot at all. If I double the SPC's clock rate, they all boot and play sound normally, and changing the clock rate a little bit can cause some of them to get further or work normally.
Edit: Had forgotten about it, but Tales of Phantasia also has similar SPC-related issues.
Optiroc wrote:
I couldn’t help myself from starting to muck around with a MacOS port...
Yea, the whole Mono situation on macOS is a pity - wish it worked better. There was some related development done early last year, but I'm not sure if anything was actually improved as a result? At the moment the simplest way to use Mesen(-S) on macOS is probably via a VM (either Windows or Linux work, though Windows works better), at least if the main intent is to have access to the debugging tools.
I did manage to get a couple of builds (of Mesen) running in a macOS VM last year (the Linux build via Mono, and a modified Windows build via Wine), but both had issues (Wine had UI performance issues in the debugger tools, Mono just had all-around issues in the UI). If you do find a good way to get it running natively on macOS, please let me know!
Quote:
Ah yea, I kept meaning to fix that, but forgot about it every time.
No worries, it's regrettably something that wasn't known before anomie left, so it's not documented. I forgot which game that fixed, but it was definitely important for something.
Quote:
It's probably still not quite perfect, though, but lo and behold, it actually fixed both Circuit USA and Jumbo Ozaki!
We basically both solved these two games in opposite directions. I had CPU<>DMA sync first, and we found out about the indirect transfer short-circuit later. Once I had both in place, the games started working. Good to know both are required.
If I had all the time in the world, I'd love to know why these games break over such an infinitesimal difference in timing. Theoretically DMA<>CPU sync should average out to the same number of cycles anyway.
Quote:
the "DoTransfer" flag does need to be reset to false during HDMA init, not doing so breaks both Ghouls 'n Ghosts and Aladdin.
Ah, so that's still the case after adding this quirk. Good to know, thanks.
I'm still not 100% certain I have the rule right for when to set DoTransfer=true. It's quite the unfortunate behavior, because it misaligns your HDMA table reads, which is disastrous. But that test ROM clearly shows it's happening somehow, and I can't think of any other conditions that allow that test ROM to work properly.
Quote:
Since you're saying the hardware most likely doesn't do this, I'll take it out.
I can't prove it, and the SNES is often illogical. He could have verified this and just never told me, or I missed it. Mea culpa if that turns out to be the case.
But, with all respect to nocash ... he doesn't always clearly state what's verified on hardware versus what's speculation. He speculates often on unknowns, and some of it goes really far sometimes.
Still, the idea that the SMP is controlling the DSP's interaction to the RAM the DSP is directly connected to seems ... very unlikely.
Quote:
Thanks for taking the time to check the SPC's code, btw.
Sure, no problem. We could try producing trace logs. Log the cycle counts with each instruction, and see where our timings don't match up. I can patch my emulator to make such a trace log if you want to try that.
byuu wrote:
We could try producing trace logs. Log the cycle counts with each instruction, and see where our timings don't match up. I can patch my emulator to make such a trace log if you want to try that.
Sorry for the super late reply!
I actually spent a few days trying to figure this whole thing out, but haven't managed to quite yet. In the process I made a small test rom that runs a bunch of stuff (thousands of DMAs, HDMAs, IRQs, etc.) and checks the H/V counters at various points during the execution to try and catch any major timing issues in my code. I fixed a pretty large number of DMA-related timing issues and some other stuff (like the idle cycle that becomes a read cycle when an IRQ is pending, etc.). I've been getting results pretty close to the latest higan/bsnes (and much closer than the results I get in snes9x/no$sns), but it still hasn't fixed the SPC freeze issues, sadly. Also haven't managed to pass most of your irq/nmi/hdma timing tests either.
If you have a build that I could use to generate CPU+SPC trace logs, that would be helpful - I've been comparing the bsnes-plus', but I'd be much better to be able to compare w/ higan directly.
In the meantime, I ended up putting that problem aside for now and focused on adding some more debugger features (sprite viewer, labels/comments/CA65 integration, improvements to some of the existing tools, etc). Will probably keep working on some debugger tools & misc features for the next 2-3 weeks until I clear out the basic list of features I had written down when I started writing this whole thing, and after that I'll probably get back to trying to fix the remaining emulation issues.
No rush, I've moved onto Sega CD emulation lately, so I've kept busy meanwhile ^-^;;
Quote:
Also haven't managed to pass most of your irq/nmi/hdma timing tests either.
They are absolutely brutal tests. It took me months to get them working myself.
They are based around my creation of a function that, once run, it returns with Vcounter=0,Hclock=0,InterlaceField=0 and another function that steps by exactly the cycle count stored in A. You can see how with the two combined, I can make any operation occur at any clock cycle (at 21.47MHz/2). You'll want to make sure the first function works (if it does, so will the second) by looking at a tracer that also logs the H/V counters per instruction. If that doesn't work, most of my tests won't ever pass.
just know that they're not too important to 100% compatibility, and that I didn't release these tests for years because I didn't want them gamified. There's nothing more displeasing than people judging the value of a complex, 100K-line emulator based on whether it passes an arbitrary edge case test ROM.
I'll try and explain the NMI/IRQ behaviors I discovered when I have some time.
Quote:
If you have a build that I could use to generate CPU+SPC trace logs, that would be helpful
I have a built-in disassembler, and I compile a branch of higan/bsnes with them selectively enabled when I debug games.
Understood, I'll put together a trace logger version for you, please give me a few days.
Quote:
and after that I'll probably get back to trying to fix the remaining emulation issues.
Awesome, keep up the great work, and remember to take rests when you need them ^-^
Am I helping when I say this?...
Anyway:
DKC2's Copy protection screen triggers when you reset the game.
Proof is at the bottom.
Also is there a way to update the emulator? I keep getting an exception "System.Net.WebException".
With all that said, This is the best news I've heard. Sour making a sequel to the most advanced NES emulator? This could be legendary. With time and a lot of work, this could be on par with higan with better Debugging than No$sns!
I'm guessing you're using 0.1.0? Both issues should be fixed already (but there hasn't been any other "official" release yet).
You can grab the latest dev builds with the fixes here:
https://ci.appveyor.com/project/Sour/me ... /artifactsI've fixed a pretty decent number of emulation issues since 0.1.0 (though some still remain) and a lot has been added/improved w/ regards to the debugger tools, too.
Sorry for the long delay between responses.
Yes. I've looked for the latest version and saw the glitch was fixed. Sorry for the mix up.
topspoon wrote:
Kick Off (Europe) - main menu is glitched
Thanks for the report & solution! This is fixed.
Super Famista 5 (Japan)
1. For some reason, you're triggering a copy protection screen after Namco logo.
-- Haven't checked why. Never seen this until now.
2. Main menus are not OPT mode2 animated.
https://github.com/SourMesen/Mesen-S/bl ... u.cpp#L780bg3ofs = $3ff and 32x32 screen size @ 8x8 tile size. $3ff + $20 >= $400 so wrap around from $000. Then you'll see the special effect.
Or $7fe + $40 >= $800 for *2.
3. When you fix #2, you'll notice that column 1 is stuck and not moving. Visually to me, it looks like tile0 is treated as 16 width but I haven't looked into this.-- my error: didn't wrap at exactly 0x800 @ 8x8 ts, 32x32 nt.
edit: fixed mistakes
There's a layering demo that does OPT 2,4,6. I think it looks okay but there's lots of visual data to pixel check.
https://forums.nesdev.com/viewtopic.php?t=14467-- ppubusactivity_rev2_sfc.7z
A few more games glitch out (non-opt). I'll see if I can give some advice or more information.
Thanks a lot for the info! The offset-per-tile mode glitch in famista should be fixed.
I haven't been able to trigger any copy protection screen though - are you running the appveyor dev builds or something else?
Comparing the layering demo to bsnes/higan again, I noticed the colors on some of the layers were wrong - turns out direct color mode had a couple of off-by-1 errors for modes other than mode 7. This is fixed.
The shape of the wave for "mode 6 layer 1" at the bottom is quite different from bsnes, I'm assuming mine is incorrect, but the animation feels broken in bsnes (but maybe it also does this on hardware?)
Just FYI, the issues I am aware of at the moment:
-Air Strike Patrol & NHL 94 have graphical glitches - I'm fairly certain these are caused by the fact the current PPU implementation is fairly primitive and doesn't load the tiles ~8-16(?) ticks ahead of time like it should. This will probably need a fairly major overhaul of the PPU's code to fix properly.
-Mosaic is pretty broken (planning on fixing this after fixing the previous issue)
-These games freeze on power on, probably because of a timing issue between the SPC & CPU: Act Raiser 2, Illusion of Gaia, Hiouden - Mamono-tachi to no Chikai, Kishin Douji Zenki - Tenchi Meidou
-Tales of Phantasia has broken audio (probably same issue that is causing the games above to freeze)
Regarding copy protection, polite reminder: you both need to exchange checksums and filenames of the ROMs you're using. I cannot stress this enough. Many ROMs distributed from olden times were pre-patched to remove such protections (even ucon/ucon64 can sometimes do this automatically), so one of you may be using a completely different ROM or even different revision of dump from the other.
Mighty Morphin Power Rangers - The Fighting Edition (USA) = in-game battle arena is glitched
ex. start game -- trial mode -- pick fighter
Code:
00:8C90 sta $420b hc=1068 vc=223 fc=2167
00:8C93 ldx #$7e80 hc=1098 vc=223 fc=2167
*** error: supposed to run nmi routine first because vc=225 when dma finished
00:8C96 stx $2116 hc=506 vc=225 fc=2168
edit: Running custom build so that might be where I goofed. I get a warning screen after Namco logo, but on other emulators it goes straight to title intro. Look into this sometime.
The demo mode6 layer1 opt effect I was also wondering about but didn't follow up on yet.
edit2:
Super Famista 5 (Japan).sfc
MD5 = C9A6F2140490FBCBA02DDECFD5409EA5
SHA-1 = 276991C3B74E92FF13352025520AD504E644B4AA
It's probably my fault somewhere when compiling. Added a few modifications for own testing, debugging.
Sour wrote:
The shape of the wave for "mode 6 layer 1" at the bottom is quite different from bsnes, I'm assuming mine is incorrect, but the animation feels broken in bsnes (but maybe it also does this on hardware?)
The hardware has the same OpT behavior for all modes and all layers: it should look like "Mode 2 Layer 1" above.
... or at least that's what I think I remember seeing when I had the logic analyzer clips in there...
topspoon wrote:
*** error: supposed to run nmi routine first because vc=225 when dma finished
I might have misinterpreted anomie's docs or higan's code, then - or both.
anomie wrote:
If the CPU is halted (i.e. for DMA) while /NMI goes low, the NMI will trigger after the DMA completes (even if /NMI goes high again before the DMA completes). In this case, there is a 24-30 cycle delay between the end of DMA and the NMI handler, time enough for an instruction or two.
So I guess that 24-30 cycle delay is when the "ldx #$7e80" instruction is executed, and the NMI fires after that one, but before the write to $2116. I'll have to recheck what higan does on this and see if I can remember why I came up with the current delay I have.
You're using the same version of the game as me, so either one of your modifications is causing it, or maybe there's some uninitialized variable somewhere that is causing this to happen somewhat randomly.
lidnariq wrote:
The hardware has the same OpT behavior for all modes and all layers: it should look like "Mode 2 Layer 1" above.
Attachment:
comparison.png [ 74.44 KiB | Viewed 3138 times ]
This is what it looks like at the moment (bsnes-plus and higan 104 are identical). In bsnes, the text scrolls in sync with the mode 5 text, but the mode 6 text's wave effect is diminished (seems like the sine's period is doubled or something akin). In Mesen-S the text's shape is similar to mode 2 & 4, but the scroll speed is halved compared to bsnes (the mode 5 text goes around the screen twice for each time the mode 6 text does). Assuming the scroll speed has a direct impact on this - either mesen-s scrolls too slow or bsnes too fast?
Sour wrote:
Assuming the scroll speed has a direct impact on this - either mesen-s scrolls too slow or bsnes too fast?
Here's a picture from a 1-1-1 SNES:
Attachment:
mIMG_6760.JPG [ 69 KiB | Viewed 3119 times ]
Other than one mistake I guess I made ("Mode 0 Layer 1" through "Mode 4 Layer 1" are two screens wide, while "Mode 5 Layer 1" and "Mode 6 Layer 1" are only one screen wide ... because they're all 512 pixels wide), "Mode 6 Layer 1" is supposed to look just like "Mode 2 Layer 1".
Quote:
If the CPU is halted (i.e. for DMA) while /NMI goes low, the NMI will trigger after the DMA completes (even if /NMI goes high again before the DMA completes). In this case, there is a 24-30 cycle delay between the end of DMA and the NMI handler, time enough for an instruction or two.
The 24-30 cycle note is a rough ballpark of one CPU instruction. higan implements this with the "IRQ lock" flag. It's just one cycle, but since interrupts are only tested on the last work cycle (one bus cycle before opcode fetch), it effectively bumps things another instruction ahead.
Wild Guns relies on this behavior to not flicker like crazy in-game.
Quote:
Here's a picture from a 1-1-1 SNES:
Jesus. That is a very impressive tech demo to test this but, any chance you could please make that a little (okay, a lot) less noisy?
I have no idea what I'm supposed to be seeing or what's supposed to be happening ^-^;;
I don't know what's supposed to happen but if there's a bug in higan I'd like to fix it~
Managed to fix it - the offset-per-tile for mode 6 had a couple of issues regarding the scroll position, etc.
Attachment:
ppubusact_000.png [ 32.51 KiB | Viewed 3065 times ]
This seems to be pretty identical to your screenshot. Although now I noticed that the sprites on top of everything are completely different in your screenshot vs all emulators I've tested? Any ideas on why?
byuu wrote:
it effectively bumps things another instruction ahead.
According to anomie's docs, writing to $420B (DMA enable) allows the CPU to continue up to the point where it reads the next instruction's op code before DMA starts. So is it just that the partially-read instruction resumes after DMA and after that it jumps to NMI/IRQ if needed?
byuu wrote:
I don't know what's supposed to happen but if there's a bug in higan I'd like to fix it~
In bsnes-plus, Mode2 and Mode4 layers do the sine wave up and down. But Mode6 doesn't move like 2/4.
Warlock -- in-game status bar has a flicker.
Code:
2108 (BG2SC) / 210b (BG12NBA)
00:E9F5 LDA $E4 hc=976 vc=0 fc=20
00:E9F7 STA $2108 hc=1000 vc=0 fc=20
00:E9FA LDA $E5 hc=1030 vc=0 fc=20
00:E9FC STA $210B hc=1054 vc=0 fc=20
00:E9FF LDA $1835 hc=1084 vc=0 fc=20
1104 = h-dma
1112 = draw
00:EA02 DEC A hc=1116 vc=0 fc=20
00:EA03 STA $2110 hc=1130 vc=0 fc=20
00:EA06 STZ $2110 hc=1160 vc=0 fc=20
Game changes bg2 nametables mid-scanline during active draw. bsnes-plus executes same timing but does not pixel flicker.
Does real hardware lock nametable addresses at start of scanline during rendering? Wondering why bsnes-plus doesn't have the glitch.
Sour wrote:
Although now I noticed that the sprites on top of everything are completely different in your screenshot vs all emulators I've tested? Any ideas on why?
Because evidently I forgot I'd made a newer version of the same test locally, with the sprites moved so that they don't overlap.
topspoon wrote:
Does real hardware lock nametable addresses at start of scanline during rendering?
I think this might be the same issue as NHL 94 & Air Strike Patrol - if the PPU was loading tile data ~16 dots before rendering it, it would probably be just enough to avoid the glitch in Warlock, I think.
lidnariq wrote:
Because evidently I forgot I'd made a newer version of the same test locally, with the sprites moved so that they don't overlap.
Oh, I guess I can assume my implementation of this is more or less ok then. Thanks!
Sour wrote:
if the PPU was loading tile data ~16 dots before rendering it, it would probably be just enough to avoid the glitch in Warlock, I think.
It looks to be ~16 pixels worth of garbage.
Code:
00:E9E8 LDA #$00
00:E9EA INC A
00:E9EB CMP #$07
00:E9ED BCC $FB
00:E9EF LDA #$00
00:E9F1 LDA #$00
00:E9F3 LDA #$00
00:E9F5 LDA $E4 hc=976 vc=0 fc=20
00:E9F7 STA $2108 hc=1000 vc=0 fc=20
00:E9FA LDA $E5 hc=1030 vc=0 fc=20
00:E9FC STA $210B hc=1054 vc=0 fc=20
00:E9FF LDA $1835 hc=1084 vc=0 fc=20
Although I tried changing cmp #$07 to cmp #$01 and tested bsnes-plus again. No pixel garbage.
American Tail and Rendering Ranger R2 won't boot. I'll look into R2.
topspoon wrote:
Although I tried changing cmp #$07 to cmp #$01 and tested bsnes-plus again. No pixel garbage.
I think you might be using the "performance" build of bsnes-plus? In the accuracy build, the glitches do appear (and are bigger if you switch $07 to $01). I checked on higan, there is a tiny glitch on the right side, too (about 8 or so pixels) - so changing the tile load logic in Mesen-S to fetch the tiles ahead of time will probably give a result similar to what higan gives at the moment, I think.
topspoon wrote:
American Tail and Rendering Ranger R2 won't boot. I'll look into R2.
It looks like both of these are also related to the SPC vs CPU issues that I haven't been able to figure out. Running the SPC 20-40% faster than normal allows them to boot. I've really run out of ideas on this one - I've double-checked and triple-checked a bunch of timing related things, and for the most part it should be pretty close to higan's, but somehow these games refuse to boot. If you manage to find the source of the problem, that'd be amazing.
I took a look at American Tail specifically: the SPC initialization timing being slightly different causes the CPU to take more time to get the SPC init loops done, which causes the PPU to trigger an IRQ too early in the code, which crashes the game.
Rendering Ranger R2:
Code:
bsnes-plus
..ff2a mov $0f4,#$ff A:ff X:ff Y:00 SP:01ff YA:00ff nvpbhizC
-- spc700 writes ff to 2140
36812b cmp #$ff A:0b00 X:0000 Y:0bb6 S:01f2 D:0000 DB:36 nvMxdIZc V:104 H:1248 F:34
36812d bne $8128 [368128] A:0b00 X:0000 Y:0bb6 S:01f2 D:0000 DB:36 nvMxdIzc V:104 H:1264 F:34
368128 lda $2140 [362140] A:0b00 X:0000 Y:0bb6 S:01f2 D:0000 DB:36 nvMxdIzc V:104 H:1286 F:34
36812b cmp #$ff A:0b00 X:0000 Y:0bb6 S:01f2 D:0000 DB:36 nvMxdIZc V:104 H:1316 F:34
-- port 0: smp -> cpu still shows 0x00
-- 65816 reads 2140 = 0x00
..ff2d mov a,$0f4 A:ff X:ff Y:00 SP:01ff YA:00ff nvpbhizC
-- port 0: smp -> cpu now shows 0xff
36812d bne $8128 [368128] A:0b00 X:0000 Y:0bb6 S:01f2 D:0000 DB:36 nvMxdIzc V:104 H:1332 F:34
368128 lda $2140 [362140] A:0b00 X:0000 Y:0bb6 S:01f2 D:0000 DB:36 nvMxdIzc V:104 H:1354 F:34
36812b cmp #$ff A:0bff X:0000 Y:0bb6 S:01f2 D:0000 DB:36 NvMxdIzc V:105 H: 20 F:34
-- 65816 reads updated 2140 = 0xff (good!)
..ff2f mov $0f4,#$fe A:fe X:ff Y:00 SP:01ff YA:00fe NvpbhizC
-- note that A = 0xfe (control code)
Code:
Mesen-S
36:8123 lda #$fe A=0B00 hc=760 vc=103 fc=34
36:8125 sta $2140 A=0BFE hc=776 vc=103 fc=34
-- important error: spc700 never sees 0xfe control code
...FF26 nop A=FF hc=806 vc=103 fc=34
36:8128 lda $2140 A=0BFE hc=806 vc=103 fc=34
...FF27 nop A=FF hc=836 vc=103 fc=34
36:812B cmp #$ff A=0B00 hc=836 vc=103 fc=34
36:812D bne $8128 A=0B00 hc=852 vc=103 fc=34
36:8128 lda $2140 A=0B00 hc=874 vc=103 fc=34
...FF28 nop A=FF hc=904 vc=103 fc=34
36:812B cmp #$ff A=0B00 hc=904 vc=103 fc=34
36:812D bne $8128 A=0B00 hc=920 vc=103 fc=34
36:8128 lda $2140 A=0B00 hc=942 vc=103 fc=34
--- port 0: smp -> cpu still reads 0x00
...FF29 nop A=FF hc=972 vc=103 fc=34
...FF2A mov $0f4,#$ff A=FF hc=972 vc=103 fc=34
--- port 0: smp -> cpu now shows 0xff (too early!)
36:812B cmp #$ff A=0BFF hc=972 vc=103 fc=34
36:812D bne $8128 A=0BFF hc=988 vc=103 fc=34
36:812F lda #$00 A=0BFF hc=1004 vc=103 fc=34
36:8131 sta $2140 A=0B00 hc=1020 vc=103 fc=34
36:8134 lda #$6f A=0B00 hc=1050 vc=103 fc=34
36:8136 sta $2143 A=0B6F hc=1066 vc=103 fc=34
...FF2D mov a,$0f4 A=FF hc=1096 vc=103 fc=34
36:8139 cmp $2143 A=0B6F hc=1096 vc=103 fc=34
...FF2F mov $0f4,#$fe A=00 hc=1126 vc=103 fc=34
36:813C bne $8134 A=0B6F hc=1126 vc=103 fc=34
36:8134 lda #$6f A=0B6F hc=1148 vc=103 fc=34
36:8136 sta $2143 A=0B6F hc=1164 vc=103 fc=34
36:8139 cmp $2143 A=0B6F hc=1194 vc=103 fc=34
36:813C bne $8134 A=0B6F hc=1224 vc=103 fc=34
36:8134 lda #$6f A=0B6F hc=1246 vc=103 fc=34
36:8136 sta $2143 A=0B6F hc=1262 vc=103 fc=34
...FF32 cmp a,#$fe A=00 hc=1292 vc=103 fc=34
...FF34 beq $ff43 A=00 hc=1292 vc=103 fc=34
36:8139 cmp $2143 A=0B6F hc=1292 vc=103 fc=34
** error: should be FF43 (A = FE); spc700 never sees this value and breaks
...FF36 A=00 hc=1322 vc=103 fc=34
Sour wrote:
According to anomie's docs, writing to $420B (DMA enable) allows the CPU to continue up to the point where it reads the next instruction's op code before DMA starts. So is it just that the partially-read instruction resumes after DMA and after that it jumps to NMI/IRQ if needed?
I wonder the same thing now after reading.
1. cpu: sta $420b
2. cpu: 1 cycle delay = fetch next opcode from memory (6,8,12)
3. cpu: halt, bus locked
4. dma: reset = align to next dma cycle boundary (2,4,6,8)
5. dma: controller init = 8 cycles
6. dma: channel init = 8 cycles
7. dma: transfer = 8 cycles * x bytes
8. dma: repeat 6-7 for more channels
9. cpu: reset = align to next cpu cycle boundary (2,4,6)
10. cpu: un-halt, resume bus operation
11. cpu: start internal operations, memory operations, etc.
12. cpu: process pending nmi, irq before next opcode fetch
Axelay -- brightness + subscreen addition saturation problem
-- start new game, stage 1
See here:
https://github.com/snes9xgit/snes9x/issues/512
cgram 2122 writes should be latched -- on high (odd) address, it commits both data.
Thanks for the Rendering Ranger R2 analysis - it does look like the other games I've checked before. The SPC/CPU are somehow just slightly out of sync and its causing some of the transmission data to be lost in the process. There's definitely something that's causing the timing problem - just need to find what it is. Spent some time yesterday reviewing a lot of the CPU's code to make sure it was timed properly, will keep doing a bit more of that today.
Re: CGRAM, should be fixed now - I had completely overlooked that detail (I thought there was a test rom for this, but I think I might be confusing myself with the OAM test roms?). Thanks!
For the post-DMA irq/nmi timing, changing it to that does fix the bug in Power Rangers, but I'm not sure if it breaks something else, would need to retest some games.
For Axelay, I tried comparing with higan, but can't see a difference? Mesen-S doesn't look like the Snes9x screenshot shown in the issue you linked:
Attachment:
axelay.png [ 84.13 KiB | Viewed 3647 times ]
Am I looking at the wrong thing?
Quote:
According to anomie's docs, writing to $420B (DMA enable) allows the CPU to continue up to the point where it reads the next instruction's op code before DMA starts.
That's another thing that also happens. That's part of the DMA<>CPU synchronization.
This is a really complex topic to talk about ... anomie's docs describe it in detail. But yes, one more instruction *cycle* is executed before the DMA begins, and you need to record whether that cycle took 6, 8, or 12 clocks. The clock is then aligned to a multiple of 8, then the DMA transfer runs, then the clock is aligned to a multiple of that last, extra instruction cycle (8, 6, or 12.)
As you can no doubt imagine, emulating this as a state machine is nightmare fuel.
But what I was mentioning, IRQ lock, is a different phenomena. After (H)DMA, the next instruction cycle won't fire interrupts, which *can* be enough to let one more instruction execute (since, sta $420b writes to $420b as the last cycle.)
Sorry, I have to rely a lot on my imperfect memory, because I implemented this stuff ten or so years ago. I hope I'm doing it all justice, but it's definitely all there in my CPU code.
Quote:
Game changes bg2 nametables mid-scanline during active draw. bsnes-plus executes same timing but does not pixel flicker.
I'm afraid bsnes' PPU timings are not to be relied on. I have 15 years' worth of game bugfixes to get everything commercial games do just right (see eg the Axelay thing), but unfortunately, my test ROMs can't verify things that can't be read back (eg video output), so I don't know when the PPU actually latches settings for rendering.
It is, with no exaggeration, the last real frontier for SNES emulation. If we get someone with a logic analyzer to give us PPU fetch timings, then we can basically have our emulation be more precise and more faithful than the SNES Jr was.
There's about a dozen minor things I know aren't emulated, and probably several dozen I don't know about, but they're indeed very minor.
Quote:
Thanks for the Rendering Ranger R2 analysis - it does look like the other games I've checked before.
FWIW, Rendering Ranger R2 was a game that broke whenever I tried synchronizing the SMP to the CPU after every opcode, rather than after every cycle. It's very tightly timed code.
Sour wrote:
For Axelay, I tried comparing with higan, but can't see a difference? Mesen-S doesn't look like the Snes9x screenshot shown in the issue you linked
I have no idea what happened but it always looks correct now, using same test build when report created. :confused: I must have some really random glitchy stuff going on. :grumble:
byuu wrote:
then the clock is aligned to a multiple of that last, extra instruction cycle (8, 6, or 12.)
Ah. That's something I missed. Assumed it was aligned to the next cpu internal operation cycle.
byuu wrote:
But yes, one more instruction *cycle* is executed before the DMA begins
Something Sour and I was wondering is: that DMA delay of 1 cpu cycle is enough to fetch the next opcode from memory. But not enough to finish CPU internal operations?
So in psuedo-effect, it's..
- sta $420b
- do dma first
- then run (finish) next cpu instruction
- now handle pending interrupts
...?
I should check the code.
byuu wrote:
It is, with no exaggeration, the last real frontier for SNES emulation. If we get someone with a logic analyzer to give us PPU fetch timings,
Speaking of which...
If I have a logic analyzer with 16 clips, what signals do you want recorded? You've previously stated that it would start off with the 9 pins between the two PPUs (CHR0..CHR3, PRIO0..1, COLOR0..2 in jwdonal's redrawn schematic), but what after that?
(Also, what ROMs should be tested against?)
Quote:
I have no idea what happened but it always looks correct now, using same test build when report created. :confused: I must have some really random glitchy stuff going on. :grumble:
Whoever you are, I wish you were around when I was bug-hunting my SNES core, hahah. You're doing a thoroughly awesome job so far.
Quote:
now handle pending interrupts
Well ... interrupt triggering is tested on the start of the last work cycle, or the start of the first bus cycle of the next instruction. So to an emulator without explicit pipelining, it's one cycle before the end of the instruction. If you skip an IRQ test on that last cycle, then it effectively won't fire until the next whole instruction, even though it was only one cycle.
The one cycle does the whole deal, one bus cycle plus one work cycle, in full.
byuu wrote:
The one cycle does the whole deal, one bus cycle plus one work cycle, in full.
Oh.
Phew. That whole procedure could get messy. Glad I'm not an emulator author. I could imagine running some derpy code in ram (or maybe worse, running it in system registers).
Code:
sta $420b
dma_target:
lda $18 (dp = $2100)
Where we read code from vram to dma_target to test what the emu does (like pre-fetch). Or set destination to stack and do sta $420b : plp. Oh the pain if we add interrupts.
I didn't get why the DMA controller had that delay sync to 8 cycles. And then always runs everything per 8. Until I realized CLK line probably feeds it at crystal / 8. So it has to wait for every falling / rising edge to actually do something.
And when it's done, CPU CLK line has what.. a multiplexer upline somewhere that chooses 6,8,12? So that's forced to wait for the edge to come and then move on.
I have to test out an SPC idea in Mesen-S to see what happens.
Working solution:
Code:
uint8_t Spc::CpuReadRegister(uint16_t addr)
{
uint8_t val = _state.OutputReg[addr & 0x03];
Run();
return val;
}
Let cpu read old spc700 value.
ActRaiser 2, Rendering Ranger R2, Illusion of Gaia / Time, American Tail, Hiouden - Mamono-tachi to no Chikai, Tales of Phantasia = working.
Kishin Douji Zenki - Tenchi Meidou = broken
Ys III - Wanderers from Ys -- save game doesn't write to sram correctly
IIRC, this uses a custom mapper board. SRAM should be mapped to 70:8000-FFFF.
And I guess it's worth bringing up that there's two non-enhanced games that use the pseudo-512 blending mode:
- Jurassic Park (hud, dialog boxes)
- Bishoujo Senshi Sailor Moon S - Kondo wa Puzzle de Oshioki yo! (main menu)
You could cheat by merging every discreet left + right pair back down to 256.
topspoon wrote:
Working solution:
That's interesting - I don't think the hardware does this, though? But at least it shows that there isn't much missing for it to actually work properly.
RE: Ys III - Wanderers from Ys - thanks, I'll take a look!
I haven't checked the sailor moon game, but the jurassic park one should be displaying properly, I've tested it a few times. Is it displaying wrong on your end?
In other news, I spent the day creating tests that measure the cycles used for every single opcode (and losing my sanity in the process) by running each one 100 times and then calculating how many PPU dots have elapsed using the H/V counters:
Attachment:
test.png [ 182.01 KiB | Viewed 3514 times ]
I found a couple of issues along the way, but it matches bsnes-plus now. Sadly, it didn't fix any of the freezes. But at least now I can be confident that my code is timed properly.
Still need to compare with higan and see if I can merge the tests into a single rom (instead of 27 roms). Will post the roms/source to these sometime tomorrow, out of time for the day.
byuu wrote:
Whoever you are, I wish you were around when I was bug-hunting my SNES core, hahah. You're doing a thoroughly awesome job so far.
I absolutely second that! Thank you so much for the time you've spent testing and even giving me outright solutions to the issues you find!
Sour wrote:
I haven't checked the sailor moon game, but the jurassic park one should be displaying properly, I've tested it a few times. Is it displaying wrong on your end?
Does the frontend do the blending? Because I'm using an unofficial one. The raw picture itself looks great, just the TV effect is missing.
Sour wrote:
That's interesting - I don't think the hardware does this, though? But at least it shows that there isn't much missing for it to actually work properly.
Honestly I'm not sure. I just imagined it this way:
- Thread A = cpu. In 1 "sync CLK" pulse, it does "nop : lda $2140"
- Thread B = spc700. In 1 "sync CLK" pulse, it does "sta $f4" (A = B0)
- SMP -> CPU 0 = EA
Last cycle CPU reads from $2140
Last cycle SPC700 writes to $F4
I'm not so sure a CLK pulse can immediately update the CPU in port at the same exact time SPC700 writes out? Guessing SPC700 seems to have a slower bus than the CPU side.
So extra assuming it then takes 1 SPC700 CLK to trickle the port values in/out on bus. And that SPC700 CLK is the guardian that controls this sync relationship; slower chip controls electronic handshaking.
CPU will read stale value of $EA first. Then spin loop and get $B0.
If it's immediate update, that might be too tight. Like how DMA sync can wait 8 cycles, because that CLK edge is that far out, and won't operate on the 0 cycle CLK edge since logic gates aren't built to handle it.
All speculation above.
But Kishin Douji Zenki - Tenchi Meidou could spook everything. Have to get a log running.
edit:
While it might be possible for SPC to write and CPU to read same SMP port at exact same time, I expect you'll get an unstable, indeterminate murky value of old + new.
It would take another CPU read to get the true value.
edit2:
This would be an interesting replacement idea to test, instead of returning old value straight.
topspoon wrote:
Does the frontend do the blending? Because I'm using an unofficial one. The raw picture itself looks great, just the TV effect is missing.
The core supports blargg's NTSC filter - if you turn that on, the blending should look alright.
RE: SPC, without calling Run() before the read, you actually end up reading an old value that the SPC might have set hundreds of cycles ago. The SPC is only executed when the CPU reads/writes to it (or once at the end of each frame). If I change the code to run the SPC in sync with the CPU, reading the value before calling Run() no longer has an impact. Still, it might be a clue as to what is causing the actual problem here.
The CPU/SPC can read/write the same port at the same time - I think anomie's documentation said this usually returns the AND of both values (this isn't implemented in any way in my code, though)
--
I finished working on my CPU timing test (attached, with source) and the results from a sd2snes are available here:
https://www.youtube.com/watch?v=3myuKodnw_k (thanks to koitsu for recording this!)
The test goes through almost every single op code on the 65816, runs a small benchmark and displays a value representing the number of PPU dots it took to perform the test. The actual values are meaningless - the important part is that the results should match the hardware values (+/- 1, or sometimes +/- 10 when dram refresh gets in the way)
The rom goes through 54 separate screens (27 without fastrom, and then the same tests, with fastrom turned on), testing most op codes with various combinations of the X/M flags. It takes about 2 minutes to run. Might try to add a few more test cases into it eventually, but this is a pretty decent start - hopefully this is useful to someone else making a SNES core at some point!
I've found and fixed a few timing issues thanks to this, but it still hasn't been enough to fix the games that freeze. Will have to start looking at other stuff (DMA timing, IRQ timing, etc) to see if I can find more issues.
Sour wrote:
The core supports blargg's NTSC filter - if you turn that on, the blending should look alright.
Yup. That does it.
Sour wrote:
RE: SPC, without calling Run() before the read, you actually end up reading an old value that the SPC might have set hundreds of cycles ago. The SPC is only executed when the CPU reads/writes to it (or once at the end of each frame). If I change the code to run the SPC in sync with the CPU, reading the value before calling Run() no longer has an impact.
Oof.
Sour wrote:
The CPU/SPC can read/write the same port at the same time - I think anomie's documentation said this usually returns the AND of both values (this isn't implemented in any way in my code, though)
Oh cool. So it returns a possibly unstable value when it happens. This is worth looking into. Will do that first then.
Sour wrote:
The test goes through almost every single op code on the 65816, runs a small benchmark and displays a value representing the number of PPU dots it took to perform the test. The actual values are meaningless - the important part is that the results should match the hardware values (+/- 1, or sometimes +/- 10 when dram refresh gets in the way)
That's very neat!
edit:
old & new
Rendering Ranger R2, Hiouden = gets farther but hangs
ActRaiser 2, American Tail, Illusion of Gaia = okay
Sour wrote:
The CPU/SPC can read/write the same port at the same time - I think anomie's documentation said this usually returns the AND of both values (this isn't implemented in any way in my code, though)
It's OR:
anomie (timing.txt) wrote:
The SPC700 communicates with the S-CPU via 4 registers. Exact memory access
timings on these registers is not known, however it is possible that the 5A22
will be performing a read at the instant the SPC700 is performing a write. The
5A22 will then read the logical OR of the old and new values of the register.
Sour wrote:
I finished working on my CPU timing test (attached, with source)
I ran the test on my
units:
http://www.mediafire.com/folder/dg6s2he ... ng_test_v2
Quote:
Oh cool. So it returns a possibly unstable value when it happens. This is worth looking into. Will do that first then.
Many, many years ago (ten, probably), I wrote a test ROM to start trying to emulate this behavior.
I loaded it onto my SNES copier (Super UFO 8.3j), and it froze the system. The copier stopped working after that. Luckily I had a second one.
I am 99% sure this was just coincidental. But it completely scared me away from trying to test bus conflicts and hardware crashes.
If anyone's willing to give it a go though, this and the SNES CPU revision 1 DMA<>HDMA crash would be very good things to emulate. They're definitely problems homebrew devs could accidentally run into. (and in fact, Parallel Worlds did with the DMA crash.)
Quote:
The actual values are meaningless - the important part is that the results should match the hardware values (+/- 1, or sometimes +/- 10 when dram refresh gets in the way)
The big revolution for bsnes' timings was me writing two key functions that I use in all of my test_* ROMs:
First, write a function that you can call, and once it returns, Vcounter=0, and Hcounter=0 (not Hdot ... the actual counter.)
Second, write a function that when called, will consume N master clock (21mhz) cycles, where N is the value in the 16-bit A register.
With these two tests, you can get 100% stable results every time you run a timing test. You can write tests that write to PPU registers at exact moments, and then log the results, etc.
If you're not sure how to do it, some of my test ROMs that Evan rehosted on snescentral.com's homebrew page should have the source with them. But I went with a really gross brute-force code generating method for doing them. I'm sure you could do much better ^-^;
I'd highly recommend you make these two functions for your own tests. Use bsnes or Mesen tracelogs that store Vcounter/Hcounter per CPU instruction to ensure the functions are correct.
Once you have this, DRAM refresh is pretty stable. When you want to get into hopeless pedantry, the actual cycle where DRAM refresh fires each scanline varies between CPU revision 1 and 2, but it's a million times easier than the absolute depraved insanity that is the Sega Genesis' four asynchronous DRAM refresh behaviors ^-^;
Quote:
You could cheat by merging every discreet left + right pair back down to 256.
A better way to do it is to blend every pixel 50% with the *input* (not output) pixel before (or after) it. Here's my code for it (supports 24-bit and 30-bit color):
Code:
if(colorBleed) {
uint32 mask = depth == 30 ? 0x40100401 : 0x01010101;
//note: this isn't demanding enough for #pragma omp parallel for
//unlike with snes_ntsc or HQ2x, OpenMP will just make it slower.
for(uint y : range(height)) {
auto target = output + y * width;
for(uint x : range(width)) {
auto a = target[x];
auto b = target[x + (x != width - 1)];
target[x] = (a + b - ((a ^ b) & mask)) >> 1;
}
}
}
I believe the credit for this goes to either anomie or blargg. Definitely blargg's math blending trick in any case.
Pseudo-hires is used to blend two 256-width layers together, and hires is used to draw onw 512-width layer.
However, it is absolutely possible to draw a 512-width layer using pseudo-hires, or two 256-width layers using hires, if you interleave the tile data appropriately. Now make a demo where you switch between the two with a single button press and what you find is that the output looks 100% identical. That means there is light blurring (as a result of analog video) on a real SNES in both pseudo-hires and true hires mode. So, even though it makes true hires 512-width text (eg G.O.D., Marvelous, Rudra no Hihou, etc) look a bit worse ... it should be blended always.
It's perfectly fine to require blargg's snes_ntsc to simulate the hires blending, but if you want the lightest-weight filter that won't distort the image more than necessary to simulate the pseudo-hires translucency effects, the above code should get the job done.
Up to you of course, Sour ^-^;
creaothceann wrote:
I ran the test on my
unitsNice! Thanks for taking the time to record that. At first glance it seems like both NTSC recordings are essentially identical (+/- 1 dot on some values). I haven't compared with koitsu's numbers, though, but I'd assume they're about the same too. It looks like the performance gap with PAL is so tiny that it doesn't really alter the numbers much - maybe if I made each test 10x longer it might make it more obvious (might try doing that sometime)
byuu wrote:
If you're not sure how to do it, some of my test ROMs that Evan rehosted on snescentral.com's homebrew page should have the source with them. But I went with a really gross brute-force code generating method for doing them.
Oh, that's right - I do think I have the source for those. Had completely forgotten about them (and I'm actually unsure if they work as intended on Mesen-S at the moment) - will try to dig them up and see if I figure out how to use them next time I try to time some behavior.
For this particular test, I was mostly just trying to confirm all idle cycles and the like - as is, any missing/extra cycles causes the value to jump up/down by ~150, so it makes it pretty obvious to find any major implementation issues.
byuu wrote:
It's perfectly fine to require blargg's snes_ntsc to simulate the hires blending, but if you want the lightest-weight filter that won't distort the image more than necessary to simulate the pseudo-hires translucency effects, the above code should get the job done.
I hadn't even really considered the whole blending & transparency side of things, actually. Definitely might be worth adding another video filter that does just a simple blending like the one you just posted - I'll add it to my list, thanks!
---
I've fixed up the SRAM mappings for Wanderers (based on the info byuu had posted a few pages back) - so it should be working properly now (and hopefully nothing else broke in the process!)
I gave fixing the whole DMA/IRQ timing issue a try, too. Wrote a pretty simple test suite to validate a few scenarios on hardware. Waiting on the hardware results, but for now I'm assuming higan is correct and my implementation gives the same result as higan for the test. It also fixes the power rangers game, too.
Ranger R2 behavior
Code:
Cpu 1:
36:852d cmp $2143 (in = $43)
spc-pre hc=1072 vc=170 fc=336 || cycle=11472160 target=11472158
36:852D A=0053 hc=1072 vc=170 fc=336
spc-post hc=1102 vc=170 fc=336 || cycle=11472160 target=11472160
Cpu 2:
36:8530 bne $852d
...095d mov a,#$53
spc-pre hc=1102 vc=170 fc=336 || cycle=11472160 target=11472160
36:8530 A=0053 hc=1102 vc=170 fc=336
spc-post hc=1124 vc=170 fc=336 || cycle=11472160 target=11472163
...095D A=06 hc=1124 vc=170 fc=336 || cycle=11472160 target=11472163
Cpu 3:
36:852d cmp $2143 (in = $43 --> $53)
...095f mov $0f7,a
spc-pre hc=1124 vc=170 fc=336 || cycle=11472164 target=11472163
36:852D A=0053 hc=1124 vc=170 fc=336
...095F A=53 hc=1154 vc=170 fc=336 || cycle=11472164 target=11472165
spc-post hc=1154 vc=170 fc=336 || cycle=11472172 target=11472165
** important note: SPC cycle 11472172 is when $0f7 is "officially" committed and done (8 cycle SPC instruction)
Reading any time between 11472164 - 11472171 can be considered stale or unstable
Cpu 4:
36:8530 bne $852d
spc-pre hc=1154 vc=170 fc=336 || cycle=11472172 target=11472165
36:8530 A=0053 hc=1154 vc=170 fc=336
spc-post hc=1170 vc=170 fc=336 || cycle=11472172 target=11472167
** error: now we're exiting spin loop too early
Cpu 5:
36:8532 lda #$59 (error: we shouldn't be here yet)
spc-pre hc=1170 vc=170 fc=336 || cycle=11472172 target=11472167
36:8532 A=0053 hc=1170 vc=170 fc=336
spc-post hc=1186 vc=170 fc=336 || cycle=11472172 target=11472168
Cpu 6:
36:8534 sta $2143 (error: this is bad. spc will not see old $0f7 = $53 value)
spc-pre hc=1186 vc=170 fc=336 || cycle=11472172 target=11472168
36:8534 A=0059 hc=1186 vc=170 fc=336
spc-post hc=1216 vc=170 fc=336 || cycle=11472172 target=11472171
*** note: we could also do something similar for spc in-port.
spc timing port update.
Cpu 7:
36:8537 cmp $2143
...0961 cmp a,$0f7 ($0f7 = $59, bad bad)
spc-pre hc=1216 vc=170 fc=336 || cycle=11472172 target=11472171
36:8537 A=0059 hc=1216 vc=170 fc=336
...0961 A=53 hc=1246 vc=170 fc=336 || cycle=11472172 target=11472174
spc-post hc=1246 vc=170 fc=336 || cycle=11472178 target=11472174
** note: 11472172 SPC cycle reached. Should be safe to update SMP -> CPU port.
But it's way too late now.
Cpu 8:
36:853a bne $8537 (start spin loop)
spc-pre hc=1246 vc=170 fc=336 || cycle=11472178 target=11472174
36:853A A=0059 hc=1246 vc=170 fc=336
spc-post hc=1268 vc=170 fc=336 || cycle=11472178 target=11472176
Cpu 9:
36:8537 cmp $2143
...0963 bne $0961
spc-pre hc=1268 vc=170 fc=336 || cycle=11472178 target=11472176
36:8537 A=0059 hc=1268 vc=170 fc=336
...0963 A=53 hc=1298 vc=170 fc=336 || cycle=11472178 target=11472179
spc-post hc=1298 vc=170 fc=336 || cycle=11472186 target=11472179
Cpu x:
We're fatally deadlocked. The End.
spc-pre hc=1298 vc=170 fc=336 || cycle=11472186 target=11472179
36:853A A=0059 hc=1298 vc=170 fc=336
spc-post hc=1320 vc=170 fc=336 || cycle=11472186 target=11472181
spc-pre hc=1320 vc=170 fc=336 || cycle=11472186 target=11472181
36:8537 A=0059 hc=1320 vc=170 fc=336
spc-post hc=1350 vc=170 fc=336 || cycle=11472186 target=11472184
spc-pre hc=1350 vc=170 fc=336 || cycle=11472186 target=11472184
36:853A A=0059 hc=1350 vc=170 fc=336
spc-post hc=8 vc=171 fc=336 || cycle=11472186 target=11472186
spc-pre hc=8 vc=171 fc=336 || cycle=11472186 target=11472186
36:8537 A=0059 hc=8 vc=171 fc=336
...0961 A=53 hc=38 vc=171 fc=336 || cycle=11472186 target=11472189
spc-post hc=38 vc=171 fc=336 || cycle=11472192 target=11472189
spc-pre hc=38 vc=171 fc=336 || cycle=11472192 target=11472189
36:853A A=0059 hc=38 vc=171 fc=336
spc-post hc=60 vc=171 fc=336 || cycle=11472192 target=11472191
spc-pre hc=60 vc=171 fc=336 || cycle=11472192 target=11472191
36:8537 A=0059 hc=60 vc=171 fc=336
...0963 A=53 hc=90 vc=171 fc=336 || cycle=11472192 target=11472194
spc-post hc=90 vc=171 fc=336 || cycle=11472200 target=11472194
Sour wrote:
I gave fixing the whole DMA/IRQ timing issue a try, too. Wrote a pretty simple test suite to validate a few scenarios on hardware. Waiting on the hardware results, but for now I'm assuming higan is correct and my implementation gives the same result as higan for the test. It also fixes the power rangers game, too.
You've definitely got the right idea to always confirm on real hardware.
I don't have an actual solid explanation for things like the IRQ lock, and even though I understand the DMA->CPU sync, it makes no sense to me why the CPU would need that to occur. The CPU->DMA sync, sure, I get that part.
I think I made a list of all my known-unknowns already. But it's very possible if you go poking about in the shadier bits, you'll find something I don't emulate or get wrong, so always be suspicious. I'd hate for an error on my part to get repeated everywhere until it becomes 'canon' ^_^
If you or anyone else does find anything I fail, please please let me know.
...
Quote:
Ranger R2 behavior
Try to match the cycle timings to bsnes' trace logs. That should very quickly reveal any timing misalignments.
The big one is going to be DMA<>CPU sync. If Sour has that, you should get a perfect match by ensuring both emulators use the same CPU<>SMP clock frequencies. Even if they don't *actually* run in sync, as long as they sync up before reads/writes to $2140-7f / $fx, it should be fine.
Mesen-S may need the bus hold delays to 100% match: I loosely simualte the analog nature of the bus holding addresses/data for 6-12 cycles by performing the reads 4 cycles before the end of the actual read (so 2,4,8 cycles in) to simulate the observable difference between latching the PPU counters via $2137 reads and $4201 writes.
byuu wrote:
I don't have an actual solid explanation for things like the IRQ lock
My implementation is simple at the moment, after a write to $420B, the next cycle reads the next opcode, then the DMA takes over, once the DMA is finished, that instruction resumes as normal. At the very least, it gives the same result as higan for the test I wrote (and the hardware result is identical, too). I'm assuming there is something else in the way I am keeping track of the IRQ flags that makes it so I don't need to do any additional processing for this (or maybe my current implementation is not 100% accurate, either).
byuu wrote:
Try to match the cycle timings to bsnes' trace logs. That should very quickly reveal any timing misalignments.
Speaking of which, do you have a build of bsnes/higan I could use to produce trace logs? bsnes-plus is different enough that I would rather be able to compare with higan itself. If I could just somehow make them sync perfectly for the first few hundred thousand operations, then I might be able to figure out what the remaining issues are.
The DMA/HDMA sync is one of the last few things I can see as the cause for the SPC issues, at the moment, but I've already spent a lot of time trying to get it right a couple of months ago, so unsure how much still has to be fixed there. I also made a test rom a couple of months ago that tries to catch errors in dma/hdma/irq timings (by running a bunch of DMAs and IRQs, and running some code during several frames filled with HDMA, etc.):
Attachment:
compare.png [ 161.69 KiB | Viewed 2937 times ]
That's what the results look like atm (right side is koitsu's SNES with a sd2snes). It's not really a conclusive test on anything, but it did help me figure out a number of issues in my DMA logic so far, and it looks like there's something odd about the HDMA timing in mesen-s, but at the same time, I don't think any of the games that freeze really use HDMA before freezing, so...
Quote:
I loosely simualte the analog nature of the bus holding addresses/data for 6-12 cycles by performing the reads 4 cycles before the end of the actual read (so 2,4,8 cycles in) to simulate the observable difference between latching the PPU counters via $2137 reads and $4201 writes.
Well, that's something I didn't expect...
topspoon wrote:
Ranger R2 behavior
Thanks - I'll try comparing some trace logs on my end too over the next few days and see if I can figure out the differences between higan and Mesen-S. There has to be one or two relatively major things that might explain why so many games freeze up at boot.
To me, the main kicker is when bsnes-plus updates the 2143 cpu reads:
Code:
..095f mov $0f7,a A:53 X:06 Y:0c SP:01fb YA:0c53 nvpbhizc
368530 bne $852d [36852d] A:0253 X:0001 Y:0005 S:01f1 D:0000 DB:36 nvMXdIzC V: 90 H: 748 F:59
36852d cmp $2143 [362143] A:0253 X:0001 Y:0005 S:01f1 D:0000 DB:36 nvMXdIzC V: 90 H: 770 F:59
368530 bne $852d [36852d] A:0253 X:0001 Y:0005 S:01f1 D:0000 DB:36 nvMXdIzC V: 90 H: 800 F:59
..0961 cmp a,$0f7 A:53 X:06 Y:0c SP:01fb YA:0c53 nvpbhizc
36852d cmp $2143 [362143] A:0253 X:0001 Y:0005 S:01f1 D:0000 DB:36 nvMXdIzC V: 90 H: 822 F:59
368530 bne $852d [36852d] A:0253 X:0001 Y:0005 S:01f1 D:0000 DB:36 nvMXdIZC V: 90 H: 852 F:59
368532 lda #$59 A:0253 X:0001 Y:0005 S:01f1 D:0000 DB:36 nvMXdIZC V: 90 H: 868 F:59
..0963 bne $0961 A:53 X:06 Y:0c SP:01fb YA:0c53 nvpbhiZC
Notice how spc writes to $f7. Then cpu side still spins a few times afterward. Because it's still reading value $43 from $2143. After 0961 runs, $2143 is updated and cpu can exit.
If you can print the low-level timing logs from bsnes of what happens each cycle in that code block, you'll see exactly when $2143 is updated with the new value $53.
Which is definitely not immediate but enough master cycles away from when "mov $0f7,a" actually starts. By the time "cmp a,$0f7" executes, $2143 is guaranteed to return a clean stable value.
I can't explain this better. Please try getting the low gritty raw timing #s out of bsnes. 1 master cycle at a time. Then you'll see what I'm trying to point out in my previous log.
Note that Mesen-S basically does this:
Code:
mov $0f7,a
cmp $2143
bne $852d
lda #$59
sta $2143
See how emu updates $2143 way too early? CPU doesn't even spin around at all!
You'll need to record a cycle timestamp of when $2143 needs to refresh its new value. And return some stale or mutated value before then.
That's the best of my ability to describe what I'm seeing. Sorry. But I'm confident you'll get a solution put into Mesen-S soon.
edit:
I know there's some Japan games with unusual mappings. Snes9x suggests these but I don't remember the physical names.
Code:
if (strncmp(ROMName, "SOUND NOVEL-TCOOL", 17) == 0 ||
strncmp(ROMName, "DERBY STALLION 96", 17) == 0)
Map_ROM24MBSLoROMMap();
else
if (strncmp(ROMName, "THOROUGHBRED BREEDER3", 21) == 0 ||
strncmp(ROMName, "RPG-TCOOL 2", 11) == 0)
Map_SRAM512KLoROMMap();
I know others will eventually bring up the multi-carts and other fun stuff in the no$cash archive.
If I see anything else that interests me, I'll chime in. Probably not since the details are getting too low-level technical.
Quote:
Speaking of which, do you have a build of bsnes/higan I could use to produce trace logs?
Ah yes, I still need to make that for you ... can you compile from source? If so, it'll save me some effort making a Windows binary on my other dev box. I'll use bsnes v107, which is identical to higan once you disable all the speed hacks (eg scanline-based renderer.)
Quote:
I've already spent a lot of time trying to get it right a couple of months ago
It'll be essential to match my timings exactly, so I guess we'll find out soon ^-^;
Quote:
Well, that's something I didn't expect...
When my hands feel better, I'm going to write an article about this. But here's the basics:
Say the SNES CPU executes an instruction, and one instruction reads from $7e2000. What actually happens is the CPU sets the 24 address bus pins to $7e2000, and then sets the /RD (read) pin. Since $7e2000 is normal speed ("SlowROM"), it then waits 8 master clocks. Anything on the bus will be watching the address pins, seeing the /RD signal, and if they choose to, they can drive the eight data bus pins. The SNES WRAM chip is what responds in this case (technically the CPU probably does it, but I digress.) After the eight cycles, the CPU stops driving /RD, and then it samples the current data bus pins for the resultant value.
Writes are similar: it sets the address bus, sets the data bus, sets the /WR pin, and then waits eight cycles. Any chip that cares will have eight master cycles to copy the data bus value. And things do: the S-DD1 watches $43xx DMA register writes to do its magic.
The key here is: the CPU doesn't read or write exactly after the 6, 8, or 12 cycles to whatever address is pointed at.
Things come into play like propagation delays, waiting for bus values to stabilize, etc.
If we want to get even more pedantic, the Game Boy and Z80 are known for having behaviors that occur on either the rising or falling edge of clock cycles. bsnes has some careful ordering of operations (DMA, ALU, etc) that is likely hiding what is in reality differences between rising and falling edges (or perhaps just 1hz differences ... internally bsnes steps its CPU at 2hz at a time over the 21.47MHz master clock.)
This pedantry comes into play in many cases:
For the SNES, $2137 reads latch the counters four clocks before $4201 writes. If you just emulate the read/write at the end of the 6, 8, 12 clock cycles, the counters would latch the same value, and that would be incorrect.
For the Game Boy, writes to the wave RAM trigger take effect two clock cycles after writes to wave RAM. If you don't emulate that, you will have to do crazy latching to pass blargg's dmg_sound tests.
For the Mega Drive, lots of chips share the same bus and assert "acknowledge" and "refresh" pins (eg /DTACK, /RAS) that cause the CPUs to pause until those pins stop being driven.
Of course, an operation doesn't have to 'complete' by the end of a write cycle, it only has to latch a copy of what's on the data bus by that point, lest the data bus change soon after. So it could be that a read from $2137 happens after 6 cycles, but a write to $4201 latches counters after 10 cycles.
How do we know the difference? We don't! And short of some hardware god analyzing chips decaps, we can't!
All we can really do is emulate these differences as we find ways to observe them. I suspect if and when get get really deep into PPU cycle timings, we're going to find that my current model of reads at cycle_count-4, writes at cycle_count-0 is too simplistic.
But for now, with the knowledge I have, I base all of my timings as stated above. The real hell though is, change those values and all of my carefully constructed timing values break. Perform reads at cycle_count-2, and now when the PPU sets the Vblank and Hblank bits are off by two clocks, etc.
I have no real solution for this mess right now.
Quote:
You'll need to record a cycle timestamp of when $2143 needs to refresh its new value. And return some stale or mutated value before then.
As I stated earlier, Rendering Ranger R2 is one of many games that absolutely requires cycle-level synchronization of the SMP to the CPU. If the CPU is calling the SMP, and the SMP is executing entire instructions before returning, then it's not going to work. It will work in much less accurate emulators by futzing with the timings (as in ZSNES and older Snes9X versions), but it will break when you get more cycle accurate timings (as in bsnes and Mesen-S. I also wrote a cycle-accurate SMP core that Snes9X uses now.)
byuu wrote:
As I stated earlier, Rendering Ranger R2 is one of many games that absolutely requires cycle-level synchronization of the SMP to the CPU. If the CPU is calling the SMP, and the SMP is executing entire instructions before returning, then it's not going to work.
And you sir, were completely right! I'm amazed that something like this has a major impact, but it actually does, heh.
Spent the morning trying to come up with a timestamp solution - I thought it worked, all the games were fixed. ..Except it broke a ton of other games. So I scrapped that, and figured I'd just give in and rework the code to be able to run a single SPC cycle at a time. And lo and behold, it works! Just doing that fixes Hiouden, Illusion of Gaia, ActRaiser 2 and Tales of Phantasia.
Rendering Ranger R2 boots, but freezes right after the first screen. Kishin Douji Zenki doesn't boot at all. However, overclocking the SPC by just ~2% makes both of these work, so I'm guessing there might be some small details left that need to be fixed to fix these.
With this (mostly) fixed, I can finally start focusing on reworking the PPU code to fix the few games that rely on the tile prefetch and fix the mosaic effect, too.
byuu wrote:
Ah yes, I still need to make that for you ... can you compile from source?
Depends what the requirements are, really, but I can probably manage - if you have a modified source with built-in tracing features that'd be great (or is this just part of the regular code, behind a compiler switch?)
byuu wrote:
For the SNES, $2137 reads latch the counters four clocks before $4201 writes. If you just emulate the read/write at the end of the 6, 8, 12 clock cycles, the counters would latch the same value, and that would be incorrect.
Didn't expect the timing of the read/write within the cpu cycle to be different. Though I can't quite recall if the NES is the same or not.
topspoon wrote:
If I see anything else that interests me, I'll chime in. Probably not since the details are getting too low-level technical.
Thank you so much for all your help until now! If you find anything else, please let me know and I'll take a look!
This could be wrong, but I tried this:
Code:
static uint8_t prev[4];
uint8_t Spc::CpuReadRegister(uint16_t addr)
{
Run();
return prev[addr & 0x03] | _state.OutputReg[addr & 0x03];
}
void Spc::Run()
{
uint64_t targetCycle = (uint64_t)(_memoryManager->GetMasterClock() * _clockRatio);
while(_state.Cycle < targetCycle) {
if(_opStep == SpcOpStep::ReadOpCode) {
_opCode = GetOpCode();
_opStep = SpcOpStep::Addressing;
_opSubStep = 0;
} else {
for(int lcv=0; lcv < 4; lcv++) prev[lcv] = _state.OutputReg[lcv];
Exec();
}
}
}
and Rendering Ranger R2 + American Tail also works. But no Kishin Douji Zenki - Tenchi Meidou.
Above hack introduces bad luck?
Sour wrote:
Rendering Ranger R2 boots, but freezes right after the first screen. Kishin Douji Zenki doesn't boot at all. However, overclocking the SPC by just ~2% makes both of these work, so I'm guessing there might be some small details left that need to be fixed to fix these.
Did you have it at 32,000 Hz before, or 32,040?
Code says:
Code:
_clockRatio = (double)2048000 / _console->GetMasterClockRate();
_masterClockRate = _region == ConsoleRegion::Pal ? 21281370 : 21477270;
Don't know how to convert to audio rate. :laughing:
2,048,000 / 64 is 32,000. The audio sample rate was determined to be ~32,040 though:
viewtopic.php?f=12&t=17644http://helmet.kafuka.org/byuubackup2/vi ... =1455.html
creaothceann wrote:
Did you have it at 32,000 Hz before, or 32,040?
Either value behaves the same, unfortunately.
Also, anomie's docs claim his SPC ran at 1026900Hz = 32090Hz, so it looks like the range on the actual clock speed is pretty wide. I'm fine with changing it to 32040Hz, though, if that's what's been determined to work the best? But was just a bit curious.
klurey wrote:
This could be wrong, but I tried this:
I'm not sure myself - does anybody (byuu?) know if the OR behavior on simultaneous read+write from the SPC+CPU is something that some licensed games are known to rely on?
Sour wrote:
Also, anomie's docs claim his SPC ran at 1026900Hz = 32090Hz, so it looks like the range on the actual clock speed is pretty wide.
Ceramic resonators, like used with the SPC700, just don't keep time very precisely. We should expect to find SPC700s running at ±0.5% of the nominal rate, and it should drift over time. +90Hz is only 0.3%...
For maximal programmer/ROM hacker adversity, I'd say it should pick a random rate ±160Hz of nominal on emulation start, and should slowly drift over ±32Hz of that while it's running
Quote:
I'm fine with changing it to 32040Hz, though, if that's what's been determined to work the best? But was just a bit curious.
I found that d4s' Breath of Fire II (German) fan translation's streaming HDMA audio engine was extremely timing sensitive, and didn't work well at 32000hz. qwertymodo reverted Snes9X from 32040hz to 32000hz, and it ended up breaking a few titles and had to be reverted back to 32040hz.
It's an annoying wart that only applies to systems with more than one oscillator. But most tests come back at 32040hz, so I went with that rather than the canonical clock rate. I think we've confirmed the quartz CPU NTSC clock rate is good, and judging by that, the CPU PAL clock rate is likely good as well (eg not exact, but nowhere near as fluctuating as the ceramic SMP clock.)
As it stands now, with frequencies that not only vary per console, but depending on their temperature (they change at runtime, by the way), that makes creating any kind of reliable test for behaviors like ORing during simulatneous accesses near impossible.
Something I'm hoping for is to get a modded SNES console that replaces the two oscillators with a single, faster oscillator and then uses dividers to get approximately the original SNES rates.
I would also probably have to bring my USART board out of retirement, because my two 21fx units aren't designed to handle multiple insertion cycles to move them between consoles.
Quote:
I'm not sure myself - does anybody (byuu?) know if the OR behavior on simultaneous read+write from the SPC+CPU is something that some licensed games are known to rely on?
I don't emulate it and at this time, and I have 100% (known) compatibility.
I would still like to emulate it anyway. As well as (if it turns out to be true) the errata mentioned around performing 16-bit writes to the ports.
Quote:
For maximal programmer/ROM hacker adversity, I'd say it should pick a random rate ±160Hz of nominal on emulation start, and should slowly drift over ±32Hz of that while it's running :P
I'm seriously running out of headroom, but that's the kind of pedantry I'd love to emulate if computers were faster :c
byuu wrote:
Something I'm hoping for is to get a modded SNES console that replaces the two oscillators with a single, faster oscillator and then uses dividers to get approximately the original SNES rates.
How hard would it be to build an 8:7 PLL to turn the 21.47 MHz master clock into 24.55 MHz to give 31960.2 Hz output?
Silicon Labs makes a part (Si5351) that will divide your choice of UHF oscillator frequency (chosen in the 600-900MHz range, and phase-locked to an external reference clock) by almost any rational number. It even has some amount of nonvolatile storage so you don't need anything to configure it at runtime. Getting the two nominal frequencies for the SNES would be straightforward.
byuu wrote:
I found that d4s' Breath of Fire II (German) fan translation's streaming HDMA audio engine was extremely timing sensitive, and didn't work well at 32000hz.
If it's similar to the one in N-Warp Daisakusen, he just dumps the contents of the four ports to the stack as quickly as possible and then twiddles his thumbs to make up 66 cycles. Which is a bit long even for PAL. I can see why a low value for the SMP speed could cause issues, depending on how soon after the first write he starts reading, and on how long the bursts are.
tepples wrote:
How hard would it be to build an 8:7 PLL to turn the 21.47 MHz master clock into 24.55 MHz to give 31960.2 Hz output?
The hard part for me would be performing the soldering work.
It's possible Kishin Douji Zenki - Tenchi Meidou is getting stuck because it keeps writing CPU 2141 and resetting SPC F1 at around same time? Forever loop on both sides.
https://wiki.superfamicom.org/spc700-referenceQuote:
There will probably be some conflict if the snes writes data at the same time the SPC initiates a port clear function.
Some emulators don't emulate this, BSNES does though.
Sour wrote:
behavior on simultaneous read+write from the SPC+CPU
When cpu reads / spc writes to same port simultaneously, bsnes keeps returning old value.
Code:
da03dc bra $03d9 [da03d9] A:32 H:1200
da03d9 lda $2140 [002140] A:32 H:1218
da03dc bra $03d9 [da03d9] A:32 H:1242
da03d9 lda $2140 [002140] A:32 H:1260
..096c mov $0f4,a A:33
da03dc bra $03d9 [da03d9] A:32 H:1284
da03d9 lda $2140 [002140] A:32 H:1302
da03dc bra $03d9 [da03d9] A:32 H:1326
da03d9 lda $2140 [002140] A:32 H:1344
..096e inc a A:33
da03dc bra $03d9 [da03d9] A:33 H: 4
da03d9 lda $2140 [002140] A:33 H: 22
..096f bra $096c A:34
da03dc bra $03d9 [da03d9] A:33 H: 46
da03d9 lda $2140 [002140] A:33 H: 64
da03dc bra $03d9 [da03d9] A:33 H: 88
da03d9 lda $2140 [002140] A:33 H: 106
..096c mov $0f4,a A:34
da03dc bra $03d9 [da03d9] A:33 H: 130
da03d9 lda $2140 [002140] A:33 H: 148
da03dc bra $03d9 [da03d9] A:33 H: 172
da03d9 lda $2140 [002140] A:33 H: 190
..096e inc a A:34
da03dc bra $03d9 [da03d9] A:34 H: 214
da03d9 lda $2140 [002140] A:34 H: 232
..096f bra $096c A:35
da03dc bra $03d9 [da03d9] A:34 H: 256
da03d9 lda $2140 [002140] A:34 H: 274
da03dc bra $03d9 [da03d9] A:34 H: 298
da03d9 lda $2140 [002140] A:34 H: 316
..096c mov $0f4,a A:35
da03dc bra $03d9 [da03d9] A:34 H: 340
da03d9 lda $2140 [002140] A:34 H: 358
da03dc bra $03d9 [da03d9] A:34 H: 382
da03d9 lda $2140 [002140] A:34 H: 400
..096e inc a A:35
da03dc bra $03d9 [da03d9] A:35 H: 424
da03d9 lda $2140 [002140] A:35 H: 442
..096f bra $096c A:36
da03dc bra $03d9 [da03d9] A:35 H: 466
da03d9 lda $2140 [002140] A:35 H: 484
da03dc bra $03d9 [da03d9] A:35 H: 508
da03d9 lda $2140 [002140] A:35 H: 526
..096c mov $0f4,a A:36
da03dc bra $03d9 [da03d9] A:35 H: 590
da03d9 lda $2140 [002140] A:35 H: 608
..096e inc a A:36
da03dc bra $03d9 [da03d9] A:36 H: 632
da03d9 lda $2140 [002140] A:36 H: 650
..096f bra $096c A:37
da03dc bra $03d9 [da03d9] A:36 H: 674
da03d9 lda $2140 [002140] A:36 H: 692
da03dc bra $03d9 [da03d9] A:36 H: 716
da03d9 lda $2140 [002140] A:36 H: 734
..096c mov $0f4,a A:37
da03dc bra $03d9 [da03d9] A:36 H: 758
da03d9 lda $2140 [002140] A:36 H: 776
da03dc bra $03d9 [da03d9] A:36 H: 800
da03d9 lda $2140 [002140] A:36 H: 818
..096e inc a A:37
da03dc bra $03d9 [da03d9] A:37 H: 842
da03d9 lda $2140 [002140] A:37 H: 860
..096f bra $096c A:38
da03dc bra $03d9 [da03d9] A:37 H: 884
da03d9 lda $2140 [002140] A:37 H: 902
da03dc bra $03d9 [da03d9] A:37 H: 926
da03d9 lda $2140 [002140] A:37 H: 944
..096c mov $0f4,a A:38
da03dc bra $03d9 [da03d9] A:37 H: 968
da03d9 lda $2140 [002140] A:37 H: 986
da03dc bra $03d9 [da03d9] A:37 H:1010
da03d9 lda $2140 [002140] A:37 H:1028
..096e inc a A:38
da03dc bra $03d9 [da03d9] A:38 H:1052
da03d9 lda $2140 [002140] A:38 H:1070
..096f bra $096c A:39
da03dc bra $03d9 [da03d9] A:38 H:1094
da03d9 lda $2140 [002140] A:38 H:1112
da03dc bra $03d9 [da03d9] A:38 H:1136
da03d9 lda $2140 [002140] A:38 H:1154
..096c mov $0f4,a A:39
da03dc bra $03d9 [da03d9] A:38 H:1178
da03d9 lda $2140 [002140] A:38 H:1196
da03dc bra $03d9 [da03d9] A:38 H:1220
da03d9 lda $2140 [002140] A:38 H:1238
..096e inc a A:39
da03dc bra $03d9 [da03d9] A:39 H:1262
da03d9 lda $2140 [002140] A:39 H:1280
..096f bra $096c A:3a
da03dc bra $03d9 [da03d9] A:39 H:1304
da03d9 lda $2140 [002140] A:39 H:1322
da03dc bra $03d9 [da03d9] A:39 H:1346
da03d9 lda $2140 [002140] A:39 H: 0
..096c mov $0f4,a A:3a
da03dc bra $03d9 [da03d9] A:39 H: 24
da03d9 lda $2140 [002140] A:39 H: 42
da03dc bra $03d9 [da03d9] A:39 H: 66
da03d9 lda $2140 [002140] A:39 H: 84
..096e inc a A:3a
da03dc bra $03d9 [da03d9] A:3a H: 108
da03d9 lda $2140 [002140] A:3a H: 126
..096f bra $096c A:3b
da03dc bra $03d9 [da03d9] A:3a H: 150
da03d9 lda $2140 [002140] A:3a H: 168
da03dc bra $03d9 [da03d9] A:3a H: 192
da03d9 lda $2140 [002140] A:3a H: 210
..096c mov $0f4,a A:3b
da03dc bra $03d9 [da03d9] A:3a H: 234
da03d9 lda $2140 [002140] A:3a H: 252
da03dc bra $03d9 [da03d9] A:3a H: 276
da03d9 lda $2140 [002140] A:3a H: 294
..096e inc a A:3b
da03dc bra $03d9 [da03d9] A:3b H: 318
da03d9 lda $2140 [002140] A:3b H: 336
..096f bra $096c A:3c
da03dc bra $03d9 [da03d9] A:3b H: 360
da03d9 lda $2140 [002140] A:3b H: 378
da03dc bra $03d9 [da03d9] A:3b H: 402
da03d9 lda $2140 [002140] A:3b H: 420
..096c mov $0f4,a A:3c
da03dc bra $03d9 [da03d9] A:3b H: 444
Could this log help fix Mesen-S? Mesen-S doesn't behave like this.
lidnariq wrote:
For maximal programmer/ROM hacker adversity, I'd say it should pick a random rate ±160Hz of nominal on emulation start, and should slowly drift over ±32Hz of that while it's running :P
That doesn't sound too hard really, I'm already recalculating the clock ratio between the CPU & SPC once a frame (in case the region setting is switched), so picking a random +- 32hz value (+ another one at power on) really wouldn't be that hard. Like a lot of stuff Mesen, though, I'd keep this disabled by default and just have it as an advanced option for devs.
byuu wrote:
d4s' Breath of Fire II (German)
Thanks, I'll keep that in mind for testing (though I just tried using this and couldn't get it to boot in any emulator, but I have a feeling I might just have patched the wrong rom)
Quote:
There will probably be some conflict if the snes writes data at the same time the SPC initiates a port clear function. Some emulators don't emulate this, BSNES does though.
I'm not too sure about this one? I took a quick look at higan's SPC code, but didn't really see anything that seemed special about the reset? Just seems to sync up to the CPU and then reset the values, which is essentially equivalent to what I have at the moment.
klurey wrote:
When cpu reads / spc writes to same port simultaneously, bsnes keeps returning old value.
That sounds like it might just be the result of slightly different timing? Not sure. I'll keep it in mind when I get back to trying to fix the SPC. For now I'm half hoping that if I manage to fix whatever is left to fix in terms of timing for DMAs, IRQs and the like, the remaining SPC issues might fix themselves.
Sour wrote:
lidnariq wrote:
For maximal programmer/ROM hacker adversity, I'd say it should pick a random rate ±160Hz of nominal on emulation start, and should slowly drift over ±32Hz of that while it's running
That doesn't sound too hard really, I'm already recalculating the clock ratio between the CPU & SPC once a frame (in case the region setting is switched), so picking a random +- 32hz value (+ another one at power on) really wouldn't be that hard.
Ideally the drift would be a bit slower than that.
I'm hoping to eventually get around to testing an advanced HDMA streaming scheme with sync adjustment. My current concept wouldn't work well with rapid timing changes because it only checks the timer ratio once a frame and I'd want to filter the result to smooth out jitter from timer granularity and polling latency.
Also, kicking the timing around that fast might actually be audible...
The ±32 Hz (audio rate; APU's CPU rate would be 32 times that) is an upper bound for how bad a ceramic resonator is once it's started running. The problem with just using an ordinary random walk is that it's unbounded.
Surely a change that large would take longer than 17 milliseconds? I was under the impression that the signal was pretty regular over the short term, just with thermal drift of the resonant frequency.
Perhaps make the frequency vary with the fraction of cycles that the S-CPU has spent in wai state over the past minute, with some random offset to simulate room temperature.
Sour wrote:
klurey wrote:
When cpu reads / spc writes to same port simultaneously, bsnes keeps returning old value.
That sounds like it might just be the result of slightly different timing? Not sure. I'll keep it in mind when I get back to trying to fix the SPC. For now I'm half hoping that if I manage to fix whatever is left to fix in terms of timing for DMAs, IRQs and the like, the remaining SPC issues might fix themselves.
I wrote another test since it could've been coincidence.
Code:
da03df lda $2140 [002140] A:31 H: 972
..0970 mov $0f4,a A:32
da03e2 lda $2140 [002140] A:31 H: 996
..0972 mov $0f4,a A:33
da03e5 lda $2140 [002140] A:32 H:1020
da03e8 lda $2140 [002140] A:32 H:1044
da03eb lda $2140 [002140] A:32 H:1068
..0974 mov $0f4,a A:34
da03ee lda $2140 [002140] A:33 H:1092
da03f1 lda $2140 [002140] A:33 H:1116
da03f4 lda $2140 [002140] A:33 H:1140
da03f7 lda $2140 [002140] A:33 H:1164
..0976 mov $0f4,a A:35
da03fa lda $2140 [002140] A:34 H:1188
da03fd lda $2140 [002140] A:34 H:1212
da0400 lda $2140 [002140] A:34 H:1236
..0978 mov $0f4,a A:36
da0403 lda $2140 [002140] A:35 H:1260
da0406 lda $2140 [002140] A:35 H:1284
da0409 lda $2140 [002140] A:35 H:1308
da040c lda $2140 [002140] A:35 H:1332
..097a mov $0f4,a A:37
da040f lda $2140 [002140] A:36 H:1356
da0412 lda $2140 [002140] A:36 H: 16
da0415 lda $2140 [002140] A:36 H: 40
..097c mov $0f4,a A:38
da0418 lda $2140 [002140] A:37 H: 64
da041b lda $2140 [002140] A:37 H: 88
da041e lda $2140 [002140] A:37 H: 112
da0421 lda $2140 [002140] A:37 H: 136
..097e mov $0f4,a A:39
da0424 lda $2140 [002140] A:38 H: 160
da0427 lda $2140 [002140] A:38 H: 184
da042a lda $2140 [002140] A:38 H: 208
..0980 mov $0f4,a A:3a
da042d lda $2140 [002140] A:39 H: 232
Bsnes always 1 step behind like before. Regardless of 1,3,4 cpu opcodes between each spc opcode.
Does libretro port have input problem? Joypad1 uses both retropads: Retropad1 (A,X,Y,L,R) + Retropad2 (D-Pad,Start,Select,B)
klurey wrote:
Does libretro port have input problem? Joypad1 uses both retropads: Retropad1 (A,X,Y,L,R) + Retropad2 (D-Pad,Start,Select,B)
It seems to be working fine as far as I can tell? I've reset my retroarch configuration to be sure and it seems to be binding the correct buttons.
93143 wrote:
Surely a change that large would take longer than 17 milliseconds? I was under the impression that the signal was pretty regular over the short term, just with thermal drift of the resonant frequency.
I wasn't suggesting that ±32Hz per vsync was reasonable... I was stating that it should never deflect more than ±32Hz from whatever frequency it started with.
In practice, ceramic resonator drift does seem to be mostly thermal, and in turn that appears to
usually be a positive monotonic coefficient among higher-frequency resonators.
But from the point of view of simulating an plausibly adverse environment, it probably shouldn't
only model the thermal coefficient. A random walk, as long as it's slow enough and bounded, is probably more useful.
Sour wrote:
klurey wrote:
Does libretro port have input problem? Joypad1 uses both retropads: Retropad1 (A,X,Y,L,R) + Retropad2 (D-Pad,Start,Select,B)
It seems to be working fine as far as I can tell? I've reset my retroarch configuration to be sure and it seems to be binding the correct buttons.
It seems to be some msvc2017 msbuild optimization problem (libretro|x86).
This does that wonky retropad1+2 reading:
Code:
case 0x4219: case 0x421B: case 0x421D: case 0x421F:
return (uint8_t)(_controllerData[((addr & 0x0E) - 8) >> 1] >> 8);
And this behaves like expected:
Code:
case 0x4219: case 0x421B: case 0x421D: case 0x421F:
printf("%04x: %04x %04x %04x = %04x %04x %04xn",
addr,
(addr & 0x0E), (addr & 0x0E) - 8, ((addr & 0x0E) - 8) >> 1,
_controllerData[((addr & 0x0E) - 8) >> 1], _controllerData[((addr & 0x0E) - 8) >> 1] >> 8,
(uint8_t) (_controllerData[((addr & 0x0E) - 8) >> 1] >> 8)
);
return (uint8_t)(_controllerData[((addr & 0x0E) - 8) >> 1] >> 8);
Using unmodified git source in both cases. And for libretro.cpp, I can't see core options unless I switch from
Code:
static const char* ==> static constexpr char*
I guess both are somehow unique to me. But that last one is more confusing.
Building latest Retroarch git using msvc2017 project. I'll try mingw building also to see what happens.
lidnariq wrote:
A random walk, as long as it's slow enough and bounded, is probably more useful.
Could probably just do something like +1 per X seconds, until you hit -32, then -1 per X seconds until you hit -32, or something akin.
klurey wrote:
And this behaves like expected:
So you're saying adding a printf statement fixes the problem? That's never a good sign... FYI I have not tested the 32-bit builds of Mesen-S at all (libretro or not) - the way I setup the configuration between the UI and the core doesn't like the 32-bit builds, and I haven't had the time to either find a solution or implement a workaround. That being said, I don't see why this would fail in x86 but not x64...
klurey wrote:
I can't see core options unless I switch from
That's my fault - it was an issue on the Mesen core in some configurations too (originally the person said Windows XP, but maybe it's more of a 32-bit issue?), but I forgot to bring over that fix when I made the libretro build of Mesen-S. I've fixed the code - thanks for letting me know.
Sour wrote:
klurey wrote:
And this behaves like expected:
So you're saying adding a printf statement fixes the problem? That's never a good sign... FYI I have not tested the 32-bit builds of Mesen-S at all (libretro or not) - the way I setup the configuration between the UI and the core doesn't like the 32-bit builds, and I haven't had the time to either find a solution or implement a workaround. That being said, I don't see why this would fail in x86 but not x64...
Yeah. It's weird because I had this and it broke:
Code:
case 0x4219: case 0x421B: case 0x421D: case 0x421F:
printf("4129: %04x %04x %04x = %04x %04x %04xn",
(addr & 0x0E), (addr & 0x0E) - 8, ((addr & 0x0E) - 8) >> 1,
_controllerData[((addr & 0x0E) - 8) >> 1], _controllerData[((addr & 0x0E) - 8) >> 1] >> 8,
(uint8_t) (_controllerData[((addr & 0x0E) - 8) >> 1] >> 8)
);
return (uint8_t)(_controllerData[((addr & 0x0E) - 8) >> 1] >> 8);
but adding the addr then cleared the problem.
Code:
case 0x4219: case 0x421B: case 0x421D: case 0x421F:
printf("%04x: %04x %04x %04x = %04x %04x %04xn",
addr,
(addr & 0x0E), (addr & 0x0E) - 8, ((addr & 0x0E) - 8) >> 1,
_controllerData[((addr & 0x0E) - 8) >> 1], _controllerData[((addr & 0x0E) - 8) >> 1] >> 8,
(uint8_t) (_controllerData[((addr & 0x0E) - 8) >> 1] >> 8)
);
return (uint8_t)(_controllerData[((addr & 0x0E) - 8) >> 1] >> 8);
What's even more evil is the printf logs always show the expected correct values. But not when actual returning.
I also tried splitting the values and it still failed:
Code:
addr = ((addr & 0x0E) - 8) >> 1;
return (uint8_t)(_controllerData[addr] >> 8);
Wondering if it's the MS LTCG, as I've heard it can be buggy often enough. I did try mingw 8.1 32-bit but the thing just hanged on me for an hour on the TraceLogger file and I aborted.
If the buildbot had a 32-bit build, I could've just tried that since it's gcc built.
klurey wrote:
Wondering if it's the MS LTCG, as I've heard it can be buggy often enough. I did try mingw 8.1 32-bit but the thing just hanged on me for an hour on the TraceLogger file and I aborted.
At the very least, I've never had any instances of the MSVC compiler screwing up so far - could be the case here, I'll try x86 on my end when I get a chance (might also try updating to VS2019 soon).
RE: TraceLogger, I'm not sure what GCC doesn't like about that file - GCC 8.x (not sure which one it was) freezes, even with all optimizations off. GCC 9.1 compiles it just fine (clang & MSVC are fine too)
Sour wrote:
lidnariq wrote:
A random walk, as long as it's slow enough and bounded, is probably more useful.
Could probably just do something like +1 per X seconds, until you hit +32, then -1 per X seconds until you hit -32, or something akin.
That makes sense to me.
I was thinking about this in terms of APU instruction cycles per (NTSC) vsync, which turns out to be a comparably small range. 16814 ±17
https://github.com/SourMesen/Mesen-S/bl ... efile#L262Played around with the Makefile (windows_msvc2017_desktop_x86).
Od = okay
O1 = bad
O2 = bad
Os = bad
Ot = bad
ltcg:off = no effect other than massive slowdown
I wonder what the asm is generating...
Kinda strange but here's what happens with msvc2017.
Code:
O2
return (uint8_t)_controllerData[((addr & 0x0E) - 8) >> 1];
mesen-s_libretro.dll+3A2E9 - 83 E2 0E - and edx,0E { 14 }
mesen-s_libretro.dll+3A2EC - 83 EA 07 - sub edx,07 { 7 }
mesen-s_libretro.dll+3A2EF - D1 FA - sar edx,1
mesen-s_libretro.dll+3A2F1 - 8A 44 56 2C - mov al,[esi+edx*2+2C]
mesen-s_libretro.dll+3A2F5 - E9 B3000000 - jmp mesen-s_libretro.dll+3A3AD
return (uint8_t)(_controllerData[((addr & 0x0E) - 8) >> 1] >> 8);
mesen-s_libretro.dll+3A318 - 8B 45 CC - mov eax,[ebp-34]
mesen-s_libretro.dll+3A31B - 83 E0 0E - and eax,0E { 14 }
mesen-s_libretro.dll+3A31E - 83 E8 06 - sub eax,06 { 6 }
mesen-s_libretro.dll+3A321 - D1 F8 - sar eax,1
mesen-s_libretro.dll+3A323 - 8A 44 46 2D - mov al,[esi+eax*2+2D]
mesen-s_libretro.dll+3A327 - E9 81000000 - jmp mesen-s_libretro.dll+3A3AD
O2 - printf
return (uint8_t)(_controllerData[((addr & 0x0E) - 8) >> 1] >> 8);
mesen-s_libretro.dll+3A368 - 8B 75 CC - mov esi,[ebp-34]
mesen-s_libretro.dll+3A36B - 83 E6 0E - and esi,0E { 14 }
mesen-s_libretro.dll+3A36E - 8D 56 F9 - lea edx,[esi-07]
mesen-s_libretro.dll+3A371 - D1 FA - sar edx,1
mesen-s_libretro.dll+3A373 - 0FB7 44 57 2C - movzx eax,word ptr [edi+edx*2+2C]
Od
return (uint8_t)_controllerData[((addr & 0x0E) - 8) >> 1];
mesen-s_libretro.dll+58463 - 0FB7 45 7C - movzx eax,word ptr [ebp+7C]
mesen-s_libretro.dll+58467 - 83 E0 0E - and eax,0E { 14 }
mesen-s_libretro.dll+5846A - 83 E8 08 - sub eax,08 { 8 }
mesen-s_libretro.dll+5846D - D1 F8 - sar eax,1
mesen-s_libretro.dll+5846F - 8B 4D 54 - mov ecx,[ebp+54]
mesen-s_libretro.dll+58472 - 8A 44 41 2C - mov al,[ecx+eax*2+2C]
mesen-s_libretro.dll+58476 - E9 E0000000 - jmp mesen-s_libretro.dll+5855B
return (uint8_t)(_controllerData[((addr & 0x0E) - 8) >> 1] >> 8);
mesen-s_libretro.dll+58491 - 0FB7 55 7C - movzx edx,word ptr [ebp+7C]
mesen-s_libretro.dll+58495 - 83 E2 0E - and edx,0E { 14 }
mesen-s_libretro.dll+58498 - 83 EA 08 - sub edx,08 { 8 }
mesen-s_libretro.dll+5849B - D1 FA - sar edx,1
mesen-s_libretro.dll+5849D - 8B 45 54 - mov eax,[ebp+54]
mesen-s_libretro.dll+584A0 - 0FB7 44 50 2C - movzx eax,word ptr [eax+edx*2+2C]
mesen-s_libretro.dll+584A5 - C1 F8 08 - sar eax,08 { 8 }
mesen-s_libretro.dll+584A8 - E9 AE000000 - jmp mesen-s_libretro.dll+5855B
Don't know why compiler is so clumsy about that line.
Well, I guess it was a compiler bug after all! It only happens in 32-bits, too.
I changed the code to simplify it, which should run just a tiny bit faster (not that it really matters) and avoids the compiler bug.
Hopefully things should be working properly on your end now.
Yes - Thank you for working around all those problems! And I see you've fixed ExHiROM save games.
I'll start testing some oddball games and see how it goes.
Found one.
Kaite Tsukutte Asoberu Dezaemon
-- error = "sram breakdown"
Some searching of byuu's old board
https://151.236.14.55/byuubackup2/viewt ... art=0.htmlCode:
board region=ntsc
rom name=program%26%2346%3Brom size=0x80000
map address=00-7d,80-ff:8000-ffff mask=0x8000
map address=40-6f,c0-ef:0000-7fff mask=0x8000
ram name=save%26%2346%3Bram size=0x20000
map address=70-7d,f0-ff:0000-7fff mask=0x8000
information
region: NTSC
title: Kaite Tsukutte Asoberu Dezaemon (Japan)
sha256: 3ddf81cee32dcf7c8df4367bdae5ea1b0af50b7baab54be1c01f6ae6c3e308a6
note: heuristically generated by icarus
Ongaku Tsukuru Kanadeeru (Japan)
-- error: boots at 00:8000 (rom 0x7ffc?). Should be 00:FF00 (rom 0xfffc)??
note: Deleted old post because I thought it was (32-bit compiler) false-positive. But not so sure anymore.
Code:
[libretro INFO] -----------------------------
[libretro INFO] Game: µÝ¶Þ¸Â¸°Ù
[libretro INFO] Type: HiROM
[libretro INFO] FastROM
[libretro INFO] Map Mode: $31
[libretro INFO] Rom Type: $02
[libretro INFO] File size: 1024 KB
[libretro INFO] ROM size: 1024 KB
[libretro INFO] SRAM size: 32 KB
[libretro INFO] -----------------------------
[libretro INFO] Map [$00:8xxx] to page number 00
[libretro INFO] Map [$00:9xxx] to page number 01
[libretro INFO] Map [$00:Axxx] to page number 02
[libretro INFO] Map [$00:Bxxx] to page number 03
[libretro INFO] Map [$00:Cxxx] to page number 04
[libretro INFO] Map [$00:Dxxx] to page number 05
[libretro INFO] Map [$00:Exxx] to page number 06
[libretro INFO] Map [$00:Fxxx] to page number 07
edit:
I think it's hitting an override? Game's name = B1ZMCJ.
Code:
bool BaseCartridge::MapSpecificCarts(MemoryManager &mm)
{
string name = GetCartName();
if(_cartInfo.GameCode[0] == 'Z' && _cartInfo.GameCode[3] == 'J') {
//BSC-1A5M-02, BSC-1A7M-01
//Games: Sound Novel Tsukuuru, RPG Tsukuuru, Derby Stallion 96
MapBanks(mm, _prgRomHandlers, 0x00, 0x3F, 0x08, 0x0F, 0, true);
MapBanks(mm, _prgRomHandlers, 0x80, 0x9F, 0x08, 0x0F, 0, true, 0x200);
MapBanks(mm, _prgRomHandlers, 0xA0, 0xBF, 0x08, 0x0F, 0, true, 0x100);
if(_saveRamSize > 0) {
MapBanks(mm, _saveRamHandlers, 0x70, 0x7D, 0x00, 0x07, 0, true);
MapBanks(mm, _saveRamHandlers, 0xF0, 0xFF, 0x00, 0x07, 0, true);
}
return true;
}
return false;
}
Dekitate High School (Japan)
-- error: Mash through all the New Game prompts. After you pick the girl, it'll go into story mode. You'll see a black bar flicker on top of the screen after every text box reset.
Looks like 15 pixels for 1 frame with v-crop on. I think it's some dma at 00:9b3d but wouldn't know why.
Here's few more games but no debugging info.
Battle Grand Prix (USA) = black screen of death after title
-- edit: I guess it's kinda random. Sometimes happens after main menu but after load state sometimes works.
Super Famista 5 (Japan) = there's some "Tokyo Yomiuri Giants" logo screen on boot that doesn't show up on snes9x or bsnes. Don't know what the message says.
-- edit: There's some sram check at 89:806F (lda $701B9E). It's supposed to be 0xFFFF. But it fails the check and shows the (piracy?) splash screen.
What's more hilarious is the sram file saved by Retroarch. It reads:
Code:
0.$.. ..archconfigremapsMesen-S__0.rmp. (Japan).opt.........
which looks nowhere near sram produced by other emus for this game.
Screenshots of said messages; we can translate it.
Alright, so I've re-fixed the IRQ/NMI delay after DMA thing, after realizing I had broken wild guns again.
Turns out I suck at assembly and the test I had written for this was flawed - so edited that to fix it, and implemented the 1 cycle "irq lock" like byuu said and now both games work properly.
Kaite Tsukutte should be fixed - it was a sram mapping issue (added an exception for this one)
Ongaku Tsukuru should also be fixed - changed the code for the exception on the other 3 games to avoid using it for this one.
Super Famista 5 -> The screen says "This cassette is a special version for Yomiuri Giants fans". Which is awfully weird, considering it only seems to show up in the libretro core, not the standalone build. Must be something that's not being initialized properly? (or initialized differently, at least). I'll take a closer look tomorrow.
Battle Grand Prix appears to lock up on a loop reading the hblank/vblank flags in $4212. If I change the timing the flags are set/cleared, it fixes it, but I'm assuming the real issue might be elsewhere since the flags should already be getting set and cleared at the right timing - will have to investigate more.
Maybe related to Battle Grand Prix? Mesen-S dma timings are slightly off compared to bsnes. And BGP likes to dma during v-blank while that 4212 bpl / bmi loop is running.
Code:
0088d3 php A:8000 X:0003 Y:0220 S:1fe2 D:0000 DB:00 nvMxdIZc V:228 H:1286 F:30
0088d4 sep #$20 A:8000 X:0003 Y:0220 S:1fe1 D:0000 DB:00 nvMxdIZc V:228 H:1308 F:30
0088d6 rep #$10 A:8000 X:0003 Y:0220 S:1fe1 D:0000 DB:00 nvMxdIZc V:228 H:1330 F:30
0088d8 lda $55 [000055] A:8000 X:0003 Y:0220 S:1fe1 D:0000 DB:00 nvMxdIZc V:228 H:1352 F:30
0088da and #$04 A:8017 X:0003 Y:0220 S:1fe1 D:0000 DB:00 nvMxdIzc V:229 H: 12 F:30
0088dc beq $8903 [008903] A:8004 X:0003 Y:0220 S:1fe1 D:0000 DB:00 nvMxdIzc V:229 H: 28 F:30
0088de lda #$80 A:8004 X:0003 Y:0220 S:1fe1 D:0000 DB:00 nvMxdIzc V:229 H: 44 F:30
0088e0 sta $2115 [002115] A:8080 X:0003 Y:0220 S:1fe1 D:0000 DB:00 NvMxdIzc V:229 H: 60 F:30
0088e3 ldy #$5000 A:8080 X:0003 Y:0220 S:1fe1 D:0000 DB:00 NvMxdIzc V:229 H: 90 F:30
0088e6 sty $2116 [002116] A:8080 X:0003 Y:5000 S:1fe1 D:0000 DB:00 nvMxdIzc V:229 H: 114 F:30
0088e9 ldy #$1801 A:8080 X:0003 Y:5000 S:1fe1 D:0000 DB:00 nvMxdIzc V:229 H: 150 F:30
0088ec sty $4300 [004300] A:8080 X:0003 Y:1801 S:1fe1 D:0000 DB:00 nvMxdIzc V:229 H: 174 F:30
0088ef ldy #$1700 A:8080 X:0003 Y:1801 S:1fe1 D:0000 DB:00 nvMxdIzc V:229 H: 210 F:30
0088f2 sty $4302 [004302] A:8080 X:0003 Y:1700 S:1fe1 D:0000 DB:00 nvMxdIzc V:229 H: 234 F:30
0088f5 stz $4304 [004304] A:8080 X:0003 Y:1700 S:1fe1 D:0000 DB:00 nvMxdIzc V:229 H: 270 F:30
0088f8 ldy #$0800 A:8080 X:0003 Y:1700 S:1fe1 D:0000 DB:00 nvMxdIzc V:229 H: 300 F:30
0088fb sty $4305 [004305] A:8080 X:0003 Y:0800 S:1fe1 D:0000 DB:00 nvMxdIzc V:229 H: 324 F:30
0088fe lda #$01 A:8080 X:0003 Y:0800 S:1fe1 D:0000 DB:00 nvMxdIzc V:229 H: 360 F:30
008900 sta $420b [00420b] A:8001 X:0003 Y:0800 S:1fe1 D:0000 DB:00 nvMxdIzc V:229 H: 376 F:30
008903 plp A:8001 X:0003 Y:0800 S:1fe1 D:0000 DB:00 nvMxdIzc V:229 H: 406 F:30
008904 rts A:8001 X:0003 Y:0800 S:1fe2 D:0000 DB:00 nvMxdIZc V:241 H: 990 F:30
-- Mesen-S:
360, 376, 406, 1002 [-12 = 990]
Super Famista 5: sram is not initialized and has random values.
bsnes default inits 0xff = okay. If you memset with 0x00, you get the special screen.
edit:
For Dekitate High School, I think the screen should be drawing during dma transfer? Mesen-S maybe doesn't do this?
Code:
009b3d sta $420b [00420b] A:1802 X:0084 Y:efda S:01c1 D:0000 DB:00 nvMxdizc V:240 H: 870 F:42
009b40 rep #$30 A:1802 X:0084 Y:efda S:01c1 D:0000 DB:00 nvMxdizc V:240 H: 900 F:42
009b42 ldx #$1000 A:1802 X:0084 Y:efda S:01c1 D:0000 DB:00 nvmxdizc V: 16 H: 34 F:43
009b45 dex A:1802 X:1000 Y:efda S:01c1 D:0000 DB:00 nvmxdizc V: 16 H: 58 F:43
Found what's wrong about Dekitate H.S. It does H-DMA[5] at line 0 to PPU 212c which turns on bg2 mainscreen.
But it's doing this in the middle of a DMA from 240 to 41. Mesen-S can't simulate this yet?
Ken Griffey Jr. also needs SRAM initialized to 0xFF or you get gibberish in the menus.
The only exception is the Power Slide FX demo that must be initialized to 0x00 or it crashes.
I just let Power Slide FX crash currently and use 0xFF.
The more proper solution that people will hate is, games should be distributed with pre-initialized and blank SRAM files. Would also handle the Codemasters device with code stored in the SRAM. But for now, just do 0xFF yeah.
Quote:
But it's doing this in the middle of a DMA from 240 to 41. Mesen-S can't simulate this yet?
Mesen-S can't run HDMA inside a DMA? That ... would break everything. Do you just mean it's not syncing the PPU up during this combination of DMA+HDMA?
Side note: both DMA and HDMA do CPU<>DMA sync. But HDMA inside DMA doesn't need to sync again, and won't. The CPU revision 1 crash starts to make sense why it might be a problem with that in mind.
byuu wrote:
games should be distributed with pre-initialized and blank SRAM files. Would also handle the Codemasters device with code stored in the SRAM
This could also be handled by the manifest which would specify the default content.
[quote="byuu"]Mesen-S can't run HDMA inside a DMA? That ... would break everything. Do you just mean it's not syncing the PPU up during this combination of DMA+HDMA?[/code]
I'll describe what I'm seeing happen in bsnes.
lda #$02
sta $420b
V:240 -> 016
At V:240, bg2 main screen is off. And right after the dma is finished, bsnes says bg2 main screen is on. There's some h-dma 5 active that supposedly keeps writing to 212c.
With Mesen-S, it shows some big black bar on top of the screen (missing bg2 layer). But snes9x and bsnes does not have this problem.
I don't really get it all. :laughing:
klurey wrote:
Found what's wrong about Dekitate H.S. It does H-DMA[5] at line 0 to PPU 212c which turns on bg2 mainscreen.
Yep, took me a little bit to figure this one out, too. Thanks for taking the time to investigate!
byuu wrote:
Mesen-S can't run HDMA inside a DMA? That ... would break everything.
It wasn't working properly - it would only interrupt DMA to run HDMA if it was on the same channel as DMA. But in this case, HDMA is on another channel, and it ended up running after DMA completed. I guess very few games depend on this? (surprisingly?)
It's fixed now. I also spent some time triple-checking the logic and timing compared to higan and I think it should be pretty close now (including the lack of sync when HDMA interrupts DMA, etc.)
I also implemented the read being 4 master clocks earlier than the writes, and this fixed Rendering Ranger R2. It could be a coincidence, though, but I'll take it!
RE: Super Famista, I added a "power on ram state" option (same as Mesen), which applies to all ram (vram, cgram, oam, work ram, save ram). It defaults to $00, though. Eventually I'd like to default it to random, maybe with some overrides for SRAM for the games that require it? Didn't set it to random just yet because random will require a bit more thought when it comes to recording/playing movies (though movies aren't implemented yet, so I guess I should have gone for random...)
With this, the games that still freeze are down to Battle Grand Prix & Kishin Douji Zenki - Tenchi Meidou. It's likely that there is more, but those are the only 2 I'm aware of.
And then graphic glitches in ASP, Warlock and NHL 94 + mosaic is still broken.
Edit: Also, the emulator now outputs 256x239 frames when there is no high resolution content on that frame, which allows the graphic filters to work a lot better than before.
Sour wrote:
I added a "power on ram state" option (same as Mesen), which applies to all ram (vram, cgram, oam, work ram, save ram). It defaults to $00, though. Eventually I'd like to default it to random, maybe with some overrides for SRAM for the games that require it? Didn't set it to random just yet because random will require a bit more thought when it comes to recording/playing movies (though movies aren't implemented yet, so I guess I should have gone for random...)
They could store the RNG seed.
creaothceann wrote:
This could also be handled by the manifest which would specify the default content.
A) I don't include unlicensed games in my own database, so it wouldn't help with the one game that needs 0x00.
B) this won't handle any hypothetical cases that need more than a single fixed value (eg the Codemasters example, though that's not an SNES game.)
C) Sour doesn't support manifests (and shouldn't ... at least not yet, that format is obscenely unstable.)
Sour wrote:
Edit: Also, the emulator now outputs 256x239 frames when there is no high resolution content on that frame, which allows the graphic filters to work a lot better than before.
A gimmick I used to do many years ago was sub-divide the screen into lores and hires regions, and pass an array of line widths into my video filters. This would let HQ2x continue to render the game content the same way when hires textboxes showed up in various JRPGs.
Without this, if the user is using bilinear interpolation on video scaling to the screen (eg GL_LINEAR), games will suddenly appear sharper when textboxes appear, and become blurrier when they go away.
But oh boy was that a whole mess of annoying code to maintain. It gets out of hand very fast when you have 15+ video filters, HD mode 7, pixel shaders, etc.
Super 4WD - The Baja (Japan)
- error: in-game Mode 7 flickers mad crazy
Code:
$80/EB86 AE 11 42 LDX $4211 [$80:4211] A:0000 X:0008 Y:00EF D:0000 DB:80 S:0B72 P:envmXdIzC HC:0964 VC:030 FC:01 I:01
$80/EB89 CB WAI A:0000 X:00C2 Y:00EF D:0000 DB:80 S:0B72 P:eNvmXdIzC HC:0994 VC:030 FC:01 I:01
$80/EB8A E2 30 SEP #$30 A:0000 X:00C2 Y:00EF D:0000 DB:80 S:0B72 P:eNvmXdIzC HC:1018 VC:030 FC:01 I:01
I think the wai is a hv-irq trigger. But Mesen-S blows right by it for a whole frame or two.
byuu wrote:
creaothceann wrote:
This could also be handled by the manifest which would specify the default content.
A) I don't include unlicensed games in my own database, so it wouldn't help with the one game that needs 0x00.
B) this won't handle any hypothetical cases that need more than a single fixed value (eg the Codemasters example, though that's not an SNES game.)
C) Sour doesn't support manifests (and shouldn't ... at least not yet, that format is obscenely unstable.)
I meant manifests in general. Text files can store anything; it's just a matter of finding a good syntax. (Even the whole ROM library could (theoretically!) be stored like that.)
klurey wrote:
Super 4WD - The Baja (Japan)
- error: in-game Mode 7 flickers mad crazy
Turns out this was the same problem as issues I found during the day today. I implemented a cycle-by-cycle version of the mul/div operations a couple of days ago, and forgot a "break;" somewhere, which broke a few games that used the division registers + H-IRQs. Should be fixed now (also affected Jurassic Park & ASP)
I just finished implementing what should be a fairly cycle-accurate fetch/render sequence for the PPU (for mode 0-6 backgrounds). This fixes the "Good Luck" text in A.S.P. and the glitch at the top right in Warlock.
I haven't implemented this for mode 7 (yet) and sprites - those are the next things I'm hoping to get done. Implementing it for mode 7 should hopefully fix the minor glitches in NHL 94's intro.
I might have broken the rendering logic for some things with all this - I tried testing most of the games I knew that used graphic modes that aren't too common and fixed everything I could find.
byuu wrote:
A gimmick I used to do many years ago was sub-divide the screen into lores and hires regions, and pass an array of line widths into my video filters. This would let HQ2x continue to render the game content the same way when hires textboxes showed up in various JRPGs.
Yea, that sounds like a fair amount of work for the handful of titles that would really benefit from it, heh. Might try doing this someday, but I think this is good enough for now. Though, I might setup xBRZ to run in a few different threads, since it can't even pull off 60fps at the moment in 512x478.
Oh hey if you want to hate me ... :P
https://archive.org/details/Bsnes-emula ... omparisonsThese were some older, stubborn bugs I had with bsnes way back in the day.
Quote:
I just finished implementing what should be a fairly cycle-accurate fetch/render sequence for the PPU (for mode 0-6 backgrounds).
That sounds promising! I mostly went off observed cases of real-world games, and made it easy to adjust the timings as we learn more.
We really, really, really need someone with a logic analyzer to work out these timings for us. A person who shall remain nameless promised to do that in return for me helping them with their emulator, but then never released their notes :c
Quote:
Though, I might setup xBRZ to run in a few different threads, since it can't even pull off 60fps at the moment in 512x478.
You could always try the good old-fashioned "#pragma omp parallel for" line above the for(y=0;y<height;y++) line trick ^-^;
I had good results with snes_ntsc and HQ2x, but it ended up hurting speed with all the other filters.
byuu wrote:
We really, really, really need someone with a logic analyzer to work out these timings for us.
What precisely do you want recorded?
This is the key problem: I don't really know ... I'm not good at the hardware side at all.
I'm hoping to get something like a cycle-by-cycle guide of when VRAM/OAM/CGRAM fetches occur and where, that we can then reverse into something meaningful. More than likely, one carefully designed test won't cut it, and the fetch patterns will change based on different PPU register settings.
I don't think we're going to be able to see things like PPU register latches, but at least once we have the fetch timing down, we can start devising tests to suss out when the register latching occurs.
Sour managed to dig up the following links on the SNESdev Discord server. What else is needed?
AWJ's trace on A14-A12lidnariq's additional traces
byuu wrote:
These were some older, stubborn bugs I had with bsnes way back in the day.
Thanks! Looks like I'm ok on those, except NHL 94. I think NHL 94 is probably due to either the matrix calculation being refreshed mid-frame or the tiles not being fetched ahead of time like I just did for modes 0-6.
RE: Fetch timing, I spoke about this w/ lidnariq in PMs, too, and so far the best theory I have is that prefetch for the tiles start at H=0: (this is assuming anomie's information about the pixels being rendered from H=22 to H=277 is correct)
H=0-263 = tilemap/tile fetch
H=264-271 = 8 idle cycles (like the threads tepples linked mention)
H=272-339 = sprite chr fetches
H=340 = idle
If the rendering starts a H=22, the tilemap fetching MUST start at least at H=6 at the latest, or it won't finish in time for the first pixel (so H=0 to 6 are the possibilities), and then the only values that don't require the sprite fetching to be split across 2 different scanlines are if tile prefetch starts at either H=0 or H=1.
This also makes sense compared to anomie's info:
anomie wrote:
The PPU seems to access memory 2-3 tiles ahead of the pixel output. At least, when we disable Force Blank mid-scanline, there is garbage for about 16-24 pixels.
With H=0, the graphical glitch in that appears at the top right of the screen in Warlock disappears completely (at H=6, it still shows up on some frames). Both H=0 and H=6 work fine for the "good luck" text in A.S.P. Still need to implement the fetching for sprites and see how that works out with Mega Lo Mania (and that test rom posted earlier in this thread).
My implementation is here atm:
https://github.com/SourMesen/Mesen-S/bl ... u.cpp#L266The fetch patterns should match the info AWJ and lidnariq provided in the threads linked.
I'd assume that CGRAM fetches are done at the time of the pixel's rendering (which may start a few dots before H=22 if this is similar to the NES which had a 2-3 dot delay between pixel rendering & pixel output? pure speculation, though.). OAM evaluation is a tough one (I'm assuming it's all internal to the PPU?), I'm assuming the process ends up using latches (or a "secondary OAM"?) to hold the tile indexes that need to be fetched after the background data is done?
I think I recall a thread somewhere that had some info about the oam logic, will need to see if I can find it again.
Code:
0: N4 -> N3 -> N2 -> N1 -> P4 -> P3 -> P2 -> P1
1: N3 -> N2 -> N1 -> P3 -> P2 -> P2 -> P1 -> P1
2: N2 -> N1 -> 03 -> 03 -> P2 -> P2 -> P1 -> P1
3: N2 -> N1 -> P2 -> P2 -> P1 -> P1 -> P1 -> P1
4: N2 -> N1 -> 03 -> P2 -> P1 -> P1 -> P1 -> P1
5: N2 -> N1 -> P2 -> P2 -> P1 -> P1 -> P1 -> P1
6: N2 -> N1 -> 03 -> 03 -> P1 -> P1 -> P1 -> P1
[ 0,263] Background sequences? There are 33 of them per line.
[ 16,271] Renderer needs 2 sequences buffered because of how fineX works.
[ 16,271] If BG2 is empty or lower priority, ExtBG can be used instead.
[264,271] But that would mean there is nothing new on the bus for ExtBG to use?! (^_^)
[ 16,271] Mode7 BG1 aligns with ExtBG so Mode7 pattern fetches must align with ExtBG.
[272,339] Final 68 dots are reserved for sprite tile fetches. (x_x)
So a bit of tinkering later, I also changed the code to read the tile data for sprites from H=272 to 339.
I arbitrarily put "sprite evaluation" (e.g the part that decides which tiles need to be loaded) at cycle 270 (in reality I imagine this runs from early on in the scanline until close to H=272).
This makes Mega Lo Mania work properly, and the "HblankEmuTest" rom correctly displays the "this is correct behavior" message (this was broken until now).
I also partially fixed the mosaic effect - I *think* it should be working properly now for standard resolution modes. (It's completely broken in high res, still).
Sour wrote:
OAM evaluation is a tough one (I'm assuming it's all internal to the PPU?), I'm assuming the process ends up using latches (or a "secondary OAM"?) to hold the tile indexes that need to be fetched after the background data is done?
I think I recall a thread somewhere that had some info about the oam logic, will need to see if I can find it again.
What I observed during my testing was that Sprite evaluation starts at the beginning of the scanline for 256 cycles so there is a gap between evaluation & fetching.
Enabling force-blank during evaluation pauses it. Then if you write the OAM address and disable force-blank, evaluation will start from that OAM address.
Strange things happen when disabling force-blank during evaluation on certain OAM addresses and cycles: sometimes no sprites are in-range anymore after the ones which were in-range before enabling force-blank.
During evaluation it writes the in-range sprite indexes to memory. There is some kind of memory in the middle of PPU1 which is 32x8bits (Hi-OAM) & 32x7bits (in-range sprite indexes).
Sprite fetching reads OAM data of the in-range sprite indexes again. It continues with sprite fetching and decrementing the current sprite index if you enable force-blank. There is some garbage for a few pixels when enabling force-blank during fetching. Other sprites fetched during force-blank are invisible. Writing to the OAM address has no effect.
Writes to 2104 during evaluation and fetching go to OAM & Hi-OAM (Reading both at the same cycle maybe?)
Sour wrote:
This makes Mega Lo Mania work properly, and the "HblankEmuTest" rom correctly displays the "this is correct behavior" message (this was broken until now).
Are you aware HblankEmuTest displays half correct / half incorrect horizontally on an NTSC SNES?
tepples wrote:
Sour managed to dig up the following links on the SNESdev Discord server. What else is needed?
AWJ's trace on A14-A12lidnariq's additional tracesThere's an SNESdev Discord? How many of these things are there? Heh.
Anyway that's really good info, thank you!
Quote:
Are you aware HblankEmuTest displays half correct / half incorrect horizontally on an NTSC SNES?
While we're at it, does Warlock have that glitch on hardware? Should probably verify ...
byuu wrote:
There's an SNESdev Discord? How many of these things are there? Heh.
The SNR on these Discords tends to be extremely poor; be on the lookout for time vampires at all costs.
Thanks a lot for the sprite evaluation info!
Seems a bit similar to the NES, including the part where disabling rendering at any point messes everything up.
This might actually be enough info to change my implementation a bit and try to roughly match this.
I excepted the OAM to only be read once (during evaluation), but I guess there is actually enough time to read it again during fetching (2 dots per OAM entry, which is the same rate as the 2 dots per tile fetch it needs). This probably explains some of the 8 idle cycles between background fetches and sprite fetches? It would need to load at least 1 OAM entry ahead of time before the fetching starts.
paulb_nl wrote:
Are you aware HblankEmuTest displays half correct / half incorrect horizontally on an NTSC SNES?
Nope, thanks for pointing it out. Guess it goes back on the "doesn't quite work yet" pile :p
RE: Warlock, validating that the game has no glitches would probably be a pretty good indicator that tile fetching starts at H=0.
I posted captures of HblankEmuTest on SNES consoles here:
viewtopic.php?p=231279#p231279Apparently it was YourEmuSuxx.sfc that has half correct/incorrect on NTSC & correct on PAL.
HblankEmuTest.sfc has half correct on both NTSC & PAL.
Yep, that's the danger of test ROMs.
Few of us have access to run them on real hardware, and just assume they're right, and in 'fixing' them, end up breaking other things.
Even the best of us with real hardware miss details. Most Game Boy test ROMs only pass on certain CPU revisions, or fail on only certain revisions. The Game Boy scene has begun tagging which systems a given test ROM passes or fails on. The SNES space should probably start doing the same.
byuu wrote:
Even the best of us with real hardware miss details. Most Game Boy test ROMs only pass on certain CPU revisions, or fail on only certain revisions. The Game Boy scene has begun tagging which systems a given test ROM passes or fails on. The SNES space should probably start doing the same.
I have been told on nesdev Discord, re: SNES, that those versions reported via MMIO registers do not accurately reflect actual changes done to the ICs at the transistor level by Nintendo, i.e. there are several "sub-revisions" of, say, PPU1 rev 2, where they all behave slightly differently. True or false?
If true, I'm not sure how one would be able to reliably determine this through software, and requiring testers to disassemble their consoles (and I don't just mean take the shell off; to get at IC silkscreenings (if relevant at all!) you have to take EVERYTHING apart) seems ridiculously extreme...
Nintendo used the revision codes at first.
CPU revision 1 and 2 have differences, a few of which I emulate (DRAM refresh trigger point, HDMA init trigger point), a few of which I don't (DMA<>HDMA crash fix, probably more I don't know about.)
PPU1 is always revision 1.
PPU2 has revisions 1, 2, and 3. 2 is very rare, and I don't own one.
I have yet to find a difference between PPU2 revisions 1 and 3, but they likely exist.
After this, Nintendo stopped bumping the revision register values.
The big revision after this is the 1CHIP. I like to jest that it's more of a clone console than an SNES, ala the Genesis 3.
They completely break the display brightness register (takes multiple lines to stabilize!), and there are several compatibility issues.
The 1CHIP is actually two chips: one is the CPU+PPU1+PPU2, and the other is the SMP+DSP. Probably Sony wouldn't give them the schematics/Verilog/Asic/whatever-have-you for the SMP+DSP, I don't know. It makes a wonderful black box as you know, I wish more systems worked like that (cough Sega, cough.)
The SMP timer glitch disappears with this revision. The SMP never had a version register anyway, though.
The SNES Jr. potentially has more revisions, but I don't know of any differences between the big 1CHIP-(01,02,03) and the Jr boards.
If someone can run a test on real hardware, we could verify the newer revisions by the PPU brightness bug or the SMP timer fix.
It's pedantic, I know, but as you saw, Sour just fixed a test ROM that's supposed to fail. I did as well when this was first reported. So it is a valid concern ...
Alright, so after a couple of days trying to figure out sprite evaluation & fetching, I think I've got something that's starting to be relatively decent (but it doesn't quite 100% match the hardware, according to the tests)
I've based a lot of this on the info in these threads:
viewtopic.php?f=12&t=18447http://forums.nesdev.com/viewtopic.php?p=231279As well as this GitHub issue (and test rom):
https://github.com/MiSTer-devel/SNES_MiSTer/issues/69And also this VHDL implementation:
https://github.com/MiSTer-devel/SNES_Mi ... rc/PPU.vhdHuge thanks to paulb_nl for all the time you've obviously spent trying to figure out these details! :)
What I have at the moment looks like this (some of this are guesses - not actual hardware-verified behavior):
H=0-255: Sprite evaluation:
Even cycles: Read X+Y (word) + high oam value
Odd cycles: Write sprite index in the 32-entry 7-bit table (if the sprite is visible on the line)
H=0-263: BG tilemap+tile fetching
H=22-278: Rendering
H=270-337: OAM reading to determine which VRAM addresses need to be fetched (starting from the last sprite index stored during evaluation)
Even cycles: Read X+Y (word)
Odd cycles: Read 2nd word (priority, palette, etc), determine the VRAM address to load on the next cycle. If all tiles for this sprite have been processed (e.g based on its width), move on to the next sprite index in the sprite evaluation table.
H=272-339: VRAM fetches for the sprites
Even cycles: Read first word of sprite tile, based on the address calculated on the previous cycle
Odd cycles: Read the 2nd word of the sprite tile
I've tried to keep track of as many details as I could find (but I'm sure I'm missing a few things).
The HBlankTest & "YourEmuSuxx" tests both look half corrupted in both PAL/NTSC. It's far from perfect, but closer than before.
paulb_nl's sprite_oam_test_short_fblank test rom seems to be behaving pretty close to what is seen in his youtube video.
With these changes, Uniracers now runs properly without having to add a hack-ish solution to make it work.
Also, the "lamp" at the top of the screen in Aladdin is correctly missing the first row of pixels due to the game disabling rendering during hblank (as mentioned here:
http://forums.nesdev.com/viewtopic.php?p=231483#p231483)
byuu wrote:
It's pedantic, I know, but as you saw, Sour just fixed a test ROM that's supposed to fail.
I'm hoping to eventually get around to gathering them up either on a wiki page or github repo and defining their expected results (on different consoles, when it matters). At least that way there would be a central place to get information on the tests that exist, rather than finding them on random websites, repos and forum threads. Might be a while before I can find the time, though.
Sour wrote:
I'm hoping to eventually get around to gathering them up either on a wiki page or github repo and defining their expected results (on different consoles, when it matters). At least that way there would be a central place to get information on the tests that exist, rather than finding them on random websites, repos and forum threads. Might be a while before I can find the time, though.
I don't know that much about SNES emulation, but I am interested in archiving. I've got
a test ROM repository with a bunch of the ROMs from this thread, plus some from
the tukuyomi archive. SNES Central also has
a bunch of test ROMs on the Homebrew page, which I haven't added to my repo yet because I'd have to figure out how to scrape the website.
I'd love to get corrections, suggestions or even contributions. Alternatively, if you want to take the things I've collected and make your own wiki/repo/whatever, that's cool too.
Sour wrote:
Alright, so after a couple of days trying to figure out sprite evaluation & fetching, I think I've got something that's starting to be relatively decent (but it doesn't quite 100% match the hardware, according to the tests)
Great work. Nice to see you adding cycle by cycle sprite evaluation.
My OAM test is basic because I am terrible at SNES coding. Would be great if someone could make a proper OAM test rom that displays the current OAM address/data at every cycle.
Long ago, I made a test that at one point filled out OAM with some sort of pattern, and then hammered writes at every single clock cycle (one at a time of course), read out the OAM, and evaluated it to find where the write ended up going to, which I believe reveals what the PPU was actually evaluating at a given cycle. (And a similar test with CGRAM.)
The Hblank write results are loosely based around those findings, but the issue I had in trying to complete an evaluator this way was that it only handled a simple sprite pattern table. What would happen if FirstSprite was different, if sprites overlapped, if we hit range/tile over, if we toggled force blank in the middle of a scanline, if there were transparent sections in certain sprites, on and on ... there could be millions of combinations, and trying to suss out what actually mattered to timing and what didn't was too daunting.
You could also reveal information from reads in this same manner, but now you're back to that aforementioned issue with "what clock cycle within reads/writes do things happen on?", which if you follow Game Boy emulation, is very far from trivial in practice.
Timing this stuff to cycle accuracy is the easy part. Getting the actually correct cycle positions for these things is the brutally hard part. I didn't see it as worthwhile to guess something that might be a little closer, but was still not correct, unless it has a discernible benefit to emulation. I've never made a secret of it that PPU timings are my Achilles' Heel due to not being able to test most of it.
Of course, there's a real possibility that I've approached the PPU wrong, so it's great to have a new approach here. What you're doing so far is really impressive work! :D
Screwtape wrote:
I've got
a test ROM repository with a bunch of the ROMs from this thread
Ah, that's a good start. Good to know something like this exists!
paulb_nl wrote:
Would be great if someone could make a proper OAM test rom that displays the current OAM address/data at every cycle.
Sadly, I'm not really qualified for that!
Also, while I like accuracy as much as anyone, I do need to worry about performance, esp. considering that the debug tools do have a decent overhead when they're opened - and nobody is going to want to use a debugger that can't even run at 60fps. So there are certain things I will probably not realistically be able to add. That being said, there's never any harm in learning more about the hardware's actual behavior.
byuu wrote:
Timing this stuff to cycle accuracy is the easy part. Getting the actually correct cycle positions for these things is the brutally hard part.
I agree - and like I said above, there are limits beyond which I'm not really able to go. I mostly only gave this a shot because I had to implement the bg fetching logic to fix A.S.P's "Good luck" text and I figured I might as well try and see how much of an impact sprite evaluation would have on performance. I haven't been keeping track too closely, but I think between the background and sprites, performance is probably down 10-15%ish. I'll keep it in for now, but eventually might have to make this kind of behavior optional if performance becomes a problem (it's already not that great..)
I edited my previous post just before your reply: I do want to say that I appreciate new approaches here very much. SNES emulation has been stagnant for a long time, and it's been a fear of mine that I might end up holding things back. I've heard several prominent devs state they didn't make an SNES emulator because mine existed, which is the worst thing I could hear ^^;
As for performance ... unfortunately SNES emulation lives in the shadow of ZSNES, and likely always will. It has completely distorted what the general public thinks of how demanding SNES emulation really is. I like the analogy of Nesticle versus Nestopia. We went from needing a 25MHz CPU to an 800MHz CPU, but it was necessary. Thankfully, everyone has an 800MHz+ CPU, so it was never an issue. pNES and Mesen need even more resources, because they do far more. It shouldn't be a surprise that SNES emulation went from needing 200MHz to needing 2-4GHz, but now we're butting up against a nearly stagnant IPC rise over the past decade and a half, and on the other end folks are pushing run-ahead and Raspberry Pis, and so things have only gotten worse, not better, over time. I have never found an effective way to explain to the general public that we're not just throwing cycles away. That we're not just terrible programmers that can't write optimized code.
I'm currently maintaining three separate SNES emulators. I match Snes9X's speed when I match their accuracy. But it is obscenely difficult to compete with them on performance without making those sacrifices. It's not for lack of trying: cooperative threading lets me run the CPU and SMP in huge blocks out of order, and a multi-threaded PPU lets me utilize today's multi-core CPUs, but that only closes the gap. Stuff like CPU ALU cycles, DMA<>HDMA sync, the SMP TEST register, cycle-accurate CPU synchronization, per-byte bus remapping, bus hold delays on memory accesses (the thing you just did for Rendering Ranger), IRQ pin holds, true SA1 memory conflict stalls, etc completely gut performance.
And regrettably for us, including for this PPU research, games just don't need any of this stuff (save of course for Air Strike Patrol, and that only needs the bare minimum), so it's easy to dismiss it. I don't want to cede this ground to the domain of FPGAs, however, so I'll keep trying.
The bsnes/higan split has been the most helpful thing I've done in a long time. I'd recommend you to consider the same, if you were willing. Most of the code can still be shared, possibly even gated behind #ifdefs. If nothing else, just keeping both a pixel-based and a scanline-based PPU will allow you to remain at least fractionally competitive with Snes9X. And even if you don't want to write your own scanline PPU, you're welcome to use mine (hey, free HD mode 7 gimmick!) Or at least lift the idea: I cache all the PPU registers (only 0x34 bytes) + CGRAM once per scanline, then I can render each scanline all at once using OpenMP. The only trick is you have to flush the queued scanlines when games try to force blank to change VRAM, but there's no games that ever turn off the display in the -middle- of the frame, so in practice it's fine.
But if you're not willing to maintain two Mesen-S cores ... I'd offer to team up with you on higan? I'd like higan to be a test bench where, if we drop below 60fps, it is what it is. We have to understand the exact cycle-level behavior of the SNES before we can perfectly optimize it. If you'd like a place to emulate absolutely anything, no matter how demanding, we can use my core for that. And then you can take those findings and make the best daily driver for playing games in the scene. Well, just an offer anyway ^-^;
Any approach you take is fine, but I'd hate to see PPU findings drop off due to how resource demanding it is.
My current sore point for the PPU is ... I feel like all BGs, the sprites, etc should be separate threads running in parallel. But that's just extreme overkill. They will likely need to be at least separate state machines, however.
Lastly, a fair warning: the SA-1 (done right with memory conflicts), ST018 (21MHz ARM6), and when overclocked which people love to do, SuperFX ... they will add a whole new world of pain to performance. They generally cut my performance in half =(
Spent the past couple of days optimizing a lot of stuff (mostly the PPU) - manage to boost performance by a good 30% or so on average (went from ~190fps to ~260fps in FF2 on a first gen ryzen 7 CPU). This gets me up to 350fps with frame skipping (rendering approx ~60fps and skipping the rest), which isn't bad for a single thread. It's about as fast as the 0.1.0 release in most scenarios, despite being way more accurate.
I've refactored a lot of the PPU's code for this to reduce the amount of duplicated work (e.g before subscreens and mainscreens were rendered separately, even though you can basically process both of them at the same time with just a couple of conditions). It also reduces a lot of the excessive templating I had originally used for the PPU, which thankfully speeds up compilation as well.
The mosaic effect for high res modes should also be fixed - processing the main & sub screens at the same time actually made it a lot easier to fix. I *think* the mosaic effect should be mostly ok now - it's still not implemented in mode 7, though.
As far as maintaining multiple versions of the core, time will tell, but I'd rather stick to a single core as much as possible, for the sake of simplicity (both mine and users'). If I can manage to get it to run at fullspeed on a RPi4, great, otherwise, one day a RPi5 will come out :p (RPi3 vs 4 boosted the Mesen libretro core's speed by 200%!)
In terms of multithreading, I'm still hoping that I can get away with rendering the entire picture on a separate thread while the emulation core is allowed to continue. I think logging the state of vram/cgram/oam once per frame + all register writes would be enough for this. The only obvious scenario where I can see this being a problem are for games that read the sprite range/time flag - but those flags can be calculated on-the-fly (with proper timing) based on the contents of OAM if that ever happens.
If it works, it would split the workload in about half and might get me a 60-80% increase in performance with relatively little change to accuracy. Obviously it can introduce a small amount of additional input lag (only if the system can't reach 60fps on a single thread), but a little input lag is better than running at 50 fps. This is essentially what I do with HD packs on Mesen.
RE: working on higan, I'm probably nowhere near knowledgeable enough about the SNES to really help on higan just yet, unfortunately. I still need to get around to implementing (at least some of) the enhancement chips and the like before I can get a better picture of how everything interacts - at the moment I still don't really know how any of them work. Now that I've fixed most other CPU/SPC/PPU issues, I might try to start working on DSP emulation soon since it sounds like that would be the most simple one.
That being said, I am looking at higan's code fairly often, so if I happen to spot anything that looks like it might be incorrect/incomplete, I'll be sure to mention it!
The videos of op_timing_test_v2 on console show LDA XM $0525 on the Stack Rel Ind Idx Y Timing page but on Mesen-S and Bsnes the result is $0560.
That's my fault - the last 2 stack relative tests were incorrect in the original version. They used the wrong addressing mode, and also gave random results (because it loaded a random address, that could end up being 12-cycle or 6-cycle egisters). The updated version is
here and koitsu also recorded that version of the test in this
video (starting at 0:24)
Just to be clear, bsnes does run the "bugged" version properly, too, it's just that the (random) contents of RAM impact the test, so sometimes you get $525, sometimes you get $560. (Mesen-S should do the same too, if you turn on the random power on ram option)
Great work on speeding up the PPU! I was going to write my concerns with rendering video in a truly separate thread but it seems like you covered it all already. I'd like to do it with bsnes anyway as an option, knowing it will add up to one frame of latency, since it does give a nice speed boost for non-special-chip games.
I'll expose a randomness setting in the next higan release. The core actually has three modes: none (runs a lot of homebrew, including my own older demo_* test ROMs), low (tries to mimic the patterns you see in RAM on real SNES decks, probably not great though), and high (full-on PCG randomization of everything. Clever ROMs can detect an emulator this way, but it's also the best way to suss out uninitialized memory accesses in homebrew short of a debugger keeping track of that and directly informing you. Which, hint hint with the usage map if you implemented that ^-^;)
From my end ... I have optimized the SMP and PPU to its limits, and the DSP from anomie and blargg aren't going to be beaten by my attempts. So what's left is to try optimizing the CPU and GSU. Attempts at things like 16/24-bit block reads didn't work well at all. I think I'll need bigger guns. Opcode decoding's not really a bottleneck on the 65816 like it would be with an ARM7, and there's no prefetcher or 3+-stage pipeline. So it's pretty much IRQ testing and just raw computations. We can't do the 6502 NZ=result delayed computation trick because the 65816 can swap between 8-bit and 16-bit modes for both A and X/Y. I'm using a binary min-heap for scheduling events and testing for them in O(1) time, and some fancy range-testing for IRQ trigger events. Idle loop optimizations seem like the most likely for large gains, but those are notoriously difficult to get right, and they only end up hurting performance in 100% CPU load cases, which are quite common. Hmm :/
Sour wrote:
That's my fault - the last 2 stack relative tests were incorrect in the original version. They used the wrong addressing mode, and also gave random results (because it loaded a random address, that could end up being 12-cycle or 6-cycle egisters). The updated version is
here and koitsu also recorded that version of the test in this
video (starting at 0:24)
Thanks, Indeed the updated version works fine.
With timing_test.sfc I found it strange that it showed random results(56-57 Higan/52 bsnes) at V-pos for the HDMA test but V-pos of the following tests stayed the same.
Oops, its overwriting the results of the HDMA test here?
https://github.com/SourMesen/SnesTests/ ... ain.s#L370
byuu wrote:
but it's also the best way to suss out uninitialized memory accesses in homebrew short of a debugger keeping track of that and directly informing you. Which, hint hint with the usage map if you implemented that ^-^;)
This is an option in Mesen (Break on uninit memory reads), and it's almost available in Mesen-S - it's only missing a UI option to turn it on and a tiny bit of code to trigger the breakpoint. It's near the top of my list of debugger features I need to finish/add.
I actually haven't paid much attention to the actual CPU core so far (and there are a number of things that aren't really optimal in terms of flag handling, etc.). The majority of the time (other than SPC/PPU) seems to be spent on "clocking" the entire system (e.g incrementing PPU position, checking for IRQs, checking for DMAs, etc).
At the moment SPC seems to be 5%, DSP 5%, too. Clocking + processing CPU reads are ~15% together. Then the PPU tends to be a good 30-40%, I think, would have to profiler again and break it down into categories to really know. It's eerily similar to Mesen in general, though. (The saving grace on the SNES is that you can run the PPU in batches, whereas on the NES this becomes a nightmare because cartridges can spy on the VRAM bus)
For the "events" I'm currently just using a boolean array with a few hclock values set to true (and they get updated as needed) and then I process all possible events whenever I hit a true value. The CPU time spent on checking that boolean array is fairly high, though, so I need to see if I can somehow find a better solution for this.
paulb_nl wrote:
Oops, its overwriting the results of the HDMA test here? :)
And suddenly the results make so much more sense! I've been wondering for weeks why that particular value was different. Thank you!
With this, it looks like Mesen-S matches bsnes/higan pretty closely, but it looks like I might have an issue with the very last test (DMA with fast rom turned on) since the timings diverge quite a bit there compared to higan 106. According to koitsu's run of the test, it looks like higan has it right though, so I'll have to investigate.
Just finished adding LLE DSP support, so mario kart finally works.
It seems to be working for the DSP games I've tested, except Super Bases Loaded 2 (the game screen is broken once gameplay starts). But I'm not sure if that's due to a bug in the DSP code or just another general emulation issue (I haven't really looked into it).
It supports loading the bios files (in the "Bios" subfolder) as a single file or 2 separate files (using what seems to be higan's current implementation + what it used to be before, I think?)
Awesome work! Super Bases Loaded 2 uses a different memory map than the other DSP games.
https://preservation.byuu.org/games/Lj4b/BxxE98BThttps://preservation.byuu.org/boards/SycBSHVC-2B3B-01Code:
memory type=ROM content=Program
map address=00-3f,80-bf:8000-ffff mask=0x8000
memory type=RAM content=Save
map address=70-7d,f0-ff:0000-7fff mask=0x8000
processor architecture=uPD7725
map address=60-6f,e0-ef:0000-7fff mask=0x3fff
memory type=ROM content=Program architecture=uPD7725
memory type=ROM content=Data architecture=uPD7725
memory type=RAM content=Data architecture=uPD7725
oscillator
(the mask selects between DR and SR.)
The ST011 requires additional uPD96050 instructions that aren't in the 7725, to account for the larger ROM and RAM.
Quote:
It supports loading the bios files (in the "Bios" subfolder) as a single file or 2 separate files (using what seems to be higan's current implementation + what it used to be before, I think?)
Yeah. st011.rom == st011.program.rom + st011.data.rom concatenated.
Another fair warning, very few people have the firmware and there's no way to make this easy for them. After about seven years of trying, I gave in and restored the DSP HLE code for when firmware wasn't available, but that's about 600KiB of code and doesn't emulate the DSP-3, ST-011, or ST-018. It also lacks timing, so every DSP operation completes instantly, meaning games run a bit too fast.
EDIT: removed formats talk. I should've have brought it up in this thread.
byuu wrote:
Awesome work! Super Bases Loaded 2 uses a different memory map than the other DSP games.
That was exactly it, thank you!
And thank you for your previous work on this - would have taken me an exponentially longer amount of time to figure this out if all I had was the data sheet & nocash' docs without any actual implementation to reference.
Just got the ST01X versions working as well (wasted 2 hours on the Shougi game because I accidentally gave the chip 2kb of RAM instead of 4kb...).
Adding the bios at the end of the file seems pretty reasonable - it also sounded like nocash' snes emulator might support this based on his documentation:
Quote:
Ideally, the uPD77C25 ROM-Image should be appended at the end of the SNES ROM-Image. In practice, it's often not there, so there's no way to detect if the game uses this or that uPD77C25 ROM
RE: HLE emulation. I'm not really planning on supporting HLE emulation. Like you said, it's a huge amount of effort in terms of coding and far more error-prone, too. Plus HLE just doesn't mesh well with the debugger tools - kind of hard to support trace logging & debugging the execution when there is no real code running. And with movies (or even netplay), everything can easily fall apart unless you start requiring that a movie recorded with LLE needs the bios to be played back, etc. There are just far too many other things I need to get done to consider putting any time into HLE at this point - maybe in a few years! :p
Quote:
And thank you for your previous work on this - would have taken me an exponentially longer amount of time to figure this out if all I had was the data sheet & nocash' docs without any actual implementation to reference.
The hail mary fix that blew our damned minds at the time was that the uPD7725's left shift instruction
shifts in ones, not zeroes! So 3<<2=15, not 12. I was tracking down missing tiles in Top Gear 3000, and after an obscene amount of debugging I started getting desperate when I thought to try that one.
Quote:
Just got the ST01X versions working as well (wasted 2 hours on the Shougi game because I accidentally gave the chip 2kb of RAM instead of 4kb...).
Tangent: there's a Japanese supermarket near me that plays the same song you hear on the Shougi game title screens over their loudspeaker on a loop. Some kind of public domain song I guess. I have that song burned into my mind, so it's amusing when I go there.
...
Okay, moving on ... the HG51B169 in Mega Man X2/X3 is also really quite simple. It gets a little tricky with the program ROM caching that affects the timing in the intro sequence, but it's not much worse than the uPD96050.
The ARM6 in Hayazashi Nidan Morita Shougi 2 ... honestly ... just don't support that, unless you're a masochist :P
I've never found anyone that actually tried to play it. And when I implemented it, it led to me making a GBA emulator since I had it, and now here I am 24 emulators and counting later wondering what the heck happened to my life :P
But, if you really want it ... you're free to use my core, or to write your own, just ... fair warning.
...
Oh, credits! I'd like to credit Talarubi, AWJ, Lord Nightmare for their help with uPD7725/96050 emulation. segher, Overload, Jonas Quinn for their help with HG51B169 emulation. Talarubi for hir help with ARM6 emulation. I couldn't have done it without those folks. Apologies if I forgot anyone! ^-^;;
EDIT: removed formats talk. I should've have brought it up in this thread.
byuu wrote:
If we can convince No-Intro to adopt this format now that apparently three emulators and counting do, I'll remove my HLE code again.
I signed up for the No-Intro forums to politely argue for and request this. The SNES datfile now has separate "combined" and "split" variants, so it's technically possible, but the only game that actually *has* combined and split variants is... PowerFest '94. And it doesn't include the DSP firmware, it just combines (or splits) the control ROM and the three game ROMs.
Maybe another round of polite requesting is in order.
Can DSP firmware be nondestructively dumped through the Game Pak's edge connector? If so, Paul at Infinite NES Lives might be our best hope in making LLE DSP the default by having updated INL-Retro software dump it.
byuu wrote:
It's also easy to figure out what firmware you have [...]
It's also easy to figure out if the ROM has a copier header, and yet we mostly have "cleaned" files now...
I'd like appended firmware much more if there was
something else in the file that describes the file layout. For example a directory footer. Just going by file size isn't enough imo.
Cartridge type in the
internal header (at $00FFD6 in S-CPU address space, 0x7FD6 in LoROM image, 0xFFD6 in HiROM image, or 0x40FFD6 in ExHiROM image) might help determine the file layout. Value $03 means DSP, and $05 means DSP and battery RAM. Other values denote GSU, OBC1, SA1, S-DD1, or LR35902+ICD2 (SGB chipset). Values $F0 and above mean the value at $00FFBF chooses which third-party coprocessor is used: $01 for ST010/011, $02 for ST018, or $10 for CX4.
Shifting in 1s is definitely on the weird side - really wonder why the data sheet makes no mention of it, too. It sounds like something that people programming for that chip would need to know, heh. But then, data sheets are also always vague about when flags are set and all that, too.
byuu wrote:
Tangent: there's a Japanese supermarket near me that plays the same song you hear on the Shougi game title screens over their loudspeaker on a loop.
Listening to it again, it is a very typically "traditional" Japanese-sounding soundtrack - I don't think I recall ever hearing it in Japan myself, though. (Mind you I've only entered a supermarket there a couple of dozen times at best :p)
RE: putting the bios at the end of the rom files - I'd agree with tepples here, using the bytes in the SNES "header" that determine which chip is needed should be enough to assume that the bios might be present at the end of the file and check for it before falling back on standalone bios files (and HLE after that if it's supported).
Quote:
Well, thank you for that. And in that case, please accept my apologies that I now do.
No need to apologize at all, and feel free to keep HLE support in if you want, too! I agree having the no-intro hashes include the bios in the file would probably be the simplest way for everybody.
Good to know the chip for X2/X3 is relatively simple, too. The Super FX, SA-1 and X2/X3 chips are essentially the ones I want to get done next, since they're also the ones that are used in the more popular games. I might try to tie up some loose ends and release a 0.2.0 version before I try to get started on those, though, since they'll probably take a while to get working.
The ARM6 chip is probably out of scope for me - at that point it's almost like putting a raspberry pi in a NES cart and then asking emulator authors to emulate the chip + its OS. :p
Like you said, I doubt anybody actually wants to play that particular game (and if they do, they can just use higan). Someone in the future will inevitably say "Mesen-S sucks, it doesn't work with X game that works on Y emulator", but after 290+ NES mappers, I've gotten used to it!
Millions of people have downloaded ps1 and ps2 firmwares and placed them in the emulator's firmware dir. It ain't hard and it keeps the data distinct. Blobbing stuff together just makes it harder to isolate checksum and size info, it also lowers awareness that it's even there.
I mean, just take this simple scenario. I dump my SMK cart. Its checksum is xxxxxxx. I compare it to the existing dump in no intro. Turns out the csum is different. Do I have a new dump? Bad dump? Did I forget to remove the copier header? Suddenly something that should be simple even for the layperson becomes a chore involving research and file splitting. Just to verify the part of a file thats actually dumpable.
Yes, heuristics can get commercial games working, and adding/detecting firmware is possible. But internal headers don't have to contain valid data (e.g. prototypes) except for the vectors, and firmware can't be stacked indefinitely. These solutions needlessly (imo) put a lid on future extensibility.
FitzRoy wrote:
Millions of people have downloaded ps1 and ps2 firmwares and placed them in the emulator's firmware dir. It ain't hard and it keeps the data distinct. Blobbing stuff together just makes it harder to isolate checksum and size info, it also lowers awareness that it's even there.
I mean, just take this simple scenario. I dump my SMK cart. Its checksum is xxxxxxx. I compare it to the existing dump in no intro. Turns out the csum is different. Do I have a new dump? Bad dump? Did I forget to remove the copier header? Suddenly something that should be simple even for the layperson becomes a chore involving research and file splitting. Just to verify the part of a file thats actually dumpable.
This, a thousand times over. What is it with certain people and file "formats"/screwing with files? Stop it already.
Other emulators --
example: Mesen (not Mesen-S, but could certainly be added to Mesen-S) -- already have the equivalent of databases of MD5/SHA1 hashes of ROMs for identification. There's your identification model, re: heuristics. For these titles, require an external BIOS/FW/whatever file, selectable somewhere else in the UI, or even just a compile-time hard-coded filename that must exist in the same directory as the ROM (I don't care how it's done) that correlates with said game. Famicom Disk System/FDS BIOS is a good example: when loading an FDS file, Mesen prompts you to the path of the FDS BIOS filename.
The number of games requiring this on the SNES has got to be astoundingly small -- I am guessing in the extremely low 2-digit range. These games are thus the exception to the rule, not the rule itself. There were 1758 official SNES/SFC games released during its lifetime; so 20 games requiring this is literally 1.1% of all the games released for the system. It's not worth it. Just think about it.
As usual, the mere mention of file formats ends up with everybody giving their (incompatible) personal opinions on the matter. No surprise here :p
I've never overly cared too much about file formats (and am not interested in the endless arguments about them), I'll support things if they exist and are helpful for the end user, and won't if they're not, and that's about it.
That being said, let's not end up with another page of file format discussion like what already happened at the beginning of the thread - this is not what this thread is for. Thanks!
Quote:
The Super FX, SA-1 and X2/X3 chips are essentially the ones I want to get done next
I'll just warn you that you're going to become very cynical and jaded when people here complain about the difficulty of the MMC5 once you implement those two. Especially the SA-1. Nintendo threw the kitchen sink at the SA-1, and then games barely used it. There's a good half-dozen major features I try to emulate, but not a single commercial game uses them, so my implementations are likely wrong (eg HVIRQs ... how does it keep track of the H/V counters without H/Vblank pins?)
Just released version
0.2.0This version is mostly accuracy/timing fixes and a lot of debugger tool improvements (+ DSP support).
Many thanks to everybody in this thread for helping out! I think I've had more help fixing emulation bugs on Mesen-S in 3 months than I have on Mesen in 3 years :p
Next version will probably be focused on adding support for the missing chips and peripherals (e.g multitap, super scope). Might also try to get around to adding options like removing the sprite limit and maybe try to implement the same kind of CPU overclocking that NES emulators use and see how that works out.
byuu wrote:
Especially the SA-1. Nintendo threw the kitchen sink at the SA-1, and then games barely used it. There's a good half-dozen major features I try to emulate, but not a single commercial game uses them,
That's actually good to know - thanks! This makes it sound like the most painful part of the SA-1 is integrating it with the rest of the system in terms of bus accesses, etc.? Since the CPU core is essentially identical, I might try to do the SA-1 first (rather than the Super FX) - that way I can focus on figuring out how to properly integrate it with the rest of the code & managing bus accesses, without having to worry about whether or not my problem is caused by a CPU bug or something else.
But for now, I'm taking a step back from this for a few days - I need a break!
Quote:
Just released version 0.2.0
This version is mostly accuracy/timing fixes and a lot of debugger tool improvements (+ DSP support).
Congratulations! Looks great :D
Quote:
Many thanks to everybody in this thread for helping out! I think I've had more help fixing emulation bugs on Mesen-S in 3 months than I have on Mesen in 3 years :p
It's seriously long overdue that we had a fresh face in SNES emulation. I hope you'll stick around long-term and that we'll find lots of new hardware discoveries out together ^-^
The more we figure out about the hardware, and the more quality open-source implementations we have, the easier it'll get for more new emudevs to enter SNES emulation and quickly get up to speed, as is the case with the NES scene currently.
I dream of the day we have a drop-in PPU core similar to blargg's DSP that has the accuracy we have in NES emulators. I'll seriously consider the SNES well-preserved and be able to retire feeling content once we reach that point.
Quote:
and maybe try to implement the same kind of CPU overclocking that NES emulators use and see how that works out.
I'm interested in this as well. The obvious trick of adding more Vblank scanlines would bias the timing (frame rate) and thus require us to clock the CPU faster to compensate, which may interfere with timed raster effects as with Air Strike Patrol and lots of buggy games that overshoot blanking periods. Only making the CPU faster during Vblank lines is likely the best strategy, right?
Quote:
This makes it sound like the most painful part of the SA-1 is integrating it with the rest of the system in terms of bus accesses, etc.?
The most painful for performance, yeah. Snes9X came up with an approximation that's not too bad if you want to focus on performance. But if you're going for all-out accuracy (^_^) then a different approach will be needed.
This is the gold standard test ROM for it:
https://github.com/VitorVilela7/SnesSpeedTestThe design of the SA1 is ingenius and evil: the CPU cannot be stalled because the SNES CPU has no concept of external wait states (/DTACK on the Genesis, for instance.) So instead, the SA1 detects when the SA1 CPU tries to access ROM, BWRAM, or IRAM while the SNES CPU is accessing it, and will insert wait states into the SA1 CPU. The obvious million dollar question is, what happens if the SA1 is in the middle of reading from one of those when the CPU comes in to try? As far as I can tell, the answer is it just lets it finish doing the read and somehow there's enough headroom to let everything still work.
The way I do it is for every CPU read/write to set a "MAR" (memory address register) variable to hold the current state of the address bus pins. This also has to be done for DMA/HDMA accesses. It took a whole lot of trial and error to get the best results in Vitor's test ROM, but I still don't have it perfect.
Key findings:
https://github.com/byuu/higan/blob/mast ... ry.cpp#L11It doesn't seem? to invalidate the address bus pins on idle cycles (but much more likely, the reason is that idle cycles tend to set the address bus to the current program counter, which is still going to stall out the SA1 because it doesn't seem to recognize it's an idle cycle.)
https://github.com/byuu/higan/blob/mast ... ng.cpp#L30This one's going to suck to emulate. Past logic analyzer traces done on SNES refresh show it actually looks like 5-cycle read + 3-cycle idle, repeated five times for 40 cycles total. Just having that pulse run for 40 clocks gave me less precise timing matches than breaking it into five sections. I rounded to 6-2 x5 because everything else steps by 2, and even forcing 5-3 at a good speed hit (have tp step the IRQ counter at 21.4MHz instead of 10.7MHz) didn't improve the results any. It turns out that the SA1 carts connect the DRAM refresh pin to the SA1 CPU, and it really does affect the timing.
https://github.com/byuu/bsnes/blob/mast ... rom.cpp#L2To avoid destroying performance completely, I added a setting to bypass the memory address stall and synchronizations on ROM, BWRAM, IRAM accesses. This of course is not accurate and fails Vitor's test, but it also kind of acts like an SA1 overclock so ... call it a feature! ^-^;;
Once you've implemented this let me know, and then I'll dive into the nitty gritty stuff I haven't fully been able to figure out yet :3
Quote:
I might try to do the SA-1 first (rather than the Super FX)
I went the same route. It's a good strategy even if the SA1 ends up being more complex as a whole.
Quote:
But for now, I'm taking a step back from this for a few days - I need a break!
It's well earned. No rush, see you when you're back.
...
Side note, I just fixed my own bug in Donkey Kong Country 2, it appears the game is messing with sprite registers mid-scanline? Didn't dig too deeply, but I had to revert my tile/item caching so that one line renders from the previous line's cache. Might be worth taking a look with Mesen-S just to make sure the sprite timing changes didn't affect it.
Thanks!
byuu wrote:
Only making the CPU faster during Vblank lines is likely the best strategy, right?
On the NES, the overclocking used is to add scanline after the end of the picture but BEFORE vblank/nmi.
This means that games have as much time as normal during NMI/vblank, meaning timed coded at the top of the screen, for example, will still occur at the correct timing. During those scanlines, the APU is suspended so from the APU's point of view, nothing changes.
This gives the game more time to finish up calculations before NMI occurs, and the majority of games appear to be compatible with this on the NES. Some people have done a lot of testing with as many as 1000 extra scanlines before NMI with no adverse effect on a lot of games - 1000 extra lines is usually enough to remove all traces of slowdown in all games.
I imagine doing the same on the SNES (e.g adding scanlines between scanlines 224 & 225 and pausing the SPC during them) would probably work out relatively well in most scenarios - but games with really tight SPC timings might potentially end up locking up?
You know, the more I think about it, the more I think it's actually pretty easy to implement - I'll give it a try soon and see how that works out :p
Quote:
Thanks for the SA1 info and the link! Hopefully it'll be useful to test basic SA1 behavior at the start when all the games refuse to boot. I think I have a pretty decent idea where to start, just need to figure out how I want to handle having 2 different CPUs with 2 different memory mappings (in terms code design).
One question though, does bsnes run the SA1 1 cycle at a time like it does for the SPC? I'm guessing it does? But in that case, my next question is, does anything break if you don't? e.g if I just run a full SA1 instruction and then wait X master cycles before running the next, should I expect games to break?
RE: DKC2, it seems to be working properly on my end, as far as I can tell - any specific part of the game that had issues? I took a look in the event viewer, and it doesn't seem like it's writing to the OAM registers outside of vblank, as far as I can tell.
Attachment:
dkc2.png [ 57.99 KiB | Viewed 4053 times ]
The HDMA seems to be mostly to adjust BG2's scroll offsets and the only OAM writes I could find at part of the PPU writes at the bottom. I might just be looking at the wrong part of the game, though?
Sour wrote:
I'll give it a try soon and see how that works out
So, did a quick implementation of overclocking as I was describing it. So far it seems to be working pretty well - depending on the game, it looks like some games prefer having the extra scanlines before NMI and others at the end of regular vblank (i.e after NMI).
I've mostly tested with 1000 extra scanlines (to be a little extreme in the timing changes) and it seems to fix the slowdowns in Gradius 3, Super Ghouls & Ghosts and Super R-Type. The emulator only runs at around 100fps (with some frame skipping) with that much overclocking, though.
Some games will lock up if putting the lines before NMI, others will lock up if putting the lines after NMI, but it seems like the majority of games actually run fine in at least one of the 2 modes. Obviously, CPU "timed" code like the triforce animation in Zelda: LTTP is affected by it. Putting 1000 scanlines after NMI in that case makes it go super fast and kinda screws up the last bit of the animation, too. 1000 lines before NMI speeds it up a little bit, but beyond that no difference. A.S.P and even games that have been picky on my SPC timings so far (e.g Tales of Phantasia, Rendering Ranger R2, Illusion of Gaia) seem to work with 1000 extra scanlines, too. Some do break, though - Super Metroid stops booting at ~440 lines after NMI (and less than that when before NMI).
Overall, though, this was pretty easy to implement and doesn't really add any overhead to regular emulation. The SNES itself can already change when the NMI occurs due to overscan mode, so having a couple more settings on top of this and recalculating the values once per frame doesn't change much.
If anybody wants to play with it, it's in Options->Emulation->Overclocking.
Quote:
I imagine doing the same on the SNES (e.g adding scanlines between scanlines 224 & 225 and pausing the SPC during them) would probably work out relatively well in most scenarios - but games with really tight SPC timings might potentially end up locking up?
It would break streaming audio games like Tales of Phantasia, Breath of Fire II (d4s), etc. Not a large portion of the library but still unfortunate. Well in any case, all we can do is implement things, have people try it out, and add refinements.
Quote:
One question though, does bsnes run the SA1 1 cycle at a time like it does for the SPC? I'm guessing it does? But in that case, my next question is, does anything break if you don't? e.g if I just run a full SA1 instruction and then wait X master cycles before running the next, should I expect games to break?
With libco (my cooperative threading library), I can switch from one CPU to another at any point within any function at any call stack level. So if you turn on that "fast sync" hack I mentioned, the SA1 can execute many instructions before the SNES CPU starts running again, and vice versa.
The thing that hurts libco with the SA1 is that both the SNES CPU and SA1 can simultaneously access BWRAM and IRAM, which are of course volatile, and ROM can be dynamically remapped. So in effect, for perfect synchronization you would have to synchronize to the other component every time ROM, BWRAM, IRAM, and I/O registers were accessed, which is almost every cycle.
So essentially, yes, bsnes' SA1 core is cycle accurate. But Snes9X's SNES CPU and SA1 cores are both opcode-based, and to my knowledge it does not break any games. You won't be able to perfectly pass Vitor's speed tests, but you may consider it a sacrifice worth making for performance and sanity. If not, you will need to have a cycle-granularity state machine, at least for your SA1 CPU core. And truth be hold, I'm not entirely certain you can completely pass his test with just cycle-based granularity, I guess we'd find out though ^-^; (you -should- be able to ...)
What you're going to find out and hate about the SA1 is, in spite of all this power and all of these cool functions, games barely use the chip. It feels more like they put it in there for anti-piracy than any other reason, and it's rather disappointing to emulate.
Quote:
any specific part of the game that had issues?
Every level had flickering sprites, hahah. It looks good, so then there's no issue with your sprite implementation, perfect! :D
Quote:
it doesn't seem like it's writing to the OAM registers outside of vblank, as far as I can tell.
... now that is very weird to hear. I guess I should have researched more instead of reverting my one-line item/tile cache design.
I'll look into it more when I have time and then talk about it again later. Thanks for verifying!
Quote:
Putting 1000 scanlines after NMI in that case makes it go super fast and kinda screws up the last bit of the animation, too.
Call that one ZSNES mode? ^__^
Anyway, sounds good! I'll give it some time to see how it goes for your users before implementing it :3
Sour wrote:
Note: The emulator currently requires the 64-byte SPC bios to be put in Mesen's data folder (i.e the one you picked on startup) and named "spc700.rom". If the file cannot be found, the UI should print an error message about it when trying to load a game.
Where is this "data" folder located? There were no subfolders in the main directory. Just the .exe file
Hey Sour, when you're not busy, can we talk sprite evaluation for a bit? ^-^;
https://github.com/byuu/higan/blob/mast ... u/main.cpphttps://github.com/byuu/higan/blob/mast ... ct.cpp#L34https://github.com/byuu/higan/blob/mast ... ct.cpp#L94I implemented the H=0-255 sprite evaluation, easy enough. Had to come up with a rather hacky way of representing the main emulator loop to avoid destroying performance, but it works.
But let's talk about this:
Quote:
H=270-337: OAM reading to determine which VRAM addresses need to be fetched (starting from the last sprite index stored during evaluation)
Even cycles: Read X+Y (word)
Odd cycles: Read 2nd word (priority, palette, etc), determine the VRAM address to load on the next cycle. If all tiles for this sprite have been processed (e.g based on its width), move on to the next sprite index in the sprite evaluation table.
H=272-339: VRAM fetches for the sprites
Even cycles: Read first word of sprite tile, based on the address calculated on the previous cycle
Odd cycles: Read the 2nd word of the sprite tile
If H=270-337 reads from actual OAM, then it would have to update the OAM address, which would alter where writes went during Hblank. Doing so breaks Uniracers.
If all of H=270-339 looks up addresses to fetch VRAM from, then we will break Mega lo Mania's introduction and half of the scanline at the top of the screen and half of the scanline in the middle of the screen will be rendered incorrectly. The game is changing OBSEL right around H=300. If we try caching OBSEL at H=270, the the entire first and middle scanlines get rendered incorrectly.
I have some theories on both, but I'd like to hear your thoughts first, and also to hear if Mega lo Mania looks correct with your current renderer, and if so ... how ^-^;;
SusiKette wrote:
Where is this "data" folder located? There were no subfolders in the main directory. Just the .exe file
It's the folder that gets opened when you click the "Open Mesen-S folder" button in Preferences, at the bottom left.
The SPC bios is no longer required as of 0.2.0, though. And for DSP games, the UI will prompt for the file if it's missing and store it in the Firmware subfolder.
byuu wrote:
If H=270-337 reads from actual OAM, then it would have to update the OAM address, which would alter where writes went during Hblank. Doing so breaks Uniracers.
Based on paulb_nl's posts and the FPGA implementation, the "current" OAM address is (roughly) this:
Code:
if(_forcedVblank || _scanline >= _vblankStartScanline) {
return _internalOamAddress;
} else {
if(_memoryManager->GetHClock() <= 255 * 4) {
return _oamEvaluationIndex << 2;
} else {
return _oamTimeIndex << 2;
}
}
_internalOamAddress: the OAM address written to the registers
_oamEvaluationIndex: The sprite index currently being evaluated by OAM evaluation (0-255)
_oamTimeIndex: The sprite index being read by the "sprite rendering" portion (270-337). Note that the value is apparently used starting right after sprite evaluation. The value stops at the last OAM entry processed for the scanline (in reverse order from their order in OAM) - if a scanline has no sprites based on sprite evaluation, then this value is retained from one scanline to another, until another scanline with sprites occurs.
So in Uniracers' case, the game writes to OAM on a scanline that has no visible sprites - this means it will write to the address of the last sprite processed on the last scanline that contained sprites (this makes it work). It's supposed to write to BOTH the "low" table of OAM and the "high" table of OAM for the current sprite at once (most likely because the PPU is accessing both addresses on the same cycle). Though I think it still respects the latch for the low table, meaning it would only write to it every other write. Unsure what happens in terms of the OAM address set by the register (e.g: does it get incremented by a write in this scenario?)
For Mega Lo Mania, the OAM select bit isn't cached, I'm reading it on every odd cycle from 270-337 to calculate the VRAM address. It works because there are a few transparent sprites before the non-transparent ones:
Attachment:
megalo.png [ 47.52 KiB | Viewed 3357 times ]
At scanline 128, the PPU will start by loading sprites 42 & 43, which are 8 tiles wide, which will take 16 cycles. (270-285). It also looks like the first 1-2 "tiles" of sprite 41 are fully transparent on that line. So, by the time it processes the address for the first tile with opaque pixels, it's at least on cycle 287, and most likely closer to 289-291. Since the register write occurs at cycle 283 at the latest, there are no missing pixels on the line. In reality, the first few tiles (on the right side) of the scanline are most likely rendering using the first "bank" of tiles and it switches to the next bank after the write occurs - since the pixels on both banks are all transparent, it has no visual impact.
Keep in mind this is just my understanding of it after spending a day re-reading forum posts and reading through the FPGA implementation. I most likely have some details wrong (and some stuff is still unknown), but at the very least it's enough to make all (known) affected games (mega lo mania, aladdin, uniracers) behave correctly.
Hope that helps!
Thanks for the help!
Quote:
_oamTimeIndex: The sprite index being read by the "sprite rendering" portion (270-337).
Quote:
It's supposed to write to BOTH the "low" table of OAM and the "high" table of OAM for the current sprite at once
Oh, from the notes it sounded like it was writing to the low table only. Okay then.
Quote:
It works because there are a few transparent sprites before the non-transparent ones:
Oh, I get it. It doesn't take 34*4 hclock cycles to do the tilefetch, it takes 34*8 hclock cycles.
Before:
Code:
43 @1080 [270]
42 @1096 [274]
41 @1112 [278]
40 @1128 [282]
OBSEL write at 128,1136[284]
39 @1144 [286]
38 @1160 [290]
37 @1176 [294]
36 @1192 [298]
Sprites 43-40 end up using the old OBSEL value, and sprites 39-36 end up using the new OBSEL value.
After:
Code:
43 @1080 [270]
42 @1112 [278]
OBSEL write at 128,1140[285]
41 @1144 [286]
40 @1176 [294]
39 @1208 [302]
38 @1240 [310]
37 @1272 [318]
36 @1304 [326]
We should probably use Hclocks (0-1363) instead of Hdots (0-340) when the values are known, if that's okay ^-^;
Code:
if(_forcedVblank || _scanline >= _vblankStartScanline) {
return _internalOamAddress;
} else {
if(_memoryManager->GetHClock() <= 255 * 4) {
return _oamEvaluationIndex << 2;
} else {
return _oamTimeIndex << 2;
}
}
So then there's not a way to write to the high two bytes of the lower OAM table with this, huh?
(A0 would come from the latch behavior if $2104 writes, but A1 isn't being kept with <<2)
Also, if writes go to both low and high tables, does that mean reads become an OR/AND bus conflict result of low and high tables? ^-^
Actually it doesn't seem that the write in Uniracers goes to both tables, or that messes up the sprites. Maybe it's based on A9 from OAMADDR?
byuu wrote:
We should probably use Hclocks (0-1363) instead of Hdots (0-340) when the values are known, if that's okay ^-^;
Sorry! Coming from the NES, I'm used to only see the PPU in terms of dots rather than master clocks :p
Though in this case, using hclocks can be tricky since you need to adjust for the long dots after 1292/1310, heh.
byuu wrote:
So then there's not a way to write to the high two bytes of the lower OAM table with this, huh?
I'm unsure - the OAM address might be (_oamTimeIndex << 2) + 2 on odd cycles, when the PPU is fetching the attribute/palette word.
Internally the PPU is loading both bytes in a single dot, so I'm unsure how the latch behavior for writes works here.
The way I have it implemented atm will actually never write to the low table (hadn't noticed until now) - but I'm fairly certain paulb_nl's posts/tests said the writes can go to both tables at once.
Unsure about the behavior on reads, too - but writing a test rom for this shouldn't be overly hard I think? Just need to fill the low table with something like $77 and the high table with $88, keep reading from the register at scanline ~119+ and see what values come out of it. (presumably would be $77, $88, $00 or $FF)
Well just for fun ...
So to display "correct behavior" on HblankEmuTest/NTSC, any amount of waiting during sprite tile fetching when the display is forced blanked does it:
https://github.com/byuu/higan/blob/mast ... ct.cpp#L98If we don't do that, but we block tile reads when the display is disabled, then only the first row of text properly splits between incorrect/correct:
https://github.com/byuu/higan/blob/mast ... t.cpp#L149If we don't do either, then it shows "incorrect behavior".
None of these match the screenshots shown thus far, ah well :P
Here's my current guess for OAM reads/writes during active display. Good enough for Uniracers, but probably very wrong.
https://github.com/byuu/higan/blob/mast ... io.cpp#L31On the bright side, very easy to change it once someone makes a test ROM to verify correct behavior :D
Quote:
Though in this case, using hclocks can be tricky since you need to adjust for the long dots after 1292/1310, heh.
Oh god I hope those don't actually affect the PPU fetching patterns for sprite tiles.
...
Looking into it again, it would be nice to merge most of the redundant main/subscreen BG rendering:
https://github.com/byuu/higan/blob/mast ... d.cpp#L149The reason I have it split up is because it hides the lores vs hires differences without code repetition.
...
Another sore point:
https://github.com/byuu/higan/blob/mast ... in.cpp#L24This would speed up the PPU quite significantly in NTSC games if we could short-circuit at vcounter() >= vdisp(), but of course it's possible to toggle overscan on and off between scanlines 225 and 240 during active display, so if we do, ... stuff's gonna go wrong if we just skipped over the entire line.
Sour wrote:
The way I have it implemented atm will actually never write to the low table (hadn't noticed until now) - but I'm fairly certain paulb_nl's posts/tests said the writes can go to both tables at once.
During evaluation I was able to write to the X,Y position bytes in OAM by writing to $2104 multiple times in a row. I think it was about 16 writes in a row and only one sprite changed.
I don't know if you can write to low OAM after evaluation (PPU dot > 255).
Great work on Mesen-S Sour.
I have some GSU tests for when you want to do Super-FX here:
https://github.com/PeterLemon/SNES/tree/master/CHIP/GSU/GSUTestThese tests helped redguy make his FPGA implementation on the SD2SNES.
Good luck with everything =D
byuu wrote:
So to display "correct behavior" on HblankEmuTest/NTSC
Yea, I haven't quite figured that one out yet, it displays some pretty "corrupted" looking sprites on Mesen-S atm :p
But at the very least, emulating the cycle fetching/evaluation for bg & sprites, even if it's not 100% perfect is enough to make it harder for a homebrew dev or romhacker to do something that the SNES wouldn't allow, which is good. Will probably revisit this whole thing later on after I'm done with the enhancement chips and the like, for now I'm fairly satisfied with the PPU's timings. There are probably still a few IRQ/DMA/CPU-timing related things I need to fix, though.
krom wrote:
I have some GSU tests for when you want to do Super-FX here
Oh, that's awesome, I didn't realize these existed at all! This should make coding the super FX core way easier than I anticipated, thank you!
Sour wrote:
Oh, that's awesome, I didn't realize these existed at all! This should make coding the super FX core way easier than I anticipated, thank you!
Nice one, very glad to help out =D
Took a few days, but finally added SA-1 support (including a debugger window for it). There are still a few SA-1 features that aren't implemented, so 3 or so games will most likely still have some issues - beyond that it should be working relatively well.
Next up on the list will probably be Super FX support.
First off, thank you for making Mesen-S, and thank you all for this awesome thread which was a lot of fun to read even if I didn't understand it all.
I have found an issue where the "CPU Debugger" window is showing the incorrect state due to some stack fiddling by Dragon Quest 3.
Note that the code is different and the PC's are different. (And the JMP is taken if you continue stepping through)
Here's the stack fiddle
So you have a "COP #$4C" and then the value on the stack is decremented twice and when returning it now uses the #$4C as the opcode (JMP).
So the debugger display shows different code on the left and a different PC (CDC59D) than the trace logger (CDC59C).
I know you've mostly been working on accuracy fixes in this thread so is this where you want debugging talk as well? And how do you feel about feature requests? (Such as selective trace logging options like log on certain addresses or address ranges, or the trace mask option bsnes-plus has)
Thanks again!
Edit: Oh, and this is the Mesen-S 0.2.0 release.
Gave the overclocking a try, noticed a few things.
If you run the extra scanlines before / at the start of NMI, it doesn't really have much effect on speed. Most games seem to have all the heavy lifting that lags out in NMI. But, it doesn't hurt to just offer to add extra scanlines to both/either.
If you run the extra scanlines very shortly into NMI, it screws with things like input polling.
Best default spot seems to be on the last scanline of a frame.
If you don't advance time with coprocessors (eg the SA1), those games break almost instantly with any overclocks. This is a real pain for me because my scheduler has one time clock per thread, so either I advance the CPU on all of them, or on none of them, during the overclocked time. I guess I can revert to my older style of thread syncing where each CPU<>CPU relationship had its own separate counter. That turned out to not really work for a system like the Sega CD, but it's fine for the SNES ...
We should probably find a good 'max' overclock value. The SuperFX is fine with up to 800% overclocks, but the CPU definitely doesn't want such heroic increments. Users would probably try to max out any overclocking slider, so we probably want to make clear that hey, even a 30% overclock is pretty heroic for the SNES CPU, without limiting them to only a minor overclock.
ansarya wrote:
First off, thank you for making Mesen-S, and thank you all for this awesome thread which was a lot of fun to read even if I didn't understand it all.
I have found an issue where the "CPU Debugger" window is showing the incorrect state due to some stack fiddling by Dragon Quest 3.
You're welcome! Thanks for the bug report - reporting debugger issues here is fine, too :p
What's screwing up the debugger is that the game is essentially reusing the 2nd byte of the COP instruction for the JMP instruction (unsure why a SNES game would want to save a byte so much, but maybe the developers might not have known about COP being a 2-byte op?)
Either way, shouldn't be too hard to fix - I'll take a look soon.
byuu wrote:
If you run the extra scanlines before / at the start of NMI, it doesn't really have much effect on speed.
As far as I remember, there was at least 1 or 2 of the 3 games I've tested that "before nmi" scanlines fixed the slowdowns. In terms of NES games, "before NMI" has always been the most "compatible" option, but this might not be true for the SNES.
For the "after NMI" scanlines, I put them at the end of vblank, too, rather than right after the NMI scanline, otherwise it tends to break some things, like you said. I'm running all the CPUs and coprocessors during the overclock on my end and only suspending the SPC, since that's actually the simpler solution in my case :p
In terms of max overclock values, on the NES, both FCEUX/Mesen/puNES support up to 1000 lines of either type, iirc, so that's what I've implemented at the moment (some games on the NES actually do need nearly 1000 extra lines to get rid of the slowdown completely, believe it or not...). FYI, AxlRocks has been testing some SNES titles on Mesen-S to see how they behave on with different values of the before/after NMI settings (he's done this on the NES for like 100+ games on FCEUX/Mesen in the past, too).
---
In other news, just committed Super FX support (including the %-based overclocking for it, though mine only supports multiples of 100%, up to 1000% atm). Like the SA-1, it also has its own debugger window, breakpoints and the like.
It seems to be working in all the games I've tested. In terms of implementation, I'm cheating with how the super fx pauses itself when it tries to access ROM/RAM while the CPU has access to them. Trying to make the super fx CPU into a state machine quickly turned into a nightmare, so I gave up on that fairly early on - this is definitely one of the scenarios where using something like libco can really simplify the code when that much accuracy is needed.
Will probably do S-DD1 next since it seems like that'll be fairly straightforward to add using the existing public domain implementation. And it won't require any debug tools, too, which certainly helps a lot!
Quote:
In terms of max overclock values, on the NES, both FCEUX/Mesen/puNES support up to 1000 lines of either type, iirc, so that's what I've implemented at the moment
Hmm ... I'm doing my overclocking as a % value. For NTSC, each frame gets:
uint clocks = 262*1364;
uint extraclocks = (clocks*overclock)-clocks; //overclock = 1.0 - 4.0
So 786 extra scanlines per frame max currently.
I guess I'll make the upper limit 500%, but ... damn that's a lot of overhead to the emulation, heh.
400% already drops max framerate from ~370fps to ~190fps.
Now throw in a 500% SA-1 overclock to go with it, or better yet, a 500% overclock of the ARM6 ... 105MHz 32-bit CPU, anyone? MSU2? :P
Quote:
FYI, AxlRocks has been testing some SNES titles on Mesen-S to see how they behave on with different values of the before/after NMI settings (he's done this on the NES for like 100+ games on FCEUX/Mesen in the past, too).
Something I've been doing with bsnes' speed hacks is detecting games that don't like them to selectively disable them (eg pixel renderer for Air Strike Patrol, cycle DSP for Koushien 2, etc.)
I'd like the idea of us building out a database of 'metadata' for games like this, and we could include information like "maximum stable overclock%", etc. Stuff like this is great for the end-user experience, instead of them having to guess and change settings per-game.
Quote:
In other news, just committed Super FX support (including the %-based overclocking for it, though mine only supports multiples of 100%, up to 1000% atm).
The SuperFX overclocks so much better than other CPUs. And the way it sleeps when done means you aren't completely murdering performance like with overclocking the main CPU. You can really go to town and games don't care. The one exception is the Stunt Race FX menus will fail after about 400%, but in-game benefits so much that I'd rather not cap it.
I haven't personally noticed any gains past 800%, but I guess if we're allowing 500% on the CPU, 1000% on the SFX seems reasonable.
Quote:
this is definitely one of the scenarios where using something like libco can really simplify the code when that much accuracy is needed.
The areas where libco was a lifesaver:
1. I prefer to not enslave the other processors to the main CPU. This is because the main CPU can start a DMA that is ten frames in length. I know, no game ever will. But it can. Supporting the bus hold delays on CPU reads plus the DMA/HDMA sync handling to exit the CPU resulted in a nightmarishly complex state machine. If someone wants to be a troll, a test ROM that verifies proper DMA/HDMA sync and then does consecutive 10-frame DMAs would be a good one :P
2. building on 1, the CPU and SMP talk so infrequently, and only over a limited 4-byte range, that you can run them way out of order of each other. It ends up being a decent speed *boost* to use libco to be able to treat the SMP like a regular opcode-based interpreter and just context switch when it's (rarely) needed.
3. as you mentioned, the SuperFX ROM/RAM buffering is very pesky otherwise.
And the areas where libco has proven a hindrance:
1. the SA1 shares all of ROM and RAM, so you can effectively never run one CPU ahead of the other. All of that context switching is unbelievably painful.
2. the CPU and PPU have a lot of trouble, too. Even though there's a limited 64-byte window for communication, there's also the H/Vblank signals. I ended up implementing a PPUcounter class for the CPU to inherit which basically predicts what the PPU H/Vblank statuses woud be for any given cycle, because otherwise the CPU wouldn't know if it could run more before the PPU blanking signals would change, and the PPU couldn't run ahead because it wouldn't know if the CPU would write to one of its registers.
libco works amazingly when either thread can run well ahead of the other. Only one is enough. But it falls apart when neither thread can, because you just end up context switching every cycle of each thread.
Probably the best idea, if someone were willing, would be to use cooperative threading where it excels, and state machines where it does not.
Further, libco doesn't solve the problem of processors that do multiple things in parallel. Eg the CPU ALU, the PPU running backgrounds and sprites separately, etc. It would if we ran a thread for each of those things, but there is no fricking way we can afford the overhead of that on modern CPUs =(
Quote:
Will probably do S-DD1 next since it seems like that'll be fairly straightforward to add using the existing public domain implementation. And it won't require any debug tools, too, which certainly helps a lot!
One of these days I'd like to simplify Andreas Naive's SDD1 decompression code. Talarubi did that to neviksti's SPC7110 decompression code, and it's probably my favorite code in higan to look at. It's just been a low priority.
The one tricky thing about the SDD1 is that it spies on $4300-437f in order to operate the decompression. If you're not ideologically opposed to crude hacks, this is no problem at all in practice to have the SDD1 core peek inside the CPU core's internal state, but otherwise it's a bit pesky.
(Eg for the NES, mappers are a hundred times easier when you can just steal the NES PPU H/Vcounter.)
byuu wrote:
Something I've been doing with bsnes' speed hacks is detecting games that don't like them to selectively disable them (eg pixel renderer for Air Strike Patrol, cycle DSP for Koushien 2, etc.)
I'd like the idea of us building out a database of 'metadata' for games like this, and we could include information like "maximum stable overclock%", etc. Stuff like this is great for the end-user experience, instead of them having to guess and change settings per-game.
That tech of using to turn speed hacks on or off based on a ROM hash reminds me of what's described in
a patent that has been discussed before.
byuu wrote:
I ended up implementing a PPUcounter class for the CPU to inherit which basically predicts what the PPU H/Vblank statuses woud be for any given cycle
If others are interested in this particular tech, see
"Prediction" on the wiki.
Being a happy Mesen user I decided to finally take Mesen-S for a spin.
Used latest dev build.
Threw ASP at it.
Nice PPU
SO excited for this emulator now.
Yea, I'm unsure how necessary having up to 1k extra lines on the SNES is - as people test out more games, time will tell I suppose. Regarding having a DB, it's definitely something I've wanted to do before on the NES, too (that and a DB of "ideal" overscan settings to hide the garbage at the edges on the NES). In theory, if AxlRocks continues with his tests, could probably turn his spreadsheet into a database pretty easily - having settings for every single game on the SNES is going to take a long time, though.
I essentially made the Super FX overclock go up to 1000% since it's on the same settings page as the extra scanlines, which also go up to 1000 :p I didn't actually test it too much beyond validating that it sped up starfox's gameplay (and thus was working "as intended")
RE: syncing processors, in my case everything is being "synced" by the master clock counter, pretty much. Every CPU cycle increments it, every dma read/write increments it, every time it's incremented, anything that "needs" high accuracy is updated too (e.g Super FX, SA1, irq checks, etc.) The SPC/DSP/PPU catch up when their registers are read/write (or once per frame at minimum). The PPU only generates the picture at the end of each scanline, unless registers are read/written during the scanline, in which case it'll split up the scanline into however as many batches as it needs. So it's a fairly simple system really, nothing fancy, but it does let me do more or less whatever I want with each piece relatively easily. (keeping in mind that everything except the main CPU/SPC run with instruction-level granularity because the only thing I turned into a state machine is the SPC)
Though now you've given me an idea w/ regards to the PPU counters & h/v irqs checks that I might be able to use to reduce the overhead it takes to check the irq flags every 4 master clocks, will have to try that out.
byuu wrote:
The one tricky thing about the SDD1 is that it spies on $4300-437f in order to operate the decompression.
Ah, so that's what the DMA code in the S-DD1 implementation is - hadn't really looked at it in detail yet. I'll have to see what makes it the simplest on my end, might just replace the read/write handler for the $4xxx registers with the S-DD1's handler and then forward it to the original handler after the S-DD1 is done processing it.
In general though, while I do try to keep my code as clean as I can, I've found that often times abstractions that make the code cleaner unfortunately end up also making it slower, esp. since everything in a console tends to be interconnected, heh. Most of the time I tend to favor speed over perfectly isolating the code for each piece of hardware (esp. since I end up with fairly slow code even if I do that :p)
007 wrote:
SO excited for this emulator now.
Glad to hear you like it! Let me know if you happen to find problems.
We might be able to adapt my database editor for storing metadata like overclocking limits and such.
I guess if this moves into serious interest, let me know please ^^;
Yeah, interrupts only need to be polled once every four 21.47MHz clock cycles.
Something else I wanted to do that would reduce some bloat: currently we have to do the NMI/IRQ tests before we execute the last work cycle of each instruction. If we instead did this test on every cycle, and kept a latch of the previous value, eg:
foreach_cycle { last = current; current = testInterrupts(); } then we could trigger interrupts based on last instead of curent.
But I haven't been able to make this work because my testInterrupts() has side effects.
Maybe you'd have better luck, though you might not do it as it might be a touch slower ^-^;
Quote:
In general though, while I do try to keep my code as clean as I can, I've found that often times abstractions that make the code cleaner unfortunately end up also making it slower, esp. since everything in a console tends to be interconnected, heh. Most of the time I tend to favor speed over perfectly isolating the code for each piece of hardware (esp. since I end up with fairly slow code even if I do that :p)
Yep, yep. This is why I've been bugging other emudevs for quite a while now (to no avail :P) to make an accurate SNES emulator.
I've always said you could get about twice the speed of higan without losing any accuracy if you optimized it to its limits. I think you have more headroom in Mesen-S to get more than 200%, but then there's also a few corner cases you're skipping because they're quite frankly ridiculously costly. But on the whole, my original estimate seems to be mostly holding up.
I don't want higan to be the fastest accurate emulator, I want it to be the reference implementation people use to validate the hardware. With higan, I want to preserve the machine more than play games. I am not in any way saying one is more important than the other (if anything, a gaming emulator is way more useful), it's just the kind of emulator I wanted to make is all.
I revived bsnes to try and fill the large gap between higan and Snes9X, because I'd pretty much given up on another serious SNES emulator attempt. I was planning to speed up the accuracy portion inside bsnes, but I think that effort might be a bit redundant now ^-^
RE: overclocking DB, I'll let you know - will probably be a while before AxlRocks plays enough games to get a better idea of what works and whatnot, too.
byuu wrote:
If we instead did this test on every cycle, and kept a latch of the previous value, eg:
foreach_cycle { last = current; current = testInterrupts(); } then we could trigger interrupts based on last instead of curent.
This is actually what I'm doing at the moment - the IRQ test runs every 4 cycles and sets the IRQ signal when needed. Then before each CPU cycle, I check the IRQ signal's value, store it, and then check if we need to jump to the IRQ handler when the instruction ends. It's this way mostly because that's also how I implemented it on the NES, too. I had actually been considering the opposite (e.g having code that runs before the last cycle in each instruction), but didn't really get any further than just thinking about it :p
Quote:
I revived bsnes to try and fill the large gap between higan and Snes9X, because I'd pretty much given up on another serious SNES emulator attempt.
I guess I decided to write this just a bit too late huh? Haha.
But yea, I'm hoping to be able to squeeze more performance out of it all eventually, though for now I'm mostly focusing on implementing the missing stuff (that being said, I do profile every time I change/add anytime major to try and optimize a bit - my efforts are usually rewarded with the exact same FPS no matter how many different ways I write the code)
Quote:
It's this way mostly because that's also how I implemented it on the NES, too.
Yeah, it pretty much is how the hardware works (well, the four-clock test thing is just the effective result anyway, but the two-stage pipeline is definitely a real thing.) Disappointing thing is the NES/PC Engine (HuC6280) CPUs could definitely do that last work cycle test on every cycle, since their tests don't have obvious side effects like the SNES version does.
Quote:
I guess I decided to write this just a bit too late huh? Haha.
Yeah, I wouldn't have made bsnes if Mesen-S were already a thing. It was already ~10 months into development with two releases out before I found out about your project (when you announced it here.) I was worried people would accuse bsnes of being a response to it anyway.
That said, I've been very pleasantly surprised by how well bsnes has gone, and the cool features that have popped up like HD mode 7 and widescreen from DerKoun. Long overdue or not, I definitely think getting the SNES emulation rock solid before making a fork like that was the right move.
So I guess at this point, bsnes will be the gap closer between Mesen-S and Snes9X ^-^;
It's not like any of us are getting paid for this, so more choices are always a good thing.
Quote:
But yea, I'm hoping to be able to squeeze more performance out of it all eventually
If Mesen-S didn't come about, I would have tried to do the same for the accuracy mode of bsnes.
As it stands, I'd like to invest as much resources as possible into speeding up the faster side of bsnes, but it's proving to be a really difficult struggle here. The pain comes from the IRQ edge cases. I have a third SNES emulator (really >_>), and with range-tested IRQs and priority queues, and an opcode-based CPU core, I can match Snes9X's performance (for non-SA1/SFX games at least), but it absolutely does not pass my test_nmi/irq ROMs. The question is, is it possible to make what we do faster without losing accuracy?
Frame rates right now for me are:
higan -> 120fps
Mesen-S -> 240fps [less accurate coprocessors]
bsnes -> 390fps (500fps with frameskip fast forward) [less accurate PPU+coprocessors]
Snes9X -> 800fps (1200fps with frameskip fast forward) [less accurate CPU+PPU+coprocessors]
People keep finding new, innovative ways to make SNES emulation more demanding:
* using rock-bottom hardware like $5 Raspberry Pi Zeroes
* implementing run-ahead to multiply the system requirements by the run-ahead amount (+3 frames = 300% more demanding)
* overclocking all the CPUs by 1000% to reduce slowdown
* wanting to run really demanding software filters like snes_ntsc (shaders are at least mostly free)
* wanting to upscale mode 7 to 3840x2160 widescreen
* wanting GL fencing / flushing per-frame
* and of course, us wanting things emulated more accurately
I hear there's one guy who's even considering hires texture replacement packs for SNES tiles! :p
Sorry for derailing your thread again, exciting times though! I'm happy you're here ^-^
byuu wrote:
I hear there's one guy who's even considering hires texture replacement packs for SNES tiles! :p
To be honest, I'm still not convinced there's much point to it - the SNES has much less restrictions than the NES to start off, giving it more colors isn't that useful, and increasing the resolution tends to make the animations look choppy. HD Packs on the NES are not really that bad though, 95% of the processing is done in another thread, so it doesn't have too much impact on performance in general.
In other news, I just finished adding CX4 support (and I added S-DD1 support a few days ago, too). CX4 took me a while longer than I had hoped, didn't help that none of the emulators appear to be able to produce trace logs for it (this was also the case with the DSP), but it's finally done. Still seems to have a bit of a timing issue because the MMX2 demo sequence desyncs in the middle of the fight (slowing down by the CX4 by about 5-10% fixes it) - not quite sure if the issue is the CX4 or something else, though. The stuff in ikari_01's thread should mostly be implemented properly, but I must be missing something somewhere.
With this, there's basically only 3 chips left (OBC-1, SPC7110 and ST018). There's also about 3 SA-1 games that don't work right yet because I still have some features to implement. And then another 2 regular games freeze at boot. So roughly 10 games left that still aren't playable. (excluding stuff like BSX and the like)
OBC1 will take you like five minutes. It's the dumbest chip in the world. Metal Combat however is a pretty fun game.
Wow, you weren't kidding, that chip is simpler than most of Nintendo's NES mappers :p
I was going to wait a bit until I added the OBC1 and SPC7110, but now that there's only the SPC7110 left, I might try and get the SPC7110 done so that I can be (mostly) done with enhancement chips for a while and focus on other things.
Also fixed a CX4 bug that caused issues that I hadn't noticed in MMX2/3.
ansarya wrote:
I have found an issue where the "CPU Debugger" window is showing the incorrect state due to some stack fiddling by Dragon Quest 3.
This should be fixed now, thanks!
Shogi can wait until after Meabg, and Meabg can in turn wait until after Meecp.
Sour wrote:
byuu wrote:
I hear there's one guy who's even considering hires texture replacement packs for SNES tiles! :p
To be honest, I'm still not convinced there's much point to it - the SNES has much less restrictions than the NES to start off, giving it more colors isn't that useful, and increasing the resolution tends to make the animations look choppy. HD Packs on the NES are not really that bad though, 95% of the processing is done in another thread, so it doesn't have too much impact on performance in general.
I too think making SNES HD pack by hand requires too much effort for too little improvement. What I want to see is AI upscale (eg ESRGAN, Waifu2x) but those can't be done in real time yet.
Just recently decided to finally try Mesen and Mesen-S and to my surprise those 2 emulators are pretty advanced, packed with plenty of features and compatibility and has a pretty nice GUI to boot, awesome.
Mesen is probably the best or 2nd best NES emulator at the moment and I've tried dozens (puNES is Mesen's competition IMHO when it comes to features and compatibility) and Mesen-S is shaping up to be the best SNES emulator around. Currently I consider Snes9X to be the best SNES emulator and Bsnes is in 2nd place but goddamn, Mesen-S is pretty new and is already in 3rd place IMHO. At this rate Mesen-S will become my default SNES emulator because it has (or will have) all the good features the other emulators have but it also has a pretty good GUI that I'm loving so far and the rewind functionality of Mesen and Mesen-S are pretty good as well.
I only have 2 minor issues with Mesen-S:
1 - It doesn't have the option to clear the recent games (Mesen has the option though).
2 - it doesn't have the option to disable the "Game Selection Screen" (Mesen also has this option).
Hopefully you can implement those options, they should be pretty easy to implement for people of your caliber.
Also, I wonder why emulator authors doesn't implement the save state/load state feature that Snes9X for Mac/OSX has? in my opinion no emulator has EVER come close to it. Just look at how pretty it looks:
xZabuzax wrote:
Mesen-S is pretty new and is already in 3rd place IMHO
Sweet, 3rd place out of 4! :p Joking aside, thank you!
RE: the missing functionalities vs Mesen - I've been trying to cut down on the number of options a bit (especially those that do not really "add" anything beyond allowing more customization). This was mostly to both save time while coding Mesen-S and also keep code complexity down when I can (all those little options pile up!)
That said, I am keeping track of missing features from Mesen that people request, since that is a good indicator that they are actually used (if only I had usage statistics for this kind of stuff...) So the simple stuff that has low impact (e.g like these) I will probably add once I get a chance (there are a lot of features missing all around, still)
RE: The Snes9x save state screen, I've actually never seen that before. Though it looks somewhat similar to PPSSPP's save state system (or at least, what I remember from using it like 2+ years ago.) It's also essentially the same as Mesen's game selection screen, except for save states (and it shows multiple states instead of just one).
As for why that kind of thing is rare, I would guess that the answer is simply because while it's pretty and possibly useful to some users, it can take a lot of time to code something like this (and increases code complexity, maintenance, etc.). This is especially true when the feature doesn't really add anything beyond potentially a cleaner interface. In cases like these, users will still expect keyboard shortcuts and menus for save states, on top of this additional feature, so it increases the amount of code required for save states, without really allowing anything that wasn't already possible, either.
Sour wrote:
Sweet, 3rd place out of 4! :p Joking aside, thank you!
More like 7 or 8, there's more Snes emulators like SnesGT, no$sns, Zsnes, sneese, and other's that I can't remember so for Mesen-S coming out of nowhere and already surpassing most of them that had years of development is already a big achievement. This will definitely become the best SNES emulator once the missing features and compatibility is added and I can't wait to have this one as my default SNES emulator.
Sour wrote:
RE: the missing functionalities vs Mesen - I've been trying to cut down on the number of options a bit (especially those that do not really "add" anything beyond allowing more customization). This was mostly to both save time while coding Mesen-S and also keep code complexity down when I can (all those little options pile up!)
That said, I am keeping track of missing features from Mesen that people request, since that is a good indicator that they are actually used (if only I had usage statistics for this kind of stuff...) So the simple stuff that has low impact (e.g like these) I will probably add once I get a chance (there are a lot of features missing all around, still)
I totally understand mate, those little features makes an emulator more "complete" but I'm not in a hurry, keep coding at your own pace, hopefully you can add those later.
Sour wrote:
RE: The Snes9x save state screen, I've actually never seen that before. Though it looks somewhat similar to PPSSPP's save state system (or at least, what I remember from using it like 2+ years ago.) It's also essentially the same as Mesen's game selection screen, except for save states (and it shows multiple states instead of just one).
I was a Mac user in the 90s and early 2000, I used Snes9X a lot and the save sate/load state feature of that emulator was so damn good, I still miss it in this day. The only emulator that had something similar is indeed PPSSPP but IMHO the one from Snes9X is better and it allows for more slots to save the game. The one from Mesen and Mesen-S is a bit similar but it only works when you quit the emulator, I rather use that feature with a simple hotkey like the Mac/Os X version of Snes9X had.
Sour wrote:
As for why that kind of thing is rare, I would guess that the answer is simply because while it's pretty and possibly useful to some users, it can take a lot of time to code something like this (and increases code complexity, maintenance, etc.). This is especially true when the feature doesn't really add anything beyond potentially a cleaner interface. In cases like these, users will still expect keyboard shortcuts and menus for save states, on top of this additional feature, so it increases the amount of code required for save states, without really allowing anything that wasn't already possible, either.
I totally understand that but then again, the Mac/Os X version of Snes9X had no trouble maintaining that feature, I haven't used a Mac in decades but I remember that Snes9X had that awesome save state/load state feature in the 90s and it still has it today, so if the Mac version of Snes9X can maintain this awesome feature for decades then I see no reason why this can't be maintained in Windows as well.
Edit:Now that I think about it, since you have something similar with Mesen or Mesen-S game selection on startup you can take the extra mile and allow 2 hotkeys to open that same window, the hotkeys will be the "Save Window" and "Load Window" and you just need to allow more "slots" in it and it will be pretty similar to Snes9X, the only difference is that your approach only has 1 slot at the moment but you can easily reduce the picture to add more slots in it. So basically, the implementation of this feature is basically done in your emulators as well, you just need to take that extra mile to make that approach prettier and drop a couple of lines of codes here and there for the hotkeys and the extra slots but the feature is basically there.
Hopefully you can take your time to think about this.
Honestly, Mesen's game selection feature would be more useful if you could display more games/screenshots simultanously like in that example.
Not that I'd use it myself, I exclusively use emulators for testing/development, and tend to just double-click the rom file, rather than going through the program's UI first. But if you were to use it, easily spotting and recognizing a screenshot is way more effecient than flipping through a bunch of them until you find what you need.
The idea with a screenshot overview for save slots is super nice, but barely useful unless you're for some reason keeping multiple threads of a game going, and want to be able to quickly tell them apart. I don't think a lot of people have any use for that, outside of maybe speedrunners who keep a ton of states ready for immediately practicing any individual segment of a game.
Personally I didn't add savestates to my own emulator at all, because my immediate goal was replicating an actual NES, and I don't personally think savestates add any value.
Small Debugger display error:
When showing "unknown" areas of code as code (see the background is red) I noticed that "ADC #xxxx" is correctly using up 3 bytes, but it doesn't display the high byte.
In this example it is displaying "ADC #75"
bytes at $C06905 ---- 18 69 75 38 85 14
See that the 38 is missing, should be ADC #3875.
This displays fine when the debugger has run that code before and has marked that area as "code" instead of "unknown".
I am using the Mesen-S 2.0 release.
Yeah there's sadly no guaranteed way to detect what the M/X flags will be before executing the instruction.
You can keep building up heuristics to increase the chances of good disassembly, eg tracking rep/sep instructions, treating code immediately after PLP and RTI/RTS/RTL as suspect, and during unknown M/X states, try to apply heuristics (eg BRK and SBC $aaaaaa,x are quite unlikely instructions to appear, so it's more likely adc #$0000) ...
But a particularly evil coder could make one routine that executes differently based on the M/X flags, and make the program dependent on all of the variants. So there's no such thing as perfection.
Yeah, I understand that you can't tell until it's run (and as you said, not necessarily even then) although I'm super glad it's trying as it's super useful. DQ3 loves to fiddle with the stack all the time, especially to store data for a routine directly after the JSL that calls it and then change the return address, for instance (which is why my wish-list text file includes "Declare a range of 'unknown' to be data or code").
But in this instance it is using up 3 bytes of the code (pc goes from c06906 to c06909) but only displaying the 2 byte version and the $38 is thrown away. It is interpreting the instruction one way and displaying it another. Not a big issue, but then I've got to go look up the bytes in the hex editor.
One another note, should I be using the Development Build version instead of the 2.0 release? I use the Mesen-S debugging tools every day so I think I should be testing the latest and I don't care about the slowdown in the dev builds since I spend most of my time paused anyways.
Quote:
DQ3 loves to fiddle with the stack all the time, especially to store data for a routine directly after the JSL that calls it and then change the return address, for instance
Yeah, horrible as it is for disassembly, I love that trick. I do it all the time in my own 65816 code. It's a cool way to pass constants to functions (integers, strings, you name it) without consuming registers, which are absolutely in dire demand on this architecture. Can be a bit of a hassle without macro support to do the stack relative calculations for you, though.
I was debugging some code and I have a few questions about the debugger:
1. What are the "sub start" lines?
2. What is "call stack" used for?
3. Why does the debugger sometimes jump back in code when the opcode is not a jump/branch type opcode? This usually causes the code to get stuck in a short section and every jump back adds one new line to call stack
1. The "sub start" lines indicate the start of a routine. That means that a JSR or JSL command jumped to that location.
2. The call stack allows you to follow the series of JSR/JSL commands (and interrupts) that changed the PC and lead you to your current location so you can see what called what.
3. I have no idea what you're talking about with this one. Do you have more information? Like what game/addresses are involved?
Is case #3 jumping to the start of the NMI or IRQ handler?
ansarya wrote:
1. The "sub start" lines indicate the start of a routine. That means that a JSR or JSL command jumped to that location.
Then I'm not sure if it's bugged because in the code that I'm debugging it has:
Code:
LDA #$FF
STA $4305
--- sub start ---
STA $4306
--- sub start ---
LDA #$01
STA $420B
ansarya wrote:
Do you have more information? Like what game/addresses are involved?
Just some code that I've written. I just can't think of a reason why a jump from the subroutine would occur to an earlier point
Code:
...
LDA #$18
STA A1T0L
LDA #$80
STA A1T0H
STZ A1B0
LDA #$FF ; This is where the code execution end up
STA DAS0L
STA DAS0H
LDA #%00000001
STA MDMAEN
DEX
BPL MemClear
LDA #$00
TCD
PHA
PLB
TAX
DEX
TXS
REP #$10
.i16
JSR RegisterSetup
... some other code here
RegisterSetup:
LDX #$0000
LDA #$80 ; Code jumps from here
STA inidisp
STA INIDISP
LDA #$03
STA obj_sel
STA OBSEL
STX OAMADDL
LDA #$09
STA bg_mode
STA BGMODE
STZ MOSAIC
... more code
RTS
EDIT: The program keeps doing this for as long as I leave it running.
Are you sure an interrupt vector isn't set to that address?
Nicole wrote:
Are you sure an interrupt vector isn't set to that address?
I'll have to check that, but I'm pretty sure no, since they were defined with labels. All interrups should be disabled at that point anyway since it's part of the reset code.
EDIT: Interrupts should be ruled out now. The address the program execution jumps to is $00:803A, and none of my interrupt vectors point to that address. Even if it was interrupt, I can't think of a reason why the code would jump there from exactly the same point every single time. The jump always seems to happen at $00:80B0 according to the call stack, which is LDX #$00.
EDIT 2: Not sure why, but the debugger now displays the code differently. Now I sort of understand why the code jump. The code was interpreted as having a BRA instruction in it, which causes the jump.
Here is a short section from the start of the subroutine with hex (from compiled file), correct (from source code), which is also what the debugger displayed earlier, and what the debugger shows now and executes. I also included the states of M and X bits from the debugger (which are as intended).
Code:
M = 1 A = 8 bit
X = 0 X/Y = 16 bit
HEX Correct Debugger
A2 00 00 LDX #$0000 LDX #$00
A9 80 LDA #$80 BRK
85 08 STA $08 BRA $00803B
8D 00 21 STA $2100 PHP
A9 03 LDA #$03 STA $2100
85 09 STA $09 LDA #$03
8D 01 21 STA $2101 STA $09
STA $2101
... ... ...
Will you add support for Mega Man X3 Zero Project? Game doesn't boot sadly.