Hi
I posted in 2011 about a program to disassemble NES games and make SNES roms nearly automaticly. It was not finished, the assembler code replacing the PPU calls and the code for the interrupt vectors is not working. Many test roms did not pass and I stoped here.
Today a friend told me to put something in GitHub before tonight, so I have chosen this project, nothing to lose.
I have released it with a GPL licence and uploaded it to Github at
https://github.com/mandraga/upernes.
For now it is mostly a tech demo but the C code works fine, it is the asm files who do not work that much any more.
Edit: I worked on it for 6 month and I have encouraging results on the snes.
Edit: Super Mario Bros nearly completed, it needs a proper sprite 0 emulation and a fast background refresh.
Edit: Sprite 0 is ok, sound and bg update in progress.
Edit: Sound is working and Super Mario Bros, Excite Bike, Balloon fight are emulated.
Is it possible for you to include windows (32bit and 64bit) binaries as well?
I do not want to have to install flex-mingw or something similar because Sourceforge seems to hate using adblockers, and no other trusted mirrors exist right now for flex-mingw!
Hello
I have MSYS2 on my windows and it will not be difficult to "port" it. I will port it maybe in a week or two when I can work with my computer with windows and MSYS2. But you can try MSYS2 (has an installer) and then adding all the tools and all the libs like flex and bison using pacman in command line. Porting from linux to windows is much more easier recently.
This is a great idea. I will build it later tonight and see how it works. If you give it sensitivity to mapper PRG bankswitching, maybe my dream of seeing Gimmick on the SNES can come true.
Hi
Today I ported it to windows64 using MSYS2.
Recompilation works except if a PORT is accessed by other means than absolute. Indirect port accessed may be used by some rom's audio program SNDSQR1 SNDSQR2.
The binary is called upernes.exe in the source folder.
Usage is: ./upernes.exe romname.nes
The new source code and CHR files are in outsrc/. Copy them to asm/ and build the snes rom.
It looks like the exes compiled with MSYS are not easy to redistribute. Because when you do not have MSYS installed, they lack a lot of DLLs and good luck to find what dll is needed. But I will try to get the proper Dlls in the bin directory.
I think you can link to libgcc statically with mingw. You should be able to static link to others as well, and this is a good situation for static linking.
Yes I have seen this while resolving this problem yesterday. However, I pushed the dlls needed in source/binw64/
Add a outsrc/ fodler to binw64 and you should be able to call "upernes.exe romname.nes". What you see in the graphic window is the code already parsed. I do not remember the color meaning, it should be data/code areas, maybe I will add audio port and graphic port accesses in other colors.
I do not know if it is current usage to keep compiled binaries in a git repository? Is it better to put that on another web page? I was used to sourceforge where you have both source and binaries. Can someone clarify this?
Currently upernes.exe exits if it finds a non absolute address access to an hardware port. It means that if a rom writes in a port using the port adress it is transformed into a call to an emulation routine on the snes. But if a rom writes the port using an indexed something then the code to replace this indexed call needs to be added to the cpp asm rewriting functions.
However, currently, the basic test roms must pass the tests again. upernes.exe rewrites the code but the called routines from the assembler files are not working properly. So the plan is to fix the asm port routines and then add the non absolute port read/writes.
Btw, now that it works on windows, do you know an snes emulator with a good assembler trace and breakpoints? This is crucial to be able to work properly on port emulation. Being too often lost in asm routines was what made me stop working on it.
cc65 keeps source on GitHub and binaries on SourceForge.net.
(I keep wanting to type SourceForget.)
I tried to build this under Debian Linux 64-bit. I had to modify the makefile to specify Linux. It would be better to make your different OS choices for cross-compilation different targets, rather than rely on editing the Makefile.
There are some serious permissions issues in your source folder, too. Everything was marked executable, and I was unable to touch any of the source subdirectories from permissions issues. I had to manually fix permissions before I could use this.
When I ran upernes on Donkey Kong (PRG1), a lot of text went by, a window with a strange red smear on it popped up for a second, then I got this output:
Code:
upernes: recompileIO.cpp:133: void CRecompilateur::outReplaceIOport(FILE*, t_pinstr, Copcodes*): Assertion `addressing = Abs' failed.
Aborted
Hi
I will check those permission issues right after this post.
The text is the rom source code plus the count to ports and memory accesses, this is normal, look to it. The image is the rom being disassembled as I said before.
And yes, it fails if it finds an indirect access to an hardware PORT.
I updated the repository, no more x rights (I liked how it appeared green in the console
). I used a trick from a slashdot post for the makefile that detects windows using an environment variable.
Thanks for your comments.
I see, thanks for the quick clarification.
Are there any commercial titles which use only absolute port access? Or will we just have to wait for this functionality to be implemented?
I checked those addressing modes and I found an old folder where I already added Absolute X indexed and Absolute Y indexed. And I merged with the current branch.
Ballonfight passes the recompilation process (however, without adding the indirect jumps if any).
Now I must merge the assembler files containing the new routines, and this is far from working. But I have seen that snes9x debug version was ported to linux, this may be great for passing the tests roms.
NO$SNS also works in Wine 1.6.2 in Xubuntu 14.04 LTS.
I updated the assembler code. I did not remember having programmed so much, it must be the reason why I have no clear recollection of 2010
The asm is really extended, with use of the DMA and a lot of the PPU is implemented. However it does not work wel, I will try those debuggers. But enough for now.
Feel free to look at the asm files and memap.txt and comment the design choices.
How can I get CRC32 to work on Super Mario Bros. 1? Also it says the indirect addresses are not for my rom (even though it's for my rom!),
I'll list the addresses via PM if needed!
It was a bug, it compared the long int CRC 0xFFFFFFFFECDA4EDD with 0x0000000ECDA4EDD. It should work now, however I did not recompile the windows binaries.
I will recompile the sources on Sunday, but today I am struggling with too much shit.
I could not update the W64 binaries right now so I compiled a W32 version. Had to remove the graphic window (because of weird dependency problems with the SDL). The CRC works.
Hi, I am proud to announce that BallonFight automatic conversion using upernes can be played on the real hardware
.
I need to improve the sprite zero emulation and I post today because I want advices on the replacement of indirect jumps. I have a problem with the JumpEngine routine in Super Mario Bros.
A post on my blog explains it:
http://blog.vreemdelabs.com/category/upernes/I am trying to find an automatic way to convert such code. And I want to discuss it with you to find a good solution.
Actually upernes recompiles only the code to the code bank and uses the original prg rom in the data bank. However doing this does not work with SMB1 because the routine addresses are stored as data, and the call to the routines uses the jsr return address as base address. But upernes moves this jsr instruction and therefore, not only the jumpEngine points on anything, but also the displayed supposedly unknown indirect jump addresses are wrong.
Solutions
Case 1: using the BRQ instruction.
Therefore the original routine addresses and data could be kept in order to handle any indirect jump problem.
The replaced instructions are like: sta/stx/sty to an IO port and lda/ldy/ldx from an IO port, plus the indirect jumps.
BRK takes 2 bytes, but the sta/lda to IO regiters and jmp indirect take 3 bytes. We can use the 2 extra bytes to store the replaced opcode. Opcodes for lda ldx ldy sta stx sty jmp could be coded on 4 bits and one byte to code the IO port or the indirect jump code. The 3 bytes opcodes are replaced with BRK XX XX and therefore, the original code keeps the same size.
The interrupt vectors must then point to an area where we have nothing on the nes. From here it goes to native mode and it changes bank to execute the IO port or jump emulation code (which only checks that the jump address was known at conversion time).
The code in ram could be a native mode change plus a jsl (jump subroutine long) going to another bank.
It will be needed to handle all the emulation mode interrupts the same way because the super mario data goes up to the interrupt vectors and leaves no room for the native vectors.
Case 2: the recompiled code must be at a known addresses.
The jump engine calls must be at the same offset as in the original prg rom. His address will be indicated in the indirect jumps file and once known every call to it will be correct.
But this could not work for other games.
Case 3: a mix between the two, every routine call address is kept at his original address. The assembler must copy the emulation code in the blank spaces left over were the data was. I assume it won't be possible.
Case 1 seems to be mandatory. I do not know if it can be done from bank 1 in emulation mode because interrupts seem to be only in bank 0?
Will it be possible to boot on the bank0 and process interrupts from bank 1 or 2 while in the CPU emulation mode?
It seems that in order to run the routines at the same address, the patched code must be in bank zero in order to execute interrupts properly. And this will give little space for the snes init and IO emulation code, it will be in another bank.
Bank 0: patched PRG ROM
Bank 1: original PRG ROM for data
Bank 2: init and IO snes code
It boots on bank zero where it goes to code installed in a data 'hole' and then to the nes init vector:
Code:
free@:
sei ; native mode
clc
xce
jml SnesInit ; long jump to the other bank
Then it initialises everything in SnesInit form bank 2, and calls jml EndSnesInit
Code:
EndSnesInit:
sec
xce ; emulation mode
jmp NesInitVector
This will take 11bytes somewhere in the bank zero
And the BRK code will work on the same principle, jumping to bank 2 in native mode and going back to bank 0 for an RTS in emulation mode.
The other solution if I want to run the nes prg from another bank, is to replace the rti instructions with a custom code going back to the proper bank. I found out that all the 256 65C816 instructions are available in emulation, maybe it can help.
Having the patched prg rom in bank 1 or 2 would make interrupts easier (Reset, NMI and BRK). I prefer this solution, but I must find a clean way to 'rti' back to the original code once in bank 0.
Edit: When an interrupt occurs in emulation from a PRG bank other than bank 0, the original bank is lost. Following the Programer's manual, a long jump to the rti in the original bank will work, restoring the bank number.
And for BRK, the rti will return on the byte after the BRK opcode, hence on the last "parameter" byte. And therefore, a pull of registers and a long jump to the saved return address + 2 should do it. The tricky part is to do it while in emulation mode???
Edit2:
I tried this code
Code:
sec
xce ; Emulation
jml next
.ENDS
.BANK 3
.ORG 0
.SECTION "Other"
next:
And it compiles, I assume that a long jump works also in emulation. And because it does not change the status registers it could be used to go to a static address like the nes reset vector while in emulation.
And the "Absolute Indirect Long Addressing" jump could be used to come back from a BRK while in emulation.
This looks promising, it would allow to have the SMB1 patched PRG in bank 1 or 2 and run the emulation on bank 0.
Bank 0: init and IO snes code (executed in native mode, no interrupts in native mode)
Bank 1: patched PRG ROM (executed in emulation mode, interrupts go to the bank 0 vector routines which return to bank 1)
Bank 2: original PRG ROM for data
I worked on the BRK patching method. But I believe it will not work because a BRK cannot occur while interrupts are disabled.
Maybe I should use a replacement like this: jsr stalongjump going to a data area of the code and put a jml there going to the first bank and returning to the jsr area.
Therefore: emulation code in bank one, patched PRG in bank 2, and original PRG in bank 3.
Edit:
Maybe I do not need to use two PRG banks, I use one patched PRG bank with jumps to code in the snes ram instead of a BRK #Code, NOP.
The patch would replace lda $2000 with something like jsr $08A0 ($08A0 in ram). And the ram area would contain the binary codes for something like:
Code:
php, pha, sta #signaturecode, jml staticRoutineAddress
It would take 8 bytes and be faster and safer than the BRK solution.
And staticRoutineAddress will call the proper emulation routine given the signaturecode in A.
The return would be made with a return long prepared on the stack by pulling the return address to add the bank prior to RTL.
Second edit:
Each signaturecode must have his routine, therefore it will use a lot of ram like more than 200bytes. But it will fit in wram.
This binary file would be created by upernes.exe, stored in the last bank and loaded at boot.
However SMB and Balloon Fight do not disable interrupts.
Anyone has an advice on using code in ram?
Just coded the RAM jump approach. And it seems to be much better, but nothing on screen yet. It goes to the emulation code in BANK0 and comes back to the patched PRG in BANK1. It is quicker than the BRK method.
The code in ram from patched IO accesses technique works. It takes a lot of ram (more than 200bytes) but it works. I have a problem somewhere, but it shows the title screen of Balloon fight.
This should solve the jump machine problem for SMB.
Edit: Balloon Fight works, a bit slower but I can optimise later.
Super Mario Bros uses a trick to jump over instructions (code from the complete smb disassembly on git hub):
Code:
MoveAllSpritesOffscreen:
ldy #$00 ;this routine moves all sprites off the screen
.db $2c ;BIT instruction opcode
MoveSpritesOffscreen:
ldy #$04 ;this routine moves all but sprite 0
lda #$f8 ;off the screen
The bit instruction takes ldy #$04 as operand. And because lda sets the flags again, it is like jumping over the ldy instruction.
But when I disassemble the shit, my program gets confused when it finds a jump to the MoveSpritesOffscreen label becaus it is like jumping in the middle of an instruction.
I should use an alternative path to handle it. However because I patch the ROM, it is not critical.
I will assume that every label after this bit instruction is not an indirect jump or an access to an IO port. And just write a warning on the console.
Super Mario Bros works!
(more or less)
I have to improve speed, and sprite zero hit flag. Every BG tile is updated during Vblank, I must use flags to check if something changed.
Awesome! I'm glad to see this project is back.
I'm still an idiot when it comes to makefiles, but I gave it a shot, and this is the error I get:
Code:
mingw32-make: *** No rule to make target 'init.asm', needed by 'init.o'. Stop.
I'm on Win7-64, I have the WLA files in the path, I copied my NES file to workdir, ran "convert.sh rom.nes", then "mingw32-make" and I get that error. I'm also running mingw32-make instead of make, because I have some weird old version of make.exe file in my path that I haven't bothered to remove.
The Makefile does seem to depend on a bunch of *nix tools, like rm and cat, so you might have limited success on Windows without some extra work.
If you happen to have Windows 10, I've found
Bash on Windows to work quite well in general.
First, I checked if a new fresh clone worked on my system, and it works.
I use Msys2. And it is scripted a lot, the makefile is called by convert.sh and the file names are passed through the environment. The makefile takes $(ROM_NAME) as parameter.
The script makes a copy of asm file from ../asm/, then calls make and finally deletes the asm files. This is why the call to make directly does not find init.asm.
Convert.sh from the source/workdir/ directory should do all the work. You should find a 'romname'.fig in the directory. There is no need to call make after convert.sh. It is simpler like this, because otherwise I was sometimes editing the wrong files and compiling a mess from different roms (like running Balloon Fight with the CHR data of SMB1).
I am glad you follow the project Memblers, because I must add your APU emulator to it.
Now I must find a solution to update the nametable data quickly enough. I use a 4KB ram buffer to store the nametable data and I make a copy of it using DMA during VBlank. This is slow, I would prefer to update it when the IO port is accessed on the nes but it does not seem to work other than during Vblank on the snes. Plus, the ram method is clean. But I must find a way to update only what changed (even donkey kong is slow). Maybe with a 8bit bitmask per line x 30 lines x 2 Banks, 60Bytes of bitmask. And take profit of the DMA.
By the way, you need to add a ton of indirect addresses to the 'super mario'.txt file containing the indirect jumps:
Here is my file (the crc is used to assert that we are using the same ROM dump):
crc32: $F2DB8422
IndirectJump: $06
addr: $8231
addr: $8FCF
addr: $8567
addr: $858B
addr: $859B
addr: $8652
addr: $865A
addr: $8693
addr: $86A8
addr: $86E6
addr: $93FC
addr: $88AE
addr: $92DB
addr: $9B0E
addr: $9A2E
addr: $85BF
addr: $85E3
addr: $8643
addr: $86FF
addr: $8732
addr: $8749
addr: $9061
addr: $8245
addr: $9131
addr: $B069
addr: $B0E9
addr: $B35A
addr: $AEDC
addr: $8FE4
addr: $889D
addr: $9071
addr: $AEEA
addr: $B376
addr: $C2F1
addr: $C8E0
addr: $CA77
addr: $98E5
addr: $B36D
addr: $BDD2
addr: $BC85
addr: $B233
addr: $9B01
addr: $BB38
addr: $B269
addr: $9B41
addr: $91CD
addr: $B245
addr: $9B14
If not the same rom, add the unknown address given by the converted rom and convert it again until it works.
Improving the speed of the nametable writes:
Updating vram by column:
We update columns of tiles. 8 colums, they can be updated by DMA using a 32 words increment.
Updating vram by Line case:
We e update lines. Same thing but one by one increment, 30 lines.
This can be made through DMA (10 times faster than a loop).
However, the line or column must be identified. The background mirroring bit can help.
2 columns or line will be updated at each NMI, not overloading the console. It can be done by DMA or by HDMA.
Bit masks can determine what to update.
However I checked if it was really the BG update who were slowing down the program. I removed it from the NMI routine and it changes nothing to the speed, maybe it's the sprite zero emulation? Balloon Fight is always fluid, it does not use it.
Edit:
Btw the Sprite0 emulation does not work well on the test rom, something is wrong with it or when writing in the scrolling registers while rendering. Sometimes it works, sometimes it stays at (0, 0). That's why the screen is shaking so much.
Maybe the scrolling values should be always written using the HDMA.
What do you update if the game changes an attribute byte and nothing else?
It's straightforward for a game to squeeze in full screen updates to the attribute tables of both nametables. I can make an attribute update torture test ROM if you want.
Every name value must be changed when attributes are changed. Because of the background tile format, otherwise it would use only one palette for the tiles instead of 4.
However the full bank copy to VRAM seems to be fast enough. I assumed it was not and I was wrong, "assume nothing" when optimising. According to the documentation, the DMA can copy up to 6KB during vblank.
The Sprite0 flag and change the scroll registers during rendering seems to be the main problem.
I checked the conversion of my sprite0 test rom (t9_sprite0.sh) and it shows the same problem. Scrolling during rendering seems to work, but it is like it misses sprite zero hits.
And by stopping the program randomly it shows that it updates the sprite CHR data in VRAM all the time.
I will fix this
Sprite0 emulation looks fine, it's something with the ppu control register, it takes too long to call it and it misses frames when looking for the sprite zero hit and vblank.
I tried the test rom ont the console and the picture is fine, and it is fast. While it flickers on bsnes plus.
On the other side it flickers on smb on the real hardware like on the emulator. And seems slow.
The title screen of SMB does not show, I read that it could be because of some missing delay in PPU reads.
Here are the reads performed by SMB:
.DW rlda_2002
.DW rldx_2002
.DW rlda_2007 <- it works, the latching was tested
The mirroring must be emulated by making a copy of the nametables bank 1 after bank 2.
Nametables bank 2 is not very clean, the lower part is missing.
Other tests:
Donkey kong still slow.
Battle city shows something.
Excite bike shows the title screen, need to add more indirect addresses.
Galaga does nothing
Ice climber hangs
So it looks like something is slowing everything down. But it does not show any-more on the Sprite0 test Rom.
I just fixed the nametable mirroring problem and the palette mirroring.
The nametable mirroring is achieved by making a copy of the first bank after the second one. And this is slooow, I must implement the line and column update.
Second correction, I used the proper way of mirroring without adding a copy.
The sprite 0 hit system shakes, it misses vblanks because of the nametables copy at each NMI. I will update the Names only by columns. The direction of mirroring is in the rom file header, therefore it will be vertical/line or horizontal/lines updating.
Enabling the video sync on bsnes plus solves the problem of blinking images on the sprite 0 test ROM.
Another thing to add, is a branch tree for the indirect jumps. There are hundreds of indirect jumps addresses in SMB1 and testing each one slows everything down.
Once a lot of indirect jumps have been detected and every system side of the program is running, it is not needed to check for indirect jump destination @. An option would disconnect the indirect jump check routine and leave the indirect jumps non patched. Like adding IndJumpDisable: 1 in the indirect jump file
And a read of every byte in the program would count probable missing io accesses or missing indirect jumps in order to check for missing patches.
Patrick FR wrote:
I will update the Names only by columns. The direction of mirroring is in the rom file header, therefore it will be vertical/line or horizontal/lines updating.
I could make a torture test ROM for that too.
A torture rom is not mandatory to pass the problem, but it could be fun to try to pass it. Like a rom with vertical scrolling and a rom with horizontal scrolling. And of course attribute table changes.
I tried to reduce the cost of updating the background in vram and the column technique is not that great because it does not use the DMA. A line approach seems better because it uses the DMA.
However, even when reducing the cost of the transfer, the sprite zero emulation glitches. Event without handling the indirect jumps. While it is faster, like on the original but it glitches a lot. It is like it misses the end of the VBlank or the collision. It could also be a bug.
I have to look at it.
I had a problem in PPUSTATUS, I assumed that the Vblank bit was not set when NMI was disabled...
I fixed that, but it still glitches.
edit: it seems to be the values written in the register causing problem, once it is like 4 and the other time it is 260 (the correct value). I must find out why incorrect values a written into the PPUSCROLL register. Maybe it is because 260 = $0104 and the highest bit is lost. This bit seems to be changing a lot.
I will try to fix the title screen first. It does not show the "Super Mario Bros." picture, nor 1 or 2 players. Anyone knows what could be the cause? By cause I mean a IO read having a peculiar behaviour.
The title screen of
Super Mario Bros. relies on the ability to read CHR ROM while rendering is off.
When you read video memory using $2007, the PPU returns the last value read from video memory and
then reads the next value from VRAM. This means if I set the video memory address to $0F00, I have to read once and throw that value away
Code:
lda #$0F
sta $2006
lda #$00
sta $2006 ; VRAM address = $0F00
lda $2007 ; A := last byte (usually garbage), and last byte := value at $0F00
lda $2007 ; A := last byte (value at $0F00), and last byte := value at $0F01
lda $2007 ; A := last byte (value at $0F01), and last byte := value at $0F02
Because NES and Super NES tiles are in different formats, and because Super NES video memory is word-addressed, literal translation of code that reads CHR ROM will probably fail.
I looked what my code did (I never fully tested that part) and reading from the CHR data was ok but the Acc value was destroyed by the address increment emulation...
I tend to keep the data in nes format, but I am not sure about the CHR data, however it works.
The title screen is all right now. Thanks
I have found why it blinks, the SMB NonMaskableInterrupt code sets the name table address to be $2800 at the beginning.
And then it does his sprite zero detection. But it does not restore it when setting the scrolling values, it is later when rendering with the PPU_CTRL_REG1 bit0 to 1 that the next nametable bank will be used. And therefore it will render ok on the nes, but in my current super nes emulation, the bit in use at the moment of the sprite zero position will be used. And therefore I must also update the scrolling registers when the bit changes in PPU_CTRL_REG1.
Edit: I have much less glitches on this, but still going to horizontal position 0.
It seems that I have a speed problem with what is before sprite 0 hit flag. But maybe not, the top still glitches somehow if I make it collide at 96. It means that PPUCTRL1 does not always point to the first name bank at the end of vblank. But it is much better if the HScroll value is below 256 (no glitch at all).
Is it a way to profile snes code? To know how many cycles and scanlines were used between 2 points?
My way would be to read the V counter before when PPUCTRL1 is accessed.
Anyway it seems (from the scroll after 256) that the number of available cycles are not enough to complete the NMI routine before the end of vblank.
What this "reset flipflop" from the end of SMB's NMI means?
Code:
jsr OperModeExecutionTree ;otherwise do one of many, many possible subroutines
SkipMainOper: lda PPU_STATUS ;reset flip-flop
pla
ora #%10000000 ;reactivate NMIs
sta PPU_CTRL_REG1
rti ;we are done until the next frame!
The PPU_CTRL_REG1's bit 1 state seems to be the cause of the glitches because the scroll offset changes from 0 to 256 while his value is 00, hence changing banks.
I had many bugs in the Vblank bits, and I had a wait Vblank in my sprite DMA IO emulation.
Finally, instead of a profiler, I used the V line counter in bsnes+ debugger view, to look at how much time was used in the nes NMI routine.
I tried to enable the fast mode by writing in the register and setting the rom type on the header ($30). I do not know if it worked. Do anyone know how to test if a rom is recognised as fast rom?
The glitch seems to come from
Code:
UpdateScreen -> WriteBufferToScreen -> WritePPUReg1 (code from the Smb disassmbly project n github)
lda Mirror_PPU_CTRL_REG1 ;load mirror of $2000,
ora #%00000100 ;set ppu to increment by 32 by default
bcs SetupWrites ;if d7 of third byte was clear, ppu will
and #%11111011 ;only increment by 1
SetupWrites: jsr WritePPUReg1 ;write to register
pla ;pull from stack and shift to left again
asl
bcc GetLength ;if d6 of thir
It writes $11 in the PPUCTRLREG1 during the Vblank and this "pushes" the Score bar to the left. Because the bit zero is the higher bit of HScroll. It is weird, maybe the screen is disabled on the nes when calling this Updatescreen routine?
However, it seems to be just enough in the SNES in terms of CPU power to run the emulation. I hope to remove this scrolling glitch.
Hi
I looked at the effect on the PPUCTRLREG1 in Fceux and it behaves the same way (I did not notice the scanline number until today).
Glitch case:
Code:
Write to PPUCTRL1 Nes Line Snes Line
$10 @ Begining of NMI interrupt routine at line 239 246 (because of prior dma copy for sprites and BG)
$11 @ UpdateScreen glitch 239 290
$11 @ Sprite0 Hit 31 31
$91 @ End of NMI routine 95 88 (slightly ahead)
Normal case:
Code:
Write to PPUCTRL1 Nes Line Snes Line
$10 @ Begining of NMI interrupt routine at line 239 246 (because of prior dma copy for sprites and BG)
$11 @ Sprite0 Hit 31 31
$91 @ End of NMI routine 95 88 (slightly ahead)
The Update screen sets the bank to $2400 at line 290 and therefore the score bar does not show up. The score bar is in bank zero, PPUCTRLREG1 should be $10 like configured at line 239 at the begining of NMI.
However the problem does not show up on the NES. It does not disapear.
Can anyone explain why the score bar does not blink on the nes?
It is like the PPUCTRL1 reg write is ignored when the rendering is disabled in PPUCTRL2.
Are there writes to $2006 after the write to $2000? It'd overwrite the value set in $2000.
It deos both writing in PPU_ADDRESS and WritePPUReg1. It enters in the routine at the UpdateScreen label.
Code:
WriteBufferToScreen:
sta PPU_ADDRESS ;store high byte of vram address
iny
lda ($00),y ;load next byte (second)
sta PPU_ADDRESS ;store low byte of vram address
iny
lda ($00),y ;load next byte (third)
asl ;shift to left and save in stack
pha
lda Mirror_PPU_CTRL_REG1 ;load mirror of $2000,
ora #%00000100 ;set ppu to increment by 32 by default
bcs SetupWrites ;if d7 of third byte was clear, ppu will
and #%11111011 ;only increment by 1
SetupWrites: jsr WritePPUReg1 ;write to register <----------------- It writes $11 or $15 here
pla ;pull from stack and shift to left again
asl
bcc GetLength ;if d6 of third byte was clear, do not repeat byte
ora #%00000010 ;otherwise set d1 and increment Y
iny
GetLength: lsr ;shift back to the right to get proper length
lsr ;note that d1 will now be in carry
tax
OutputToVRAM: bcs RepeatByte ;if carry set, repeat loading the same byte
iny ;otherwise increment Y to load next byte
RepeatByte: lda ($00),y ;load more data from buffer and write to vram
sta PPU_DATA
dex ;done writing?
bne OutputToVRAM
sec
tya
adc $00 ;add end length plus one to the indirect at $00
sta $00 ;to allow this routine to read another set of updates
lda #$00
adc $01
sta $01
lda #$3f ;sets vram address to $3f00
sta PPU_ADDRESS
lda #$00
sta PPU_ADDRESS
sta PPU_ADDRESS ;then reinitializes it for some reason
sta PPU_ADDRESS
UpdateScreen: ldx PPU_STATUS ;reset flip-flop
ldy #$00 ;load first byte from indirect as a pointer
lda ($00),y
bne WriteBufferToScreen ;if byte is zero we have no further updates to make here
InitScroll: sta PPU_SCROLL_REG ;store contents of A into scroll registers
sta PPU_SCROLL_REG ;and end whatever subroutine led us here
rts
We have the "reinitializes it for some reason" which looks dubious.
Usually the Mirror value of PPUCTRL1 (the copy in ram) is anded with $FE to remove the lower bit at the begining of the Vblank NMI routine. But here it simply writes it without masking.
I have found this code in the fceux source code (ppu.cpp).
Code:
static DECLFW(B2006) {
FCEUPPU_LineUpdate();
PPUGenLatch = V;
if (!vtoggle) {
TempAddr &= 0x00FF;
TempAddr |= (V & 0x3f) << 8;
ppur._vt &= 0x07;
ppur._vt |= (V & 0x3) << 3;
ppur._h = (V >> 2) & 1;
ppur._v = (V >> 3) & 1;
ppur._fv = (V >> 4) & 3;
} else {
TempAddr &= 0xFF00;
TempAddr |= V;
RefreshAddr = TempAddr;
DummyRead = 1;
if (PPU_hook)
PPU_hook(RefreshAddr);
ppur._vt &= 0x18;
ppur._vt |= (V >> 5);
ppur._ht = V & 31;
ppur.install_latches();
}
vtoggle ^= 1;
}
Where _h is the value of PPUCTRLREG1 bit 0 and V the byte written in $2006. It seems to tell that the bit 2 of this write will go to the bank select bit of PPUCTRL1.
Patrick FR wrote:
It deos both writing in PPU_ADDRESS and WritePPUReg1. It enters in the routine at the UpdateScreen label.
Code:
lda Mirror_PPU_CTRL_REG1 ;load mirror of $2000,
ora #%00000100 ;set ppu to increment by 32 by default
bcs SetupWrites ;if d7 of third byte was clear, ppu will
and #%11111011 ;only increment by 1
SetupWrites: jsr WritePPUReg1 ;write to register <----------------- It writes $11 or $15 here
; [snipped several lines]
lda #$3f ;sets vram address to $3f00
sta PPU_ADDRESS
lda #$00
sta PPU_ADDRESS
sta PPU_ADDRESS ;then reinitializes it for some reason
sta PPU_ADDRESS
We have the "reinitializes it for some reason" which looks dubious.
The going theory is that a programmer saw that leaving rendering off with the VRAM address pointed at $3F01-$3F1F caused that color to be sent to the composite output block instead of the color at $3F00. The programmer internalized a (wrong but close enough) model of the hardware in which the CGRAM had a separate address pointer, and writing $3F00 then $0000 initialized both the CGRAM address and the VRAM address, just as it (actually) has separate OAM and VRAM pointers.
In 1999, loopy discovered
the skinny on why the last two writes on $2006 keep the status bar from flickering. Let me summarize:
Bits 1-0 of the value written to $2000 get copied into bits 11-10 of
t, the top-left corner address. This address is used to reset vertical parts of
v, the VRAM address, during the pre-render line's hsync pulse and the horizontal parts of
v at the start of hblank. But a pair of writes to $2006 overwrites both
t and
v. Here, writing $0000 clears both
t and
v to 0, causing the PPU to read from the first nametable.
Thanks tepples, this one was rough
. The wiki page is detailed, it will help me fix this.
I am actually reworking the transition from patched prg to emulation, I will post as soon as it works.
The score bar glitch problem is solved. I have found a speed problem causing a scrolling glitch, the code does not finish prior to Vblank End and it skips a frame wile the scrolling is at (0, 0). It recovers on the sprite 0 hit flag assertion on line 30 of the next frame.
It seems to happen during scrolling when colums are updated in the nametables, but I have seen that behaviour at the end of the map near the flag pole while not moving. An IO emulation routine may take a lot of time, it takes 35 lines to complete, maybe I should make an array of counters per IO call and reset it on vblank start. It would indicate what is called.
I checked the calls to IO emulation and sometimes I find 26 writes to PPUMEMDATA or 42 writes to PPUMEMADDR. The calls caused a missed vblank end.
My code was not in bank $80! Therefore the fast mode was not used. Now it is faster, and the 26 writes to PPUDATA do not cause a glitch. But the 42 PPUMEMADDR writes still cause a frame miss.
This function changes the address of the routine called when writing. Maybe I could put it in RAM and only call the code in the other bank when the toggle is 0 after writing. This will cut in half the number of calls to the bank 0, it will surely work here.
But I doubt that it is enough cycles to add sound to it, and also have the scrolling working but I will try it. The ratio between available cycles and IO call cost is low.
I moved part of "sta $2006" in ram, reducing from 42 calls to 21. The missing frame is still here, even if it is faster with the latest optimisation. It does not glitch when returning directly for all the 26 writes to PPU RAM, and therefore it is close to working on super mario bros. Maybe by using a 2KB jump table in the WRAM bank depending on the PPUADDR, it will spare a ton of cycles both in $2006 and $2007 emulation. But it looks like the snes has not enough power to emulate scrolling nes Games at 100%. I have cut a lot of cycles and optimisation makes the code less clear. I could get more cycles during Vblank by changing the background update to a fifo instead of a rolling DMA transfer. But all in all, if everything must be optimised, the development will be very slow. + On the console, it shows little rendering mistakes.
Anyway, it works with non scrolling games and Super Mario Bros can be played directly from the conversion.
I will take a look at what Memblers did with the APU emulation, in order to see how it could fit in, and I end here. It was interesting, it went further than what I expected but it is not an aesthetic conversion where everything fits (that was my goal if possible). However it is fast, despite the few missing frames it really feels like the NES. I am not going to squeeze the cycle count for each IO access. There is no room for a correct PPU emulation and therefore no room for improvement.
Thanks for your help and for the amazing emulators and their integrated debugger. It is impressive that even on such code, bsnes behaves like the real hardware.
edit:
I can't stop thinking... the super FX or a custom FPGA program sould be able to do PPU emulation. ...someday maybe.
I could not stop thinking about this cycle problem, and I found something really effective about PPUADDRESS. I use a table of routines in WRAM (4KB) to be able to jump to the routine of the current PPU address quickly. It removes the address increment routine cost. And I moved the PPUADDRESS IO write acces to the ram code. This removed a shitload of cycles, I gained 20 rendering lines, but it still glitches. 2 more lines needed..
I could solve it by moving the sprite0 hit routine to the ram code area (using the timers or the DMA to update the flags). This last change will remove the cost of 10 calls and spare 10 rendering lines. It will remove the glitch on SMB1.
However I doubt that I can add sound to it. Any opinion on that topic? The sound takes 5 or 6 calls per frame, the cost of bank-switching is already included in the current code. It goes to the sound routine but the routine does nothing.
Sound emulation could be fine if we can update the registers in the SPC700 when we want.
On Smb1 we have from line 80 to line 239 to do it. The timer will call the update routine on a given line from the romname.txt file, like SoundLine: 120.
LOLZ
I managed to gain enough cycles to eradicate the glitches. And it was not easy but I used the IRQ to emulate the PPUSTATUS flag. It was not easy because it turns out the IRQ must not occur close to the NMI interrupt or the program bank will be lost. Unless the first instruction of the NMI interrupt is an sei but in that case you never have an IRQ and therefore it needs to add an variable telling that it is an IRQ during the NMI and then enabling interrupts...
I just insterted the PPUSTATUS update in the NMI
.
The trace recording was very useful.
I need a way to update a column of tiles from RAM to VRAM? A line update is easy with the DMA, but I do not see how to do it with an increment of 32 to transfer a column. With HDMA?
Any idea?
I'm not intricately familiar with the SNES's DMA, but msot DMA units will let you change the post-increment value. Does the SNES's DMA truly not allow this?
VMAIN has three increment modes: one adding 1 for horizontal transfers, one adding 32 for columns of mode 0-6 nametables, and one adding 128 for columns of mode 7 nametables. Writing to VRAM uses whatever increment mode is set in VMAIN, regardless of whether it's through PIO or DMA.
Writing in VRAM would do it with the 32word increment. But I must have my column in a continuous array in ram.
Maybe it can work with an indirect HDMA mode and tables?
However making a copy of 30 words to a transfer buffer can be quick, I made one of 128Bytes from the sprite buffer and it took 5 rendering lines. Maybe doing it by software is not that bad. I must try it.
I added the APU emulation today. I tried to use the source code of Memblers emulator but it seemed not up to date on the 65816 side (I could not load the SPC700 code, and used the routine form the NSF player ROM instead). And I do not have the assembler for the SPC700 code.
Therefore I used directly the binary data extracted from the NSF player big ROM using the memory dump of the SPC700 RAM and an hex editor to cut it out.
However I must find out what it does to emulate $4015 and other register updates.
The update_dsp routine works, but I need to understand what it does to the APU registers during the interrupt on the 65816 side. The APU emulation routines from the NSF player ROM and from the source in 2a03-src.zip are different.
If it helps, here is a link to the assembler I used for SPC700:
http://6502.org/tools/asm/tasm301.zipAttached to this post is the needed instruction table. Command to assemble is "tasm -t700 -b -a spc.asm"
Sorry the source release was a mess, the original project is too (1,944 files all in a single directory).
It tracks register updates in the "detect_changes" subroutine. The RAM at $7F4116 has some flags to tell the SPC when $4003,$4007, etc. were written. The trick is, the value the NES code writes to $4003,$4007,etc. has to be non-zero for it to be detected! This works out because the length counter bits must be non-zero for there to be sound anyways.
The length counters are emulated on the 65816 side, and this adds some extra handling to the $4015 register. The program uploads $4015 to the SPC, masked with the length counter result bits, like this:
Code:
lda $7F4115 ; length counter enable flags, sent to SPC
and $7F4015 ; what the NES program wrote to $4015
sta $7F4115
If I can get upernes to build over here, I'd like to take a shot at improving the sound quality. The main thing I'd like to try is using longer samples for the waveforms, because the short samples are extremely affected by the SPC's interpolation.
Thank you Memblers, I used the disassembly windows of bsnes+ and two of the procedures are exactly the same as the source zip. Only the code loading the binary on the SPC700 is clearly different. And the fast rom init sets slow rom in the source code.
I will check if what I have from the two other routines and the source code are the same.
Tasm may help, thanks.
I could not understand what 4116 was for. I though it was the gamepad data
. I was confused by this value.
Your 4 routines are already in Sound.asm, only two of them work and my backup register offsets are a mess in var.inc because I have an hardcoded ram @ after them (not enough reserved space). But by removing the unused backup registers, it should begin to work.
I will work on this tomorrow.
I you have a windows 64bits, you do not need to compile upernes. If the exe runs, it should be fine, the development is mostly in the asm files. (If the SDL graphic window showing the progress makes problems it can be removed)
About my progress:
Today I tried to update columns instead of using the DMA and even with loop unrolling it takes about 15 rendering lines instead of 4, and it glitches.
A way to improve the nametable transfer would help a lot, maybe by converting vertical scrollers like Xevious first. Lines updates are easy using the DMA. Column updates are tricky.
Maybe it could work by preparing the data from column to line after the sound emulation code. If enough cycles are available there. Or by preparing an HDMA table for colums.
My smb1.txt file to disassemble the indirect jump areas:
Code:
##########################################
# smb1.nes
crc32: $F2DB8422
DisableIndJumpPatching
SoundEmuLine: $97
IndirectJump: $06
addr: $8231
addr: $8FCF
addr: $8567
addr: $858B
addr: $859B
addr: $8652
addr: $865A
addr: $8693
addr: $86A8
addr: $86E6
addr: $93FC
addr: $88AE
addr: $92DB
addr: $9B0E
addr: $9A2E
addr: $85BF
addr: $85E3
addr: $8643
addr: $86FF
addr: $8732
addr: $8749
addr: $9061
addr: $8245
addr: $9131
addr: $B069
addr: $B0E9
addr: $B35A
addr: $AEDC
addr: $8FE4
addr: $889D
addr: $9071
addr: $AEEA
addr: $B376
addr: $C2F1
addr: $C8E0
addr: $CA77
addr: $98E5
addr: $B36D
addr: $BDD2
addr: $BC85
addr: $B233
addr: $9B01
addr: $BB38
addr: $B269
addr: $9B41
addr: $91CD
addr: $B245
addr: $9B14
addr: $B27D
addr: $B1E5
addr: $96C5
addr: $9B19
addr: $9A50
addr: $99F2
addr: $B94B
addr: $98AB
addr: $970D
addr: $B206
addr: $9AB7
addr: $999E
addr: $9218
addr: $C8D6
addr: $9806
addr: $D2D9
addr: $C30E
addr: $BDD8
addr: $9968
addr: $9A59
addr: $D311
addr: $B2A4
addr: $B3CF
addr: $B2CA
addr: $D2F2
addr: $D312
addr: $D34E
addr: $D3A2
addr: $9882
addr: $9224
I updated the variables used in sound emulation, the routines code is identical from what is in the ROM (except 2 or 3 areas were it did not branch and were I do not have the disassembled code). Not tested yet.
I added the two last procedures. It improved a lot the sound
I have a problem with the noise channel, and weird effects sometimes. Probably wrong offsets in some registers.
I improved the background update by writing directly into VRAM instead of having a buffer+DMA. But it does not update everything because a VRAM write is not possible during rendering. And when this occurs on a screen change where a lot of writes are made, the screen is messed up because the writes after line 0 fail.
I could try the HDMA transfer:
A 4byte transfer to the VRAM gate static addresses, first 2bytes are the VRAM@ and the next 2 bytes are the data. I will take the data from the ram buffer.
Every update @ and data must be written in the table.
Actually it works well but the attribute table refresh lags because it must use the rolling DMA update.
The BG update works perfectly now on SMB1
The trick was to properly emulate ForceVBLANK. The nes code forces vblank before updating the screen. I was not emulating it because it was easier to debug BG data, but is was time to add it.
Now it works. The VRAM update is made during the register write emulation and not during vblank. It saves 4 more lines during VBLANK.
Sound seems to have an initialisation problem, I must check the registers initial values.
I was cleaning the scripts and tried to convert other roms like Xevious. And I have a bug in sprite 0 irq handling.
I checked the other mappers, and with some more work any mapper 0 game will pass. Mapper 2 should also pass.
However mapper 4 will probably never work wel because of the CHR bank switching. It could be done for a specific game but not for automatic conversion.
I'd like to see your results for Concentration Room, Thwaite, and RHDE. It should be OK to upload videos of those games to major video hosts, as they won't trigger Content ID.
Hello I tried your roms and it did not work because uperness lacked bit Io emulation. The bit instruction is used on the ppustatus. But since, I improved many things (see the commits), and thwaite works except the controller buttons, I do not know why. Croom seems to work, same controller problem (weird theme you chose for this game).
I passed the following games:
ExciteBike 100%
BalloonFight 100%
Pinbal Nametable bug at start but everything is fine then.
Pacman: control keys also messed up.
Xevious begins to do something but it behaves like writing the Nametables during rendering. Is it possible????
Battle city does not work well, like it is using sprites from the 2 CHR banks at the same time. Pinbal swaps sprite and backgorund banks between the title screen and the game, but Battle City is like switching sprite banks during rendering. Weird behaviour, does anyone have information about this?
Thanks for trying them.
Hi-Def NES also had problems with
the controller reading routine I use in most of my games the first time Kevin tried it. It interleaves reading bits from controller 1 and controller 2.
Patrick FR wrote:
Battle city does not work well, like it is using sprites from the 2 CHR banks at the same time. Pinbal swaps sprite and backgorund banks between the title screen and the game, but Battle City is like switching sprite banks during rendering. Weird behaviour, does anyone have information about this?
8x16 sprites on the NES use both pattern banks simultaneously. The sprite bank select in $2000 is ignored, and instead the LSB of each sprite's character index is used as the bank number for that sprite.
I fixed the PAD problem, Thwaite works 100%.
Pacman works, except I have no sound while the sound routine executes.
Thanks for the solution AWJ, I could find a way to emulate it by using the bit 8 of the object data.
Patrick FR wrote:
I fixed the PAD problem, Thwaite works 100%.
Congratulations. Do you have the mouse that came with
Mario Paint?
I do not have the mouse for the snes.
Edit: Plus it is a rare object, it could be tried on an emulator?
Using the FCEUX debugger, I found out why PACMAN does not play music:
STA ($F6,X) @ $4000 = #$FF
It writes in the sound port using this direct indexed thing. X is 0 by the way. $4000 is in the direct page at $F6. It must be patched by hand, like adding an asm file containing a custom routine, and calling it from a patch at this address. This cannot be automatic, the address checks would slow everything down.
Edit: doing like Memblers in the NSF player should also work: executing from the WRAM and reading the register values from WRAM @ $4000 to $4015.
Hello.
I managed to execute the converted rom from work ram. It is how Memeblers emulation works because there is no need to patch the accesses to $4000 - $4015, it writes directly in WRAM to this address range.
I did this because pacman accesses sound registers in a way that cannot be detected when disassembling.
However, it does not work 100% yet, the sound is weird I must have missed something because pacman hangs.
There is one thing to watch out for when you execute from work RAM, you might have seen this already, but it is surprisingly common to find code that writes to zeropage with absolute mode. So they'll do something like STA $0000 / LDA ($00,y). But with direct register page allowed in bank zero only, and the data bank register being in WRAM, $00 and $0000 are different places. In my NSF player, I had to patch that code manually. In the NSF it was usually just 1 or 2 spots where that would happen, but there was at least one game where every single ZP access was in absolute mode (I didn't even bother with that one).
Thanks Memblers, I was not expecting this behaviour. I must look at this.
As I understand it, that can be worked around by running the NES program from bank $7E, as $000000-$001FFF is a mirror of $7E0000-$7E1FFF.
One plausible scenario is to assign to X and Y the indices of two elements in the same array and access one element with dd,X and the other with aaaa,Y, as the 6502 lacks dd,Y mode for most instructions that aren't LDX/STX.
Maybe it could work if using bank $7E. The lower 8KB of bank $7E is the same ram as the $00 bank. And I use $7E as data and program bank. I converted the following code (using the nes palette test rom) and it works:
Code:
lda $918
ldy #$00
lda #$CC
sta $0000
lda #$34
sta $00CC
lda #$00
lda [$00], y
It loads $34 from $00CC in A ($918 is my breakpoint address).
I am close to fix the audio, I may have forgotten some IO request in the $7E bank.
Pacman passes on WRAM, it plays sound.
I tested upernes from a fresh clone and it works (on needs to write the indirect jumps and then disable the tests on them, but once this is done, it works).
Sound is a little weird however, like when I first integrated it. But it plays more sounds than by using the IO patching method. I assume it is because it missed sound IO accesses when disassembling and now it is read from the WRAM at $4000. Maybe I should do a video to show the problem, not today however.
I was thinking about using upernes on NSF files. If they are like maper zero roms, it should work?
If all the starting bank number bytes are 0, then it's essentially the same as NROM, except:
- Valid data starts at the load address instead of $8000.
- Init ends with RTS instead of forever: JMP forever.
- Play ends with RTS instead of RTI.
- Audio registers and RAM outside the stack need to be initialized.
Thanks tepples. Hence it is mapper 0, upernes could be adapted to work with this. It would help to improve upernes sound accuracy (it actually does not work very well).