Ruby Runner motion test

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Ruby Runner motion test
by on (#238003)
One of my ambitions is to produce a successor to Boulderdash which features smooth motion, as seen on the Amiga game "Emerald Mine", rather than the tile-at-a-time motion seen on Boulderdash. Preliminary testing on the NES seems promising, though I'm not sure what sort of mapper would be best. This CNROM demo demo only features two kinds of objects: the PGMs (Placeholder Graphic Monsters) and the PRGs (Placeholder Rock Graphics), but can accommodates dozens of 16x16 or 16x24 moving objects without flicker.

While this could probably be built into an interesting game using CNROM, going beyond that would open up some possibilities. Perhaps the biggest limitation with CNROM is a limit of 64 metatiles. This demo alone uses sixteen for the PGMs, and seven for the PRGs, and some other kinds of monsters I'd like to include could need 24 or more. A mapper that could use one tile set for the upper-left corner of each metatile, one for the upper right, one fro the lower left, and one for the lower right, could increase the number of usable metatiles to 256.

Beyond allowing more tiles, such a mapper could also reduce the number of vblank intervals necessary to process each screen update, thus perhaps avoiding the need for overly complex logic to coordinate screen updates with game logic.

Does the idea of such a game appeal to anyone? What sort of mapper should I target?
Re: Ruby Runner motion test
by on (#238005)
Would it play anything like Crystal Mines, Exodus, and Joshua? If so, those were made on the Color Dreams mapper (#11), which is similar to GNROM (#66). These dynamically convert objects between sprites and background.

Another way to allow more metatiles is to let different objects share individual tiles.
Re: Ruby Runner motion test
by on (#238006)
Smooth scrolling is very commonplace on the NES, no need to overthink that aspect. 4-screen (name table RAM on the cartridge) is the easy way out for glitch-free scrolling, but you can also use horizontal mirroring and hide scroll artifacts by masking the sides of the screen (Alfred Chicken, Felix the Cat) or use vertical mirroring and blank some of the top and/or bottom of the screen (Jurassic Park, Big Nose Freaks Out), or just live with a little bit of scroll artifacts, like so many NES games do (even huge hits like SMB 3).

The tile limit on the other hand is not that simple to get around. The only mapper that really did anything to increase the tile count for the entire screen is the MMC5, which has a mode that allows access to 16384 tiles simultaneously through the use of an extra tile map that adds 6 more bits to each name table entry, for a total of 14 bits, and 2 bits for per-tile palette selection. More often than not, the MMC5 is overkill though.

If you're up for creating your own mapper, sure, you could use the name table fetch address to select a different CHR bank for each metatile quadrant, but then you'd run into all the drawbacks of issuing a custom mapper, such as having to hack emulators to support it and finding a way to mass produce it if you want to sell cartridges.

Do you really expect to use more than 256 unique ties for the background? A good artist can do wonders with that amount, and 256 tiles doesn't necessarily equals 64 metatiles... A good artist will often reuse the same tiles in multiple metatiles. If you need different graphics on the same level but not in the same screen, you can also use CHR-RAM and dynamically change patterns as necessary, or use a mapper like the MMC3 and switch patterns in 64-tile chunks.

BTW, I'm not on my PC so I wasn't able to check out your demo yet.
Re: Ruby Runner motion test
by on (#238007)
If you're interested, I have a custom mapper that is based on the FME-7, but supports the extended attributes of the MMC5 and is able to do the bank-per-tile idea you mentioned. I have it working on the PowerPak and in Mesen.
Re: Ruby Runner motion test
by on (#238008)
supercat wrote:
A mapper that could use one tile set for the upper-left corner of each metatile, one for the upper right, one fro the lower left, and one for the lower right, could increase the number of usable metatiles to 256.
While the needed hardware – latch nametable A0 and A5 lines and use those to choose banks – certainly is easy enough, nothing like that already exists. (The closest is Mapper 96, which instead latches A8 and A9. That said, why do you have seven different rocks? (If they're frames of animation, or a theme that changes from level to level, that's what multiple CHR banks are for.)

You may find Dwedit's demake of Chu Chu Rocket inspiring.

Quote:
What sort of mapper should I target?
Any mapper with an IRQ makes shaving off the bottom easy. Battletoads used extended blanking to shave off the top, on a very minimal mapper. But note that if you do use extended blanking, you more-or-less have to use OAMDMA - more than about 20 scanlines of blanking will result in DRAM starting to lose its contents, which will usually move on-screen.
Re: Ruby Runner motion test
by on (#238012)
lidnariq wrote:
That said, why do you have seven different rocks? (If they're frames of animation, or a theme that changes from level to level, that's what multiple CHR banks are for.)


Stationary rock, rock falling out of tile, rock falling into tile, rock sliding left out of tile, rock sliding left into tile, rock sliding right out of tile, and rock sliding right into tile.

The sixteen PGMs are move up/right/down/left out of tile, move up/right/down/left into tile, spin clockwise starting up/right/down/left, and spin counterclockwise starting up/right/down/left.

Using this approach means that it's only necessary to update the name tables once every game tick, rather than once every animation frame.
Re: Ruby Runner motion test
by on (#238014)
tokumaru wrote:
Smooth scrolling is very commonplace on the NES, no need to overthink that aspect. 4-screen (name table RAM on the cartridge) is the easy way out for glitch-free scrolling, but you can also use horizontal mirroring and hide scroll artifacts by masking the sides of the screen (Alfred Chicken, Felix the Cat) or use vertical mirroring and blank some of the top and/or bottom of the screen (Jurassic Park, Big Nose Freaks Out), or just live with a little bit of scroll artifacts, like so many NES games do (even huge hits like SMB 3).


My present plan is to mask the sides of the screen with sprites that could also be used to show score, time left, etc. The only movable sprites would likely be a pair of sprites for a 16x16 player character. Hopefully, setting OAMADDR to 248 and writing 13 bytes would work (setting OAMADDR might corrupt the first two and last two sprites, but rewriting the last two sprites and setting the first two to non-displayed locations should render such corruption irrelevant). The animation technique I'm using requires that name table updates all become effective on the same frame as the tile set switches, which on the NES necessitates double-buffering. Trying to do partial updates in four quadrants seems unworkably complicated.

Quote:
The tile limit on the other hand is not that simple to get around. The only mapper that really did anything to increase the tile count for the entire screen is the MMC5, which has a mode that allows access to 16384 tiles simultaneously through the use of an extra tile map that adds 6 more bits to each name table entry, for a total of 14 bits, and 2 bits for per-tile palette selection. More often than not, the MMC5 is overkill though.

If you're up for creating your own mapper, sure, you could use the name table fetch address to select a different CHR bank for each metatile quadrant, but then you'd run into all the drawbacks of issuing a custom mapper, such as having to hack emulators to support it and finding a way to mass produce it if you want to sell cartridges.


It seems weird that nobody made any provision for that, given how many games are based on metatiles.

Quote:
Do you really expect to use more than 256 unique ties for the background? A good artist can do wonders with that amount, and 256 tiles doesn't necessarily equals 64 metatiles... A good artist will often reuse the same tiles in multiple metatiles. If you need different graphics on the same level but not in the same screen, you can also use CHR-RAM and dynamically change patterns as necessary, or use a mapper like the MMC3 and switch patterns in 64-tile chunks.


Different tiles are needed for different state transitions. On the game tick where a PRG (placeholder rock graphic) starts falling, it will be replaced with a "rock leaving downward" tile at the same time as the tile below will be replaced with a "rock entering from above" tile; these changes will occur when the tile set switches so the former tile contains most of the rock. In the other three tile sets, the "rock leaving downward" will show decreasing amounts of the rock while "rock entering from above" will show more and more. If the space below the "rock entering" space is vacant, then on the next gametick, the "rock leaving downward" will be replaced with a blank tile, "rock entering from above" will be replaced with "rock leaving downward", and the tile below that will be replaced with "rock entering from above". All of this must be synchronzied with switching back to the first tile bank.

Quote:
BTW, I'm not on my PC so I wasn't able to check out your demo yet.


It's pretty sweet.
Re: Ruby Runner motion test
by on (#238015)
lidnariq wrote:
You may find Dwedit's demake of Chu Chu Rocket inspiring.


Based on the text, that sounds like the effect I'm after, though I wasn't particularly planning on using sprites for anything other than the player and for side-wall masking. The screen-shot links seemed to be broken, though.
Re: Ruby Runner motion test
by on (#238016)
tepples wrote:
Would it play anything like Crystal Mines, Exodus, and Joshua? If so, those were made on the Color Dreams mapper (#11), which is similar to GNROM (#66). These dynamically convert objects between sprites and background.

Another way to allow more metatiles is to let different objects share individual tiles.


Crystal Mines looks a lot like the effect I'm after, though I don't know whether that game would have flicker problems if more than four many objects became active at once on a scan line, or 32 became active in total. The approach I'd be using for Ruby Runner wouldn't be affected by such issues.
Re: Ruby Runner motion test
by on (#238018)
supercat wrote:
Quote:
BTW, I'm not on my PC so I wasn't able to check out your demo yet.


It's pretty sweet.

Pretty sweet indeed!

Now I see why scrolling might be a problem, you're using the two name tables to double buffer the background updates... Well, since you have nearly all sprites at your disposal for masking the sides of the screen, I guess you can easily create enough of room for the scrolling seam.

As for the tile count, well, I guess there's not much you can do about it other than use a custom mapper. Even the extended graphics mode of the MMC5 wouldn't be a good solution, since it only provides extra attributes for 1 name table, so you wouldn't be able to double buffer the background animations. On the other hand, AFAIK, the MMC5 does require its extended attributes to be updated *during* rendering which mean you wouldn't need to double buffer that part. It'd still be a lot more data to manage per frame though.
Re: Ruby Runner motion test
by on (#238024)
tokumaru wrote:
Pretty sweet indeed!


Thanks. Have you ever seen that many smoothly-animated objects on a scan line without flicker? Although the PGM designs are a bit meh, I really like the way they transition between motions except for the way they continuously spin when trapped. Adding another four metatiles for stationary up/right/down/left might help but I didn't think it was necessary for the demo.

Quote:
Now I see why scrolling might be a problem, you're using the two name tables to double buffer the background updates... Well, since you have nearly all sprites at your disposal for masking the sides of the screen, I guess you can easily create enough of room for the scrolling seam.


The way the NES handles horizontal scrolling would have been painful, even if a cart had 8K of nametable RAM usable as double-buffered four-way scrolling, handling the address discontinuity after a variable number of columns would require a lot of extra work. Simpler to reset the scrolling registers every game tick, and draw tiles in the right places to line up with the new scroll values, since all tiles are being redrawn every gametick anyhow.

Quote:
As for the tile count, well, I guess there's not much you can do about it other than use a custom mapper. Even the extended graphics mode of the MMC5 wouldn't be a good solution, since it only provides extra attributes for 1 name table, so you wouldn't be able to double buffer the background animations. On the other hand, AFAIK, the MMC5 does require its extended attributes to be updated *during* rendering which mean you wouldn't need to double buffer that part. It'd still be a lot more data to manage per frame though.


I think MMC2 and MMC4 could easily almost double the number tiles, by having each row start with either a "switch to set 0" or "switch to set 1" tile. Going from 64 tiles to ~120 would be a big improvement. Not sure if I'd find myself wanting more than 120 tiles, but I can easily see 64 as being inadequate.

What if any research has been done with regard to designing a "universal mapper" emulation VM? I think it should be practical to write a C module which would export only one external symbol and import none, using function pointers for all communication with the parent emulator. Using a bytecode-based VM for each memory fetch would add some overhead, of course, but if the VM builds converts the bytecode program into a table of structures that describe the different instructions therein, I think it should be possible to limit the performance impact while still keeping the bytecode program completely sandboxed.
Re: Ruby Runner motion test
by on (#238042)
supercat wrote:
What, if any, research has been done with regard to designing a "universal mapper" emulation VM?
I don't think it's a good idea.

A bundled HDL block, regardless of whether "HDL" is "bytecode" or "Verilog" or "netlist" and regardless of whether it's opaque or not, isn't actually for anyone. It's not for emulator authors, because it's too fragile ("bad descriptions", but you can add "wrong API in the original design" and "too much complexity to support and/or too much of a performance hit" ). It's not for any singular emulator author, because it's not going to be easier for them to write the implementation in this HDL than in whatever convenience abstractions they've already implemented in the emulator's native language. It's not for game developers, because the skill to write in HDL is extremely unrelated to developing a game, and also unrelated to every other skill they'll already have and probably ever would have. It's not for end users, who don't care. It's not for DRM, because it's too easily reverse-engineered.

Furthermore, there's nothing that actually constrains things to something that can actually be built. At best one ends up with overhead for no benefit, at worst one ends up with people designing hardware that can't see a hardware release, at which point it's not an NES game.

Partly due to the large number of already-defined mappers, the emulator author already has a big incentive to make new mapper hardware very easy to write, so it's not like writing these descriptions in C or whatever is hard. It's not even like build time would be meaningfully improved: these aren't large code bases.



And yet... at the same time, I have to admit it would be really cool to have something that was an NES emulator connected to some kind of digital logic simulator, to allow one to plug virtual random 74xx and 40xx and ROMs and RAMs in. But I'd also probably never actually use it beyond proofs-of-concept.
Re: Ruby Runner motion test
by on (#238047)
Wow this looks really nice. Interesting set of constraints on how objects move... so any object can only move to a currently empty space, and they all must synchronize to the same 8-frame sequence to start their movements? 8 frames to update the double-buffered nametable, during which the visible one is animated only by 4 CHR flips at 30fps?

As far as the mapper suggestions I'd put MMC3 at the top of the list with it's 4 x 1k CHR banking and ubiquitous support. If you restrict yourself to a subset of MMC3 features it can be even more practical to build in hardware too.

Otherwise any mapper with 1k CHR banking would probably be suitable? VRC6, VRC7, FME7, etc.

On the multidirectional scrolling front, VRC6 or N163 with CHR-RAM might be viable as a way to double-buffer 4-screen nametables, since they can both map CHR to nametable. Not sure how well you'd find emulator support for this. Otherwise MMC3 + 4-screen is very well supported and would be suitable for scrolling on 1 axis only, I suppose?
Re: Ruby Runner motion test
by on (#238060)
lidnariq wrote:
supercat wrote:
What, if any, research has been done with regard to designing a "universal mapper" emulation VM?
I don't think it's a good idea./quote]

I think you perhaps misunderstood my intention. i wasn't intending circuit-level emulation, but rather a VM with an instruction set focused on bit manipulation and 2/4/8-way branching. Using ca65, once one includes a file that sets up some macros and labels, something like CNROM, but with RAM at 0x6000, would be something like:


Code:
.word INIT,CPUREAD, CPUWRITE, PPUREAD, PPUWRITE, ... also define labels for debug read/write, etc.
INIT:
    LOADCONST32(8, 0x22000) ; Load virtual register 8 with 0x28000-0x6000
    LOADCONST8(9, 15) ; Load virtual register 9 with 15
    EXIT
CPUREAD:  ; 8-way branch on bits 13-15 of addr
    BRANCH8(CPUADDR, 13, DONE,DONE,DONE, RDRAM, RDROM, RDROM, RDROM, RDROM)
RDRAM:
    ADD(7, CPUADDR, 8) ; Compute R7=R8+CPUADDR
    READMEM(CPUDATA,7) ; CPUData = mem[R7]
    EXIT
RDROM:
    READMEM(CPUDATA,CPUADDR)
    EXIT
DONE:
    EXIT
CPUWRITE:
    BRANCH8(CPUADDR, 13, DONE,DONE,DONE, WRRAM, WRROM, WRROM, WRROM, WRROM)
WRRAM:
    ADD(7, CPUADDR, 8) ; Compute R7=R8+CPUADDR
    WRITEMEM(CPUDATA,7) ; mem[R7] = CPUData
    EXIT
WRROM:
    READMEM(7, CPUADDR)  ; R7 = mem[CPUADDR]
    ANDREG(CPUDATA,CPUDATA,7) ; CPUDATA &= R7
    ANDREG(7,CPUDATA,9)  ; R7 = CPUDATA & R9
    SHL(6,7,13) ; R6 = R7 << 13
    EXIT
PPUREAD:
    BRSET(PPUADDR,12, DONE) ; Go to DONE if PPUADDR.12 is set
    ADDREG(7,CPUADR,6) ; R7=CPUADDR + R6
    READMEM(PPUDATA,7)  ; PPUDATA = Mem[R7]
    EXIT
PPUWRITE:
    EXIT

Not a hardware description, but rather a description of what an emulator would need to do.


Quote:
A bundled HDL block, regardless of whether "HDL" is "bytecode" or "Verilog" or "netlist" and regardless of whether it's opaque or not, isn't actually for anyone. It's not for emulator authors, because it's too fragile ("bad descriptions", but you can add "wrong API in the original design" and "too much complexity to support and/or too much of a performance hit" ). It's not for any singular emulator author, because it's not going to be easier for them to write the implementation in this HDL than in whatever convenience abstractions they've already implemented in the emulator's native language. It's not for game developers, because the skill to write in HDL is extremely unrelated to developing a game, and also unrelated to every other skill they'll already have and probably ever would have. It's not for end users, who don't care. It's not for DRM, because it's too easily reverse-engineered.


Processing could be pretty simple. More complicated than a typical *single* mapper, but not by a huge amount.

Quote:
Furthermore, there's nothing that actually constrains things to something that can actually be built. At best one ends up with overhead for no benefit, at worst one ends up with people designing hardware that can't see a hardware release, at which point it's not an NES game.


There is a huge range of mappers that can be constructed by making a few cuts and jumps and maybe a "dead-bug" chip to an existing boards, or by reprogramming CPLDs.

Quote:
Partly due to the large number of already-defined mappers, the emulator author already has a big incentive to make new mapper hardware very easy to write, so it's not like writing these descriptions in C or whatever is hard. It's not even like build time would be meaningfully improved: these aren't large code bases.


What about the time required by an emulator author to examine someone else's code for a mapper before releasing it to the public, to ensure that it doesn't contain any security weaknesses that would allow a maliciously-constructed ROM to take over the real-world host machine? Using a sandboxed VM, even maliciously-crafted bytecode running a maliciously-crafted ROM file would be powerless to actually do anything to the host machine beyond wasting a lot of CPU cycles.
Re: Ruby Runner motion test
by on (#238070)
supercat wrote:
I think you perhaps misunderstood my intention. i wasn't intending circuit-level emulation, but rather a VM with an instruction set focused on bit manipulation and 2/4/8-way branching.
No, I think I understood what you meant. It's still not for anyone. It's still a solution in search of a problem.

You've invented a novel language that has nothing in common with skills already needed to develop a game. It's like how the SNES has two CPUs with radically different mnemonics albeit similar ISAs, and that posed yet another barrier towards people writing games for the SNES.

In order to take advantage of your VM, a developer needs to learn a novel assembler-like language. The set of people who this benefits is the specific set of people who like machine code enough more than C that they'd rather learn another kind of assembly instead of figuring out any existing emulator code base, and want third parties to maintain an interface so that they can still develop new and novel mappers.

Note that most of the current "best-of-breed" emulators grew support for both GTROM and UNROM512 pretty quickly after they saw hardware release.

Quote:
Not a hardware description, but rather a description of what an emulator would need to do.
Arguably, that's even worse, because it's easier to build things that can't inexpensively exist in hardware.

Quote:
There is a huge range of mappers that can be constructed by making a few cuts and jumps and maybe a "dead-bug" chip to an existing boards, or by reprogramming CPLDs.
I'm not talking about what easily-implemented things it covers. I'm talking about what hard to implement things it permits.

There's this phenomenon that's been observed repeatedly: when the emulated hardware is easier to use than the actual hardware, people treat the emulated hardware as authoritative. And if it does't enforce what can be built, things get designed that can't be built.

Quote:
What about the time required by an emulator author to examine someone else's code for a mapper before releasing it to the public, to ensure that it doesn't contain any security weaknesses that would allow a maliciously-constructed ROM to take over the real-world host machine?
No historical mapper is complex enough to do that. Even MMC5, which I'd suggest is the most complex, is pretty easy to audit. And honestly, even as grotesque as I find the COOLGIRL sort-of-a-flashcart, its code is pretty easy to audit if you're already familiar with the FCEUX abstractions.

But as far as mappers that people actually use for development? Have you looked at any existing emulator's mapper implementations?

Here's several emulators' implementations of mapper 96, the closest analog to the one you're floating:
FCEUX
Nestopia (note that this implementation is pedantically wrong, but irrelevantly so)
Mesen

Modern emulator and flashcart implementations of GTROM and UNROM512 have problems stemming from the Flash used, which is not something that would be easily solved by your VM.

For the most part, once an emulator author has written mapper support for the core set of licensed games (about 80 mappers), they'll have written all the convenience functions which do this the right way in the first place, and a module that didn't use them would be at best suspicious.

Quote:
Using a sandboxed VM, even maliciously-crafted bytecode running a maliciously-crafted ROM file would be powerless to actually do anything to the host machine beyond wasting a lot of CPU cycles.
At best it wastes a lot of CPU cycles. At worst it also wastes a lot of CPU cycles and involves extra code complexity with its own attack surfaces to achieve something that's more easily solved in the emulator's native programming language.

By the way, here's FCEUX's implementation of CNROM with PRG RAM.
Re: Ruby Runner motion test
by on (#238073)
lidnariq wrote:

I didn't know that was attested. (The name of the game it appears in reminds me of Freudian psychology.)
Re: Ruby Runner motion test
by on (#238143)
lidnariq wrote:
supercat wrote:
I think you perhaps misunderstood my intention. i wasn't intending circuit-level emulation, but rather a VM with an instruction set focused on bit manipulation and 2/4/8-way branching.
No, I think I understood what you meant. It's still not for anyone. It's still a solution in search of a problem.


At present, developers are effectively limited to mappers that include the specific combinations of attributes that people in the past have needed for their games. Need the ability to switch tile sets on certain rows? MMC2 and MMC4 can do that by using "shift" tiles. Need scan-line IRQ supprt? MMC3 supports that. Need both? Sorry.

The NES has always supported a "language" beyond 6502 assembly language: the language of electrons. Wire in a circuit to do what you want, and it will do what you want. Something like 74hc74 and an RC timing bodge (to delay A12 relative A0 and A5) could be used to do a two-way or four-way tile split easily. Something like a 74hc253 could be used to add IRQ support for a single split-screen in a side-scroller.

Many numbered mappers represent a number of related circuits, but there can sometimes be confusion as to what all of the parameters are supposed to mean. What I would envision as a typical process would be for a developer to visit a web page for a general family of mappers, specify a number of options, and receive from that some combination (depending upon the web page) of the following:

1. A schematic for a precise circuit (i.e. a boilerplate schematic with a few customization points filled in).

2. A fab-ready board layout for the exact circuit (a boilerplate layout with filled-in gaps for customization)

3. One or more JEDEC file(s) for various CPLD-based carts like INL-ROM.

4. A bytecode source file for the cart description.

5. A bytecode binary file that could be given to an emulator.

The first three parts could be done on a web page fairly straightforwardly using a little Javascript, but they wouldn't be any good for anyone wanting to emulate the game. Further, if any new options get added to the recipe (e.g. wire VRAM A10 to one of the CHR-bank-select bits), that would complicate emulation because it would be unclear whether there should be a new mapper number, or simply another parameter setting for an existing one.

[quote[]You've invented a novel language that has nothing in common with skills already needed to develop a game. It's like how the SNES has two CPUs with radically different mnemonics albeit similar ISAs, and that posed yet another barrier towards people writing games for the SNES.[/quote]

The only people I'd expect to be writing mapper code directly would be hardware designers, who could either release mapper code for specific configurations, or have a customization page that could generate mapper code customized for the exact hardware options selected.

Quote:
In order to take advantage of your VM, a developer needs to learn a novel assembler-like language. The set of people who this benefits is the specific set of people who like machine code enough more than C that they'd rather learn another kind of assembly instead of figuring out any existing emulator code base, and want third parties to maintain an interface so that they can still develop new and novel mappers.


I would expect that most developers would use pre-existing mapper code or a use tool to customize a mapper for their particular needs. On something like an INL-ROM cart, a more expensive CPLD would be needed to support e.g. MMC3 than to support a mapper that was designed to include a set of features tailored to a specific game. I would guess that InfiniteNesLives could probably make a variation of the INL-ROM board which was just like the original, but with a simpler CPLD and save a couple bucks in parts cost.

Quote:
Quote:
Not a hardware description, but rather a description of what an emulator would need to do.
Arguably, that's even worse, because it's easier to build things that can't inexpensively exist in hardware.


Actually, that could be used as a form of DRM for people who want to allow people to download their games for use on emulators, but don't want people making carts of them. Release a version of the game binary that targets impractical hardware, and sell a cartridge version that uses practical hardware in a copy-protected CPLD. Reverse-engineering CPLDs is non-trivial if they have a few seemingly-spare resources and aren't meant to be duplicated. One may have a game that seems to work, except that on pirate carts the second level boss is unkillable because the real cart does something "interesting" that the pirate carts don't.

Quote:
Quote:
There is a huge range of mappers that can be constructed by making a few cuts and jumps and maybe a "dead-bug" chip to an existing boards, or by reprogramming CPLDs.
I'm not talking about what easily-implemented things it covers. I'm talking about what hard to implement things it permits.


Someone could make an emulator-only game, but I'm not sure why they'd target a system with the color and sprite limitations of the NES if they weren't intending to make a cart of it. And someone who does want to make a cart of a game but doesn't understand hardware should stick to a hardware description made using a board vendor's tools.

Quote:
There's this phenomenon that's been observed repeatedly: when the emulated hardware is easier to use than the actual hardware, people treat the emulated hardware as authoritative. And if it does't enforce what can be built, things get designed that can't be built.


Such problems most often arise in cases where emulated hardware can be described simply, but real hardware's behavior deviates from those descriptions (e.g. OAM-RAM corruption issues). I don't see that as being so much of an issue if one visits a web page for a board that has a bunch of customization points, picks some options, and receives a mapper emulation file.

More to the point, with the exception of speed issues, there isn't really much one could specify in a VM that couldn't be done in real hardware. There are many things that couldn't be produced for a price many people would want to pay, but it's not the emulator writer's job to pass such judgments.

Quote:
Quote:
What about the time required by an emulator author to examine someone else's code for a mapper before releasing it to the public, to ensure that it doesn't contain any security weaknesses that would allow a maliciously-constructed ROM to take over the real-world host machine?
No historical mapper is complex enough to do that. Even MMC5, which I'd suggest is the most complex, is pretty easy to audit. And honestly, even as grotesque as I find the COOLGIRL sort-of-a-flashcart, its code is pretty easy to audit if you're already familiar with the FCEUX abstractions.


Such auditing needs to be done separately by the maintainers of every emulator. And looking through the code you linked, I wouldn't say auditing seems particularly easy since it's often not at all clear who's responsible for ensuring that what things are in bounds. I haven't found any actual security holes, but I did find a lot of places that violate the "single source of truth" principle. For example, code which allocates memory for the requested RAM sizes, and then uses a switch statement to set up pointers to 8K blocks of it, but doesn't directly validate accesses to ensure that they're valid. There may be no security holes in the code as written, but the design is brittle, and if someone who doesn't understand it perfectly tries to tweak it to implement a slightly-different banking scheme, that could easily result in subtle security weaknesses if a malicious ROM causes the mapper to allocate less memory than it's going to try to use.

Running all accesses to run-time-computed addresses through centralized validating load/store routines (whether in a VM or simply as a coding practice) imposes a minimal run-time cost on modern hardware (since the code to perform such validation will be used enough to stay in cache), but makes it much easier to guard against malicious ROMs.

Quote:
But as far as mappers that people actually use for development? Have you looked at any existing emulator's mapper implementations?

Here's several emulators' implementations of mapper 96, the closest analog to the one you're floating:
FCEUX
Nestopia (note that this implementation is pedantically wrong, but irrelevantly so)
Mesen

For the most part, once an emulator author has written mapper support for the core set of licensed games (about 80 mappers), they'll have written all the convenience functions which do this the right way in the first place, and a module that didn't use them would be at best suspicious.


I didn't notice the "convenience functions" using any consistent strategy to ensure that they wouldn't overrun their memory allocations if e.g. code on a cartridge that supports different addressing modes sets parameters which would be valid for the current modes, but then switches to a mode where those parameters aren't valid. Such mistakes could happen in bytecode as well as in C code, but if they happen in bytecode the interpreter would catch them and signal them an error, without allowing the code to take over the real-world machine.

Quote:
Quote:
Using a sandboxed VM, even maliciously-crafted bytecode running a maliciously-crafted ROM file would be powerless to actually do anything to the host machine beyond wasting a lot of CPU cycles.
At best it wastes a lot of CPU cycles. At worst it also wastes a lot of CPU cycles and involves extra code complexity with its own attack surfaces to achieve something that's more easily solved in the emulator's native programming language.


What fraction of an emulator's time is spent on mapper emulation, as opposed to all the other things the emulator has to do? I would expect that a moderately-optimized VM would involve one iteration of a loop like
Code:
    vmProc = vmState->cpuReadProc;
    stepLimit = 10000;
    do
    {
      vmProc = (*vmProc)(vmProc, vmState);
    } while(vmProc && --stepLimit);

for each of the virtual instructions to be executed. More CPU cycles than a simple memory fetch, but not so many as to meaningfully affect emulation speed.

Quote:


That's not the actual byte-handling logic, but merely code that sets up some tables for use by the byte-handling logic; it should hardly be surprising that a cart which doesn't really do much beyond basic read/write should be handled using the "handle common case" code.

If you don't like the idea of having a common machine-readable means of describing mappers in a way that emulators can process them, how would you suggest getting around the chicken-and-egg problems of people not wanting to use improved hardware techniques for fear of support, and emulator writers not wanting to support mappers that nobody will ever use? Bundling mapper byte code with an NES file would mean that if one's emulator can process mapper byte code, it will be able to play the NES file regardless of whether the author of the emulator ever heard about the mapper that it's using.
Re: Ruby Runner motion test
by on (#238144)
For what it's worth, I'm really digging supercat's idea for custom mapper support in emulators. I've always wanted something like that. It'll be hard to convince people to do things differently, though...
Re: Ruby Runner motion test
by on (#238145)
I'll ask the harder question (to everyone): is this really worth doing on the NES (or any preexisting classic console of said era)? Don't hastily reply: step away from the keyboard for a day and really think long and hard about it.
Re: Ruby Runner motion test
by on (#238146)
supercat wrote:
At present, developers are effectively limited to mappers that include the specific combinations of attributes that people in the past have needed for their games. Need the ability to switch tile sets on certain rows? MMC2 and MMC4 can do that by using "shift" tiles. Need scan-line IRQ supprt? MMC3 supports that. Need both? Sorry.
Actually, the JY Company mapper (#209) gives you both.

supercat wrote:
Something like a 74hc253 could be used to add IRQ support for a single split-screen in a side-scroller.
...did you mean some other part number? That's limited to choosing to generate an IRQ based on five inputs (A, B, 1G, 1Cn, 2Cn) ... I assume you meant to type the same '688 you mentioned last time.

Quote:
The NES has always supported a "language" beyond 6502 assembly language: the language of electrons.
And your VM doesn't particularly resemble such language of electrons. Writing things for the language you've specified doesn't resemble "oh, I put a latch here clocked by that thing and the output enable tied to this other thing", but instead implies sequentiality.

Quote:
5. A bytecode binary file that could be given to an emulator.
Which I'm supposed to accept is somehow better than a build of an emulator that supports the mapper directly...? Because emulator choice or something? Nevermind that the bytecode is an opaque box and if the result doesn't do what you think it should, you have to be familiar with the bytecode language and maybe your emulator has a VM debugger so that you can figure out what's going wrong... which gets back to the same starting point: you're arguing that the bytecode is somehow easier to understand than the language the emulator is already written in.

Quote:
The only people I'd expect to be writing mapper code directly would be hardware designers, who could either release mapper code for specific configurations, or have a customization page that could generate mapper code customized for the exact hardware options selected.
You are dramatically overestimating the complexity of adding emulator support in the emulators' native language. Even a bunch of times.

Quote:
Actually, that could be used as a form of DRM
DRM is an antifeature in my book, and any argument that presupposes it being a good thing is going to face extra scrutiny. DRM is nothing more, and nothing less, than a way to take things that could be permanent and make them transient and irrecoverable, excused as a way to make a buck (regardless of whether it actually does) at the cost of all future people forever after.

But even if it weren't: the VM bytecode would be too dense and easily-reverse-engineered of a target to be useful DRM.

Quote:
Someone could make an emulator-only game, but I'm not sure why they'd target a system with the color and sprite limitations of the NES if they weren't intending to make a cart of it.
Because they either don't know, or don't care. People who write music for NSF, because it was defined to support all the different famicom expansion audio at the same time don't care that using multiple expansions at the same time is unrealistic. NSF supports it.

People wrote ROM hacks and/or translations that targeted oversize MMC3 before all-new-parts reproductions were available. They didn't know that that wasn't something they could get. Sometimes they didn't understand or didn't care (e.g. the patch that removed this size enforcement from Nintendulator's source)

Quote:
And someone who does want to make a cart of a game but doesn't understand hardware should stick to a hardware description made using a board vendor's tools.
So, I'm supposed to accept that vendor lock-in is a good thing?

Quote:
There are many things that couldn't be produced for a price many people would want to pay, but it's not the emulator writer's job to pass such judgments.
That is exactly what I'm saying is.

If it can't ever have been affordably reproduced, it isn't an NES game. This isn't some arbitrary definition, this is actually the only thing that's internally consistent.

Quote:
Such auditing needs to be done separately by the maintainers of every emulator.
And every maintainer will have already written the relevant abstractions in support.

Quote:
And looking through the code you linked, I wouldn't say auditing seems particularly easy since it's often not at all clear who's responsible for ensuring that what things are in bounds.
That's the entire point of the abstractions. Use the abstractions, can't get an out-of-bounds access. Don't use the abstractions, and your code is suspect and should be treated with extreme prejudice.

You don't get to disregard the safety of their preexisting abstractions when you're presenting an argument in favor of your own abstraction.

Quote:
I didn't notice the "convenience functions" using any consistent strategy to ensure that they wouldn't overrun their memory allocations if e.g. code on a cartridge that supports different addressing modes sets parameters which would be valid for the current modes, but then switches to a mode where those parameters aren't valid.
... did you look at the same code I did?

what do you think FCEUX's setprg and setchr, and Nestopia's SwapBank, and Mesen's SelectXxxPage functions do?

why would you even think that they would be vulnerable?

why would you even think that's something that should be handled at this level, instead of inside the abstraction?

(it turns out that a bunch of games already rely on the upper bits being discarded)

Quote:
Such mistakes could happen in bytecode as well as in C code, but if they happen in bytecode the interpreter would catch them and signal them an error, without allowing the code to take over the real-world machine.
My argument is:
A VM to emulate this is
1- fragile
2- slow
3- easy to have subtly wrong "safe" surfaces
4- easy to expose attack surfaces
5- everyone's implementation will be subtly different

In contrast, your argument is that all C ever is forever unsafe. That's not a good argument.

Quote:
What fraction of an emulator's time is spent on mapper emulation, as opposed to all the other things the emulator has to do?
In practice, audio emulation seems to be the heaviest, because resampling or band-limited synthesis. Optionally, postprocessing like the NTSC filter or shaders. After that, all that's basically left is the CPU and PPU emulation, and you'll have the mapper overhead on every single cycle of each... possibly dozens of cycles of VM bytecode.

Quote:
it should hardly be surprising that a cart which doesn't really do much beyond basic read/write should be handled using the "handle common case" code.
That's my entire point. You gave a VM machine code implementation of CNROM with PRG RAM, I pointed out that the abstraction already handles that tidily.

I'll assume you want to talk about mapper 96 or MMC4 instead: the native FCEUX abstraction is still cleaner and easier to understand than the corresponding VM source would be. Cleaner, faster, not any more vulnerable.

Quote:
how would you suggest getting around the chicken-and-egg problems of people not wanting to use improved hardware techniques for fear of support, and emulator writers not wanting to support mappers that nobody will ever use?
By involving someone who can make the hardware in the first place and making sure that what they want can actually be done. Conveniently, at that time, they'll already have a manufacturer lined up.

(this was my 8288th post, the bus controller (permitting multiple masters, e.g. DMA or FPU) in the original IBM PC)
Re: Ruby Runner motion test
by on (#238175)
tokumaru wrote:
For what it's worth, I'm really digging supercat's idea for custom mapper support in emulators. I've always wanted something like that. It'll be hard to convince people to do things differently, though...
In all earnestness, what's keeping you from playing around in the source to existing emulators?
Re: Ruby Runner motion test
by on (#238176)
Although I am a programmer, I have little experience with modifying other people's code (I feel very insecure about making modifications to things I don't have a complete understanding of) and compiling desktop applications (I'm more of a script and web guy). But even if I was used to doing those things, it sounds counterproductive to code the same functionality for several different emulators, each using a different architecture for handling mappers, and having to re-apply those changes every time a new version of each emulator comes out, possibly having to deal with architectural changes.

IMO, implementing a mapper in a single language, only once, certainly beats having to constantly fiddle with several different code bases. And there's also the distribution issue: other people won't be able to play my custom mapper games with their preferred setups, they'll have to use the modified emulators I supply (if their licences allow it), and that may not even include their favorite emulators or operating systems. It's a hassle not only for the programmers but also for their audience.

It is possible that your new mapper is eventually Incorporated in the official versions of well maintained emulators, but it just sounds so much better to have this process completely automated, you know?
Re: Ruby Runner motion test
by on (#238177)
lidnariq wrote:
In all earnestness, what's keeping you from playing around in the source to existing emulators?

As for the NES, not much other than that I don't see how to build a .NET application (Mesen) or Win32 application (FCEUX debugging version) under Linux.

As for the Game Boy, half of SameBoy's debugging features are Mac-only, and bgb is closed source. But then it wouldn't need mappers as much anyway because of its built-in timers and 1-screen mirroring, except perhaps for a multicart.
Re: Ruby Runner motion test
by on (#238178)
The main arguments I see against it are essentially:
A) The odds of any significant amount of emulators gaining support for such a concept at this point in time is low, people will keep using nestopia, fceux, and others for a variety of reasons for a very long time. Unless whoever designed such a system decides to implement it themselves in multiple emulators, adoption will remain low.

B) Mapper logic can actually become a large part of CPU usage (e.g MMC5), especially for everything that watches the VRAM bus and reacts to it. This means the behavior either needs to a) be very roughly approximated for the purpose of super fast emulators (e.g QuickNES) or implemented with cycle accuracy with as much optimization as possible for cycle accurate emulators. Having a "VM" in the way here could potentially greatly affect speed-focused emulators, and would likely slow down accuracy-focused emulators some more. Personally if I'm going to make Mesen any slower, I'd rather use the extra CPU time to improve accuracy rather than implement a highly customizable "build-your-own-mapper" system :p

In the end you're designing a potentially complex system, that will have its own bugs in different emulators and which will very likely reduce performance. The benefit is that a handful of people may create their own mappers for it and presumably be able to run it on all emulators (but that's unlikely to happen because the majority of popular NES emulators are barely maintained these days). Not to mention that unless someone goes and correctly reimplements all 350+ existing NES/Famicom mappers using that new system, emulators will still have to implement regular mappers, on top of this new system.

tepples wrote:
not much other than that I don't see how to build a .NET application (Mesen)
All it takes is literally 2 packages (mono-devel & SDL2) and running "make" in the project's root folder :p
Re: Ruby Runner motion test
by on (#238179)
Yeah, I expect this to remain a dream, at least for now. It would be great to have emulators that behave just like an actual console, which can take any mapper and just work, because the mapper is entirely inside the cartridge. Bundling the mapper definition along with the ROM itself would mimic that perfectly, it'd be awesome. But it will take more than a handful of developers wanting to combine a few features from different mappers to make that happen. If this doesn't bring any sensible benefits for the masses, the chances of adoption are pretty low.

Even if you, as a game programmer, create your own emulator and games using a dynamic mapper system, other emulators authors are more likely to just implement hardcoded versions of your mappers (if your games are popular enough) than change how their emulators work. People have been doing emulation the same way for decades now, and while a few people are experimenting with different approaches (e.g. MetroBoy), most are perfectly OK with how things are.
Re: Ruby Runner motion test
by on (#238183)
tokumaru wrote:
IMO, implementing a mapper in a single language, only once, certainly beats having to constantly fiddle with several different code bases. And there's also the distribution issue: other people won't be able to play my custom mapper games with their preferred setups, they'll have to use the modified emulators I supply (if their licences allow it), and that may not even include their favorite emulators or operating systems. It's a hassle not only for the programmers but also for their audience.

My honest opinion is that the sense of scale in this comparison is very disproportionate.

Most NES mappers are a very small amount of code to support. On the order of a few hours or maybe even just minutes to actually write (especially if doing several at once).

Building a verilog simulator or whatever this universal mapper thing has to be, on the other hand, is probably about as complex as writing a whole emulator, in my estimation. Probably weeks of work, plus ongoing maintenance hunting down all the loose ends and bugs that come with an undertaking of this magnitude.

The maintenance work alone will completely overshadow the entire amount of work it would be to just implement the mappers natively like we've always had to do. I guarantee this. You can't just snap your fingers and come up with a system that will be able to handle any mapper you'd ever want to implement, with a spec and implementation that's robust enough to last forever. This is kind of like writing a whole second emulator for your emulator to use, for twice as much work and probably a lot worse performance, with additional collateral learning of some extra domain specific language for anyone who wants to implement a mapper. Not to mention how much of a barrier this is to anyone who wants to implement it from scratch.


My ballpark guess would be that you could write a traditional implementation of every extant NES mapper in a lot less time than it would take to get this dream system working. I could be wrong, but that's my honest estimate.
Re: Ruby Runner motion test
by on (#238187)
Yeah, I totally get that. I'm just pointing out that the current, "easy" way of creating new mappers (modify an emulator yourself, try to get actively developed emulators to support it) has its flaws, and that automating mapper support would be a great way to address those flaws, for good. I do get how hard this would be to implement, and that there would be performance implications.
Re: Ruby Runner motion test
by on (#238249)
I appreciate the responses about mapping emulation, and would like to continue them soon in on another thread on the emulation forum, but my main purpose in asking whether any research had been done was to avoid duplicating any existing efforts in sketching out a proposal. My goal isn't to recreate Verilog, but rather a DSL that is geared specifically toward the emulation of mappers. Verilog needs to deal with a large number of individual signals and timing relationships among them. Mappers generally have a small number of groups of signals which may be more sensibly treated as groups within 32-bit integers, and since the "virtual world" is stopped while a mapper is running, a sequential execution model will suffice.

It sounds as though the real-world mapper that's available and would be closest to what I want would be MMC2/MMC4 (either would probably work equally well). Those would get me up to 128 metatiles, but would leave a "load seam" across the top or bottom of the screen on PAL systems, and would also be more complicated (and thus more expensive) than should be necessary for the game. If I were designing a cart to optimally run the game as cheaply as possible without compromising gameplay, using back-in-the-day hardware, it would include:

  • 64K or 128K of PRG-ROM, banked as a unit
  • 2K or 8K of PRG-RAM
  • A 2-bit register that would capture A0 and A5 and use those to select a tile set
  • 2K of PPU-RAM wired for addresses 0x2000-0x1FFF, with address lines A0 and A5 disconnected, but one pin controlled by the CPU banking register
  • A pair of 4-input NAND gates to allow PPU addresses 1 xx11 1xxx xxxx to be served by NIRAM rather than the external RAM (for attribute fetches)
  • 64K of CHR-ROM wired using A0-A12 plus the two latched bits above and two bits from the CPU-side banking register
  • A banking register.

Much simpler mapping hardware than MMC2 or MMC4, but more useful for the needs of the game:

  • Nametable update time could be significantly reduced, from a per-metatile time of 28 cycles down to 11 or maybe even 8.
  • The ability to use almost 256 tiles, rather than only 64 [some tiles would need to be reserved for sprites].

Unfortunately, it sounds like no existing schemes come close.
Re: Ruby Runner motion test
by on (#238366)
Quote:
I think MMC2 and MMC4 could easily almost double the number tiles, by having each row start with either a "switch to set 0" or "switch to set 1" tile. Going from 64 tiles to ~120 would be a big improvement. Not sure if I'd find myself wanting more than 120 tiles, but I can easily see 64 as being inadequate.


I sort of struggle to see what MMC2 / MMC4 would provide that a scanline IRQ couldn't, except for a modest amount of saved CPU cycles.

And if you can live without the DMC channel in your game's music, you may not even need a mapper-based IRQ, but could use the DMC channel's IRQ as explained on the wiki: https://wiki.nesdev.com/w/index.php/APU_DMC
Re: Ruby Runner motion test
by on (#238367)
Bananmos wrote:
And if you can live without the DMC channel in your game's music, you may not even need a mapper-based IRQ, but could use the DMC channel's IRQ as explained on the wiki: https://wiki.nesdev.com/w/index.php/APU_DMC

I'm pretty sure you'd lose a lot of CPU time if you used this trick multiple times per frame, as one of the key elements of this technique is to waste CPU time to compensate for timing errors.
Re: Ruby Runner motion test
by on (#238368)
Bananmos wrote:
I sort of struggle to see what MMC2 / MMC4 would provide that a scanline IRQ couldn't, except for a modest amount of saved CPU cycles.


Using a scanline IRQ would require a relatively complex mapper chip. While MMC2/MMC4 are somewhat fancy, a game designed for MMC2 could be readily adapted for a discrete-logic board more readily than that required a scan-line interrupt timed to be suitable for the purpose.
Re: Ruby Runner motion test
by on (#238369)
supercat wrote:
Bananmos wrote:
I sort of struggle to see what MMC2 / MMC4 would provide that a scanline IRQ couldn't, except for a modest amount of saved CPU cycles.


Using a scanline IRQ would require a relatively complex mapper chip. While MMC2/MMC4 are somewhat fancy, a game designed for MMC2 could be readily adapted for a discrete-logic board more readily than that required a scan-line interrupt timed to be suitable for the purpose.
Either approach could work, but if the scan line IRQ isn't triggered at the optimal time it may be necessary to waste a fair number of CPU cycles.
Re: Ruby Runner motion test
by on (#238374)
tokumaru wrote:
Bananmos wrote:
And if you can live without the DMC channel in your game's music, you may not even need a mapper-based IRQ, but could use the DMC channel's IRQ as explained on the wiki: https://wiki.nesdev.com/w/index.php/APU_DMC

I'm pretty sure you'd lose a lot of CPU time if you used this trick multiple times per frame, as one of the key elements of this technique is to waste CPU time to compensate for timing errors.


Well, let's try to do the numbers then :)

For playback rate $C, we have a period of $06A = 106 cycles on NTSC. This means 106*8 = 848 cycles until an interrupt hits. Or ~7.46 scanlines.

This leaves us with 0.54 of a scanline, to do the work in our IRQ handler, or 61.3 CPU cycles. This should be plenty of time to enter the IRQ handler, write the bank-switching register, and leave the IRQ handler.

We'll need to repeat this IRQ handling up to 30 times in the frame, which means wasting ~1840 cycles in an NTSC frame, or just 6.1% of CPU cycles. I'd dare to call this "modest", considering that games like SMB1 wasted more than that just waiting for a sprite#0 hit. I'd say anything below 10% is an acceptable trade-odd for lower hardware costs.

Speaking of sprite#0 hit, there is of course a problem with this 1-DMC-IRQ-per-8-scanlines-proposal: After enough of these, the delay introduced when IRQ fires inside an instruction might have skewed your writes and pushed them outside of hblank into the visible portion of the screen.
For these reasons, I'd also suggest having a sprite#0 hit somewhere in the middle of the screen, on an IRQ-firing scanline, which can be waited on inside the half-time IRQ handler, in order to "recalibrate" your writes.

As a bonus, ensuring that sprite#0 always occurs allows you to wait for the end of vblank by polling for sprite#0 hit being cleared. Which can be used to activate/deactivate masking of the the top portion of the screen, to avoid scrolling artifacts.

Were this 20 years ago, I'd say you'd have to painstakingly do all this IRQ aligning on a real system. But the Event Viewer Sour added to Mesen makes these things very easy to develop in an emulator, and even highly enjoyable.
(just keep in mind that if you're targeting PAL, then not only are the periods different, but no emulator accurately emulates the DMC cycle steals AFAIK, so you'd need to be back to doing this on real hardware again, like in the old days...)
Re: Ruby Runner motion test
by on (#238375)
supercat wrote:
Bananmos wrote:
I sort of struggle to see what MMC2 / MMC4 would provide that a scanline IRQ couldn't, except for a modest amount of saved CPU cycles.


Using a scanline IRQ would require a relatively complex mapper chip. While MMC2/MMC4 are somewhat fancy, a game designed for MMC2 could be readily adapted for a discrete-logic board more readily than that required a scan-line interrupt timed to be suitable for the purpose.


True. And I certainly don't want to dissuade your from either using the MMC2, or designing your own simplified mapper if that's an important part of your project goals. We're all doing this odd hobby for fun, and if thinking up/implementing new hardware boards is what excites you, then by all means go ahead.

In fact, Never-obsolete isn't the only one who's designed an FPGA mapper prototype for extended attributes. I did this with more of a hypothetical "what's the minimum HW do you need at minimum for 8x8 attributes", which might be a more useful startin point for you. Have a look at this thread, where I've posted sources for both Powerpak and Everdrive: Mapper30 8x8 attributes on Everdrive
While it is made for 8x8 color attributes, it'd be trivial to change it to use the latched A0/A5 to control upper bits for CHR instead of the attribute table.

I haven't actually tried to implement it with discrete logic, but can't imagine it would be a lot of chips or dollars to add it to any discrete logic board. I just didn't pursue that further, because - as lidnariq would have put it - it was more of a solution looking for a problem. Very few game ideas actually *require* 8x8 color attributes.

Said that, it still sounds to me like designing a new mapper is a bit overkill for your game idea, and could easily end up being a bit of a distraction from designing the game itself. It's incredibly common with NES development in particular to got down the rabbit hole of hardware selection/design and end up not having time to focus on the software itself. Which is why I think the software-only solution with a DMC IRQ is neater, and a fun challenge to tackle in its own right - if you can live without DMC samples in your game, that is.
Re: Ruby Runner motion test
by on (#238377)
Bananmos wrote:
For playback rate $C, we have a period of $06A = 106 cycles on NTSC. This means 106*8 = 848 cycles until an interrupt hits. Or ~7.46 scanlines.

This leaves us with 0.54 of a scanline, to do the work in our IRQ handler, or 61.3 CPU cycles. This should be plenty of time to enter the IRQ handler, write the bank-switching register, and leave the IRQ handler.

We'll need to repeat this IRQ handling up to 30 times in the frame, which means wasting ~1840 cycles in an NTSC frame, or just 6.1% of CPU cycles.

Unfortunately, I don't think it's that simple. DMC IRQs don't fire a constant number of cycles after you start playback (it would be amazing if that was the case!), what really happens is that there's always an unpredictable delay, which you have to measure and compensate for in subsequent IRQs. It's this error measuring and compensation that wastes CPU time, and you have to compensate the error (which can be several scanlines) on every IRQ.
Re: Ruby Runner motion test
by on (#238379)
tokumaru wrote:
Bananmos wrote:
For playback rate $C, we have a period of $06A = 106 cycles on NTSC. This means 106*8 = 848 cycles until an interrupt hits. Or ~7.46 scanlines.

This leaves us with 0.54 of a scanline, to do the work in our IRQ handler, or 61.3 CPU cycles. This should be plenty of time to enter the IRQ handler, write the bank-switching register, and leave the IRQ handler.

We'll need to repeat this IRQ handling up to 30 times in the frame, which means wasting ~1840 cycles in an NTSC frame, or just 6.1% of CPU cycles.

Unfortunately, I don't think it's that simple. DMC IRQs don't fire a constant number of cycles after you start playback (it would be amazing if that was the case!), what really happens is that there's always an unpredictable delay, which you have to measure and compensate for in subsequent IRQs. It's this error measuring and compensation that wastes CPU time, and you have to compensate the error (which can be several scanlines) on every IRQ.


Oops. I really should have read that wiki more carefully myself!
I was under the impression that you could actually control the phase of this timer by starting a new sample at some variable point in the NMI. But yes it does appear this was misunderstanding of mine. To be honest I ever only tried the simple DMC IRQ variant in practice (use DMC IRQ for coarse timing, then use sprite#0 for sync)

So guess that means the documented method would be closer to wasting ~50% of CPU time to consistently bank-switch every 8th scanline, and the op's suggestion for MMC2 / a purpose-built mapper latching A5 easily wins out.

Though it does make me wonder if it's practically possible to jiggle the phase of the counter a bit in the vblank period to make the CPU use a bit less wasteful...
Because the places where the IRQs happen are reasonably deterministic if you're attempting to trigger them systematically in a frame, I suppose you should also be able to predict what "phase shift" you'll get in each NMI from one frame to the other, and possibly set off a few dummy IRQs, just to align the IRQs in the rendered frame a bit closer to your ideal IRQ firing? And then alternate playback rates $C and $B to keep the delay to your bank-switch write minimal.
Of course, it would make the programming effort way, way bigger and perhaps not very practical to do in anything but a tech demo... though maybe it's a good use-case for going crazy with Mesen's event viewer... :P
Re: Ruby Runner motion test
by on (#238382)
Bananmos wrote:
Oops. I really should have read that wiki more carefully myself!
I was under the impression that you could actually control the phase of this timer by starting a new sample at some variable point in the NMI. But yes it does appear this was misunderstanding of mine. To be honest I ever only tried the simple DMC IRQ variant in practice (use DMC IRQ for coarse timing, then use sprite#0 for sync)

So guess that means the documented method would be closer to wasting ~50% of CPU time to consistently bank-switch every 8th scanline, and the op's suggestion for MMC2 / a purpose-built mapper latching A5 easily wins out.


I've not really settled on the pros/cons of using interrupts vs MMC2 vs a custom mapper.

The first potential advantage I can see to an interrupt would be the ability to cleanly blank the top and bottom at consistent positions. I don't think a "load seam" would be visible on a typically-calibrated NTSC set, but on an NTSC set configured to show all visible lines it would be. Balancing this might be the need to special-case the scenario where a row character flip would occur just after the top of the screen, but I don't think the game will ever need to scroll by an odd number of scan lines, so I could avoid that issue by picking my scroll amounts vs. the top line placement.

The second advantage to an interrupt, which would be huge *if* it were workable, would be an ability to have pairs of rows use the same bytes of nametable RAM, so as to cut in half the number of writes required to draw the screen. Unfortunately, I don't think there's any way to disable rendering, switch the PPU address, and re-enable rendering, without needing at least one blank scan line.

Using MMC2 would eliminate the need for interrupt-handling code, and would "just work" on NTSC and PAL--a useful attribute since I don't have any real PAL hardware to test with. The downside would be an unclean top and bottom edge in PAL mode.

Using a custom mapper would allow almost 256 tiles rather than 128, though the advantage going beyond 128 is much less than that of going from 64 to 128. Using a CPLD-based mapper could offer a bigger advantage of cutting the time to update each row of metatiles from 448 cycles down to 142. If I use CHRAM wired so that the same chip also replaces the NIRAM, and all four cells of each metatile map to the same address, then if my understanding of the PPU is correct, writing a row of metatiles would simply be:
Code:
    ; Assume Y is zero
    lda rowZeroAddrH
    sta $2006
    lda rowZeroAddrL
    sta $2006
    lda rowZeroTile+0
    sta $2007,y
    lda rowZeroTile+1
    sta $2007,y
    ... 14 more tiles

each indexed STA would read PPU data, ignoring the result, but incrementing the address, so each tile write would be reduced to a 3-cycle load and 5-cycle store, as distinct from having to use:
Code:
    lda rowZeroTile+0
    sta $2007
    eor #$02
    sta $2007

for the top half of all sixteen tiles and then having to do:
Code:
    lda rowZeroTile+0
    eor #$01
    sta $2007
    eor #$02
    sta $2007

for the bottom half. Using a CPLD for the CPU side as well as the PPU side could improve update speeds even further: I think a pair of even minimal-cost CPLDs, a bunch of resistors to isolate the cartridge buses from those within the NES, a 74HC373, and a 32K WRAM could probably speed up even the enhanced version by another factor of four (with the fast-transfer mode latch enabled, have any address in the range $7000-$7FFF enable the WRAM with the corresponding address in some 4K block until M2 goes high; when M2 goes high, strobe the 373 and PPU CPLD and switch the WRAM address to $7000-$7FFF. If the upper range of addresses contains mostly do-something-immediate and jmp instructions that target the next address, the act of running code from those addresses would copy the "parallel" region of WRAM to the PPU-side RAM memory via the 373. I don't think that latter speedup would be needed for Ruby Runner, but it would make Elite-style graphics practical on NTSC systems.
Re: Ruby Runner motion test
by on (#238395)
supercat wrote:
If I use CHRAM wired so that the same chip also replaces the NIRAM,
A significant number of Famiclones don't let you disable NTRAM. There aren't very many historical games that rely on being able to – roughly 10 – and Memblers decided that compatibility with these consoles wasn't a concern for his new GTROM board. But it could possibly influence your decisions.
Re: Ruby Runner motion test
by on (#238397)
lidnariq wrote:
A significant number of Famiclones don't let you disable NTRAM.

Is there a reliable way to test for these in software? Would it work, say, to write different values to $2000, $2400, $2800, and $2C00, and then try to read them all back? And if so, is there a reliable way to display a message if both the internal nametable memory and the cartridge are responding to reads and writes of $2000-$2FFF?

Code:
THIS GAME IS FOR NINTENDO
ENTERTAINMENT SYSTEM AND
FULLY COMPATIBLE THIRD-PARTY
GAME CONSOLES.

A PIN ON THE GAME PAK
CONNECTOR USED FOR EXPANDING
VIDEO MEMORY IS NOT PRESENT
IN YOUR CONSOLE.  IT WOULD
ALSO SHOW PROBLEMS WITH
GAUNTLET, RAD RACER II,
AND CASTLEVANIA III.
Re: Ruby Runner motion test
by on (#238401)
tepples wrote:
lidnariq wrote:
A significant number of Famiclones don't let you disable NTRAM.

Is there a reliable way to test for these in software? Would it work, say, to write different values to $2000, $2400, $2800, and $2C00, and then try to read them all back? And if so, is there a reliable way to display a message if both the internal nametable memory and the cartridge are responding to reads and writes of $2000-$2FFF?

If internal and external memory are both responding in typical fashion to reads and writes in the range $2000-$2FFF, writing and reading back data in the range $2000-$23FF without any intervening accesses to $2400-$2FFF should behave normally without bus contention. If resistors are used to isolate the cart's internal bus from that of the console, then the console's NTRAM would reliably "win" in case of conflict--a condition that could be detected. Given that the ability to use external NTRAM was part of the design intention of the NES, however, I don't see much reason that a console that can't support it shouldn't be considered "broken".
Re: Ruby Runner motion test
by on (#238409)
supercat wrote:
Given that the ability to use external NTRAM was part of the design intention of the NES, however, I don't see much reason that a console that can't support it shouldn't be considered "broken".
Regardless of reasoning, my point is just that making a game that relies on disabling nametables has very little company, and will be broken on a significant minority of consoles. I strongly suspect that more famiclones have been sold with this brokenness than licensed PAL consoles. (You could also suspect – I think rightly – that more of those famiclones have been discarded due to being manufactured for disposabilty)

It's perfectly ok to choose to be incompatible, but that's all it is.
Re: Ruby Runner motion test
by on (#238413)
A bunch of people including koitsu and myself were discussing this in Kaydus's NESdev Discord server.

Sometimes people try to solve a technical problem with the NES using a custom mapper. The problem often takes this form:

  • NES is just barely too weak for the application.
  • Super NES is too strong, and players would expect production values that a solo or duo can't deliver on time and on a ramen budget.
  • TurboGrafx-16 is just right but too obscure. Goldilocks likes Little Bear's TG16 console but realizes there might not be a chance of actually selling any HuCards.

Attachment:
Goldilocks_and_Little_Bear.jpg
Goldilocks_and_Little_Bear.jpg [ 99.43 KiB | Viewed 4093 times ]


The NES sits in a sweet spot:

  • 2C02 PPU is capable enough for things to be recognizable but limited enough to be practical for 1 or 2 people making the assets. The former rules out the Atari 2600, the latter the Super NES.
  • The NES has a substantial installed base, meaning enough consoles in the field that people are willing to dig out of their closets and use. This rules out the TG16.

As for famiclone incompatibility, there are probably more Famicom, NES, and famiclone consoles that are compatible with 4-screen than there are TG16 consoles. And even with the engineering needed to build a custom mapper, you might still sell more copies of an NES game than an equivalent TG16 game.
Re: Ruby Runner motion test
by on (#238437)
And don't forget that most of us do this for fun, not for profit, even when money is involved, so we choose the NES because we love this specific machine, not because we're looking for the ideal retro console to do what we want.

The NES was designed with a pretty versatile cartridge slot, and during the life of the console people designed not only software for the machine, but a multitude of hardware improvements to go along with the games as well. Mapper design has the potential to be pretty fun, provided you know what you're doing and are willing to go through all the steps to get your designs out there and widely supported.
Re: Ruby Runner motion test
by on (#238459)
tepples wrote:
Sometimes people try to solve a technical problem with the NES using a custom mapper. The problem often takes this form:

  • NES is just barely too weak for the application.
  • Super NES is too strong, and players would expect production values that a solo or duo can't deliver on time and on a ramen budget.
  • TurboGrafx-16 is just right but too obscure. Goldilocks likes Little Bear's TG16 console but realizes there might not be a chance of actually selling any HuCards.


The NES was designed to allow for the possibility of custom hardware in cartridges. The routing of /PPUA13 and CIRAM /CE wouldn't make any sense otherwise. Loading up a cartridge with something that would have been impractical in the day (e.g. using an on-cartridge mmicrocontroller to run all the game logic and simply use DMA to feed the CPU and PPU muses while the main CPU simply executes:
Code:
loop:
    lda patchpoint1
    sta patchpoint2
    jmp loop

as needed to operate the controllers, sound, and sprites) would be cheating, but including hardware that could have been built, but wasn't, isn't.

There is much greater satisfaction in designing a game that pushes the limits of what would have been possible back in the day, than in designing a game which would be considered unexceptional on the target platform. If logic can be fit on a couple of CPLDs, that would suggest that it would probably have cost no more to make than something like the MMC3 chip that was, in fact, considered practical for use in production cartridges, with the provisos that RAM and ROM would have cost significant money. If one needed 128KB+32K of ROM and 8K+8K of RAM to produce a super awesome game that couldn't possibly be done with fewer resources, so be it, but a programmer who used such resources on a game that could have been implemented as 8K+0K of ROM and 0K+0K of RAM would not have been appreciated (though I don't know that any historical games ever went below 32K+8K).
Re: Ruby Runner motion test
by on (#238460)
tokumaru wrote:
The NES was designed with a pretty versatile cartridge slot, and during the life of the console people designed not only software for the machine, but a multitude of hardware improvements to go along with the games as well. Mapper design has the potential to be pretty fun, provided you know what you're doing and are willing to go through all the steps to get your designs out there and widely supported.


The reason I'd like to see a "universal emulator" is to facilitate exactly this. A CPLD-based cart could allow programmers a lot of versatility while being reasonably practical and cheap to manufacture; there's no reason a programmer using such a cart should need to be confined to the limits of existing CPLD fusemaps.

With regard to things like the PowerPak, I would expect that if a cart has a reprogrammable FPGA, it would probably be relatively straightforward to produce a web page that could accept CPLD fusemaps targeting a particular cart, along with any necessary jumper configurations, and produce a Verilog or VHDL implementation thereof, assuming the existence of a clock which is sufficiently fast relative to the CPU and PPU clocks). Essentially, one would simply generate a block which, on each clock edge, would compute the level at each macrocell relative to the current or previous values of all macrocells (the converter should apply logic propagation when possible without loops, and introduce a one-clock delay in each loop that's detected). CPLDs should be sufficiently simple relative to the routing capacity of FPGAs as to allow synthesis of any reasonable design.
Re: Ruby Runner motion test
by on (#238461)
supercat wrote:
If one needed 128KB+32K of ROM and 8K+8K of RAM to produce a super awesome game that couldn't possibly be done with fewer resources, so be it, but a programmer who used such resources on a game that could have been implemented as 8K+0K of ROM and 0K+0K of RAM would not have been appreciated (though I don't know that any historical games ever went below 32K+8K).

The smallest licensed Famicom game is Galaxian: 8K PRG ROM + 8K CHR ROM. The smallest licensed NES games are Donkey Kong and the rest of the launch lineup: 16K PRG ROM + 8K CHR ROM.

The smallest homebrew NES game I'm aware of is Magic Floor: 4K PRG ROM + 0K CHR ROM, using mapper 218 to repurpose half of nametable memory as 64 tiles' worth of CHR RAM. I'd bet Munchie Attack (4K PRG ROM + 8K CHR RAM) and Hot Seat Harry (1K PRG ROM + 8K CHR RAM) could be ported to mapper 218 if someone tried.

On the other hand, using the absolute minimum resources to make a given game isn't always the best option. It can be wise to spend more bytes on presentation to leave a better impression on the player. My Game Boy port of Magic Floor, for instance, adds a bunch of features, such as more detailed character animation, an animated demo, distinct textures for the floor cells and border, and achievements. And though Tetris could have been done in 4K, both Tengen and Nintendo saw fit to add cut scenes and the like to fill 32K PRG ROM + 16K CHR ROM.

One practical problem with fusemap recompilation as a web service is that obtaining synthesis software for old Xilinx FPGAs, such as the one in the PowerPak, is a pain.
Re: Ruby Runner motion test
by on (#238466)
tepples wrote:
One practical problem with fusemap recompilation as a web service is that obtaining synthesis software for old Xilinx FPGAs, such as the one in the PowerPak, is a pain.


For FPGAs, that's likely true, and I don't know what if anything can be done about that unfortunate situation. CPLD fusemaps tend to be fairly simple, though. If one has a fusemap for a mapper, and knows which rows and columns are associated with which nodes, it should be straightforward to compile a device with logic assigned to prevent nodes from being optimized out, blank out the fuses associated with those nodes, and then fill in fuses according to requested configuration options without needing to use vendor tools. If people targeting a CPLD use the CPLD during development and the game is any good, someone with FPGA tools should be able to process the translated version into an FPGA fusemap.