Super Mario Bros. on the PC-Engine?

Super Mario Bros. on the PC-Engine?
by cd_vision on 2008-07-24 (#35233)

I've got this rom for the PC-Engine that is a port of Super Mario Bros. Does anybody know the history on this? It seems to also be found in a 6-in-1 fami collection.

Did they use the original code from the famicom game, or is this programmed from the ground up? Because except for the colors and sound being a little off, it's indistinguishable from the original to me.

by mic_ on 2008-07-24 (#35234)

You could try comparing the ROMs and see if you find any chunks of code that are the same in both of them. Anything that accesses the APU, PPU or joypads would've had to be partially or completely rewritten, but the main game "engine" should've been possible to move from the NES to the PCE with few changes.

by ccovell on 2008-07-24 (#35238)

NES games can be hacked (with considerable effort) to run on the PCE.

Those NES hacks are from Image, a Finnish cracking group. Since Image were Finnish, they used the PAL ROM of Super Mario Bros. for their PCE conversion, thus the colours are a bit off and the speed is wrong on NTSC systems such as the PCE/Turbo. Image also hid a few pictures at the end (?) of their PCE ROMs; go have a look for them in some tile editors.

by doppelganger on 2008-07-24 (#35254)

Very interesting. I had no idea such a port even existed.

by LoneKiltedNinja on 2008-07-24 (#35264)

Out of curiosity, I've never thought seriously about PCE dev work. I'm sure I could pick up the basic ISA, but how different is the theory behind the sound and graphics systems? Iirc PCE uses mainly PCM sound, but is it strictly sampled or can you play arbitrary bitsrings? And how much more complex is it working with the greater range of sprite sizes? FractalEngine looks like it has some direct-draw. Is that for-real, or did you have to rig a clever raster hack?

by ccovell on 2008-07-24 (#35269)

The PC-Engine was clearly designed by Hudson to be as easy to program for as the Famicom, yet much more powerful, in order to lure FC developers to the system.

The graphics system on the PCE is tile-based like the FC, meaning that graphics are always composed of a tilemap of adjustable size, and then tile characters taking up the rest of VRAM. There is enough VRAM on the PCE for an uncomprssed/unoptimized 512x240 pixel image, which is what I set up in FractalEngine. My program sets up such a screen, then fills in the tile characters as it renders the fractal. It just so happens that each 8x1 horizontal line in a graphic tile takes up 4 bytes, so to get this "direct draw" thing going, I just had to have it change the VRAM addresses accordingly and write the fractal image to VRAM, 8 pixels at a time. There was no need to read previously-drawn tiles because of the tile bitplane layout.

Sprites are easy to use and set up for large sizes, the only headache being figuring out the VRAM offset for sprite tiles. Really stupidly, the sprite tile format on the PCE is different from the BG tile format, so sprites and BG can't use the same CHR graphics. I don't know why Hudson did it this way...

Sound on the PCE can be PCM, but it really is just a glorified PSG chip. It is a WSG: a PSG chip with a single user-definable waveform cycle for each sound channel Most games set up a waveform for each channel, using 4 or 5 channels as regular WSG, with perhaps the 6th channel set up with a CPU timer for PCM drum effects. Very few games use all the channels for PCM, because it is a bit tougher to do, and more CPU intensive. The PCE's sound chip is no SPC-700.

by tepples on 2008-07-24 (#35277)

ccovell wrote:
Sound on the PCE can be PCM, but it really is just a glorified PSG chip. It is a WSG: a PSG chip with a single user-definable waveform cycle for each sound channel

Like the Game Boy's counterpart to the NES triangle channel, right?

by mic_ on 2008-07-25 (#35281)

Yeah, the samples are 4-bit on the GB and 5-bit on the PCE, but it's the same concept.

by LoneKiltedNinja on 2008-07-25 (#35291)

ccovell wrote:
My program sets up such a screen, then fills in the tile characters as it renders the fractal. It just so happens that each 8x1 horizontal line in a graphic tile takes up 4 bytes, so to get this "direct draw" thing going, I just had to have it change the VRAM addresses accordingly and write the fractal image to VRAM, 8 pixels at a time. There was no need to read previously-drawn tiles because of the tile bitplane layout.

So let me see if I'm parsing this correctly- you still compose the background of tiles, but you basically fill in the tiles at runtime via the CPU, as opposed to the NES architecture which (to my knowledge) makes it a pain for the CPU to get directly at the CHR data, and doesn't give you enough distinct tiles to fill more than a fraction of the screen anyway?
The best easy hack I could think of for the NES was a premade CHR bank with every possible combination of 4 quarter-tiles x 4 colors...

by MottZilla on 2008-07-25 (#35292)

I'm sure the PC-Engine still has VBlank rules. You update the graphical hardware while it's between frames. I would guess the PC-Engine has a large amount of VRAM that the CPU loads tiles into through memory mapped registers similar to the NES. But I know the PC-Engine can remap its 64K of address space so maybe you can map VRAM into CPU space.

Honestly I don't see what you mean about the NES. The CPU can modify PPU memory well enough for some amazing games. Sure it's not fast enough to compare to how you do things on modern PCs with a frame buffer and tons of bitmaps. But it works pretty damn well when you consider how limited the hardware was back then.

What are you talking about a premade CHR bank with every possible combo of 4 quarter tiles x 4 colors? Hacking NES games to PC-Engine wouldn't require duplicating tiles for each palette. I don't really know that much about PC-Engine, but it's from the same era as the Genesis/MegaDrive and SNES.

But anyway, the NES may seem like a pain to work with, but if you take your time and think about what you are doing I really don't think it's as hard as you think it is.

by ccovell on 2008-07-25 (#35294)

The PCE still has VBlank rules, but the VDC can still be written to during the active display. I'll quote Charles MacDonald:
Quote:
During the active display period of a scanline, the VDC can do one 16-bit
access to VRAM on each cycle of the dot clock. Bits 1-0 of MWR tell the VDC
how to divide this amongst several sources:

1. CPU (reading or writing a word via register $02)
2. Background character pattern generator data (one read is for bitplanes
0 and 1, another is for bitplanes 2 and 3, either one or two are needed
per character)
3. BAT data (character name and palette, one fetch needed per character)
Code:
Bit Dot Dot cycles within an 8-dot unit
1 0 Width 0 1 2 3 4 5 6 7
-------------------------------------------
0 0 1 CPU BAT CPU ??? CPU CG0 CPU CG1
0 1 2 --BAT-- --CPU-- --CG0-- --CG1--
1 0 2 --BAT-- --CPU-- --CG0-- --CG0--
1 1 4 ------BAT------ ----CG0/CG1----

CPU - A read or write to register $02
BAT - The palette block and character name from the BAT
??? - Unknown, possibly an unused 'dummy' access
CG0 - Bitplanes 0, 1 from the character generator
CG1 - Bitplanes 2, 3 from the character generator

The default mode all games use is 0, as far as I can tell, modes 1, 2 are
identical, and mode 3 enables the CG mode bit as described later.

I never quite understood this exactly. VRAM can be written to at any time, but the access that the CPU is given is spread between VDC VRAM accesses. At any rate, I got small graphical corruption when I tried writing freely to the VDC in my Fractal Engine program, so I set up writes to occur during HBlank interrupts. I guess this slows down the program a little bit, but fractal computation stil takes longer than one scanline, I figured.

I might go back to my program someday and see if I can get it to write to VRAM at any time, speeding up writing.

On the NES, yes CHR-RAM writing is limited, but it can still be optimized -- look at what carts like Videomation do. I think for fractals, your 4x4 CHR bank is good enough, because who wants to watch a fractal on the NES render four times slower than it already is?

by MottZilla on 2008-07-25 (#35299)

I'm not sure what a fractal engine is, but I don't see the point. These machines are meant for games. They are perfectly capable of performing rich and enjoyable games.

As for CHR-RAM and the NES in general, just look at Battletoads. :p You can do alot with the NES. You just can't be as wasteful as you can with newer platforms. But when you consider how simple most games are, and how much fun simple games can be, the NES can get the job done. And it does quite well if you pair the NES with a beefy memory mapper like the MMC5. Though even with a MMC3, you can accomplish alot in your games.

by ccovell on 2008-07-26 (#35304)

MottZilla wrote:
I'm not sure what a fractal engine is,

http://www.disgruntleddesigner.com/chri ... ngine.html

MottZilla wrote:
but I don't see the point.These machines are meant for games.

MottZilla wrote:
They are perfectly capable of performing rich and enjoyable games.

True, but they are also capable of running various other kinds of software. You probably also don't see the point of demos, either.

MottZilla wrote:
As for CHR-RAM and the NES in general, just look at Battletoads. :p You can do alot with the NES. You just can't be as wasteful as you can with newer platforms. But when you consider how simple most games are, and how much fun simple games can be, the NES can get the job done. And it does quite well if you pair the NES with a beefy memory mapper like the MMC5. Though even with a MMC3, you can accomplish alot in your games.

This, at least, is true.

by tokumaru on 2008-07-26 (#35306)

ccovell wrote:
You probably also don't see the point of demos, either.

To me, the main point of demos is showing uncommon amazing effects, that couldn't be normally used in games because they require too much CPU time/ROM/RAM/whatever.

Demos vs. games
by tepples on 2008-07-26 (#35307)

MottZilla wrote:
I don't see the point. These machines are meant for games.

Are Nintendo platforms also meant just for games? If so, are these games?
Videomation
Mario Paint
Workboy
PictoChat
Electroplankton
Animal Crossing: Wild World
Play-Yan
MoonShell
DSOrganize
All of these run on a Nintendo platform, but for each one, I can think of at least one objection to calling it a game.

tokumaru wrote:
amazing effects, that couldn't be normally used in games because they require too much CPU time/ROM/RAM/whatever.

Have you played Recca? It's a shooter that looks like a demo with sprites on top.

by MottZilla on 2008-07-26 (#35309)

ccovell wrote:
True, but they are also capable of running various other kinds of software. You probably also don't see the point of demos, either.

Oh no I didn't mean to go that far. Demos are very impressive. It is always neat seeing what a platform can do in other tasks. I've seen alot of demos you've done and always been quite impressed.

The thing I mainly didn't like about the guys idea with this fractal engine was this "directdraw" business on PC-Engine really. What's the point if you are just plotting pixels? That's no fun, hardware sprites and BGs are where it's at. For games I mean.

by LoneKiltedNinja on 2008-07-26 (#35312)

Apologies for the long post, but you guys are giving me a lot to work with here

IMO, consoles are designed to play games, which is what makes it more impressive when people write console apps for other things. Resource wastefulness is agreeably a big issue with old CPUs, but it's still an issue today (Did you know an X86 floating-point division can take up to 20 times as long as an addition? How many polygons the PS2 can render without chugging? What a 'page fault' even means?).

The bigger and more interesting challenge for me is making the hardware do things it was never intended to do precisely because it was designed for games, even if those things are cheap enough to be done during games. For instance, as has been stated (and with apologies to those of you who know this cold), the NES and PCE use tile-based graphics systems, and have no official 'direct-draw' capacity. Fractals, by definition, are much friendlier to draw when you have individual control over the color of every pixel. For his Mandelbrot Set demo, ccovel knows some PCE tricks which, whether because I haven't tackled the architecture/terminology the same way he has, or because I'm just being stupid, I'm still wrapping my head around. They are allowing him to indirectly set the color of every pixel in VRAM in spite of seemingly only having direct control over which 8x8(?) tile from memory is copied to which 8x8 square of VRAM. The best I could manage for my Mandelbrot Set on on NES was to use 4x4-pixel blocks (as large as 1/4 CHR tile) as 'pixels', since conveniently 4 blocks per tile ^ 4 colors per block = 256, the total number of tile indeces in 1 CHR ROM bank. I 'direct draw' by calculating the color I want for each block, then looking up the tile with those 4 blocks from my bank. I suppose I could use individual writes through the PPU to edit actual tile memory, but to me that goes against the notion of how the "average" cartridge was set up- your CHR banks are in ROM, and unless you sacrifice a chunk of a bank to use as a writeable scratchpad (at added cost and headache for the hardware engineers), all you can do is copy existing tiles into your nametables. My earlier comment on how the NES architecture makes getting at your CHR data difficult revolved around precisely this- the CHR address space is not directly accessible by the CPU, only by reads/writes through the PPU, and even then writes to your actual pixel bitmaps are expected to fizzle under most circumstances.

Similarly, I acknowledge games like Battletoads, Kirby's Adventure, Final Fantasy III, Somari, and others are awesome games, but knowing what the NES "can" do, I can pick out pretty much exactly how the games are working their magic. Games like Recca are far more entertaining for me because I can't always tell how the system does what it's doing. Recca has got to be devoting some seriously anal timing loops to those full-screen raster waving effects, but moreover, I could swear I count more than 64 sprites going at once without much clipping or flicker, even on 'strict' emus like Nestopia. Heck, the train level in Little Nemo had me mildly stumped for a bit with its apparent multi-layered multi-directional scrolling/parallax. Even the MegaMan 2 title uses a vertical parallax effect that "shouldn't be possible." And porting SMB to the PCE? I can see how a good bit of the game code would be a straight shot, but the whole reason I've been curious about the graphics, sound and I/O is to see how much one would need to bend over backwards or come up with creative solutions to make the PCE do something it may not have been intended to.

by tepples on 2008-07-26 (#35316)

LoneKiltedNinja wrote:
the NES and PCE use tile-based graphics systems, and have no official 'direct-draw' capacity. Fractals, by definition, are much friendlier to draw when you have individual control over the color of every pixel. For his Mandelbrot Set demo, ccovel knows some PCE tricks which, whether because I haven't tackled the architecture/terminology the same way he has, or because I'm just being stupid, I'm still wrapping my head around. They are allowing him to indirectly set the color of every pixel in VRAM in spite of seemingly only having direct control over which 8x8(?) tile from memory is copied to which 8x8 square of VRAM.

That's because you edit the tiles themselves. Videomation for NES, Elite for NES, Qix for NES and Game Boy, FaceBall 2000 for Game Boy and Super NES, Mario Paint for Super NES, Super Game Boy's border editor, Game Boy Camera, and the cut scenes of Metroid Fusion for GBA do exactly this. I'd imagine that RPGs for the Super Famicom do the same so that they can put kanji into text boxes. So does that proportional font demo that Blargg and I made for that ebook project. In fact, the menu for the GBA and DS versions of Lockjaw works like this, treating the screen as a pixel-mapped surface despite that it's actually made of tiles, so that it can use a proportional font and still fit in the "little bank" of the DS VRAM.

Quote:
I suppose I could use individual writes through the PPU to edit actual tile memory, but to me that goes against the notion of how the "average" cartridge was set up- your CHR banks are in ROM

So U*ROM, A*ROM, B*ROM, and SNROM aren't "average"? I'll grant that CPROM and TQROM are special cases.

by LoneKiltedNinja on 2008-07-27 (#35334)

Fine, fine, you win. I've only built a NES on a protoboard and hooked carts up to it, never actually built a cart. My impression from the console's end was that the majority of carts used non-writeable memory for CHR, and from a ROM programmer standpoint, that even writeable CHR memory was of limited value since the PPU could only index 256 tiles per layer- enough for some font characters or basic line drawing, but a bit lacking for large-scale rendering.
I can believe that the newer the system, the larger and more prevalent writeable memories become, I'm just surprised so many developers would have gone to the effort/expense in the late 80s.

And that still doesn't contradict the notion that drawing a full-screen fractal by manually rewriting the tiles is a trick on PCE and not really an option on NES.
...unless you, say, manually track which tiles would be identical and hope that in any given render you don't need more than 256 unique tiles... (edit: or, reading up on that e-book thread, 512 tiles provided only 256 were needed for any given set of rows)
Y'know, that may just work, what with all the detail being on relatively thin edges around the nodes. I may need to try it some time. Would I be correct to guess that since writeability is determined by the type of chip in the cart, not the console, most emulators will honor writes to CHR as if the cart did use writeable memory?

by tepples on 2008-07-27 (#35349)

LoneKiltedNinja wrote:
My impression from the console's end was that the majority of carts used non-writeable memory for CHR, and from a ROM programmer standpoint, that even writeable CHR memory was of limited value since the PPU could only index 256 tiles per layer- enough for some font characters or basic line drawing, but a bit lacking for large-scale rendering.

The PPU can index tiles from the $0000 half of the pattern table or from the $1000 half. A few games write to PPUCTRL to switch between $0000 and $1000 halfway down the screen, such as Qix. This is more than enough for a 24x20 tile playfield.

Quote:
And that still doesn't contradict the notion that drawing a full-screen fractal by manually rewriting the tiles is a trick on PCE and not really an option on NES.

You could be right about fractals. But then, a lot of old 8-bit home computers such as the ZX Spectrum had a similar tiled graphics scheme, needing the same tile-based approach to frame buffers.

Quote:
...unless you, say, manually track which tiles would be identical and hope that in any given render you don't need more than 256 unique tiles [...] Y'know, that may just work, what with all the detail being on relatively thin edges around the nodes.

Nice idea. Now you're thinking like the developers of Color A Dinosaur, another NES game using pixel-level edits to CHR RAM.

Quote:
Would I be correct to guess that since writeability is determined by the type of chip in the cart, not the console, most emulators will honor writes to CHR as if the cart did use writeable memory?

Emulators decide whether to honor CHR writes by looking at the board spec, which makes up the first 16 bytes of an iNES file. The key values for this are the board class (aka "mapper number") and the CHR ROM size. Two board classes (TQROM and some Chinese MMC3-clones) have both CHR RAM and ROM, selected by the CHR bank number. Emulators assign 8 KiB of (writable) CHR RAM after the end of CHR ROM. For everything else, if the CHR ROM size is 0 KiB, emulators assign the appropriate amount of CHR RAM based on the board class: 16 KiB for CPROM and 8 KiB for most everything else.

by LoneKiltedNinja on 2008-07-29 (#35386)

Thanks, tepples.

tepples wrote:
The PPU can index tiles from the $0000 half of the pattern table or from the $1000 half. A few games write to PPUCTRL to switch between $0000 and $1000 halfway down the screen, such as Qix. This is more than enough for a 24x20 tile playfield.

I noticed that trick in your e-reader. Theoretically, with enough attention to timing, could one flip the banks after any sufficiently large number of rows? (enough, at least, to handle the interrupt and move sprite0?)

tepples wrote:
Emulators decide whether to honor CHR writes by looking at the board spec, which makes up the first 16 bytes of an iNES file. The key values for this are the board class (aka "mapper number") and the CHR ROM size. Two board classes (TQROM and some Chinese MMC3-clones) have both CHR RAM and ROM, selected by the CHR bank number. Emulators assign 8 KiB of (writable) CHR RAM after the end of CHR ROM. For everything else, if the CHR ROM size is 0 KiB, emulators assign the appropriate amount of CHR RAM based on the board class: 16 KiB for CPROM and 8 KiB for most everything else.

I suppose combining multiple memory chips would be nicer with a mapper. Possible without (EE rant omitted), but ugly. Are there really carts out there that use CHR writes as NOPs so emulators need to check the mapper, or does every cart with CHR RAM simply happen to have a mapper that has other functionality that the emulator needs to handle and/or fall into the 1 bank ROM / 1 bank RAM class?

Edit: Bleh. Yeah. Okay. More than 2 banks === uses a mapper. *headsmack* But for mapperless carts, it does sound like the emulator gets to assume that any bank not filled with stuff is writeable.

One of the things I was most relieved about during the make-a-NES project was that all the mapper stuff was cart-side so we didn't actually need to worry much about it. Sadly, while my 6502 seemed to step through code just fine on the carts we plugged in, and the PPU team got everything right graphically, we came together at the last minute and realized that their PPU didn't hold the vblank line for 3 PPU cycles and my 6502 couldn't latch a signal held for only 1/3 of a cycle, so inevitably any commercial game we tried locked up in a NMI wait before anything was drawn

by tomaitheous on 2008-08-28 (#36503)

Quote:
At any rate, I got small graphical corruption when I tried writing freely to the VDC in my Fractal Engine program, so I set up writes to occur during HBlank interrupts

Hey Chris. If I'm correct, you're not using a double buffer for the image (not enough room for two 512x240 images). If the CPU is writing to the same location that the VDC is reading from, then you'll get an artifact for that row of 8 pixels. I usually trail the VDC raster by one scanline if I'm not using a double buffer system.

That chart Charles put up is for the speed of the VRAM. Slow modes, which aren't used for the PCE, give less CPU slots. The slowest mode gives no CPU slots during active display. These unused modes were for the VDC paired with slow and really slow VRAM. It also limits the VDC to 2bit tiles/sprites instead of 4bit because of slower ram. The slowest mode also limits the sprite scanline limit to half. I guess that's kind of irrelevant.

But the DOT clock chart for normal (fastest vram setting) mode shows the number of CPU slots open to word writes in a 8 pixel segment of a scanline during active display. If the CPU writes to the VDC data port and the VDC is busy with another access slot, the VDC will assert /RDY until that slot is free. For 5.37mhz mode, each slot in the DOT chart is equivalent 1.33 CPU cycles long. So if you write to the latch $0003 ($0002 LSB has no delay), you'll get a small fraction of a CPU cycle delay on any misaligned writes. Faster pixel clock modes have faster slot times and even smaller fractional delays.

So you still have unlimited read and write access to vram during active display.

Setting up the VDC for larger incremental writes with tiles arranged in a certain pattern will allow you to write an entire scanline of pixels into vram without changing vram position, but for 16 colors you have to do a two pass. Still muuuch faster than writing 8 pixels, then reposition the vram pointer, doing the math, etc. If you use low res mode (256) and setup sprites to fill the entire screen - using it as a background, you can do some amiga style effects since that would give you 4 independent bit planes (one WORD would write 16 pixels at a time).

On the original topic of the SMB port, I did a compare of the extracted rom and a headerless NES SMB rom. Looks like a large chunk was change. Probably the tilemap format as the original code looks to be intact - sans a few hacks. The sprites are unchanged though and are padded to 16x16 even though they are still 8x8. A lot of the PPU writes are changed to JSRs. The PCE has ZP in a different address location, so the hack maps the same 8k RAM into both $0000-1fff and $2000-3fff.