Hello, I'm very much a newcomer to this board, and to NES development in general, so excuse any stupid questions or faux pas. I imagine all this stuff is basic to you guys but I haven't been able to find anyone doing a project like this so I would appreciate some input.
I was disappointed when this project -
https://howchoo.com/g/mti0oge5nzk/pi-ca ... -cartridge - turned out to just be another Raspberry Pi case mod. It got me thinking - could you actually put a Pi in a cart and have it output video through the NES? Running any game that the Pi can on an unmodified NES seemed like a fun idea. I was hoping the NES had a simple composite passthrough or something, but that would be too easy!
I suppose the "proper" way to do it would be to have the Pi fill some RAM attached to the PPU's memory bus, but that limits us to 256 different tiles and 16x16 attribute areas. I guess you could do some kind of pattern-matching thing that squashed similar patterns together, or map the pattern and attribute tables to a larger virtual address space using scanline-synced binary counters or something, but I'm lazy and stupid and it just made my head hurt.
So I thought - since we know the exact sequence the PPU will fetch data, why not ignore the address bus entirely and just feed it bytes in the correct order? Since we can feed it different pattern and attribute bytes for each 8x1 pixel area, that gives us 8x1 attributes and a unique pattern for each tile.
Anyway, long story short, I've (mostly) got it working :
https://www.youtube.com/watch?v=CxBBYujFDYMRight now I'm squirting pre-generated 40970 byte (170 bytes per scanline * 241 scanlines) packets to the PPU using a high-speed USB-to-parallel-FIFO chip (FT232H). There's a small PRG rom that sets up the palette and turns on rendering, as well as sending frame-sync commands back to the Pi. Unfortunately the first-generation Pi I'm using isn't quite fast enough to generate the palette-matched and Floyd-Steinberg dithered frames on-the-fly, but I expect the Pi 3 would have enough power to do it.
I guess my questions are :
1. Has anyone done any work on converting images into NES-friendly formats? I've tried using various homebrew per-frame palette generating algorithms, but nothing seems to work as well as manually hand-picking a palette.
2. What would be the best way of getting controller input back to the Pi? I was thinking a binary comparator on the PRG address lines feeding the chip enable pin of an 8-bit latch, but I'm not really a hardware guy so I'm sure there's a simpler way than that. Keep in mind the process latency on a Pi is far too high to be able to sit directly on a traditional CPU bus.
3. I don't want to reinvent the wheel, so has anyone done any work in this sort of area before? Again I'm a total newcomer to the NES scene (in fact I'd never even played a NES before starting this project, they weren't very common in the UK, haha).
I will of course share source code if anyone's interested.
Oh man, we've been talking about this for years! It's really cool to see someone actually did it.
rasteri wrote:
attribute bytes for each 8x1 pixel area
How do you pick which palette to use?
Quote:
palette-matched and Floyd-Steinberg dithered frames on-the-fly
Temporal dithering will dramatically improve the video quality, too. (
http://www.pouet.net/prod.php?which=60939 )
Quote:
1. Has anyone done any work on converting images into NES-friendly formats? I've tried using various homebrew per-frame palette generating algorithms, but nothing seems to work as well as manually hand-picking a palette.
Unfortunately, the only work I've seen was thefox's
converter.
Quote:
2. What would be the best way of getting controller input back to the Pi? I was thinking a binary comparator on the PRG address lines feeding the chip enable pin of an 8-bit latch, but I'm not really a hardware guy so I'm sure there's a simpler way than that. Keep in mind the process latency on a Pi is far too high to be able to sit directly on a traditional CPU bus.
The NES controllers are basically SPI slaves; the NES CPU manually runs the bus.
A pair of latches, a decoder, and having the code on the 2A03 read and relay the values is probably the simplest option, as you guessed.
lidnariq wrote:
How do you pick which palette to use?
For each pixel in the slice it finds the palette that has the closest color match, then whichever palette has the most matches "wins" that slice. I'm sure there must be better/more efficient algorithms.
Quote:
Temporal dithering will dramatically improve the video quality, too. (
http://www.pouet.net/prod.php?which=60939 )
Awesome, I hadn't heard of that but I see some commercial LCD panels do it too. As I understand it, instead of shifting the error onto nearby pixels a la Floyd-Steinberg, you shift it onto the following frame?
Exactly.
One time I was playing around with temporal dithering, I just used a plain white noise source (LFSR of prime period) and added it to my intended signal. Then I truncated the result to the available output depth. Since the noise source is
stationary, I don't need to explicitly propagate error either spatially or temporally. But I also treated RGB independently, which means that the error signals are 1-dimensional, and the random walk is guaranteed to return average 0.
This was in preparation for trying to use Cypress's EZ-USB parts to bit-bang a VGA subset (24MHz R2G2B2) but never managed to figure out how to write the host support.
While Pi 1 does not have Neon, it does have some simpler SIMD instructions. Using those to optimize your code might make it fast enough.
Headsup that the link to fox's converter in that thread is dead.
Here's the correct one:
https://kkfos.aspekt.fi/downloads/nes-i ... -2-v01.zipIt achieves a higher colour fidelity by flicking palettes and tables each line and frame, which many HDTV:s won't reproduce. This effect could be achieved (with the right screen) though there will be some some side effects from the strobe-as-interpolation when the whole screen is changing a lot (like camera movement in a 3d game). It may come across as either smudgy motion blur (perfectly ok, actually it could be flattering to the image), or jitter/stutter if the change is extreme enough, probably eye straining or worse in the long run, depending on how much the scene changes from frame to frame.
I've implemented temporal dithering in my image encoder, and it seems really flickery on my emulated output, but hopefully it'll look better on the NES. I may have to buy a PVM (good luck finding a consumer NTSC CRT in the UK), since I'm guessing that the crappy 240p deinterlacing my TV does will totally mangle it.
Applying a motion blur effect in virtualdub makes it look great, so I guess my algorithm must be OK.
lidnariq wrote:
Temporal dithering will dramatically improve the video quality, too. (
http://www.pouet.net/prod.php?which=60939 )
Please, no ! Flickering looks awful on a real NES + real CRT screen.
Other than that, I'm happy to see FINALLY someone trying to push the NES 2A03 to it's "limits", it's about time !
Have you actually watched Out-oh-mat?
Or the 3840 color VGA tweak mode? (e.g.
http://swag.outpostbbs.net/GRAPHICS/0191.PAS.html )
I don't know if it's possible to use this on your project, but I like tepples' RGB121:
viewtopic.php?f=21&t=13014&start=15viewtopic.php?f=9&t=13042&start=15
I've tried some of the flicker ROMs made by the penguin game guy, and they looked awful. I couldn't stand that flicker for long.
Cool to see somebody experimenting with this stuff. I have been meaning to do something similar for a long time.
calima wrote:
I've tried some of the flicker ROMs made by the penguin game guy, and they looked awful. I couldn't stand that flicker for long.
By "the penguin game guy" I think you mean Macbee, the member that posted immediately prior to you.
I really like the effect in Lucky Penguin and on my particular television and hardware, is pretty seamless. But it is an optional setting for a reason (different hardware and user visual sensitivity, etc.)
Flickering on a CRT and real NES is practically unoticeable. It is the way to go if you want to simulate more colors. Just use horizontal flicker, never vertical. When running on an emulator with inconsistent frame rate it can look bad though.
Anyway, this project is amazing. I didn't understand if the input issue has been answered, and I hope it can work.
I agree that flicker ought to be an option, but it'd be great looking for those who'd be able to view it correctly.
Anyway, i think one last thing one could try to reach higher fidelity is streaming sprite chr and map them to different places where they can improve colour detail or smooth out overly harsh attribute borders. It would be less complex to write code for it if all sprites in effect were treated as one single pixel (basically a particle) or perhaps an 8x1 sliver (ie 1-dimensional), although that would cap some of its potential. And it would need to be optimized so it doesn't waste sprites on exceeded sprites per scanline limits with priorites set wrong.
Unfortunately, OAM is internal so i don't think that could be streamed into. the CPU would be full-time responsible for updating between frames, moving around a select number of sprites... possibly updating one half of the selection one frame and the other half the next.
...
It's a fantastic project, congrats!
Could just set the full set of sprites to a 64x128 pixel block somewhere with a fixed palette and let the extra fetches (since rasteri is already providing data for all the fetches) provide a 3-color overlay.
Makes the quantization logic harder, tho.
A (well, potentially) more dynamic approach somewhere in between those two would be to have a few metasprite objects set to different sizes.
Just winging an example:
2 4x4 sprites
6 2x2 sprites
8 objects, 56 sprites. These are field coverage as if they were bg tiles (mostly?). I think aligning them to a grid would help.
lastly, a number of (low-priority?) stand-alone sprites
8 objects, 8 sprites:
These are for details with higher colour fidelity or where the background and/or field coverage objects are too clumsy.
Get both some block coverage and details; update 16 objects between frames.
Avoids a very squarey block looking more colourful.
Wether to go 8x8 or 8x16, i guess it is a coverage vs complexity balance.
8x16 could either mean larger fields, more details, or something in-between.
I think the bg layer would need to be calculated as if there was no sprite overlay, so it never looks glitchy when sprites get choked by the scanline limit. Then sprites would be applied to fields where the posterization has been the most severe compared to the original frame.
nesrocks wrote:
Flickering on a CRT and real NES is practically unoticeable. It is the way to go if you want to simulate more colors.
It is very noticeable. It doesn't look like a new colour, but like two actually flickering colours, what it actually is.
Some 18bit consumer LCD panels use temporal dithering to simulate 24bit color, but they're flickering between two very similar colors - on a NES, the two colors you're flickering between will have a significant color distance, which I think will make it more noticable.
I'm gonna try anyway though. I think I've found an NTSC CRT locally.
At least you have principally more leeway with hues than with brightness tiers.
rasteri wrote:
Some 18bit consumer LCD panels use temporal dithering to simulate 24bit color, but they're flickering between two very similar colors - on a NES, the two colors you're flickering between will have a significant color distance, which I think will make it more noticable.
I'm gonna try anyway though. I think I've found an NTSC CRT locally.
Good luck! I think it can look very impressive, even if doing distant hue/brightness which makes the flickering noticeable. The tradeoff is totally worth it, especially on busy moving scenes or even full screen animation/video.
Yeah it'd be great if it can be made to work acceptably.
I have another question - how much current can the NES safely supply through the cartridge port? The raspi will use upwards of 200mA and I don't wanna start blowing up people's 5v regulators.
Not an answer to that, but:
you did use a first-gen rpi before?
Rpi3 (and 2) are relatively power hungry, but rpi zero is both a lot quicker in single-threaded operation than models A+ and B+, while being somewhere in-between A+ and B+ in power consumption.
Reference.
The CPU and PPU draw somewhere around 300-400mA, and there's a 1A regulator on the mainboard. You're likely ok, but I'd worry a little about surge current when you turn things on.
lidnariq wrote:
The CPU and PPU draw somewhere around 300-400mA, and there's a 1A regulator on the mainboard.
That's reassuring.
lidnariq wrote:
You're likely ok, but I'd worry a little about surge current when you turn things on.
The raspi doesn't have any huge capacitors on it or anything. If neccesary I'll use an NTC. Or just a resistor, haha.
You should run a NES emulator on the Pi
Haha, don't think I haven't considered that
Another question - I'm considering designing cart PCBs that have an FT232H as well as a small PRG eeprom, to make development easier. I haven't been immediately able to find any decent technical drawings of the cart PCB. Does anyone have one, or even better a premade library for Altium/Eagle/whatever?
If you're open to giving designspark a try, I've got PCB outline along with my current component library posted in this
topic. Should have most of what you're looking for in terms of edge connector, EPROM, SRAM, etc.
infiniteneslives wrote:
If you're open to giving designspark a try, I've got PCB outline along with my current component library posted in this
topic. Should have most of what you're looking for in terms of edge connector, EPROM, SRAM, etc.
I'm certainly willing to try designspark long enough to steal your dimensions
I really just need the board outline and edge connector spacing (which I hear is 2.5mm rather than .1"/2.54mm for some bizarre reason)
rasteri wrote:
I really just need the board outline and edge connector spacing (which I hear is 2.5mm rather than .1"/2.54mm for some bizarre reason)
Yes that's true. Interestingly enough, (American) NES is metric pitch, and Japanese Famicom is English/standard (2.54mm pitch). Perhaps it has something to do with the custom design nature of the NES's ZIF connector, where as the famicom used off the shelf 'standardized' female edge connectors.
I'm having a lot of problems keeping the FT232 synced with the NES. It only has a 1K buffer so it needs to be fed data at least every 300us, and once DOOM and the encoding threads start competing for context switches - not to mention the USB overhead - the display starts getting pretty marginal. I'm going to experiment with RTlinux but I suspect it's not going to help.
Possible solutions I've thought of :
1. FIFO chips - may be difficult to frame sync (I'm using _WR during the vblank as a frame sync)
2. Dual-port RAM - write the entire 40970 byte frame to a (potentially double-buffered) dual port ram and use counters to generate the address lines
3. Try a different USB FIFO such as the Cypress FX2LP.
4. Run another OS - this would involve porting DOOM/SDL/USB stack
I really liked the elegant 1-component simplicity of the USB FIFO solution so I really don't want to try any of those. It's a shame the only suitable high-speed data bus out of the raspi is USB with all its inherent problems. I'm gonna keep trying though.
At least, it's pretty cheap to get your hands on a PCB containing Cypress's FX2LP. It still only has 4 KiB of RAM for buffering... I don't know if needing to send new data to the endpoint 1/4 as often will help enough.
lidnariq wrote:
At least, it's pretty cheap to get your hands on a PCB containing Cypress's FX2LP. It still only has 4 KiB of RAM for buffering... I don't know if needing to send new data to the endpoint 1/4 as often will help enough.
It's ALMOST fast enough to keep up with a 1KB buffer, so 4x that much might be enough. The FX2LP however is an actual microcontroller requiring code to be written, and I'm already approaching my "silly project time" threshold for this year, so it may have to wait for a couple months so I can afford to eat
Actually I have no idea how the FTDI D2XX library handles blocking (source is not available) so it's possible libftdi or just raw libusb might be more efficient.
Quick update, I'm still working on getting the FX2 interfaced properly, but I'm very impressed by it overall. It's almost like a tiny very-easy-to-program CPLD, in that it has a state machine that can be triggered using combinational logic from any internal or external trigger. And everything is configured from a GUI which is great for someone like me who's not really a hardware guy.
It seems to me that I could even get controller input via the PPU just by writing to PPUDATA - the FX2 could just read and relay it back to the Pi. No need for any external glue logic at all.
BTW my naive attempts at temporal dithering (i.e. just adding the error onto the same pixel in the following frame) look terrible. I really want to get it working (especially since it saves so much processor time vs. Floyd-Steinberg or my favorite Atkinson dither) so I'll look into rgb121 some more.
You definitely can't propagate the entirety of the error signal when using temporal dithering. It oscillates problematically.
Some years back I played around with a bunch of different dithering techniques, as I was looking into emitting 3 or 6bpp VGA using an FX2. (Never got far enough). I did write a couple of simulators using SDL to play around with different dithering modes, with varying levels of visual quality.
Unfortunately I didn't take notes as to how well each one worked.
I tried the following:
* Propagate 1/3 the error into the pixel to the right, and 1/3 into the pixel in the next frame
* Propagate 3/7 the error into the next frame. On half the scanlines, propagate 1/7 the error each up/down/right; then on the other scanlines, propagate 3/7 error left.
* Just adding stationary uniformly-distributed noise before quantiziation
* 1st-order sigma delta
* 2nd-order sigma delta
The last produced quite nice results, but is probably wholly inapplicable to moving content.
In the video the guy says that he needed to kill an original NES cart for the CIC lockout chip. I'm pretty sure that this community has already solved this. The lockout chip for the SNES has been reversed engineered and you can get a chip burned with that code on it making SNES homebrew possible.
Also to the creator of the video, if you can get a board made where it is easy to hook up the raspberry pie to it I would love to make an NES game with an enhancement chip!
I don't know if you can increase the sprite number or whatever but if you can make very advanced NES games that would be something cool to see.
Decided to come back to this project, after learning a bit more about the FX2LP and USB in general.
Got frame sync near-100% stable now.
I was wondering about music - I've seen PCM demos for the NES but I'd (probably) need another FX2LP to stream data in and out of the CPU bus.
Maybe the thing to do is make funky 2A03 chiptune versions of the DOOM soundtrack, haha
That would be fun.
Vid here -
https://www.youtube.com/watch?v=FzVN9kIUNxw
rasteri wrote:
I was wondering about music - I've seen PCM demos for the NES but I'd (probably) need another FX2LP to stream data in and out of the CPU bus.
If you want to keep the NES unmodded, try to drive the IRQ line with your external hardware and put the pre-mixed sample for the DAC on a memory-mapped I/O port. To cut the overhead of the IRQ for the NES as short as possible, reading the port should also acknowledge the IRQ and reset whatever time you use to generate the IRQ signal. Make the NES do as little as possible. If the console isn't doing a whole lot, you can expect to get a rate of at least 22kHz.
It's a lovely project so far, though I wonder if the NES itself could run a game that would make use of the Pi for graphics processing only?
za909 wrote:
try to drive the IRQ line with your external hardware and put the pre-mixed sample for the DAC on a memory-mapped I/O port.
Yeah this is sounding like more work than I really have time for
Perhaps someone else can do it when I open-source this project (which I will do).
I did a test arrangement of E1M1 in famitracker, it sounds pretty nice IMO :
https://www.youtube.com/watch?v=c9Ky79Wpg7oQuote:
It's a lovely project so far, though I wonder if the NES itself could run a game that would make use of the Pi for graphics processing only?
I don't see any reason why not.
So this is basically stable enough to release now.
Source code and build guide can be found here :
https://github.com/rasteri/PiPUIt's still quite basic, and every part of the project could be improved massively, but I'm unlikely to have any more time to work on silly projects until next year.
If anyone feels like tinkering with the code, let me know if you need help setting up a dev environment.
I've also made a video explaining how everything works :
https://www.youtube.com/watch?v=gCWhWBtu0LA
Very cool! Now, you say this is limited to 13 colors on screen at once, but since you're ignoring the address bus, shouldn't it be possible to update a number of consecutive palette entries every hblank, improving the quality of images with more vertical color distribution? Sure this would complicate the graphics conversion process (for each scanline you could maybe pick the least used palette, or the one that caused the most coloring errors, and have it recalculated for the next scanline), and increase the amount of palette data that needs to be sent to the console, but it'd be interesting to see.
tokumaru wrote:
Very cool! Now, you say this is limited to 13 colors on screen at once, but since you're ignoring the address bus, shouldn't it be possible to update a number of consecutive palette entries every hblank, improving the quality of images with more vertical color distribution?
That would be very cool. Feel free to implement it
Or maybe I will some day.
How many palette entries could be updated each hblank?
Since you don't need to restore scroll, you should be able to update one or two (sequential) palette entries every scanline.
0) Write $3F to $2006 early (0 cycles)
1) preload A,X,Y (0 cycles)
2) Write $0 to $2001 to disable rendering (ideally landing on exactly dot 255/6/7) (≈1 cycle during hblank)
3) Write [palette index] to $2006 (4 cycles)
4) Write new color to $2007 (4 cycles)
5a) register pressure (2 cycles maybe)
5b) Write new color to $2007 (4 cycles)
6a) register pressure (2 cycles maybe)
6b) Write $0A to $2001 to re-enable rendering (4 cycles) - must be no later than dot 319. This just barely leaves room for two "register pressure" moments and two new (sequential) colors (1+4+4+2+4+2+4)
It is possible with the cartridge that you can then get all points addressable, with 8x1 areas of colours (rather than 16x16). (I guess it does this already.)
On Famicom, you could also use Raspberry pi for audio, too, probably (you could also combine this with 2A03 audio if wanted). And then, (on RF Famicom only) you can also use the microphone with it.
For a game using sprites with additional colours you might also add a 6502 code to set up the sprites, in addition to audio (if used) and controller (including even 2-players and whatever else might be in use; such as, if using a light gun then it can include a code to transmit the hit test data).
I looked at the existing codes; it is a C code, although an assembly code would also do, I think. (Even a game could be made that uses both logic together, somehow.)