Scanline- vs pixelengine

Scanline- vs pixelengine
by oRBIT2002 on 2016-02-15 (#164552)

I've only done scanline-based graphicsengines for my emulators earlier but I've been thinking of trying to create a pixel-per-pixel engine. Question is if it's worth the effort?
The drawback with (my) scanline-based engines are that they are obviously not as accurate as pixelbased ones (but still works pretty good in most cases).

I'm currently working on a scanline-based engine and discovered a few problems with it that I want to get fixed. Question if I should try to partly rewrite my engine (however, it tends to get a bit "hack-ish"

) or rewrite it from scratch pixel-per-pixel (which seems pretty complex when I've checked some sources and info on the wiki..).

So, anyone with pixel-per-pixel engine-experience here that'd like to share some wise thoughts/hints if I decide to go that way...?

Re: Scanline- vs pixelengine
by Disch on 2016-02-15 (#164553)

It depends what your goals are, and how much you want to stress accuracy. Obviously a scanline-based emulator is never going to be as accurate as a pixel-based one... but do you really care? If you don't... and if you're happy with getting games to run even if the emu itself isn't completely accurate, then don't bother.

As for difficulty -- it's not as difficult as you think. It's just taking things in smaller steps.

Re: Scanline- vs pixelengine
by Dwedit on 2016-02-15 (#164557)

What causes the PPU pixels to change within a scanline?
*Changes in CHR mapping (including automatically triggered stuff from punch out/fire emblem)
*Changes in PPU control $2000 (which banks it uses for Sprites and BG)
*Changes in PPU mask $2001 (enabling/disabling, color emphasis, monochrome, first column hides)
*Changes in fine scroll ($2005 first write)
*Changes in VRAM address

If you properly handle these changes at the correct timings, you've turned a scanline based engine into a pixel based engine.

Note that the process for the PPU pipeline still matters for getting the pixels correct. But if nothing changes in the scanline, you don't worry about anything.

Re: Scanline- vs pixelengine
by tokumaru on 2016-02-15 (#164558)

Disch wrote:

It depends what your goals are

Exactly. A pixel-based emulator is really useful for developers trying to time raster effects and such, but players in general couldn't care less.

Quote:

As for difficulty -- it's not as difficult as you think. It's just taking things in smaller steps.

In fact, breaking it down into smaller steps could actually make things easier!

Re: Scanline- vs pixelengine
by oRBIT2002 on 2016-02-15 (#164561)

It seems a bit frightening to calculate which parts of a tile/Sprite to draw on screen..

Perhaps I don't care about total compability, but something in between perhaps.. Of course, it would also be a nice ego-boost to solve a pixel-based renderer.

Re: Scanline- vs pixelengine
by zeroone on 2016-02-15 (#164564)

oRBIT2002 wrote:

As mentioned by others above, it's up to you to decide how far down the rabbit hole you wish to travel. When it comes to side projects in general, my advise is to develop them in layers, kind of like an onion. Once a layer of development is completed, you should have something releasable. And, by releasable, I mean, each layer is a give-up point. Time is a limited resource and an NES emulator can contain a surprisingly large set features. Nevertheless, many on this forum have persisted with their efforts on the same side project for many years. And, this is in a world saturated with plenty of accurate emulators out there. The primary audience of your side project is yourself.

Re: Scanline- vs pixelengine
by oRBIT2002 on 2016-02-16 (#164615)

I'm very tempted to start this project, but we'll see...

The basic idea is to draw 3 pixels for each CPU clockcycle right? Currently my CPU emulation calls a clockcycle-countdown function at the end of each opcode to be able to calculate the current scanline etc.
Would it be an idea to implement this "pixel-code" in this clockcycle-counter-function or am I getting it wrong?
I've seen emulators not counting clockcycles at all so perhaps there's a better way..?

Re: Scanline- vs pixelengine
by Disch on 2016-02-16 (#164632)

Not necessarily 3 pixels per CPU cycle... but 3 PPU cycles per CPU cycle. Only some PPU cycles actually render pixels.

As far as cycle tallying, I typically do a catchup approach with timestamps to keep track of where each subsystem is... with the timestamp scaled up to some "master clock" setting.

Typically I do:
1 PPU cycle = 5 master clocks
1 NTSC CPU cycle = 15 master cycles
1 PAL CPU cycle = 16 master cycles

So every time the CPU does 1 cycle, you would add 15 to its timestamp (or 16, if PAL)

On the CPU side, 1 cycle always corresponds to either a read or a write. So if you emulate all the dummy reads for all instructions, you don't have to tally cycles. You just increment the timestamp on every read/write.

When the CPU does something that affects the PPU (like reading/writing to a PPU reg), "catch up" the PPU by emulating it until its timestamp reaches the CPU's current timestamp (so the CPU runs ahead of everything, and then other system catch up to it when they need to sync for something).

Of course there are countless ways to do this. This is just the way I found that works for me.

Re: Scanline- vs pixelengine
by Zepper on 2016-02-16 (#164635)

oRBIT2002 wrote:

The basic idea is to draw 3 pixels for each CPU clockcycle right?

Right. Remeber that 3 pixels = 3 PPU cycles. You know, it's not only a matter of drawing pixels. You must write your PPU timing loop - VBlank, NMI, scanline/cycle switch case, DMC cycles...

Quote:

Currently my CPU emulation calls a clockcycle-countdown function at the end of each opcode to be able to calculate the current scanline etc.
Would it be an idea to implement this "pixel-code" in this clockcycle-counter-function or am I getting it wrong?

Well, my emulator uses the "microcode level" - runs the PPU at every CPU memory access. Running the PPU at the end of instruction..!? No clue - PPU timing is very sensitive, and even me is having problems with timing (Battletoads, The Simpsons, Mega Man V). So, try yourself and get ready for long sessions of timing debugging... until it's finally fine-tunned.

Quote:

I've seen emulators not counting clockcycles at all so perhaps there's a better way..?

Good. I don't need any CPU cycle counters. All the timing is driven by the PPU.

Re: Scanline- vs pixelengine
by oRBIT2002 on 2016-02-17 (#164654)

One question (among many others) are how you pixelrendering guys keep track of sprites and drawing them with this approach?
Assuming, for example, you're gonna draw a backgroundpixel at position 100,100.. And you have to know if there's a sprite there and draw the correct pixel from it. It has to be shortcut I doesn't know about?

Re: Scanline- vs pixelengine
by Dwedit on 2016-02-17 (#164663)

The answer to sprites is how the NES itself does it.
During each scanline, it looks for the first 8 sprites that are in Y range. Then during Hblank time, it fetches the 16 graphics bytes for the 1px tall sliver those sprites. So if you have the X coordinate of 8 sprite slivers, and the graphics for those sprites, they are easy enough to draw.

Re: Scanline- vs pixelengine
by tokumaru on 2016-02-17 (#164664)

Isn't it a bad idea to lock the PPU:CPU ratio to 3:1 if you plan to support PAL as well?

Re: Scanline- vs pixelengine
by oRBIT2002 on 2016-02-17 (#164666)

I've been checking this on the wiki:

Larger one here:
http://wiki.nesdev.com/w/images/d/d1/Ntsc_timing.png

Just to clarify things, here's an example:
Assume we start on scanline 0. First CPU instruction is a NOP. A NOP is 2 CPU clockcycles. Assuming screen is turned on etc.. during this time 6 pixels will be drawn (2*3 "PPU Cycles"). Am I right so far?

Re: Scanline- vs pixelengine
by Zepper on 2016-02-17 (#164667)

Actually, the PPU is always running. For each CPU cycle, the PPU runs for 3 cycles. Pixels are rendered only if background or sprites are enabled. Keep track of the PPU cycle and scanline - you need it for VBlank, NMI... as I already mentioned here.

Re: Scanline- vs pixelengine
by oRBIT2002 on 2016-02-17 (#164668)

Ok, except for the part about "screen turned on" etc, I was right then..

Re: Scanline- vs pixelengine
by Disch on 2016-02-17 (#164669)

On that graphic, cycles [1..256] are the ones that render pixels. Cycle 0 and cycles 257+ do not. So in your example if you start at cycle 0 scanline 0 and do a NOP, only 5 pixels will be rendered.

EDIT:
I always thought it was easier to use the other numbering system (where pixels start at cycle 0), before it switched to the 2C02 numbering system. If you want to do that, simply shift that entire chart over to the left one space. You'll find that when you do that, more things align with 0, and with multiples of 8... which generally makes them easier to deal with.

Re: Scanline- vs pixelengine
by oRBIT2002 on 2016-02-18 (#164701)

Ok, so the "skipped on bg-odd" and "idle" boxes counts as an entire PPU cycle in the chart then?

Re: Scanline- vs pixelengine
by Disch on 2016-02-18 (#164716)

Yes. Every square box is a PPU cycle. 341 per scanline * 262 scanlines

Re: Scanline- vs pixelengine
by oRBIT2002 on 2016-02-18 (#164722)

The terminology confuses me here.
You mentioned one "box" is one PPU cycle but as mentioned elsewhere "One CPU cycle = Three PPU cycles", it should be like 24 PPU Cycles then to be correct?
The diagram groups the PPU cycles and calls them "VRAM fetch".
Is "One CPU cycle = 3 VRAM fetch" (group of 8 PPU cycles) more correct? However that sounds a bit strange.

EDIT: I've tried designing some primitive psuedocode on paper for a pixel-renderer (Yes I like psuedocode, Disch knows what I mean

), but things tends to get very compact and very difficult to grasp. I try to break it down in very small steps but it's still not entirely trivial..
However I must admit a pixelrenderer probably solves alot of other problems compared to a scanline-version (and I don't only mean advanced raster-effects), so a pixel-version is probably the way to go in the end.

Re: Scanline- vs pixelengine
by Disch on 2016-02-18 (#164735)

Quote:

You mentioned one "box" is one PPU cycle but as mentioned elsewhere "One CPU cycle = Three PPU cycles", it should be like 24 PPU Cycles then to be correct?

On this diagram: http://wiki.nesdev.com/w/images/d/d1/Ntsc_timing.png

One square box = 1 PPU cycle.

On NTSC, 1 CPU cycle = 3 PPU cycles. So 1 CPU cycle would take 3 boxes worth of time. The PPU runs 3x as fast, so there are 3 PPU cycles to every 1 CPU cycle.

I don't know where you got "24 PPU Cycles" from, but no.

Quote:

The diagram groups the PPU cycles and calls them "VRAM fetch".

A VRAM fetch takes 2 PPU cycles, hence why 2 boxes are grouped together.

According to Brad Taylor's original 2C02 doc ( http://nesdev.com/2C02%20technical%20reference.TXT ), the reason it takes 2 cycles is because it updates the address lines on the 1st cycle, then actually performs the read and moves the data off the bus on the 2nd cycle:

Quote:

At the beginning of the access cycle, PPU address lines 8..13 are updated
with the target address. This data remains here until the next time an
access cycle occurs.

The lower 8-bits of the PPU address lines are multiplexed with the data bus,
to reduce the PPU's pin count. On the first clock cycle of the access,
A0..A7 are put on the PPU's data bus, and the ALE (address latch enable)
line is activated for the first half of the cycle. This loads the lower
8-bit address into an external 8-bit transparent latch strobed by ALE
(74LS373 is used).

On the second clock cycle, the /RD (or /WR) line is activated, and stays
active for the entire cycle. Appropriate data is driven onto the bus during
this time.

Quote:

Is "One CPU cycle = 3 VRAM fetch" (group of 8 PPU cycles) more correct? However that sounds a bit strange.

No.

1 CPU cycle = 3 PPU cycles
1 VRAM fetch = 2 PPU cycles

Re: Scanline- vs pixelengine
by tepples on 2016-02-18 (#164736)

Also:
1 PPU cycle = 4 master clocks (NTSC, RGB) or 5 master clocks (PAL NES, Dendy)
1 background column fetch = 8 PPU cycles (1-8, 9-16, 25-32, ...)
1 sprite pattern fetch = 8 PPU cycles (257-264, 265-272, ..., 313-320)

Jailbars, on Control Decks that suffer from them, align with the 8-dot fetch pattern.

EDIT: dot numbers might help others understand

Re: Scanline- vs pixelengine
by Disch on 2016-02-18 (#164737)

Wait what? That's even confusing me and I know how all this works =P

How do you figure 8 cycles for a "column fetch", and what even is a column fetch?

EDIT:

Oh wait I think I get what you're saying. Nevermind.

Re: Scanline- vs pixelengine
by oRBIT2002 on 2016-02-18 (#164741)

Quote:

On that graphic, cycles [1..256] are the ones that render pixels. Cycle 0 and cycles 257+ do not. So in your example if you start at cycle 0 scanline 0 and do a NOP, only 5 pixels will be rendered.

Disch, the above must be wrong then?
1 CPU Cycle = 3 PPU cycles (NOP = 6 PPU cycles)
Since pixels are usually rendered every 8:th PPU cycle, there can't be 5 rendered pixels on 2 CPU cycles?

NOP (2 CPU cycles) = 6 PPU Cycles, there's a chance there's not even a single pixel rendered here?

Check my attachment:
Red checkmarks = 1 PPU cycle
Green circles = 1 CPU cycle
Am I getting it right this time?

Re: Scanline- vs pixelengine
by Disch on 2016-02-18 (#164742)

Quote:

Disch, the above must be wrong then?

Nope... what I said before looks right.

Quote:

1 CPU Cycle = 3 PPU cycles (NOP = 6 PPU cycles)

NOP = 6 PPU cycles. That is correct. But there are 341 PPU cycles per scanline, and only 256 of those cycles render pixels. So some cycles will not render pixels.

If you start at cycle 0 and run for 6 cycles, you will only render 5 pixels, because cycle 0 does not render a pixel, but cycles 1-5 do.

Quote:

Since pixels are usually rendered every 8:th PPU cycle, there can't be 5 rendered pixels on 2 CPU cycles?

I don't know where you got that 8th PPU cycle from, but that's not the case.

Each cycle between cycles [1..256] render 1 pixel each (in addition to doing some other stuff). 1 pixel = 1 cycle.

Quote:

NOP (2 CPU cycles) = 6 PPU Cycles, there's a chance there's not even a single pixel rendered here?

Yes, but not because of the 8th cycle thing.

Cycles 257-340 do not render any pixels.

EDIT:

Just saw your attachment. We must be online at the same time.

Yes @ checkmarks & circles.

But again note that this 3:1 ratio will only work for NTSC. If you are planning to add PAL support, it might help to think of the ratio as 15:5 instead because the PAL ratio is 16:5.

Re: Scanline- vs pixelengine
by oRBIT2002 on 2016-02-18 (#164743)

Quote:

If you start at cycle 0 and run for 6 cycles, you will only render 5 pixels, because cycle 0 does not render a pixel, but cycles 1-5 do.

Aha! I thought only every 8:th PPU cycle (the red ones) rendered a pixel, this is why I am getting it all wrong.
So all PPU cycles renders a pixel (when in "camera-range" of course, and not the skip/idle-ones)..

And yes, I'm aware of PAL differences, but I'll focus on NTSC for now..

Re: Scanline- vs pixelengine
by Disch on 2016-02-19 (#164744)

Quote:

So all PPU cycles renders a pixel (when in "camera-range" of course, and not the skip/idle-ones)..

"in camera range", yes.

scanlines [0..239] and cycles [1..256]. As long as both of those are true, the PPU is outputting a pixel.

Re: Scanline- vs pixelengine
by oRBIT2002 on 2016-02-19 (#164745)

So, which pixels are actually drawn every PPU cycle? I notice that 2 tiles are fetched on line 261 to be the first displayed on top of screen. But PPU Cycles 1,2,3.. they fetch data that are displayed "a little bit later then"? There's some kind of 2 tile buffering going on here, kind of?

EDIT:
Disch, I can hardly ask of you to help me out anymore here (I'm still extremly grateful for the help with audio psuedocode!), but if possible for you, or anyone else here with the skills, could help out, atleast with the basics of some pixelrender-psuedocode here, a push in the right direction.. It would be awesome..
I've tried myself on paper a few times but I tends to do it (probably) alot more complex than it needs to be, so..

Re: Scanline- vs pixelengine
by tepples on 2016-02-19 (#164757)

The two background fetches at the end of horizontal blanking (321-328 and 329-336) are buffered to set up the pixel pipeline. The fetched background pattern bytes are first fed through a pair of 8-bit parallel in serial out (PISO) shift registers, which decode the pattern bytes into 2-bit pixel values. Then they and the attribute value are fed through a set of four 8-tap delay lines, which implement fine horizontal scrolling (bits 2-0 of first $2005 write). Do you need an illustration in order to understand this phase?

Re: Scanline- vs pixelengine
by oRBIT2002 on 2016-02-19 (#164783)

Does this mean there's a slight delay here.. PPU Clocks 1,2,3 etc. are actually displayed 16 pixels later?

Re: Scanline- vs pixelengine
by Disch on 2016-02-19 (#164785)

The delay comes between when the CHR for the tiles is fetched, and when those pixels are actually output.

The tiles are loaded between ~9 and ~16 cycles before they're actually output

EDIT:

To clarify... assuming the fineX scroll is 0, the CHR being fetched on PPU cycles 5-8 will not be displayed until PPU cycle 17. fineX scrolling will make it display sooner than that due to the way that works.

The pixels being rendered on cycles 1-16 are the tile data fetched at the very end of the PREVIOUS scanline. This is part of the reason why there has to be a pre-render scanline before the actual displayed scanlines.

Re: Scanline- vs pixelengine
by oRBIT2002 on 2016-02-19 (#164787)

This is really interesting/fun stuff but I'm expecting to have a brain meltdown anytime now.

I think I understand the basics more or less (probably a little more "less"

) but the challenge here is to get it down in code. Where do you start?

There are many things to think about but still it doesn't seem that it will require that much code in the end.
Perhaps the start is to have a small pixelbuffer (16-pixels or so) that you keep on filling/emptying... (?)

Re: Scanline- vs pixelengine
by Disch on 2016-02-19 (#164788)

I would stay start with the bare minimum. For me, that would be drawing BG only with no attributes. If you can wrap your head around the moving parts, it's not that hard:

The PPU typically has a 4-fetch pattern that spans 8 cycles total (each fetch is 2 cycles). Those fetches are:

A) Nametable byte (cycles 1,2)
B) Attribute byte (3,4)
C) CHR low byte (5,6)
D) CHR high byte (7,8)

- Nametable fetch reads a byte from the current scroll (current PPU address). That NT byte gets stored somewhere temporarily
- Attribute fetch read a byte -- but for now let's ignore it
- CHR low byte reads a byte based on the previously fetched NT byte. That byte gets moved into the "low shifter"
- CHR high byte reads the other CHR byte, moves it into a "high shifter"

So that's the fetch pattern. Then the rendering pattern is just pulling values from the shifters and outputting them.

When fineX = 0, you use the high bit of the shifter.
When fineX = 1, you use next to high bit
When fineX = 2, you use next to next to high bit
etc

Once you pull a value out of the shifter (no matter where you pulled it from), you shift it left.

1) So you pull out the appropriate bits from the low and high shifters
2) combine them to get your 2bpp color
3) which you would then combine with attributes (but don't worry about that for now)
4) which you would then prioritize with the sprite pixel (but don't worry about that for now)
5) Final value = the pixels you output

Re: Scanline- vs pixelengine
by oRBIT2002 on 2016-02-19 (#164790)

Hmm, But even in minimum state I need to deal with "FineY"-scroll and how to deal with fetching data from another name table (assuming xscroll>0).
?

Re: Scanline- vs pixelengine
by Disch on 2016-02-19 (#164792)

If you handle $2000/$2005/$2006 writes properly, and do the appropriate changes to the PPU addr to adjust the scroll during the frame... you need not concern yourself with the scroll. At least not for the NT fetch. The low 12 bits of the PPU address IS the scroll.

This handles coarse scroll:

Code:

NTbyte = ppuread( 0x2000 | (ppu_addr & 0x0FFF) );

Fine-Y is stored in bits 12-14 of the PPU address... and that is only used when fetching CHR:

Code:

 // bgpage = 0x0000 or 0x1000 depending on the state of $2000.4
finey = (ppu_addr >> 12) & 7;

CHR_low = ppuread( bgpage | (NTbyte << 4) | finey );
CHR_high = ppuread( bgpage | (NTbyte << 4) | finey | 8); 

An Fine-X scroll determines which bit of the shifter to pull your pixel from.

Re: Scanline- vs pixelengine
by oRBIT2002 on 2016-02-20 (#164810)

What happens assuming x-scroll is something like, for example, 250? What happens with the vram_address then because if I just keeps incrementing it, it won't work properly. Will the vram_address magically "jump" to the next nametable? What will then happen at the next scanline?
Hope you get what I mean..

Re: Scanline- vs pixelengine
by tokumaru on 2016-02-20 (#164811)

oRBIT2002 wrote:

Will the vram_address magically "jump" to the next nametable?

Yes, the name table bit will toggle.

Quote:

What will then happen at the next scanline?

At the end of the visible scanline, the X part of the scroll is copied from the temp VRAM register, so the next scanline will start at the same X coordinate as the previous one.

If I'm not mistaken, this information is written in the diagram that was posted in this thread.

EDIT: Yes, the "hori(v) = hori(t)" box Indicates when the horizontal bits of the scroll are copied from T to V.

Re: Scanline- vs pixelengine
by Disch on 2016-02-20 (#164819)

oRBIT: You might want to read some docs on how scrolling actually works on the NES:

http://wiki.nesdev.com/w/index.php/PPU_scrolling

EDIT:

The thing to note is that the NES doesn't really have an X scroll or a Y scroll... or even a nametable select. It just has a PPU address, and a "temporary" PPU address: Often referred to as "loopy v" and "loopy t" respectively (since they were identified/coined in loopy's scroll doc) -- or just "v" and "t".

If you look at that wiki page... notice how $2005 and $2006 change the exact same var ("t"), just in different ways.

Re: Scanline- vs pixelengine
by oRBIT2002 on 2016-02-20 (#164835)

i think I've got it a little bit wrong again.. According to the PPU diagram it fetches tiles all the time (and not really pixel by pixel as I thought and discovered when doing the math...). Correct me if I am wrong.

You learn and discover something new everyday obviously.

Re: Scanline- vs pixelengine
by Disch on 2016-02-20 (#164839)

The PPU performs 4 fetches for a single tile:

- Nametable byte
- Attribute byte
- CHR low
- CHR high

Each of these fetches takes 2 cycles... resulting in 8 cycles total for an 8x1 pixel image.
That 4-fetch pattern is repeated several times throughout the scanline. You can think of this fetch process of "loading" the image, but not actually drawing it.

In addition to fetching, it's also outputting pixels. 1 pixel per cycle means that it'll take 8 cycles to draw an 8x1 image.

So it takes 8 cycles to LOAD an 8x1 image.
And it takes 8 cycles to DRAW an 8x1 image.

So graphics are loaded and drawn at more or less the same speed... just with a slight delay between when they're loaded and when they're actually drawn.

Re: Scanline- vs pixelengine
by tokumaru on 2016-02-20 (#164845)

Disch wrote:

So graphics are loaded and drawn at more or less the same speed...

Not only at the same speed, but also at the same TIME. As it fetches data for future pixels, it also draws pixels using data that was fetched before.

Re: Scanline- vs pixelengine
by oRBIT2002 on 2016-02-21 (#164870)

The pieces are finally starting to fit together(in my head)..

Re: Scanline- vs pixelengine
by Zepper on 2016-02-21 (#164875)

oRBIT2002 wrote:

The pieces are finally starting to fit together(in my head)..

Good. To be better, please, the wiki must be updated in order to make it easier to understand.

Re: Scanline- vs pixelengine
by oRBIT2002 on 2016-02-21 (#164898)

Here's tonights first result of "Mario Bros" titlescreen with a pixelrenderer ("tilerender" should be more appropriate but still..

)
It's buggy of course, but I'm still pretty proud I got this far.

Now lets see if we can fix some bugs...

Re: Scanline- vs pixelengine
by tepples on 2016-02-21 (#164900)

The first step toward fixing it is logging which addresses the PPU is reading and seeing whether the log makes sense. For example, the first row of a title screen nametable is likely to be blank, and the first 20 accesses might look like this:

Code:

     nt   attr tiledata
321: 2000 23C0 0240 0248  (starting in pre-render pre-roll)
329: 2001 23C0 0240 0248
  1: 2002 23C0 0240 0248  (first fetch of new line, for third column)
  9: 2003 23C0 0240 0248
 17: 2004 23C1 0240 0248

What do you get?

Re: Scanline- vs pixelengine
by oRBIT2002 on 2016-02-22 (#164960)

My vram-address is way off, so something is going on here...

Re: Scanline- vs pixelengine
by Disch on 2016-02-22 (#164971)

Tracelogs tracelogs tracelogs.

If your scroll is off, make trace dumps of all parts of your code that's touching the scroll so you can see how it's changing and see where the problem is.

Emu dev is all about analyzing trace logs. You've never seen so many tracelogs

Re: Scanline- vs pixelengine
by oRBIT2002 on 2016-02-22 (#164974)

Current plan is to write a simple test rom that outputs a static nametable and create some logs.

My vram address really likes to go above $4000 for some reason so I'll see what I can find. A little exciting actually .

Re: Scanline- vs pixelengine
by tepples on 2016-02-22 (#164975)

VRAM address in $4000-$7FFF means the PPU is rendering the bottom half of a row of tiles. During nametable fetches, bits 14-12 are ignored, and during pattern table fetches, they're moved down to bits 2-0.

Re: Scanline- vs pixelengine
by tokumaru on 2016-02-22 (#164976)

During rendering, the VRAM address assumes the following structure:

Code:

0yyyNNYY YYYXXXXX

yyy - fine vertical scroll;
NN - name table;
YYYYY - coarse vertical scroll;
XXXXX - coarse horizontal scroll;

Bits 0 to 11 are used in that exact same order to read from the nametables (using a base address of $2000), but bits 12 to 14 are used to select the correct row of pixels from the pattern tables. When reading patterns, an address with this structure is used:

Code:

000BIIII IIIIPyyy

B - base address ($0000 or $1000, as defined by bit 4 of PPUCTRL);
IIIIIIII - tile index read from the name table;
P - low plane or high plane;
yyy - pattern row;

Re: Scanline- vs pixelengine
by oRBIT2002 on 2016-02-22 (#164985)

Progressreport.

Re: Scanline- vs pixelengine
by oRBIT2002 on 2016-02-23 (#165020)

Fixed a few more issues and probably introduced a few more bugs.. Current status is that the screen tends to "fall down" (like your flipping pages in a book or so). My gut feeling is that it hopefully won't be that hard to fix..
If I only had more sparetime!

EDIT:
Getting alot of strange midscreen $2005-writes which probably explains why the screen is moving even if it shouldn't.. To be continued..

Re: Scanline- vs pixelengine
by Dwedit on 2016-02-23 (#165054)

Midscreen 2005 writes only affect fine scroll and X scroll for the next line.

Re: Scanline- vs pixelengine
by Disch on 2016-02-23 (#165063)

Yeah, remember that $2005 only changes fine-X and "ppu temp". It does not ever actually touch the real ppu address.

Re: Scanline- vs pixelengine
by Zepper on 2016-02-23 (#165071)

boring.

let's toast when finished :shock:

Re: Scanline- vs pixelengine
by oRBIT2002 on 2016-02-24 (#165091)

Obviously got issues with $2006/$2007.. Here's the old NESStress-ROM that has problems with positioning "OK"..

Re: Scanline- vs pixelengine
by beannaich on 2016-03-07 (#165909)

Not sure if this will help, or if you're interested, but I recently started writing a half-cycle accurate stand-alone PPU emulator. The goal was for it to be used as a reference implementation for other emulators. If you're anything like me, seeing the code can be more valuable than any number of wiki pages.

Re: Scanline- vs pixelengine
by Aliasmk on 2016-03-21 (#166589)

Sorry to be that guy but what is the difference between Scanline and PixelEngine rendering in this context? Thanks.

Re: Scanline- vs pixelengine
by tokumaru on 2016-03-21 (#166592)

A scanline engine draws entire scanlines at a time, using the same set of PPU parameters (scroll, pattern banks, emphasis bits, etc.) all the way through. This is not how the real console works though... Since the PPU and the CPU run in parallel, most PPU parameters can be changed at any time, so emulating the PPU with finer granularity (pixel by pixel) while simulating the individual memory fetches the actual PPU does, at the correct times, results in more accuracy. Most games don't need this kind of accuracy, but some special effects do, and developers also need it to make sure they're respecting the constraints of the real hardware.