Simple questions

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Simple questions
by on (#51363)
hello, I am new here although I have been looking at this board for a while now. I have been programing for a number of years and I consider myself to be ok at it. But I have decided to create a nes emulator to learn about hardware which I am alot less knowledgeable about.

I just have some simple questions about the 2A03, I am the first to admit I know little about CPUs so please don't facepalm yourself at some of my questions.

-From what I gather the program counter is the only thing that should be treated as 16 bit in the nes, am I correct?

-It seems the accumulator should be treated as a signed byte, and it looks like x and y are unsigned, so how do I handle commands like TAY? should I simply copy it bit for bit?

-is the portion of the cpu that code can be run from 0x8000 to 0xFFFF?(just making sure)

- how do you tell where the code on the rom starts (It is probably documented somewhere, but I could not see it)

- when is the status register changed (outside of opcodes that directly tweak its flags such as CLC)

I am sorry about my n00bness, but I am really trying to learn how to do this so thank you to anyone who can clear up any of my questions for me
Re: Simple questions
by on (#51364)
someone_somewhere wrote:
-From what I gather the program counter is the only thing that should be treated as 16 bit in the nes, am I correct?

That and the temporary address in indexed and indirect-indexed addressing modes.

Quote:
-It seems the accumulator should be treated as a signed byte, and it looks like x and y are unsigned, so how do I handle commands like TAY? should I simply copy it bit for bit?

Yes. The registers are actually unsigned; the N flag (used by BMI/BPL) is just a bit 7 flag.

Quote:
-is the portion of the cpu that code can be run from 0x8000 to 0xFFFF?(just making sure)

Entire address space. Plenty of games copy code to RAM and run it, especially games whose mappers use 32 KiB, such as AOROM, BNROM, GNROM, Color Dreams, and multicarts.

Quote:
- how do you tell where the code on the rom starts (It is probably documented somewhere, but I could not see it)

The first thing the CPU does when reset is JMP ($FFFC), which loads a little-endian address from $FFFC-$FFFD into the program counter. But like any read from cartridge address space ($4018-$FFFF), the actual location in the ROM is subject to bankswitching.

Quote:
- when is the status register changed (outside of opcodes that directly tweak its flags such as CLC)

Any good 6502 reference states which instructions modify N, V, Z, and C flags. Other "direct tweak" instructions are PHP, PLP, and to a lesser extent BIT.

by on (#51365)
First of all, welcome! I'll try and answer your questions as best as I can.

Quote:
From what I gather the program counter is the only thing that should be treated as 16 bit in the nes, am I correct?


No, the program counter is not the only thing you should treat as 16 bit. The 6502 allows you to address a 16-bit address not only with loading values (LDA, LDY, LDX, etc.) But also storing values, jumping, and other things. If you look at a 6502 reference, you'll see that there are many many opcodes which use an absolute, 16-bit value as the argument.

Quote:
It seems the accumulator should be treated as a signed byte, and it looks like x and y are unsigned, so how do I handle commands like TAY? should I simply copy it bit for bit?


With 6502 code, you can treat numbers as signed, but they aren't necessarily signed. They may affect some flags like signed numbers do, but the accumulator is not always signed. I'm not quite sure what you mean by "treat" it as signed or unsigned, though. If you look at a 6502 reference, it will tell you what math operations affect which status flags, which will probably answer your questions a little more.

Quote:
-is the portion of the cpu that code can be run from 0x8000 to 0xFFFF?(just making sure)
Quote:

The program counter can exist anywhere from $0000 to $FFFF. Though on the NES, ROM is generally placed in the memory range $8000 to $FFFF. You can run code from RAM, which is found at $0000-$07FF or $6000-$7FFF, if you have 8 kilobytes of additional SRAM.

Quote:
- how do you tell where the code on the rom starts (It is probably documented somewhere, but I could not see it)


The Reset vector at $FFFC (I think that's the location) of the "fixed" bank of ROM (this depends on the mapper, often it is the last bank of ROM) points to the location that is gone to when the Reset button is pressed (or the power button is switched on) with a 16-bit address. For NROM, which you should focus on learning on before moving on to more complex mappers (NROM means No Mapper), there are only 2 16 kilobyte banks of program data, so the reset vector will be at $FFFC at all times.

Quote:
when is the status register changed (outside of opcodes that directly tweak its flags such as CLC)


The status register is changed A LOT, so you should really consult a 6502 reference, because a good one will tell you all of this.

Sorry, I have to get back to class now. I'd be happy to answer any more of your questions though.

by on (#51368)
Thank you both very much, that was helpful.

so, if the registers are unsigned like tepples said, then which of these are true?

I have two numbers, 1 and 6, and I subtract 6 from 1, which would be true:

00000001 - 00000110 would be stored as 11111011

00000001 - 00000110 would be stored as 11111011 and the negative flag would in the status register would be set

00000001 - 00000110 would be stored as 00000101 and the negative flag would in the status register would be set

or did I miss what you are trying to telling me?

by on (#51369)
All load or arithmetic instructions affect the Zero and Negative flags.

Think of the negative flag is being a copy of the most significant bit (leftmost bit).
The Zero flag is set any time the result is zero.

by on (#51370)
That makes sense, thanks
Re: Simple questions
by on (#51375)
someone_somewhere wrote:
-From what I gather the program counter is the only thing that should be treated as 16 bit in the nes, am I correct?


The PC counter only has an 8bit adder. So instructions that fall between two pages have an extra cycle added to the opcode. Same with indexing (except for [zp,x] where the MSB isn't effected IIRC). When the cpu calculates the effective address of base plus index, and the index+lsb of the address causes an overflow, an extra cycle is taken to add the carry to the MSB of the base address.
Re: Simple questions
by on (#51382)
tomaitheous wrote:
The PC counter only has an 8bit adder. So instructions that fall between two pages have an extra cycle added to the opcode.

All instructions? I thought it was just branches whose not-taken instruction and taken instruction had their first byte on different pages.
Re: Simple questions
by on (#51383)
tepples wrote:
tomaitheous wrote:
The PC counter only has an 8bit adder. So instructions that fall between two pages have an extra cycle added to the opcode.

All instructions? I thought it was just branches whose not-taken instruction and taken instruction had their first byte on different pages.


Tepples is correct.

Every 6502, 65c02, and 65816 book I have (and that's MANY, for the record) has no mention of an extra cycle being applied to opcodes+operands which are "split" across a page boundary (e.g. for 80FE: LDA $1234, you'd have 80FE=A9, 80FF=$34, 8100=$12).

I believe the cycle penalty only applies to branches and loads which cross a page boundary.

It also seems that as of late people have been wanting an accurate opcode chart, including with all the "catches" (cycle penalties, etc.). I might spend some time typing the 6502 chart in this weekend if enough people are interested.
Re: Simple questions
by on (#51404)
koitsu wrote:
It also seems that as of late people have been wanting an accurate opcode chart, including with all the "catches" (cycle penalties, etc.). I might spend some time typing the 6502 chart in this weekend if enough people are interested.

If your typing is free of transcription errors it would be greatly appreciated. Ascending by opcode [for easy comparisons for those of us using a function pointer table] would be most useful for me. 8)
Re: Simple questions
by on (#51432)
tepples wrote:
tomaitheous wrote:
The PC counter only has an 8bit adder. So instructions that fall between two pages have an extra cycle added to the opcode.

All instructions? I thought it was just branches whose not-taken instruction and taken instruction had their first byte on different pages.


Why would it *only* apply to conditional branch instructions? I assume you're not talking about the normal +1 cycle if the branch is taken, but an additional +1 cycle if the taken branch crosses a page boundary.

Quote:
Every 6502, 65c02, and 65816 book I have (and that's MANY, for the record) has no mention of an extra cycle being applied to opcodes+operands which are "split" across a page boundary (e.g. for 80FE: LDA $1234, you'd have 80FE=A9, 80FF=$34, 8100=$12).


I heard it from more than one source (conversations, cycle testing/emu accuracy related). I thought this was common knowledge. The PCE's related 65x was tested for all these conditions. Has no page boundary penalties. But the again, the PCE has +1 cycle for every instruction that's 2bytes or longer - hinting to the fact that the processor is updating the MSB of the PC on every opcode *or* address calculation (sans [zp,x]). Bxx's are 2/4 cycles, not 2/3. Etc. Books aren't always 100%. Hell, quite a few patents are incorrect as well manufacture manuals (this I've seen first hand).

Curious, how would you validate absolute addressing with indexing's extra cycle if the indexing regs value puts it into another page boundary (LSB addition overflows) - but *not* for when an instruction crosses a page boundary?

Anyway, only one way to find out for sure :)

by on (#51433)
Technically the PC doesn't have adder, just an incrementer ;)
Re: Simple questions
by on (#51445)
tomaitheous wrote:
I heard it from more than one source (conversations, cycle testing/emu accuracy related). I thought this was common knowledge. The PCE's related 65x was tested for all these conditions. Has no page boundary penalties. But the again, the PCE has +1 cycle for every instruction that's 2bytes or longer - hinting to the fact that the processor is updating the MSB of the PC on every opcode *or* address calculation (sans [zp,x]). Bxx's are 2/4 cycles, not 2/3. Etc. Books aren't always 100%. Hell, quite a few patents are incorrect as well manufacture manuals (this I've seen first hand).


No offence intended -- but comparing the PCE's incredibly custom HuC6280A to that of an NMOS 6502 is pretty ballsy, especially since the PCE's CPU is CMOS to boot. Hudson's processor is massively custom in many ways.

tomaitheous wrote:
Curious, how would you validate absolute addressing with indexing's extra cycle if the indexing regs value puts it into another page boundary (LSB addition overflows) - but *not* for when an instruction crosses a page boundary?


What makes you think the CPU logic for page boundary crossing is the same for instruction (opcode/operands) fetching as it is indexed opcodes? We don't know this.

I spent time this morning reading through the six different 6502 books I have, including one which covers NMOS 6502, 65C02, and 65816. All of them mention the cycle penalty for indexed operations, but none of them -- including 3 of the books which include a full copy of the engineering and hardware implementation manual -- mention this penalty for instructions themselves.

The closest thing I can find to your claim is in one of my books, and it's a long shot. In Programming The 6502 by Rodnay Zaks (ISBN 089588-046-6), page 49, section "6502 Hardware Organization", subsection "The Paging Concept", there is the following quote:

"In the case of the 6502, it is important to keep in mind the page organization of the memory. Whenever a page boundary has to be crossed, it will often introduce an extra cycle delay in the execution of the instruction."

This section comes immediately after the section describing the stack, and immediately before the section describing the address and data bus lines on the chip itself. It's hard to conclude if the placement of that section means the concept is universal across the CPU or not.

tomaitheous wrote:
Anyway, only one way to find out for sure :)


I'd offer to contact Bill Mensch via WDC, but I'm of the realisation that it wouldn't do much good.

Someone would really need to bust out an oscilloscope or some kind of logic analyser to figure it out. Bunnyboy? :-)

by on (#51447)
No logic analyzer needed. All one has to do is make some timing tests involving single instructions split across a boundary and run them on a PowerPak (or the TG16 counterpart). Make sure to use JMP in these tests for more predictability, as it doesn't lose a cycle when crossing a page boundary.

by on (#51450)
tepples wrote:
No logic analyzer needed. All one has to do is make some timing tests involving single instructions split across a boundary and run them on a PowerPak (or the TG16 counterpart). Make sure to use JMP in these tests for more predictability, as it doesn't lose a cycle when crossing a page boundary.


I don't know how this would work, or what the PowerPak provides that would allow for detection of a single cycle difference. Clue me in.

by on (#51452)
koitsu wrote:
tepples wrote:
All one has to do is make some timing tests involving single instructions split across a boundary and run them on a PowerPak (or the TG16 counterpart).

I don't know how this would work, or what the PowerPak provides that would allow for detection of a single cycle difference. Clue me in.

How do you think Blargg's CPU timing test ROMs work?

The NES PPU draws one scanline in the time it takes the CPU to execute 106.56 (PAL) or 113.67 (NTSC) cycles. So run a loop 256 times, and it'll take at least two scanlines longer if each iteration is one cycle longer. That's enough to make the loop finish before or after the sprite 0 hit.

by on (#51454)
tepples wrote:
koitsu wrote:
tepples wrote:
All one has to do is make some timing tests involving single instructions split across a boundary and run them on a PowerPak (or the TG16 counterpart).

I don't know how this would work, or what the PowerPak provides that would allow for detection of a single cycle difference. Clue me in.

How do you think Blargg's CPU timing test ROMs work?


I have no idea because I haven't used them and I have no idea how such a thing would work anyway, hence my "clue me in".

tepples wrote:
The NES PPU draws one scanline in the time it takes the CPU to execute 106.56 (PAL) or 113.67 (NTSC) cycles. So run a loop 256 times, and it'll take at least two scanlines longer if each iteration is one cycle longer. That's enough to make the loop finish before or after the sprite 0 hit.


Hmm, I kind of see where you're going with this. I understand the purpose of said 256-iteration loop is to provide "more evidence" confirming/denying the cycle penalty. But I don't understand what sprite 0 hit has to do with this.

I suppose you could say I'm questioning the accuracy of such a method.

by on (#51456)
From nestech 2.0:
koitsu wrote:
L. Sprite #0 Hit Flag
---------------------
The PPU is capable of figuring out where Sprite #0 is, and stores
it's findings in D6 of $2002. The way this works is as follows:

The PPU scans for the first actual non-transparent "sprite pixel" and
the first non-transparent "background pixel." A "background pixel" is
a tile which is in use by the Name Table. Remember that colour #0
defines transparency.

The pixel which causes D6 to be set *IS* drawn.

D6 turns off at the end of vertical blanking and on once the PPU draws these overlapping opaque pixels. So we can carefully plan the execution of a loop so that it starts with the raster in a known position and finishes before the flag turns on if the CPU waits 256 penalty cycles (one for each iteration), or after the flag turns on if the CPU does not wait the extra cycles.
Re: Simple questions
by on (#51477)
koitsu wrote:
No offence intended -- but comparing the PCE's incredibly custom HuC6280A to that of an NMOS 6502 is pretty ballsy, especially since the PCE's CPU is CMOS to boot. Hudson's processor is massively custom in many ways.


How is it ballsy? All the normal extra cycles under certain conditions of the original 65c02, are executed in full on the 6280. The 6280 might be custom, but it's still based on Rockwell's 65C02 IP. It's not some microcoded compatible 65x. I'm saying there's more than a passing coincidence.


Quote:
What makes you think the CPU logic for page boundary crossing is the same for instruction (opcode/operands) fetching as it is indexed opcodes? We don't know this.


Well; everything else in the processor is 8bit, it was made as cheaply as possible (and that could be an understatement), having a single 8bit incrementor for the PC fits in with the rest. But, I have read that the effective address calculation of indexing, didn't pass through the ALU - but a separate 8bit ADDER unit. Thus, no extra cycles for indexing (that doesn't cross a page boundary). It would make sense that the PC would also use this second ADDER unit as well, considering all the corners they were cutting. Yes - it's a lot of speculation and hearsay, but then there's the statement of Bxx taking an extra-extra cycle on page crossing and 6280 mysterious additional cycles that match up with 65x if you include page crossing penalties.


Quote:
I'd offer to contact Bill Mensch via WDC, but I'm of the realisation that it wouldn't do much good.


That's not the same guy that stated in a recent online video interview, that any serious project for the 65x should be made with sweet16 - is it? For a man that had a hand in designing this legacy processor (if it's the same guy), he sure knows little about programming on these old processor. Sweet16 is pathetic. I close out the podcast after hearing that in the interview :/ What a jerk.


Quote:
Someone would really need to bust out an oscilloscope or some kind of logic analyser to figure it out. Bunnyboy? :-)


Logic analyzer would be the best idea. Loops and visual conformation are inaccurate. Especially for this conversation. I mean, the amount of impact of what we are talking about would be extremely tiny. Charles MacDonald had the right idea. He underclocked the PCE processor with a 600khz clock and used a logic analyzer. Same should be done with the NES.
Re: Simple questions
by on (#51479)
tomaitheous wrote:
Well; everything else in the processor is 8bit, it was made as cheaply as possible (and that could be an understatement), having a single 8bit incrementor for the PC fits in with the rest.

More than likely, there's an 8-bit incrementor for both the lower and upper bytes of the PC, and the upper result isn't used unless the old value of the lower byte is $FF. Some behaviors of the unofficial instructions involve ((PC>>8)+1), which is more evidence for this second incrementor's presence.

Quote:
Logic analyzer would be the best idea. Loops and visual conformation are inaccurate. Especially for this conversation. I mean, the amount of impact of what we are talking about would be extremely tiny.

If the extra cycle happens all the time under those conditions, then in what way would a loop repeatedly triggering those precise conditions be inaccurate?