Handling multi byte reads for page crossing

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Handling multi byte reads for page crossing
by on (#53896)
I was wondering if there are any efficient ways to handle multibyte reads for page crossings, assuming you got the endian all down. I thought about this, and all I could come up with was making the prg banks share with one another in a big array, but that would mean I have to copy the prg rom data everytime a mapper makes a bankswitch. I guess what one could also do is a stream of if statements for something seeing if it near the page boundary like read16()/readxx() and then act otherwise. Is there any methods that you guys use to read across boundaries without any extra logic?

by on (#53900)
The NES CPU can read only one byte per cycle. For a multibyte read like an instruction or an indirect address, it reads a byte at a time and accumulates the result in internal latches. So should an emulator unless you're trying to run on really limited hardware like a 16.8 MHz ARM.

by on (#53903)
This is more of an optimization thing, rather than an accuracy depiction of the hardware, which I'm fine with since I wanted to try new approaches for emus. To be truthful, I asked in here not to try it on the NES, but for other consoles that reads more than 1 byte at a time. I wanted to see how you guys would handle such cases efficiently. Sorry I didn't give all the info since I didn't know where to put this question (only one subsection titled *emudev :P)

by on (#53907)
Are you talking about a platform with a data bus wider than a byte,* such as PCs or 68K-based consoles? In that case, the vast majority of reads and writes larger than 1 byte are aligned to a multiple of the data bus size. For example, a 16-bit read on a 16-bit data bus is from an address that is a multiple of 2. Each CPU has its own behavior in case of unaligned access: 68K throws an exception, x86 breaks it up into smaller aligned reads, and ARM barrel-rotates the result in some ABI-defined manner.


* "Byte" is the smallest unit of memory addressable without bit shift instructions. It isn't always 8 bits, though it is 8 bits on the vast majority of platforms that you'll encounter, including all major platforms by Atari, Nintendo, Sega, Sony, and Microsoft. Some old mainframes and minis have 6-bit bytes, and some DSPs are "word addressed" with 16- or 32-bit bytes.

by on (#53915)
Yeah I am talking about consoles with data transmission of more than one byte. In this case, I just want to try emulate consoles with 8 bits for 1 byte data bus, which brings me back to the question of why I started the thread. I guess I didn't write what I want exactly. :? My question is simply this. Assuming same endian and same byte size, what will be the most efficient way to use say, a u16 read function that return 2 bytes combined that handle pages that point to different thing. For example:

0 to 0xFFF is low prg rom
0x1000 to 0x1FFF to high prg rom
0x2000 to 0x2FFF IO regs

and the prg rom bankswitched, so my question is how do I write such a function that can handle these things with maximum efficiency, that is, how to handle page crossing with different pages when some data retrieval happens. (ideally, no branching (if possible :P))

by on (#53931)
If you don't care about accuracy, the only way you'll manage to use:
data = *(uint16_t*)address is if you pack the bus together.

In other words, for your example allocate 0x3001 bytes of RAM, and put the data right inside the buffer. The last extra byte should mirror address 0 in PRG ROM. This is in case the game tries to read 16-bits from $2fff.

You'll complicate your write and I/O functions substantially, but it's the only way short of voodoo mmap() magic (that WILL be slower) that you're going to be able to access two bytes immediately after each other. I doubt it will even really be faster.

Snes9X uses an if() test to toggle between byte and word read mode, and only on little endian systems. Since they kept all the extra complexity, I assume it helps a little. Tell that to people who have the program crash when you play Mega Man X3 because their pointer system is inherently unsafe though.

if((addr & 0xfff) == 0xfff) {
data.lo = read(addr);
data.hi = read(addr + 1);
} else {
data.word = readword(addr);
}

Both read and readword decode addr exactly one time to figure out what memory block it should read from.

by on (#53932)
Using u16 pointers is X86 only, because it's one of the few instruction sets that doesn't have word/halfword alignment restrictions.