Assembly Question

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Assembly Question
by on (#276)
Hey peeps, a quick assembly question (x86!!)

According to x86 docs a read/write to memory takes 1 cycle (given that the thing being read is in cache)

My question is, if I have to save a register on a stack, would it be faster to to use a temporary int memory location all over my program for this purpose?

a push takes 1 cycle
a pop takes like 4 cycles

but a read/write to memory location takes 1+1 = 2

So over my code would it be faster to go:

mov TEMP, ecx;
call runPerfectNesEmulator;
mov ecx, TEMP;

vs

push ecx;
call runPerfectNesEmulator;
pop ecx;

The math adds up? :D
Re: Assembly Question
by on (#280)
Unlike 6502's you just can't do precise cycle counting on never CPUs. A guesstimate would be that most normal instructions takes about half a cycle while a complete cache miss might take a 100 cycles, and a page-fault can cost you billions of cycles.

It's virtually guaranteed that the stack is cached (unless you use loads of local data) while a single random memory location is probably not, or worse, spreading out your data might waste a whole cache line for it. A push/pop is slightly more complex than a simple move however they are also heavily optimized by processor manufacturers, on ancient hardware they caused stalls for other instructrions accessing the stack pointer.

It's probably a good idea to reserve some stack space among the local variables to hold the value, that way you can still use a move and you won't have to modify the stack pointer in the middle of a function.

The real advantage of using the stack (at least to lazy programmers like me) is that you don't have to worry about race conditions among the writer, this is what often makes it a pain to reuse zeropage registers on the 6502.

by on (#282)
As doynax says, clock cycles can't be counted in the same way on modern CPUs. I would use push/pop for simplicity's sake, but I guess it's a matter of taste.
Either way, if your code really has that call between the push/pop, you shouldn't worry about a clock cycle being lost or gained when using the stack. :)