Yes, exact synchronization with the PPU is pretty much impossible (AFAIK, even blargg's trick will only get you really close, like, up to 1 pixel off), but for a successful scroll change you only have to fit 5 cycles worth of instructions (last cycle of st* $2005 plus 4 cycles of st* $2006) in a ~22-cycle window, so there's plenty of wiggle room to get it right regardless of any timing fluctuation.
Effects timed from an NMI or an IRQ look jittery because the CPU always waits for the current instruction to finish before calling the interrupt handler, so with instructions ranging from 2 to 7 cycles on the 6502, the position of a raster effects will vary by up to 7 * 3 = 21 pixels. Effects timed from sprite hits or overflows also suffer from this because the flag can get set at any point during the wait loop, meaning it can sometimes take the code longer to realize the flag has been set (e.g. if it gets set right after $2002 is read, a full iteration of the wait loop will happen before the change is detected), which is why it's important to make these loops as short as possible:
Code:
;do this:
lda #%00100000
TestOverflow:
bit $2002
beq TestOverflow
;instead of this:
TestOverflow:
lda $2002
and #%00100000
beq TestOverflow