maybe its a fool question, but if we have a 16 bit shift register were we load the pattern table data, does it means that the first tile (3th) load of the scanline is at cycle 8th?
The first tile load (or 3rd on-screen indeed) starts at cycle 0 and ends at cycle 7 (that's the 8th cycle). It takes 4 memory fetches, every 2 cycles, to load a tile: nametable, attribute, tile low, tile high.