I managed to get down to 102 cycles to convert 8 4bit pixels, with a long complicated LUT. I'm not done typing out the LUT, so the LUT isn't posted. I can't seem to find an algorithm faster than this.
ldx !input ;3
ldy !input+1 ;3 6
lda LUT+$0000,x ;4 10
ora LUT+$0400,y ;4 14
xba ;2 16
lda LUT+$0100,x ;4 20
ora LUT+$0500,y ;4 24
ldx !input+2 ;3 27
ldy !input+3 ;3 30
ora LUT+$0900,x ;4 34
ora LUT+$0d00,y ;4 38
sta !output+1 ;3 41
xba ;2 43
ora LUT+$0800,x ;4 47
ora LUT+$0c00,y ;4 51
sta !output ;3 54
lda LUT+$0a00,x ;4 58
ora LUT+$0e00,y ;4 62
xba ;2 64
lda LUT+$0b00,x ;4 68
ora LUT+$0f00,y ;4 72
ldx !input ;3 75
ldy !input+1 ;3 78
ora LUT+$0300,x ;4 82
ora LUT+$0700,y ;4 86
sta !output+3 ;3 89
xba ;2 91
ora LUT+$0200,x ;4 95
ora LUT+$0600,y ;4 99
sta !output+2 ;3 102
Does anybody know of a way I can bring this down even more?
Code:
ldx !input ;3
ldy !input+1 ;3 6
lda LUT+$0000,x ;4 10
ora LUT+$0400,y ;4 14
xba ;2 16
lda LUT+$0100,x ;4 20
ora LUT+$0500,y ;4 24
ldx !input+2 ;3 27
ldy !input+3 ;3 30
ora LUT+$0900,x ;4 34
ora LUT+$0d00,y ;4 38
sta !output+1 ;3 41
xba ;2 43
ora LUT+$0800,x ;4 47
ora LUT+$0c00,y ;4 51
sta !output ;3 54
lda LUT+$0a00,x ;4 58
ora LUT+$0e00,y ;4 62
xba ;2 64
lda LUT+$0b00,x ;4 68
ora LUT+$0f00,y ;4 72
ldx !input ;3 75
ldy !input+1 ;3 78
ora LUT+$0300,x ;4 82
ora LUT+$0700,y ;4 86
sta !output+3 ;3 89
xba ;2 91
ora LUT+$0200,x ;4 95
ora LUT+$0600,y ;4 99
sta !output+2 ;3 102
Does anybody know of a way I can bring this down even more?