faster, but less accurate, sprite rotation algorithm

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
faster, but less accurate, sprite rotation algorithm
by on (#84580)
A year ago, I posted a code for software sprite rotation, and promised a demo for it, but never actually got finished with it.

The reason being so was:

1) I had school in the way.
2) It wasn't fast enough to do it in realtime.

My old algorithm addresses pixels in a 256 x 256 grid. First it calculates the y-coordinates for a line of 8 pixels using 16 bit values, with the low 8-bits being decimals. Then it does the same with the x-coordinates, and stores them one byte before the y-coordinates, so the top x byte overwrites the bottom y byte, to create the 16-bit address for the pixel.


Last night in the shower, I came up with an idea that can gain a magnitude of speed. Instead of calculating the pixel coordinates for every pixel, I can calculate the coordinates for the middle row, and middle column and use "LDA (dp),y" to add the row coordinate and the column coordinate and access the pixel in one instruction.

Converting from packed pixel to planar takes a lot of work too, so I decided to use 2bpp as opposed to 4bpp. I then realized, since I only need to calculate one row and one collumn of coordinates, I can arrange it so it uses a 128x128 pixel grid, with 2 bytes per pixel. Each byte holds just 1 bit. All it needs to do is "ASL" and "ORA (dp),y" the next pixel.

Code:
ldy !row_0

lda (!collumn_0),y
asl
ora (!collumn_1),y
asl
ora (!collumn_2),y
asl
ora (!collumn_3),y
asl
ora (!collumn_4),y
asl
ora (!collumn_5),y
asl
ora (!collumn_6),y
asl
ora (!collumn_7),y

sta !bitmap


To avoid carring a bit from the x-coordinate into the y-coordinate, a sprite needs to be within a box twice as large on all sides.

For an upright 32x32 sprite the collumn and row coordinates are:

Collumns:
(0,16),(1,16)...(30,16),(31,16)

Rows:
(16,0),(16,1)...(16,30),(16,31)

Notice how the middle row (16,16) and middle collumn (16,16) add up to (32,32), which is right in the middle of the 64x64 box!
Re: faster, but less accurate, sprite rotation algorithm
by on (#160344)
Sorry to bring back the topic from the grave, I am interested in it.

Did you take your idea further ?
I am trying to understand your algorithm, have you posted a full version somewhere ? (Couldn't find anything about it)
Re: faster, but less accurate, sprite rotation algorithm
by on (#160346)
I actually forgot about this thread. I do have a sprite rotation algorithm, but it doesn't use any of the tricks I mentioned in this thread.
Re: faster, but less accurate, sprite rotation algorithm
by on (#160351)
Thank you for your fast reply, have you made your algorithm public or maybe some explanations ?
Am trying to look for good starter ideas on soft sprite rotation and you seemed to have done some work on that.
Re: faster, but less accurate, sprite rotation algorithm
by on (#160373)
This is the code used in Alisha's Adventure. When calling this routine, index X should be already pointing to the metasprite data for the sprite it wants to rotate. I store the rotation information in with the metasprite data for convenience.

It doesn't run in real time though. It has to pre-rotate sprites in RAM first, but it can do it while running the game, and I've set up check points in the level to swap out different sets of rotating sprites.

Code:
rotate_sprite:
phb
php
sep #$20
lda $0004,x
sta {rotation_step}
stz {rotation_angle}
lda $000a,x
asl #3
sta {size}
asl #2
sta {d}
lda $0000,x
stz {x_pixel}
sta {x_pixel_hi}
lda $0001,x
stz {y_pixel}
sta {y_pixel_hi}
lda $0002,x
pha
rep #$20
lda $0008,x
sta {e}

lda $0006,x
tax

plb
phd
lda #$0000
tcd
jsr convert_bitmap
pld
plp
plb
rts

new_rotation_step:

pla
sta.b {y_pixel}
pla
sta.b {x_pixel}

lda.b {rotation_step}
clc
adc.b {rotation_angle}
sta.b {rotation_angle}
cmp #$0080
bcc convert_bitmap
rts


convert_bitmap:
sep #$20
lda #$00
sta $004200
rep #$20

phx
lda.b {rotation_angle}
asl
and #$01fe
tax
lda $000000+sine,x
sta.b {sine}
lda $000000+cosine,x
sta.b {cosine}
plx

lda.b {x_pixel}
pha
lda.b {y_pixel}
pha

lda.b {sine}
clc
adc.b {cosine}
sta.b {a}
sep #$20
sta $00211b
xba
sta $00211b
lda.b {size}
lsr
sta $00211c
rep #$20
lda.b {size}
xba
clc
adc.b {a}
lsr
sec
sbc $002134
clc
adc.b {x_pixel}
sta.b {x_pixel}
lda.b {cosine}
sec
sbc.b {sine}
sta.b {a}
sep #$20
sta $00211b
xba
sta $00211b
lda.b {size}
lsr
sta $00211c
rep #$20
lda.b {size}
xba
clc
adc.b {a}
lsr

sec
sbc $002134
clc
adc.b {y_pixel}
sta.b {y_pixel}
lda.b {size}
sta.b {c}
lsr #3
sta.b {b}
lda.b {size}
asl #2
sta.b {a}

sep #$20
lda #$81
sta $004200
rep #$20


convert_bitmap_loop:
lda.b {c}
bne old_rotation_step
jmp new_rotation_step

old_rotation_step:
lda.b {x_pixel}
pha
lda.b {y_pixel}
pha
convert_line:
sep #$20
jsr convert_pixel
phb
lda.b {e}
pha
plb
rep #$20
lda.b {bitplane_0}
sta $0000,x
lda.b {bitplane_2}
sta $0010,x
plb
txa
clc
adc #$0020
tax
lda.b {b}
bne convert_line
pla
clc
adc.b {cosine}
sta.b {y_pixel}
pla
clc
adc.b {sine}
sta.b {x_pixel}
dec.b {c}
lda.b {size}
lsr #3

sta.b {b}
lda.b {size}
txa
sec
sbc.b {a}
inc #2
tax
bit #$000e
bne convert_bitmap_loop
clc
adc.b {d}
sec
sbc #$0010
tax
jmp convert_bitmap_loop


convert_pixel:
rep #$20
dec.b {b}


lda.b {y_pixel}
sta.b {scratch_pad_ram}+2
sec
sbc.b {sine}
sta.b {scratch_pad_ram}+6
sec
sbc.b {sine}
sta.b {scratch_pad_ram}+10
sec
sbc.b {sine}
sta.b {scratch_pad_ram}+14
sec
sbc.b {sine}
sta.b {scratch_pad_ram}+18
sec
sbc.b {sine}
sta.b {scratch_pad_ram}+22
sec
sbc.b {sine}
sta.b {scratch_pad_ram}+26
sec
sbc.b {sine}
sta.b {scratch_pad_ram}+30
sec
sbc.b {sine}
sta.b {y_pixel}

lda.b {x_pixel}
sta.b {scratch_pad_ram}+1
clc
adc.b {cosine}
sta.b {scratch_pad_ram}+5
clc
adc.b {cosine}
sta.b {scratch_pad_ram}+9
clc
adc.b {cosine}
sta {scratch_pad_ram}+13
clc
adc.b {cosine}
sta {scratch_pad_ram}+17
clc
adc.b {cosine}
sta {scratch_pad_ram}+21
clc
adc.b {cosine}
sta {scratch_pad_ram}+25
clc
adc.b {cosine}
sta.b {scratch_pad_ram}+29
clc
adc.b {cosine}
sta.b {x_pixel}

sep #$20


lda ({scratch_pad_ram}+2)
lsr
rol.b {bitplane_0}
lsr
rol.b {bitplane_1}
lsr
rol.b {bitplane_2}
lsr
rol.b {bitplane_3}
lda ({scratch_pad_ram}+6)
lsr
rol.b {bitplane_0}
lsr
rol.b {bitplane_1}
lsr
rol.b {bitplane_2}
lsr
rol.b {bitplane_3}
lda ({scratch_pad_ram}+10)
lsr
rol.b {bitplane_0}
lsr
rol.b {bitplane_1}
lsr
rol.b {bitplane_2}
lsr
rol.b {bitplane_3}
lda ({scratch_pad_ram}+14)
lsr
rol.b {bitplane_0}
lsr
rol.b {bitplane_1}
lsr
rol.b {bitplane_2}
lsr
rol.b {bitplane_3}
lda ({scratch_pad_ram}+18)
lsr
rol.b {bitplane_0}
lsr
rol.b {bitplane_1}
lsr
rol.b {bitplane_2}
lsr
rol.b {bitplane_3}
lda ({scratch_pad_ram}+22)
lsr
rol.b {bitplane_0}
lsr
rol.b {bitplane_1}
lsr
rol.b {bitplane_2}
lsr
rol.b {bitplane_3}
lda ({scratch_pad_ram}+26)
lsr
rol.b {bitplane_0}
lsr
rol.b {bitplane_1}
lsr
rol.b {bitplane_2}
lsr
rol.b {bitplane_3}
lda ({scratch_pad_ram}+30)
lsr
rol.b {bitplane_0}
lsr
rol.b {bitplane_1}
lsr
rol.b {bitplane_2}
lsr
rol.b {bitplane_3}


rts

Re: faster, but less accurate, sprite rotation algorithm
by on (#160374)
Quote:
Converting from packed pixel to planar takes a lot of work too

It consumes many cycles ??

I said to stef, that the main reason for slow rendering on snes was his planar mode,it take many more cycles than packed pixels and makes software effects more slower than MD .
Re: faster, but less accurate, sprite rotation algorithm
by on (#160409)
Thank you psycopathicteen for your help.