43110 wrote:
I had to read that several times before I got it. While I am aiming for subroutine reuse, I don't think I'll have things logically knitted like that. But you did mention putting subroutines in the same page, and I then saw that the address really only have 10 bits for a 4KiB ROM. That means I'm now wasting 6 bits for every address in the list, and that's not too far from having a whole implied byte automatically inserted by the list loader.
But you also want the list loader to be concise.
Without the "falling block puzzle game" side of it, you can have counted vectors anywhere in the ROM without a count byte by using the high bit to flag the beginning of the list, but after working that through, its a bit more intricate and brittle than I like.
Vectors with an index to their own end in front don't let you play the "sliding block puzzle" games, but they are easier to follow what is going on.
And there really is not any binding constraint with the vector pool approach in terms of how many vectors it supports. If you have so many lists of subroutine calls that you've used up one vector pool, you've saved more than enough space to cover the overhead for a second vector pool page.
Code:
;;
; NB. A "vector pool" is a page with a set of subroutine address vectors in it.
; The leading byte is the position in the page of the last byte in the vector.
routine1: ; the first entry in vector pool 1.
ldx #<vector1
go_pool1:
stx temp+5
lda vpool1,x
tax
-
lda vpool1,x
pha
dex
cpx temp+5
bne -
rts ; call subroutine vectors in turn from stack page
routine2:
ldx #<vector2
jmp go_pool1
routine3: ; NB. routine3 is only four ops, so there is no space saving from the vector
jsr op3_1
jsr op3_2
jsr op3_3
jmp op3_4
routine4:
ldx #<vector4
jmp go_pool1
; ... and so on, until & unless the vpool1 page is full up, then (supposing it was 30 routines that filled it up):
routine31: ; the first entry in vector pool 2.
ldx #<vector31
go_pool2:
lda vpool2,x
tax
-
routine1: ; the first entry in vector pool 1.
ldx #<vector1
go_pool1:
stx temp+5
lda vpool2,x
tax
-
lda vpool2,x
pha
dex
cpx temp+5
bne -
rts ; call subroutine vectors in turn from stack page
routine32:
ldx #<vector2
jmp go_pool2
...
... so the limit of distinct vectors to a single binary page is not an overall limit on the number of vectors altogether.
And if (as supposed above), 30 vectors filled up a vector page, then using the vector page saved 30 bytes, since each vector pool call is one byte shorter than passing a full address, which
But if you want to really crunch ... yeah the addresses can be packed. EDIT ~ yeah, 4K is 12 bits, so not just can't be bothered packing more than two, CAN'T pack more then two ~ 2 bits to 4 bits fixed put in place w/out more edit markets ~ the original was to crunch routines into the 4K "Golden RAM" in the C64 above the ROM in the memory map and below the I/O, but it was reconstructed from memory, and I got some of the details off ~ plus multiplied when I should have been dividing (oops). /EDIT
If there are an odd number of routines, its the address at the head of the list that is not unpacked, so in the vector pool, an odd list would like like (dummy, so op_# is the place of the op in the sequence, not the actual name, psuedo-assembler syntax):
.byte
.word op1
.byte <op_2, <op_3, ((>op_2 .AND. $0F)*16)+(>op_3 .AND $0F)
.byte <op_4, <op_5, ((>op_4 .AND. $0F)*16)+(>op_5 .AND $0F)
.byte <op_6, <op_7, ((>op_6 .AND. $0F)*16)+(>op_7 .AND $0F)
...
... with the packed address business best constructed as a macro in your favorite 6502 macro assembler, to get it right once and avoid lurking typos.
Note that the high bit of the address is stripped by the packing process and replaced by the routine.
Code:
;;
; NB. The leading byte is the size of the packed string of vectors in bytes.
routine1: ; the first entry in vector pool 1.
ldx #<vector1
go_pool1:
lda vpool1,x
sta temp+5
tax
-
lda vpool1,x
tay
and #$0F
ora #$80
pha
lda vpool1,x
pha
dex
cmp temp+5
beq +
lda vpool1,x
tya
lsr a
lsr a
lsr a
sec
ror a
pha
lda vpool1,x
pha
dex
cmp temp+5
bne -
+
rts ; call subroutine vectors in turn from stack page
44 vs 16 bytes, so 28 bytes extra space required by the processing. 56 pairs of subroutine vectors crunched together and that breaks even.
So that is worth considering if the vector pool page is more than half full.
And since each vector pool has its own engine, if you have own full vector pool page and one only quarter full, you can put packed lists in the full one, and unpacked lists in the only partly full one.
Personally, if I was using any, I'd use the simpler one, because its easier to just define the routines using ".word" assembler instructions with the label for the routine.
This is all a kind of building a Forth-like inner interpreter without a Forth assembler/compiler system running in the NES.