I've been reverse engineering the firmware inside of a very old (well, NES contemporary) dot matrix printer, and found the following fascinating little chunk of code:
Yield saves A,X,Y,S, and PC of the calling thread and switches to another thread. The scheduler appears to just be round-robin; if thread N yields, thread N+1 is resumed ... until all threads have yielded, in which case they're all marked as "allowed to execute" again and it resumes with the first one in the table.
Specifically, this was running on a Mitsubishi 740-class microcontroller, which uses a superset of the original NMOS 6502 instruction set, and has many of the same instructions (although with different encodings) as the WDC 65C02.
Code:
CreateThreads:
ldy #$00
- lda ThreadStartValues+1,y
beq +
sta temp5
lda ThreadStartValues,y
sta temp4
ldx ThreadStartValues+2,y
iny
iny
iny
lda #$00
sta $0101,x
txs
pha
lda temp5
pha
lda temp4
pha
lda #$00
pha
pha
pha
stx temp1
tsx
txa
ldx temp1
sta $0100,x
bra - ; "branch always"
+ cli
-- ldy #$00
- lda ThreadStartValues+1,y
beq ++
ldx ThreadStartValues+2,y
lda $0101,x
bne +
stx StackPtrOfThreadYieldedFrom
lda $0100,x
tax
txs
pla
tay
pla
tax
pla
rts
+ iny
iny
iny
bra -
++ ldy #$00
- lda ThreadStartValues+1,y
beq +
ldx ThreadStartValues+2,y
lda $0101,x
clb0 a ; equivalent to "and #$FE"
sta $0101,x
iny
iny
iny
bra -
+ bra --
;----------------------
Yield:
pha
txa
pha
tya
pha
ldy StackPtrOfThreadYieldedFrom
lda $0101,y
seb0 a ; equivalent to "ORA #1"
sta $0101,y
tsx
txa
sta $0100,y
jmp --
;---------
ThreadStartValues:
.byte <(Thread1Start-1),>(Thread1Start-1),Stack1Start
.byte <(Thread2Start-1),>(Thread2Start-1),Stack2Start
.byte <(Thread3Start-1),>(Thread3Start-1),Stack3Start
.byte <(Thread4Start-1),>(Thread4Start-1),Stack4Start
.byte 0,0
ldy #$00
- lda ThreadStartValues+1,y
beq +
sta temp5
lda ThreadStartValues,y
sta temp4
ldx ThreadStartValues+2,y
iny
iny
iny
lda #$00
sta $0101,x
txs
pha
lda temp5
pha
lda temp4
pha
lda #$00
pha
pha
pha
stx temp1
tsx
txa
ldx temp1
sta $0100,x
bra - ; "branch always"
+ cli
-- ldy #$00
- lda ThreadStartValues+1,y
beq ++
ldx ThreadStartValues+2,y
lda $0101,x
bne +
stx StackPtrOfThreadYieldedFrom
lda $0100,x
tax
txs
pla
tay
pla
tax
pla
rts
+ iny
iny
iny
bra -
++ ldy #$00
- lda ThreadStartValues+1,y
beq +
ldx ThreadStartValues+2,y
lda $0101,x
clb0 a ; equivalent to "and #$FE"
sta $0101,x
iny
iny
iny
bra -
+ bra --
;----------------------
Yield:
pha
txa
pha
tya
pha
ldy StackPtrOfThreadYieldedFrom
lda $0101,y
seb0 a ; equivalent to "ORA #1"
sta $0101,y
tsx
txa
sta $0100,y
jmp --
;---------
ThreadStartValues:
.byte <(Thread1Start-1),>(Thread1Start-1),Stack1Start
.byte <(Thread2Start-1),>(Thread2Start-1),Stack2Start
.byte <(Thread3Start-1),>(Thread3Start-1),Stack3Start
.byte <(Thread4Start-1),>(Thread4Start-1),Stack4Start
.byte 0,0
Yield saves A,X,Y,S, and PC of the calling thread and switches to another thread. The scheduler appears to just be round-robin; if thread N yields, thread N+1 is resumed ... until all threads have yielded, in which case they're all marked as "allowed to execute" again and it resumes with the first one in the table.
Specifically, this was running on a Mitsubishi 740-class microcontroller, which uses a superset of the original NMOS 6502 instruction set, and has many of the same instructions (although with different encodings) as the WDC 65C02.