Here's a quick test program that seems to read those buttons in Nintendulator.
The bottom bits of p3curstate/p3oldstate contain the state of the A, X, L, R buttons, and p1curstate/p1oldstate contain the rest. They're called p3 because this is a really quick mod of code I was using for the four score.
The actual joypad reading starts at main, the rest is just standard initialization.
Code:
; INES header setup
.inesprg 1
.ineschr 0
.inesmir 0
.inesmap 0
.zp
p1curstate = $D8
p1oldstate = $D9
p3curstate = $DC
p3oldstate = $DD
.code
.bank 0
.org $C000
reset:
sei
cld
bit $2002
ldx #$40
stx $4017
ldx #$00
stx $2000
stx $2001
stx $4010
stx $4015
vb1
bit $2002
bpl vb1
txa
clearram:
sta $0000,x
sta $0100,x
sta $0200,x
sta $0300,x
sta $0400,x
sta $0500,x
sta $0600,x
sta $0700,x
inx
bne clearram
dex
txs;Set up stack
vb2:
bit $2002
bpl vb2
main:
BIT $2002
BPL main
lda <p1curstate
sta <p1oldstate
lda <p3curstate
sta <p3oldstate
lda #$01
sta <p1curstate ;initialize the buffer with a flag
sta $4016
lsr a
sta $4016
lda #%00010000;initialize the buffer with a flag
sta <p3curstate
p1loop:
lda $4016
ror a
rol <p1curstate
bcc p1loop ;loop if the flag wasn't shifted out yet
p3loop:
lda $4016
ror a
rol <p3curstate
bcc p3loop ;loop if the flag wasn't shifted out yet
mainend:
jmp main
.bank 1
.org $E000
.org $FFFA
.dw 0
.dw reset
.dw 0
I'm not sure about the two lsr instructions in your edit. I feel like that shifts one bit relevant to the button press out without checking it, so it would only work with famicom expansion controllers.