Bit reversing

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Bit reversing
by on (#16386)
Hi.

I've been recently looking into things about the 6502 that I never really payed attention to, like comparing if something is greater or less than something else (Yeah, I really didn't know how to do that). And it was quite apparent that I needed to, so I did. Do not fear, I understand 6502 quite well, and I program with it. I'm not a newbie, there are just some dumb things like that that I've never really payed attention to. I've been wondering about reversing a byte, though. It seems like a complicated routine, but I was wondering if there was an easier way... What I mean by reversing a byte is this:

Take this: 10010100

And Make it: 00101001

Does anyone know?

by on (#16387)
Reverse bit order? That's easy:

Code:
    STA temp
    LDX #$08
loop:
    LSR temp
    ROL A
    DEX
    BNE loop


The trick is knowing that ASL and LSR shift the "discarded" bit into the Carry flag, and ROL/ROR will shift the carry flag in at the new location.

Next question? :D

by on (#16388)
Want to make it faster? Unroll the loop:

Code:
  sta temp

  lsr temp
  rol a
  lsr temp
  rol a
  lsr temp
  rol a
  lsr temp
  rol a

  lsr temp
  rol a
  lsr temp
  rol a
  lsr temp
  rol a
  lsr temp
  rol a

Or use a lookup table. The proper method to use depends on why you want to reverse bits in a byte.

by on (#16390)
Thanks for that. I've never really been to good at understanding why some shifting methods work. I looked at a routine that Tokumaru made, it was a hex to decimal routine. I still do not understand why that works. It'd do me well to understand why, I suppose. I was wondering about this for reversing graphics with CHR RAM. As in background tiles, not sprites. I know how to do that. Thanks though!

tepples wrote:
Want to make it faster? Unroll the loop:

Code:

sta temp

lsr temp
rol a
lsr temp
rol a
lsr temp
rol a
lsr temp
rol a

lsr temp
rol a
lsr temp
rol a
lsr temp
rol a
lsr temp
rol a

Or use a lookup table. The proper method to use depends on why you want to reverse bits in a byte.


Oh, but that wastes alot of space... It's like the MMC1 bankswitching technique:

lda #$XX
sta $E000
lsr a
sta $E000
lsr a
sta $E000
lsr a
sta $E000
lsr a
sta $E000

I just do:

ldx #5
lda #$XX
-
sta $E000
lsr a
dex
bne -

I like to save space, beings as we work with such a space-unplentiful system.

by on (#16393)
We also work with a time-unplentiful system, and for things your program does often, those extra 5 cycles of dex/bne per iteration add up. And for reversing relatively big amounts of data such as tile graphics, I would strongly suggest using a lookup table.

by on (#16395)
tepples wrote:
And for reversing relatively big amounts of data such as tile graphics, I would strongly suggest using a lookup table.

This all depend if you buffer the reversed tile or if you do that in NMI.

by on (#16403)
Here's a LUT:
Code:
.db $00, $80, $40, $c0, $20, $a0, $60, $e0
.db $10, $90, $50, $d0, $30, $b0, $70, $f0
.db $08, $88, $48, $c8, $28, $a8, $68, $e8
.db $18, $98, $58, $d8, $38, $b8, $78, $f8
.db $04, $84, $44, $c4, $24, $a4, $64, $e4
.db $14, $94, $54, $d4, $34, $b4, $74, $f4
.db $0c, $8c, $4c, $cc, $2c, $ac, $6c, $ec
.db $1c, $9c, $5c, $dc, $3c, $bc, $7c, $fc
.db $02, $82, $42, $c2, $22, $a2, $62, $e2
.db $12, $92, $52, $d2, $32, $b2, $72, $f2
.db $0a, $8a, $4a, $ca, $2a, $aa, $6a, $ea
.db $1a, $9a, $5a, $da, $3a, $ba, $7a, $fa
.db $06, $86, $46, $c6, $26, $a6, $66, $e6
.db $16, $96, $56, $d6, $36, $b6, $76, $f6
.db $0e, $8e, $4e, $ce, $2e, $ae, $6e, $ee
.db $1e, $9e, $5e, $de, $3e, $be, $7e, $fe
.db $01, $81, $41, $c1, $21, $a1, $61, $e1
.db $11, $91, $51, $d1, $31, $b1, $71, $f1
.db $09, $89, $49, $c9, $29, $a9, $69, $e9
.db $19, $99, $59, $d9, $39, $b9, $79, $f9
.db $05, $85, $45, $c5, $25, $a5, $65, $e5
.db $15, $95, $55, $d5, $35, $b5, $75, $f5
.db $0d, $8d, $4d, $cd, $2d, $ad, $6d, $ed
.db $1d, $9d, $5d, $dd, $3d, $bd, $7d, $fd
.db $03, $83, $43, $c3, $23, $a3, $63, $e3
.db $13, $93, $53, $d3, $33, $b3, $73, $f3
.db $0b, $8b, $4b, $cb, $2b, $ab, $6b, $eb
.db $1b, $9b, $5b, $db, $3b, $bb, $7b, $fb
.db $07, $87, $47, $c7, $27, $a7, $67, $e7
.db $17, $97, $57, $d7, $37, $b7, $77, $f7
.db $0f, $8f, $4f, $cf, $2f, $af, $6f, $ef
.db $1f, $9f, $5f, $df, $3f, $bf, $7f, $ff

by on (#16411)
This table would go well for the super fast but super space consumming method, that you'd typically use if you want to rush reversing a byte when updating tile data during VBlank.

You'd have to be a lot bored to build this table up. Personally, I'd avoid doing such hard work unless I'm really forced to, heh. I just don't see what is so facinating in writing a table with a lot of .db and numbers after that.

by on (#16414)
I already had the table; I use it to dump Turbo Grafx-16 games which have mirrored data pins from a PC-engine ;)

by on (#16417)
Spending 256 bytes to greatly speed up a routine that is critical to a game is a good choice. If it allows avoidance of more time-consuming optimization elsewhere, it's a win. It's not like the table has to be repeated; one copy is all that's needed.

A programmer would obviously let the computer build the above bit reversing table. Even if tedium weren't an issue, building it by hand would be error-prone. But the above doesn't matter, since above we have the table ready for copy-and-paste above.

EDIT: Just to be sure I overuse that word (for whatever reason), I'll write it a few more times: above above above.... above! Back to your regularly scheduled posts.

by on (#16418)
Just to say the idea to make this style of things by hand made me sick. Unfortunately sometimes there is no choise.
And honnestly a loop that just roll shifts two values and repeat 8 times isn't really much a waste of time. In some other condition, such as loops that could repeat much more than 8 times and/or do much more work than just shifting two values, the table solution would be worthy.

by on (#16422)
I agree 100% with blargg. In the grand scheme of things, 256 bytes is nothing. Especially when you consider the time it will save in-game.


Bregalad wrote:
And honnestly a loop that just roll shifts two values and repeat 8 times isn't really much a waste of time.


For just one tile, I'd agree. But if this is going to be a common occurance, that is a major waste of processing time.

Q's loop takes 100 cycles
tepples' unrolled loop takes 59 cycles

using a lookup table with:

Code:
tax
lda table,X


takes a grand total of 6 cycles.

This is not peanuts we're talking about here... the speed difference is astounding. When you consider this process will have to be done SIXTEEN TIMES for a single tile to be flipped... we're talking major slowdowns. using Q's loop and dedicating your entire frame to flipping tiles, you'll be able to convert maybe 18-19 tiles. Whereas using a lookup table... 19 tiles can be converted in roughtly 2000 cycles (~17.5 scanlines)

by on (#16423)
This is true. It really does make a huge difference in speeds. I think the lookup table is a good idea if you need it to be fast, but if you are leaving the screen black for about 1 second (60 frames), I think you can handle saving the space. When you're making an RPG, you MUST conserve as much space as possible. Especially when you have 350+ maps, a huge world map, a ton of graphical data, and all this text and all these events going on in your game, you really don't want to dedicate 256 bytes to anything that could possibly be shrinked down to about 20 or so. Wasting space is not a good thing to do. I say go with the table if you're making a game that moves fast, but if you're making an RPG, dude, save the space, and go with the small routine.

by on (#16424)
If you're trying to conserve space in a program, optimizing tile flipping like this would be one of the last places to look (unless it's an entry to the 2K/4K competition or something). The savings is at most 256 bytes, which wouldn't be worth the time. Much better places are where the savings is multiplied by some large amount of data, like levels, graphics, or text. These are the areas where big savings can often be made as the data are rarely in the most compact form possible. Another possibility is an interpreted scripting language in place of some portions of machine code for games like Maniac Mansion where there's a significant amount of game logic that doesn't have to run continuously.

The general point is that micro-optimization can obscure the big picture and opportunities for major optimization. The most productive areas for optimization are specific to each program. A particular program might arrange data in a way that allows for a major optimization that wouldn't be relevant in any other program.

by on (#16433)
If you're gonna bit-reverse a whole bunch of tiles, you could also generate that table in RAM before doing it. :) Though I guess if you have time to do that, you're in no hurry anyways (or you have RAM to spare, if you keep it persistant).

But yeah, like blargg said, optimize where it counts. Don't sweat the little stuff. Code size (including small-ish lookup tables) is usually nothing compared to data. With 512kB, give yourself a 32kB bank for code, and probably the rest of the ROM for data.

by on (#16434)
I was thinking of 16k for code, and 512k with 16k banks, actually. But I could probably use another bank for code. And you know what, graphics won't be that big of a deal, because I won't have to redraw a single tile in another graphics data file. My game uses CHR RAM, of course, and each map has it's graphics info at the beggining, like a graphics header. It can take parts of different graphics files and store them in the pattern tables. I'll see how much space I have in the end, and I'll see which method I'll use for bit reversing...

by on (#16438)
If you're talking about FF7 (or anything on SUROM) you'd want to have at least 2 times separed 16kb ROM for main code, one fixed bank for the first chunk of 256kb, and one for the segond chunk.
Effectivly, if you're reversing a huge chunk of tiles, the table would be worth it, I haven't really think to that, but I think the other guy above is right.

You'd want to have a few routines common between both fixed banks, and swapping to either one for various department (graphics, field engine, battle engine, music, etc...)