speed optimizations vs development time

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
speed optimizations vs development time
by on (#69931)
It has been said a gazillion times before that slow code is the result of short development time. However I think slow code has more to do with programmer's style more than his time constraints.

Here is an example of slow code:

lda #$00
sta $2115
lda #$11
sta $2116 ;; load v-ram address port with some random number
lda #$22
sta $2117
lda #$33 ;; set up the DMA registers with random shit
sta $4300
lda #$44
sta $4301
lda #$55
sta $4302
lda #$66
sta $4303
lda #$77
sta $4304
lda #$88
sta $4305
lda #$99
sta $4306
lda #$01
sta $420b

Now here is a faster version of the same code

rep #$20
ldx #$00
stx $2115
lda #$2211
sta $2116 ;; load v-ram address port with some random number
lda #$4433
sta $4300 ;; set up the DMA registers with random shit
lda #$6655
sta $4302
ldx #$77
stx $4304
lda #$9988
sta $4305
ldx #$01
stx $420b

Okay, which looks like it takes longer to write? The faster one? If developers were concerned about taking the least amount of time writing codes, why did a lot of them choose the first method? Probably because it was just their style and didn't pay attention to it.

Your probably thinking, "Oh that is just a little piece of code, that can't possibly cause slowdown by itself." Yes, your right, it is a little piece of code and it isn't contributing very much to slowdown.

The thing is, when you have millions and millions of little pieces of slow code, it causes slowdown, and you can't fix it by optimizing any single one of those million little pieces of slow code by itself. You'd have to fix all of them. That takes forever and you wouldn't have had that problem if you'd just programmed fast the first time around.

If anybody here ever runs into that problem, instead of waiting till the end to nitpick millions of lines of code, a better way is making a habbit of optimizing your code as your writing it. It's good to optimize code by auto-pilot, so you don't have to intentionally think about it.

by on (#69934)
I both agree and disagree with your sentiments. You need to take into consideration the experience level of the programmer. Most people tend to write assembly code that isn't optimal during the first past (think of it as a first draft) and then go back and adjust things as needed. Hell, I've been doing 65xxx since the late 80s and I still write code this way.

You also need to keep in mind context. Your code example has no context; is this run within a loop? Within NMI? Etc.. If it's only done during RESET, who cares? A perfect example is some of the first-generation SNES initialisation code that circulated during the early 90s when the SNES development scene started; it wasn't optimised, but it was an initialisation routine, so who cared? I wrote a "more optimised" version of it, but then I thought about it for a while and determined... why bother? This really isn't optimisation given its context.

It's like running around with x86 assembly changing "mov ax,0" into "xor ax,ax" and claiming you're optimising everything.

Additionally, there's the need to comment such changes like your example. It should be ""obvious"" to anyone looking at it, but a simple comment can make all the difference when someone who didn't write it has to read it.

I'm not trying to battle with you or argue -- honest. I'm just saying that if your above optimisation example isn't a very good validator of your point, given that there's no context surrounding it.

Ease up, don't judge, and remember that different people have different experience levels. :-)

by on (#69937)
koitsu wrote:
remember that different people have different experience levels. :-)

So if I'm with other programmers, how do I keep from being under-leveled?

by on (#70028)
Quote:
You also need to keep in mind context. Your code example has no context; is this run within a loop? Within NMI? Etc.. If it's only done during RESET, who cares? A perfect example is some of the first-generation SNES initialisation code that circulated during the early 90s when the SNES development scene started; it wasn't optimised, but it was an initialisation routine, so who cared? I wrote a "more optimised" version of it, but then I thought about it for a while and determined... why bother? This really isn't optimisation given its context.


Is there any other example of first gen code that wasn't optimized? I've always wondered why early games slowdown so easily, but nobody gave me a straight answer on the specifics.

What percentage of CPU time was used for different routines? Collision, metasprite building, scrolling etc.

Was there any 6502 code left over from the NES days?

Was anything written in C, or translated from 68000 ASM?

Was most of the CPU time spent on AI and physics, or was most of the CPU spent on overhead work that is done every frame regardless of how much is onscreen?

by on (#70030)
psycopathicteen wrote:
Is there any other example of first gen code that wasn't optimized?

I've read that Tetris for Game Boy is really dirty code.

Quote:
Was there any 6502 code left over from the NES days?

The logic in Super Mario All-Stars, I guess. But then taking a program meant to run at 1.8 MHz and running it at 2.7 MHz or 3.6 MHz will probably help alleviate slowdown by itself.

Quote:
Was anything written in C

Probably everything by Koei, and possibly SimCity given that Micropolis (the simulation engine in SimCity) was written in C.

by on (#70424)
Anything from Gradius 3 and Super Ghosts and Gouls etc?

I want to know how the game engines of these games worked.

by on (#70465)
I had a conversation with Neill Corlett long ago about Gradius 3 and its slowdown, and the going theory was that it was intentional, rather than hardware limitations or "too much stuff going on". If I remember right (Neill if you're lingering please correct me :-) ), Neill had done some disassembly work but wasn't able to pinpoint exactly where all the time was being spent.

by on (#70471)
The slowdown actually helps at some points in the game as it gives you more time to react when all hell is breaking loose. However sometimes the slowdown is not consistent and it can get you killed when speed suddenly returns to normal. If someone could hack Gradius 3 to reduce or remove the slowdown it would be a neat hack to play.

by on (#70472)
Yeah this was so annoying in all Gradius games (not only 3) and Parodius too.
The NES version of Parodius is just crazy, it almost constantly lags if you have any option and if there is any enemy on the screen. How when the games jumps form lagging 1 frame (most common) to 2 frames, or 0 frames, it makes a sudden gameplay speed change, and this is pretty terrible to deal with.

Why Konami didn't speed up their routine / place less enemies on the screen when they noticed a systematical lag is beyond me. You'd expect such a well kown game company to do that.

by on (#70485)
I've noticed that slowdown was mainly a problem with Konami and Capcom. Some lesser known companies (such as Natsume) never had trouble with it.

by on (#70496)
Konami wasn't known for their coding prowess, but their game designs instead. Even their 68k arcade (twin 68k at that!) games slow down. I know on the PCE, they (Konami) have some poor optimizations in certain aspects of games (like sprite management and such), yet have clever higher level designs (like dynamic tiles). I think it's safe to say that Konami wasn't worried above speed or slow down. Maybe it's a Japanese developer thing? (Back then)

by on (#70505)
I don't think anyone understood what I said... or something.

The conversation I had with Neill indicated that the slowdown was intentional -- meaning the game, somewhere/somehow, was intentionally slowing the system down to increase the player's chance of survival. Again: it appears that this behaviour is **intentional**, and not a result of "sloppy coding".

by on (#70507)
Which reminds of when I played some Gradius game on the PSP. A coworker had brought it with him to show a passage in the game that he was unable to get past (some kinda rotating obstacles IIRC). I made it through on the first try.

At first he couldn't understand how I did it while he got killed all the time, but then he figured it out: I was pressing the fire button repeatedly even when there were no enemies on the screen - this slowed the game down sufficiently that getting through those obstacles became much easier.

by on (#70508)
koitsu wrote:
I don't think anyone understood what I said... or something.

The conversation I had with Neill indicated that the slowdown was intentional -- meaning the game, somewhere/somehow, was intentionally slowing the system down to increase the player's chance of survival. Again: it appears that this behaviour is **intentional**, and not a result of "sloppy coding".

It's probably better to get a consistent 30 FPS than something randomly inbetween 30 and 60 FPS based on CPU usage. It could be they noticed that some parts of the game were too CPU intensive and put flags in there to deliberately make it 30 FPS in those parts.

by on (#70511)
Some later games have slowdown as an ability. I've seen it in at least Max Payne series, Amplitude, and D+Pad Hero 2. I've read about it in Enter the Matrix, where the ability is called "focus".

by on (#70512)
I think DKC has intentional slowdown everytime there are more than approx 32 16x16 sprite cells onscreen, due to DMA limits. Never studied the code, but it is a possibility.

by on (#70514)
psycopathicteen wrote:
I think DKC has intentional slowdown everytime there are more than approx 32 16x16 sprite cells onscreen, due to DMA limits.

Yeah, that's 4 KiB, and I've been told the limit on NTSC is 7 KiB per frame. But it could have just started skipping frames of animation instead of changing the gameplay speed.

by on (#70516)
Battletoads delayed the animation switching to the next frame when it was updating too many tiles (like what happens in 2 player mode.

by on (#70518)
Well Battletoads (NTSC) is pretty well known for all sorts of clever tricks to get the most out of the NES. I guess on PAL it's less impressive with all the extra VBlank time.

by on (#70519)
Now that we're on the subject of animation, how do "animation slots" work. Most games don't have uniform sized sprites. Smaller slots mean smaller objects/characters, but bigger slots mean less objects/characters onscreen.

by on (#70523)
In fact aside of Battletoads, it was extremely uncommon to use such as "slot" system on the NES, as opposed to what is seen on other systems. The low ROM->VRAM bandwith is the reason for this.
99% of games just have banks of graphics which are loaded once in a level, or that are switched as animations need to change, such as in SMB3, allowing for more a detailed main character and / or more main characters.

by on (#70525)
If I was designing a VRAM system, I'd use a cache system, where stuff gets loaded in and out as needed.
If you need dynamically created graphics, mark something as locked so you can draw there, but it's still in the table.

But I only like caching systems because I made a caching system for PocketNES GBAMP, and a caching system is my new hammer, so I'm looking for nails. But I still think VRAM management makes a good nail. Caching systems are good any time you have something smaller that's limited, and something less accessible that's big.

A caching system for VRAM is nothing like a slot system, since slot systems tend to keep the locations of sprites fixed, so new graphics overwrite old graphics.

by on (#70526)
NES games that have very detailed animation going on are probably going to use CHR-ROM and if possible 1K CHR Banks. As Bregalad said you have too little time to use CHR-RAM and still update all the patterns you need to. Battletoads manages to pull off alot. But it is at the top of the pack. It pushes the limits probably. Most games work on a much slower pace only loading CHR-RAM between scenes. Or like 1943, it will slowly update CHR-RAM as you progress through the level so by the time you get to where new tiles are used they are loaded.

Some NES games, like Crisis Force, use CHR-ROM 1K banking for some really neat effects that happen just because of changing the pattern data banked in. Like in the first level there is a vertical running trench that parallax scrolls.

On SNES with DMA you can probably afford to just DMA alot of patterns from ROM or WRAM banks $7E&7F. But you'll still have limits. You just have to decide what works best for you and your game.

by on (#70527)
Don't forget Nightmare on Elm Street. It also uses a Battletoads-like system, even with 4-player support.

by on (#70596)
Dwedit wrote:
If I was designing a VRAM system, I'd use a cache system, where stuff gets loaded in and out as needed.
If you need dynamically created graphics, mark something as locked so you can draw there, but it's still in the table.

But I only like caching systems because I made a caching system for PocketNES GBAMP, and a caching system is my new hammer, so I'm looking for nails. But I still think VRAM management makes a good nail. Caching systems are good any time you have something smaller that's limited, and something less accessible that's big.

A caching system for VRAM is nothing like a slot system, since slot systems tend to keep the locations of sprites fixed, so new graphics overwrite old graphics.


How does that work? That sounds really complicated.

by on (#70599)
psycopathicteen wrote:
How does that work? That sounds really complicated.


Fun thing is though that this algorithm was originally designed for page replacement for processor caches, or for paging for virtual memory, but it can be used in many different places too.

So let's say it's just a way of having a huge number of "virtual" tile numbers from some expansive bank of ROM, and far less "physical" tile slots to put them in.
You could do it for the background, so a background could be specified just using a wide range of virtual tile numbers, then they are loaded and unloaded as needed when they become physical tile numbers.
Or you could do it for sprites too.

Data structures used:
* Mapping between virtual tile numbers => physical tile numbers. A lookup or hash table works here, but pure lookup tables might be too big. Needs to be fast, because there are a ton of lookups.
* Mapping between physical tiles => virtual tile numbers. It's so the you know for each physical tile which virtual tile is mapped in there.
* Reference Count or "Locked" flag for physical tiles so you know which ones are in use, so they shouldn't be expired.
* If you want, a "this was used recently" bit to indicate give a tile a second chance before it's replaced.
* A cursor (just an integer) telling which tile gets replaced next.

Operations:
* The only real operations are Fetch, Lock/Increment reference, and Unlock/Decrement reference.
* Fetch: You request a given tile, and it returns where it is, or replaces something so you can get it. If you use the second chance bit, indicate that it gets a second chance.
* Replace: Called from fetch. Skip everything that's locked. If it has a second chance, it loses that, and skip it.
* Pre-Load: Maybe you want stuff to be in the cache even though it isn't currently being used yet.

How to use it for backgrounds:
* When tiles go in (exposing new area for scrolling), check if the tile is in the cache, increment it.
* When tiles go out, decrement it.
* For animated tiles, you have two choices, either animate the physical tiles and leave the virtual tiles constant, or change the virtual tiles on the map.

How to use it for sprites:
* Look up everything before you do any replacements, so you don't throw out actively used stuff.

by on (#71374)
I just thought up a fairly simple slot-based animation system that would work well on the SNES. Divide the sprite pattern table into 16 slots of 8 16x16 sprites. Any enemy or objesct can look for 1 or several unused slot, use the slot, or retire the slot. Slots can be used for real time animation updates for enemies as big as 32x64, or slots can be used to hold several needed animation frames for smaller characters and objects that are used within a specific area.