Minimizing expansion of CHR converted for real-time updates

Minimizing expansion of CHR converted for real-time updates
by tokumaru on 2012-08-22 (#98448)

In order to maximize the number of patterns I can update during VBlank, I thought of converting the graphics to a series of LDA #$XX; STA $2007 followed by an RTS, which would send tiles to VRAM the fastest way possible (6 cycles per byte). Of course this has the inconvenient side effect of expanding the data by a factor of 5. In order to minimize that, I implemented a simple compression scheme that uses a 3-byte dictionary (the dictionary is stored in A, X and Y), which usually reduces the expansion to about 4 times rather than 5. Still incredibly huge though. Can anyone think of better ways to minimize that expansion?

I imagine that I could reuse longer strings if I made them separate subroutines that could be called as many times as necessary, or maybe make a more advanced analysis of the data and generate algorithms that would produce the desired output (this sounds too complex!). One more thing to worry about is that the generated code can't take longer than a certain threshold to execute, since it has to fit in VBlank along with other tasks, so the converter would have to count cycles and use that information to make decisions.

Obviously, I wouldn't use this for all the graphics, just for the ones that need to be animated more frequently, such as the main character. Other graphics can just be loaded with indexed absolute addressing or even the slow LDA ($XX), Y; STA $2007; INY way.

I'd like to point out that I'm aware of other methods of uploading data to VRAM, such as pulling data from the stack (8 cycles per byte) or loading from ZP (7 cycles per byte), but these are still too slow and require a good chunk of RAM. The only way to achieve 6 cycles per byte is with immediate addressing, and without WRAM, the code that writes the data does have to be in the ROM.

Re: Minimizing expansion of CHR converted for real-time upda
by Bregalad on 2012-08-22 (#98453)

Why not generate the long lda #$xx / sta $2007 chain in RAM or WRAM in real time ? Of course if you plan to do this while an actual game is playing WRAM will be pretty much required, but for an intro plain RAM could be just enough - there is just enough space in $300-$7ff to store 256 "transfers".

If I remember well already done it at a time just to try, and if I remember well transfering 256 bytes (16 tiles) in a single VBlank + doing sprite DMA was no problem (on NTSC - on PAL that would be no problem even using a fully rolled loop).

Re: Minimizing expansion of CHR converted for real-time upda
by rainwarrior on 2012-08-22 (#98465)

Battletoads does a lot of its CHR-RAM updates this way with code copied to RAM. Obviously you can't practically max out your usage without WRAM, but it could be worth devoting some RAM to it.

Re: Minimizing expansion of CHR converted for real-time upda
by tokumaru on 2012-08-22 (#98467)

Yeah, the obvious solution would be to use WRAM, but that's not always available. I never use WRAM, I'm not sure why. Maybe it's to keep the cost down if I decide to make carts, or even just to see how far the NES can go without such extensions. Without WRAM, there would be no memory left for the actual game.

Bregalad wrote:

if I remember well transfering 256 bytes (16 tiles) in a single VBlank + doing sprite DMA was no problem

Exactly. My game is supposed to execute 2 VRAM updates + sprite DMA every frame, and if both slots are occupied by pattern updates, 256 bytes of CHR (16 tiles) will be transferred. That won't happen so often, since other updates (rows and columns of tiles, palettes, etc.) also have to use those slots.

I know that this is a crazy solution, and I know that the amount of ROM required is a big price to pay. I'm just trying to reduce that cost a bit, I'm not looking for anything miraculous that will make the quick CHR code occupy the same space as regular CHR data.

Re: Minimizing expansion of CHR converted for real-time upda
by 3gengames on 2012-08-22 (#98468)

32KB of WRAM is IMO the best answer, 8KB or even 16KB can hold a ton of unrolled code!

And then your game has a ton more RAM to work with in general.

Re: Minimizing expansion of CHR converted for real-time upda
by tokumaru on 2012-08-22 (#98469)

rainwarrior wrote:

but it could be worth devoting some RAM to it.

With just 2KB of RAM, maybe dedicating 128 or 256 bytes to CHR data would be realistic (although in my case I really can't spare that much), but that will only allow you to copy as fast as 7 cycles per byte (if you put the data in ZP, otherwise it would take 8 cycles per byte and there would be no advantage at all). To go as fast as 6 cycles per byte you need 5 times the space, which would be prohibitive with such little RAM.

I'm just looking for a way to make the expanded code a little smaller. If I can't find a way, I'll still use the simple 3-byte dictionary compression I already implemented, and keep the amount of tiles stored this way to a minimum (i.e. just the main character and some animated level objects).

EDIT: Just in case this isn't clear: for this particular game, I'm not using WRAM. I will take the data expanded 5 times rather than using WRAM.

Re: Minimizing expansion of CHR converted for real-time upda
by thefox on 2012-08-22 (#98470)

Well, here's a pipe dream: wait for somebody to come up with a super mapper that allows one to upload PPU updates to the mapper (FPGA blockram mayhaps), and then have the mapper generate the LDA #imm STA pairs on the fly.

Re: Minimizing expansion of CHR converted for real-time upda
by tepples on 2012-08-22 (#98476)

A proper "super mapper" would take full control over the CHR bus and implement what kevtris calls a "stuffer": a FIFO of (address, data) pairs to execute on cycles when the PPU isn't doing anything that matters, such as the fetches for the 34th tile on a row or the nametable fetches between the sprite pattern fetches.

But it'd almost be easier to use mapper 119, which puts CHR ROM in banks 0-63 for the main character and CHR RAM in banks 64-71 for everything else.

Re: Minimizing expansion of CHR converted for real-time upda
by tokumaru on 2012-08-22 (#98478)

tepples wrote:

But it'd almost be easier to use mapper 119, which puts CHR ROM in banks 0-63 for the main character and CHR RAM in banks 64-71 for everything else.

I could very well use the MMC3 (which is common as hell) with CHR-ROM and not have to worry about updating patterns at all. But I still want to see how well the NES can perform when using the same style of pattern updates that's common on other platforms which don't have the possibility of using ROM for storing tiles.

The Master System is an 8-bit console that does extremely well in this area. Since it doesn't have sprite flipping on hardware, it's pretty much mandatory that the patterns are constantly rewritten. Most games animate the main character this way, and a good number of them animate other objects as well.

On the NES this is less common. Sprite flipping makes it possible to keep all animation frames loaded at all times on simpler games, and more complex games usually went with CHR-ROM and bankswitching. The few games that made heavy use of CHR-RAM were either PAL-only or resorted to forced blanking in order to be able to transfer the amount of data necessary for the animations.

According to my calculations, the NES can handle quite a lot of pattern animation without the need for forced blanking, but the speed of 6 cycles per byte would really help.

Re: Minimizing expansion of CHR converted for real-time upda
by Bregalad on 2012-08-23 (#98519)

You're saying you have 2 slots for VRAM updates, and that both can point to palettes, NT/AT or Patterns ?
It might sound like a silly question, but...
Would your game still work if only one of those slots could update patterns ? That way the "worst case" you're thinking about will never be happening.

Why would you update 16 tiles in a frame to update 0 on the next when you could update 8 tiles on both frames ?

Nevertheless if you have to do it the way you said, even though it wastes ridiculous amount of ROM I'd say it's an interesting idea.

I think it could be optimized by doing the following :
- Load the most 2 commonly used bytes in X and Y at the start of the code, and never touch them again. This will save some lda #$xx instructions.
- When there is "runs" of multiple identical bytes, do only have a single lda #$xx

However this won't reduce much the size of your code, just a little.

Re: Minimizing expansion of CHR converted for real-time upda
by tokumaru on 2012-08-23 (#98525)

Bregalad wrote:

You're saying you have 2 slots for VRAM updates, and that both can point to palettes, NT/AT or Patterns ?

Yes, there can be 2 pattern updates or 2 NT/AT updates in the same frame, but this is not necessary for the palette, since it's completely overwritten (i.e. once a slot has been allocated for a palette update it doesn't matter how many objects modify the palette, it will all be updated at once).

Quote:

It might sound like a silly question, but...
Would your game still work if only one of those slots could update patterns ?

It would work, but since the main character is practically always changing tiles it would be kinda hard to animate anything else. I'm afraid that some animations would become noticeably laggy.

Quote:

Why would you update 16 tiles in a frame to update 0 on the next when you could update 8 tiles on both frames ?

The idea is indeed to update 8 tiles per frame most of the time, while the other slot is used for other things. If the other slot is free however, using it for pattern updates will help keep the animations flowing smoothly, because there might be frames where none of the slots are used for patterns (such as when scrolling diagonally, when one slot is used for updating rows and the other for updating columns - scrolling updates really can't be delayed, because the visual glitches are too noticeable).

Quote:

Nevertheless if you have to do it the way you said, even though it wastes ridiculous amount of ROM I'd say it's an interesting idea.

Yeah, that's what I think. Every unconventional programming solution has its disadvantages, and in this case the downside is the huge amount of ROM it costs to store the graphics. But I can live with that if only some of the graphics use this technique.

Quote:

I think it could be optimized by doing the following :
- Load the most 2 commonly used bytes in X and Y at the start of the code, and never touch them again. This will save some lda #$xx instructions.
- When there is "runs" of multiple identical bytes, do only have a single lda #$xx

I'm doing something similar, but instead of keeping two values loaded at all times I'm throwing away the value that will take the longest to be used again when a new value has to be loaded. I might try your method and see which one is better.

Re: Minimizing expansion of CHR converted for real-time upda
by tepples on 2012-08-23 (#98526)

Consoles where this was widespread had either DMA to VRAM during vblank or dual-ported VRAM or both. The SMS and Genesis in particular have pseudo-dual-ported VRAM implemented using what kevtris calls a "stuffer". It has two different VRAM addresses instead of the NES PPU's single loopy_v: one used for rendering and one used by the host. Games write to VRAM through the host port, and the VDP holds on to the address and data until the next idle cycle in the scanline.

Now consider how this could be simulated on the NES. There are 341 dots, and it takes about 24 dots to add one item to the FIFO. Thus the FIFO would need to be about 14 units deep. A mapper simulating this would have plenty of time to empty the FIFO. Out of 170 video memory reads per scanline, the data sent to the PPU doesn't matter for 22 reads that occur in or near horizontal blanking: four for the thirty-fourth background tile, two for the nametable fetch before each sprite pattern fetch, and two at the end of the scanline. So a mapper that controls all 13 CHR RAM address lines could sit between the PPU address bus and the CHR RAM, watch the nametable access pattern, and execute writes from the FIFO at those times.

tokumaru wrote:

throwing away the value that will take the longest to be used again when a new value has to be loaded

The problem of optimizing the use of three registers is equivalent to cache algorithms. What you describe is the clairvoyant algorithm, which can't run in real time but is optimal when run offline: "When a [value] needs to be swapped in, the [compiler] swaps out the [value] whose next use will occur farthest in the future." I wonder to what extent you can save bytes by planning out which values can be calculated with ASL/LSR/ROL/ROR (and thus kept in A) or with DEX/INX/DEY/INY (and thus kept in X or Y).

Re: Minimizing expansion of CHR converted for real-time upda
by Bregalad on 2012-08-23 (#98529)

Quote:

It would work, but since the main character is practically always changing tiles it would be kinda hard to animate anything else. I'm afraid that some animations would become noticeably laggy.

Then give the main character's animation update higher priority.

Does the player change frame every frame ? No, very unlikely, even if you have very detailed graphics frames of animation will last at least 4 hardware frames. I think it's affordable to have 1/4 of probability delay other updates of a single frame, and it won't be that noticeable. Unless you use re-writable patterns for all enemies/whathever, but that would not be a good idea on the NES anyway.

Re: Minimizing expansion of CHR converted for real-time upda
by cpow on 2012-08-23 (#98530)

tepples wrote:

I wonder to what extent you can save bytes by planning out which values can be calculated with ASL/LSR/ROL/ROR (and thus kept in A) or with DEX/INX/DEY/INY (and thus kept in X or Y).

Good god this sounds like an extremely interesting technical challenge! So much more so than the fire-drill remote debugging drudgery I'm currently enslaved in at work.

Given any set of bytes (tiles or NT/AT), what is the shortest NES 6502 code segment (bytes *and* cycles) that can achieve copying said set to the PPU? Hah!

I can imagine the 'compiler' would take several phases:

1. Analyze for RLEable groups.
2. Analyze for "nearest neighbor groupings" in the (-2,+2) domain since more than two INX/DEX mightaswell just use LDX.
3. Analyze for "nearest neighbor groupings" in the ASL/LSR and ROL/ROR domain.
4. Analyze for "repeat patterns (chains of bytes longer than 7 bytes perhaps?) that don't fit anything already found" that can be written using a loop and a table of $2006 values for starting place.
5. Emit optimized code.
6. $?

Obviously, though, since it has to be lossless there's always the tileset that's going to result in 0% compression over the original suggestion.

Re: Minimizing expansion of CHR converted for real-time upda
by tepples on 2012-08-23 (#98532)

cpow wrote:

2. Analyze for "nearest neighbor groupings" in the (-2,+2) domain since more than two INX/DEX mightaswell just use LDX.

As I understand the problem that tokumaru stated, we're trying to minimize cycles first and then break ties by minimizing bytes. Naive application of the clairvoyant method will already minimize cycles and given an upper bound for bytes. A single INX, DEX, INY, DEY, ASL, LSR, ROL, or ROR can save one byte and zero cycles over LDX #, LDY #, or LDA #; more than one will waste cycles.

Re: Minimizing expansion of CHR converted for real-time upda
by tokumaru on 2012-08-23 (#98534)

tepples wrote:

What you describe is the clairvoyant algorithm, which can't run in real time but is optimal when run offline: "When a [value] needs to be swapped in, the [compiler] swaps out the [value] whose next use will occur farthest in the future."

Yeah, that's exactly it!

Quote:

I wonder to what extent you can save bytes by planning out which values can be calculated with ASL/LSR/ROL/ROR (and thus kept in A) or with DEX/INX/DEY/INY (and thus kept in X or Y).

That's a pretty good idea! The difficult part would be deciding whether to use the accumulator or an index register to load a new value, since you'd have to predict what kind of modifications that value would go through in the future.

Bregalad wrote:

Then give the main character's animation update higher priority.

But then the background animations might feel laggy.

Quote:

Does the player change frame every frame ? No, very unlikely, even if you have very detailed graphics frames of animation will last at least 4 hardware frames.

Certainly not every frame, but some effects (quick attacks, visual deformations, etc.) would need new graphics every other frame.

Quote:

Unless you use re-writable patterns for all enemies/whathever, but that would not be a good idea on the NES anyway.

In addition to the main character, I have to animate waterfalls, flowers, power ups, and other background objects. These will probably animate at a steady rate of 15 frames per second, and since there could easily be 4 of these animated items in a level it would be tough if everything had to share 1 slot. Still, I don't have the RAM to hold patterns, so for me this isn't an option anyway.

tepples wrote:

As I understand the problem that tokumaru stated, we're trying to minimize cycles first and then break ties by minimizing bytes.

My rule is that the "compressed" code can't take longer to execute than the raw LDA STA chain. I'm considering making the encoder count how many cycles it saves by not loading values that were already in one of the registers and use those cycles for extra compression (such as using subroutines for longer repeated patterns). JRS + RTS takes 12 cycles, so only after avoiding 6 load operations I would be able to call a subroutine once (obviously, any savings inside the subroutine will be counted as many times as the routine is called).

Re: Minimizing expansion of CHR converted for real-time upda
by infiniteneslives on 2012-08-23 (#98538)

So what about running with 32Kb of VRAM? There is no added cost because 32Kb is usually cheaper than 8Kb. There weren't any NES boards with this config, but it's simple enough to just use SRAM instead of ROM. Then you could bankswitch for your animations, and still not have tiles stored in ROM.

In regards to the super mapper wouldn't dual ported memory end up being a better and easier route?

Re: Minimizing expansion of CHR converted for real-time upda
by tepples on 2012-08-23 (#98539)

How 32 KiB CHR RAM would work depends on what CHR bank size you were thinking of. 1 KiB? 2? 4? 8?

As for dual ported memory, what part number were you thinking of?

Re: Minimizing expansion of CHR converted for real-time upda
by Shiru on 2012-08-23 (#98540)

A very common usage for the VRAM update is the main character graphics, as there is only one copy of the main character on the screen. VRAM bankswitch won't help with this, as 8 frames would be not enough for main character animation, but may in fact make things more complicated if other sprite animation is needed along with the main character animation.

Re: Minimizing expansion of CHR converted for real-time upda
by tokumaru on 2012-08-23 (#98541)

infiniteneslives wrote:

The extra memory may not add to the cost, but but a mapper that can bankswitch CHR certainly will. The mapper I'm using is just 1 discrete logic chip, and I don't plan on changing that.

I'm familiar with these other solutions some of you have been suggesting, and they are not what I'm looking for, I'm just trying to slightly improve the one I'm already using (and if I can't improve it, I'll not feel bad about using it as it is). I know this scheme is crazy and most of you would rather use any of the more conventional ways of handling CHR animations, but I think that the advantages of the technique I'm using outweigh the disadvantages in this case.

Re: Minimizing expansion of CHR converted for real-time upda
by lidnariq on 2012-08-23 (#98542)

tokumaru wrote:

The extra memory may not add to the cost, but a mapper that can bankswitch CHR certainly will. The mapper I'm using is just 1 discrete logic chip, and I don't plan on changing that.

With just one IC, I'm guessing you're using a 74'161. I don't know how suppliers work in Brazil, but at least in the US, Mouser is selling 74'377s for 20c/1 and 74'161s for 25c/1, so it is likely that you could add CHR banking if you wanted at no incremental cost.

Re: Minimizing expansion of CHR converted for real-time upda
by infiniteneslives on 2012-08-23 (#98543)

You could still have 32Kb VRAM with a single chip discrete mapper, but you'd be limited to 8Kb bank sizes. Super GNROM style I guess.

Tepples: we're kinda straying from the topic but... I was thinking mmc3 so you'd get 1/2Kb banks, but if you're designing the mapper yourself you could do whatever you please. As for part number, I'm referring to the lattice machXO2 CPLD (1200HC) with 7Kb dual ported RAM. It'd be your mapper, SRAM, save flash, Synth, and capable of MMC5ish capabilities and all for $6-7. There would be some other added costs but not much more than $1 depending.

Re: Minimizing expansion of CHR converted for real-time upda
by tokumaru on 2012-08-23 (#98544)

lidnariq wrote:

With just one IC, I'm guessing you're using a 74'161.

Yup.

Quote:

I don't know how suppliers work in Brazil, but at least in the US, Mouser is selling 74'377s for 20c/1 and 74'161s for 25c/1

I just checked and I can get the '161 for about the same price and the '377 for twice the price at Mouser.

Quote:

so it is likely that you could add CHR banking if you wanted at no incremental cost.

But not fine CHR banking, which is the kind that's useful for CHR animations. Swapping the whole 8KB or even 4KB is useless for this purpose. Background + main character animation is only feasible with MMC3-class mappers.

Re: Minimizing expansion of CHR converted for real-time upda
by Bregalad on 2012-08-23 (#98546)

Quote:

But then the background animations might feel laggy.

I seriously doubt.
If for example you're animating water, and rewrite it's tiles every 10 frames, and sometimes it would be delayed by a frame if the main character had been updated on the frame where water was supposed to animate.
So instead of having a pattern like :
10 - 10 - 10 - 10 - 10 - 10- 10 - 10 -....

We would have something like :
10 - 10 - 11 - 9 - 10 - 10 - 10 - 11 - 9 - 10 - ....

I doubt any human beings would note any difference.

Of course if you're animating the water in a 2-frame sequence it's another story, but who does this really ?

Re: Minimizing expansion of CHR converted for real-time upda
by infiniteneslives on 2012-08-23 (#98561)

tokumaru wrote:

I agree, if you really didn't want to break down and drop the dollar or two for the CPLD (and make a new mapper) that would allow fine bank switching it's not much of an option. Personally, I don't really consider that much of a cost difference though. That cost is easily recoverable with PCB quantities.

Your goal is to utilize the simpler, more challenging route though and I can understand/admire that as well.

Re: Minimizing expansion of CHR converted for real-time upda
by tokumaru on 2012-08-24 (#98570)

infiniteneslives wrote:

It's not just cost, you know? It's the complexity in general... Mapper design is something I find very interesting, but I'm not as good with hardware as I am with software (by far!). And there's the whole problem of modifying emulators to support the mappers and the fact that I can't manufacture carts myself... I really don't need that kind of complication.

Quote:

Your goal is to utilize the simpler, more challenging route though and I can understand/admire that as well.

It kinda is an experiment to see how complex a game can look (and not necessarily be) without deviating much from the basic setup. That means using a simple mapper just to be able to use more PRG-ROM and keeping raster effects, timed code and forced blanking to a minimum.

Re: Minimizing expansion of CHR converted for real-time upda
by infiniteneslives on 2012-08-24 (#98574)

tokumaru wrote:

Yeah I fully realize that. Coming from the opposite end of the software/hardware spectrum most of my projects are trying to battle those issues.

I think it's good to take your route and try to squeeze every last ounce out of what's available with the NES and a simple discrete mapper. Obviously there is a lot that can be done with that alone. Then if one should be so lucky to get to the point of working on a sequel, you can fully appreciate and utilize the hardware expansions. Really that's what many developers did back in the day with this system, just look at SMB and SMB3.

Re: Minimizing expansion of CHR converted for real-time upda
by tokumaru on 2014-01-28 (#124653)

Sorry for the bump, but today I found myself thinking about this again...

tepples wrote:

I wonder to what extent you can save bytes by planning out which values can be calculated with ASL/LSR/ROL/ROR (and thus kept in A) or with DEX/INX/DEY/INY (and thus kept in X or Y).

How would you go about planning this? I mean, using INX/DEX/INY/DEY/ASL/LSR/ROL/ROR as a "bonus" instead of LDA/LDX/LDY when possible would be easy (and probably wouldn't happen very often), but actually planning which registers to use ahead of time is way beyond my capabilities... Can anyone think of a way to do this?

Ideally, cpow's idea of the generating the optimal 6502 sequence that would generate the desired data (even using JSR/RTS for repeated patterns and such) would be used, but an algorithm for doing that would be insane to code!

Re: Minimizing expansion of CHR converted for real-time upda
by infiniteneslives on 2014-01-29 (#124681)

I'm not sure about that specifically. But revisiting this problem my question would now be, how big is too big? Or how close are you to your limit? What is your limit? That was the main problem with your fast method after all right? There isn't a significant cost difference in chip prices between 128KB and 512KB, and the mapper cost doesn't change for a BNROM mapper of that size.

It's just my opinion, but some pennies just aren't worth pinching. Rom size is one of those provided you're under 512KB and have already expanded beyond NROM. Not to mention if you trip and fall while trying to pick up too many pennies, the end result may be that you never complete your adventure.

Re: Minimizing expansion of CHR converted for real-time upda
by tokumaru on 2014-01-29 (#124682)

You're right. I'm already using BNROM, and I'm shooting for 256KB of PRG-ROM. I guess I could easily bump the PRG-ROM to 512KB, but since I don't plan on using these "fast pattern updates" that much (maybe 8KB at most, which would expand to 40KB without optimizations) I doubt that'll be necessary.

infiniteneslives wrote:

Not to mention if you trip and fall while trying to pick up too many pennies, the end result may be that you never complete your adventure.

You're absolutely right... one of the reasons I don't advance with my projects as much as I'd like is that I get caught up in unimportant details like this. I should learn to focus on what really matters.

Re: Minimizing expansion of CHR converted for real-time upda
by Celius on 2014-01-30 (#124713)

It's funny, because my biggest road block has been trying to complete my project using NROM specifications. I had very limited space left for data after creating the engine and allotting space for tile data (used by the entire game), and I still had some other game modes to program. I recently decided to up the ROM and CHR up to 64k each because I knew it was preventing my project from going anywhere. And now that I'm thinking about it, I may just jump to 128k PRG and use CHR RAM...

But honestly, I think the idea of the fast patterns is a good one. There are several components in my engine where data is stored as code, for the sake of speed, and reducing the complexity of the engine. How many tiles are we talking about? You might also consider a sort of "split" method. You don't have to choose between hardcoded only and updates from RAM only. You might consider having the most frequently updated tiles be hardcoded, and then have space for a few tile updates in RAM (not storing code in RAM, but just tile values).

Re: Minimizing expansion of CHR converted for real-time upda
by tokumaru on 2014-01-30 (#124716)

Celius wrote:

It's funny, because my biggest road block has been trying to complete my project using NROM specifications.

Sometimes we think that having less space to fill will help us finish projects faster, but when you find yourself struggling to save space you can be sure the whole idea backfired. You should definitely increase the storage space in this case.

Quote:

How many tiles are we talking about?

I guessed around 512 tiles before, but I honestly have no idea. I can see the main character and level background animations using this technique.

Quote:

You don't have to choose between hardcoded only and updates from RAM only. You might consider having the most frequently updated tiles be hardcoded, and then have space for a few tile updates in RAM (not storing code in RAM, but just tile values).

Definitely, I never considered using the fast graphics exclusively. My engine can update 6 tiles per update slot (there are 2 sots per VBlank) in normal mode (direct copy from ROM or RAM) and 8 tiles in quick mode.

Re: Minimizing expansion of CHR converted for real-time upda
by infiniteneslives on 2014-01-30 (#124727)

Glad I could help! Yeah, if you read a few posts back you can see I too got caught up on fancy ideas such as throwing a bunch of hardware at the problem. Back in the 80's there was significant savings to be had for halving the size of your rom. But now, rom/ram bits are dirt cheap. It's actually the packaging, assembly, and other near flat costs per chip that make up the better part of the memory cost. Back in the day it was all about die size, this may still be somewhat true with mask roms today though. That and we will never reach the volumes they did back then to realize large savings overall from a small savings per cart.

Not to mention if the desire is for a digital release, the only benefit to be had aside from enjoyment to be had while optimizing.

Don't get me wrong, there is still savings to be had for smaller roms and simpler mappers in production. But even if you save a few dollars on the hardware needed, you still have to actually complete it before that savings can be realized. Once things are mostly complete one can always go back and optimize things if there is a need/benefit to do so. Just look at Driar for example, something like ?128KB? MMC1 down to NROM ~year after it was released.

Celius wrote:

I recently decided to up the ROM and CHR up to 64k each because I knew it was preventing my project from going anywhere. And now that I'm thinking about it, I may just jump to 128k PRG and use CHR RAM...

This doesn't apply if the desire is to build carts on donors or you consider old stock eproms new:
But if you're building a cart with new widely available parts, the cheapest rom size available is 128KB practically speaking. The NROM boards I produce actually use two 128KB flash roms trimmed down to 8/32KB each. So in reality if your mapper already supports 128KB, there is no cost difference between 64KB and 128KB. It ends up being a question if you want to waste/reserve half the chip or not...

Re: Minimizing expansion of CHR converted for real-time upda
by Celius on 2014-01-30 (#124734)

I'll probably just end up going with 128kb PRG and using CHR RAM. I'm not even sure how this game will end up being distributed. I'm kind of dumb when it comes to hardware, but I'm assuming something with these specifications could easily be produced? I'm going to convert the project to MMC3 for now, but I'll treat banks as if they're 16kb and only switch out $8000-$BFFF, and I won't be using anything like the scanline counter. That way the project should be pretty easy to convert to another mapper, if needed.

And I guess part of the problem is that you feel your project has to meet certain standards if it has high enough specs. Like if you up the amount of CHR data, you think "should I really be reusing that one tile all over the place, when I have the ability to make more unique patterns now?" It can sometimes create more work once you realize you have more resources.

Re: Minimizing expansion of CHR converted for real-time upda
by lidnariq on 2014-01-30 (#124737)

infiniteneslives wrote:

Just look at Driar for example, something like ?128KB? MMC1 down to NROM ~year after it was released.

Yeah, it was 128KiB SGROM originally.

Quote:

The NROM boards I produce actually use two 128KB flash roms trimmed down to 8/32KB each.

I was going to hope that maybe that meant ROM+RAM would be better, except that the cheapest 5V ROMs are still cheaper (only by a dime or so) than any size of 5V RAM...

That's funny, that makes it look like the Color Dreams board is currently the cheapest board to manufacture.

Re: Minimizing expansion of CHR converted for real-time upda
by infiniteneslives on 2014-01-30 (#124743)

lidnariq wrote:

I was going to hope that maybe that meant ROM+RAM would be better, except that the cheapest 5V ROMs are still cheaper (only by a dime or so) than any size of 5V RAM...

Yeah that's pretty accurate by my account as well.

Quote:

That's funny, that makes it look like the Color Dreams board is currently the cheapest board to manufacture.

Cheapest non-NROM board, yes. And assuming that there isn't money to be saved from a 4bit '161 compared to a 8bit flipflop. If you can save money by going with a '161 then GNROM would be cheaper.

Colordreams certainly is the cheapest if you require all the memory on chip to be available for use. I've always considered CHR-RAM superior to CHR-ROM. But with 16x8KB banks of CHR-ROM, perhaps it's really not that limiting, and it leaves more space in a minimal 128KB PRG-ROM chip. Never thought of this either, but with Colordreams, the obvious choice for saves on flash would be putting them on a CHR-ROM bank. So with further consideration, unless you're really looking to take advantage of CHR-RAM for graphic effects, there are a lot of benefits to going with mapper #11. Additionally the expansion of up to 512KB PRG-ROM is trivial, so you're not limited to 128KB PRG.

Re: Minimizing expansion of CHR converted for real-time upda
by Bregalad on 2014-01-31 (#124758)

@Celius : It sounds like you should use UNROM then. Until you need to change mirroring or have SRAM, in which case you should use MMC1 (SGROM/SNROM). You even get the bonus of having the possibility to do single-screen mirroring, that MMC3 can't do. I see no reason to use MMC3 in your case.

Back on topic (well off-topic) I think it's fun to try to place as much stuff as possible in as little ROM as possible. It makes us face the same kind of problems that engineers had back in the day. The key is to profile the thing correctly, and work hard on optimizing the largest stuff. In my case the largest stuff is maps, which takes a hell of a space, even with RLE compression. So I'm going to apply a more advanced compression to save more space.
I have already made a fun compression scheme for my metasprites that allows trivial ones to be stored in few bytes while allowing for maximum flexibility (not gird aligned sprites, multiple colours, etc...) at the same time. I'm proud of it !

@Tokumaru, It's very smart to store the few animated graphics this way, and that's don't prevent you to apply a strong compression to other graphics, probably being overall more efficient that if all graphics were stored the plain way (uncompressed).

Re: Minimizing expansion of CHR converted for real-time upda
by tepples on 2014-01-31 (#124768)

More stuff in less ROM helps when you're trying to include contributions from several different developers in one multicart. It's why the 2011 compo was limited to NROM and the 2014 compo is limited to 64K.

Re: Minimizing expansion of CHR converted for real-time upda
by tokumaru on 2014-01-31 (#124771)

tepples wrote:

More stuff in less ROM helps when you're trying to include contributions from several different developers in one multicart.

Yeah, this is indeed a concern. I don't expect this particular project to end up in any multicarts though.

Quote:

It's why the 2011 compo was limited to NROM and the 2014 compo is limited to 64K.

It is? Guess I didn't read the rules very carefully. I probably won't have time to develop anything for the compo anyway, but if I did it would definitely be within the 64KB limit.