So, for mode 0 (or just generally, anything 4/16 colours) the first big to-do item is creating some conversion to palettized tiles. If you want to use background-processor as is, note that even at
a single 16-color palette for an entire image there's approximately zero difference adding or removing HDMA for photographic contents. It would need the image to be tiled to expand the palette sufficiently to get more interesting results.
Certainly, generating tiled images is on my distant radar, but getting decent results for that is a computationally hard problem that I haven't grappled with yet, and I've got more interesting fish to fry in the meantime. I am aware of that
script Khaz wrote awhile back for generating 16-color palette images, which may be a point of inspiration, but I'd want to pick at the problem enough to get, at least, a more performant solution out of it - I'd
really like to stay below a second or two total wall-time for the processing time of a single image, for example, not 2-3 minutes. Hopefully, also generate some slightly higher quality images. Their results are pretty good but there are still plenty of cases where it falls over a bit, with obvious discontinuities between tiles that I wouldn't consider of
shipping quality.
(as an aside, even if, e.g., the DKC title screens had a lot of artist manipulation to make them fit for 16-colour tiles by moving elements around or tweaking colours of objects, that just underscores the need for rapid processing, and hence rapid iteration and turnaround time; doing mild tweaks becomes out of the question if you're waiting multiple minutes for a result)