NMI vs IRQ - NESdev BBS

NMI vs IRQ
by qwertymodo on 2016-10-14 (#180813)

I'm writing a simple video player ROM from scratch, and I'm confused on something. With the following code:

Code:

arch snes.cpu

macro seek(variable offset) {
    origin (offset) & $3FFFFF
    base offset
}

// ===Interrupt Handlers===
seek($C00500)
nmi:
// Do NMI stuff
    rti
irq:
// Do IRQ stuff
    rti

// ===Initialization Routine===
seek($C02000)
__init:
// Init stuff here
// . . .
    jml __main

// ===Interrupt Vectors===
seek($C0FF00)
// RESET
    sei
    clc
    xce
    rep #$18
    ldx #$1FFF
    txs
    jml __init

seek($C0FF10)
// NMI
    jml nmi

seek($C0FF14)
// IRQ
    jml irq

seek($C0FF18)
// BRK
-//  lda $ABCDEF
    bra -

seek($C0FFFF)
// EMPTY VECTOR
    rti

seek($C0FFB0)
// ===Internal Header===
snes_header:
    dw $3713            // Maker code
    db 'M','S','U','1'  // Game code
    // Reserved
    db $00,$00,$00,$00,$00,$00,$00
    db $00              // Exp RAM size
    db $00              // Special version
    db $00              // Cartridge type
    // ROM Name
    db 'M','S','U','-','1',' ','V','i','d','e','o'
    db ' ','T','e','s','t',' ','R','O','M',' '
    db $31              // Mapper
    db $02              // ROM type
    db $0C              // ROM size
    db $03              // SRAM size
    db $01              // Country code
    db $33              // Reserved
    db $01              // Version number
    dw $AA00            // Checksum complement
    dw $55FF            // Checksum

// ===Vector Table===
// Native Mode
    dd $FFFFFFFF        // UNUSED
    dw $FFFF            // COP
    dw $FF18            // BRK
    dw $FFFF            // ABORT
    dw $FF10            // NMI
    dw $FFFF            // RESET
    dw $FF14            // IRQ

// Emulation Mode
    dd $FFFFFFFF        // UNUSED
    dw $FF18            // COP
    dw $FFFF            // BRK
    dw $FFFF            // ABORT
    dw $FFFF            // NMI
    dw $FF00            // RESET
    dw $FFFF            // IRQ

seek($D00000)
__main:
    sep #$20        // Set the A register to 8-bit.

    lda #$80
    sta $4200   // Enable NMI

// Loop forever.
-; bra -

I would expect NMI interrputs to jump to $C0FF10, based on the vector table, however, running it in bsnes-plus, if I set a breakpoint on $C0FF10, I never hit it. Instead, I hit $C0FF14 instead. Am I doing something wrong, or just misunderstanding how the vectors work? For now, I've just resorted to putting all of my NMI handling in the IRQ handler instead, which seems to be working, but I'm running into issues where I'm only able to DMA about half of the VRAM data I expect, and I'm curious if maybe this issue is somehow related. Even if not, I'd at least like to try and understand what's going on.

Re: NMI vs IRQ
by koitsu on 2016-10-14 (#180818)

I can absolutely assure you with 100% accuracy that VBlank is tied to the NMI vector, not the IRQ vector -- yes, 100% sure.

It looks to me like you're off-by-4 (4 bytes) somewhere, but it's hard to tell.

Is there some reason you aren't providing an assembly listing that contains generated code offsets so you can see what ends up where? Or an actual ROM file so I could go and actually reverse engineer the vectors? An assembly listing would be better.

You've convoluted the process by trying to stick the SNES header + native mode vectors + emulation mode vectors all into one seek($C0FFB0), requiring me to sit down and figure out where each/every byte goes.

Why don't you try putting seek($C0FFE4) in front of your native mode vectors, and seek($C0FFF4) in front of your emulation mode vectors? You can remove the dd $FFFFFFFF statements. And because there's no BRK vector in emulation mode, you can set that to $0000 (it makes it more clear than $FFFF).

Also, you shouldn't use $FFFF as a vector point for anything. Do you know why? Bank $00 is where all the vectors are read, and to my knowledge bank wrapping doesn't happen on vectors (i.e. a vector of $FFFF will either jump to code at $00FFFF (low) and $010000 (high) or $00FFFF (low) and $000000 (high). My guess is the latter.

Also, you really need to be setting your emulation mode vector addresses properly. Guess what mode the CPU starts in? Emulation mode. Vectors read by the CPU natively are actually in bank $00, which is presumably mapped to bank $C0 in the memory model you're using. You should also be setting your RESET vector (in native mode) to the same place as in native mode.

And finally: I can't even determine how your RESET vector is working right now. I don't see how $FFFF (native) or $FF00 (emulation) point to anything in __init. Oh wait, actually, they would point to the rti you have under seek($C0FFFF) (for native). Why aren't you doing things like dw __nmi then making a __nmi label? You've made this troubleshooting process a huge, HUGE pain in the ass.

One more edit: what assembler is this? I really want to know. What you're using looks more like a "patching assembler" (I'm thinking bass or xkas? I forget which tool), and not an "assembler" in the classic sense.

Re: NMI vs IRQ
by Nicole on 2016-10-14 (#180819)

By the way, why are these repeated like this?

Code:

seek($C0FF10)
// NMI
    jml nmi

seek($C0FF14)
// IRQ
    jml irq

seek($C0FF18)
// BRK
-//  lda $ABCDEF
    bra -

seek($C0FFFF)
// EMPTY VECTOR
    rti

seek($C0FF10)
// NMI
    jml nmi

seek($C0FF14)
// IRQ
    jml irq

seek($C0FF18)
// BRK
-//  lda $ABCDEF
    bra -

seek($C0FFFF)
// EMPTY VECTOR
    rti

I don't think it would cause your particular issue, but there's no reason to duplicate these lines.

Re: NMI vs IRQ
by qwertymodo on 2016-10-14 (#180820)

Nicole, that was a stupid copy/paste error, I'll fix it as soon as I'm not on my phone. Koitsu, this is bass v14, yes a patching assembler, since this is actually going to be turned into a patch, but I wanted to get the functionality working on its own. I'll post a ROM when I get home. I copied the vector tables from Chrono Trigger, including the FFFF's (they didn't have the rti at C0FFFF as far as I know, I just did that because I thought it made sense, maybe I was wrong about that).

Re: NMI vs IRQ
by koitsu on 2016-10-14 (#180821)

If this is a "patched ROM" (i.e. a patched commercial ROM), please do not upload it here -- it will be deleted by a mod immediately citing copyright. An IPS patch would be fine.

There's nothing wrong with using rti as code at a vector point. But it sounds like the commercial ROM just set unused vectors to $FFFF (this is probably because Nintendo mandated that "unused portions of a ROM be filled with $FF" even though tons of games didn't do this -- it just made it easier on the EPROM and mask ROM process (hardware folks please correct me if I'm wrong)).

Either way, I can assure you that VBlank is tied to NMI, not IRQ. Geiger's SNES9x with debugging capabilities, although runs extremely wonky in present-day OSes (Windows 7 up), will give you vector information. I think there's a modified bsnes-classic that has some debugging capability as well.

Chrono Trigger has a special place in my heart, filed under "games that really made my (technical) life hell" (for a couple reasons), so I'm already gritting my teeth. ;)

Edit: these are the official vectors from Chrono Trigger (MD5 and filename: a2bc447961e52fd2227baed164f729dc Chrono Trigger (U) [!].sfc). "8-bit" means emulation mode, "16-bit" means native mode.

Code:

Vectors:
   8 Bit      16 Bit
ABT   $00:FFFF   $00:FFFF
BRK   $00:FFFF   $00:FF18
COP   $00:FF18   $00:FFFF
IRQ   $00:FFFF   $00:FF14
NMI   $00:FFFF   $00:FF10
RES  $00:FF00

Review of the actual code at vectors $FF18 (native mode BRK and emulation mode COP), and $FFFF (a placeholder vector, i.e. isn't used), tell me the following:

The $FF18 vector contains an infinite loop that just does an lda $abcdef infinitely (i.e. something you could easily witness/see in a debugger or possibly on a hardware ICE):

Code:

$00/FF18 AF EF CD AB LDA $ABCDEF[$AB:CDEF]   A:0000 X:0000 Y:0000 P:envMXdIzC
$00/FF1C 80 FA       BRA $FA    [$FF18]      A:0000 X:0000 Y:0000 P:envMXdIzC

The $FFFF vector is worthless: $00FFFF (contains $FF) and $000000 (direct page, so it's going to vary). So, this is as I suspected: it's just filling unused vectors with $FFFF.

The RESET vector at $FF00, however, is legitimate:

Code:

$00/FF00 78          SEI                     A:0000 X:0000 Y:0000 P:EnvMXdIzc
$00/FF01 18          CLC                     A:0000 X:0000 Y:0000 P:EnvMXdIzc
$00/FF02 FB          XCE                     A:0000 X:0000 Y:0000 P:EnvMXdIzc
$00/FF03 5C 00 C0 FD JMP $FDC000[$FD:C000]   A:0000 X:0000 Y:0000 P:envMXdIzC
$FD/C000 C2 10       REP #$10                A:0000 X:0000 Y:0000 P:envMXdIzC

The NMI and IRQ vectors for emulation mode are set to $FFFF because the game *immediately* inhibits IRQs and switches to native mode (see above RESET code). This is important because the NMI and IRQ vectors in native mode jump to code located in RAM:

Code:

$00/FF10 5C 00 05 00 JMP $000500[$00:0500]   A:0000 X:0000 Y:0000 P:envMXdIzC
$00/FF14 5C 04 05 00 JMP $000504[$00:0504]   A:0000 X:0000 Y:0000 P:envMXdIzC

In other words: Chrono Trigger, during run-time, almost certainly modifies $000500 to $000503 to a long jump (5C xx xx xx) to a location of its choosing, allowing for different kinds of VBlank and IRQ handling routines to be used. Common technique.

Finally: breakpoints for vectors often requires that you add breakpoints for bank $00 addresses, not higher banks ($80 or $C0). Bank $00 is where the vectors are loaded from in the actual CPU (the vectors are only 16-bit, as you know). Quoting WDC documentation, with the relevant part underlined:

Quote:

When an interrupt is first received, the processor finishes the currently executing instruction and pushes the double-byte program counter (which now points to the instruction following the one being executed when the interrupt was received) and the status flag byte onto the stack. Since the 6502 and 65C02 have only a sixteen-bit program counter, only a sixteen-bit program counter address is pushed onto the stack; naturally, this is the way the 65802 and 65816 behave when in emulation mode as well. The native-mode 65802 and 65816 must (and do) also push the program counter bank register, since it is changed to zero when control is transferred through the bank zero interrupt vectors.

"Program counter bank register" refers to what's known as K, or alternately PCB (program counter bank). This refers to the bank code is currently executing out of. Do not confuse it with B (data bank register).

Re: NMI vs IRQ
by qwertymodo on 2016-10-14 (#180822)

My ROM as it stands is not Chrono Trigger, it is my own code and some of the tutorial code from the super famicom wiki. I only copied a few tiny things I didn't understand like that BRK vector and the vector tables themselves. This is a from-scratch ROM to test out my own video loading code, which I will eventually be turning into a patch for CT, so I wanted things like the mapping to match up just to avoid headaches in that stage of the process. I still don't have access to my laptop, but I'll try to post it in an hour or so when I get home.

Re: NMI vs IRQ
by qwertymodo on 2016-10-14 (#180823)

I edited the OP, and uploaded the ROM, which I just now tested again, only to find that the NMI breakpoint at $FF10 was hitting properly. I don't know if it's just a difference between my machines, or if something changed since I posted, or if I was just tired and delirious and thought I was hitting the wrong breakpoint, but whatever it was, it's working now. What *isn't* working, however, is my tile data DMA, which is still only transferring about half of the tiles I queued up (a little over 9,000 bytes per VBLANK over the course of 4 consecutive VBLANKs), but that's another problem for another day.

Re: NMI vs IRQ
by Nicole on 2016-10-15 (#180824)

For the DMA, you are giving it the number of bytes to transfer, right? That's what you need, not the number of units to transfer, because DMA uses DASxL/H as a byte counter (it can stop mid-unit).

Re: NMI vs IRQ
by koitsu on 2016-10-15 (#180829)

Yeah, what Nicole said. It'd help to just see the code itself (maybe the whole thing, or maybe just relevant snippets including setup procedures done before the transfer, as well as your full NMI routine).

Re: NMI vs IRQ
by tepples on 2016-10-15 (#180830)

qwertymodo wrote:

a little over 9,000 bytes per VBLANK

You appear not to have the Location set in your profile. Are you in Europe? Because unless you're either on a PAL SNES or using the forced blank register ($2100) to disable rendering early and enable it late, you can't fit much more than about 6,000 bytes in a vblank.

Re: NMI vs IRQ
by qwertymodo on 2016-10-15 (#180835)

I'm in the US, I thought I'd read the number was supposed to be closer to 13K for NTSC, but if that's not correct, that would explain it. My full code is included in the zip attached to the first post (some of the commenting might be incorrect, I'm still figuring out some of these things, and others I might have changed without updating the comment lines, but it should be mostly close).

This is what I'm getting (with 4 VBLANKs).

Re: NMI vs IRQ
by AWJ on 2016-10-15 (#180838)

qwertymodo wrote:

Yeah, those gaps are almost certainly caused by your program trying to write to VRAM past the end of VBlank and failing.

Re: NMI vs IRQ
by qwertymodo on 2016-10-15 (#180843)

Then, can anybody help me understand where these numbers are coming from? I need to transfer ~33K over the course of 4 frames (15fps@60Hz), or else I need to drastically reduce my frame size.

Re: NMI vs IRQ
by tepples on 2016-10-15 (#180844)

93143's post was assuming the use of 28 lines of forced blanking at the top and 28 at the bottom: "With 168 active scanlines". This matches a 16:9 TV's safe area fairly well.

If you need full frame on a 4:3 TV, you'll need to plan on reusing tiles from frame to frame (possibly including flipping), making some tiles 2-bit (for use on BG3), or using HDMA to change vertical scroll after each scanline to halve the vertical resolution. It's also possible to reduce effective vertical background resolution to 448/3 = 149 lines by mixing vertical scroll with interlace mode so that each scanline of tile data is displayed for an average 1.5 lines.

Re: NMI vs IRQ
by qwertymodo on 2016-10-15 (#180846)

The source video is letterboxed to 256x144 (or maybe 160, I don't remember), so I'll look into the use of forced blanking. Not really sure how to do that though. This whole video thing is new for me.

Re: NMI vs IRQ
by suFami on 2016-10-15 (#180847)

Hey qwertymodo, I don't know enough to help with the technical stuff, but I did find this. A user named Ladida made an example MSU-1 Video over at SMWCentral. The ASM is included in the download and might be able to help you out. https://www.smwcentral.net/?p=viewthread&t=73363 Maybe you could use this process instead of making your own player.

Quote:

in case youre wondering how its done (or how I did it):
1. grab relevant video file
2. open in handbrake or something, pick relevant time segment from video (or just whole vid)
3. export as 256x144 (16:9, crop/letterbox if you have to), @ constant 30fps (or 15fps, or 7.5fps, etc)
30fps gives you ~1 hour of video (if you use all 4GB of MSU)
15fps gives you ~2 hours
7.5fps gives you ~3 hours? or 4? i cant into math, sorry
4. virtualdub doesnt like mp4/mkv, so convert to avi
(you could have done steps 1-3 in virtualdub btw)
5. in virtualdub, export image sequence (PNG)
i guess you dont really need virtualdub for this, just something that will export
the processed video as a sequence of PNGs
6. batch convert the PNGs in irfanview to PCX, downsampled to 8bpp
7. batch process the PCXs through pcx2snes using the bat i provided as a sample
8. sticky the files together; have the gfx and pal separate
as in, combine all the gfx into one, and then all the pal into one, and then
stick the pal to the exact end of the gfx. take note of the address
9. you pretty much have your .msu, just use it.

you dont exactly have to follow the steps above. whats important is that you end up with
x9000 byte gfx files and x200 byte palette files for each frame, and it should look good
when animated at a constant rate (either 30fps, 15fps, 7.5fps, 3.75fps, 1.875fps...)

audio is much easier, everyone and their pet goat knows how to do it:
1. extract audio from video file
2. convert audio to 44.1khz 16bit wav
3. open in audacity, cut out audio you dont want (make sure it matches video)
skip this step if the extracted audio already matches the video you processed
4. export as wav, then just run through wav2msu

you can copy and use my source code if you want. it should work with anything you throw at it;
just replace the audio file(s) and the .msu file

Re: NMI vs IRQ
by qwertymodo on 2016-10-15 (#180849)

Thanks, I'll take a look at that, but I'm still planning on writing my own, since I haven't worked with the PPU before, and I'll need to understand it in order to complete the hack (since the videos will interrupt the gameplay and I'll need to restore the video state afterward). Also, understanding how it works makes the reverse engineering of the existing code that much easier. I basically have mine functionally displaying, I'm just having timing issues which it sounds like should be workable with forced blanking (or at least that should get me most of the way there and if I have to shrink the frame, I can do that too, but smkdan's player managed to do it). So, that's the plan.

Re: NMI vs IRQ
by Señor Ventura on 2016-10-17 (#180930)

qwertymodo wrote:

Then, can anybody help me understand where these numbers are coming from? I need to transfer ~33K over the course of 4 frames (15fps@60Hz), or else I need to drastically reduce my frame size.

I think the problem is you haven't bandwith to do that. You only have 22,8 KB every 4 frames.

But decreasing the resolution to 192 horizontal scanlines yo have 10,88KB per frame, and i don't know why with 256x160 or 256x144 you don't have enough bandwith.

Re: NMI vs IRQ
by 93143 on 2016-10-17 (#180939)

qwertymodo wrote:

The source video is letterboxed to 256x144 (or maybe 160, I don't remember), so I'll look into the use of forced blanking. Not really sure how to do that though. This whole video thing is new for me.

If you're only transferring ~9 KB per frame, and your visible area is 160 lines high and centered vertically, you can probably get away with just setting the top bit of $2100 at the beginning of NMI, and resetting it at the end, as long as all the tiles above and below the video frame are black. Normal VBlank plus the extra 32 lines at the top of the screen should give you something like 11 KB, with a sufficiently lean and efficient NMI routine.

If you needed more bandwidth, you could use an IRQ set to trigger at the bottom of the video playback window, and disable NMI entirely. This would get you roughly 16 KB per frame.

You say the original video is 15 fps?

...

You're not using HDMA for anything during video playback, are you?

Señor Ventura wrote:

You only have 22,8 KB every 4 frames.

But decreasing the resolution to 192 horizontal scanlines yo have 10,88KB per frame

How are you calculating those numbers? DMA is 165.5 bytes per line. Are you assuming a particular amount of code overhead?

Re: NMI vs IRQ
by tepples on 2016-10-17 (#180943)

22800 bytes/4 frames * 1 line/165.5 bytes = 34.5 out of 38 lines. I'm not sure how much time the S-PPU spends in "pre-render" seeking out sprites for the first line though.

Re: NMI vs IRQ
by qwertymodo on 2016-10-17 (#180945)

Since the video is letterboxed, I was able to force blank from scanline 184-40 which gave me enough time to upload the frame in 4 chunks. Now I just have a bunch of counter logic code to write and I'll have myself a video player.

Re: NMI vs IRQ
by 93143 on 2016-10-17 (#180949)

That's only 142 lines of video. I assume it's supposed to be 144...

Also, I don't know if you're doing this, but it's probably wise to avoid using HDMA, because it can interact badly with regular DMA on early S-CPUs. I'm not sure if the bug happens on a 'dry fire' (HDMA active but no data for that particular scanline), but the fact that it can happen at the beginning of line 0 suggests that it might, and I wouldn't risk it. You can start HDMA partway through a frame, but it's a bit fiddly and I believe there's a one-line delay.

Then again, if you're starting the main DMA right below the video, it's probably finished well before the top of the screen...

Re: NMI vs IRQ
by qwertymodo on 2016-10-17 (#180957)

I'm not using HDMA. Good call on the line numbers, I probably want to blank 185-39 (I'm using a tilemap with fully black tiles in the letterboxed area, so there won't be any garbage if I blank late/enable early, it just cuts down on the bandwidth a bit). Anyway... it's working (the audio is a bit out of sync because it's muxed in separately, I don't feel like coding up an SPC ROM and upload routine just to disable the DSP mute register, especially since this is going to be a ROM hack anyway, and the original game will handle that for me).

https://www.youtube.com/watch?v=SO0zOXvgk64

Re: NMI vs IRQ
by qwertymodo on 2016-10-18 (#180999)

Thanks for all the help, I've managed to get all 10 FMV's playing now

https://www.youtube.com/watch?v=kJ7CT6bFPlw

Re: NMI vs IRQ
by 93143 on 2016-10-18 (#181016)

Good job!

I find Color quantizer works better than the GIMP in many cases. It certainly has more options (particularly in the old version 0.6.5.0)... Don't be afraid to fiddle with "Max error"; I find that small patches of colour can come out horribly wrong if it's too small...

Unfortunately I don't see a way to quantize in a 15-bit colourspace. If you use a custom palette, it doesn't seem to pay any attention to the "Number of colors" box... I've tried using a custom palette to quantize to 15-bit with dither, and then re-dithering to 256 colours afterwards, but the first step takes forever for some reason (then again, my computer has been acting strangely of late, so your mileage may vary)...

What's the colour depth of the original video? The above paragraph is only relevant if it's higher than 15-bit...

Some quick test results (image credit: tepples):

GIMP 2.8.10 - reduce to 256 colours with positioned dither, then posterize to 32 levels.

Attachment:

Wii_kids_GIMPpositional256_RGB15.gif [ 43.44 KiB | Viewed 2217 times ]

Color quantizer v0.6.5.0 - Adaptive, 256 colours, frame size 2048, max error 15, ordered dither, all other settings default. Load in GIMP and posterize to 32 levels. Generally much smoother, but notice the bit depth issues in the dark areas...

Attachment:

Wii_kids_CQordered256e15_RGB15.gif [ 41.6 KiB | Viewed 2202 times ]

Color quantizer v0.6.5.0 - custom RGB555 palette, 8x8 pattern dither (by far the most computationally intensive kind of dither; this step took nearly 3 minutes), tweaked max error and frame size but can't remember settings. Save and load modified image. Adaptive, 256 colours, frame size 4096, max error 14, ordered dither, all other settings default. Finally load in GIMP and posterize to 32 levels.

Attachment:

Wii_kids_CQpatternRGB15_ordered256e14.gif [ 45.28 KiB | Viewed 2209 times ]

Has anyone else got any recommendations for a good colour quantization and dither solution?

If you'd rather roll your own, that's not a bad thing and I'm not going to stop you...

...

I don't know what you're using to target SNES format, but note that pcx2snes has a chroma dither feature that screws with hue in an attempt to preserve luminosity, and it is implemented in such a way as to mess up the palette of an image that has already been posterized to approximate 15-bit colour. I find the results can be rather ugly. I prefer to use my own code to convert images to SNES format, because I know it won't pull any weird shenanigans...

Re: NMI vs IRQ
by qwertymodo on 2016-10-18 (#181032)

I wrote my own png to bitplane converter in C++. So, if I implement any of the custom dithering algorithms, like one of these, I'll do it in there. I might also put the resizing in there as well to avoid that extra step as well. For now, I'll give color quantizer a shot. At the very least, I need to give the positional algorithms a try, since error-diffusion algorithms don't play nice with animation.

Re: NMI vs IRQ
by 93143 on 2016-10-18 (#181038)

qwertymodo wrote:

custom dithering algorithms, like one of these

Hey, those look really good. For 16 colours, anyway... Plainly you're way ahead of me on this.

Quote:

resizing

...wait, I thought you said the source video was 256x144...

Quote:

error-diffusion algorithms don't play nice with animation.

Ah. Good point.

Re: NMI vs IRQ
by qwertymodo on 2016-10-19 (#181083)

93143 wrote:

...wait, I thought you said the source video was 256x144...

The original source is 320x176, letterboxed to 320x240. The final output is running at 256x144, letterboxed to 256x224

Re: NMI vs IRQ
by 93143 on 2016-10-19 (#181097)

I see.

Not to be too much of a back seat driver or anything, but why not 256x152? According to my calculations it would still fit fairly comfortably if you left as much data as possible to the last frame, and the aspect ratio would be slightly less wrong.

Based on the page you linked above on the subject of dither algorithms, I'm going to assume you plan to use a high-quality rescaling algorithm too... man, I really have trouble trusting people to do their hobbies right, don't I?

Re: NMI vs IRQ
by qwertymodo on 2016-10-19 (#181098)

265x152 wouldn't be the right aspect ratio. The correct ratio is actually 256x141.8, so I'm just calling it 142 and adding 1 black line on top and bottom.

Re: NMI vs IRQ
by 93143 on 2016-10-19 (#181099)

An NTSC PSX in 320-wide mode has a 32:35 PAR, meaning the video should display with an aspect ratio of 1.66. An NTSC SNES in 256-wide mode has an 8:7 PAR, meaning the video as you've converted it shows up with an AR of 2.06. You could get an exact match with 256x176, but of course that wouldn't fit in VRAM without tearing; getting the correct aspect ratio at 8bpp would involve narrowing the image. Here's a reasonably good comparison; 240x160 apparently fits, and gives an aspect ratio of 1.71.

(If one were to make the assumption that the original video was prepared without taking the PAR into account, the SNES resolution closest to recovering the animation's natural aspect ratio with full screen width would be 256x161. But based on the shape of the sun in that first frame, I don't think this is true.)

Re: NMI vs IRQ
by tepples on 2016-10-20 (#181106)

You wouldn't need the bottom of the video to fit in VRAM twice. You can double buffer all but the last block and single buffer the last block, and you still won't get tearing.

Re: NMI vs IRQ
by qwertymodo on 2016-10-20 (#181112)

That's what I'm doing, it's the last block that's the limiting factor, since you need to upload it all in one frame. The larger your visible window, the more data you need to upload in the last frame, while at the same time having fewer lines to do it. I'm not sure how much more data I can pull off, if any (at least in terms of full lines of tiles).

Re: NMI vs IRQ
by 93143 on 2016-10-20 (#181118)

256x152 (aspect ratio ~16% too wide)

38912 B + 64 B (black tile*) + 2x1344 B (168 high*) = 41664 B for a single frame, including two tilemaps
65536 B - 41664 B = 23872 B max buffer (roughly 8 KB per frame for 3 frames)
38912 B - 23872 B + 512 B = 15552 B to transfer on last frame
15552/165.5 = 93.97 scanlines of DMA, resulting in about 16 scanlines of wiggle room

*not necessary with precise IRQ timing
[...actually, it occurs to me that a blank line of the tilemap could be reused as a black tile...]

256x160 (aspect ratio 10% too wide)

40960 B + Nx1280 B (160 high) = 42240+1280(N=2) B for a single frame, including N tilemaps
65536 B - (42240+1280(N=2)) B = 23296-1280(N=2) B max buffer
(40960+1280(N=1)) B - (23296 B-1280(N=2)) B + 512 B = 19456 B to transfer on last frame
19456/165.5 = 117.56 scanlines of DMA, resulting in about 16 scanlines of overload

(also note that it does not matter whether the tilemap is duplicated and switched or overwritten during the final frame)

240x160 (aspect ratio ~3% too wide)

38400 B + 64 B (black tile**) + 2x1408 B (176 high*) = 41280 B for a single frame, including two tilemaps
65536 B - 41280 B = 24256 B max buffer
38400 B - 24256 B + 512 B = 14656 B to transfer on last frame
14656/165.5 = 88.56 scanlines of DMA, resulting in about 13 scanlines of wiggle room

**not necessary with window masking and precise IRQ timing

240x165 (aspect ratio exactly correct)

40320 B (240x168 data) + 2x1344 B = 43008 B for a single frame, including two tilemaps
65536 B - 43008 B = 22528 B max buffer
40320 B - 22528 B + 512 B = 18304 B to transfer on last frame
18304/165.5 = 110.60 scanlines of DMA, resulting in about 14 scanlines of overload

256x176 Quantomatic (aspect ratio exactly correct)

22528 B + 1408 B = 23936 B for a single frame, including one tilemap (Quantomatic requires a tilemap refresh every frame)
(22528 B + 704 B + 256 B)/4 = 5872 B per frame, resulting in about 50 scanlines of wiggle room (ie: it fits in normal VBlank)

with sprites

22528 B + 1408 B = 23936 B for a single frame
(22528 B + 1408 B + 512 B + 544 B)/4 = 6248 B per frame, resulting in about 48 scanlines of wiggle room (ie: it still might fit in normal VBlank if you're careful/clever)

Re: NMI vs IRQ
by qwertymodo on 2016-10-20 (#181119)

I really don't want to do 240x160, the side bars always looked weird to me in smkdan's original. However, I will definitely look into 256x152. I'll have to scroll BG1 to recenter it, but that's pretty straightforward, right?

Re: NMI vs IRQ
by 93143 on 2016-10-20 (#181120)

Yes, scroll registers are single-byte dual write, accessible any time (even during active rendering, though the PPU won't notice for a couple of slivers). Just remember that neutral position is actually FFFFh (or 03FFh, since the underlying value is only 10-bit for non-Mode 7 layers) because line 0 is for sprite caching and isn't displayed.

To me, the aspect ratio is a bigger issue than the sidebars, but it's your project...

I just wish Quantomatic were at a more sophisticated stage of development; right now it doesn't use sprites and has issues with colour fidelity in certain situations. Mind you, it still wouldn't look quite as good as true 8bpp... I'm thinking that combining it with CGRAM HDMA would be a fun task for a quantum computer...

Re: NMI vs IRQ
by qwertymodo on 2016-10-20 (#181123)

93143 wrote:

To me, the aspect ratio is a bigger issue than the sidebars, but it's your project...

Yes, I understand that objectively 240x160 is better, but in practice, aspect ratio isn't really that noticeable unless 1) it's an actual photograph where you can tell proportions of real objects are off, or 2) you are looking at it side-by-side with the original (or are very familiar with it) or 3) it's completely and horribly off

In this case, it's close enough, and the fact that it's animated lends considerable wiggle room before it starts to look visibly off without the original to compare to, and stretching to 256x152 will get us closer. On the other hand, the sidebars are just *there*, obviously visible by themselves, regardless of whether or not you're looking at the original, which, to me, is a larger issue when viewed on its own, as it will be by 99% of everybody who is playing the game normally. It's like watching a widescreen video on YouTube where the uploader hardcoded the letterboxing into the actual video file so now you can't properly watch that widescreen video on a widescreen monitor.

Quote:

I'm thinking that combining it with CGRAM HDMA would be a fun task for a quantum computer...

Then we'd have to rename it quantumatic, amirite?

Re: NMI vs IRQ
by qwertymodo on 2016-10-21 (#181184)

Ok, 256x152 it is. It definitely looks better, so there was certainly a point to be made about the old AR not looking good, but at this point, I don't think you can really argue that the remaining AR correction would make up for the eyesore that is vertical pillarboxing.

Re: NMI vs IRQ
by tepples on 2016-10-21 (#181186)

Would it be hard to make some tiles 8-bit and others 4-bit using BG1 and BG2 of mode 3? If you can recognize which tiles in each frame don't need the extra bit depth, that can save you some bandwidth.

Re: NMI vs IRQ
by 93143 on 2016-10-21 (#181188)

qwertymodo wrote:

at this point, I don't think you can really argue that the remaining AR correction would make up for the eyesore that is vertical pillarboxing.

Well... they do look good, but the AR is still noticeably off. If the SNES had square pixels, simply correcting for the Playstation's non-square pixels would result in an ideal resolution of 256x154, which is why your screenshots look so good without aspect correction. But the SNES does not have square pixels, as is popularly demonstrated by the famous pic of the full moon over Magus' castle (the developer really lived up to their name with this game). Scaled horizontally by 8:7, those screens look like this:

Attachment:

Yukud5jPAR.png [ 49.79 KiB | Viewed 2000 times ]

Attachment:

VzKgDlhPAR.png [ 170.45 KiB | Viewed 2000 times ]

And would the sidebars really be all that noticeable in a real use case? On an HDTV, or even a widescreen monitor, a SNES game running fullscreen will have sidebars anyway, and on a CRT the screen typically has a dark or black border that can even cover up some of the picture. And anyway lots of games with FMV and/or software rendering (Out Of This World, Star Fox, etc.) had black borders on all four sides.

Nevertheless I respect your decision. Full width does have its own appeal, and it does look basically perfect in an emulator with square pixels...

...

Also, why is the first image (the sky/sun) only 151 high? It seems to be missing a line off the bottom. The second one is 152 as expected...

qwertymodo wrote:

Then we'd have to rename it quantumatic, amirite?

I wasn't going to say that, but I suppose somebody had to...

tepples wrote:

Yeah, but then you need two tilemaps, and they both need updating. Worst-case bandwidth is actually higher than with straight 8bpp. Unless you got really lucky with the material, the only reason to do that would be to save some space, and this hack is nowhere near the limits of the MSU1.

Re: NMI vs IRQ
by qwertymodo on 2016-10-21 (#181195)

93143 wrote:

tepples wrote:

The hard part is that right now every frame's tilemap is identical, so I can upload frame n, then upload tilemap n+1 and switch to it in the middle of uploading frame n+1, so I can overwrite tilemap n when I get to it and have that much less data to upload in the final frame. If I were to try and optimize with 2 different BG layers, not only is that 2 tilemaps to upload, it's 2 tilemaps worth of space I can't touch until the end of the frame, which is exactly when the bandwidth timing is already the tightest. Not to mention the vast increase in complexity converting the data.