Question about Gameplay speedrun on real hardware

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Question about Gameplay speedrun on real hardware
by on (#75750)
Hello
I hope this topic wasn't discussed here before, but are there any projects that implemented speed-runs / or any other runs to an actual NES ROM? I mean like changing the demo run in SMB to run somebody's actual gameplay (there's the game genie code for enabling the sound in the SMB demo run AOSYLZEI + AUNPAAEY =thanks nesondubois) or, that really interrest me, to implement a Rockman run to actual Rockman.nes - you see my point...
I know there are sync/timing problems with real hardware as discussed here:
http://tasvideos.org/forum/viewtopic.php?t=4288
but nevertheless I would be really interrested in this, even if the succes rate of that run would be only like 50%.
But as a non-programmer I would only be interrested in some already made compiler, just add salt and water and it works -> that's adding rom(SMB and Rockman1 are my real interrest) + run(something like emulator "video") = final product.

EDIT: I know about the extra Famicom hardware (can't remember the name now) that can repeat button presses/ and/or using a PC to do the same thing, but that's not the question...

by on (#75751)
You mean adding the playback stuff in the ROM itself? You won't find a easy solution for that, every game is different. It's possible but making an automated tool isn't an easy task...

I thought about doing a replayer like this for, guess what, PowerPak. :) I think it should be possible by allocating part of the PRG-ROM for controller states, stealing a couple of cycles off NMI (to read the controller data from PRG-ROM and pass it to FPGA) and using the FPGA to watch for controller reading instructions and substituting (like my save state mappers' turbo feature already does). That is if the game actually keeps NMI enabled at all times...

by on (#75779)
Technically, if there's enough free PRG-ROM space, you could easily store the button log (i.e. emulator movie, but compressed to save more space) there and modify the game's controller reading routine to read from that log instead. You'll also need a couple of free RAM bytes, to keep track of the button data.

Since games use vastly different mappers (which means that the way in which the button logs will be stored will vary due to the way PRG-ROM is bankswitched), have their controller reading routines in different places, and use RAM very differently, you can't automate this process. Each game is a particular case.

A tool can even be made for this, but it will only support games that have been explicitly added by the person making the program, it will not automatically support "every ROM ever".

by on (#75782)
It does not address your question, but this guy hooked up an Arduino to the controller port, then fed an FCEUX movie to it so it played through a speed run on a real NES.

Really impressive!

by on (#75783)
I like the button log idea, I had something like that in my mind. Keeping it simple. Updating the finished compilation software due to the differences between games seems normal... But developing something like that would be probably easier for the powerpak first (although I would like the idea better if I had one).

I don't understand the mechanics of storing the emulator videos in current emlators, it isn't just simple time(when) + button(what) log right? So there would have to be also a compiler for those runs. The "special" TAS runs are out, many of those use routines which aren't even supported by the real hardware (AFAIK). But there are many "normal" runs and speed-runs that can be reproduced by a human (understand "real hardware").

The Famicom hardware button-log storage unit was called "Hori Game Repeater". Some info here:
http://www.famicomworld.com/forum/index.php?topic=4847
Pictures here:
http://www.forum.emunes.pl/index.php?topic=2927.0

EDIT: Thanks for the Arduino link, haven't seen that before, nice!

EDIT2: I just found a relevant discussion here:
http://nesdev.com/bbs/viewtopic.php?t=6162

by on (#75784)
TAS' generally work from a cleared save file, from the start of a game, and use no cheats or anything. They're nothing but pure button input. This being true, anything a TAS does should absolutely be possible in the game, on the actual console even if it's not HUMANLY possible. Unless "special" TAS refers to something I haven't seen.

TAS button input may not work on a console, but the cause of this would be that the TAS was made on an inaccurate emulator, and not really the TAS' input itself. In fact this may even break "normal" speed runs done on an emulator. For instance, emulators tend to work from cleared RAM, but NES games might rely on the "random" RAM at startup to seed their random number generators.

In fact there are even several verified on console runs, and now that methods like the Super Mario Bros. video exist, more runs will probably be verified.

by on (#75786)
tokumaru wrote:
Technically, if there's enough free PRG-ROM space, you could easily store the button log (i.e. emulator movie, but compressed to save more space)

Any ideas for a compression/decompression algorithm (I remember you being active in the CHR compression thread)? Trying to do this on PowerPak tickled my interest again. :)

by on (#75788)
Quote:
Any ideas for a compression/decompression algorithm

Wouldn't you be fairly likely to have some data sequences repeated fairly often in a button log from a speed run? So a LZ variant should perform fairly well. aPLib maybe? Or PuCrunch.

by on (#75789)
mic_ wrote:
Quote:
Any ideas for a compression/decompression algorithm

Wouldn't you be fairly likely to have some data sequences repeated fairly often in a button log from a speed run? So a LZ variant should perform fairly well. aPLib maybe? Or PuCrunch.

Hmm, I was thinking about something that could be decompressed easily. If it could be done on the FPGA that would be even better. :) I have to say I'm (almost) completely clueless about compression methods.

by on (#75790)
Here's an aPLib decoder I ported to the 6502: http://jiggawatt.org/badc0de/decrunch/aplib_decrunch_6502.asm
Note that it isn't fully optimized.

I think the PuCrunch archive includes a decoder for the 6510 that could be used pretty much as-is.

by on (#75794)
I worked on an AVR-based solution to play back a TAS over the controller port (never finished :( ). Here's the compression scheme I used for this project. I've converted it to C for readability.


Code:
typedef struct
{
   byte num_frames;   // The number of frames to hold this input byte.
   byte input;         // The litteral bits to load into the latch register
} input_record;

struct input_dictionary
{
   input_record records[128]; // The dictionary of input records
}

typedef union
{
   byte dict_entry;   // If bit 7 is set, bits 0-6 are an index into input_dictionary.
   input_record record; // If bit 7 is not set, this is a litteral input record.
} input;


That way you have your 128 most frequent input records compressed down from two bytes to one. And you can still represent holding down the same inputs for over two seconds without using another input record.

I did write a program to convert FCEU format input movies into this format, but I can't seem to find it.

So with all that said I was working with a very limited platform. The input movie would have been burned into the ROM of the AVR, then read back using slow program reads. I only had 32 bytes of RAM to work with (not counting a return stack), so I never looked into fancy compression and the like.

Ya'll are way beyond me with the PowerPak stuff. I can't even read VHDL or whatever it is you all use :D

Wow, this really makes me want to finish up that AVR input playback device. I'll have to go work on that over the weekend :D

by on (#75797)
thefox wrote:
Any ideas for a compression/decompression algorithm (I remember you being active in the CHR compression thread)? Trying to do this on PowerPak tickled my interest again. :)

In my game I just used RLE (buttons + frame count), because the demos are fairly short. I have a recording routine that outputs to SRAM, so I can later grab the data from the save file and incbin it in the ROM for playback.

How much memory do you have for this on the PowerPak? Keeping track of several minutes (possibly more than an hour) of button logs might be a challenge.

by on (#75799)
mic, I think those algos are too complex for this purpose. Thanks anyways.

tokumaru wrote:
thefox wrote:
Any ideas for a compression/decompression algorithm (I remember you being active in the CHR compression thread)? Trying to do this on PowerPak tickled my interest again. :)

In my game I just used RLE (buttons + frame count), because the demos are fairly short. I have a recording routine that outputs to SRAM, so I can later grab the data from the save file and incbin it in the ROM for playback.

How much memory do you have for this on the PowerPak? Keeping track of several minutes (possibly more than an hour) of button logs might be a challenge.

Well, the idea was to store the states in the the upper half of the PRG-ROM (512K on PowerPak, so game can still use 256K max), hook NMI to a memory area where FPGA dynamically generates code to do BIT $xxxx JMP original_nmi (4+3 cycles). FPGA then grabs the read value from the data bus and returns it the next time game reads $4016/$4017.

So 256K would allow ~72 minutes worth of uncompressed controller states (single player). I don't know how long the longest speedruns are but I'd like the limit to be bit bigger than that.

qbradq wrote:
I did write a program to convert FCEU format input movies into this format, but I can't seem to find it.

Let me know if you find it, I would be especially interested in the compression rate.

by on (#75801)
Good news! I got bored so I whipped up a Python program to compress FCEUX FM2 input movies into the format I documented above. I tested this with a ~18 minute run of SMB1 and a ~6 minute run of SMB1. Both came out to about 6% of the uncompressed (one byte per frame) size.

Download Link (GPL V3, Requres Python 2.5.4 or Higher)

Edit: Oh yea, I forgot to mention. This program only supports first player, only with a Game Pad, and there are *no* sanity checks. I may add that stuff later, or anyone else is free to.

I am very excited to work on my AVR-based playback thing-a-majig now. Only problem is I only have *very* small AVRs with 2K Flash space. I'll have to order bigger ones to see full speed runs :oops:

Here's the Stats! Yay!
Code:
Title:         NES Super Mario Bros (JPN/USA PRG0) "warpless" in 18:41.7 by Lee (HappyLee)
Download URL:   http://tasvideos.org/movies/fm2/happylee3-smbwarpless.zip
Total Frames:   67413
Output Length:  4391 Bytes
Compressed %:   6.513580%

Title:         NES Super Mario Bros (JPN/USA) "walkathon" in 06:47.2 by Lee (HappyLee).
Download URL:   http://tasvideos.org/movies/fm2/happylee4-smb-sidestroller.zip
Total Frames:   24472
Output Length:  1498 Bytes
Compressed %:   6.121281%


Python Program (WARNING: Ugly Code :D )
Code:
'''fm2bin.py

Convert an FCEUX .FM2 Emulator Movie File into a binary form for playback on
NES hardware.

Copyright (C) 2011 Norman B. Lancaster

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.

'''

import os
import sys
import re
import struct


input_bytes = []
rle_words = []
word_count = dict()
dict_words = dict()
output_bytes = []


def main():
   if len(sys.argv) != 3:
      print '''Usage: fm2bin.py input output
input   *MUST* be a .fm2 version 3 file. This is created by FCEUX. Only port 0
        input will be processed. Port 0 must have been a game pad. Save states
      are not supported either.
output  The output binary file. See "fileformat.txt" for details.
'''
      sys.exit(1)
   
   # Load the input file
   input = open(sys.argv[1], "r")
   while True:
      line = input.readline()
      if line == "":
         break
      line = line.strip()
      
      # Input record line
      if line.startswith('|'):
         byte = 0
         match = re.search('\|([^\|]*)\|([^\|]*)\|', line)
         token = match.group(2)
         for char in token:
            byte = byte << 1
            if char != " " and \
            char != ".":
               byte += 1
         input_bytes.append(byte)
   input.close()
   
   # REL-compress the data
   last_byte = -1
   count = 0
   for byte in input_bytes:
      if ( byte != last_byte and \
      count > 0 ) or \
      count >= 128:
         rle_words.append(((count - 1) << 8) | (last_byte & 0xff))
         count = 0
      last_byte = byte
      count += 1
   rle_words.append(((count - 1) << 8) | (last_byte & 0xff))
   
   # Build the word dictionary
   for word in rle_words:
      if not word in word_count:
         word_count[word] = 0
      word_count[word] += 1
   
   # Sort the word dictionary
   tupples = []
   for key in word_count:
      tupples.append((word_count[key], int(key)))
   tupples.sort()
   tupples.reverse()
   
   # Take the top 128 elements as the compression dictionary
   tupples = tupples[0:128]
   for i in range(0, len(tupples)):
      dict_words[tupples[i][1]] = i
      tupples[i] = tupples[i][1]
   
   # Output the dictionary
   # NOTE: tupples is now an array of INTs, should have thought that one through :D
   output_bytes.append(len(tupples))
   for word in tupples:
      # Just like in the record stream, count first, then byte
         output_bytes.append(word >> 8)
         output_bytes.append(word & 0xff)
   
   # Output the input record stream
   for word in rle_words:
      # Compress-able record
      if word in dict_words:
         # Bit 7 is the "this is a dictionary word" flag
         output_bytes.append(dict_words[word] | 0x80)
      # Non-compress-able record
      else:
         # Count first, then byte
         output_bytes.append(word >> 8)
         output_bytes.append(word & 0xff)
   
   # Output the file
   # Note: there may be a much faster way to do this.
   output = open(sys.argv[2], "wb")
   for i in range (0, len(output_bytes)):
      output.write(struct.pack("B", output_bytes[i]))
   output.close()
   
   print "Total Frames:\t%d" % len(input_bytes)
   print "Output Length:\t%d Bytes" % len(output_bytes)
   print "Compressed %%:\t%f%%" % float((float(len(output_bytes)) / float(len(input_bytes))) * float(100))


if __name__ == "__main__":
   main()


File Format Documentation:
Code:

File format description for the binary output file created by fm2bin.py

by Norman B. Lancaster



Offset   Length   Desscription
0      1      (N) Length of the compression dictionary in entries
1      2*N      Compression entries.
            Byte   Meaning
            0      The number of frames to hold this data in the output latch for controller 1
            1      The data that should be loaded into the output latch for controller 1
2*N+1   XXX      Input record entries.
            Byte   Meaning
            0      The number of frames to hold this data in the output latch for controller 1
                  If bit 7 is set this is an index into the compression entries table (expressed
                  as an offset in compression entires, NOT bytes) to load.
            1      The data that should be loaded into the output latch for controller 1
                  If bit 7 of byte one is set, skip this byte.

Results for Reference FM2 Files:

Title:         NES Super Mario Bros (JPN/USA PRG0) "warpless" in 18:41.7 by Lee (HappyLee)
Download URL:   http://tasvideos.org/movies/fm2/happylee3-smbwarpless.zip
Total Frames:   67413
Output Length:  4391 Bytes
Compressed %:   6.513580%

Title:         NES Super Mario Bros (JPN/USA) "walkathon" in 06:47.2 by Lee (HappyLee).
Download URL:   http://tasvideos.org/movies/fm2/happylee4-smb-sidestroller.zip
Total Frames:   24472
Output Length:  1498 Bytes
Compressed %:   6.121281%

by on (#75805)
qbradq wrote:
Good news! I got bored so I whipped up a Python program to compress FCEUX FM2 input movies into the format I documented above. I tested this with a ~18 minute run of SMB1 and a ~6 minute run of SMB1. Both came out to about 6% of the uncompressed (one byte per frame) size.

Wow, thanks, that's a much better ratio than I expected. I guess this means that I could actually use the 32K PRG-RAM for my PowerPak replayer instead of PRG-ROM. Makes some things easier, because recordings could be loaded using the standard PowerPak menus.

EDIT: FYI, I checked tasvideos.org and the longest "speedrun" seems to be DW4 at a bit over 2 hours. I should try couple of more videos with your script to see how well they compress in general.

by on (#75815)
Please post your results! I'd love to see them :D

by on (#75835)
qbradq, that compression scheme is very clever! Too bad that previous button combinations have to be stored in a separate array, because 128 bytes is a little too much RAM to dedicate to demos in a game.

I'm sure this is a great solution for other applications though, the compression ratio is very impressive.

by on (#75843)
tokumaru wrote:
qbradq, that compression scheme is very clever! Too bad that previous button combinations have to be stored in a separate array, because 128 bytes is a little too much RAM to dedicate to demos in a game.

As far as I understand it the dictionary doesn't need to be in RAM though.

EDIT: Here are some results. Overall very good. Lagrange Point is a strange outlier.
Code:
>fm2bin.py adelikat-dragonwarrior4.fm2 adelikat-dragonwarrior4.bin
Total Frames:   464795
Output Length:  19512 Bytes
Compressed %:   4.197980%

>fm2bin.py neo_omegon-lagrangepoint.fm2 neo_omegon-lagrangepoint.bin
Total Frames:   228682
Output Length:  38039 Bytes
Compressed %:   16.634016%

>fm2bin.py acmlm-destinyofanemperor.fm2 acmlm-destinyofanemperor.bin
Total Frames:   292658
Output Length:  20985 Bytes
Compressed %:   7.170486%

>fm2bin.py aglar-startropics.fm2 aglar-startropics.bin
Total Frames:   221542
Output Length:  12832 Bytes
Compressed %:   5.792130%

>fm2bin.py nitrodon1-lolo3.fm2 nitrodon1-lolo3.bin
Total Frames:   242350
Output Length:  13041 Bytes
Compressed %:   5.381060%

>fm2bin.py samioutinen-startopics2.fm2 samioutinen-startopics2.bin
Total Frames:   219654
Output Length:  15508 Bytes
Compressed %:   7.060195

by on (#75844)
From reading the description I thought that the dictionary was composed by the latest 128 button combinations used... You could have the index point to one of the previous 128 bytes in the raw stream, but that would effectively make the dictionary shorter I think...

Maybe I got it wrong and the dictionary is actually static?

by on (#75905)
Quote:
From reading the description I thought that the dictionary was composed by the latest 128 button combinations used... You could have the index point to one of the previous 128 bytes in the raw stream, but that would effectively make the dictionary shorter I think...

Maybe I got it wrong and the dictionary is actually static?


The dictionary is static. It is the 128 words of the input stream that are the most common (so 256 bytes). So when reading the input stream you read one byte, and if that byte's MSB is set, then bits 0-6 are the index into the dictionary / 2. If the MSB is not set, then bits 0-6 are the number of frames to hold the input byte for, and the next byte is the input byte.

I never thought about doing a stack-based compression scheme. That could potentially give smaller results, but the implementation would be much more complex.

I have thought of something that is going to prevent me from implementing this with a controller-port only application. The input data I have is in terms of frames, and I have no visibility to the frame start / end through the controller port. I'll have to rig something up with FCEUX to play and re-record the input movie in terms of what the input state is per strobe of the port or something.

@thefox: Hey man, don't forget that there are many games (like SMB3) that read the control port multiple times per frame and wait until it's the same. You probably already thought of that though :D

by on (#75907)
qbradq, your system for embedding TAS type inputs is pretty slick. I plan to put this into my Yars' Revenge clone, to spice up my "attract mode" (but for both controllers, as my attract mode demos the two-player coop mode).

Thank you for your excellent contribution to the nesdev community.

by on (#75913)
Quote:
qbradq, your system for embedding TAS type inputs is pretty slick. I plan to put this into my Yars' Revenge clone, to spice up my "attract mode" (but for both controllers, as my attract mode demos the two-player coop mode).

Thank you for your excellent contribution to the nesdev community.


Cool man, thanks! I can do the work of porting this thing to support two players if you'd like.

by on (#75914)
qbradq wrote:
I have thought of something that is going to prevent me from implementing this with a controller-port only application. The input data I have is in terms of frames, and I have no visibility to the frame start / end through the controller port. I'll have to rig something up with FCEUX to play and re-record the input movie in terms of what the input state is per strobe of the port or something.

I actually have a similar problem -- my thing needs the controller state per NMI. So I'll have to do some rigging as well. I just hope there aren't too many games that have NMI disabled for menus or stuff, because that would complicate this quite a lot. :)

I guess I could try to make it detect instruction sequences such as LDA $4016.. dynamically patch the following instruction to JMP to my own code and then JMP back when it's done. Man that would be dirty, but kinda fun at the same time. ;)

Quote:
@thefox: Hey man, don't forget that there are many games (like SMB3) that read the control port multiple times per frame and wait until it's the same. You probably already thought of that though :D

Yeah this shouldn't pose a problem.. The FPGA can keep track of elapsed frames by counting CPU cycles.

by on (#75915)
Quote:
guess I could try to make it detect instruction sequences such as LDA $4016.. dynamically patch the following instruction to JMP to my own code and then JMP back when it's done. Man that would be dirty, but kinda fun at the same time.


Your FPGA magic is well over my head at this point, but I'll tell you what I was thinking about, maybe it will be of use to you.

I was basically going to simulate the parallel-load serial shift register inside the controller using the AVR. I would just listen to the clock line.

Anyhow, what would you think about a process that would run the input movie through the emulator and output a similar input stream but in terms of "times the controller port was strobed" instead of frames? Seems like that would reduce the complexity for both of our implementations.

by on (#75932)
qbradq wrote:
Anyhow, what would you think about a process that would run the input movie through the emulator and output a similar input stream but in terms of "times the controller port was strobed" instead of frames? Seems like that would reduce the complexity for both of our implementations.

Yeah that should be easier to implement I guess (if I don't use the NMI way, which is kinda easier for me). I wonder how much it adds to the size of the compressed files?

by on (#75935)
I doubt it will add much to the size. If anything it may reduce the size of some input movies.

Many games do not strobe the controller ports during cut scenes, level / screen transitions, etc. So we should end up with fewer samples. Games like SMB3 that strobe multiple times per frame will always be reading the same input byte when on an emulator, so it should only add one to the repeat count.

I don't know when I can work on this though. Seems like a big pain in the neck as I've never looked at the FCEUX source.

by on (#75936)
qbradq wrote:
I doubt it will add much to the size. If anything it may reduce the size of some input movies.

Many games do not strobe the controller ports during cut scenes, level / screen transitions, etc. So we should end up with fewer samples. Games like SMB3 that strobe multiple times per frame will always be reading the same input byte when on an emulator, so it should only add one to the repeat count.

I don't know when I can work on this though. Seems like a big pain in the neck as I've never looked at the FCEUX source.

Well, I can probably add this to Nintendulator. However first I need to figure out the Nintendulator movie format and how to convert FCEUX files to it...

by on (#75937)
qbradq wrote:
Many games do not strobe the controller ports during cut scenes, level / screen transitions, etc. So we should end up with fewer samples. Games like SMB3 that strobe multiple times per frame will always be reading the same input byte

Unless they hit the DPCM double clock glitch. If they hit it on the second read, there might be four strobes in one frame: one pristine, a second with a deleted bit, and a third and fourth pristine.

by on (#75939)
tepples wrote:
qbradq wrote:
Many games do not strobe the controller ports during cut scenes, level / screen transitions, etc. So we should end up with fewer samples. Games like SMB3 that strobe multiple times per frame will always be reading the same input byte

Unless they hit the DPCM double clock glitch. If they hit it on the second read, there might be four strobes in one frame: one pristine, a second with a deleted bit, and a third and fourth pristine.


Good point. That may make input movies for games like this somewhat larger. We won't know until we see I guess.

Quote:
Well, I can probably add this to Nintendulator. However first I need to figure out the Nintendulator movie format and how to convert FCEUX files to it...


It seems like you could support playback of FCEUX files in Nintendulator. It's a dead-simple format if you ignore things like starting from a save state (which isn't terribly hard, you'd just have to convert it to the Nintendulator format on the fly) and movie text.

by on (#76282)
So.. I got movie playback working on PowerPak. It doesn't use compression yet so the movies are limited to about 11 minutes on PAL NES, but I'll test it a bit and try work the kinks out before adding that.

I modified Nintendulator to output an NMI based controller log to a binary file. Now I just need to convert FCEUX movies to Nintendulator's format or modify FCEUX... or add FCEUX movie playback support in Nintendulator. I have a feeling it might be easiest to just convert FCEUX movies to Nintendulator's format. We'll see...

by on (#76289)
I was able to convert FCEUX FM2 movies to Nintendulator's NMV format.. however they're not playing back correctly. :( I made a short video in SMB and it desyncs after some time. The TASVideos.org SMB speedrun failed miserably. I verified that FCEUX and Nintendulator both update controller state at the first scanline of the vblank so that shouldn't be the issue.

I think there's a small "bug" in Nintendulator, it doesn't update the controller state for the very first emulated frame. I fixed this and StarTropics speedrun played little bit further than the last time but still desynced eventually.

Must be some small detail that's messing it up.

EDIT: I have no idea what's going on... my short SMB movie works without fixing that "bug", StarTropics gets much farther with it fixed.

EDIT: Fixed! I think. I had to skip TWO of the first frames in the FCEUX movies for it to work right in Nintendulator. It didn't help that Nintendulator seems to exhibit some nondeterministic behaviour when frameskip is enabled...

EDIT: Spoke too soon... I haven't gotten any PAL movie working (tho there aren't that many anyways). Some NTSC movies also don't work.