I am currently planning a 6502 compiler for a few reasons:
1) I'm kind of annoyed by different features or the lack thereof with different compilers.
2) I'm in an IB CS class and have to write a program for my final so I figured I'd go with something I might actually use and like.
On the note of the IB CS part, it is recommended that I interview others who have the potential to use it about what they would want in the program.
So what features would you find highly desirable in a compiler. Right now I'm looking at a few features:
1) Basic Optimizations (Kind of obvious but not all compilers do this *cough*NESASM*cough*)
2) Block commenting (I do not know of any compilers that offer this)
Are you making an "assembler", which compiles a dialect of assembly language to an object file, or a "compiler", which is generally understood to compile a higher-level language?
Awsome! I wrote a mini C compiler for my final college project too.
I've often thought about writing a "NES Friendly" compiler for a "C-like" language. I like cc65, but not for NES development.
One feature that I would add, and I recommend to you, is the ability for your language to handle 8, 16, 24 and 32 bit signed and unsigned integers and fixed-point math.
For example, your game's world coordinate system might be 2^16 pixels wide, but you want sub-pixel precision for object movement. So you really want an unsigned 24-bit variable to store the X coordinate. (I'm ignoring multiple dimensions). However, you want to store your object's velocities as single (signed) bytes, where the upper bit is the sign (two's complement of course), the next 4 bits are pixel magnitude, and lower 3 bits are the sub-pixel part. So you want to write in code "X += V" and have the compiler produce the correct bit-shifts, clcs, adcs, etc...
Same for multiplication. You might want to express "X += V * sin(A)". There should be a way to give the compiler a hint that sin(A) is in range -1 to +1 and that V*sin(A) will not need all of the bits that a less-intelligent compiler would emit code for. (where 'sin' and 'cos' are LUTs).
Anyway, just some thoughts of mine about a NES-friendly compiler feature.
tepples wrote:
Are you making an "assembler", which compiles a dialect of assembly language to an object file, or a "compiler", which is generally understood to compile a higher-level language?
I'm writing an assembler. Straight 6502 -> ROM.
67726e wrote:
Basic Optimizations (Kind of obvious but not all compilers do this *cough*NESASM*cough*)
67726e wrote:
I'm writing an assembler. Straight 6502 -> ROM.
You got me a little confused then... What kind of optimizations should assemblers do? Assembly should be a straight conversion from mnemonics to binary code, and I don't know what kind of optimizations you think would be desirable during this process.
For example, on the NES there are times when you write the same value to the same memory location twice in a row, such as when you are resetting the scroll to (0, 0):
Code:
LDA #$00
STA $2005
STA $2005
An assembler that "optimizes" code might think that it's stupid to write the same value to the same location twice if it doesn't know that this address is mapped to a register and that all writes are being watched and have a meaning. I'd be very pissed if my assembler filtered out one of the writes, as that would produce a very hard to find bug.
I don't think there is any case I would want an assembler to monkey around with the code I wrote. The whole point of ASM is that you give the CPU precise instructions that should be followed to the letter, even if they don't always make sense.
tokumaru wrote:
I'd be very pissed if my assembler filtered out one of the writes, as that would produce a very hard to find bug.
Well to be fair, I could program the optimization to recognize necessary double writes, seemingly useless reads, etc.
Also, I'm not necessarily saying the compiler will have carte blanche and just do what it will with your code. I'm thinking more like make an option to that outputs possible spots of code that are not needed or could be better written. Just as an example:
Code:
LDA Some_Address
CMP #$00
BEQ Another_Location
The user would be prompted of this possible optimization:
Code:
LDA Some_Address
BEQ Another_Location
Finally, I really had in mind the whole zero-page issue with NESASM3 although I'm not sure if you would consider that an optimization or not.
It sounds like someone's trying to make an assembler capable of
peephole optimizations. Such an assembler would need a keyword to declare a symbol exempt from such optimization, just as
C does with 'volatile'. It might look like this:
Code:
PPUCTRL = volatile $2000
PPUMASK = volatile $2001
PPUSTATUS = volatile $2002
OAMADDR = volatile $2003
OAM_DMA = volatile $4014
PPUSCROLL = volatile $2005
PPUADDR = volatile $2006
PPUDATA = volatile $2007
As for eliminating CMP #$00 after an instruction that loads A, CMP #$00 has the side effect of always setting the carry flag like SEC, and Another_Location might expect the carry to be set for an SBC, ROL, or ROR.
67726e wrote:
Well to be fair, I could program the optimization to recognize necessary double writes, seemingly useless reads, etc.
I think this severely limits the usefulness of your assembler. You might have a good understanding of the NES, but how much do you know about the hundreds of other computers that have a 6502 in them? I am really glad I can use the same assembler to make NES programs and Atari 2600 programs, something I probably couldn't do with NESASM and its stupid platform-specific things such as 8KB banks.
Quote:
Just as an example:
Code:
LDA Some_Address
CMP #$00
BEQ Another_Location
The user would be prompted of this possible optimization:
Code:
LDA Some_Address
BEQ Another_Location
I think of this kind of thing as unnecessary hand-holding. If you are coding in assembly, you must have some confidence in what you are doing. These little details are things that people usually learn to fix pretty early on, and it's not like they cause programs to be terribly inefficient. I think it's not worth the trouble because you'll spend a lot of time making a program smart enough to identify these situations (and the result is never 100% smart, I'm sure there will be dumb or plain wrong suggestions sometimes) but once newbies get past the "writing stupid code" phase it will rarely present any advantage. You might be spending your time better if you focused on other features related to macros, memory management and things like that, in order to make your assembler a very solid and useful program.
Quote:
Finally, I really had in mind the whole zero-page issue with NESASM3 although I'm not sure if you would consider that an optimization or not.
I guess this is an optimization after all. I'd rather have the option to do it one way or the other though, like using a command to pick the default option but still have the mans to select a specific addressing mode for individual instructions when more than one are possible.
tepples wrote:
As for eliminating CMP #$00 after an instruction that loads A, CMP #$00 has the side effect of always setting the carry flag like SEC, and Another_Location might expect the carry to be set for an SBC, ROL, or ROR.
This is the perfect example of how optimizations like these can go terribly wrong. Assembly code is often more than what it appears to be, and making a program that always makes smart and *safe* decisions is not a trivial task.
But what if the instruction is in there for cycle-accuracy? Then it would be taking out a VERY key component. I'd rather have no optimizations, but thats just me. If you want optimized code, it shouldn't be the assembler that does it, it should be you.
tokumaru wrote:
If you are coding in assembly, you must have some confidence in what you are doing.
Some people code in assembly because they have full confidence in what they are doing. (You're likely to agree with me that action games' graphics engines should be left to such people.) Others code in assembly because most of the available high-level languages are nowhere near efficient enough to make even a non-scrolling game. Perhaps an assembler with a peephole optimizer could be a useful step toward making NES programming more accessible, if only as a base on which to build a compiler.
Quote:
I think it's not worth the trouble because you'll spend a lot of time making a program smart enough to identify these situations
Whenever I compile a C program with
gcc -Wall -O2, I thank the GCC team for making its compiler smart enough to find type errors for me and to optimize the RTL that the C translation front-end generates.
Quote:
You might be spending your time better if you focused on other features related to macros, memory management and things like that
At this point, would it be worth it to make this project an extension to ca65 (zlib licensed 6502 assembler) instead of a rewrite from scratch?
@3gengames: For a cycle-timed subroutine, one should be able to mark a block of code as not suitable for peephole optimizations, and the assembler won't apply them there.
Let me clarify, the compiler will not optimize the code for you, it will output something saying "Check out this line, you might want to do this".
Now I'm assuming someone who is writing code to be cycle-accurate would know what they are doing and know this would break the code.
The optimization recommendations are also triggered via a switch when the program is called e.g. 'assembler game.asm -optimize'. It is by no means a feature that is run without the user's expressed desire, and even then it will tread lightly.
That said, do you still think it is a bad idea and just plain needs to be scrapped. In that case, for something like zero-page, how would you go about doing that? Would you always do zero-page unless told not to or always do $0001 unless told to do $01?
Also, are there any other things you would want in a compiler? Anything you think I should avoid?
So we introduce two qualifiers 'absolute' and 'volatile', and we introduce a directive '.peephole'.
Code:
somelabel = $F1
anotherlabel = absolute $F2
SNDCHN = volatile $4015
; These generate zero page/direct page addressing mode
lda $F1
lda somelabel
; These generate absolute addressing mode
lda absolute $F2
lda anotherlabel
; This LDA/CMP won't be changed to LDA/SEC.
.peephole off
LDA Some_Address
CMP #$00
BEQ Another_Location
.peephole on
; But if peephole were turned on, the SEC can be slid
; upward until it meets up with another that sets C.
; This LDA won't get removed, even though the values of
; A and flags NZ after the second LDA would ordinarily be
; the same as after the first LDA, because the label has the
; volatile qualifier.
lda #$0F ; instruction setting NZ flags
sta SNDCHN
lda SNDCHN
As someone pointed out, you're talking about a assembler, not a compiler.
There's already great assemblers out there. You might want to focus just on the optimizations - instead of generating code and warnings, just generate warnings. Much like lint. Then people could run their code through your tool, fix the warnings they cared about, and then assemble it with ca65 or something else.
atarimike wrote:
As someone pointed out, you're talking about a assembler, not a compiler.
There's already great assemblers out there. You might want to focus just on the optimizations - instead of generating code and warnings, just generate warnings. Much like lint. Then people could run their code through your tool, fix the warnings they cared about, and then assemble it with ca65 or something else.
I just wanted to point out that I am doing this for a class and there are certain criteria my application
must meet. Something that just goes through and generates warnings does not sound like it would meet all of my criteria.
Ok, optimizations aside, what are features you would like to see in an assembler?
One of the things I am really set on having is block comments (/* */).
Are there any features you wish a current 6502 assembler had? Anything you really don't like?
GCC has a feature analogous to lint (-Wall); adding such a checker to ca65 should ideally qualify. If organized education fails to recognize that a solution that builds on an existing
free software product is just as valid as a solution from scratch, that'd explain a lot of the
NIH syndrome I've seen. (For more about NIH, see my previous post discussing
definitions of plagiarism.) Are these IB criteria a trade secret?
As for your second question, almost every assembler feature that I've needed I've been able to implement as a preprocessor. See for example my source code shuffler written in Python (
implementation and
discussion), intended to help discover buffer overflows and to help trace leaked binaries.
I'd like to see a 6502/65816 assembler that supports the very 6502-like SPC700 instruction set with the same feature set that ca65 provides to 6502, 65C02, and 65816.
As it were, IB CS forces students to use Java. Now I'm not 100% certain, but I've got $20 saying the ca65 is not written in Java.
For the curious, the criteria are as follows:
1) Arrays
2) User-Defined Objects
3) Objects As Data Records
4) Simple Selection (if-else)
5) Complex Selection (nested if, if with multiple conditions, or switch)
6) Loops
7) Nested Loops
8.) User-Defined Methods
9) User-Defined Methods w/ Parameters
10) User-Defined Methods w/ Return Values
11) Sorting
12) Searching
13) Binary File I/O
14) Use Of External Libraries
15) Use Of Sentinels/Flags
Personally I think the whole IB CS thing is a load of shieße. Had I known what I was getting into with this class, I would have not taken it. To give you an indication of the level of mastery my teacher has, he once told the class you could concatenate two chars into a String in 'System.out.println()' by separating them with a comma.
I told him to compile this:
Code:
public class Bull {
public static void main(String[] args) {
System.out.println('B', 'S');
}
}
If the Java language is the requirement, then a preprocessor might be the perfect project.
Well the one problem with only making a preprocessor is that my teacher wants us to read and/or write a binary file. I can't imagine ever having to do anything involving binary files for a preprocessor.
Anyway regarding a compiler, aside from macros and block commenting, are there any other desirable features out there?
Haha, make it compile, but also output 8-Bit BIN files of the code with another command line switch?
It's not that hard at all with C. Even noobs like me figured it out quite easily, even if not that good.
Well this is sounding very interesting now and has great ideas. I hope you get a good grade and your tool gets used alot! Sounds like it'll be good.
67726e wrote:
Well the one problem with only making a preprocessor is that my teacher wants us to read and/or write a binary file.
Then add a new directive .incdpcm that takes a .wav file (binary), encodes it to DPCM, and spits out a huge block of .byt statements. I did something similar for the GBAdev community when the GNU assembler didn't have .incbin.
I talked to my teacher about my choice for the final project and he isn't 100% sure even a compiler would fulfill all the requirements. He is emailing some IB brass to find out what they have to say about it.
That sounds like a really retarded teacher.
Just tell him you'll be working with assembly, if that will be fine, and tell us how he answers to that. lol.
Maybe you should try to write the next BASIC compiler for the NES. Make it ridiculously easy to write a "hello world" program on the NES. But provide the ability to stick in pure asm when needed, etc. It would make for an awesome tool for total beginners to programming who want to eventually learn enough to make a game.
67726e wrote:
One of the things I am really set on having is block comments (/* */).
Agreed, I hate having to go line-by-line to comment something out.
cartlemmy wrote:
67726e wrote:
One of the things I am really set on having is block comments (/* */).
Agreed, I hate having to go line-by-line to comment something out.
I do too. When I need to comment out more than a dozen or so related lines, I'll use ".if 0", ".endif" to temporarily disable them.
I've decided that if the compiler it enough on its on, I'm going to through the compiler in with a small 'IDE' that basically just manages the current project you are on and offers syntax highlighting.
Either way I will be writing some kind of compiler for fun at the very least. And FYI, I just wrapped up the code that strips out line comments from the source file so at the very least a preprocessor utility is with that feature will happen.
3gengames wrote:
That sounds like a really retarded teacher.
Just tell him you'll be working with assembly, if that will be fine, and tell us how he answers to that. lol.
Its not so much the teacher so much as it is the IB program and their rules on what you have to do to prove you have 'mastery' over the language. I originally was gonna write this in C++ and just use JNI to put it in a Java shell but anything that isn't written in Java is not taken into consideration for the 'mastery' portion.
Gradualore wrote:
Maybe you should try to write the next BASIC compiler for the NES. Make it ridiculously easy to write a "hello world" program on the NES.
Yeah, something along the lines of
batari Basic. A lot of people who had no clue about programming are making Atari 2600 games with it (and some are actually good!), and that's a system that's not particularly easy to code for, specially when you are completely oblivious to programming. It abstracts all the hardcore stuff such as screen-drawing kernels (by providing a set of general-purpose built-in kernels) and leaves just the game logic itself and the art for the programmer.
The equivalent of that on the NES would be to radically facilitate video and audio generation. For example, the programmer would interact with the name tables through an array, without ever having to worry about VBlank and NMIs. Behind the scenes, your framework would delay all such updates so that they were performed at the correct times, but the user doesn't have to care about that.
cartlemmy wrote:
Agreed, I hate having to go line-by-line to comment something out.
If Notepad++ is configured for the language you are using (so that it knows what the comment symbol is), you can select several lines and comment or uncomment them all at once with a single command. Because of that, the lack of multi-line comments doesn't bother me much.
In ASM, if I need to temporally skip a bunch of code, and timing and code-alignement are not important, I just JMP over the code.
clueless wrote:
I'll use ".if 0", ".endif" to temporarily disable them.
OH jeez, why didn't I think of that!?
tokumaru wrote:
...I just JMP over the code.
Or that.
67726e wrote:
Ok, optimizations aside, what are features you would like to see in an assembler?
One of the things I am really set on having is block comments (/* */).
CA65 can do block comments if you enable them with .FEATURE. I thought about using them, but then was like 'meh'.
Also, on CA65 they'll eat any newline characters between the open and closer.