Address 0 in zeropage

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Address 0 in zeropage
by on (#165206)
I noticed something and I'd like to hear your opinion:


When writing a config file for CC65, I declared the zeropage memory like this:
Code:
ZP: type = rw, start = $0000, size = $0100, file = "";


But then I thought about the fact that you can declare NULL pointers in C:
Code:
int *p = NULL;


They are supposed to point to nothing at all. So, wouldn't it be absolutely necessary that the start address of the zeropage gets an offset?
Code:
ZP: type = rw, start = $0001, size = $00FF, file = "";


Because otherweise, the following condition would be true:
Code:
int *p1 = &FirstGlobalZpVariableInTheProgram;
int *p2 = NULL;

if (p1 == p2)
{
    /* This should never happen. */
}

I had a look at Shiru's NES programs for an answer, but they start the zero page at $20 for some reason.

If my assumption is correct, then how many bytes should be reserved for this?
Should we reserve eight bytes, so that accessing the value of a NULL pointer of the biggest standard data type (double) always returns 0?
Or should we only reserve one byte to make sure that the address comparison between a NULL pointer and a non-NULL pointer is never true. Accessing the value of a NULL pointer is undefined anyway and it therefore doesn't matter if a NULL pointer to int or long would include bytes from other variables if dereferenced.


What do you say about this?
Re: Address 0 in zeropage
by on (#165215)
Null = 0.

Null pointer points to address $0000.

I don't know if the scenario you mentioned is possible. But it seems like an unusual situation... having a valid pointer to the zero page address zero. Hmm.

You would think cc65 would give an error message if you tried to do this.
Re: Address 0 in zeropage
by on (#165216)
dougeff wrote:
Null = 0.

Null pointer points to address $0000.

Yes, I know. That's why I said it's supposed to point to nothing, not that it actually does.

dougeff wrote:
I don't know if the scenario you mentioned is possible. But it seems like an unusual situation... having a valid pointer to the zero page address zero. Hmm.

Yes, it is possible. I tried it out.

Some of CC65's config files even start the zeropage at $0000. (Not in the case of the NES, though.) And rainwarrior's example file does as well.
Re: Address 0 in zeropage
by on (#165217)
What you said is technically true. If you want to say compliant with C, you should reserve a "null" address that no other object may have. In practice it's a pretty minor problem, given that most of the stuff on NES tends to be statically allocated and you won't need to be doing that much null pointer checking anyways.

I don't see much point in reserving more than one byte. You can even go and reuse that first memory location for something that's not visible from C.
Re: Address 0 in zeropage
by on (#165218)
thefox wrote:
You can even go and reuse that first memory location for something that's not visible from C.

Or just leave the CRT's own allocations there, since your C code will never attempt to create a pointer to those.

This is all incredibly moot though. A NULL pointer is just a special return value for cases like "object not found" or "out of memory". It only matters to code that needs to check for that possibility. Even if you did want to use a pointer to $0000 (and you won't) it's quite valid to simply use a pointer that happens to be equal to 0 in all other cases. The question is "will I ever need to distinguish a pointer to $0000 from NULL as a special return value?" and the answer should simply be "no". This would never happen unless you were doing something highly unusual.
Re: Address 0 in zeropage
by on (#165243)
On a PC, trying to access memory address 0 results in a segfault. The NES has no segfaults or any concept of segmentation, so you can happily access $0000 all you want.

There is a reasonable situation in which this would cause a problem:
Let's say, for example, you have a "print" routine that runs every vblank and, if a pointer is nonzero, copies the pointed-to string to the PPU. Now let's say you've declared an array, and you want to print that array, but the compiler puts the start of that array at $0000 (unknowing to you). You set the print pointer to the start of that array, but oops, the array starts at $0000, so the print routine thinks "pointer is zero, nothing to print", leading to failure. That'd be a pretty awful bug to track down.

You can tell CC65 to skip $0000, or you can reserve that byte somehow, for a variable you know won't cause null pointer confusion.
Re: Address 0 in zeropage
by on (#165256)
Drag wrote:
the compiler puts the start of that array at $0000 (unknowing to you).

This can't happen. CC65's C compiler won't allocate variables on the zeropage. You can only import zeroopage variables from assembly allocations, so you would have had to put it there yourself.

Edit: calima explains how to forcefully allocate on the ZP from C below.
Re: Address 0 in zeropage
by on (#165261)
rainwarrior wrote:
Drag wrote:
the compiler puts the start of that array at $0000 (unknowing to you).

This can't happen. CC65's C compiler won't allocate variables on the zeropage. You can only import zeroopage variables from assembly allocations, so you would have had to put it there yourself.


Hey!
Do you know if the ZP is saved for something special other than pointers? It doesn't go completely unused, does it?

Thanks!
Re: Address 0 in zeropage
by on (#165264)
C code doesn't put anything in ZP. Only assembly code can. If you need to, you can allocate ZP blocks in assembly and then import them to be used in C.

The CRT defines a handful of ZP variables in assembly which are expected by the compiler and used extensively by the CRT (something like 10 bytes worth), and can't be used directly by your C code (they're "internal"). Everything else goes in regular RAM. (BTW putting this internal ZP allocation at $00 completely obviates the OP's point.)

Why doesn't it allocate things in ZP? Probably because this would be a special case, require additional extensions to the C language to annotate ZP variables, require additional initialization, and any performance benefit of using ZP is completely trashed by the slowness of C code in the first place. Most of CC65's target platforms have a lot more RAM by default than the NES too, so scarcity isn't as glaring a problem.

Edit: This is not quite true. calima explains how to forcefully allocate on the ZP from C below.
Re: Address 0 in zeropage
by on (#165267)
On most platforms, Null is an address which causes memory protection errors if you try to access it. You can see messages like "Segmentation Fault" or "Bus Error".
Not so much on the NES though.
C standard defines the zero literal as the null pointer, but does not specify the actual memory address used for it, so a compiler is free to pick something other than zero.
Re: Address 0 in zeropage
by on (#165272)
rainwarrior wrote:
This can't happen. CC65's C compiler won't allocate variables on the zeropage. You can only import zeroopage variables from assembly allocations, so you would have had to put it there yourself.

Sorry, this is wrong: cc65 will put C variables there if you explicitly tell it to. No need to write asm.

Quote:
Probably because this would be a special case, require additional extensions to the C language to annotate ZP variables, require additional initialization, and any performance benefit of using ZP is completely trashed by the slowness of C code in the first place.

It has those extensions, I regularly use them, because I CBA to write asm for declaring variables. The performance and size benefits certainly apply to C just as well.

Here's a sample. Header:
Code:
extern unsigned char myzp;
// The following lets other files know this is a ZP symbol.
#pragma zpsym("myzp")


File:
Code:
#pragma bss-name (push, "ZEROPAGE")
// All vars in this block go to ZP.
unsigned char myzp;
#pragma bss-name (pop)
Re: Address 0 in zeropage
by on (#165277)
calima wrote:
Sorry, this is wrong: cc65 will put C variables there if you explicitly tell it to. No need to write asm.

Ah, okay. I didn't realize that you could override the "BSS" to use "ZEROPAGE" and then combine with the zpsym pragma. Interesting.

So, okay, you can do it. You wouldn't do it by accident, but it is there, thanks.

What I meant about a special case for initialization, is what happens if you put initialized variables in ZP. Does the CRT have an initialize/copy routine that will fill in the variables you just explicitly moved to ZP, or do you have to write your own extention to the CRT for this? (I don't remember seeing anything like this in there...)
Re: Address 0 in zeropage
by on (#165281)
That I don't know, since I never use initialized vars, it's all by code.

edit: Assuming initialized vars in general work, I don't see why ZP would need a special case. After all, the long pointer form can still be used to access ZP (lda $0010 and lda $10 do the same thing, the first is just slower).
Re: Address 0 in zeropage
by on (#165284)
Ok, so you have to explicitly jump through hoops to declare something in zeropage. That makes things a little better, but the null pointer confusion still exists; returning to my print example, let's say you wanted that array to be in zeropage, how would you ensure that it's not declared at $0000? Is it as simple as just declaring something else first?
Re: Address 0 in zeropage
by on (#165287)
My linker script defines the ZEROPAGE segment to run from $0010 through $00FF because it's my standard practice in 6502 assembly to use $0000 through $000F for local variables.
Re: Address 0 in zeropage
by on (#165289)
What do you mean with local variables?
Re: Address 0 in zeropage
by on (#165290)
Local variables are variables that a subroutine/function uses whose values don't need to stick around between calls to the same subroutine/function.
Re: Address 0 in zeropage
by on (#165293)
I know what a local variable is in general. I was asking what you mean in the context of CC65. Because I find a few things strange with your statement:

1. If you actually use local variables:
Code:
void Function(void)
{
    int localVariable;
}
(I don't know how to do this in Assembly syntax apart from using the CC65 functions) wouldn't they be created on the stack? So, they wouldn't occupy the first 16 bytes of the zeropage, would they?

2. If your local variables are just helper variables that you declare once and use in various functions, aren't they still global variables?

3. If you define ZEROPAGE as starting at $10, how do you declare your local variables in the first place? Do you give them absolute address values?
Code:
Local1 = $0000
Local2 = $0001
Local3 = $0002
; etc.
Or do you actually declare a separate segment?
Re: Address 0 in zeropage
by on (#165298)
DRW wrote:
3. If you define ZEROPAGE as starting at $10, how do you declare your local variables in the first place? Do you give them absolute address values?
Code:
Local1 = $0000
Local2 = $0001
Local3 = $0002
; etc.

Yes, this is my practice. The practice of others may differ.
Re: Address 0 in zeropage
by on (#165301)
rainwarrior wrote:
Does the CRT have an initialize/copy routine that will fill in the variables you just explicitly moved to ZP, or do you have to write your own extention to the CRT for this? (I don't remember seeing anything like this in there...)

Pretty sure you'd have to write your own init routine. I don't think the runtime expects anything to be placed there.

One limitation of the bss-name trick is that you can only use the "ZEROPAGE" segment, not one of your own zeropage segments, because it's not possible to specify addressing size for segments from C pragmas. Discussed here: https://github.com/cc65/cc65/issues/261
Re: Address 0 in zeropage
by on (#165307)
tepples wrote:
DRW wrote:
3. If you define ZEROPAGE as starting at $10, how do you declare your local variables in the first place? Do you give them absolute address values?
Code:
Local1 = $0000
Local2 = $0001
Local3 = $0002
; etc.

Yes, this is my practice. The practice of others may differ.

Then those are not local variables, but global variables that you simply use for general purposes. The Wikipedia article that you linked to doesn't talk about these kind of variables.
Re: Address 0 in zeropage
by on (#165312)
Yeah, those are really more like registers (which are normally used for storing local variables, as well as anything the compiler considers temporary, really). The 65816 is quite starved on CPU registers so I imagine it's not uncommon to use a few bytes in ZP as extra "registers". So, just put those starting at 0, then you pretty much ensure no variable will be ever properly assigned to those.
Re: Address 0 in zeropage
by on (#165316)
A "local" is a variable scoped to the lifetime of a function call. The concept doesn't require any particular implementation. (Depending on the compiler and the hardware, locals can be stored in registers, on the stack, or in general RAM.) What makes it a local isn't where you put it but how you use it. Like so many things, with 6502 ASM you have to roll your own. Statically allocating them in zero page could preclude recursion and could potentially cause conflicts if two functions attempt to use the same RAM without agreeing on what should be preserved when, but if the intended use is the lifetime of a function call, well, it's a local.
Re: Address 0 in zeropage
by on (#165328)
snarfblam wrote:
A "local" is a variable scoped to the lifetime of a function call.

Which is not the case if you simply declare 16 variables and use them thoughout your program. They may be helper variables or whatever, but they are not local.

snarfblam wrote:
Statically allocating them in zero page could preclude recursion and could potentially cause conflicts if two functions attempt to use the same RAM without agreeing on what should be preserved when, but if the intended use is the lifetime of a function call, well, it's a local.

This is nonsense. As you demonstrate yourself: You can use the same variable in various functions. Therefore it is not local.

This:
Code:
int X;

void Y()
{
    X = 1;

    // ...
}

void Z()
{
    X = 2;

    // ...
}
has nothing to do with a local variable, even if both functions don't interfere with each other's access of the variable. It's still global.

Just because you say that it is local doesn't mean it actually is.
According to your definition, a public property in a class is a private property as long as nothing outside the class accesses the property:
Code:
class X
{
    public int Y { get; set; }
    // Totally a private property.
}
Re: Address 0 in zeropage
by on (#165332)
DRW wrote:
Which is not the case if you simply declare 16 variables and use them thoughout your program. They may be helper variables or whatever, but they are not local.

And in many implementations of C, a function can read what another function left on the stack, though this is undefined by the Standard. Other C implementations allocate CPU registers to local variables and may return a value in the primary CPU register without clearing other registers. A program in mixed asm/C can read these other registers to discern the internal state at the end of a function.

DRW wrote:
This:
Code:
int X;

void Y()
{
    X = 1;

    // ...
}

void Z()
{
    X = 2;

    // ...
}
has nothing to do with a local variable, even if both functions don't interfere with each other's access of the variable. It's still global.

On common operating systems, very little is truly local. In theory, even though it is undefined by the Standard, a program can walk the heap used by malloc() and read blocks of memory that belong to other functions.

DRW wrote:
According to your definition, a public property in a class is a private property as long as nothing outside the class accesses the property:
Code:
class X
{
    public int Y { get; set; }
    // Totally a private property.
}

Not just snarfblam's definition but also Guido van Rossum's. Python doesn't implement anything like C++/Java/C#-style access restrictions on class members. The closest it has are that 1. members whose names begin with an underscore are usually private, and 2. undocumented members are considered private by convention, and getting or setting them causes unspecified behavior.
Re: Address 0 in zeropage
by on (#165333)
DRW wrote:
Just because you say that it is local doesn't mean it actually is.


Not so. You need only "say" it is a local by how you use it within your code.

Even in a higher level language, if you declare a variable as a global and then only use it in a manner that is consistent with the concept of a local, while you may not be using the language's built-in implementation for local variables, and it may not be what the language specification identifies as a "local" variable, it is still, in effect, a local variable. That may seem obtuse in a language like C++, but in 6502 ASM it is a practical implementation of local variables.

I guess the sticking point for you is that the assembler doesn't enforce the restriction that the "local" variable can only be accessed within its intended scope. But there are all manner of protections the assembler does not afford you by virtue of it being an assembler. If that means it doesn't meet your definition of a "local", then it's just a case of some people being strictly theoretical and others being more pragmatic.
Re: Address 0 in zeropage
by on (#165335)
Layne's Law of Debate: As a debate grows longer, the chance of it becoming about the definition of a word approaches 1.

I'm with snarfblam. So long as all programmers attached to a project agree on a calling convention, $0000-$000F on 6502 is as local as EAX, EBX, ECX, EDX, ESI, and EDI on i386.
Re: Address 0 in zeropage
by on (#165336)
tepples wrote:
And in many implementations of C, a function can read what another function left on the stack, though this is undefined by the Standard.

Exactly: It is undefined.
I'm aware that you can do all kinds of stuff if you do some arbitrary memory access. This doesn't mean that a globally allocated variable that you simply choose to use like a local variable is local in the same way as some value that gets added and removed on the stack when accessing and leaving a function.

The difference between the undefined behavior with unexpected stack access and your situation is: If one of your functions accesses one of those 16 variables that it isn't supposed to access at this point, the result is not undefined at all. You just violated your own rule and potentially created a bug in your program, but the result will be well-defined and reproducable on every platform.

snarfblam wrote:
Even in a higher level language, if you declare a variable as a global and then only use it in a manner that is consistent with the concept of a local, while you may not be using the language's built-in implementation for local variables, and it may not be what the language specification identifies as a "local" variable, it is still, in effect, a local variable.

Yeah, in the same way a hair brush is in effect a back scratcher if you choose to use it exclusively like that.

snarfblam wrote:
That may seem obtuse in a language like C++, but in 6502 ASM it is a practical implementation of local variables.

Even in Assembly, you have a stack where you can add and remove variables, making them truly local because you assure that no variable that's still valid is shared with an unrelated function while the memory location is immediately accessible by any function again as soon as the local variable loses its lifetime.

16 persistent variables that can be and are used in many functions throughout the whole program have nothing to do with local-ness.

You might have a point if you specify that variable 1 only gets used in function 1, variable 2 and 3 only in function 2 etc. But if you have a variable that can and is used everywhere, well, that's a helper variable, but not a local one.

tepples wrote:
I'm with snarfblam. So long as all programmers attached to a project agree on a calling convention, $0000-$000F on 6502 is as local as EAX, EBX, ECX, EDX, ESI, and EDI on i386.

Yeah, and if we agree that "plane" means "bus", then I'll totally take a flight to work on Monday.

http://www.youtube.com/watch?v=r-t7UvIw7YM&t=28s
Re: Address 0 in zeropage
by on (#165339)
DRW wrote:
You might have a point if you specify that variable 1 only gets used in function 1, variable 2 and 3 only in function 2 etc. But if you have a variable that can and is used everywhere, well, that's a helper variable, but not a local one.

By your definition of local, it is a waste of CPU time to have more than three bytes of local variables in one subroutine on a 6502.

My linker script defines the ZEROPAGE segment to run from $0010 through $00FF because it's my standard practice in 6502 assembly to use $0000 through $000F for helper variables.
Re: Address 0 in zeropage
by on (#165340)
tepples wrote:
Layne's Law of Debate: As a debate grows longer, the chance of it becoming about the definition of a word approaches 1.

It approached 1 right from the get-go.

Holy shit, are we seriously discussing semantics instead of just trying to rephrase it? Like, "use $00-$0F as if they were extra (and slower) registers because the 65816 has a severe lack of real ones"? Because we seriously could have just boiled it down to that and called it a day. How you use those is entirely up to you.

Let's just treat them as temporary variables used for when the code runs out of registers, OK?
Re: Address 0 in zeropage
by on (#165341)
"local" variables can be stored in any way that their side effects remain local to that function. For example:

1. C-stack
2. hardware stack
3. registers
4. dedicated ZP/RAM storage
5. temporary ZP/RAM storage

In the case of 4, you only need to ensure the function is not re-entrant (i.e. it doesn't call itself, directly or indirectly). In fact, CC65 has a compiler flag to do this (--static-locals) which is a RAM for speed tradeoff.

In the case of 5, you only need to ensure that the value of the temporary is not stored across a function call made within the function. CC65 has several dedicated zeropage allocations for this purpose, and its CRT functions take extensive advantage of this. There is a relevant command line option for CC65 (--register-vars) that provides this in a partial way for regular C code. (If writing in assembly you can micro-manage their overlap and your function calls, and expect some to be preserved across a call, but in general a C compiler is not going to know enough to do that kind of thing.)

The compiler is free to implement local variables in a lot of different ways without causing global side-effects. The reason changing the state of RAM or something like that is not a "global" side effect is that it is internal; you can't access any of these things directly from C, except by its local handle within the function, which is exactly why it's local.

Well, of course you can always access illegal/hidden things if you know the underlying implementation, but that's a breach of the locality contract, if you will, and the compiler is no longer required to generate correct code if you're that kind of bastard. ;P
Re: Address 0 in zeropage
by on (#165345)
I guess de-referencing a NULL pointer leads to undefined behaviour, so writing to adress $00 (and followings, depending on size of the write) is as valid as a segfalut or whathever error message for platforms that supports them.

Also a NULL pointer is not nessesarly 0, it can be any value that the compiler agree serves as a NULL pointer, even if it is refered to as '0' in the source code. So you could arbitrarly decide that any value is a NULL pointer. I'd say that other than $0000, $ffff would be a great candidate too, as it's unlikely you'd want to point there, and it's as easy to test (just AND two bytes and compare with $ff).

NULL pointers are not supposed to be dereferenced, ever (if you do that it's undefined behaviour). They are just a "key" value that serves for comparison purposes: Most of the time as error handling, but also sometimes as a marker to say "this pointer is invalid" in some algorithms.

Even if $0 is used as a value for a null pointer, this doesn't prevent a valid pointer to point to $0, as long as it is never checked for NULL.

Quote:
If my assumption is correct, then how many bytes should be reserved for this?

Quote:
I don't see much point in reserving more than one byte.

If there is any risk of dereferencing a NULL pointer, reserving bytes for that is *NOT* the correct approach. Even if you were to reserve bytes, then you could dereference the pointer to write a long, a long long if the C dialect supports it, or even a struct that is very large. So there is basically no bound how many bytes could be affected by the faulty store instruction.

Instead, the correct approach is to make sure a very important variable is stored at $00. That way if this variable is erroneously overwritten by a NULL pointer dereferencing, then a bug immediately occurs and you can fix the problem. If this variable is unused, then a dummy write will happen and you risk never figuring the bug out, so this is a bad idea.
Re: Address 0 in zeropage
by on (#165349)
Bregalad wrote:
So you could arbitrarly decide that any value is a NULL pointer. I'd say that other than $0000, $ffff would be a great candidate too, as it's unlikely you'd want to point there, and it's as easy to test (just AND two bytes and compare with $ff).

Only if you're actually writing the compiler. :) I'm fairly sure cc65 will assume that the null pointer's underlying representation is a bit pattern of all zeroes, and will generate pointer comparison code based on that. There's no way to override this behavior without modifying the compiler.

For anyone interested, here's a good rundown of null pointers in C: http://c-faq.com/null/
Re: Address 0 in zeropage
by on (#165374)
Went looking up, it seems that since C99, NULL is pretty much guaranteed to be 0 (and if the platform doesn't want that for null, it's up to the compiler to work around that - as far as the program is aware, 0 is null). So that isn't helping matters.

Also, if the initial portion of ZP was purposed as register-like variables, writing to it would be guaranteed to result in a rather spectacular crash - may as well call it a fake segfault =P
Re: Address 0 in zeropage
by on (#165409)
Drag wrote:
Ok, so you have to explicitly jump through hoops to declare something in zeropage. That makes things a little better, but the null pointer confusion still exists; returning to my print example, let's say you wanted that array to be in zeropage, how would you ensure that it's not declared at $0000? Is it as simple as just declaring something else first?

I love how my question went unacknowledged. You guys have a serious issue with stuff like this.
Re: Address 0 in zeropage
by on (#165412)
In C, placement of variables in memory is not explicit, i.e. the compiler is free to do what it wants, so you can't really count on that, no.

In CC65 I think it's the order in which segments are declared within a memory block, followed by the order in which objects are passed to the linker, followed by the order in which they were declared in their individual modules.

So... nothing to be done directly in C, really, but it's pretty easy to do something with your linker config.

Again, though: THIS PROBLEM WILL NEVER COME UP UNLESS YOU'RE TRYING TO MAKE IT HAPPEN. (Which might be why nobody takes your issue "seriously", Drag?)
Re: Address 0 in zeropage
by on (#165415)
Drag wrote:
Drag wrote:
Ok, so you have to explicitly jump through hoops to declare something in zeropage. That makes things a little better, but the null pointer confusion still exists; returning to my print example, let's say you wanted that array to be in zeropage, how would you ensure that it's not declared at $0000? Is it as simple as just declaring something else first?

I love how my question went unacknowledged. You guys have a serious issue with stuff like this.

I implicitly acknowledged it:
tepples wrote:
My linker script defines the ZEROPAGE segment to run from $0010 through $00FF because it's my standard practice in 6502 assembly to use $0000 through $000F for helper variables.

A linker script that never includes $0000 in a MEMORY area will cause the linker never to place an array at $0000.
Re: Address 0 in zeropage
by on (#165463)
rainwarrior wrote:
Again, though: THIS PROBLEM WILL NEVER COME UP UNLESS YOU'RE TRYING TO MAKE IT HAPPEN. (Which might be why nobody takes your issue "seriously", Drag?)

I understand, but regardless of opinions, when you're an engineer, you must take every issue completely seriously, no matter how mundane or unlikely it is, or else you wind up with obscure bugs that'll screw fellow engineers years down the line, after everyone's forgotten about the mundane issue we didn't take seriously. I demonstrated a reasonable* scenario in which correct C code would fail, that's all anyone should need to at least document the problem, if not fix it.

tepples wrote:
I implicitly acknowledged it:
tepples wrote:
My linker script defines the ZEROPAGE segment to run from $0010 through $00FF because it's my standard practice in 6502 assembly to use $0000 through $000F for helper variables.

A linker script that never includes $0000 in a MEMORY area will cause the linker never to place an array at $0000.

That's the obvious solution, yes, and what was proposed by the OP. It's perfectly acceptable, but I'd also like to know if there are other ways around it. As rainwarrior pointed out, there may not be a straightfoward thing you can do in C to ensure something not being defined at $0000, but maybe if you moved CRT's internal "reserved" ZP bytes to $00-$0F+?


*: The example in question isn't intentionally trying to trigger the bug. It's simply a mundane situation in which a particular design would cause a failure, even though the C code was correct. The routine itself is fine, but the ZP array could've been a later addition, and since this problem relies on the linker arranging variables a particular way, this falure could be triggered by changing completely unrelated code elsewhere in the project. Any programmer would be 100% baffled because the problem isn't their code, it's the compiler/linker.
Re: Address 0 in zeropage
by on (#165471)
Drag wrote:
I understand, but regardless of opinions, when you're an engineer, you must take every issue completely seriously, no matter how mundane or unlikely it is, or else you wind up with obscure bugs that'll screw fellow engineers years down the line, after everyone's forgotten about the mundane issue we didn't take seriously. I demonstrated a reasonable* scenario in which correct C code would fail, that's all anyone should need to at least document the problem, if not fix it.


* It's a remote edge case which requires a number of co-inciding events:

1. forcing CC65 to allocate arrays on the ZP, either through assembly workarounds or deliberate pragmas
2. the allocation made just happened to accidentally be the first thing on the ZP (random combination of link order, etc.)
3. you want to use a pointer to point to one of these allocations
4. this pointer will be used as an argument to a function, or a return value
5. and the use of those functions must also depend on the special-case NULL value

All five of these things have to be simultaneously true to have a conflict, and it's pretty trivial to stop any of them (especially without intending to). Condition 1 is non-trivial, and you're already getting into the underlying implementation (either by mixing assembly allocations with C, or understanding advanced pragmas specific to this compiler). And remember, the only code this breaks is a check for NULL; there's nothing wrong with using the pointer that happens to be 0 if there's something actually allocated there.

All this on a hobbyist compiler with only a small number of people using it worldwide. My honest assessment is this is just about the tiniest nit you could possibly pick with cc56. I'm happy to talk about it on this forum, 'cause it relieves boredom, or whatever, but I'd be lying if I said I thought any of this mattered.

No, not every issue is serious. I don't even want to debate that (because debating the truth of this is not a serious issue either).

...and if you really are worried about it, it's simply 1 line of code in a linker config to reserve a single byte on its own ZP segment. Personally I'd rather have an extra byte free than used as a prophylactic against this apocalyptic scenario. (And even that wasted byte can be resolved with another trivial line somewhere else...)
Re: Address 0 in zeropage
by on (#165510)
All of my previous points still stand, all you did was reiterate how much of a non-serious issue you believe this is. If you want a nice example of what a "bah, nobody'd ever do this so I don't need to fix it" attitude results in, take a look at nesasm, with all of its obscure bugs.

If you release something that the public is going to use, you treat every issue seriously, no matter how minor. Don't like it? Don't write software.
Re: Address 0 in zeropage
by on (#165514)
The Standard fails to state that it is an issue in the first place.

Dereferencing a null pointer is undefined behavior. The Standard states that the developer of a compiler need not worry about the output when compiling a program containing undefined behavior. So unless the developer of a compiler is advertising that the compiler explicitly traps a particular undefined behavior, or unless the Standard requires that a compiler emit a diagnostic when it detects a particular undefined behavior, the developer of a compiler can just ignore it and still conform.
Re: Address 0 in zeropage
by on (#165515)
I stand by everything I've said so far, but it looks like this issue lives another day.
Re: Address 0 in zeropage
by on (#165518)
Drag wrote:
Ok, so you have to explicitly jump through hoops to declare something in zeropage. That makes things a little better, but the null pointer confusion still exists; returning to my print example, let's say you wanted that array to be in zeropage, how would you ensure that it's not declared at $0000? Is it as simple as just declaring something else first?

In cc65 this can be ensured in the linker configuration file. I think the question went unanswered partially because it had already been covered by previous answers.

tepples wrote:
The Standard fails to state that it is an issue in the first place.

Dereferencing a null pointer is undefined behavior. The Standard states that the developer of a compiler need not worry about the output when compiling a program containing undefined behavior. So unless the developer of a compiler is advertising that the compiler explicitly traps a particular undefined behavior, or unless the Standard requires that a compiler emit a diagnostic when it detects a particular undefined behavior, the developer of a compiler can just ignore it and still conform.

I don't think they were talking about dereferencing a null pointer.
Re: Address 0 in zeropage
by on (#165522)
thefox wrote:
I don't think they were talking about dereferencing a null pointer.

Then perhaps I misunderstood what DRW's first post in the topic meant by "accessing the value of a NULL pointer".

Besides, a few posts down, Drag mentioned the solution: "You can tell CC65 to skip $0000". The linker scripts mentioned so far in this topic do exactly this.
Re: Address 0 in zeropage
by on (#165524)
tepples wrote:
thefox wrote:
I don't think they were talking about dereferencing a null pointer.

Then perhaps I misunderstood what DRW's first post in the topic meant by "accessing the value of a NULL pointer".

I was assuming you were commenting on Drag's/rainwarrior's discussion (they were the "they" in my message).
Re: Address 0 in zeropage
by on (#165525)
Drag wrote:
All of my previous points still stand, all you did was reiterate how much of a non-serious issue you believe this is. If you want a nice example of what a "bah, nobody'd ever do this so I don't need to fix it" attitude results in, take a look at nesasm, with all of its obscure bugs.

If you release something that the public is going to use, you treat every issue seriously, no matter how minor. Don't like it? Don't write software.

Okay, just to be clear, I'm not the maintainer of CC65. I'm speaking entirely in the context of this forum. You complained that nobody here took your "problem" seriously, and my point was that you aren't actually having a problem. Nobody's having a problem with this "bug", because it's never come up in the history of CC65. Also, multiple people here, including myself, have explained multiple ways you can solve this (non) issue, so I'm not even sure what your beef is. We actually did take it seriously in every practical sense that we can at this forum.

My last post wasn't just "re-iterating" how non-serious it was, I was explaining 5 critical points that can be used to work around the problem as well, and at this point I feel like you're not taking me "seriously", to be honest.

I was really responding directly to this...
Drag wrote:
I love how my question went unacknowledged. You guys have a serious issue with stuff like this.

...not as a maintainer of CC65, but as a user of this forum, who like everyone else here doesn't owe you a thing w.r.t. explaining CC65, maintaining CC65, or anything of the sort. You asking the most obscure question about CC65 on this niche NES forum is not a "serious issue". The serious issue here, in my view, is that you feel entitled to an answer from us about something we're not responsible for. ...AND I DID ANSWER YOU, AND SO DID MANY OTHERS when you asked again, despite your insulting attitude. The qualifying part of my commentary was merely a reaction to your rudeness.

If you think somebody working on CC65 should address this, there is an appropriate place to report it, and that place is not the NESDev forums.

NESASM's "obscure" bugs actually do come up and irritate people constantly, and it also has no active maintainer. I'm not sure why you think that compares to CC65 having a flaw based on bizarrely specific behaviour that I'm certain has never come up for anybody using it before. Nobody here is complaining that it happened to them, just that it could happen, in some obscure theoretical situation.

I do release software to the public, in my personal projects, and for a living. If you have a complaint about a piece of my software, feel free to submit a comment or report in an appropriate channel. If you're complaint is "you don't ideologically think that every complaint is a serious one", well I won't be taking that complaint seriously. ;P
Re: Address 0 in zeropage
by on (#165528)
Drag wrote:
All of my previous points still stand, all you did was reiterate how much of a non-serious issue you believe this is.
tepples wrote:
The Standard fails to state that it is an issue in the first place.

And I think that's the issue. The standards simply assume that null is zero and consider that address to be simply invalid. So this is a flaw in C in the first place. The only option here is to make the toolchain avoid using that address altogether - which is what messing with the linker does. But that's something beyond the scope of the language.

So basically what you want to do is go yell at the standards comitee. And they won't hear you because no modern platform puts anything useful for an userspace program at that address (they usually leave vectors there instead).
Re: Address 0 in zeropage
by on (#165532)
rainwarrior wrote:
Ah, okay. I didn't realize that you could override the "BSS" to use "ZEROPAGE"....
What happens if you put initialized variables in ZP?

Anwering my own question, in case it's of use to anyone else:

Variables with an initializer go into the DATA segment, rather than BSS. There is another corresponding pragma for this (#pragma data-name), though it can't actually be used with any segment of type zp or bss (both of which dictate uninitialized data), so if your linker configuration is correct you will get an appropriate error for attempting this.

Any segment used for DATA must be of type rw, which is apparently a promise to the compiler that your CRT will correctly initialize it.
Re: Address 0 in zeropage
by on (#165539)
rainwarrior: I owe you some explanations. I asked a question and didn't get a response. I don't think it's unreasonable to expect a response when you ask a question. I've made several posts on nesdev (sometimes things that took time and effort to figure out) that were met with silence, and it happening again is what set me off. That's where my original "you guys have a serious problem with this" quip of entitlement came from, but I'll save that discussion for later. When I asked that question, I was looking for a different solution than just skipping bytes, but I didn't make that clear enough. Your next post somewhat answered my question, but then you said "THIS PROBLEM WILL NEVER COME UP UNLESS YOU'RE TRYING TO MAKE IT HAPPEN." Which started me on a different tangent. Now to explain my mindset, this is the C version of the print example I came up with in my first post:
Code:
#pragma bss-name (push, "ZEROPAGE")
unsigned char zpBuffer[16];
unsigned char *zpPrintBuffer;
#pragma bss-name (pop)

void main() {
  // <code>
  zpPrintBuffer = zpBuffer;
  // <code>
}

void vblank() {
  if (zpPrintBuffer) {
    // <print routine here>
    zpPrintBuffer = 0;
  }
}

This looks completely innocuous and like something someone could easily do, so I was confused by your all-caps statement, leading to my "no, take it seriously" post. Next, I asked if a different solution would be to move reserved CRT bytes to cover $0000 (or if that's even something you can do), which I still haven't gotten an answer on, which is why I'm still posting.

Your next post listed 5 things that need to happen to cause the problem (all 5 were satisfied by my example already, and you wouldn't know to change any of those things since this'd be a hard thing to debug), plus you cited that this is even less of a problem because we're hobbyists and that I'm just nit picking, leading to my response.

I couldn't care less about CC65, my problem is the notion that we're allowed to downplay problems because they're obscure or we're hobbyists, as though ignoring problems makes them go away. That's what I was arguing with you about. Yes, you should prioritize problems, but when the problematic code looks as innocuous as the example above, I think that makes it a bigger problem. So in the end, I wasn't even concerned with what the solution was, but rather the severity that the problem was being treated with. I'm sorry for the confusion and anger.


Tepples: I apologize, but I misinterpreted your post about the standards. I'm not talking about dereferencing a null pointer, I'm talking about a legitimate pointer to $0000 being confused with a null pointer.

Tepples and Sik: When coding in C like in my example above, I feel that the C notion of $0000 being "off limits" should be respected and that the linker should avoid allocating any variables there, but I also feel that rather than skipping $0000, it should be reserved by the CRT or something similar. AFAIK, C was not developed with 6502 in mind, but if you're going to have a C compiler, you should still adhere to the C standard. In fact, if the CC65 linker allocates user variables to $0000 (which, again AFAIK, can happen by default unless you tell it not to), CC65 is at fault for doing something nonstandard.
Re: Address 0 in zeropage
by on (#165541)
Drag wrote:
I don't think it's unreasonable to expect a response when you ask a question.

The first post in this topic is by DRW, who on more than one occasion said that he would prefer a week of silence to workarounds that almost but don't quite meet the criteria of the original question. [1] [2]

Quote:
I apologize, but I misinterpreted your post about the standards. I'm not talking about dereferencing a null pointer, I'm talking about a legitimate pointer to $0000 being confused with a null pointer.

In an environment that encourages legitimate pointers to $0000, NULL would have an underlying representation that differs from 0000000000000000. For example, in an environment where valid addresses run from $0000 to $3FFF, NULL may be stored as $8000, which requires flipping bit 15 when casting between pointers and integers. I use this paradigm often in assembly language with pointers into VRAM, where bit 15 true means the pointer is invalid. But as for C, I admit that I haven't studied the relatively more recent intptr_t type. Of course, if you're NVIDIA, you just intentionally violate the C standard and just give NULL the same representation as the first valid pointer.

In practice, I agree with you that if you are building an application binary interface (ABI) for C, the easiest thing to do is to have the CRT or linker script reserve some short range starting at $0000.

Quote:
In fact, if the CC65 linker allocates user variables to $0000 (which, again AFAIK, can happen by default unless you tell it not to), CC65 is at fault for doing something nonstandard.

By default, all variables go in segment BSS, which on the included linker scripts doesn't include zero page. If you're using #pragma bss-name, you're already doing something nonstandard.

Do you have a GitHub account? If so, I encourage you to poke around in cc65's issue tracker for a while.
Re: Address 0 in zeropage
by on (#165547)
tepples wrote:
The first post in this topic is by DRW, who on more than one occasion said that he would prefer a week of silence to workarounds that almost but don't quite meet the criteria of the original question. [1] [2]

Yeah, there are situations where having no response is better, but there are other times when it's polite to leave feedback, like if you're asking for suggestions and people are posting them. That way, even if your suggestion was turned down, you still feel appreciated and not like nobody cares.

tepples wrote:
In an environment that encourages legitimate pointers to $0000, NULL would have an underlying representation that differs from 0000000000000000.

As long as it happens transparently to the C coder. However, you wind up with oddball cases like the FDS, where basically everything is RAM. Then it becomes the $8000 confusion problem instead of the $0000 confusion problem.

tepples wrote:
By default, all variables go in segment BSS, which on the included linker scripts doesn't include zero page. If you're using #pragma bss-name, you're already doing something nonstandard.

Ok, you got me there. I still think it's worthwhile to consider having $0000 reserved in a more useful way rather than just skipping it though. Then you wouldn't need to also reconfigure your linker.

tepples wrote:
Do you have a GitHub account? If so, I encourage you to poke around in cc65's issue tracker for a while.

Thanks, but I'll pass. :P I'm not in a condition to take up a project like that.
Re: Address 0 in zeropage
by on (#165551)
tepples wrote:
By default, all variables go in segment BSS, which on the included linker scripts doesn't include zero page. If you're using #pragma bss-name, you're already doing something nonstandard.

The ability to reassign the BSS segment as "ZEROPAGE" doesn't appear to be directly intended (the documents don't appear to consider the possibility), and using this pragma at all has a few documented caveats (e.g. the CRT will zero out "BSS" by default, but not any other segment, same goes for the DATA segment). Being able to assign it to a zeropage segment seems more like a loophole than anything else; i.e. this is advanced-user kinda stuff. It works, but it's kind of a "you're on your own" situation to begin with. The other option is maybe having allocated the array in assembly and using the zpsym pragma, again non-standard and presumes assembly knowledge.

But just to be clear, I think placing an array on the zeropage is unusual. Here are the reason's I'd expect someone to do it:
  • 1. The performance benefit of zeropage access.
  • 2. Out of space in regular RAM areas.

So, case 1 is weird for this, because you only get the performance benefit if you're statically accessing this zeropage array. Using a pointer with it doesn't come with the performance benefit, so if this applies we're presuming that you have code in one place that does "fast" static access to it, and code in another place that does "slow" / "generic" pointer access to it; and finally we're presuming that in the slow segment we're testing it for NULL for some reason. I think that's a rather tall order? Sure it could happen, in theory, but I am certain it's never going to be a real problem for anyone using this compiler. (It shouldn't be a problem for anybody who has read this thread, by this point at least.) Most likely case I can think of is somebody placing NMI update data on the ZP in assembly, and filling it with a pointer in C code, but checking it for NULL at this point is a stretch to me; even most C library string functions don't check for NULL.

Case 2 is also a bit unexpected. Usually ZP is very scarce and other RAM is far less so, but performance needs tend to dictate that ZP variables be reserved for things that are accessed statically.

It's all predicated on a test for NULL too, which is also kind of unusual for NES code, which tends to use static allocations. I mean, sure there's probably some object oriented case where you have a bunch of objects and stuff like that, but we're also talking about an architecture where it's usually better to use a structure of arrays instead of an array of structures, and pass indices instead of pointers so you can statically access the arrays.

The hypotheticals are a bit strange to me. I can think of some, but I would never expect them to happen on the scale of use we're working with (especially in the NESDev subset). It just seems... incredibly obscure to me at face value.


I mean, anything is possible, of course. It's possible. It's just not the kind of usage that I think anybody would think of naturally. The most likely situation in my mind is some advanced user doing a bunch of zeropage optimizations, and then other users using their code or copy pasting their advanced stuff without understanding how it works, and accidentally tweaking it in some weird way. That kind of thing does happen, and rather frequently I think, though some people are working on these issues. (That particular problem, calima's forking neslib to fix things that shiru won't, and myself I'm about to submit a documentation patch to CC65 to explain that CC65 doesn't actually have an 8-bit return value.) I just wouldn't ever expect the NULL pointer snafu to be a significant thing, and if it does happen in a case like that you can usually blame the advanced author's library as much as any other part of the problem. :P
Re: Address 0 in zeropage
by on (#165554)
I agree, libraries are where issues have a better chance of appearing, that and code that gets copy/pasted or tweaked over time. The main scare was that something had the chance of randomly breaking and being very hard to debug, but after all the explanations and CC65 knowledge in here, it really does seem I was blowing it out of proportion. Sorry, my bad. :P

As long as we're talking about unusual things, compiling C code to the 6502 seems really strange to me, since there's plenty of C conventions that don't translate well to 6502 and vice versa, and you'd never know unless you wrote and optimized a lot of 6502 assembly. Especially the lack of a stack frame and not fully understanding how variables are allocated both statically and on the heap (especially when you only have 1.5kb to work with, and then how would you use 8kb wram? And then the FDS?), plus the lack of any memory protection and traps. Basically, I don't understand, and that's where my confusion came from.
Re: Address 0 in zeropage
by on (#165555)
CC65 is probably a lot more suited to FDS stuff than it is regular NES, since the paradigm of having most of the memory space be RAM is much closer to what you'd have on most 6502 computers (e.g. Commodore 64, which is probably its primary target).

Using 8k WRAM is relatively straightforward: just use your linker config to put your BSS/DATA segments there instead of the NES' internal RAM area. Unfortunately then if you want to place anything on the NES' RAM in C you'd have to do that manually (this is really the intended use of the "bss-name" pragma, I think; cases where RAM is divided), and add something to zero it out on startup.
Re: Address 0 in zeropage
by on (#165558)
I'd just add that if you really want the behavior of ld65 placing the zeropage.inc variables first, you'll need to send a patch. Opening a bug report would be a good start, but I doubt anyone else has the same need.
Re: Address 0 in zeropage
by on (#165562)
It's kind of an annoying thing because I don't think there's a clean way to solve it generically/incrementally.

Adding a new segment requirement to the CRT would invalidate a lot of existing linker config files; for a bug that nobody's hit yet breaking backward compatibility to solve it would be a huge net loss, relatively.

If you put the CRT library before anything else in your linker command line, that would probably be sufficient to prevent the problem, but where does that go? A recommendation in some document somewhere?

Maybe you could have the compiler/linker investigate references, and if any reference to a symbol that isn't NULL or a literal 0 that evaluates at link time to 0 could generate a warning, but I think this would be far from trivial to implement (and would probably generate a lot of false positives). A lot of encumbrance to add to the compiler for such a tiny gain.

Like I don't really know how I'd suggest solving this problem upstream, even if I wanted to. Really the best you could do is mention the theoretical issue somewhere in the documents, probably? I dunno where it would go, though.
Re: Address 0 in zeropage
by on (#165573)
rainwarrior wrote:
If you put the CRT library before anything else in your linker command line, that would probably be sufficient to prevent the problem, but where does that go? A recommendation in some document somewhere?

Word of mouth is probably the best bet. In the form of example files and tutorials placing the CRT first. Documentation that shows this off can include a footnote stating that it's recommended to place the CRT first.

Most users will just do it and not really care why.

Maybe if that windows template creator project gets off the ground, we can put the "fixed" linker config file in it. ;)