[C/C++]Relying on short to be 2 bytes wide, a good practice?

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
[C/C++]Relying on short to be 2 bytes wide, a good practice?
by on (#58205)
Of course you can't just rely on it, because a short int must be bigger than a char but can be as big as an int, which can be 4 bytes wide.
What I plan on doing is using short for addresses, and making an assert that won't let the emulator run if a short variable is not 2 bytes wide.

This way I can rely on the PC and other addresses to wrap around $FFFF as would do the original NES registers.

I know that the feature of short being 2 or 4 byte is oriented to optimization because in some processors it may be faster to process a 4 byte number than a 2 byte, but, if I can't rely in the address wrapping around, I have to AND the addresses with 0xFFFF each time to be sure they don't overflow the 2 LSBytes, and then the original optimization is lost anyway.

What do you think of this?

by on (#58207)
When you make something short, the compiler is adding code to truncate it to short size every operation.

by on (#58208)
So you would suggest ANDing it with FFFF every time its value changes, or is there a more clever option?
Re: [C/C++]Relying on short to be 2 bytes wide, a good pract
by on (#58209)
Petruza wrote:
if I can't rely in the address wrapping around, I have to AND the addresses with 0xFFFF each time to be sure they don't overflow the 2 LSBytes

On many compilers, (unsigned short int)value and (value & 0xFFFF) produce the same machine instruction. It's not worth the time to try to optimize for speed without appropriate metrics. But if you can demonstrate that a change increases speed, make sure to summarize the metrics in code comments: "This change improved frames per second by 3%."

There are at least four passes where errors in a C or C++ program can be raised: preprocess time, compile time, link time, and run time. I know of assertion mechanisms for three of these steps: at preprocess time, use #if; at run time, use if(). But sizeof isn't available until compile time, so you'll need to use a variable declaration that acts like a compile-time assertion.
Code:
// Declares a negative sized array only if value is false at compile time.
// Name must be unique.
#define COMPILE_ASSERT(name, value) extern const char name[(value)?1:-1];

#include <limits.h>  // import CHAR_BIT
COMPILE_ASSERT(eight_bit_bytes, 8 == CHAR_BIT)
COMPILE_ASSERT(sixteen_bit_shorts, 2 == sizeof(short))

It is an error to declare an array with a negative number of elements. If this happens, the compiler is supposed to emit a diagnostic and not produce object code for this translation unit. The COMPILE_ASSERT macro is supposed to trigger this if and only if value is false.

by on (#58210)
if you need an int to be exactly 16bits wide, why not use int16_t / uint16_t instead (defined in <stdint.h>)?

by on (#58211)
I recommend simply masking with 0xFFFF. I use plain int for addresses in most of my emulators and don't have to constantly mask them. This is the most portable approach, since it doesn't depend on the hardware doing anything in particular. If you want to support machines with 16-bit int (rare these days), you'll need to use unsigned int. If you want to support non-two's-complement machines, you'll need to use unsigned int or cast to unsigned int before masking (consider subtracting 1 from an address of 0; you get -1 if signed int, then cast to unsigned int and are guaranteed a two's complement representation, for masking).

If profiling shows that masking is somehow significantly slower than using uint16_t, you can get the latter from <stdint.h>. Just remember that stdint.h isn't part of C89 (it's part of C99), and not a part of any C++ standard. Even on C99, it's not guaranteed to exist, because the host hardware might not support such a size.

by on (#58213)
I say typedef it.

Code:
typedef unsigned short u16;
typedef signed short s16;


If you run into a compiler/system where short is a different size -- just change your typedef.

If you run into a compiler/system that doesn't support a 16-bit data type, you can write a class to simulate one (can just use a larger type and overload mathematical operators to mask afterward). Just change your typedef to be your class.


typedefs are great.

by on (#58218)
- Well, I knew about the alignment thing, but never thought it would have some impact.

by on (#58219)
yes, I actually use typedef unsigned short int word;
Thanks for the tips on how to assert word's length at compile time.

I just want to make the code the most portable possible, but I guess the source would need some modification for machines in which shorts are larger than two bytes.

by on (#58228)
Please don't assume short = 16-bit. That simply isn't the case on all platforms. Please use the standard uintXX_t and intXX_t types that are already defined (mostly) universally. Specifically uint8_t, uint16_t, uint32_t, and uint64_t. Same goes for the non-unsigned versions.

You might also find things like 1ULL useful.

by on (#58246)
It's one thing to use uint16_t to get implicit masking with 0xFFFF, but quite another to use it for a particular format in memory. Many machines store the least-significant octet first, then the most, but plenty store it in the reverse order. And plenty of machines require that it be aligned on a two-byte boundary, either aborting the program or giving a wrong result when it's not aligned. Unless profiling shows that your program is too slow AND that it is improved enough by using uint16_t instead of accessing the bytes individually and combining them, stick to something you know is portable.

by on (#58254)
Well my main concern right now is not speed optimization, but just to have a value I'm sure won't surpass 0xFFFF.
So it's either enforced by the data type or by a bit mask.
That's why I ask this.
And also my concern is that this code is the most portable possible.

by on (#58277)
- It makes sense, but it's just annoying to use only int or unsigned int for every value. :) Some values must be boolean (unsigned char), for example.

- Anyway, what system/OS/whatever "see" an unsigned short different, other than 16bit long? AFAIK, the "old" (?) Macs use reversed bytes, Big endian, not Little endian.

by on (#58283)
Portability is great, but how many people that are all about it actually make sure their code runs on all of Windows, *nix, and Mac? Nevermind the fact that newer Macs make it so they don't have to care about non-x86 platforms.

I have noticed lately that I need to use long long to get 64bit "u64" typedefs on both i586 and x86-64 gcc, so I really shouldn't talk. :roll: 8)

by on (#58284)
OK, I've spent half my afternoon to find a solution without using any macro (because macros are evil (C++ FAQ lite) and we want something more portable) and without execution penalty (in other words, everything is done compile-time). My code is a bit generic, so it may suit other situations.

The key is to use templates to make compile-time decisions on the type of a typedef. First, we have to make a kind of static linked list of types using templates, and make a meta-programmed algorithm to find the desired type that have the sizeof we want.

Code:
// we have to make such a static linked list of type.
// so that's the basic definition, and later we'll link it with himself
template<typename T, typename U>
 struct type_list
{
    //those members are facultative here, but might be useful
    typedef T head;
    typedef U tail;
};

// but to find the end of such a list, we have to define it
class Empty{};


// and now, we make our list of unsigned integral type
typedef type_list<
        unsigned char, type_list<
            unsigned wchar_t, type_list<
                unsigned short, type_list<
                    unsigned int, type_list<
                        unsigned long int, type_list<
                            unsigned long long int, Empty>
                        >
                    >
                >
            >
        >
    unsigned_integral_type_list;


// that class will perform the actual research algorithm
template<unsigned size_of, typename T>
  class find_type_with_n_bytes;

// anything that doesn't match something below will lead to compile error.
template<unsigned size_of, typename head, typename tail>
 class find_type_with_n_bytes<size_of,type_list<head, tail> >
{
    // in the private field we have another struct that will truly do the job
    // using recursion, at compile-time.
    template<bool, unsigned, typename, typename>
        struct research_type_by_size_of;

    // this is the case when we have found the type.
    template<unsigned my_size_of,typename _result, typename _any>
        struct research_type_by_size_of<true, my_size_of, _result, _any >
    {
        typedef _result _type;
    };

    //this is the case when the type is not found, and we're at the end of
    // the list. We typedef the struct type_not_found so when trying to
    // actually use the type, it will fail to compile with a fairly explicit
    // error like " illegal use of 'struct type_not_found' "
    template<unsigned my_size_of, typename _result>
        struct research_type_by_size_of<false, my_size_of, _result, Empty >
    {
        typedef struct type_not_found _type;
    };

    // the recursive case, when the type is not found and we're not yet at
    // the end of the list.
    template<unsigned my_size_of,typename _result, typename _head, typename _tail>
        struct research_type_by_size_of<false, my_size_of, _result, type_list< _head, _tail > >
    {
        typedef typename research_type_by_size_of<(sizeof(_head)== my_size_of), my_size_of, _head, _tail >::_type _type;

    };

public:
    // what the user can use.
    typedef typename research_type_by_size_of<(sizeof(head)== size_of), size_of, head, tail >::_type result;


};

int main(int,char *[])
{
     // how to use:
     typedef find_type_with_n_bytes<2,unsigned_integral_list_type>::result myUInt_16_t;

     //do whatever you want with
     myUInt_16_t a, b, c = b = a = 0;

     return 0;
}


Note: I suspect there's something in the C++ Boost library that already addresses the issue, but I didn't checked out, probably because I wanted to take that challenge :)

by on (#58288)
Oh dear, overkill^overkill there with those templates. If you want fully portable code but don't want to have to remember to mask everywhere (forget and you have a bug), use a bitfield:
Code:
struct {
   unsigned pc : 16;
} m;

m.pc++; // equivalent to m.pc = (m.pc + 1) & 0xFFFF

by on (#58291)
~J-@D!~ wrote:
OK, I've spent half my afternoon to find a solution without using any macro (because macros are evil (C++ FAQ lite) and we want something more portable)

C++ FQA has a rebuttal; have you read it? As for portability, even on the same compiler targeting the same CPU, different operating systems have different UI toolkits.

Quote:
Code:
// and now, we make our list of unsigned integral type
typedef type_list<
        unsigned char, type_list<
            unsigned wchar_t, type_list<
                unsigned short, type_list<
                    unsigned int, type_list<
                        unsigned long int, type_list<
                            unsigned long long int, Empty>
                        >
                    >
                >
            >
        >
    unsigned_integral_type_list;

Are you trying to get on TheDailyWTF.com? :shock:

by on (#58337)
tepples wrote:
Are you trying to get on TheDailyWTF.com? :shock:

No... :) ? But you can call me freak anyways...

tepples wrote:
C++ FQA has a rebuttal; have you read it?

No! And I'm really happy to see such a site, that's the first time I see such a website bashing C++ :) thanks!!! I'll read it carefully.