When to use signed and unsigned integers in modern C++

The main distinction between signed and unsigned integer in C++ is not, as one could suppose, whether it represents an integer that can be negative or not. They are more distinct that meet the eye, to the point I even think "unsigned integer" is a misleading name.

Unsigned integer are suitable for bitfields, when you really consider every bit independently and bit operators (eg ~, &, ...) make sense, and so does overflow and underflow semantics.

But when using your integer is an actual mathematical number, for arithmetic, when to use unsigned ? The answer is quite simple: almost never, and certainly not just because your integer "cannot be negative".

Note: The content of this post is my not so humble opinion. It is slightly biased towards "high level" C++ and one could object to some points when writing a system code. I think the points stay valid for the vast majority of modern C++ development.

But if my integer cannot be negative, why not indicate it in its type ?

The thing is, not everything fits into types. Types are here to give you compile-time safety. If adding information to a type just get in your way with no benefits, then there is no point.

As a (dumb) example, consider we add a new integer type: prime_int_t. If I know that an integer is prime, why not indicate it in its type? The more information, the better, right?

First, please consider in-place operators such as +=, ++, etc. With regard to static typing, void f(prime_it_t& x) { ++x; } is broken, as a prime number incremented will not be prime - with the exception of 1. So as a first measure, we must make our prime integers immutable.

Now what would be the signature of the addition operator? Probably int operator +(prime_int_t x, int y), since there is no guarantee the result will be prime itself. Actually, pretty much any computation on a prime_int makes the result uncertain, and int does not mean "not prime", it is maybe_prime_int.

The problem is, we tend to compute a lot (it's called computer science), so we will very often lose this "prime" typing information. And if one gives you a function void f(prime_int e), you will have to check whether your input is prime and maybe raise a type exception at runtime, then cast when you're sure. That's two red flags. Our prime_int_t doesn't bring anything with respect to typing, and greatly complexifies the code.

My point is, not all information belong in a type, and certainly not runtime information. We could stuff arbitrary information in integer type, but we don't have a type for strictly positive integer and nobody seems to complain about it.

While the prime number example can seem exaggerated, it's not such a stretch. Because many operations on an unsigned integer don't guarantee it would stay unsigned, they actually need to have a different meaning so we don't have to make them immutable: when going below zero, they wrap around. So adding unsigned is not really about adding information to the type, it's about picking a different semantics. For instance, std::string::size() is unsigned, because after all the size of a string cannot be negative. Now look at this fiasco trying to determine the size variation between two strings:

std::cerr << std::string("foo").size() - std::string("foobar").size() << std::endl;

This prints 18446744073709551613. Gee, thanks, that's exactly what I was looking for! Worse than that, this number could be different on other CPUs. We made a language where the difference between the size of "foo" and "foobar" depends on the size of your CPU data bus. So much for abstraction.

This is because unsigned does not just indicate that those sizes cannot be negative, it make them behave differently from mathematical numbers, and certainly not like distances, be they the length of strings or the distance above sea level.

Likewise, if you see get a void f(unsigned int x) function, it doesn't mean that your compiler will gently err if you pass f a negative number, like it does in all other typing error case. Try calling f(-1) and buckle up. Unless you explicitely wanted the "-1 wraps to UINT_MAX" semantics, in which case you're not doing arithmetic but bitfields, the correct way to pass any int to this function is to always check before whether it's indeed non-negative, and raise a runtime error otherwise. Or is it a typing error, since after all it's a type mismatch?! Let's face it: signedness does not belong in types, not in the same sense as all other C++ typing mechanisms. Please, do not use unsigned for arithmetic numbers.

And to top it off, please run this:

#include <cstdint>
#include <cassert>

bool less_than_two(int i)
{
  uint64_t const x = 2; // could be the return of any .size()
  return i < x;
}

int main()
{
  assert(less_than_two(0));
  assert(less_than_two(-1));
}

Modern compilers will kindly issue a warning there because it's such a pit trap, but whether they should is even debatable since it's perfectly valid, well defined C++.

But I may need the extra range!

Honestly you probably don't. Either you check for overflow, either you don't. If your range is not bounded, then a factor of two on the maximum value is not the answer, boundaries checking is needed in any case. If going past a certain value is to be rejected, I highly doubt rejecting 4 billion is acceptable, but rejecting 2 billion is not - in a 32 bits situation. And if you need to support arbitrary high numbers, then you must rely on some big numbers library, pushing the fiasco by a mere factor 2 is no solution.

As mentioned, I know this point is debatable if not spurious in some lower level or embedded situation where size and performances could be crucial. But this is not the domain I'm addressing here.

There is a notable exception to all this: when addressing something that is exactly \(2^{64}\) or \(2^{32}\) in size, such as your address space. When your domain is exactly \([0, 2^{64}[\), although the type-correct way would be to use a signed integer with mathematical semantics, having to rely on something else than uint_64_t seems overkill, obscure and impractical. So in this case, use unsigned and proceed carefully with semantics, because it is after all well defined and doable.

But please, in any other case, don't use unsigned for arithmetic.

But unsigned integer could be faster!

Unsigned integer can be faster in some situation, namely multiplications and divisions by powers of two. But let's be honest, unless you're doing bitfields operations (in which case, remember, you should use unsigned), this is probably a very rare case. Note that the compiler must be able to statically detect that your right operand is a multiple of 2, so apart from extremely specific cases this is not going to happen. Also if you care this much about performances, you should try to get this sort of optimization in all cases, not only when your integer happens to be unsigned, which I doubt you do.

And in fact, signed integer will probably perform better for you. Because overflows are undefined behavior on signed integer, the compiler can assume they will never happen and perform all kind of dead code elimination and conditions skipping. Unless you're this concerned and that knowledgeable about optimizations, you should not take performances as a criteria, and you should not use unsigned for arithmetic.

But the standard library uses unsigned ?

Yes it does, and it's a pain, although in most places it's unfortunately the most generic choice. Remember that the standard library and STL are quite old and not necessarily the panacea in term of design. Also note that those libraries must work for every use case, and std::vector<char>::size has to be uint64_t, because you could get such a vector as big as your addressable space, hence it falls into my special corner case.

Likewise, you could allocate more than half your memory on your 16 bits CPU with a std::string and use it as a pool, which justifies std::string::size being unsigned. But if you're handling a memory pool, I doubt std::string is the adequate type. In the end, we have the choice between a string class that just represents text with sensible typing, or a string class that can also be used as a memory buffer mapping more than half your addressable space but where subtracting sizes yields crazy values unless you carefully convert them first. I really, really regret the former was not picked. There would be room for a std::buffer type that actually represents memory regions and where an unsigned std::size_t as the size would be justified, leaving our strings alone.

But you probaly design code that is a bit higher level, and you probably don't handle numbers related to the total size of your addressable space that often. If you're counting the number of entries in a JSON dictionary, if \(2^N | N >= 16\) is an acceptable hard maximum, \(2^{N-1}\) is probably too, so please use a signed integer.

So ... that simple ?

Yes, it is that simple. And it is a good thing, we should not have to think this hard for an operation as simple as declaring an integer. Also, uniformity is a good property, different convention within the same project causes useless hesitations and complications.

Like all rules, it cannot be absolute, and you can point out exceptions. If you're serializing a string of maximum 200 characters, losing a byte to encode its size would be wasteful, so there could be a touch of unsigned to temporarily hold the size. But if you're returning the number of elements in a JSON object, please, int64_t will do. Or for almost any other integer, in the mathematical sense of the term.

Of course, old C++ roadies know about this, but it's still a pain, and it's quite a shame to have to explain to newcomers "ha, you compared a string size to a potentially negative integer, you fool!". This make this language suck for algorithmics. My main point here is that "this integer should not be negative" is not a sufficient justification to use unsigned.

So please, unless you have a specific reason, do not use unsigned for arithmetic.