Thank you, I wish I saw this an year ago when I was learning 2's complement for digital systems subject.
Many articles I read said that 2's complement helps in arithmetic, but I was not able to understand "how".
The naive me was like - why do they complement and set a sign bit? We can just use a sign bit right? So if 0010 means +2, then why not represent -2 as 1010. What's the need for the complement part at all?!
Then only I realized that we are able to retain x + (-x) = 0 using this 2's complement representation. :)
Here's a very brief description of two's complement that also happens to be a complete description:
Two's complement is ordinary integer arithmetic. There is no difference between two's complement arithmetic and unsigned arithmetic. All calculations are literally identical.
Unsigned integer arithmetic is modular arithmetic with equivalence classes having labels between 0 (inclusive) and 2^n (exclusive), while two's complement arithmetic is the same thing, but the equivalence classes have labels between negative 2^{n-1} (inclusive) and 2^{n-1} (exclusive).
Here's a table of three-bit values with "unsigned" and "two's complement" labels:
Applying the unary "negate" operation isn't particularly useful test.
For reference (and note the apostrophe locations):
* sign-magnitude is slightly easier for multiplication/division but makes addition/subtraction marginally harder. It's widely used in bigint libraries, and IEEE binary floats use a variant of it (unlike IEEE decimal floats, which are not at all sane).
* two's complement is optimal for addition/subtraction, but has a weird "most negative" value. Fixed-width multiplication and division have a couple warts but .
* ones' complement is not particularly useful compared to the above two choices, though I suppose it's marginally easier to sort than sign-magnitude
* offset-binary is useful for naive sorting, but casting to/from unsigned is not a nop.
For two's complement at least, to avoid the UB you can just cast to/from unsigned around most operations. Division has to be done with some kind of branches, however.
I rather wish we had an integer `NaN` instead of the most-negative value.
Also, there are several useful overflow behaviors: trap, wrap, saturate, unspecified, and undefined. "unspecified" is particularly useful if you're sometimes transpiling to a JS-like language that doesn't actually support integers.
I don't know about other bigint libraries, but java.math.BigInteger uses two's complement, not sign-magnitude. This class is designed to imitate machine-size two's complement integers.
The most negative value is a wart in two's complement, but negative zero is a wart in sign-magnitude and ones' complement.
The advantage of ones' complement compared to sign-magnitude is that the former needs minimal circuitry to implement subtraction. Namely, it just needs to invert every bit and then feed it into the adder.
You only need to avoid undefined behavior in two's complement in C and C++. UB is not a property of two's complement, but a particular language's design choices. I can assure you that there is no UB in two's complement in Java or x86 assembly.
Note that offset-binary is used in IEEE 754 floating-point exponents.
> You only need to avoid undefined behavior in two's complement in C and C++. UB is not a property of two's complement, but a particular language's design choices. I can assure you that there is no UB in two's complement in Java or x86 assembly.
The entire point of making unsigned arithmetic UB in C was to support different bit representations of negative numbers depending on the HW.
I think the GP's point was that if you want two's complement wraparound behavior in languages where you are not allowed to wrap around signed integers you can get the same result with unsigned arithmetic.
A nit: a lot of uses of multiplication only care about a result type as wide as the input type. For these uses of multiplication, two's complement signed multiplication and reinterpreting the bits as unsigned then performing unsigned multiplication give the same result. When you start caring about i32 x i32 = i64 or i64 x i64 = i128 or bigint multiplies this stops being true because the bits you'd use to sign-extend the inputs can actually show up in the result.
I don't think an integer NaN would be a good idea. People's intuition around the existing float NaNs is not good.
I want to make user defined types like BalancedI32 (32bit signed integer but the magnitude of MAX and MIN are the same) possible in Rust. Today the stdlib is allowed to make NonZeroI32 which as its name suggests lacks zero but you can't make such types yourself except via wrapping these NonZero types with xor or similar.
(In nightly Rust you can just break the rules and do the same thing the stdlib does, but there's no obligation to keep that working and it will never be stabilized)
Because of the Guaranteed Niche Optimisation this means Option<BalancedI32> would have identical size to i32. So you get the same practicality as a normal signed integer with a sentinel value, but the same programming ergonomics as an expensive high level type.
> People's intuition around the existing float NaNs is not good.
I think that, if NaN was defined to be an unconditional trap but can be deferred for efficiency, then it could have worked much better. NaN surprises people because it was exposed in the first place, while it should really have been hidden behind the scene (because the presence of NaN means you are already wrong somewhere, and you are expected to be notified about that fact). In terms of Rust, say, every f32 operation could have returned `Result<f32, FloatError>` (mostly likely to be aliased though) that has the exactly same layout as the current f32. The current split between qNaN and sNaN (among others) is unnecessarily complex with little benefit.
Huh? I think you might have skimmed what I wrote a bit too fast and didn't end up comprehending most of it.
Some Rust types have niches, bit patterns which the type doesn't use. For example booleans take up a whole byte, but only need two bit patterns, leaving 254 unused. Today there is no stable way to conjure into existence a type with a new niche of your choosing.
The example I gave was BalancedI32, a type which is exactly like i32 except that it doesn't use the most negative value. The most negative value of integers in a two's complement system is a weird outlier, so in many ways this is nicer to work with, but the niche also allows such a type to take advantage of the Guaranteed Niche Optimisation.
Rust's String type is basically just a Vec<u8> with extra promises about the bytes (they're guaranteed to be a UTF-8 string) and there's nothing special about it, you can make your own and several people have. You can even use an existing niche to do clever things, as CompactString does.
I agree I've misunderstood things. I thought that the general class of "some bytes that are guaranteed to only contain certain bit patterns" was addressed by a single feature, but actually there's a single feature that addresses this for only primitives and types similar to primitives in some sense. The builtin string type is doing some unrelated thing where it has a thing and it assumes that the thing will contain only a strict subset of the possible bit patterns of the thing or you get nasal demons, and user code is free to do that, but doing that is unrelated to niche optimizations.
IEEE decimal is definitely complicated and confusing, but it does seem to unambiguously be sign-magnitude? The confusing part is how the magnitude works, not how signedness works...
The weird interleaving of the exponent with the mantissa - and the existence of noncanonical values - means you can't just through them into sign-magnitude integer comparison routines.
Isn't that interleaving is mainly used for making better use of bits? To me it is more like the limitation of exact power-of-two bit sizes, for example decimal40 (1 sign bit + 9 exponent bits + 30 mantissa bits) should have been not that bad. In comparison, decimal32 could have the following combinations:
s e m
1 11 20 - 6 mantissa digits, exponent range too large (~10^+/-1000)
1 7 24 - 7 mantissa digits, exponent range too small (~10^+/-60)
1 7.6 23.3 - still 7 mantissa digits, but interleaving gives exponent range of ~10^+/-90
1 4 27 - 8 mantissa digits, exponent range too small to even contain binary32
1 1 30 - 9 mantissa digits but no usable exponent range
I guess it is not the worst trade-off.
Your point about non-canonical values is entirely valid though. I do know it is mostly due to the expensive normalization step, but that only sounds like a reason to always prefer DPD---there is little reason to define a decimal FP standard if it is not for hardwares nowadays.
Ah, I see. By "sign-magnitude" I just meant that the sign and the magnitude are stored independently, and I think that's the usual usage of the term; I wouldn't expect "a sign-magnitude format" to mean one where you can use sign-magnitude integer comparison.
For me two’s complement started making sense when I realized that for example when you subtract 1 from 0, regardless of using signed or unsigned, you get 11111111 (assuming 8 bits). Using two’s complement, we just interpret it as -1, but the binary arithmetic is the same.
The author briefly talks about the HP-16C. I’ve been looking at buying one for a while, but never managed to convince myself I need it enough to spend a couple of hundreds of dollars on a used calculator. I do a lot of 6502 assembly, so it would certainly be nice to have. I did, ironically, spend about the same amount in total on various other calculators that I thought might do the same job. I am now using a Casio CM-100, which is actually quite nice. I see the author has a 16C clone. I got a different clone of the 16C, a PX-16C, which is an impressive project, but ended up not enjoying using it.
To find the two’s complement of 5, we invert all the bits (changing 0s to 1s and 1s to 0s) and then add one to the result.
I am embarrassed by the fact that I wasn't aware of this. I always interpreted the two's complement as the difference between the all the numbers before the high bit, minus the high bit. Aka 10000000 – 01111011 = 0101
When I first learned two's complement I sort of accepted it as something to memorise for a test and didn't really understand (or care) why it worked.
What really made it click for me was thinking of it as modular arithmetic. If you consider 8-bit integers, they range from 0-255 and you're actually working modulo 256. So you can think of 0-127 as your non-negative numbers. The numbers from 128-255 behave as negatives modulo 256 (e.g. -1 mod 256 = 255).
That it's the same thing may be more familiar in base 10: for example with 3-digit numbers, the 10's complement of (say) 428 is 1000-428 = 572, and you get the same result if you "invert" each digit (subtract from 9) and add 1: 571 + 1 = 572. This is just because
(999-x)+1 = 1000-x.
One of the advantages of two's compliment (particularly relevant in early hardware that really needed to minimize gate count) is that you can use an adder to perform subtraction: just flip all of the bits of the argument to subtract, and have an extra carry input to pretend you have a carry-in bit to be added to the least significant bit.
If you want your CPU to efficiently support multi-precision addition, you can have an overflow machine state flag and have an instruction that uses that overflow flag as the carry-in to the least significant bit during addition.
I find it much easier to think of everything in unsigned terms. If I want 8bit -5, I really want 256-5=251. I've done plenty of that in my time but I don't think I've once consciously inverted bits and added 1.
So does this mean that whenever one is working in 8 bits that it’s taken for granted that the upper bounds represent negative integers? Or does it rather depend on the context?
I'm not sure what you're asking. If you're working with n-bit integers and you interpret the value as unsigned, then all values >= 2^(n-1) (i.e. the upper half of the range) represent negative two's complement values. Or put another way, bit n-1 is on iff the value is negative.
It depends on context. Sometimes the bits or bytes represent a signed number, sometimes an unsigned number, sometimes a float, sometimes a string, etc. It's all up to how you choose to interpret it.
Two's complement is just like those spinning number wheel counters, like the km counter in old cars: If you drove more than 999999 km, it would look like you drove 000000. Now, if you had a funny car where driving backwards would subtract from the km counter, starting at 000000 and driving backwards 1 km would result in 999999, therefor 999999 means -1 (you could say 000000 to 499999 is positive, 500000 to 999999 is negative).
Let's say x=000110 for example
Invert all bits of x, call that number inv(x)=111001
If we add them together, "the gears mesh together" and we get x+inv(x)=111111
Adding 1 to that, let the carry just drop off the left side, we get x+inv(x)+1=000000
So it's natural to take inv(x)+1 as a representation of -x.