Semantics of long ints

Tim Peters (tim@ksr.com)
Tue, 31 Dec 91 00:08:15 EST

> ... a few quick remarks about shifting and masking on long ints.
>
> First, long ints have changes to a *one's complement* representation.

That is consistent with some of the samples I posted.

> This was the only way that I could give ~0L the meaning of "all 1 bits".

? Don't see a connection. E.g., it appears that short ints are (as C
greatly favors) represented in 2's-comp, and on this 32-bit machine I get

>>> (~0 = 0xffffffff)
1

That is, ~0 means "all 1's" for short 2's-comp ints, so how could it be
a problem for ~0 to mean "all 1's" for long 2's-comp ints too? I think
I must be missing your meaning badly.

If the underlying question is "if long ints are unbounded 2's-comp
integers, then how can the ~ operator be implemented such that ~0 works
out to being all 1's (and in all other respects does 'the right
thing')?", the tricky-but-simple answer is "in all cases, ~ should add
1, then flip the sign bit". That follows from the 2's-comp identity
-x = ~x + 1
so that
~x = -x - 1 = -(x+1)

This assumes that *internally* a signed-magnitude representation is used
for longs (which I believe is true in Python). Then ~ applied to a 0
will (1) add 1, yielding 1, & (2) flip the sign bit, yielding -1, the
correct 2's-comp way to spell "all 1's". Similarly, ~ applied to -1
will (1) add 1, yielding 0, & (2) flip the sign bit, again yielding 0,
the correct 2's-comp way to spell "all 0's". It's not the least charm
of 2's comp that these "spellings" are unique ...

It's clearly easier on the user if short ints & long ints are (at least
conceptually) represented the same way; since C matches nearly all
widely used machines in favoring 2's comp, and Python is implemented in
C, it makes sense for Python to mirror C's rules for short ints. Using
a different method for long ints will cause inconsistencies, like the
current

>>> 5 & (-1)
5
>>> 5L & (-1L)
4L
>>>

That's not good; e.g., consider the poor user who tries to boost the
precision in their program by switching to long arithmetic and gets
different answers as a result.

> Of course, as soon as you use any arithmetic on it, it changes to 0L,

Ya, another reason to hate 1's comp <grin>.

> and arithmetic operations (including unary "-") will never produce it.
>
> Second, there's a bug in the implementation of long &, | and ^ when
> one operand is negative and larger than the other; e.g., the outcome
> for "1L & ~1000000L" is wrong. Fixed in 0.9.5.
>
> I hope this explains most of what you've discovered.

Let's see ... The "~0L acts like 0 in arithmetic" explains the results
for the "+" and "*" examples. The "~0L acts like all 1's" explains the
result for the "&" example. Believe the "|" and "^" examples remain
mysteries (those don't appear to fit the "one operand is negative and
larger than the other" clause):

>>> 42L | (~0L)
-42L
>>> 42L ^ (~0L)
32725L

Oops, add one more to the pot:

>>> (~0L) << 1 # expected -1L
~0L

> "I told you the documentation was out of date :-)"

Na, documentation is for wimps -- I wouldn't read it anyway <grin>.

bigints-are-such-a-tricky-business-almost-nobody-gets-them-right-ly y'rs
- tim

Tim Peters Kendall Square Research Corp
tim@ksr.com, ksr!tim@uunet.uu.net