Python portability planning ( and prototypes ) [ pwas: pack/unpack...]

Steven D. Majewski (sdm7g@aemsun.med.Virginia.EDU)
Tue, 4 Feb 92 15:56:25 EST

In discussion with guido ( mostly about pack/unpack from binary records )
I suggested a sys.arch variable be added to module sys to contain machine
or os specific info.

[ Guido: I DO have some more specific followups on some of the other
questions/issues, but I, too, have been busy with other things. I've
been sitting on this os module waiting for time to move all the sources
to my PC and start on the dos end of things. ]

me> The pack/unpack proposal was just meant as something for discussion.
me> I am now serious about 'sys.arch' as a proposal.

To which guido responded:

Guido> But what value would you like sys.arch to have, and how would you
Guido> choose the right value at compile time? An an alternate proposal, I
Guido> can implement an interface to the Posix uname() function; this returns
Guido> a system name, a node (host) name, a release level, a version level
Guido> (of the release), and a machine type. I suppose this is easy enough
Guido> to fake for systems that don't have full Posix libraries yet. For
Guido> now, you can fake this by doing something like
Guido> string.split(posix.popen('uname -a', 'r')), provided you have the
Guido> uname program. Older systems may have a utility called arch (SunOS)
Guido> or machine (Ultrix), I'm sure you can cook up a function which tries
Guido> several of these until it finds one which works.

Well, that's the reason I brought up the subject. I *DIDN'T* in any
sense mean to infer that most of these things couldn't be done in python
( rather than coded in C to provide a python-interface ), but since they
all address portability concerns, they seem to indicate to me that
EITHER they should be "builtins" to the language ( and defined in the
library manual ) OR they should be declared "conventions" ( i.e. "this
is the proper way to do this in python, and therefore you had better not
do *X* because that will make your code horribly un-portable." )

So, I'm only partially asking a "how-to" question.
The more important question is the "how-should" question.

I am also *not* discussing things that should be C-source coded ( rather
than Python-source coded ) for effeciency. There probably are some cases
but there it makes sense to try out a python prototype BEFORE considering
a "native code" solution.

But there are some other things that either are not do-able *directly*
in python, or are relatively difficult to provide in python, but easier
to provide in C-source as python "built-in's".

The base level determination now seems to be that you can tell (roughly)
what os you are on by which "import" statement succeeds. [ *Is* this the
only way ? I think so. ]. If "import mac" succeeds, the we know we are
on a mac, and can assume that the CPU is a m680x0 type, and we can
have a dictionary of CPU's and endian-ness-flags, etc. But if/when apple
ships Risc-Mac's the library routines need to be changed. It doesn't seem
any more of a bother to require some #define's/#ifdef's in a config.h
file that pre-define some of these. Using the 'succesful import' rule
for determining machine variables also means that the convention must
be enforced of "no modules named "dos", "mac", "posix" , etc. This means
that you can't provide (limited) posix or dos functionality to another
system by an "external" posix module ( or dos.py ) because that will
make 'import posix' succeed on the wrong machine, and may interfere with
other routines that need it to fail to determine machine specific parameters.
( And forcing "import posix" to really "import mac" seems the only way
to make the library sources backward compatible! ( or "path" & "macpath" )

Back to the question of *what* needs to be "pre-defined" -
I'm not really sure.
We should start off with a minimal set of what can be easily
determined either by the compiler or edited into a config.h file.
(1) I think (dos|posix|mac|whatever) should be #define-d in, and *not*
determined by successful import.
(2) I think cpu-family should also be determined at compile time.
[ Exactly which cpu ( 80286 vs 80386, 68000 vs 68020, etc. )
CAN NOT be determined by the compiler ( except for the non-downward
compatible cases - gnu C will *only* produce 32 bit 386,486 code. )
And I can't really thing of a use for this info except as a potential
warning to folks stuck with only a 640K python on a PC. I expect they'll
get SOME sort of run time message when alloc/malloc fails ]
/* the above are my "must" list. The following are increasingly maybe's */
/* although they are so easy to add that I see no reason not to. */
(3) Endian-ness can either be determined at compile time ( by a #define )
or perhaps better, can be a one-time low-level 'C' test that sets
a symbol. ( look at (int) 1 as a char[2] )
(4) The sizes of native ints, longs, etc. may be useful for things like
pack/unpack etc. It is certainly easy to figure out at compile-time.
( size.int = sizeof(int) ; size.long = sizeof(long) ... )
(5) Vaxen might need a value to indicate what type of floating point
number support is compiled into python.

I propose that a dictionary object { 'cpu':'80x86', 'system':'posix', ... }
be either added to the sys module ( sys.arch, sys.machine, ? ) or that
a new "builtin" module be added ( machine ? ) that (initially) defines
some of these. ( We can then prototype suggested additions by importing
an additional module ( moresys.py ? ) that adds more mappings to the
dictionary. ( And perhaps module sys should "automagically" try to
do the equivalend of "import moresys" ? )

-------

Below is a module that determines what system it is by successful import.
If it is posix, it tries 'uname' or 'arch' to get a more specific answer.
For mac & dos, it assumes the obvious for cpu.
[ Note: there is not YET a "dosmodule" to be imported. ]

I will probably change the names of things, so this is not a suggested
"standard" method.
[ Who was it that said: programming is largely the art of naming things well ?
And the proper use of namespace is really the question, if we are talking
about defining conventions. ( I admit this is not "well-named" - just a
prototype|exercise that suggested some of the above issues. ) ]

# module os
#
# os.module = appropriate os specific module
# i.e. posix , or its closest functional equivalent.
# ( mac, dos ) - there is SOME common functionality
# between them. ( listdir(), etc. )
# os.isposix = 0 | 1 ( if import posix was successful )
# os.system = { 'cpu':cpu-type, os:( os-tuple ), ... }
#
# thus:
# import os
# def ls( ):
# for file in os.module.listdir(''): print file
#
# SHOULD work on any system.
# ( If we add a directory-specifier argument to the above, we either
# need to require that the user and any-other calling routines
# use the proper local pathname format, or we need a posix-to-local
# pathname converter function. )
#

import string

#
# Order of tests is because I was writing/testing on SunOS,
# and I wanted to make sure that the other paths were tried.
#
def _sysmodule():
system = None
try:
import mac
system = mac
except ImportError:
try:
import dos
system = dos
except ImportError:
try:
import posix
system = posix
return system

module = _sysmodule()

#
# Note: this version makes several calls to uname ( if posix ) with
# switches ( -m(achine) -s(ystem) -r(release) ) to get specific info.
# I didn't know whether I could rely on a posix defined order with
# the -a(ll) switch.
# SparcStation 'uname -a' returned: "SunOS aemsun.me 4.1.1 1 sun4c"
# IBM RS6000 'uname -a' returned: "AIX galen 1 3 000019501100"
# I don't know how useful cpu = "000019501100" is !
# Also SunOS and AIX seem to differ on the meaning of the terms
# 'version' and 'release'. AIX appears to do it correctly, returning
# "3" for 'uname -v', SunOS returns '1' . I was writing/testing this
# on a SUN, so I use 'uname -r' to get " 4.1.1"
# ( This was probably the source of my doubts about the posix order. )
#

def _cpu():
if `module` == '<module \'dos\'>' : return 'i8x86'
if `module` == '<module \'mac\'>' : return 'm68x00'
if `module` == '<module \'posix\'>' :
for cmd in [ 'uname -m', 'mach', 'arch' ]: # add any other likely cmds!
mach = module.popen( cmd, 'r' ).readline()
if mach != '' : return string.strip( mach )
return None

def _sys_generic():
name = string.splitfields( `module`, '\'' )[1]
return name

_true = 1
_false = 0

if _sys_generic() == 'posix' :
isposix = _true
else:
isposix = _false

if _sys_generic() == 'mac' :
ismac = _true
else:
ismac = _false

if _sys_generic() == 'dos' :
isdos = _true
else:
isdos = _false

def _system( ):
generic = _sys_generic()
if generic == 'posix' :
sys = module.popen( 'uname -s','r').readline() # system
ver = module.popen( 'uname -r','r').readline() # release
return ( generic, string.strip(sys), string.strip(ver) )
else: return ( generic, ) # may want to add more specific code, e.g.
# ( 'ms-dos', '5.01' ), ( 'mac','7.0.1'), etc.

system = { 'cpu':_cpu(), 'os':_system() }

def check():
print 'module: ', `module`
print 'system',`system`,':'
for each in system.keys() :
print (' '*4)+string.ljust(each,9), ': ', system[each]
import os # interesting need to import own module!
# else: dir() => 'each' & dir(os) => error
for each in dir(os):
thing = eval('os.'+each)
print string.ljust(each,12), string.ljust(`type(thing)`,20), thing

check()

#
# needed:
# machine support:
# byte_order
# sizes of native int,long
# ieee, vax, or other floating-point
# ... (?)
# os-support:
# ???
# 'portable' import:
# if os.isdos :
# import dospath
# path = dospath
# if os.ismac :
# import macpath
# path = macpath
# if os.isposix :
# import posixpath
# path = posixpath
# etc...
# Note: if the above is in module 'path', then importing path
# will make the function names 'path.path.function()' instead
# of 'path.function'. Probably: "from path import path" will work.
#

I'll try to cobble out a prototype of a C include file for a
prototype sys.arch when I have more time. Any other suggestions
on what should be included/excluded ? I thing sys.arch could be
an dictionary object like 'system' above.

Note:
I hate having to use literals for this sort of thing:
if `module` == '<module \'dos\'>' : return 'i8x86'
if `module` == '<module \'mac\'>' : return 'm68x00'
if `module` == '<module \'posix\'>' :
It is too error prone: especially when there are escapes, as above,
it is easy to mispell and hard to check by any way other that
actually executing the code. This was my same complaint previously
about exceptions. ( I know Guido agrees that the exception namespace
needs to be finer grained - I saw it on his "to-do" list! ). If one
wants to check on a more specific error by the additional argument,
one has to actually cause the exception to find out what the exact
message is! type(''),type([]),etc. work ok as a replacement for
type literals ( type( open('/dev/null','r' ) ) is not as neat, but
I haven't been forced to use it yet! :-). But I have no better
general solution to recommend.

Another Note:

* In ICON: *

procedure main( args )
local n
local c
c := &digits ++ &lcase ++ &ucase

write( "Hello World! ", &dateline )
write( "This is " , &version )
write( "&Features include:" )
every write( " <*" , &features, "*>" )

...

}
end

*** produces: ***

Hello World! Tuesday, February 4, 1992 12:06 pm
This is Icon Version 8.0. May 7, 1990
&Features include:
<*UNIX*>
<*ASCII*>
<*co-expressions*>
<*direct execution*>
<*environment variables*>
<*error trace back*>
<*executable images*>
<*expandable regions*>
<*external functions*>
<*large integers*>
<*math functions*>
<*memory monitoring*>
<*pipes*>
<*string invocation*>
<*system function*>

I think this is overkill. ( for python, at least. ) This was due,
I imagine, to the Icon project wanting to provide *an* implementation,
even if limited, on many platforms. I would not expect the same sort
of functional differences in python on different platforms. Here the
problem is NOT whether or not there *are*, for example, pipes, but
how to make programs that use posix.listdir use mac.listdir with a
minimum of effort or source convention. [ Initially, I was going to
use posix.popen & msdos.popen as an example, but I agree with the
argument in comp.lang.icon that they are *not* semantically equal.
BUT close enough for the equivalence to be useful. ]

I also note that there has been some discussion in comp.lang.tcl
about handling UNIX-ism's on Mac-tcl and the converse. Mostly on
the subject of path-name-translation ( automatic or not, etc. )
I have no opinion (yet) on the matter.

Wishing-I-had-something-clever-to-say-here-like-tim-ingly yours,
( But the above has exhausted me! As Tim once said to me:
"If I had more time, I would have made it shorter!" )

- Steve

But to condense & reiterate the major point of the above discussion:
( That's easier than editing & rewriting the whole thing to make it
clearer. :-& <grrr> )

Using the success/failure of modules posix|mac|dos|{whatever} to
determine what machine & system the program is executing on
( which is needed to do un-portable things portably ) is not
a good idea because it will conflict with 'aliasing' the modules
to provide portability. ( for example: making a posix.py module
for mac that attempts to provide some posix compatible emulation. )