I am happy to have you do the work on this question. What, you don't
work, you're an academic???!!! :-)
>
> but seriously: (4) this possibility is what got me excited
> about Python in the first place (after I decided that Common
> Lisp would never work out and dbPerl wasn't such a good idea);
> (5) I've done some halting experimentation in this area (I'm
> working on a relational algebra++ in Python as we speak).
>
Frankly, I wasn't thinking of a BIG can of worms, just trying to bring
some order to the emerging chaos. But if you want to jump in with a
more elegant solution, be my guest.
> [...] I'd like to see
> 1) a direct interface to underlying SQL engines that just takes
> straight SQL strings and returns "query objects".
QED
> 2) a compositional (algebraic/nonSQL) approach that allows queries
> to be constructed and derived directly in python, either with
> no visible means of support or using an SQL engine underneith.
Can you make a suggestion? My sense is that this is "interesting" but
not feasible or practical for the issue at hand.
> 3) some sort of SQL implementation directly in python.
> If we start with (1) and do it well then we could feed the result
> directly into (2) which in turn could be used by (3). Viola!
> Python becomes a killer DB glue language! Interoperability anyone?
>
Definitely not interesting to me, as a practitioner. My current company
chose to use Oracle. My next company may use Sybase. I am skeptical that
there is any value or point, except as an academic exercise, in competing
with these commercial ventures.
> [...] Any row should "look like a dictionary", but python's
> neat OO interface means we can implement it however we like. The only
> reason I can think of for using something more complicated than a
> dict for a row would be for "updatable cursors" which might need to hold more
> information.
>
The other responses that I received indicated a preference for a tuple with
the column name information obtainable elsewhere (or not at all). I agree
with Aaron that a dict is a better choice because it retains the column
heading information on a row-by-row basis, thus allowing each row to be
"detached" from the query without loss of information.
> >3) Should a select return a cursor object that is interrogated for the
> >rows? My implementation does not do this.
>
> Why not? I'd like to see queries be objects that could be
> passed around, duplicated, reset, advanced, indexed, and even algebraically
> combined with other queries. Let's be expansive and general,
> and if for some reason something's too hard to actually implement,
> we can just 'raise "hell"' (Ie, punt using errors). A lot of
> this would be best implemented in a Python layer which talks to
> the C interface something closer to native mode (like Tkinter/tkinter).
>
I don't use a Query object because I can't see any value in it. This is
especially true if I can pass around SQL statements as strings instead
of having to pass around the query object. I toyed with the idea of
having the execution of a query return a Cursor object that could be
treated like a list. I backed off of this because I didn't see any
value in adding the overhead. Nor do I see the point in storing
the full results of the query. If the user wants the full results stored,
(s)he can do it outside of the framework of the module itself. To illustrate,
my solution is a module that looks a little like this:
class Oracle:
def execute(self, query):
# Pass the query to the DBMS and open a cursor for fetching
def nextRow(self):
# get the next row from the cursor and return it
# if there is no next row, return None
I use it as such:
>>> o = Oracle()
>>> o.execute('select * from my_tables')
>>> results = []
>>> row = o.nextRow()
>>> while row: results.append(row); row = o.nextRow()
I could have used list semantics for the nextRow, but I use an internal
cache for the rows and when I need a row that is beyond the end of the
cache, I flush the cache and refill it with the next n rows, so I have
no way of going backwards. However, I solve this problem by creating
my own list and then I can use it however I choose. Thus, the burden
for memory management (i.e. from potentially large queries) is on
the user and not assumed in the module itself.
> >4) How should data be represented? I support strings, ints, and floats
> >as native datatypes. What about BLOBs?
>
> When possible a query object should encapsulate types for attributes.
> Could use a "dunno" type to indicate that the programmer is "on his
> own." I don't see any reason not to use dynamic allocation for big
> stuff like the rest of python. But if it's REALLY big my ODBC ref
> manual notes that "[SQLGetData] can be used to retrieve character
> or binary data in parts" -- so we could wrap big objects in a python
> BLOB/bigstring class, which only materializes the piece we want to
> look at, eventually -- but push that one for the moment. Oracle
> tries to convince customers to buy enough memory so the whole database
> will fit "in core". Maybe we should follow their lead.
I haven't given the idea of non-ASCII, non-numeric types much thought
because I don't use them myself, but I suspect that this is a real issue
for some. The idea of putting the whole database in core is ludicrous.
My database is over 10GB and growing. It might be nice to have the
much memory, but it isn't happening.
>
> On the error messages: come on, that's too practical for me.
> [I'll have to think about it.]
I raise it not because it is interesting, but because a consistent
and documented method for error/exception handling is critical to a
successful standard module.
>
> As an aside, hypothetically if we provide a python interface to
> ODBC (and/or the emerging SQL standard API) programmers could use
> it to talk to a lot more than just db's -- Excel spreadsheets, for
> example. Thank God that Bill Gates (if the two can be distinguished)
> is in a position to enforce some standards on the software community!
>
ODBC support is not high on my list, but it is interesting.
> next step: maybe a moderately formal object architecture for
> python/dbglue?
Go for it!
> Aaron Watters
> Department of Computer and Information Sciences
> New Jersey Institute of Technology
> University Heights
> Newark, NJ 07102
> phone (201)596-2666
> fax (201)596-5777
> home phone (908)545-3367
> email: aaron@vienna.njit.edu
> ===
> [A] Microsoft developer took a piece of IBM code that
> required 33,000 characters ... and rewrote it in 200
> characters.... IBM managers then began complaining....
> Measured in lines of code, they said, Microsoft was
> actually doing _negative_ work....
> -- Paul Carroll, _Big Blues: The Unmaking of IBM_
-- Derek Fields (derek@cc.bellcore.com)