Directory usage tree's in perl & python.

Steven D. Majewski (sdm7g@aemsun.med.Virginia.EDU)
Tue, 21 Jan 92 16:17:19 EST

There has been a series of good-perl/bad-perl exchanges in comp.lang.perl,
that produced (among other things) this exchange:

From: flee@cs.psu.edu (Felix Lee) :

And Perl is definitely awkward with data types. I haven't yet found a
pleasant way of shoving non-trivial data types into Perl's grammar.
Here's a very simple problem that's tricky to express in Perl: process
the output of "du" to produce output that's indented to reflect the
tree structure, and with each subtree sorted by size. Something like:
434 /etc
| 344 .
| 50 install
| 35 uucp
| 3 nserve
| | 2 .
| | 1 auth.info
| 1 sm
| 1 sm.bak

---

And from: Tom Christiansen tchrist@convex.com convex!tchrist:

:Here's a very simple problem that's tricky to express in Perl: process :the output of "du" to produce output that's indented to reflect the :tree structure, and with each subtree sorted by size. Something like: : 434 /etc : | 344 . : | 50 install : | 35 uucp : | 3 nserve : | | 2 . : | | 1 auth.info : | 1 sm : | 1 sm.bak

At first I thought I could just keep one local list around at once, but this seems inherently recursive. Which means I need an real recursive data structure. Maybe you could do it with one of the %assoc arrays Larry uses in the begat programs, but I broke down and got dirty. I think the hardest part was matching Felix's desired output exactly. It's not blazingly fast: I should probably inline the &childof routine, but it *was* faster to write than I could have written the equivalent C program.

--tom

#!/usr/bin/perl # dutree -- tchrist@convex.com

@lines = `du @ARGV`; chop(@lines); &input($top = pop @lines); &output($top); exit $?;

sub input { local($root, *kid, $him) = @_[0,0]; while (@lines && &childof($root, $lines[$#lines])) { &input($him = pop(@lines)); push(@kid, $him); } @kid = &sizesort(*kid); }

sub output { local($root, *kid, $prefix) = @_[0,0,1]; local($size, $path) = split(' ', $root); $path =~ s!.*/!!; $line = sprintf("%${width}d %s", $size, $path); print $prefix, $line, "\n"; $prefix .= $line; $prefix =~ s/\d /| /; $prefix =~ s/[^|]/ /g; local($width) = $kid[0] =~ /(\d+)/ && length("$1"); for (@kid) { &output($_, $prefix); }; }

sub sizesort { local(*list, @index) = shift; sub bynum { $index[$b] <=> $index[$a]; } for (@list) { push(@index, /(\d+)/); } @list[sort bynum 0..$#list]; }

sub childof { local(@pair) = @_; for (@pair) { s/^\d+\s+//g; } index($pair[1], $pair[0]) >= 0; }

---

The above prompted me to try the same problem in python. First attempt was a simple version, which leads me to discover that it is an inherently recursively problem ( as Tom states above... If I could read Perl better without poking into the manual, It might have been more obvious - I didn't have to perl man with me, so I didn't try very hard to read it. )

The simple version has 2 problems: (1) After writing the python script, I re-read the thread, and saw that they had asked for nodes sorted by size, ( at least the terminals, I'm not sure if they meant the whole tree. ) This was just an oversight, BUT I don't see a neat way to patch it into this code. (2) "Dangling "|"'s" : This is what made me realize it needs a recursize solution.

So back to the drawing board for version 2 :-( Happy to hear any suggestions!

Simple (version 1) of dutree and output:

#!/usr/local/python # dutree.py -- Steven D. Majewski (sdm7g@Virginia.EDU) # print a directory tree #

import string,path,posix,sys

# du( root ) ==> [ sorted list of [ path, usage ] ] def du( root ): d = posix.popen('du '+root ,'r' ).readlines() # slurp up 'du' output L = [] for x in d : xx =string.split(x) # split fields: [ usage, path ], xx.reverse() # swap ( [path,usage] sorts better! ) L.append( xx ) # append L.sort() return L

# print L as a tree, where: L = [ sorted list of [ path, usage ] ] def printdu( L ): print L[0] comlen = len(string.splitfields( L[0][0], '/' )) + 1 for x in L[1:]: ea = string.splitfields( x[0], '/' ) + x[-1:] print (' '*6+'|')*(len(ea)-comlen)+'__',string.rjust(ea[-2],4)+':',ea[-1] print '\n'

# tree is defined mainly for interactive use. # Usage: # >>>from dutree import tree # >>>tree( '/usr/local' ) # ( doesn't return anything - only prints tree.)

def tree( root ): printdu(du(root))

for d in sys.argv[1:] : tree( d ) # this is main() for dutree.py

-- 
and sample output: 

('/local/lang/Python', '9761') |__ demo: 542 | |__ scripts: 202 | |__ sgi: 176 | | |__ al: 20 | | |__ audio: 4 | | |__ audio_stdwin: 17 | | |__ gl: 69 | | | |__ glstdwin: 27 | | |__ gl_panel: 64 | | | |__ apanel: 5 | | | |__ flying: 30 | | | |__ nurbs: 9 | | | |__ twoview: 17 | |__ sockets: 17 | |__ stdwin: 145 | | |__ ibrowse: 81 |__ doc: 970 |__ lib: 382 |__ local: 86 |__ misc: 77 |__ readline: 1107 | |__ doc: 256 | |__ examples: 14 |__ src: 2733 |__ stdwin: 3859 | |__ Appls: 227 | | |__ dpv: 84 | | |__ klok: 22 | | |__ miniedit: 45 | | |__ repeat: 9 | | |__ test: 48 | | |__ tetris: 18 | |__ Build: 2858 | | |__ sun4: 2857 | | | |__ x11: 2856 | | | | |__ klok: 291 | | | | |__ lib: 1647 | | | | |__ miniedit: 373 | | | | |__ repeat: 291 | | | | |__ tetris: 253 | |__ Conf: 46 | |__ Doc: 146 | | |__ man: 38 | |__ Gen: 13 | |__ H: 42 | |__ Packs: 112 | | |__ textedit: 49 | | |__ vt: 62 | |__ Ports: 401 | | |__ alfa: 77 | | |__ mac: 98 | | |__ msdos: 33 | | |__ vtrm: 48 | | |__ x11: 144 | |__ Tools: 4

---
Note: ^ the dangling "|" ! :-( 
 -- Steve.