Following submission format
of PyCon -- Deadline: Dec 31
Authors
Yarden Katz, Kendall Clark, Bijan Parsia
Category
Web-based Systems
Contact information
yarden@umd.edu, kendall@monkeyfist.com, bparsia@isr.umd.edu
Requested timeslot
40-45 min. presentation
Summary
Title: Pychinko: A Native Python Rule Engine
Rule processing engines have proven to be useful in several areas of
Computer Science and software engineering, ranging from expert systems
in bioinformatics to serving as backends for web content management
platforms. Pychinko is a forward-chaining rule engine capable of the
generic use rule engines have traditionally received, but geared for
Semantic Web applications. Pychinko, implemented purely in Python,
employs the well-known and scalable Rete algorithm for expert
systems. We present the use and architecture of Pychinko (including a
brief exposition of the Rete algorithm), as well as several of its
compelling use cases, including examples from software engineering, web
applications, and Semantic Web specific tasks.
Outline
- Introduction
- Rules are in widespread usage in computer science and software
engineering. The symbolic approach of
rules has significant advantages over the methodology offered by traditional
procedural programming languages. In cases where queries posed
to a large pool of abstract data need to be answered, rules provide for a more
natural implementation. The widespread use of rules in
knowledge representation (KR) provides for many examples, where
rule processing engines like CLIPS had great
success.
- A particularly successful example is MYCIN, an early expert system for finding treatments to and diagnosing blood-related infections. MYCIN drew on a large knowledge base of relationships and a corresponding IF-THEN rules, entered by doctor(s) (the "expert" in "expert systems"), that allowed for quick remedies (educated guesses) in cases where a formal medical treatment was not possible.
- Rules are also useful in:
- Web applications: processing social networks
created by vocabularies such as FOAF, devising web access policies,
and styling and formatting web pages (use of
rule-based templates for generating HTML)
- Software engineering application: An implementation
of Python analog of Java's ant tool; rules for automating
software production and compilation.
- Pychinko
- A native, hackable, small Python rule engine with easy syntax.
- Facts and rules can be manipulated natively as Python data
structures in Pychinko. A rule of the form:
(?x hasParent ?y) (?y hasParent ?z)
=>
(?y hasGrandparent ?z)
(where ?x, ?y and ?z are variables)
is represented as a Rule object: Rule([Pattern(?x,
hasParent, ?y)], [Pattern(?y, hasParent, ?x)]) where the
first and second lists of the Rule object represent the
left-hand side and right-hand side of the rule, respectively.
- Pychinko employs Charles Forgy's classic AI algorithm, Rete,
proven to be the most efficient way to evaluate large rule sets. The use
of Rete allows Pychinko to process data sets significantly
faster than native Python alternatives such as CWM.
- The Rete algorithm: trading memory for speed, scales
extremely well in environments with many rules. Rete is based
on the observation that newly added facts tend to require only a
small set of rules to be triggered. The Rete network
is a selective network that represents all the rules in a
knowledge base. When a new fact is
added, it is filtered through the Rete network, where only
relevant rules are checked (and if necessary) triggered to infer
new information. The filtering process of Rete prevents
irrelevant rules from being checked or fired.
- Our implementation of Rete is compact (approximately 1200 lines of
code) and makes use of Python dictionaries for indexing facts efficiently.
- Rules and the Semantic Web
- The aim of the Semantic
Web project: give useful semantics to digitally meaningless
documents; allow for intelligent software agents to perform
tasks that would now require a human eye and brain. The areas
we focus on where Semantic Web languages are used are social networks and trust.
- Rules are more light-weight and attractive for certain
applications than Description
Logics (DLs), a computationally tractable subset of
first-order logic that is widely used in Semantic Web applications. The recently released Nokia
mobile phone software development in Python serves as a prime
example. On the phone, resources are limited, and the hardware
limitations prevent one from using heavy-duty DL reasoners such
as Pellet (written in Java.)
- Rules naturally lend themselves to describing relationships
in social networks and in areas of trust.
- Social networks: Processing FOAF documents. FOAF
is a semantic web vocabulary for forming social networks.
Consider a web application performing social network analysis
on large, spidered FOAF documents. An intelligent filtering
mechanism to avoid 'bland' documents (perhaps autogenerated by
a different application) unfit for analysis is needed. A
filtering mechanism of this kind might follow
foaf:knows links, and, based on running a series of
tests (in the form of rules) on the values of other FOAF
properties present, decide whether the document is to be
included in the analysis. Such rules were easily implemented
in Pychinko, being already RDF/XML aware and capable of
handling FOAF data.
- Trust: Devising policies on the web; permissions
and restrictions of the form "I grant access to database X
only to senior employees--all others can only view databases Y
and Z" or "Only those known to be my friends in my FOAF
document can view my picture gallery" are naturally captured
in the form of rules. The rules are easily expressed in
Pychinko, while the terms to which the rules refer
("employee", "database Y", etc.) are likely to be references
to a concept in an OWL
ontology or RDFS document
- Averages computation: Used for inferencing and
analysis of 2004 federal election polling data (scraped to RDF)
- Being an OWL vocabulary, the real gain of FOAF over purely
syntactic descriptions such as XML comes from inference.
Pychinko, unconstrained by the complexities of DL reasoners and
their difficulty with FOAF, is well-suited for performing useful
inference. DL reasoners might be cumbersome for two reasons: 1)
their implementations are too resource expensive for certain
environments, and 2) their required formality in knowledge
representation, while well-suited for many tasks, sometimes
cannot capture the ad-hoc, minimal reasoning of certain
applications.
- Tim Berners-Lee's implementation of a rule engine in Python
for the Semantic Web, cwm, does not use the
Rete algorithm, instead choosing a naive for-loop
implementation, though with some topological sorting
optimizations. Pychinko, on average, processes rules two
times faster than cwm, and can be upto five times faster in certain
cases due to its usage of Rete. There is an active effort to
adopt the Pychinko engine as a replacement for cwm's current algorithm.
- Future work
- Rules for validation of data stored in regular data bases.
A series of tests, in the form of rules, could be run to check
whether data stored in a given data base does not break a formal
specification, whether semantic (an OWL ontology) or syntactic
(an XML Schema.)
- Rules as a styling tool for complex data (displaying, in
HTML form, instances of OWL ontologies.)
Last modified: Fri Jan 7 11:23:32 EST 2005