Transaction In Doubt

My notes on Software Engineering

Category: Uncategorized

Polyglot

From its beginning the objective of the durable_rules project has been to provide a library, which can be leveraged in the most popular languages for building internet applications. Even though durable_rules borrows a healthy number of concepts from the Production Business Rules and Expert Systems domains, I consider it critical to tap into the infrastructure and knowledge base, which has been built over the last few years with languages such as JavaScript, Python and Ruby.

The initial durable_rules prototypes were built using JavaScript and the node.js environment. As I considered targeting a larger audience, the implementation evolved into a core engine written in C, using JSON as the lingua-franca for defining both rulesets and the data to be processed. A set of thin adapters would allow using the engine in JavaScript, Python and Ruby. I borrowed the MongoDB JSON query syntax as the linguistic abstraction for defining rulesets. For optimal performance the data in Redis would be stored using the message pack format (not JSON).

arch

Because scripting languages have great support for the JSON data model (hash sets, arrays and primitives) I was optimistic (perhaps naive) this approach would be a one-size fits all solution. However, as the project progressed, I found a couple of problems with this idea. 

First problem

In durable_rules the logical (and, or) and sequence operator (when_all, when_any) precedence matters. In MongoDB an expression like: ‘qty > 100 and price < 9.95’ is represented as ‘{{ qty: { $gt: 100 } }, { price: { $lt: 9.95 } }}’. This is nice and simple, but the JSON data model doesn’t imply hash-set ordering. What if I don’t want the engine to evaluate price when qty < 100? …And then came other features such as correlations, statecharts, flowcharts, parallel, salience, tumbling windows, count windows… I departed from MongoDB queries. The example below shows how a typical rule is defined in the latest durable_rules implementation. This rule waits for three events: The first of type ‘purchase’. The second with the same ‘ip’ address as the first but different ‘cc’ number. And the third with same ‘ip’ address as the first, but ‘cc’ number different than that of the second and the first.

JSON.stringify({
    suspect: {
        all: [
            {first: {t: 'purchase'}},
            {second: { 
                $and: [
                    {ip: {first: 'ip'}}, 
                    {$neq: {cc: {first: 'cc'}}} 
                ]
            }},
            {third: { 
                $and: [
                    {ip: {first: 'ip'}}, 
                    {$neq: {cc: {first: 'cc'}}}, 
                    {ip: {second: 'ip' }}, 
                    {$neq: {cc: {second: 'cc'}}}
                ]
            }}
        ],
    }
})

Second problem

The rule above expresses precisely what I want. But I just cannot imagine manually defining tens, hundreds or thousands of rules using the format above. It is not sustainable. Ruby, with its great DSL support, gave me the creative inspiration for making significant improvements over the user model. By leveraging language extensibility I was able to define and follow the set of principles for the three libraries I have implemented so far JavaScript, Python, Ruby:

  1. Leverage the language
    • Use existing expression language
    • Use existing scoping constructs
    • Use blocks, lambdas and decorators
  2. Enable compact and maintainable code
    • Avoid nesting\scoping when possible
    • Encourage the user to define short phrases, avoid cascading
    • Facilitate troubleshooting

To the extent possible I tried driving uniformity across library implementations, however I did not sacrifice usability for uniformity. Below is a summary of the features I used for implementing the libraries. 

arch

Notes on Ruby

Ruby has the most comprehensive language extensibility: Operator overloading for logical and arithmetic expressions. Property accessor interception to leverage ‘.’ notation. Block support to seamlessly integrate actions as part of the rule definition. Contexts applied to block execution to avoid unnecessary variable qualification in expression rvalues and action implementations. All these features lead to a compact and fairly well integrated ruleset definitions.  

Durable.ruleset :suspect do
  when_all c.first = m.t == "purchase",
           c.second = (m.ip == first.ip) & (m.cc != first.cc),
           c.third = (m.ip == second.ip) & (m.cc != first.cc) & (m.cc != second.cc) do
    puts "detected " + first.cc + " " + second.cc + " " + third.cc
  end
end

‘durable_rules’ exposes a REST API to post events and assert facts. I used the Sinatra framework to implement such an API. It was very easy to integrate, except for the threading model: for each request a synchronous response is expected and multiple concurrent requests are dispatched in different threads in a single process.

One caveat, which really bothers me: In Ruby it is not possible to overload logical ‘and’ and ‘or’. I had to use bitwise operators instead. The bitwise operator precedence is different from that of their logical counterparts. So, it is required to enclose logical expressions with parens ‘(m.ip == first.ip) & (m.cc != first.cc)’ as opposed to ‘m.ip == first.ip && m.cc != first.cc’.

Notes on Python

Python has great language extensibility. Similar to Ruby: operator overloading for logical and arithmetic expressions and property accessor interception allows using ‘.’ notation for tuple names. Function decorators for rule action integration. Contexts for leveraging scoping constructs when defining rulesets, statecharts and flowcharts. This results in compact and well integrated ruleset definitions.

with ruleset(’suspect'):
    @when_all(c.first << m.t == 'purchase',
              c.second << (m.ip == first.ip) & (m.cc != first.cc),
              c.third << (m.ip == second.ip) & (m.cc != first.cc) & (m.cc != second.cc))
    def detected(c):
        print ('detected {0} {1} {2}'.format(c.first.cc, c.second.cc, c.third.cc))

I used the Werkzeug framework to implement the REST API. I considered using Flask, but I really didn’t need all its features. Similar to Sinatra and Ruby: for each request a synchronous response is expected and multiple concurrent requests are dispatched in different threads in a single process.

Python has the same caveat as Ruby regarding logical ‘and’ and ‘or’ overloading. Lambda support in Python is fairly limited, so I resorted to function decorators, still with a very reasonable outcome. In python ‘=‘ is a statement, so I had to use the ‘<<‘ operator for naming events and facts in correlated sequences.  

Notes on JavaScript

JavaScript does not have operator overloading, so a set of functions for describing logical and arithmetic expressions needs to be provided. Property accessor overloading is not readily available, I played a nifty trick to overcome this obstacle (see note below). Lambda support enables rule action integration. Contexts allow leveraging scoping constructs when defining rulesets, statecharts and flowcharts. While the result seems acceptable, the question still remains: has this crossed the usability tipping point?

with (d.ruleset('suspect')) {
    whenAll(c.first = m.amount.gt(100),
            c.second = m.ip.eq(c.first.ip).and(m.cc.neq(c.first.cc)), 
            c.third = m.ip.eq(c.second.ip).and(m.cc.neq(c.first.cc), m.cc.neq(c.second.cc)),
        function(c) {
            console.log('detected ' + c.first.cc + ' ' + c.second.cc + ' ' + c.third.cc);
        }
    );
}

Node.js provides great support for implementing the REST API. In addition the single threaded continuation based model is a beautiful invention (in my opinion), it helps with JavaScript ‘global’ variable management, enables non-blocking IO calls and provides nice concurrency control using multiple processes.

‘m.amount.gt(100)’: It is easy to take it for granted. In order to enable such an expression without requiring the user to define object hierarchies, I leveraged V8 extensibility in C++ (given that the rulesets are run in node.js). In V8 it is possible to provide a proxy object that intercepts Property Set\Get and call back a JavaScript lambda:

C++ class definition:

class Proxy {
public:

  explicit Proxy(Handle<Function> gvalue, Handle<Function> svalue) { 
    Isolate* isolate = Isolate::GetCurrent();
    _gfunc.Reset(isolate, gvalue); 
    _sfunc.Reset(isolate, svalue); 
  }
  Handle<Object> Wrap();

private:

  static Proxy* Unwrap(Local<Object> obj);
  static void Get(Local<String> name, const PropertyCallbackInfo<Value>& info);
  static void Set(Local<String> name, Local<Value> value, const PropertyCallbackInfo<Value>& info);
 
  Persistent<Function> _gfunc;
  Persistent<Function> _sfunc;
};

From JavaScript

var s = r.createProxy(
    function(name) {
      ….
    },
    function(name, value) {
    }
 );

Please find the full implementation in durable_rules code base.

A note on multi-platform support

Unfortunately a Polyglot framework does not imply multi-platform support. In the Unix ecosystem, the ‘durable_rules’ libraries (including the C engine implementations) have a high degree of portability. That is to say: having implemented the framework in the Mac OS X platform, very few changes were needed to make it work in Linux. Integrating the Windows platform has been a lot more difficult: VisualStudio, which is the premier toolset for Windows, doesn’t fully support C99 (variable length arrays, asprintf, snprintf…), luckily all these limitations can be overcome without needing fundamental changes nor creating a different code fork. Hiredis is not officially supported by Redis for Windows, the MS Open Tech team has done a nice job porting and maintaining the Redis code base for Windows. I extracted the ‘Hiredis’ portion for the ‘durable_rules’ C engine as it heavily depends on it. 

 

Rete_D

The past few months have been a time of intense study and creative software implementation. Which led to the full development of the durable_rules forward inference algorithm. Before I release what I consider the durable_rules beta version, I would like to present my thoughts on what I have done with the Rete algorithm. It is difficult to talk about an algorithm without naming it. That is why, in this note, I refer to durable_rules forward inference implementation as Rete_D (D for distributed).

As I read blogs and white papers on rules engines implementations, at least in principle, I found interesting similarities between Rete_D and Drools (ReteOO as well as PHREAK). In particular it is worth mentioning the features:

  • Alpha node indexing: Rete_D accomplishes by using a trie written in C (see my previous blog entry).
  • Beta node indexing: in Rete_D `join` and `not` nodes are indexed using a lua table.
  • Set oriented propagation:  Inserts and deletes are queued. During rule evaluation, inserts and deletes are evaluated against a set of frames which facts and events stored in redis.
  • Heap based agendas: The agenda for rule execution is managed with a redis sorted set and lists ordered by salience.

Rete_D has a number of important features, which make it unique. To demonstrate them I use a simple fraud detection rule written in Ruby.

Durable.ruleset :fraud do
  when_all c.first = m.t == "purchase",
                 c.second = (m.t == "purchase”) & (m.location != first.location) do
    puts "fraud detected " + first.location + " " + second.location
  end
end

Scripting

In the example above, you might have noticed there is no types nor models definition. Events and facts are JSON objects and rules are written for the JSON type system (sets of name value pairs). A good analogy is the MongoDB query language, in which the existence of JSON objects is assumed and an explicit structured schema definition is not needed. In fact in my earliest Rete_D implementation I used the MongoDB syntax for defining rules. I departed from MongoDB query because expression precedence when using operators such as And, Or, All and Any is ambiguous. In today’s durable_rules implementation our example will be translated to:

fraud: {
    all: [
        {first: {t: 'purchase'}},
        {second: {$and: [{t: 'purchase'}, {$neq: {location: {first: 'location'}}}]}}
     ],
}

Events and facts are fully propagated down the network, that is, there is no property extraction constructs in either the library nor the Rete_D structure. In a rule definition, events and facts have to be named and referenced by such a name when expressing correlations. Any property name can be used. The results:

  1. Simpler rule definitions. 
  2. Integration with scripting languages, that provide great support for JSON type system.

Distribution

With Rete_D, the tree evaluation can distributed among different compute units (call them processes or machines). There are three important reasons for doing so:

  1. Availability: avoid single points of failure.   
  2. Scalability: distribute the work utilizing resources available.
  3. Performance: consolidate event ingestion with real-time rule evaluation.

To make this viable, pushing evaluation as close to the data a possible is critical. Alpha nodes are evaluated by the Rete_D C library and can be distributed symmetrically across compute units (such as Heroku web dynos). The Beta nodes are evaluated by a lua script (created during rule registration) in the Redis servers where the state is kept. Events and facts can be evaluated in different Redis servers depending on the context they belong to. Finally actions are to be executed by the compute units responsible for alpha node evaluation.

trie

Events

Events are a first class citizen in the Rete_D data model. The difference between events and facts is events can only be seen once by any action. This principle implies that events can be deleted as soon as a rule is resolved and before the corresponding action is scheduled for dispatch. This can dramatically reduce the combinatorics of join evaluation and the amount of unnecessary computations for some scenarios.  

To illustrate this point, let’s add the following rule to the example above:


  when_start do
    assert :fraud1, {:id => 1, :sid => 1, :t => "purchase", :location => "US"}
    assert :fraud1, {:id => 2, :sid => 1, :t => "purchase", :location => "CA"}
    assert :fraud1, {:id => 3, :sid => 1, :t => "purchase", :location => "UK"}
    assert :fraud1, {:id => 4, :sid => 1, :t => "purchase", :location => "GE"}
    assert :fraud1, {:id => 5, :sid => 1, :t => "purchase", :location => "AU"}
    assert :fraud1, {:id => 6, :sid => 1, :t => "purchase", :location => "MX"}
    assert :fraud1, {:id => 7, :sid => 1, :t => "purchase", :location => "FR"}
    assert :fraud1, {:id => 8, :sid => 1, :t => "purchase", :location => "ES"}
    assert :fraud1, {:id => 9, :sid => 1, :t => "purchase", :location => "BR"}
    assert :fraud1, {:id => 10, :sid => 1, :t => "purchase", :location => "IT"}
  end

Because each fact matches the first and the second rule expression, there are 10 x 10 comparisons, which produce 10 x (10 -1) = 90 results. In contrast, let’s replace ‘when_start’ with the following rule:


  when_start do
    post :fraud1, {:id => 1, :sid => 1, :t => "purchase", :location => "US"}
    post :fraud1, {:id => 2, :sid => 1, :t => "purchase", :location => "CA"}
    post :fraud1, {:id => 3, :sid => 1, :t => "purchase", :location => "UK"}
    post :fraud1, {:id => 4, :sid => 1, :t => "purchase", :location => "GE"}
    post :fraud1, {:id => 5, :sid => 1, :t => "purchase", :location => "AU"}
    post :fraud1, {:id => 6, :sid => 1, :t => "purchase", :location => "MX"}
    post :fraud1, {:id => 7, :sid => 1, :t => "purchase", :location => "FR"}
    post :fraud1, {:id => 8, :sid => 1, :t => "purchase", :location => "ES"}
    post :fraud1, {:id => 9, :sid => 1, :t => "purchase", :location => "BR"}
    post :fraud1, {:id => 10, :sid => 1, :t => "purchase", :location => "IT"}
  end

Remember each event can only be see once, therefore there are 10 comparisons, which produce only 5 results. As you can see events can dramatically improve performance for some scenarios. 

Context References

It is common to provide rule configuration information by asserting facts. In some cases this might lead to increasing join combinatorics. Rete_D implements an eventually consistent state cache, which can be referenced during alpha node evaluation, thus reducing the beta evaluation load. Consider the rule implemented in Ruby:

Durable.ruleset :a8 do
  when_all m.amount > s.id(:global).min_amount do
    puts "a8 approved " + m.amount.to_s
  end
  when_start do
    patch_state :a8, {:sid => :global, :min_amount => 100}
    post :a8, {:id => 2, :sid => 1, :amount => 200}
  end
end

An event is compared against the ‘min_amount’ property of the context state object called ‘global’. This evaluation is done by an alpha node using the state cached in the compute unit and refreshed every two minutes (by default). Again this can lead to reduction in join combinatorics and significant performance improvements for some scenarios.

Conclusion

Rete_D provides a number of unique features, which I believe make it relevant for solving Streaming Analytics problems where rules are expected to handle a large amount of data in real time. The following features are not yet implemented in Rete_D. They represent exciting areas for future research.

  • Beta node sharing
  • Lazy rule evaluation
  • Rule based partitioning
  • Backward inference