Transaction In Doubt

My notes on Software Engineering

Tag: json

Boosting Performance with C

Right after posting my last blog entry I began work on Durable Rules third iteration. My main question was: How can I improve the framework performance by an order of magnitude across all scenarios? As you can see it has been quite a long time since my last blog post. Well… I have been very busy finding an answer for the question above.

Early on, I decided to re-implement the core rules engine in C. Compared with scripting and memory managed languages, C provides good control over memory allocation and a lot flexibility on data structure definition. Obviously, control and flexibility come at a price: It took me a lot longer to develop the code.

So, in the end, this exercise was not just simply porting code from one language to another. I had to rethink and design the data structures. I also had to design and write a JSon parser. And I must say: I also took some time to invent a new feature to enable batch processing for streaming. The detailed explanation is in the following sections.

Data Structure Definition

The most important part of the project was to define a suitable data structure to represent a ruleset. The central principle: the system has to guarantee O(n) (linear) performance as a function of the input (event) size. To illustrate how this is achieved, let’s consider the simple rule definition below:

var d = require('durable');{
    rating: {
        track: {
            whenAll: {
                a: { event: ’start’ },
                b: { event: ‘end’ }
            run: function(s) { }

The code above waits for two events ‘start’ and ‘end’ to execute the ‘track’ action. Conceptually this rule is represented by a Rete tree, which root is an alpha node for message based events (type = $m). This node is followed by two alpha nodes with two rules event = ’start’ and event = ‘end’ respectively. The alpha nodes are joined by the ‘all’ beta node, which is followed by the track action node.


In my C implementation the Rete tree is a trie. Every time an event is observed, each event message property is evaluated in the trie. The conceptual counterpart of a trie node is an alpha node in the Rete tree. Thus each trie node contains the rule to be applied to a single named property. Also each trie node points to subsequent properties to be evaluated (typically derived from an ‘and’ expression). The leafs of the trie point to a linked list of beta nodes terminated by an action node.


Very important: the trie node keeps the map of subsequent properties in a hash table. This is a slight deviation over typical trie nodes used in text context search, which keep a map of subsequent valid characters using an array. I decided to handle collisions in the hash table using a simple linear probing algorithm. With linear probing I get better memory usage and acceptable predictable near constant time performance as long as the load in the tables is kept low.

In order to increase memory locality and make better use of the processor cache, the trie structure is packaged in two arrays. One array has all the alpha, beta and action nodes. While a second array contains all the hash table buckets for the trie nodes. Nodes and hash tables never hold pointers to other nodes, only array offsets.


JSon parsing

To reduce the cost of data transformation I decided to use JSon as the event type system for this new Rules engine. Thus a JSon object could be passed directly from a Web Request input stream without any extra processing. C doesn’t have built in facilities for JSon parsing. So I wrote a lightweight JSon parser following a few strict principles:

  • Avoid Object Model (DOM) definition and buffering
  • Avoid using the heap, only use stack memory
  • Optimize for parsing key value pairs in a single pass
  • Calculate property name hashes while parsing

The JSon parser is really just a simple state machine tailored for the specific use in the rules engine. An important aside: I chose the DJB algorithm to hash property names because it is known to have good distribution and it added minimal overhead when parsing. The property name hash codes, as explained in the section above, are critical for the Rete tree ultra fast evaluation.


Compared with events which don’t match any rule, single positive event evaluation is expensive because it needs to be recorded in the Redis cache. In some cases it triggers the evaluation of a beta join and in other cases the queueing of an action. Batching helps optimizing the cost of all this activity and allows for processing large streams of data. Let’s consider the following snippet of code:{
    approval: {
        rule: {
            whenSome: { $and: [ { subject: 'approve’ }, { $lte: { amount: 1000 }}]},
            run: function(s) {}
}, '', null, function(host) {
    host.postBatch('approval', [{ id: '0', sid: 1, subject: 'approve', amount: 100 }, 
                                { id: '1', sid: 1, subject: 'approve', amount: 100 },
                                { id: '2', sid: 1, subject: 'approve', amount: 100 },
                                { id: '3', sid: 1, subject: 'approve', amount: 100 },
                                { id: '4', sid: 1, subject: 'approve', amount: 100 }]);

The ‘rule’ action will be able to process at least one event which matches the expression at the time of evaluation. The postBatch function allows clients to post an array of event messages for evaluation and dispatching.


In order not to lose track of my main objective I constantly measured performance when developing the project. When talking about performance there is always a lot of confusion. So first I will explain the methodology I used for measuring, then I will present the results for three important tests.

In all benchmarks: I used the same equipment: IMac OS X 10.9.4, 3.4GHz i7, 16GB RAM 1333MGHz DDR3. I drove the CPU utilization to 90% across all cores by running the same test concurrently in as many node.js processes as needed. In addition I added as many Redis servers as required to prevent it from becoming a bottleneck. I defined the number of operations to be performed and I measured the time it took to perform them.’Throughput’ is: the number of operations divided by the time it took to perform them. ‘Cost’ is: elapsed time divided by the number of operations. In the notes below I use the unit ‘K’ to denote thousand and ‘M’ to denote million.

Reject Events Test

In this test I defined a ruleset with 30 rules and continuously asserted event messages, which did not match any rule. I ran several experiments with different event message sizes. I used two different modes single and batch events.  The objective was to measure the raw JSon parsing and the tree expression evaluation speed. Just as a reference, not as baseline, I ran the same experiments for the JScript JSON.Parse function.


In batch mode durable rules is able to process a little less than 10M small messages (30B) every second, in single message mode it is 25% slower (7.5M messages every second). In both batch and single mode durable rules is able to process 1M larger messages (1KB) every second. Important note: the complexity of the algorithm is linear (O(n)) to the size of event messages. Interesting factoid: parsing the same messages in JScript (JSON.Parse) is also linear, but every byte takes more cycles to parse, indeed it can parse 10M small messages every second but only 0.25M large messages every second.

Accept Events Test

Again I defined a ruleset with 30 rules and continuously asserted event messages. This time all event messages matched a rule in the set and lead to queueing an action. I ran experiments with different message sizes and I tried two modes: single and batch. The objective was to measure redis caching and beta join calculation done in Redis scripts.


In batch mode durable rules is able to process 250K small event messages (50B) every second, while it can process 60K single small event messages every second. It can process 120K and 40K large event messages (1KB) every second in batch and single mode respectively. Again, the algorithm is linear (O(n)) to the size of the event message.

Rete Cycle Test

Finally I tested the full Rete cycle. I defined a 30 rules ruleset, but in this case not only were the event messages accepted, but the actions dequeued, the messages removed and the new associated state re-asserted. All of this was done while keeping the system consistency guarantees.


In batch mode durable rules can run 100K small event messages (50B) through the Rete cycle, while in single mode it can run 18K. In the case of larger event messages (1KB), the results are 15K and 65K in single and batch mode respectively. The algorithm is linear (O(n)) to the size of the event message.

In conclusion the new C implementation provides a significant performance boost over my previous JScript based implementation. The new batch\streaming feature allows for flowing a lot more data through the system making a more efficient use of the Redis cache.

To learn more, please visit:

Rete Meets Redis

In my previous post I briefly touched on the subject of Rules and the work I published in NPM as ‘durable’, which source is available under In this post I explore in detail how to use the Rete algorithm and the Redis server as a powerful combination for processing real time events. A little bit of background first: The Rete algorithm was created a few decades ago and has been used in Inference based Expert Systems for efficient execution of business rules. Redis is a data-structure server, currently  used in a large variety of scalable applications.

Ruleset Definition

A ruleset is a set of conditions followed by actions. In ‘durable’ I use rulesets to express conditions about events, that is, things that happen. Let’s illustrate this with a very simple and downright simplistic example. Imagine we are rating the effectiveness of a web page, and we establish the following rules:

  1. As soon as a visitor hits our web page, we will start a timer of 5 minutes (assuming the visitor is now reading).
  2. When our visitor clicks on a link within the 5 minute timeout, we are going to increase the page score (yes the page worked!)
  3. When the timeout expires and our visitor did not click on any link, we are going to decrease the page score (the page was abandoned)

I now translate the rules above to the simple JSON abstraction used in ‘durable’ for expressing rules. Don’t worry if you don’t understand all the details as long as you see how it relates to the description above, you should be able to follow. If you are curious about the rules definition abstraction please visit

rating: {
    r1: {
        when: { event: ’start' },
        run: function(s) {
            s.state = ‘reading’;
            s.startTimer(’timeout’, 300);
    r2: {
        whenAll: {
            $m: { event: ’click’  },
            $s: { state: ‘reading’ }
        run: function(s) { increaseScore(); }
    r3: {
        whenAll: {
            $m: { $t: ’timeout’ },
            $s: { state: ‘reading’ }
        run: function(s) { decreaseScore(); }

The Rete Tree

When running the ruleset above, the first thing the ‘durable’ runtime is going to do is create a Rete tree as shown below:

Rating Ruleset Rete Tree

In a Rete tree there are three different kinds of nodes:

  1. The solid line circles in the diagram above are called alpha nodes. They have one input and can have several outputs. Their main task is routing data into beta or action nodes. They are associated with an assertion about the input. When the assertion is met, the data is pushed into all children nodes.
  2. The dashed line circles are known as beta nodes. They have two or more inputs and one output. They are in charge of doing join operations. That is, pushing data into action nodes when all input criteria has been met.
  3. The squares represent actions. Actions do real work (such as sending email or triggering other events), they can change the state of the tree, negating previous state and asserting new state.

The two top level alpha nodes represent the type of assertions that can be made. In the case of durable there are always two top level nodes: one for event messages and another for user state changes.

Forward Chaining

Now it is time to consider how an event sequence works. Please pay special attention to step 2.iv. Remembering facts about rules antecedent state is the key for high performance rule evaluation. When the event in step 3 happens, the system doesn’t need to reevaluate the entire ruleset and it does not need to deserialize the user state.

1. Let’s start by posting the following message:

    id: 1,
    event: ‘start'
i. The message comes through the REST handler and is posted to the ruleset’s Rete tree.
ii. The message is pushed to the alpha node `type = ‘$m’`. The condition is matched, so it is now pushed to the next alpha node.
iii. The message meets the condition `$m.event = ‘start’`. It is then pushed to the action node.
iv. An action item is added to the action queue of the ruleset. The REST handler response is sent to back to the user at this point.

2. The action is picked up by the background dispatch loop:

i. The action is executed, the state is changed to `s.state = ‘reading’`.
ii. The new state is pushed to the alpha node `type = ‘$s’`. The condition is matched. The state is pushed to the next alpha node.
iii. The state meets the condition `s.state = ‘reading’`, so it is pushed to the beta node.
iv. The ‘all’ join in the beta node is not complete yet. This input is remembered (this fact is stored in memory).

3. Now we assume the user clicked on a link in the page and as a result the following message is posted:

    id: 2,
    event: ‘click'
i. The request comes through the REST handler and is posted to the ruleset’s Rete tree.
ii. The message is pushed to the alpha node `type = ‘$m’`, which pushes the message to the next alpha node.
iii. The message meets the condition `$m.subject = ‘approved’` and is pushed to the beta node.
iv. The ‘all’ join is now satisfied, so the message is pushed to the action node.
v. Now an action is added to the action queue of the ruleset. The REST handler response is sent to back to the user at this point.

4. The action is picked up by the background dispatch loop.

i. The all join state is deleted.
ii. The action is executed.

Tree State in Redis

In the previous section you might have noticed I glossed over the Rete tree memory management details (points 2.iv and 3.iv). In order to scale out, that is, to be able to use several cores to evaluate the ruleset, it is necessary to store the tree state in a cache external to the processes. Managing the beta nodes state needs careful consideration. Let’s look at a first approach to handling step 2.iv and 3.iv above:

Get the beta node state from the cache
If the ‘all' join is not complete
    Store the fact that one beta node side is complete
    Delete the beta node state from the cache
    Add an item to the action queue

The pseudo code above has two fatal flaws: 1. A race condition when checking and storing the beta node state can lead to losing event messages 2. Failure after deleting the beta node state can lead to dropping the action. To fix both problems we need to use a lock and the code needs to be idempotent, so it can be retried in case of failure.

Acquire a lock with a timeout (in case of failure)
Get the beta node state from the cache
If the ‘all' join is not complete
    Store the fact that one beta node side is complete
    Add an item to the action queue if it has not been added
    Delete the beta node state from the cache
Release the lock

Now the code above requires up to 5 cache accesses. In addition, recovering from a failure will delay waiting for the lock to timeout. In order to achieve real-time event processing performance only one memory cache access can be afforded for each relevant event (I came to this conclusion after studying the performance profiles from my benchmarks). There has to be a better way of doing this.

Redis offers features such as sorted sets, single threaded multi command execution and Lua scripting, which can be used to efficiently address the problem above. The Forward Chaining algorithm has to be modified for efficient memory management and join evaluation. Specifically: 1. Push the beta node state all the way down to the action nodes 2. The action nodes now are in charge of storing the information in the Redis, evaluating upstream joins and enqueuing actions in a single multi execute sequence.To illustrate this point, for the example above, the following sets are created to store the join information:

  • rating.r1.$m
  • rating.r2.$s.$s
  • rating.r2.$m.$m
  • rating.r3.$s.$s
  • rating.r3.$m.$m

The join evaluation works as follows:

Multi execute sequence for step 2.iv.
    Add an entry identifying the user state to rating.r2.$s.$s.
    Join by testing if rating.r2.$s.$s and rating.r2.$m.$m have an entry.
Multi execute sequence for step 3.iv.
    Add an entry identifying the event message to rating.r2.$m.$m.
    Join by testing if rating.r2.$s.$s and rating.r2.$m.$m have an entry.
    Add action entry to the action queue.
    Remove the entries in rating.r2.$m.$m and rating.r2.$s.$s.

Please note multi execute is run as one single threaded set of instructions in the Redis server. The use of sets ensure the idempotency of operations (it can be retried in case of failure).