<< Previous | Home

Kafka Record Patterns for Data Replication

Imagine going down to your local milkshake bar and signing a contract with the owner so that you could purchase bespoke drinks at a set price. Let's say you agreed on fresh milk with 3.5% fat and one tablespoon of chocolate powder, per 500ml of milk.  Putting that into a table might look like this:

PK contract_number start fat_content chocolate_powder
100 12345678 2021-01-01 3.5% 1 tbsp

After a few weeks, your tastebuds become a little desensitised and you decide you want to add some more chocolate powder. The owner is agile, so he adjusts the contract, meaning we need to add a few columns in order to track validity:

PK contract_number contract_from start end fat_content chocolate_powder
100 12345678 2021-01-01 0001-01-01 2021-01-31 3.5% 1 tbsp
101 12345678 2021-01-01 2021-02-01 9999-12-31 3.5% 2 tbsp

Note two things: 1) this table is not normalised and 2) I used a low date (year 0001) and high date (year 9999) for the start of the first row and the end of the last row.

In reality we would probably normalise this data. For the sake of this example, I won't because it will make it more readable as I add more information below.

The low and high dates are there, so that I can always find data, regardless of the date I use - I don't have to know the contract termination date which is different for every contract, in order to be able to simply ask what the latest recipe is, for a given contract number:

select *
from contracts
where contract_number = '12345678' 
  and '9999-12-31' between start and end;
--> returns row with primary key 101

After a few more weeks, I realise that I need to reduce my calorific intake, but I'm a complete chocoholic. We agree to reduce the fat content:

PK contract_number contract_from start end fat_content chocolate_powder
100 12345678 2021-01-01 0001-01-01 2021-01-31 3.5% 1 tbsp
101 12345678 2021-01-01 2021-02-01 2021-02-28 3.5% 2 tbsp
102 12345678 2021-01-01 2021-03-01 9999-12-31 0.8% 2 tbsp

At some point I get bored of milkshakes and I terminate the contract, but because I never purchased a milkshake with 0.8% fat, the owner lets me terminate it with a date in the past, say 2021-02-14, so that we can delete the last row:

PK contract_number contract_from contract_to start end fat_content chocolate_powder
100 12345678 2021-01-01 2021-02-14 0001-01-01 2021-01-31 3.5% 1 tbsp
101 12345678 2021-01-01 2021-02-14 2021-02-01 9999-12-31 3.5% 2 tbsp

Note that it is a design choice whether or not we "shorten" the end date. We might want to do that in order to make such data not be found after the contract termination date. It depends on requirements more than anything.

What has all this got to do with Kafka, and data replication?

Imagine a self-contained microservice which needs to have an up to date copy of this data, in memory, in order to run lightning fast. Imagine you want that cache to be distributed across all of your service instances (Kubernetes pods). How about the following 7 lines of Kotlin code that use the nifty Kafka Streams API:

val builder = StreamsBuilder()
val globalStore = Materialized.`as`(globalStoreName)
// global, so that every pod has access to all data from all partitions:
builder.globalTable(CONTRACTS_TOPIC, globalStore)
val streams = KafkaStreams(builder.build(), props)
val globalBausteinView = streams.store(fromNameAndType(globalStoreName, ...)

// REST Handler:
val contractJson = globalBausteinView.get(contractNumber)

We need to publish the contract data to the topic used as the input, but before we do that, let's think about the keys we use, in order to have the data survive log compaction. It would be no good to publish three records, each using the contract number as the key, because as soon as the topic were compacted, only the data from the last row would survive, and any service replicating from scratch would have an incomplete dataset. The solution is to include the start date in the key, e.g. "12345678::2021-02-01".

We have a number of options regarding the values (payload). Let's work through the examples.

(Note: initially contracts are valid for 5 years, so the contract_to column always has a value)

1) Denormalised Table, Variation 1 - One Event per Attribute Combination
Use Case PK contract_number contract_from contract_to start end





records emitted
Contract Creation 100 12345678 2021-01-01


0001-01-01 9999-12-31 3.5% 1 tbsp

Key:  12345678::2021-01-01

Value: {cn: 12345678, from: "2021-01-01", to: "2025-12-31", start: "2021-01-01", end: "2025-12-31", fatContent: 3.5, choc: 1}

Change choc powder 101 12345678 2021-01-01 2025-12-31 0001-01-01 2021-01-31 3.5% 1 tbsp

Key:  12345678::2021-01-01

Value: {cn: 12345678, from: "2021-01-01", to: "2025-12-31", start: "2021-01-01", end: "2021-01-31", fatContent: 3.5, choc: 1}

102 12345678 2025-12-31 2025-12-31 2021-02-01 9999-12-31 3.5% 2 tbsp Key:  12345678::2021-02-01

Value: {cn: 12345678, from: "2021-01-01", to: "2025-12-31", start: "2021-02-01", end: "2025-12-31", fatContent: 3.5, choc: 2}

Change fat content 101 12345678 2021-01-01 2025-12-31 0001-01-01 2021-01-31 3.5% 1 tbsp none - no changes made
102 12345678 2021-01-01 2025-12-31 2021-02-01 2021-02-28 3.5% 2 tbsp Key:  12345678::2021-02-01

Value: {cn: 12345678, from: "2021-01-01", to: "2025-12-31", start: "2021-02-01", end: "2021-02-28", fatContent: 3.5, choc: 2}

103 12345678 2021-01-01 2025-12-31 2021-03-01 9999-12-31 0.8% 2 tbsp Key:  12345678::2021-03-01

Value: {cn: 12345678, from: "2021-01-01", to: "2025-12-31", start: "2021-03-01", end: "2025-12-31", fatContent: 0.8, choc: 2}

Contract Termination 101 12345678 2021-01-01 2021-02-14 0001-01-01 2021-01-31 3.5% 1 tbsp

Key:  12345678::2021-01-01

Value: {cn: 12345678, from: "2021-01-01", to: "2021-02-14", start: "2021-01-01", end: "2021-01-31", fatContent: 3.5, choc: 1}

102 12345678 2021-01-01 2021-02-14 2021-02-01 2021-02-14 3.5% 2 tbsp Key:  12345678::2021-02-01

Value: {cn: 12345678, from: "2021-01-01", to: "2021-02-14", start: "2021-02-01", end: "2021-02-14", fatContent: 3.5, choc: 2}



Key: 12345678:2021-03-01

Value: null (tombstone record)

Note how the key and start/end dates are not the ugly technical dates but limited to the atual contract validity. That is a design choice where I chose not to expose technical details.

In this variant, we publish a record for the "lowest common denominators" in terms of validity. There is an event for each time window in which values are constant. Each change, leads to a new record.

Imagine viewing the validities of the values seperately, as they might be if we normalised the table:

Value January February March April...
Milk Fat Content 3.5 0.8
Chocolate Powder 1 2
Resulting Time Windows with constant values 3.5 & 1 3.5 & 2 0.8 & 2

Each change leads to a new row in the denormalised table and hence a new record in Kafka. The three events that are published are visible on that bottom row.

As an alternative, we could publish one event per contract, with validities inside the payload, as follows.

2) Denormalised Table, Variation 2 - One Event per Contract
Use Case records emitted
Contract Creation

Key:  12345678

Value: {cn: 12345678, from: "2021-01-01", to: "2025-12-31",

    fatContent: [ {start: "2021-01-01", end: "2025-12-31", value: 3.5} ],

    choc: [ {start: "2021-01-01", end: "2025-12-31", value: 1} ]


Change chocolate powder Key:  12345678

Value: {cn: 12345678, from: "2021-01-01", to: "2025-12-31",

    fatContent: [ {start: "2021-01-01", end: "2025-12-31", value: 3.5} ],

    choc: [ {start: "2021-01-01", end: "2021-01-31", value: 1},

                 {start: "2021-02-01", end: "2025-12-31", value: 2} ]


With this variation, we end up having to publish a list of values together with their validities.

3) Normalised Table, Each Attribute on its own Topic

The next solution is to publish each attribute on its own topic.

Use Case records emitted
Contract Creation

Topic: Contract

Key:  12345678

Value: {cn: 12345678, from: "2021-01-01", to: "2025-12-31"}

Topic: Fat Content

Key: 12345678::2021-01-01

Value: {start: "2021-01-01", end: "2025-12-31", value: 3.5}

Topic: Chocolate Powder

Key: 12345678::2021-01-01

Value: {start: "2021-01-01", end: "2025-12-31", value: 1}

Change choc powder

Topic: Chocolate Powder

Key: 12345678::2021-01-01

Value: {start: "2021-01-01", end: "2021-01-31", value: 1}

Key: 12345678::2021-02-01

Value: {start: "2021-02-01", end: "2025-12-31", value: 2}

Change fat content

Topic: Fat Content

Key: 12345678::2021-01-01

Value: {start: "2021-01-01", end: "2021-02-28", value: 3.5}

Key: 12345678::2021-03-01

Value: {start: "2021-03-01", end: "2025-12-31", value: 0.8}

Contract Termination

Topic: Contract

Key:  12345678

Value: {cn: 12345678, from: "2021-01-01", to: "2021-02-14"}

Topic: Fat Content

Key: 12345678::2021-01-01

Value: {start: "2021-01-01", end: "2021-02-14", value: 3.5}

Key: 12345678::2021-03-01

Value: null (tombstone record)

Topic: Chocolate Powder

Key: 12345678::2021-01-01 --> no change, so no record emitted

Key: 12345678::2021-02-01

Value: {start: "2021-02-01", end: "2021-02-14", value: 2}

4) Verticalised Table, One Topic for all Attributes

The final solution is to use a verticalised table in order to store the data. This has the advantage that you can dynamically add new attributes, and in fact each contract could have different attributes. This is akin to a schemaless document. The publication of records in Kafka becomes quite generic.

Use Case records emitted
Contract Creation

Key:  12345678::fatContent::2021-01-01

Value: {start: "2021-01-01", end: "2025-12-31", value: 3.5}

Key: 12345678::chocolatePowder::2021-01-01

Value: {start: "2021-01-01", end: "2025-12-31", value: 1}

Change choc powder

Key:  12345678::fatContent::2021-01-01 --> no change, no event emitted

Key: 12345678::chocolatePowder::2021-01-01

Value: {start: "2021-01-01", end: "2021-01-31", value: 1}

Key: 12345678::chocolatePowder::2021-02-01

Value: {start: "2021-02-01", end: "2025-12-31", value: 2}

Change fat content

Key:  12345678::fatContent::2021-01-01

Value: {start: "2021-01-01", end: "2021-02-28", value: 3.5}

Key:  12345678::fatContent::2021-03-01

Value: {start: "2021-03-01", end: "2021-02-28", value: 0.8}

Key: 12345678::chocolatePowder::2021-01-01 --> no change, no event emitted

Key: 12345678::chocolatePowder::2021-02-01 --> no change, no event emitted

Contract Termination

Key:  12345678::fatContent::2021-01-01

Value: {start: "2021-01-01", end: "2021-02-14", value: 3.5}

Key:  12345678::fatContent::2021-03-01

Value: null (tombstone record)

Key: 12345678::chocolatePowder::2021-01-01 --> no change, no event emitted

Key: 12345678::chocolatePowder::2021-02-01

Value: {start: "2021-02-01", end: "2021-02-14", value: 2}

My favourite is the first solution, as I find it to be the closest to the functional business requirements.

Another way to choose which solution to use might be to calculate the effect that the solution has on data volume (storage in Kafka; transport through your landscape; storage in replicates).

If you have other solutions, please get in touch.

Copyright ©2021, Ant Kutschera

Social Bookmarks :  Add this post to Slashdot    Add this post to Digg    Add this post to Reddit    Add this post to Delicious    Add this post to Stumble it    Add this post to Google    Add this post to Technorati    Add this post to Bloglines    Add this post to Facebook    Add this post to Furl    Add this post to Windows Live    Add this post to Yahoo!

Using Reinforcement Learning To Learn To Play Tic-Tac-Toe

About a year ago I set myself the goal of writing an algorithm that could learn to play tic-tac-toe. I didn't want to tell the algorithm what the rules of the game are, nor did I want it to try and use some kind of calculation to look ahead at possible moves which might lead to a win from the current state of the board. I wanted the algorithm to "learn" how to play, by playing against itself. I knew nothing about machine learning, so I spent a bit of time learning about neural networks but didn't get very far and convinced myself that neural networks wouldn't work and I put the project to the back of my mind. A few months ago at a Christmas party, I bumped into an acquaintance, JP, and I ended up telling him about my goal. He suggested reinforcement learning and a few days later I started reading about it. It didn't take long before I was trying to understand Q-learning and failing fast by getting lost in the maths. So with my limited knowledge I went about iteratively designing an algorithm to "learn" to play. Here is the result, which you can play against interactively by clicking on the board or buttons. It contains no code telling it what good or bad moves are and the AI algorithm knows nothing about the rules of the game - it's simply told what the outcome of each move is (unfinished, won, drawn, lost). It doesn't even know that the aim is to get three in a row. All the AI algorithm knows is that X and O make alternate moves and that a square can only contain an X or an O or be emtpy.

The "clear" button clears the board and resets the game. The "reset stats" button clears the score board below the buttons. The "play autonomously" button causes it to start playing against itself and continue learning - it swaps side every hundred games and will occasionally explore randomly, according to its exploration strategy (see below). The "swap roles" button causes it to change sides. The "explore" button can be used to turn off exploration during games against humans.

Tic-tac-toe is played by successively placing 'X's and 'O's (markers) on the board until one side has three in a row, vertically, horizontally or diagonally. To start with, each cell on the board is empty (modelled as a null in Javascript). As the game progresses, the state of the board, a.k.a. "board state", changes. As an example, here is the board state after one move by X in the middle of the top row of the board:

  j=0 j=1 j=2
i=0   X  

Each state can be converted into a "pattern" which is built by concatenating the cell values from left to right starting with the top row and ending with the bottom row, delimiting the rows with a pipe character. Empty cells are represented using a hyphen ('-'). So the example above becomes: "-X-|---|---". The patterns are used as keys, which map to a list of possible moves from that board state (like a dictionary). To make things simple, I chose that X would always start regardless of whether it was the AI or the human moving first. For each possible move, the algorithm stores the final result that was achieved - a win for X, a win for O or a draw. After the first move, there are eight possible moves, after the second, only seven, etc. Results are recorded in JSON similar to the following, and here is an example showing what the algorithm has learned for the pattern where no move has yet been made:
const memory = {
    "---|---|---" : [
        {i: 1, j: 1, xWinRewards: 185510, oWinRewards: 20028, drawRewards: 60161},
        {i: 2, j: 2, xWinRewards:   1390, oWinRewards:  1910, drawRewards:   379},

Each element in the array represents a move which can be made from the board state represented by the key, with 'i' representing the vertical position, and 'j' the horizontal position. So 'xWinRewards' records how well X has done from this board state, in this case an empty board, after moving to the position indicated by i and j. 'oWinRewards' records how well player O has done, and 'drawRewards' shows how likely a draw is. The example above was taken after a hundred thousand games were played. It shows how the algorithm has learned that X is most likely to win if it starts in the centre of the board (position i=1, j=1). It also shows how moving to the centre is more likely to result in a draw than a win for the second player O, because it has gained more rewards for draws than for O winning. You can also see that far fewer games were played where the first player opened in the bottom right (2, 2) because it has gained less rewards for that move.

Notice the term "reward". The numbers recorded above are not the number of games, rather they are rewards (or points) that are given at the end of the game, when the algorithm walks through each move of the game, retroactively awarding points for each move. First it builds the pattern and then it looks that pattern up in its memory. It then adds a reward to the relevant "next move" which was made in the current game, adding it to the relevant field in the object shown above. Take for example the board after an initial move by X to the centre of the board. Let's say that O won later in the game, after moving to position 2,2 as their first move. So the pattern which is looked up is "---|-X-|---". In that memory location, the algorithm finds the object with i=2 and j=2. It adds a reward to the field named "oWinRewards". The algorithm then moves on to the next move which took place in the game and does the same thing for the pattern "---|-X-|--O", since that pattern represents the game after the second move.

Initially I used a reward of just one point. It soon became clear that the algorithm wasn't learning properly and it was still making "stupid" decisions, for example it wouldn't complete two in a row in order to win immediately and would prefer to move elsewhere rather than taking the victory. The reason is that in the beginning it only moves randomly (it has to explore since it has no experience to exploit, see below) and sometimes it would miss the chance to win immediately, but then win later in a game anyway. It learned that it could skip an immediate win and still be successful. The solution was to increase the reward for winning shorter games, so that the incentive was higher.

The next problem I encountered was that the algorithm was giving too much weight to moves which it had already encountered many times, even if a different move was actually better. It resulted in the algorithm doing what appeared to be silly moves, and not really seeming to learn. The solution was to chose the next move based on relative rewards, rather than absolute rewards. That way, the total number of times a move was tried was no longer relevant. Rather the chances of winning were what made the algorithm chose the next move. I later learned that there are different strategies here - you can base the choice of the next move on the average reward, or the best one that has ever been achieved, and the strategy relates to the reinforcement learning algorithm being used (for example Q-learning apparently uses the best reward rather than an average one).

Another factor which influenced how the algorithm learned was whether it learned against a player who always moved randomly or against itself. It became a stronger player when learning against itself, than against a random monkey. That makes sense because if you've only ever had to respond to random moves, you won't have learned how to play someone who knows how to win tic-tac-toe, where more skill is required.

Finally, the last important factor in learning is the trade off between exploration (to try a random move which might lead to learning something new) and exploitation (using existing knowledge to choose a "good" move based on experience). Just like a human, the algorithm starts by only exporing randomly, because it doesn't know better, but the more games it plays, the less it explores and the more it choses moves based on experience. The method used to decide whether to explore or not has quite an effect on how quickly and how well the algorithm learns.

So does the algorithm really learn? Let's take a look by asking some questions.

1) Does the algorithm know where the best starting location is?

Yes, because it knows that starting in the centre gives the best chances of a win. The rewards for the possible moves before any player has had a turn (pattern "---|---|---|"), ordered by reward with the best move at the top and the worst at the bottom are as follows:
{i: 1, j: 1, xWinRewards: 185510, oWinRewards: 20028, drawRewards: 60161} // => 72.0
{i: 2, j: 2, xWinRewards:   1390, oWinRewards:  1910, drawRewards:   379} // => 38.3
{i: 0, j: 0, xWinRewards:   1169, oWinRewards:  1526, drawRewards:   447} // => 38.1
{i: 2, j: 0, xWinRewards:   1291, oWinRewards:  2068, drawRewards:   419} // => 34.7
{i: 0, j: 2, xWinRewards:   1191, oWinRewards:  1920, drawRewards:   495} // => 33.9
{i: 1, j: 2, xWinRewards:   1155, oWinRewards:  2300, drawRewards:   402} // => 30.4
{i: 0, j: 1, xWinRewards:   1122, oWinRewards:  2372, drawRewards:   400} // => 29.2
{i: 1, j: 0, xWinRewards:    844, oWinRewards:  1678, drawRewards:   491} // => 29.0
{i: 2, j: 1, xWinRewards:   1079, oWinRewards:  2292, drawRewards:   474} // => 28.7
The reward that is shown on the right of each line, which is used when choosing the next move to make, is calculated as follows:
winRewards = player === X ? possibleMove.xWinRewards : possibleMove.oWinRewards;
drawRewards = possibleMove.drawRewards;
lossRewards = player === X ? possibleMove.oWinRewards : possibleMove.xWinRewards;
total = winRewards + drawRewards + lossRewards;
finalReward = (100*(winRewards/total)) + (10*(drawRewards/total)) + (-1*(lossRewards/total));

Remember, reward is calculated relatively rather than absolutely, and rewards given at the end of the game are higher if the game is shorter. Note that the above calculation also weights the rewards, so that the algorithm prefers winning over drawing, and drawing over losing.

If you were wondering whether or not the centre is indeed the best opening move, please read my previous article which shows that contrary to common belief, the centre is a better opening move than a corner. Wikipedia, xkcd and Google all get it wrong.

2) Does it know how to respond to an opening move in the centre?

Here are the rewards for the next moves after an opening in the centre (pattern "---|-X-|---|"):
{i: 2, j: 0, xWinRewards: 26920, oWinRewards: 3712, drawRewards:  9333} // => 11.5
{i: 0, j: 0, xWinRewards: 67866, oWinRewards: 8622, drawRewards: 35873} // => 10.8
{i: 0, j: 2, xWinRewards: 21060, oWinRewards: 2616, drawRewards:  5856} // => 10.8
{i: 2, j: 2, xWinRewards: 21014, oWinRewards: 2574, drawRewards:  6730} // => 10.6
{i: 0, j: 1, xWinRewards: 12096, oWinRewards:  668, drawRewards:   603} // =>  5.4
{i: 2, j: 1, xWinRewards: 12502, oWinRewards:  664, drawRewards:   595} // =>  5.2
{i: 1, j: 2, xWinRewards: 11850, oWinRewards:  592, drawRewards:   582} // =>  4.9
{i: 1, j: 0, xWinRewards: 11994, oWinRewards:  580, drawRewards:   589} // =>  4.8

My previous article and other sources like Wikipedias tic-tac-toe article and Martin Gardners book entitled Hexaflexagons and Other Mathematical Diversions agree that the best counter to an opening move in the centre is a move to a corner. The results above agree because all four corners have higher rewards (again shown on the right side) than the edges.

3) Does it know how to win on the next move when it nearly has three in a row?

Yes, it knows how to win instantly, for example with the pattern "OX-|-X-|O--", which looks as follows, where the winning move is the middle of the bottom row (2,1):

  j=0 j=1 j=2
i=0 O X  
i=1   X  
i=2 O    

The rewards for that pattern are:
{i: 2, j: 1, xWinRewards: 7104, oWinRewards:  0, drawRewards: 0} // => 100.0
{i: 1, j: 0, xWinRewards:  224, oWinRewards:  0, drawRewards: 2} // =>  99.2
{i: 1, j: 2, xWinRewards:   16, oWinRewards:  8, drawRewards: 0} // =>  66.3
{i: 0, j: 2, xWinRewards:    0, oWinRewards: 48, drawRewards: 0} // =>  -1.0
{i: 2, j: 2, xWinRewards:    0, oWinRewards: 56, drawRewards: 0} // =>  -1.0
So indeed, it knows the reward is highest for (2,1) and will move there to win. It also knows that it doesn't need to block the opponent's imminent win on the left edge (1, 0).

4) Does it know how to block the opponent from winning on their next turn?

Consider the pattern "XO-|-O-|--X" which looks like this:

  j=0 j=1 j=2
i=0 X O  
i=1   O  
i=2     X

It's Xs turn and the algorithm would need to block the bottom middle (2, 1) in order to stop O winning on their next move. Here are the rewards for that pattern:
1: {i: 2, j: 1, xWinRewards: 0, oWinRewards:  4, drawRewards: 4} // =>  4.5
0: {i: 0, j: 2, xWinRewards: 0, oWinRewards: 32, drawRewards: 0} // => -1.0
2: {i: 2, j: 0, xWinRewards: 0, oWinRewards: 32, drawRewards: 0} // => -1.0
3: {i: 1, j: 2, xWinRewards: 0, oWinRewards: 16, drawRewards: 0} // => -1.0
4: {i: 1, j: 0, xWinRewards: 0, oWinRewards: 32, drawRewards: 0} // => -1.0

The algorithm knows that doing a move at (2,1) is best and avoids the loss. It also knows that X has never won from here and can only draw by stopping the win for O.

5) Does it know about the strategy on Wikipedias tic-tac-toe page which shows how O can best counter opening moves by X?

Yes! The algorithm is clever! But not clever enough to read Wikipedia 😀
It's simply learned through playing, that these moves give the highest rewards. Here is what Wikipedia says:
Player O must always respond to a corner opening with a center mark,...
Try it out on the interactive board at the top of this article! The algorithm does this in three out of the four corner openings, strangely not when starting in top right. I guess it needs a little more practice on that opening move.
...and to a center opening with a corner mark.
Yes, the algorithm does this.
An edge opening must be answered either with a center mark, a corner mark next to the X, or an edge mark opposite the X.
For the bottom and right edges it responds by taking the centre cell. For the left edge it responds by taking the bottom left corner, i.e. next to starting position. For the top, it fails - I guess it needs some more practise here too.

Since we are pretty much able to answer "yes" to these five questions, I claim that it has indeed learned how to play tic-tac-toe. If you like that and think it's impressive, read on, because it gets better.

If you look at the interactive board at the top of this article, you will see a line which says "Unique games seen: 4330". In my previous article I confirmed that there are 255,168 unique games, of which only 29,600 are not classed as "stupid" because of missed wins or blocks. So if there are so many games, why has the algorithm only encountered less than 2% of them and why is it still able to play so well and why has it learned as shown above?

We can get a list of all patterns where the algorithm hasn't experienced every possible next move, by opening the Javascript console for the interactive board (right click on the board and select "inspect", in Chrome), and running the following snippet:
    let len = k.split('').filter(function(l){ return l === 'X' || l === O;}).length;
    if(model.patterns[k] && model.patterns[k].possibleNextMoves &&
        (model.patterns[k].possibleNextMoves.length + len !== 9)) {

2927 patterns have missing solutions, i.e. the large majority of them. Let's randomly take one and have a look, e.g. "X--|OXO|---":

  j=0 j=1 j=2
i=0 X    
i=1 O X O

The algorithm has only experienced one move from there, although there are actually 5.
{i: 0, j: 1, xWinRewards: 8, oWinRewards: 0, drawRewards: 0}
The move to (0, 1) is actually a "stupid" move, because it misses the immediate win which it would get if it moved to (2, 2). But if we take a step back, Os last move was also "stupid" as it should have blocked that potential win. If we look at the rewards for the two patterns where O makes its second move to take the left or right edge, we see the following.

Pattern "X--|OX-|---":
{i: 2, j: 2, xWinRewards: 98, oWinRewards: 12, drawRewards: 21} // => 10.7
{i: 2, j: 0, xWinRewards: 16, oWinRewards:  0, drawRewards:  0} // =>  0.0
{i: 1, j: 2, xWinRewards:  8, oWinRewards:  0, drawRewards:  0} // =>  0.0
Pattern "X--|-XO|---":
{i: 2, j: 2, xWinRewards: 164, oWinRewards: 148, drawRewards: 4} // => 46.5
The algorithm has learned to block the potential win by X, and so it will never ever go down the path which resulted in the above game again. Moreover, if we try and recreate this pattern using the interactive board at the top of this article, we find that we can't because the algorithm has learned much better moves. If the algorithm starts, taking the centre, and a human follows by taking the left edge, the algorithm doesn't take the top left corner, because by taking the top edge it can force either an immediate win or a fork and so it wins in all cases. Equally, if a human starts as X and takes the top left corner, the algorithm will take the centre because it is a better defence, and so the board layout above will never happen. Finally, if a human starts in the centre, the algorithm will take the top right corner rather than an edge, because taking the corner is a better defence to an opening in the centre than an edge is. The algorithm only ever came across the pattern above during the exploration phase of its learning cycle. It has found the strongest moves starting from the opening and as such it doesn't need to bother learning about games where it ends up in a weaker position. Humans do the same if we think about chess - games where we move say the king with our second move don't make sense so we don't keep them in our toolbox of best moves. I personally find it quite impressive how the algorithm has homed in on optimal games and as such reduced its need to know every possible game. Originally I questioned whether or not this type of machine learning was "brute force" because it needs to play so many games before becoming strong. I assumed it needed to play many many unique games, and that after "learning" it was just going to use its massive memory to work out what to do. But it isn't actually playing that many games, it just takes time to work out which of the 4,330 games that it knows about are the best ones. If it needed to encounter all 29,600 games which don't involve stupid moves, or indeed more than that number, then I think one could claim that the algorithm uses brute force in order to become strong. But because that isn't the case, I am confident in saying that this algorithm isn't using brute force to learn. Having done the analysis above, I am a lot more impressed by the algorithm than I was the first time it beat me, which was quite an impressive moment in itself.

So if I am so impressed, am I scared of AI? Not really. It took me a long time to tune this algorithm so that it played reasonably and there are many dimensions which can be tuned:
  • Exploration: how long should it explore for and when should it no longer expore? If it explores too long, it just learns to play randomly. If it doesn't explore enough, my teenage son can easily beat it when he uses some unexpected moves.
  • Rewards: the question is really "what is the best move", and defining what "best" even means can be tricky - see my previous article. In the case of this algorithm, the rewards are two dimensional, because not only do they depend on whether the game is won/drawn/lost, but also how quickly, so that it learns to not miss immediate wins.
  • Experience: how many games does it need to play until it becomes good? I wasn't able to answer "yes" to my five questions until it had played 100,000 games.
Tuning those dimensions is something that takes quite a lot of time and I would say is something that still needs a human to do. As with all uses of software, one needs to work out if there is a business case for it. I must have spent 5-20 hours a week on this project over three months, let's call it a round 20 man days. I'm not sure I'd find someone who would have paid me to write this algorithm because who really cares that much about playing a futile game which will always end in a draw when played by experienced players? My current customer is introducing an algorithm which takes the description of the insurance case being created and based on that text it preselects some fields in the form. It's 80% accurate, it doesn't use reinforcement learning, more likely pattern recognition with neural networks. If it saves each user 10 seconds per insurance case, that's over a man year of work saved every year. Designing, implementing and maintaining the algorithm certainly took less time, so there is a real business case where there is a positive return on investment. As long as companies reinvest this return, rather than laying people off, I have no problem with it. Most of my career has been based on building software to increase productivity and I don't currently believe that AI will increase this productivity by orders of magnitude greater than what we have been achieving without AI and just good old fashion software.

Just a couple of closing remarks now:
  • The algorithm presented here learns by playing against itself. I did play around with training it against more sophisticated players to start with, and it seemed to improve the rate at which it learned. But I find it quite cool that I was able to remove that and get it to learn against itself. It's effectively taking the algorithm with no knowledge, throwing it into a dark room and saying "hey, go play, and I'll tell you the result of each game, you don't need to know the rules, you can learn them along the way". It learns the rules and the strategies to become a strong player!
  • The algorithm can be classed as reinforcement learning. But what about it's subclass, for example is it a q-learning algorithm? Not quite.
  • The title of this article could have been "Using *Deep* Reinforcement Learning To Learn To Play Tic-Tac-Toe", which would have required using a neural network to recognise patterns, rather than just matching known board patterns exactly. I played with this a little but couldn't train a network which worked at all. I believe that the problem is that unlike matching letters or images, similar looking patterns in tic-tac-toe or other games like chess, can actually have exact opposite outcomes, one leading to a win and the other leading to a loss. Imagine a game of chess where a king is moved by just one position so that the opponent could put it into checkmate - the patterns would be very similar, the outcomes very different. This is a different problem to classifying hand written letters. But I am really a novice when it comes to neural networks, so it's best if I don't comment further.
The complete code for this project can be viewed here: http://github.com/maxant/tictactoe/.

What are the next steps? I have spent a lot of time on this little project. I've learned a lot. One of my original thoughts was that a true test would be to introduce a new game, say Three Men's Morris and see if the same algorithm could learn to play that too. One day... but first it's time for something different...

Finally a few links that I used along the way: Copyright ©2018, Ant Kutschera
Social Bookmarks :  Add this post to Slashdot    Add this post to Digg    Add this post to Reddit    Add this post to Delicious    Add this post to Stumble it    Add this post to Google    Add this post to Technorati    Add this post to Bloglines    Add this post to Facebook    Add this post to Furl    Add this post to Windows Live    Add this post to Yahoo!

The best opening move in a game of tic-tac-toe

As part of a machine learning project, I had to understand tic-tac-toe better, and so I have written an algorithm which a) finds all the possible unique games and b) gathers statistical information about those games.

Based on Wikipedia's tic-tac-toe article, consider a board with the nine positions numbered as follows:

  j=0 j=1 j=2
i=0 1 2 3
i=1 4 5 6
i=2 7 8 9

Assume X always starts. As an example, take the game where X moves top left, followed by O moving top right, then X going middle left followed by O going top middle. These first four moves can be written down as "1342". The game could continue and once completed could be written as "134258769". It's not a perfect game because the first player misses a few opportunities to win and in the end it's a draw.

Every possible combination of moves making up unique games of tic-tac-toe are hence found somewhere between the numbers 123456789 and 999999999 (although probably iterating up to 987654321 suffices). Most of the numbers are illegitimate because each cell is only allowed to be filled once, so for example the number 222222222 does not represent a valid combination. In order to find every valid combination we simply start with that lowest number and iterate up to nine nines, attempting to determine if each number is a valid combination and if it is, record the results of the game.

In order to determine if a combination is valid, we use the following Javascript snippet, which checks that the nine digit number does not contains zeros and contains all of the digits 1 to 9:
//gameAsString contains the current number, e.g. '123456798'
let isValidGame = true;
for(idx = 1; idx < 10; idx++) {
    if(gameAsString.indexOf('' + idx) === -1 //only valid if every digit from 1-9 is found
       || gameAsString.charAt(idx-1) === '0') //only 1-9 are valid
        isValidGame = false;
If a number does represent a valid combination, the next step is to iterate through each digit starting at the left, to work out when the game is finished (it can be won after just 5 moves, or it can require all 9 moves, ending in a win or a draw). For example numbers starting with '12437' represent combinations completed after just 5 moves. Although there are 9999 such valid combinations, the game is only recorded once and all other combinations of numbers which start with 12437 are discarded.

A game is analysed using the following algorithm. The empty board is built by creating a two dimensional array of objects using indexes i and j as shown above. An element in an array of cells represents a cell on the board and has attributes 'i', 'j' and 'v' which records the contents of the cell, either null, 'X' or 'O', depending on whether a player has moved in that cell.
let board = buildEmtpyBoard();
let moves = '';
for(idx = 0; idx < gameAsString.length; idx++){
    let c = gameAsString[idx];
    moves += c; //initially e.g. '1', then '12', etc.
    let coords = convertIndexToCoordinates(parseInt(c)); //turns e.g. '3' into [0,2]
        throw new Error("cell " + i + "," + j + " is already selected by " +
    let player = idx % 2 === 0 ? X : O;
    let rival = player === X ? O : X;
    board[coords[0]][coords[1]].v = player;

    let winner = checkFinished(board);
        let o = uniqueGames[moves];

            //it doesnt exist. update stats
            updateStats(gameAsString, winner, idx, relevantStats);
            uniqueGames[moves] = {}; //record this game so it isn't harvested again

The function checkFinished(board) simply checks whether or not the game is over (a draw, or a player has three in a row), and returns 'DRAW', 'X' or 'O' depending upon the outcome of the game. 'uniqueGames' is an object used to track whether or not the current game has already been seen. Using the example above the combination '12437' will be seen 9999 times, but to all intents and purposes it is just one game of tic-tac-toe so we discard all subsequent sightings of it. If the game is finished and has not yet been seen, the statistics are updated.

The full algorithm is available on GitHub and you can run it live in your browser by clicking on this link: stats.html, which takes a few minutes to run.

That page records the outcome of the game from the perspective of the first move. "cornerXWins5" shows how many times X won games that were only 5 moves long by starting in a corner. As part of updating statistics, the results are grouped into games which are:
  • "stupid" - the player has missed winning in the current move or missed stopping the opponent from winning in the next move; grouped under "stupid stats"
  • won with a "fork" - the winning player had more than one position to move on the board in order to win, when they won; grouped under "fork stats"
  • all other games; grouped under "stats"
The functions used to categorise the games are determineIfStupid and determineIfWinningWithFork.

First of all, notice how there are 255,168 unique games (shown near the bottom of the page after computation is complete). This correlates with other results.

The following table is copied from that page and consolidates these raw results to show how often the starting player can win when opening in the relevant location.

14652 14232 14652
14232 15648 14232
14652 14232 14652

For example, the first player wins in 14652 of the 255168 unique games if they start in the top left. This number comes from adding the number of corner wins after 5, 7 and 9 moves (X doesn't win after 6 or 8 moves, rather O does), for each group of game (0+992+1344 from "stats", 0+1984+0 from "fork stats" and 720+19776+33792 from "stupid stats"), and importantly dividing by four, because there are four corners - they are shared out over the four corners of the table above. Results for the edges are also divided by four, but since there is only one centre cell the result is left undivided.

This table demonstrates that the best location to start is in fact the centre of the board, because starting there wins the most amount of games. This is confirmed by Azahar Machwes blog article Tic Tac Toe: Statistical Analysis with arbitrary Grid Size (albeit for different reasons than he gives) which states:
For the first movers the best starting square by far is 5 (right in the middle and opens up 4 winning lines – most of any starting square, all other squares open up to 3!). Second best option then becomes the corners (1, 3, 7, 9).
A similar result is shown when considering how often the first player loses after they start in one of the following locations (this is done by consolidating results of games that end after 6 or 8 moves, i.e. when the second player wins, beating the first player):

 7896 10176  7896
10176  5616 10176
 7896 10176  7896

For example, the first player loses in 10176 of the 255168 unique games when they start in the middle of the top row. This also demonstrates that the opening move for the first player, which leads to the least number of games lost, is also the centre. Finally we can consider how many games are drawn when the first player opens in the relevant cell:

5184 5184 5184
5184 4608 5184
5184 5184 5184

The first player draws least when opening in the centre.

As such, the centre is the best position for the first player to start, as it leads to a) the most number of games won and b) the least number of draws and c) the least number of games lost.

But because we are considering every possible game, this statement does depend upon the players being random in their moves and not necessarily moving so that they win as soon as possible or block an opponent from winning on their next turn. For that reason, the results have been grouped so that we can also analyse what happens when a more clever player plays. Let us disregard all "stupid" games where a move to win instantly is missed or a move to block the opponent winning next is missed. We are now considering a more "perfect player". They are only "more" perfect, rather than exactly "perfect", because the algorithm used to ignore stupid games works as follows. First it checks to see if the player is missing a chance to win immediately. If they are, they count as stupid. Second, it checks to see if the opponent could win in their next move, and if they can, and the current player does not block them, they count as stupid too. Now consider the game "1234567", as it has some special properties. After "12345" the board looks like this:

O X  

O, the second player, needs to move to position 7 or 9 in order to attempt to block the fork created when X moved to position 5. Of course any move which O makes is futile, and as such, the move to 6 which O is about to make does not count as stupid, because there is nothing they can do at this stage to avoid losing. Perhaps Os move to position 4 should have counted as stupid, because it didn't block X from getting a fork in their next move. The algorithm doesn't go that far, and as such this game is marked as being non-stupid and won with a fork by X in 7 moves. Since it isn't as good as it could be, I describe the players as more perfect rather than perfect.

So, when considering these more perfect players, the number of games won by the first player moving to the relevant cell then becomes:

1080 1240 1080
1240  480 1240
1080 1240 1080

The centre no longer leads to the most number of games won, rather starting on an edge improves the chances of a victory. The number of games lost by the first player moving to the relevant cell is:

304 608 304
608 128 608
304 608 304

The centre is still the starting position which results in the least number of losses for the first player. Although the edge has become the starting position which results in the most number of wins (see above), it is also the starting position resulting in the most number of games lost. Draw statistics for these more perfect players look like this:

1668 2144 1668
2144  816 2144
1668 2144 1668

Starting in the centre still results in the least number of draws. The results of the last three tables are a little conflicting - is it best to start in the centre and avoid a draw or start on an edge which leads to most victories but also to most losses? One way of answering this question is to consolidate the three tables by combining the results using a "reward function" - a formula to sum the individual results which can be weighted if required.

Imagine writing an artificial intelligence algorithm which had to determine the best cell to move to. It might be trained to favour a win over a draw and a draw over a loss. The reward function could be:
reward = (100*oddsOfWinning) + (10*oddsOfDrawing) + (-1*oddsOfLosing)
The table of rewards for perfect play, using the above weightings, is:

1,509,144 1,464,864 1,509,144
1,464,864 1,605,264 1,464,864
1,509,144 1,464,864 1,509,144

The result in this case is that a first move to the centre is still the best. Even if we change the weightings, it isn't possible to tune the reward function so that the corner becomes the best opening move.

Maybe it depends on how "best" is defined. The popular webcomic xkcd posted a complete map of optimal tic-tac-toe moves, perhaps in 2010, which is apparently based on the work titled "Fractal Images of Formal Systems", by Paul St. Denis and Patrick Grim, which shows a more complete map, because it does not show just the top left corner where X starts in the top left. For some reason, only the top left corner of their image was reproduced by xkcd. A few people have asked why this is on the xkcd forum and the best explanation I could find is from someone using the name Algrokoz, who states:
Opening in the corner is optimal because of the number of chances it gives O to not play optimally, not because of total potential win scenarios.
That means a corner opening gives the opponent less places to go in order to play optimally, whatever "optimally" means. Assuming that the best counter to an opening in a corner is the centre, and assuming that the best counter to an opening in the centre is a corner, then yes, one could argue that there are four corners and just one centre, so it is better to start in a corner, because a random opponent has a one in eight chance of finding the best counter move. Compare this to opening in the centre where a random opponent has a four in eight chance of finding the best counter move. We will examine this logic in a bit...

Martin Gardner, an American popular mathematics and science writer wrote the following in the 1988 edition of his book Hexaflexagons and Other Mathematical Diversions
If played "rationally" by both sides, the game must end in a draw. The only chance of winning is to catch an unwary opponent in a "trap" where a row can be scored on the next move in two ways, only one of which can be blocked. Of the three possible opening plays- a corner, the center or a side box- the strongest opening is the corner, because the opponent can avoid being trapped at the next move only by one of the eight possible choices: the center. Conversely, center opening traps can be blocked only by seizing a corner.
The "traps" which he is talking about are what Wikipedia and I refer to as forks - plays which result in a winning opportunity in more than one cell, forcing the opponent to lose. In his statement Gardner is saying that the only place the opponent can avoid a fork, after an opening in a corner, is the centre. If we analyse just games that end in a fork, and search for what happens after two moves we find that when X starts in a corner and O takes the centre, there are still 24 games that X can win with a fork, so I am not sure that Gardners statement is correct. However of all the other second moves, countering with the centre results in the least amount of possible forks later in the game (this is easy to calculate by grouping game results by their two opening moves, see the "second move analysis" shown on the statistics page after the computation completes).

His logic however does seem to be similar to that of the xkcd comment. Interestingly, if we consider what happens when X starts in the centre and O takes a corner, there are also 24 forks by which X can win the game, and this too is the best counter move from the point of view of forks.

If we consider only games won by a fork (see the "fork stats" section on the page which calculates them), then we find that starting in a corner results in 496 (1984/4) forks, compared to only 320 when starting in the centre. Finally we have a situation where it is indeed better to start in the corner.

But is it correct to just consider games that end in a fork? I don't believe it is possible to force a player into a situation where they will be confronted with a fork. Think about the game presented earlier, "1234567", where a fork could have been avoided by O moving to the centre (5) instead of 4. If a player is clever enough to avoid a fork then the game will certainly end in a draw. And so we are talking about players who know how to avoid losing from the start. There is no "best" place to go in order to avoid a draw, when perfect players are playing - Wikipedia and my own analysis shows that when perfect players play, the only possible result is a draw. An optimal opening position can surely only exist when playing a player who might make a mistake or a move which is less than ideal. In that case it makes no sense to analyse only games which end in a fork and we end up having to also consider games that end otherwise. As we have already seen, a corner opening is not the best.

Let's return quickly to the argument about the number of places the second player can move to. Maybe Gardners argument is the same, that in total, because there are four corners and just one centre, there are more moves available to the opponent to avoid a fork, when opening in the centre. But in the same quote he has reduced the number of opening moves to just three, because of board symmetries, so such a conclusion wouldn't be logical.

The problem with Gardners statement (and the graphic published by xkcd) is that the Wikipedia tic-tac-toe page uses them to make the following claim:
The first player, who shall be designated "X", has 3 possible positions to mark during the first turn... Player X can win or force a draw from any of these starting marks; however, playing the corner gives the opponent the smallest choice of squares which must be played to avoid losing[16]. This makes the corner the best opening move for X, when the opponent is not a perfect player.
Reference 16 is the Gardner book that I reference above. The last sentence in the Wikipedia quote is problematic because of the phrase "when the opponent is not a perfect player". As argued above, if players are not perfect, we need to consider at least games ending in a fork and otherwise, but probably also some games which involve stupid moves. As shown in this article, an opening in the centre is better than one in a corner.

Why is all of this so important? I've been working on training a machine learning algorithm to play tic-tac-toe and I was surprised how it learned to open in the centre rather than a corner. I assumed Wikipedia was correct... As a result of this study I shall now endevour to get the Wikipedia page corrected, before a generation of children wrongly learn that opening in a corner is the best move!

Copyright ©2018, Ant Kutschera
Social Bookmarks :  Add this post to Slashdot    Add this post to Digg    Add this post to Reddit    Add this post to Delicious    Add this post to Stumble it    Add this post to Google    Add this post to Technorati    Add this post to Bloglines    Add this post to Facebook    Add this post to Furl    Add this post to Windows Live    Add this post to Yahoo!

Java problem with mutual TLS authentication when using incoming and outgoing connections simultaneously

In most enterprise environments some form of secure communication (e.g. TLS or SSL) is used in connections between applications. In some environments mutual (two-way) authentication is also a non-functional requirement. This is sometimes referred to as two-way SSL or mutual TLS authentication. So as well as the server presenting it's certificate, it requests that the client send it's certificate so that it can then be used to authenticate the caller.

A partner of my current client has been developing a server which receives data over MQTT and because the data is quite sensitive the customer decided that the data should be secured using mutual TLS authentication. Additionally, the customer requires that when the aggregated data which this server collects is posted to further downstream services, it is also done using mutual TLS authentication. This server needs to present a server certificate to its callers so that they can verify the hostname and identity, but additionally it must present a client certificate with a valid user ID to the downstream server when requested to do so during the SSL handshake.

The initial idea was to implement this using the standard JVM system properties for configuring a keystore: "-Djavax.net.ssl.keyStore=...", i.e. putting both client and server certificates into the single keystore. We soon realised however that this doesn't work, and tracing the SSL debug logs showed that the server was presenting the wrong certificate, either during the incoming SSL handshake or the outgoing SSL handshake. During the incoming handshake it should present its server certificate. During the outgoing handshake it should present its client certificate.

The following log extracts have been annotated and show the problems:

Upon further investigation it became clear that the problem is related to the default key manager implementation in the JVM. The SunX509KeyManagerImpl class is used for selecting the certificate which the JVM should present during the handshake, and for both client certificate and server certificate selection, the code simply takes the first certificate it finds:

    String[] aliases = getXYZAliases(keyTypes[i], issuers);
    if ((aliases != null) && (aliases.length > 0)) {
        return aliases[0];  <========== NEEDS TO BE MORE SELECTIVE

The aliases returned by the method on the first line simply match key types (e.g. DSA) and optional issuers. So in the case where the keystore contains two or more certificates, this isn't selective enough. Furthermore, the order of the list is based on iterating over a HashMap entry set, so the order is not say alphabetical, but it is deterministic and constant. So while searching for the server certificate, the algorithm might return the client certificate. If however that part works, the algorithm will then fail when the server makes the downstream connection and needs to present its client certificate, as again the first certificate will be presented, namely the server certificate. As such, because it is impossible to create concurrent incoming and outgoing two-way SSL connections, I have filed a bug with Oracle (internal review ID 9052786 reported to Oracle on 20180225, now officially JDK-8199440).

One solution is to use two keystores, one for each certificate as demonstrated here.

A possible patch for the JVM would be to make the algorithm more selective by using the "extended key usage" certificate extensions. Basically the above code could be enhanced to additionally check the extended key usage and make a more informed decision during alias selection, for example:

String[] aliases = getXYZAliases(keyTypes[i], issuers);
if ((aliases != null) && (aliases.length > 0)) {
    String alias = selectAliasBasedOnExtendedKeyUsage(aliases, "");  //TODO replace with constant
    if (alias != null) return alias;

    //default as implemented in openjdk
    return aliases[0];

The method to select the alias would then be as follows:

private String selectAliasBasedOnExtendedKeyUsage(String[] aliases, String targetExtendedKeyUsage) {
    for(String alias : aliases){
        //assume cert in index 0 is the lowest one in the chain, and check its EKU
        X509Certificate certificate = this.credentialsMap.get(alias).certificates[0];
        List ekus = certificate.getExtendedKeyUsage();         for (String eku : ekus) {             if(eku.equals(targetExtendedKeyUsage)){                 return alias;             }         }     }     return null; } 

More details including a fully running example and unit tests are available here.

Copyright ©2018, Ant Kutschera

Tags : ,
Social Bookmarks :  Add this post to Slashdot    Add this post to Digg    Add this post to Reddit    Add this post to Delicious    Add this post to Stumble it    Add this post to Google    Add this post to Technorati    Add this post to Bloglines    Add this post to Facebook    Add this post to Furl    Add this post to Windows Live    Add this post to Yahoo!

Revisiting Global Data Consistency in Distributed (Microservice) Architectures

Back in 2015 I wrote a couple of articles about how you can piggyback a standard Java EE Transaction Manager to get data consistency across distributed services (here is the original article and here is an article about doing it with Spring Boot, Tomcat or Jetty).

Last year I was fortunate enough to work on a small project where we questioned data consistency from the ground up. Our conclusion was that there is another way of getting data consistency guarantees, one that I had not considered in another article that I wrote about patterns for binding resources into transactions. This other solution is to change the architecture from a synchronous one to an asynchronous one. The basic idea is to save business data together with "commands" within a single database transaction. Commands are simply facts that other systems still need to be called. By reducing the number of concurrent transactions to just one, it is possible to guarantee that data will never be lost. Commands which have been committed are then executed as soon as possible and it is the command execution (in a new transaction) which then makes calls to remote systems. Effectively it is an implementation of the BASE consistency model, because from a global point of view, data is only eventually consistent.

Imagine the situation where updating an insurance case should result in creating a task in a workflow system so that a person gets a reminder to do something, for example write to the customer. The code to handle a request to update an insurance case might look like this:
    EntityManager em;

    public void updateCase(Case case) {
        case = em.merge(case);

            long taskId = taskService
                            "Write to customer...");
The call to the task service results in a remote call to the task application, which is a microservice responsible for workflow and human tasks (work that needs to be done by a human).

There are two problems with our service as described above. First of all, imagine that the task application is offline at the time of the call. That reduces the availability of our application. For every additional remote application that our application connects to, there is a reduction in availability of our system. Imagine one of those application has an allowed downtime of 4 hours per month and a second application has one of 8 hours. That could cause our application to be offline for 12 hours per month, in addition to our own downtimes, since there is never a guarantee that the downtimes will occur at the same time.

The second problem with the service design above, comes when there is a problem committing the data to the database after the call to the task application is made. The code above uses JPA which may choose to flush the SQL statements generated by the call to the merge method or the updates to the entity, at some time after those calls, and at latest at commit time. That means a database error could occur after the call to the task application. The database call might even fail for other reasons such as the network not being available. So conceptually we have the problem that we might have created a task asking an employee to send a letter to the customer, but it wasn't possible to update the case, so the employee might not even have the information necessary to write the letter.

If the task application were transaction aware, i.e. capable of being bound into a transaction so that the transaction manager in our application could deal with the remote commit/rollback, it would certainly help to avoid the second problem described above (data consistency). But the increase in downtime wouldn't be handled.

Changing the architecture so that the call to the task application occurs asynchronously will however solve both of those problems. Note that I am not talking about simple asynchronous method invocation but instead I am talking about calling the task application after our application commits the database transaction. It is only at that point that we have a guarantee that no data will be lost. We can then attempt the remote call as often as is necessary until the task is created successfully. At that stage global data is consistent. Being able to retry failed attempts means that the system as a whole becomes more reliable and our downtime is reduced. Note that I am also not talking about non-blocking methods which are often referred to as being asynchronous.

To make this work, I have created a simple library which requires the developer to do two things. More information about the rudimentary implementation used in the demo application is available here. First of all, the developer needs to make a call to the CommandService, passing in the data which is required when the actual command is executed. Secondly, the developer needs to provide an implementation of the command, which the framework will execute. The first part looks like this:
public class TaskService {

    CommandService commandService;

    /** will create a command which causes a task to be
     *  created in the task app, asynchronously, but robustly. */
    public void createTask(long caseNr, String textForTask) {
        String context = createContext(caseNr, textForTask);

        Command command = new Command(CreateTaskCommand.NAME, context);


    private String createContext(long nr, String textForTask) {
        //TODO use object mapper rather than build string ourselves...
        return "{\"caseNr\": " + nr + ", \"textForTask\": \"" + textForTask + "\"}";
The command service shown here takes a command object which contains two pieces of information: the name of the command and a JSON string containing data which the command will need. A more mature implementation which I have written for my customer takes an object as input rather than a JSON string, and the API uses generics.

The command implementation supplied by the developer looks as follows:
public class CreateTaskCommand implements ExecutableCommand {

    public static final String NAME = "CreateTask";

    public void execute(String idempotencyId, JsonNode context) {
        long caseNr = context.get("caseNr").longValue();


    public String getName() { return NAME; }
The execute method of the command is where the developer implements the stuff which needs to be done. I haven't shown the code used to call the task application since it isn't really relevant here, it's just an HTTP call.

The interesting part of such an asynchronous design isn't in the above two listings, rather in the framework code which ensures that the command is executed. The algorithm is a lot more complicated than you might first think because it has to be able to deal with failures, which causes it to also have to deal with locking. When the call to the command service is made, the following happens:
  • The command is persisted to the database
  • A CDI event is fired
  • When the application commits the transaction, the framework is called since it observes the transaction success
  • The framework "reserves" the command in the database, so that multiple instances of the application wouldn't attempt to execute the same command at the same time
  • The framework uses an asynchronous EJB call to execute the command
  • Executing the command works by using the container to search for implementations of the ExecutableCommand interface and using any which have the name saved in the command
  • All matching commands are executed by calling their execute method, passing them the input that was saved in the database
  • Successfully executed commands are removed from the database
  • Commands which fail are updated in the database, so that the number of execution attempts is incremented
As well as that fairly complex algorithm, the framework also needs to do some house keeping:
  • Periodically check to see if there are commands which need to be executed. Criteria are:
    • The command has failed, but has not been attempted more than 5 times
    • The command is not currently being executed
    • The command is not hanging
    • (a more complex implementation might also restrict how quickly the retry is attempted, for example after a minute, two minutes, then four, etc.)
  • Periodically check to see if there are commands which are hanging, and unlock them so that they will be reattempted
Commands might hang if for example the application crashes during execution. So as you can see, the solution isn't trivial and as such belongs in framework code, so that the wheel doesn't keep getting invented. Unfortunately the implementation very much depends on the environment in which it is supposed to run and so that makes writing a portable library very difficult (which is why I have not done more than publishing the classes in the commands package of the demo application). Interestingly it even depends on the database being used because for example select for update isn't properly supported by Hibernate when used with Oracle. For completions sake, commands which fail 5 times should be monitored so that an administrator can resolve the problem and update the commands so that they are reattempted.

The right question at this stage is whether or not changing the architecture to an asynchronous one is the best solution? On the surface it certainly looks as though it solves all our data consistency problems. But in reality there are a few things that need to be considered in detail. Here are a few examples.

A) Imagine that after updating the insurance case, the user wants to close it, and part of the business rules dictating whether or not a case may be closed includes checking whether any tasks are incomplete. The best place to check whether any tasks are incomplete is the task application! So the developer adds a few lines of code to call it. At this stage it already gets complicated, because should the developer make a synchronous call to the task application, or use a command? Advice is given below, and for simplicity, let's assume the call is made synchronously. But what if three seconds ago, the task application was down and so an incomplete command is still in our database, which when executed will create a task. If we just relied on the task application, we'd close the case and at the next attempt to execute the incomplete command we'd save the task even though the case is already closed. It get's messy, because we'd have to build extra logic to re-open the case when a user clicks the task to deal with it. A more proper solution would be to first ask the task application and then check commands in our database. Even then, because commands are executed asynchronously, we could end up with timing issues, where we miss something. The general problem that we have here is one of ordering. It is well known that eventually consistent systems suffer from ordering problems and can require extra compensatory mechanisms, like the one described above where the case gets reopened. These kind of things can have quite complex impacts on the overall design, so be careful!

B) Imagine an event occurs in the system landscape which results in the case application being called in order to create an insurance case. Imagine then that a second event occurs which should cause that case to be updated. Imagine that the application wishing to create and update the case was implemented asynchronously using the commands framework. Finally, imagine that the case application was unavailable during the first event, so that the command to create the case stayed in the database in an incompleted state. What happens if the second command is executed before the first one, i.e. the case is updated before it even exists? Sure, we could design the case application to be smart and if the case doesn't exist, it simply creates it in the updated state. But what do we then do when the command to create the case is executed? Do we update it to its original state? That would be bad. Do we ignore the second command? That could be bad if some business logic depended on a delta, i.e. a change in the case. I have heard that systems like Elastic Search use timestamps in requests to decide if they were sent before the current state, and it ignores such calls. Do we create a second case? That might happen if we don't have idempotency under control, and that would also be bad. One could implement some kind of complex state machine for tracking commands and for example only allow the update command to be executed after the creation command. But that needs an extra place to store the update command until the creation command has been executed. So as you can see, ordering problems strike again!

C) When do we need to use commands, and when can we get away with synchronous calls to remote applications? The general rule appears to be that as soon as we need to access more than one resource in order to write to it we should use commands, if global data consistency is important to us. So if a certain call requires lots of data to be read from multiple remote applications, so that we can update our database, it isn't necessary to use commands, although it may be necessary to implement idempotency or for the caller to implement some kind of retry mechanism, or indeed use a command to call our system. If, on the other hand, we want to write to a remote application and our database in a consistent manner, then we need to use a command to call the remote application.

D) What do we do if we want to call multiple remote applications? If they all offer idempotent APIs, there doesn't appear to be a problem in calling them all from a single command. Otherwise it might be necessary to use one command per remote application call. If they need to be called in a certain order, it will be necessary that one command implementation creates the command that should be called next in the chain. A chain of commands reminds me of choreography. It might be easier or more maintainable to implement a business process as an orchestration. See here for more details.

E) Thread Local Storage (TLS) can cause headaches because commands are not executed on the same thread that creates the command. As such, mechanisms like the injection of @RequestScoped CDI beans also no longer work as you might expect. The normal Java EE rules which apply to @Asynchronous EJB calls also apply here, precisely because the framework code uses that mechanism in its implementation. If you need TLS or scoped beans then you should considering adding the data from such places into the input which is saved with the command in the database, and as soon as the command is executed, restore the state before calling any local service/bean which relies on it.

F) What do we do if the response from a remote application is required? Most of the time we call remote systems and need response data from them in order to continue processing. Sometimes it is possible to separate reads and writes, for example with CQRS. One solution is to break up the process into smaller steps, so that each time a remote system needs to be called it is handled by a new command, and that command not only makes the remote call, but also updates the local data when the response arrives. We have however noticed that if an optimistic locking strategy is in place it can result in errors when the user wants to persist changes that they have made to their data, which is now "stale" compared to the version in the database, even though they might only want to change certain attributes which the command did not change. One solution to this problem is to propagate events from the backend over a web socket to the client so that it can do a partial update to the attributes affected by the command, so that the user is still able to save their data later on. A different solution is to question why you need the response data. In the example above, I put the task ID into the case. That could be one way to track tasks relating to the case. A better way is to pass the case ID to the task application, and get it to store the case ID in the task. If you need a list of tasks related to the case, you query them using *your* ID, rather than tracking their ID. By doing this you eliminate the dependency on the response data (other than to check that the task is created without an error), and as such there is no need to update your data based upon the response from the remote application.

Hopefully I have been able to demonstrate that an asynchronous architecture using commands as described above offers a suitable alternative to the patterns for guaranteeing global data consistency, which I wrote about a few years ago.

Please note that after implementing the framework and applying it to several of our applications we learned that we are not the only ones to have such ideas. Although I have not read up about Eventuate Tram and its transactional commands, it appears to be very similar. It would be interesting to compare the implementations.

Finally, as well as commands, we added "events" on top of the commands. Events in this case are messages sent via JMS, Kafka, choose your favourite messaging system, in a consistent and guaranteed manner. Both sides, namely publication and consumption of the event is implemented as a command, which provides very good at-least-once delivery guarantees. Events inform 1..n applications in the landscape that something has happened, whereas commands tell a single remote application to do something. These, together with websocket technology and the ability to inform clients of asynchronous changes in the backend, complete the architecture required to guarantee global data consistency. Whether or not such an asynchronous architecture is better than say piggy backing a transaction manager in order to guarantee global data consistency, is something that I am still learning about. Both have their challenges, advantages and disadvantages. Probably, the best solution relies on a mix, as is normally the case with complex software system :-)

Copyright ©2018, Ant Kutschera
Social Bookmarks :  Add this post to Slashdot    Add this post to Digg    Add this post to Reddit    Add this post to Delicious    Add this post to Stumble it    Add this post to Google    Add this post to Technorati    Add this post to Bloglines    Add this post to Facebook    Add this post to Furl    Add this post to Windows Live    Add this post to Yahoo!