<< Previous | Home | Next >>

Java problem with mutual TLS authentication when using incoming and outgoing connections simultaneously

In most enterprise environments some form of secure communication (e.g. TLS or SSL) is used in connections between applications. In some environments mutual (two-way) authentication is also a non-functional requirement. This is sometimes referred to as two-way SSL or mutual TLS authentication. So as well as the server presenting it's certificate, it requests that the client send it's certificate so that it can then be used to authenticate the caller.

A partner of my current client has been developing a server which receives data over MQTT and because the data is quite sensitive the customer decided that the data should be secured using mutual TLS authentication. Additionally, the customer requires that when the aggregated data which this server collects is posted to further downstream services, it is also done using mutual TLS authentication. This server needs to present a server certificate to its callers so that they can verify the hostname and identity, but additionally it must present a client certificate with a valid user ID to the downstream server when requested to do so during the SSL handshake.

The initial idea was to implement this using the standard JVM system properties for configuring a keystore: "-Djavax.net.ssl.keyStore=...", i.e. putting both client and server certificates into the single keystore. We soon realised however that this doesn't work, and tracing the SSL debug logs showed that the server was presenting the wrong certificate, either during the incoming SSL handshake or the outgoing SSL handshake. During the incoming handshake it should present its server certificate. During the outgoing handshake it should present its client certificate.

The following log extracts have been annotated and show the problems:

Upon further investigation it became clear that the problem is related to the default key manager implementation in the JVM. The SunX509KeyManagerImpl class is used for selecting the certificate which the JVM should present during the handshake, and for both client certificate and server certificate selection, the code simply takes the first certificate it finds:

    String[] aliases = getXYZAliases(keyTypes[i], issuers);
    if ((aliases != null) && (aliases.length > 0)) {
        return aliases[0];  <========== NEEDS TO BE MORE SELECTIVE

The aliases returned by the method on the first line simply match key types (e.g. DSA) and optional issuers. So in the case where the keystore contains two or more certificates, this isn't selective enough. Furthermore, the order of the list is based on iterating over a HashMap entry set, so the order is not say alphabetical, but it is deterministic and constant. So while searching for the server certificate, the algorithm might return the client certificate. If however that part works, the algorithm will then fail when the server makes the downstream connection and needs to present its client certificate, as again the first certificate will be presented, namely the server certificate. As such, because it is impossible to create concurrent incoming and outgoing two-way SSL connections, I have filed a bug with Oracle (internal review ID 9052786 reported to Oracle on 20180225, now officially JDK-8199440).

One solution is to use two keystores, one for each certificate as demonstrated here.

A possible patch for the JVM would be to make the algorithm more selective by using the "extended key usage" certificate extensions. Basically the above code could be enhanced to additionally check the extended key usage and make a more informed decision during alias selection, for example:

String[] aliases = getXYZAliases(keyTypes[i], issuers);
if ((aliases != null) && (aliases.length > 0)) {
    String alias = selectAliasBasedOnExtendedKeyUsage(aliases, "");  //TODO replace with constant
    if (alias != null) return alias;

    //default as implemented in openjdk
    return aliases[0];

The method to select the alias would then be as follows:

private String selectAliasBasedOnExtendedKeyUsage(String[] aliases, String targetExtendedKeyUsage) {
    for(String alias : aliases){
        //assume cert in index 0 is the lowest one in the chain, and check its EKU
        X509Certificate certificate = this.credentialsMap.get(alias).certificates[0];
        List ekus = certificate.getExtendedKeyUsage();         for (String eku : ekus) {             if(eku.equals(targetExtendedKeyUsage)){                 return alias;             }         }     }     return null; } 

More details including a fully running example and unit tests are available here.

Copyright ©2018, Ant Kutschera

Tags : ,
Social Bookmarks :  Add this post to Slashdot    Add this post to Digg    Add this post to Reddit    Add this post to Delicious    Add this post to Stumble it    Add this post to Google    Add this post to Technorati    Add this post to Bloglines    Add this post to Facebook    Add this post to Furl    Add this post to Windows Live    Add this post to Yahoo!

Revisiting Global Data Consistency in Distributed (Microservice) Architectures

Back in 2015 I wrote a couple of articles about how you can piggyback a standard Java EE Transaction Manager to get data consistency across distributed services (here is the original article and here is an article about doing it with Spring Boot, Tomcat or Jetty).

Last year I was fortunate enough to work on a small project where we questioned data consistency from the ground up. Our conclusion was that there is another way of getting data consistency guarantees, one that I had not considered in another article that I wrote about patterns for binding resources into transactions. This other solution is to change the architecture from a synchronous one to an asynchronous one. The basic idea is to save business data together with "commands" within a single database transaction. Commands are simply facts that other systems still need to be called. By reducing the number of concurrent transactions to just one, it is possible to guarantee that data will never be lost. Commands which have been committed are then executed as soon as possible and it is the command execution (in a new transaction) which then makes calls to remote systems. Effectively it is an implementation of the BASE consistency model, because from a global point of view, data is only eventually consistent.

Imagine the situation where updating an insurance case should result in creating a task in a workflow system so that a person gets a reminder to do something, for example write to the customer. The code to handle a request to update an insurance case might look like this:
    EntityManager em;

    public void updateCase(Case case) {
        case = em.merge(case);

            long taskId = taskService
                            "Write to customer...");
The call to the task service results in a remote call to the task application, which is a microservice responsible for workflow and human tasks (work that needs to be done by a human).

There are two problems with our service as described above. First of all, imagine that the task application is offline at the time of the call. That reduces the availability of our application. For every additional remote application that our application connects to, there is a reduction in availability of our system. Imagine one of those application has an allowed downtime of 4 hours per month and a second application has one of 8 hours. That could cause our application to be offline for 12 hours per month, in addition to our own downtimes, since there is never a guarantee that the downtimes will occur at the same time.

The second problem with the service design above, comes when there is a problem committing the data to the database after the call to the task application is made. The code above uses JPA which may choose to flush the SQL statements generated by the call to the merge method or the updates to the entity, at some time after those calls, and at latest at commit time. That means a database error could occur after the call to the task application. The database call might even fail for other reasons such as the network not being available. So conceptually we have the problem that we might have created a task asking an employee to send a letter to the customer, but it wasn't possible to update the case, so the employee might not even have the information necessary to write the letter.

If the task application were transaction aware, i.e. capable of being bound into a transaction so that the transaction manager in our application could deal with the remote commit/rollback, it would certainly help to avoid the second problem described above (data consistency). But the increase in downtime wouldn't be handled.

Changing the architecture so that the call to the task application occurs asynchronously will however solve both of those problems. Note that I am not talking about simple asynchronous method invocation but instead I am talking about calling the task application after our application commits the database transaction. It is only at that point that we have a guarantee that no data will be lost. We can then attempt the remote call as often as is necessary until the task is created successfully. At that stage global data is consistent. Being able to retry failed attempts means that the system as a whole becomes more reliable and our downtime is reduced. Note that I am also not talking about non-blocking methods which are often referred to as being asynchronous.

To make this work, I have created a simple library which requires the developer to do two things. More information about the rudimentary implementation used in the demo application is available here. First of all, the developer needs to make a call to the CommandService, passing in the data which is required when the actual command is executed. Secondly, the developer needs to provide an implementation of the command, which the framework will execute. The first part looks like this:
public class TaskService {

    CommandService commandService;

    /** will create a command which causes a task to be
     *  created in the task app, asynchronously, but robustly. */
    public void createTask(long caseNr, String textForTask) {
        String context = createContext(caseNr, textForTask);

        Command command = new Command(CreateTaskCommand.NAME, context);


    private String createContext(long nr, String textForTask) {
        //TODO use object mapper rather than build string ourselves...
        return "{\"caseNr\": " + nr + ", \"textForTask\": \"" + textForTask + "\"}";
The command service shown here takes a command object which contains two pieces of information: the name of the command and a JSON string containing data which the command will need. A more mature implementation which I have written for my customer takes an object as input rather than a JSON string, and the API uses generics.

The command implementation supplied by the developer looks as follows:
public class CreateTaskCommand implements ExecutableCommand {

    public static final String NAME = "CreateTask";

    public void execute(String idempotencyId, JsonNode context) {
        long caseNr = context.get("caseNr").longValue();


    public String getName() { return NAME; }
The execute method of the command is where the developer implements the stuff which needs to be done. I haven't shown the code used to call the task application since it isn't really relevant here, it's just an HTTP call.

The interesting part of such an asynchronous design isn't in the above two listings, rather in the framework code which ensures that the command is executed. The algorithm is a lot more complicated than you might first think because it has to be able to deal with failures, which causes it to also have to deal with locking. When the call to the command service is made, the following happens:
  • The command is persisted to the database
  • A CDI event is fired
  • When the application commits the transaction, the framework is called since it observes the transaction success
  • The framework "reserves" the command in the database, so that multiple instances of the application wouldn't attempt to execute the same command at the same time
  • The framework uses an asynchronous EJB call to execute the command
  • Executing the command works by using the container to search for implementations of the ExecutableCommand interface and using any which have the name saved in the command
  • All matching commands are executed by calling their execute method, passing them the input that was saved in the database
  • Successfully executed commands are removed from the database
  • Commands which fail are updated in the database, so that the number of execution attempts is incremented
As well as that fairly complex algorithm, the framework also needs to do some house keeping:
  • Periodically check to see if there are commands which need to be executed. Criteria are:
    • The command has failed, but has not been attempted more than 5 times
    • The command is not currently being executed
    • The command is not hanging
    • (a more complex implementation might also restrict how quickly the retry is attempted, for example after a minute, two minutes, then four, etc.)
  • Periodically check to see if there are commands which are hanging, and unlock them so that they will be reattempted
Commands might hang if for example the application crashes during execution. So as you can see, the solution isn't trivial and as such belongs in framework code, so that the wheel doesn't keep getting invented. Unfortunately the implementation very much depends on the environment in which it is supposed to run and so that makes writing a portable library very difficult (which is why I have not done more than publishing the classes in the commands package of the demo application). Interestingly it even depends on the database being used because for example select for update isn't properly supported by Hibernate when used with Oracle. For completions sake, commands which fail 5 times should be monitored so that an administrator can resolve the problem and update the commands so that they are reattempted.

The right question at this stage is whether or not changing the architecture to an asynchronous one is the best solution? On the surface it certainly looks as though it solves all our data consistency problems. But in reality there are a few things that need to be considered in detail. Here are a few examples.

A) Imagine that after updating the insurance case, the user wants to close it, and part of the business rules dictating whether or not a case may be closed includes checking whether any tasks are incomplete. The best place to check whether any tasks are incomplete is the task application! So the developer adds a few lines of code to call it. At this stage it already gets complicated, because should the developer make a synchronous call to the task application, or use a command? Advice is given below, and for simplicity, let's assume the call is made synchronously. But what if three seconds ago, the task application was down and so an incomplete command is still in our database, which when executed will create a task. If we just relied on the task application, we'd close the case and at the next attempt to execute the incomplete command we'd save the task even though the case is already closed. It get's messy, because we'd have to build extra logic to re-open the case when a user clicks the task to deal with it. A more proper solution would be to first ask the task application and then check commands in our database. Even then, because commands are executed asynchronously, we could end up with timing issues, where we miss something. The general problem that we have here is one of ordering. It is well known that eventually consistent systems suffer from ordering problems and can require extra compensatory mechanisms, like the one described above where the case gets reopened. These kind of things can have quite complex impacts on the overall design, so be careful!

B) Imagine an event occurs in the system landscape which results in the case application being called in order to create an insurance case. Imagine then that a second event occurs which should cause that case to be updated. Imagine that the application wishing to create and update the case was implemented asynchronously using the commands framework. Finally, imagine that the case application was unavailable during the first event, so that the command to create the case stayed in the database in an incompleted state. What happens if the second command is executed before the first one, i.e. the case is updated before it even exists? Sure, we could design the case application to be smart and if the case doesn't exist, it simply creates it in the updated state. But what do we then do when the command to create the case is executed? Do we update it to its original state? That would be bad. Do we ignore the second command? That could be bad if some business logic depended on a delta, i.e. a change in the case. I have heard that systems like Elastic Search use timestamps in requests to decide if they were sent before the current state, and it ignores such calls. Do we create a second case? That might happen if we don't have idempotency under control, and that would also be bad. One could implement some kind of complex state machine for tracking commands and for example only allow the update command to be executed after the creation command. But that needs an extra place to store the update command until the creation command has been executed. So as you can see, ordering problems strike again!

C) When do we need to use commands, and when can we get away with synchronous calls to remote applications? The general rule appears to be that as soon as we need to access more than one resource in order to write to it we should use commands, if global data consistency is important to us. So if a certain call requires lots of data to be read from multiple remote applications, so that we can update our database, it isn't necessary to use commands, although it may be necessary to implement idempotency or for the caller to implement some kind of retry mechanism, or indeed use a command to call our system. If, on the other hand, we want to write to a remote application and our database in a consistent manner, then we need to use a command to call the remote application.

D) What do we do if we want to call multiple remote applications? If they all offer idempotent APIs, there doesn't appear to be a problem in calling them all from a single command. Otherwise it might be necessary to use one command per remote application call. If they need to be called in a certain order, it will be necessary that one command implementation creates the command that should be called next in the chain. A chain of commands reminds me of choreography. It might be easier or more maintainable to implement a business process as an orchestration. See here for more details.

E) Thread Local Storage (TLS) can cause headaches because commands are not executed on the same thread that creates the command. As such, mechanisms like the injection of @RequestScoped CDI beans also no longer work as you might expect. The normal Java EE rules which apply to @Asynchronous EJB calls also apply here, precisely because the framework code uses that mechanism in its implementation. If you need TLS or scoped beans then you should considering adding the data from such places into the input which is saved with the command in the database, and as soon as the command is executed, restore the state before calling any local service/bean which relies on it.

F) What do we do if the response from a remote application is required? Most of the time we call remote systems and need response data from them in order to continue processing. Sometimes it is possible to separate reads and writes, for example with CQRS. One solution is to break up the process into smaller steps, so that each time a remote system needs to be called it is handled by a new command, and that command not only makes the remote call, but also updates the local data when the response arrives. We have however noticed that if an optimistic locking strategy is in place it can result in errors when the user wants to persist changes that they have made to their data, which is now "stale" compared to the version in the database, even though they might only want to change certain attributes which the command did not change. One solution to this problem is to propagate events from the backend over a web socket to the client so that it can do a partial update to the attributes affected by the command, so that the user is still able to save their data later on. A different solution is to question why you need the response data. In the example above, I put the task ID into the case. That could be one way to track tasks relating to the case. A better way is to pass the case ID to the task application, and get it to store the case ID in the task. If you need a list of tasks related to the case, you query them using *your* ID, rather than tracking their ID. By doing this you eliminate the dependency on the response data (other than to check that the task is created without an error), and as such there is no need to update your data based upon the response from the remote application.

Hopefully I have been able to demonstrate that an asynchronous architecture using commands as described above offers a suitable alternative to the patterns for guaranteeing global data consistency, which I wrote about a few years ago.

Please note that after implementing the framework and applying it to several of our applications we learned that we are not the only ones to have such ideas. Although I have not read up about Eventuate Tram and its transactional commands, it appears to be very similar. It would be interesting to compare the implementations.

Finally, as well as commands, we added "events" on top of the commands. Events in this case are messages sent via JMS, Kafka, choose your favourite messaging system, in a consistent and guaranteed manner. Both sides, namely publication and consumption of the event is implemented as a command, which provides very good at-least-once delivery guarantees. Events inform 1..n applications in the landscape that something has happened, whereas commands tell a single remote application to do something. These, together with websocket technology and the ability to inform clients of asynchronous changes in the backend, complete the architecture required to guarantee global data consistency. Whether or not such an asynchronous architecture is better than say piggy backing a transaction manager in order to guarantee global data consistency, is something that I am still learning about. Both have their challenges, advantages and disadvantages. Probably, the best solution relies on a mix, as is normally the case with complex software system :-)

Copyright ©2018, Ant Kutschera
Social Bookmarks :  Add this post to Slashdot    Add this post to Digg    Add this post to Reddit    Add this post to Delicious    Add this post to Stumble it    Add this post to Google    Add this post to Technorati    Add this post to Bloglines    Add this post to Facebook    Add this post to Furl    Add this post to Windows Live    Add this post to Yahoo!

Choosing the right language to write a simple transformation tool

Recently, a colleague asked for help in writing a little tool to transform a set of XML files into a non-normalised single table, so that their content could be easily analysed and compared, using Excel. The requirements were roughly:
  1. Read XML from several files, with the structure shown below,
  2. Write a file containing one row per combination of file and servlet name, and one column per param-name (see example below),
  3. It should be possible to import the output into Excel.
Example input: In the example input above, there can be any number of servlet tags, each containing at least a name, and optionally any number of name-value pairs, representing input parameters to the servlet. Note that each servlet could contain totally different parameters!

The output should then have the following structure. We chose comma separated values (CSV) so that it could easily be imported into Excel. Note how the output contains empty cells, because not every servlet has to have the same parameters. The algorithm we agreed on was as follows:
  1. Read files in working directory (filtering out non-XML files),
  2. For each file:
  3.     For each servlet:
  4.         For each parameter name-value pair:
  5.             Note parameter name
  6.             Note combination of file, servlet, parameter name and value
  7. Sort unique parameter names
  8. Output a header line for the file column, servlet column, and one column for each unique parameter name
  9. For each file:
  10.     For each servlet:
  11.         For each sorted unique parameter name:
  12.             Output a "cell" containing the corresponding parameter value,
                or an empty "cell" if the servlet has no corresponding
                parameter value for the current parameter name
The next step was to think about how we would implement this. We tried simply importing the data into Excel, but it isn't so good at coping with non-normalised structures and the varying number of parameters with differing names meant it wasn't possible to use Excel directly. We did not consider writing some VBA to manipulate the imported data. Working with a company that has invested heavily in Java, it would have been obvious to use that. While we didn't have any XSD which defined the XML structure, there are no end of tools which can be used to generate it based solely on the XML files. From there we could have used JAXB XML Binding to generate some Java classes and import the XML files and deserialise them into a Java model. Another option (which was the one my colleague chose to maintain), was to use XStream for deserialising. But while my colleague worked on the Java solution, I asked myself if there wasn't a different, better way to do it. I quite like using Javascript and Node.js for scripting tools, and I've been learning Typescript recently, so I gave that a go.

Typescript solution: Lines 7 and 8 are where I create a very simple model of the input content. I haven't bothered to create any classes which define the content, instead I'm just using objects as dictionaries/maps. They map names to objects and the JSON corresponding to the output shown at the start of this article is as follows: Lines 12-18 of the Typescript solution are where I read the input files and put their content into the simple model described above. Lines 22-24 are where I write the output file. Notice how I have to use the Promise API on line 22 to wait for all the promises which the handleFile function returns, before writing the output. The promises are there because dealing with I/O in Node.js is normally done asynchronously. So just looking at this first part of the Typescript solution, it quickly becomes obvious that we have to write quite a lot of boilerplate code because Node.js is based on a single threaded non-blocking I/O paradigm. While that is nice for writing UI code in the browser [1] and very useful for writing highly performing code in the back end [2], I find it very annoying for writing little tools where that stuff shouldn't matter. In fact over half of lines 12 to 25 are cluttered with code for dealing with these Node.js qualities. Line 12 defines a callback for dealing with the files which are read from the input directory. Lines 14, 16, 17, 22 and 24 contain code for dealing with promises that leak out of the library we use to parse the XML. Callback hell isn't just what happens when you write deeply nested code structures. For me, it's also about having to influence so much of my code with intricacies related to callbacks.

Luckily, Node.js also provides synchronous versions of some of the I/O functions, so when we read the XML file on line 30 of the Typescript solution, or write the output file on line 78, we don't need to put code which must wait for the I/O to be completed, inside a callback. When writing tools like this one, where performance based on I/O doesn't really matter, I prefer to use those functions as it makes the code much more readable, and thus maintainable. Yes, you could argue that you want to parse all the files in parallel and make the program really really fast. The Typescript version of this program runs in 40 milliseconds, so I'm not going to worry about parsing files in parallel, when I'd rather have readable and maintainable code. The important point is that the program runs fast enough for this use case.

Javascript ES 2017 and Typescript 1.7 introduced the await keyword which can be used in async functions. The idea is that you can write code without having to deal with promises. Note that async/await works with functions that return promises, and unfortunately the library that converts XML to a Javascript object works with so called error first callbacks instead of promises. So I chose to hide the XML parsing inside the function shown on lines 45-52, called parseXml, which simply converts from the callback pattern to a promise. See here for more details. The function called handleFile defined on lines 27-43 shows an example of using the await keyword on line 31.

You can use the await keyword inside any function which is marked with the async keyword, in front of a call to a function which returns a promise. It causes all the code after that line to be executed after the relevant promise completes. So lines 33-41 are called after the promise which the parseXml function returns, is completed. At this stage the code in the handleFile functions looks better than when writing it with promises or callbacks and much of the boilerplate magic has been removed. But the reality is that the abstraction that is gained when using await leaks out of the handleFile function, because async functions return promises. Without the code on lines 14, 17 and 22, we would start writing the output file before all the files contents are added to our model, on lines 33-41.

These problems of boilerplate code related to leaky abstractions are just enough of a reason for me to continue looking for a better solution, when writing a simple tool like this.

[1] - UI developers shouldn't have to worry about threading issues related to screen refreshing. As such, having no threads to think about aleviates UI developers from unnecessary complexities while concentrating on developing front ends. Calls to servers (using XHR, Web Sockets, etc.) are handled behind the scenes, and UI developers just have to supply a function which is called when the result becomes available, sometime in the future. That is REALLY cool! Try doing the same thing in Java and you soon lose time thinking very hard about threads.

[2] - See my blog post from a few years ago where I showed an example of where Node.js out performs the JVM because it's tuned for non-blocking scenarios.

The next language choice was Scala, a language that I spent a lot of time exploring in 2012-13. After that I left the language and returned to Java a little disappointed because I found Scala just a little too complicated for the projects I work in. In those projects, it is rarely, if ever, that technology is the problem. We struggle with problems related to the business, and Java does just nicely in solving 99% of those problems. In my opinion, using Scala doesn't directly help to address the problems we have more than using Java does. Nonetheless, Scala has a very interesting XML parser and can be used to write some pretty cool code. So I dusted off my Scala keyboard so to say, and wrote the following solution to the problem at hand.

Scala Solution: The first thing to note is that the Scala solution looks to be about 25% shorter. That is something the Scala community used to (still do?) hail as one advantage over Java. In this case it's more related to formatting and structuring of the code. Notice how I have everything inside just one function, compared to four in the Typescript solution. Below I introduce a Python solution, which has just about as much code as the Scala solution. So let's look at other stuff.

Line 19 creates a model just as we did in the Typescript solution and effectively has the same structure as the JSON model shown above. Lines 23 & 24 read all the files in the working directory; lines 25 & 26 filter out anything that is not a file and does not have the xml extension; lines 27-32 convert each servlet tag into a tuple containing the file model and the servlet xml node. There are a number of noteworthy things going on in that block. First of all, lines 29-30 create a HashMap named fileModel and puts it into the main model, keyed by the file name. Then line 31 loads the XML from the file. Line 32 then returns a collection of tuples containing the file model (HashMap) and the servlet XML node (note that the last line of a Scala function has an implicit return statement). Using tuples is a neat way to ensure that the code on lines 33-36, which iterates over each servlet node, still has access to the file model i.e. the servlets parent. There are other ways of doing this, but tuples, combined with a case statement as shown on line 33, which allows the tuples parameters to be renamed, are by far the easiest way. This is something that I really miss in Java, because not only does Java have no native tuples, but there is no way to change the parameter names and so the code becomes unreadable and unnecessarily complex. Line 34 creates a servlet model (also a HashMap) and line 35 puts it into the file model, keyed by the servlet name. Line 36 is similar to line 32 in that it returns a collection of tuples, one for each child node of the XML servlet node. Since the XML files always contain a parameter name before the parameter value, line 39 notes the most recent name it encounters, and that is used on line 40 to put the name-value pair into the servlet model.

Line 45 is then also similar to the Typescript solution in that we build a unique sorted list of all parameter names, so that we can iterate over them, to create the columns in the output file, which is done on lines 51-55. The file is written synchronously on line 58.

The solution presented here uses a functional approach in combination with the powerful Scala collections library and that leads to a solution which I feel is better than what could be done with Java, even if using lambdas. At the same time, I find the Scala solution harder to read. Both Scala and Typescript have a huge number of language features, meaning that the reader needs to know more, just to be able to read the code. I have seen several attempts at categorising Scala language features (e.g. here and in this book). I've also seen companies document which features of languages they would like their employees to specifically avoid or treat specially. I wonder if the same should be done with Typescript which has been growing in terms of the number of language features that exist. This kind of thing becomes ever more important when a team is allowed to make their own technology/language choices and choose to become polyglot.

The last thing to note about the Scala solution is the speed of execution. While the Typescript solution took around 40 milliseconds to execute (parsing two simple input files on my laptop), the Scala solution takes over 900 milliseconds. I have heard that the XML library is slow, but I have not (yet) taken the time to investigate this further.

The final solution that I investigated was implemented in Python, a language that I have only just started to learn.

Python solution: The algorithm that has already been implemented twice should again be quite visible. Lines 12-13 create empty models. Line 17 finds all the XML files in the working directory. Lines 20-33 parse the files and build up the model, which is used on lines 37-61 to write the output. Line 22 creates a new dictionary (map) in the model, keyed by the file name. Line 23 parses the XML using the "untangle" library. The library builds a Python model of the file contents, which can be accessed in a natural way, using expressions like servlet.name.cdata to access the content of the path /config/servlet/name in the XML tree. This only works because of the dynamic nature of Python. It works like that in the Typescript solution too (see lines 33 & 35), but not with the JVM, because it is statically typed. Line 34 sorts the parameters, so that we can iterate over them whilst building the output columns (lines 38 & 51). The output is written on lines 58-61.

This solution is most like a script. And a script is precisely what is required for this relatively simple problem, and that was what I was searching for, when I started my little quest to find something better than a typical Java solution. This script is relatively easy to read and hasn't used any advanced language features (except for maybe the lambda on line 17 used to filter the input files). Even the development environment is very simple, since there is no need to compile. And thanks to PyCharm there is even a community edition of a very powerful IDE (IntelliJ also has a community edition for Scala, but sadly not (yet?) for Typescript). The Python solution even runs quickest, in just 20 milliseconds. For those reasons the Python solution became my favourite, for writing this tool. It lets me write just a small amount of code which doesn't leak technicals details like promises everywhere, which is easy to read, and performs well. I can see now why Python is recommended as a first language to learn (e.g. here).

Copyright ©2017, Ant Kutschera
Social Bookmarks :  Add this post to Slashdot    Add this post to Digg    Add this post to Reddit    Add this post to Delicious    Add this post to Stumble it    Add this post to Google    Add this post to Technorati    Add this post to Bloglines    Add this post to Facebook    Add this post to Furl    Add this post to Windows Live    Add this post to Yahoo!

Global Data Consistency, Transactions, Microservices and Spring Boot / Tomcat / Jetty

We often build applications which need to do several of the following things together: call backend (micro-) services, write to a database, send a JMS message, etc. But what happens if there is an error during a call to one of these remote resources, for example if a database insert fails, after you have called a web service? If a remote service call writes data, you could end up in a globally inconsistent state because the service has committed its data, but the call to the database has not been committed. In such cases you will need to compensate the error, and typically the management of that compensation is something that is complex and hand written.

Arun Gupta of Red Hat writes about different microservice patterns in the DZone Getting Started with Microservices Refcard. Indeed the majority of those patterns show a microservice calling multiple other microservices. In all these cases, global data consistency becomes relevant, i.e. ensuring that failure in one of the latter calls to a microservice is either compensated, or the commital of the call is re-attempted, until all the data in all the microservices is again consistent. In other articles about microservices there is often little or no mention of data consistency across remote boundaries, for example the good article titled "Microservices are not a free lunch" where the author just touches on the problem with the statement "when things have to happen ... transactionally ...things get complex with us needing to manage ... distributed transactions to tie various actions together". Indeed we do, but no mention is ever made of how to do this in such articles.

The traditional way to manage consistency in distributed environments is to make use of distributed transactions. A transaction manager is put in place to oversee that the global system remains consistent. Protocols like two phase commit have been developed to standardise the process. JTA, JDBC and JMS are specifications which enable application developers to keep multiple databases and message servers consistent. JCA is a specification which allows developers to write wrappers around Enterprise Information Systems (EISs). And in a recent article I wrote about how I have built a generic JCA connector which allows you to bind things like calls to microservices into these global distributed transactions, precisely so that you don't have to write your own framework code for handling failures during distributed transactions. The connector takes care of ensuring that your data is eventually consistent.

But you won't always have access to a full Java EE application server which supports JCA, especially in a microservice environment, and so I have now extended the library to include automatic handling of commit / rollback / recovery in the following environments:
  • Spring Boot
  • Spring + Tomcat / Jetty
  • Servlets + Tomcat / Jetty
  • Spring Batch
  • Standalone Java applications
In order to be able to do this, the applications need to make use of a JTA compatible transaction manager, namely one of Atomikos or Bitronix.

The following description relies on the fact that you have read the earlier blog article.

The process of setting up a remote call so that it is enlisted in the transaction is similar to when using the JCA adapter presented in the earlier blog article. There are two steps: 1) calling the remote service inside a callback passed to a TransactionAssistant object retrieved from the BasicTransactionAssistanceFactory class, and 2) setting up a central commit / rollback handler.

The first step, namely the code belonging to the execution stage (see earlier blog article), look as follows (when using Spring):
public class SomeService {

    @Autowired @Qualifier("xa/bookingService")
    BasicTransactionAssistanceFactory bookingServiceFactory;

    public String doSomethingInAGlobalTransactionWithARemoteService(String username) throws Exception {
        //write to say a local database...

        //call a remote service
        String msResponse = null;
        try(TransactionAssistant transactionAssistant = bookingServiceFactory.getTransactionAssistant()){
            msResponse = transactionAssistant.executeInActiveTransaction(txid->{
                BookingSystem service = new BookingSystemWebServiceService().getBookingSystemPort();
                return service.reserveTickets(txid, username);
        return msResponse;
Listing 1: Calling a web service inside a transaction
Lines 5-6 provide an instance of the factory used on line 13 to get a TransactionAssistant. Note that you must ensure that the name used here is the same as the one used during the setup in Listing 3, below. This is because when the transaction is committed or rolled back, the transaction manager needs to find the relevant callback used to commit or compensate the call made on line 16. It is more than likely that you will have multiple remote calls like this in your application, and for each remote service that you integrate, you must write code like that shown in Listing 1. Notice how this code is not that different to using JDBC to call a database. For each database that you enlist into the transaction, you need to:
  • inject a data source (analagous to lines 5-6)
  • get a connection from the data source (line 13)
  • create a statement (line 14)
  • execute the statement (lines 15-16)
  • close the connection (line 13, when the try block calls the close method of the auto-closable resource). It is very important to close the transaction assistant after it has been used, before the transaction is completed.
In order to create an instance of the BasicTransactionAssistanceFactory (lines 5-6 in Listing 1), we use a Spring @Configuration:
public class Config {

    public BasicTransactionAssistanceFactory bookingSystemFactory() throws NamingException {
        Context ctx = new BitronixContext();
        BasicTransactionAssistanceFactory microserviceFactory = 
                          (BasicTransactionAssistanceFactory) ctx.lookup("xa/bookingService");
        return microserviceFactory;
Listing 2: Spring's @Configuration, used to create a factory
Line 4 of Listing 2 uses the same name as is found in the @Qualifier on line 5 of Listing 1. The method on line 5 of Listing 2 creates a factory by looking it up in JNDI, in this example using Bitronix. The code looks slightly different when using Atomikos - see the demo/genericconnector-demo-springboot-atomikos project for details.

The second step mentioned above is to setup a commit / rollback callback. This will be used by the transaction manager when the transaction around lines 8-20 of Listing 1 is committed or rolled back. Note that there is a transaction because of the @Transactional annotation on line 2 of Listing 1. This setup is shown in Listing 3:
CommitRollbackCallback bookingCommitRollbackCallback = new CommitRollbackCallback() {
    private static final long serialVersionUID = 1L;
    public void rollback(String txid) throws Exception {
        new BookingSystemWebServiceService().getBookingSystemPort().cancelTickets(txid);
    public void commit(String txid) throws Exception {
        new BookingSystemWebServiceService().getBookingSystemPort().bookTickets(txid);
TransactionConfigurator.setup("xa/bookingService", bookingCommitRollbackCallback);
Listing 3: Setting up a commit / rollback handler
Line 12 passes the callback to the configurator together with the same unique name that was used in listings 1 and 2.

The commit on line 9 may well be empty, if the service you are integrating only offers an execution method and a compensatory method for that execution. This commit callback comes from two phase commit where the aim is to keep the amount of time that distributed systems are inconsistent to an absolute minimum. See the discussion towards the end of this article.

Lines 5 and 9 instantiate a new web service client. Note that the callback handler should be stateless! It is serializable because on some platforms, e.g. Atomikos, it will be serialized together with transactional information so that it can be called during recovery if necessary. I suppose you could make it stateful so long as it remained serializable, but I recommend leaving it stateless.

The transaction ID (the String named txid) passed to the callback on lines 4 and 8 is passed to the web service in this example. In a more realistic example you would use that ID to lookup contextual information that you saved during the execution stage (see lines 15 and 16 of Listing 1). You would then use that contextual information, for example a reference number that came from an earlier call to the web service, to make the call to commit or rollback the web service call made in Listing 1.

The standalone variations of these listings, for example to use this library outside of a Spring environment, are almost identical with the exception that you need to manage the transaction manually. See the demo folder on Github for examples of code in several of the supported environments.

Note that in the JCA version of the generic connector, you can configure whether or not the generic connector handles recovery internally. If it doesn't, you have to provide a callback which the transaction manager can call, to find transactions which you believe are not yet completed. In the non-JCA implentation discussed in this article, this is always handled internally by the generic connector. The generic connector will write contextual information to a directory and uses that during recovery to tell the transaction manager what needs to be cleaned up. Strictly speaking, this is not quite right, because if your hard disk fails, all the information about incomplete transactions will be lost. In strict two phase commit, this is why the transaction manager is allowed to call through to the resource to get a list of incomplete transactions requiring recovery. In todays world of RAID controllers there is no reason why a production machine should ever lose data due to a hard disk failure, and for that reason there is currently no option of providing a callback to the generic connector which can tell it what transactions are in a state that needs recovery. In the event of a catastrophic hardware failure of a node, where it was not possible to get the node up and running again, you would need to physically copy all the files which the generic connector writes, from the old hard disk over to a second node. The transaction manager and generic connector running on the second node would then work in harmony to complete all the hung transactions, by either committing them or rolling them back, whichever was relevant at the time of the crash. This process is no different to copying transaction manager logs during disaster recovery, depending on which transaction manager you are using. The chances that you will ever need to do this are very small - in my career I have never known a production machine from a project/product that I have worked on to fail in such a way.

You can configure where this contextual information is written using the second parameter shown in Listing 4:
MicroserviceXAResource.configure(30000L, new File("."));
Listing 4: Configuring the generic connector. The values shown are also the default values.
Listing 4 sets the minimum age of a transaction before it becomes relevant to recovery. In this case, the transaction will only be considered relevant for cleanup via recovery when it is more than 30 seconds old. You may need to tune this value depending upon the time it takes your business process to execute and that may depend on the sum of the timeout periods configured for each back-end service that you call. There is a trade off between a low value and a high value: the lower the value, the less time it takes the background task running in the transaction manager to clean up during recovery, after a failure. That means the smaller the value is, the smaller the window of inconsistency is. But be careful though, if the value is too low, the recovery task will attempt to rollback transactions which are actually still active. You can normally configure the transaction manager's timeout period, and the value set in Listing 4 should be more than equal to the transaction manager's timeout period. Additionally, the directory where contextual data is stored is configured in Listing 4 to be the local directory. You can specify any directory, but please make sure the directory exists because the generic connector will not attempt to create it.

If you are using Bitronix in a Tomcat environment, you may find that there isn't much information available on how to configure the environment. It used to be documented very well, before Bitronix was moved from codehaus.org over to Github. I have created an issue with Bitronix to improve the documentation. The source code and readme file in the demo/genericconnector-demo-tomcat-bitronix folder contains hints and links.

A final thing to note about using the generic connector is how the commit and rollback work. All the connector is doing is piggy-backing on top of a JTA transaction so that in the case that something needs to be rolled back, it gets notification via a callback. The generic connector then passes this information over to your code in the callback that is registered in Listing 3. The actual rolling back of the data in the back end is not something that the generic connector does - it simply calls your callback so that you can tell the back end system to rollback the data. Normally you won't rollback as such, rather you will mark the data that was written, as being no longer valid, typically using states. It can be very hard to properly rollback all traces of data that have already been written during the execution stage. In a strict two phase commit protocol setup, e.g. using two databases, the data written in each resource remains in a locked state, untouchable by third party transactions, between execution and commit/rollback. Indeed that is one of the drawbacks of two phase commit because locking resources reduces scalability. Typically the back end system that you integrate won't lock data between the execution phase and the commit phase, and indeed the commit callback will remain empty because it has nothing to do - the data is typically already committed in the back end when line 16 of Listing 1 returns during the execution stage. However, if you want to build a stricter system, and you can influence the implementation of the back end which you are integrating, then the data in the back end system can be "locked" between the execution and commit stages, typically by using states, for example "ticket reserved" after execution and "ticket booked" after the commit. Third party transactions would not be allowed to access resources / tickets in the "reserved" state.

The generic connector and a number of demo projects are available at https://github.com/maxant/genericconnector/ and the binaries and sources are available from Maven.

Copyright ©2015, Ant Kutschera.
Social Bookmarks :  Add this post to Slashdot    Add this post to Digg    Add this post to Reddit    Add this post to Delicious    Add this post to Stumble it    Add this post to Google    Add this post to Technorati    Add this post to Bloglines    Add this post to Facebook    Add this post to Furl    Add this post to Windows Live    Add this post to Yahoo!

Rules Engine 2.2.0, now with JavaScript (Nashorn) Support

A new version of the Simple Rule Engine is available, so that you can now use JavaScript (Nashorn) for writing your rules (MVEL is still supported because it is so fast!).

New Features:
  • JavaScript based Rule Engine - Use the JavascriptEngine constructor to create a subclass of Engine which is capable of interpreting JavaScript rules. It uses Nashorn (Java 8) as a JavaScript engine for evaluating the textual rules. Additionally, you can load scripts, for example lodash, so that your rules can be very complex. See the testRuleWithIterationUsingLibrary() and testComplexRuleInLibrary() and testLoadScriptRatherThanFile() tests for examples. Nashorn isn't threadsafe, but the rule engine is! Internally it uses a pool of Nashorn engines. You can also override the pool configuration if you need to. See the testMultithreadingAndPerformance_NoProblemsExpectedBecauseScriptsAreStateless() and testMultithreadingStatefulRules_NoProblemsExpectedBecauseOfEnginePool() tests for examples. If required, you can get the engine to preload the pool, or leave it lazily fill the pool (default). Please note, the engine is not completely Rhino (Java 6 / Java 7) compatible - the multithreaded tests do not work as expected for stateful scripts, but the performance of Rhino is so bad that you won't want to use it anyway.
  • You can now override the name of the input parameter - previous versions required that the rules refer to the input as "input", for example "input.people[0].name == 'Jane'". You can now provide the engine with the name which should be used, so that you can create rules like "company.people[0].name == 'Jane'".
  • Java 8 Javascript Rule Engine - If you want to use Java 8 lambdas, then you instantiate a Java8JavascriptEngine rather than the more plain JavascriptEngine.
  • For your convenience, there are now builders for the JavascriptEngine and Java8JavascriptEngine, because their constructors have so many parameters. See the testBuilder() test for an example.
  • Javascript rules can refer to input using bean notation (e.g. "input.people[0].name") or Java notation (e.g. "input.getPeople().get(0).getName()").
The library is available from Maven Central:


Have fun!

Copyright ©2015, Ant Kutschera
Social Bookmarks :  Add this post to Slashdot    Add this post to Digg    Add this post to Reddit    Add this post to Delicious    Add this post to Stumble it    Add this post to Google    Add this post to Technorati    Add this post to Bloglines    Add this post to Facebook    Add this post to Furl    Add this post to Windows Live    Add this post to Yahoo!