Global Data Consistency, Transactions, Microservices and Spring Boot / Tomcat / Jetty

We often build applications which need to do several of the following things together: call backend (micro-) services, write to a database, send a JMS message, etc. But what happens if there is an error during a call to one of these remote resources, for example if a database insert fails, after you have called a web service? If a remote service call writes data, you could end up in a globally inconsistent state because the service has committed its data, but the call to the database has not been committed.
In such cases you will need to compensate the error, and typically the management of that compensation is something that is complex and hand written.

Arun Gupta of Red Hat writes about different microservice patterns in the DZone Getting Started with Microservices Refcard. Indeed the majority of those patterns show a  microservice calling multiple other microservices. In all these cases, global data consistency becomes relevant, i.e. ensuring that failure in one of the latter calls to a microservice is either compensated, or the commital of the call is re-attempted, until all the data in all the microservices is again consistent. In other articles about microservices there is often little or no mention of data consistency across remote boundaries, for example the good article titled “Microservices are not a free lunch” where the author just touches on the problem with the statement “when things have to happen … transactionally …things get complex with us needing to manage … distributed transactions to tie various actions together“. Indeed
we do, but no mention is ever made of how to do this in such articles.

The traditional way to manage consistency in distributed environments is to make use of distributed transactions. A transaction manager is put in place to oversee that the global system remains consistent. Protocols like two phase commit have been developed to standardise the process. JTA, JDBC and JMS are specifications which enable application developers to keep multiple databases and message servers consistent. JCA is a specification which allows developers to write wrappers around
Enterprise Information Systems (EISs). And in a recent article I wrote about how I have built a generic JCA connector which allows you to bind things like calls to microservices into these global distributed transactions, precisely so that you don’t have to write your own framework code for handling failures during distributed transactions. The connector takes care of ensuring that your data is eventually consistent.

But you won’t always have access to a full Java EE application server which supports JCA, especially in a microservice environment, and so I have now extended the library to include automatic handling of commit / rollback / recovery in the following environments:

  • Spring Boot
  • Spring + Tomcat / Jetty
  • Servlets + Tomcat / Jetty
  • Spring Batch
  • Standalone Java applications

In order to be able to do this, the applications need to make use of a JTA compatible transaction manager, namely one of Atomikos or Bitronix.

The following description relies on the fact that you have read the earlier blog article.

The process of setting up a remote call so that it is enlisted in the transaction is similar to when using the JCA adapter presented in the earlier blog article. There are two steps: 1) calling the remote service inside a callback passed to a TransactionAssistant object retrieved from the BasicTransactionAssistanceFactory class, and 2) setting up a central commit / rollback handler.

The first step, namely the code belonging to the execution stage (see earlier blog article), look as follows (when using Spring):

@Service
@Transactional
public class SomeService {

    @Autowired @Qualifier("xa/bookingService")
    BasicTransactionAssistanceFactory bookingServiceFactory;

    public String doSomethingInAGlobalTransactionWithARemoteService(String username) throws Exception {
        //write to say a local database...

        //call a remote service
        String msResponse = null;
        try(TransactionAssistant transactionAssistant = bookingServiceFactory.getTransactionAssistant()){
            msResponse = transactionAssistant.executeInActiveTransaction(txid->{
                BookingSystem service = new BookingSystemWebServiceService().getBookingSystemPort();
                return service.reserveTickets(txid, username);
            });
        }
        return msResponse;
    }
}

Listing 1: Calling a web service inside a transaction

Lines 5-6 provide an instance of the factory used on line 13 to get a TransactionAssistant. Note that you must ensure that the name used here is the same as the one used during the setup in Listing 3, below. This is because when the transaction is committed or rolled back, the transaction manager needs to find the relevant callback used to commit or compensate the call made on line 16. It is more than likely that you will have multiple remote calls like this in your application, and for each remote service that you integrate, you must write code like that shown in Listing 1. Notice how this code is not that different to using JDBC to call a database. For each database that you enlist into the transaction, you need to:

  • inject a data source (analagous to lines 5-6)
  • get a connection from the data source (line 13)
  • create a statement (line 14)
  • execute the statement (lines 15-16)
  • close the connection (line 13, when the try block calls the close method of the auto-closable resource). It is very important to close the transaction assistant after it has been used, before the transaction is completed.

In order to create an instance of the BasicTransactionAssistanceFactory (lines 5-6 in Listing 1), we use a Spring @Configuration:

@Configuration
public class Config {

    @Bean(name="xa/bookingService")
    public BasicTransactionAssistanceFactory bookingSystemFactory() throws NamingException {
        Context ctx = new BitronixContext();
        BasicTransactionAssistanceFactory microserviceFactory = 
                          (BasicTransactionAssistanceFactory) ctx.lookup("xa/bookingService");
        return microserviceFactory;
    }
...

Listing 2: Spring’s @Configuration, used to create a factory

Line 4 of Listing 2 uses the same name as is found in the @Qualifier on line 5 of Listing 1. The method on line 5 of Listing 2 creates a factory by looking it up in JNDI, in this example using Bitronix. The code looks slightly different when using Atomikos – see the demo/genericconnector-demo-springboot-atomikos
project for details.

The second step mentioned above is to setup a commit / rollback callback. This will be used by the transaction manager when the transaction around lines 8-20 of Listing 1 is committed or rolled back. Note that there is a transaction because of the @Transactional annotation on line 2 of Listing 1. This setup is shown in Listing 3:

CommitRollbackCallback bookingCommitRollbackCallback = new CommitRollbackCallback() {
    private static final long serialVersionUID = 1L;
    @Override
    public void rollback(String txid) throws Exception {
        new BookingSystemWebServiceService().getBookingSystemPort().cancelTickets(txid);
    }
    @Override
    public void commit(String txid) throws Exception {
        new BookingSystemWebServiceService().getBookingSystemPort().bookTickets(txid);
    }
};
TransactionConfigurator.setup("xa/bookingService", bookingCommitRollbackCallback);

Listing 3: Setting up a commit / rollback handler

Line 12 passes the callback to the configurator together with the same unique name that was used in listings 1 and 2.

The commit on line 9 may well be empty, if the service you are integrating only offers an execution method and a compensatory method for that execution. This commit callback comes from two phase commit where the aim is to keep the amount of time that distributed systems are inconsistent to an absolute minimum. See the discussion towards the end of this article.

Lines 5 and 9 instantiate a new web service client. Note that the callback handler should be stateless! It is serializable because on some platforms, e.g. Atomikos, it will be serialized together with  transactional information so that it can be called during recovery if necessary. I suppose you could make it stateful so long as it remained serializable, but I recommend leaving it stateless.

The transaction ID (the String named txid) passed to the callback on lines 4 and 8 is passed to the web service in this example. In a more realistic example you would use that ID to lookup contextual information that you saved during the execution stage (see lines 15 and 16 of Listing 1).
You would then use that contextual information, for example a reference number that came from an earlier call to the web service, to make the call to commit or rollback the web service call made in Listing 1.

The standalone variations of these listings, for example to use this library outside of a Spring environment, are almost identical with the exception that you need to manage the transaction manually. See the demo folder on Github for examples of code in several of the supported  environments.

Note that in the JCA version of the generic connector, you can configure whether or not the generic connector handles recovery internally. If it doesn’t, you have to provide a callback which the transaction manager can call, to find transactions which you believe are not yet completed. In the non-JCA implemtation discussed in this article, this is always handled internally by the generic connector.
The generic connector will write contextual information to a directory and uses that during recovery to tell the transaction manager what needs to be cleaned up. Strictly speaking, this is not quite right, because if your hard disk fails, all the information about incomplete transactions will be lost. In strict two phase commit, this is why the transaction manager is allowed to call through to the resource to get a list of incomplete transactions requiring recovery. In todays world of RAID controllers there is no reason why a production machine should ever lose data due to a hard disk failure, and for that reason there is currently no option of providing a callback to the generic connector which can tell it what transactions are in a state that needs recovery. In the event of a catastrophic hardware failure of a node, where it was not possible to get the node up and running again, you would need to physically copy all the files which the generic connector writes, from the old hard disk over to a second node.
The transaction manager and generic connector running on the second node would then work in harmony to complete all the hung transactions, by either committing them or rolling them back, whichever was relevant at the time of the crash. This process is no different to copying transaction manager logs during disaster recovery, depending on which transaction manager you are using. The chances that you will ever need to do this are very small – in my career I have never known a production machine from a project/product that I have worked on to fail in such a way.

You can configure where this contextual information is written using the second parameter shown in Listing 4:

MicroserviceXAResource.configure(30000L, new File("."));

Listing 4: Configuring the generic connector. The values shown are also the default values.

Listing 4 sets the minimum age of a transaction before it becomes relevant to recovery. In this case, the transaction will only be considered relevant for cleanup via recovery when it is more than 30 seconds old. You may need to tune this value depending upon the time it takes your business process to execute and that may depend on the sum of the timeout periods configured for each back-end service that you call. There is a trade off between a low value and a high value: the lower the value, the less time it takes the background task running in the transaction manager to clean up during recovery, after a failure. That means the smaller the value is, the smaller the window of inconsistency is. But be careful though, if the value is too low, the recovery task will attempt to rollback transactions which are actually still active. You can normally configure the transaction manager’s timeout period, and the value set in Listing 4 should be more than equal to the transaction manager’s timeout period. Additionally, the directory where contextual data is stored is configured in Listing 4 to be the local directory. You can specify any directory, but please make sure the directory exists because the generic connector will not attempt to create it. 

If you are using Bitronix in a Tomcat environment, you may find that there isn’t much information available on how to configure the environment. It used to be documented very well, before Bitronix was moved from codehaus.org over to Github. I have created an issue with Bitronix to improve the documentation. The source code and readme file in the demo/genericconnector-demo-tomcat-bitronix folder contains hints and links.

A final thing to note about using the generic connector is how the commit and rollback work. All the connector is doing is piggy-backing on top of a JTA transaction so that in the case that something needs to be rolled back, it gets notification via a callback. The generic connector then passes this information over to your code in the callback that is registered in Listing 3. The actual rolling back of the data in the back end is not something that the generic connector does – it simply calls your callback so that you can tell the back end system to rollback the data. Normally you won’t rollback as such, rather you will mark the data that was written, as being no longer valid, typically using states. It can be very hard to properly rollback all traces of data that have already been written during the execution stage. In a strict two phase commit protocol setup, e.g. using two databases, the data written in each resource remains in a locked state, untouchable by third party transactions, between execution and commit/rollback. Indeed that is one of the drawbacks of two phase commit because locking resources reduces scalability.

Typically the back end system that you integrate won’t lock data between the execution phase and the commit phase, and indeed the commit callback will remain empty because it has nothing to do – the data is typically already committed in the back end when line 16 of Listing 1 returns during the execution stage. However, if you want to build a stricter system, and you can influence the  implementation of the back end which you are integrating, then the data in the back end system can be “locked” between the execution and commit stages, typically by using states, for example “ticket reserved” after execution and “ticket booked” after the commit. Third party transactions would not be allowed to access resources / tickets in the “reserved” state.

The generic connector and a number of demo projects are available at https://github.com/maxant/genericconnector/ and the binaries and sources are available from
Maven.

Copyright ©2015, Ant Kutschera.