Global Data Consistency in Distributed (Microservice) Architectures

UPDATE: Now supports Spring and Spring Boot outside of a full Java EE sever. See new article for more details!

UPDATE 2: See this new article for more details about using asynchronous calls to remote applications to guarantee global data consistency

I’ve published a generic JCA resource adapter on Github available from Maven (ch.maxant:genericconnector-rar) with an Apache 2.0 licence. This let’s you bind things like REST and SOAP web services into JTA transactions which are under the control of Java EE application servers. That makes it possible to build systems which guarantee data consistency, with as little boiler plate code as possible, very easily. Be sure to read the FAQ.

Imagine the following scenario…

Functional Requirements

  • … many pages of sleep inducing requirements…
  • FR-4053: When the user clicks the “buy” button, the system should book the tickets, make a confirmed payment transaction with the acquirer and send a letter to the customer including their printed tickets and a receipt.
  • … many more requirements…

Selected Non-Functional Requirements

  • NFR-08: The tickets must be booked using NBS (the corporations “Nouvelle Booking System”), an HTTP SOAP Web Service deployed within the intranet.
  • NFR-19: Output Management (printing and sending letters) must be done using COMS, a JSON/REST Service also deployed within the intranet.
  • NFR-22: Payment must be done using our Partner’s MMF (Make Money Fast) system, deployed in the internet and connected to using a VPN.
  • NFR-34: The system must write sales order records to its own Oracle database.
  • NFR-45: The data related to a single sales order must be consistent across the system, NBS, COMS and MMF.
  • NFR-84: The system must be implemented using Java EE 6 so that it can be deployed to our clustered application server environment.
  • NFR-99: Due to the MIGRATION’16 project, the system must be built to be portable to a different application server.

Analysis

NFR-45 is interesting. We need to ensure that the data across multiple systems remains consistent, i.e. even during software/hardware crashes. Yet NFR-08, NFR-19, NFR-22 and NFR-34 make things a little harder. SOAP and REST don’t support transactions! – No, that isn’t entirely true. We could very easily use something like the Arjuna transaction manager from JBoss which supports WS-AT. See for example this project (or its source on Github) or this JBoss example or indeed this Metro example. There are several problems with those solutions though: NFR-99 (the APIs used are not portable); NFR-19 (REST doesn’t support WS-AT, although there is something in the pipeline at JBoss); and the fact that the web services we are integrating might not even support WS-AT. I have integrated many internal and external web services in the past but never come across one which supports WS-AT.

Over the years I have worked on projects which have had similar requirements but which have produced different solutions. I’ve seen and heard of companies who end up effectively building their own transaction managers, which bind web services into transactions. I’ve also come across companies who don’t worry about consistency and ignore NFR-45. I like the idea of consistency, but I don’t like the idea of a business project writing a framework which tracks the state of transactions and manually commits or rolls them back trying to stay synchronised with a Java EE transaction manager. So a few years ago I had an idea of how to fulfil all of those requirements yet avoid such a complex solution that it was akin to building a transaction manager. NFR-84 almost comes to the rescue because Java EE application servers support distributed transactions. I wrote “almost” because what is missing is some form of adapter for binding non-standard resources like web services into such transactions. But the Java EE specifications also contain JSR-112, the JCA specification for building resource adapters, that can be bound into distributed transactions. My idea was to build a generic resource adapter that could be used to bind web services and other things into the transaction under the control of the application server, with as little configuration as necessary and with as simple an API as I could design.

Background to Distributed Transactions

To understand the idea better, let’s take a look at distributed transactions and two phase commit which can be used to bind calls made to a database into an XA transaction using SQL. Listing 1 shows a list of statements which are needed to commit data in an XA transaction:

mysql> XA START 'someTxId';

mysql> insert into person values (null, 'ant');

mysql> XA END 'someTxId';

mysql> XA PREPARE 'someTxId';

mysql> XA COMMIT 'someTxId';

mysql> select * from person;
+-----+-------------------------------+
| id  | name                          |
+-----+-------------------------------+
| 771 | ant                           |
+-----+-------------------------------+

Listing 1: An XA transaction in SQL

The branch of a global transaction is started within the database (a resource manager) on line 1. Any arbitrary transaction ID can be used and typically the global transaction manager inside the application server generates this ID. Line 3 is where the “business code” is executed, i.e. all the statements relating to why we are using the database, i.e. to insert data and run queries. Once all that business stuff is finished, and between lines 1 and 5 you could be calling other remote resources, the transaction is ended using line 5. Note however that the transaction isn’t yet complete, it just moves to a state where the global transaction manager can start to query each resource manager as to whether it should go ahead and commit the transaction. If just one resource manager decides it does not want to commit the data, then the transaction manager will tell all the others to rollback their transactions. If however all of the resource managers report that they are happy to commit the transaction, and they do so via their response to line 7, then the transaction manager will tell all the resource managers to commit their local transaction using a command like that on line 9. After line 9 runs, the data is available to everyone as the select statement on line 11 demonstrates.

Two phase commit is about consistency in a distributed environment. Rather than just looking at the happy flow, we also need to understand what happens during failure after each of the above commands. If any of the statements up to and including the prepare statement fail, then the global transaction will be rolled back. The resource managers and the transaction manager should all be writing their state to persistent durable logs so that in the event of them being restarted they can continue the process and ensure consistency. Up to and including the prepare statement, the resource managers would rollback the transactions if they failed and were restarted.

If some resource managers report that they are prepared to commit but others report they want to rollback, or indeed others don’t answer, then the transaction will be rolled back. It might take time, if resource managers have crashed and become unavailable, but the global transaction manager will ensure that all resource managers rollback.

Once however all resource managers have successfully reported that they want to commit, there is no going back. The transaction manager will attempt to commit the transaction on all resource managers even if they temporarily become unavailable. The result is that temporarily there may be inconsistencies in the data which other transactions can view, as say one resource manager that crashed has not yet been committed, even though it has been restarted and is again available, but eventually, the data will become consistent. This is an important point, because I have often heard, and I even used to cite that the two phase protocol guaranteed ACID consistency. It doesn’t – it guarantees eventual consistency – only the local transactions viewed as individuals have ACID properties.

There is one more important step in the two phase commit protocol, namely recovery, which must be implemented for failure cases. When either the transaction manager or a resource manager becomes unavailable, the transaction managers job is to keep trying until eventually the entire system again becomes consistent. In order to do this it can query the resource manager to find transactions which the resource manager believes to be incomplete. In Mysql the relevant command is shown in listing 2 together with its result, namely that a transaction is incomplete. I ran this command before the commit command in listing 1. After the commit, the result set is empty, since the resource manager cleans up after itself and removes the successful transaction.

mysql> XA RECOVER ;
+----------+--------------+--------------+----------+
| formatID | gtrid_length | bqual_length | data     |
+----------+--------------+--------------+----------+
|        1 |            8 |            0 | someTxId |
+----------+--------------+--------------+----------+

Listing 2: The XA recover command in SQL

Design

The JCA spec includes the ability for the transaction manager to retrieve an XA Resource from the adapter, which represents a resource which understands commands like start, end, prepare, commit, rollback and recover. The challenge is to use that fact to create a resource adapter that can call services which have the ability to be commit and rolled back, in a similar fashion to a remote database engine. If we make a few assumptions we can define a simplified contract which such services need to implement, so that we can bind them into distributed transactions.

Consider for example NFR-22 and MMF, the acquirer system. Typically payment systems let you reserve money and then shortly afterwards book the reservation. The idea is that you call their service to ensure there are funds available and you reserve some money, you then complete all your business transactions on your side, and then definitely book the reserved money once your data is committed (see the FAQ for an alternative). The reserving and definitely booking should take no more than a few seconds. Reserving and releasing the reservation in the event of a crash should take no more than a few minutes. The same often goes for ticket booking systems in my experience where a ticket can be booked, and shortly afterwards confirmed, at a time when you are willing to take responsibility for its cost. I will refer to the initial stage as execution and the latter stage as commit. Of course if you cannot complete business on your side, an alternative to stage two, namely rollback, can be run in order to cancel the booking. If you aren’t friendly enough to rollback and you just leave the reservation open the provider should eventually let the reservation timeout so that the reserved “resource” (money or tickets in this example) can be used elsewhere, for example so that the customer can go shopping or the ticket can be bought by someone else. What is the motivation for this timeout? Three phase commit and the experience of the author that back end systems like ticket booking systems and payment systems only reserve resources like tickets and money for a limited amount of time.

It is that kind of system that we want to bind into our transaction. The contract for such services looks as follows:

  1. The provider should make three operations available: execution, commit and rollback (although commit is actually optional 1),
  2. The provider may let non-committed and non-rolledback executions timeout after which any reserved resources may be used in other transactions,
  3. A successful execution guarantees that the transaction manager is allowed to commit or rollback the reserved resources, as long as no timeout has occurred 2,
  4. A call to commit or rollback the reservation can be done multiple times without side effects (think about idempotency here), so that the transaction manager may finish the transaction if an initial attempt failed.

Footnote #1: Sometimes web services offer an execution operation and an operation for cancelling the call, e.g. so that money is indeed not taken from the customers account. But they don’t offer an operation for committing the execution. If we go back to the discussion around listing 2 where I stated that the transactions are eventually consistent rather than immediately consistent, it becomes clear that it doesn’t matter if a system in a global transaction definitely books resources during the execution stage rather than waiting until the commit stage. Eventually, either all systems will also commit, or all will rollback and the money transaction will be cancelled, freeing up reserved funds on the customers account. Note however that a service offering all three operations is cleaner, and if it is possible to influence the system design, it is recommended to ensure the services being integrated offer all three operations: execute, commit and rollback.

Footnote #2: After a successful call to the execute operation, a web service may not refuse to commit or rollback the transaction due to say business rules. It may only temporarily fail due to technical problems, in which case the transaction manager may attempt completion again shortly afterwards. It is not acceptable to build business rules into the commit or rollback operations. All validation must be completed during the execution, i.e. before commit or rollback time. The same is true in the database world – during XA transactions the database must check all constraints at latest during the prepare stage, i.e. definitely before the commit or rollback stage.


Let’s compare using a contract like this to using a database. Take the acquirer web service: the money that is reserved during execution is really put to one side and is no longer available to other entities trying to create a credit card transaction. But the money also hasn’t been transferred to our account. There are three states: i) the money is in the customer’s credit; ii) the money is reserved and may not be used by other transactions; iii) the money is booked and is no longer available to the customer. This is analagous to a database transaction: i) a row has not yet been inserted; ii) the row is inserted, but not currently visible to other transactions (although that depends on the transaction isolation level); iii) finally the row is committed and visible to everyone. Although this is similar to the acquirer example the transaction which reserves money in the web service is immediately visible to the entire world once the execution stage is committed – it does not remain invisible until after the commit stage as is the case with the database. The isolation level is different but of course the web service can be built to hide such information, for example based on the state, if the requirements need it to be so.

With a contract like this, there are several fundamental differences to the way in which WS-AT and two phase commit are designed. Firstly transactions encapsulated inside a web service are NOT kept open between execution and commit/rollback. Secondly, because the transaction isn’t kept open, resources are not locked, as they might be when using databases. And these two differences lead to a third: rolling back a web service call is normally not about undoing what it did, rather about changing the state of what it did so that from a business point of view, resources again become available.

These differences are what give the generic connector the advantage over traditional two phase commit. What is really going on in this connector is that we are piggy-backing onto the distributed transaction, cherry-picking the best parts, namely execution, commit, rollback and recovery. By doing this in a Java EE application server, we get transaction management for free!

A final stage, namely recovery (see listings 1 & 2) is required, but it does not necessarily need to be implemented by the web service because the adapter can handle that part internally – after all it knows about the state of the transactions since it has been making calls to the web service.

So, with the above assumptions, we can build a generic JCA resource adapter which tracks transaction state and calls the commit/rollback operations on web services at the correct time, when the transaction manager tells the XA resource to do things like start a transaction, execute some business code and commit or rollback the transaction.

Applicability to Microservice Architectures

Microservice architectures or indeed SOA have one noteworthy problem when compared to monolithic applications, namely that it is hard to keep data consistent in a distributed system. A microservice will typically provide operations for doing work, but should also offer operations to cancel that work. The work doesn’t need to be made invisible, but it does need to be cancelled as far as the business is concerned, so that no more resources (money, time, human effort, etc.) is invested in the work. The adapter presented here can be used inside an “application layer”, i.e. a service which your clients (mobile, rich web clients, etc.) call. That layer should be deployed in a Java EE application server and make use of the generic connector each time one of the microservices in your landscape is called. That way, if something fails, all the microservice calls can be “rolled back” by the transaction manager. The point of the application layer is to control the global transaction so that anything which needs to be done consistently can be monitored and coordinated by the transaction manager, rather than say calling each microservice directly from the client and then having to write code which cleans up and restores consistency.

Using the Adapter

The first requirement I gave myself was to build an API which allows you to add business calls to a web service, inside an existing transaction. Listing 3 shows an example of how to bind the web service call into a transaction using Java 8 lambdas (even though the API is compatible with Java 1.6 – see Github for an example).

@Stateless
public class SomeServiceThatBindsResourcesIntoTransaction {

  @Resource(lookup = "java:/maxant/BookingSystem")
  private TransactionAssistanceFactory bookingFactory;
...
  public String doSomethingInvolvingSeveralResources(String refNumber) {
...
    BookingSystem bookingSystem = new BookingSystemWebServiceService()
                                        .getBookingSystemPort();
...
    try ( ...
      TransactionAssistant bookingTransactionAssistant = 
                                bookingFactory.getTransactionAssistant();
... ) {
      //NFR-34 write sales data to Oracle using JDBC and XA-Driver
      ...

      //NFR-08 book tickets
      String bookingResponse = 
          bookingTransactionAssistant.executeInActiveTransaction(txid -> {

        return bookingSystem.reserveTickets(txid, refNumber);
      });
...
      return response;
    } catch (...) {...}
...
  }
...

Listing 3: Binding a web service call into a transaction

Line 1 designates the class as an EJB which by default uses container managed transactions and requires a transaction to be present on each method call, starting one if none exists. Lines 4-5 ensure that an instance of the relevant class of the resource adapter is injected into the service. Line 9 creates a new instance of a web service client. This client code was generated using wsimport and the WSDL service definition. Lines 13-14 create the “transaction assistant” which the resource adapter makes available. The assistant is then used on line 21 to run line 23 within the transaction. Under the hood, this sets up the XA resource which the transaction manager uses to commit or rollback the connection. Line 23 returns a String which sets the String on line 20 synchronously.

Compare this code to writing to a database: lines 4 and 5 are like injecting a DataSource or EntityManager; lines 9 and 13-14 are similar to opening a connection to the database; finally lines 21-23 are like making a call to execute some SQL.

Line 23 doesn’t do any error handling. If the web service throws an exception it leads to the transaction being rolled back. If you decide to catch such an exception you need to remember to either throw another exception such that the container rolls back the transaction, or you need to set the transaction to roll back by calling setRollbackOnly() on the session context (the demo code on Github shows an example where it catches an SQLException).

So, the overhead of binding a web service call into a transaction is very small and similar to executing some SQL on a database. Importantly the commit or rollback is not visible in the application code above. However we do still need to show the application server how to commit and rollback. This is done just once per web service, as shown in listing 4.

@Startup
@Singleton
public class TransactionAssistanceSetup {
...
  @Resource(lookup = "java:/maxant/BookingSystem")
  private TransactionAssistanceFactory bookingFactory;
...
  @PostConstruct
  public void init() {
    bookingFactory
      .registerCommitRollbackRecovery(new Builder()
      .withCommit( txid -> {
        new BookingSystemWebServiceService()
          .getBookingSystemPort().bookTickets(txid);
      })
      .withRollback( txid -> {
        new BookingSystemWebServiceService()
          .getBookingSystemPort().cancelTickets(txid);
      })
      .build());
...
  }

  @PreDestroy
  public void shutdown(){
        bookingFactory.unregisterCommitRollbackRecovery();
...
  }

Listing 4: One time registration of callbacks to handle commit and rollback

Here, lines 1-2 tell the application server to create a singleton and to do it as soon as the application starts. This is important so that if the resource adapter needs to recover potentially incomplete transactions, it can do so as soon as it is ready. Lines 5-6 are like those in listing 3. Line 11 is where we register a callback with the resource adapter, so that it gains knowledge of how to commit and rollback transactions in the web service. I have used Java 8 lambdas here also, but if you are using Java 6/7 you can use an anonymous inner class instead of the new builder on line 11. Lines 13-14 simply call the web service to book the tickets which were previously reserved, on line 23 of listing 3. Lines 17-18 cancel the reserved tickets, should the transaction manager decide to rollback the global transaction. Very importantly, line 26 unregisters the callback for the adapter instance when the application is shutdown. This is necessary because the adapter only allows you to register one callback per JNDI name (web service) and if the application were restarted without unregistering the callback, line 11 would fail with an exception the second time that the callback is registered.

As you can see, binding a web service, or indeed anything which does not naturally support transactions, into a JTA global transaction is very easy using the generic adapter that I have created. The only thing left is to configure the adapter so that it can be deployed together with your application.

Adapter Configuration

The adapter needs to be configured once per web service which it should bind into transactions. To make that a little clearer, consider the code in listing 4 for registering callbacks for commit and rollback. Only one callback can be registered per adapter instance i.e. JNDI name. Configuring the adapter is application server specific, but only because of where you put the following XML. In Jboss EAP 6 / Wildfly 8 upwards, it is put into <jboss-install-folder>/standalone/configuration/standalone.xml, between the XML tags similar to <subsystem xmlns="urn:jboss:domain:resource-adapters:...>

<resource-adapters>
  <resource-adapter id="GenericConnector.rar">
    <archive>
      genericconnector-demo-ear.ear#genericconnector.rar
    </archive>
    <transaction-support>XATransaction</transaction-support>
    <connection-definitions>
      <connection-definition 
          class-name=
            "ch.maxant.generic_jca_adapter.ManagedTransactionAssistanceFactory" 
          jndi-name="java:/maxant/BookingSystem" 
          pool-name="BookingSystemPool">
        <config-property name="id">
          BookingSystem
        </config-property>
        <config-property name="handleRecoveryInternally">
          true
        </config-property>
        <config-property name="recoveryStatePersistenceDirectory">
          ../standalone/data/booking-tx-object-store
        </config-property>
        <xa-pool>
          <min-pool-size>1</min-pool-size>
          <max-pool-size>5</max-pool-size>
        </xa-pool>
        <recovery no-recovery="false">
          <recover-credential>
            <user-name>asdf</user-name>
            <password>fdsa</password>
          </recover-credential>
        </recovery>
      </connection-definition>
      ... one connection-definition per registered commit/rollback callback
    </connection-definitions>
  </resource-adapter>
</resource-adapters>

Listing 5: Configuring the generic resource adapter

Listing 5 starts with the definition of a resource adapter on lines 2-35. The archive is defined on line 4 – note the hash symbol between the EAR file name and the RAR file name. Note that you may also need to stick the Maven version number in the RAR file name. It depends upon the physical file in your EAR and app servers other than JBoss may use different conventions. Line 6 tells the application server to use the XAResource from the adapter so that it is bound into XA transactions. Lines 8-32 then need to be repeated for each web service which you want to integrate. Lines 9 and 10 define the factory which the resource adapter provides and this value should always be ch.maxant.generic_jca_adapter.ManagedTransactionAssistanceFactory. Line 11 defines the JNDI name used to lookup the resource in your EJB. Line 12 names the pool used for the connection definition. It is recommended to use a unique name per connection definition. Lines 13-15 define the ID of the connection definition. You must use a unique name per connection definition. Lines 16-18 tell the resource adapter to track transaction state internally so that it can handle recovery without help from the web service which is being integrated. The default value is false, in which case you must register a recovery callback in listing 4 – see listing 6 below. Lines 19-21 are required if the resource adapter is configured to handle recovery internally – you must provide the path to a directory where it should write the transaction state which it needs to track. It is recommended to use a directory on the local machine where the application server is running, rather than one located on the network. Lines 22-31 are required for JBoss so that it really does use the XAResource and bind calls into the global transaction. It is possible that other application servers only require line 6 – the deployment to other application servers has not yet been fully tested (more details…).

Recovery

Until now I haven’t said much about recovery. Indeed, the handleRecoveryInternally attribute in the XML in listing 5 means that the application developer doesn’t really need to think about recovery. Yet if we return to listing 2, recovery is a clear part of the two phase commit protocol. Indeed Wikipedia states that “To accommodate recovery from failure (automatic in most cases) the protocol’s participants use logging of the protocol’s states. Log records, which are typically slow to generate but survive failures, are used by the protocol’s recovery procedures.“. The participants are the resource managers, e.g. the database or web service, or perhaps the resource adapter if you wish to interpret it so. I have to be honest that I don’t fully understand why the transaction manager cannot do this instead. So that the resource adapter is more flexible, but also in case you are not allowed to let the adapter write to the file system (operations management departments in big corporations tend to be strict like this), it is also possible to provide the resource adapter with a callback so that it can ask the web service for an array of transaction numbers which the web services feels are in an incomplete state. Note, if the adapter is configured as above, then it tracks the state of the calls to the web service itself. The information that the web service’s commit or rollback method was called is saved to disk after a successful response is received. If the application server crashes before the information can be written it isn’t so tragic, since the adapter will tell the transaction manager that the transaction is incomplete, and the transaction manager will attempt to commit/rollback using the web service once again. Since the web service contract defined above requires that the commit and rollback methods may be called multiple times without causing problems, it should be absolutely no problem when the transaction manager then attempts to re-commit or re-rollback the transaction. That leads me to state that the only reason you would want to register a recovery callback is that you are not allowed to let the resource adapter write to disk. But I should state that I do not fully understand why XA requires the resource manager to provide a list of potentially incomplete transactions, when surely the transaction manager is able to track this state itself.

Setting up recovery so that the adapter uses the web service to query transactions which it thinks are incomplete, involves first setting the handleRecoveryInternally attribute in the deployment descriptor to false (after which you do not need to supply the recoveryStatePersistenceDirectory attribute) and second, adding a recovery callback, as shown in listing 6.

@Startup
@Singleton
public class TransactionAssistanceSetup {

  @Resource(lookup = "java:/maxant/Acquirer")
  private TransactionAssistanceFactory acquirerFactory;
...
  @PostConstruct
  public void init() {
    acquirerFactory
      .registerCommitRollbackRecovery(new Builder()
      .withCommit( txid -> {
...
      })
      .withRollback( txid -> {
...
      })
      .withRecovery( () -> {
        try {
          List<String> txids = new AcquirerWebServiceService().getAcquirerPort().findUnfinishedTransactions();
          txids = txids == null ? new ArrayList<>() : txids;
          return txids.toArray(new String[0]);
        } catch (Exception e) {
          log.log(Level.WARNING, "Failed to find transactions requiring recovery for acquirer!", e);
          return null;
        }
      }).build());
...    

Listing 6: Defining a recovery callback

Registering a recovery callback is done next to the registration of the commit and rollback callbacks that were setup in listing 4. Lines 18-26 of listing 6 add a recovery callback. Here, we simply create a web service client and call the web service to get the list of transaction IDs which should be completed. Any errors are simply logged, as the transaction manager will soon come by and ask again. A more robust solution might choose to inform an administrator if there is an error here, because firstly errors shouldn’t occur here and secondly the transaction manager calls this callback from a background task, where no user will ever be shown an error. Of course, if the web service is currently unavailable, the transaction manager will receive no transaction IDs, but the hope is that the next time it tries (roughly every two minutes in JBoss Wildfly 8), the web service will again be available.

Tests

To test the adapter, a demo application based on the scenario described at the start of this article was built (also available on Github) which calls three web services and writes to the database twice, all during the same transaction. The acquirer supports execution, commit, rollback and recovery; the booking system supports execution, commit and rollback; the letter writer only supports execution and rollback. In the test, the process first writes to the database, then calls the acquirer, then the booking system, then the letter writer and finally it updates the database. This way, failure at several points in the process can be tested. The adapter was tested using the following test cases:

  • Positive Testcase– here everything is allowed to pass. Afterwards the logs, database and web services are checked to ensure that indeed everything is committed.
  • Failure at end of process due to database foreign key constraint violation– Here the web services have all executed their business logic and the test ensures that after the database failure, the transaction manager rolls back the web service calls.
  • Failure during execution of acquirer web service– here, the failure occurs after an initial database insert to check that the insert is rolled back
  • Failure during execution of booking web service– here, the failure occurs after an initial database insert and the web service call to the acquirer to check that both are rolled back
  • Failure during execution of letter writer web service– here, the failure occurs after an initial database insert and two web service calls to check that all three are rolled back
  • During commit, web services are shut down– by setting breakpoints in the commit callbacks, we can undeploy the web services and then let the process continue. Initial committing on the web services fails but the database is fine and the data is available. But after the web services are redeployed and up and running, the transaction manager again attempts to carry out the commit which should be successful.
  • During commit, the database is shut down– also using breakpoints, the database is shutdown just before the commit. Commit works on the web services but fails on the database. Upon restarting the database, the next time the transaction manager runs a recovery it should commit the database.
  • Kill application server during prepare, before commit– Here we check that nothing is ever commit.
  • Kill application server during commit– Here we check that after the server restarted and recovery runs, that consistency across all systems is restored, i.e. that everything is committed.

Results

The demo application and resource adapter log everything they do, so the first port of call is to read the logs during each test. Additionally, the database writes to disk, so we can use a database client to query the database state for example select * from person p inner join address a on a.person_FK = p.id;. The acquirer writes to the folder ~/temp/xa-transactions-state-acquirer/. There, a file named exec*txt exists if the transaction is incomplete, otherwise it is named commit*txt or rollback*txt if it was commit or rolledback, respectively. The booking system writes to the folder <jboss-install>/standalone/data/bookingsystem-tx-object-store/. The letter writer writes to the folder <jboss-install>/standalone/data/letterwriter-tx-object-store/. The adapter removes the temporary file named exec*txt once the transaction is commit or rolled back, so the only way to verify completion is to read the adapter logs, although checking that the files are removed makes sense, albeit doesn’t inform whether there was a commit or a rollback.

The results were all positive and as expected although an ancient bug in Mysql provided a nice little challenge to overcome, which I will write about in a different article. If you have difficulty with your database, take a look at the JBoss manual which provides tips on getting XA recovery working with different databases.

FAQ

  • The service I am integrating only offers an operation to execute and an operation to cancel. There is no commit operation. No worries – this is acceptable and discussed above, where the contract that web services should fulfil is discussed. Basically, call the execute operation during normal business processing and the cancel operation only if there is a rollback. During the commit stage, don’t do anything, since data was already committed during the call to the execute operation.
  • What happens if a web service takes a long time to come back online, after a business operation is executed but before the commit/rollback operation has been called? Transactions that require recovery may end up in trouble if they take a long time to come back online, because it is recommended that the systems behind the web service implement a timeout after which they clean up reserved but not booked (committed) resources. Take the example where a seat is reserved in a theatre during the execution but the final booking of the seat is delayed due to a system failure. It is entirely possible that the seat will be released after say half an hour so that it can be sold to other potential customers. If the seat is released and some time later the application server which reserved it attempts to book the seat, there could be an inconsistency in the system as a whole, as the other participants in the global transaction could be committed, indicating that the seat was sold, and for example money was taken for the seat, yet the seat has been sold to another customer. This case can occur in normal two phase commit processes. Imagine a database transaction that creates a foreign key reference to a record, but that record is deleted in a different transaction. Normally the solution is to lock resources, which the reservation of the seat is actually doing. But indefinate locking of resources can cause problems like deadlocks. This problem is not unique to the solution presented here.
  • Why don’t you recommend WS-AT? Mostly because the world is full of services which don’t offer WS-AT support. And the adapter I have written here is generic enough that you could be integrating non-web service resources. But also because of the locking and temporal issues which can occur, related to keeping the transaction open between the execution and commit stages.
  • Why not just create an implementation of XAResource and enlist it into the transaction using the enlistResource method? Because doing so doesn’t handle recovery. The generic connecter presented here also handles recovery when either the resource or the application server crash during commit/rollback.
  • This is crazy – I don’t want to implement commit and rollback operations on my web services! WS-AT is for you! Or an inconsistent landscape…
  • I’m in a microservice landscape – can you help me? Yes! Rather than letting your client call multiple microservices and then having to worry about global data consistency itself, say in the case where one service call fails, make the client call an “application layer”, i.e. a service which is running in a Java EE application sever. That service should make calls to the back end by using the generic connector, and that way the complex logic required to guarantee global data consistency is handled by the transaction manager, rather than code which you would have to otherwise write.
  • The system I am integrating requires me to call its commit and rollback methods with more than just the transaction ID. You need to persist the contextual data that you use during the execution stage and use the transaction ID as the key which you can then use to lookup that data during commit, rollback or recovery. Persist the data using an inner transaction (@RequiresNew) so that the data is definitely persisted before commit/rollback/recovery commences – this way it is failure resistant.
  • The system I am integrating dictates a session ID and does not take a transaction ID. See the previous answer – map the transaction ID to the session ID of the system you are integrating. Ensure that you do it in a peristent manner so that your application can survive crashes.
  • The payment system I am integrating executes the payment on their own website, but the “commit” occurs over an HTTP call. Can I integrate this? Yes! Redirect to their site to do the payment; when they callback to your site, run your business logic in a transaction and using the transaction assistant execute a no-op method in the execution stage which will cause the commit callback to be called at commit time; in the commit callback make the HTTP call to the payment system to confirm the payment.

Using the generic adapter in your project

To use the adapter in your application you need to do the follwing things:

  • Create a dependency on the ch.maxant:genericconnector-api Maven module,
  • Write code as shown in listing 3 to execute business operations on the web services that your application integrates,
  • Setup commit and rollback callbacks as shown in listing 4, and optionally a recovery callback as shown in listing 6,
  • Configure the resource adapter as shown in listing 5
  • Deploy the resource adapter in an EAR by adding a dependency to the Maven module ch.maxant:genericconnector-rar and referencing it as a connector module in the application.xml deployment descriptor.

For more information, see the demo application.

Conclusions

The idea that I had, namely to bind web service calls into JTA transactions using a generic JCA resource adapter does work. It eliminates the need to build your own transaction management logic and it does ensure that there is consistency across the entire landscape, regardless of whether a transaction is committed or rolled back in the application code running in the Java EE application server.

Further Reading

A plain english introduction to CAP Theorem
Eventual consistency and the trade-offs required by distributed development
The hidden costs of microservices
Microservice Trade-Offs
Starbucks Does Not Use Two-Phase Commit

Copyright ©2015, Ant Kutschera, with thanks to Claude Gex for his review.