<< Previous | Home

Parkinsons Law applied to Software Projects

While recently listening to a BBC podcast I learned about Parkinson's law and realised how it simply explains why traditional software project which work to deadlines, fail so often. Simply put, the law states "work expands so as to fill the time available for its completion". This year I had to help work on an estimate used to promise a delivery date to upper management. We use SAFe at this organisation (where I work as a system architect in a troika managing an agile release train), and did an agile estimate, so it was based on previous features and we tried to estimate just their complexity rather than say hours, and we added factors to handle risks and added additional reserves for other unknowns. The date we named was roughly 6 months ahead of the estimation date, based on work done in the 18 months prior. That date was communicated both to our 8 software development teams, and upper management. So the expectation of when to finish had been set. According to my interpretation of Parkinson's law, it doesn't matter how much work there really is, because if there isn't enough time you trim optional features or quality that nobody is really counting on.  On the other hand, if there is too much time, you do things like write more automated tests or little scripts here and there, those things you always wanted to do but never had time to do. I've seen our various teams doing these things, and sometimes both, depending on where they were in the timeline.

Note that the way we implement SAFe means that we don't micro-manage the work done sprint by sprint, rather we let the teams work, we sync once a week to get information about progress, and then wait for the end of the 3-month programme increment (PI) when we check over the PI objectives and overall progress. But anything that wasn't completed, spills over into the next PI, normally with further refinements before the PI-planning in order to better understand what is still left.

Now, two months prior to the deadline, we appear to be heading for roughly an on schedule delivery, although we have started to trim features down to the bear minimum, and we are now suddenly finding more and more things we didn't think about - the devil is in the details, and much of it is only discovered as features are completed and tested, especially together with others which may have been finished for longer. That is quite typical in my experience - if you estimate well, and give teams plenty of time to complete the work, they take the time up front to do a good job, probably "over engineering" the solution.  But as time goes by, you pay for that as you get closer to the deadline, because time that was invested in that higher quality is then missing for tasks that weren't planned.

I've worked on two seriously micro-managed projects - one that I micro-managed as the architect & developer back in 2006 with a team of 3 other developers. The other was with a number of teams, probably about 20 developers, circa 2012. By cracking the whip micro-managing all the tasks, you can ensure that work done early in the project is still finished early, leaving room for the unplanned things that always turn up later in the project. Were either of them actually successful? That depends on how you measure success - they were certainly feature complete, on time, on budget and they appeared to have the required quality.  However both left the impression on those who took care of the work after the project (me included), that they weren't very well engineered. In the first one, we had a lot of duplicate code and didn't take the time to factor out common parts (code / designs). In the second one, we hadn't had the time to make allowances for future projects which were just around the corner. Both can be classed as technical debt. Both can also be classed as problems that should be adressed by future projects, if and when they come (ever worked on solutions which don't make progess because your'e too busy engineering for the future?)

An alternative is to use feature toggles and actually release, when you (everyone: the teams, the business, the users) are ready for it. Often the business will give reasons why this isn't realistic, for example they need to market the project and organise a launch, maybe organising expensive PR or training for users, or doing final acceptance tests, or needing to migrate an entire inventory of legacy data at the time of the release. For me, those are reasons that don't exist in 2021. Big software enterprises (think of the sites we are confronted with daily) don't launch apps like this, rather they go for soft / viral launches. Training is done with self-taught videos and by embedding context relevant tips and tricks into the application and gamifying it to encourage users to discover new functionality. Acceptance testing is something we do on a daily basis anyway, with our continous deployment and releasing. Regarding data migrations - just expose the data so it is usable and decouple the migration and split it, so that it can be partially migrated step by step, mitigating risks associated with big bang releases. It's only really the launch party which suffers if there are delays 😜

One thing is for sure - I stated at the time that I was dead against naming a deadline, and that I never want to do it again. Next time anyone suggests it, I am going to quote Parkinson's law at them!

Copyright ©2021, Ant Kutschera

Social Bookmarks :  Add this post to Slashdot    Add this post to Digg    Add this post to Reddit    Add this post to Delicious    Add this post to Stumble it    Add this post to Google    Add this post to Technorati    Add this post to Bloglines    Add this post to Facebook    Add this post to Furl    Add this post to Windows Live    Add this post to Yahoo!

Eleven Patterns, Problems & Solutions related to Microservices and in particular Distributed Architectures

The wish to fulfil certain system quality attributes lead us to choose microservice architectures, which by their very nature are distributed, meaning that calls between the services are effectively calls to remote processes. Even without considering microservices, we distribute software in order to meet scalability and availabilty requirements, which also causes calls to be remote. By choosing to support such quality attributes, we automatically trade off others, typically data consistency (and isolation), because of CAP theorem. In order to attempt to compensate for that, so that data eventually becomes consistent, we introduce mechanisms to retry remote calls in cases where they fail, for example using the transactional outbox pattern (based either on CDC to tail changes to the command table, or building a polling mechanism such as here). This in turn forces us to make our interfaces idempotent, so that duplicate calls do not result in unwanted side effects. Retry mechanisms can also cause problems related to the ordering of calls, for example if the basis for a call has changed, because while the system was waiting to recover before retrying the call again, another call from a different context was successful. Ordering problems might also occur simply because we add concurrency to the system to improve performance. Data replication is another mechanism that we might choose in order to increase availability or performance, so that the microservice in hand does not need to make a call downstream, and that downstream service does not even need to be online at the same time. Replication can however also be problematic, if changes to the master have not yet been replicated. Replication, ordering and idempotency all have an effect on system complexity, clarity, reproducibility and our ability to undertand and reason about what is happening, or indeed did happen. Mostly, the effect of eventual consistency on isolation is not even addressed in microservice landscapes. At the end of this article, I have created a diagram which sums up these dependencies.

As you can see, there are a number of technical challenges that are introduced, just because a system is distributed.  I recently undertook some self-study training and built "The Milk Factory" which is an attempt at building a event driven microservice architecture based on MVC across the entire landscape, that is, in backend services too. I did this because I wanted to explore the effects of an Event Driven Architecture (EDA) as well their reactive nature. In theory they are more loosely coupled, for example from a temporal point of view, in that the microservices do not all need to be online at the same time as is the case with a hierarchy of services communicating with synchronous HTTP calls.  Not only does the asynchronous nature of an EDA improve availability, but it also has a positive effect in regard to back pressure, because a service that is already struggling with load will only fetch more work from the message infrastructure when it has the capacity to deal with it, rather than being bombarded with requests e.g. via HTTP, from upstream services that have no idea that it is struggling to cope with the load.  While an EDA is often cited as also being more scalable, the question is, with regard to what? I do not address this point further, because HTTP based solutions built on cloud services like Kubernetes also scale well.

A Quick Note on MVC and Event Driven Architectures

In 2008 I wrote a blog article in which I posted the following picture:

Note the clockwise nature of the arrows which is paramount to making MVC work well. Facebook came up with Flux a few years later which "is a system architecture that promotes single directional data flow through an application" [1]. While out running one day I came up with the idea of how to apply such a pattern to the entire landscape:

The Milk Factory Architectural Concept

The diagram above was the basis for my architecture and over the course of several months I developed a small set of Quarkus based microservices to implement a number of business processes around the sale and billing of contracts for ordering bespoke milkshakes. Consider the following diagram which shows the process used by The Milk Factory to create or modify an offer.

The first step is to send a request from the browser to the backend, implemented using an HTTP call (1). The contract service is responsible for the contract and it's contents, so called "components" which are modelled as a tree containing the recipe for the milkshake, but also it's packaging. Each component, say the milk, can have a configuration containing attributes like the fat content of the milk. The components and their configurations can then be referenced by rules in other microservices like the pricing microservice which will contain the rules used to provide a price for each component. The sum of a component's children makes its base price, which can be affected by rules too. Ultimately the root component contains the total price for the contract.  The contract microservice deals with the request made by browser, updates its state in the database, and produces messages in Kafka which downstream microservices can consume. These can either be in the form of commands or events. A command is typically sent from one microservice to another, to get some explicit work done by the receiver, for example to create a human task in the workflow system. An event is typically a simple message to the landscape, that something has happened, for example that the price of an offer was calculated. Microservices may choose to react to such events and may then produce further commands and/or events.

The next step (2) is an example of a command message, where the business logic in the contract service decides that a human task needs to be created, for example to do a background check on a potential customer. This command is consumed and processed by the workflow service which produces an event (3). This event is consumed by the "web" service which is a backend microservice, whose responsibility is to send the events that it receives to the relevant browsers (4). It does this using SSE in lieu of a WebSocket, because often SSE fulfils security requirements better than the latter. In order to be able to send messages to the correct browser instance, a session ID is propagated throughout the landscape with all the messages sent over Kafka or indeed HTTP (in both cases as a header). Individual browsers start a session with that backend service, some time before they make requests to other microservices. Note that this web backend service can be scaled, as long as individual instances subscribe to all Kafka partitions using a unique group ID, so that it does not matter to which instance a specific browser is attached.

At the same time as creating the command to create a human task, the contract service also produces an event to say that it has created or modified a contract (5). Although not shown explicitly on the diagram, this event is also sent to the browser, which can start to render the contract and its components.  The DSC service handles discounts, surcharges and conditions and contains business rules which decide if a discount or surcharge applies and it attaches conditions as required.  It consumes contract events (5) and publishes an event after persisting its data (6). This event is also sent to the browser, although not shown directly on the diagram, so that the browser can append such information to the rendered contract.

The pricing service consumes messages from the DSC service (6) and together with the contract data that was passed on in that same event, it is able to calculate the price of the contract. If say a technical exception were to occur, this could result in the service sending the original message to a "waiting room" service (7) which will forward the message back to the pricing service a short time later, so that it can be retried (8). Assuming the technical problem has been resolved, the pricing service can then recalculate the prices and publish them (9). These too make their way to the browser (10), which can now complete the rendering process by attaching a price and say VAT information to the offer. The user can now continue to modify the offer, or indeed choose to accept it and complete the sale.

Note that all attempts to write data are done using Kafka as the transport mechanism, rather than HTTP, except for the very first one from the browser. This is a design choice made primarily so that downstream services are not unknowingly bombarded with HTTP calls, but also so that the retry mechanism described above can be employed by all services, rather than just an orchestrator at the top of the call hierarchy.

While implementing The Milk Factory, I encountered a number of challenges, which are now listed together with potential solutions.

1. Synchronous albeit Asynchronous

Having a synchronous flow of calls makes the path through the landscape easier to understand, reflect upon and reason about. The term "synchonous" also makes us think of a coordinated and deterministic sequence of steps. This can be achieved by blocking the user from doing more than one thing at a time and waiting for all the relevant responses to arrive back at the browser before letting them continue to work. More precisely, we block the user from doing more than one action at a time (e.g. changing the configuration of a component in the contract), but we let events flow back to the browser as they happen, updating the relevant part of the screen accordingly (e.g. the changes to the contract details; discounts/conditions changed by business rules; prices; a list of human tasks). By updating the screen as events occur, the user is not blocked from viewing the contract until everything is ready, and this allows them to visualise and process the data as early as possible. But at the same time, we don't want the user to start another action, before the model has become globally consistent.

2. Maintaining Order

When building an event driven architeture, it is necessary to design the queues / topics over which messages will flow. One option is to have a Kafka topic per microservice and type of data it publishes, for example a topic for discounts and surcharges, and a second topic for conditions, although these data are all published by the DSC component. Another option is to create a single Kafka topic for a whole group of related data.  I chose to allow the contract, DSC and pricing services to use the same topic, and by using the contract ID as the Kafka record key, this ensures that the order of the records is maintained for a given contract. The browser will never receive pricing data before it receives the contract or discount data to which the price belongs. The same is true for data related to billing, partners, workflow (human tasks), etc.

By making the order in which records arrive determinstic, we reduce the system complexity and make it easier to reason about what is happening.  The trade off that we need to make is that of concurrency and the ability to retry in the case of failure. We cannot have a deterministic order, and have two services process data related to one contract at the same time - in the case of The Milk Factory, this is OK because we need to let the DSC service run its business rules before we can calculate prices. The functional dependencies coming from business requirements automatically limit the amount of concurrency we can add to the system. We also cannot have a deterministic order, if we allow a retry mechanism to send data to "the back of the queue" in order for it to be attempted later.

As an alternative we could block all messages after the current failing one, until it can be processed, perhaps under the assumption that if the current message cannot be processed due to technical difficulties, then none of the others will be able to be processed either.  But is it correct to block all other attempts, even those from other users related to other contracts, and effectively block all users and torpedo the system availability? Not if you choose availability over consistency, which is one of the reasons to distribute the services.

Ordering rears its ugly head in other circumstances too. Imaging a billing system that needs up to date partner and contract data in order to send a bill. If while designing the system we made the assumption that the partner data will be available before we process a contract, but the ordering is not guaranteed because e.g. different topics are used to transport the data, the billing system will end up in an erroneous state if partner data is missing. It can get even worse, e.g. the partner data is stale because an up to date address has not yet been received at the time that a contract message arrives. By guaranteeing the order, and guaranteeing that partner data will be received before contract data, we have less problems to solve and the system is simpler.

More information is available here and here.

3. Error Propagation back to the Source

When using synchronous HTTP, errors are propagated back to the original caller, even over a hierarchy of remote calls (modern Java EE / Microprofile Libraries map HTTP error codes into exceptions and vice versa). This may lead to an inconsistent state from a global perspective, but the important point is that the source of the original call receives an error and if a user is involved, they are informed. An event based architecture should not behave differently. Traditionally message based systems dump messages that cannot be successfully processed into a dead letter queue / topic, but this only puts the problem on hold until an administrator gets involved. In The Milk Factory any errors are sent to a special Kafka topic named "errors", but this topic is subscribed to by the service which is connected to the browsers with SSE, and so the error can be propagated back to the correct browser and user, by means of the session ID (see above).  The user experience is no different than when an error occurs during synchronous HTTP. Problems related to global inconsistencies need to be solved anyway, in distributed landscapes, and suitable mechanisms are listed below.

4. Timeouts and Time-to-Live (TTL)

If the response to a synchronous HTTP call takes too long, a timeout typically occurs, and depending upon whether the problem is handled locally or not, local transactions may be rolled back and the error returned to the client.  Things get a little more complex, when a service which is supposed to process an event is unavailable, but the message has been uploaded to the message server.  In a system which processes everything in the background, this is not necessarily a problem, as it probably doesn't matter if the message is processed immediately, or in 5 hours (or perhaps even 5 days).  In a system where an end user is waiting for something to happen, they are likely to start to retry their action and if they are still blocked, they will probably move on to do something that uses different processes.  What should the system then do, when the service that was down becomes online and is able to process the messages? It is probably best to let it do that because this way, global data consistency can be achieved. There are however cases where it is not worth processing these messages or where doing so it more likely to lead to errors, because the chances are that other processes have already influenced the data upon which the current message is dependent.  It can be worth adding a time-to-live header to messages, after which they are simply disposed. Note that optimistic locking strategies like JPA's version column can help to ensure that the processing of messages which no longer match the underlying data are handled correctly, rather than simply letting the last message that is written "win" over all previous writes. Unfortunately, this topic is too complex to have a standard simple solution, and it really depends on your system and the data and processes being used and implemented, as to what the best solution is. Time-to-live mechanisms are still a handy tool in some cases.

5. Dependency Leaks and Encapsulation

Should a service be given all the data it depends on, to do it's processing, or should it be given the freedom to go and fetch this data from downstream services on which it then depends?  This question is related to the tell don't ask problem, but also depends on your domain, the size of your microservices and non-functional requirements related to performance and resilience.

Imagine that a service has already called a couple of other services and as such already has access to data, or at least it could have, if those services return it during those calls. You might be tempted to pass that data on to a third service if it depends on it, because this will save remote calls from the third service to the others, improving performance and resilience but also saving you having to write adapters in the third service in order to call the others, as well as perhaps cumbersome consumer driven contract tests.

Now imagine a domain in which you are selling products, and some process steps are product specific. If you had made the above optimisations, you will now be asking yourself which data is necessary to provide to the third service. It's dependencies start to leak out to the orchestrator which needs this knowledge. You might just have to provide all possible data, just in case the third service needs it, for example causing you to provide data relating to car insurance bonus protection, even though the product being sold is house insurance which doesn't support a bonus system.  It would actually be nice if the API could document and enforce what data must be provided, but as soon as it becomes product specific or as soon as it depends on other data, this becomes impossible without providing a specific interface, which too is a form of dependency leaking.  So you may choose to trade off the performance, resilience and maintenance wins against the dependency loss.

The problem can become more complex when building an event driven architecture.  If you choose to let services fetch data, do you want to allow them to do that with HTTP, or do you want to enforce all communication via messaging, so that you don't unknowingly bombard services (see above)? I relaxed this design constraint in The Milk Factory and allowed HTTP for reading data, because using messaging to read data is overly complicated - you need to send commands out, receive events as responses, store the pending request somewhere and join all the responses to that request as they come in. Using HTTP to read is much simpler, and if a read fails, we just use retry mechanisms that we need to implement anyway (see below).  Nonetheless, I still sent all of the data from the contract service to the DSC service, and both those sets of data to the pricing service (inside the DSC event), in order to reduce the number of fetches that are required. Every reduction in remote communication, reduces the number of places that errors can occur.  One downside of sending all of a services data on to others, is that it appears to increase the coupling, because now all attributes are available to downstream services. This may not actually be the case if they do not use all that data, but you don't easily know what fields are being used unless you have an overview of the entire landscape, so while initially the downstream service may only use a few, it's free to start using more. You could also reverse this situation and only publish attributes which you know are used (or might be used, depending on say which product is being sold). That way, the coupling is kept to a minimum, but requires exposing further fields, as new requirements with more dependencies are implemented.  The term "shared data model" has been used by colleagues to describe exposing data from one service to many, and been criticised for increasing coupling. While a common database for the landscape, as may have been the case in the 80-90s, is definitely such a shared model and bad, I would consider using a shared model with events, within the bounds of a well defined set of services which are closely coupled anyway.

6. Repair Processes

The moment that you build a distributed system without a global transaction manager, you can have the problem of inconsistent data, for example because two of three services have written their data, but the third cannot, regardless of whether for technical or functional (business) reasons. So the question arises of how the user can get his entities out of this inconsistent state? Typically you end up having to bulid process steps into the system, or explicitly allow for inconsistent erroneous states in the system. These mechanisms introduce extra complexity compared to a monolith.

Typically you will need to allow the user to fetch the entire state from all the microservices, and use validation to indicate to them which problems exist. They will then need to be able to carry out specific actions in order to make corrections. None of this may relate to the actual functional use cases, but is required to get from a globally inconsistent state back to a consistent and usable one, so that the user can continue and complete the process they have started.

In some situations, you may also consider building a self-healing mechanism into the system. Imagine a sales system like The Milk Factory where a user attempted to add a manual discount, which worked, but the pricing service failed to recalculate the latest price. You could build the system to recognise that the prices are not up to date (see below), and recalculate them, when the user reloads the offer. If the pricing failed due to a business exception, the error can then be displayed to the user, which might for example hint that they first need to set a specific attribute to a valid value, for the discount to become usable. When the user does this, not only is the new value for the attribute saved, but the system can then recalculate the prices without running into the business exception.

7. Prevention of further Process Steps

The time it takes for a number of nodes in a distributed system to become consistent is known as the convergence time. We speak about the time it takes for the nodes to converge.  It can be very useful to block the user from executing further process steps while we wait for the system to converge and a globally consistent state to be reached. Imagine not blocking the user, and allowing them to send the customer an offer, before the latest price has been calculated. If there is no guarantee that the action to send the offer will not be processed after all events used to update the price, it is entirely possible that an offer will be sent to the user with the wrong price.

In The Milk Factory, the UI disables buttons once a process step is started, so that the user cannot start a second process step, without the first one being completed. For example, the offer cannot be sent to the customer, if the system is still updating the configuration of a milkshake contract.

This can become related to timeouts and time-to-live (see above), because we are blocking the user, and at some point, if a microservice is no longer responding, we will want to allow the user to continue doing something else. Note that at no time is the user blocked from leaving the current process, and for example searching for a different customer or contract, in order to do some work elsewhere. But what happens if the user leaves the current process because they cannot move forward, and then they return to the current sales case?  How can the system know that the process step that the user wants to execute will be based on a consistent state and should be allowed?  The next section describes the solution.

Note that blocking the user is precisely the kind of locking which occurs in distributed systems with ACID properties where a global transaction manager implementing say two phase commit is used, and is precisely the problem most often cited when convincing you that it is bad, and that you should build an eventually consistent system.  On the one hand they are right, and if you find yourself building mechanisms like those described here, you might have a problem. On the otherhand, it is sometimes functionally necessary to build such mechanisms, because the business requirements demand it. Sending the customer an offer with the wrong price may not be acceptable. It does however get interesting when you start to ask the question, "why is it not acceptable"? If the price is too low, the customer won't complain. They may however not quite trust your IT systems. In more complex systems, you can even argue that the customer may never know that the price offered is wrong. But watch out, if the price is recalculated while booking the contract or during the billing process, you might well end up with unhappy customers because the price they have to pay is not the price they were quoted.

8. Consistency Mechanisms

The Milk Factory uses a timestamp called the "sync timetamp" to allow it to work out if all parts of a contract are synchronized and the data is globally consistent. The timestamp is chosen by the contract service which acts as an orchestrator for all process steps relating to the creation and alteration of contract offers. It is propagated to all microservices within the contract domain, and persisted by them during each process step, even if the microservice doesn't need to actually update its own business data in that process step. By comparing the timestamp across all services, we can ensure that the data is consistent. If we update for example a contract condition in the DSC service, we also update the sync timestamp in the contract and pricing services, even though this has no effect on the data held in the contract service, and might not affect the price.

This mechanism reduces the performance of the system, as well as reducing the resilience, because a simple alteration requires data processing from all microservices. At the same time, it allows us to guarantee that the offer is consistent before sending it to the customer, or booking it when the customer accepts it. This is a trade off, and one made, because global data consistency in particular financial data are more important at The Milk Factory, than performance and resilience. This is a business decision at the end of the day.

9. Isolation and Visibility of Partially Consistent Data

Should data that is not yet globally consistent, be visible to other technical components and processes? Should a user viewing a customer's contract summaries, see an old price next to an updated payment frequency, where the surcharge for monthly billing has not yet been reflected in the price?  Probably not, if they are going to confirm the price to the customer.  Should a user viewing rooms in a hotel see that the room count has been reduced, even though the flight bookings are not yet confirmed? Almost definitely.  So it really depends on the use cases and what the business' requirements are.

Isolation is probably the hardest of the four ACID properties to get right in a distributed system.  You can add mechanisms to the system, in an attempt to get isolation problems resolved, but they often lead to resource locking. There is a well known pattern called "try-confirm-cancel", not too unrelated to two phase commit (2PC), whereby an orchestrating service tries to get work done by downstream services, and then either confirms that work, or cancels it, depending upon whether or not all tries were successful or not. Just like 2PC, if the system is not going to make tried changes visible to other processes, it effectively locks the related resources (data) until the confirmation or cancellation is executed. These locks are the same as those described above, which are typically slated as the reasons to move from an ACID based system to an eventually consistent one, in order to increase resilience, performance, availability, etc.

10. Automated Retry Mechanisms

In cases where technical errors lead to failure, it can be good to build automatic retries into the system.  Netflix famously introduced Hysterix to address fault tolerance and the Microprofile now defines standard mechanisms for dealing with faults. This is one way to make the system self-healing (another is to build algorithms for detecting inconsistencies and then fixing them - something we do at my current customer's site to compensate for an old AS400 system which allows users to persist wrong contract data).

In 2018 I published an article in which I described a mechanism for saving "commands" in a technical table, next to your business data, within the same transaction. This ensures that global data eventually becomes consistent, by ensuring that those commands are executed, and are retried until they succeed.  I have also implemented an event sending mechanism which builds upon that, to guarantee that events are sent out into the landscape, if, and only if, the your microservice's data is persisted.  This mechanism is now known as the polling publisher pattern, an implementation of the transactional outbox pattern.

Combined with Kafka, this ensures that messages are sent to Kafka, but does not ensure that they are successfully processed on the consumer side. If an error happens during message consumption, you need to think about what to do with that incoming message. If you send it to a dead letter topic, what happens then?  If you throw it away, you will certainly never be globally consistent.

The Milk Factory has a service called the waiting room, which I implemented after reading this article while searching for solutions on how to retry the consumption of a message after a technical failure downstream of the consumer, e.g. a database outage. It basically sends messages to a special topic if a technical error is encountered (determined by certain standard Java exceptions). The waiting room service sends the messages back to the component that just had the technical problem, in order to be retried. A retry count is propagated with the message and incremented accordingly and can be used to implement a backoff strategy in order to reduce the load on a service that is currently struggling.  Note how a library is used in the business services, to encapsulate this technical solution, in order to separate business and technical concerns.

This works well, but introduces a further complexity into the system, namely the loss of ordering (see above). Automatic retries also require services to behave idempotently (see below).

11. Idempotency

Due to retry mechanisms, services need to behave idempotently, which here is used to mean that there should be no unexpected side effects due to processing a message multiple times. For example, if a technical error leads to a message about a discount being retried, a second and subsequent attempt to process the message should not result in the same discount being added to a contract multiple times, if the user only wanted it to be added once.

Idempotentcy in microservices is typically implemented by having the client generate a request ID for the current action which is propagated throughout the landscape with the instruction (command) to do something. Each time that command is processed, a service can check if it has already processed the request ID, and if it has, it should respond as though it has just successfully processed it for the first time. In an event driven architecture this will typically mean publishing relevant events, potentially multiple times, which drives the requirement for idempotent services even more, so that downstream services also cause no side effects. Note that you can use a unique database constraint and always attempt to process the event, without having to first check if it already has been processed (a performance gain). But in a Java EE container, the transaction will then be marked for rollback, and the container might write a log entry at say warn level, which is not actually something you want to be warned about, as it is "normal" in idempotent services and no administrator action is required. Note also that while checking the database for the request ID before writing might appear to work, it is not guaranteed to work if two instances of the microservice are concurrently processing the command for whatever reason - so the use of a unique constraint is more precise.

Sometimes you can avoid mechanisms like this request ID, if for example the commands are absolute.  For example "add discount code XA1 to the offer", combined with an implementation that only allows discount codes to exist zero or one time per offer. If the service always deletes the given codes before inserting them, within the same database transaction, then they will never exist more than one time, no matter how often the command is processed. Note again how there is a trade off made, this time between performance (deleting before inserting) and having to implement idempotency.

Why multiple Services?

This is a side note rather than a pattern. Consider The Milk Factory and how you could choose to put the price calculation algorithm into the same microservice as the contract details (and let's ignore discounts, surcharges and contract conditions for the moment). That would have the awesome effect of ensuring that the price is always consistent with the contract details!  Now let's consider the recurring billing process which takes place periodically. Prices can change if they depend on say markets, or say discounts that themselves depend on the customer's age or time.  If the billing process needs to also recalculate prices, you end up with a dependency from the billing service to the contract service (which now includes pricing), and end up with a new discussion, namely can you not lump those two services together? Sure you can, but be aware that you are moving in the direction of a monolith and sticking concerns together that might be better left apart, especially when you think about all the other things that also influence microservice sizing and design. In The Milk Factory the pricing functionality is a common denominator of multiple processes and as such, is outsourced into its own microservice.

Help with Service Dependencies

This is not a pattern, but a very useful design technique, given to me by a wise colleague. Draw a matrix on a piece of paper and start to put your microservices on it. Draw the dependencies between services as arrows from one service requiring something, to another which provides that something. Arrows are only allowed to go from left to right, and from top to bottom. This means that you will need to start moving the services around in order to not break those rules. If you have arrows pointing from right to left, or bottom to top, then those dependencies are illegal and need to be resolved. That can be done by either cutting relevant parts of microservices out into separate components, or joining two services into a single service (although theoretically the same exercise should be repeated for package/class dependency within each service).

Conclusions

The following diagram shows how the non-functional requirements, quality attributes, the mechanisms listed above and side-effects are related to each other.

All of the patterns discussed above, their problems and solutions, and the trade offs that are discussed, are technical in nature. As an developer/architect of software systems that solve complex business problems, I don't actually want to be faced with these technical issues, but I am forced to do so because of non-functional requirements that cause me to design distributed systems.  During the design process we choose how to divide the business problem in to a multitude of (micro-)services.  That "cutting" process can have a large effect on the number of points in a landscape where things like those discussed above, can go wrong. By reducing the number of microservices, we can reduce the number of points in the system where things can break. There are many criteria used in choosing how large to make a microservice, and they can be functional, technical or even organisational in nature.  I recommend considering the following:

  • If a business process step fits into a single microservice, that means that it can have ACID properties.
  • Not all process steps will fit into a single microservice, meaning that you will encounter some of the patterns described above.
  • A team of around 8 people with up to 5 developers, in a large organisation, should be developing somewhere between one and three microservices (and possibly maintaining one or two more, depending upon how often they change and their complexity).
  • Put all of the mechanisms described above, and many more, into your toolbox and use them as required, like skilled craftspeople do with physical tools.

If you have encountered more patterns and/or would like them added, please get in touch by leaving a comment.

Copyright ©2021, Ant Kutschera

 

Social Bookmarks :  Add this post to Slashdot    Add this post to Digg    Add this post to Reddit    Add this post to Delicious    Add this post to Stumble it    Add this post to Google    Add this post to Technorati    Add this post to Bloglines    Add this post to Facebook    Add this post to Furl    Add this post to Windows Live    Add this post to Yahoo!

Kafka Record Patterns for Data Replication

Imagine going down to your local milkshake bar and signing a contract with the owner so that you could purchase bespoke drinks at a set price. Let's say you agreed on fresh milk with 3.5% fat and one tablespoon of chocolate powder, per 500ml of milk.  Putting that into a table might look like this:

PK contract_number start fat_content chocolate_powder
100 12345678 2021-01-01 3.5% 1 tbsp

After a few weeks, your tastebuds become a little desensitised and you decide you want to add some more chocolate powder. The owner is agile, so he adjusts the contract, meaning we need to add a few columns in order to track validity:

PK contract_number contract_from start end fat_content chocolate_powder
100 12345678 2021-01-01 0001-01-01 2021-01-31 3.5% 1 tbsp
101 12345678 2021-01-01 2021-02-01 9999-12-31 3.5% 2 tbsp

Note two things: 1) this table is not normalised and 2) I used a low date (year 0001) and high date (year 9999) for the start of the first row and the end of the last row.

In reality we would probably normalise this data. For the sake of this example, I won't because it will make it more readable as I add more information below.

The low and high dates are there, so that I can always find data, regardless of the date I use - I don't have to know the contract termination date which is different for every contract, in order to be able to simply ask what the latest recipe is, for a given contract number:

select *
from contracts
where contract_number = '12345678' 
  and '9999-12-31' between start and end;
--> returns row with primary key 101

After a few more weeks, I realise that I need to reduce my calorific intake, but I'm a complete chocoholic. We agree to reduce the fat content:

PK contract_number contract_from start end fat_content chocolate_powder
100 12345678 2021-01-01 0001-01-01 2021-01-31 3.5% 1 tbsp
101 12345678 2021-01-01 2021-02-01 2021-02-28 3.5% 2 tbsp
102 12345678 2021-01-01 2021-03-01 9999-12-31 0.8% 2 tbsp

At some point I get bored of milkshakes and I terminate the contract, but because I never purchased a milkshake with 0.8% fat, the owner lets me terminate it with a date in the past, say 2021-02-14, so that we can delete the last row:

PK contract_number contract_from contract_to start end fat_content chocolate_powder
100 12345678 2021-01-01 2021-02-14 0001-01-01 2021-01-31 3.5% 1 tbsp
101 12345678 2021-01-01 2021-02-14 2021-02-01 9999-12-31 3.5% 2 tbsp

Note that it is a design choice whether or not we "shorten" the end date. We might want to do that in order to make such data not be found after the contract termination date. It depends on requirements more than anything.

What has all this got to do with Kafka, and data replication?

Imagine a self-contained microservice which needs to have an up to date copy of this data, in memory, in order to run lightning fast. Imagine you want that cache to be distributed across all of your service instances (Kubernetes pods). How about the following 7 lines of Kotlin code that use the nifty Kafka Streams API:

val builder = StreamsBuilder()
val globalStore = Materialized.`as`(globalStoreName)
// global, so that every pod has access to all data from all partitions:
builder.globalTable(CONTRACTS_TOPIC, globalStore)
val streams = KafkaStreams(builder.build(), props)
streams.start()
val globalBausteinView = streams.store(fromNameAndType(globalStoreName, ...)

// REST Handler:
val contractJson = globalBausteinView.get(contractNumber)

We need to publish the contract data to the topic used as the input, but before we do that, let's think about the keys we use, in order to have the data survive log compaction. It would be no good to publish three records, each using the contract number as the key, because as soon as the topic were compacted, only the data from the last row would survive, and any service replicating from scratch would have an incomplete dataset. The solution is to include the start date in the key, e.g. "12345678::2021-02-01".

We have a number of options regarding the values (payload). Let's work through the examples.

(Note: initially contracts are valid for 5 years, so the contract_to column always has a value)


1) Denormalised Table, Variation 1 - One Event per Attribute Combination

Use Case PK contract_number contract_from contract_to start end

fat

content

chocolate

powder

records emitted
Contract Creation 100 12345678 2021-01-01

2025-12-31

0001-01-01 9999-12-31 3.5% 1 tbsp

Key:  12345678::2021-01-01

Value: {cn: 12345678, from: "2021-01-01", to: "2025-12-31", start: "2021-01-01", end: "2025-12-31", fatContent: 3.5, choc: 1}

                   
Change choc powder 101 12345678 2021-01-01 2025-12-31 0001-01-01 2021-01-31 3.5% 1 tbsp

Key:  12345678::2021-01-01

Value: {cn: 12345678, from: "2021-01-01", to: "2025-12-31", start: "2021-01-01", end: "2021-01-31", fatContent: 3.5, choc: 1}

102 12345678 2025-12-31 2025-12-31 2021-02-01 9999-12-31 3.5% 2 tbsp Key:  12345678::2021-02-01

Value: {cn: 12345678, from: "2021-01-01", to: "2025-12-31", start: "2021-02-01", end: "2025-12-31", fatContent: 3.5, choc: 2}

                   
Change fat content 101 12345678 2021-01-01 2025-12-31 0001-01-01 2021-01-31 3.5% 1 tbsp none - no changes made
102 12345678 2021-01-01 2025-12-31 2021-02-01 2021-02-28 3.5% 2 tbsp Key:  12345678::2021-02-01

Value: {cn: 12345678, from: "2021-01-01", to: "2025-12-31", start: "2021-02-01", end: "2021-02-28", fatContent: 3.5, choc: 2}

103 12345678 2021-01-01 2025-12-31 2021-03-01 9999-12-31 0.8% 2 tbsp Key:  12345678::2021-03-01

Value: {cn: 12345678, from: "2021-01-01", to: "2025-12-31", start: "2021-03-01", end: "2025-12-31", fatContent: 0.8, choc: 2}

                   
Contract Termination 101 12345678 2021-01-01 2021-02-14 0001-01-01 2021-01-31 3.5% 1 tbsp

Key:  12345678::2021-01-01

Value: {cn: 12345678, from: "2021-01-01", to: "2021-02-14", start: "2021-01-01", end: "2021-01-31", fatContent: 3.5, choc: 1}

102 12345678 2021-01-01 2021-02-14 2021-02-01 2021-02-14 3.5% 2 tbsp Key:  12345678::2021-02-01

Value: {cn: 12345678, from: "2021-01-01", to: "2021-02-14", start: "2021-02-01", end: "2021-02-14", fatContent: 3.5, choc: 2}

103

deleted

Key: 12345678:2021-03-01

Value: null (tombstone record)

Note how the key and start/end dates are not the ugly technical dates but limited to the atual contract validity. That is a design choice where I chose not to expose technical details.

In this variant, we publish a record for the "lowest common denominators" in terms of validity. There is an event for each time window in which values are constant. Each change, leads to a new record.

Imagine viewing the validities of the values seperately, as they might be if we normalised the table:

Value January February March April...
Milk Fat Content 3.5 0.8
Chocolate Powder 1 2
Resulting Time Windows with constant values 3.5 & 1 3.5 & 2 0.8 & 2

Each change leads to a new row in the denormalised table and hence a new record in Kafka. The three events that are published are visible on that bottom row.

As an alternative, we could publish one event per contract, with validities inside the payload, as follows.


2) Denormalised Table, Variation 2 - One Event per Contract

Use Case records emitted
Contract Creation

Key:  12345678

Value: {cn: 12345678, from: "2021-01-01", to: "2025-12-31",

    fatContent: [ {start: "2021-01-01", end: "2025-12-31", value: 3.5} ],

    choc: [ {start: "2021-01-01", end: "2025-12-31", value: 1} ]

}

Change chocolate powder Key:  12345678

Value: {cn: 12345678, from: "2021-01-01", to: "2025-12-31",

    fatContent: [ {start: "2021-01-01", end: "2025-12-31", value: 3.5} ],

    choc: [ {start: "2021-01-01", end: "2021-01-31", value: 1},

                 {start: "2021-02-01", end: "2025-12-31", value: 2} ]

}

With this variation, we end up having to publish a list of values together with their validities.


3) Normalised Table, Each Attribute on its own Topic

The next solution is to publish each attribute on its own topic.

Use Case records emitted
Contract Creation

Topic: Contract

Key:  12345678

Value: {cn: 12345678, from: "2021-01-01", to: "2025-12-31"}

Topic: Fat Content

Key: 12345678::2021-01-01

Value: {start: "2021-01-01", end: "2025-12-31", value: 3.5}

Topic: Chocolate Powder

Key: 12345678::2021-01-01

Value: {start: "2021-01-01", end: "2025-12-31", value: 1}

   
Change choc powder

Topic: Chocolate Powder

Key: 12345678::2021-01-01

Value: {start: "2021-01-01", end: "2021-01-31", value: 1}

Key: 12345678::2021-02-01

Value: {start: "2021-02-01", end: "2025-12-31", value: 2}

   
Change fat content

Topic: Fat Content

Key: 12345678::2021-01-01

Value: {start: "2021-01-01", end: "2021-02-28", value: 3.5}

Key: 12345678::2021-03-01

Value: {start: "2021-03-01", end: "2025-12-31", value: 0.8}

   
Contract Termination

Topic: Contract

Key:  12345678

Value: {cn: 12345678, from: "2021-01-01", to: "2021-02-14"}

Topic: Fat Content

Key: 12345678::2021-01-01

Value: {start: "2021-01-01", end: "2021-02-14", value: 3.5}

Key: 12345678::2021-03-01

Value: null (tombstone record)

Topic: Chocolate Powder

Key: 12345678::2021-01-01 --> no change, so no record emitted

Key: 12345678::2021-02-01

Value: {start: "2021-02-01", end: "2021-02-14", value: 2}


4) Verticalised Table, One Topic for all Attributes

The final solution is to use a verticalised table in order to store the data. This has the advantage that you can dynamically add new attributes, and in fact each contract could have different attributes. This is akin to a schemaless document. The publication of records in Kafka becomes quite generic.

Use Case records emitted
Contract Creation

Key:  12345678::fatContent::2021-01-01

Value: {start: "2021-01-01", end: "2025-12-31", value: 3.5}

Key: 12345678::chocolatePowder::2021-01-01

Value: {start: "2021-01-01", end: "2025-12-31", value: 1}

   
Change choc powder

Key:  12345678::fatContent::2021-01-01 --> no change, no event emitted

Key: 12345678::chocolatePowder::2021-01-01

Value: {start: "2021-01-01", end: "2021-01-31", value: 1}

Key: 12345678::chocolatePowder::2021-02-01

Value: {start: "2021-02-01", end: "2025-12-31", value: 2}

   
Change fat content

Key:  12345678::fatContent::2021-01-01

Value: {start: "2021-01-01", end: "2021-02-28", value: 3.5}

Key:  12345678::fatContent::2021-03-01

Value: {start: "2021-03-01", end: "2021-02-28", value: 0.8}

Key: 12345678::chocolatePowder::2021-01-01 --> no change, no event emitted

Key: 12345678::chocolatePowder::2021-02-01 --> no change, no event emitted

   
Contract Termination

Key:  12345678::fatContent::2021-01-01

Value: {start: "2021-01-01", end: "2021-02-14", value: 3.5}

Key:  12345678::fatContent::2021-03-01

Value: null (tombstone record)

Key: 12345678::chocolatePowder::2021-01-01 --> no change, no event emitted

Key: 12345678::chocolatePowder::2021-02-01

Value: {start: "2021-02-01", end: "2021-02-14", value: 2}

My favourite is the first solution, as I find it to be the closest to the functional business requirements.

Another way to choose which solution to use might be to calculate the effect that the solution has on data volume (storage in Kafka; transport through your landscape; storage in replicates).

If you have other solutions, please get in touch.

Copyright ©2021, Ant Kutschera

Tags :
Social Bookmarks :  Add this post to Slashdot    Add this post to Digg    Add this post to Reddit    Add this post to Delicious    Add this post to Stumble it    Add this post to Google    Add this post to Technorati    Add this post to Bloglines    Add this post to Facebook    Add this post to Furl    Add this post to Windows Live    Add this post to Yahoo!

Using Reinforcement Learning To Learn To Play Tic-Tac-Toe

About a year ago I set myself the goal of writing an algorithm that could learn to play tic-tac-toe. I didn't want to tell the algorithm what the rules of the game are, nor did I want it to try and use some kind of calculation to look ahead at possible moves which might lead to a win from the current state of the board. I wanted the algorithm to "learn" how to play, by playing against itself. I knew nothing about machine learning, so I spent a bit of time learning about neural networks but didn't get very far and convinced myself that neural networks wouldn't work and I put the project to the back of my mind. A few months ago at a Christmas party, I bumped into an acquaintance, JP, and I ended up telling him about my goal. He suggested reinforcement learning and a few days later I started reading about it. It didn't take long before I was trying to understand Q-learning and failing fast by getting lost in the maths. So with my limited knowledge I went about iteratively designing an algorithm to "learn" to play. Here is the result, which you can play against interactively by clicking on the board or buttons. It contains no code telling it what good or bad moves are and the AI algorithm knows nothing about the rules of the game - it's simply told what the outcome of each move is (unfinished, won, drawn, lost). It doesn't even know that the aim is to get three in a row. All the AI algorithm knows is that X and O make alternate moves and that a square can only contain an X or an O or be emtpy.


The "clear" button clears the board and resets the game. The "reset stats" button clears the score board below the buttons. The "play autonomously" button causes it to start playing against itself and continue learning - it swaps side every hundred games and will occasionally explore randomly, according to its exploration strategy (see below). The "swap roles" button causes it to change sides. The "explore" button can be used to turn off exploration during games against humans.

Tic-tac-toe is played by successively placing 'X's and 'O's (markers) on the board until one side has three in a row, vertically, horizontally or diagonally. To start with, each cell on the board is empty (modelled as a null in Javascript). As the game progresses, the state of the board, a.k.a. "board state", changes. As an example, here is the board state after one move by X in the middle of the top row of the board:

  j=0 j=1 j=2
i=0   X  
i=1      
i=2      

Each state can be converted into a "pattern" which is built by concatenating the cell values from left to right starting with the top row and ending with the bottom row, delimiting the rows with a pipe character. Empty cells are represented using a hyphen ('-'). So the example above becomes: "-X-|---|---". The patterns are used as keys, which map to a list of possible moves from that board state (like a dictionary). To make things simple, I chose that X would always start regardless of whether it was the AI or the human moving first. For each possible move, the algorithm stores the final result that was achieved - a win for X, a win for O or a draw. After the first move, there are eight possible moves, after the second, only seven, etc. Results are recorded in JSON similar to the following, and here is an example showing what the algorithm has learned for the pattern where no move has yet been made:
const memory = {
    "---|---|---" : [
        {i: 1, j: 1, xWinRewards: 185510, oWinRewards: 20028, drawRewards: 60161},
        {i: 2, j: 2, xWinRewards:   1390, oWinRewards:  1910, drawRewards:   379},
        ...
    ],
    ...
};

Each element in the array represents a move which can be made from the board state represented by the key, with 'i' representing the vertical position, and 'j' the horizontal position. So 'xWinRewards' records how well X has done from this board state, in this case an empty board, after moving to the position indicated by i and j. 'oWinRewards' records how well player O has done, and 'drawRewards' shows how likely a draw is. The example above was taken after a hundred thousand games were played. It shows how the algorithm has learned that X is most likely to win if it starts in the centre of the board (position i=1, j=1). It also shows how moving to the centre is more likely to result in a draw than a win for the second player O, because it has gained more rewards for draws than for O winning. You can also see that far fewer games were played where the first player opened in the bottom right (2, 2) because it has gained less rewards for that move.

Notice the term "reward". The numbers recorded above are not the number of games, rather they are rewards (or points) that are given at the end of the game, when the algorithm walks through each move of the game, retroactively awarding points for each move. First it builds the pattern and then it looks that pattern up in its memory. It then adds a reward to the relevant "next move" which was made in the current game, adding it to the relevant field in the object shown above. Take for example the board after an initial move by X to the centre of the board. Let's say that O won later in the game, after moving to position 2,2 as their first move. So the pattern which is looked up is "---|-X-|---". In that memory location, the algorithm finds the object with i=2 and j=2. It adds a reward to the field named "oWinRewards". The algorithm then moves on to the next move which took place in the game and does the same thing for the pattern "---|-X-|--O", since that pattern represents the game after the second move.

Initially I used a reward of just one point. It soon became clear that the algorithm wasn't learning properly and it was still making "stupid" decisions, for example it wouldn't complete two in a row in order to win immediately and would prefer to move elsewhere rather than taking the victory. The reason is that in the beginning it only moves randomly (it has to explore since it has no experience to exploit, see below) and sometimes it would miss the chance to win immediately, but then win later in a game anyway. It learned that it could skip an immediate win and still be successful. The solution was to increase the reward for winning shorter games, so that the incentive was higher.

The next problem I encountered was that the algorithm was giving too much weight to moves which it had already encountered many times, even if a different move was actually better. It resulted in the algorithm doing what appeared to be silly moves, and not really seeming to learn. The solution was to chose the next move based on relative rewards, rather than absolute rewards. That way, the total number of times a move was tried was no longer relevant. Rather the chances of winning were what made the algorithm chose the next move. I later learned that there are different strategies here - you can base the choice of the next move on the average reward, or the best one that has ever been achieved, and the strategy relates to the reinforcement learning algorithm being used (for example Q-learning apparently uses the best reward rather than an average one).

Another factor which influenced how the algorithm learned was whether it learned against a player who always moved randomly or against itself. It became a stronger player when learning against itself, than against a random monkey. That makes sense because if you've only ever had to respond to random moves, you won't have learned how to play someone who knows how to win tic-tac-toe, where more skill is required.

Finally, the last important factor in learning is the trade off between exploration (to try a random move which might lead to learning something new) and exploitation (using existing knowledge to choose a "good" move based on experience). Just like a human, the algorithm starts by only exporing randomly, because it doesn't know better, but the more games it plays, the less it explores and the more it choses moves based on experience. The method used to decide whether to explore or not has quite an effect on how quickly and how well the algorithm learns.

So does the algorithm really learn? Let's take a look by asking some questions.

1) Does the algorithm know where the best starting location is?

Yes, because it knows that starting in the centre gives the best chances of a win. The rewards for the possible moves before any player has had a turn (pattern "---|---|---|"), ordered by reward with the best move at the top and the worst at the bottom are as follows:
{i: 1, j: 1, xWinRewards: 185510, oWinRewards: 20028, drawRewards: 60161} // => 72.0
{i: 2, j: 2, xWinRewards:   1390, oWinRewards:  1910, drawRewards:   379} // => 38.3
{i: 0, j: 0, xWinRewards:   1169, oWinRewards:  1526, drawRewards:   447} // => 38.1
{i: 2, j: 0, xWinRewards:   1291, oWinRewards:  2068, drawRewards:   419} // => 34.7
{i: 0, j: 2, xWinRewards:   1191, oWinRewards:  1920, drawRewards:   495} // => 33.9
{i: 1, j: 2, xWinRewards:   1155, oWinRewards:  2300, drawRewards:   402} // => 30.4
{i: 0, j: 1, xWinRewards:   1122, oWinRewards:  2372, drawRewards:   400} // => 29.2
{i: 1, j: 0, xWinRewards:    844, oWinRewards:  1678, drawRewards:   491} // => 29.0
{i: 2, j: 1, xWinRewards:   1079, oWinRewards:  2292, drawRewards:   474} // => 28.7
The reward that is shown on the right of each line, which is used when choosing the next move to make, is calculated as follows:
winRewards = player === X ? possibleMove.xWinRewards : possibleMove.oWinRewards;
drawRewards = possibleMove.drawRewards;
lossRewards = player === X ? possibleMove.oWinRewards : possibleMove.xWinRewards;
total = winRewards + drawRewards + lossRewards;
...
finalReward = (100*(winRewards/total)) + (10*(drawRewards/total)) + (-1*(lossRewards/total));

Remember, reward is calculated relatively rather than absolutely, and rewards given at the end of the game are higher if the game is shorter. Note that the above calculation also weights the rewards, so that the algorithm prefers winning over drawing, and drawing over losing.

If you were wondering whether or not the centre is indeed the best opening move, please read my previous article which shows that contrary to common belief, the centre is a better opening move than a corner. Wikipedia, xkcd and Google all get it wrong.

2) Does it know how to respond to an opening move in the centre?

Here are the rewards for the next moves after an opening in the centre (pattern "---|-X-|---|"):
{i: 2, j: 0, xWinRewards: 26920, oWinRewards: 3712, drawRewards:  9333} // => 11.5
{i: 0, j: 0, xWinRewards: 67866, oWinRewards: 8622, drawRewards: 35873} // => 10.8
{i: 0, j: 2, xWinRewards: 21060, oWinRewards: 2616, drawRewards:  5856} // => 10.8
{i: 2, j: 2, xWinRewards: 21014, oWinRewards: 2574, drawRewards:  6730} // => 10.6
{i: 0, j: 1, xWinRewards: 12096, oWinRewards:  668, drawRewards:   603} // =>  5.4
{i: 2, j: 1, xWinRewards: 12502, oWinRewards:  664, drawRewards:   595} // =>  5.2
{i: 1, j: 2, xWinRewards: 11850, oWinRewards:  592, drawRewards:   582} // =>  4.9
{i: 1, j: 0, xWinRewards: 11994, oWinRewards:  580, drawRewards:   589} // =>  4.8

My previous article and other sources like Wikipedias tic-tac-toe article and Martin Gardners book entitled Hexaflexagons and Other Mathematical Diversions agree that the best counter to an opening move in the centre is a move to a corner. The results above agree because all four corners have higher rewards (again shown on the right side) than the edges.

3) Does it know how to win on the next move when it nearly has three in a row?

Yes, it knows how to win instantly, for example with the pattern "OX-|-X-|O--", which looks as follows, where the winning move is the middle of the bottom row (2,1):

  j=0 j=1 j=2
i=0 O X  
i=1   X  
i=2 O    

The rewards for that pattern are:
{i: 2, j: 1, xWinRewards: 7104, oWinRewards:  0, drawRewards: 0} // => 100.0
{i: 1, j: 0, xWinRewards:  224, oWinRewards:  0, drawRewards: 2} // =>  99.2
{i: 1, j: 2, xWinRewards:   16, oWinRewards:  8, drawRewards: 0} // =>  66.3
{i: 0, j: 2, xWinRewards:    0, oWinRewards: 48, drawRewards: 0} // =>  -1.0
{i: 2, j: 2, xWinRewards:    0, oWinRewards: 56, drawRewards: 0} // =>  -1.0
So indeed, it knows the reward is highest for (2,1) and will move there to win. It also knows that it doesn't need to block the opponent's imminent win on the left edge (1, 0).

4) Does it know how to block the opponent from winning on their next turn?

Consider the pattern "XO-|-O-|--X" which looks like this:

  j=0 j=1 j=2
i=0 X O  
i=1   O  
i=2     X

It's Xs turn and the algorithm would need to block the bottom middle (2, 1) in order to stop O winning on their next move. Here are the rewards for that pattern:
1: {i: 2, j: 1, xWinRewards: 0, oWinRewards:  4, drawRewards: 4} // =>  4.5
0: {i: 0, j: 2, xWinRewards: 0, oWinRewards: 32, drawRewards: 0} // => -1.0
2: {i: 2, j: 0, xWinRewards: 0, oWinRewards: 32, drawRewards: 0} // => -1.0
3: {i: 1, j: 2, xWinRewards: 0, oWinRewards: 16, drawRewards: 0} // => -1.0
4: {i: 1, j: 0, xWinRewards: 0, oWinRewards: 32, drawRewards: 0} // => -1.0

The algorithm knows that doing a move at (2,1) is best and avoids the loss. It also knows that X has never won from here and can only draw by stopping the win for O.

5) Does it know about the strategy on Wikipedias tic-tac-toe page which shows how O can best counter opening moves by X?

Yes! The algorithm is clever! But not clever enough to read Wikipedia 😀
It's simply learned through playing, that these moves give the highest rewards. Here is what Wikipedia says:
Player O must always respond to a corner opening with a center mark,...
Try it out on the interactive board at the top of this article! The algorithm does this in three out of the four corner openings, strangely not when starting in top right. I guess it needs a little more practice on that opening move.
...and to a center opening with a corner mark.
Yes, the algorithm does this.
An edge opening must be answered either with a center mark, a corner mark next to the X, or an edge mark opposite the X.
For the bottom and right edges it responds by taking the centre cell. For the left edge it responds by taking the bottom left corner, i.e. next to starting position. For the top, it fails - I guess it needs some more practise here too.

Since we are pretty much able to answer "yes" to these five questions, I claim that it has indeed learned how to play tic-tac-toe. If you like that and think it's impressive, read on, because it gets better.

If you look at the interactive board at the top of this article, you will see a line which says "Unique games seen: 4330". In my previous article I confirmed that there are 255,168 unique games, of which only 29,600 are not classed as "stupid" because of missed wins or blocks. So if there are so many games, why has the algorithm only encountered less than 2% of them and why is it still able to play so well and why has it learned as shown above?

We can get a list of all patterns where the algorithm hasn't experienced every possible next move, by opening the Javascript console for the interactive board (right click on the board and select "inspect", in Chrome), and running the following snippet:
Object.keys(model.patterns).forEach(function(k){
    let len = k.split('').filter(function(l){ return l === 'X' || l === O;}).length;
    if(model.patterns[k] && model.patterns[k].possibleNextMoves &&
        (model.patterns[k].possibleNextMoves.length + len !== 9)) {

        console.log(k);
    }
});
2927 patterns have missing solutions, i.e. the large majority of them. Let's randomly take one and have a look, e.g. "X--|OXO|---":

  j=0 j=1 j=2
i=0 X    
i=1 O X O
i=2      

The algorithm has only experienced one move from there, although there are actually 5.
{i: 0, j: 1, xWinRewards: 8, oWinRewards: 0, drawRewards: 0}
The move to (0, 1) is actually a "stupid" move, because it misses the immediate win which it would get if it moved to (2, 2). But if we take a step back, Os last move was also "stupid" as it should have blocked that potential win. If we look at the rewards for the two patterns where O makes its second move to take the left or right edge, we see the following.

Pattern "X--|OX-|---":
{i: 2, j: 2, xWinRewards: 98, oWinRewards: 12, drawRewards: 21} // => 10.7
{i: 2, j: 0, xWinRewards: 16, oWinRewards:  0, drawRewards:  0} // =>  0.0
{i: 1, j: 2, xWinRewards:  8, oWinRewards:  0, drawRewards:  0} // =>  0.0
Pattern "X--|-XO|---":
{i: 2, j: 2, xWinRewards: 164, oWinRewards: 148, drawRewards: 4} // => 46.5
The algorithm has learned to block the potential win by X, and so it will never ever go down the path which resulted in the above game again. Moreover, if we try and recreate this pattern using the interactive board at the top of this article, we find that we can't because the algorithm has learned much better moves. If the algorithm starts, taking the centre, and a human follows by taking the left edge, the algorithm doesn't take the top left corner, because by taking the top edge it can force either an immediate win or a fork and so it wins in all cases. Equally, if a human starts as X and takes the top left corner, the algorithm will take the centre because it is a better defence, and so the board layout above will never happen. Finally, if a human starts in the centre, the algorithm will take the top right corner rather than an edge, because taking the corner is a better defence to an opening in the centre than an edge is. The algorithm only ever came across the pattern above during the exploration phase of its learning cycle. It has found the strongest moves starting from the opening and as such it doesn't need to bother learning about games where it ends up in a weaker position. Humans do the same if we think about chess - games where we move say the king with our second move don't make sense so we don't keep them in our toolbox of best moves. I personally find it quite impressive how the algorithm has homed in on optimal games and as such reduced its need to know every possible game. Originally I questioned whether or not this type of machine learning was "brute force" because it needs to play so many games before becoming strong. I assumed it needed to play many many unique games, and that after "learning" it was just going to use its massive memory to work out what to do. But it isn't actually playing that many games, it just takes time to work out which of the 4,330 games that it knows about are the best ones. If it needed to encounter all 29,600 games which don't involve stupid moves, or indeed more than that number, then I think one could claim that the algorithm uses brute force in order to become strong. But because that isn't the case, I am confident in saying that this algorithm isn't using brute force to learn. Having done the analysis above, I am a lot more impressed by the algorithm than I was the first time it beat me, which was quite an impressive moment in itself.

So if I am so impressed, am I scared of AI? Not really. It took me a long time to tune this algorithm so that it played reasonably and there are many dimensions which can be tuned:
  • Exploration: how long should it explore for and when should it no longer expore? If it explores too long, it just learns to play randomly. If it doesn't explore enough, my teenage son can easily beat it when he uses some unexpected moves.
  • Rewards: the question is really "what is the best move", and defining what "best" even means can be tricky - see my previous article. In the case of this algorithm, the rewards are two dimensional, because not only do they depend on whether the game is won/drawn/lost, but also how quickly, so that it learns to not miss immediate wins.
  • Experience: how many games does it need to play until it becomes good? I wasn't able to answer "yes" to my five questions until it had played 100,000 games.
Tuning those dimensions is something that takes quite a lot of time and I would say is something that still needs a human to do. As with all uses of software, one needs to work out if there is a business case for it. I must have spent 5-20 hours a week on this project over three months, let's call it a round 20 man days. I'm not sure I'd find someone who would have paid me to write this algorithm because who really cares that much about playing a futile game which will always end in a draw when played by experienced players? My current customer is introducing an algorithm which takes the description of the insurance case being created and based on that text it preselects some fields in the form. It's 80% accurate, it doesn't use reinforcement learning, more likely pattern recognition with neural networks. If it saves each user 10 seconds per insurance case, that's over a man year of work saved every year. Designing, implementing and maintaining the algorithm certainly took less time, so there is a real business case where there is a positive return on investment. As long as companies reinvest this return, rather than laying people off, I have no problem with it. Most of my career has been based on building software to increase productivity and I don't currently believe that AI will increase this productivity by orders of magnitude greater than what we have been achieving without AI and just good old fashion software.

Just a couple of closing remarks now:
  • The algorithm presented here learns by playing against itself. I did play around with training it against more sophisticated players to start with, and it seemed to improve the rate at which it learned. But I find it quite cool that I was able to remove that and get it to learn against itself. It's effectively taking the algorithm with no knowledge, throwing it into a dark room and saying "hey, go play, and I'll tell you the result of each game, you don't need to know the rules, you can learn them along the way". It learns the rules and the strategies to become a strong player!
  • The algorithm can be classed as reinforcement learning. But what about it's subclass, for example is it a q-learning algorithm? Not quite.
  • The title of this article could have been "Using *Deep* Reinforcement Learning To Learn To Play Tic-Tac-Toe", which would have required using a neural network to recognise patterns, rather than just matching known board patterns exactly. I played with this a little but couldn't train a network which worked at all. I believe that the problem is that unlike matching letters or images, similar looking patterns in tic-tac-toe or other games like chess, can actually have exact opposite outcomes, one leading to a win and the other leading to a loss. Imagine a game of chess where a king is moved by just one position so that the opponent could put it into checkmate - the patterns would be very similar, the outcomes very different. This is a different problem to classifying hand written letters. But I am really a novice when it comes to neural networks, so it's best if I don't comment further.
The complete code for this project can be viewed here: http://github.com/maxant/tictactoe/.

What are the next steps? I have spent a lot of time on this little project. I've learned a lot. One of my original thoughts was that a true test would be to introduce a new game, say Three Men's Morris and see if the same algorithm could learn to play that too. One day... but first it's time for something different...

Finally a few links that I used along the way: Copyright ©2018, Ant Kutschera
Social Bookmarks :  Add this post to Slashdot    Add this post to Digg    Add this post to Reddit    Add this post to Delicious    Add this post to Stumble it    Add this post to Google    Add this post to Technorati    Add this post to Bloglines    Add this post to Facebook    Add this post to Furl    Add this post to Windows Live    Add this post to Yahoo!

The best opening move in a game of tic-tac-toe

As part of a machine learning project, I had to understand tic-tac-toe better, and so I have written an algorithm which a) finds all the possible unique games and b) gathers statistical information about those games.

Based on Wikipedia's tic-tac-toe article, consider a board with the nine positions numbered as follows:

  j=0 j=1 j=2
i=0 1 2 3
i=1 4 5 6
i=2 7 8 9

Assume X always starts. As an example, take the game where X moves top left, followed by O moving top right, then X going middle left followed by O going top middle. These first four moves can be written down as "1342". The game could continue and once completed could be written as "134258769". It's not a perfect game because the first player misses a few opportunities to win and in the end it's a draw.

Every possible combination of moves making up unique games of tic-tac-toe are hence found somewhere between the numbers 123456789 and 999999999 (although probably iterating up to 987654321 suffices). Most of the numbers are illegitimate because each cell is only allowed to be filled once, so for example the number 222222222 does not represent a valid combination. In order to find every valid combination we simply start with that lowest number and iterate up to nine nines, attempting to determine if each number is a valid combination and if it is, record the results of the game.

In order to determine if a combination is valid, we use the following Javascript snippet, which checks that the nine digit number does not contains zeros and contains all of the digits 1 to 9:
//gameAsString contains the current number, e.g. '123456798'
let isValidGame = true;
for(idx = 1; idx < 10; idx++) {
    if(gameAsString.indexOf('' + idx) === -1 //only valid if every digit from 1-9 is found
       || gameAsString.charAt(idx-1) === '0') //only 1-9 are valid
    {
        isValidGame = false;
        break;
    }
}
            
If a number does represent a valid combination, the next step is to iterate through each digit starting at the left, to work out when the game is finished (it can be won after just 5 moves, or it can require all 9 moves, ending in a win or a draw). For example numbers starting with '12437' represent combinations completed after just 5 moves. Although there are 9999 such valid combinations, the game is only recorded once and all other combinations of numbers which start with 12437 are discarded.

A game is analysed using the following algorithm. The empty board is built by creating a two dimensional array of objects using indexes i and j as shown above. An element in an array of cells represents a cell on the board and has attributes 'i', 'j' and 'v' which records the contents of the cell, either null, 'X' or 'O', depending on whether a player has moved in that cell.
let board = buildEmtpyBoard();
let moves = '';
for(idx = 0; idx < gameAsString.length; idx++){
    let c = gameAsString[idx];
    moves += c; //initially e.g. '1', then '12', etc.
    let coords = convertIndexToCoordinates(parseInt(c)); //turns e.g. '3' into [0,2]
    if(board[coords[0]][coords[1]].v)
        throw new Error("cell " + i + "," + j + " is already selected by " +
                        board[coords[0]][coords[1]].v);
    let player = idx % 2 === 0 ? X : O;
    let rival = player === X ? O : X;
...
    //move
    board[coords[0]][coords[1]].v = player;

    let winner = checkFinished(board);
    if(winner){
        let o = uniqueGames[moves];

        if(!o){
            ...
            //it doesnt exist. update stats
            updateStats(gameAsString, winner, idx, relevantStats);
            uniqueGames[moves] = {}; //record this game so it isn't harvested again
        }

        break;
    }
}
The function checkFinished(board) simply checks whether or not the game is over (a draw, or a player has three in a row), and returns 'DRAW', 'X' or 'O' depending upon the outcome of the game. 'uniqueGames' is an object used to track whether or not the current game has already been seen. Using the example above the combination '12437' will be seen 9999 times, but to all intents and purposes it is just one game of tic-tac-toe so we discard all subsequent sightings of it. If the game is finished and has not yet been seen, the statistics are updated.

The full algorithm is available on GitHub and you can run it live in your browser by clicking on this link: stats.html, which takes a few minutes to run.

That page records the outcome of the game from the perspective of the first move. "cornerXWins5" shows how many times X won games that were only 5 moves long by starting in a corner. As part of updating statistics, the results are grouped into games which are:
  • "stupid" - the player has missed winning in the current move or missed stopping the opponent from winning in the next move; grouped under "stupid stats"
  • won with a "fork" - the winning player had more than one position to move on the board in order to win, when they won; grouped under "fork stats"
  • all other games; grouped under "stats"
The functions used to categorise the games are determineIfStupid and determineIfWinningWithFork.

First of all, notice how there are 255,168 unique games (shown near the bottom of the page after computation is complete). This correlates with other results.

The following table is copied from that page and consolidates these raw results to show how often the starting player can win when opening in the relevant location.

14652 14232 14652
14232 15648 14232
14652 14232 14652

For example, the first player wins in 14652 of the 255168 unique games if they start in the top left. This number comes from adding the number of corner wins after 5, 7 and 9 moves (X doesn't win after 6 or 8 moves, rather O does), for each group of game (0+992+1344 from "stats", 0+1984+0 from "fork stats" and 720+19776+33792 from "stupid stats"), and importantly dividing by four, because there are four corners - they are shared out over the four corners of the table above. Results for the edges are also divided by four, but since there is only one centre cell the result is left undivided.

This table demonstrates that the best location to start is in fact the centre of the board, because starting there wins the most amount of games. This is confirmed by Azahar Machwes blog article Tic Tac Toe: Statistical Analysis with arbitrary Grid Size (albeit for different reasons than he gives) which states:
For the first movers the best starting square by far is 5 (right in the middle and opens up 4 winning lines – most of any starting square, all other squares open up to 3!). Second best option then becomes the corners (1, 3, 7, 9).
A similar result is shown when considering how often the first player loses after they start in one of the following locations (this is done by consolidating results of games that end after 6 or 8 moves, i.e. when the second player wins, beating the first player):

 7896 10176  7896
10176  5616 10176
 7896 10176  7896

For example, the first player loses in 10176 of the 255168 unique games when they start in the middle of the top row. This also demonstrates that the opening move for the first player, which leads to the least number of games lost, is also the centre. Finally we can consider how many games are drawn when the first player opens in the relevant cell:

5184 5184 5184
5184 4608 5184
5184 5184 5184

The first player draws least when opening in the centre.

As such, the centre is the best position for the first player to start, as it leads to a) the most number of games won and b) the least number of draws and c) the least number of games lost.

But because we are considering every possible game, this statement does depend upon the players being random in their moves and not necessarily moving so that they win as soon as possible or block an opponent from winning on their next turn. For that reason, the results have been grouped so that we can also analyse what happens when a more clever player plays. Let us disregard all "stupid" games where a move to win instantly is missed or a move to block the opponent winning next is missed. We are now considering a more "perfect player". They are only "more" perfect, rather than exactly "perfect", because the algorithm used to ignore stupid games works as follows. First it checks to see if the player is missing a chance to win immediately. If they are, they count as stupid. Second, it checks to see if the opponent could win in their next move, and if they can, and the current player does not block them, they count as stupid too. Now consider the game "1234567", as it has some special properties. After "12345" the board looks like this:

X O X
O X  
     

O, the second player, needs to move to position 7 or 9 in order to attempt to block the fork created when X moved to position 5. Of course any move which O makes is futile, and as such, the move to 6 which O is about to make does not count as stupid, because there is nothing they can do at this stage to avoid losing. Perhaps Os move to position 4 should have counted as stupid, because it didn't block X from getting a fork in their next move. The algorithm doesn't go that far, and as such this game is marked as being non-stupid and won with a fork by X in 7 moves. Since it isn't as good as it could be, I describe the players as more perfect rather than perfect.

So, when considering these more perfect players, the number of games won by the first player moving to the relevant cell then becomes:

1080 1240 1080
1240  480 1240
1080 1240 1080

The centre no longer leads to the most number of games won, rather starting on an edge improves the chances of a victory. The number of games lost by the first player moving to the relevant cell is:

304 608 304
608 128 608
304 608 304

The centre is still the starting position which results in the least number of losses for the first player. Although the edge has become the starting position which results in the most number of wins (see above), it is also the starting position resulting in the most number of games lost. Draw statistics for these more perfect players look like this:

1668 2144 1668
2144  816 2144
1668 2144 1668

Starting in the centre still results in the least number of draws. The results of the last three tables are a little conflicting - is it best to start in the centre and avoid a draw or start on an edge which leads to most victories but also to most losses? One way of answering this question is to consolidate the three tables by combining the results using a "reward function" - a formula to sum the individual results which can be weighted if required.

Imagine writing an artificial intelligence algorithm which had to determine the best cell to move to. It might be trained to favour a win over a draw and a draw over a loss. The reward function could be:
reward = (100*oddsOfWinning) + (10*oddsOfDrawing) + (-1*oddsOfLosing)
            
The table of rewards for perfect play, using the above weightings, is:

1,509,144 1,464,864 1,509,144
1,464,864 1,605,264 1,464,864
1,509,144 1,464,864 1,509,144

The result in this case is that a first move to the centre is still the best. Even if we change the weightings, it isn't possible to tune the reward function so that the corner becomes the best opening move.

Maybe it depends on how "best" is defined. The popular webcomic xkcd posted a complete map of optimal tic-tac-toe moves, perhaps in 2010, which is apparently based on the work titled "Fractal Images of Formal Systems", by Paul St. Denis and Patrick Grim, which shows a more complete map, because it does not show just the top left corner where X starts in the top left. For some reason, only the top left corner of their image was reproduced by xkcd. A few people have asked why this is on the xkcd forum and the best explanation I could find is from someone using the name Algrokoz, who states:
Opening in the corner is optimal because of the number of chances it gives O to not play optimally, not because of total potential win scenarios.
That means a corner opening gives the opponent less places to go in order to play optimally, whatever "optimally" means. Assuming that the best counter to an opening in a corner is the centre, and assuming that the best counter to an opening in the centre is a corner, then yes, one could argue that there are four corners and just one centre, so it is better to start in a corner, because a random opponent has a one in eight chance of finding the best counter move. Compare this to opening in the centre where a random opponent has a four in eight chance of finding the best counter move. We will examine this logic in a bit...

Martin Gardner, an American popular mathematics and science writer wrote the following in the 1988 edition of his book Hexaflexagons and Other Mathematical Diversions
If played "rationally" by both sides, the game must end in a draw. The only chance of winning is to catch an unwary opponent in a "trap" where a row can be scored on the next move in two ways, only one of which can be blocked. Of the three possible opening plays- a corner, the center or a side box- the strongest opening is the corner, because the opponent can avoid being trapped at the next move only by one of the eight possible choices: the center. Conversely, center opening traps can be blocked only by seizing a corner.
The "traps" which he is talking about are what Wikipedia and I refer to as forks - plays which result in a winning opportunity in more than one cell, forcing the opponent to lose. In his statement Gardner is saying that the only place the opponent can avoid a fork, after an opening in a corner, is the centre. If we analyse just games that end in a fork, and search for what happens after two moves we find that when X starts in a corner and O takes the centre, there are still 24 games that X can win with a fork, so I am not sure that Gardners statement is correct. However of all the other second moves, countering with the centre results in the least amount of possible forks later in the game (this is easy to calculate by grouping game results by their two opening moves, see the "second move analysis" shown on the statistics page after the computation completes).

His logic however does seem to be similar to that of the xkcd comment. Interestingly, if we consider what happens when X starts in the centre and O takes a corner, there are also 24 forks by which X can win the game, and this too is the best counter move from the point of view of forks.

If we consider only games won by a fork (see the "fork stats" section on the page which calculates them), then we find that starting in a corner results in 496 (1984/4) forks, compared to only 320 when starting in the centre. Finally we have a situation where it is indeed better to start in the corner.

But is it correct to just consider games that end in a fork? I don't believe it is possible to force a player into a situation where they will be confronted with a fork. Think about the game presented earlier, "1234567", where a fork could have been avoided by O moving to the centre (5) instead of 4. If a player is clever enough to avoid a fork then the game will certainly end in a draw. And so we are talking about players who know how to avoid losing from the start. There is no "best" place to go in order to avoid a draw, when perfect players are playing - Wikipedia and my own analysis shows that when perfect players play, the only possible result is a draw. An optimal opening position can surely only exist when playing a player who might make a mistake or a move which is less than ideal. In that case it makes no sense to analyse only games which end in a fork and we end up having to also consider games that end otherwise. As we have already seen, a corner opening is not the best.

Let's return quickly to the argument about the number of places the second player can move to. Maybe Gardners argument is the same, that in total, because there are four corners and just one centre, there are more moves available to the opponent to avoid a fork, when opening in the centre. But in the same quote he has reduced the number of opening moves to just three, because of board symmetries, so such a conclusion wouldn't be logical.

The problem with Gardners statement (and the graphic published by xkcd) is that the Wikipedia tic-tac-toe page uses them to make the following claim:
The first player, who shall be designated "X", has 3 possible positions to mark during the first turn... Player X can win or force a draw from any of these starting marks; however, playing the corner gives the opponent the smallest choice of squares which must be played to avoid losing[16]. This makes the corner the best opening move for X, when the opponent is not a perfect player.
Reference 16 is the Gardner book that I reference above. The last sentence in the Wikipedia quote is problematic because of the phrase "when the opponent is not a perfect player". As argued above, if players are not perfect, we need to consider at least games ending in a fork and otherwise, but probably also some games which involve stupid moves. As shown in this article, an opening in the centre is better than one in a corner.

Why is all of this so important? I've been working on training a machine learning algorithm to play tic-tac-toe and I was surprised how it learned to open in the centre rather than a corner. I assumed Wikipedia was correct... As a result of this study I shall now endevour to get the Wikipedia page corrected, before a generation of children wrongly learn that opening in a corner is the best move!


Copyright ©2018, Ant Kutschera
Social Bookmarks :  Add this post to Slashdot    Add this post to Digg    Add this post to Reddit    Add this post to Delicious    Add this post to Stumble it    Add this post to Google    Add this post to Technorati    Add this post to Bloglines    Add this post to Facebook    Add this post to Furl    Add this post to Windows Live    Add this post to Yahoo!