<< August 2010 | Home | October 2010 >>

Wordle Tag Clouds

This is really cool - a website for generating tag clouds based on text you give it: www.wordle.net. Here is an example generated for this blog:

Social Bookmarks :  Add this post to Slashdot    Add this post to Digg    Add this post to Reddit    Add this post to Delicious    Add this post to Stumble it    Add this post to Google    Add this post to Technorati    Add this post to Bloglines    Add this post to Facebook    Add this post to Furl    Add this post to Windows Live    Add this post to Yahoo!

JAX-WS Payload Validation, and Websphere 7 Problems

A WSDL file contains a reference to an XSD document which defines the data structures which can be sent to the service over SOAP. In an XSD, you can define a Type for an element, or things like the elements cardinality, whether its optional or required, etc.

When the web server hosting a web service is called, it receives a SOAP envelope which tells it which web service is being called. It could (and you might expect it does) validate the body of the SOAP message against the XSD in the WSDL... but it doesn't.

Is this bad? Well, most clients will be generated from the WSDL, so you can assume that the type safety is respected. Saying that, it's not something the server can guarantee, so it needs to check that say a field that is supposed to contain a date, really does contain a date, and not some garbled text that is meant to be a date. But more importantly, something which a client does not guarantee, is whether all required fields in the data structure are actually present. To check this, you can validate incoming SOAP bodies against the XSD.

The way to do this, is by using "Handlers". The JAX-WS specification defines two kinds, namely, SOAP Handlers and Logical Handlers. The SOAP kind is useful for accessing the raw SOAP envelope, for example to log the actual SOAP message. The logical kind is useful for accessing the payload as an XML document. To configure a handler, you add the HandlerChain annotation to your web service, passing it the name of the handler configuration file:

@WebService
@HandlerChain(file="handler.xml")
public class GislerService {

  private static final String FORMAT_DD_MMM_YYYY_HH_MM_SS = "dd. MMM yyyy HH:mm:ss";

  @WebMethod
  public String formatDate(ObjectModel om){
    SimpleDateFormat sdf = new SimpleDateFormat(FORMAT_DD_MMM_YYYY_HH_MM_SS);
    return "Formatted, its " + sdf.format(om.date);
  }
}


So, in order to validate a body against an XSD (i.e. to ensure that the "ObjectModel" instance in the above method is valid, before your method gets called by the WS framework), you use a logical handler to grab the payload as a javax.xml.transform.Source and pass it to a javax.xml.validation.Validator which you create using a javax.xml.validation.SchemaFactory and javax.xml.validation.Schema, based on the XSD from the WSDL. If the validation is successful, you let your handler return "true", otherwise you throw a javax.xml.ws.WebServiceException passing it the validation exception's text, so that any client making an invalid call can work out why it's invalid. It's something like this:

public class MyLogicalHandler implements LogicalHandler {

private Validator validator;

public MyLogicalHandler(){
try{
long start = System.nanoTime();
SchemaFactory schemaFac = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFac.newSchema(new URL("http://localhost:8084/Gisler/GislerServiceService?xsd=1"));
validator = schema.newValidator();
System.out.println("created validator in " + ((System.nanoTime()-start)/1000000.0) + "ms");
} catch (IOException e) {
e.printStackTrace();
throw new WebServiceException(e.getMessage()); //cant validate/initialise
} catch (SAXException e) {
e.printStackTrace();
throw new WebServiceException(e.getMessage()); //cant validate/initialise
}
}
.
.
.
public boolean handleMessage(LogicalMessageContext context) {

if (((Boolean) context.get(MessageContext.MESSAGE_OUTBOUND_PROPERTY)).booleanValue()) {
return true; // only validate incoming messages
}

try{
LogicalMessage lm = context.getMessage();
Source payload = lm.getPayload();
validator.validate(payload);
System.out.println("validated ok");
}catch(SAXParseException e){
System.out.println("validated failed: " + e.getMessage());
throw new WebServiceException(e.getMessage()); //invalid, so tell the caller!
} catch (SAXException e) {
e.printStackTrace();
throw new WebServiceException(e.getMessage()); //cant validate/initialise
} catch (IOException e) {
e.printStackTrace();
throw new WebServiceException(e.getMessage()); //cant validate/initialise
}

return true;
}
}


The attached Eclipse Projects were tested using GlassFish v3 (Glassfish Tools Bundle for Eclipse 1.2), and they worked well. To test required but missing elements, I wrote a JUnit test, which is in the client project. To test badly formed dates, I captured the SOAP HTTP Requests from the JUnit using the Eclipse TCP/IP Monitor, and modified them before resending them, with an invalid date string. Note that to make this work, you also need to modify the HTTP header's "content-length" using the monitor, otherwise you get some very strange errors, because the stream terminates early! An alternative is to use a tool like SoapUI.

Compared to Glassfish, Websphere didn't do so well - it has a nasty bit of functionality built in. Validating using a handler works fine, until your first invalid request comes in. Then, it still works, until another valid request is processed. After that, it completely ignores invalid elements like dates, if they are marked as optional (which is the default!). How come? Well, that took a while to work out, but basically, it's optimising the incoming message by arguing that if an element is optional, and happens to be present but invalid, the invalid data can be thrown away, because it's... well, optional... OK, I couldn't believe it either, but that's what it does, the SOAP handler that logs to System.out simply had the invalid element missing. So a little searching around on the internet, and a little luck, and hey presto this link which is a bug fix for Websphere 7 fix pack 9. By setting the system property "jaxws.payload.highFidelity=true", Websphere guarantees that the message passed to the handlers is exactly that which came over the wire. Tests showed that it indeed did fix the problem.

So what does all this mean, in the grand scheme of SOAP things? Well, when designing a service, you need to consider how important it is to have valid data. If you allow optional fields which are strongly typed, such as dates or complex types, then you could have a problem. Without adding a validating handler, it is possible that the caller could pass you optional but invalid data, and you wouldn't receive it in your web service, which is given a null reference, instead of invalid data! If you don't work with optional data, then you could skip validation, and just let null pointer exceptions fly around, in cases where a caller has passed invalid required data. By logging the incoming and outgoing communications, it makes it easier to debug such problems, but it is a little impolite to build a public interface which handles invalid requests like this.

The last consideration is performance. Creating a validator for a simple schema like in this demo only took around 20ms on my laptop. The actual validation took less than a millisecond for a valid request, and just over a millisecond for an invalid response. The reader is left to draw their own conclusions :-)


© 2010 Ant Kutschera

Social Bookmarks :  Add this post to Slashdot    Add this post to Digg    Add this post to Reddit    Add this post to Delicious    Add this post to Stumble it    Add this post to Google    Add this post to Technorati    Add this post to Bloglines    Add this post to Facebook    Add this post to Furl    Add this post to Windows Live    Add this post to Yahoo!

How Brontosaurs kill Raptors

The board of a large organisation appoints a CIO who delegates technical decisions to his employees. In the Enterprise Architecture (EA) department, an expert on Business Intelligence (BI) decides that the strategy for this company is to have a single Enterprise Data Warehouse (EDW). This makes sense, because experience has shown that in the long run, it is cheaper and more efficient to build one Data Warehouse, rather than start by building "islands" and then later try to merge them together.

In the mean time, some business people working for the company have rallied the budget holders, and started a project to build a new online sales application. They need to generate some reports based on the sales from this application, in order to set themselves targets and measure their performance. But, they are not experts when it comes to reporting. The business sector at which their application is aimed is cutting edge and brand new, so they don't know what to expect, they don't really know what kind of reports they will need, and they will certainly need to adapt to the market very quickly.

A project team in the IT department then get the assignment to build the application, including the report generation. Also knowing little about reporting, they start talking to EA, who set up a meeting, between themselves, the development team, some people from the Quality Assurance (QA) department and some experts from the BI team. Hours of discussion later, they have made the following conclusions:

  • Company IT Strategy states all reporting must be implemented in the BI solution, to ensure there is only one EDW,
  • The BI team are the only experts of the BI system,
  • The BI team cost a lot, because they are using SAP and consultants (how many SAP guys aren't freelance?),
  • The BI team dictate a six monthly release cycle, no exceptions,
  • The BI system is a black box, its inner workings stay secret within the BI team.


So, a simple question arises... "Our customer doesn't know exactly what they want, and because of their business requirements, they need us to be agile. What do we do and can the BI team help?", the development team ask. The first answer skirts around the issue and doesn't actually deserve being called an answer, because it doesn't answer the question. The development team make the point that the question hasn't been addressed, to which another attempt to avoid the question is made. Clearly, either these people don't understand their businesses needs, or they know they can't help but won't admit it. Either way, the debate moves on.

Days later, the business guys are telling some colleagues about their cool new project over an espresso and a latte. "Hee hee hee, hoo hoo hoo...", laugh their colleagues. "We thought our reports would let us track our progress too, but the IT department screwed us over. They charged us double and delivered nothing useful. We tried to submit some change requests to get it working and all the IT department wanted was more money and they couldn't deliver the changes quick enough anyway!"

The cool business dudes ran over to the IT office, informed their development team about the new information and said, "Guys, we don't want you to use BI. Well not for the first release anyway. Maybe in a year or two. Sure, it's policy and all, but we don't know what we want, and need to gain some experience, so just hack those reports out within budget please!".

So the development team arrange another meeting with the astronauts EAs and QAs to get that unanswered question answered, and they talk about the customers concerns. Even if it's an over reaction, the point is, that another department needs to be involved meaning more communication, less visibility and hence greater costs and risks. "Sure, no problem if the requirements are stable and known, but this project needs agility", is the point the dev team makes. "Well, no, not a chance!", comes the response.

"In fact," they continue, "you simply need to go back to the business and tell them that their idea is an architecture infringement - it goes against company policies. You won't pass QA, you won't get the go ahead to go into production, and that's the end of it", comes the reply.

A solution oriented response indeed... "Furthermore," continues an EA, " the business doesn't have the right to come into our shop and tell us where to implement things and where not to. We are the experts in the technology domain. We make these choices."

Well, his argument is that the business hired the CIO, who delegated responsibility to a BI expert, who created a non-agile policy. So the business have to swallow the argument and accept that the IT department is shit and won't give them solutions which they need, to compete in new markets... I'm not sure he realises that this is his argument, but it is, if you look at it pragmatically.

I'm not sure the cool business dudes see it that way either. I know I don't see it that way. And I know of no solution oriented architect or developer who sees it that way.

Of course I understand the reasons we have EA, QA, Processes and Policies. For large organisations they are the only way to avoid utter chaos, amongst many other reasons. Most of my working life has been spent with large organisations, even though in many cases I was allowed to implement agile solutions. But, after a long day in the office, with some frustrating decisions, I need to rant, so here is my cynical view of EA, QA, Processes and Policies, albeit somewhat harsh: these are all things which get bigger, the larger a company is, and all things which let people hide behind their responsibility of providing the business with the solutions they need. They all cause large corporations to be compared to large dinosaurs. These departments seem to have forgotten one golden rule: if the business fails, those departments won't exist. In competitive markets like we have today, businesses, no matter how large, need to stay competitive, and they can do so by using IT departments which are able to deliver agile solutions. These departments also often remind me of self-perpetuating witch doctors (The Witch Doctors, ISBN 074932645X). They justify themselves by stating that the business learned the hard way and now that they have been built up they cannot possibly be removed or have power taken away, like we some how never managed to go live before they existed, or that small companies can't exist because they don't have them.

Copyright © Ant Kutschera

Social Bookmarks :  Add this post to Slashdot    Add this post to Digg    Add this post to Reddit    Add this post to Delicious    Add this post to Stumble it    Add this post to Google    Add this post to Technorati    Add this post to Bloglines    Add this post to Facebook    Add this post to Furl    Add this post to Windows Live    Add this post to Yahoo!