Word to XML, Then and Now

The Scream by Edvard Munch
The Scream by Edvard Munch

I was lucky that last month’s XML Philly meeting didn’t trigger my post-traumatic stress syndrome. Quark’s presentation on their XML Author product took me back to the front lines, having done something similar with Word and SGML over a dozen years ago.  Quark says it always produces valid XML for any schema.  I can testify that it’s no small feat if true:  Although Word now produces XML directly, it’s a generic schema that represents formatting, not semantics.  Wasn’t this the schema Microsoft wanted to patent as a part of their contribution to “Open Standards”?  Anyway, this is still a hard problem with no obvious solution.

Their secret is that the plug-in completely replaces the implementation of the Word data model.  XML is always valid because users are always working in XML; there is no messy conversion between the flat, unstructured Word model and the deep, structured XML model.  What XML Author gets from Word is the familiar GUI and a clear list of features to support, like Track Changes.  In theory, this gets around several common XML acceptance problems:  Users don’t have to learn a new interface, and business owners don’t have to pay for two separate word processors on everybody’s desktop.

Both justifications fall apart under closer scrutiny. Authoring XML changes how users work due to structural requirements; in particular, cut-copy-paste between vanilla Word or different schemas requires skill and patience because of the always-on validation.  Although users won’t have extra icons on their desktops, the business will have to cough up significant licensing fees that will feel like having two separate, high-end products installed.  Quark was also pushing their professional services for getting things up and running–both an added cost and an indication that things aren’t as simple as they seem.

Then there’s the question that always comes up at these meetings:  What if you share XML documents with people outside your company? There might be something webbie in the future, but for now let’s not even go there.

We didn’t get a live demo of the product, and an acquaintance who evaluated it warns that it’s not ready for prime time if your business depends on complex XML or heavy-duty Word features.  I would also be wary of the product constantly lagging behind Word features because it is essentially a reverse-engineered product, and it’s an acquisition that Quark’s still trying to fit into its existing product line.  Still, it’s easier than trying to mimic, maintain, and synchronize XML structures in actual Word documents.  I have the scars to prove it.