Return to DNJ Online home page

 

The .NET Platform
Development Tools
COM & COM+
Data Access
Web Development
XML Technologies
Windows Servers
Wireless & Mobile
Security issues
Design & Process
Career Development
Analysis & Comment
Disposable Objects

Subscribe to our RSS feed to receive notification of new articles as they are published.

Events Diary
Software Update

About Us
Advertisers

 

You are not logged in: login here to access all areas.


Open for business

The new file format introduced with the 2007 Microsoft Office System opens up many business opportunities. Matt Nicholson finds out who is doing what.

Author: Matt Nicholson

Last updated: Jun 2007

The official name for the most recent version of Microsoft’s office productivity suite is not ‘Microsoft Office 2007’ but ‘the 2007 Microsoft Office system’. Rather more cumbersome but there is a reason for the change as, with this release, Microsoft’s primary focus has been to provide a set of tools not just for producing stand-alone documents, but to help you automate your office as a system.
     In most offices documents are not just written: they are drafted, compiled, corrected, updated, checked, approved, authenticated, tracked, delivered, archived and retrieved. Standard paragraphs are pulled in from other documents; data is inserted from databases or spreadsheets; comments are added, accepted or rejected and then cleaned out. Documents, spreadsheets and slide presentations have always had a lifecycle and the 2007 Microsoft Office system has the management of this lifecycle at its heart.
     A central component here is the Ecma Office Open XML file format that is now native to Office Word, Excel and PowerPoint 2007. As we have seen in our earlier articles, this is an open format, based on established industry standards in XML and ZIP, which has been designed around the notion that third-party applications should be able to access and edit parts within the document that are specific to their needs, without having to parse the whole document.
     Until recently, Microsoft Office has used a proprietary data format that can only be manipulated by the associated Office application. Office 2003 did introduce WordprocessingML and SpreadsheetML but these essentially saved the document as a single flat file of XML data, without the internal structure of Open XML. Furthermore, they had yet to be ratified as industry standards which meant third-parties were reticent to commit fully to their use for fear Microsoft might later make changes that could damage their applications.
     This was the thinking behind the European Commission’s call in May 2004 for Microsoft to submit its XML formats “to an international standards body of their choice”, stating: “... standardisation initiatives will ensure not only a fair and competitive market but will also help safeguard the interoperability of implementing solutions whilst preserving competition and innovation.” Now that Open XML is an industry standard, developers can work with it with greater confidence.

The internal structure of the Open XML format makes it possible to change the content of a document without having to parse the main document part itself. By editing the appropriate relationship we can swap Part1.xml content for Part2.xml within the document.

Direct access
The ability to work with Office documents without having to load the associated application can bring huge benefits in itself. Microsoft Program Manager Brian Jones, whose blog is well worth reading, cites a bank that uses Word 2000 to generate the paperwork for loan agreements. These agreements are constructed from a set of document fragments according to a set of rules. The bank currently has an installation of some 70 servers, each running Word 2000, which churns out thousands of such documents a year.
     Running Word or Excel unattended on a server so that you can manipulate Office documents through their APIs using COM Interop has become common practice, although fraught with problems. Such a solution is resource-intensive and the applications can crash which requires monitoring and automatic re-starts. Furthermore, it is not a practice that is supported by Microsoft.
     Using Open XML, the bank’s requirements could be satisfied with a relatively small .NET application. Thanks to the use of relationship parts by Open XML, the application would not necessarily have to know anything about WordprocessingML to generate documents that are perfectly readable from Word 2007 or, if the Compatibility Pack has been installed, any version from Word 97 onwards. Furthermore, the same throughput could probably be handled by a single machine.
     BPM Suite 4.5 is the latest version of Bluespring Software’s .NET application for designing, managing and monitoring business processes. Thanks to Open XML, this version no longer requires Microsoft Office to be installed alongside BPM Engine on the server, and users no longer need to include proprietary Bluespring tags in their documents. The suite uses Open XML to create Excel 2007 and Word 2007 documents on the fly and tailor them appropriately.
     Mindjet makes MindManager, a package that you use to create visual ‘maps’ for brainstorming and visualising business processes. MindManager has long had the ability to export its maps as Microsoft Word documents or PowerPoint presentations. However it has been limited in its ability to take changes made in Word or PowerPoint and incorporate them back into its maps.
     The Open XML format makes this much easier as MindManager can directly generate and read the necessary documents – indeed Richard Barber of Mindjet told us that tests had shown they could create Word 2007 documents ten times faster using Open XML than using the previous COM interface. Furthermore, MindManager Pro 6 is able to use Open XML to create ‘roundtrip’ buttons that are added to the Office 2007 ribbon menu when the document is opened (see ‘Customising the user interface’).
     The Word 2007 Map Editor for Mindjet MindManager actually embeds the MindManager map as a custom XML part within a macro-enabled Open XML document. In addition to this are parts defining the new controls that are to appear in the Word 2007 ribbon, the macros needed to make them work, and an XSL transformation that updates the embedded map by merging it with the Word document.

Separation of parts
CODA develops financial management software for medium and large organisations. CODA applications have linked to Office products quite extensively in the past, and particularly to Excel because it is the tool of choice for most accountants. CODA is currently working on project Neon and is testing Open XML for use in a future release.
     One aspect of Neon that could benefit from Open XML is the procurement process. This process can be initiated by emailing each potential supplier an RFQ (Request for Quote). This email contains a small Excel 2007 spreadsheet in which suppliers enter the prices, delivery dates and so forth that make up their quotes. When the email is returned the application automatically extracts the XML data from the RFQ spreadsheet and adds it to a master spreadsheet that is collating the quotes from each supplier. The user can then choose a supplier by clicking on a custom control in the Excel 2007 ribbon which updates the system with the necessary details.
     “Being able to transport XML data in and out of a spreadsheet means we can use it as a data source without having to do much in terms of translation,” says Tim Tribe, Head of Product Management. “Open XML will make it easier to present the data to users in formats they feel comfortable with, such as Microsoft Excel and Word.”
     The support in Open XML for custom data parts will allow CODA to include tracking information with each RFQ spreadsheet. CODA is also looking to use Word 2007 for order entry. Here a custom data part helps keep the Word document synchronised with the central database. “Having custom XML parts allows us to put information in a format that our application can easily deal with, away from the main data that is displayed to the user, and then reference it when we need to.”
     Here CODA has found the Packaging API which comes with .NET 3.0 very useful. As Tribe told us, “You can read the XML directly but it’s a lot more difficult to parse without the Packaging API. It helps separate out the data you need.”
     Tim Wallis of solution integrator Content and Code also feels that the ability to present information to users in a form that can be read using Microsoft Office applications is a real benefit. Content and Code has recently rolled out an Open XML-based solution which users access through Word 2003, thanks to the Compatibility Pack. “It is transparent to the user.”

Collaborative solutions
Open XML can also open doors when it comes to building collaborative solutions. As Paul Watson, a solutions architect at Edenbrook, points out: “The ownership of style, graphics and different elements of the content may lie in different parts of the organisation, or across different organisations altogether. Regardless of what software they are using, provided it supports XML they can publish those elements back to the central workflow which then aggregates and creates the document. It’s a much more flexible system going forward.”
     Content and Code is currently working with a travel company that is extracting data from a Unix-based service and pushing it into QuarkXPress publishing software running on the Apple Macintosh from where it can be turned into printed brochures and Web pages. The company is looking to replace this with Adobe InDesign, which can understand data in XML format, for the printed brochure. On the Web side they are looking to publish the data as a Word or PDF document.
     Wallis explains: “Imagine you are shopping on-line and you shortlist four or five holidays. You don’t want a big thick brochure which is expensive to print. Instead they can print their own customised ‘My top holidays for 2007’ brochure which they can view at home. It’s that sort of document production that’s really convenient.” The customer would download an Open XML document which they would then open on their desktop. Watson adds, “Once you have your data in an XML format, turning it into Open XML is a nice easy solution.”
     Indeed Open XML can become an alternative to PDF for transferring printed documents across the Internet. In much the same way as you click on a link to download a PDF document, together with the Adobe Reader if not already installed, you can download an Open XML document, together with Word, Excel or PowerPoint Viewer 2007 if you haven’t already got Office 2007 installed. The advantage from the developer’s point of view is that Open XML documents are easier to assemble. Furthermore, it opens up the possibility of the user editing the document and returning it for further processing.
     As Wallis says, “It’s bringing the commonality that HTML has across any type of phone or browser to the document format.” Watson points out, “A lot of workflows are content-driven so, if it is in an XML format, anything that can understand XML knows where to locate the right piece of information or property of that document, and can interact with it, change it, push it to a new end-point. It does really open up the doors for driving solutions around those qualities.”

Here to stay
Standardisation means that the Open XML format is here to stay. Ratification by Ecma International means that Microsoft no longer owns the format, but instead its future will be determined by a Technical Committee whose members include not only Microsoft and other industry leaders but also organisations such as the British Library and the US Library of Congress who have a strong interest in Open XML being around for a long time to come. Ratification by ISO can only strengthen this. Developers can work with Open XML confident that their solutions will work well into the future, and certainly beyond the next version of Microsoft Office.


Small is beautiful
One of the benefits of Open XML that is often touted is its small file size, with statistics suggesting that Word 2007 documents can be a third the size of their equivalent in binary format, and Excel 2007 documents half the size. This is of course largely down to ZIP compression but also because the tags used by WordprocessingML, SpreadsheetML and PresentationML are very short – often just a single letter.
In a world where even a budget PC can boast a 160Gb hard drive, this may seem irrelevant. However the small file size comes into its own when delivering documents to mobile devices over GPRS, or to clients in places that do not have broadband.
Open XML is also handy where data connections are intermittent or unreliable, such as in the developing world. The usual solution for remote users with low-bandwidth connections is a Web site, linking them to backend services through ASP.NET or the like. However this does depend on a reliable connection. If the information is instead delivered in Open XML documents then these can be downloaded in bulk when the connection is good, edited off-line and then returned for further processing at the server.
Furthermore, the standardisation of Open XML means that such users could be using OpenOffice.org with an Open XML add-on, so avoiding the cost of an Office 2007 installation.


Customising the user interface
One of the most obvious innovations in Office 2007 is the new ‘ribbon’ menu employed by Word, Excel and PowerPoint 2007. You can customise this ribbon by creating an XML part that is recognised by these applications as defining a ‘customUI’. Add this to your Open XML document, create the necessary relationship, and your customised user interface will appear when the user opens the document. Alternatively you can use the Custom UI Editor Tool which you can download from http://openxmldeveloper.org/articles/customuieditor.aspx.
This facility is not part of the Open XML specification but is an example of how the file format can be extended to include application-specific data. The screenshot below shows how MindJet has added a MindManager tab to the Word 2007 ribbon.

Send to a friend

Top of page

Click here for our Privacy Statement. Copyright © Matt Publishing. All rights reserved. No part of this site may be reproduced without the prior consent of the copyright holder.

Send to a friend

Introduction

Inside Open XML

Opening the package

Small is beautiful

Customising the user interface