Alright, here we are at the start of a new year - my first as a married individual (very exciting,
Office Professional Plus 2010!) - which means it's time for a few more posts to continue my focus on extensibility features in Word.
So far, in my series of posts on content controls, I've tried to describe what they do, and why we chose to add them as an important new part of Word 2007. I've done that by focusing on the way they let you control the document editing surface. As well, there's another part to the controls (which I think is super exciting) that I have only hinted at so far,
Microsoft Office 2010 Product Key, and that's how content controls let you integrate Word documents with data coming from other sources, providing true data/view separation in Word.
The Data behind the Document
Let's look at an example document – here's an example of a legal contract document that some of you might have seen before – a legal contract to sell a piece of property:
It's pretty obvious that the document has a lot of boilerplate text (that's invariant regardless of who the parties are to the contract), and some dynamic text (the important data for a particular contract, if you will). In fact, I've already marked up the dynamic data with content controls so that it's easy to fill out in the document:
So, we can basically think of the resulting document as having two parts:
The data The view (the boilerplate that surrounds the data)
That data can be handled independently of the rest of the contract – in fact, we can go and move around the content controls to completely redefine what you see on the page and the "data" of the document would be exactly the same. Now, we (as humans) know what the data of the document is just by looking at it, but how do we make it accessible in a way that's easily interoperable with other tools and code that wants to run on top of these Word documents?
To do that, we can take the data of that contract and put it in XML form, like this:
Now we've got a machine readable form (the XML) that's easily shared with any system that understands XML (which are numerous these days) as well as good for automatic processing; and a human readable form (the document including its content controls) that's good for you and me to look at and fill out the contract. All we need is a way to link these two forms together in order to have the best of both worlds: the machine and human readability of the content.
In Word 2007, we do that in two steps:
Storing the data in a special space in the document called the XML data store Mapping the data in that custom XML to content controls in the document. Custom XML with Office Open XML Documents
First, we load the data into the document in a way that:
Doesn't affect the printed page Keeps the data in the exact form we created (the XML we see above) Makes it easily accessible to any tool that consumes custom XML
To do this, we load the XML into an existing Word Open XML document in one of two ways:
Using the Word object model – specifically, the Document.CustomXMLParts.Add() method – to pull the XML into the document. The CustomXMLParts collection is the set of all custom XML documents which are being stored with a document (as there can be any number of them – for example, SharePoint properties are also stored this way) Directly manipulating the Office Open XML file format and adding the custom XML as a new distinct part (this is what the Word object model does "under the hood")
What we end up with is a document with a separate storage for our XML data, like this:
The XML still looks exactly the same as when we added it, it just now travels along with the document.
Mapping Content Controls to Custom XML
Finally, we need a way to associate the elements in the data with individual content controls, which is called XML mapping in Word,
Office 2007, by establishing a link between the control and an XML element or attribute by supplying one or two pieces of information:
An XPath expression which uniquely targets the element we want to map to (optionally) A specific piece of custom XML on which to evaluate the XPath. If this is omitted, then Word will try each available piece of data in turn, until it finds a match.
To do this, we again:
Use the Word object model – specifically, the ContentControl.XMLMapping.SetMapping() method – to specify the XPath expression Directly manipulate the Office Open XML file format and adding the mappings as a property of any content control's <sdtPr> element
Now what we have is a document with distinct data and presentation, but lots of links between the two:
Those links give us that "best of both worlds" I talked about – now, the document can be manipulated from either perspective:
When the user types into the controls, the corresponding data in the data store is updated in real time (so the custom XML is always live and up to date).This means that finding out the "data" of the document is as simple as pulling out the appropriate XML data store part. When the data is updated inside or outside of Word, the corresponding controls are updated – so the contract that you see can be changed simply by editing the custom XML that lives with the document. That custom XML has no Word-specific information in it, and is therefore extremely easy to read and/or write.
I know I'm understating it, but if you've ever tried to get data in or out of Word documents, this is a HUGE step forward (along with the Open XML Standard) as it makes getting this information in/out of documents vastly simpler than it was before.
I think I've covered a lot of ground for one post,
Windows 7 64 Bit, so I'll stop here. Before I continue (in which I'll dig into each of the pieces I covered here),
Office Professional 2010, some of the other members of the team are going to go through a real solution they built on top of this architecture, which is a cool way to hopefully understand it better.
- Tristan
<div