Tuesday, May 27, 2008

Programmatically Build a Word 2007 Document Part 2

In the previous post, we gained access to the Open Office object model. In this post we will take the different parts and access XML to modify the contents of the document.

Once you have a part of the document, such as the MainDocumentPart or HeaderPart item, you can access the XML in a similar fashion:

StreamReader sr = new StreamReader(mainDoc.GetStream());

If you want to simply replace a block of text with another block of text, you can pull back the XML as a string:

String docText;
docText = sr.ReadToEnd();

Then create a Regular Expression to search for the text you want to replace:

Regex regexText = new Regex("Hello world!");
docText = regexText.Replace(docText, "Hi Everyone!");

To write back the results of your XML change, use an StreamWriter:

StreamWriter sw = new StreamWriter(mainDoc.GetStream(FileMode.Create));
sw.Write(docText);

You can also use XPath queries to gain access to certain nodes to delete blocks of text or add new blocks of text. XPath is a way to travel up and down an XML document and pull back the data you want. XPath is too deep a subject in this post, possibly in another post. With the release of .Net 3.5, in Visual Basic you can type inline XML so you could build the nodes of your document that way as well.

Once you write the data to the streams, don't forget to close the packages and then flush them to make sure that the data is written to the underlying stream and/or file.

Pack.Close();
Pack.Flush();

It is an involved process to build a word document programmatically but the benefits that the Office Open formats provide make it worthwhile to learn this new API and take advantage of it.

Labels: , ,

Monday, May 19, 2008

Programmatically Build a Word 2007 Document Part 1

Today I was assigned the task of building a Word 2007 document programmatically and then uploading that document to a SharePoint document library. The second part was not that hard as I had already written a block of code that would upload a word document to a SharePoint library. Building the word document shouldn't be that hard as Office 2007 documents are simply XML files. I was able to build a simple document with text in it. The problem came when I tried to add a header to the document.

The new XML format for Office files makes it interesting to build. Each element lies in its own separate XML file and there are relationship files that determine how all of the files relate to each other within the package. If you rename a typical .docx or .pptx or .xlsx file to a .zip extension, you can open the file up in WinZip or as a Windows Compressed folder. The first think you will notice in a .docx folder is that there is a rels folder and a word folder. The rels folder contains all of the package level relationships and determines where the rest of the data files lie. The word folder contains these data files. Within the word folder is a relationship file and at least a document.xml. There could also be a style.xml, header.xml, or footer.xml among others. The folder structure will look similar to the following drawing.

The relationship file that resides in the word folder relates the various pieces that make up a word document: the main document, the header, the footer, and any styles related to the content in the document.

To access this object model in a more efficient way, you can download the Microsoft SDK for Open XML Formats. The Microsoft.Office.DocumentFormat.OpenXml.Packaging class will expose some objects to make it easier to construct or modify a Word 2007 document.

The first class you will want to access is the WordprocessingDocument class. You create an object in code similar to this:

WordprocessingDocument myDocument = WordprocessingDocument.Open(myPackage);

where myPackage is a System.Xml.Packaging Package. You can also create a WordprocessingDocument from a Stream or a file.

Once you have the WordprocessingDocument, you can get back the main part of the document through the MainDocumentPart:

MainDocumentPart mainDoc = myDocument.MainDocumentPart;

The MainDocumentPart represents the main part of the document which resides in the document.xml file.

You can also gain access to headers and footers:

IEnumerable<headerpart> headerParts = mainDoc.HeaderParts;

You have to create an IEnumerable interface of the type of part you want to enumerate. You can then enumerate the headerParts collection. The reason that headers and footers are collections is that you can have multiple headers and footers in a document.

In the next part I will get back the XML and modify it.

Labels: ,