File formats and structures
Before we can begin, we need to know that the familiar “.doc” file extension has been used on many different Microsoft Word file formats over the years. There are in fact only four major file formats, but each has several variations to cater for features introduced in the more than 20 major versions of Microsoft Word to date. These four main underlying structures of the saved files determine how numbering behaves in the product, and what goes wrong with it.
The four major file formats are:
- Word for DOS
- Word for Windows 1 and 2 and Mac Word 4 and 5
- Word 6 and 95 and Mac Word 6
- Word 97, 2000, 2002; Mac Word 98, 2001, X.
Within these formats, there are two basic
“structures”: The
original “text+formatting”
structure, and the newer “OLE
structured storage”.
Text + formatting formats
Word for DOS and the first versions of Word for Windows and Mac Word produce files that basically contain the text of the document in plain ANSI, followed by binary data that specifies the formatting. In the early 1980s this was a great advance on the traditional word processor file formats, which put the formatting commands in among the text.
In these formats, the numbering exists in the text as real text and remains stable once created. Unfortunately, people rarely use these versions of Word these days.
OLE Structured Storage
In order to make full use of linked and embedded objects from other applications (such as pictures, charts, or spreadsheets) Word versions 6 and later use OLE structured storage files. Each file is a nested series of containers like Chinese Eggs. For example, each paragraph is a container of characters. Each Section Break is a container of paragraphs. Each document is a container of Section Breaks.
Each container contains “something” (it may be a stream of characters, a set of Word drawing objects, or a spreadsheet, picture or other binary object). The container also contains one or more “pointers” that connect it with property tables that determine the formatting and behaviour of the contained “something”.
In Word versions 6 and higher, numbering is expressed as a property, a little container that contains a marker to say where in the text it is to print, and a pointer to an entry in a list of numbering formats that determines what kind of number it is, and how it increments. It is important to understand that this property does not contain the actual number. Word works this out “on the fly” just before it displays or prints the document.