WordPerfect to Word converters (and why none of them are perfect)

Article contributed by Hafiz

I worked in Word Support at Microsoft and saw the evolution of WinWord from v1.1a through Word97. WordPerfect conversion was one of the biggest challenges/headaches that we faced, the others being envelope printing and print-merge. The latter two were eventually worked out, especially as printer manufacturers improved their media handling and wrote printer ROMs to support standard envelope sizes. However, WordPerfect conversion remains a hugely thorny problem.

The root of it (from a Word perspective) is in WordPerfect's document structure, with its embedded codes. This is a completely different theory of what constitutes a document, at a very basic level. Folks, this isn't “apples and oranges,” it's more like “apple” and “triangle.” WordPerfect looks at a document as a stream running from top to bottom; the embedded codes change the behavior of that stream as it passes through a theoretical window, either the view onscreen or being processed by the printer driver (which, for WordPerfect's first Windows versions, was still proprietary because they couldn't shoehorn their format into Windows' printing model very succesfully).

Word looks at a document as a structure, the basic building blocks being character, paragraph, and section. The text is stored in the file as pretty much plain text (with a very few oddments of control characters and the occasional field code) and the formatting itself is stored in binary form at the bottom of the disk image of the file with pointers into various offsets in the text. (This is why there is no “reveal codes” in Word; there are literally no codes to reveal. It is also an extremely efficient way to define formatting; there are no moving parts, only values to be changed in the formatting tables as the document changes. It also makes the retaining of formatting (or merging of formatting) during drag-and-drop editing (first seen in Word 2) much easier to implement. WordPerfect has to actually move strings of characters – its formatting codes – around in the document, and this imposes a lot of overhead. This has changed somewhat in recent versions, as WordPerfect has moved to a more pointer-based format.)

When you try to move between these models, there are inevitable breakdowns. There are things in both products that simply don't have an equivalent in the other. Add to that the fact that a converter must make several passes on the WordPerfect document to resolve all the embedded codes, and it becomes a very difficult process to do accurately. Within the industry, WordPerfect is considered the most difficult format to convert accurately by far; it's not that WordPerfect intended to make it that way – it just is. In conversations we had with (then) Aldus Corp. technicians over this (and other technical support war stories), we knew they had the same kinds of problems. PageMaker has a pointer-based format similar to Word, and Aldus finally just called it good and lived with a limited converter from WordPerfect to PageMaker, citing a lengthy list in a Readme about things that wouldn't convert.

In versions of Word through Word2c, a firm specializing in format conversions wrote the WordPerfect converter for Microsoft. It never quite cut the mustard. During the development of Word6, it was decided that we bail on that converter, since accurate WordPerfect conversions, especially round-trip conversions (which add a whole “nother magnitude of complexity” to an existing nightmare) had to form one of the cornerstones of getting market share from WordPerfect. A huge chunk of money and person-hours was budgeted to making the best possible WordPerfect converter (which was rewritten from scratch by Microsoft itself) and it says something that the WordPerfect converter binary is something like two-and-a-half times the size of any of the others. Word Support collected literally thousands of users' real world WordPerfect documents and put them through the converter to smoke out its bugs and deficiencies, and there are dozens of concatenations of formatting that were “special-cased” by the converter. (It helped that the designated liaison between Support and Development was quite knowledgeable about WordPerfect and was quite willing to make Development look at areas where WordPerfect did things better than Word.)

This really is as good as it gets. Believe me.

With WordPerfect documents that are skilfully constructed by knowledgeable WordPerfect users, the converter is pretty durn good, even in round-trip conversions. It is drastically better than any from third parties and far better than the Word converter used by WordPerfect itself. However, WordPerfect's embedded code formatting instructions lend themselves to a degree of convolution in unskilful hands that just makes my head hurt, and there is nothing that can be done about this and about the features idiosyncratic to WordPerfect (often themselves artefacts of their formatting model) that have no near equivalents in Word, and vice versa.

I hope this gives you a better idea of what the issues are. In some cases, this conversion just will not go smoothly. If you have a WordPerfect–Word conversion problem, do contact Word Support about it, especially if you have real world documents that you can submit. I assure you they will be carefully scrutinized.

Word Support, 1990–1997

See also:
How Word differs from WordPerfect
Is there life after "Reveal Codes"?