How do I generate an index in Word?

Article contributed by John McGhie

The Microsoft Word Help suggests that you can automatically generate an index.  Sorry, but you can't (the "result" looks like an index, but the reader can't use it).  You can automatically mark index entries: however, the amount of work required to edit the result into a useable index is usually double the effort required to manually mark the index entries one-by-one.

Instead of automatically generating something that is not useable, the reader would far prefer you to express the document electronically and provide a free text search. A free text search serves the reader's needs far better than a badly-constructed index, and the search engines available these days are smart enough to look for what the reader wanted rather than what he or she asked for.

Making an Index

An experienced technical writer wrote this article. As a technical writer, I produce long documents running to thousands of pages of technical material. Indexes are part of my game. I can't tell you how to produce one automatically, but I can tell you how to produce one easily!

Before 1990-ish, Indexing was a profession of its own; in addition to an Author and an Editor, a large book had an Indexer. Even today, if you are making a book such as a medical encyclopedia that is going to remain in print for many years, it is simply stupid not to use a professional indexer. Really good indexes are an even mix of science and art form, and the quality improvement a professional makes is well worth paying for. Of course, few of us these days work on publications that are going to last long enough to justify this effort. And even fewer of us have the time to produce such an index. If you do have the time, obtain a copy of Indexing, The Art of by G. Norman Knight (Allen & Unwin, ISBN 0-04-029002-6).  Norman Knight is a former President of The Society of Indexers, and his book is simple and charming. Reading it, you will soon realize that indexing is not difficult; it simply takes attention to detail and patience.

Planning the Job

Word has one of the nicest and most powerful index generators around built right in, so you have all the tools you are going to need. You need to allow a week per 500 pages to generate an index in a technical book. Technical publications are fairly information dense. Scholarly monographs and the like are usually quicker to index.

Types of Index

In the old days (say, 1995 or thereabouts!) indexes were all produced by the shoebox method. They literally used a shoebox into which they inserted index cards: three-inch by five-inch cards upon which they wrote the index term and its page number. The Indexer would sit with a large pile of galley proofs, single-page images as they were returned from the typesetter, and go through each one line-by-line seeking and recording the index terms. At the finish, they typed the index out with its page numbers and sent it off to the typesetter for publication.  There is a software tool specially built for indexing that emulates this process exactly. I tell you this simply because, in certain circumstances, this method is still the best today.  If your document is going to be published from a different computer to the one it is being created on, and that machine cannot interpret Microsoft Word XE tags, and you do not know what the page numbers are yet because the other machine is going to do the pagination, then use the shoebox method!

Word will do two forms of index: The Concordance Index and the Mark-up Index. It will also do something half-way in-between, using its Mark All command.

Mark-up Indexes
A Mark-up index is the method I recommend.
  It's quick, accurate, easy to understand, and easy to correct.  With a little care in the planning, it normally results in a very useable index.

As the term implies, you produce a mark-up index by embedding mark-up tags in the Word document. Word automatically looks up the page numbers at Print time and generates and formats the index for you. Study the help topic Create an index and all its sub-topics. This is the way I recommend.  It's the way that all good writers create an index these days. Mark by mark, page by page!  It is explained in detail below.

Concordance Indexes
I implore you not to waste your time with a Concordance Index for most publications. It results in a huge pile of rubbish that is of very little use to the reader. And it takes nearly as long to make as it does to generate an index properly. The Concordance Index is a hangover from the past when people were desperately hoping to produce an automatic index to reduce the labor. Every major word-processor will do them, and no professional writer or editor would, these days, permit one.

To make a Concordance index you make up a table of all the terms you want Word to find in one column, and the index entry you want to see for each term in the other. For more information, see Create a concordance file in the Word help file. But the end result is that you have every term indexed at EVERY place it occurs. Most of the mentions of a term in a book are simply passing references: what the reader wants to see in the index is only one page number; the one that contains the main topic for the term. If you send them on a wild goose chase to 20 other places first, they will think most unkindly of you.

The concordance mechanism does have its place:  It can often be used to good effect in Reference Books such as Programming Reference Manuals, where each command or function is referred to only in a small section of the text, then rarely mentioned anywhere else in the book.

For the truly adventurous...

Technical writers and other folk who publish seriously-huge documents in HTML may want to spend a little time learning about Concordance Indexes.  In conjunction with VBA, a concordance index is a great way to automatically generate hyperlinks in your document.  You tag every mention of each term with the concordance indexing mechanism, then use VBA to change the tags into hyperlink tags.

Indexing Made Easy

Here are some worthwhile hints I can give you so you do not go mad during the process: 


Print a copy of the book and go through it with a highlighter, marking the items you would like to see in the index. If you are not the subject-matter expert, get someone who is expert in the subject to do this for you (the process is massively easier if you understand the subject well). Mark only places where the reader will get information about each item. For example, if you want to include installation procedure, you would mark Follow the procedure below to install... in Chapter 1, you would not mark if you completed the installation procedure... in Chapter 5. The first is what the reader would expect to see when he looks up 'Installation Procedure'. The second might cause the reader to come and look you up {grin}.


Make some design decisions before you start putting codes in the file. The most important are:


How many levels of entry are you going to allow? If it is more than three, I will personally come and shoot you! Such an Index is both unusable and unmaintainable {grin}.


Are you going to reverse the terms? Indexing, the art of or The art of Indexing? Normally do the former, but whichever you select, you must do it for every entry


How will you treat numbers? All as if they were spelled out; or all up the front above the As? In technical books, do the second, but whichever you do, you must do it for every number.


Will you use see references to condense the index? My vote in modern times is: No, don't bother.

See references mean the reader finds the index entry, then has to go find another index entry before they can find the page. It annoys your reader, it doesn't save much paper, and these days paper is not very expensive.


Will you put the Table of Contents in the Index? Debate rages in the more pedantic Indexing circles about this one.

The pedants (sorry, purists) say you should not include in the index terms that are contained in headings in the table of contents. I say: Of course you should.  Research shows that some people (about 35 pct) look in Tables of Contents, some people (about 60 pct) look in Indexes. Few readers these days have a clear picture of the conceptual difference between them, and each reader will secretly thank you if he can find what he wants in both places. I always include an index entry for every heading in the book. So shoot me!


Sort order: Word-by-word or letter-by-letter? By default, Word does the former. Purists like the latter: I don't; I can never find anything in such an index, and most readers hate it. So shoot me again! To produce a letter-by-letter sort, you have to place the generated index in a two-column table (page numbers in one column, text in the other).  Then copy the text column, remove the spaces from it with Find/Replace, then shift that column to all upper-case and sort by it. Then remove the uppercase column and turn the table back into text.


Avoid the classic hilarity of putting the book in the Index. If you are writing a book called All About Word you may get sued for a laughter-based injury if you include Word as a term in the index. But for your own amusement, have a look in the indexes (not indices!) of a few cheap-and-nasty technical manuals such as are often produced in-house as training manuals. You will be surprised how often you see this classic faux pas. And you may immediately become suspicious that you are looking at an automatically generated index!


Now run through and tag the entries you have highlighted, according to the instructions in the help topic Mark index entries. Unfortunately, if you have made a few indexes, you will know how to do this, and if you haven't, your first attempt will contain errors. Sorry: I had to go through this too {grin}.

I will give you a hint that will save you a bit of time (quite a lot, actually...) Do not put in the subentries at this stage.  By that I mean tag each item as a main term.  If the entry does belong as a subentry, you will find that you can add the main term to the tag more simply on your second pass.

A Word About Tagging:

 Word's index tags are both case-sensitive and "space-sensitive".  "Installing" and "installing" are not the same thing: each will appear under its own heading.  "Administration" and " Administration" are not the same thing: one will sort right at the top of the index.  See?  When you are debugging "entries out of sequence" you sometimes have to look extremely closely to ensure that the tags really do match exactly.

To enter an index tag in a heading, ensure that your headings are formatted by styles, and do not apply any formatting overrides to the heading. If you apply direct formatting to the headings that contain index tags, the direct formatting will be copied through to your Index.

A colon : and a semicolon ; are not the same thing!  You use colons to divide the levels of sub-entry in your index tags.  When you are in a hurry, it is too easy to type the un-shifted character (the semi-colon) instead of the shifted character (the full colon) in the tag.  If you do, you will get some very weird errors in your generated index.  There's no easy way to find these, but the semi-colon will appear in the index.  If you have strange things happening (items that do not appear under their correct entries or sub-entries) try searching your generated index for semi-colons.  If you find any, at least you know "what" is wrong: finding the tag that produced the problem is a real chore (it will not be on the page in the index...).  Try this:  Reveal your hidden text (so you can see your XE tags) then search for a semi-colon with the font format  hidden text.  If you find any, chances are they are in your bad index tags.


Now generate the index. Ignore the formatting at this stage; just print it. Leave it as a single column for ease of reference. If you have a big screen, you can open a second window into the document and look at the index that way (see the Window menu) but for most, it's easier to print the first result.


Now sit down with a colored pen or pencil (you can't see blue or black against black type...) and edit the index.


Mark all the terms that should become sub-entries, and show the term they should be sub-entries of. 


Now run down it, and for each term, ask yourself What else could the reader possibly call this? Add an entry for each.


Run down it again, and for each term, ask yourself Is there anything else the reader would need to know about when looking this up? Add a See also for each one you find.


Go through and edit the tags in the file to implement the changes you have identified.

You can find index tags easily by using the Browse buttons on your vertical scroll bar (see Browse to the next or previous page, table, or other item in the help).

In later versions of Word (2002 and above) you can use Ctrl + G to bring up the "Go To" dialog.  Set "Go to what?" to "Field".  Set the Enter field name box to "XE".  Click Next, then Close.  Your "Previous" and "Next" browse buttons (at the extreme bottom right corner of the Word window, under the vertical scroll bar) will now go to the next or previous index entry fields on each click, until you change to something else.

If you use Find, or Browse by Find, you can specify ^d XE as your Find string to find only index tags.

If you know exactly what the text of the tag is, you can use ^d XE "tag text string" to find exactly that tag.  However, this requires you to work out exactly what the tag content will be, and that's not easy three levels down in an Index. 

So I prefer to use Ctrl + G, Page Number (from the index), then Ctrl + F, ^d (to find the next XE tag.  Then keep hitting Browse Next to find the tag you want.


Now regenerate your index. (Click in it and press F9). You can now change it to double-column if you wish. You format an index by using Format>Style to change the styles Index 1 through Index 9. Each style controls the formatting of one level of entry.

Page Number Conflation

Page number conflation is where only the first and last page numbers appear for a topic.  In the index you see 88 - 95 instead of 88, 89, 90... 

I am very tempted to say "don't bother"!  Tag the first instance of each term.  If your reader does not have the brains to see that the information on a topic continues for several pages, they should be kept away from your book in case they hurt themselves...  However, if you absolutely must conflate, this is the way to do it:

See!  It isn't that hard

There! That's the way I do it. If you trust me and do it that way, you will find out why I do it that way. If you don't trust me and do it another way, you will find out why much sooner {grin}.