UNIX Power Tools

UNIX Power ToolsSearch this book
Previous: 29.4 Inside spell Chapter 29
Spell Checking, Word Counting, and Textual Analysis
Next: 29.6 Counting Lines, Words, and Characters: wc
 

29.5 Adding Words to ispell's Dictionary

ispell (29.2) uses two lists for spelling verification: a master wordlist and a supplemental personal wordlist.

The master wordlist for ispell is normally the file /usr/local/lib/ispell/ispell.hash. This is a "hashed" dictionary file. That is, it has been converted to a condensed, program-readable form using the buildhash program (which comes with ispell), to speed the spell-checking process.

The personal wordlist is normally a file called .ispell_words in your home directory. (You can override this default with either the -p command-line option or the WORDLIST environment variable (6.1).) This file is simply a list of words, one per line, so you can readily edit it to add, alter, or remove entries. The personal wordlist is normally used in addition to the master wordlist, so if a word usage is permitted by either list it is not flagged by ispell.

Custom personal wordlists are particularly useful for checking documents that use jargon, or special technical words that are not in the master wordlist, and for personal needs such as holding the names of your correspondents. You may choose to keep more than one custom wordlist, to meet various special requirements.

You can add to your personal wordlist any time you use ispell: simply use the I command to tell ispell that the word it offered as a misspelling is actually correct, and should be added to the dictionary. You can also add a list of words from a file using the ispell -a option. The words must be one to a line, but need not be sorted. Each word to be added must be preceded with an asterisk. (Why? Because ispell -a has other functions as well.) So, for example, we could have added a list of UNIX utility names to our personal dictionaries all at once, rather than one by one as they were encountered during spell-checking.

Obviously, though, in an environment where many people are working with the same set of technical terms, it doesn't make sense for each individual to add the same word list to his or her own private .ispell_words file. It would make far more sense for a group to agree on a common dictionary for specialized terms and always to set WORDLIST to point to that common dictionary.

If the private wordlist gets too long, you can create a "munched" wordlist. The munchlist script that comes with ispell reduces the names in a wordlist to a set of word roots and permitted suffixes according to rules described in the ispell(4) reference page that will be installed with ispell from the CD-ROM. This creates a more compact but still editable wordlist.

Another option is to provide an alternative master spelling list using the -d option. This has two problems, though:

  1. The master spelling list should include spellings that are always valid, regardless of context. You do not want to overload your master wordlist with terms that might be misspellings in a different context. For example, perl is a powerful programming language, but in other contexts, perl might be a misspelling of pearl. You may want to place perl in a supplemental wordlist when documenting UNIX utilities, but probably wouldn't want it in the master wordlist unless you are documenting UNIX utilities most of the time that you use ispell.

  2. The -d option must point to a hashed dictionary file. This is a large file and time-consuming to build. What's more, you cannot edit a hashed dictionary; you will have to edit a master word list and use (or have the system administrator use) buildhash to hash the new dictionary to optimize spelling checker performance.

To build a new hashed wordlist, provide buildhash with a complete list of the words you want included, one per line. (The buildhash utility can only process a raw wordlist, not a munched wordlist.) The standard system wordlist, /usr/dict/words on many systems, can provide a good starting point. This file is writeable only by the system administrator, and probably shouldn't be changed in any case. So make a copy of this file, and edit or add to the copy. After processing the file with buildhash, you can either replace the default ispell.hash file, or point to your new hashed file with the -d option.

- TOR, LK


Previous: 29.4 Inside spell UNIX Power ToolsNext: 29.6 Counting Lines, Words, and Characters: wc
29.4 Inside spell Book Index29.6 Counting Lines, Words, and Characters: wc

The UNIX CD Bookshelf NavigationThe UNIX CD BookshelfUNIX Power ToolsUNIX in a NutshellLearning the vi Editorsed & awkLearning the Korn ShellLearning the UNIX Operating System