Getting the Most out of Dictionaries - Emacs Predictive Completion User Manual

Previous: Dictionary Learning, Up: Dictionaries

5.6 Getting the Most out of Dictionaries

As it says at the beginning of this chapter, predictive completion is only as good as the dictionary it uses. The English dictionary supplied with the predictive package is trained on a large body of (British) English text, so the words and word weights it contains accurately reflect average English usage. But you are very unlikely to write “average” English (whatever that is!). To get the most out of predictive completion, it is better to train your dictionary on your own writing style, rather than someone else's.

There are two approaches to this. The first is to create a copy of the supplied English dictionary containing the all same words, but with all their weights reset to zero. You can then either use the auto-learn feature to slowly train the dictionary as you write (see Automatic Learning, or better still, kick-start things by training it on text you have already written by learning from existing files (see Learning from Buffers and Files). You can of course still leave auto-learn enabled in order to refine the dictionary, or even use the auto-add feature to automatically add missing words as you type them (see below).

A variant of this approach, if you don't like the supplied English dictionary, is to create the initial dictionary from some other list of words, e.g. the /usr/dict/words file on Unix systems. You will first need to massage the list into the format required by predictive-create-dict (see Creating Dictionaries), which is the same as the format produced by the dump commands (see Loading and Saving Dictionaries), but this should be easy for even a moderately savvy Emacs user¹!

The second approach is to start from a completely empty dictionary, and use the auto-add feature to automatically add words as you type them (see Automatic Learning). The auto-add feature adds words when you “accept” them. Since the words aren't already in the dictionary, the easiest way to add new words while typing is to ensure dynamic completion is enabled, and type an end-of-word character (such as a space or punctuation character) at the end of the word (see Dynamic Completion). Alternatively, you can use the fast learning commands predictive-fast-learn-or-add-from-buffer and predictive-fast-learn-or-add-from-file to add words from existing text (note that you must use the fast learning commands for this; the normal ones will only increment the weights of words that are already in the dictionary). However you auto-add the words, there is a risk that some words that you don't want will make their way into the dictionary, for example typos and misspellings, or possibly words containing non-letter characters. The latter are best dealt with by appropriate entries in completion-dynamic-syntax-alist and completion-dynamic-override-syntax-alist (see Syntax). The former are best dealt with by setting a predictive-auto-add-filter function (see Automatic Learning). It's still a good idea to occasionally check which words are in the dictionary by dumping it to a buffer and scanning through it by hand or with ispell (see Loading and Saving Dictionaries).

So which approach is better? Each has advantages and disadvantages, and it comes down to personal preference. Training a reset copy of the supplied English dictionary (or one built from another word list) ensures that all the words in the dictionary are spelled correctly (assuming the words in the list were correct in the first place). It also means that predictive mode will provide spelling assistance even when you type an obscure word that you've never used before. On the other hand, the dictionary will contain many words that you will never use, and may lack words that you do use, which will have to be added by hand (unless you enable auto-add).

If you write different types of text (e.g. your novel, academic papers, and emails), the vocabulary you use will differ significantly between the different types of text. You will get more out of predictive completion by creating separate dictionaries for each. You can then set up predictive mode to select the appropriate dictionary automatically, either based on the major mode (see Major Modes) or, in the case of LaTeX documents, based on the document class (see LaTeX Support). Once you've created your dictionaries, you can use the many features of predictive mode to tweak the dictionary training and behaviour to suit your every desire. Using buffer-local dictionaries can help predictive mode adapt faster to the specific vocabulary you are using in an individual document, especially if you set a large predictive-buffer-local-learn-multiplier (see Automatic Learning). Defining sensible prefix relationships between words makes sure predictive completion doesn't “get in your way” when you're typing fast (see Relationships Between Words). The predictive-auto-define-prefixes option and the predictive-define-all-prefixes command make defining prefix relationships very easy. Finally, having gone to all this effort to create the perfect dictionary, it would be tragic to lose it all! Make sure you occasionally backup your dictionaries by dumping them to a plain text file using predictive-dump-dict-to-file (see Loading and Saving Dictionaries). This is vital before upgrading to a new version of the Predictive package, since there's no guarantee that old dictionaries will be readable in the new version (whereas the dumped plain-text format is usually stable across Predictive package versions; even if exceptionally it changes, since it's a plain-text format it will at the very least always be readable in Emacs, and can be manipulated into the format required for recreating your dictionary in the new Predictive package version).

Footnotes

[1] Keyboard macros may help here...