Next: , Previous: Auto-Overlay Functions, Up: Top


3 Worked Example

The interaction of all the different regexp definitions, overlay properties and auto-overlay classes provided by the auto-overlay package can be a little daunting. This section will go through an example of how the auto-overlay regexps could be defined to create overlays for a subset of LaTeX, which is complex enough to demonstrate most of the features.

LaTeX is a markup language, so a LaTeX document combines markup commands with normal text. Commands start with ‘\’, and end at the first non-word-constituent character. We want to highlight all LaTeX commands in blue. Two commands that will particularly interest us are ‘\begin’ and ‘\end’, which begin and end a LaTeX environment. The environment name is enclosed in braces: ‘\begin{environment-name}’, and we want it to be highlighted in pink. LaTeX provides many environments, used to create lists, tables, titles, etc. We will take the example of an ‘equation’ environment, used to typeset mathematical equations. Thus equations are enclosed by ‘\begin{equation}’ and ‘\end{equation}’, and we would like to highlight these equations in yellow. Another example we will use is the ‘$’ delimiter. Pairs of ‘$’s delimit mathematical expressions that appear in the middle of a paragraph of normal text (whereas ‘equation’ environments appear on their own, slightly separated from surrounding text). Again, we want to highlight these mathematical expressions, this time in green. The final piece of LaTeX markup we will need to consider is the ‘%’ character, which creates a comment that lasts till the end of the line (i.e. text after the ‘%’ is ignored by the LaTeX processor up to the end of the line).

LaTeX commands are a good example of when to use word regular expressions (see Overview). The appropriate regexp definition is loaded by

     (auto-overlay-load-definition
      'latex
      '(word ("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)"
              (face . (background-color . "blue")))))

We have called the regexp set latex. The face property is a standard Emacs overlay property that sets font properties within the overlay. See Overlay Properties. "\\\\" is the string defining the regexp that matches a single\’. (Note that the ‘\’ character has a special meaning in regular expressions, so to include a literal one it must be escaped: ‘\\’. However, ‘\’ also has a special meaning in lisp strings, so both ‘\’ characters must be escaped there too, giving \\\\.) [[:alpha:]]*? matches a sequence of zero or more letter characters. The ? ensures that it matches the shortest sequence of letters consistent with matching the regexp, since we want the region to end at the first non-letter character, matched by [^[:alpha:]]. The \| defines an alternative, to allow the LaTeX command to be terminated either by a non-letter character or by the end of the line ($). See Regular Expressions, for more details on Emacs regular expressions.

However, there's a small problem. We only want the blue background to cover the characters making up a LaTeX command. But as we've defined things so far, it will cover all the text matched by the regexp, which includes the leading ‘\’ and the trailing non-letter character. To rectify this, we need to group the part of the regexp that matches the command (i.e. by surround it with ‘\(’ and ‘\)’), and put the regexp inside a cons cell containing the regexp in its car and a number indicating which subgroup to use in its cdr:

     (auto-overlay-load-definition
      'latex
      '(word (("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)" . 1)
              (face . (background-color . "blue")))))

The ‘$’ delimiter is an obvious example of when to use a self regexp (see Overview). We can update our example to include this (note that ‘$’ also has a special meaning in regular expressions, so it must be escaped with ‘\’ which itself must be escaped in lisp strings):

     (auto-overlay-load-definition
      'latex
      '(word (("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)" . 1)
              (face . (background-color . "blue")))))
     
     (auto-overlay-load-definition
      'latex
      '(self ("\\$" (face . (background-color . "green")))))

This won't quite work though. LaTeX maths commands also start with a ‘\’ character, which will match the word regexp. For the sake of example we want the entire equation highlighted in green, without highlighting any LaTeX maths commands it contains in blue. Since the word overlay will be within the self overlay, the blue highlighting will take precedence. We can change this by giving the self overlay a higher priority (any priority is higher than a non-existent one; we use 3 here for later convenience). For efficiency reasons, it's a good idea to put higher priority regexp definitions before lower priority ones, so we get:

     (auto-overlay-load-definition
      'latex
      '(self ("\\$" (priority . 3) (face . (background-color . "green")))))
     
     (auto-overlay-load-definition
      'latex
      '(word (("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)" . 1)
              (face . (background-color . "blue")))))

The ‘\begin{equation}’ and ‘\end{equation}’ commands also enclose maths regions, which we would like to highlight in yellow. Since the opening and closing delimiters are different in this case, we must use nested overlays (see Overview). Our example now looks like:

     (auto-overlay-load-definition
      'latex
      '(self ("\\$" (priority . 3) (face . (background-color . "green")))))
     
     (auto-overlay-load-definition
      'latex
      '(nested
        ("\\begin{equation}"
         :edge start
         (priority . 1)
         (face . (background-color . "yellow")))
        ("\\end{equation}"
         :edge end
         (priority . 1)
         (face . (background-color . "yellow")))))
     
     (auto-overlay-load-definition
      'latex
      '(word (("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)" . 1)
              (face . (background-color . "blue")))))

Notice how we've used separate start and end regexps to define the auto-overlay. Once again, we have had to escape the ‘\’ characters, and increase the priority of the new regexp definition to avoid any LaTeX commands within the maths region being highlighted in blue.

LaTeX comments start with ‘%’ and last till the end of the line: a perfect demonstration of a line regexp. Here's a first attempt:

     (auto-overlay-load-definition
      'latex
      '(self ("\\$" (priority . 3) (face . (background-color . "green")))))
     
     (auto-overlay-load-definition
      'latex
      '(nested
        ("\\begin{equation}"
         :edge start
         (priority . 1)
         (face . (background-color . "yellow")))
        ("\\end{equation}"
         :edge end
         (priority . 1)
         (face . (background-color . "yellow")))))
     
     (auto-overlay-load-definition
      'latex
      '(word (("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)" . 1)
              (face . (background-color . "blue")))))
     
     (auto-overlay-load-definition
      'latex
      `(line ("%" (face . (background-color
                           . ,(face-attribute 'default :background))))))

We use the standard Emacs face-attribute function to retrieve the default background colour, which is evaluated before the regexp definition is loaded. (This will of course go wrong if the default background colour is subsequently changed, but it's sufficient for this example). Let's think about this a bit. We probably don't want anything within a comment to be highlighted at all, even if it matches one of the other regexps. In fact, creating overlays for ‘\begin’ and ‘\end’ commands which are within a comment could cause havoc! If they don't occur in pairs within the commented region, they will erroneously pair up with ones outside the comment. We need comments to take precedence over everything else, and we need them to block other regexp matches, so we boost the overlay's priority and set the exclusive property:

     (auto-overlay-load-definition
      'latex
      `(line ("%" (priority . 4) (exclusive . t)
                  (face . (background-color
                           . ,(face-attribute 'default :background))))))
     
     (auto-overlay-load-definition
      'latex
      '(self ("\\$" (priority . 3) (face . (background-color . "green")))))
     
     (auto-overlay-load-definition
      'latex
      '(nested
        ("\\begin{equation}"
         :edge start
         (priority . 1)
         (face . (background-color . "yellow")))
        ("\\end{equation}"
         :edge end
         (priority . 1)
         (face . (background-color . "yellow")))))
     
     (auto-overlay-load-definition
      'latex
      '(word (("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)" . 1)
              (face . (background-color . "blue")))))

We're well on our way to creating a useful setup, at least for the LaTeX commands we're considering in this example. There is one last type of overlay to create, but it is the most complicated. We want environment names to be highlighted in pink, i.e. the region between ‘\begin{’ and ‘}’. A first attempt at this might result in:

     (auto-overlay-load-definition
      'latex
      `(line ("%" (priority . 4) (exclusive . t)
                  (face . (background-color
                           . ,(face-attribute 'default :background))))))
     
     (auto-overlay-load-definition
      'latex
      '(self ("\\$" (priority . 3) (face . (background-color . "green")))))
     
     (auto-overlay-load-definition
      'latex
      '(nested
        ("\\begin{"
         :edge start
         (priority . 2)
         (face . (background-color . "pink")))
        ("}"
         :edge end
         (priority . 2)
         (face . (background-color . "pink")))))
     
     (auto-overlay-load-definition
      'latex
      '(nested
        ("\\begin{equation}"
         :edge start
         (priority . 1)
         (face . (background-color . "yellow")))
        ("\\end{equation}"
         :edge end
         (priority . 1)
         (face . (background-color . "yellow")))))
     
     (auto-overlay-load-definition
      'latex
      '(word (("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)" . 1)
              (face . (background-color . "blue")))))

However, we'll hit a problem with this. The ‘}’ character also closes the ‘\end{’ command. Since we haven't told auto-overlays about ‘\end{’, every ‘}’ that should close an ‘\end{’ command will instead be interpreted as the end of a ‘\start{’ command, probably resulting in lots of unmatched ‘}’ characters, creating pink splodges everywhere! Clearly, since we also want environment names between ‘\end{’ and ‘}’ to be pink, we need something more along the lines of:

     (auto-overlay-load-definition
      'latex
      `(line ("%" (priority . 4) (exclusive . t)
                  (face . (background-color
                           . ,(face-attribute 'default :background))))))
     
     (auto-overlay-load-definition
      'latex
      '(self ("\\$" (priority . 3) (face . (background-color . "green")))))
     
     (auto-overlay-load-definition
      'latex
      '(nested
        ("\\begin{"
         :edge start
         (priority . 2)
         (face . (background-color . "pink")))
        ("\\end{"
         :edge start
         (priority . 2)
         (face . (background-color . "pink")))
        ("}"
         :edge end
         (priority . 2)
         (face . (background-color . "pink")))))
     
     (auto-overlay-load-definition
      'latex
      '(nested
        ("\\begin{equation}"
         :edge start
         (priority . 1)
         (face . (background-color . "yellow")))
        ("\\end{equation}"
         :edge end
         (priority . 1)
         (face . (background-color . "yellow")))))
     
     (auto-overlay-load-definition
      'latex
      '(word (("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)" . 1)
              (face . (background-color . "blue")))))

We still haven't solved the problem though. The ‘}’ character doesn't only close ‘\begin{’ and ‘\end{’ commands in LaTeX. All arguments to LaTeX commands are surrounded by ‘{’ and ‘}’. We could add all the commands that take arguments, but we don't really want to bother about any other commands (at least in this example). All we want to do is prevent predictive mode incorrectly pairing the ‘}’ characters used for other commands. Instead, we can just add ‘{’ to the list:

     (auto-overlay-load-definition
      'latex
      `(line ("%" (priority . 4) (exclusive . t)
                  (face . (background-color
                           . ,(face-attribute 'default :background))))))
     
     (auto-overlay-load-definition
      'latex
      '(self ("\\$" (priority . 3) (face . (background-color . "green")))))
     
     (auto-overlay-load-definition
      'latex
      '(nested
        ("{"
         :edge start
         (priority . 2))
        ("\\begin{"
         :edge start
         (priority . 2)
         (face . (background-color . "pink")))
        ("\\end{"
         :edge start
         (priority . 2)
         (face . (background-color . "pink")))
        ("}"
         :edge end
         (priority . 2))))
     
     (auto-overlay-load-definition
      'latex
      '(nested
        ("\\begin{equation}"
         :edge start
         (priority . 1)
         (face . (background-color . "yellow")))
        ("\\end{equation}"
         :edge end
         (priority . 1)
         (face . (background-color . "yellow")))))
     
     (auto-overlay-load-definition
      'latex
      '(word (("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)" . 1)
              (face . (background-color . "blue")))))

Notice how the { and } regexps do not define a background colour (or indeed any other properties), so that any overlays they create will have no effect other than making sure all ‘{’ and ‘}’ characters are correctly paired.

We've made one mistake though: by putting the { regexp at the beginning of the list, it will take priority over any other regexp in the list that could match the same text. And since { will match whenever \begin{ or \end{ matches, environments will never be highlighted! The { regexp must come after the \begin{ and \end{ regexps, to ensure it is only used if neither of them match (it doesn't matter whether it appears before or after the { regexp, since the latter will never match the same text):

     (auto-overlay-load-definition
      'latex
      `(line ("%" (priority . 4) (exclusive . t)
                  (face . (background-color
                           . ,(face-attribute 'default :background))))))
     
     (auto-overlay-load-definition
      'latex
      '(self ("\\$" (priority . 3) (face . (background-color . "green")))))
     
     (auto-overlay-load-definition
      'latex
      '(nested
        ("\\begin{"
         :edge start
         (priority . 2)
         (face . (background-color . "pink")))
        ("\\end{"
         :edge start
         (priority . 2)
         (face . (background-color . "pink")))
        ("{"
         :edge start
         (priority . 2))
        ("}"
         :edge end
         (priority . 2))))
     
     (auto-overlay-load-definition
      'latex
      '(nested
        ("\\begin{equation}"
         :edge start
         (priority . 1)
         (face . (background-color . "yellow")))
        ("\\end{equation}"
         :edge end
         (priority . 1)
         (face . (background-color . "yellow")))))
     
     (auto-overlay-load-definition
      'latex
      '(word (("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)" . 1)
              (face . (background-color . "blue")))))

There is one last issue. A literal ‘{’ or ‘}’ character can be included in a LaTeX document by escaping it with ‘\’: ‘\{’ and ‘\}’. In this situation, the characters do not match anything and should not be treated as delimiters. We can modify the { and } regexps to exclude these cases:

     (auto-overlay-load-definition
      'latex
      `(line ("%" (priority . 4) (exclusive . t)
                  (face . (background-color
                           . ,(face-attribute 'default :background))))))
     
     (auto-overlay-load-definition
      'latex
      '(self ("\\$" (priority . 3) (face . (background-color . "green")))))
     
     (auto-overlay-load-definition
      'latex
      '(nested
        ("\\begin{"
         :edge start
         (priority . 2)
         (face . (background-color . "pink")))
        ("\\end{"
         :edge start
         (priority . 2)
         (face . (background-color . "pink")))
        ("\\([^\\]\\|^\\){"
         :edge start
         (priority . 2))
        ("\\([^\\]\\|^\\)}"
         :edge end
         (priority . 2))))
     
     (auto-overlay-load-definition
      'latex
      '(nested
        ("\\begin{equation}"
         :edge start
         (priority . 1)
         (face . (background-color . "yellow")))
        ("\\end{equation}"
         :edge end
         (priority . 1)
         (face . (background-color . "yellow")))))
     
     (auto-overlay-load-definition
      'latex
      '(word (("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)" . 1)
              (face . (background-color . "blue")))))

The new, complicated-looking regexps will only match ‘{’ and ‘}’ characters if they are not preceded by a ‘\’ character (see Regular Expressions). Note that the character alternative [^\]\|^ can match any character that isn't a ‘\or the start of a line. This is required because macthes to auto-overlay regexps are not allowed to span more than one line. If ‘{’ or ‘}’ appear at the beginning of a line, there will be no character in front (the newline character doesn't count, since it isn't on the same line), so the [^\] will not match.

However, when it does match, the } regexp will now match an additional character before the }, causing the overlay to end one character early. (The { regexp will also match one additional character before the {, but since the beginning of the overlay starts from the end of the start delimiter, this poses less of a problem.) We need to group the part of the regexp that should define the delimiter, i.e. the }, by surrounding it with \( and \), and put the regexp in the car of a cons cell whose cdr specifies the new subgroup (i.e. the 2nd subgroup, since the regexp already included a group for other reasons; we could alternatively replace the original group by a shy-group, since we don't actually need to capture match data for that group). Our final version looks like this:

     (auto-overlay-load-definition
      'latex
      `(line ("%" (priority . 4) (exclusive . t)
                  (face . (background-color
                           . ,(face-attribute 'default :background))))))
     
     (auto-overlay-load-definition
      'latex
      '(self ("\\$" (priority . 3) (face . (background-color . "green")))))
     
     (auto-overlay-load-definition
      'latex
      '(nested
        ("\\begin{"
         :edge start
         (priority . 2)
         (face . (background-color . "pink")))
        ("\\end{"
         :edge start
         (priority . 2)
         (face . (background-color . "pink")))
        ("\\([^\\]\\|^\\){"
         :edge start
         (priority . 2))
        (("\\([^\\]\\|^\\)\\(}\\)" . 2)
         :edge end
         (priority . 2))))
     
     (auto-overlay-load-definition
      'latex
      '(nested
        ("\\begin{equation}"
         :edge start
         (priority . 1)
         (face . (background-color . "yellow")))
        ("\\end{equation}"
         :edge end
         (priority . 1)
         (face . (background-color . "yellow")))))
     
     (auto-overlay-load-definition
      'latex
      '(word (("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)" . 1)
              (face . (background-color . "blue")))))

With these regexp definitions, LaTeX commands will automatically be highlighted in blue, equation environments in yellow, inline maths commands in green, and environment names in pink. LaTeX markup within comments will be ignored. And ‘{’ and ‘}’ characters from other commands will be correctly taken into account. All this is done in “real-time”; it doesn't wait until Emacs is idle to update the overlays. Not bad for a bundle of regexps!

Of course, this could all be done more easily using Emacs' built-in syntax highlighting features, but the highlighting was only an example to show the location of the overlays. The main point is that the overlays are automatically created and kept up to date, and can be given any properties you like and used for whatever purpose is required by your Elisp package.