Revise pattern for Emacs modeline detection

This is a rewrite of the regex that handles Emacs modeline matching. The
current one is a little flaky, causing some files to be misclassified as
"E", among other things.

It's worth noting malformed modelines can still change a file's language
in Emacs. Provided the -*- delimiters are intact, and the mode's name is
decipherable, Emacs will set the appropriate language mode *and* display
a warning about a malformed modeline:

    -*- foo-bar mode: ruby -*-   # Malformed, but understandable
            -*- mode: ruby--*-   # Completely invalid

The new pattern accommodates this leniency, making no effort to validate
a modeline's syntax beyond readable mode-names. In other words, if Emacs
accepts certain errors, we should too.
This commit is contained in:
Alhadis
2016-09-17 19:45:43 +10:00
parent 00efd6a463
commit 697380336c
5 changed files with 41 additions and 1 deletions

View File

@@ -1,7 +1,36 @@
module Linguist
module Strategy
class Modeline
EMACS_MODELINE = /-\*-\s*(?:(?!mode)[\w-]+\s*:\s*(?:[\w+-]+)\s*;?\s*)*(?:mode\s*:)?\s*([\w+-]+)\s*(?:;\s*(?!mode)[\w-]+\s*:\s*[\w+-]+\s*)*;?\s*-\*-/i
EMACS_MODELINE = /
-\*-
(?:
# Short form: `-*- ruby -*-`
\s* (?= [^:;\s]+ \s* -\*-)
|
# Longer form: `-*- foo:bar; mode: ruby; -*-`
(?:
.*? # Preceding variables: `-*- foo:bar bar:baz;`
[;\s] # Which are delimited by spaces or semicolons
|
(?<=-\*-) # Not preceded by anything: `-*-mode:ruby-*-`
)
mode # Major mode indicator
\s*:\s* # Allow whitespace around colon: `mode : ruby`
)
([^:;\s]+) # Name of mode
# Ensure the mode is terminated correctly
(?=
# Followed by semicolon or whitespace
[\s;]
|
# Touching the ending sequence: `ruby-*-`
(?<![-*]) # Don't allow stuff like `ruby--*-` to match; it'll invalidate the mode
-\*- # Emacs has no problems reading `ruby --*-`, however.
)
.*? # Anything between a cleanly-terminated mode and the ending -*-
-\*-
/xi
# First form vim modeline
# [text]{white}{vi:|vim:|ex:}[white]{options}