From 06c049b8c0533add03d2412616bef316da770d8e Mon Sep 17 00:00:00 2001 From: Alhadis Date: Sat, 28 Nov 2015 04:40:29 +1100 Subject: [PATCH] Change ".ms" heuristic to accommodate MAXScript Linguist currently uses the presence of "move" commands to differentiate a GAS file from Groff. This is problematic with MAXScript, with includes a built-in function of that name. Furthermore, because of the language's exhaustive vocabulary, case insensitive nature and flexible syntax, it's difficult to impose rigid criteria on classifying it. This commit modifies the heuristic to assume the following flow: 1. If a line contains ".include" or ".global"/".globl" which doesn't follow a non-whitespace character, assume GAS. 2. Otherwise, if the line starts with a command like ".LG7E0" with a possible string of whitespace before it, assume it's also GAS. UNLESS either of the following conditions are true: 2a. The token is enclosed by a string or /* multiline comment */ 2b. The previous line ends with a backslash to denote a statement broken between lines, with possible whitespace and/or comment sequences between the backslash and the actual newline. 3. If neither of the above are met, assume the file is MAXScript. This approach may appear overly-inclusive, but given real-world usage of MAXScript includes writing brief files with few distinguishing keywords, it's reasonable to permit this leniency. --- lib/linguist/heuristics.rb | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/lib/linguist/heuristics.rb b/lib/linguist/heuristics.rb index 0a1ab912..7c1c7db1 100644 --- a/lib/linguist/heuristics.rb +++ b/lib/linguist/heuristics.rb @@ -238,8 +238,10 @@ module Linguist disambiguate ".ms" do |data| if /^[.'][a-z][a-z](\s|$)/i.match(data) Language["Groff"] - elsif /((^|\s)move?[. ])|\.(include|globa?l)\s/.match(data) + elsif /(?