Each strategy takes as candidates the language outputted by the
previous strategy if any. This was already the case for the
Classifier and Heuristic strategies as these couldn't generate new
candidate languages (as opposed to the Modeline, Filename, Shebang,
and Extension strategies).
In practice, this signifies that if, for example, the Shebang
strategy finds two possible languages for a given file (as is
currently possible with the perl interpreter), the next strategy, the
Extension strategy, will use this information and further reduce the
set of possible language.
Currently, without this commit, the Extension strategy would discard
the results from the previous strategy and start anew, possibly
returning a different language from those returned by the Shebang
strategy.
* Mainly fixing problems with Perl heuristics
And also adding a little bit of text to the README file to help with local use and test.
* Adds new sample
* Adds a couple of samples more, not represented before
* Moves installation intructions to CONTRIBUTING.md
Refs #2309 and also changes github.com to an uniform capitalization.
* Correcting error. Great job, CI
* Moving another file
* Adds samples and new checks for perl/perl6
* Stupid mistake
* Changing regex for perl5 vs perl6
Initial suggestion by @pchaigno, slightly changed to eliminate false positives such as "classes" or "modules" at the beginning of a line in the =pod
BTW, it would be interesting to just eliminate these areas for language detection.
* Eliminates Rexfile from Perl6
And adds .pod6
* Followup to #2709
I just found I had this sitting here, so I might as well follow
instructions to fix it.
* Adds example for pod6
* Eliminates .pod because it's its own language
* Removes bad directory
* Reverting changes that were already there
* Restored CONTRIBUTING.md from head
I see installation of cmake is advised in README.md
* Eliminates `.pod6`
To leave way for #3366 or succeeding PRs.
* Removed by request, since we're no longer adding this extension
* Sorting by alphabetical order filenames
* Moved from sample to test fixtures
* Revert "Remove Arduino as a language (#3933)"
This reverts commit 8e628ecc36.
* Revert "Check generated Jest snap file (#3874)"
This reverts commit ca714340e8.
* Add test to demonstrate Perl syntax detection bug
A Perl 5 .pm file containing the word `module` or `class`, even with
an explicit `use 5.*` statement, is recognized as Perl 6 code.
* Improve Perl 5 and Perl 6 disambiguation
The heuristics for Perl 5 and 6 `.pm` files disambiguation was done
searching for keywords which can appear in both languages (`class` and
`module`) in addition to the `use` statement check.
Due to Perl 6 being tested first, code containing those words would
always be interpreted as Perl 6.
Test order was thus reversed, testing for Perl 5 first. Since Perl 6
code would never contain a `use 5.*` statement, this does no harm to
Perl 6 detection while fixing the problem to Perl 5.
Fixes: #3637
* ash: only interpreter, extension is more commonly used for
Kingdom of Loathing scripting, e.g. github.com/twistedmage/assorted-kol-scripts
* dash: only interpreter, extension is more commonly used for
dashboarding-related stuff
* ksh: extension was already present
* mksh
* pdksh
* Separate find_by_extension and find_by_filename
find_by_extension now takes a path as argument and not only the file extension.
Currently only find_by_extension is used as a strategy.
* Add find_by_filename as first strategy
This is a rewrite of the regex that handles Emacs modeline matching. The
current one is a little flaky, causing some files to be misclassified as
"E", among other things.
It's worth noting malformed modelines can still change a file's language
in Emacs. Provided the -*- delimiters are intact, and the mode's name is
decipherable, Emacs will set the appropriate language mode *and* display
a warning about a malformed modeline:
-*- foo-bar mode: ruby -*- # Malformed, but understandable
-*- mode: ruby--*- # Completely invalid
The new pattern accommodates this leniency, making no effort to validate
a modeline's syntax beyond readable mode-names. In other words, if Emacs
accepts certain errors, we should too.
The current expressions fail to match certain permutations of options:
vim: noexpandtab: ft=javascript:
vim: titlestring=foo\ ft=notperl ft=javascript:
Version-specific modelines are also unaccounted for:
vim600: set foldmethod=marker ft=javascript: # >= Vim 6.0
vim<600: set ft=javascript: # < Vim 6.0
See http://vimdoc.sourceforge.net/htmldoc/options.html#modeline
* master: (168 commits)
ruby for example
Bumping version
Updating grammars
Grammar for Less from Atom package
Remove Less grammar
Updating to latest perl6 grammar
Adding Perl6-specific grammar.
Grammar for YANG from Atom package
Support for YANG language
Add detection of GrammarKit-generated files
Add .xproj to list of XML extensions
Test submodules are using HTTPS links
Improved vim modeline detection
Heuristic for Pod vs. Perl
Bumping to v4.7.4
Grammar update
Support .rs.in as a file extension for Rust files.
HTTPS links for submodules
Add the LFE lexer as an example of erlang .xrl
Add the Elixir parser as an example of erlang .yrl
...
TLDR: This greatly increases the flexibility of vim modeline detection
to manually set the language of a file.
In vim there are two forms of modelines:
[text]{white}{vi:|vim:|ex:}[white]{options}
examples: 'vim: syntax=perl', 'ex: filetype=ruby'
-and-
[text]{white}{vi:|vim:|Vim:|ex:}[white]se[t] {options}:[text]
examples: 'vim set syntax=perl:', 'Vim: se ft=ruby:'
As you can see, there are many combinations. These changes should allow
most combinations to be used. The two most important additions are the
use of the keyword 'syntax', as well as the addition of the first form
(you now no longer need to use the keyword 'set' with a colon at the end).
The use of first form with 'syntax' is very, very common across GitHub:
https://github.com/search?l=ruby&q=vim%3A+syntax%3D&ref=searchresults&type=Code&utf8=%E2%9C%93
This change allows the filetype/language to be retrieved from more complex vim modelines. The current regex strictly allows a set line which contains only the filetype/ft parameter and nothing else