* Add test to demonstrate Perl syntax detection bug
A Perl 5 .pm file containing the word `module` or `class`, even with
an explicit `use 5.*` statement, is recognized as Perl 6 code.
* Improve Perl 5 and Perl 6 disambiguation
The heuristics for Perl 5 and 6 `.pm` files disambiguation was done
searching for keywords which can appear in both languages (`class` and
`module`) in addition to the `use` statement check.
Due to Perl 6 being tested first, code containing those words would
always be interpreted as Perl 6.
Test order was thus reversed, testing for Perl 5 first. Since Perl 6
code would never contain a `use 5.*` statement, this does no harm to
Perl 6 detection while fixing the problem to Perl 5.
Fixes: #3637
* Add failing test for finding with non-String input
Show the failing behaviour of find_by_alias, find_by_name, and []
when non-String input is provided.
* Return nil rather than erroring on non-String input
* Support for C++ files generated by protobuf/grpc
This changeset includes a sample generated file.
[grpc](http://grpc.io) is a high performance, open-source universal
RPC framework.
* Account for older gRPC protobuf plugin message
test_classify_ambiguous_languages was not running any test, since
it was looking only for languages that are ambiguous on
filename for known filenames (rather than ambiguous for filename
or extension).
Note that test time and assertions.
Before:
Finished in 0.149294s, 40.1892 runs/s, 46.8874 assertions/s.
After:
Finished in 3.043109s, 1.9717 runs/s, 224.7702 assertions/s.
* .xpm and .pm extensions associated with XPM.
* .pm is disambiguated by searching the /* XPM */ string.
This is how `file` performs detection and should work with
every XPM3 file (most XPM generated by software later than 1991).
Added XPM samples:
* stick-unfocus.xpm: extracted from Fluxbox (MIT License)
0c13ddc0c8/data/styles/Emerge/pixmaps/stick-unfocus.xpm
* cc-public_domain_mark_white.pm: public domain image from
https://commons.wikimedia.org/wiki/File:Cc-public_domain_mark_white.svg
converted to XPM with ImageMagick (convert input.svg output.xpm).
* ash: only interpreter, extension is more commonly used for
Kingdom of Loathing scripting, e.g. github.com/twistedmage/assorted-kol-scripts
* dash: only interpreter, extension is more commonly used for
dashboarding-related stuff
* ksh: extension was already present
* mksh
* pdksh
* Update md5 sums for Ruby 2.4
Ruby 2.4 deprecated Fixnum & Bignum into Integer. This means the MD5 digests for the integers in our tests have a class of Integer instead of Fixnum which means we need to update the digests specifically for 2.4.
* Use Gem::Version for safer version comparison
* fix Roff detection in heuristics
This affects extensions .l, .ms, .n and .rno.
Groff was renamed to Roff in 673aeb32b9851cc58429c4b598c876292aaf70c7,
but heuristic was not updated.
* replace FORTRAN with Fortran
It was already renamed in most places since 4fd8fce08574809aa58e9771e2a9da5d135127be
heuristics.rb was missing though.
* fix caseness of GCC Machine Description
* Remove a few hashes for grammars with BSD licenses
There was an error in Licensee v8.8.2, which caused it to not
recognize some BSD licenses. v8.8.3 fixes it.
* Update submodules
Remove 2 grammars from the whitelist because their licenses were
added to a LICENSE file which a proper format (one that Licensee
detects).
MagicPython now supports all scopes that were previously supported
by language-python.
* Update Licensee hashes for grammar licenses
Licensee v8.8 changed the way licenses are normalized, thus changing hashes for
some grammars
* Update Licensee
Prevent automatic updates to major releases
* Return early if no languages supplied
There's no need to tokenise the data when attempting to classify without a limited language scope as no action will be performed when it comes to scoring anyway.
* Add test for empty languages array
* Separate find_by_extension and find_by_filename
find_by_extension now takes a path as argument and not only the file extension.
Currently only find_by_extension is used as a strategy.
* Add find_by_filename as first strategy
* Remove deprecated find_by_shebang
* Remove deprecated ace_modes function
* Remove deprecated primary_extension function
Gists don't have a language dropdown anymore
* Remove deprecated Linguist::Language.detect function
* Remove deprecated search_term field
* Generate language_id from language names
The language_id is generated from the SHA256 hash of the language's name
* Test the validity of language ids
All languages should have a positive 32bit integer as an id
* Update languages.yml header in set-language-ids