Each strategy takes as candidates the language outputted by the
previous strategy if any. This was already the case for the
Classifier and Heuristic strategies as these couldn't generate new
candidate languages (as opposed to the Modeline, Filename, Shebang,
and Extension strategies).
In practice, this signifies that if, for example, the Shebang
strategy finds two possible languages for a given file (as is
currently possible with the perl interpreter), the next strategy, the
Extension strategy, will use this information and further reduce the
set of possible language.
Currently, without this commit, the Extension strategy would discard
the results from the previous strategy and start anew, possibly
returning a different language from those returned by the Shebang
strategy.
* Mainly fixing problems with Perl heuristics
And also adding a little bit of text to the README file to help with local use and test.
* Adds new sample
* Adds a couple of samples more, not represented before
* Moves installation intructions to CONTRIBUTING.md
Refs #2309 and also changes github.com to an uniform capitalization.
* Correcting error. Great job, CI
* Moving another file
* Adds samples and new checks for perl/perl6
* Stupid mistake
* Changing regex for perl5 vs perl6
Initial suggestion by @pchaigno, slightly changed to eliminate false positives such as "classes" or "modules" at the beginning of a line in the =pod
BTW, it would be interesting to just eliminate these areas for language detection.
* Eliminates Rexfile from Perl6
And adds .pod6
* Followup to #2709
I just found I had this sitting here, so I might as well follow
instructions to fix it.
* Adds example for pod6
* Eliminates .pod because it's its own language
* Removes bad directory
* Reverting changes that were already there
* Restored CONTRIBUTING.md from head
I see installation of cmake is advised in README.md
* Eliminates `.pod6`
To leave way for #3366 or succeeding PRs.
* Removed by request, since we're no longer adding this extension
* Sorting by alphabetical order filenames
* Moved from sample to test fixtures
* Detect Maven wrapper "mvnw"
* Fix build, filenames must be sorted in the "filenames" section of languages.yml, filenames cannot be grouped by topic
* Remove `mvnw` file from languages/Shell/filenames according to @Alhadis recommendation as we are sure that `mvnw` always starts with the shebang `#!/bin/sh`.
* Remove space chars added by mistake
* Update licensee version
This pulls in Licensed 0.10.0 too.
* Use a full path to the grammars
Licensed now enforces this as it's easier then guessing.
* Ensure full path
* Use new path for FSProject
* Starting to adjust tests
* require licensee again
* Fix grammar tests
* verify -> status
* whitelist -> allowed
* explicitly set cache_path in configuration
default for licensed v1.0 changed from `vendor/licenses` to `.licenses`
* load configuration from file location
default configuration file location changed from `vendor/licenses/config.yml` to `.licensed.yml`
* update gemspec for licensed 1.0.0
* Remove unused license hash
* Add detectable key to languages
This key allows to override the language being included in the
language stats of a repository.
* Make detectable override-able using .gitattributes
* Mention `linguist-detectable` in README
* Remove detectable key from languages
Reverts changes in 0f7c0df5.
* Update commit hash to the one that was merged
PR #3806 changed the commit hash. The original commit was not
actually merged into the test/attributes branch.
* Fix check to ensure detectable is defined
* Add include in language stats tests when detectable set
* Ignore detectable when vendored, documentation or overridden
* Add documentation on detectable override in README
* Improve documentation on detectable override in README
* Revert "Remove Arduino as a language (#3933)"
This reverts commit 8e628ecc36.
* Revert "Check generated Jest snap file (#3874)"
This reverts commit ca714340e8.
* grammars: Update several grammars with compat issues
* [WIP] Add new grammar conversion tools
* Wrap in a Docker script
* Proper Dockerfile support
* Add Javadoc grammar
* Remove NPM package.json
* Remove superfluous test
This is now always checked by the grammars compiler
* Update JSyntax grammar to new submodule
* Approve Javadoc license
* grammars: Remove checked-in dependencies
* grammars: Add regex checks to the compiler
* grammars: Point Oz to its actual submodule
* grammars: Refactor compiler to group errors by repo
* grammars: Cleanups to error reporting
* Add Cocoapods to generated list so it doesn't show in PR diffs
* Removed Cocoapods from vendor.yml
* Enhance regex to match only Cocoapod's Pods folder
* Adds additional test cases for generated Pods folder
* Add test to demonstrate Perl syntax detection bug
A Perl 5 .pm file containing the word `module` or `class`, even with
an explicit `use 5.*` statement, is recognized as Perl 6 code.
* Improve Perl 5 and Perl 6 disambiguation
The heuristics for Perl 5 and 6 `.pm` files disambiguation was done
searching for keywords which can appear in both languages (`class` and
`module`) in addition to the `use` statement check.
Due to Perl 6 being tested first, code containing those words would
always be interpreted as Perl 6.
Test order was thus reversed, testing for Perl 5 first. Since Perl 6
code would never contain a `use 5.*` statement, this does no harm to
Perl 6 detection while fixing the problem to Perl 5.
Fixes: #3637
* Add failing test for finding with non-String input
Show the failing behaviour of find_by_alias, find_by_name, and []
when non-String input is provided.
* Return nil rather than erroring on non-String input
* Support for C++ files generated by protobuf/grpc
This changeset includes a sample generated file.
[grpc](http://grpc.io) is a high performance, open-source universal
RPC framework.
* Account for older gRPC protobuf plugin message
test_classify_ambiguous_languages was not running any test, since
it was looking only for languages that are ambiguous on
filename for known filenames (rather than ambiguous for filename
or extension).
Note that test time and assertions.
Before:
Finished in 0.149294s, 40.1892 runs/s, 46.8874 assertions/s.
After:
Finished in 3.043109s, 1.9717 runs/s, 224.7702 assertions/s.
* .xpm and .pm extensions associated with XPM.
* .pm is disambiguated by searching the /* XPM */ string.
This is how `file` performs detection and should work with
every XPM3 file (most XPM generated by software later than 1991).
Added XPM samples:
* stick-unfocus.xpm: extracted from Fluxbox (MIT License)
0c13ddc0c8/data/styles/Emerge/pixmaps/stick-unfocus.xpm
* cc-public_domain_mark_white.pm: public domain image from
https://commons.wikimedia.org/wiki/File:Cc-public_domain_mark_white.svg
converted to XPM with ImageMagick (convert input.svg output.xpm).
* ash: only interpreter, extension is more commonly used for
Kingdom of Loathing scripting, e.g. github.com/twistedmage/assorted-kol-scripts
* dash: only interpreter, extension is more commonly used for
dashboarding-related stuff
* ksh: extension was already present
* mksh
* pdksh
* Update md5 sums for Ruby 2.4
Ruby 2.4 deprecated Fixnum & Bignum into Integer. This means the MD5 digests for the integers in our tests have a class of Integer instead of Fixnum which means we need to update the digests specifically for 2.4.
* Use Gem::Version for safer version comparison
* fix Roff detection in heuristics
This affects extensions .l, .ms, .n and .rno.
Groff was renamed to Roff in 673aeb32b9851cc58429c4b598c876292aaf70c7,
but heuristic was not updated.
* replace FORTRAN with Fortran
It was already renamed in most places since 4fd8fce08574809aa58e9771e2a9da5d135127be
heuristics.rb was missing though.
* fix caseness of GCC Machine Description
* Remove a few hashes for grammars with BSD licenses
There was an error in Licensee v8.8.2, which caused it to not
recognize some BSD licenses. v8.8.3 fixes it.
* Update submodules
Remove 2 grammars from the whitelist because their licenses were
added to a LICENSE file which a proper format (one that Licensee
detects).
MagicPython now supports all scopes that were previously supported
by language-python.
* Update Licensee hashes for grammar licenses
Licensee v8.8 changed the way licenses are normalized, thus changing hashes for
some grammars
* Update Licensee
Prevent automatic updates to major releases
* Return early if no languages supplied
There's no need to tokenise the data when attempting to classify without a limited language scope as no action will be performed when it comes to scoring anyway.
* Add test for empty languages array