* refactor-heuristics: (43 commits)
update docs
Clean up heuristic logic
Allow disambiguate to return an Array
Rename .create to .disambiguate
docs
Remove inactive heuristics
Refactor heuristics
Not going back
docs
Move call method into existing Classifier class
Try strategies until one language is returned
Remove unneded empty blob check
Add F# and GLSL samples. Add Forth and GLSL extension .fs. Add heuristic to disambiguate between F#, Forth, and GLSL.
byebug requires ruby 2.0
Remove test for removed extension
Fix typo in test
add rake interpreter
add python3 interpreter
Remove old wrong_shebang.rb sample
Add byebug
...
Conflicts:
lib/linguist/heuristics.rb
test/test_heuristics.rb
* origin/master:
byebug requires ruby 2.0
Remove test for removed extension
Merge branch 'master' into 1233-local
Removing pry runtime dependency
Moving to fixtures
Language detection test for non-sample files
Refactoring of Language.detect
Try shebang detection if the extension is unknown
Change unknown extension of PHP sample file
We require samples for explicitly defined filenames that matches multiple languages. This is generally a good thing, but in this case they will be identical.
* origin/master:
Allow mime-types 2.x to be used with Linguist
Upgrade to rugged 0.22.0b1
Mention that languages need to be quite popular
fix vendor/cache
Gemfile.lock is nolonger considered generated
Tests for BlobHelper#empty?
remove reference to empty.js
Remove more empty samples
Bail earlier if the file is empty.
Moving comments
Use heuristics earlier to inform the rest of the classification process
Removing inconsistency of `find_by_heuristics` (was sometimes returning nil and sometimes returning and empty array)
Removing unused array of candidate languages.
Reworking most heuristics to only return one match
Based on top of PR#1447. Adds a simple heuristic check for Hack files vs PHP files (`<?hh` vs other `<?`).
Tested by verifying that the Hack example site was detected as 100% Hack and that Laravel was detected as 100% PHP. (Without the heuristic, Laravel gets detected as about 50% Hack, just by randomness in the classifier since PHP and Hack are very hard to distinguish unless you actually parse the file and look for specific language features.)
Hack is Facebook's dialect of PHP: http://hacklang.org/. This adds support for detecting it via the ".hh" file extension; although that extension techincally conflicts with C++ headers, the files look different enough that the existing classifier based on sample code has no trouble distinguising them.
This diff deliberately does not deal with detecting ".php" as another valid extension for Hack code. That's much trickier since the code looks basically identical to PHP to the classifier, and needs a different approach.