Documentation is an important part of a software project but is not
generally thought of as part of the code for that project. Repository
language statistics are used to quantify the project's code, so it makes
sense to exclude documentation from those computations.
Documentation files are recognized similarly to vendored files.
lib/linguist/documentation.yml contains regular expressions to match
common names for documentation files. A new linguist-documentation Git
attribute can be used to override those conventions.
Now that FileBlobs with relative paths can still access their files on
disk, we can use relative paths for all FileBlobs in the test. This more
closely matches the behavior in github.com's codebase, where all blobs
use relative paths.
This gives us a consistent test framework across all Ruby versions which
should help avoid errors that are only found when CI runs the tests on
different Rubies. (And this fixes an immediate bug where there's no
`skip` method in the version of test-unit we're currently using only on
Ruby 2.2.)
* origin/master: (31 commits)
Link to Lightshow in CONTRIBUTING.md
Switch to a better F# grammar
Bump Rugged again
Checkout the master for testing
Rugged 0.22.0b3
Reordering
Bump version to 4.0.3
Add some docs for tm_scope
Change NONE to none
Checking other case for Chart.jS
Test that all languages have grammars
Fix RHTML's tm_scope
Chart JS is vendored
Switch to a better grammar for Bro
reorder again…
put cjsx at the top
Use a SQF grammar for SQF files
move cjsx before iced
move cjsx before iced
change component name
...
Conflicts:
test/test_language.rb
We've seen cases where binary files are detected as encodings such as
ISO-8859-8-I. This usually happens when the binary files are short, so
while the detector is mistaken, there is also not very much data for use
in the detection algorithm in the first place so it's understandable
that the detector was wrong.
In these cases, the code to convert ASCII newline characters to
encodings such as ISO-8859-8-I fails because there is no conversion
between them.
We now simply assume that the data is all one line in those cases. In
reality the data is binary, but this obviously difficult to detect
reliably.