Arfon Smith
0443c4db2d
Merge pull request #1674 from github/rework-heuristics
...
Rework heuristics
2014-11-18 10:43:01 -06:00
Vicent Marti
4a10b27611
Remove Pygments
2014-11-14 17:37:12 +01:00
Brandon Keepers
df55043500
Bail earlier if the file is empty.
...
This will change behavior for empty files with unique extensions, returning nil instead of the language.
2014-11-06 14:49:24 -06:00
Vicent Marti
7dcc3b3edf
Add tm_scope to the BlobHelper
2014-10-13 17:19:38 +02:00
Arfon Smith
61faea0298
Fixing up bin/linguist
2014-07-23 11:20:31 -05:00
Arfon Smith
8c7b54d6e3
Fixing BlobHelper loading issue
2014-07-22 12:26:21 -05:00
Vicent Marti
c4260ae681
Use Rugged when computing Repository stats
2014-06-24 17:41:16 +02:00
Brian Lopez
7e8be1293e
Use the :ruby_encoding value from charlock 0.7.2
2014-06-04 15:51:33 -05:00
Andy Lindeman
aa5a94cc3e
Handle case where newline chars don't transcode to detected encoding
...
We've seen cases where binary files are detected as encodings such as
ISO-8859-8-I. This usually happens when the binary files are short, so
while the detector is mistaken, there is also not very much data for use
in the detection algorithm in the first place so it's understandable
that the detector was wrong.
In these cases, the code to convert ASCII newline characters to
encodings such as ISO-8859-8-I fails because there is no conversion
between them.
We now simply assume that the data is all one line in those cases. In
reality the data is binary, but this obviously difficult to detect
reliably.
2014-06-03 12:26:23 -04:00
Andy Lindeman
09a33f8daa
Takes a different approach
2014-05-21 15:11:06 -04:00
Andy Lindeman
185db0e8d5
Makes sure we do not fail if encoding == nil
...
It looks like it's valid to call this method even if `binary?` is true.
Encoding as 'ASCII-8BIT' should always succeed.
2014-05-21 13:36:39 -04:00
Andy Lindeman
85efbde3f7
Counts the number of lines correctly for files with certain multibyte encodings
2014-05-21 13:36:39 -04:00
Ted Nyman
69bfe73165
Not yet on the additional binary check
2014-02-16 19:43:33 -08:00
Ted Nyman
b0894e20ef
Merge pull request #301 from andyli/binary
...
Do not detect language if it is a binary file.
2014-02-16 14:55:07 -08:00
Ted Nyman
7e178cc416
Place guards, checks for multiline shell hacks
2013-12-06 22:04:40 -08:00
Ted Nyman
f51c5e3159
Update documentation related to Pygments
2013-07-12 14:10:55 -07:00
Joshua Peek
032125b114
Axe indexable?
2013-06-10 11:06:18 -05:00
Joshua Peek
b1a137135e
Axe colorize_without_wrapper
2013-06-10 10:58:33 -05:00
Joshua Peek
1a53d1973a
ws
2013-06-10 10:39:59 -05:00
Joshua Peek
3e3fb0cdfe
Say why
2013-06-09 21:02:55 -05:00
Joshua Peek
d907ab9940
Kill mac_format check, buggy
2013-06-09 21:02:11 -05:00
Joshua Peek
9c1d6e154c
Always split lines on \n or \r
2013-06-09 21:01:03 -05:00
Joshua Peek
fa797df0c7
Note that BlobHelper is a turd
2013-06-09 20:51:26 -05:00
Joshua Peek
c7100be139
Make mac_format? private
2013-06-09 20:48:45 -05:00
Yaroslav Shirokov
b68732f0c7
Add detection for CSV
2013-04-04 14:01:09 -07:00
Garen Torikian
4148ff1c29
Add PDF detection
2013-03-25 15:45:58 -07:00
Ted Nyman
de94b85c0d
Merge pull request #295 from yandy/patch-1
...
downcase extname when we determin whether it's a image
2013-03-10 15:39:55 -07:00
Pascal Borreli
70eafb2ffc
Fixed typos
2013-03-03 21:26:31 +00:00
Mike Skalnik
1766123448
Fix typo in comment
2013-02-26 14:00:42 -08:00
Mike Skalnik
5ea039a74e
Remove OBJ files as support solids
2013-02-26 14:00:29 -08:00
Ted Nyman
58a9b56f4d
Merge pull request #253 from Tass/master
...
Binary mime type override if languages.yml says so
2013-02-21 21:49:09 -08:00
Mike Skalnik
041ab041ae
Add binary & ascii STLs and OBJs
2013-01-17 14:15:01 -08:00
Andy Li
7c9e973082
Do not detect language if it is a binary file.
2012-11-26 21:54:43 +08:00
Michael Ding
97c998946b
determine image with downcase extname
2012-11-22 20:30:59 +08:00
Michael Ding
8529c90a4d
use downcase string for extname
2012-11-22 17:14:45 +08:00
Joshua Peek
31e33f99f2
Ensure lang is skipped on any binary file
2012-09-24 10:51:39 -05:00
Simon Hafner
b954d22eba
Override for binary mime type based on languages.yml
...
If the extension already exists in languages.yml, it's probably not a
binary, but code.
2012-09-13 14:55:31 -05:00
Ryan Tomayko
887a050db9
Only search the first 4K chars for \r
2012-09-10 01:56:08 -07:00
Ryan Tomayko
2e49c06f47
Handle ✨ Mac Format ✨ when splitting lines
2012-09-10 01:05:48 -07:00
Scott J. Goldman
04394750e7
When testing if a blob is safe to colorize, check size first
...
Similar to e415a13
2012-09-02 00:08:37 -07:00
Scott J. Goldman
e415a1351b
When testing if a blob is indexable, check size first
...
Otherwise, charlock_holmes will allocate another large binary
buffer for testing the encoding, which is a problem if the binary
blob is many hundreds of MB large. It'll just fail and crash ruby.
2012-08-31 22:47:19 -07:00
Joshua Peek
cfe496e9fc
Drop mime type module
...
Closes #206
2012-08-20 11:40:32 -05:00
Joshua Peek
b85aeaad3e
Inline mime type lookup into blob helper
2012-08-20 11:33:16 -05:00
Joshua Peek
f8df871d85
Only double check binary mime type when lazy loading blob
2012-08-20 11:20:37 -05:00
Joshua Peek
620150d188
Only double check with binary mime type when lazy loading blob
2012-08-20 11:14:45 -05:00
Joshua Peek
047d23862e
Still index .txt
2012-08-03 16:34:53 -05:00
Joshua Peek
804e23e995
Extract seperate language detection method
2012-08-03 16:03:06 -05:00
Joshua Peek
41b7d13aa7
Extract generated blob check into its own module
2012-08-03 15:47:50 -05:00
Joshua Peek
16a67cb852
Move shebang detection into classifier
...
Fixes #203
2012-08-03 15:07:36 -05:00
Joshua Peek
6014bd015e
Change find_by_filename api to return all matching languages
2012-08-03 13:53:12 -05:00