Commit Graph

147 Commits

Author SHA1 Message Date
Andy Lindeman
aa5a94cc3e Handle case where newline chars don't transcode to detected encoding
We've seen cases where binary files are detected as encodings such as
ISO-8859-8-I. This usually happens when the binary files are short, so
while the detector is mistaken, there is also not very much data for use
in the detection algorithm in the first place so it's understandable
that the detector was wrong.

In these cases, the code to convert ASCII newline characters to
encodings such as ISO-8859-8-I fails because there is no conversion
between them.

We now simply assume that the data is all one line in those cases. In
reality the data is binary, but this obviously difficult to detect
reliably.
2014-06-03 12:26:23 -04:00
Andy Lindeman
09a33f8daa Takes a different approach 2014-05-21 15:11:06 -04:00
Andy Lindeman
185db0e8d5 Makes sure we do not fail if encoding == nil
It looks like it's valid to call this method even if `binary?` is true.
Encoding as 'ASCII-8BIT' should always succeed.
2014-05-21 13:36:39 -04:00
Andy Lindeman
85efbde3f7 Counts the number of lines correctly for files with certain multibyte encodings 2014-05-21 13:36:39 -04:00
Ted Nyman
69bfe73165 Not yet on the additional binary check 2014-02-16 19:43:33 -08:00
Ted Nyman
b0894e20ef Merge pull request #301 from andyli/binary
Do not detect language if it is a binary file.
2014-02-16 14:55:07 -08:00
Ted Nyman
7e178cc416 Place guards, checks for multiline shell hacks 2013-12-06 22:04:40 -08:00
Ted Nyman
f51c5e3159 Update documentation related to Pygments 2013-07-12 14:10:55 -07:00
Joshua Peek
032125b114 Axe indexable? 2013-06-10 11:06:18 -05:00
Joshua Peek
b1a137135e Axe colorize_without_wrapper 2013-06-10 10:58:33 -05:00
Joshua Peek
1a53d1973a ws 2013-06-10 10:39:59 -05:00
Joshua Peek
3e3fb0cdfe Say why 2013-06-09 21:02:55 -05:00
Joshua Peek
d907ab9940 Kill mac_format check, buggy 2013-06-09 21:02:11 -05:00
Joshua Peek
9c1d6e154c Always split lines on \n or \r 2013-06-09 21:01:03 -05:00
Joshua Peek
fa797df0c7 Note that BlobHelper is a turd 2013-06-09 20:51:26 -05:00
Joshua Peek
c7100be139 Make mac_format? private 2013-06-09 20:48:45 -05:00
Yaroslav Shirokov
b68732f0c7 Add detection for CSV 2013-04-04 14:01:09 -07:00
Garen Torikian
4148ff1c29 Add PDF detection 2013-03-25 15:45:58 -07:00
Ted Nyman
de94b85c0d Merge pull request #295 from yandy/patch-1
downcase extname when we determin whether it's a image
2013-03-10 15:39:55 -07:00
Pascal Borreli
70eafb2ffc Fixed typos 2013-03-03 21:26:31 +00:00
Mike Skalnik
1766123448 Fix typo in comment 2013-02-26 14:00:42 -08:00
Mike Skalnik
5ea039a74e Remove OBJ files as support solids 2013-02-26 14:00:29 -08:00
Ted Nyman
58a9b56f4d Merge pull request #253 from Tass/master
Binary mime type override if languages.yml says so
2013-02-21 21:49:09 -08:00
Mike Skalnik
041ab041ae Add binary & ascii STLs and OBJs 2013-01-17 14:15:01 -08:00
Andy Li
7c9e973082 Do not detect language if it is a binary file. 2012-11-26 21:54:43 +08:00
Michael Ding
97c998946b determine image with downcase extname 2012-11-22 20:30:59 +08:00
Michael Ding
8529c90a4d use downcase string for extname 2012-11-22 17:14:45 +08:00
Joshua Peek
31e33f99f2 Ensure lang is skipped on any binary file 2012-09-24 10:51:39 -05:00
Simon Hafner
b954d22eba Override for binary mime type based on languages.yml
If the extension already exists in languages.yml, it's probably not a
binary, but code.
2012-09-13 14:55:31 -05:00
Ryan Tomayko
887a050db9 Only search the first 4K chars for \r 2012-09-10 01:56:08 -07:00
Ryan Tomayko
2e49c06f47 Handle Mac Format when splitting lines 2012-09-10 01:05:48 -07:00
Scott J. Goldman
04394750e7 When testing if a blob is safe to colorize, check size first
Similar to e415a13
2012-09-02 00:08:37 -07:00
Scott J. Goldman
e415a1351b When testing if a blob is indexable, check size first
Otherwise, charlock_holmes will allocate another large binary
buffer for testing the encoding, which is a problem if the binary
blob is many hundreds of MB large. It'll just fail and crash ruby.
2012-08-31 22:47:19 -07:00
Joshua Peek
cfe496e9fc Drop mime type module
Closes #206
2012-08-20 11:40:32 -05:00
Joshua Peek
b85aeaad3e Inline mime type lookup into blob helper 2012-08-20 11:33:16 -05:00
Joshua Peek
f8df871d85 Only double check binary mime type when lazy loading blob 2012-08-20 11:20:37 -05:00
Joshua Peek
620150d188 Only double check with binary mime type when lazy loading blob 2012-08-20 11:14:45 -05:00
Joshua Peek
047d23862e Still index .txt 2012-08-03 16:34:53 -05:00
Joshua Peek
804e23e995 Extract seperate language detection method 2012-08-03 16:03:06 -05:00
Joshua Peek
41b7d13aa7 Extract generated blob check into its own module 2012-08-03 15:47:50 -05:00
Joshua Peek
16a67cb852 Move shebang detection into classifier
Fixes #203
2012-08-03 15:07:36 -05:00
Joshua Peek
6014bd015e Change find_by_filename api to return all matching languages 2012-08-03 13:53:12 -05:00
Joshua Peek
65d05e02c9 name can be nil 2012-07-23 17:19:11 -05:00
Joshua Peek
6ac9138aed Remove pathname
Closes #207
2012-07-23 16:50:30 -05:00
Joshua Peek
bf944f6d1a Make classify a function on the Classifier 2012-07-23 13:47:15 -05:00
Joshua Peek
80e8ee7ce6 Rename Sample -> Samples 2012-07-23 13:15:27 -05:00
Joshua Peek
0c9a947f39 Load classifer db into sample data hash 2012-07-23 13:13:52 -05:00
Joshua Peek
7292bdc180 Change Classifier to accept language name Strings 2012-07-20 15:52:27 -05:00
Joshua Peek
e58f268258 Associate .module with drupal php 2012-07-20 15:42:21 -05:00
Joshua Peek
0867e7b69b Remove old language disambiguation functions 2012-07-20 15:30:53 -05:00