Andy Lindeman
aa5a94cc3e
Handle case where newline chars don't transcode to detected encoding
...
We've seen cases where binary files are detected as encodings such as
ISO-8859-8-I. This usually happens when the binary files are short, so
while the detector is mistaken, there is also not very much data for use
in the detection algorithm in the first place so it's understandable
that the detector was wrong.
In these cases, the code to convert ASCII newline characters to
encodings such as ISO-8859-8-I fails because there is no conversion
between them.
We now simply assume that the data is all one line in those cases. In
reality the data is binary, but this obviously difficult to detect
reliably.
2014-06-03 12:26:23 -04:00
Andy Lindeman
09a33f8daa
Takes a different approach
2014-05-21 15:11:06 -04:00
Andy Lindeman
185db0e8d5
Makes sure we do not fail if encoding == nil
...
It looks like it's valid to call this method even if `binary?` is true.
Encoding as 'ASCII-8BIT' should always succeed.
2014-05-21 13:36:39 -04:00
Andy Lindeman
85efbde3f7
Counts the number of lines correctly for files with certain multibyte encodings
2014-05-21 13:36:39 -04:00
Ted Nyman
69bfe73165
Not yet on the additional binary check
2014-02-16 19:43:33 -08:00
Ted Nyman
b0894e20ef
Merge pull request #301 from andyli/binary
...
Do not detect language if it is a binary file.
2014-02-16 14:55:07 -08:00
Ted Nyman
7e178cc416
Place guards, checks for multiline shell hacks
2013-12-06 22:04:40 -08:00
Ted Nyman
f51c5e3159
Update documentation related to Pygments
2013-07-12 14:10:55 -07:00
Joshua Peek
032125b114
Axe indexable?
2013-06-10 11:06:18 -05:00
Joshua Peek
b1a137135e
Axe colorize_without_wrapper
2013-06-10 10:58:33 -05:00
Joshua Peek
1a53d1973a
ws
2013-06-10 10:39:59 -05:00
Joshua Peek
3e3fb0cdfe
Say why
2013-06-09 21:02:55 -05:00
Joshua Peek
d907ab9940
Kill mac_format check, buggy
2013-06-09 21:02:11 -05:00
Joshua Peek
9c1d6e154c
Always split lines on \n or \r
2013-06-09 21:01:03 -05:00
Joshua Peek
fa797df0c7
Note that BlobHelper is a turd
2013-06-09 20:51:26 -05:00
Joshua Peek
c7100be139
Make mac_format? private
2013-06-09 20:48:45 -05:00
Yaroslav Shirokov
b68732f0c7
Add detection for CSV
2013-04-04 14:01:09 -07:00
Garen Torikian
4148ff1c29
Add PDF detection
2013-03-25 15:45:58 -07:00
Ted Nyman
de94b85c0d
Merge pull request #295 from yandy/patch-1
...
downcase extname when we determin whether it's a image
2013-03-10 15:39:55 -07:00
Pascal Borreli
70eafb2ffc
Fixed typos
2013-03-03 21:26:31 +00:00
Mike Skalnik
1766123448
Fix typo in comment
2013-02-26 14:00:42 -08:00
Mike Skalnik
5ea039a74e
Remove OBJ files as support solids
2013-02-26 14:00:29 -08:00
Ted Nyman
58a9b56f4d
Merge pull request #253 from Tass/master
...
Binary mime type override if languages.yml says so
2013-02-21 21:49:09 -08:00
Mike Skalnik
041ab041ae
Add binary & ascii STLs and OBJs
2013-01-17 14:15:01 -08:00
Andy Li
7c9e973082
Do not detect language if it is a binary file.
2012-11-26 21:54:43 +08:00
Michael Ding
97c998946b
determine image with downcase extname
2012-11-22 20:30:59 +08:00
Michael Ding
8529c90a4d
use downcase string for extname
2012-11-22 17:14:45 +08:00
Joshua Peek
31e33f99f2
Ensure lang is skipped on any binary file
2012-09-24 10:51:39 -05:00
Simon Hafner
b954d22eba
Override for binary mime type based on languages.yml
...
If the extension already exists in languages.yml, it's probably not a
binary, but code.
2012-09-13 14:55:31 -05:00
Ryan Tomayko
887a050db9
Only search the first 4K chars for \r
2012-09-10 01:56:08 -07:00
Ryan Tomayko
2e49c06f47
Handle ✨ Mac Format ✨ when splitting lines
2012-09-10 01:05:48 -07:00
Scott J. Goldman
04394750e7
When testing if a blob is safe to colorize, check size first
...
Similar to e415a13
2012-09-02 00:08:37 -07:00
Scott J. Goldman
e415a1351b
When testing if a blob is indexable, check size first
...
Otherwise, charlock_holmes will allocate another large binary
buffer for testing the encoding, which is a problem if the binary
blob is many hundreds of MB large. It'll just fail and crash ruby.
2012-08-31 22:47:19 -07:00
Joshua Peek
cfe496e9fc
Drop mime type module
...
Closes #206
2012-08-20 11:40:32 -05:00
Joshua Peek
b85aeaad3e
Inline mime type lookup into blob helper
2012-08-20 11:33:16 -05:00
Joshua Peek
f8df871d85
Only double check binary mime type when lazy loading blob
2012-08-20 11:20:37 -05:00
Joshua Peek
620150d188
Only double check with binary mime type when lazy loading blob
2012-08-20 11:14:45 -05:00
Joshua Peek
047d23862e
Still index .txt
2012-08-03 16:34:53 -05:00
Joshua Peek
804e23e995
Extract seperate language detection method
2012-08-03 16:03:06 -05:00
Joshua Peek
41b7d13aa7
Extract generated blob check into its own module
2012-08-03 15:47:50 -05:00
Joshua Peek
16a67cb852
Move shebang detection into classifier
...
Fixes #203
2012-08-03 15:07:36 -05:00
Joshua Peek
6014bd015e
Change find_by_filename api to return all matching languages
2012-08-03 13:53:12 -05:00
Joshua Peek
65d05e02c9
name can be nil
2012-07-23 17:19:11 -05:00
Joshua Peek
6ac9138aed
Remove pathname
...
Closes #207
2012-07-23 16:50:30 -05:00
Joshua Peek
bf944f6d1a
Make classify a function on the Classifier
2012-07-23 13:47:15 -05:00
Joshua Peek
80e8ee7ce6
Rename Sample -> Samples
2012-07-23 13:15:27 -05:00
Joshua Peek
0c9a947f39
Load classifer db into sample data hash
2012-07-23 13:13:52 -05:00
Joshua Peek
7292bdc180
Change Classifier to accept language name Strings
2012-07-20 15:52:27 -05:00
Joshua Peek
e58f268258
Associate .module with drupal php
2012-07-20 15:42:21 -05:00
Joshua Peek
0867e7b69b
Remove old language disambiguation functions
2012-07-20 15:30:53 -05:00