Lexer crash fix (#3900 )

* input may return 0 for EOF Stops overruns into fread from nothing. * remove two trailing contexts * fix up sgml tokens
Update Julia definitions to use Atom instead of TextMate (#3871 )
2025-10-29 17:50:22 +00:00 · 2017-11-10 22:11:32 +11:00 · 2017-11-09 19:39:37 +11:00 · 2017-11-04 11:16:44 +01:00 · 2017-11-01 10:01:03 +10:00 · 2017-10-31 16:27:21 +00:00
52 changed files with 5104 additions and 283 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -8,3 +8,6 @@ lib/linguist/samples.json
 /node_modules
 test/fixtures/ace_modes.json
 /vendor/gems/
+/tmp
+*.bundle
+*.so
--- a/.gitmodules
+++ b/.gitmodules
@@ -169,9 +169,6 @@
 [submodule "vendor/grammars/Agda.tmbundle"]
 	path = vendor/grammars/Agda.tmbundle
 	url = https://github.com/mokus0/Agda.tmbundle
-[submodule "vendor/grammars/Julia.tmbundle"]
-	path = vendor/grammars/Julia.tmbundle
-	url = https://github.com/JuliaEditorSupport/Julia.tmbundle
 [submodule "vendor/grammars/ooc.tmbundle"]
 	path = vendor/grammars/ooc.tmbundle
 	url = https://github.com/nilium/ooc.tmbundle
@@ -883,3 +880,6 @@
 [submodule "vendor/grammars/wdl-sublime-syntax-highlighter"]
 	path = vendor/grammars/wdl-sublime-syntax-highlighter
 	url = https://github.com/broadinstitute/wdl-sublime-syntax-highlighter
+[submodule "vendor/grammars/atom-language-julia"]
+	path = vendor/grammars/atom-language-julia
+	url = https://github.com/JuliaEditorSupport/atom-language-julia
--- a/.travis.yml
+++ b/.travis.yml
@@ -19,10 +19,6 @@ rvm:
  - 2.3.3
  - 2.4.0

-matrix:
-  allow_failures:
-  - rvm: 2.4.0
-
 notifications:
  disabled: true

--- a/18
+++ b/18
@@ -1,6 +1,7 @@
 require 'bundler/setup'
 require 'rake/clean'
 require 'rake/testtask'
+require 'rake/extensiontask'
 require 'yaml'
 require 'yajl'
 require 'open-uri'
@@ -10,8 +11,14 @@ task :default => :test

 Rake::TestTask.new

+gem_spec = Gem::Specification.load('github-linguist.gemspec')
+
+Rake::ExtensionTask.new('linguist', gem_spec) do |ext|
+  ext.lib_dir = File.join('lib', 'linguist')
+end
+
 # Extend test task to check for samples and fetch latest Ace modes
-task :test => [:check_samples, :fetch_ace_modes]
+task :test => [:compile, :check_samples, :fetch_ace_modes]

 desc "Check that we have samples.json generated"
 task :check_samples do
@@ -34,12 +41,19 @@ task :fetch_ace_modes do
  end
 end

-task :samples do
+task :samples => :compile do
  require 'linguist/samples'
  json = Yajl.dump(Linguist::Samples.data, :pretty => true)
  File.write 'lib/linguist/samples.json', json
 end

+task :flex do
+  if `flex -V` !~ /^flex \d+\.\d+\.\d+/
+    fail "flex not detected"
+  end
+  system "cd ext/linguist && flex tokenizer.l"
+end
+
 task :build_gem => :samples do
  rm_rf "grammars"
  sh "script/convert-grammars"
--- a/ext/linguist/extconf.rb
+++ b/ext/linguist/extconf.rb
@@ -0,0 +1,3 @@
+require 'mkmf'
+dir_config('linguist')
+create_makefile('linguist/linguist')
--- a/ext/linguist/lex.linguist_yy.c
+++ b/ext/linguist/lex.linguist_yy.c
--- a/ext/linguist/lex.linguist_yy.h
+++ b/ext/linguist/lex.linguist_yy.h
@@ -0,0 +1,336 @@
+#ifndef linguist_yyHEADER_H
+#define linguist_yyHEADER_H 1
+#define linguist_yyIN_HEADER 1
+
+#line 6 "lex.linguist_yy.h"
+
+#define  YY_INT_ALIGNED short int
+
+/* A lexical scanner generated by flex */
+
+#define FLEX_SCANNER
+#define YY_FLEX_MAJOR_VERSION 2
+#define YY_FLEX_MINOR_VERSION 5
+#define YY_FLEX_SUBMINOR_VERSION 35
+#if YY_FLEX_SUBMINOR_VERSION > 0
+#define FLEX_BETA
+#endif
+
+/* First, we deal with  platform-specific or compiler-specific issues. */
+
+/* begin standard C headers. */
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdlib.h>
+
+/* end standard C headers. */
+
+/* flex integer type definitions */
+
+#ifndef FLEXINT_H
+#define FLEXINT_H
+
+/* C99 systems have <inttypes.h>. Non-C99 systems may or may not. */
+
+#if defined (__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
+
+/* C99 says to define __STDC_LIMIT_MACROS before including stdint.h,
+ * if you want the limit (max/min) macros for int types. 
+ */
+#ifndef __STDC_LIMIT_MACROS
+#define __STDC_LIMIT_MACROS 1
+#endif
+
+#include <inttypes.h>
+typedef int8_t flex_int8_t;
+typedef uint8_t flex_uint8_t;
+typedef int16_t flex_int16_t;
+typedef uint16_t flex_uint16_t;
+typedef int32_t flex_int32_t;
+typedef uint32_t flex_uint32_t;
+typedef uint64_t flex_uint64_t;
+#else
+typedef signed char flex_int8_t;
+typedef short int flex_int16_t;
+typedef int flex_int32_t;
+typedef unsigned char flex_uint8_t; 
+typedef unsigned short int flex_uint16_t;
+typedef unsigned int flex_uint32_t;
+#endif /* ! C99 */
+
+/* Limits of integral types. */
+#ifndef INT8_MIN
+#define INT8_MIN               (-128)
+#endif
+#ifndef INT16_MIN
+#define INT16_MIN              (-32767-1)
+#endif
+#ifndef INT32_MIN
+#define INT32_MIN              (-2147483647-1)
+#endif
+#ifndef INT8_MAX
+#define INT8_MAX               (127)
+#endif
+#ifndef INT16_MAX
+#define INT16_MAX              (32767)
+#endif
+#ifndef INT32_MAX
+#define INT32_MAX              (2147483647)
+#endif
+#ifndef UINT8_MAX
+#define UINT8_MAX              (255U)
+#endif
+#ifndef UINT16_MAX
+#define UINT16_MAX             (65535U)
+#endif
+#ifndef UINT32_MAX
+#define UINT32_MAX             (4294967295U)
+#endif
+
+#endif /* ! FLEXINT_H */
+
+#ifdef __cplusplus
+
+/* The "const" storage-class-modifier is valid. */
+#define YY_USE_CONST
+
+#else	/* ! __cplusplus */
+
+/* C99 requires __STDC__ to be defined as 1. */
+#if defined (__STDC__)
+
+#define YY_USE_CONST
+
+#endif	/* defined (__STDC__) */
+#endif	/* ! __cplusplus */
+
+#ifdef YY_USE_CONST
+#define yyconst const
+#else
+#define yyconst
+#endif
+
+/* An opaque pointer. */
+#ifndef YY_TYPEDEF_YY_SCANNER_T
+#define YY_TYPEDEF_YY_SCANNER_T
+typedef void* yyscan_t;
+#endif
+
+/* For convenience, these vars (plus the bison vars far below)
+   are macros in the reentrant scanner. */
+#define yyin yyg->yyin_r
+#define yyout yyg->yyout_r
+#define yyextra yyg->yyextra_r
+#define yyleng yyg->yyleng_r
+#define yytext yyg->yytext_r
+#define yylineno (YY_CURRENT_BUFFER_LVALUE->yy_bs_lineno)
+#define yycolumn (YY_CURRENT_BUFFER_LVALUE->yy_bs_column)
+#define yy_flex_debug yyg->yy_flex_debug_r
+
+/* Size of default input buffer. */
+#ifndef YY_BUF_SIZE
+#define YY_BUF_SIZE 16384
+#endif
+
+#ifndef YY_TYPEDEF_YY_BUFFER_STATE
+#define YY_TYPEDEF_YY_BUFFER_STATE
+typedef struct yy_buffer_state *YY_BUFFER_STATE;
+#endif
+
+#ifndef YY_TYPEDEF_YY_SIZE_T
+#define YY_TYPEDEF_YY_SIZE_T
+typedef size_t yy_size_t;
+#endif
+
+#ifndef YY_STRUCT_YY_BUFFER_STATE
+#define YY_STRUCT_YY_BUFFER_STATE
+struct yy_buffer_state
+	{
+	FILE *yy_input_file;
+
+	char *yy_ch_buf;		/* input buffer */
+	char *yy_buf_pos;		/* current position in input buffer */
+
+	/* Size of input buffer in bytes, not including room for EOB
+	 * characters.
+	 */
+	yy_size_t yy_buf_size;
+
+	/* Number of characters read into yy_ch_buf, not including EOB
+	 * characters.
+	 */
+	yy_size_t yy_n_chars;
+
+	/* Whether we "own" the buffer - i.e., we know we created it,
+	 * and can realloc() it to grow it, and should free() it to
+	 * delete it.
+	 */
+	int yy_is_our_buffer;
+
+	/* Whether this is an "interactive" input source; if so, and
+	 * if we're using stdio for input, then we want to use getc()
+	 * instead of fread(), to make sure we stop fetching input after
+	 * each newline.
+	 */
+	int yy_is_interactive;
+
+	/* Whether we're considered to be at the beginning of a line.
+	 * If so, '^' rules will be active on the next match, otherwise
+	 * not.
+	 */
+	int yy_at_bol;
+
+    int yy_bs_lineno; /**< The line count. */
+    int yy_bs_column; /**< The column count. */
+    
+	/* Whether to try to fill the input buffer when we reach the
+	 * end of it.
+	 */
+	int yy_fill_buffer;
+
+	int yy_buffer_status;
+
+	};
+#endif /* !YY_STRUCT_YY_BUFFER_STATE */
+
+void linguist_yyrestart (FILE *input_file ,yyscan_t yyscanner );
+void linguist_yy_switch_to_buffer (YY_BUFFER_STATE new_buffer ,yyscan_t yyscanner );
+YY_BUFFER_STATE linguist_yy_create_buffer (FILE *file,int size ,yyscan_t yyscanner );
+void linguist_yy_delete_buffer (YY_BUFFER_STATE b ,yyscan_t yyscanner );
+void linguist_yy_flush_buffer (YY_BUFFER_STATE b ,yyscan_t yyscanner );
+void linguist_yypush_buffer_state (YY_BUFFER_STATE new_buffer ,yyscan_t yyscanner );
+void linguist_yypop_buffer_state (yyscan_t yyscanner );
+
+YY_BUFFER_STATE linguist_yy_scan_buffer (char *base,yy_size_t size ,yyscan_t yyscanner );
+YY_BUFFER_STATE linguist_yy_scan_string (yyconst char *yy_str ,yyscan_t yyscanner );
+YY_BUFFER_STATE linguist_yy_scan_bytes (yyconst char *bytes,yy_size_t len ,yyscan_t yyscanner );
+
+void *linguist_yyalloc (yy_size_t ,yyscan_t yyscanner );
+void *linguist_yyrealloc (void *,yy_size_t ,yyscan_t yyscanner );
+void linguist_yyfree (void * ,yyscan_t yyscanner );
+
+/* Begin user sect3 */
+
+#define yytext_ptr yytext_r
+
+#ifdef YY_HEADER_EXPORT_START_CONDITIONS
+#define INITIAL 0
+#define sgml 1
+#define c_comment 2
+#define xml_comment 3
+#define haskell_comment 4
+#define ocaml_comment 5
+#define python_dcomment 6
+#define python_scomment 7
+
+#endif
+
+#ifndef YY_NO_UNISTD_H
+/* Special case for "unistd.h", since it is non-ANSI. We include it way
+ * down here because we want the user's section 1 to have been scanned first.
+ * The user has a chance to override it with an option.
+ */
+#include <unistd.h>
+#endif
+
+#define YY_EXTRA_TYPE struct tokenizer_extra *
+
+int linguist_yylex_init (yyscan_t* scanner);
+
+int linguist_yylex_init_extra (YY_EXTRA_TYPE user_defined,yyscan_t* scanner);
+
+/* Accessor methods to globals.
+   These are made visible to non-reentrant scanners for convenience. */
+
+int linguist_yylex_destroy (yyscan_t yyscanner );
+
+int linguist_yyget_debug (yyscan_t yyscanner );
+
+void linguist_yyset_debug (int debug_flag ,yyscan_t yyscanner );
+
+YY_EXTRA_TYPE linguist_yyget_extra (yyscan_t yyscanner );
+
+void linguist_yyset_extra (YY_EXTRA_TYPE user_defined ,yyscan_t yyscanner );
+
+FILE *linguist_yyget_in (yyscan_t yyscanner );
+
+void linguist_yyset_in  (FILE * in_str ,yyscan_t yyscanner );
+
+FILE *linguist_yyget_out (yyscan_t yyscanner );
+
+void linguist_yyset_out  (FILE * out_str ,yyscan_t yyscanner );
+
+yy_size_t linguist_yyget_leng (yyscan_t yyscanner );
+
+char *linguist_yyget_text (yyscan_t yyscanner );
+
+int linguist_yyget_lineno (yyscan_t yyscanner );
+
+void linguist_yyset_lineno (int line_number ,yyscan_t yyscanner );
+
+/* Macros after this point can all be overridden by user definitions in
+ * section 1.
+ */
+
+#ifndef YY_SKIP_YYWRAP
+#ifdef __cplusplus
+extern "C" int linguist_yywrap (yyscan_t yyscanner );
+#else
+extern int linguist_yywrap (yyscan_t yyscanner );
+#endif
+#endif
+
+#ifndef yytext_ptr
+static void yy_flex_strncpy (char *,yyconst char *,int ,yyscan_t yyscanner);
+#endif
+
+#ifdef YY_NEED_STRLEN
+static int yy_flex_strlen (yyconst char * ,yyscan_t yyscanner);
+#endif
+
+#ifndef YY_NO_INPUT
+
+#endif
+
+/* Amount of stuff to slurp up with each read. */
+#ifndef YY_READ_BUF_SIZE
+#define YY_READ_BUF_SIZE 8192
+#endif
+
+/* Number of entries by which start-condition stack grows. */
+#ifndef YY_START_STACK_INCR
+#define YY_START_STACK_INCR 25
+#endif
+
+/* Default declaration of generated scanner - a define so the user can
+ * easily add parameters.
+ */
+#ifndef YY_DECL
+#define YY_DECL_IS_OURS 1
+
+extern int linguist_yylex (yyscan_t yyscanner);
+
+#define YY_DECL int linguist_yylex (yyscan_t yyscanner)
+#endif /* !YY_DECL */
+
+/* yy_get_previous_state - get the state just before the EOB char was reached */
+
+#undef YY_NEW_FILE
+#undef YY_FLUSH_BUFFER
+#undef yy_set_bol
+#undef yy_new_buffer
+#undef yy_set_interactive
+#undef YY_DO_BEFORE_ACTION
+
+#ifdef YY_DECL_IS_OURS
+#undef YY_DECL_IS_OURS
+#undef YY_DECL
+#endif
+
+#line 118 "tokenizer.l"
+
+
+#line 335 "lex.linguist_yy.h"
+#undef linguist_yyIN_HEADER
+#endif /* linguist_yyHEADER_H */
--- a/ext/linguist/linguist.c
+++ b/ext/linguist/linguist.c
@@ -0,0 +1,64 @@
+#include "ruby.h"
+#include "linguist.h"
+#include "lex.linguist_yy.h"
+
+int linguist_yywrap(yyscan_t yyscanner) {
+	return 1;
+}
+
+static VALUE rb_tokenizer_extract_tokens(VALUE self, VALUE rb_data) {
+	YY_BUFFER_STATE buf;
+	yyscan_t scanner;
+	struct tokenizer_extra extra;
+	VALUE ary, s;
+	long len;
+	int r;
+
+	Check_Type(rb_data, T_STRING);
+
+	len = RSTRING_LEN(rb_data);
+	if (len > 100000)
+		len = 100000;
+
+	linguist_yylex_init_extra(&extra, &scanner);
+	buf = linguist_yy_scan_bytes(RSTRING_PTR(rb_data), (int) len, scanner);
+
+	ary = rb_ary_new();
+	do {
+		extra.type = NO_ACTION;
+		extra.token = NULL;
+		r = linguist_yylex(scanner);
+		switch (extra.type) {
+		case NO_ACTION:
+			break;
+		case REGULAR_TOKEN:
+			rb_ary_push(ary, rb_str_new2(extra.token));
+			free(extra.token);
+			break;
+		case SHEBANG_TOKEN:
+			s = rb_str_new2("SHEBANG#!");
+			rb_str_cat2(s, extra.token);
+			rb_ary_push(ary, s);
+			free(extra.token);
+			break;
+		case SGML_TOKEN:
+			s = rb_str_new2(extra.token);
+			rb_str_cat2(s, ">");
+			rb_ary_push(ary, s);
+			free(extra.token);
+			break;
+		}
+	} while (r);
+
+	linguist_yy_delete_buffer(buf, scanner);
+	linguist_yylex_destroy(scanner);
+
+	return ary;
+}
+
+__attribute__((visibility("default"))) void Init_linguist() {
+	VALUE rb_mLinguist = rb_define_module("Linguist");
+	VALUE rb_cTokenizer = rb_define_class_under(rb_mLinguist, "Tokenizer", rb_cObject);
+
+	rb_define_method(rb_cTokenizer, "extract_tokens", rb_tokenizer_extract_tokens, 1);
+}
--- a/ext/linguist/linguist.h
+++ b/ext/linguist/linguist.h
@@ -0,0 +1,11 @@
+enum tokenizer_type {
+  NO_ACTION,
+  REGULAR_TOKEN,
+  SHEBANG_TOKEN,
+  SGML_TOKEN,
+};
+
+struct tokenizer_extra {
+  char *token;
+  enum tokenizer_type type;
+};
--- a/ext/linguist/tokenizer.l
+++ b/ext/linguist/tokenizer.l
@@ -0,0 +1,119 @@
+%{
+
+#include "linguist.h"
+
+#define feed_token(tok, typ) do { \
+    yyextra->token = (tok); \
+    yyextra->type = (typ); \
+  } while (0)
+
+#define eat_until_eol() do { \
+    int c; \
+    while ((c = input(yyscanner)) != '\n' && c != EOF && c); \
+    if (c == EOF || !c) \
+      return 0; \
+  } while (0)
+
+#define eat_until_unescaped(q) do { \
+    int c; \
+    while ((c = input(yyscanner)) != EOF && c) { \
+      if (c == '\n') \
+        break; \
+      if (c == '\\') { \
+        c = input(yyscanner); \
+        if (c == EOF || !c) \
+          return 0; \
+      } else if (c == q) \
+        break; \
+    } \
+    if (c == EOF || !c) \
+      return 0; \
+  } while (0)
+
+%}
+
+%option never-interactive yywrap reentrant nounput warn nodefault header-file="lex.linguist_yy.h" extra-type="struct tokenizer_extra *" prefix="linguist_yy"
+%x sgml c_comment xml_comment haskell_comment ocaml_comment python_dcomment python_scomment
+
+%%
+
+^#![ \t]*([[:alnum:]_\/]*\/)?env([ \t]+([^ \t=]*=[^ \t]*))*[ \t]+[[:alpha:]_]+ {
+    const char *off = strrchr(yytext, ' ');
+    if (!off)
+      off = yytext;
+    else
+      ++off;
+    feed_token(strdup(off), SHEBANG_TOKEN);
+    eat_until_eol();
+    return 1;
+  }
+
+^#![ \t]*[[:alpha:]_\/]+ {
+    const char *off = strrchr(yytext, '/');
+    if (!off)
+      off = yytext;
+    else
+      ++off;
+    if (strcmp(off, "env") == 0) {
+      eat_until_eol();
+    } else {
+      feed_token(strdup(off), SHEBANG_TOKEN);
+      eat_until_eol();
+      return 1;
+    }
+  }
+
+^[ \t]*(\/\/|--|\#|%|\")" ".*   { /* nothing */ }
+
+"/*"                              { BEGIN(c_comment); }
+  /* See below for xml_comment start. */
+"{-"                              { BEGIN(haskell_comment); }
+"(*"                              { BEGIN(ocaml_comment); }
+"\"\"\""                          { BEGIN(python_dcomment); }
+"'''"                             { BEGIN(python_scomment); }
+
+<c_comment,xml_comment,haskell_comment,ocaml_comment,python_dcomment,python_scomment>.|\n { /* nothing */ }
+<c_comment>"*/"                   { BEGIN(INITIAL); }
+<xml_comment>"-->"                { BEGIN(INITIAL); }
+<haskell_comment>"-}"             { BEGIN(INITIAL); }
+<ocaml_comment>"*)"               { BEGIN(INITIAL); }
+<python_dcomment>"\"\"\""         { BEGIN(INITIAL); }
+<python_scomment>"'''"            { BEGIN(INITIAL); }
+
+\"\"|''                           { /* nothing */ }
+\"                                { eat_until_unescaped('"'); }
+'                                 { eat_until_unescaped('\''); }
+(0x[0-9a-fA-F]([0-9a-fA-F]|\.)*|[0-9]([0-9]|\.)*)([uU][lL]{0,2}|([eE][-+][0-9]*)?[fFlL]*) { /* nothing */ }
+\<[[:alnum:]_!./?-]+              {
+    if (strcmp(yytext, "<!--") == 0) {
+     BEGIN(xml_comment);
+    } else {
+      feed_token(strdup(yytext), SGML_TOKEN);
+      BEGIN(sgml);
+      return 1;
+    }
+  }
+<sgml>[[:alnum:]_]+=\"            { feed_token(strndup(yytext, strlen(yytext) - 1), REGULAR_TOKEN); eat_until_unescaped('"'); return 1; }
+<sgml>[[:alnum:]_]+='             { feed_token(strndup(yytext, strlen(yytext) - 1), REGULAR_TOKEN); eat_until_unescaped('\''); return 1; }
+<sgml>[[:alnum:]_]+=[[:alnum:]_]* { feed_token(strdup(yytext), REGULAR_TOKEN); *(strchr(yyextra->token, '=') + 1) = 0; return 1; }
+<sgml>[[:alnum:]_]+               { feed_token(strdup(yytext), REGULAR_TOKEN); return 1; }
+<sgml>\>                          { BEGIN(INITIAL); }
+<sgml>.|\n                        { /* nothing */ }
+;|\{|\}|\(|\)|\[|\]               { feed_token(strdup(yytext), REGULAR_TOKEN); return 1; }
+[[:alnum:]_.@#/*]+                {
+    if (strncmp(yytext, "/*", 2) == 0) {
+      if (strlen(yytext) >= 4 && strcmp(yytext + strlen(yytext) - 2, "*/") == 0) {
+        /* nothing */
+      } else {
+        BEGIN(c_comment);
+      }
+    } else {
+      feed_token(strdup(yytext), REGULAR_TOKEN);
+      return 1;
+    }
+  }
+\<\<?|\+|\-|\*|\/|%|&&?|\|\|?     { feed_token(strdup(yytext), REGULAR_TOKEN); return 1; }
+.|\n                              { /* nothing */ }
+
+%%
+
--- a/github-linguist.gemspec
+++ b/github-linguist.gemspec
@@ -10,8 +10,9 @@ Gem::Specification.new do |s|
  s.homepage = "https://github.com/github/linguist"
  s.license  = "MIT"

-  s.files = Dir['lib/**/*'] + Dir['grammars/*'] + ['LICENSE']
+  s.files = Dir['lib/**/*'] + Dir['ext/**/*'] + Dir['grammars/*'] + ['LICENSE']
  s.executables = ['linguist', 'git-linguist']
+  s.extensions = ['ext/linguist/extconf.rb']

  s.add_dependency 'charlock_holmes', '~> 0.7.5'
  s.add_dependency 'escape_utils',    '~> 1.1.0'
@@ -19,6 +20,7 @@ Gem::Specification.new do |s|
  s.add_dependency 'rugged',          '>= 0.25.1'

  s.add_development_dependency 'minitest', '>= 5.0'
+  s.add_development_dependency 'rake-compiler', '~> 0.9'
  s.add_development_dependency 'mocha'
  s.add_development_dependency 'plist', '~>3.1'
  s.add_development_dependency 'pry'
--- a/grammars.yml
+++ b/grammars.yml
@@ -45,8 +45,6 @@ vendor/grammars/Isabelle.tmbundle:
 - source.isabelle.theory
 vendor/grammars/JSyntax:
 - source.j
-vendor/grammars/Julia.tmbundle:
- source.julia
 vendor/grammars/Lean.tmbundle:
 - source.lean
 vendor/grammars/LiveScript.tmbundle:
@@ -192,6 +190,9 @@ vendor/grammars/atom-language-1c-bsl:
 vendor/grammars/atom-language-clean:
 - source.clean
 - text.restructuredtext.clean
+vendor/grammars/atom-language-julia:
+- source.julia
+- source.julia.console
 vendor/grammars/atom-language-p4:
 - source.p4
 vendor/grammars/atom-language-perl6:
--- a/lib/linguist/blob_helper.rb
+++ b/lib/linguist/blob_helper.rb
@@ -275,10 +275,8 @@ module Linguist
          # also--importantly--without having to duplicate many (potentially
          # large) strings.
          begin
-            encoded_newlines = ["\r\n", "\r", "\n"].
-              map { |nl| nl.encode(ruby_encoding, "ASCII-8BIT").force_encoding(data.encoding) }
-
-            data.split(Regexp.union(encoded_newlines), -1)
+            
+            data.split(encoded_newlines_re, -1)
          rescue Encoding::ConverterNotFoundError
            # The data is not splittable in the detected encoding.  Assume it's
            # one big line.
@@ -289,6 +287,51 @@ module Linguist
        end
    end

+    def encoded_newlines_re
+      @encoded_newlines_re ||= Regexp.union(["\r\n", "\r", "\n"].
+                                              map { |nl| nl.encode(ruby_encoding, "ASCII-8BIT").force_encoding(data.encoding) })
+
+    end
+
+    def first_lines(n)
+      return lines[0...n] if defined? @lines
+      return [] unless viewable? && data
+
+      i, c = 0, 0
+      while c < n && j = data.index(encoded_newlines_re, i)
+        i = j + $&.length
+        c += 1
+      end
+      data[0...i].split(encoded_newlines_re, -1)
+    end
+
+    def last_lines(n)
+      if defined? @lines
+        if n >= @lines.length
+          @lines
+        else
+          lines[-n..-1]
+        end
+      end
+      return [] unless viewable? && data
+
+      no_eol = true
+      i, c = data.length, 0
+      k = i
+      while c < n && j = data.rindex(encoded_newlines_re, i - 1)
+        if c == 0 && j + $&.length == i
+          no_eol = false
+          n += 1
+        end
+        i = j
+        k = j + $&.length
+        c += 1
+      end
+      r = data[k..-1].split(encoded_newlines_re, -1)
+      r.pop if !no_eol
+      r
+    end
+
    # Public: Get number of lines of code
    #
    # Requires Blob#data
--- a/lib/linguist/classifier.rb
+++ b/lib/linguist/classifier.rb
@@ -3,6 +3,8 @@ require 'linguist/tokenizer'
 module Linguist
  # Language bayesian classifier.
  class Classifier
+    CLASSIFIER_CONSIDER_BYTES = 50 * 1024
+
    # Public: Use the classifier to detect language of the blob.
    #
    # blob               - An object that quacks like a blob.
@@ -17,7 +19,7 @@ module Linguist
    # Returns an Array of Language objects, most probable first.
    def self.call(blob, possible_languages)
      language_names = possible_languages.map(&:name)
-      classify(Samples.cache, blob.data, language_names).map do |name, _|
+      classify(Samples.cache, blob.data[0...CLASSIFIER_CONSIDER_BYTES], language_names).map do |name, _|
        Language[name] # Return the actual Language objects
      end
    end
--- a/lib/linguist/file_blob.rb
+++ b/lib/linguist/file_blob.rb
@@ -23,21 +23,21 @@ module Linguist
    #
    # Returns a String like '100644'
    def mode
-      File.stat(@fullpath).mode.to_s(8)
+      @mode ||= File.stat(@fullpath).mode.to_s(8)
    end

    # Public: Read file contents.
    #
    # Returns a String.
    def data
-      File.read(@fullpath)
+      @data ||= File.read(@fullpath)
    end

    # Public: Get byte size
    #
    # Returns an Integer.
    def size
-      File.size(@fullpath)
+      @size ||= File.size(@fullpath)
    end
  end
 end
--- a/lib/linguist/heuristics.rb
+++ b/lib/linguist/heuristics.rb
@@ -1,6 +1,8 @@
 module Linguist
  # A collection of simple heuristics that can be used to better analyze languages.
  class Heuristics
+    HEURISTICS_CONSIDER_BYTES = 50 * 1024
+
    # Public: Use heuristics to detect language of the blob.
    #
    # blob               - An object that quacks like a blob.
@@ -14,7 +16,7 @@ module Linguist
    #
    # Returns an Array of languages, or empty if none matched or were inconclusive.
    def self.call(blob, candidates)
-      data = blob.data
+      data = blob.data[0...HEURISTICS_CONSIDER_BYTES]

      @heuristics.each do |heuristic|
        if heuristic.matches?(blob.name, candidates)
@@ -72,6 +74,14 @@ module Linguist

    # Common heuristics
    ObjectiveCRegex = /^\s*(@(interface|class|protocol|property|end|synchronised|selector|implementation)\b|#import\s+.+\.h[">])/
+    CPlusPlusRegex = Regexp.union(
+        /^\s*#\s*include <(cstdint|string|vector|map|list|array|bitset|queue|stack|forward_list|unordered_map|unordered_set|(i|o|io)stream)>/,
+        /^\s*template\s*</,
+        /^[ \t]*try/,
+        /^[ \t]*catch\s*\(/,
+        /^[ \t]*(class|(using[ \t]+)?namespace)\s+\w+/,
+        /^[ \t]*(private|public|protected):$/,
+        /std::\w+/)

    disambiguate ".as" do |data|
      if /^\s*(package\s+[a-z0-9_\.]+|import\s+[a-zA-Z0-9_\.]+;|class\s+[A-Za-z0-9_]+\s+extends\s+[A-Za-z0-9_]+)/.match(data)
@@ -219,8 +229,7 @@ module Linguist
    disambiguate ".h" do |data|
      if ObjectiveCRegex.match(data)
        Language["Objective-C"]
-      elsif (/^\s*#\s*include <(cstdint|string|vector|map|list|array|bitset|queue|stack|forward_list|unordered_map|unordered_set|(i|o|io)stream)>/.match(data) ||
-        /^\s*template\s*</.match(data) || /^[ \t]*try/.match(data) || /^[ \t]*catch\s*\(/.match(data) || /^[ \t]*(class|(using[ \t]+)?namespace)\s+\w+/.match(data) || /^[ \t]*(private|public|protected):$/.match(data) || /std::\w+/.match(data))
+      elsif CPlusPlusRegex.match(data)
        Language["C++"]
      end
    end
@@ -358,23 +367,15 @@ module Linguist
    end

    disambiguate ".pm" do |data|
-      if /^\s*(?:use\s+v6\s*;|(?:\bmy\s+)?class|module)\b/.match(data)
-        Language["Perl 6"]
-      elsif /\buse\s+(?:strict\b|v?5\.)/.match(data)
+      if /\buse\s+(?:strict\b|v?5\.)/.match(data)
        Language["Perl"]
+      elsif /^\s*(?:use\s+v6\s*;|(?:\bmy\s+)?class|module)\b/.match(data)
+        Language["Perl 6"]
      elsif /^\s*\/\* XPM \*\//.match(data)
        Language["XPM"]
      end
    end

-    disambiguate ".pod", "Pod", "Perl" do |data|
-      if /^=\w+\b/.match(data)
-        Language["Pod"]
-      else
-        Language["Perl"]
-      end
-    end
-
    disambiguate ".pro" do |data|
      if /^[^#]+:-/.match(data)
        Language["Prolog"]
@@ -476,7 +477,7 @@ module Linguist
    end

    disambiguate ".ts" do |data|
-      if data.include?("<TS")
+      if /<TS\b/.match(data)
        Language["XML"]
      else
        Language["TypeScript"]
--- a/lib/linguist/languages.yml
+++ b/lib/linguist/languages.yml
@@ -1144,6 +1144,15 @@ Ecere Projects:
  codemirror_mode: javascript
  codemirror_mime_type: application/json
  language_id: 98
+Edje Data Collection:
+  type: data
+  extensions:
+   - ".edc"
+  tm_scope: source.json
+  ace_mode: json
+  codemirror_mode: javascript
+  codemirror_mime_type: application/json
+  language_id: 342840478
 Eiffel:
  type: programming
  color: "#946d57"
@@ -1921,6 +1930,7 @@ IRC log:
  language_id: 164
 Idris:
  type: programming
+  color: "#b30000"
  extensions:
  - ".idr"
  - ".lidr"
@@ -3313,7 +3323,6 @@ Perl:
  - ".ph"
  - ".plx"
  - ".pm"
-  - ".pod"
  - ".psgi"
  - ".t"
  filenames:
--- a/lib/linguist/strategy/modeline.rb
+++ b/lib/linguist/strategy/modeline.rb
@@ -109,8 +109,8 @@ module Linguist
      # Returns an Array with one Language if the blob has a Vim or Emacs modeline
      # that matches a Language name or alias. Returns an empty array if no match.
      def self.call(blob, _ = nil)
-        header = blob.lines.first(SEARCH_SCOPE).join("\n")
-        footer = blob.lines.last(SEARCH_SCOPE).join("\n")
+        header = blob.first_lines(SEARCH_SCOPE).join("\n")
+        footer = blob.last_lines(SEARCH_SCOPE).join("\n")
        Array(Language.find_by_alias(modeline(header + footer)))
      end

--- a/lib/linguist/tokenizer.rb
+++ b/lib/linguist/tokenizer.rb
@@ -1,4 +1,5 @@
 require 'strscan'
+require 'linguist/linguist'

 module Linguist
  # Generic programming language tokenizer.
@@ -15,191 +16,5 @@ module Linguist
    def self.tokenize(data)
      new.extract_tokens(data)
    end
-
-    # Read up to 100KB
-    BYTE_LIMIT = 100_000
-
-    # Start state on token, ignore anything till the next newline
-    SINGLE_LINE_COMMENTS = [
-      '//', # C
-      '--', # Ada, Haskell, AppleScript
-      '#',  # Ruby
-      '%',  # Tex
-      '"',  # Vim
-    ]
-
-    # Start state on opening token, ignore anything until the closing
-    # token is reached.
-    MULTI_LINE_COMMENTS = [
-      ['/*', '*/'],    # C
-      ['<!--', '-->'], # XML
-      ['{-', '-}'],    # Haskell
-      ['(*', '*)'],    # Coq
-      ['"""', '"""'],  # Python
-      ["'''", "'''"]   # Python
-    ]
-
-    START_SINGLE_LINE_COMMENT =  Regexp.compile(SINGLE_LINE_COMMENTS.map { |c|
-      "\s*#{Regexp.escape(c)} "
-    }.join("|"))
-
-    START_MULTI_LINE_COMMENT =  Regexp.compile(MULTI_LINE_COMMENTS.map { |c|
-      Regexp.escape(c[0])
-    }.join("|"))
-
-    # Internal: Extract generic tokens from data.
-    #
-    # data - String to scan.
-    #
-    # Examples
-    #
-    #   extract_tokens("printf('Hello')")
-    #   # => ['printf', '(', ')']
-    #
-    # Returns Array of token Strings.
-    def extract_tokens(data)
-      s = StringScanner.new(data)
-
-      tokens = []
-      until s.eos?
-        break if s.pos >= BYTE_LIMIT
-
-        if token = s.scan(/^#!.+$/)
-          if name = extract_shebang(token)
-            tokens << "SHEBANG#!#{name}"
-          end
-
-        # Single line comment
-        elsif s.beginning_of_line? && token = s.scan(START_SINGLE_LINE_COMMENT)
-          # tokens << token.strip
-          s.skip_until(/\n|\Z/)
-
-        # Multiline comments
-        elsif token = s.scan(START_MULTI_LINE_COMMENT)
-          # tokens << token
-          close_token = MULTI_LINE_COMMENTS.assoc(token)[1]
-          s.skip_until(Regexp.compile(Regexp.escape(close_token)))
-          # tokens << close_token
-
-        # Skip single or double quoted strings
-        elsif s.scan(/"/)
-          if s.peek(1) == "\""
-            s.getch
-          else
-            s.skip_until(/(?<!\\)"/)
-          end
-        elsif s.scan(/'/)
-          if s.peek(1) == "'"
-            s.getch
-          else
-            s.skip_until(/(?<!\\)'/)
-          end
-
-        # Skip number literals
-        elsif s.scan(/(0x\h(\h|\.)*|\d(\d|\.)*)([uU][lL]{0,2}|([eE][-+]\d*)?[fFlL]*)/)
-
-        # SGML style brackets
-        elsif token = s.scan(/<[^\s<>][^<>]*>/)
-          extract_sgml_tokens(token).each { |t| tokens << t }
-
-        # Common programming punctuation
-        elsif token = s.scan(/;|\{|\}|\(|\)|\[|\]/)
-          tokens << token
-
-        # Regular token
-        elsif token = s.scan(/[\w\.@#\/\*]+/)
-          tokens << token
-
-        # Common operators
-        elsif token = s.scan(/<<?|\+|\-|\*|\/|%|&&?|\|\|?/)
-          tokens << token
-
-        else
-          s.getch
-        end
-      end
-
-      tokens
-    end
-
-    # Internal: Extract normalized shebang command token.
-    #
-    # Examples
-    #
-    #   extract_shebang("#!/usr/bin/ruby")
-    #   # => "ruby"
-    #
-    #   extract_shebang("#!/usr/bin/env node")
-    #   # => "node"
-    #
-    #   extract_shebang("#!/usr/bin/env A=B foo=bar awk -f")
-    #   # => "awk"
-    #
-    # Returns String token or nil it couldn't be parsed.
-    def extract_shebang(data)
-      s = StringScanner.new(data)
-
-      if path = s.scan(/^#!\s*\S+/)
-        script = path.split('/').last
-        if script == 'env'
-          s.scan(/\s+/)
-          s.scan(/.*=[^\s]+\s+/)
-          script = s.scan(/\S+/)
-        end
-        script = script[/[^\d]+/, 0] if script
-        return script
-      end
-
-      nil
-    end
-
-    # Internal: Extract tokens from inside SGML tag.
-    #
-    # data - SGML tag String.
-    #
-    # Examples
-    #
-    #   extract_sgml_tokens("<a href='' class=foo>")
-    #   # => ["<a>", "href="]
-    #
-    # Returns Array of token Strings.
-    def extract_sgml_tokens(data)
-      s = StringScanner.new(data)
-
-      tokens = []
-
-      until s.eos?
-        # Emit start token
-        if token = s.scan(/<\/?[^\s>]+/)
-          tokens << "#{token}>"
-
-        # Emit attributes with trailing =
-        elsif token = s.scan(/\w+=/)
-          tokens << token
-
-          # Then skip over attribute value
-          if s.scan(/"/)
-            s.skip_until(/[^\\]"/)
-          elsif s.scan(/'/)
-            s.skip_until(/[^\\]'/)
-          else
-            s.skip_until(/\w+/)
-          end
-
-        # Emit lone attributes
-        elsif token = s.scan(/\w+/)
-          tokens << token
-
-        # Stop at the end of the tag
-        elsif s.scan(/>/)
-          s.terminate
-
-        else
-          s.getch
-        end
-      end
-
-      tokens
-    end
  end
 end
--- a/lib/linguist/vendor.yml
+++ b/lib/linguist/vendor.yml
@@ -81,6 +81,9 @@
 # Animate.css
 - (^|/)animate\.(css|less|scss|styl)$

+# Select2
+- (^|/)select2/.*\.(css|scss|js)$
+
 # Vendored dependencies
 - third[-_]?party/
 - 3rd[-_]?party/
@@ -119,6 +122,15 @@
 # jQuery File Upload
 - (^|/)jquery\.fileupload(-\w+)?\.js$

+# jQuery dataTables
+- jquery.dataTables.js
+
+# bootboxjs
+- bootbox.js
+
+# pdf-worker
+- pdf.worker.js
+
 # Slick
 - (^|/)slick\.\w+.js$

--- a/lib/linguist/version.rb
+++ b/lib/linguist/version.rb
@@ -1,3 +1,3 @@
 module Linguist
-  VERSION = "5.3.0"
+  VERSION = "5.3.2"
 end
--- a/Collection/mild.edc
+++ b/Collection/mild.edc
--- a/samples/Perl/feedgnuplot
+++ b/samples/Perl/feedgnuplot
--- a/samples/Perl/Sample.pod
+++ b/samples/Perl/Sample.pod
--- a/samples/TypeScript/cache.ts
+++ b/samples/TypeScript/cache.ts
@@ -0,0 +1,102 @@
+import { DocumentNode } from 'graphql';
+import { getFragmentQueryDocument } from 'apollo-utilities';
+
+import { DataProxy, Cache } from './types';
+
+export type Transaction<T> = (c: ApolloCache<T>) => void;
+
+export abstract class ApolloCache<TSerialized> implements DataProxy {
+  // required to implement
+  // core API
+  public abstract read<T>(query: Cache.ReadOptions): T;
+  public abstract write(write: Cache.WriteOptions): void;
+  public abstract diff<T>(query: Cache.DiffOptions): Cache.DiffResult<T>;
+  public abstract watch(watch: Cache.WatchOptions): () => void;
+  public abstract evict(query: Cache.EvictOptions): Cache.EvictionResult;
+  public abstract reset(): Promise<void>;
+
+  // intializer / offline / ssr API
+  /**
+   * Replaces existing state in the cache (if any) with the values expressed by
+   * `serializedState`.
+   *
+   * Called when hydrating a cache (server side rendering, or offline storage),
+   * and also (potentially) during hot reloads.
+   */
+  public abstract restore(
+    serializedState: TSerialized,
+  ): ApolloCache<TSerialized>;
+
+  /**
+   * Exposes the cache's complete state, in a serializable format for later restoration.
+   */
+  public abstract extract(optimistic: boolean): TSerialized;
+
+  // optimistic API
+  public abstract removeOptimistic(id: string): void;
+
+  // transactional API
+  public abstract performTransaction(
+    transaction: Transaction<TSerialized>,
+  ): void;
+  public abstract recordOptimisticTransaction(
+    transaction: Transaction<TSerialized>,
+    id: string,
+  ): void;
+
+  // optional API
+  public transformDocument(document: DocumentNode): DocumentNode {
+    return document;
+  }
+  // experimental
+  public transformForLink(document: DocumentNode): DocumentNode {
+    return document;
+  }
+
+  // DataProxy API
+  /**
+   *
+   * @param options
+   * @param optimistic
+   */
+  public readQuery<QueryType>(
+    options: DataProxy.Query,
+    optimistic: boolean = false,
+  ): QueryType {
+    return this.read({
+      query: options.query,
+      variables: options.variables,
+      optimistic,
+    });
+  }
+
+  public readFragment<FragmentType>(
+    options: DataProxy.Fragment,
+    optimistic: boolean = false,
+  ): FragmentType | null {
+    return this.read({
+      query: getFragmentQueryDocument(options.fragment, options.fragmentName),
+      variables: options.variables,
+      rootId: options.id,
+      optimistic,
+    });
+  }
+
+  public writeQuery(options: Cache.WriteQueryOptions): void {
+    this.write({
+      dataId: 'ROOT_QUERY',
+      result: options.data,
+      query: options.query,
+      variables: options.variables,
+    });
+  }
+
+  public writeFragment(options: Cache.WriteFragmentOptions): void {
+    this.write({
+      dataId: options.id,
+      result: options.data,
+      variables: options.variables,
+      query: getFragmentQueryDocument(options.fragment, options.fragmentName),
+    });
+  }
+}
--- a/test/fixtures/Perl/Module.pm
+++ b/test/fixtures/Perl/Module.pm
@@ -0,0 +1,8 @@
+use 5.006;
+use strict;
+
+=head1
+
+module
+
+=cut
--- a/test/test_grammars.rb
+++ b/test/test_grammars.rb
@@ -23,7 +23,6 @@ class TestGrammars < Minitest::Test
    "8653305b358375d0fced85dc24793b99919b11ef", # language-shellscript
    "9f0c0b0926a18f5038e455e8df60221125fc3111", # elixir-tmbundle
    "a4dadb2374282098c5b8b14df308906f5347d79a", # mako-tmbundle
-    "b9b24778619dce325b651f0d77cbc72e7ae0b0a3", # Julia.tmbundle
    "e06722add999e7428048abcc067cd85f1f7ca71c", # r.tmbundle
    "50b14a0e3f03d7ca754dac42ffb33302b5882b78", # smalltalk-tmbundle
    "eafbc4a2f283752858e6908907f3c0c90188785b", # gap-tmbundle
@@ -44,6 +43,7 @@ class TestGrammars < Minitest::Test
    "9dafd4e2a79cb13a6793b93877a254bc4d351e74", # sublime-text-ox
    "8e111741d97ba2e27b3d18a309d426b4a37e604f", # sublime-varnish
    "23d2538e33ce62d58abda2c039364b92f64ea6bc", # sublime-angelscript
+    "53714285caad3c480ebd248c490509695d10404b", # atom-language-julia
  ].freeze

  # List of allowed SPDX license names
--- a/test/test_heuristics.rb
+++ b/test/test_heuristics.rb
@@ -1,6 +1,6 @@
 require_relative "./helper"

-class TestHeuristcs < Minitest::Test
+class TestHeuristics < Minitest::Test
  include Linguist

  def fixture(name)
@@ -237,14 +237,6 @@ class TestHeuristcs < Minitest::Test
    })
  end

-  # Candidate languages = ["Pod", "Perl"]
-  def test_pod_by_heuristics
-    assert_heuristics({
-      "Perl" => all_fixtures("Perl", "*.pod"),
-      "Pod" => all_fixtures("Pod", "*.pod")
-    })
-  end
-
  # Candidate languages = ["IDL", "Prolog", "QMake", "INI"]
  def test_pro_by_heuristics
    assert_heuristics({
--- a/vendor/CodeMirror
+++ b/vendor/CodeMirror
--- a/vendor/README.md
+++ b/vendor/README.md
@@ -101,6 +101,7 @@ This is a list of grammars that Linguist selects to provide syntax highlighting
 - **eC:** [ecere/ec.tmbundle](https://github.com/ecere/ec.tmbundle)
 - **Ecere Projects:** [textmate/json.tmbundle](https://github.com/textmate/json.tmbundle)
 - **ECLiPSe:** [alnkpa/sublimeprolog](https://github.com/alnkpa/sublimeprolog)
+- **Edje Data Collection:** [textmate/json.tmbundle](https://github.com/textmate/json.tmbundle)
 - **edn:** [atom/language-clojure](https://github.com/atom/language-clojure)
 - **Eiffel:** [textmate/eiffel.tmbundle](https://github.com/textmate/eiffel.tmbundle)
 - **EJS:** [gregory-m/ejs-tmbundle](https://github.com/gregory-m/ejs-tmbundle)
@@ -181,7 +182,7 @@ This is a list of grammars that Linguist selects to provide syntax highlighting
 - **JSONiq:** [wcandillon/language-jsoniq](https://github.com/wcandillon/language-jsoniq)
 - **JSONLD:** [atom/language-javascript](https://github.com/atom/language-javascript)
 - **JSX:** [github-linguist/language-babel](https://github.com/github-linguist/language-babel)
- **Julia:** [JuliaEditorSupport/Julia.tmbundle](https://github.com/JuliaEditorSupport/Julia.tmbundle)
+- **Julia:** [JuliaEditorSupport/atom-language-julia](https://github.com/JuliaEditorSupport/atom-language-julia)
 - **Jupyter Notebook:** [textmate/json.tmbundle](https://github.com/textmate/json.tmbundle)
 - **KiCad Layout:** [Alhadis/language-pcb](https://github.com/Alhadis/language-pcb)
 - **KiCad Legacy Layout:** [Alhadis/language-pcb](https://github.com/Alhadis/language-pcb)
--- a/vendor/grammars/Julia.tmbundle
+++ b/vendor/grammars/Julia.tmbundle
--- a/vendor/grammars/TypeScript-TmLanguage
+++ b/vendor/grammars/TypeScript-TmLanguage
--- a/vendor/grammars/atom-language-julia
+++ b/vendor/grammars/atom-language-julia
--- a/vendor/grammars/atom-language-p4
+++ b/vendor/grammars/atom-language-p4
--- a/vendor/grammars/elixir-tmbundle
+++ b/vendor/grammars/elixir-tmbundle
--- a/vendor/grammars/language-blade
+++ b/vendor/grammars/language-blade
--- a/vendor/grammars/language-coffee-script
+++ b/vendor/grammars/language-coffee-script
--- a/vendor/grammars/language-crystal
+++ b/vendor/grammars/language-crystal
--- a/vendor/grammars/language-css
+++ b/vendor/grammars/language-css
--- a/vendor/grammars/language-gfm
+++ b/vendor/grammars/language-gfm
--- a/vendor/grammars/language-javascript
+++ b/vendor/grammars/language-javascript
--- a/vendor/grammars/language-less
+++ b/vendor/grammars/language-less
--- a/vendor/grammars/language-meson
+++ b/vendor/grammars/language-meson
--- a/vendor/grammars/language-roff
+++ b/vendor/grammars/language-roff
--- a/vendor/grammars/language-ruby
+++ b/vendor/grammars/language-ruby
--- a/vendor/grammars/language-shellscript
+++ b/vendor/grammars/language-shellscript
--- a/vendor/grammars/language-yaml
+++ b/vendor/grammars/language-yaml
--- a/vendor/grammars/pawn-sublime-language
+++ b/vendor/grammars/pawn-sublime-language
--- a/vendor/grammars/sublime-glsl
+++ b/vendor/grammars/sublime-glsl
--- a/vendor/grammars/wdl-sublime-syntax-highlighter
+++ b/vendor/grammars/wdl-sublime-syntax-highlighter
--- a/vendor/licenses/grammar/Julia.tmbundle.txt
+++ b/vendor/licenses/grammar/Julia.tmbundle.txt
@@ -1,27 +0,0 @@
---
-type: grammar
-name: Julia.tmbundle
-license: mit
---
-Copyright (c) 2012-2014 Stefan Karpinski, Elliot Saba, Dirk Gadsden,
-Adam Strzelecki, Jonathan Malmaud and other contributors:
-
-https://github.com/JuliaEditorSupport/Julia.tmbundle/contributors
-
-Permission is hereby granted, free of charge, to any person obtaining a copy of
-this software and associated documentation files (the "Software"), to deal in
-the Software without restriction, including without limitation the rights to
-use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
-of the Software, and to permit persons to whom the Software is furnished to do
-so, subject to the following conditions:
-
-The above copyright notice and this permission notice shall be included in all
-copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.
--- a/vendor/licenses/grammar/atom-language-julia.txt
+++ b/vendor/licenses/grammar/atom-language-julia.txt
@@ -0,0 +1,27 @@
+---
+type: grammar
+name: atom-language-julia
+license: mit
+---
+The atom-language-julia package is licensed under the MIT "Expat" License:
+
+> Copyright (c) 2015
+>
+> Permission is hereby granted, free of charge, to any person obtaining
+> a copy of this software and associated documentation files (the
+> "Software"), to deal in the Software without restriction, including
+> without limitation the rights to use, copy, modify, merge, publish,
+> distribute, sublicense, and/or sell copies of the Software, and to
+> permit persons to whom the Software is furnished to do so, subject to
+> the following conditions:
+>
+> The above copyright notice and this permission notice shall be
+> included in all copies or substantial portions of the Software.
+>
+> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Author	SHA1	Message	Date
Ashe Connor	c9b3d19c6f	Lexer crash fix (#3900 ) * input may return 0 for EOF Stops overruns into fread from nothing. * remove two trailing contexts * fix up sgml tokens	2017-11-10 22:11:32 +11:00
Alex Arslan	0f4955e5d5	Update Julia definitions to use Atom instead of TextMate (#3871 )	2017-11-09 19:39:37 +11:00
Paul Chaignon	d968b0e9ee	Improve heuristic for XML/TypeScript (#3883 ) The heuristic for XML .ts files might match TypeScript generics starting with TS	2017-11-04 11:16:44 +01:00
Ashe Connor	1f5ed3b3fe	Release v5.3.2 (#3882 ) * update grammar submodules * bump to 5.3.2	2017-11-01 10:01:03 +10:00
Robert Koeninger	297be948d1	Set color for Idris language (#3866 )	2017-10-31 16:27:21 +00:00
Charles Milette	b4492e7205	Add support for Edje Data Collections (#3879 ) * Add support for Edje Data Collections Fixes #3876 * Add EDC in grammars list	2017-10-31 16:26:44 +00:00
Paul Chaignon	c05bc99004	Vendor a few big JS libraries (#3861 )	2017-10-31 15:12:02 +01:00
Ashe Connor	99eaf5faf9	Replace the tokenizer with a flex-based scanner (#3846 ) * Lex everything except SGML, multiline, SHEBANG * Prepend SHEBANG#! to tokens * Support SGML tag/attribute extraction * Multiline comments * WIP cont'd; productionifying * Compile before test * Add extension to gemspec * Add flex task to build lexer * Reentrant extra data storage * regenerate lexer * use prefix * rebuild lexer on linux * Optimise a number of operations: * Don't read and split the entire file if we only ever use the first/last n lines * Only consider the first 50KiB when using heuristics/classifying. This can save a lot of time; running a large number of regexes over 1MiB of text takes a while. * Memoize File.size/read/stat; re-reading in a 500KiB file every time `data` is called adds up a lot. * Use single regex for C++ * act like #lines * [1][-2..-1] => nil, ffs * k may not be set	2017-10-31 11:06:56 +11:00
Cesar Tessarin	21babbceb1	Fix Perl 5 and 6 disambiguation bug (#3860 ) * Add test to demonstrate Perl syntax detection bug A Perl 5 .pm file containing the word `module` or `class`, even with an explicit `use 5.` statement, is recognized as Perl 6 code. Improve Perl 5 and Perl 6 disambiguation The heuristics for Perl 5 and 6 `.pm` files disambiguation was done searching for keywords which can appear in both languages (`class` and `module`) in addition to the `use` statement check. Due to Perl 6 being tested first, code containing those words would always be interpreted as Perl 6. Test order was thus reversed, testing for Perl 5 first. Since Perl 6 code would never contain a `use 5.*` statement, this does no harm to Perl 6 detection while fixing the problem to Perl 5. Fixes: #3637	2017-10-23 10:16:56 +01:00
Paul Chaignon	15885701cd	Tests for Ruby 2.4 must pass (#3862 )	2017-10-17 11:08:04 +02:00
Ashe Connor	9b942086f7	Release v5.3.1 (#3864 ) * Fix Perl/Pod disambiguation	2017-10-17 19:31:20 +11:00
Ashe Connor	93cd47822f	Only recognise Pod for .pod files (#3863 ) We uncomplicate matters by removing ".pod" from the Perl definition entirely.	2017-10-17 19:05:20 +11:00