linguist

KevinMidboe/linguist

Fork 0

mirror of https://github.com/KevinMidboe/linguist.git synced 2026-07-24 18:41:55 +00:00

Commit Graph

Author	SHA1	Message	Date
Ashe Connor	c9b3d19c6f	Lexer crash fix (#3900 ) * input may return 0 for EOF Stops overruns into fread from nothing. * remove two trailing contexts * fix up sgml tokens	2017-11-10 22:11:32 +11:00
Ashe Connor	99eaf5faf9	Replace the tokenizer with a flex-based scanner (#3846 ) * Lex everything except SGML, multiline, SHEBANG * Prepend SHEBANG#! to tokens * Support SGML tag/attribute extraction * Multiline comments * WIP cont'd; productionifying * Compile before test * Add extension to gemspec * Add flex task to build lexer * Reentrant extra data storage * regenerate lexer * use prefix * rebuild lexer on linux * Optimise a number of operations: * Don't read and split the entire file if we only ever use the first/last n lines * Only consider the first 50KiB when using heuristics/classifying. This can save a lot of time; running a large number of regexes over 1MiB of text takes a while. * Memoize File.size/read/stat; re-reading in a 500KiB file every time `data` is called adds up a lot. * Use single regex for C++ * act like #lines * [1][-2..-1] => nil, ffs * k may not be set	2017-10-31 11:06:56 +11:00

Author

SHA1

Message

Date

Ashe Connor

c9b3d19c6f

Lexer crash fix (#3900 )

* input may return 0 for EOF

Stops overruns into fread from nothing.

* remove two trailing contexts

* fix up sgml tokens

2017-11-10 22:11:32 +11:00

Ashe Connor

99eaf5faf9

Replace the tokenizer with a flex-based scanner (#3846 )

* Lex everything except SGML, multiline, SHEBANG

* Prepend SHEBANG#! to tokens

* Support SGML tag/attribute extraction

* Multiline comments

* WIP cont'd; productionifying

* Compile before test

* Add extension to gemspec

* Add flex task to build lexer

* Reentrant extra data storage

* regenerate lexer

* use prefix

* rebuild lexer on linux

* Optimise a number of operations:

* Don't read and split the entire file if we only ever use the first/last n
  lines

* Only consider the first 50KiB when using heuristics/classifying.  This can
  save a *lot* of time; running a large number of regexes over 1MiB of text
  takes a while.

* Memoize File.size/read/stat; re-reading in a 500KiB file every time `data` is
  called adds up a lot.

* Use single regex for C++

* act like #lines

* [1][-2..-1] => nil, ffs

* k may not be set

2017-10-31 11:06:56 +11:00

2 Commits