* Lex everything except SGML, multiline, SHEBANG
* Prepend SHEBANG#! to tokens
* Support SGML tag/attribute extraction
* Multiline comments
* WIP cont'd; productionifying
* Compile before test
* Add extension to gemspec
* Add flex task to build lexer
* Reentrant extra data storage
* regenerate lexer
* use prefix
* rebuild lexer on linux
* Optimise a number of operations:
* Don't read and split the entire file if we only ever use the first/last n
lines
* Only consider the first 50KiB when using heuristics/classifying. This can
save a *lot* of time; running a large number of regexes over 1MiB of text
takes a while.
* Memoize File.size/read/stat; re-reading in a 500KiB file every time `data` is
called adds up a lot.
* Use single regex for C++
* act like #lines
* [1][-2..-1] => nil, ffs
* k may not be set