69 Commits

Author SHA1 Message Date
Ashe Connor
0b81b21983 Grammar compiler invocation fix (#3945)
* Correct grammar-compiler invocation in build_gem

/cc @vmg

* || true so we can release with broken grammars
2017-12-12 09:41:21 +01:00
Ashe Connor
c9b3d19c6f Lexer crash fix (#3900)
* input may return 0 for EOF

Stops overruns into fread from nothing.

* remove two trailing contexts

* fix up sgml tokens
2017-11-10 22:11:32 +11:00
Ashe Connor
99eaf5faf9 Replace the tokenizer with a flex-based scanner (#3846)
* Lex everything except SGML, multiline, SHEBANG

* Prepend SHEBANG#! to tokens

* Support SGML tag/attribute extraction

* Multiline comments

* WIP cont'd; productionifying

* Compile before test

* Add extension to gemspec

* Add flex task to build lexer

* Reentrant extra data storage

* regenerate lexer

* use prefix

* rebuild lexer on linux

* Optimise a number of operations:

* Don't read and split the entire file if we only ever use the first/last n
  lines

* Only consider the first 50KiB when using heuristics/classifying.  This can
  save a *lot* of time; running a large number of regexes over 1MiB of text
  takes a while.

* Memoize File.size/read/stat; re-reading in a 500KiB file every time `data` is
  called adds up a lot.

* Use single regex for C++

* act like #lines

* [1][-2..-1] => nil, ffs

* k may not be set
2017-10-31 11:06:56 +11:00
Martin Nowak
fa6ae1116f better heuristic distinction of .d files (#3145)
* fix benchmark

- require json for Hash.to_json

* better heuristic distinction of .d files

- properly recongnize dtrace probes
- recongnize \ in Makefile paths
- recongnize single line `file.ext : dep.ext` make targets
- recognize D module, import, function, and unittest declarations
- add more representative D samples

D changed from 31.2% to 28.1%
DTrace changed from 33.5% to 32.5%
Makefile changed from 35.3% to 39.4%

See
https://gist.github.com/MartinNowak/fda24fdef64f2dbb05c5a5ceabf22bd3
for the scraper used to get a test corpus.
2017-03-30 18:25:53 +01:00
Arfon Smith
988739d566 Merge branch 'master' into combine-gems 2016-03-09 06:25:35 -06:00
rpavlick
2d392581e2 adding NCL language 2015-07-09 07:17:01 -07:00
Adam Roben
0cfdbfb91c Merge github-linguist-grammars into github-linguist
Now that all our grammars are licensed (or grandfathered in), we can
distribute them as part of the standard github-linguist gem. This makes
it easier for projects to get up and running with Linguist.
2015-01-07 14:47:26 -05:00
Adam Roben
78a0030d46 download-grammars -> convert-grammars
Downloading is only a small part of what this script does. The main
thing it does is convert grammars to JSON.
2015-01-06 13:28:25 -05:00
Garen Torikian
7a57a0b594 What is this, Lisp? 2014-11-28 12:35:42 -08:00
Garen Torikian
be82b55408 Simplify rescue catching 2014-11-28 12:33:43 -08:00
Garen Torikian
526ca1761a This require is no longer used 2014-11-28 12:33:37 -08:00
Garen Torikian
1d4149168d Add Rake task to fetch ace_modes, and skip test if there's no internet 2014-11-28 11:48:26 -08:00
Adam Roben
046fb18980 Add github-linguist-grammars gem
The purpose of this gem is to package up the language grammars that are
used for syntax highlighting on github.com. The grammars are TextMate,
Sublime Text, or Atom language grammars, converted to JSON and given the
filename SCOPE.json, where SCOPE is the language scope that the grammar
defines.

The github-linguist-grammars gem packages up all the grammars, and also
exports a Linguist::Grammars.path method to locate the directory
containing the grammars.

To build the gem, simply run `rake build_grammars_gem`. The grammars.yml
file lists all the repositories we download grammars from, as well as
which scopes are defined by each repository. The
script/download-grammars script takes that list and downloads and
processes the grammars into the format expected by the gem.
2014-11-13 11:03:53 -05:00
Brandon Keepers
02aeb4f895 Merge remote-tracking branch 'origin/master' into just-yajl
* origin/master: (42 commits)
  its always greener
  that new green shell
  Removing stale extension
  Update README.md
  Add moon interpreter for MoonScript
  Bumping version for 3.4.1 release
  Use text.html.erb scope for HTML+ERB files
  Add sample .dyalog file for file type APL
  Added extra Papyrus sample files.
  Add sample Papyrus script
  Add Papyrus support
  Add LOLCODE support
  Add ProGuard config files to vendored files
  Recognise *.dyalog as APL sources
  Assign a bunch more TextMate scopes
  CI step for samples
  Add .command as a Shell file extension
  CI config
  Vendored gems
  Update cibuild
  ...

Conflicts:
	Rakefile
2014-10-31 18:03:03 -04:00
Brandon Keepers
cd743332f4 Use yajl since it is already a dependency
Both JSON and Yajl were listed as dependencies. Pygments.rb already requires yajl, so let's just use that instead of using both.
2014-10-17 14:45:28 -04:00
Brandon Keepers
85957ecf56 Require "bundler/setup" in rakefile
This ensures that the Rake task will use bundler to manage dependencies and print a warning to run `bundle install` if dependencies are missing.
2014-10-17 14:14:27 -04:00
Arfon Smith
e71eefe8fc Merge branch 'master' into 1515-local 2014-09-30 08:38:26 -05:00
Arfon Smith
2e4e602787 Housekeeping 2014-09-29 15:20:11 -05:00
Arfon Smith
ca59303dba Preferred syntax 2014-09-18 14:25:36 -05:00
Arfon Smith
3284450dc4 Make sure samples.json is present before running tests 2014-09-18 13:56:41 -05:00
Brandon Keepers
e67c1789b8 Generate samples.json before building gem 2014-09-16 10:26:35 -04:00
Brandon Keepers
015af19eaf Move Samples::DATA constant to Samples.cache method 2014-09-16 10:25:30 -04:00
Arfon Smith
5932f5f273 Allow for result to be generated when there are un-committed changes. 2014-09-13 11:06:15 -05:00
Brandon Keepers
dab75f6f97 Rework benchmarking script to avoid git operations
$ git checkout master
    $ bundle exec rake benchmark:generate CORPUS=~/Downloads/samples-9
    wrote benchmark/results/samples-9-8cdb8ed4.json

    $ git checkout branch-name
    $ bx rake benchmark:generate CORPUS=~/Downloads/samples-9

    wrote benchmark/results/samples-9-8d8020dd.json

    $ bx rake benchmark:compare
REFERENCE=benchmark/results/samples-9-8cdb8ed4.json
CANDIDATE=benchmark/results/samples-9-8d8020dd.json
    LanguageA changed from 95.9% to 0.0%
    LanguageB changed from 4.0% to 99.9%
2014-09-10 15:47:44 -05:00
Arfon Smith
417bf7e1c9 Reworking Rake tasks 2014-08-06 19:21:20 +01:00
Arfon Smith
e376fe921b Skipping Text and Binary dirs 2014-07-23 11:30:25 -05:00
Arfon Smith
7d13b9eb99 Formatting 2014-07-23 10:59:10 -05:00
Arfon Smith
6ed0a05b44 Reporting errors in classifications 2014-07-23 10:49:29 -05:00
Arfon Smith
20154eb049 Rework diff slightly 2014-07-23 10:30:54 -05:00
Arfon Smith
84ea710d42 Moving linguist detection into rake task and ignoring diff for now. 2014-07-23 10:30:53 -05:00
Arfon Smith
4d83bf34f3 Ditching IO 2014-07-23 10:26:23 -05:00
Arfon Smith
3a797e2583 Formatting 2014-07-23 10:26:23 -05:00
Arfon Smith
7802030a53 Counting changes 2014-07-23 10:26:22 -05:00
Arfon Smith
e8e1e0ca23 Abort unless files exist 2014-07-23 10:26:22 -05:00
Arfon Smith
973431be40 Breaking comparsion step out into separate task 2014-07-23 10:26:22 -05:00
Arfon Smith
7fa1b52497 Benchmark dir 2014-07-23 10:26:22 -05:00
Arfon Smith
a90d21899a Shellwords 2014-07-23 10:26:22 -05:00
Arfon Smith
569058f481 test on all 2014-07-23 10:26:22 -05:00
Arfon Smith
4ecda08f1f Prettier print 2014-07-23 10:26:21 -05:00
Arfon Smith
3b23059c09 Prettier print 2014-07-23 10:26:21 -05:00
Arfon Smith
a474ffc101 Deep diffing 2014-07-23 10:26:21 -05:00
Arfon Smith
f7672b837a Building language indexes 2014-07-23 10:26:21 -05:00
Arfon Smith
9094923de9 Debug statements 2014-07-23 10:26:21 -05:00
Arfon Smith
6454c96e6a Abort 2014-07-23 10:26:21 -05:00
Arfon Smith
0a717f5c81 Gem 2014-07-23 10:26:21 -05:00
Arfon Smith
dab9777621 Branches 2014-07-23 10:26:20 -05:00
Ted Nyman
b5df71950d Minor formatting 2013-12-20 14:55:47 -08:00
Charlie Somerville
bf11900bc9 prefer to load from languages.json if it exists 2013-12-04 15:58:34 +11:00
Joshua Peek
b798e28bfb No warnings 2012-10-07 15:37:09 -05:00
Joshua Peek
bacfd4e832 Fix test task 2012-07-23 16:40:16 -05:00