Merge pull request #3193 from Alhadis/grammar-scripts

Add script to add or replace grammars
This commit is contained in:
Arfon Smith
2016-10-25 07:36:01 -04:00
committed by GitHub
3 changed files with 110 additions and 10 deletions

View File

@@ -17,7 +17,7 @@ To add support for a new extension:
In addition, if this extension is already listed in [`languages.yml`][languages] then sometimes a few more steps will need to be taken:
0. Make sure that example `.yourextension` files are present in the [samples directory][samples] for each language that uses `.yourextension`.
0. Test the performance of the Bayesian classifier with a relatively large number (1000s) of sample `.yourextension` files. (ping @arfon or @bkeepers to help with this) to ensure we're not misclassifying files.
0. Test the performance of the Bayesian classifier with a relatively large number (1000s) of sample `.yourextension` files. (ping **@arfon** or **@bkeepers** to help with this) to ensure we're not misclassifying files.
0. If the Bayesian classifier does a bad job with the sample `.yourextension` files then a [heuristic](https://github.com/github/linguist/blob/master/lib/linguist/heuristics.rb) may need to be written to help.
@@ -28,10 +28,7 @@ We try only to add languages once they have some usage on GitHub. In most cases
To add support for a new language:
0. Add an entry for your language to [`languages.yml`][languages]. Omit the `language_id` field for now.
0. Add a grammar for your language. Please only add grammars that have [one of these licenses](https://github.com/github/linguist/blob/257425141d4e2a5232786bf0b13c901ada075f93/vendor/licenses/config.yml#L2-L11).
0. Add your grammar as a submodule: `git submodule add https://github.com/JaneSmith/MyGrammar vendor/grammars/MyGrammar`.
0. Add your grammar to [`grammars.yml`][grammars] by running `script/convert-grammars --add vendor/grammars/MyGrammar`.
0. Download the license for the grammar: `script/licensed`. Be careful to only commit the file for the new grammar, as this script may update licenses for other grammars as well.
0. Add a grammar for your language: `script/add-grammar https://github.com/JaneSmith/MyGrammar`. Please only add grammars that have [one of these licenses][licenses].
0. Add samples for your language to the [samples directory][samples] in the correct subdirectory.
0. Add a `language_id` for your language using `script/set-language-ids`. **You should only ever need to run `script/set-language-ids --update`. Anything other than this risks breaking GitHub search :cry:**
0. Open a pull request, linking to a [GitHub search result](https://github.com/search?utf8=%E2%9C%93&q=extension%3Aboot+NOT+nothack&type=Code&ref=searchresults) showing in-the-wild usage.
@@ -39,7 +36,7 @@ To add support for a new language:
In addition, if your new language defines an extension that's already listed in [`languages.yml`][languages] (such as `.foo`) then sometimes a few more steps will need to be taken:
0. Make sure that example `.foo` files are present in the [samples directory][samples] for each language that uses `.foo`.
0. Test the performance of the Bayesian classifier with a relatively large number (1000s) of sample `.foo` files. (ping @arfon or @bkeepers to help with this) to ensure we're not misclassifying files.
0. Test the performance of the Bayesian classifier with a relatively large number (1000s) of sample `.foo` files. (ping **@arfon** or **@bkeepers** to help with this) to ensure we're not misclassifying files.
0. If the Bayesian classifier does a bad job with the sample `.foo` files then a [heuristic](https://github.com/github/linguist/blob/master/lib/linguist/heuristics.rb) may need to be written to help.
Remember, the goal here is to try and avoid false positives!
@@ -82,9 +79,9 @@ Here's our current build status: [![Build Status](https://api.travis-ci.org/gith
Linguist is maintained with :heart: by:
- @arfon (GitHub Staff)
- @larsbrinkhoff
- @pchaigno
- **@arfon** (GitHub Staff)
- **@larsbrinkhoff**
- **@pchaigno**
As Linguist is a production dependency for GitHub we have a couple of workflow restrictions:
@@ -113,5 +110,6 @@ If you are the current maintainer of this gem:
[grammars]: /grammars.yml
[languages]: /lib/linguist/languages.yml
[licenses]: https://github.com/github/linguist/blob/257425141d4e2a5232786bf0b13c901ada075f93/vendor/licenses/config.yml#L2-L11
[samples]: /samples
[new-issue]: https://github.com/github/linguist/issues/new

93
script/add-grammar Executable file
View File

@@ -0,0 +1,93 @@
#!/usr/bin/env ruby
require "optparse"
ROOT = File.expand_path("../../", __FILE__)
# Break a repository URL into its separate components
def parse_url(input)
hosts = "github\.com|bitbucket\.org|gitlab\.com"
# HTTPS/HTTP link pointing to recognised hosts
if input =~ /^(?:https?:\/\/)?(?:[^.@]+@)?(?:www\.)?(#{hosts})\/([^\/]+)\/([^\/]+)/i
{ host: $1.downcase(), user: $2, repo: $3.sub(/\.git$/, "") }
# SSH
elsif input =~ /^git@(#{hosts}):([^\/]+)\/([^\/]+)\.git$/i
{ host: $1.downcase(), user: $2, repo: $3 }
# provider:user/repo
elsif input =~ /^(github|bitbucket|gitlab):\/?([^\/]+)\/([^\/]+)\/?$/i
{ host: $1.downcase(), user: $2, repo: $3 }
# user/repo - Common GitHub shorthand
elsif input =~ /^\/?([^\/]+)\/([^\/]+)\/?$/
{ host: "github.com", user: $1, repo: $2 }
else
raise "Unsupported URL: #{input}"
end
end
# Isolate the vendor-name component of a submodule path
def parse_submodule(name)
name =~ /^(?:.*(?:vendor\/)?grammars\/)?([^\/]+)/i
path = "vendor/grammars/#{$1}"
unless File.exist?("#{ROOT}/" + path)
warn "Submodule '#{path}' does not exist. Aborting."
exit 1
end
path
end
# Print debugging feedback to STDOUT if running with --verbose
def log(msg)
puts msg if $verbose
end
usage = """Usage:
#{$0} [-v|--verbose] [--replace grammar] url
Examples:
#{$0} https://github.com/Alhadis/language-roff
#{$0} --replace sublime-apl https://github.com/Alhadis/language-apl
"""
$replace = nil
$verbose = false
OptionParser.new do |opts|
opts.banner = usage
opts.on("-v", "--verbose", "Print verbose feedback to STDOUT") do
$verbose = true
end
opts.on("-rSUBMODULE", "--replace=SUBMODDULE", "Replace an existing grammar submodule.") do |name|
$replace = name
end
end.parse!
$url = ARGV[0]
# No URL? Print a usage message and bail.
unless $url
warn usage
exit 1;
end
# Ensure the given URL is an HTTPS link
parts = parse_url $url
https = "https://#{parts[:host]}/#{parts[:user]}/#{parts[:repo]}"
repo_new = "vendor/grammars/#{parts[:repo]}"
repo_old = parse_submodule($replace) if $replace
if repo_old
log "Deregistering: #{repo_old}"
`git submodule deinit #{repo_old}`
`git rm -rf #{repo_old}`
end
log "Registering new submodule: #{repo_new}"
`git submodule add -f #{https} #{repo_new}`
exit 1 if $?.exitstatus > 0
`script/convert-grammars --add #{repo_new}`
log "Confirming license"
`script/licensed --module "#{repo_new}"`

View File

@@ -4,6 +4,7 @@
require "bundler/setup"
require "licensed/cli"
require "optparse"
module Licensed
module Source
@@ -32,7 +33,14 @@ module Licensed
end
end
source = Licensed::Source::Filesystem.new("vendor/grammars/*/", type: "grammar")
module_path = nil
OptionParser.new do |opts|
opts.on("-mPATH", "--module=PATH", "Cache license file for specific grammar") do |p|
module_path = p
end
end.parse!
source = Licensed::Source::Filesystem.new(module_path || "vendor/grammars/*/", type: "grammar")
config = Licensed::Configuration.new
config.sources << source
@@ -43,4 +51,5 @@ else
end
command.run
`git checkout -- vendor/licenses/grammar/` if module_path
exit command.success?