Exclude documentation files from language statistics

Documentation is an important part of a software project but is not
generally thought of as part of the code for that project. Repository
language statistics are used to quantify the project's code, so it makes
sense to exclude documentation from those computations.

Documentation files are recognized similarly to vendored files.
lib/linguist/documentation.yml contains regular expressions to match
common names for documentation files. A new linguist-documentation Git
attribute can be used to override those conventions.
This commit is contained in:
Adam Roben
2015-02-12 10:04:44 -05:00
parent e0c1107a25
commit 066052ddd2
7 changed files with 98 additions and 3 deletions

View File

@@ -236,6 +236,21 @@ module Linguist
name =~ VendoredRegexp ? true : false
end
documentation_paths = YAML.load_file(File.expand_path("../documentation.yml", __FILE__))
DocumentationRegexp = Regexp.new(documentation_paths.join('|'))
# Public: Is the blob in a documentation directory?
#
# Documentation files are ignored by language statistics.
#
# See "documentation.yml" for a list of documentation conventions that match
# this pattern.
#
# Return true or false
def documentation?
name =~ DocumentationRegexp ? true : false
end
# Public: Get each line of data
#
# Requires Blob#data

View File

@@ -0,0 +1,18 @@
# Documentation files and directories are excluded from language
# statistics.
#
# Lines in this file are Regexps that are matched against the file
# pathname.
#
# Please add additional test coverage to
# `test/test_blob.rb#test_documentation` if you make any changes.
## Documentation Conventions ##
- ^docs?/
- ^Documentation/
- (^|/)CONTRIBUTING(\.|$)
- (^|/)COPYING(\.|$)
- (^|/)LICEN[CS]E(\.|$)
- (^|/)README(\.|$)

View File

@@ -4,7 +4,7 @@ require 'rugged'
module Linguist
class LazyBlob
GIT_ATTR = ['linguist-language', 'linguist-vendored']
GIT_ATTR = ['linguist-documentation', 'linguist-language', 'linguist-vendored']
GIT_ATTR_OPTS = { :priority => [:index], :skip_system => true }
GIT_ATTR_FLAGS = Rugged::Repository::Attributes.parse_opts(GIT_ATTR_OPTS)
@@ -37,6 +37,14 @@ module Linguist
end
end
def documentation?
if attr = git_attributes['linguist-documentation']
boolean_attribute(attr)
else
super
end
end
def language
return @language if defined?(@language)

View File

@@ -159,7 +159,7 @@ module Linguist
blob = Linguist::LazyBlob.new(repository, delta.new_file[:oid], new, mode.to_s(8))
# Skip vendored or generated blobs
next if blob.vendored? || blob.generated? || blob.language.nil?
next if blob.vendored? || blob.documentation? || blob.generated? || blob.language.nil?
if DETECTABLE_TYPES.include?(blob.language.type)
file_map[new] = [blob.language.group.name, blob.size]