Detect all markup languages when computing language statistics

Originally, only "programming" languages were included in repository
language statistics. In 33ebee0f6a we
started detecting a few selected "markup" languages as well. We didn't
include all "markup" languages because at the time formats like Markdown
and AsciiDoc were labeled as "markup" languages, and we thought that
including those prose (i.e., non-code) languages in repository
statistics on github.com was misleading for repositories that are
largely about code but also contain a lot of documentation (e.g.,
rails/rails).

This hand-picked set of whitelisted "markup" languages can cause strange
categorization for some repositories. For example, it includes CSS (and
some variants) but not HTML. This results in repositories that contain
the source code for a static website being classified as either a
JavaScript (programming) or CSS (markup) repository, with no mention of
HTML anywhere.

Fast-forward to today, and prose languages are no longer "markup"
languages; they're now "prose" languages. So now we can include all
"markup" languages in repository language statistics without worrying
about undesirable effects for documentation-heavy repositories.
This commit is contained in:
Adam Roben
2015-02-10 13:31:11 -05:00
parent ee0b4f96a8
commit b2ee2cc7b8
2 changed files with 3 additions and 9 deletions

View File

@@ -32,13 +32,6 @@ module Linguist
# Valid Languages types # Valid Languages types
TYPES = [:data, :markup, :programming, :prose] TYPES = [:data, :markup, :programming, :prose]
# Names of non-programming languages that we will still detect
#
# Returns an array
def self.detectable_markup
["CSS", "Less", "Sass", "SCSS", "Stylus", "TeX"]
end
# Detect languages by a specific type # Detect languages by a specific type
# #
# type - A symbol that exists within TYPES # type - A symbol that exists within TYPES

View File

@@ -8,6 +8,8 @@ module Linguist
# Its primary purpose is for gathering language statistics across # Its primary purpose is for gathering language statistics across
# the entire project. # the entire project.
class Repository class Repository
DETECTABLE_TYPES = [:programming, :markup].freeze
attr_reader :repository attr_reader :repository
# Public: Create a new Repository based on the stats of # Public: Create a new Repository based on the stats of
@@ -159,8 +161,7 @@ module Linguist
# Skip vendored or generated blobs # Skip vendored or generated blobs
next if blob.vendored? || blob.generated? || blob.language.nil? next if blob.vendored? || blob.generated? || blob.language.nil?
# Only include programming languages and acceptable markup languages if DETECTABLE_TYPES.include?(blob.language.type)
if blob.language.type == :programming || Language.detectable_markup.include?(blob.language.name)
file_map[new] = [blob.language.group.name, blob.size] file_map[new] = [blob.language.group.name, blob.size]
end end
end end