From b2ee2cc7b8b72953025fe27116e370ab8df30265 Mon Sep 17 00:00:00 2001 From: Adam Roben Date: Tue, 10 Feb 2015 13:31:11 -0500 Subject: [PATCH] Detect all markup languages when computing language statistics Originally, only "programming" languages were included in repository language statistics. In 33ebee0f6a1e097550df0c7a69c5ffe57223f558 we started detecting a few selected "markup" languages as well. We didn't include all "markup" languages because at the time formats like Markdown and AsciiDoc were labeled as "markup" languages, and we thought that including those prose (i.e., non-code) languages in repository statistics on github.com was misleading for repositories that are largely about code but also contain a lot of documentation (e.g., rails/rails). This hand-picked set of whitelisted "markup" languages can cause strange categorization for some repositories. For example, it includes CSS (and some variants) but not HTML. This results in repositories that contain the source code for a static website being classified as either a JavaScript (programming) or CSS (markup) repository, with no mention of HTML anywhere. Fast-forward to today, and prose languages are no longer "markup" languages; they're now "prose" languages. So now we can include all "markup" languages in repository language statistics without worrying about undesirable effects for documentation-heavy repositories. --- lib/linguist/language.rb | 7 ------- lib/linguist/repository.rb | 5 +++-- 2 files changed, 3 insertions(+), 9 deletions(-) diff --git a/lib/linguist/language.rb b/lib/linguist/language.rb index 7fbf8a96..35c58c30 100644 --- a/lib/linguist/language.rb +++ b/lib/linguist/language.rb @@ -32,13 +32,6 @@ module Linguist # Valid Languages types TYPES = [:data, :markup, :programming, :prose] - # Names of non-programming languages that we will still detect - # - # Returns an array - def self.detectable_markup - ["CSS", "Less", "Sass", "SCSS", "Stylus", "TeX"] - end - # Detect languages by a specific type # # type - A symbol that exists within TYPES diff --git a/lib/linguist/repository.rb b/lib/linguist/repository.rb index 41e829c5..3c197fad 100644 --- a/lib/linguist/repository.rb +++ b/lib/linguist/repository.rb @@ -8,6 +8,8 @@ module Linguist # Its primary purpose is for gathering language statistics across # the entire project. class Repository + DETECTABLE_TYPES = [:programming, :markup].freeze + attr_reader :repository # Public: Create a new Repository based on the stats of @@ -159,8 +161,7 @@ module Linguist # Skip vendored or generated blobs next if blob.vendored? || blob.generated? || blob.language.nil? - # Only include programming languages and acceptable markup languages - if blob.language.type == :programming || Language.detectable_markup.include?(blob.language.name) + if DETECTABLE_TYPES.include?(blob.language.type) file_map[new] = [blob.language.group.name, blob.size] end end