More documentation

This commit is contained in:
Ted Nyman
2013-12-15 20:17:30 -08:00
parent 0626def699
commit 17d0b1e02f
2 changed files with 6 additions and 7 deletions

View File

@@ -14,7 +14,7 @@ For disambiguating between files with common extensions, we first apply
some common-sense heuristics to pick out obvious languages. After that, we use a
[Bayesian
classifier](https://github.com/github/linguist/blob/master/lib/linguist/classifier.rb).
For an example, this process us tell the difference between `.h` files which could be either C, C++, or Obj-C.
For an example, this process can help us tell the difference between `.h` files which could be either C, C++, or Obj-C.
In the actual GitHub app we deal with `Grit::Blob` objects. For testing, there is a simple `FileBlob` API.

View File

@@ -1,16 +1,14 @@
require 'linguist/tokenizer'
module Linguist
# A collection of simple heuristics that can be used to better analysis languages.
class Heuristics
# Public: Given an array of String language names, a
# apply all heuristics against the given data and return an array
# Public: Given an array of String language names,
# apply heuristics against the given data and return an array
# of matching languages, or nil.
#
# data - Array of tokens or String data to analyze.
# languages - Array of language name Strings to restrict to.
#
# Returns an array of language name Strings, or []
# Returns an array of Languages or []
def self.find_by_heuristics(data, languages)
if languages.all? { |l| ["Objective-C", "C++"].include?(l) }
disambiguate_h(data, languages)
@@ -19,6 +17,8 @@ module Linguist
# .h extensions are ambigious between C, C++, and Objective-C.
# We want to shortcut look for Objective-C.
#
# Returns an array of Languages or []
def self.disambiguate_h(data, languages)
matches = []
matches << Language["Objective-C"] if data.include?("@interface")
@@ -26,4 +26,3 @@ module Linguist
end
end
end