mirror of
https://github.com/KevinMidboe/linguist.git
synced 2025-12-29 21:31:01 +00:00
More documentation
This commit is contained in:
@@ -14,7 +14,7 @@ For disambiguating between files with common extensions, we first apply
|
|||||||
some common-sense heuristics to pick out obvious languages. After that, we use a
|
some common-sense heuristics to pick out obvious languages. After that, we use a
|
||||||
[Bayesian
|
[Bayesian
|
||||||
classifier](https://github.com/github/linguist/blob/master/lib/linguist/classifier.rb).
|
classifier](https://github.com/github/linguist/blob/master/lib/linguist/classifier.rb).
|
||||||
For an example, this process us tell the difference between `.h` files which could be either C, C++, or Obj-C.
|
For an example, this process can help us tell the difference between `.h` files which could be either C, C++, or Obj-C.
|
||||||
|
|
||||||
In the actual GitHub app we deal with `Grit::Blob` objects. For testing, there is a simple `FileBlob` API.
|
In the actual GitHub app we deal with `Grit::Blob` objects. For testing, there is a simple `FileBlob` API.
|
||||||
|
|
||||||
|
|||||||
@@ -1,16 +1,14 @@
|
|||||||
require 'linguist/tokenizer'
|
|
||||||
|
|
||||||
module Linguist
|
module Linguist
|
||||||
# A collection of simple heuristics that can be used to better analysis languages.
|
# A collection of simple heuristics that can be used to better analysis languages.
|
||||||
class Heuristics
|
class Heuristics
|
||||||
# Public: Given an array of String language names, a
|
# Public: Given an array of String language names,
|
||||||
# apply all heuristics against the given data and return an array
|
# apply heuristics against the given data and return an array
|
||||||
# of matching languages, or nil.
|
# of matching languages, or nil.
|
||||||
#
|
#
|
||||||
# data - Array of tokens or String data to analyze.
|
# data - Array of tokens or String data to analyze.
|
||||||
# languages - Array of language name Strings to restrict to.
|
# languages - Array of language name Strings to restrict to.
|
||||||
#
|
#
|
||||||
# Returns an array of language name Strings, or []
|
# Returns an array of Languages or []
|
||||||
def self.find_by_heuristics(data, languages)
|
def self.find_by_heuristics(data, languages)
|
||||||
if languages.all? { |l| ["Objective-C", "C++"].include?(l) }
|
if languages.all? { |l| ["Objective-C", "C++"].include?(l) }
|
||||||
disambiguate_h(data, languages)
|
disambiguate_h(data, languages)
|
||||||
@@ -19,6 +17,8 @@ module Linguist
|
|||||||
|
|
||||||
# .h extensions are ambigious between C, C++, and Objective-C.
|
# .h extensions are ambigious between C, C++, and Objective-C.
|
||||||
# We want to shortcut look for Objective-C.
|
# We want to shortcut look for Objective-C.
|
||||||
|
#
|
||||||
|
# Returns an array of Languages or []
|
||||||
def self.disambiguate_h(data, languages)
|
def self.disambiguate_h(data, languages)
|
||||||
matches = []
|
matches = []
|
||||||
matches << Language["Objective-C"] if data.include?("@interface")
|
matches << Language["Objective-C"] if data.include?("@interface")
|
||||||
@@ -26,4 +26,3 @@ module Linguist
|
|||||||
end
|
end
|
||||||
end
|
end
|
||||||
end
|
end
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user