Class Cass::Stats

  1. cass/lib/cass/stats.rb
Parent: Object

Collects miscellaneous descriptive statistic methods that may be useful. These are generally not hooked up to the primary processing stream, and need to be called on an ad-hoc basis.

Methods

public class

  1. string_tokens
  2. word_count

Public class methods

string_tokens (text, s)

Count the number of times a given token s occurs in text.

[show source]
# File cass/lib/cass/stats.rb, line 34
    def self.string_tokens(text, s)
      text.scan(/#{s}/).size
    end
word_count (text, stopwords=nil, save=nil)

Takes a string as input and prints out a list of all words encountered, sorted by their frequency count (in descending order). Words are separated by whitespace; no additional processing will be performed, so if you don’t want special characters to define words, you need to preprocess the string before you call this method. Arguments:

  • text: the string to count token occurrences in.
  • stopwords: optional location of stopword file. Words in file will be excluded from count.
  • save: the filename to save the results to. If left nil, will print to screen.
[show source]
# File cass/lib/cass/stats.rb, line 18
    def self.word_count(text, stopwords=nil, save=nil)
      sw = {}
      text = text.join(" ") if text.class == Array
      File.new(stopwords).readlines.each { |l|  sw[l.strip] = 1 } if !stopwords.nil?
      words = text.split(/\s+/)
      counts = Hash.new(0)
      words.each { |w| counts[w] += 1 if !sw.key?(w) }
      counts = counts.sort { |a,b| b[1] <=> a[1] }.each { |l| "#{l[0]}: #{l[1]}" }
      if save.nil?
        puts counts
      else
        File.new(save, 'w').puts counts
      end
    end