Contrast Analysis of Semantic Similarity: CASS

text-analytic tools for social scientists



NEWS: The corresponding article was published in March 2011. You may access the article here: PDF



Social scientists strive to understand how people are similar and how people are different. One key goal is to understand differences in the way people or groups of people think about concepts--one person thinks that Republicans are amazing while the next person thinks that Republicans are horrible. Such associations reveal interesting biases or preferences. In trying to estimate conservative bias in a news station, for example, one might estimate its tendency to associate conservative concepts (conservative, republican) and good concepts (good, positive, etc.), compared to conservative concepts and bad concepts. The resulting difference would indicate the degree of conservatism. This comparison could be further refined by accounting for information about associations made for liberal concepts:

[(conservative & good) - (conservative & bad)] - [(liberal & good) - (liberal & bad)]

The output from this equation provides an estimate of relative conservative bias. CASS is used to apply the above contrast to text data (e.g., television transcripts). It extracts information about the associations among concepts in a user-supplied text file.

The CASS method provides a complement to text-analytic approaches that count word frequencies to make inferences (e.g., how often a person uses the word "conservative"). Just because a person uses the word "conservative" does not necessarily mean that they are conservative: It is important to account for the valence of the sentence in which the word "conservative" is used.

CASS could be useful beyond the study of political slant. We believe that CASS has the potential to extract information about a wide array of topics, such as personality, self-esteem, attitudes, and stereotypes.



To start your own CASS studies or to try it out using some sample text that we provide, please read the documentation here.



The software was written by Tal and Nick (tyarkoni@gmail.com; nick.holtzman@gmail.com); This new CASS method was originally described in a published paper by:

- Nick Holtzman, doctoral candidate at Washington University in St. Louis

- John Paul Schott, doctoral candidate at Washington University in St. Louis

- Mike N. Jones, assistant professor at Indiana University; creator of the BEAGLE model

- Dave Balota, professor at Washington University in St. Louis

- Tal Yarkoni, postdoctoral fellow at Columbia and the University of Colorado at Boulder



        CASS is a free software utility. Business consultation is available for a fee (e.g., for software changes).

        Feel free to contact Nick or Tal, describe your goals, and provide a cost estimate.



Last updated: 14 April 2011