Personality can be partially revealed in one’s use of language. In spoken form, language leaves residues of personality in the conversations we hold (Holtzman, Vazire, & Mehl, in press; Mehl, Gosling, & Pennebaker, 2006; Vazire & Mehl, 2008) and the personal narratives we tell (McAdams, 2008). In written form, language leaves residues of personality in our poetry (Stirman & Pennebaker, 2001), essays (Pennebaker & King, 1999), and blogs (Yarkoni, 2010). In and of itself, the finding that language is associated with personality is relatively unsurprising. Since most human communication is mediated through language, the things we say should tell us something about the people doing the saying. But formalizing this intuition in personality science has been quite challenging. The goals of this brief column are to informally review a selection of the recent work exploring the relation between personality and language, to discuss the abundance of resources that cognitive psychology offers for these types of endeavors, and to reference the literature on emerging methods so as to facilitate further exploration.
For many years, personality psychologists have studied the relation between word counts and personality. Most projects have relied on the Linguistic Inquiry and Word Count (LIWC) method developed by Pennebaker and colleagues. This method groups individual words into semantically-coherent categories made up of many words (Mehl & Gill, 2010; Pennebaker, Francis, & Booth, 2001). LIWC analyses have identified numerous associations between personality and word use. For instance, these studies have shown that Neurotic people tend to use more negative emotion words and Extraverted people tend to use more positive emotion words (Pennebaker & King, 1999). But such studies also have a downside: because the LIWC categories each aggregate over many different words, researchers using this approach cannot identify more specific associations between personality and individual words.
Yarkoni (2010) recently demonstrated the value of word-level analyses. In a large-scale analysis of the relation between bloggers’ personalities and the contents of their blogs, Yarkoni identified thousands of associations between broad and narrow personality traits and individual English words, such as a robust positive correlation between the Neuroticism facet Self-Consciousness and the word “sizes”. It turned out that “sizes” tended to appear in discussions about clothing. That makes sense in hindsight, because self-conscious people are concerned about the way they look, but the finding would be difficult to predict a priori. Moreover, such a finding would be diluted in LIWC analyses that group a single word like “sizes” with other words. Yarkoni (2010) further discusses the complementary roles of LIWC-style and word-level analyses.
For all their benefits, word count methods—whether they count categories or individual words—have limits. How do we capture not only word counts, but also the meaning of utterances embedded in context-rich sentences? What tools can personality psychology add to its toolbox to better capture personality via language use? To address these questions, we suggest that personality psychologists should look further afield, to the disciplines of cognitive psychology and computer science.
About the time that Pennebaker began establishing his line of research with the LIWC system (Pennebaker, et al., 2001), a seemingly unrelated line of work was developing in cognitive psychology. Their goal was simple: Landauer and colleagues wanted to understand the meaning of English words in relation to one another (Landauer & Dumais, 1997; Landauer, Foltz, & Laham, 1998)—to develop a quantitative thesaurus—by exploring patterns of co-occurrence between words within sentences (e.g., “dog” with “bite”; “happy” with “gleeful”). What they eventually developed was a model of semantic space, known as Latent Semantic Analysis (LSA), sophisticated enough that it has received passing grades on the TOEFL exam, a qualifying exam for prospective college students who speak English as a second language.
Although LSA and related techniques (e.g., Lund & Burgess, 1996) certainly lend themselves to measuring individual differences, these methods have been almost entirely overlooked by personality psychologists. To the best of our knowledge, no studies have modeled the relation between personality and word co-occurrences. We believe that such studies could yield considerable returns. For instance, one could ask questions such as: Do masochists tend to associate “pain” with “good” and “pleasure” with “bad”? Do lesbians tend to associate masculine concepts with words related to oneself? Are Neurotic individuals likely to show a larger relationship between self-referential words (“me”, “I”, etc.) and negative terms (“bad” or ”stupid”)? The research terrain is wide open here, as virtually no work like this has been done.
As a step toward such investigations, colleagues of ours have joined us in developing tools to explore idiosyncrasies in semantic structure (Holtzman, Schott, Jones, Balota, & Yarkoni, under review). Our research builds on recent work by Jones and Mewhort (2007), who used a model (BEAGLE) similar to LSA to capture semantic relations between individual words. As a sample application, we have used a variant of the BEAGLE model to automate the identification of media bias, but the same method could be easily used for explorations of individual differences—and we are conducting such research now. To facilitate future use, we offer user-friendly software tools to implement the methods (Contrast Analysis of Semantic Similarity: www.casstools.org). The recent trend in personality psychology towards the collection of large, naturalistic datasets ensures that there will be ample opportunities to explore the relation between semantic structure and personality. Possible datasets to use for such explorations include Mehl’s (2006) Electronically Activated Recorder (EAR) transcripts, Pennebaker’s extensive set of language corpora (e.g., Stirman & Pennebaker, 2001), Fast and Funder’s (2010) conversation transcripts, Yarkoni’s blog database (2010), and McAdam’s life narratives (2008)—this is just a sample of the text that is already collected.
The data are available. The methods are emerging. How much might we learn about personality by exploring individual differences in semantic structure?
Exploring These Topics on the Web
BEAGLE code
http://www.indiana.edu/~clcl/BEAGLE/
Tools for Contrast Analysis of Semantic Similarity
http://casstools.org
Demonstrations: Word usage and personality
http://homepage.psy.utexas.edu/homepage/faculty/pennebaker/home2000/Words.html
Latent Semantic Analysis (LSA)
http://lsa.colorado.edu/
Linguistic Inquiry and Word Count (LIWC)
http://www.liwc.net/
LIWC Description:
http://dingo.sbs.arizona.edu/~mehl/eReprints/MehlGill2010ATA.pdf
References
Fast, L. A., & Funder, D. C. (2010). Gender Differences in the Correlates of Self-Referent Word Use: Authority, Entitlement, and Depressive Symptoms. Journal of Personality, 78, 313-338.
Holtzman, N. S., Vazire, S. & Mehl, M. R. (in press). Sounds like a narcissist: Behavioral manifestations of narcissism in everyday life. Journal of Research in Personality.
Holtzman, N. S., Schott, J. P., Jones, M. N., Balota, D. A., & Yarkoni, T. (under review). Individual and group differences in semantic associations: An application to media bias.
Jones, M. N., & Mewhort, D. J. K. (2007). Representing word meaning and order information in a composite holographic lexicon. Psychological Review, 114, 1-37.
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato's problem: the Latent Semantic Analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104, 211-240.
Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25, 259-284.
Lund, K., & Burgess, C. (1996). Producing high-dimensional spaces from lexical co-occurence. Behavior Research Methods, Instruments, & Computers, 28, 203-208.
McAdams, D. P. (2008). Personal Narratives and the Life Story. In O. John, R. W. Robins & L. A. Pervin (Eds.), Handbook of personality: Theory and research. NY: Guilford Press.
Mehl, M. R., & Gill, A. J. (2010). Automatic text-analysis. In S. D. Gosling & J. A. Johnson (Eds.), Advanced Methods for Conducing Online Behavioral Research (pp. 109-127). Washington, DC: American Psychological Association.
Mehl, M. R., Gosling, S. D., & Pennebaker, J. W. (2006). Personality in its natural habitat: Manifestations and implicit folk theories of personality in daily life. Journal of Personality and Social Psychology, 90, 862-877.
Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). Linguistic Inquiry and Word Count: LIWC 2001. Mahwah, NJ: Erlbaum.
Pennebaker, J. W., & King, L. A. (1999). Linguistic styles: language use as an individual difference. Journal of Personality and Social Psychology, 77, 1296-1312.
Stirman, S. W., & Pennebaker, J. W. (2001). Word use in the poetry of suicidal and nonsuicidal poets. Psychosomatic Medicine, 63, 517-522.
Vazire, S., & Mehl, M. R. (2008). Knowing me, knowing you: The accuracy and unique predictive validity of self-ratings and other-ratings of daily behavior. Journal of Personality and Social Psychology, 95, 1202-1216.
Yarkoni, T. (2010). Personality in 100,000 words: A large-scale analysis of personality and word use among bloggers. Journal of Research in Personality.