Can a Computer Judge Your Personality Better than Your Friends?

17 Replies

Yesterday, as I was standing in line in my campus bookstore, I heard someone on the radio talk about a new study published in the Proceedings of the National Academy of Sciences (PNAS) showing that a computer algorithm, relying only on the things you “Like” on Facebook, makes more accurate judgments of your personality than your friends. If you also heard about this study, you probably did not react the way I did yesterday. Having been a reviewer on this study, I had already read the paper. So my reaction was, “Yeah, the study did show that, but it isn’t as simple as this report makes it sound.”

So what does the study show? I personally was intrigued by three things.

1) Clearly there is a sexy news story in saying that computers make better judgments than humans. And that is precisely how this study has been discussed so far.[1] However, the data show that self-other agreement with human judges was about r = .49 (across all Big 5 traits) while self-other agreement with computer-based judgments was about r = .56. Yes, these differences are statistically significant and NO we shouldn’t care that they are statistically significant. What these effectively mean is that if you judge yourself to be above average (median) on a trait, your friends are likely to guess that you are above average 74.5% of the time, while the computer algorithm guesses correctly 78% of the time. This is a real difference, so I don’t want to downplay it, but it is important not to oversell it either.

2) To me, and I noted this in my review, one of the most interesting findings from this paper was the fact that both computer-based personality judgments from Facebook Likes *AND* peer judgments of personality predicted self-reports of personality largely independently of each other. This is discussed on p. 3 of the paper in the first full paragraph under (the beautiful looking) Figure 2. You can also see the results for yourself in Supplemental Table 2 here. Average self-other agreement with human judgments was r = .42 when controlling for computer judgments. Likewise, average self-other agreement with computer judgments was r = .38 when controlling for human judgments. Both the computer algorithm and human judgments have substantial and unique contributions to self-other agreement. That is pretty cool if you ask me.

3) Although the paper and the reports make it sound as if computers have some sort of knowledge that we do not, this is of course not true. The computer-based algorithm for making personality judgments is based entirely on the person’s behavior. That is, “Liking” something on Facebook is a behavior. The computer is taking the sum total of those behaviors into account and using them as a basis for “judgment.” And these behaviors came from the person whose personality is being judged. Thus, one could argue that the computer judgments are merely linking self-reports of behavior or preferences (e.g., I like Starbucks) with self-reports of personality.

I don’t mean to downplay the study here. I thought it was a really interesting and well-conducted study when I reviewed it, and I still do. The study combines a large sample, multiple methodologies, and sophisticated (but appropriate) analytic techniques to examine something really interesting. In those respects, this study is a model for how many of us should be doing psychological research.

[1] All I did was Google “computers are better than humans” and those were the top three stories to appear. I’m told there are many more.

Note: Thanks to David Funder and Simine Vazire for prior comments on this post.

17 thoughts on “Can a Computer Judge Your Personality Better than Your Friends?”

Pingback: A computer can know you better than your friends and relatives (but not as good as your partner) | From experience to meaning...
simine vazire January 14, 2015 at 21:43

great post! another thought i had about this study is that it’s kind of strange for reporters to claim that computers are ‘better than’ human judges, when the criterion they are using for accuracy is the person’s self-report (i.e., a human judge).
if human judges are the criterion, you’ve already conceded the game to humans.
i think the true test would be who (self, computer, strangers, close friends, etc.) would do best at predicting an ‘objective’ criterion (e.g., behavior, not as reported/measured by any of these judges).

Reply ↓
1. Ryne Post authorJanuary 14, 2015 at 23:24
  
  Thanks Simine! I think you are absolutely right. To the authors’ credit they did try to look at behavioral prediction in the study in the form of “life outcomes.” Self-reports tended to shower higher predictive validity on these, though the authors contend that, because many of those life outcomes came from self-reports themselves, they are not really on equal footing. Actual behavior as reported by an independent third party or more objective life outcomes (e.g., marriage, divorce, employment status) would be a much better test.
  
  Reply ↓
Jonas eklöf January 14, 2015 at 22:27

Both other and computers make decisions on your behaviour. And the computer can probably sum up more faster using likes. Bet a human would do it better if they revived the same instructions?

But it still is below .5 so a coin toss would still be more accurate on the big 5 , yes?

Reply ↓
1. Ryne Post authorJanuary 14, 2015 at 23:28
  
  Thanks Jonas! Regarding the .49 and .56 numbers, those are correlation coefficients (http://en.wikipedia.org/wiki/Correlation_coefficient). They can be interpreted in numerous ways, including the way I did in the post. If we ask judges to determine if the target is above or below the median, of course 50% accuracy is the baserate. In this case, human judgments had an accuracy of 74.5% compared to 78% by the computer algorithm, both well-above the baserate.
  
  Reply ↓
David Stillwell January 16, 2015 at 02:06

Hi Ryne, I’m one of the authors of the article – we came across this page when we were looking at how the coverage was perceived on Twitter; I hope this isn’t an intrusion. First, I want to say thank-you for such an in-depth and positive peer-review process.

I think your point about computers and friends predicting different self-report variance is intriguing. We also published a paper on the reliability/validity of status updates as predictors of personality, and Table 4 shows that the same applies to status updates… partialing out the predictions from friends doesn’t much decrease the correlation between status updates and personality. It’s definitely an interesting question as to how all 3 methods are predicting self-reports (see: http://gregorypark.org/assets/Automatic%20Personality%20Assessment_JPSP_OnlineFirst.pdf ). We haven’t yet looked at whether Likes and status updates predict the same variance.

Reply ↓
Ryne Post authorJanuary 16, 2015 at 02:30

Hi David! No intrusion at all. Your participation in the discussion is very much welcomed and appreciated. Thanks for pointing out Table 4 in the paper you linked. I know I read that paper when it came out, but I definitely didn’t remember that Table. It certainly corresponds closely with what you all found here. And I think looking at the overlap between Likes and Status Updates would be a great idea!

Reply ↓
Youyou January 16, 2015 at 05:05

Hi Ryne,

I am also the author of the study. I very much appreciate your excellent pre and post review. Our paper benefits a lot from your review and your blog post provides some great insights into the findings.

I have a comment about point three in the post. I tend to think of Likes as direct observation of behaviour within the natural, online setting, rather than self-reports of behaviour and preferences. They are certainly subject to users’ motivation to manager their impression and therefore might not be as naturalistic, but so are other offline behaviour.

This is to some extent evident in the fact that Likes-based personality scores predict other Facebook activities better than other self-reported behavior. A good example would be to contrast Likes with self-reports of sensational interests, an external criterion we used in the study. If Likes are thought of as self-reports of preferences, they should be almost equivalent to self-reports of sensational interests in this case, and we should expect the two to align well. Yet Likes-based personality is still slightly worse in predicting sensational interests than self-report of personality. I believe this is at least partially due to Likes sharing more method variance with online activities than self-report.

Reply ↓
1. Ryne Post authorJanuary 16, 2015 at 14:05
  
  Hi Youyou! Thanks for your comment, and of course, for the great study! Perhaps my point #3 came off a bit too strongly. I don’t personally believe that “Likes” on Facebook are the equivalent of self-reported preferences (i.e., giving someone a survey on their preferences). But I’m also a bit reluctant to call them directly observed behavior as well. I think of them as somewhere between the two. Something like the diagram below:
  
  Self-Reported Preference <------------ Mixed --------------> Actual Behavior
  Survey about preferences ———– Facebook Likes ——— Credit card purchases
  
  Reply ↓
  1. Youyou January 16, 2015 at 19:53
    
    I completely agree with that representation!
    Thanks again for thinking about our study. I’ve been reading your past posts and figured you are an R pro. I’m a big fan of R and I am all up for encouraging all psychologists to switch to R, learn programming, and make use of massive online data. Hopefully our paths will cross somewhere along the conference circuit or through collaboration 🙂
    
    Reply ↓
Aaron Weidman January 17, 2015 at 17:40

Very interesting post Ryan, and a fun paper to see. I had a question (really just an open thought) about the implications though: What does this study tell us about the process of people learning about each others’ personalities? If “likes” are trait-diagnostic behavioral residue, then it seems almost necessary that a computer (which can hold thousands of likes “in mind” at once when judging a person) will outperform another person (who can only hold a few likes [perhaps 7 +/- 2] in mind at once when making a judgment). This seems conceptually the same as adding predictors to a regression model; new predictors will always account for more variance, never less (even if a model is over-fit), and a computer’s model simply allows for more predictors than a human’s model. So, if anything, the humans here outperform the computers in what they can make out of the few predictors in their model, no? And, given that humans will never have the capacity of a computer, I’m not sure from reading the paper what we learn about inferences people make about people–which is really the main goal of personality psychology, no?

These are not fully formed thoughts by any stretch, so I’d be interested to hear yours!

Reply ↓
1. Ryne Post authorFebruary 6, 2015 at 01:15
  
  Hi Aaron,
  
  Sorry for taking so long to reply. When I first read your post I wanted to take some time to think about it. Then I got busy, but I kept thinking about it. First, I actually think the main goal of personality psychology is prediction, which probably explains my rather descriptive-focused (rather than explanation-focused) research. Beyond that though, I’m not sure the paper demonstrates that the superior memory capacity of computers to humans makes them better judges. As Simine pointed out, if the criterion is a human judge (the self) haven’t we already conceded that humans are the best? (Actually though, I think I’d rather consider the acquaintances the criterion.) While it is certainly true that computes are better a keeping the total number and kind of likes in memory than humans, I suspect that humans still have some advantages over computers. Facebook likes are easy to quantify, but other human behaviors are not. Sarcastic remarks comes to mind, but I’m sure there are many others. To be sure, computers are getting better at detecting things that we once thought only humans could measure. Whether they will ever be able to outperform humans on all of those judgment tasks is an interesting question.
  
  Reply ↓
aidan April 22, 2015 at 00:19

Ryne,

This is a very interesting post about a very interesting article. In many respects I would agree with you that the article is impressive for many of the reasons you say (large sample, multi-method, sophisticated technical aspects). However, I also think there are major limitations that I’ve not seen discussed. Several aspects of the paper left me concerned with the methodology, presentation of the findings, and conclusions.

First, similar to the point Simine made above, it is well known that human self and other ratings of personality are not always based on the same information. I’m not actually concerned that the authors used self ratings as ground truth. But I think it is worth considering the implications of the fact that the computer was trained on self ratings, so it was always doing its best to approximate the self ratings, whereas the other human was not basing their ratings on how the target would rate themselves. I don’t know whether a change in instruction for the other human rater would have much of an impact, but it really isn’t as clean of a comparison as the authors might have hoped for.

Second, another aspect that I found puzzling in the article is that in Figure 3 the authors report correlations instead of regression coefficients. For the life of me I can’t figure out why this would be a preferred method, especially since there is both shared and non-overlapping variance in the ratings. The fact that any of the three methods of rating outperforms the other in raw correlations is meaningless unless we know that the variance accounted for by the stronger predictor encompasses the variance accounted for by the weaker predictor. How do we know this isn’t just separate variance? How much would the R2 go up if all were entered together in a model? How about just the other and the computer? I can’t help but think this was the way to present the data that made the computer look best relative to other methods. But of course I have no way of knowing that.

Finally, I think we may differ in what we view as the main focus of personality psychology. I would disagree that it can be boiled down to prediction, although I agree that it is an important aspect of PP. I would argue PP has many goals, which broadly fall under bettering our understanding of human functioning writ large, but include both descriptive and mechanistic aims. This particular study is a marvel in terms of addressing the predictive aims, but I’m not sure it tells us much about personality per se.

For all these reasons, I can’t really get behind the title of the article. It is of course sensational, but it doesn’t strike me as completely accurate, nor does it tell the full story.

Reply ↓
1. Ryne Post authorApril 22, 2015 at 03:16
  
  Thanks for your comment Aidan. These are all very good points. I’ll only speak to your second point here though. As I recall, I made this very point in my review and suggested that the regression results (in the supplemental file) be included in the main text. Unfortunately, this change was not made. I completely agree with you that the regression results are more interesting (be sure to look at the supplemental file for them). They did show that both facebook and peer ratings had substantial and largely unique contributions to predicting the self-ratings. Regarding the decision not to do that, I actually blame the format of PNAS, which has very strict word limits. Tables and Figures count against you tremendously (I’ve attempted to write papers that could go there before, but I simply couldn’t make it short enough). As such, it may have simply been a pragmatic decision to leave those in the supplemental file.
  
  Best,
  
  -Ryne
  
  Reply ↓
  1. Aidan April 23, 2015 at 21:16
    
    Hi Ryne,
    Thanks for your reply. And glad to hear you raised the issue. I’m open to the possibility that I’m missing or misunderstanding things, but I can’t find any regression coefficients in the supplementary materials, just correlations (eg, in table S4). Please correct me if I’m wrong.
    
    I do think that to the extent this is not overlapping variance, or to the extent the computer is highly overlapping with the self but not the other ratings in predicting outcomes, then it challenges the articles narrative. But who knows without the presented results.
    
    Aidan
    
    Reply ↓
    1. Ryne Post authorApril 26, 2015 at 13:26
      
      Hi Aidan,
      
      I was actually referring to Table S2. It includes both zero-order and partial correlations (controlling for the third variable) for the combinations of ratings. That is, the third line contains the correlations between the self and peers for each trait. The next line contains those same partial correlations after controlling for the computer ratings. The correlations get smaller, but not dramatically so. The 5th and 6th lines do similarly only for self and computer ratings (after controlling for peers in line 6). Thus, one can concluded that while there is some overlap between peer and computer judgments, they both are largely associated with self ratings in a unique fashion.
      
      Reply ↓
Aidan April 23, 2015 at 21:18

Sorry, perhaps it wasn’t clear from my original post and my reply that I was talking about predicting external variables. I did appreciate the partial correlations in the supplementary material and would agree they are fine for supplemental.

Aidan

Reply ↓