{"id":59,"date":"2014-11-17T04:23:38","date_gmt":"2014-11-17T04:23:38","guid":{"rendered":"http:\/\/rynesherman.com\/blog\/?p=59"},"modified":"2014-11-19T02:01:26","modified_gmt":"2014-11-19T02:01:26","slug":"misinterpreting-confidence-intervals","status":"publish","type":"post","link":"http:\/\/rynesherman.com\/blog\/misinterpreting-confidence-intervals\/","title":{"rendered":"(Mis)Interpreting Confidence Intervals"},"content":{"rendered":"<p>In a recent paper <a href=\"http:\/\/link.springer.com\/article\/10.3758%2Fs13423-013-0572-3\">Hoekstra, Morey, Rouder, &amp; Wagenmakers<\/a> argued that confidence intervals are just as prone to misinterpretation as tradiational <em>p<\/em>-values (for a nice summary, see this <a href=\"http:\/\/digest.bps.org.uk\/2014\/11\/reformers-say-psychologists-should.html\">blog<\/a> post). They draw this conclusion based on responses to six questions from 442 bachelor students, 34 master students, and 120 researchers (PhD students and faculty). The six questions were of True \/ False format and are shown here (this is taken directly from their Appendix, please don\u2019t sue me; if I am breaking the law I will remove this without hesitation):<\/p>\n<p><a href=\"http:\/\/rynesherman.com\/blog\/wp-content\/uploads\/2014\/11\/Bubledorf.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-60 size-full\" src=\"http:\/\/rynesherman.com\/blog\/wp-content\/uploads\/2014\/11\/Bubledorf.png\" alt=\"Bubledorf\" width=\"584\" height=\"673\" srcset=\"http:\/\/rynesherman.com\/blog\/wp-content\/uploads\/2014\/11\/Bubledorf.png 584w, http:\/\/rynesherman.com\/blog\/wp-content\/uploads\/2014\/11\/Bubledorf-260x300.png 260w\" sizes=\"auto, (max-width: 584px) 100vw, 584px\" \/><\/a><\/p>\n<p>Hoekstra et al. note that all six statements are false and therefore the correct response to mark each as False. [1] The results were quite disturbing. The average number of statements marked True, across all three groups, was 3.51 (58.5%). Particularly disturbing is the fact that statement #3 was endorsed by 73%, 68%, and 86% of bachelor students, master students, and researchers respectively. Such a finding demonstrates that people often use confidence intervals simply to revert back to NHST (i.e., if the CI does not contain zero, reject the null).<\/p>\n<p>However, it was questions #4 and #5 that caught my attention when reading this study. The reason they caught my attention is because my understanding of confidence intervals told me they are correct. However, the correct interpretation of a confidence interval, according to Hoekstra et al., is \u201cIf we were to repeat the experiment over and over, then 95% of the time the confidence intervals contain the true mean.\u201d Now, if you are like me, you might be wondering, how is that different from a 95% probability that the true mean lies between the interval? Despite the risk of looking ignorant, I asked that very question on Twitter:<\/p>\n<p><a href=\"http:\/\/rynesherman.com\/blog\/wp-content\/uploads\/2014\/11\/TwitterPic.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-61\" src=\"http:\/\/rynesherman.com\/blog\/wp-content\/uploads\/2014\/11\/TwitterPic.png\" alt=\"TwitterPic\" width=\"643\" height=\"325\" srcset=\"http:\/\/rynesherman.com\/blog\/wp-content\/uploads\/2014\/11\/TwitterPic.png 643w, http:\/\/rynesherman.com\/blog\/wp-content\/uploads\/2014\/11\/TwitterPic-300x151.png 300w, http:\/\/rynesherman.com\/blog\/wp-content\/uploads\/2014\/11\/TwitterPic-624x315.png 624w\" sizes=\"auto, (max-width: 643px) 100vw, 643px\" \/><\/a><\/p>\n<p>Alexander Etz (@AlexanderEtz) provided an excellent <a href=\"http:\/\/nicebrain.wordpress.com\/2014\/11\/16\/can-confidence-intervals-save-psychology-part-1\/\">answer to my question<\/a>. His post is rather short, but I\u2019ll summarize it here anyway: from a Frequentist framework (under which CIs fall), one cannot assign a probability to a single event, or in this case, a single CI. That is, the CI either contains \u03bc (p = 1) or it does not (p = 0), from a Frequentist perspective.<\/p>\n<p>Despite Alexander\u2019s clear (and correct) explanation, I still reject it. I reject it on the grounds that it is practically useful to think of a single CI as having a 95% chance of containing \u03bc. I\u2019m not alone here. Geoff Cumming also thinks so. In his <a href=\"http:\/\/www.latrobe.edu.au\/psy\/research\/cognitive-and-developmental-psychology\/esci\">book<\/a> (why haven\u2019t you bought this book yet?) on p. 78 he provides two interpretations for confidence intervals that match my perspective. The first interpretation is \u201cOne from the Dance of CIs.\u201d This interpretation fits precisely with Hoekstra et al.\u2019s definition. If we repeated the experiment indefinitely we would approach an infinite number of CIs and 95% of those would contain \u03bc. The second interpretation (\u201cInterpret our Interval\u201d) says the following:<\/p>\n<blockquote><p>It\u2019s tempting to say that the probability is .95 that \u03bc lies in our 95% CI. Some scholars permit such statements, while others regard them as wrong, misleading, and wicked. The trouble is that mention of probability suggests \u03bc is a variable, rather than having a fixed value that we don\u2019t know. Our interval either does or does not include \u03bc, and so in a sense the probability is either 1 or 0. I believe it\u2019s best to avoid the term \u201cprobability,\u201d to discourage any misconception that \u03bc is a variable. However, in my view it\u2019s acceptable to say, \u201cWe are 95% confidence that our interval includes \u03bc,\u201d provided that we keep in the back of our minds that we\u2019re referring to 95% of the intervals in the dance including \u03bc, and 5% (the red ones) missing \u03bc.<\/p><\/blockquote>\n<p>So in Cumming\u2019s view, question #4 would still be False (because it misleads one to thinking that \u03bc is a variable), but #5 would be True. Regardless, it seems clear that there is some debate about whether #4 and #5 are True or False. My personal belief is that it is okay to mark them both True. I\u2019ve built a simple R example to demonstrate why.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n# First, create 1000 datasets of N=100 each from a normal distribution.\r\nset.seed(5)\r\nsims &lt;- 1000\r\ndatasets &lt;- list()\r\nfor(i in 1:sims) {\r\n\u00a0 datasets&#x5B;&#x5B;i]] &lt;- rnorm(100)\r\n}\r\n\r\n\u00a0 # Now get the 95% confidence interval for each dataset.\r\nout &lt;- matrix(unlist(lapply(datasets, function(x) t.test(x)$conf.int)), ncol=2, byrow=T)\r\ncolnames(out) &lt;- c(&quot;LL&quot;, &quot;UL&quot;)\r\n\u00a0 # Count the number of confidence intervals containing Mu\r\nres &lt;- ifelse(out&#x5B;,1] &lt;= 0 &amp; out&#x5B;,2] &gt;= 0, 1, 0)\r\nsum(res) \/ sims\r\n\u00a0 # Thus, ~95% of our CIs contain Mu\r\n<\/pre>\n<p>This code creates 1000 datasets of N=100 by randomly drawing scores from a normal distribution with \u03bc = 0 and \u03c3 = 1. It then computes a 95% confidence interval for the mean for each dataset. Lastly, it counts how many of those contain \u03bc (0). In this case, it is just about 95%.[2] This is precisely the definition of a confidence interval provided by Hoekstra et al. If we repeat an experiment many times, 95% of our confidence intervals should contain \u03bc. However, if we were just given one of those confidence intervals (say, at random) there would also be a 95% chance it contains \u03bc. So if we think of our study, and its confidence interval, as one of many possible studies and intervals, we can be 95% confident that this particular interval contains the population value.<\/p>\n<p>Moreover, this notion can be extended beyond a single experiment. That is, rather than thinking about repeating the same experiment many times, we can think of all of the different experiments (on different topics with different \u03bcs) we conduct and note that 95% of them will contain \u03bc within the confidence interval, but 5% will not. Therefore, while I (think) I understand and appreciate why Hoekstra et al. consider the answers to #4 and #5 to be False, I disagree. I think that they are practically useful interpretations of a CI. If it violates all that is statistically holy and sacred, then damn me to statistical hell.<\/p>\n<p>Despite this conclusion, I do not mean to undermine the research by Hoekstra et al. Indeed, my point has little bearing on the overall conclusion of their paper. Even if questions #4 and #5 were removed, the results are still incredibly disturbing and suggest that we need serious revisions to our statistical training.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>[1] The last sentence of the instructions makes it clear that it is possible that all True and all False are possibilities. How many people actually believed that instruction is another question.<\/p>\n<p>[2] Just for fun, I also calculated the proportion of times a given confidence interval contains the sample mean from a replication. The code you can run is below, but the answer is about 84.4%, which is close to Cummings\u2019 (p. 128) CI Interpretation 6 \u201cPrediction Interval for a Replication Mean\u201d of 83%.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n\u00a0 # Now get the sample Means\r\nMs &lt;- unlist(lapply(datasets, mean))\r\n\u00a0\u00a0\u00a0 # For each confidence interval, determine how many other sample means it captured\r\nreptest &lt;- sapply(Ms, function(x) ifelse(out&#x5B;,1] &lt;= x &amp; out&#x5B;,2] &gt;= x, 1, 0))\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 # Remove the diagonal to avoid double-counting\r\ndiag(reptest) &lt;- NA\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 # Now summarize it:\r\nmean(colMeans(reptest, na.rm=T)) # So ~ 84.4% chance of a replication falling within the 95% CI\r\n<\/pre>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In a recent paper Hoekstra, Morey, Rouder, &amp; Wagenmakers argued that confidence intervals are just as prone to misinterpretation as tradiational p-values (for a nice summary, see this blog post). They draw this conclusion based on responses to six questions from 442 bachelor students, 34 master students, and 120 researchers (PhD students and faculty). The [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[4,9,3],"class_list":["post-59","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-psychology","tag-r","tag-statistics"],"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"","_links":{"self":[{"href":"http:\/\/rynesherman.com\/blog\/wp-json\/wp\/v2\/posts\/59","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/rynesherman.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/rynesherman.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/rynesherman.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/rynesherman.com\/blog\/wp-json\/wp\/v2\/comments?post=59"}],"version-history":[{"count":6,"href":"http:\/\/rynesherman.com\/blog\/wp-json\/wp\/v2\/posts\/59\/revisions"}],"predecessor-version":[{"id":67,"href":"http:\/\/rynesherman.com\/blog\/wp-json\/wp\/v2\/posts\/59\/revisions\/67"}],"wp:attachment":[{"href":"http:\/\/rynesherman.com\/blog\/wp-json\/wp\/v2\/media?parent=59"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/rynesherman.com\/blog\/wp-json\/wp\/v2\/categories?post=59"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/rynesherman.com\/blog\/wp-json\/wp\/v2\/tags?post=59"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}