My interview with Joseph Strayhorn, the co-author of a new study on teen birth rates is up at Her.meneutics. I began the email interview by telling Dr. Strayhorn that I myself had been a teen mother (pregnant at 19) whose child died by suicide. He responded with condolences. That intro was edited out for space. Here’s the lede:
Some social liberals used a recently published study in Reproductive Health that found a strong link between high religiosity and teen pregnancy rates to further their case for why abstinence-only sex education doesn’t work.
Double XX, Mother Jones, Bonnie Erbe at U.S. News & World Report, and Andrew Sullivan at The Atlantic say the results of “Religiosity and Teen Birth Rates” — which found higher teen birth rates in the most religiously conservative states, even after controlling for differences in income and abortions — point to conservatives’ hypocrisy on family values. The researchers, father-daughter team Joseph and Jillian Strayhorn, speculated that perhaps teens in highly religious states are more likely to become pregnant because they are less likely to know about or use contraception. Jillian’s work went toward fulfilling an advanced home-schooling course in statistics, roughly equivalent to a sophomore college course in regression analytics. (Dr. Strayhorn noted that “ironically, one or two of the bloggers I read who used our article to slam religion also slammed home-schooling.”)
After reading such interpretations, Her.meneutics regular Christine A. Scheller decided to interview Dr. Strayhorn, associate professor of psychiatry at Drexel University College of Medicine. …
Also cut was this lengthy response to my question about methodology:
CAS: I found the negative correlation language confusing. Can you explain your results to me in layman’s language? How did you control for abortion and income?
JS: Thanks for raising these methodological questions. Obviously many, many people do not understand the statistical methods we used, but few of them appear to pursue an understanding of these. These techniques are central to the research.
First, what is the correlation coefficient? This is a number computed from a set of ordered pairs representing values of two variables. For example, Alabama has religiosity score x1 and teen birth rate y1. Alaska has religiosity score x2 and teen birth rate y2. And so on, for however many pairs you have. These days, computers and calculators are the ones that spit out the correlation coefficient, which is a number between –1 and +1. The correlation coefficient tells us how predictable one number is from the other, by means of a linear equation (of the form y=mx +b). The higher the absolute value of the correlation, the closer the points on a scatter plot tend to cluster around a line. If the correlation coefficient is +1, that means that you can exactly predict the value of either variable knowing the other. A correlation coefficient of +1 means that if you plot the data on a scatter plot, for example by using the weight of a sample of water as the x coordinate and the volume of the sample of water as the y coordinate, and do that for a number of samples of different size at the same temperature measured with great accuracy, you get a perfectly straight line. A correlation of 0, by contrast, means that knowing the value of one variable helps you not at all in predicting the value of the second. Scatter plots with correlations of 0 look like a bunch of dots randomly scattered out, without a linear pattern apparent. If you and I were to each take a few dice and roll them, and we were to plot as data point 1 the sum of the scores for mine as the x coordinate and the sum of the scores for yours as the y coordinate, and do that say 50 times, the scatter plot would probably come out looking random and the correlation coefficient would come out pretty close to 0.
Positive correlation coefficients mean that as one variable increases, the other also increases. Negative correlation coefficents mean that as one variable increases, the other decreases. A correlation of –1 is one where perfect prediction is also possible, only the line that relates the two variables slopes from upper left to lower right rather than lower left to upper right. When we say that religiosity is positively correlated with teen birth rate, we mean that more religious states tend to have higher birth rates. When we say that religiosity is negatively correlated with income, we mean that more religious states tend to have lower incomes.
In social science, many people refer to correlations of 0.1 as low, of 0.3 as moderate, and of 0.5 or greater as high. Correlations among averages, as in our state data, are often higher than individual data.
That’s the easy part, so far. Now how do we statistically control for a variable like abortion or income? This is the purpose of the partial correlation coefficient. The partial correlation is meant to estimate, what would the correlation of two variables be, without the influence of a certain third variable? For example: if we just compute a raw correlation between basketball skill and shoe size, we get a very high correlation. However, if we compute a partial correlation of basketball skill and shoe size with height controlled, we get a much smaller or zero correlation.
Here’s how “partialling” works. Let’s make a scatter plot with abortion rate as the x coordinate and teen birth rate on the y axis, and get the computer to draw the best possible line that relates the two variables (with best possible defined as minimizing the sum of the squared deviations of the points from the line). The vertical distance of each point from the line is called the residual. This can be thought of as the part of teen birth that isn’t accounted for by abortion. Now we do the same thing, plotting abortion on the x axis and religiosity on the y axis. Now when we compute the residuals, we get the amount that the religiosity score for each state deviates from that which would be predicted by abortion. These residual scores can be thought of as scores that are an answer to the question, “What would the scores of religiosity and teen birth look like if all states were the same with respect to abortion rates?” Now, if we compute the regular correlation between the residual scores, we get the partial correlation coefficient. It may be thought of as a correlation between two variables, with the effects of a third possibly confounding variable statistically removed. If the only reason for higher teen birth rates in more religious states were differences in abortion rates, the partial correlation between teen birth and religiosity with abortion controlled for would have been close to zero.
Of course, these days people simply ask computers to calculate partial correlations and get the numbers almost instantly, but the process of first getting residuals and then correlating them will give you the same number the computer gets, and this process helps understand partialling.