Bias by Design - By the Numbers

Most of the numbers made sense to me.   The number of libraries that held a particular title; number of titles reviewed; the total number of titles in the survey.   When I got to the final set of tables, I was flummoxed.  There were little boxes titled "Mean="   And in those boxes, after the mathematical operator (or equal sign) there were numbers: bigger numbers and smaller numbers.   It took me forever to figure out what those numbers really meant.  In fact, it wasn't until this morning that I had an epiphany and all became clear.   For those of you who are math geeks and understood what they signified: stop rolling your eyes and shut up.   I'm willing to bet that Stephen Hawking was quite proud of me when I managed to get 4 pages into A Brief History of Time , 20 pages into A Briefer History of Time , and nearly a quarter of the way through The Universe in a Nutshell .  However feel free to correct any of my assumptions or mathematical errors.  Just be nice -  I can find out where you live and I'll have your mom cut off the power to the basement. At first glance those numbers are potentially damning.   Take a look at the last few columns.   Total Mean for favorable books is 214.  The same for unfavorable books is 633.  Which leads Tomeboy to his conclusion that for every 3 libraries that have a title critical of ID there is 1 that has title that is supportive. [NOPE (Note Of Personal Embarrassment) - when I saw that number I jumped all over it.  AHA!   If you add the mean for the top 21 unfavorable titles (389) to the mean for what I thought was the bottom 18 titles (214) you end up with a total of 593 and a rough proportion of 1 to 1.   The condensed version of Tomeboy's response was: "Nice try, but no go".   Evidently the phrase "Total" in "Total Mean" meant just that.   It was the total mean for all 39 titles.  Who knew?] However, I was left with an even more perplexing paradox.   If you compare the means for the top 21 unfavorable titles (633) with the top 21 favorable titles (389) you end up with a proportion of 1.63 to 1.   Using Tomeboy's analysis, and being generous, that would mean 2 libraries having an unfavorable title to every library that has a favorable title.   That is where I got confused.   When you add in the bottom 18 favorable titles, you end up with a proportion of 3 to 1.   If libraries had stopped at acquiring the top 21 titles, the chances for finding a library that carried a pro-ID book were better?  The more titles you bought that support the idea of ID the less chance you were going to find a library that carried one?  WTF? That's when I finally got smart and started to look at what was being measured.   The numbers provided by Tomeboy measure the average number of libraries that buy a particular book in the three categories (Balanced, Not Favorable, Favorable).  Comparing those numbers means that rather than comparing the number of libraries that have a unfavorable book to the number of libraries that possess a favorable book, Tomeboy is comparing the proportion of the average number of libraries that own a particular title - unfavorable to favorable.  I don't know what that number means, or of what the importance of it is.  [Math geeks: now would be a good time to jump in].   If you want to compare the number of libraries that have unfavorable titles to the number of libraries that have favorable titles, just look at the raw numbers:    Unfavorable: 13,298; favorable: 8236.  No, I didn't double check my addition.  Feel free.  Which leaves us with an average of 1.61 libraries with a title that "pooh-pooh" ID to 1 library with a title that supports ID.   At least our mythical patron's search isn't quite as onerous.  

Comments

Means to an end

Robert - So you took my advice? ; )

(for those interested, Robert and I have been discussing statistics via personal email)

Where to begin?

If you want to compare the number of libraries that have unfavorable titles to the number of libraries that have favorable titles, just look at the raw numbers: Unfavorable: 13,298; favorable: 8236. No, I didn't double check my addition. Feel free. Which leaves us with an average of 1.61 libraries with a title that "pooh-pooh" ID to 1 library with a title that supports ID. At least our mythical patron's search isn't quite as onerous.

First, you are using 13,298 to represent a total sample of all WorldCat libraries. You can't do that. You are conflating holdings with individual libraries. If memory serves I recall a total worldwide membership of WorldCat being around 9000. In other words, the 13,298 holdings for Not Favorable titles is not tantamount to 13,298 different libraries. For example, of the 1077 libraries that hold Traipsing into Evolution perhaps only 50 hold the Tower of Babel. Or of the 1087 that hold The Case for a Creator, maybe 1000 also have The Language of God. Simply put we do not know the total number of libraries that own at least 1 ID book. (Be my guest) Make sense?

If our total sample was 13,298 libraries, then you could say there was a "definitive", as opposed to mean, ratio of 1.61. But that's not the case here which is precisely why I choose to work with means. Shake these 9000 or so libraries in the biggest book drop you can imagine and begin pulling them out one at time. On average (important) for every 3 libraries holding a title Not Favorable to ID you will find 1 Favorable title. Or back to that irritating 2.95:1.

For comparison, we do know (assuming I didn't miss any) the total sample for determining the ratio between Not Favorable and Favorable reviewed books and could have said the ratio is (15/68)/(6/68) or 2.5:1. However this assumes equal distribution of Not Favorable and Favorable titles or 34 for each category which is not the case here and would be misleading. (See how messy working with aggregates can be?) A more statistically accurate number to convey the disparity is to compare the number of reviewed titles within each categore i.e. 72% v 29% to arrive at an average for each.

Averages Robert. Averages.

Question about your stats

Did you take into account the proportion of books which are being published in both categories and how that compares to the proportion found in libraries? I would think that if the proportions are within statistical tolerances, no one can possibly make a case for discriminatory selection.

Re:Question about your stats

Did you take into account the proportion of books which are being published in both categories and how that compares to the proportion found in libraries?

I don't follow your question.

In proportion to all libraries in WorldCat? Every book taken from my sample, except "The Signature of the Artist", had at the time at least one library holding. (we can assume a holding forthcoming). That established, I am unaware of any way to discern the sum of all published "Balanced" books (any books for that matter)outside of WorldCat for a given time period. I do think it safe to infer as the world's largest database comprising 70 million records, that I doubt we missed too many under the radar. Using the numbers given in WorldCat, if by "both categories" you mean Balanced, then yes we can determine a mean ratio of Not Favorable to Balanced at almost 1.9:1.

As I told Robert, means are the currency of statistical analysis. Determining a proportion found in libraries, if I understand your question, would require knowing the total sample.

Re:Means to an end

"If our total sample was 13,298 libraries, then you could say there was a "definitive", as opposed to mean, ratio of 1.61. But that's not the case here which is precisely why I choose to work with means. Shake these 9000 or so libraries in the biggest book drop you can imagine and begin pulling them out one at time. On average (important) for every 3 libraries holding a title Not Favorable to ID you will find 1 Favorable title. Or back to that irritating 2.95:1."

So what you are saying is that if a library owned 1, and only 1 book on ID, that it would fall into the 2.95 to 1 mean. In short, 3 libraries would own a book critical of ID to every 1 library that owned a book supportive.

Re:Question about your stats

I think what FF was asking was: Even if there is a difference in the proportions of various kinds of titles per collection, or the the whole of WorldCat, are those difference statistically significant?

Re:Means to an end

"First, you are using 13,298 to represent a total sample of all WorldCat libraries." Actually, it's even worse than that.   13,298 equals only the Not Favorable titles.   If you also add in the Favorable (8363) and the Balanced (2668) you end up with a grand total of 24,329.  You can't do that. You are conflating holdings with individual libraries. If memory serves I recall a total worldwide membership of WorldCat being around 9000. In other words, the 13,298 holdings for Not Favorable titles is not tantamount to 13,298 different libraries. For example, of the 1077 libraries that hold Traipsing into Evolution perhaps only 50 hold the Tower of Babel. Or of the 1087 that hold The Case for a Creator, maybe 1000 also have The Language of God. Simply put we do not know the total number of libraries that own at least 1 ID book. (Be my guest) Make sense? Absolutey.  You've actually anticipated my next point.  That libraries don't own just 1 ID title, on average they will own several.  If several titles are owned isn't there also a chance that the library will have a mix of Favorable and Unfavorable ID titles? If our total sample was 13,298 libraries, then you could say there was a "definitive", as opposed to mean, ratio of 1.61. But that's not the case here which is precisely why I choose to work with means. Shake these 9000 or so libraries in the biggest book drop you can imagine and begin pulling them out one at time. On average (important) for every 3 libraries holding a title Not Favorable to ID you will find 1 Favorable title. Or back to that irritating 2.95:1. I think there is some serious, if not fatal, flaws in your method.  First of all, it seems to ignore other, contradicting, facts, and secondly it is too easily manipulated by extremes in the data.  Lets compare the average of "Balanced" titles to "Favorable" titles. 334 to 214, or 1.56:1.   Using your method, I would conclude that our patron, searching for ID books, will find that for every 16 libraries containing a Balanced book, they will find 10 libraries containing a Favorable book.  Right?  Despite the fact that the raw numbers show a  3 to 1 advantage in the number of Favorable titles owned by libraries compared to Balanced titles owned by libraries (8363 to 2668).  How can it be that despite a greater number of Favorable titles, a patron is still more likely to find a book carrying a Balanced ID title? One other question.  Is it possible to infer from the data that we could be looking at an imbalance within the collection of a library.  That, on average, a particular library that owns ID titles will have 3 Unfavorable titles to every 1 Favorable title?

Re:Means to an end

Despite the fact that the raw numbers show a 3 to 1 advantage in the number of Favorable titles owned by libraries compared to Balanced titles owned by libraries (8363 to 2668).

Robert, friend, you are mixing...again... mean ratios with what you incorrectly assume to be an aggregate sample of unique libaries e.g. 8363 to 2668. One, single solitary randomly selected Balanced title will have on average be held in 334 libraries. Another single solitary randomly selected Favorable title will have an average holding of 214 libraries. This is a one-to-one average correlation.

Let's try an analogy here. (I'm out of ideas to explain frankly)

Consider that big book return with the aggregate (total) sample of 9000 WorldCat libraries. Your library, my library, they are all in the bin. Ok? Now suppose I tell you, "Hey Robert, you want to make a wager on ID books in libraries? Maybe the title to your house or car? Consider Atheistic Universe(Not Favorable) has holdings in only 88 libraries while The Language of God (Favorable) has holdings in 1141. Will you wager that for every randomly selected library holding an ID title Not Favorable to ID, we will select 13 that have a Favorable?

(If you'll take the wager leave your cell number with an email)

Re:Question about your stats

My question was for Robert, not you, but since you've responded I'll examine the issues you've raised.

I'll accept that 70 million records will provide a sufficient statistical universe if you will define the territory those records comprise. Books in print? Or in library collections?

Secondly, define "balance" and "not favorable". Both terms have the stench of subjectivity about them. I doubt seriously that you and I would consider the same material balanced, in as much as you validate "information" by where it is published. Perhaps you would care to use the word "favourable" instead of balanced? Unless you can point to criteria by which material can be judged using objective scientific method?

As for what my question means, I'll refer you to Chuck's reply to your confusion. What do you suppose it does to your statistical analysis if the proportion of favourable to critical books in the market of ideas is also 1.9:1? Suppose it is higher? What then?

Re:Means to an end

Actually I understand what you mean. I think we have run into a problem of semantics. When you said in your post, "For nearly every 3 libraries holding a title pooh-poohing Intelligent Design, your patrons will find only 1 library with the temerity to rebut those who find nothing intelligent about Intelligent Design." I took that to mean ANY title. Your statistics refer to the average chances of finding a PARTICULAR title. Thus, the chances of finding any ONE particular title on the shelf of a library is 3 times greater for unfavorable ID books than that of any ONE particular favorable book. That, however, does not translate into the chances of finding an unfavorable title vs. a favorable title (any title)from the lists you provided.

So what you are saying, then, and based upon your averages, is that if I were to wager that for every library, randomly selected, that owned a balanced book that we would pull 3 or more libraries that owned an unfavorable book, that you would take that wager?

And we both get to count libraries that own at least 1 of each.

Why isn't that, up to this point, reflected in the real world? After you found an Oregon library that carried a anti-ID title, but no pro-ID titles, I actually went through and checked title holdings on Worldcat. Out of 50 libraries, 44 carried both pro and con. 6 carried only unfavorable. I didn't check to see how many carried only con. As far as I can tell, there is a possibility of 2, but I have not followed up. Does this mean that Oregon is an anomaly? It's nice to know that our patrons will find they would have to search pretty hard to find a library that carried only unfavorable books.

Re:Means to an end

"Robert, friend, you are mixing...again... mean ratios with what you incorrectly assume to be an aggregate sample of unique libaries e.g. 8363 to 2668. One, single solitary randomly selected Balanced title will have on average be held in 334 libraries. Another single solitary randomly selected Favorable title will have an average holding of 214 libraries. This is a one-to-one average correlation."

Which is not to say that there is a 1 to 1 correlation of which kind a library will hold. Looking at all libraries, we are more likely to find one that carries a favorable, any favorable, title, than one that carries any balanced title. Is that right?

Re:Question about your stats

Actually, your question is for Tomeboy.

A new question

As you've pointed out, since there are 24,000 titles there is no one-to-one correlation between titles held and the libraries that hold them. However, can't we work with percentages? By the way, I'm truly grateful for the patience you've shown.

If I figured the numbers out correctly, 55% of all titles held by libraries are unfavorable, 34% are favorable, and 11% are balanced. As our poor patron wanders around in a daze, just trying to find some books on ID, that means if she wanders to 10 different libraries, six will have an unfavorable book, three will have a favorable book, and one will have a balanced book. That works, doesn't it?

Library Journal

You include Library Journal under the heading of "Reviewed in an ALA Publication" Library Journal is not an ALA publication.

Syndicate content