Note: A reply to each of the points in Payscale.com’s comments on this article has been relocated to a separate post here. Problem’s #4 and #8 were also updated to clarify the issues in question.
But how much can you MAKE???
Friday’s New York Times article on Payscale.com and today’s in the Huffington Post show the website’s growing influence in the national conversation about higher education outcomes, and, in particular, the role of “average earnings” in that debate. The Seattle-based company, which built its business model around providing a glimpse at wages to potential applicants in exchange for their own current salary and providing industry-level aggregate salary data to employers, has gotten increasing press recently for disaggregating salaries based on the college attended and undergraduate major the respondent. In addition to Payscale’s own rankings, available on its website, its numbers are now a central part of the Forbes Rankings (where they outweigh “graduation rate”) and reputable sites like the Gates Foundation-funded / Chronicle-sponsored CollegeRealityCheck.com.
Here’s a spoiler alert- this post is NOT about why we shouldn’t include salary data as part of the national outcomes conversation (there’s are clear arguments for why salary data alone shouldn’t be the metric by which colleges are judged, but neither Payscale nor those citing its data have suggested otherwise); to the contrary, I’m assuming here that salary information and the type of data that necessarily underlie it is vital part of the national conversation on the outcomes of higher education.
Nor is this about Payscale.com having bad intentions or being bad at what it does. Their motivation in creating the rankings is clear and reasonable: Parents, students, researchers, and non-profits really want to know how much a graduate can expect to earn, and with no real national-level competition for this data and big pile of salary records in the vault, Payscale is filling a niche where there is money to be made. And while I think Payscale.com a great resource in its original wheelhouse of industry-level salary information, there are some serious problems with using its college and major data for anything beyond cocktail conversation and some very limited comparisons between a small subset of institutions. Here are 9-
The Problems with Payscale’s College Rankings
The form and setting where the data is collected isn’t conducive to accuracy. Payscale collects all of its wage information from a quick survey required of visitors to access salary information about a particular company or job. It’s entirely fine for what it is, but as with any similar survey there is no way to check for absolute accuracy (of either college attended or salary earned), and it is typically filled out quickly in an attempt to get to the information of interest. On top of that, and perhaps even more importantly, self-reported data on salary is just notoriously tricky.
Sites like Payscale oversample young workers new to the job market . Users of any online salary comparison tool are more likely to be young, white-collar workers; That’s fine, if that’s all you want to know about, but the ratings purport to indicate “mid-career” earnings and include colleges that serve much wider demographics. Looking at the response pool for any college, typically only about 10-15% have 10+ years of experience.
Payscale in particular may over- and under-sample lots of things (like region), but we can’t know what. Payscale.com isn’t the internet’s top purveyor of user-reported salary information. Competitor Glassdoor.com receives about 315,000 visits per day compared to Payscale’s 35,000. Because the data is proprietary (see #8) we can’t know how or if users are biased by region or anything else, but it seems likely that Payscale.com, with its lower name recognition, might, for example, be more popular near its west coast base than in the Southwest (Glassdoor is based in Texas).
Payscale logically but problematically excludes anyone with an advanced degree. This is more of a challenge with looking at outcomes broadly. When students have graduate degrees, how do we parse out how much of their current earning to attribute to their undergraduate institution vs how much we attribute to their graduate institution? The imperfect solution almost always used is to exclude anyone with more than a B.A. entirely. Payscale.com noted in their response to this article that nationally only 11% of all individuals in the U.S. 25 or older have a M.A. or higher, but remember that this sample includes only college graduates. How does that change relative percentage? Well, it was big news a year back that BA or higher attainment in the 25+ group topped 30% for the first time. That means that about 37% of the intended sample population (11 ÷ 30) hold an advanced degree. You can also derive that number using data here. That % is growing quickly (and, as such, is higher if you look at the younger half of the workforce from which most Payscale data is derived), at least in part because M.A.’s, PhD’s, and professional degrees end up earning more than those with only a BA. On top of that, this 37% is obviously not evenly distributed across all undergrad institutions (see the response to Payscale’s 3rd comment for why the number may be close to 75% for many of the colleges on the Payscale list) and you’ve got yourself a real bias problem- this unequal distribution also applies to majors. An entirely separate issue is that undergraduate institutions would argue, not unreasonably, that these students are admitted to and succeed in graduate school because of what they learn in undergrad, and if that’s even partly true then institutions where these students represent a high percentage of all graduates are likely misrepresented for that reason as well.
- Payscale rankings don’t (and can’t) weight by majors. It won’t be news to anyone that there are differences in mean earnings based on undergraduate major- Payscale even promotes this fact on a separate part of its site. So you would think they might try to take account of this in some way in their rankings. Just a quick glance is all it takes to notice that the vast majority of the top schools are those where majors are skewed towards a certain subset of majors. You could perhaps make the case that a student on the margin about what they wanted to do with their life might go to one of these colleges and would be more likely to choose a technical profession and go immediately into the workforce rather than graduate school… but as you might imagine, most students who go to these types of schools go because they already know that’s exactly the sort of thing they want to do – representing not a value added, but an input. More problematically, this means that schools with a more even distribution of majors or even a skew towards certain types of majors (take a few minutes and try to find the first “school of the arts” on the lists rank low), even though a student going into one of the very few fields that Payscale reports correctly here (again, largely types of engineering and technical work) who attends the University of Virginia or a small liberal arts college might actually earn more than a similar student and top-ranked Harvey Mudd (or that one of the tiny number of humanities majors at MIT or other technical schools might vastly under-earn their counterparts at other schools). This isn’t what the these rankings purport to measure, but it’s a large part of what they do.
There are very few responses for many colleges, and Payscale uses this limited data to make questionable inferences. Payscale already admits in its sparse methodology that the confidence interval for liberal arts colleges is 10%, but with thousands of graduates represented by fewer than 100 salaries even this is likely too conservative and could lead to false conclusions. In one troubling example, the NYTimes article above questions a “gender gap” in the rankings by pointing to low ranks for Wellesley and Bryn Mawr. IN response, an unnamed Payscale.com representative’s explains that “women’s colleges still don’t produce enough graduates in engineering, science and technology, the fields that draw the highest salaries.” Yet National Science Foundation data from a nearly complete dataset of U.S. PhDs shows that Wellesley, which ranks 304 on the Payscale list, is 33rd in the country in production of Science and engineering PhD’s degrees per capita. Bryn Mawr, #562 on Payscale, is 12th- 9.7% of their alums go on to get PhD’s in Science or Engineering- a healthy margin above John’s Hopkins, Yale, Rennselaer Polytechnic Institute, UC-Berkley, and obviously just about every other institution. Grinnell, a small co-ed liberal arts college also called out in the article for being #366 on Payscale, comes in at #8. What’s more likely is that these institutions rank so low because a) they have a very small set of responses in Payscale’s database b) they produce a disproportionate number of graduate student c) they may produce fewer of a certain type of graduate that goes immediately into the workforce for a certain type of firm. One easy test (but impossible with the data they make available) would be to look at the relative variability of rankings plotted against institutional size and responses.
More accurate (but also imperfect) state-level data sets suggest that even the Payscale data on large universities may be way off. There are a handful of state-level systems that collect alumni salary data based on actual state tax records (Arkansas, Tennessee, Virginia, Colorado, and Texas). Just a few quick comparisons to actual wage data from Texas show that Payscale overestimates Texas A&M’s starting salary by almost $10,000 ($51,900 vs. $42,662) or 22%. Meanwhile, it underestimates initial year earnings at Texas Woman’s University by $3,694 or 8%. These state-level systems are limited too- like Payscale, they only look at earnings for students with only a B.A. and can’t track those alumni working out of state, but both the number of records and their accuracy is vastly higher for the state-level institutions they report. (See why Texas is used as the example here)
We can’t know what else is wrong (or right) with the data because the Payscale data is proprietary. In many ways, Payscale is an advertisement for the value of the private sector. If you you want a great sense of what people make in certain industries, or a ballpark sense of what they make at a particular large company, sites like Payscale, and Glassdoor, and Salary.com are appealing as helpful because the companies they report on may be tight-lipped about salary. On top of that, their presentation of the the data is well-organized, visually appealing, and user-friendly in a way that outdoes anything you’ll find on the department of education website. But the data in question here has big implications, and comes with huge risk of bias. While Payscale.com has indicated in a response (found in the comments below) that they are “very open to research requests” they also note, entirely reasonably, that they must consider data releases in light of the business interests. Because not only their data-quality but their reputation and market value are threatened by challenges to data quality, it’s simply impossible fully explore the bias in the data, ensure the quality of the methodology, and ask more important questions of the information in ways that are publicly available- and publicly accountable.
There could be a MUCH better option. To be clear, I don’t begrudge Payscale.com one bit- they’re a for-profit company that is filling a unoccupied market for which there is clearly a demand. On top of that, one can imagine that, even given the limitations above, Payscale data might actually be very accurate as a metric for the early career salaries of white collar/tech workers from large institutions who do not go to graduate school. Nor can we reasonably criticize groups that have cited the rankings in a search for something, anything, that can tell us about institutional post-graduate outcomes at the national level. That’s why our question shouldn’t be “why is Payscale.com data so bad,” but, rather, “Why don’t we have something better?”
So why don’t we have something better?
You may be thinking to yourself- “But wait- the federal government has attendance and graduation records for every student who has ever gotten any federal financial aid (just about all of them, in some form or another), and the IRS has got the nation’s best records on what people make. Both have an ID number that could be used to link them. Why can’t we just use that?”
One answer, offered by former director of the of the National Center for Education Statistics (NCES) Mark Schneider in a chapter in Accountability in Higher Education, is that we tried. Already concerned about growing levels of student loan debt and the questionable amount of student aid going to new for-profit institutions, the NCES proposed a student-level record system and issued a report demonstrating its feasibility while protecting student-level privacy, yet progress was cut short after political pressure from the higher education lobby. Leading the charge was the National Association of Independent Colleges and Universities, or NAICU, which ironically represents many of the small and medium-sized private schools noted as likely misrepresented in #6.
In 2008, NAICU successfully argued that because a federal student-level record database would compromise student privacy (although similarly confidential information is collected en masse by both the FAFSA and the IRS) and that because institutions were already participating in voluntary accountability processes (largely re-reporting already-available or incomplete data to a limited audience), a student-level database wasn’t just a bad idea- it should be impossible. As a result, the 2008 reauthorization of the Higher Education Act specifically prohibits federal collection of student-level data on graduation and subsequent earnings. Instead, each state has been asked to create separate databases (like those already functional in the states noted in #7), and in some states, including Texas and Wisconsin, private colleges have successfully lobbied to create their own databases separate from public colleges and replete additional restrictions on access.
The potential of a national student-level dataset relates not only to post-graduate earnings, but to essential, universally valued metrics in higher education- graduation and retention rates. Our current method asks hundreds of individual colleges to report on who finishes college in ways that entirely miss 1) transfer students, 2) those who enter as part-time students, and 3) those who take more than 6 years to graduate (or 3 years if the student is at a 2 year institution); over a third of all students fit the first category alone. Instead, nearly every summary statistic you have ever read relies upon the antiquated federal IPEDS reporting system, which collects data for a subset of students that made sense in 1950, but not in 2013. That isn’t because the government doesn’t care about the large (and growing) group the metrics miss- indeed, understanding their success will be central to the future success of financial aid and higher education in the U.S. We ignore these students because we have passed a law that makes it impossible for us to track them.
If the Obama Administration, non-profits, and education policy advocates are really serious about wanting to better understand the outcomes of college, legalizing and facilitating the creation of a highly-secure but highly comprehensive student-level data system yardpost against which to measure their own success.
“If you want a fair opinion of dogs, don’t just ask the fire hydrants.”
There’s some meaning in this old survey research mantra for the Payscale.com data- we need seriously consider the limitations we know about, and the many we not- many of which come back to the simple question of who is really getting asked.
But, broadly, this is about voices being heard- Parents, students, and taxpayers deserve more intelligent answers to questions about the broader return on their investment in higher education. It’s the very sort of idea that might be debated and applauded within the hallowed halls of the higher education institutions whose representatives have fought the hardest against it. While it can be a little depressing to the ol’ idealism, the politics of higher education are real, getting a counter-argument heard will require time and funding, and good ideas without strong advocates often remain only that.
This one is worth getting right.