(the following post is a set of responses to the comments posted by Payscale.com on the post “8 Problems with Payscale.com’s College Rankings (and One Solution)”. The original comments are reproduced verbatim in italics, and the Around Learning responses follow each. This post first appeared as an addendum to original article)
Many thanks to the folks at Payscale.com for their reply, included in full in the comments section and italicized below-I prefer a conversation to a soapbox any day! As noted in the article, these responses should not be viewed as a critique of the work and style of Payscale.com writ-large; they focus narrowly and specifically on the Payscale.com college rankings, and suggest that, particularly in light of efforts to gather accurate data on outcomes for all colleges, Payscale.com is not the right solution as currently contructed. That does not mean the rankings (or, perhaps more appropriately, a non-ranked comparison system) can’t be improved, and, as some colleagues have noted, they may be better than nothing.
Here are my thoughts on your thoughts on my thoughts:
1. Accuracy is of utmost importance to our business. Every salary survey submitted to PayScale is validated through a number of accuracy checks both automated and manual. They are further judged to see if they are the result of attempted data fraud. Lastly, our team of data scientists does regular validity tests comparing PayScale data to other sources of compensation data (both publically and privately available). We have more than 2,500 business customers who rely on our data to set compensation for their employees.
Absolutely, and sorry if that was unclear-glad to tweak the language there. The article is intended to assume that Payscale.com, Glassdoor.com, and other salary websites have some spot-check mechanisms in place triggered by responses like “clown school” for undergrad, “underwater basket weaver” as profession, and $1,000,000,000 as salary. It is, of course, impossible, though, to know if the institutions a respondents say they have “graduated” from are the same as those where they started, which means it has the same problem as graduate degrees- how much credit do we give to an individual’s first institution vs the one where they finished (see my #4 and your #3 below). There are also a myriad of well-researched challenges with individuals misreporting their own salary and graduation data with no attempt to mislead- it’s just something that, because of taxes, bonuses, and another dozen things, is incredibly hard to standardize when relying on self-reported data- that limitation is true with the best and most thoroughly vetted surveys. There is also some great research out there on how inaccurately folks self-report even simpler things things like birthdays and marital status. Re: your 2,500 business customers, please note that #7 acknowledges that Payscale is probably a great source of industry-level data (at least for many industries), and that’s because of an assumption that your overall sample is much larger than your sample of data linked to specific undergraduate institutions (especially the small ones). You likely also draw upon other industry data (which may be the confusion over the 40 million profiles vs 1.4 million, noted in my response to your next question) that is likely incredibly valuable in examining sector-level data, but of course would not be linked to undergraduate institution.
2. PayScale actually is the top purveyor of not only user-reported salary information but of salary information in general. Our data is more accurate, more up-to-date and broader than any other salary information available. PayScale has a database of more than 40 million individual compensation profiles. Glassdoor claims to have “nearly 3 million salaries and reviews.”
It sounds like we may be mixing metrics here- the NYTimes article cited in the post notes, “PayScale says its rankings are based on data from 1.4 million college graduates” it would be helpful if you could clarify the difference between salary profiles and user-reported salaries- do both include undergraduate information? If not, then the Glassdoor citation may actually make my case (with a reminder that the focus of the article is not Payscale’s broader work as an aggregator of salary data across industries, only those records associated with colleges). The citations in that section are based on the overall number of user-reported salaries linked to individual colleges and the average daily visit rate- if it is significantly different than 35,000, I would just need another source to correct it. I think, though, that even if Payscale does have the largest dataset of salaries linked to colleges, the particular argument the post makes there is less about market dominance than about the existence of a market at all, the small number of samples per college, the potential for regional variation, and the inability for outsiders to ask questions about such variation and other potential biases.
3. We do exclude master’s degrees and above but for good reason. The intention of our report is to meet the needs of the majority of prospective students researching college choice. Because only 11% of those aged 25 and older hold a master’s degree, according to US Census data, the majority of prospective students will only complete a bachelor’s degree. We sought to offer the best comparison of post-grad outcomes for that population. If someone wants to use the data for another purpose, we’d need to create a unique, specific dataset for that purpose, which we’re more than happy to do.
As the post notes, this is absolutely the standard, logical (read ‘with good reason’) practice with this sort of thing, but that does not make it unproblematic. The 11% figure you cite for advanced degrees is roughly correct for the entire U.S. population, but remember, by the nature of these rankings we are interested only in a subset of the population: college graduates. It was big news a year back that BA or higher attainment in the 25+ group topped 30% for the first time in 2012, that means that while only 11% of the population hold an advanced degree 37% of the intended sample population (11 ÷ 30) hold one. That number can also be derived independently using data here. That % is growing quickly (and, as such, is higher if you look at the younger half of the workforce from which most of the Payscale data is derived), and that 37% is not evenly distributed across all institutions. In fact, many of the colleges most highly valued by consumers (at least in part) for their graduates’ eventual potential earnings send a disproportionate number of their alums into graduate programs. The original post has been updated to clarify this point. Unfortunately, as I note, while institutions can track this information using a tedious National Student Clearinghouse workaround, there is no good public way to track graduate degrees at the national level across institutions except for PhD’s, so let’s use those as an example:
Looking at Grinnell again, data derived from the NSF Webcaspar system (which tracks about 92% of all PhDs) indicates that 15% of Grinnell alums get PhDs.- that’s excluding M.A.’s, M.D’s, J.D.s- any other non-PhD graduate or professional degree. Nationally, only 2% of all individuals and ~5% of college grads get doctorates of any kind (so including M.D., J.D., Ed.D, etc- most estimate 2% is closer to the average PhD’s specifically for college grads), so Grinnell alums have between 3-7 times the average, even among only B.A. grads. Overall, we know that nationally another ~31% will get a non-doctorate advanced degree (law school, M.A.’s, and med school, and a dozen other potential programs). Even if Grinnell’ s comparative rate of attainment for these other degrees is even much less than its PhD rate- let’s say they only get them at 2 times the national rate for B.A. grads- you’re now talking about three quarters of the alumni base, and that might even be a conservative estimate. But we don’t and can’t know for sure because… we don’t have a good federal tracking system, which is of course the real point of the article. Ideally, one of the contextual pieces you would want for this sort of data would include % of alums earning advanced degrees (perhaps broken down in some way) by institution- again, you can’t access that information, but you could include % getting PhDs (using the source I cite) and the % with advanced degrees in your database. You could then disaggregate salaries along those lines for institutions with more than a certain number of cases; alternately, you could exclude schools (largely small schools and highly selective schools) above a certain threshold; you could even compare the PhD rate to your own internal distribution to check for bias. Your own data will still be highly subject to self-report bias along other lines, and it would be difficult if not impossible to provide rankings in this way, although I could imagine a more useful comparison system that used this data. Alternately, you could do some sort of rankings focused on schools with an explicit mission (which some have) of sending graduates directly into the workforce without any additional advanced training (and which have data to suggest that’s what happens as practice as well), but of course that loses some of the cultural wow-factor.
4. State data not tracking alumni working out of state is a big problem When comparing our data to the state data, you make the assumption that the state data is more accurate even though they can’t track salary data for alumni working out of state. We don’t have this constraint, so I’d argue that it’s the state data that is likely inaccurate. We’ve researched the question of where alumni end up after graduation — near their alma mater or not. Around 50% of grads work in a different state than the state in which they attended school.
I entirely agree that the state data is imperfect as well and not that in the article- that’s exactly the reason (or at least that plus the inefficiency of replicating collection systems in 50 states, plus the even more valuable questions of graduation, retention, and post-grad degree attainment…) that we need a good national system. However, it would be helpful to know more about the 50% number you cite- I would caution that if it comes only from the folks in the Payscale system, then the same selection challenges of the data-set, broadly (young, white collar, actively searching for jobs using a national database of salaries) may be skewing results. Even if based in something external to the site, part of the reason the post uses Texas is that the state’s large (and growing) population centers, diverse economy (with a growing tech sector), and lack of comparable job-market competition in its immediate border states mean that its out-of-state mobility is almost certainly lower than nearly every other state’s- and remember that the comparison point here was two public universities. Certainly, some meaningful percentage of Texas public college alums will take jobs out of state, but Texas A&M has graduated about 49,000 student since Payscale.com was founded in 2006, and Payscale has records for about 2,000 students that would would have graduated during that range of years, so even if the Texas data includes only 50% of graduates (although I expect it includes many more) it’s hard to believe it isn’t much more accurate for the (frustratingly small) number of schools in its system.
5. We’re very open to research requests. No one has ever asked, but we’ve actually considered releasing some of our data publically so that it can be scrutinized by researchers. We just recently put together a board of advisors that includes economists and others in the academic world, including Mark Schneider of AIR, to talk about how to do things like this in the best way. We are a for-profit company, and our data is our largest asset, so we do have to be careful about what we make available to protect our business interests, but we’re very interested in helping to further the discussion around college outcomes. We are the largest and best source of this data.
That’s great to hear, and thank you in particular for replying to this portion- #8 has not been reworded a bit to focus more on the core issue it was intended to address. There is of course no way for me to know about your responses to all requests definitively, and I’m sure you may get many, so I’ll also remove the anecdotal note about the experience of some colleagues, and if you have interest in working with some university-based research groups that explore these sorts of data issues all the time, I’m glad to suggest some. However, limited engagement with outside researchers and work with individuals in advisory or consultant roles still falls short of the larger point that you allude to in your comment: Payscale.com is a for-profit company and the data is (entirely reasonably!) proprietary. What that means, as you suggest, is that releasing more than a certain policed amount of data is contrary to Payscale’s business interests; what it also means is that evidence pointing to bias in the data is problematic for the company in ways not only methodological but financial. That’s an entirely understandable, necessary limitation of any for-profit company doing this sort of work. Thus the post concludes noting that the real question isn’t about the quality/access to Payscale data, but about why there isn’t a federal, publicly accountable option, especially given that the government is already collecting this data through the FAFSA system and the IRS- they just are not legally able to merge it. This option would allow us to address not only earnings outcomes, but more importantly issues around graduation, retention, and (as our conversation around advanced degrees suggests) post graduate degree attainment. A growing national voice including the President, non-profits, tax-payers, and students are arguing that even as we laud the value of education we can acknowledge that it is also an expensive and stratified enterprise, and that, as such, we have the right to ask tough questions of our colleges and universities- and to make that conversation public and accountable. I suggest here simply that we owe it to ourselves and the many schools “doing it right” to be able to hold the data we use in that process to the same publicly accountable standard.
Many thanks again for the reply. I’m glad to continue the conversation and would love to learn more about your efforts to make even a restricted dataset open to analysis, and would be glad to add any follow-ups you have to the post.