16 October 2017

Open Data for Education

There’s a global crisis in learning, and we need to learn more about how to address it. Whilst data collection is costly, developing countries have millions of dollars worth of data about learning just sitting around unused on paper and spreadsheets in government offices. It’s time for an Open Data Revolution for Education.

The 2018 World Development Report makes clear the scale of the global learning crisis. Fewer than 1 in 5 primary school students in low income countries can pass a minimum proficiency threshold. The report concludes by listing 3 ideas on what external actors can do about it;
  1. Support the creation of objective, politically salient information
  2. Encourage flexibility and support reform coalitions
  3. Link financing more closely to results that lead to learning
The first of these, generating new information about learning, can be expensive. Travelling up and down countries to sit and test kids for a survey can cost a lot of money. The average RCT costs in the range of $0.5m. Statistician Morten Jerven added up the costs of establishing the basic census and national surveys necessary to measure the SDGs — coming to a total of $27 billion per year, far more than is currently spent on statistics.

And as expensive as they can be, surveys have limited value to policymakers as they focus on a limited sample and can only provide data about trends and averages, not individual schools. As my colleague Justin Sandefur has written; “International comparability be damned. Governments need disaggregated, high frequency data linked to sub-national units of administrative accountability.”

Even for research, much of the cutting edge education literature in advanced countries makes use of administrative not survey data. Professor Tom Kane (Harvard) has argued persuasively that education researchers in the US should abandon expensive and slow data collection for RCTs, and instead focus on using existing administrative testing and data infrastructure, linked to data on school inputs, for quasi-experimental analyses than can be done quickly and cheaply.

Can this work in developing countries?
My first PhD chapter (published in the Journal of African Economies) uses administrative test score data from Uganda, made available by the Uganda National Exams Board at no cost, saving data collection that would have cost hundreds of thousands of pounds and probably been prohibitively expensive. We’ve also analysed the same data to estimate the quality of all schools across the country, so policymakers can look up the effectiveness of any school they like, not just the handful that might have been in a survey (announced last week in the Daily Monitor).

Another paper I’m working on is looking at the Public School Support Programme (PSSP) in Punjab province, Pakistan. The staged roll-out of the program provides a neat quasi-experimental design that lasted only for the 2016–17 school year (the control group have since been treated). It would be impossible to go in now and collect retrospective test score data on how students would have performed at the end of the last school year. Fortunately, Punjab has a great administrative data infrastructure (though not quite as open as the website makes out), and I’m able to look at trends in enrolment and test scores over several years, and how these trends change with treatment by the program. And all at next to no cost.

For sure there are problems associated with using administrative data rather than independently collected data. As Justin Sandefur and Amanda Glassman point out in their paper, official data doesn’t always line up with independently collected survey data, likely because officials may have a strong incentive to report that everything is going well. Further, researchers don’t have the same level of control or even understanding about what questions are asked, and how data is generated. Our colleagues at Peas have tried to useofficial test data in Uganda but found the granularity of the test is not sufficient for their needs. In India there is not one but several test boards, who end up competing with each other and driving grade inflation. But not all administrative data is that bad. To the extent that there is measurement error, this only matters for research if it is systematically associated with specific students or schools. If the low quality and poor psychometric properties of an official test are just noisy estimates of true learning, this isn’t such a huge problem.

Why isn’t there more research done using official test score data? Data quality is one issue, but another big part is the limited accessibility of data. Education writer Matt Barnum wrote recently about “data wars” between researchers fighting to get access to public data in Louisiana and Arizona. When data is made easily available it gets used; a google scholar search for the UK “National Pupil Database” finds 2,040 studies.

How do we get more Open Data for Education?
Open data is not a new concept. There is an Open Data Charter defining what open means (Open by default, timely and comprehensive, accessible and usable, comparable and interoperable). The Web Foundation ranks countries on how open their data is across a range of domains in their Open Data Barometer, and there is also an Open Data Index and an Open Data Inventory.

Developing countries are increasingly signing up to transparency initiatives such as the Open Government Partnership, attending the Africa Open Data conference, or signing up to the African data consensus.

But whilst the high-level political backing is often there, the technical requirements for putting together a National Pupil Database are not trivial, and there are costs associated with cleaning and labelling data, hosting data, and regulating access to ensure privacy is preserved.

There is a gap here for a set of standards to be established in how governments should organise their existing test score data, and a gap for financing to help establish systems. A good example of what could be useful for education is the Agriculture Open Data Package: a collaboratively developed “roadmap for governments to publish data as open data to empowering farmers, optimising agricultural practice, stimulating rural finance, facilitating the agri value chain, enforcing policies, and promoting government transparency and efficiency.” The roadmap outlines what data governments should make available, how to think about organising the infrastructure of data collection and publication, and further practical considerations for implementing open data.

Information wants to be free. It’s time to make it happen.

11 October 2017

Why don’t parents value school effectiveness? (because they think LeBron’s coach is a genius)


A new NBER study exploits the NYC centralised school admissions database to understand how parents choose which schools to apply for, and finds (shock!) parents choose schools based on easily observable things (test scores) rather than very difficult to observe things (actual school quality as estimated (noisily!) by value-added).

Value-added models are great — they’re a much fairer way of judging schools than just looking at test scores. Whilst test scores conflate what the student’s home background does with what the school does, value-added models (attempt to) control for a student’s starting level (and therefore all the home inputs up that point), and just looking at the progress that students make whilst at a school.

David Leonhardt put in well;
“For the most part, though, identifying a good school is hard for parents. Conventional wisdom usually defines a good school as one attended by high-achieving students, which is easy to measure. But that’s akin to concluding that all of LeBron James’s coaches have been geniuses.”
Whilst value-added models are fairer on average, they’re typically pretty noisy for any individual school, with large and overlapping confidence intervals. Here’s the distribution of school value-added estimates for Uganda (below). There are some schools at the top and bottom that are clearly a lot better or worse than average (0), but there are also a lot of schools around the middle that are pretty hard to distinguish from each other, and that is using an econometric model to analyse hundreds of thousands of data points. A researcher or policymaker who can observe the test score of every student in the country can’t distinguish between the actual quality of many pairs of schools, and we expect parents to be able to do so on the basis of just a handful of datapoints and some kind of econometric model in their head??




Making school quality visible

If parents don’t value school effectiveness when it is invisible, what happens if we make it visible by publishing data on value-added? There are now several studies looking at the effect of providing information to parents on test score levels, finding that parents do indeed change their behaviour, but there are far fewer studies directly comparing the provision of value-added information with test score levels.

One study from LA did do this, looking at the effect of releasing value-added data compared to just test score levels on local house prices, finding no additional effect of providing the value-added data. But this might just be because not enough of the right people accessed the online database (indeed, another study from North Carolina found that providing parents with a 1-page sheet with information that had already been online for ages already still caused a change in school choice).

It is still possible that publishing and properly targeting information on school effectiveness might change parent behaviour.

Ultimately though, we’re going to keep working on generating value-added data with our partners because even if parents don’t end up valuing the value-added data, there are two other important actors who perhaps might — the government when it is considering how to manage school performance, and the school themselves.