09 November 2011

Using "reshape" to generate country-year data in Stata

The other day, I observed a colleague creating a country-year dataset by hand--using Excel to type out a list of countries and then manually add years. It took her eight or ten hours.

This is a little inefficient.

So I thought I'd give a very quick tutorial in how to do this in 10 seconds.

First, open Stata and create a new file. (For convenience, I'll refer to this as "country.dta".)

Create one new variable, called "country."

Populate this with some arbitrary number of country names--"Belgium","France","Germany", whatever. Since this is an example, four or five will be fine.

Next, create some number of years, like so:

gen year1960=1960
gen year1961=1961
gen year1962=1962

You should now have four variables--"country", "year1960", "year1961", and "year1962"--of which the latter three should be identical. To see your data, type


Now, type

reshape long year, i(country)
drop _j

Once again, type


to see your data.

You'll see that you now have your data arrayed in country-year format. 

This is a toy example, but it's got obvious advantages. For more on the tools that went into this, see the UCLA computing site or type 

help reshape

from the Stata command line.

1 comment:

  1. Thanks, you just saved me 10 hours of my life :-)