Skip to content

Linking the SHRUG to Additional Data⚓︎

The Population and Economic Censuses (among other administrative data sources in India) contain much more potential data than we are able to include in the SHRUG. Some of the data that can be linked to SHRUG via the raw Population and Economic Censuses include:

  • Disaggregated data about firms, including firm size, source of finance, and public ownership.
  • Additional village characteristics, including post offices, health centers, train stations, and characteristics of agricultural production.
  • Additional town characteristics, including district capitals, transportation, and electricity infrastructure

To make it easy to link the SHRUG to the underlying data, we include keys that link shrids to each Economic and Population Census in a single step. See the page on the SHRUG keys for details. The keys are unique on Economic and Population Census identifiers but are not necessarily unique on shrids. Researchers wishing to match the SHRUG to multiple rounds of data will need to decide how to deal with these duplicates. We advise collapsing external data sources to the SHRUG geographic unit of interest (shrid, for example) before merging to the core SHRUG. Stata code to link SHRUG to additional data in the 1991 and 2001 PCAs would thus take the following form:

/* 1. open the SHRUG PCA */
use pc01_pca_shrid.dta, clear

/* 2. prefix shrug data so it does not duplicate */
ren * sh_*
ren sh_shrid shrid

/* 3. merge to the additional population census data */
merge 1:1 shrid using PCA2001.dta, keepusing(...)

/* 4. collapse PCA back to the shrid level, but don’t recollapse SHRUG data */
collapse (sum) pc01_pca_* (firstnm) sh_*, by(shrid)

/* 5. reset names to original format */
ren sh_* *

/* 5. Go back to step 2 in order to merge to additional data */
ren * sh_*
ren sh_shrid shrid
merge 1:1 shrid using PCA1991.dta
[etc...]

There are many administrative and private datasets in India linked to a Population Census year. Users can merge any of these data into the SHRUG by merging their data to shrids via the SHRUG keys. Merging into SHRUG data at higher aggregation levels like district requires an additional merge via the shrid-aggregate location key, then collapsing.

For example, the complete PCAs, Town and Village Directories, and Economic Censuses have been made openly available online by the Indian government. The SHRUG contains a subset of fields available in these broader datasets, because creating data fields that are consistently described and aggregated across time requires careful attention and cleaning of the raw data.

The included keys make it easy to conduct analysis at the shrid level using any additional fields. But we caution users bringing in additional data to carefully examine the raw data for miscodings, missing values, outliers, and inconsistent definitions across years, which are common in the raw census data.