Skip to content

Town and Village Identifiers (aka shrids)⚓︎

Town and Village Identifiers⚓︎

The backbone of the SHRUG is a set of keys that link all the Indian Population Censuses to each other from 1991 to 2011 at the smallest consistent geographic unit possible. These keys were developed by matching towns and villages across population censuses, with close attention to splits, merges, and other realignments. Prior to SHRUG, we are aware of no dataset that allows accurate linking across all these datasets at the level of the town and village.

Linking these multiple survey rounds has necessitated merging units at different levels of aggregation depending on how those units have changed. The unit of aggregation in the SHRUG is a SHRUG identifier, or a shrid. In many cases, no aggregation was required and shrids can be matched to single towns and villages in all underlying datasets. However, when two units merge in any population census period, we have merged these units in all periods to allow consistent analysis of the unit. Some of the largest units are Delhi and Chandigarh, for which we were not able to retain any aggregations below the entire metropolis because of changes in unit identifiers across the censuses.

The shrids in SHRUG versions 2.0 and later have been re-defined and improved. Although the vast majority (over 99%) of shrids in SHRUG v1 and v2 represent the same geographic area, we refer to them as "shrid1" and "shrid2" to avoid confusion.

A unique characteristic of location identifiers in the SHRUG is that we treat villages and towns equivalently in cases where similar data is recorded for both. As a result, each SHRUG dataset contains both villages and towns. In contrast, the Indian Population and Economic Censuses use an arbitrary distinction between villages and towns that has no basis in governance, and results in several villages with population over 50,000, and several towns with only a few hundred people. For each shrid, we have included a weights key of rural (from villages), urban (from towns), and total population. Shrid land area is included in the same sectoral divisions. Users can use weights to determine the relative urbanization of each shrid.

Note that when rural and urban fields were identical in a survey (for example, "number of employees" in the Economic Census), then we have aggregated the rural and urban data into a single value in shrids. In other cases where rural and urban data were not directly comparable, we package urban and rural shrid data separately. For this reason, the town and village directories are packaged separately, because they contain many sector-specific variables (e.g. rural population, or power supply for agricultural use). Users should be very cautious when attempting to combine sector-specific data or analyze shrids in which urbanization rate changes dramatically across time. One dangerous example is a shrid that was 2 villages in 1991, then re-classified to 1 town and 1 village in 2001, then consolidated into 1 town in 2011. The example shrid would show a large decline in rural population from 1991 to 2001, but this would reflect reclassification, not population loss. However, the main population field shrid_pcYY_pca_tot_pp will accurately track total population across the entire sample.

How SHRUG identifiers are named⚓︎

For shrids containing any towns:

  • PC91: YY-SS-DD-00-TTTTT
  • PC01: YY-SS-DDD-00-TTTTTTTT
  • PC11: YY-SS-DDD-sssss-TTTTTT

For shrids that only contain villages:

  • PC91: YY-SS-DD-ssss-VVVV
  • PC01: YY-SS-DDD-ssss-VVVVVVVV
  • PC11: YY-SS-DDD-sssss-VVVVVV

In all shrids, YY indicates the latest census year to which the shrid is matched. If observations are matched to 2011 PC locations, we use 11. If observations are matched to 2001 PC locations but not 2011, we use 01, etc.. SS indicates the state identifier corresponding to PC year YY. DDD indicates the district code. ss indicates the subdistrict code. VV/TT indicates the census code of the most populous town/village in the shrid, based on PC year YY.

SHRUG location names⚓︎

We set location names (state, district, subdistrict, town/village) for each shrid. The location names come from the Primary Census Abstract of the census year in the shrid ID. i.e. if a shrid starts with “11-”, then location names come from the 2011 PCA. In shrids that contain multiple towns/villages, names are assigned from the town/village with the highest population. To avoid confusion, shrids that combine multiple locations have location names ending with “”. For example, if a shrid combines 2 villages named “Attawa” (population 2,000) and “Badheri” (population 1,000), then the shrid village_name variable will be “Attawa”. The same is true in the rare case that a shrid has locations in different subdistricts or districts. The Delhi and Chandigarh shrids, each of which spans the whole state, are exceptions in which the shrid name is just the state name.

Differences in shrids between v1 and v2⚓︎

Change 1:Removed the Economic Census from the definition of shrids. The biggest change between shrid1 and shrid2 is that shrids in v1 represented stable geographic units across all Population Censuses and all Economic Censuses (1990, 1998, 2005, 2013). Shrids in v1 are stable geographic units across all Population Censuses. By removing the EC, shrids in v2 are simpler, and are less prone to over-grouping into very large shrids.

** Change 2**: Added districts and subdistricts as location identifiers.

Substantive change: This change between SHRUG1 and SHRUG2 only matters in the special edge cases where a single town/village is split across district or subdistrict boundaries.

Naming change: The more semantic change is the shrid ID variable in SHRUG2 is longer. Shrids in v1 had names like “[year]-[state]-[district sometimes]-[town/village]”. Shrids in v2 always have a space for district and subdistrict IDs.

Change 3: Broke 5 very large city shrid1s into smaller parts. In SHRUG v1, several mega-cities (Ahmedabad, for example) were single shrids. In SHRUG v2, they have broken into hundreds or thousands of smaller shrids, to prevent over-aggregation of data.