Skip to content

SHRUG open-source shapefile⚓︎

In SHRUG v2, we include a set of open-source maps for towns/villages, subdistricts, districts, state, and shrid boundaries. All shapes except shrids represent 2011 Population Census locations. These maps are stitched together from multiple open-source maps, all of which were incomplete in and of themselves. Sources include the SEDAC data center at Columbia (which hosts 1991 and 2001 maps, which we carried forward using our SHRUG town and village keys), Bharatmaps, Datameet, and the Administrative Atlas of India. We linked these sources through common location codes, georeferenced when necessary, and geometrically harmonized them to the best of our ability.

While DDL uses constituency boundary maps in our own calculations, those maps are not published under an open source license. Users seeking open-source constituency maps are encouraged to check out DataMeet's open source Assembly Constituency files on Github.

We extensively reviewed the open source maps against our proprietary geographic data (which are not published under open source license). We found that while the locality boundaries for India don’t agree across sources, the misalignments are not drastic and conclude that the administrative boundaries in our proprietary maps are slightly more accurate. The location coverage as measured by the match rate of location identifiers between official 2011 Population Census localities and the open source map is near perfect for all states.

We believe that, at this time, these maps are the most accurate and complete publically available source for India's town and village boundaries.

SHRID2 open shapefile⚓︎

Please note that the open PC11 town/village shapefile will have different number of polygons from the open shrid-level shapefile. The latter is constructed by dissolving village/town polygons based on the relationship between population census units and shrids. The open PC11 town/village shapefile has 649618 unique polygons while the open shrid-level shapefile has 575153 unique polygons. For a more concrete example, consider the Delhi shrid (11-07-090-00431-800441). This shrid comprises of 246 unique population census units (polygons), which are aggregated to make 1 unique shrid.

Delhi SHRID

The polysource variable in the shrid-level shapefile describes the method of aggregation from PC11 town/village shapefile to shrids.

  • 1 shrid = 1 pc polygon refers to a 1:1 match between 1 PC11 town/village and shrid (n = 570666)

    • Example: 11-09-178-00908-167603 Example polygon
  • 1 shrid = 1 contiguous set of pc polygons implies that the given shrid is a multipolygon but has been dissolved into one contiguous polygon (n = 4495)

    • Example: 11-08-113-00567-084285 Example polygon
  • 1 shrid = 1 largest land pc polygon implies that the largest piece of the PC11 town/village multipolygon was selected as the shrid polygon based on land area (n = 84)

  • 1 shrid = 1 largest pop pc polygon implies that the largest piece of the PC11 town/village multipolygon was selected as the shrid polygon based on population (n = 128)
  • 1 shrid = 1 largest pop & land pc polygon implies that the largest piece of the PC11 town/village multipolygon was selected as the shrid polygon based on land area and population (n = 116)
  • 1 shrid = multi pc polygons within 10km implies that the shrid is a multipolygon but the distance between the farthest pieces is less than 10km. In this case, we leave the shrid in pieces (n = 659)
  • 1 shrid = 1 manually picked polygon from far multipolygons, shrid pop > 5k implies that we cross-referenced the location of the town/village in Google maps and manually selected 1 polygon in the right location (n = 5)

Note

Satellite data released as part of SHRUG v2 is estimated from proprietary geographic data and not open shapefiles. If you try to compare the data across these sources, the estimates may differ.

File formats⚓︎

All maps are available in two file formats, ESRI shapefile (.shp) and GeoPackage (.gpkg). Due to variable name length constraints in shapefiles, 2011 Population Census location identifiers have been shortened to less than 10 characters (e.g. pc11_state_id shorted to pc11_s_id). In the GeoPackage versions, location ID variable names can merge automatically with the rest of the SHRUG.

Limitations⚓︎

  • Village and town boundaries are best understood to represent true boundaries with 0–1 km of measurement error. This seems to be the state of play with Indian village maps — every village map (open or proprietary) that we have come across has had at least this level of inaccuracy. As such, we suggest caution in using these boundaries to identify differences along narrow spatial dimensions, like neighboring village boundaries.

  • A number of locations in India appear to be represented in official data only as points, not as polygons — for instance, in forest areas and in the northeast. Where we only had point geometries, we generated boundaries using Thiessen polygons, constraining the size of each unit to its spatial area according to the village directory. We validated boundaries against several external sources including satellite imagery as well as OpenStreetMap data. However, India is huge and of course we could not validate every village; some errors are likely to remain.

Centroid alignment⚓︎

The boundary misalignment measures above were effective to detect misalignment for larger administrative units. To quantify overall agreement of locality boundaries, we calculated the distance between town/village polygon centroids in the open source and proprietary maps. One third of all proprietary map centroids lie within 500 meters of the open source equivalents, and two thirds lie within 1 kilometer. Only 99 localities have centroids more than 50 kilometers apart.

Data quality measures⚓︎

The table shrid2_spatial_stats.dta contains a variety of data quality measures, including flags about how the individual shrid polygon was generated, distances to towns, and a variety of other checks. The variables within this file are catalogued on the metadata page of this website. Some variables (such as polysource) are available both in shrid2_spatial_stats.dta and the shrid2.gpkg geometries themselves.