Legislative Constituency Identifiers⚓︎

SHRUG includes socioeconomic data aggregated to the level of the legislative constituency, using both boundaries before and after the redelimitation in 2007. Parliamentary constituency boundaries will be available in a future version of the SHRUG.

For each legislative constituency, we create a location identifer that is consistent for the entire period of the delimitation. These are labeled ac07_id for the 1976–2007 delimitation and ac08_id for the 2008–present delimitation. The identifier takes the form SS-AAA, where SS is the 2011 Population Census state code, and AAA is either (i) the last assembly number used internally by the Election Commission (1976–2007); or (ii) the first assembly number used by the Election Commission after the 2007 delimitation.

Limitations of constituency data⚓︎

First, unfortunately constituency identifiers have not been used consistently by the Election Commission of India (ECI), making it sometimes challenging to link constituencies over time. Our approach makes it easy to link a constituency in Jharkhand to the same constituency when it was part of Madhya Pradesh, but this causes some discrepancies between the constituency numbers used by the ECI in some years. We do not include the 20 constituencies in Uttar Pradesh which were reformed into the 70 constituencies of Uttarakhand in 2001 because we could not obtain a high quality map of the prior UP constituencies. However, the 70 Uttarakhand constituencies are included. We also do not include post-delimitation Jharkhand because our constituency map had errors in this state. A future version will correct this.

Please note that constituency identifiers are extremely inconsistent across data sources; often some set of numeric identifiers have excellent overlap, while others within the same state do not. While the numeric codes can be useful for matching, the name matches should always be verified. Kerala, Goa, Tripura and Sikkim are missing from the 2007 constituency SHRUG because our constituency maps for them were particularly low quality. It was particularly difficult to assign villages to constituencies in these two states because Kerala has very large villages and the other three have very small constituencies and misaligned shapefiles.

Second, in constituencies that contain or overlap with large cities, our constituency population data and population-weighted variables may be distorted. The constituency SHRUG was assembled by aggregating shrid data. This requires a key matching shrids to constituencies; a detailed description of how we made the shrid-constituency keys is included below. Because the maximum electorate of a constituency is lower than the population of India's largest cities, a small subset of large city shrids span multiple constituencies. To calculate constituency population, we aggregate the population of shrids (or pieces of shrids) that intersect with the constituency. But, without information on the spatial distribution of urban populations, we must assume (however incorrectly) that population is evenly distributed within shrid. Only in the small minority of constituencies that contain large cities, this leads to substantial under-estimation of population in constituencies with small land area, and over-estimation in constituencies with large land area. The population distortion will bias any population-weighted variables (means and counts) in the same directions.

In the SHRUG, we have already addressed the most biased data by dropping constituency population and population-weighted variables in the 10% of cases where publicly available electorate count and our population estimates disagreed most severely (under- or over-estimate).

The shrid-constituency keys⚓︎

As described above, a small minority of large urban shrids span multiple constituencies. For these shrids, we take an approach that breaks shrids into component pieces, called "fragments." A fragment is the piece of one shrid contained entirely in a single constituency. The fragment approach prevents over-aggregation of data across multiple constituencies spanned by single shrids. Thus, the shrid-constituency keys link shrid fragments to constituencies.

Within each shrid, we make a shrid fragment weight, the share of the shrid contained in the fragment. The fragment weight is normalized to sum to 1 within shrids, and indicates the distribution of the shrid across constituencies (e.g. a 30-70 split). Fragment weights are derived from two sources. First, for a small subset of 2008 constituencies containing large towns, we use ECI's database of constituency population by towns/ward IDs, which are linkable to both shrid and constituency. The ECI population database is most reliable, but only available for a small number of urban constituencies. So second, for all remaining constituencies, we overlay the shrid and constituency maps, and assign fragment weight as the share of shrid area lying in each constituency. Importantly, we do not mix population and area weights within shrid; each shrid's fragment weights are based either all on population or all on land area. Thus, fragment weight is a general measure of each constituency's "importance" or "share" of the shrid without elaborating on the nature of the importance. For example, if the weight of a shrid fragment linked to constituency 07-001 is 0.10, it indicates that 10% of the shrid lies within 07-001.

In the shrid-constituency keys, shrid fragments do have IDs, but these are simply numeric IDs generated by DDL, and have no correspondence to an adminstrative unit. The final shrid-constituency keys are unique on shrid ID and fragment ID.

Constituency population and land area⚓︎

We estimate constituency land area as the area of the polygon in the constituency map.

Estimating constituency population is much more complicated. We estimate constituency population by aggregating the the 2001 Primary Census Abstract population of shrids (or shrid fragments) intersecting with the constituency. To calculate shrid fragment population, we multiply shrid population by fragment weight. Unfortunately, this implicitly assumes that population is uniformly distributed within shrids, but in the absence of very granular population density data, then assumption is necessary.

We then validate our constituency population estimates by comparing them to electorate data from the Trivedi Centre for Political Data. We calculate the ratio of our estimate divided by Trivedi electorate, and replace constituency population as missing in constituencies where the ratio falls outside the 5th to 95th percentile range. Effectly, we set population to missing in the 10% of constituencies where Trivedi and SHRUG disagree most severely. We do this for the pre-2007 and post-2007 data separately. In pre-2007 data, we drop outliers where the ratio falls outside range [0.5, 3], and in post-2007 data, outside range [0.6, 1.6]. As a consequence, all population-weighted variables in the SHRUG are missing in these constituencies.