Skip to content

Contribution Checklist⚓︎

What form should my data take?⚓︎

Your data should be in tabular format linked to SHRUG identifiers. Most often, data will be contributed at the shrid level. Occasionally other dimensions (such as time) will be required to uniquely identify the rows, or data will be contributed at other levels, such as the subdistrict or district.

You can link your data from the various census identifiers included in the SHRUG keys, e.g. the PC11 village-town or shrid key. For example, if you have village- or town-level data that you would like to contribute, you will first need to convert from PC11 identifiers to the shrid.

If you have spatial data or other data that does not link easily to shrid, or data that exists at a higher level of aggregation but would still make a meaningful open-source contribution, please contact us at info@devdatalab.org and we may work with you to develop a plan on how to incorporate it.

Data Format and Content⚓︎

Please ensure that your data follows meets these requirements before submitting:

  • Data is in tabular format linked to SHRUG identifiers (usually but not necessarily at the shrid level)
  • Data is linked from census identifiers included in SHRUG keys (e.g., PC11 village-town or shrid key)
  • Assert that each row is uniquely identifiable (usually by geographic variables alone, but sometimes other dimensions are required e.g. time)
  • All variable names are lowercase with underscores – no camelCase1
  • Descriptive variable labels are included in the dataset if submitting data in .dta format
  • A key is provided for any data that is not shrid-level2

Metadata and Documentation⚓︎

One of the most important elements of making a contribution to the SHRUG is to create the necessary documentation that describes geographic and temporal coverage, dataset source, variable construction and usage notes, and any other information relevant to end users. We have a metadata template that ensures this information is collected and available to users. You will include this metadata sheet in your ZIP archive and will also upload it to the SHRUG Data Contribution Form.

If fields aren't relevant to your data (e.g. sampling or weighting), enter "N/A". If you have any questions, please reach out to use at info@devdatalab.org.

Code⚓︎

As appropriate, we expect contributors to provide source code that creates their data. We also expect contributors to provide example code scripts for analysis and integration with the SHRUG.

  • Source code that creates your data
  • Example code script(s) for analysis and integration with the SHRUG

File Structure⚓︎

Your contribution must include a ZIP archive named archive.zip with the following structure:

archive.zip
 |-- code/
 |------- source_code/
 |-------------- source code that creates your data
 |------- usage_example/
 |-------------- example code script(s) for analysis and integration with the SHRUG
 |--  data/
 |------- all data files
 |--  documentation/
 |------- all supporting documentation files, e.g. from original producer
 |--  metadata/
 |------- metadata.xlsx (variable, table, and dataset-level, following the DDL template)
 | README.md (included in the template, contains license)
 └ citation.tex (citation for your data)

Once you've met all these requirements, you're ready to submit your contribution.

SHRUG Contribution Form


  1. Good: pc11_state_id. Bad: pc11StateID. Bad: pc11_State_ID. Bad: pc11-state-id

  2. A key is a table that matches one set of ID variables to another. For example, an Economic Census 2013 to Population Census 2011 district-level key would match EC13 district IDs to PC11 district IDs. For example, if you are contributing RBI bank branch data, you should provide a key that matches bank branch IDs to shrid