Contribution Checklist⚓︎
What form should my data take?⚓︎
Your data should be in tabular format linked to SHRUG identifiers. Most often, data will be contributed at the shrid
level. Occasionally other dimensions (such as time) will be required to uniquely identify the rows, or data will be contributed at other levels, such as the subdistrict or district.
You can link your data from the various census identifiers included in the SHRUG keys, e.g. the PC11 village-town or shrid
key. For example, if you have village- or town-level data that you would like to contribute, you will first need to convert from PC11 identifiers to the shrid
.
If you have spatial data or other data that does not link easily to shrid
, or data that exists at a higher level of aggregation but would still make a meaningful open-source contribution, please contact us at info@devdatalab.org and we may work with you to develop a plan on how to incorporate it.
Data Format and Content⚓︎
Please ensure that your data follows meets these requirements before submitting:
- Data is in tabular format linked to SHRUG or PC11 identifiers (usually but not necessarily at the 2011 population census village-town or
shrid
level) - Assert that each row is uniquely identifiable (usually by geographic variables alone, but sometimes other dimensions are required e.g. time)
- All variable names and filenames are lowercase with underscores – no camelCase1
- Descriptive variable labels are included in the dataset if submitting data in .dta format
- A key is provided for any data that is not shrid-level2
- Any variables found elsewhere in the SHRUG are named the same (e.g. if your dataset has PC11 village codes, that variable must be
pc11_village_id
, not something likevillage_code
) - Any variables found elsewhere in the SHRUG are in the same format, e.g.
pc11_village_id
is a string variable with leading zeroes, not an integer - All irrelevant and intermediate variables have been removed
- ID variables have been sorted to the top/front of the data
Metadata and Documentation⚓︎
One of the most important elements of making a contribution to the SHRUG is to create the necessary documentation that describes geographic and temporal coverage, dataset source, variable construction and usage notes, and any other information relevant to end users. We have a metadata template that ensures this information is collected and available to users. You will include this metadata sheet in your ZIP archive and will also upload it to the SHRUG Data Contribution Form.
The methodology used to construct your dataset must be made clear within the metadata. Which files were downloaded from what source on what date? How were downoaded used and/or modified? If fields aren't relevant to your data (e.g. sampling or weighting), enter "N/A". If you have any questions, please reach out to use at info@devdatalab.org.
- All three tabs of the metadata spreadsheet have been filled out
- All variables in the data have variable-level metadata rows, and variable names match
- All template fields still exist in the spreadsheet (i.e. leave a field blank if necessary, do not delete it)
Code⚓︎
As appropriate, we expect contributors to provide source code that creates their data. We also expect contributors to provide example code scripts for analysis and integration with the SHRUG.
- Source code that creates your data
- Example code script(s) for analysis and integration with the SHRUG
File Structure⚓︎
Your contribution must include a ZIP archive named archive.zip
with the following structure:
archive.zip
|-- code/
|------- source_code/
|-------------- source code that creates your data
|------- usage_example/
|-------------- example code script(s) for analysis and integration with the SHRUG
|-- data/
|------- all data files
|-- documentation/
|------- all supporting documentation files, e.g. from original producer
|-- metadata/
|------- metadata.xlsx (variable, table, and dataset-level, following the DDL template)
| README.md (included in the template, contains license)
└ citation.tex (citation for your data)
Once you've met all these requirements, you're ready to submit your contribution.
-
Good:
pc11_state_id
. Bad:pc11StateID
. Bad:pc11_State_ID
. Bad:pc11-state-id
. ↩ -
A key is a table that matches one set of ID variables to another. For example, an Economic Census 2013 to Population Census 2011 district-level key would match EC13 district IDs to PC11 district IDs. For example, if you are contributing RBI bank branch data, you should provide a key that matches bank branch IDs to
shrid
. ↩