Data Collection Preparation Tips
When preparing the data for archiving, please remember the following:
- Information that could identify an individual should not be included in the data submitted to ICPSR.
- If the dataset consists of two or more related files, variables that link the files should be included for each file along with documentation that clearly explains the relationship among the files and the variables needed to link them.
- Each variable should have a set of exhaustive, mutually-exclusive codes. These codes should be thoroughly documented in the codebook. Variable labels and value labels should clearly describe the information or question recorded in that variable.
- Document blanks whenever they are used as codes.
- Use separate codes to distinguish cases where information was not applicable from other types of missing data such as "don't know" or "refused to answer".
- Investigators are urged to check for out-of-range codes and codes that are inconsistent with skip patterns or internal consistency.
- If the data include transformed variables, variables derived from other variables, formulas or details should be provided that explain how the derived variables were computed.
- Include weights in the data files if they were developed. Explain how they were generated and appropriate ways to use them.
- Variables should be defined as numeric whenever possible. A wider range of analyses can be performed with numeric variables.
