9 Data Output
The main outputs provided to the Scottish Government are SPSS data files, delivered on an annual basis at the end of the survey. There are three separate SPSS data files provided:
- Respondent file ( RF);
- Victim form file ( VFF);
- Self-completion file ( SCF).
9.1.1 Respondent file
The RF data file is produced at the level of the individual respondent and contains all questionnaire data and associated variables, excluding information that is collected in the victim form or the self-completion questionnaire. The file also contains additional variables such as geo-demographic variables from the sample data and the derived variables for incidence and prevalence measures. Data for all respondents is provided in the RF file, irrespective of whether they were victims or non-victims.
9.1.2 Victim form file
The VFF data file is produced at the level of the individual incident and contains all the data collected in the victim form. Thus, an individual respondent who reported three separate incidents and completed three victim forms would have three separate records in the VFF data file.
All victim forms are included in the file; including cases where the incident occurred outside of the reference period or outside of Scotland. These records were not used for analysis and contain very little information (the victim form questionnaire is terminated in these cases - section 3.4.1), but are retained on the file for use by researchers who may wish to examine this data. Similarly, victim forms which were assigned a non-valid offence code (and therefore were not used in the production of the 'all SCJS crime' statistics from the survey) are also retained (section 7.1).
9.1.3 Self-completion file
The SCF data file is produced at the level of the respondent and contains all of the data and associated variables in the self-completion questionnaire (illicit drug use, stalking and harassment, partner abuse and sexual victimisation) as well as the key demographic variables from the RF data file. The file can also be linked to the RF data file for analysis purposes via use of the variable SERIAL.
The questions in illicit drugs section of the SCF data file do not contain responses for respondents who say they have ever taken semeron (a fictitious drug - section 3.6.1). These respondents (nine for the 2009/10 survey) are identified in the variable SEMERON.
9.2 Content of SPSS data files
The SPSS data files delivered to the Scottish Government and available on the UK Data Archive contain different types of variables, including:
- Questionnaire variables (all files). SPSS variable names correspond to question labels from the questionnaire. Variable names are also repeated in variable labels;
- Incidence and prevalence variables ( RF and SCF data files - Chapter 7).
- Geo-demographic variables (all data files). All cases have a set of pre-specified geo-demographic variables attached to them, including Police Force Area ( PFA), Community Justice Authority Area ( CJAA), National Criminal Justice Board Area ( NCJBA), 82 Local Authority ( LA), 83 Health Board Area ( HBA), 2009 Scottish Index of Multiple Deprivation ( SIMD) 84 and 2007-2008 Scottish Government Urban Rural classification; 85
- Coding variables ( RF and VFF data files). On the RF data file, SOC2000 and NS- SEC codes (based on SOC2000) are included for the respondent (see section 6.3).
- Offence coding variables (all files). On the VFF data file, a full set of offence codes, including the history, are attached as outlined in section 6.1.2. The RF and SCF data files contain the final offence code assigned to each respondent's victim forms (section 6.1.2);
- Derived variables (all files). Many derived variables are also added to the file. There are two main types of derived variables:
- Flag variables that identify, for example, the date of interview, the month of issue, a partial or full interview, a victim or non-victim etc. On the VFF data file, flag variables include whether an incident was in-scope or non-valid (section 7.1.1), whether it was a series or a single incident (section 3.3.2), whether the respondent had an unmet support need etc;
- Classificatory variables derived from the data. These included standard classifications such as banded age groups, household composition, tenure, etc;
- Interviewer and observational variables (all files). All interviews had a small amount of observational data collected by interviewers in the CAPI script, such as whether the respondent required any help with the self-completion section of the questionnaire;
- Weighting variables (all files). See section 8.10 for further information on what these variables are and how they should be used.
9.3 Conventions used in SPSS data files
Consistency was retained between the SCJS 2008/09 and 2009/10 data files. In the majority of cases, SPSS variable names correspond to question labels from the questionnaire.
9.3.1 Case identifiers
There are two types of case identifiers in the data files: SERIAL (all files) and VSERIAL (victim form file [ VFF] data file).
The unique identifier SERIAL consists of up to six digits and is present in the respondent file ( RF) data file (where each individual case or record represents an individual respondent) as well as the VFF data file (where the identifier is no longer unique as respondents can have more than one victim form).
In the VFF, where each individual case or record represents a victim form, the unique case identifier (VSERIAL) is identical to SERIAL, but with the addition of the victim form number (01 to 05) at the end. This gives each victim form a unique identifier.
9.3.2 Don't know and refused values
Don't know and refused codes are standard on most questions. They have been assigned standard values in SPSS to aid data analysis:
Don't Know: -1
For multicode variables in the SPSS data files, the variables relating to the don't know code are named ending '_dk' and for refused '_rf'.
9.3.3 Multiple response variables
Multiple response variables were set up as a set of variables equal to the total number of answers possible (including Don't Know and Refused and any additional codes added in the coding process - see section 6.2). Multiple response variables generally follow the format <question label><_><01> with the underscore denoting a multiple response variable and the number incrementing with each additional variable. Each variable was then given a value of '1' or '0', depending on whether the respondent gave that particular answer or not.
An example of a multiple response variable where there are seven possible answer categories, and so seven separate variables, is shown below:
QMAGE: How old were the people who did it? Would you say they were … READ OUT. MULTICODE OK.
1. Under school age (QMAGE_01)
2. Of school age (QMAGE_02)
3. Aged between 16 and 24 (QMAGE_03)
4. Aged between 25 and 39 (QMAGE_04)
5. Aged 40 or over? (QMAGE_05)
6. Don't Know (QMAGE_dk)
7. Refused (QMAGE_rf)