This website is no longer being updated. Please go to GOV.SCOT

2009/10 Scottish Crime and Justice Survey: Technical Report

Listen

8 Weighting

8.1 Introduction

The rationale for weighting, a description of the methodology used and the weighting characteristics are provided in the sections below.

The SCJS, like the British Crime Survey ( BCS), technically consists of two highly related, but separate surveys; at various times in the survey the respondent provides information on behalf of the household as a whole and on behalf of themselves as an individual.

There are three main units of analysis used on the SCJS:

  • Households;
  • Individuals;
  • Incidents of victimisation.

Different weights are used depending upon the unit of analysis (and what data file is being analysed):

  • Household weights were constructed for use with variables where the household is the main unit of analysis. Some crimes are considered household crimes ( e.g. burglary, vandalism to household property, theft of and from a car - see section 7.2.1 and Annex 15 for a full list) and therefore the main unit of analysis is the household. Similarly, analysis for certain questions in the survey is also conducted at the household level (for example, accommodation type). In these cases the household weight would apply. The household weight is present in the respondent file ( RF) data file.
  • Individual weights were constructed for use with variables where the individual is the main unit of analysis. The individual weight would also be used when analysing personal feelings of safety when walking alone after dark in the local area and other questions where the respondent is asked for their personal opinion or information about themselves. Analysis of crimes which are considered personal crimes (assault, robbery, sexual offences etc. - see section 7.2.1) is undertaken using the individual weight. The individual weight is present in the RF data file.
  • Incident weights are used when analysing the characteristics of types of crime. The incident weight is only present in the victim form file ( VFF) data file. The incident weight is based on the corresponding household and individual weight (depending on whether the crime is classed as a household or personal crime) and additionally incorporates an expansion factor reflecting whether incidents in the victim form reflect a single or a series incident (see section 3.3.2). The incident weights are used for all analysis conducted on the VFF data file if 'all SCJS crime' is being analysed or any of the published statistics are being analysed.

The questionnaire included a self-completion section (sections 3.6 and 5.8). However, not all respondents to the main part of the questionnaire completed the self-completion section (section 4.6). Therefore, an additional set of individual weights was necessary for use when analysing this sub-sample. 75 The self-completion weights were calculated in a similar way to the main individual and household weights but were based only on respondents who had answered the self-completion section of the questionnaire. These are described in section 8.6.

The variable names used for each weight and their descriptions are presented in section 8.10.

8.2 Rationale for weighting

There are a number of reasons why weights are calculated for the SCJS sample. These include:

1. Correction of the sample for unequal probabilities of selection that arose from various aspects of the sample design. These included:

  • The requirement for a final sample in each Police Force Area ( PFA) equivalent to a simple random sample of 1,000. Consequently, PFAs with smaller populations were over-sampled relative to other PFAs;
  • The requirement for a final sample in each Local Authority ( LA) equivalent to a simple random sample of 250 (with the exception of Orkney, Shetland and Western Isles). Consequently, LAs with smaller populations were over-sampled relative to other LAs;
  • The number of dwellings at an address differed from the number on the Postcode Address File ( PAF) sample frame, despite the fact that PAF was expanded by the multiple occupation indicator ( MOI). 76 This resulted in an unequal probability of selection;
  • Since only one adult respondent (aged 16 or over) was selected from each household, the selection probability differed according to the number of adults in the household.

These corrections are known as design weights (or design correction weights).

2. Differing response rates by subgroups within the sample. Response rates can differ by household type, age, and gender (for example, a young adult male living alone may be less likely to respond to the survey than one living with a partner and child).

Correction for this is often referred to as non-response corrections or, more recently, as calibration weighting.

3. The results from the survey are reported in terms of the population of Scotland. Therefore, an expansion factor is required to gross up the sample data to allow the results to be expressed as population values.

8.3 Weighting method

A two-stage approach to weighting was used for the SCJS. The first stage calculated a set of design weights that corrected for the unequal probabilities of selection due to an inaccuracy in the PAFMOI for the household weights. For the individual level weights, the product of the adult household size and the household weight were used. These design weights were used as pre-weights, or initial weights, at the start of the calibration weighting. Correction for disproportional sampling by PFA was achieved within the calibration weighting.

Calibration weighting is a relatively new name for a practice that has been employed for many years. In outline, the method is to weight sample data to population estimates across a number of variables. This, in effect, corrects for non-response bias and in the SCJS grosses the results up to population levels in the same operation.

A procedure often employed to do this, and used for SCJS, is usually known as 'rim weighting'. The population data are entered as targets for a series of 'rims', each rim relating to a variable or combination of variables, and the sample is weighted to each set of targets in turn. The weights after weighting to the targets of one rim are then input to weighting the next rim. The process continues to weight to each rim in turn until the weights of each component of every rim are consistent within a predefined criterion of the target (population) values. This gives a weighted sample whose profile is the same as the population profile for all of the dimensions included in the weighting rims. It permits weighting to allow for many characteristics when population data are not available for the complete interlinking of the various rim characteristics.

8.4 Household weights

8.4.1 Occupancy Correction - Pre-weight

In some cases the number of dwellings at an address may differ from that shown by the MOI given on the PAF. In those cases a correction was made for the changed probability of selection. The correction applied was the ratio of the actual number of households at the dwelling to the MOI value. The correction was used as a pre-weight to the rim weighting.

8.4.2 Weighting rims

There are two criteria that should be applied to determine the characteristics of rims to be used in calibration weighting:

  • They should be characteristics related to the measurement. That is, for the SCJS they should be related to levels and type of crime experienced by both households and individuals;
  • Robust and up-to-date estimates of the populations should be available for those characteristics.

Statistical modelling has shown that levels of victimisation and crime are related to household type with single parent households being a particularly important group (Kershaw and Tseloni, 2005). Population data available for households in Scotland are limited, however data are published by General Register Office for Scotland ( GROS) for four household types:

  • One adult, no children;
  • One adult, one or more children;
  • Two or more adults, no children;
  • Two or more adults, one or more children.

As sub-national data from the SCJS are to be reported at PFA and Community Justice Authority Area ( CJAA) levels, the second rim used for household weighting was for the eleven combined PFA / CJAA areas by the household types shown above.

The age group of the head of household has also been shown to be related to levels of crime (Kershaw and Tseloni, 2005). GROS publishes data for households by age of the head of household at the PFA / CJAA level and therefore that classification was used as a rim employed in the weighting.

The 2009/10 design created a disproportional design by Local Authority ( LA) within PFA and by urban and rural areas within LA and therefore this was also used as a rim employed in weighting.

Thus, the rims selected for use in the weighting were:

  • Household type within PFA / CJAA;
  • Age of head of household within PFA / CJAA.
  • Urban / rural areas within LA;

The application of these rims in the weighting procedure produced a single household weight for each record. Details of the targets for the components of the household weighting rims, together with their sources, are given in Annex 11.

8.5 Individual weights

8.5.1 Variation in selection probabilities - pre-weight

The probability of selection of an adult respondent varied from household to household according to the number of adults in the household. Respondents in single adult households were certain to be selected whereas those in two adult households would be selected one time in two. Similarly the selection probabilities changed for households containing more than two adults. Weights were applied corresponding to the number of adults in the household to correct for these variations in selection probabilities.

8.5.2 Household characteristics - pre-weight

The characteristics of respondents and their experience of levels and types of crime are related to the characteristics of the households in which they live. For this reason the SCJS household weights were carried forward into the individuals' weighting as part of the individual pre-weights.

The actual pre-weights used in calculating individual weights were the product of an adult's probability of selection and their household weight.

8.5.3 Age and gender

The final stage in calculating individual weights was to ensure that the weighted profile of the adults in the sample was consistent with the population profile for Scotland.

A single age by gender by PFA / CJAA rim was used after applying pre-weights as outlined in section 8.5.2. In surveys prior to the SCJS the age by gender rim was applied at the national level. As with the SCJS 2008/09, weighting at the sub-national level was carried out due to the survey design requirement to produce representative data at the PFA and CJAA level.

This weighting procedure produced a single weight for each adult respondent. Details of the weighting targets for age and gender and the sources are given in Annex 12.

8.6 Self-completion weights

Not all individuals responding to the SCJS survey agreed to complete the self-completion questionnaire. 84% of the respondent to the main survey completed the self-completion (section 4.6.1). If this proportion was uniform across all subgroups within the total sample there would be no need for re-calculating weights other than to apply a factor to allow for the smaller sample size and gross up the estimates to the population. However, analysis of response highlighted a difference in response to the self-completion section among different subgroups of respondents, with age demonstrating the greatest difference (section 4.6.1).

In order to correct for this differential response, the individual weights were re-calculated using the same approach as the main survey. The weight grosses the survey estimates from the self-completion data to the adult population of Scotland.

8.7 Weighting Efficiency

Weights have been applied to the SCJS data in order to minimise any bias in the data resulting from sample design effects and response bias. The variation in the size of the weights introduces sample inefficiency. This can be illustrated by considering two respondents, the first has a weight of two and the second has a weight of 0.5 where the average weight is one. In this example the second respondent has one quarter the influence as the first. The more the size of the variation in the weights, the lower the weighting efficiency.

The weighting efficiency for the SCJS household based data is 79%. This means that while the total achieved sample size is 16,036, the effective sample size is 12,667.

Similarly the effective sample size for the individual based data is less than the actual achieved sample. The weighting efficiency is 67%, the achieved sample size is 16,036 and the effective sample size is thus 10,820.

The effective sample sizes by PFA for both household and individuals' based data are given in Annex 13. Annex 14 provides the minimum, maximum and mean weights by PFA.

8.8 Weighted and unweighted sample profiles

Table 8.1 and Table 8.3 show the achieved sample profiles for the main and self-completion questionnaires respectively compared to the weighted sample profile. As with all sample surveys, the achieved profile does not exactly match the population profile, despite the strict procedures which are followed to ensure a random sample and respondent selection. Sample surveys are not precisely representative of a cross-section of the population due to a variety of reasons including whether potential respondents were available for interview and their willingness to participate in the survey. In the SCJS 2009/10, the achieved sample under-represented younger adults and over-represented older adults. This pattern is fairly common in large scale social surveys of this type, and calibration weighting was applied to correct for differences in the level of response among groups of individuals on key attributes (section 8.5.3).

Table 8.1: Main questionnaire unweighted and weighted sample profiles by age and gender

SCJS 2009/10 & GROS Mid-2009 Population Estimates Scotland ( Annex 2).

Base: All respondents (16,036).

Unweighted sample %

Weighted sample %

Men

16-24

8.5

15.5

25-34

12.4

15.8

35-44

16.5

17.5

45-54

18.2

17.8

55-64

18.1

15.3

65+

26.2

18.0

Base

7,061

2,048,250

Women

16-24

8.0

13.7

25-34

12.7

14.4

35-44

17.4

17.4

45-54

17.0

17.5

55-64

17.4

14.7

65+

27.5

22.4

Base

8,975

2,233,400

ALL MEN

44.0

47.8

ALL WOMEN

56.0

52.2

Base

16,036

4,281,650

The differential response of younger and older respondents to the self-completion section of the questionnaire discussed in section 4.6.1 brought the unweighted sample profile for the self-completion questionnaire slightly closer to the adult population profile.

Table 8.3: Self-completion section unweighted and weighted sample profiles by age and gender
SCJS 2009/10 & GROS Mid-2009 Population Estimates Scotland ( Annex 2).
Base: All respondents to the self-completion section (13,418).

Unweighted sample %

Weighted sample %

Men

16-24

8.8

15.5

25-34

12.6

15.8

35-44

17.3

17.5

45-54

18.6

17.8

55-64

18.5

15.3

65+

24.3

18.0

Base

5,913

2,048,250

Women

16-24

8.5

13.7

25-34

12.9

14.4

35-44

18.4

17.4

45-54

17.6

17.5

55-64

18.0

14.7

65+

24.6

22.4

Base

7,505

2,233,400

ALL MEN

44.1

47.8

ALL WOMEN

55.9

52.2

Base

13,418

4,281,650

8.9 Victim form expansion factor / incident weight

Most victim forms collect details of only a single occurrence of an incident. However, respondents can also experience series of incidents, where the same thing was done under the same circumstances and probably by the same people (see section 3.3.2). In these cases, only one victim form is completed, collecting details of the latest incident only. However, the number of incidents that occurred in the reference period is recorded and this number, capped at five incidents (see section 8.9.2), is used in the crime statistics produced from the survey.

Weighted incident values were calculated for each victim form. The values are the products of the appropriate household or individual weight and the number of incidents (the incident count), capped at five, represented by that victim form. 77 This is common practice in other victimisation surveys such as the BCS and National Crime Victimisation Survey ( NCVS) in the USA.

This weight should be applied when analysing incident details in the victim form file ( VFF) data file - for example, when analysing who the offender(s) were for 'all SCJS crime' and any subgroups of 'all SCJS crime' so that data from series incidents are represented in the correct proportion of incidents overall (section 7.2.2).

8.9.1 Calculating the incident counts

Respondents could complete up to five victim forms. The incident count differed according to the characteristics of each victim form:

  • Whether the incident detailed in the victim form was assigned an in-scope offence code ( i.e. the incident was in Scotland, in the reference period and given one of the 33 offence codes included in the 'all SCJS crime' definition - section 7.1.4);
  • Whether the victim form represented a single incident or a series of incidents (section 3.3.2);

The following rules were applied:

1. Where the victim form was not assigned an in-scope offence code the household or individual weight was multiplied by zero;

2. Where the victim form was for a single incident the appropriate weight was multiplied by one;

3. Where the victim form represented a series of incidents, the appropriate weight was multiplied by the number of incidents represented, up to a maximum of five. 78

In the cases where the multiplier was zero, the number of weighted incidents clearly also became zero, effectively removing those cases from weighted analysis of 'all SCJS crime'. This enabled estimates of the incidence of 'all SCJS crime', and of specific types of crimes within that, to be calculated. 79

8.9.2 Capping the incident counts

The restriction to the first five incidents in a series is applied to ensure that survey estimates of incidence are not affected by a very small number of respondents reporting an extremely high number of incidents. The number of incidents reported without the cap can be highly variable between survey years and the inclusion of all of these incidents could undermine the ability to measure trends consistently (Smith and Hoare, 2009). On the other hand, the practice of capping series incidents has been shown to underestimate the incidence of survey crime (Farrell and Pease, 2007; Planty and Strom, 2007). The convention of capping does not affect estimates of the risk of victimisation.

In the SCJS 2009/10, two per cent (72) of all in-scope victim forms (3,326) were for a series of more than five similar incidents and one per cent (18) were for a series of more than 10.

8.10 Weighting and expansion variables in SPSS data files

Table 8.4 and Table 8.5 list the weighting variables which are contained in the SCJS 2009/10 SPSS data files.

There are two sets of weights - grossed weights and scaled weights. Grossed weights (Table 8.4) include an expansion factor so that data can be expressed as a proportion of the population of Scotland. When using the gross weight to analyse individual based data for a question asked of the entire sample, the weighted sample size would be 4,281,650 (the total number of adults in Scotland).

Table 8.4: Gross weighting variables in the SCJS SPSS data files

Weighting variable

Data File 80

Description

WGTGHHD

RF & VFF

Household weight

WGTGINDIV

RF & VFF

Individual weight

WGTGHHD_ SC

SCF

Self-completion household weight

WGTGINDIV_ SC

SCF

Self-completion individual weight

Scaled weights (Table 8.5) do not include this expansion factor and can be used when undertaking advanced statistical analysis. When using the scaled weight to analyse individual based data for a question asked of the entire sample, the weighted sample size would be 16,036 (the total number of respondents interviewed). The scaled versions of the household and individual weights (including those in the self-completion file) are denoted by the addition of _SCALE at the end of the weighting variable names listed in Table 8.4). More information on scaled weights is provided in the SCJS 2008/09 User Guide (available from the survey website).

Table 8.5: Scaled weighting variables in the SCJSSPSS data files

Weighting variable

Data File 80

Description

WGTGHHD_SCALE

RF & VFF

Scaled household weight

WGTGINDIV_SCALE

RF & VFF

Scaled individual weight

WGTGINC_ SCJS_SCALE

VFF

Scaled gross incident weight for SCJS crimes

WGTGHHD_ SC_SCALE

SCF

Scaled self-completion household weight

WGTGINDIV_ SC_SCALE

SCF

Scaled self-completion individual weight

When analysing the respondent file ( RF) individual weights should be used as respondents provide details of their own circumstances, experiences, attitudes and opinions. In a small number of cases, respondents are asked to provide information on behalf of the entire household (for example, the way in which the household occupies the accommodation, whether anyone in the household has owned or had regular use of a car, whether there is anyone in the household who requires care etc). These questions / variables are listed in Annex 15 and the household weight should be used when conducting analysis of these questions / variables.

In addition, when analysing incidence and prevalence variables for household crimes or crime groups (section 7.2.1) in the RF data file the household weight should be used. A list of household crimes is provided in Annex 15. Users should note that, following conventions used on the BCS, where crime groups contain both household and personal crimes, the individual weights are used in the calculation of published incidence and prevalence rates. 81

8.10.1 Calculating rates per 10,000

Past surveys have included weights that incorporate a calculation to display incidence statistics as rates per 10,000 households or individuals (and rates per 10,000 are presented in the Main Findings report, Annex 1, Table A1.4). These are not included in the SCJS data files. They can be created by users if necessary by using the following syntax which simply divides the gross weights by the total population (household or individual) divided by 10,000:

compute WGTGINDIVRATE=WGTGINDIV/(4255000/10000).
compute WGTGHHDRATE=WGTGHHD /(2331250/10000).