**GLOSSARY OF STATISTICAL TERMS**

*(Terms in bold type are themselves the subject of definitions in this glossary)*

analysis of covariance

A type of **regression** analysis in which data representing two groups (here, recipients and comparison respondents) are analysed in such a way as effectively to generate separate results for each group. This method permits estimation of the average difference in the outcome of interest (for example, some measure of health) between the groups.

*confidence interval*

A range of values - expressed as a lower and an upper limit - within which the unknown 'true' value of an estimated quantity (such as an average) is expected to fall. Confidence intervals are expressed in terms of specific levels of uncertainty. For example, a 95% confidence interval indicates a 95% probability that the true value will lie within the stated lower and upper limits. Interpretation of the confidence interval depends on the nature of the analysis which generated it. For example in a **regression** analysis (including **analysis of covariance**) if the confidence interval around the estimated value of a predictor includes the value zero, the result is considered not to be statistically significant. However, in a **logistic regression**, the inclusion of the value one in the confidence interval around an estimated **odds ratio** indicates a nonsignificant result.

*logistic regression*

A special form of **regression** used when the outcome of interest is binary; that is, can only take one of two possible values ( e.g. the presence or absence of some specific disease). When logistic regression is used, the **parameter estimates** are expressed as **odds ratios**.

*odds ratio*

A measure of the likelihood of a binary factor (such as the presence or absence of some disease) being observed in one group relative to the corresponding likelihood for a second group. An odds ratio of one indicates equal likelihood for both groups; odds ratios greater / lower than one indicate unequal likelihoods. An odds ratio may be presented with an associated **confidence interval**. A full understanding of odds ratios is important for the results of the evaluation to be interpreted correctly. Recognising this, a simple worked example of odds ratio calculation is given at the end of this section.

*p value*

The probability that the result of a statistical test is attributable to the random play of chance, rather than to the presence of an actual effect in the population of interest. All *p* values fall within a range bounded by zero and one. Large *p* values ( e.g. 0.2) are interpreted as indicating that the observed result could plausibly have arisen due merely to chance, while small *p* values ( e.g. 0.01) suggest that the result reflects an effect which is actually present in the population from which the sample is drawn. A value of *p* = 0.05 is commonly regarded as an informal 'threshold' of statistical significance, values of 0.05 or lower being considered significant ( i.e. indicative of a real effect) while values greater than 0.05 are treated as nonsignificant. While this is a useful guideline, it can be potentially misleading - it is incorrect to place a completely different interpretation on the result of a statistical test simply because the observed *p* value is (say) 0.06 rather than 0.05.

*parameter estimate*

The 'result' obtained from a statistical model (such as a **regression** analysis), estimating - on the basis of a sample - the unknown 'true' value of some quantity in the population under investigation. The parameter estimate embodies a degree of uncertainty as to how accurately it represents the true (population) value; this uncertainty may be quantified by showing a **confidence interval**.

*regression*

A family of statistical techniques which seek to predict the value of some quantity ( e.g. a measure of health) from one or more other variables ( e.g. gender, age).

*Odds ratio - worked example*

Suppose there are two groups of individuals. 150 of these are central heating recipients, of whom 32 have some condition of interest - for example, they report that their heating always keeps them sufficiently warm. The *odds* of the condition being present among these individuals is calculated as the number who **do** have the condition divided by the number who **do not**i.e.

The second group consists of 144 comparison group respondents ( i.e. individuals not receiving central heating), of whom 17 indicate that their heating always provides adequate warmth. The odds of the condition being present among these people is given by

The odds ratio - that is, the odds of the condition being present among heating recipients relative to the odds of its presence among comparison group members - is given by

The above example illustrates the calculation of a 'raw' or unadjusted odds ratio. In the statistical models used to generate the results of the evaluation, the odds ratios are adjusted for the effects of other factors which might plausibly be relevant - see Appendix B, Section B.13.