LGRAS ( 03 ) 2 - EXCEL SPREADSHEET VERSIONS OF ROAD ACCIDENTS STATISTICS TABLES AVAILABLE FROM THE SCOTTISH EXECUTIVE WEB SITE
1.1 This paper informs Members of the Group of the forthcoming availability of Excel spreadsheet versions of road accident statistics tables on the Scottish Executive (SE) Web site, and invites their views on the need for expanded spreadsheet versions, and time-series spreadsheet versions, of some tables.
1.2 The previous paper referred to the review of the dissemination of and access to road accident statistics, conducted by Professor Richard Allsop of University College London. This paper describes those of Professor Allsop's recommendations regarding access to the statistics via the Web which relate to the Scottish Executive, and sets out SE's proposed responses. It also refers briefly to the recommendations addressed to the Department for Transport (DfT).
2 Excel spreadsheet versions of the tables from the publications
2.1 Professor Allsop's report recommended (section 8.3 of Part 2 of the 2002 Quality Review report) that the tables in the current issues of the annual publications be made available as Excel spreadsheets as soon as is practicable.
2.2 All the tables in "Road Accidents Scotland" have been produced in "camera-ready" form using Excel spreadsheets with effect from "Road Accidents Scotland 1998". ( "Key Road Accident Statistics" tables have been produced in that way longer.) However, as yet, these spreadsheets have not been made available on the SE Web site for users to "download", principally because SE did not have the kind of "Web publishing" facilities that would enable this to be done quickly and easily. SE has made available on its Web site "HTML" and "PDF" versions of the publications. Users can "cut and paste" the numbers from the HTML versions (but not from the PDF versions) into their own spreadsheets - but the appearance and formatting of the results is not as good as in the original. Therefore, when requested, the Transport Statistics branch has sent copies of its Excel spreadsheets to users of the statistics.
2.3 A new version of the SE Web site will "go live" later this year. It will have a revised SE Statistics section, including several pages of information about Transport statistics. These will be created and updated using new SE "Web publishing" facilities. The Transport Statistics branch will use these pages to make available on the SE Web site Excel spreadsheet versions of all the tables in the latest editions of "Road Accidents Scotland", "Key Road Accident Statistics" and its other publications, plus some spreadsheet versions of tables from some earlier editions. Members of the Group, and non-members who receive copies of LGRAS papers, will be notified when this has been done, and will be told how to get to the relevant Web address. Thereafter, Transport Statistics publications will refer to the availability on the SE Web site of Excel spreadsheet versions of the tables in the publications, and give a Web address which can be used to access them.
2.4 Members of the Group are asked to note the position, and to raise any queries they may have on the forthcoming availability on the SE Web site of Excel spreadsheet versions of tables from Transport Statistics publications.
3 Expanded Excel spreadsheet versions of some tables from the publications
3.1 Among the comments made by Scottish users of the statistics who responded to the Quality Review survey were the following:
- information on casualties by age is given in age-bands [which] prevents detailed comparisons within age bands used by this Police Force and a number of Local Authorities. If age tables were available in single years of age, users could set their own bandings;
- being able to download tables of casualties by age would permit local customising of age bands;
- numbers for the current year [should be] presented rather than [the latest five year] averages, in order to give the number of fatalities etc in any one year;
- I get spreadsheets containing unpublished data emailed to me by SE on request. It would be useful to be able to include some data in studies etc without having to input it myself or continually correspond with SE.
3.2 Although there was no recommendation specifically on this point, the Scottish Executive appreciates that some users would find it useful to have available expanded Excel spreadsheet versions of some of the tables in "Road Accidents Scotland" and "Key Road Accident Statistics" - for example, spreadsheets giving figures for each individual year of age, or (in the case of a table which provides just the latest five years' averages) for each individual calendar year. Such expanded spreadsheets could be far too large to appear in a publication. Of course, users would have to take care when analysing such figures, because there would be cases where an expanded spreadsheet included many numbers whose year-to-year variability was large relative to their values (e.g. the number of casualties aged, say, 18 for a particular mode of transport may be in single figures in most years, so could, by chance, have large year-to-year percentage changes).
3.3 The Scottish Executive will, therefore, make some expanded Excel spreadsheet version of some of the tables in "Road Accidents Scotland" and "Key Road Accident Statistics" available on its Web site in due course. The Executive has not prepared a proposed list of expanded spreadsheet tables: it would welcome the advice of Members of the Group as to which tables they would consider most useful to have in expanded spreadsheet form. Suggestions should specify clearly and unambiguously the changes that would be required to the published table to produce the expanded spreadsheet version - for example:
- "Table 32, but with the rows giving individual years of age (rather than age-groups)"; or
- "Table 28, but with the figures for each of the latest five years appearing in a separate spreadsheet (rather than one spreadsheet giving the five year averages)".
The Executive will consider all the suggestions that are made, and the views that Members of the Group may have on their priority, and then arrange to make available as many as can be produced within the limited staff time that will be available for this work.
3.4 Finally, it should be noted that expanded spreadsheet versions of tables might not include the percentage distributions, or the various sets of percentage changes, that appear in some of the published tables. The main purpose of the expanded spreadsheet versions of the tables will be to make available the basic numbers (of accidents, drivers or casualties), for use by others in their own analyses. It may not be worthwhile for SE staff to spend a lot of time including percentage distributions and percentage changes in the expanded spreadsheets. Members of the Group suggesting expanded spreadsheet versions of any tables which include percentages should therefore indicate the importance of the percentages to them. SE can then consider each case individually, in the light of users' views on the value of the percentages and the amount of work that might be required to provide them.
3.5 Members of the Group are invited to identify the "Road Accidents Scotland" and "Key Road Accident Statistics" tables for which they would like to have expanded Excel spreadsheet versions available on the SE Web site. Their suggestions should specify precisely how they would like the tables' rows and/or columns expanded, and how important it is that the expanded spreadsheets include the kinds of percentages that appear in the original table.
4 Time-series spreadsheet versions of tables from the publications
4.1 The Quality Review survey included a question specifically on users' needs to download time-series versions of tables that appear year-by-year in publications (asking which publications? and for how many years?). In their responses, Scottish users of the statistics said (for example):
- It would be useful to download any or all tables and charts … 10 years time series would be sufficient for most purposes, but should also show comparisons to government target baseline data;
- All publications concerned with road accident, traffic and population data. All commencing 1981, which is the date from which local records were computerised;
- 1981 to present (two other Scottish respondents);
- … for all publications … as long a time series as possible;
- For five years, or from 94-98 stats, to compare against base for … targets;
- possibly to 5 years;
- this would be a great time saver … sometimes I use 3, 5 or 10 year time series … sure that it would be relatively simple to allow the user to select … data … tailored to the chosen period;
- would be extremely useful for input to reports etc … normally deal with accident comparisons over 3, 5 and 10 year periods. It is sometimes useful to refer back 20-25 years to emphasise improvements made.
Respondents from other parts of GB made similar points.
4.2 Professor Allsop's report recommended (in section 8.3) that time-series of tables from the annual publication be made available in two phases: Phase 1 - all tables back to 1994 in order of relevance to monitoring of progress with the road safety strategy; and Phase 2 - key tables back to 1981, selected for relevance of monitoring of progress towards the targets. This could involve a lot of work. The Scottish Executive would therefore welcome the advice of the Group on which tables are needed in time-series spreadsheet form, or in "extended" time-series spreadsheet form (an "extended" spreadsheet version of a table would cover more years than the published version). Views are also sought on the periods for which the figures would be useful, and on what should have priority in the provision of the spreadsheets.
4.3 In terms of the feasibility of producing time-series (or extended time-series) spreadsheet versions, the tables which appear in "Road Accidents Scotland" and "Key Road Accident Statistics" fall into a number of categories, which are discussed below.
4.4 Time-series tables, providing information which comes solely from the SE road accident statistics database and which is available on a sufficiently consistent basis for many years . It would be easy to produce extended time-series spreadsheet versions of such tables, covering a greater number of years. An example is Table 3 of "Road Accidents Scotland". It would be easy to produce an extended spreadsheet version of this, providing figures all the way back to, say, 1981. All the numbers could be extracted easily from the SE's database, and saved as a spreadsheet. It should be noted that, although the current local authorities were only established in 1996, time-series for their areas can be produced back to 1979. This is because the records of accidents back to 1979 were allocated retrospectively to the code for the relevant current local authority, in a special exercise undertaken some years ago (see section 2 of Annex E of "Road Accidents Scotland").
4.5 Time-series tables, providing information which comes solely from the SE road accident statistics database but which is not available on a sufficiently consistent basis for many years . In such cases, extended time-series spreadsheet versions could only cover those years for which information was available on a sufficiently consistent basis, which will depend on the topics covered by the table and when (e.g.) the relevant parts of "Stats 19" were changed. An example is Table 4 of "Road Accidents Scotland". Because of the changes to the trunk road network at the time of local government reorganisation (1st April 1996), it is not possible to produce an extended time-series table giving figures for the "current" trunk road network all the way back to, say, 1981. Accidents are normally classified on the basis of whether or not they occurred on a road which was part of the trunk road network at the time they occurred. In addition, the records of accidents which occurred between the start of 1992 and 31st March 1996 were classified retrospectively on the basis of whether or not the road formed part of the "post-1 April 1996" trunk road network, in a special exercise carried out some years ago (see section 5 of Annex E of "Road Accidents Scotland"). So, the Scottish Executive can produce figures on the basis of the "post-1 April 1996" trunk road network for the years back to 1992, but not for any of the years prior to 1992.
4.6 "Single period" tables, providing information which comes solely from the SE road accident statistics database, and which is available on a sufficiently consistent basis for many years. An example would be Table 8 of "Road Accidents Scotland", which gives just the averages for the latest five year period. A time-series spreadsheet version of this table could be produced - the main question would be how to incorporate "time" in the spreadsheet. There are a number of options:
- a separate spreadsheet for each period for which figures are produced - e.g. one spreadsheet for the 1997-2001 average, one for the 1996-2000 average, one for the 1995-1999 average, and so on;
- a separate spreadsheet for each single year - e.g. one for 2001, one for 2000, one for 1999, and so on;
- incorporate "five year averages" into the rows of the spreadsheet. For example, to produce a time-series spreadsheet version of Table 8, the first line of figures (for "Built-up, more than 20m from junction") would be replaced by a series of rows, each giving the five year averages for "Built-up, more than 20m from junction" for a different period (e.g. 1997-2001, 1996-2000, 1995-1999, etc) - and similarly for the rest of the categories in the rows of the original table;
- incorporate individual years into the rows of the spreadsheet. For example, to produce a time-series spreadsheet version of Table 8, the first line of figures (for "Built-up, more than 20m from junction") would be replaced by a series of rows, each giving the figures for "Built-up, more than 20m from junction" for a different year (e.g. 2001, 2000, 1999, etc) - and similarly for the rest of the categories in the rows of the original table;
- have separate blocks of figures for each period, each with the same layout as the published table - for example, the spreadsheet might start with the figures that were published for the latest period (e.g. 1997-2001 averages) as they appeared in the publication, below which would be the figures for another period (e.g. 1996-2000 averages) with the same layout, and so on;
- etc - there will be other options.
- The "best" option for producing a time-series spreadsheet version of a particular table would depend mainly upon how people would use the figures. For example:
- if they are likely to be mainly interested in the figures for individual years, the spreadsheet should give these;
- if they are likely to want to look at the year-to-year changes, or annual average changes, the spreadsheet should have the different years (or the different annual averages) on consecutive rows;
- if they just want to see how a few figures for the latest period compare with those for an earlier period (e.g. ten years ago), they might find it easier to use a set of small spreadsheets, each containing a table which looks like the published one but covers a different period. These could be provided together in a workbook, with each spreadsheet's "tab" specifying the year(s) to which it related. However, there might be little benefit in adopting this approach, when people could often just as well compare the figures which they could get from the tables in the relevant editions of the publication;
- etc - there will be other possibilities.
Subject to the views of Members of the Group, it is suggested that an approach along the lines shown in the attached example would be best. It provides separate figures for each year on consecutive rows, so that year to year changes can be seen easily. Those who want annual averages (be they for three years, five years or some other length of period) can calculate them from the figures for the relevant individual years.
4.7 "Single period" tables, providing information which comes solely from the SE road accident statistics database, but which is not available on a sufficiently consistent basis for many years. The options are similar to those of the above category, but any time-series spreadsheet could be produced only for those years for which the data were sufficiently consistent, which (e.g.) will depend upon the topics covered by the table, and when the relevant parts of the "Stats 19" specification were changed.
4.8 Tables which use information which comes, wholly or in part, from sources other than the SE road accident statistics database. Whether time-series spreadsheet versions could be produced would depend upon the availability (or otherwise) of the information from the other sources, and what would have to be done to prepare a time-series spreadsheet from it. The tables in "Road Accidents Scotland" which use information from other sources are:
- Tables A to D - comparisons with England and Wales;
- Tables E and F - international comparisons;
- Table 5 (b) and (c) - accident rates, which are calculated using DfT traffic estimates;
- Tables 9 to 12 - accident costs, which are calculated using GB average costs per accident and per casualty;
- Tables 18, 31, 32, 34, 37, 38 - the "rates per thousand population" in these tables are calculated using the population estimates provided by GRO(S);
- Table 22 - drink-drive estimates, produced by DfT.
In some cases, it should be quite straightforward to produce a time-series (or an extended time-series) spreadsheet version of a table. For example, DfT's average costs per accident and per casualty (at "2001 prices") could be applied easily to the numbers of accidents and casualties for all years back to 1981, producing a time-series of accident costs (at "2001 prices") back to 1981. On the other hand, we do not know whether all the "international comparisons" data would be available for earlier years, particularly in the case of Table F's figures for age-groups (which have only been included in the publication for a few years). So, the Scottish Executive cannot guarantee to produce time-series (or extended time-series) spreadsheet versions of these tables. However, it will look into the possibilities, in the case of those tables for which Members of the Group identify a clear need for time-series (or extended time-series) spreadsheet versions.
4.9 Finally, it should be noted that time-series (or extended time-series) spreadsheet versions of tables might not include the percentage distributions, or the various sets of percentage changes, that appear in some of the published tables. The main purpose of the spreadsheet versions of the tables will be to make available the basic numbers (of accidents, drivers or casualties), for use by others in their own analyses. It may not be worthwhile for SE staff to spend a lot of time including percentage distributions and percentage changes in the spreadsheets. Members of the Group suggesting spreadsheet versions of any tables which include percentages should therefore indicate the importance of the percentages to them. SE can then consider each case individually, in the light of users' views on the value of the percentages and the amount of work that might be required to provide them.
4.10 The views of Members of the Group on these matters would be welcomed. For example:
- for which tables do they wish to have time-series (or extended time-series) spreadsheet versions? and for how many years?
- for what purposes would they expect to use such spreadsheets?
- would they agree that the approach shown in the attached example would be the most appropriate in the case of the tables which (in the publication) give only figures for the latest year or the latest five years' average?
- are there cases where they feel that another approach would be better?
- are there cases where they feel that it is important that time-series spreadsheet versions include percentages?
- what priority would they give to each of the tables?
4.11 The Scottish Executive will then consider what time-series (and extended time-series) spreadsheet versions to make available on the SE Web site, in the light of the views of the users of the statistics who are represented on the Group.
5 Other recommendations
5.1 The Quality Review report made a number of other recommendations on access to data via the Web. This section refers to them, and describes the position in respect of those which relate to the Scottish Executive.
5.2 At the start of section 8, the report recommended that Web sites providing access to the national road accident statistics should have links to Web sites containing other relevant data, such as data about exposure to risk, traffic offences and health, and should themselves be accessible by links from the ONS Web site. The SE Web site includes "Scottish Transport Statistics", which contains some statistics of motor vehicle offences recorded by the police (information which is obtained from SE Criminal Justice Statistics, whose publications are also available on the SE Web site). The revised SE Statistics Web site will have links to, for example, the DfT and Scottish Health Service Information Services Division Web sites, which contain data on exposure to risk (e.g. traffic estimates) and health statistics.
5.3 Professor Allsop also recommended (in section 8.3) that the feasibility of providing matching time series of exposure data from road traffic data and the National Travel Survey be examined. The report recognised that only some of the exposure data that can be provided for Great Britain as a whole can be provided also for Scotland, Wales or local authority areas. There appears to be little that the Scottish Executive can do on this recommendation. As indicated in the previous paper, SE publishes the main figures from the DfT traffic estimates, for areas, in "Scottish Transport Statistics", and will probably include some traffic estimates in "Road Accidents Scotland". Assuming that DfT will make its road traffic estimates available on its own Web site, that will be the definitive source of all the figures for Scotland as a whole, and for local authority areas, that DfT considers to be sufficiently reliable to "publish" in that way. There would be little point in the Scottish Executive making a copy of the DfT traffic estimates available on the SE Web site. Because the National Travel Survey's sample size is small (up to 2001: only 300 or so households per year; roughly triple that from 2002), the results for Scotland have traditionally been produced only for three-year periods, and are not suitable for analysis in detail, so provide very little "exposure data" for Scotland.
5.4 The report made some other recommendations (in sections 7 and 8 of Part 2) on access via the Web, and access to data at the individual record level, which were addressed to DfT. These related to the arrangements regarding the anonymised copy of the GB data held by the UK Data Archive, the presentation on the Web of the publications of the data for GB, telephone enquiries to DfT, and suggestions that DfT might consider arranging for a reduced dataset for use in schools, and for Web query language access to anonymised datasets.
5.5 Members of the Group may wish to comment on, or ask questions about, the above recommendations, or to raise other points on the report and its recommendations.