The project aims to analyse route environment factors that increase or reduce injury risk for people cycling in the UK. We are doing this through a ‘case-crossover’ analysis that compares sites at which cyclists were injured, with other ‘control’ sites randomly chosen from their routes prior to injury.

Having control sites is important because it allows us to understand risk. For instance, if 75% of cycle injuries take place on major roads, it’s important to know whether more or less than 75% of cycling takes place on major roads. If it’s more, major roads might actually be safer; if less, more risky.

The problem is that usually, we don’t know much about where people ride. In the UK, only London has a cyclist flow model, so for London we can for instance estimate the amount of cycling that happens on major roads. This approach led to a recently published paper, from a pilot study for this project.

Outside London, we can’t do this. So the method in this study uses individual-level data, from police injury data held by the Department for Transport. We have cyclist injury locations, and from DfT home postcodes of the injured cyclists. Cyclists who were injured in the weekday morning peak, within a ‘cycleable’ distance of their home location, are highly likely to have been travelling from home (in most cases, going to work).

We then have start and end points, but not the route. So we are using Cyclestreets (a journey planner for cyclists) to generate routes. Initial testing suggests that Cyclestreets ‘fast routes’ represents well the types of routes people follow, much better than a shortest-path algorithm that we created for comparison with a sample of actual routes.

The next step is then to route cyclists using Cyclestreets. Having done that, we randomly select a point from each cyclist’s route, which is then their control site. We are also planning to select a ‘control junction’ for those cyclists injured at junctions, inspired by Kay Teschke’s study (and as advised by our stakeholder group).

With control and injury sites, we can then carry out the spatial matching, to create a dataset with different types of route environment variable, from mode-specific infrastructure (cycle tracks, bus lanes) to more general built environment variables (road classification, car parking provision), to motor traffic speeds and volumes.

With this data, as in the pilot, we can construct a regression model to explore which characteristics are associated with elevated or reduced risk (including interactions with injury-specific variables, such as vehicle involved in the collision).