Pain and physical function belong to the core set of outcomes for phase III trials in osteoarthritis ( Bellamy 1997). Short-term (post-intervention) effects were analysed. Outcome measures were extracted by the principal author (MJJ). Two reviewers (MJJ and AFL) extracted information about the different intervention components. For each study and outcome measure, effect sizes were calculated using the difference in the mean change within the intervention and control group divided by the pooled baseline standard deviation. Positive values indicate that the intervention group improved on average more than the control group. Effect sizes of 0.2 to 0.5 can be interpreted as small,

0.5 to 0.8 as moderate, and greater than 0.8 as large effects. To calculate the standard error of the effect size estimates, the pre-test post-test correlation must be known for the pain and function measurements within each study. Since this information was not available for any of the studies, we assumed a correlation of 0.6. All of the analyses were repeated using

an assumed correlation of 0.4 and 0.8, yielding essentially identical results. A meta-analysis was then conducted to obtain the average effect for the different intervention types and to compare these effects against each other. We anticipated that no trials might be found that directly compare any of the three interventions. Therefore we pre-planned a mixed-effects meta-regression model for this purpose, using restricted maximum likelihood estimation to estimate the amount of (residual) heterogeneity and using appropriate

dummy variables for the different intervention codes. To examine potential effect modification, we repeated this analysis including the type of control group (education/usual care/ultrasound vs none), study quality (EBRO score), treatment delivery mode (individual vs group), duration of treatment period (in weeks), treatment frequency per week, duration of treatment period × frequency, sex (% females), mean age of the sample, measurement instrument (WOMAC pain/function vs other) and type of weight bearing exercise used (non-weight bearing, weight bearing, or both) as covariates in the model. All analyses were carried out in R (version 2.10.1) using the ‘metafor’ package (Viechtbauer 2010). Of the 153 retrieved trials identified by the literature search, 21 were relevant. Twelve of these relevant studies were randomised controlled trials that met the inclusion and exclusion criteria. Figure 1 outlines the flow of studies through the review. Reasons for exclusion of the studies were: no non-exercise control group (Deyle et al 2005, Diracoglu et al 2005, McCarthy et al 2004, Veenhof et al 2006); no or only light strengthening exercises used in the intervention (Bautch et al 1997, Kovar et al 1992), and not possible to classify under one of the three codes.