Core Topics in General & Emergency Surgery: Companion to Specialist Surgical Practice (9 page)

2
Outcomes and health economic issues in surgery

Sharath C.V. Paravastu and
Jonathan A. Michaels

Introduction

Evidence-based medicine demands that all those making decisions regarding clinical management, either on an individual patient basis or at a policy level, consider existing evidence in order to maximise the chance of favourable outcomes and optimise the use of available resources. However, such evidence is rarely clear-cut and there may be conflicting advice because of differences in the way that outcomes are measured, the way in which costs are assessed or the perspective from which an economic evaluation is carried out.

This chapter deals with some of the issues around the measurement of outcomes, the calculation of costs and the methods of economic evaluation. The available outcome measures are considered, drawing the distinctions between disease-specific and generic measures and explaining concepts such as health-related quality of life, quality-adjusted life-years (QALYs) and utilities. The differences between costs, charges and resource use are highlighted, followed by a discussion of issues such as discounting, sensitivity analysis and marginal costing. Finally, a section on economic evaluation describes the different techniques available – cost minimisation, cost-effectiveness, cost–utility and cost–benefit analysis – and discusses the use of cost-effectiveness league tables. The intention is not to provide a full reference work on these subjects but to raise awareness of some of the important issues to be considered when evaluating evidence on specific interventions that may rely on differing outcome measurements or methods of economic evaluation.

Outcome measures

Clinicians tend routinely to consider health outcomes in terms of clinical or biomedical measures such as blood pressure levels, blood sugar levels or bone mineral density. Process-based outcomes such as readmission rates, reintervention or complications are readily considered alternatives. Data such as these are seen as readily available, easily measured, objective and comparable between differing settings. However, the present environment in medical services makes it necessary for the healthcare professional to consider more than just the treatment of the condition. A greater emphasis is now placed upon the consideration by the clinician of the actual status of the patient's quality of life. Now, considerations extend beyond assessing the value of an intervention and the effectiveness, or otherwise, of drug regimens. There should also be an assessment of the patient's physical, mental and social well-being. In line with such interests, there has been considerable research and a greater emphasis upon applying subjective non-biomedical measures and the development of such tools (or ‘instruments’) has been substantial since the early 1980s.

When considering which instrument of assessment to choose from the plethora now available, the user should carefully consider what parameter is to be measured before making a final selection of an outcome measure. Before applying this measure to a patient population, particular consideration needs to be given to deciding whether it will measure what we are interested in measuring and whether it will answer the questions that we wish to be answered.

All too often assessment tools may be applied to patients in the wrong circumstances or used when there is no realistic opportunity to measure what we wish to measure. These are important considerations because the administration and analysis of these measures can be costly, as well as taking up valuable time for both patients and clinicians. In addition, the use of unsuitable measures applied in the wrong context might yield results that are perhaps plausible but wrong, thus leading to erroneous conclusions. The implications of such findings for patients or the health service can be substantial.

Instruments should therefore be carefully selected for their appropriateness (able to answer the research question), acceptability (acceptable to patients), feasibility (ease of administering), reliability (reproducibility), validity (measures the outcome it is meant to), responsiveness (ability to respond to changes), precision (of scores) and interpretability (ease of understanding the results).
1
In particular, attention should be given to reliability and validity.
Reliability
refers to whether the instrument will be reproducible, such that if applied in different settings or circumstances to the same unchanged population then the same results should be achieved. This has particular implications for studies using instruments to derive longitudinal data on a particular sample of patients. In such circumstances, we need to be confident that observed changes over a given time period reflect actual change. Test–retest reliability is an important consideration and is assessed by making repeated assessments under the same circumstances at differing points in time and comparing the results using correlations or differences. Similarly, for instruments that require administration by interviewers, there needs to be a high level of agreement between different raters assessing the same patients but at different periods in time. For example, Collin et al. found a high level of agreement between the patients, a trained nurse and two skilled observers during applications of the Barthel Index (assesses patients' ability to carry out daily activities) to the same group of patients.
2
In another example, Aissaoui et al. found a high level of agreement between doctors and nurses during the application of Behavioural Pain Score (BPS) in the same group of critically ill patients.
3

Another psychometric criterion for consideration is that of
validity
, which means that instruments measure precisely what they set out to do. It should be borne in mind that measures can be reliable without being valid, but they cannot be valid without being reliable. Three types of validity are described. First,
content validity
, which relates to the choice, appropriateness and representativeness of the content of the instrument. Judging content validity involves an assessment of whether all of the relevant concepts are represented. For example, a representative sample of asthmatic patients could be used to develop an asthma questionnaire in order to ensure that it captures all the domains of interest for such a patient population. Second, there is a requirement to consider
criterion validity
, which is the degree to which the measure obtains results that are comparable to some kind of ‘gold standard’. While this is theoretically a simple concept, there are very few such gold standards for comparison. Finally, there is
construct validity
. This relates to observation of when expected patterns of given relationships are observed. For example, if a method of valuation of outcomes predicts that a patient prefers option A to option B, then one would expect this to be reflected by their behaviour when faced with genuine clinical choices. This is normally assessed through the use of multitrait–multimethod techniques,
4
which map the correlations between alternative approaches to measuring the same construct and between measures of different constructs.

As can be seen, the choice of an outcome measure is not always as straightforward as it may seem at first. In addition to considerations regarding the patient group, there are also important considerations regarding the psychometric properties of the instruments. Different outcome measures will have uses for differing patient groups. For example, a biomedical measure such as blood pressure alone might be considered suitable for comparing two similar drug regimens to assess ‘best’ control of blood pressure. However, a study that attempts to compare renal transplant with dialysis might also wish to consider a much broader picture and would be likely to require consideration of quality-of-life issues together with mortality as outcome measures.

It is extremely important to choose the right outcomes for the purpose in question, as different conclusions can be drawn from the application of different outcome measures in the same study. For example, a study of vascular patients compared exercise training with angioplasty for stable claudication with results expressed in terms of ankle–brachial pressure indices and walking distance.
5
In the short term, it was found that angioplasty improved the pressure but not the walking distance, while exercise improved the walking distance but not the pressure. This example shows that different outcome measures may not always change in the same direction and used in isolation could lead to opposite conclusions.

The following sections examine some of the issues involved in the evaluation and application of some common specific outcome measures.

Mortality

The mortality rate expresses the incidence of death in a population of interest over a given period of time. It is calculated by dividing the number of fatalities in the given population by the total population.

Mortality is often used as an outcome measure in studies as an indication of the effectiveness, or otherwise, of a treatment. It is often easily derived and as such represents a readily accessible outcome measure. While mortality can indeed provide much useful information, its use in reporting results should always be interpreted with caution. First, procedure- and diagnosis-related mortality rates often refer to inpatient deaths only or perhaps mortality over a given postoperative period, for example 30 days. Variations in short-term survival rates might simply reflect differing discharge practices between differing hospitals or settings. Longer-term survival rates are frequently reported for cancer and other chronic conditions. In interpreting these, it must be borne in mind that distortion may occur as a result of the starting point or choice of time frame. For example, survival may be longer in a screened population because there is an earlier starting point,
6
and comparisons between surgical and medical treatments may be very sensitive to follow-up periods because of early excess operative mortality in surgical treatments, which may be offset by better long-term survival. For these reasons, it is often necessary to compare survival curves rather than total survival at a specific time point. This may raise further issues regarding the possible need for discounting, to take account of a preference for survival in the earlier years after treatment (see below).

Second, mortality can only be a partial measure of quality, and it is often not the most appropriate outcome measure for use in most situations. Many studies report mortality and tend to ignore other important outcomes such as morbidity and quality of life. These are more complex to quantify and are not routinely collected. Mortality is particularly limited in usefulness for studies investigating low-risk procedures. Accurate assessment of the quality of such procedures requires more sensitive measures. For example, use of mortality alone as an outcome measure for parathyroidectomy would not be appropriate as it is associated with an extremely low mortality. A more appropriate measure would be to assess improvement in symptoms or quality of life.

It is also necessary to highlight the effect that differences in case mix can have on the mortality rate. For example, there is a tendency in studies relating workload to outcome for the results to be reported for the whole sample of patients. It is important to be aware that results reported in this way may be misleading as no account is taken of the diversity of patient characteristics that may be contained in such a sample. Both differences in severity of illness and in risk of adverse outcomes relating to comorbidity can significantly affect any interpretation of mortality rates. This problem is illustrated by Sowden and Sheldon, who discuss examples from coronary artery bypass grafting and intensive care to demonstrate the importance of adjusting for case mix.
7
For coronary artery bypass grafting, they report that the strength of the relationship between low volume and increased mortality is reduced in studies that adjust for differences in risk among patients receiving treatment. With adult intensive care, they cite a study by Jones and Rowan
8
in which the apparent higher mortality associated with smaller intensive care units ceased to be significant once the data were adjusted to reflect the fact that severity of illness was on average higher among patients admitted to small units. These examples clearly demonstrate that, in order to minimise bias in such studies, account must be taken of all possible factors (beyond workload) that are likely to affect patient outcomes.

Condition-specific outcome measures

The term ‘condition specific’ describes instruments designed to measure health outcomes considered to be of specific interest to patients who incur health problems attributable to a particular disease or as the result of other processes. Such instruments are often referred to as ‘disease specific’, but this term is more general as it encompasses more diverse areas such as natural ageing, trauma and pregnancy, which are not diseases.
9

The measurement of health status is not restricted to broad generic measures. There are many instances when researchers and clinicians are interested in assessing the health status of individuals with a certain condition or disease. As might be anticipated, many tools have been designed for this purpose and these are primarily aimed at measuring changes that are of importance to clinicians. For example, Spilker et al. identified over 300 such instruments in 1987 and many more are presently available.
10
Examples of such instruments include: measures for arthritis, such as the Arthritis Impact Measurement Scales;
11
measures for the heart, such as the Specific Activity Scale;
12
measures that assess pain, such as the McGill Pain Questionnaire;
13
and measures for varicose veins, such as the Aberdeen Varicose Vein Questionnaire.
14
These instruments have a varying number of dimensions, differing numbers of items and are generally self-completion or interview, though some methods include professional assessment and clinical interview. Such questionnaires are usually scored in a simplistic fashion. Most have simple numerical scaling, such as from 1 to 5, and these scores are usually summed across the items for each dimension, or across all items.

Advantages of condition-specific measures include their relevance and their greater responsiveness to health change.
15
Disadvantages are that they often exclude items relevant to potential complications of treatment and symptoms that do not easily fit the medical model of disease. Generic measures have tended to be used in preference because they can be used to assess benefits for differing treatments or conditions, in a common and exchangeable currency. This enables decisions to be made on allocative efficiency between healthcare programmes within the total healthcare budget, rather than helping to establish the technical efficiency of producing health benefits for a specific condition.
9

Patient-reported outcome measures (PROMs)

Due to increasing need for assessing the effectiveness of care from a patient's perspective, a number of PROMs have been developed. For example, in the NHS patients undergoing primary hip or knee replacement, hernia surgery and varicose vein surgery are requested to fill questionnaires to assess their symptoms and disability before and after surgery. Some NHS trusts use electronic methods of recording outcome data, such as the Patient Assessment Questionnaires (ePAQ) in gynaecology. The advantage of PROM is that it minimises observer bias, wherein patient experience is directly assessed. These results can be useful in informing patients, redesigning the provision of services and quality improvement. One disadvantage can be the response rate and, in the case of electronic data collection, the access to and use of technology by patients to complete the questionnaires.

Other books

Push by Eve Silver
Beyond Affection by Abbie Zanders
Bait: A Novel by Messum, J. Kent
Parishioner by Walter Mosley
Fourteen by C.M. Smith
Allegiance by Trevor Corbett