Chapter 4

Finding and summarizing evidence

Finding and evaluating existing evidence

There is an enormous amount of information available to clinicians and patients. Finding, sifting, and turning it into evidence is a major challenge.

In this chapter we introduce the skills that are needed to find existing evidence, read papers, assess the quality of the literature, and then summarize it in a systematic review and meta-analysis.

Finding evidence: overview

Evidence can be found in a wide range of sources. PubMed, a leading portal for databases like Medline, includes >20 million citations to journal articles and books. Unless you use a systematic method to find your evidence, you will waste a lot of time, and potentially miss a great deal of important work.

Defining the question

You must begin with a clearly defined clinical or research question and some idea of which information source you will use to search for evidence. Even if you define your research question thoroughly, you may have to modify your search strategy after your initial review.

Searching

Use a bibliographic database such as PubMed and enter your search terms.
Use specific search terms such as MeSH (Medical Subject Headings, a catalogue of terms managed by the National Library of Medicine).
Combine terms with Boolean operators ‘AND’, ‘OR’, and ‘NOT’.
Limit your search (by year of publication, language, only those that are full text, review articles, etc.).

If you are not familiar with how to do this then make use of the online tutorials (PubMed), or contact your local librarian.

Refining the search

If your search produces 2000 articles, you may need to refine it to make the results more manageable or focused; while if only 5 articles are found, you may want to consider broadening the search terms you used. It is always essential to go back to your defined research question and use the details in the question you are attempting to answer to guide your search strategy and its criteria.

Keeping records

This is perhaps the most important adjunct to your literature review. Accurate bibliographic details, search histories, critique details, key information from papers, etc. will help you find things again quickly. Reference Manager, Endnote and Mendeley are useful electronic systems.

Finding and evaluating evidence: worked example

A 76-year-old man has been diagnosed with inoperable intra-hepatic cholangiocarcinoma and has been advised that his prognosis is poor, 6 months at best. He asks you (his GP) whether he should have photodynamic therapy (PDT) which he has heard may prolong his survival. How do you find out whether it would be appropriate for him?

Define the question (see p.74)

‘In a 76-year-old man who has been diagnosed with inoperable intra-hepatic cholangiocarcinoma, will PDT increase his survival compared with palliative care alone?’

Search

In a search engine such as PubMed, type your keywords, e.g.:

Cholangiocarcinoma AND Photodynamic.

The search engine will then translate these:

(‘cholangiocarcinoma’ [MeSH Terms] OR ‘cholangiocarcinoma’ [All Fields] AND ‘photodynamic’ [All Fields].)

This produces 109 results (search performed March 2011). If you are looking for a quick answer you are likely to want a review article. You can limit your results to review, or you can add systematic review into the search, producing 9 articles.

Select

If you are doing a more thorough search (e.g. if you are doing a research project, or preparing a journal club) you should look at all 109 of these titles and abstracts to select those that are relevant. In a formal systematic review (see p.88) you would define inclusion and exclusion criteria for the papers.

Appraise

You then read the selected papers in detail to see if they are:

Relevant (to your question).
Valid (methodologically sound, well analysed).
Applicable (to your patient).

Methods for assessing these are covered in detail in this chapter. In this case, a brief search identifies a recent, high-quality systematic review that concludes there is some limited evidence that PDT may improve survival,1 but this is based on 2 small randomized controlled trials (RCTs) and some observational studies of varied quality. A firm conclusion was not possible.

Apply

If you find appropriate evidence then you should apply it. In this case, current evidence is not sufficient to make a strong recommendation. However, you would discuss this with your patient, and contact the oncologist to see if there might be ongoing trials that he could enter.

Review

As a GP you are unlikely to have another patient with this condition in the future, but you may still be interested and return to the literature in a couple of years to see if the evidence has changed.

Electronic resources

There are numerous e-resources available to help you identify information and search for up-to-date medical evidence. Any library will be able to provide a list of the key resources that you can use, including details of which ones they have free access to.

Reference databases

Some databases are simply a way of organizing information on all relevant publications in the field. Medline, Embase, and PsychInfo include articles from biomedical journals and some books and other publications. Where they cover the same broad topic, e.g. Medline and Embase, the majority of their content will overlap, but with some key areas of difference. It is always advisable to search at least 2 databases to ensure that you are not missing articles.

Some databases help to do this for you. For example, PubMed is a way of accessing several databases, and it also provides links to the publishers and to full-text articles where these are available.

Citation data

A citation is a reference to another article or source of information. Information about citations can be useful in looking for the most influential articles which are generally cited many times. There are many ways of accessing citation data now, but the most comprehensive is the Citation Indices published by the ISI Web of Science. Citation information is also provided through Google Scholar.

Conference abstracts

Information is sometimes published first at conferences where it may appear in official conference proceedings. These can be accessed through another index, the Conference Proceedings Citation Index, available through the Web of Knowledge.

Open access resources

Publicly funded research is increasingly published in open-access journals to ensure that everyone benefits from the findings. For example UK PubMed Central (UKPMC) provides free access to peer-reviewed research papers in the medical and life sciences. It includes over 2 million full-text journal articles, access to 24 million abstracts, and clinical guidelines (from NHS).

Evidence-based medicine resources

If you are looking for answers to a specific medical query it is best to see if the literature has already been reviewed by a reputable group. Evidence-based reviews can be found in a number of places including:

The Cochrane Library: a collection of databases with results of systematic reviews, clinical trials, health technology assessments, economic evaluations, etc.
BMJ Clinical Evidence: this is a journal publishing systematic reviews, but links to various resources designed to help clinicians in their evidence-based practice.
TRIP Database: another database of EBM resources available on the Internet.

Bibliographic software

When you search for articles it is essential to keep good records and be able to manage the data. Bibliographic software packages allow you to manage all the referenced evidence you find by enabling you to store it in your own personal database or library. In general, these packages are designed to assist in the following tasks:

Manual cataloguing of bibliographic references relating to particular research areas/topics.
Automated collection and organization of references from bibliographic databases, library catalogues, etc.
Quick searches for a particular reference.
Search and retrieval of bibliographic subsets.
Print or save a list of references.
Integration with word-processing software to automatically insert and format citations and bibliographies.
Formatting of references according to particular bibliographic styles (e.g. Chicago, Harvard, individual journals’ styles) and also formats for exporting to other packages and for data-sharing.
Find, import, and save full-text articles and access them from anywhere online.

There are many packages, including:

Reference Manager: image http://www.refman.com

Critical appraisal

Critical appraisal is the process of assessing the validity of research and deciding how applicable it is to the question you are seeking to answer. This section will cover how to read a paper with these aims in mind.

Validity: are the results of the study valid? Are the conclusions justified by the description of the methodology and the findings? Is the methodology sound, have the authors made reasonable assumptions, are there confounding factors they have failed to consider? If they are using a sample, has it been selected in a way that avoids bias?
Applicability: will the results help locally? Are the problems I deal with sufficiently like those in the study to extrapolate the findings? Can I generalize from this study to my clinical practice?

In subsequent sections we describe tools available to support appraisal of papers reporting different types of study.

Appraising a paper: summary checklist


Summarize the evidence you have read:
Why did they do it?
What did they do?
What did they find?
What did they conclude?
Consider the following:
Question.
Design.
Population.
Methods.
Data management.
Analysis.
Confounders.
Bias.
Ethics.
Patient engagement.
Interpretation.
Applicability.

Question

What is the question the researchers are trying to answer?
How does the question relate to evidence from earlier studies? Is it original or a ‘me too’ study (asking a question that has been asked and answered before in other populations perhaps)?
Is there a hypothesis and is it clearly stated?
Is the question relevant, focused and carefully formulated?

Design

What type of study design was used? Is it a case report/case series, ecological, time trend, cross-sectional, case–control, cohort, RCT, or is it a systematic review or meta-analysis?
Is that study design appropriate to the question? (See p.194.)
Where does the study design fit in the hierarchy of evidence? (See p.147.)

Population

Which population was used? Is it relevant to my question? Are results generalizable to other populations? E.g. findings from a clinical trial conducted only in men, or adults, or people with a particular stage of disease may not be generalizable to women, children, or people with different disease stages.
Sample size: how many people were included? Has a power calculation been conducted (see p.240) and did the researchers reach the numbers required? If not, then the study may not have sufficient power to detect differences even when they exist (see p.240).
How were the participants recruited?
Were all people in the target population invited to participate or a random sample of these?
What was the response rate? What evidence has been provided to show how responders differed from non-responders? Could this non-response have introduced bias?
Participation of volunteers without reference to a target population may introduce bias, as only people with a particular interest in the research question may be motivated to respond.
Inclusion criteria: these will define the population to which results can be extrapolated.
Exclusion criteria: these refine the target population and remove avoidable sources of bias e.g. excluding patients with a coexisting illness that may make results difficult to interpret.
Cases: how were cases defined and where were they recruited?
Controls: how were they defined and where were they recruited? Are they representative of people without the disease?
Setting: were the subjects studied in ‘real life’ circumstances such as a clinical setting, or at home? These factors will affect whether results can be replicated in other settings.

Methods

What specific intervention was being considered and what was it being compared with? Which exposure/risk factor was being studied in association with which outcome?
What exposures and confounders were measured and how? Were they measured in the same way in all groups? Is there any potential for bias?
What outcome was measured and how?
Was the outcome obtained by an objective measure, e.g. biochemical tests, or a more subjective method, e.g. symptoms, pain, psychological measures through a questionnaire?
Was measurement the same in all groups? Were the researchers blinded to the exposure/ treatment allocation?
Follow-up:
Has the study continued for long enough to detect the effect of the intervention or followed the cohort for long enough for cases of disease to accrue? Has there been loss to follow up which may bias the results?

Data management

Were data managed appropriately?
Was data entry checked (e.g. by double entry) and cleaned prior to analysis (see p.220)?
How were data stored? Was this secure and confidential?
Who had access to the data?
Is there any possibility that personal information could be disclosed?
Is any data linkage described and was it likely to miss links?

Analysis

Were primary and secondary end-points described in advance?
Was there an explicit framework for the analysis—based on hypothesis testing, and were subgroup analyses planned, or were the data ‘mined’ for any associations?
Were appropriate statistical tests carried out to evaluate probabilities of a chance finding and to adjust for confounding?
Were statistical methods adequately described? If multivariate analysis was carried out were the steps clearly defined (e.g. how regression models were constructed and how decisions were made about which variables to include or exclude)?

Confounders

Were potential confounders (see p.148) measured and adjusted for?

Bias

Is there a systematic error in the study design, data collection/measurement procedures, analyses, reporting, or a combination of these factors that has led to conclusions that are systematically different from the truth?
e.g. measurement bias could be introduced if a thermometer was incorrectly calibrated 3° lower than the actual temperature.
e.g. in a RCT, if participants with more severe disease were allocated non-randomly to treatment vs placebo groups.
e.g. selection bias could occur if there was high participation by health-conscious people in voluntary cancer screening, or if there was low participation by heavy smokers in studies investigating the association between smoking and cancer.
Completeness of follow-up:
Has any assessment been made of those who dropped out of the study? If, for example, drop-out rates were related to treatment e.g. due to adverse side effects associated with the treatment, this might bias the results.
Was assessment blinded?
In interventional studies (RCTs) it is preferable that both the participants and the researchers are unaware of whether or not the participant has received the trial drug or placebo. If, for example, patients were applying cream to a wound and nurses were measuring their improvement, knowledge of who got cream with active ingredients could potentially bias the recorded results.

Ethics

Has the study been reviewed and approved by an independent Ethics Committee or Institutional Review Board (IRB)?
Has appropriate consent been obtained from participants?
Are there any obvious ethical challenges, e.g. if people were tested (screened) for a condition were they provided with counselling, results, and appropriate management?

Patient engagement

Were any patients consulted or involved in any way in the design, management, analysis, or interpretation of the study?

Interpretation

Do the authors interpret their findings appropriately?
Are there other possible interpretations?
Do the authors situate their results appropriately alongside other research findings?
Do they make a causal inference? Think over the Bradford-Hill criteria (see p.156)

Applicability

Applicability is, in some ways, the hardest to judge in a rigidly scientific manner and decisions in this area may still be an art.

Ask yourself: are the problems I deal with sufficiently like those in the study to extrapolate the findings? Can I generalize from this study to my own practice?

Examples to consider:

If a doctor were to locate a paper that is scientifically faultless, he may be left pondering questions like, if the selection criteria only included ‘patients between 70 and 80 years old’ can I use the conclusions for patients in the 65 to 70 age group, and what about the relatively fit and ‘biologically young’ 81-year-olds?

Can studies on urban Americans be extrapolated from, say, Birmingham, Alabama, to Birmingham, West Midlands, and are rural practices in Norway different to those in Wales?

Similarly a teacher might ask ‘Are teaching practices that have been shown to be effective in 5- to 8-year-olds also of value in pre-school age children?’ or a social worker might enquire ‘Would counselling techniques used successfully in Asian youths apply equally well to those from an Afro-Caribbean background?’.

Appraisal checklists

Studies vary in design, and there are key factors to look out for when appraising each type. Specific checklists have been developed for a wide range of studies and can be useful when writing or reading papers.

These checklists are all available through the EQUATOR network website where they are regularly reviewed, updated, and new ones added.1,2 It is therefore worth checking the website for updates. As an example we have included the checklists and flow-diagrams for RCTs—the CONSORT statement later in this chapter (see pp.8486).

Other guidelines are listed in Table 4.1, check the websites for details.

Systematic reviews and meta-analyses

Consider:

Did the review address a clearly defined and important clinical or public health question?
Were all relevant studies identified through a thorough search?
Was methodological quality assessed and studies weighted accordingly?
Was it appropriate to combine the results of the included studies?
Was the possibility of publication bias assessed?
Were the results interpreted appropriately and with consideration of the broader picture?

Diagnostic test accuracy studies

Consider:

Is the study aim clearly defined? Does the study aim to estimate diagnostic accuracy or compare accuracy between tests or across patient groups?
Has the appropriate reference test or gold standard test been chosen for comparison?
Has the study included an appropriate spectrum of participants?
Is the disease status of the tested participants clearly established?
Were the methods for performing the test described in sufficient detail?
Has work-up bias been avoided?
Has observer bias been avoided?
Were confidence intervals calculated for sensitivity, specificity and positive and negative predictive values of the test? (See p.16)
Could the results for the diagnostic test of interest have been influenced by the results of the reference/gold standard test?
Have the study findings regarding the test been placed in the wider context of other potential tests in the diagnostic process?
Is this test relevant to my clinical practice?

Qualitative research studies

Consider:

Was there a clearly defined question?
Was a qualitative method appropriate for the research question?
What was the researcher’s perspective and how did this influence the methods they used for collection of data?
Was the recruitment strategy appropriate for the aims of the research?
Were all ethical issues duly considered?
Were the data transcribed and analysed in a rigorous manner?
Are the conclusions drawn justified by the findings?
Are the findings transferable to other clinical settings?
Do the findings provide me with useful insight for my clinical practice?

Table 4.1 Study appraisal guidelines

Acronym Study design to be appraised Website
CONSORT RCTs image http://www.consort-statement.org
MOOSE Meta-analyses of observational studies image http://www.equator-network.org
PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses image http://www.prisma-statement.org
STARD Diagnostic test accuracy studies image http://www.stard-statement.org
STREGA Strengthening the Reporting of Genetic Associations image http://www.equator-network.org
STROBE Observational studies image http://www.strobe-statement.org

CONSORT statement

See Table 4.2 for the CONSORT statement, and Fig. 4.1 for flow diagram and additional recommendations.

Table 4.2 CONSORT statement*

Section/topic Checklist item
Title and abstract
Identification as a randomized trial in the title
Structured summary of trial design, methods, results, and conclusions (see CONSORT for abstracts)
Introduction
Background and objectives Scientific background and explanation of rationale
Specific objectives or hypotheses
Methods
Trial design Description of trial design (such as parallel, factorial) including allocation ratio
Important changes to methods after trial commencement (such as eligibility criteria), with reasons
Participants Eligibility criteria for participants
Settings and locations where the data were collected
Interventions The interventions for each group with sufficient details to allow replication, including how and when they were actually administered
Outcomes Completely defined pre-specified primary and secondary outcome measures, including how and when assessed
Any changes to trial outcomes after trial commenced, with reasons
Sample size How sample size was determined
When applicable, explanation of any interim analyses and stopping guidelines
Randomization
Sequence generation Method used to generate the random allocation sequence
Type of randomization; details of any restriction (such as blocking and block size)
Allocation concealment mechanism Mechanism used to implement the random allocation sequence (such as sequentially numbered containers), describing any steps taken to conceal the sequence until interventions were assigned
Implementation Who generated the random allocation sequence, who enrolled participants, and who assigned participants to interventions
Blinding If done, who was blinded after assignment to interventions (for example, participants, care providers, those assessing outcomes) and how
If relevant, description of the similarity of interventions
Statistical methods Statistical methods used to compare groups for primary and secondary outcomes
Methods for additional analyses, such as subgroup analyses and adjusted analyses
Results
Participant flow (a diagram is strongly recommended) For each group, the numbers of participants who were randomly assigned, received intended treatment, and were analysed for the primary outcome
For each group, losses and exclusions after randomization, together with reasons
Recruitment Dates defining the periods of recruitment and follow-up
Why the trial ended or was stopped
Baseline data A table showing baseline demographic and clinical characteristics for each group
Numbers analysed For each group, number of participants (denominator) included in each analysis and whether the analysis was by original assigned groups
Outcomes and estimation For each primary and secondary outcome, results for each group, and the estimated effect size and its precision (such as 95% confidence interval)
For binary outcomes, presentation of both absolute and relative effect sizes is recommended
Ancillary analyses Results of any other analyses performed, including subgroup analyses and adjusted analyses, distinguishing pre-specified from exploratory
Harms All important harms or unintended effects in each group (for specific guidance see CONSORT for harms)
Discussion
Limitations Trial limitations, addressing sources of potential bias, imprecision, and, if relevant, multiplicity of analyses
Generalizability Generalizability (external validity, applicability) of trial findings
Interpretation Interpretation consistent with results, balancing benefits and harms, and considering other relevant evidence
Registration Registration number and name of trial registry
Protocol Where the full trial protocol can be accessed, if available
Funding Sources of funding and other support (such as supply of drugs), role of funders

*CONSORT strongly recommends reading this statement (previous pages) in conjunction with the CONSORT 2010 Explanation and Elaboration for important clarifications on all the items.

If relevant, they also recommend reading CONSORT extensions for cluster randomized trials, non-inferiority and equivalence trials, non-pharmacological treatments, herbal interventions, and pragmatic trials. Additional extensions are forthcoming: for those and for up to date references relevant to this checklist, see: image http://www.consort-statement.org

image

Fig. 4.1 CONSORT Statement 2010 flow diagram. From Altman DG, Moher D, for the CONSORT Group. CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials. BMJ 2010; 340:c332.

Systematic review

A systematic review is a method of providing a summary appraisal of existing evidence. Unlike traditional literature reviews, which are subjective and selective, this method is designed to be rigorous and reproducible, reducing bias.

Steps in a systematic review

Define the question.
Search the literature.
Appraise the studies for relevance and quality.
Extract the data.
Synthesize the data.
Report and apply the findings.

A systematic review should be based on a protocol that is produced in advance and includes the following elements:

Clear statement of the question.
Detailed description of the search strategy including databases to be searched and search terms to be used, and other methods for finding evidence such as use of reference lists in articles, contacting key researchers or organizations to find unpublished reports.
Explicit inclusion and exclusion criteria for studies; e.g. define terms in advance, specify relevant populations (e.g. age range), types of study.
Details of who will perform the search; ideally this should be done by more than one person to check that the same studies are identified.
Methods for assessing the quality of studies: this may be through a formal scoring against pre-defined criteria, or may be a qualitative review. Again this should be done by more than one person.
A description of data to be extracted, e.g. setting, participants, response rates, outcome.
A method for synthesizing results; this may be a qualitative or narrative report of the findings, tables summarizing the study populations and findings, or it may lead into a quantitative synthesis through a meta-analysis (see p.90).
A plan for reporting and applying the results.

Reporting a systematic review

Use the checklist in PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) at M http://www.prisma-statement.org. Describe the methods as in the protocol, including the details of the search. The results of the search should be presented in a standard flow-chart (Fig. 4.2). Details of the included papers should be in tables, as should key extracted data.

image

Fig. 4.2 PRISMA 2009 flow diagram. From Moher D, et al. Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. PLoS Med 2009: 6(6):e1000097. (Reproduced under the terms of the Creative Commons Attribution Licence.)

Meta-analysis

What is a meta-analysis?

A meta-analysis is a statistical analysis of a collection of studies, often collected together through a systematic review. A systematic review provides a summary appraisal of the evidence in the systematically collected studies in a narrative form. In contrast, the statistical techniques in a meta-analysis enable a quantitative review to be undertaken by combining together the results from all the individual available studies to estimate an overall summary, or average, finding across studies. The studies themselves are the primary units of analysis and can include the range of various different epidemiological study designs that we discuss throughout this book.

Why conduct a meta-analysis?

As a result of limited study size, biases, differing definitions, or quality of exposure, disease and potential confounding data, as well as other study limitations, individual studies often have inconsistent findings. They are therefore often insufficient in themselves to definitively answer a research question or provide clear enough evidence on which to base clinical practice. A meta-analysis, however, has several advantages, including:

Making sense of an inconsistent body of evidence by contrasting and combining results from different studies with the aim of identifying consistent patterns.
Including more people than any single constituent study, and produce a more reliable and precise estimate of effect.
Identifying differences (heterogeneity) between individual studies.
Exploring whether, for the question under investigation, studies with positive findings are more likely to have been published than studies with negative findings (publication bias).
Providing an evidence base for clinical decisions.

What are the steps in a meta-analysis?

There are 4 steps in a meta-analysis:

1. Extraction of the main result or study effect estimate from each individual study (calculation is sometimes necessary) e.g. odds ratio, relative risk etc., together with an estimate of the probability that the effect estimate is due to chance. Every study effect estimate is accompanied by the standard error, describing the variability in the study estimate due to random error. Sometimes we see this expressed as the variance, which is the standard error (SE) squared or the level of certainty we have in the estimated study result may be shown through the confidence interval (see p.229).

2. Checking whether it is appropriate to calculate a pooled summary/average result across the studies. Appropriateness depends on just how different the individual studies are that you are trying to combine. Sometimes, it is not appropriate to combine the studies at all. If it is appropriate, you must decide which method to use before you begin calculation (see p.92).

3. Calculation of the summary result as a weighted average across the studies, using either a random effects or fixed effects model (see p. 93). The weight usually takes into account the variance of the study effect estimate which reflects the size of the study population of each individual study. This weighted average gives greater weight to the results from studies which provide us with more information (usually larger studies with smaller variances) as these are usually more reliable and precise and less weight to less informative studies (often smaller studies).

4. Presentation of summary results—you will often see these as forest plots (Fig. 4.3). This is a graphical representation of the results from each study included in a meta-analysis, together with the combined meta-analysis result.

In the forest plot

The overall estimate from the meta-analysis is usually shown at the bottom, as a diamond.
The centre of the diamond and dashed line corresponds to the summary effect estimate.
The width of the diamond represents the confidence interval around this estimate.
image

Fig. 4.3 Understanding the results of meta analysis presented in a forest plot.

Analysis and interpretation of a meta-analysis

It is not always appropriate to pool results into a statistical meta-analysis.

How do I check if it’s sensible to pool results?

There are 2 elements in the decision: clinical and statistical.

Clinical

This requires a judgement, somewhat subjective, as to whether the studies collected are addressing the same question such that calculating a summary of their results would be informative and meaningful.
Pooling may not be appropriate if there are sufficient differences in study participants, interventions, or outcomes that suggest a different underlying research question.
For example: if you were considering undertaking a meta-analysis of the effect of a particular treatment of depression, it would not be sensible to pool studies investigating treatment efficacy in teenagers with studies in the elderly; nor should you combine studies reporting outcomes at 6 weeks with those reporting outcomes at 12 months.
If you have reason to believe that the studies are not estimating the same effects, do not pool the results.

Statistical

Studies should be assessed to see if there is significant variation with respect to the populations, interventions/exposures, outcomes, clinical settings, and designs used.
These differences or heterogeneity can be explored via Galbraith (radial) plots.

Assessing heterogeneity via Galbraith plots

Galbraith (radial) plots facilitate the examination of heterogeneity, including detection of outliers (Fig. 4.4):

The plot is the standardized intervention/exposure effect (effect/standard error) against the reciprocal of the SE.
The regression line through the origin represents the pooled effect estimate; with 95% boundaries.
Where there is little heterogeneity, the majority (95%) of studies should fall within these lines.
The vertical spread describes the extent of heterogeneity and reveals outliers.

Calculating a pooled result

The method depends on whether there is statistical heterogeneity. If no evidence for heterogeneity is found, a fixed effects model can be used to pool the effect estimates. If some heterogeneity exists, and the underlying assumptions of a fixed effects model (i.e. that diverse studies are estimating a single effect) is too simplistic, the heterogeneity can be allowed for by using an alternative approach known as the random effects model to pool the effect estimates.

image

Fig. 4.4 Interpreting the Galbraith (radial) plot.

The fixed effects model

For use where there is no evidence of heterogeneity.
Assumes every study evaluates a common treatment/exposure effect.
Assumes there is a single ‘true’ or ‘fixed’ underlying effect.

The random effects model

Assumes that the true treatment/exposure effects in the individual studies may be different from each other.
Assumes there is no single effect to estimate but a distribution of effects (due to between-study variation), from which the meta-analysis estimates the mean (and standard deviation) of the different effects.

If specific sub-groups of studies display heterogeneity, these can be pooled separately in sub-analyses, e.g. in the earlier example in this topic on the treatment of depression, pooling findings for young patients only together or pooling study findings focusing on longer-term outcomes together, etc.

Important points to remember:

Simply adding up data from individual studies is inappropriate and is not what a meta-analysis does.
If the studies are too heterogeneous, it may be inappropriate, even misleading, to statistically pool the results from separate studies.
Random effects meta-analysis will tend to be more conservative than fixed effects, as it allows for an extra source of variation (between study).

Evaluating meta-analyses: what to watch out for

Providing an overall summary measure of association/effect, can give a false impression of consistency across individual study results.
Always look out for systematic variations in findings across studies.
Was it appropriate to pool the studies—how well was heterogeneity explored?
Bear in mind that no meta-analysis can compensate for the inherent limitations of the trial or observational data being combined together.*
Consider the possibility that all studies have suffered a common systematic error, particularly before making inferences about causality.
Publication bias—the results of a meta-analysis may be biased if the included studies are a biased sample of studies in general (this is the meta-analytic analogue of selection bias in other epidemiological study designs).

Exploring publication bias

Publication bias refers to the greater likelihood that research with statistically significant results will be published in the peer-reviewed literature in comparison to those with null or non-significant results.

When evaluating a meta-analysis, you need to consider the following:

Failure to include all relevant data in a meta-analysis may mean the effect of an intervention/exposure is over- (or under-) estimated.
Publication bias is caused when only a subset of the relevant data is available.
Null or non-significant findings (especially in small studies) are less likely to be reported/published than statistically significant findings.
Publication bias in meta-analyses can be explored using funnel plots.
Funnel plots show whether there is a link between study size (or precision) and the effect estimate.
A funnel plot which is symmetric about the mean effect and shaped like an upside-down funnel indicates no publication bias.
A funnel plot with the lower right or left hand corner of the plot missing indicates that publication bias is present, as shown in Fig. 4.5.
Detecting publication bias is not straightforward and nor is correcting for it, but the funnel plot helps us to estimate how big an impact such bias might be having on the results of the meta-analysis.

image

Fig. 4.5 Interpreting funnel plots: the lack of publication bias (a) and the presence of publication bias (b).

Further reading

Egger, M et al. (eds). DG. Systematic Reviews in Health Care. London: BMJ Publishing Group; 2001.

Greenhalgh T. How to read a paper: Papers that summarise other papers (systematic reviews and meta-analyses). BMJ 1997; 315:672–75.

The Cochrane Collaboration: image http://www.cochrane-net.org


* Note that meta-analysis is only as good as the source data. It is no substitute for undertaking high quality original studies and trials which can then feed in to the decision-making process.