Abstract
Background
Most randomized controlled trials (RCTs) and meta-analyses of RCTs examine effect modification (also called a subgroup effect or interaction), in which the effect of an intervention varies by another variable (e.g., age or disease severity). Assessing the credibility of an apparent effect modification presents challenges; therefore, we developed the Instrument for assessing the Credibility of Effect Modification Analyses (ICEMAN).Methods
To develop ICEMAN, we established a detailed concept; identified candidate credibility considerations in a systematic survey of the literature; together with experts, performed a consensus study to identify key considerations and develop them into instrument items; and refined the instrument based on feedback from trial investigators, systematic review authors and journal editors, who applied drafts of ICEMAN to published claims of effect modification.Results
The final instrument consists of a set of preliminary considerations, core questions (5 for RCTs, 8 for meta-analyses) with 4 response options, 1 optional item for additional considerations and a rating of credibility on a visual analogue scale ranging from very low to high. An accompanying manual provides rationales, detailed instructions and examples from the literature. Seventeen potential users tested ICEMAN; their suggestions improved the user-friendliness of the instrument.Interpretation
The Instrument for assessing the Credibility of Effect Modification Analyses offers explicit guidance for investigators, systematic reviewers, journal editors and others considering making a claim of effect modification or interpreting a claim made by others.Free full text
Development of the Instrument to assess the Credibility of Effect Modification Analyses (ICEMAN) in randomized controlled trials and meta-analyses
Abstract
BACKGROUND:
Most randomized controlled trials (RCTs) and meta-analyses of RCTs examine effect modification (also called a subgroup effect or interaction), in which the effect of an intervention varies by another variable (e.g., age or disease severity). Assessing the credibility of an apparent effect modification presents challenges; therefore, we developed the Instrument for assessing the Credibility of Effect Modification Analyses (ICEMAN).
METHODS:
To develop ICEMAN, we established a detailed concept; identified candidate credibility considerations in a systematic survey of the literature; together with experts, performed a consensus study to identify key considerations and develop them into instrument items; and refined the instrument based on feedback from trial investigators, systematic review authors and journal editors, who applied drafts of ICEMAN to published claims of effect modification.
RESULTS:
The final instrument consists of a set of preliminary considerations, core questions (5 for RCTs, 8 for meta-analyses) with 4 response options, 1 optional item for additional considerations and a rating of credibility on a visual analogue scale ranging from very low to high. An accompanying manual provides rationales, detailed instructions and examples from the literature. Seventeen potential users tested ICEMAN; their suggestions improved the user-friendliness of the instrument.
INTERPRETATION:
The Instrument for assessing the Credibility of Effect Modification Analyses offers explicit guidance for investigators, systematic reviewers, journal editors and others considering making a claim of effect modification or interpreting a claim made by others.
Investigators who conduct randomized controlled trials (RCTs) and meta-analyses of RCTs often perform analyses of effect modification to assess whether intervention effects might vary by another variable such as age, disease severity or, in a meta-analysis, study setting or year of study.1–14 The terminology varies; Box 1 presents the alternatives currently in use.
Investigators sometimes make claims that an effect modification is present. Literature surveys suggest that 14%–26% of RCTs and meta-analyses emphasize at least 1 potential effect modification in their abstract or discussion.4–9,11
The interest in effect modification is understandable: if patients with differing characteristics respond differently to the same intervention, the overall effect estimate is misleading for some, if not all, patients. Identifying situations in which true variation in effects exist is important, and the notion of tailoring therapy to patients has enormous appeal. Moreover, the opportunities for analyzing effect modification grow with the increasing number of newly developed diagnostic and genomic markers.
However, mistaken claims of effect modification may compromise optimal patient care, and many claims of effect modification have subsequently proved spurious.15–24 Applying a mistaken claim of effect modification may cause harms through administration of ineffective treatment or may lead to patients’ being denied beneficial therapies, and will likely increase health care costs.
Numerous theoretical analyses and simulation studies show that the fundamental reason for misleading claims of effect modification is chance:25 even if the treatment effect is the same for all patients, examining a sufficiently large number of candidates will inevitably reveal an apparent, but misleading, effect modification. Other reasons that contribute to spurious claims include selective reporting,5,7 lack of background knowledge and prior evidence,5,7,26 and failure to use a proper statistical analysis.5,8,27–29
Nevertheless, some claims of effect modification — likely a minority1 — will be true. Because most claims will never undergo replication to determine their veracity, stakeholders, including health care providers, clinical investigators, systematic review authors, guideline developers and journal editors, need criteria to differentiate spurious from real claims.
Methodologists first suggested credibility criteria for effect modification in the early 1990s.30,31 Since then, 30 groups have suggested sets of 3–21 criteria.25 Aside from the variability in these criteria, previous sets have suboptimal presentation, which results in ambiguity in their application.25 Some criteria — for instance, whether the effect modifier was one of a small number tested32 — are subjective, and users could benefit from more detailed guidance. Most important, none of the previous sets of criteria involved a rigorous development process or underwent user testing before publication.25
To address these limitations, we developed the Instrument for assessing the Credibility of Effect Modification Analyses (ICEMAN).
Methods
Development of ICEMAN consisted of 4 steps:33,34 clarifying the scope and measurement concept of the instrument; a systematic literature survey to identify existing instruments and candidate credibility criteria; a consensus study among experts to identify key criteria and design the instrument; and formal user testing.
Concept
Members of the core group (S.S., G.H.G., M.B., X.S., M.W., L.T.) began with the following concept:
Effect modification means that the effect of an intervention on an outcome varies by levels of another variable.
The aim of the new instrument is to assist users in assessing the credibility of claims that effect modification is present (rather than claims that effect modification is absent, which would require different criteria).
An effect modification is credible if it is unlikely to be the result of chance or bias.
We also clarified that patient importance is not part of the credibility; an effect modification primarily defines an association between the modifier and the causal effect of the intervention on the outcome (i.e., the presence of a causal relation between the modifier and the outcome is not necessary for the claim to be valid); and effect modification can be assessed on any scale (e.g., risk ratio or risk difference).
Target users include health care providers, trial investigators, systematic review authors, health technology assessment practitioners, journal editors, guideline developers and health policy-makers.
The instrument will address individual RCTs and meta-analyses of RCTs (including aggregate data and individual participant data meta-analyses).
The core instrument will consist of no more than 8–12 questions to keep both the demands of application and the cognitive burden manageable and will provide explicit response options for each question.
Responses to individual criteria should vary when applied to different claims of effect modification. An overly strict or lenient criterion that does not vary is useless for distinguishing more from less credible claims.
The instrument should conclude with a summary assessment that expresses the overall credibility of the apparent effect modification.
Systematic literature survey
The objectives of the systematic literature survey, presented in a separate publication,25 were to identify existing instruments for assessing the credibility of effect modification, candidate credibility criteria and leading experts in the field. Based on a comprehensive search of journal articles and textbooks, we identified 150 eligible publications, from which we abstracted 36 candidate criteria (Appendix 1, available at www.cmaj.ca/lookup/suppl/10.1503/cmaj.200077/-/DC1), 30 previous sets of criteria (none reflected our concept sufficiently) and 40 experts.25
Consensus study
The aim of the consensus study was to identify key criteria to assess the credibility of effect modification claims and use them to develop a user-friendly instrument. The steering committee randomized the order of the 40 experts identified in the literature survey and invited the first 18 to participate, of whom 9 agreed, 6 declined, and 3 did not respond. The final group included 15 members (the core group and 9 experts: R.V., C.H.S., R.A.H., J.G., M.B., G.J.M.G.V., I.J.D., W.S. and J.P.A.I.). The consensus study included elements of the Delphi method complemented by interactive video conferences.
In a first step, S.S. created a list of the 36 candidate criteria identified in the systematic survey, their frequency of reporting and reported rationales for their use (Appendix 2, available at www.cmaj.ca/lookup/suppl/10.1503/cmaj.200077/-/DC1). The members of the group (excluding S.S.) independently rated the importance of each criterion for credibility assessment from 1 (not important at all) to 7 (highly important). They also provided written suggestions to drop criteria, merge criteria or add new criteria. During the first video conference, the group discussed the importance ratings and identified 20 criteria that should be included (some of which we later combined), 8 that should be excluded and 8 that were considered optional (Appendix 2).
Based on the initial criteria selected, the core group developed a first draft of the instrument. Initially, we planned to create a single instrument applicable to individual RCTs, aggregate data meta-analyses and meta-analyses of individual participant data. Because a single version proved excessively complex, the group decided to create separate versions for RCTs and meta-analyses (of any type). We drafted preliminary considerations, explicit items (each with 4 response options), optional considerations and a final item to assess overall credibility by means of a visual analogue scale. Where possible, we used a format similar to that of other research assessment instruments such as the Cochrane risk-of-bias tool35 and Grading of Recommendations Assessment, Development and Evaluation (GRADE).36
We held a second video-conference to reach consensus on the general structure of the instrument, including preliminary considerations, core items and format of the overall rating. Main discussion points included issues of threshold selection (e.g., for p values and number of analyses) and framing of optional considerations.
In the last part of the consensus study, we created a detailed manual that provides, for each item, a detailed justification of the importance of the item for assessing the credibility of effect modification claims (Appendix 3, available at www.cmaj.ca/lookup/suppl/10.1503/cmaj.200077/-/DC1) (the justifications for excluding candidate items are provided in Appendix 2). For each response option, we sought a supporting example of an effect modification claim from the medical literature.
Throughout the consensus study, we periodically circulated summaries of the discussions and updated versions of the instrument to the experts, inviting them to provide comments. Appendix 2 documents major developments.
User testing
The aim of user testing was to identify challenges experienced by potential users in applying ICEMAN to a claim of effect modification that we provided. Each user received the full text of an RCT or meta-analysis in which the authors claimed an effect modification, and drafts of ICEMAN and the manual. We selected 17 claims specifically to introduce variation across the range of possible credibility (4 very low, 5 low, 4 moderate and 4 high, based on our judgement) and across designs (9 RCTs, 4 aggregate data meta-analyses and 4 meta-analyses of individual participant data) (Appendix 4, Supplemental Table S1, available at www.cmaj.ca/lookup/suppl/10.1503/cmaj.200077/-/DC1).
We recruited 17 potential users from 3 sources: corresponding authors of Cochrane reviews (n = 7), authors of published RCTs (n = 5) and journal editors from personal networks (n = 5). The users varied with respect to gender, background and familiarity with issues of effect modification (Appendix 4, Supplemental Table S2). We continued to enrol users until they did not identify any new major limitations of the instrument.
One of 2 investigators (S.S. or N.D.) interviewed users immediately after they had applied ICEMAN. The investigators followed a semistructured interview guide that included open questions (e.g., “What was your experience when you applied the first item?”) and allowed expansion on topics that emerged during the interview. The interviews lasted 25–70 (median 37) minutes and were video-recorded. The interviewers transcribed the interviews and extracted suggestions for improvement using Dedoose software (www.dedoose.com). We updated the instrument after 7, 12 and 15 interviews, before the consensus group finalized the instrument and manual. The users’ comments and resulting changes are summarized in Appendix 4, Supplemental Table S3.
Ethics approval
The Hamilton Integrated Research Ethics Board approved the user-testing study.
Results
The version of ICEMAN for individual RCTs is presented in Appendix 5 and that for meta-analyses of RCTs in Appendix 6 (both available at www.cmaj.ca/lookup/suppl/10.1503/cmaj.200077/-/DC1). The material, including potential updates, can also be downloaded from https://www.iceman.help.
The instrument can be used by investigators performing RCTs or meta-analyses who are planning analyses of effect modification; by investigators evaluating the credibility of claims they are considering; and by those who are critically appraising effect modification claims in the literature.
Both versions start with a set of preliminary considerations that link ICEMAN to a specific study, specify the effect modification claim under consideration and alert users that ICEMAN may not apply to effect modifiers measured after randomization (see manual [Appendix 3] for more details).
The version for RCTs includes 5 core questions and that for meta-analyses, 8 core questions (4 questions overlap) (Table 1). For each core question, ICEMAN provides 4 response options that differ in wording but have the same order and logic: response options on the left indicate definitely or probably reduced credibility, and response options on the right indicate probably or definitely increased credibility. We included the response option “probably no” with “unclear” to cover situations with insufficient information.
Table 1:
Core question | Version; question no.* | |
---|---|---|
Randomized controlled trials | Meta-analyses | |
Is the analysis of effect modification based on comparison within rather than between trials? | – | 1 |
For within-trial comparisons, is the effect modification similar from trial to trial? | – | 2 |
For between-trial comparisons, is the number of trials large? | – | 3 |
Was the direction of effect modification correctly hypothesized a priori? | 1 | 4 |
Was the effect modification supported by prior evidence? | 2 | – |
Does a test for interaction suggest that chance is an unlikely explanation of the apparent effect modification? | 3 | 5 |
Did the authors test only a small number of effect modifiers or consider the number in their statistical analysis? | 4 | 6 |
Did the authors use a random-effects model? | – | 7 |
If the effect modifier is a continuous variable, were arbitrary cut points avoided? | 5 | 8 |
NA = not applicable.
One optional question allows additional considerations that can reduce or increase credibility, such as results from sensitivity analyses, a dose–response relation, or other considerations that are difficult to ascertain, are less relevant or do not universally apply (see manual [Appendix 3] for examples).
The instrument concludes with an overall credibility assessment rated on a visual analogue scale divided into 4 areas (very low credibility, low credibility, moderate credibility and high credibility). The 4 areas correspond roughly to probabilities of less than 25%, 25%–50%, 51%–75% and greater than 75%, respectively, that the effect modification truly exists. To aid interpretation, the final item provides suggestions — rather than an algorithm — for judging overall credibility.
The manual (Appendix 3) provides detailed explanations, key references, examples for each response option, suggestions for use and presentation, and elaboration on conceptual considerations.
Interpretation
Using a systematic approach involving both experts and potential users, we developed ICEMAN. The instrument provides versions for individual RCTs and meta-analyses, is short (5 core items for RCTs, 8 for meta-analyses), is structured (preliminary considerations, core questions, overall rating) and provides a detailed manual.
One of the benefits of ICEMAN is that it may help to reduce over-reliance on the p value for interaction when assessing credibility. The p value counts no more than other items. Instead of a single threshold, the response options are based on 3 thresholds, thus emphasizing the continuous character of the concept. The expectation is therefore that claims of effect modification can be reasonably credible despite borderline p values, whereas claims that are based exclusively on very low p values may have low credibility.
Limitations
A possible limitation of ICEMAN is that, to optimize reliability, formulating 4 response options required specification of threshold values for credibility with respect to the number of studies in a meta-analysis, p values and number of candidate effect modifiers. These thresholds are arbitrary, and experts initially disagreed on the specific threshold values and whether they should be used at all. Particularly controversial within our group were thresholds for interaction p values, although the group finally found a compromise acceptable to all (using thresholds of 0.05, 0.01 and 0.005). It is perhaps reassuring that none of the participants of the user-testing study mentioned concerns with the chosen thresholds, and those who did comment appreciated their explicitness. Nevertheless, some users may disagree with the chosen thresholds.
Another potential limitation is that the core questions may not include all credibility considerations that experienced analysts may deem relevant, in particular for complex analyses such as modelling of continuous effect modifiers37,38 and data-driven algorithms for subgroup discovery.39,40 For instance, some analysts may question the appropriateness of statistical models underlying tests for interaction,26,41,42 may differ in their approach to multiple testing,43 may consider 3-way or 4-way interactions, may correct for exaggerated magnitude of effect modification44 or may want to consider the correlation structure between multiple effect modifiers. 45 Even for such users, ICEMAN will provide a useful starting point. For instance, if the core questions suggest low or very low credibility, it is unlikely that investing in additional, more complex analyses could increase credibility substantially; however, if the core questions suggest moderate credibility, users can use ICEMAN’s optional item to incorporate additional considerations.
Some properties of ICEMAN remain uncertain. We plan to investigate the reliability of ICEMAN ratings when applied by different raters to claims of effect modification. Another open question is the validity of ICEMAN ratings. We are unsure, however, whether there will ever be sufficient data available to investigate validity if we consider independent replication the reference standard for establishing the credibility of an effect modification claim. A recent analysis showed that attempts to replicate effect modification findings in RCTs are extremely rare.24 Therefore, we invite ICEMAN users to share their ratings with us so we can start building a database of more or less credible claims of effect modification and, at a later time, potentially assess the extent to which the claims were replicated. This will also allow better calibration of the 4 credibility areas of the overall credibility assessment and the corresponding ranges of percent credibility that we suggest. In addition, we will continue to evaluate ICEMAN’s performance in practice. We invite users to report difficulties or suggestions for improvement for consideration in future modifications of the instrument, by contacting the corresponding author.
Conclusion
In summary, ICEMAN provides a systematically developed and thoroughly user-tested instrument for judging the credibility of apparent effect modification. We expect that both investigators and readers of RCTs and meta-analyses, and other groups including journal editors, will find the structured assessment of credibility of proposed effect modification helpful.
Footnotes
Competing interests: None declared.
This article has been peer reviewed.
Contributors: Stefan Schandelmaier, Matthias Briel, Xin Sun, Michael Walsh, Lehana Thabane and Gordon Guyatt conceived the study. Stefan Schandelmaier and Gordon Guyatt drafted the manuscript. All of the authors revised the manuscript critically for important intellectual content, gave final approval of the version to be published and agreed to be accountable for all aspects of the work.
Funding: Stefan Schandelmaier was supported by grant P300PB_16475 from the Swiss National Science Foundation, the Gottfried and Julia Bangerter-Rhyner Foundation and the Freiwillige Akademische Gesellschaft Basel. Issa Dahabreh was supported by Methods Research Awards ME-1306-03758 and ME-1502-27794 from the Patient-Centered Outcomes Research Institute.
Disclaimer: The content of this manuscript does not represent the official views of the Patient-Centered Outcomes Research Institute, its Board of Governors or the Methodology Committee.
Data sharing: All relevant data except the names of the participants in the user-testing study and the transcripts of the interviews are provided in the publication, appendices and a related publication, and are available for use by other researchers. Additional requests regarding data sharing may be directed to the corresponding author.
References
Articles from CMAJ : Canadian Medical Association Journal are provided here courtesy of Canadian Medical Association
Full text links
Read article at publisher's site: https://doi.org/10.1503/cmaj.200077
Read article for free, from open access legal sources, via Unpaywall: https://www.cmaj.ca/content/cmaj/192/32/E901.full.pdf
Citations & impact
Impact metrics
Article citations
HIV Social-network intervention more effective in older populations in Kenya.
BMC Public Health, 24(1):3098, 09 Nov 2024
Cited by: 0 articles | PMID: 39516844 | PMCID: PMC11549832
Set reliable hypothesis when using ICEMAN to assess credibility of subgroup analysis. Authors' reply.
Intensive Care Med, 21 Oct 2024
Cited by: 0 articles | PMID: 39432101
Inhaled Reliever Therapies for Asthma: A Systematic Review and Meta-Analysis.
JAMA, 28 Oct 2024
Cited by: 0 articles | PMID: 39465893
Early high-sensitivity troponin elevation in predicting short-term mortality in sepsis: A protocol for a systematic review with meta-analysis.
PLoS One, 19(10):e0301948, 25 Oct 2024
Cited by: 0 articles | PMID: 39453928 | PMCID: PMC11508495
Traditional, complementary and integrative medicine therapies for the treatment of mild/moderate acute COVID-19: protocol for a systematic review and network meta-analysis.
BMJ Open, 14(11):e088959, 07 Nov 2024
Cited by: 0 articles | PMID: 39515857 | PMCID: PMC11552603
Go to all (181) article citations
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
A systematic survey identified 36 criteria for assessing effect modification claims in randomized trials or meta-analyses.
J Clin Epidemiol, 113:159-167, 24 May 2019
Cited by: 11 articles | PMID: 31132471
Credibility of claims of subgroup effects in randomised controlled trials: systematic review.
BMJ, 344:e1553, 15 Mar 2012
Cited by: 165 articles | PMID: 22422832
Review
The credibility of subgroup analyses reported in stroke trials is low: A systematic review.
Int J Stroke, 18(10):1161-1168, 02 May 2023
Cited by: 0 articles | PMID: 36988330 | PMCID: PMC10676048
Review Free full text in Europe PMC
Evidence-based Urology: Subgroup Analysis in Randomized Controlled Trials.
Eur Urol Focus, 7(6):1237-1239, 20 Oct 2021
Cited by: 3 articles | PMID: 34688589
Review