Definition of Meditation
The National Center for Complementary and Alternative Medicine defines meditation as a “mind-body” method. This category of complementary and alternative medicine includes interventions that employ a variety of techniques that facilitate the mind’s capacity to affect bodily function and symptoms. In meditation, a person learns to focus attention. Some forms of meditation instruct the student to become mindful of thoughts, feelings, and sensations, and to observe them in a nonjudgmental way. Many believe this practice evokes a state of greater calmness, physical relaxation, and psychological balance.1
Current Practice and Prevalence of Use
Many people use meditation to treat stress and stress-related conditions, as well as to promote general health.2,3 A national survey in 2008 found that the number of people meditating is increasing, with approximately 10 percent of the population having some experience with meditation.2 A number of hospitals and programs offer courses in meditation to patients seeking alternative or additional methods to relieve symptoms or to promote health.
Forms of Meditation
Meditation training programs vary in several ways, including the emphasis on religion or spirituality, the type of mental activity promoted, the nature and amount of training, the use of an instructor, and the qualifications of an instructor, which may all affect the level and nature of the meditative skills learned. Some meditative techniques are integrated into a broader alternative approach that includes dietary and/or movement therapies (e.g., ayurveda or yoga).
Researchers have categorized meditative techniques as emphasizing “mindfulness,” “concentration,” and “automatic self-transcendence.” Popular techniques such as transcendental meditation (TM) emphasize the use of a mantra in such a way that one “transcends” to an effortless state where there is no focused attention. Other popular techniques, such as mindfulness-based stress reduction (MBSR), are classified as “mindfulness” and emphasize training in present-focused awareness. Uncertainty remains about the extent to which these distinctions actually influence psychosocial stress outcomes.
Psychological Stress and Well-Being
Researchers have postulated that meditation programs may affect a range of outcomes related to psychological stress and well-being. The research ranges from the rare examination of positive outcomes, such as increased well-being, to the more common approach of examining reductions in negative outcomes, such as anxiety or sleep disturbance. Some studies address symptoms related to the primary condition (e.g., pain in patients with low back pain or anxiety in patients with social phobia), whereas others address similar emotional symptoms in clinical groups of people who may or may not have clinically significant symptoms (e.g., anxiety or depression in individuals with cancer).
Evidence to Date
Reviews to date have demonstrated that both “mindfulness” and “mantra” meditation techniques reduce emotional symptoms (e.g., anxiety and depression, stress) and improve physical symptoms (e.g., pain) from a small to moderate degree.4-23 These reviews have largely included uncontrolled studies or studies that used control groups that did not receive additional treatment (i.e., usual care or wait list). In wait-list controlled studies, the control group receives usual care while “waiting” to receive the intervention at some time in the future, providing a usual-care control for the purposes of the study. Thus, it is unclear whether the apparently beneficial effects of meditation training are a result of the expectations for improvement that participants naturally form when obtaining this type of treatment. Additionally, many programs involve lengthy and sustained efforts on the part of participants and trainers, possibly yielding beneficial effects from the added attention, group participation, and support participants receive, as well as the suggestion that symptoms will likely improve with these increased efforts.24,25
The meditation literature has significant limitations related to inadequate control comparisons. An informative analogy is the use of placebos in pharmaceutical trials. The placebo is typically designed to match the “active intervention” in order to elicit the same expectations of benefit on the part of both provider and patient, but not contain the “active” ingredient. Additionally, placebo treatment includes all components of care received by the active group, including office visits and patient-provider interactions. These nonspecific factors are particularly important to control when the evaluation of outcome relies on patient reporting. In this situation, in which double-blinding has not been feasible, the challenge to execute studies that are not biased by these nonspecific factors is more pressing.25 Thus, there is a clear need to examine the specific effects of meditation in randomized controlled trials (RCTs) in which expectations for outcome and attentional support are controlled.
Clinical and Policy Relevance
There is much uncertainty regarding the differences and similarities between the effects of different types of meditation.26,27 Given the increasing use of meditation across a large number of conditions, it is important for patients, clinicians, and policymakers to understand the effects of meditation, types and duration of meditation, and settings and conditions for which meditation is efficacious. While some reviews have focused on RCTs, many, if not most, of the included studies involved wait-list or usual-care controls. Thus, there is a need to examine the specific effects of meditation interventions relative to conditions in which expectations for outcome and attentional support are controlled.
The objectives of this systematic review are to evaluate the effects of meditation programs on affect, attention, and health-related behaviors affected by stress, pain, and weight among people with a medical or psychiatric condition in RCTs with appropriate comparators.
Scope and Key Questions
This report reviews the efficacy of meditation programs on psychological stress and well-being among those with a clinical condition. “Affect” refers to emotion or mood. It can be positive, such as the feeling of well-being, or negative, such as anxiety, depression, or stress. Studies usually measure affect through self-reported questionnaires designed to gauge how much someone experiences a particular affect. “Attention” refers to the ability to maintain focus on particular stimuli; clinicians measure this directly. Studies measure substance use as the amount consumed or smoked over a period of time, and include alcohol consumption, cigarette smoking, and use of other drugs such as cocaine. They measure sleep as the amount of time spent asleep versus awake or as overall sleep quality. Studies measure sleep time through either polysomnography or actigraphy, and sleep quality through self-reported questionnaires. They measure eating using food diaries to calculate how much energy or fat a person has consumed over a particular period of time. They measure pain similarly to affect, by a self-reported questionnaire to assess how much pain an individual is experiencing. Studies measure pain severity on a numerical rating scale from 0 to 10 or by using other self-reported questionnaires. The studies measure weight in pounds or kilograms.
The Key Questions are as follows:
Key Question 1. What are the efficacy and harms of meditation programs on negative affect (e.g., anxiety, stress) and positive affect (e.g., well-being) among those with a clinical condition (medical or psychiatric)?
Key Question 2. What are the efficacy and harms of meditation programs on attention among those with a clinical condition (medical or psychiatric)?
Key Question 3. What are the efficacy and harms of meditation programs on health-related behaviors affected by stress, specifically substance use, sleep, and eating, among those with a clinical condition (medical or psychiatric)?
Key Question 4. What are the efficacy and harms of meditation programs on pain and weight among those with a clinical condition (medical or psychiatric)?
Figure A. Analytic framework for meditation programs conducted in clinical and psychiatric populations
Figure A illustrates our analytic framework for the systematic review. The figure indicates the populations of interest, the meditation programs, and the outcomes that we reviewed. This figure depicts the Key Questions (KQs) within the context of the population, intervention, comparator, outcomes, timing, and setting (PICOTS) framework described in Table A. Adverse events may occur at any point after the meditation program has begun.
KQ = Key Question
Literature Search Strategy
We searched the following databases for primary studies through November 2012: MEDLINE®, PsycINFO®, Embase®, PsycArticles, SCOPUS, CINAHL, AMED, and the Cochrane Library. We developed a search strategy for MEDLINE, accessed via PubMed®, based on medical subject headings (MeSH®) terms and text words of key articles that we identified a priori. We used a similar strategy in the other electronic sources. We reviewed the reference lists of included articles, relevant review articles, and related systematic reviews (n=20) to identify articles that the database searches might have missed. We did not impose any limits based on language or date of publication.
Two trained investigators independently screened articles at the title-and-abstract level and excluded them if both investigators agreed that the article met one or more of the exclusion criteria (Table A). We resolved differences between investigators regarding abstract eligibility through consensus.
Paired investigators conducted a second independent review of the full-text article for all citations that we promoted on the basis of title and abstract. We resolved differences regarding article inclusion through consensus.
Paired investigators conducted an additional independent review of full-text articles to determine if they adequately addressed the KQs and should be included in this review.
We included RCTs in which the control group was matched in time and attention to the intervention group for the purpose of matching expectations of benefit. The inclusion of such trials allowed us to evaluate the specific effects of meditation programs separately from the nonspecific effects of attention and expectation. Our team thought this was the most rigorous way to determine the efficacy of the interventions. We did not include observational studies because they are likely to have a high risk of bias due to problems such as self-selection of interventions (since people who believe in the benefits of meditation or who have prior experience with meditation are more likely to enroll in a meditation program) and use of outcome measures that can be easily biased by participants’ beliefs in the benefits of meditation.
For inclusion in this review, we required that studies reported on participants with a clinical condition such as medical or psychiatric populations. Although meditation programs may have an impact on healthy populations, we limited our evaluation of these meditation programs to clinical populations. Since trials study meditation programs in diverse populations, we have defined clinical conditions broadly to include mental health/psychiatric conditions (e.g., anxiety or stress) and physical conditions (e.g., low back pain, heart disease, or advanced age). Additionally, since stress was of particular interest in meditation studies, we also included trials that studied stressed populations even though they may not have a defined medical or psychiatric diagnosis. We excluded studies among otherwise healthy populations.
|ACT = acceptance and commitment therapy; DBT = dialectical behavioral therapy; MBCT = mindfulness-based cognitive therapy; MBSR = mindfulness-based stress reduction; PICOTS = population, intervention, comparison, outcome, timing, and setting; RCT = randomized controlled trial; TM = transcendental meditation
Note: We excluded articles with no original data (reviews, editorials, and comments), studies published in abstract form only, and dissertations.
|Population and Condition of Interest||
|Interventions||Structured meditation programs (any systematic or protocolized meditation programs that follow predetermined curricula) consisting of at least 4 hours of training with instructions to practice outside the training session
|Meditation programs in which the meditation is not the foundation and majority of the intervention
|Comparisons of Interest||Active control is defined as a program that is matched in time and attention to the intervention group for the purpose of matching expectations of benefit. Examples include “attention control,” “educational control,” or another therapy, such as progressive muscle relaxation, that the study compares with the intervention.
||Studies that evaluate only a wait-list/usual-care control or do not include a comparison group|
|Outcomes||See Figure A||All other outcomes|
|Study Design||RCTs with an active control||Nonrandomized designs, such as observational studies|
|Timing and Setting||Longitudinal studies that occur in general and clinical settings||None|
Data Abstraction and Data Management
We used DistillerSR (Evidence Partners, 2010) to manage the screening process. DistillerSR is a Web-based database management program that manages all levels of the review process. We uploaded all the citations our search identified to this system.
We created standardized forms for data extraction and pilot tested them. Reviewers extracted information on general study characteristics, study participants, eligibility criteria, interventions, and outcomes. Two investigators reviewed each article for data abstraction. For study characteristics, participant characteristics, and intervention characteristics, the second reviewer confirmed the first reviewer’s data abstraction for completeness and accuracy. For outcome data and risk-of-bias scoring, we used dual and independent review. Reviewer pairs included personnel with both clinical and methodological expertise. We resolved differences between investigators regarding data through consensus.
For each meditation program, we extracted information on measures of intervention fidelity, including dose, training, and receipt of intervention. We measured duration and maximal hours of structured training in meditation, amount of home practice recommended, description of instructor qualifications, and description of participant adherence, if any.
For each KQ, we created a detailed set of evidence tables containing all information abstracted from eligible studies.
To display the outcome data, we calculated relative difference-in-change scores (i.e., the change from baseline in an outcome measure in the treatment group minus the change from baseline in the outcome measure in the control group, divided by the baseline score in the treatment group). However, many studies did not report enough information to calculate confidence intervals for the relative difference-in-change scores. When we evaluated point estimates and confidence intervals for just the postintervention or end-of-study differences between groups and compared these with the point estimates for the relative difference-in-change scores for those time points, some of the estimates that did not account for baseline differences appeared to favor a different group (e.g., treatment or control) when compared with the estimates that accounted for baseline differences. We therefore used the relative difference-in-change scores to estimate the direction and approximate magnitude of effect for all outcomes. For the purpose of generating an aggregate quantitative estimate of the effect of an intervention and the associated 95-percent confidence interval, we performed meta-analysis using standardized mean differences (effect sizes) calculated by Cohen’s method (Cohen’s d). We also used these to assess the precision of individual studies, which we factored into the overall strength of evidence (SOE). For each outcome, we displayed the resulting effect-size estimate according to the type of control group and duration of followup. Some studies did not report enough information to be included in meta-analysis. For that reason, we decided to display the relative difference-in-change scores along with the effect-size estimates from meta-analysis so that readers can see the full extent of the available data.
We considered a 5-percent relative difference-in-change score to be potentially clinically significant, since these studies were looking at short interventions and relatively low doses of meditation. In synthesizing the results of these trials, we considered both statistical and clinical significance. Statistical significance is determined according to study-specific criteria; we reported p-values and confidence intervals for these where present.
Trials used either nonspecific active controls or specific active controls (TableA, Figure A). Nonspecific active controls (e.g., education control or attention control) are used to control for the nonspecific effects of time, attention, and expectation. Comparisons against these controls allow for assessments of the specific effectiveness of the meditation program above and beyond the nonspecific effects of time, attention, and expectation. Such a comparison is similar to a comparison against a placebo pill in a drug trial, where one is concerned with the nonspecific effects of interacting with a provider, taking a pill, and expecting the pill to work. Specific active controls are therapies (e.g., exercise or progressive muscle relaxation) known or expected to change clinical outcomes. Comparisons against these controls allow for assessments of comparative effectiveness and are similar to comparing one drug against another known drug in a drug trial. Since these study designs using different types of controls are expected to yield quite different conclusions (effectiveness vs. comparative effectiveness), we separated them in our analyses.
Assessment of Methodological Quality of Individual Trials
We assessed the risk of bias in studies independently and in duplicate based on the recommendations in the Evidence-based Practice Center “Methods Guide for Effectiveness and Comparative Effectiveness Reviews” (Methods Guide).28 We supplemented these tools with additional assessment questions based on the Cochrane Collaboration’s risk-of-bias tool.29,30 While many of the tools to evaluate risk of bias are common to behavioral as well as pharmacologic interventions, some items are more specific to behavioral interventions. After discussion with experts in meditation programs and clinical trials, we emphasized four major and four minor criteria. We assigned 2 points each to the major criteria, weighting them more than the minor criteria in assessing risk of bias. We assigned 1 point each to the minor criteria. Studies could therefore receive a total of 12 points. If studies met a minimum of three major criteria and three minor criteria (9–12 points), we classified them as having “low risk of bias.” We classified studies receiving 6–8 points as having “medium risk of bias,” and studies receiving 5 or fewer points as having “high risk of bias”.
Assessment of Potential Publication Bias
We planned to use funnel plots to assess potential publication bias if numerous studies reported on an outcome of interest. We also searched for any trials on clinicaltrials.gov that completed recruitment 3 or more years ago and did not publish results, or listed outcomes for which they did not report results.
Strength of the Body of Evidence
Two reviewers graded the strength of evidence for each outcome for each of the KQs using the grading scheme recommended by the Methods Guide. In assigning evidence grades, we considered four domains: risk of bias; directness, consistency, and precision. We classified evidence into four basic categories: (1) “high” grade, indicating high confidence that the evidence reflects the true effect, and further research is very unlikely to change our confidence in the estimate of the effect; (2) “moderate” grade, indicating moderate confidence that the evidence reflects the true effect, and further research may change our confidence in the estimate of the effect and may change the estimate; (3) “low” grade, indicating low confidence that the evidence reflects the true effect, and further research is likely to change our confidence in the estimate of the effect and is likely to change the estimate; and (4) “insufficient” grade, indicating that evidence is unavailable or inadequate to draw a conclusion.
List of major and minor criteria in assessing risk of bias
- Was the control matched for time and attention by the instructors?
- Was there a description of withdrawals and dropouts?
- Was attrition <20% at the end of treatment? As several studies did not calculate attrition starting from the original number randomized, we recalculated the attrition from the original number randomized.
- Were those who collected data on the participants blind to the allocation?
- Was the method of randomization described in the article? To answer yes for this question, the trials had to give some description of the randomization procedure.
- Was allocation concealed?
- Was intent-to-treat analysis used? To answer yes for this question, the trial must impute noncompleter or other missing data, and it must do this from the original number randomized.
- Did the trial evaluate the credibility, and if so, was it comparable? If the trial did not evaluate credibility, or if it evaluated credibility but did not find it comparable, then we did not give the trial a point.
aWe assigned 2 points each to the major criteria in assessing risk of bias, and 1 point each to the minor criteria.
We assessed applicability separately for the different outcomes of benefit and harm for the entire body of evidence guided by the PICOTS framework, as recommended in the Methods Guide.28 We assessed whether findings were applicable to various ethnic groups, and whether race, ethnicity, or education limited the applicability of the evidence.
Literature Search Results
The literature search identified 17,801 unique citations. During the title-and-abstract screening, we excluded 16,177 citations. During the article screening, we excluded 1,447 citations. During KQ applicability screening, we excluded an additional 136 articles that did not meet one or more of the inclusion criteria. We included 41 articles in the review.31-71
Most trials were short term, but they ranged from 4 weeks to 9 years in duration. Since the amount of training and practice in any meditation program may affect its results, we collected this information and found a fair range in the quality of information. Not all trials reported on amount of training and home practice recommended. MBSR programs typically provided 20–27.5 hours of training over 8 weeks. The mindfulness meditation trials typically provided about half this amount. TM trials provided 16–39 hours over 3–12 months, while other mantra meditation programs provided about half this amount. Only five of the trials reported the trainers’ actual meditation experience (ranging from 4 months to 25 years), and six reported the trainers’ actual teaching experience (ranging from 0 to 15.7 years).
Of the 41 trials we reviewed, 15 studied psychiatric populations, including those with anxiety, depression, stress, chronic worry, and insomnia. Five trials studied substance-abusing populations such as smokers and alcoholics, 5 studied chronic pain populations, and 16 studied diverse medical populations, including those with heart disease, lung disease, breast cancer, diabetes, hypertension, and HIV.
The strength of evidence on the outcomes of our review is shown in Tables B and C. Since there were numerous scales for the different measures of affect, we organized the scales to best represent the clinically relevant aspects of each affect. For this review, the comparisons with nonspecific active controls provided efficacy data, whereas comparisons with specific active controls provided comparative effectiveness data. We found it difficult to draw comparative effectiveness conclusions from comparisons with specific active controls due to the large heterogeneity of type and strength of control groups. Therefore, we presented our results first for all the comparisons with nonspecific active controls in Table B (efficacy), and then for the specific active controls inTable C (comparative effectiveness).
The direction and magnitude of effect are derived from the relative difference between groups in the change score. In our efficacy analysis (Table B) we found low SOE of no effect or insufficient evidence that mantra meditation programs had an effect on any of the psychological stress and well-being outcomes we examined in these diverse adult clinical conditions.
Mindfulness meditation programs had moderate SOE for improvement in anxiety (effect size [ES], 0.40; confidence interval [CI], 0.08 to 0.71 at 8 weeks; ES, 0.22; CI, .02 to .43 at 3–6 months); depression (ES, 0.32; CI, −.01 to +0.66 at 8 weeks; ES, 0.23; CI, .05 to .42 at 3–6 months); and pain (ES, 0.33; CI, .03 to .62); and they had low SOE for improvement in stress/distress and mental health–related quality of life. We found either low SOE of no effect or insufficient SOE of an effect of meditation programs on positive mood, attention, and weight. We also found insufficient evidence that meditation programs had an effect on health-related behaviors affected by stress, including substance use and sleep.
In our comparative effectiveness analyses (Table C), we found low SOE of no effect or insufficient SOE that meditation programs were more effective than exercise, progressive muscle relaxation, cognitive-behavioral group therapy, or other specific comparators in changing any outcomes of interest.
Harm Outcomes for All Key Questions
Few trials reported on potential harms of meditation programs. Of the nine trials that reported on harms, none reported any harms of the intervention. One trial specified that the researchers looked for toxicities of meditation to hematologic, renal, and liver markers and found none. The remaining eight trials did not specify the type of adverse event they were looking for. Seven reported that they found no significant adverse events, while one did not comment on adverse events. The remaining 32 trials did not report whether they monitored for adverse events.
Assessment of Potential Publication Bias
We could not conduct any reliable quantitative tests for publication bias since few studies were available for most outcomes, and we were unable to include all eligible studies in the meta-analysis due to missing data. Consequently, funnel plots were unlikely to provide much useful information regarding the possibility of publication bias. We reviewed the clinicaltrials.gov registration database to assess the number of trials that had been completed 3 or more years ago and that prespecified our outcomes but did not publish at all, or published but did not publish all outcomes that were prespecified. We found five trials on clinicaltrials.gov that appeared to have been completed before January 1, 2010, and were published but did not publish the results of all outcomes they had prespecified on the registration Web site. We also found nine trials that appeared to have been completed before January 1, 2010, and had prespecified at least one of our outcomes but for which we could not find any publication. Ten registered trials had prespecified one or more KQ1 outcomes but did not publish them, two registered trials had prespecified attention as an outcome but did not publish, five registered trials prespecified one or more KQ3 outcomes but did not publish, and five registered trials prespecified one or more KQ4 outcomes but did not publish. It was not possible to determine whether eight of the nine registered trials for which we could not find a publication had actually been conducted or completed. Among 109 outcomes in 41 trials, trials did not give enough information to calculate a relative difference-in-change score (our primary analysis) for 6 outcomes due to statistically insignificant findings. Trials did not give enough information to conduct a meta-analysis on 16 outcomes. Our findings from the primary analysis are therefore less likely to be affected by publication bias than those from the meta-analysis.
Forty-one RCTs included in this review tested the effects of meditation programs in clinical conditions relative to active controls. Ten programs tested mantra meditation, and 31 programs tested mindfulness meditation. Active control groups included nonspecific controls, as well as specific controls that offer an opportunity to examine the comparative effectiveness of meditation programs.
Our review finds that the mantra meditation programs do not appear to improve any of the outcomes we examined, but the strength of this evidence varies from low to insufficient. We find that, compared with nonspecific active controls, the mindfulness meditation programs show small improvements in anxiety, depression, and pain with moderate SOE, and small improvements in stress/distress, negative affect, and the mental health component of health-related quality of life with low SOE. The remaining outcomes had insufficient SOE to draw any level of conclusion for mindfulness meditation programs. We were unable to draw a high-grade SOE for either type of meditation program for any of the psychological stress and well-being outcomes. We also found no evidence for any harms, although few trials reported on this.
We found 32 trials for KQ1: 4 evaluating TM, 2 evaluating other mantra meditation, and 26 evaluating mindfulness meditation. In general, we found no evidence that mantra meditation programs improve psychological stress and well-being. Compared with a nonspecific active control, mindfulness meditation programs improve multiple dimensions of negative affect, including anxiety, depression, and perceived stress/general distress, and the mental health component of quality of life, with a low to moderate SOE. Well-being and positive mood are positive dimensions of mental health. While meditation programs generally seek to improve the positive dimensions of health, the available evidence from a very small number of studies did not show any effects on positive affect or well-being. Both analytic methods—the difference-in-change estimates (which accounted for baseline differences between groups) and the meta-analyses (which compared only end-line differences)—generally showed consistent but small effects for anxiety, depression, and stress/distress. However, there are a number of observations that help in interpreting and giving context to our conclusions.
First, very few mantra meditation programs were included in our review, significantly limiting our ability to draw inferences about the effects of mantra meditation programs on psychological stress-related outcomes. These conclusions did not change when we evaluated TM separately from other mantra meditation programs. Apart from the paucity of trials, another reason for seeing null results may be the type of populations studied; for example, three TM trials enrolled cardiac patients, while only one enrolled anxiety patients. In addition, it is not known whether these study participants had high levels of a particular negative affect to begin with.
Second, among mindfulness trials, the effects were significant for anxiety and marginally significant for depression at the end of treatment, and these effects continued to be significant at 3–6 months for both anxiety and depression.
Third, when we combine each outcome that is a subdomain of negative affect (anxiety, depression, and stress/distress), we see a small and consistent signal that any domain of negative affect is improved in mindfulness programs when compared with a nonspecific active control.
Fourth, the effect sizes are small. Over the course of 2–6 months, mindfulness meditation program effect-size estimates ranged from 0.22 to 0.40 for anxiety symptoms and 0.23 to 0.32 for depressive symptoms, and were statistically significant.
Fifth, there may be differences between trials for which these outcomes are a primary versus secondary focus, although we did not find any evidence for this. Some trials that had an outcome as a primary focus did not recruit based on high symptom levels of that outcome. Thus, the samples included in these trials more closely resemble a general primary care population, and there may not be room to measure an effect if symptom levels were low to start with (i.e., a “floor” effect).
Sixth, studies found an improvement in outcomes among the mindfulness groups (compared with control) only when they made comparisons against a nonspecific active control. In each comparison against a known treatment or therapy, mindfulness did not outperform the control for any outcome. This was true for all comparisons for any form of meditation for any KQ. Out of 53 comparisons with a specific active control, we found only 2 that showed a statistically significant improvement: mindfulness-based cognitive therapy improved quality of life in comparison with use of antidepressant drugs among depressed patients, and mindfulness therapy reduced cigarette consumption in comparison with the Freedom from Smoking program. However, we also found five comparisons for which the specific active control performed better, with statistically significant results, than the meditation programs. The comparisons with specific therapies led to highly inconsistent results for most outcomes (Figure B2) and indicated that meditative therapies were no better than the specific therapies they were being compared with. These include such therapies as exercise, yoga, progressive muscle relaxation, cognitive behavioral therapy, and medications.
One RCT compared a meditation program with active control on the outcome of attention. There were no statistically significant differences between groups on the Attentional Network Test. Trends suggested that the meditation program performed better than the nonspecific active control on this measure, although the difference did not reach statistical significance. These findings indicate the need for more comprehensive trials with a variety of clinical populations (e.g., people with disorders in which attention may be compromised) to provide a clearer understanding of the impact of meditation programs on attention.
Among the 13 trials evaluating the effects of meditation programs on health-related behaviors affected by stress, 4 evaluated the effect of meditation on substance use,33,34,54,67 2 evaluated eating,43,50 and 7 evaluated sleep.31,41,42,49,55,61,70 Overall, there is insufficient evidence to indicate that meditation programs alter health-related behaviors affected by stress. Our findings are consistent with those of previous reviews in this area, in which uncontrolled studies have usually found a benefit for the effects of meditation programs on health-related behaviors affected by stress, while very few controlled studies have found a similar benefit.14-16
Among the 14 RCTs evaluating the effect on pain and weight, we found moderate SOE that MBSR reduces pain severity to a small degree when compared with a nonspecific active control. This finding is based on four trials, of which two were conducted in musculoskeletal pain patients, one in patients with irritable bowel syndrome, and one in a nonpain population. Visceral pain had a large and statistically significant relative 30-percent improvement in pain severity, while musculoskeletal pain showed 5- to 8-percent improvements that were considered nonsignificant. We also found low SOE that MBSR was not superior in reducing pain severity when compared with various specific active controls (including massage). Two mindfulness trials evaluated weight as an outcome, and it was a primary outcome for both. Three TM trials evaluated weight as a secondary outcome. Due to consistently null results, there was low SOE to suggest that TM and MBSR do not have an effect on weight.
The comparative effectiveness of an intervention obviously depends heavily on what is done for the comparison group. A strength of our review is our focus on RCTs with nonspecific active controls, which should give us greater confidence that the reported benefits are not due to having a flawed comparison group that does not control for nonspecific effects, as seen in trials using a wait-list or usual-care control.
Limitations of the Primary Studies
Although we collected information on amount of training provided, the trials did not provide enough information to make use of the data. We could not draw definitive conclusions about effect modifiers, such as dose and duration, because of the limited amount of data.
It may be that specific outcome measurement scales may be more relevant for a particular form of meditation than for others. Many studies assessed only certain measures, and the scales may have been limited in their ability to detect an effect.
We intended to evaluate the effects of meditation programs on a broad range of medical and psychiatric conditions, since psychological stress outcomes are not limited to any particular medical or psychiatric condition. Despite our focus on active RCTs, we were unable to detect a specific effect of meditation on most outcomes, with the majority of our evidence grades being insufficient or low. This was mostly driven by two important evaluation criteria: risk of bias and inconsistencies in the body of evidence. The reasons for such inconsistencies may include differences in the particular clinical conditions, as well as the type of control groups that studies used. We could not easily compare studies in which a meditation program was compared with a specific active control versus trials that used a nonspecific active control. We therefore separated these comparisons in order to be able to evaluate the effects against a relatively homogeneous nonspecific active control group. In general, comparing trials that used one specific active control with trials that used another specific active control led to large inconsistencies that could be explained by differences in the control groups.
Another possibility is that programs had no real effect on many of the outcomes that had inconsistent findings. While some of the outcomes were primary outcomes, many were secondary outcomes, and the studies may not have been appropriately powered to detect changes in secondary outcomes.
Limitations of the Review
Our assessment of a 5-percent relative difference between groups in change scores as being potentially clinically significant needs to be interpreted in the context of heterogeneous scales reporting on various measures. The literature does not clearly define the appropriate threshold for what is clinically significant on many of these scales. Some may consider a higher threshold as being clinically relevant.
While this review sought to assess the effectiveness of meditation programs above and beyond the nonspecific effects of expectation and attention, it did not assess the preferences of patients. Even though one therapy may not be better than another, many patients may still prefer it for personal or philosophical reasons.
We were limited in our ability to determine the overall applicability of the body of evidence to the broad population of patients who could benefit from mindfulness meditation because the studies varied so much in many ways other than just the specific targeted population; that is, they also varied in characteristics of the intervention, comparator, outcomes, timing, and setting. Also, the studies generally did not provide enough information to be able to determine whether the effectiveness of mindfulness meditation varied by race, ethnicity, or education.
Further research in meditation would benefit by addressing several remaining methodological and conceptual issues. First, all forms of meditation, including both mindfulness and mantra, imply that more time spent meditating will yield larger effects. Most forms, but not all, also present meditation as a skill that requires expert instruction and time dedicated to practice. Thus, more training with an expert and practice in daily life should lead to greater competency in the skill or practice, and greater competency or practice would presumably lead to better outcomes. When compared with other skills that require training, the amount of training afforded in the trials included in our review was quite small, and generally the training was offered over a fairly short period of time. Researchers should account for or consider the level of skill in meditation and how variation in skill may affect the effectiveness of meditation when designing studies, collecting data, and interpreting data. To facilitate this, better measurement tools are needed. Research has not adequately validated currently available mindfulness scales, and the scales do not appear to distinguish between different forms of meditation.26 Thus, we need further work on the operationalization and measurement of the particular meditative skill. For meditation programs that do not consider themselves to be training students in a skill, such as TM and certain mindfulness programs, there is still a need to transparently assess whether a student has attained a certain mental state or is correctly executing the recommended mental activities (or absence of activities).
Second, trials need to document the amount of training instructors provide and patients receive, along with the amount of home practice patients complete. This information gives an indication of how effective the program is at delivering training and how adherent participants were. This will allow us to address questions around “dosing.”
Third, studies should report on teacher qualifications in detail. The range of experience in meditation and competence as a teacher of the skill or practice likely plays a role in outcomes.
Fourth, when using a specific active control, if one finds no statistically significant superiority over the control, one is left with the issue of whether the meditation is equivalent to or not inferior to the control, or whether the trial was just underpowered to detect any difference. Conducting comparative effectiveness trials requires prior specification of the hypothesis (superiority, equivalence, noninferiority) and appropriate determination of the margins of clinical significance and minimum importance difference.72 In the case of equivalence and noninferiority, trials also need to have appropriate assay sensitivity. None of the trials showed statistically significant effects against a specific active control, nor did they appear adequately powered to assess noninferiority or equivalence. These issues leave a lot of uncertainty in such trial designs.
Fifth, positive outcomes are a key focus of meditative practices. However, most trials did not include positive outcomes as primary or even secondary outcomes. Future studies should expand on these domains.
Sixth, we were unable to review biological markers of stress for meditation programs. A comprehensive review would benefit meditation research and also allow for a cross-validation of psychological and biological outcomes.
Future trials should appropriately report key design characteristics so we can accurately assess risk of bias. Future trials should register the trial on a national register, standardize training using trainers who meet specified criteria, specify primary and secondary outcomes a priori, power the trial based on the primary outcomes, use CONSORT (CONsolidated Standards of Reporting Trials) recommendations for reporting results, and operationalize and measure the practice of meditation by study participants.
Our review found moderate SOE that mindfulness meditation programs are beneficial for reducing anxiety, depression, and pain severity, and low SOE that they may lead to improvement in any dimension of negative affect when compared with nonspecific active controls. There was no advantage of meditation programs over specific therapies they were compared with. Otherwise, much of the evidence was insufficient to address the comparisons for most of the questions.
There are reasons why a large number of outcomes lacked sufficient evidence. While we sought to review the highest standards of behavioral RCTs that controlled for nonspecific factors, there was wide variation in risk of bias among these trials. Another reason for a lack of sufficient evidence is that we found a limited number of trials for most outcomes, resulting in limited data available for meta-analysis or descriptive synthesis. For example, there were so few trials of TM that we could not draw meaningful conclusions from them. In addition, the reasons for a lack of significant reduction of stress-related outcomes may be related to the way the research community conceptualizes meditation programs, the difficulties of acquiring meditation skills or meditative states, and the limited duration of RCTs. Historically, the general public has not conceptualized meditation as a quick fix toward anything. It is a skill or state one learns and practices over time to increase one’s awareness, and through this awareness gain insight and understanding into the various subtleties of one’s existence. Training the mind in awareness, nonjudgmentalness, and the ability to become completely free of thoughts or other activity are daunting accomplishments. While some meditators may feel these tasks are easy, they likely overestimate their own skills due to a lack of awareness of the different degrees to which these tasks can be done or the ability to objectively measure their own progress. Since becoming an expert at simple skills such as swimming, reading, or writing (which can be objectively measured by others) takes a considerable amount of time, it follows that meditation would also take a long period of time to master. However many of the studies included in this review were short term (e.g., 2.5 hours a week for 8 weeks), and the participants likely did not achieve a level of expertise needed to improve outcomes that depend on a mastery of mental and emotional processes. The short-term nature of the studies, combined with the lack of an adequate way to measure meditation competency, could have significantly contributed to results.