Microsoft word - web supplement sans line numbers reformat citations.doc
Supplementary Appendix
This appendix has been provided by the authors to give readers additional information about their work.
Supplement to: Turner EH, Matthews AM, Linardatos E, et al. Selective publication of antidepressant trials and
its influence on apparent efficacy. N Engl J Med 2008;358:252-60.
SELECTIVE PUBLICATION OF ANTIDEPRESSANT TRIALS AND ITS INFLUENCE ON APPARENT EFFICACY
SUPPLEMENTARY APPENDIX
This document provides details that supplement the material in the body of the article but could not be included due to space constraints. The headings used below mostly fol ow
those in the main article.
• In order to approve a drug, the FDA requires "substantial evidence" of effectiveness
from two positive "adequate and wel control ed studies".1 The language that the FDA uses in labeling reflects this focus on positive trials. For example, the labeling for
sertraline states, "The efficacy of Zoloft as a treatment for major depressive disorder was
established in two placebo-control ed studies in adult outpatients meeting DSM-III
criteria for major depressive disorder."
DATA FROM FDA REVIEWS
Data procurement – FDA reviews
• Review documents for many approved drug-indication combinations have been
electronical y available in the public domain since enactment of the Electronic Freedom of Information Act Amendments of 1996 (eFOIA).2 Reviews for drugs approved before
that are not available electronical y. For reasons that are unclear, at the FDA has not, as
of this writing, posted the reviews of three antidepressants approved after eFOIA was
enacted: bupropion extended-release (Wel butrin XL®), mirtazapine oral y disintegrating (Remeron Soltabs®), and transdermal selegiline (EMSAM®).
• Fluoxetine studies 62-a and 62-b are referred to in the FDA review with a single study
number (62). However, the review indicates that these were identical y designed but that
they were separate and nonoverlapping studies. The first study involved patients with mild depression (but who stil met DSM-criteria for major depression), while the other
involved patients with moderate depression.
Data extraction – FDA reviews
• The FDA uses statistical superiority (P<.05) to a comparator, usual y placebo, to
determine whether the study is positive3 (aka "a win"). Studies are pooled or meta-
analyzed only if that is specified in the original protocol and agreed to in advance by the
• Failed versus negative studies: Active comparator treatment arms are sometimes
included in study designs along with study drug and placebo. Because active comparators
are approved antidepressants, they are expected to beat (demonstrate statistical
superiority to) placebo. When that does not happen and the study drug also does not
beat placebo, the FDA "excuses" the study drug for not beating placebo and deems the study inconclusive or "failed".4 On the other hand, when an active comparator
does beat
SELECTIVE PUBLICATION OF ANTIDEPRESSANT TRIALS AND ITS INFLUENCE ON APPARENT EFFICACY
placebo (as expected) but the study drug does not, the study is judged negative. When
the sponsor elects to omit an active comparator from the design and the study drug does not beat placebo, the study is also deemed negative.1 However, the validity of
distinguishing between negative and failed studies has been questioned.5
• If the reviewer's overal judgment regarding a study's outcome was not clearly stated, we
used that of the team leader or division director.
• Double data extraction and entry – FDA reviews: This was performed first by ET, AM,
and EL. A second extraction and entry of the FDA data was performed by RT and SR,
who were blind to the results of the first extraction/entry process. The values obtained
in the second process were compared to those obtained in the first. Any discrepancies
were resolved by consensus.
DATA FROM JOURNAL ARTICLES
Data procurement – journal articles
•
Literature search: Literature searches were original y conducted by ET, AM, and EL.
Subsequently, an academic reference librarian (AH), who was blind to the results of the original searches, conducted independent searches of Ovid Medline and Cochrane
Central Register of Control ed Trials, which identified no new journal articles. A repeat
search of reference lists revealed one additional article not indexed in the searched
• We did not consider forms of data disclosure such as conference proceedings (including
published conference abstracts), clinical trial registries, book chapters, or newspaper articles.
• We al owed one exception to our rule of including only stand-alone publications of
single studies, a paper pooling the results from two identical y designed studies of
paroxetine control ed-release (Appendix Table A).
• Matching of studies in FDA reviews to journal articles was completed initial y by ET,
AM, and EL. It was later repeated by RT, who was blind to the results of the original
matching process. Any discrepancies between the first and second matching procedures were resolved by consensus.
Data extraction – journal articles
• Double data extraction and entry: Journal data extraction and entry was initial y
performed by ET. Subsequently, we provided the matched journal articles (clean
1 Unwritten policy learned during the first author's tenure as a reviewer in the division of the FDA handling
approval of psychotropic drugs, stil applicable according to communications with current employees and apparent from review documents. This may not apply to other review divisions within the FDA.
SELECTIVE PUBLICATION OF ANTIDEPRESSANT TRIALS AND ITS INFLUENCE ON APPARENT EFFICACY
unmarked copies) to TB and NM, who were blind to the results of the original
extraction/entry process. They extracted the results on the apparent primary endpoints, which were entered by NM and RT. The results of this second extraction/entry process
were compared with those from the first process. Discrepancies were resolved by
STATISTICAL ANALYSIS
Continuous data (effect size)
• Using effect size permitted us to combine data from trials using different primary rating
scales in a standardized way.
• The meta-analyses described were performed twice, once based on the first data
extraction/entry process and again based on the second such process (see above). The
overal mean weighted ES values (Figure 3) resulting from the two sets of meta-analyses
were within ±.01 of one another. The percentage difference, between the overal
weighted mean ES value obtained from the FDA data and the value obtained from the journal data, was unchanged.
• For purposes of the paired (signed-rank) analyses of the g values, we conducted a
preliminary "mini-meta-analysis" of the FDA data for paroxetine studies 448 and 449 (Appendix Table A). This provided a single FDA result, which we compared with the
single pooled result reported in the journal article.
• As stated in the Methods section of this article, within the FDA dataset, we calculated a
mean g value for each drug's published studies and a mean g value for each drug's unpublished studies. However, 2 of the 12 drugs had no unpublished studies; thus a
value for gunpublished could not be calculated for these two drugs. We first analyzed the 10
complete pairs of gpublished and gunpublished values. We followed this with an ancil ary analysis,
using 12 pairs by imputing gunpublished=gpublished for these two drugs, assuming the nul
hypothesis held.
Handling of imprecise P values
• Precise P values were not always available for the calculation of effect size. In cases
where they were not available, we looked for data from which to calculate precise P
values. We found means and standard deviations (or standard errors or confidence
intervals) in the FDA reviews for 9 treatment arms within 6 studies and for 17 treatment arms within 10 journal articles. These instances are shown in Appendix Table A with the
superscript "M". For one article6 whose apparent primary endpoint was based on the
proportion of responders, we calculated a precise P value by using the reported number
of responders in a replication of the authors' chi-squared test. This instance is shown in Appendix Table A with the superscript "R".
• If no such data were available, we set the P value equal to the precise P value obtained
from the other data source. The purpose of this approach was to create a conservative
bias in the direction of the nul hypothesis, i.e. of
not finding a difference between FDA and journal results reporting. As an example, if Psponsor was stated as <.05, but PFDA was
SELECTIVE PUBLICATION OF ANTIDEPRESSANT TRIALS AND ITS INFLUENCE ON APPARENT EFFICACY
given as .03, we set Psponsor=.03 and used that to calculate gsponsor. Likewise, if Psponsor was
reported as "NS" while PFDA was .24, we set Psponsor=.24, also. Each such instance is
shown in Appendix Table A with the superscript "F" or "J".
• If the precise P value did
not lie within the P value range provided by the other source,
we set the P value equal to the top of that range, thus bringing the FDA and journal P
values as close to equality as reasonably possible. For example, in the case of Psponsor<.10
but PFDA=.50, we set Psponsor=.10. These instances are shown in Appendix A with the
superscript "T".
• Additional y we set three pairs of matching P value ranges equal to the values at the top
of their respective ranges. These are shown in Appendix Table A with the superscripts "T" and "J" on the FDA side and "T" and "F" on the journal side. (In one of these
cases, the FDA review reported a P value of ".00", which we interpreted as P<.01.)
• Nonsignificant P values: The FDA reviews reported nonsignificant results for 56
treatment arms. From these we obtained 45 precise P values. (Precise values were reported for 41 nonsignificant P values, and we calculated 4 others from data provided
in the reviews using the method noted above.) The FDA reported on an additional 11 P
values as "NS" but no other data from which we could calculate precise P values. One of
these we set equal to the precise (nonsignificant) P value obtained from the corresponding journal publication, fol owing the procedure mentioned above. For the
remaining 10 instances (18% of the total number of nonsignificant results), there was no
corresponding journal publication. To exclude these nonsignificant P values from the
analyses would have created a bias in the proportion of studies found nonsignificant, and it would also have biased the estimates of effect size. Instead, we assigned a precise P
value for these 10 nonrecorded nonsignificant P values by transforming the above-
mentioned 45 precise P values into their standard normal deviates Z, finding their
median (.954), and transforming that back into P =.34. (The upper and lower quantile Z values were 1.254 and 0.628, corresponding to P values of .21 and .53, respectively.)
Thus we used P=.34, derived from 45 precise nonsignificant P values, as the precise
value for the 10 P values reported only as "NS". These instances are shown in Appendix
Table A with the superscript "N".
SELECTIVE PUBLICATION OF ANTIDEPRESSANT TRIALS AND ITS INFLUENCE ON APPARENT EFFICACY
• Studies of escitalopram and some citalopram studies used the Montgomery-Åsberg
Depression Rating Scale (MADRS). Al other studies employed the Hamilton
Depression Rating Scale, usual y the 17-item version.
• For antidepressant studies, the FDA defines intent-to-treat (ITT) patients as al
randomized patients who return for at least one on-drug post-baseline visit.
• In enumerating the numbers of patients within the studies, we included only those
patients whose data we used in the other analyses. Thus we excluded patients
randomized to active comparator treatment arms and to doses that were eventual y not approved. Therefore the actual numbers of patients participating were greater than we
STUDY OUTCOME AND PUBLICATION STATUS
Below is the 2x2 table corresponding to the risk ratio (RR) presented in the print version of
the article:
nonpositive Total
(CI95% 6.2 - 22.0)
To check our analyses of categorical data for robustness, we excluded questionable studies. FDA-positive studies were approximately 8 times more likely to be published in a way that
agreed with the FDA than FDA-negative studies:
(CI95% 4.3 – 14.1)
SELECTIVE PUBLICATION OF ANTIDEPRESSANT TRIALS AND ITS INFLUENCE ON APPARENT EFFICACY
Additional y, we ignored whether the publications agreed or conflicted with the FDA and
simply compared published to unpublished studies. FDA-positive studies were approximately 3 times more likely to be published than FDA-negative studies:
(CI95% 2.0 – 4.3)
NUMBER OF PATIENT PARTICIPANTS IN STUDIES
Below is the 2x2 table corresponding to the risk ratio (RR) presented in the print version of
this article:
nonpositive Total
(CI95% 25.6 – 28.8)
LISTING OF DATA USED IN ANALYSES (APPENDIX TABLES A, C)
Appendix Table A lists the raw data used in the analyses presented here. Each FDA-
registered study is shown alongside the corresponding stand-alone journal publication (with
reference information), unless the study in question was not published. The calculated effect
size and standard error values are shown by study and data source in Appendix Table C.
QUALITATIVE DESCRIPTION OF SELECTIVE REPORTING WITHIN
T R I A L S ( A P P E ND I X T A B L E B)
• Of the 11 publications listed in Appendix Table B (also shown with gray shading in
Appendix Table A), 7 highlighted results that did not appear in FDA reviews as either
primary or secondary endpoints, suggesting that these analyses were conducted
post hoc. While the FDA reviews and the journal articles agreed as to the primary rating scale, they
differed in how the data derived from these scales were analyzed.
• Among the ways in which the methodology of the journal articles differed from that of
the FDA, there were differences as to which data were included in and excluded from the analyses. For example, at the patient level, there were deviations from the intent-to-
SELECTIVE PUBLICATION OF ANTIDEPRESSANT TRIALS AND ITS INFLUENCE ON APPARENT EFFICACY
treat (ITT) principle,7, 8 specifical y by invoking an "efficacy subset" of the ITT
population meeting additional criteria or by using an observed cases approach, which omits data from patients who drop out due to lack of efficacy or adverse events. At the
site level, there were two journal articles that presented positive data from single sites
within multicenter studies, whose overal results, according to the FDA, were
nonsignificant. (See footnotes to Appendix Table B for details.)
C O M P A R I S O N S O F E F F E C T S I Z E ( A P P E N D I X T A B LE D)
The results of the analyses comparing the sets of effect size values are listed in
Supplementary Appendix Table D. These are the same as those mentioned in the text of results section of the main article. However, the table additional y includes the result of the
ancil ary analysis described in the supplementary methods. Like the others, this result was
statistical y significant (P=0.003).
• Regarding the method of handling dropouts, the Committee for Proprietary Medicinal
Products (CPMP) of the European Agency for the Evaluation of Medicinal Products
(EMEA) states, "There is no universal y applicable method of handling missing values, and different approaches may lead to different results. As such it is essential to pre-
specify the selected methods in the statistical section of the study protocol."9
RE FE R EN CES
1.
Food and Drug Administration, Center for Drug Evaluation and Research. Guidance for
Industry: Providing Clinical Evidence of Effectiveness for Human Drug and Biological Products; 1998:6, 9. Available at: http://www.fda.gov/cder/guidance/1397fnl.pdf
FOIA Update. The Freedom of Information Act 5 USC 552: Electronic Freedom of
Information Act Amendments of 1996
. Vol XVII. Available at: http://www.usdoj.gov/oip/foia_updates/Vol_XVII_4/page2.htm
Temple R. Government viewpoint of clinical trials of cardiovascular drugs.
Med Clin North
Am. Mar 1989;73(2):495-509.
Temple R, El enberg SS. Placebo-controlled trials and active-control trials in the evaluation
of new treatments. Part 1: ethical and scientific issues.
Ann Intern Med. Sep 19 2000;133(6):455-463.
Otto MW, Nierenberg AA. Assay sensitivity, failed clinical trials, and the conduct of science.
Psychother Psychosom. 2002;71(5):241-243.
Cohn CK, Robinson DS, Roberts DL, Schwiderski UE, O'Brien K, Ieni JR. Responders to
antidepressant drug treatment: a study comparing nefazodone, imipramine, and placebo in patients with major depression.
J Clin Psychiatry. 1996;57 Suppl 2:15-18.
International Conference on Harmonisation (ICH), European Medicines Agency (EMEA).
Statistical Principles for Clinical Trials. Topic E9. 1998. Available at: http://www.fda.gov/cder/guidance/iche3.pdf
Lachin JM. Statistical considerations in the intent-to-treat principle.
Control Clin Trials. Jun
2000;21(3):167-189.
SELECTIVE PUBLICATION OF ANTIDEPRESSANT TRIALS AND ITS INFLUENCE ON APPARENT EFFICACY
Committee for Proprietary Medicinal Products (CPMP). Evaluation of Medicines for
Human Use: Points to Consider on Missing Data: European Agency for the Evaluation of Medicinal Products (EMEA); 2001. Available at:
Appendix Table A. Antidepressant study results as analyzed by FDA
and as presented in corresponding journal publications.
Acta Psychiatr Scand
Int Clin Psychopharm
J Clin Psychopharm
Eur Neuropsychopharm
≤.001 (.00001)M
<.001 (.001)T,J
<.001 (.001)T,F Detke
Int Clin Psychopharm
Int Clin Psychopharm
<.0001 (.0001)T
NS (.329)M
≤.001 (.0008)F Claghorn
Human Psychopharm
Human Psychopharm
NS (.136)M
Eur Neuropsychopharm
.01<p≤.05 (.03)F Fontaine
Acta Psychiatr Scand
Int Clin Psychopharm
Acta Psychiatr Scand
NS (.311)F
NS (.34)N
NS (.34)N
NS (.34)N
NS (.34)N
NS (.34)N
NS (.11)J
Human Psychopharm
NS (.34)N
≤.001 (.0002)M Reihmerr
NS (.35)M
NS (.87-)M
NS (.21)M
NS (.64)M
NS (.34)N
NS (.34)N
NS (.34)N
NS (.34)N
<.001 (.00009)M Guelfi
<.001 (.0004)J
<.05 (.0004)M Schweizer
J Clin Psychopharm
<.001 (.001)T,J
<.001 (.001)T,F Cunningham
Ann Clin Psychiat
<.001 (.0003)J
<.001 (.0003)M Thase
Numbers in second column correspond to those shown in Figure 2a.
Dark blue pattern: unpublished studies. (Colors are same as in Figure 2.)
Medium blue pattern: studies deemed negative or questionable by FDA but presented as positive in journal articles. Details provided in Appendix Table B.
Light blue: studies published in agreement with FDA conclusion.
All P values are two-tailed unless noted otherwise
Parentheses indicate precise P value used in analysis; please see methods.
Bold numbers: statistically nonsignificant results
Italics: "failed" studies (see text)
(-) following p value indicates study drug performed worse than placebo.
PMID = PubMed ID Number
F FDA precise P value used as sponsor precise P value
J Journal precise P value used as FDA precise P value
M Mean used with SD, SE, or CI to calculate precise P value
R Responders counts in article used to replicate sponsor's analysis and obtain precise P value
T Top of reported P value range used as precise P value
N Nonsignificant; precise P value derived as detailed in supplementary methods
Appendix Table B. How apparent primary methods in journal articles differed from protocol-prespecified primary methods per FDA.
Journal data vs. FDA data
Patients included / excluded
Data analytic details
Article's apparent
Presented "Efficacy
"primary"
primary reported in
data from subset" of
Baseline Contin- Test other
reported in article
Drug name NumberA
differences uous
(mg/d) lighted in
site from patients
accounted rendered
CN104-006 100-600
Footnotes to Appendix Table B
article. The review's conclusion noted "the disappointing results
Number used as column header in this table also used in Figure
from study 86141".
2a and Appendix Tables A and C.
The introduction of the article states, "Hereby presented results
Although the two higher dose groups were nonsignificant on
are obtained in one of three participating centres; the results of
the FDA's usual primary outcome (endpoint), the Agency seemed
the multicentre study are going to be presented elsewhere."
to consider this study weakly supportive of the efficacy of
However, we were unable to find any other publications with
sertraline. "For the HAMD total score and CGI (Clinical Global
mirtazapine in the title and with the principal investigator (PI ) of
Impression) items, the LOCF [last observation carried forward]
this multicenter study as an author. The FDA review does not
analyses showed only scattered significant comparisons. The OC
present results separately by site, only the results of the entire
[observed cases] analyses for the same items were more
study, ie. with data from al sites combined.
consistently significant for the higher sertraline doses (ie., not for
the 50-mg form). This study did not reveal any evidence for a
G Article does not mention the results of the two studies, 448 and
dose response relationship. Nevertheless, this study provides
449, individually. See footnote V for details.
evidence for the efficacy of sertraline." Elsewhere, in a separate
review document by the same FDA statistician, "Study 103 does
The article's results section, after reporting significant results
not provide, in itself, compel ing evidence of the efficacy of
with observed cases, provides raw scores for LOCF but does not
sertraline. " We therefore classified this study as questionable.
state whether they were significant or nonsignificant.
The P value for low-dose group was not reported in the journal
The journal article noted a lack of significance using ANCOVA
article. Lack of significance on this is evident only from noting
and the LOCF method of handling dropouts, but it did not
that there is no footnoted P value for that dose group in Figure
report the P value from that analysis.
1a. FDA review shows that, in addition to the LOCF analysis
J Results section states, "In the intention-to-treat analyses (al
being nonsignificant, the observed cases analysis-presented as
significant in journal article-was also nonsignificant (P=.381) for
patients). . In the sertraline 100 mg/day group, significant
this dose group.
improvement was noted in al parameters except the HAMD
Total and CGI Severity scores. in the 200 mg/day group,
D (-) sign indicates that study drug performed numerical y worse
significant improvement was observed in al but the HAMD
Total and CGI Improvement scores." No P values were
provided, and the fact that the HAMD Total was the primary
E Methods section states, "The intention-to-treat and endpoint
efficacy measure was not reported.
[LOCF] analyses, which are not reported here, yielded results
similar to those of the efficacy analysis." The result reported first
Results section states, "The primary efficacy analysis was
by the FDA, involving the HAMD and the LOCF method of
carried out on a last observation carried forward basis." Other
handling dropouts (P=.316), was not reported in the journal
details were unclear. Please see related footnotes for this study.
Footnotes to Appendix Table B
L The word "primary" was used in the article, but only in
article how dropouts were handled in this analysis. It appears that
reference to the rating scale, not to the statistical analytic
LOCF approach was not used (footnote E), suggesting that an
observed cases (OC) approach was used. However, the results
highlighted in the journal article do not agree with the OC
M Methods section of article states, "The a priori primary
analysis conducted by FDA. The results section of the journal
treatment comparison was the contrast between duloxetine 120
article states, "The mean total scores decreased at week 2 and
mg/day and placebo on change from baseline at week 8 (visit 8)
onwards in both the citalopram and placebo groups (P<.05)."
on the HAMD17 total score. Longitudinal efficacy outcomes
The FDA's OC analysis found that the difference between
were primarily analyzed using a likelihood-based mixed-effects
citalopram and placebo was nonsignificant at weeks 2 and 4
repeated measures approach (MMRM)." However, see footnote
(P=.374 and .709, respectively) and became significant only at
Methods section states, "Primary efficacy outcome measures
The FDA did not conduct any analyses using the "efficacy
specified in the study protocol included the HAMD-17 and the
subset" definition put forth in journal article. (See also footnote
Clinical Global Impressions-Improvement (CGI-I) scale ratings."
It did not state how the protocol said that the data obtained from
these scales should be analyzed. Results section states, "The
The result obtained using mixed model repeated measures
primary efficacy measure
used in these analyses is the number of
(MMRM) analysis was not reported in the FDA's review of this
depressed patients classified as treatment responders." [italics
study, which only reported the result using the LOCF analysis.
added for emphasis] Thus it was unclear from article's wording
(See also footnote BB.)
alone whether the analyses reported were the same as or different
T The FDA review states, "The results for the weekly [OC]
from those prespecified in protocol. (FDA reported
analysis were inconsistent. For some variables, fluoxetine was
nonsignificant result according to primary method, which did not
significantly better than placebo for some variables [sic] toward
involve treatment responders.)
the latter weeks while placebo was occasional y nonsignificantly
O "Primary" used in reference to rating scale but not in reference
better than at earlier weeks." The FDA review did not report the
to details of statistical analysis.
P value from this analysis, which was the one highlighted in the journal article. (See footnote DD.)
P FDA review shows "al efficacy patients" analysis using LOCF
technique with P=.04, consistent with the P<.05 result reported
The "evaluable patients" analysis reported on in journal article
in the journal article. However, this gave different results from
(see footnote AA) was not reported on in FDA review.
the P=.25 obtained using the primary intention-to-treat (ITT)
V Two similarly designed 20-site studies in the US and Canada,
analysis in the FDA review (see footnote Z).
were pooled into a single analysis in the journal article, with an
Q "Efficacy analysis" result reported on in the journal article
overal positive result. As shown in Appendix Table A, the FDA
could not be found in the FDA review. It was unclear from the
Footnotes to Appendix Table B
review indicates that one of these two studies was negative
AA Article states, "Evaluable patients were classified as those who
(P=.254). See also footnote LL.
took study medication on or after the 11th day of the double-
W This study involved two centers, both in Houston, Texas.
blind phase, who had efficacy assessments on or after study day
Center 1 did not show a significant difference for nefazodone
11, and who were not major protocol violators." Analyses based
(P=.97) or for imipramine (P=.53) vs. placebo. Center 2 showed
on "evaluable patients" are shown in Figure 1 and Table 1, while
a trend (P=.09) when analyzed according to protocol (ANOVA
those based on ITT patients appear later in Table 5.
on change-from-baseline scores). The FDA reviewer argued that
BB The FDA statistical review of duloxetine states that sponsor
this study could be considered positive by ignoring Center 1 (as a
proposed MMRM as the primary analytic method in the protocol
failed sub-study) and by considering that the secondary outcomes
it submitted. However, the review also states, "Based on the letter
provided "strong supportive evidence in favor of nefazodone
issued by the Agency to the sponsor (dated January 11, 2002),
with both the LOCF and OC results in agreement." The Division
this reviewer considered the LOCF analysis as the primary
Director's memo stated agreement with this. The PI of Center 2
statistical analysis to evaluate the efficacy of duloxetine." (We did
authored the paper shown in Appendix A. Its highlighted analysis
not find this letter among the FDA review documents.)
was based on percentage of responders, different from the prespecified method noted above, and gave a significant result.
CC Same as for BB above.
The other center was not mentioned.
DD An observed cases (completers analysis) was apparently used.
X The FDA's LOCF analysis included 147 patients, while the
Though not explicit in the text, Table 1 shows that the
article's "efficacy evaluation group" included 133.
highlighted P value is at Week 5, where the Ns show significant
attrition compared to the Ns at earlier weeks. Also, the
Post-baseline depression ratings were conducted at Week 1 and
Discussion section states, "Because of the slow onset of action,
Week 3 according to both FDA review and journal article. FDA
endpoint [LOCF] analyses were considered not appropriate for
review reports definition of ITT was those with any postbaseline
this data set."
data, ie. those returning at Week 1. Journal defined "evaluable
patients" as those with data recorded after at least 2 weeks of
EE Result highlighted in Figure 1 was obtained with random-
data. But since the next visit was at Week 3, this analysis required
effects mixed model (REMM) analysis.
patients to have 3 weeks of data. Thus, patients dropping out
between the Week 1 and Week 3 visits were excluded in the
Efficacy findings presented first obtained using observed cases
journal article but included in the FDA analysis.
(ie. completers only) analysis. Mean scores obtained using the
LOCF method were reported but without associated P values or
Z According to the FDA review, the ITT Ns were 40 and 40 for
mention of their lack of significance. However, the LOCF scores
paroxetine and placebo groups, respectively, vs. 35 and 36 for the
were used in a "trend analysis" reported (as significant)
"all efficacy" analysis. The FDA review also states, "The
subsequently in article.
discrepancy between the ITT and the Al Efficacy data derives
from the 5 paroxetine and 4 placebo patients excluded."
Footnotes to Appendix Table B
GG t-test was noted as the analytic method in the FDA review, so
of unblinding at that site. An audit by the sponsor found that the
we assume that this was according to protocol. This is typical y
site had provided patients with a copy of the scale during the
not the case, because it cannot account for possible effects of site
rating, contrary to instructions given at the investigator meeting.
or of treatment-by-site interaction.
Including this site, P=.021; excluding this outlier site, P=.254. In
the analysis presented in the journal article, data from this site
HH The article presents results of a nonparametric two-sample
Wilcoxon rank sum test. The FDA review was not clear on what
test was used to analyze the data, but because it presented P values alongside mean change scores, it appears that the FDA
used a parametric test (e.g. ANOVA, ANCOVA, or t-test). II Though this result did not meet our criteria for apparent primary result, a positive dose-response effect was presented in
the abstract, in the text of results section, and in Table 2. This finding, obtained using the LOCF method of handling dropouts,
was contradicted by the relative performance of the doses shown
in Figure 1, which showed results obtained using the observed
cases approach (see footnote FF). This figure showed that the 50-75mg dose was outperformed on both the HAMD and MADRS
scales by the 25-mg dose (a lower dose not FDA-approved as
effective). JJ According to the journal article, the highlighted result of P<.10 one-tailed (equivalent to two-tailed P<.20) "should be interpreted in the light of the smal sample size available." For comparison,
the usual criterion for statistical significance is two-tailed P<.05
(equivalent to one-tailed P<.025). KK The Ns shown in the journal article were larger than the Ns shown in the FDA review and in a report of the same study on the sponsor's website, www.lil ytrials.com. The reason for this
discrepancy is unclear. LL The FDA analysis excluded Center 2/4 which had a remarkably high (compared to the other 19 sites) drug-placebo
difference. The FDA reviewer stated that this raised the question
Appendix Table C. Effect size and standard error for individual FDA-registered
studies according to the FDA and to journal publications.
Drug_name Number Study Number
g_journal SE_journal
-0.2082325 0.3126727
-0.2747475 0.2177361
Drug_name Number Study Number
g_journal SE_journal
59 & 60 448 & 449
Number column: Studies are listed in same order as in Appendix Table A. Period indicates missing data due to study not being published as stand-alone article. g: Hedges' g SE: standard errorpub: published version
Appendix Table D. Comparisons of Effect Size
Results of Contrast
Unpublished Published
Within published Studies
journal results b
a compare with Figure 3, Panel Ab compare with Figure 3, Panel Bc ancillary analysis with imputation -- please see supplementary methodsd signed-rank teste rank-sum test
N = number of studies or drugs analyzedg = Hedges's g (effect size), unweighted means = standard deviation around mean of gg
Source: http://www.papelesdesociedad.info/IMG/pdf/appendix.pdf
VESTIBULAR MEIO DE ANO 2008 PROVA DE CONHECIMENTOS GERAIS CADERNO DE QUESTÕES 1. Conferir seu nome, número de inscrição e número da carteira na capa deste 2. Esta prova contém 84 questões e terá duração de 4 horas. 3. Para cada questão, existe somente uma alternativa correta. Anotar na tabela ao lado a alternativa que julgar certa. 4. Depois de assinaladas todas as respostas, transcrevê-las para a folha defi nitiva de
Baloney Detection An Essay by Carl Sagan The Fine Art of Baloney Detection The Fine Art of Baloney Detection The human understanding is no dry light, but receives an infusion from the will and affections; whence proceed sciences which may be called "sciences as one would." For what a man had rather were true he more readily believes. Therefore he rejects difficult things from impatience of research;