Microsoft word - web supplement sans line numbers reformat citations.doc

Supplementary Appendix This appendix has been provided by the authors to give readers additional information about their work.
Supplement to: Turner EH, Matthews AM, Linardatos E, et al. Selective publication of antidepressant trials and its influence on apparent efficacy. N Engl J Med 2008;358:252-60.
SELECTIVE PUBLICATION OF ANTIDEPRESSANT TRIALS AND ITS INFLUENCE ON APPARENT EFFICACY SUPPLEMENTARY APPENDIX This document provides details that supplement the material in the body of the article but could not be included due to space constraints. The headings used below mostly fol ow those in the main article. • In order to approve a drug, the FDA requires "substantial evidence" of effectiveness from two positive "adequate and wel control ed studies".1 The language that the FDA uses in labeling reflects this focus on positive trials. For example, the labeling for sertraline states, "The efficacy of Zoloft as a treatment for major depressive disorder was established in two placebo-control ed studies in adult outpatients meeting DSM-III criteria for major depressive disorder." DATA FROM FDA REVIEWS
Data procurement – FDA reviews
• Review documents for many approved drug-indication combinations have been electronical y available in the public domain since enactment of the Electronic Freedom of Information Act Amendments of 1996 (eFOIA).2 Reviews for drugs approved before that are not available electronical y. For reasons that are unclear, at the FDA has not, as of this writing, posted the reviews of three antidepressants approved after eFOIA was enacted: bupropion extended-release (Wel butrin XL®), mirtazapine oral y disintegrating (Remeron Soltabs®), and transdermal selegiline (EMSAM®). • Fluoxetine studies 62-a and 62-b are referred to in the FDA review with a single study number (62). However, the review indicates that these were identical y designed but that they were separate and nonoverlapping studies. The first study involved patients with mild depression (but who stil met DSM-criteria for major depression), while the other involved patients with moderate depression. Data extraction – FDA reviews
• The FDA uses statistical superiority (P<.05) to a comparator, usual y placebo, to determine whether the study is positive3 (aka "a win"). Studies are pooled or meta- analyzed only if that is specified in the original protocol and agreed to in advance by the • Failed versus negative studies: Active comparator treatment arms are sometimes included in study designs along with study drug and placebo. Because active comparators are approved antidepressants, they are expected to beat (demonstrate statistical superiority to) placebo. When that does not happen and the study drug also does not beat placebo, the FDA "excuses" the study drug for not beating placebo and deems the study inconclusive or "failed".4 On the other hand, when an active comparator does beat SELECTIVE PUBLICATION OF ANTIDEPRESSANT TRIALS AND ITS INFLUENCE ON APPARENT EFFICACY placebo (as expected) but the study drug does not, the study is judged negative. When the sponsor elects to omit an active comparator from the design and the study drug does not beat placebo, the study is also deemed negative.1 However, the validity of distinguishing between negative and failed studies has been questioned.5 • If the reviewer's overal judgment regarding a study's outcome was not clearly stated, we used that of the team leader or division director. • Double data extraction and entry – FDA reviews: This was performed first by ET, AM, and EL. A second extraction and entry of the FDA data was performed by RT and SR, who were blind to the results of the first extraction/entry process. The values obtained in the second process were compared to those obtained in the first. Any discrepancies were resolved by consensus. DATA FROM JOURNAL ARTICLES
Data procurement – journal articles
Literature search: Literature searches were original y conducted by ET, AM, and EL.
Subsequently, an academic reference librarian (AH), who was blind to the results of the original searches, conducted independent searches of Ovid Medline and Cochrane Central Register of Control ed Trials, which identified no new journal articles. A repeat search of reference lists revealed one additional article not indexed in the searched • We did not consider forms of data disclosure such as conference proceedings (including published conference abstracts), clinical trial registries, book chapters, or newspaper articles. • We al owed one exception to our rule of including only stand-alone publications of single studies, a paper pooling the results from two identical y designed studies of paroxetine control ed-release (Appendix Table A). • Matching of studies in FDA reviews to journal articles was completed initial y by ET, AM, and EL. It was later repeated by RT, who was blind to the results of the original matching process. Any discrepancies between the first and second matching procedures were resolved by consensus. Data extraction – journal articles
• Double data extraction and entry: Journal data extraction and entry was initial y performed by ET. Subsequently, we provided the matched journal articles (clean 1 Unwritten policy learned during the first author's tenure as a reviewer in the division of the FDA handling approval of psychotropic drugs, stil applicable according to communications with current employees and apparent from review documents. This may not apply to other review divisions within the FDA. SELECTIVE PUBLICATION OF ANTIDEPRESSANT TRIALS AND ITS INFLUENCE ON APPARENT EFFICACY unmarked copies) to TB and NM, who were blind to the results of the original extraction/entry process. They extracted the results on the apparent primary endpoints, which were entered by NM and RT. The results of this second extraction/entry process were compared with those from the first process. Discrepancies were resolved by STATISTICAL ANALYSIS
Continuous data (effect size)
• Using effect size permitted us to combine data from trials using different primary rating scales in a standardized way. • The meta-analyses described were performed twice, once based on the first data extraction/entry process and again based on the second such process (see above). The overal mean weighted ES values (Figure 3) resulting from the two sets of meta-analyses were within ±.01 of one another. The percentage difference, between the overal weighted mean ES value obtained from the FDA data and the value obtained from the journal data, was unchanged. • For purposes of the paired (signed-rank) analyses of the g values, we conducted a preliminary "mini-meta-analysis" of the FDA data for paroxetine studies 448 and 449 (Appendix Table A). This provided a single FDA result, which we compared with the single pooled result reported in the journal article. • As stated in the Methods section of this article, within the FDA dataset, we calculated a mean g value for each drug's published studies and a mean g value for each drug's unpublished studies. However, 2 of the 12 drugs had no unpublished studies; thus a value for gunpublished could not be calculated for these two drugs. We first analyzed the 10 complete pairs of gpublished and gunpublished values. We followed this with an ancil ary analysis, using 12 pairs by imputing gunpublished=gpublished for these two drugs, assuming the nul hypothesis held. Handling of imprecise P values
• Precise P values were not always available for the calculation of effect size. In cases where they were not available, we looked for data from which to calculate precise P values. We found means and standard deviations (or standard errors or confidence intervals) in the FDA reviews for 9 treatment arms within 6 studies and for 17 treatment arms within 10 journal articles. These instances are shown in Appendix Table A with the superscript "M". For one article6 whose apparent primary endpoint was based on the proportion of responders, we calculated a precise P value by using the reported number of responders in a replication of the authors' chi-squared test. This instance is shown in Appendix Table A with the superscript "R". • If no such data were available, we set the P value equal to the precise P value obtained from the other data source. The purpose of this approach was to create a conservative bias in the direction of the nul hypothesis, i.e. of not finding a difference between FDA and journal results reporting. As an example, if Psponsor was stated as <.05, but PFDA was SELECTIVE PUBLICATION OF ANTIDEPRESSANT TRIALS AND ITS INFLUENCE ON APPARENT EFFICACY given as .03, we set Psponsor=.03 and used that to calculate gsponsor. Likewise, if Psponsor was reported as "NS" while PFDA was .24, we set Psponsor=.24, also. Each such instance is shown in Appendix Table A with the superscript "F" or "J". • If the precise P value did not lie within the P value range provided by the other source, we set the P value equal to the top of that range, thus bringing the FDA and journal P values as close to equality as reasonably possible. For example, in the case of Psponsor<.10 but PFDA=.50, we set Psponsor=.10. These instances are shown in Appendix A with the superscript "T". • Additional y we set three pairs of matching P value ranges equal to the values at the top of their respective ranges. These are shown in Appendix Table A with the superscripts "T" and "J" on the FDA side and "T" and "F" on the journal side. (In one of these cases, the FDA review reported a P value of ".00", which we interpreted as P<.01.) • Nonsignificant P values: The FDA reviews reported nonsignificant results for 56 treatment arms. From these we obtained 45 precise P values. (Precise values were reported for 41 nonsignificant P values, and we calculated 4 others from data provided in the reviews using the method noted above.) The FDA reported on an additional 11 P values as "NS" but no other data from which we could calculate precise P values. One of these we set equal to the precise (nonsignificant) P value obtained from the corresponding journal publication, fol owing the procedure mentioned above. For the remaining 10 instances (18% of the total number of nonsignificant results), there was no corresponding journal publication. To exclude these nonsignificant P values from the analyses would have created a bias in the proportion of studies found nonsignificant, and it would also have biased the estimates of effect size. Instead, we assigned a precise P value for these 10 nonrecorded nonsignificant P values by transforming the above- mentioned 45 precise P values into their standard normal deviates Z, finding their median (.954), and transforming that back into P =.34. (The upper and lower quantile Z values were 1.254 and 0.628, corresponding to P values of .21 and .53, respectively.) Thus we used P=.34, derived from 45 precise nonsignificant P values, as the precise value for the 10 P values reported only as "NS". These instances are shown in Appendix Table A with the superscript "N". SELECTIVE PUBLICATION OF ANTIDEPRESSANT TRIALS AND ITS INFLUENCE ON APPARENT EFFICACY • Studies of escitalopram and some citalopram studies used the Montgomery-Åsberg Depression Rating Scale (MADRS). Al other studies employed the Hamilton Depression Rating Scale, usual y the 17-item version. • For antidepressant studies, the FDA defines intent-to-treat (ITT) patients as al randomized patients who return for at least one on-drug post-baseline visit. • In enumerating the numbers of patients within the studies, we included only those patients whose data we used in the other analyses. Thus we excluded patients randomized to active comparator treatment arms and to doses that were eventual y not approved. Therefore the actual numbers of patients participating were greater than we STUDY OUTCOME AND PUBLICATION STATUS
Below is the 2x2 table corresponding to the risk ratio (RR) presented in the print version of
the article:
nonpositive Total (CI95% 6.2 - 22.0) To check our analyses of categorical data for robustness, we excluded questionable studies. FDA-positive studies were approximately 8 times more likely to be published in a way that agreed with the FDA than FDA-negative studies: (CI95% 4.3 – 14.1) SELECTIVE PUBLICATION OF ANTIDEPRESSANT TRIALS AND ITS INFLUENCE ON APPARENT EFFICACY Additional y, we ignored whether the publications agreed or conflicted with the FDA and simply compared published to unpublished studies. FDA-positive studies were approximately 3 times more likely to be published than FDA-negative studies: (CI95% 2.0 – 4.3) NUMBER OF PATIENT PARTICIPANTS IN STUDIES
Below is the 2x2 table corresponding to the risk ratio (RR) presented in the print version of
this article:
nonpositive Total (CI95% 25.6 – 28.8) LISTING OF DATA USED IN ANALYSES (APPENDIX TABLES A, C)
Appendix Table A lists the raw data used in the analyses presented here. Each FDA-
registered study is shown alongside the corresponding stand-alone journal publication (with
reference information), unless the study in question was not published. The calculated effect size and standard error values are shown by study and data source in Appendix Table C. QUALITATIVE DESCRIPTION OF SELECTIVE REPORTING WITHIN
( A P P E ND I X T A B L E B)
• Of the 11 publications listed in Appendix Table B (also shown with gray shading in Appendix Table A), 7 highlighted results that did not appear in FDA reviews as either primary or secondary endpoints, suggesting that these analyses were conducted post hoc. While the FDA reviews and the journal articles agreed as to the primary rating scale, they differed in how the data derived from these scales were analyzed. • Among the ways in which the methodology of the journal articles differed from that of the FDA, there were differences as to which data were included in and excluded from the analyses. For example, at the patient level, there were deviations from the intent-to- SELECTIVE PUBLICATION OF ANTIDEPRESSANT TRIALS AND ITS INFLUENCE ON APPARENT EFFICACY treat (ITT) principle,7, 8 specifical y by invoking an "efficacy subset" of the ITT population meeting additional criteria or by using an observed cases approach, which omits data from patients who drop out due to lack of efficacy or adverse events. At the site level, there were two journal articles that presented positive data from single sites within multicenter studies, whose overal results, according to the FDA, were nonsignificant. (See footnotes to Appendix Table B for details.) C O M P A R I S O N S O F E F F E C T S I Z E ( A P P E N D I X T A B LE D)
The results of the analyses comparing the sets of effect size values are listed in
Supplementary Appendix Table D. These are the same as those mentioned in the text of results section of the main article. However, the table additional y includes the result of the ancil ary analysis described in the supplementary methods. Like the others, this result was statistical y significant (P=0.003). • Regarding the method of handling dropouts, the Committee for Proprietary Medicinal Products (CPMP) of the European Agency for the Evaluation of Medicinal Products (EMEA) states, "There is no universal y applicable method of handling missing values, and different approaches may lead to different results. As such it is essential to pre- specify the selected methods in the statistical section of the study protocol."9 RE FE R EN CES
Food and Drug Administration, Center for Drug Evaluation and Research. Guidance for Industry: Providing Clinical Evidence of Effectiveness for Human Drug and Biological Products; 1998:6, 9. Available at: FOIA Update. The Freedom of Information Act 5 USC 552: Electronic Freedom of Information Act Amendments of 1996. Vol XVII. Available at: Temple R. Government viewpoint of clinical trials of cardiovascular drugs. Med Clin North Am. Mar 1989;73(2):495-509. Temple R, El enberg SS. Placebo-controlled trials and active-control trials in the evaluation of new treatments. Part 1: ethical and scientific issues. Ann Intern Med. Sep 19 2000;133(6):455-463. Otto MW, Nierenberg AA. Assay sensitivity, failed clinical trials, and the conduct of science. Psychother Psychosom. 2002;71(5):241-243. Cohn CK, Robinson DS, Roberts DL, Schwiderski UE, O'Brien K, Ieni JR. Responders to antidepressant drug treatment: a study comparing nefazodone, imipramine, and placebo in patients with major depression. J Clin Psychiatry. 1996;57 Suppl 2:15-18. International Conference on Harmonisation (ICH), European Medicines Agency (EMEA). Statistical Principles for Clinical Trials. Topic E9. 1998. Available at: Lachin JM. Statistical considerations in the intent-to-treat principle. Control Clin Trials. Jun 2000;21(3):167-189. SELECTIVE PUBLICATION OF ANTIDEPRESSANT TRIALS AND ITS INFLUENCE ON APPARENT EFFICACY Committee for Proprietary Medicinal Products (CPMP). Evaluation of Medicines for Human Use: Points to Consider on Missing Data: European Agency for the Evaluation of Medicinal Products (EMEA); 2001. Available at: Appendix Table A. Antidepressant study results as analyzed by FDA
and as presented in corresponding journal publications.
Acta Psychiatr Scand Int Clin Psychopharm J Clin Psychopharm Eur Neuropsychopharm ≤.001 (.00001)M <.001 (.001)T,J <.001 (.001)T,F Detke Int Clin Psychopharm Int Clin Psychopharm <.0001 (.0001)T NS (.329)M
≤.001 (.0008)F Claghorn Human Psychopharm Human Psychopharm NS (.136)M
Eur Neuropsychopharm .01<p≤.05 (.03)F Fontaine Acta Psychiatr Scand Int Clin Psychopharm Acta Psychiatr Scand NS (.311)F
NS (.34)N
NS (.34)N
NS (.34)N
NS (.34)N
NS (.34)N
NS (.11)J
Human Psychopharm NS (.34)N
≤.001 (.0002)M Reihmerr NS (.35)M
NS (.87-)M
NS (.21)M
NS (.64)M
NS (.34)N
NS (.34)N
NS (.34)N
NS (.34)N
<.001 (.00009)M Guelfi <.001 (.0004)J <.05 (.0004)M Schweizer J Clin Psychopharm <.001 (.001)T,J <.001 (.001)T,F Cunningham Ann Clin Psychiat <.001 (.0003)J <.001 (.0003)M Thase Numbers in second column correspond to those shown in Figure 2a.
Dark blue pattern: unpublished studies. (Colors are same as in Figure 2.)
Medium blue pattern: studies deemed negative or questionable by FDA but presented as positive in journal articles. Details provided in Appendix Table B.
Light blue: studies published in agreement with FDA conclusion.
All P values are two-tailed unless noted otherwise
Parentheses indicate precise P value used in analysis; please see methods.
Bold numbers: statistically nonsignificant results
Italics: "failed" studies (see text)
(-) following p value indicates study drug performed worse than placebo.
PMID = PubMed ID Number
F FDA precise P value used as sponsor precise P value
J Journal precise P value used as FDA precise P value
M Mean used with SD, SE, or CI to calculate precise P value
R Responders counts in article used to replicate sponsor's analysis and obtain precise P value
T Top of reported P value range used as precise P value
N Nonsignificant; precise P value derived as detailed in supplementary methods
Appendix Table B. How apparent primary methods in journal articles differed from protocol-prespecified primary methods per FDA.
Journal data vs. FDA data Patients included / excluded Data analytic details Article's apparent Presented "Efficacy "primary" primary reported in data from subset" of Baseline Contin- Test other reported in article Drug name NumberA differences uous (mg/d) lighted in site from patients accounted rendered CN104-006 100-600 Footnotes to Appendix Table B article. The review's conclusion noted "the disappointing results Number used as column header in this table also used in Figure from study 86141". 2a and Appendix Tables A and C. The introduction of the article states, "Hereby presented results Although the two higher dose groups were nonsignificant on are obtained in one of three participating centres; the results of the FDA's usual primary outcome (endpoint), the Agency seemed the multicentre study are going to be presented elsewhere." to consider this study weakly supportive of the efficacy of However, we were unable to find any other publications with sertraline. "For the HAMD total score and CGI (Clinical Global mirtazapine in the title and with the principal investigator (PI ) of Impression) items, the LOCF [last observation carried forward] this multicenter study as an author. The FDA review does not analyses showed only scattered significant comparisons. The OC present results separately by site, only the results of the entire [observed cases] analyses for the same items were more study, ie. with data from al sites combined. consistently significant for the higher sertraline doses (ie., not for the 50-mg form). This study did not reveal any evidence for a G Article does not mention the results of the two studies, 448 and dose response relationship. Nevertheless, this study provides 449, individually. See footnote V for details. evidence for the efficacy of sertraline." Elsewhere, in a separate review document by the same FDA statistician, "Study 103 does The article's results section, after reporting significant results not provide, in itself, compel ing evidence of the efficacy of with observed cases, provides raw scores for LOCF but does not sertraline. " We therefore classified this study as questionable. state whether they were significant or nonsignificant. The P value for low-dose group was not reported in the journal The journal article noted a lack of significance using ANCOVA article. Lack of significance on this is evident only from noting and the LOCF method of handling dropouts, but it did not that there is no footnoted P value for that dose group in Figure report the P value from that analysis. 1a. FDA review shows that, in addition to the LOCF analysis J Results section states, "In the intention-to-treat analyses (al being nonsignificant, the observed cases analysis-presented as significant in journal article-was also nonsignificant (P=.381) for patients). . In the sertraline 100 mg/day group, significant this dose group. improvement was noted in al parameters except the HAMD Total and CGI Severity scores. in the 200 mg/day group, D (-) sign indicates that study drug performed numerical y worse significant improvement was observed in al but the HAMD Total and CGI Improvement scores." No P values were provided, and the fact that the HAMD Total was the primary E Methods section states, "The intention-to-treat and endpoint efficacy measure was not reported. [LOCF] analyses, which are not reported here, yielded results similar to those of the efficacy analysis." The result reported first Results section states, "The primary efficacy analysis was by the FDA, involving the HAMD and the LOCF method of carried out on a last observation carried forward basis." Other handling dropouts (P=.316), was not reported in the journal details were unclear. Please see related footnotes for this study. Footnotes to Appendix Table B L The word "primary" was used in the article, but only in article how dropouts were handled in this analysis. It appears that reference to the rating scale, not to the statistical analytic LOCF approach was not used (footnote E), suggesting that an observed cases (OC) approach was used. However, the results highlighted in the journal article do not agree with the OC M Methods section of article states, "The a priori primary analysis conducted by FDA. The results section of the journal treatment comparison was the contrast between duloxetine 120 article states, "The mean total scores decreased at week 2 and mg/day and placebo on change from baseline at week 8 (visit 8) onwards in both the citalopram and placebo groups (P<.05)." on the HAMD17 total score. Longitudinal efficacy outcomes The FDA's OC analysis found that the difference between were primarily analyzed using a likelihood-based mixed-effects citalopram and placebo was nonsignificant at weeks 2 and 4 repeated measures approach (MMRM)." However, see footnote (P=.374 and .709, respectively) and became significant only at Methods section states, "Primary efficacy outcome measures The FDA did not conduct any analyses using the "efficacy specified in the study protocol included the HAMD-17 and the subset" definition put forth in journal article. (See also footnote Clinical Global Impressions-Improvement (CGI-I) scale ratings." It did not state how the protocol said that the data obtained from these scales should be analyzed. Results section states, "The The result obtained using mixed model repeated measures primary efficacy measure used in these analyses is the number of (MMRM) analysis was not reported in the FDA's review of this depressed patients classified as treatment responders." [italics study, which only reported the result using the LOCF analysis. added for emphasis] Thus it was unclear from article's wording (See also footnote BB.) alone whether the analyses reported were the same as or different T The FDA review states, "The results for the weekly [OC] from those prespecified in protocol. (FDA reported analysis were inconsistent. For some variables, fluoxetine was nonsignificant result according to primary method, which did not significantly better than placebo for some variables [sic] toward involve treatment responders.) the latter weeks while placebo was occasional y nonsignificantly O "Primary" used in reference to rating scale but not in reference better than at earlier weeks." The FDA review did not report the to details of statistical analysis. P value from this analysis, which was the one highlighted in the journal article. (See footnote DD.) P FDA review shows "al efficacy patients" analysis using LOCF technique with P=.04, consistent with the P<.05 result reported The "evaluable patients" analysis reported on in journal article in the journal article. However, this gave different results from (see footnote AA) was not reported on in FDA review. the P=.25 obtained using the primary intention-to-treat (ITT) V Two similarly designed 20-site studies in the US and Canada, analysis in the FDA review (see footnote Z). were pooled into a single analysis in the journal article, with an Q "Efficacy analysis" result reported on in the journal article overal positive result. As shown in Appendix Table A, the FDA could not be found in the FDA review. It was unclear from the Footnotes to Appendix Table B review indicates that one of these two studies was negative AA Article states, "Evaluable patients were classified as those who (P=.254). See also footnote LL. took study medication on or after the 11th day of the double- W This study involved two centers, both in Houston, Texas. blind phase, who had efficacy assessments on or after study day Center 1 did not show a significant difference for nefazodone 11, and who were not major protocol violators." Analyses based (P=.97) or for imipramine (P=.53) vs. placebo. Center 2 showed on "evaluable patients" are shown in Figure 1 and Table 1, while a trend (P=.09) when analyzed according to protocol (ANOVA those based on ITT patients appear later in Table 5. on change-from-baseline scores). The FDA reviewer argued that BB The FDA statistical review of duloxetine states that sponsor this study could be considered positive by ignoring Center 1 (as a proposed MMRM as the primary analytic method in the protocol failed sub-study) and by considering that the secondary outcomes it submitted. However, the review also states, "Based on the letter provided "strong supportive evidence in favor of nefazodone issued by the Agency to the sponsor (dated January 11, 2002), with both the LOCF and OC results in agreement." The Division this reviewer considered the LOCF analysis as the primary Director's memo stated agreement with this. The PI of Center 2 statistical analysis to evaluate the efficacy of duloxetine." (We did authored the paper shown in Appendix A. Its highlighted analysis not find this letter among the FDA review documents.) was based on percentage of responders, different from the prespecified method noted above, and gave a significant result. CC Same as for BB above. The other center was not mentioned. DD An observed cases (completers analysis) was apparently used. X The FDA's LOCF analysis included 147 patients, while the Though not explicit in the text, Table 1 shows that the article's "efficacy evaluation group" included 133. highlighted P value is at Week 5, where the Ns show significant attrition compared to the Ns at earlier weeks. Also, the Post-baseline depression ratings were conducted at Week 1 and Discussion section states, "Because of the slow onset of action, Week 3 according to both FDA review and journal article. FDA endpoint [LOCF] analyses were considered not appropriate for review reports definition of ITT was those with any postbaseline this data set." data, ie. those returning at Week 1. Journal defined "evaluable patients" as those with data recorded after at least 2 weeks of EE Result highlighted in Figure 1 was obtained with random- data. But since the next visit was at Week 3, this analysis required effects mixed model (REMM) analysis. patients to have 3 weeks of data. Thus, patients dropping out between the Week 1 and Week 3 visits were excluded in the Efficacy findings presented first obtained using observed cases journal article but included in the FDA analysis. (ie. completers only) analysis. Mean scores obtained using the LOCF method were reported but without associated P values or Z According to the FDA review, the ITT Ns were 40 and 40 for mention of their lack of significance. However, the LOCF scores paroxetine and placebo groups, respectively, vs. 35 and 36 for the were used in a "trend analysis" reported (as significant) "all efficacy" analysis. The FDA review also states, "The subsequently in article. discrepancy between the ITT and the Al Efficacy data derives from the 5 paroxetine and 4 placebo patients excluded." Footnotes to Appendix Table B GG t-test was noted as the analytic method in the FDA review, so of unblinding at that site. An audit by the sponsor found that the we assume that this was according to protocol. This is typical y site had provided patients with a copy of the scale during the not the case, because it cannot account for possible effects of site rating, contrary to instructions given at the investigator meeting. or of treatment-by-site interaction. Including this site, P=.021; excluding this outlier site, P=.254. In the analysis presented in the journal article, data from this site HH The article presents results of a nonparametric two-sample Wilcoxon rank sum test. The FDA review was not clear on what test was used to analyze the data, but because it presented P values alongside mean change scores, it appears that the FDA used a parametric test (e.g. ANOVA, ANCOVA, or t-test). II Though this result did not meet our criteria for apparent primary result, a positive dose-response effect was presented in the abstract, in the text of results section, and in Table 2. This finding, obtained using the LOCF method of handling dropouts, was contradicted by the relative performance of the doses shown in Figure 1, which showed results obtained using the observed cases approach (see footnote FF). This figure showed that the 50-75mg dose was outperformed on both the HAMD and MADRS scales by the 25-mg dose (a lower dose not FDA-approved as effective). JJ According to the journal article, the highlighted result of P<.10 one-tailed (equivalent to two-tailed P<.20) "should be interpreted in the light of the smal sample size available." For comparison, the usual criterion for statistical significance is two-tailed P<.05 (equivalent to one-tailed P<.025). KK The Ns shown in the journal article were larger than the Ns shown in the FDA review and in a report of the same study on the sponsor's website, www.lil The reason for this discrepancy is unclear. LL The FDA analysis excluded Center 2/4 which had a remarkably high (compared to the other 19 sites) drug-placebo difference. The FDA reviewer stated that this raised the question Appendix Table C. Effect size and standard error for individual FDA-registered
studies according to the FDA and to journal publications.

Drug_name Number Study Number
g_journal SE_journal
-0.2082325 0.3126727 -0.2747475 0.2177361 Drug_name Number Study Number
g_journal SE_journal
59 & 60 448 & 449 Number column: Studies are listed in same order as in Appendix Table A. Period indicates missing data due to study not being published as stand-alone article. g: Hedges' g SE: standard errorpub: published version Appendix Table D. Comparisons of Effect Size
Results of Contrast Unpublished Published Within published Studies journal results b a compare with Figure 3, Panel Ab compare with Figure 3, Panel Bc ancillary analysis with imputation -- please see supplementary methodsd signed-rank teste rank-sum test N = number of studies or drugs analyzedg = Hedges's g (effect size), unweighted means = standard deviation around mean of gg



VESTIBULAR MEIO DE ANO 2008 PROVA DE CONHECIMENTOS GERAIS CADERNO DE QUESTÕES 1. Conferir seu nome, número de inscrição e número da carteira na capa deste 2. Esta prova contém 84 questões e terá duração de 4 horas. 3. Para cada questão, existe somente uma alternativa correta. Anotar na tabela ao lado a alternativa que julgar certa. 4. Depois de assinaladas todas as respostas, transcrevê-las para a folha defi nitiva de

Baloney Detection An Essay by Carl Sagan The Fine Art of Baloney Detection The Fine Art of Baloney Detection The human understanding is no dry light, but receives an infusion from the will and affections; whence proceed sciences which may be called "sciences as one would." For what a man had rather were true he more readily believes. Therefore he rejects difficult things from impatience of research;