1 a. Reaction to NRC article, April 29, 2014 (JF)
1 b. Reaktion auf den NRC Artikel, der am 29. April 2014 im Niederländischen NRC erschien (JF)
2. Letter by Prof. Dr. Liberman, May 4, 2014
3. Reaction to LOWI Report, May 11, 2014
4. Reaction to Science Article, May 29, 2014
6. Christmas Letter to Friends and Colleagues, December, 24, 2014
7 a. Humboldt-Professorship, April 20, 2015
7 b. Humboldt-Professur, vom 20. April, 2015
8. Reaktion auf UvA-Notiz vom 03. Juni 2015/Reaction to UvA note June 3, 2015
9. Reaction to UvA Report, June 17, 2015
10 a. Erklärung Ehrengerichtsurteil
10 b. Statement Judgement by the Court of Honor
11. Criticism against Peeters, Klaassen and van de Wiel (2015) with respect to PSPB (2009)
11. Criticism against Peeters, Klaassen and van de Wiel (2015) with respect to PSPB (2009
Dear colleagues and friends,
recently, the University of Amsterdam approached the editors of PSPB and asked them to retract my paper:
Förster, J., Epstude, K. & Özelsel, A. (2009), Why love has wings and sex has not: how reminders of love and sex influence creative and analytic thinking. Personality and Social Psychology Bulletin, 35, 1479-1491.
The retraction request was based on an analysis authored by Peeters, Klaassen, & van de Wiel, in the following referred to as PKW-report.The PKW-report flagged our paper as "strong evidence for low scientific veracity".
In a recent post, PSPB published a note of concern with regard to this paper and refrained from retracting it.
I agree with the decision and will summarize here some major concerns that my colleagues and I have with the methods used in the PKW-report in general and with regard to the specific PSPB paper.
I. Criticism of the PKW-report
Ideally, statistics used to detect “low scientific veracity“ should be valid and conclusive. In the following, I will 1) summarize the criticism raised against the statistical method used in the PKW-report, 2) mention specific shortcomings with respect to the analysis for the specific PSPB paper in question.
1) Criticism of the statistics used in the PKW-report
The large amount of studies examined in the PKW-report made it possible to look deeper into the analyses that were used. My co-authors Prof. Dr. Nira Liberman and Prof. Dr. Markus Denzler published an extensive criticism against the report (https://errorstatistics.files.wordpress.com/2015/06/june-2015-update-j-forster.pdf) see also attachment 2).
I also responded to the PKW-report in my blog (http://www.socolab.de/main.php?id=66), basically using Liberman & Denzler’s work .
Let me summarize the criticism here. This paragraph will be followed by an analysis that focuses specifically on the PSPB paper. Most of the points summarized here, however also apply to the paper in question.
To summarize, in the PKW-report two methods are used:
1) A newly developed method by Prof. Dr. Klaassen (in the following the V-method) who is also one of the authors of this report.
2) A method that had been introduced by an anonymous complainant (in the following the C-method).
Note, that in this report, the “evidence” remains purely statistical. No further investigation (i.e. hearing, raw data analysis, etc.), took place.
One major problem is that the V-method has never been validated.
New methods should be scrutinized by independent experts and this is usually achieved in our field through a peer-review process. Please note that in 2012, the first ethic committee by UvA investigating my case explicitly asked the anonymous complainant of the analyses to seek a scientific discussion via publishing his/her methods in refereed journals. However, within 3 years neither of the methods has been published!
One may argue that nowadays publication on websites such as Arxiv counts. Klaassen (2015) for example published his V-method on this site. However, the problem is that such papers are not scrutinized by experts before publication (although they might be criticized after the fact. Indeed, one is welcome to visit https://pubpeer.com/publications/5439C6BFF5744F6F47A2E0E9456703 to see very serious criticisms of the Klaassen, 2015 paper).
Filling this void, my co authors included a validation attempt, investigating other papers that unlikely included "low veracity" data. They chose for example a study by Blanken, van de Ven, Zeelenberg, & Meijers (2014) that was part of the "open science framework" replication attempt and found a number of V values that would deem these findings as having "low veracity". There is probably no situation where faking is more unlikely.
Liberman & Denzler’s analysis includes a long list of detailed criticism against the methods. They show that the methods cannot be used to detect fraud or “low scientific veracity”. I strongly recommend reading this paper. I will only briefly summarize the main arguments here:
a. The V-method is portrayed as "standard procedure in Bayesian forensic inference." In fact, it diverges from Bayesian inference on a critical point. A core characteristic of Bayesian inference is that evidence can either increase or decrease confidence in prior beliefs, whereas with the new method evidence can only strengthen the belief in “low veracity”. That is, the more studies are in a paper, the more likely the judgment will become “low veracity”. Dr Hannes Matuschek (2015) confirms this observation. Recently, he published a criticism on the V-method (unfortunately, again in Arxiv) and came to similar conclusions: “A third problem is the assumption that the product of the evidence provided by every single experiment in an article can serve as a metric of evidence for data manipulation in this article. As mentioned above as well as in the comments to the article at pubpeer.com  and in a response by Denzler and Liberman , this assumption implies that the evidence for data manipulation grows exponentially with the number of experiments even under H0. The probability of V ≥ 2 for a single experiment is about p ≈ 0.25. Thus, about every 4th good experiment will double the evidence for data manipulation.”
b. The criteria for fraud are too inclusive. To give an example, in Denzler, Förster & Liberman (2009) for example, 2 out of 17 V values were above the critical value 6. There is however a 0.40 chance to wrongly find two or more values above 6 in 17 computed values. Matuschek (2015) confirms the problem “that the critical value of V ∗ = 6 chosen by the authors, which implies (asymptotically) p ≈ 0.08. Arguably, this is a rather high probability of falsely accusing a colleague of data manipulation”.
c. Both methods rest on a wrong assumption that dependence of variances necessarily indicates fraud, whereas in real experimental settings many (benign) reasons may contribute to such dependence. Here is an example. Suppose you have a paper-and-pencil experiment with two conditions, A and B, and plan to run 20
￼￼￼participants in each condition. Your assistant prints out all 40 questionnaires in interchanging order (ABABAB), then gives each of four experimenters 1⁄4 of the pile, hence forcing an equal number of five participants in each of the experimental groups for each experimenter. If there is an experimenter effect (which is fairly common in our field, for example if you measure willingness to help and the experimenters differ in how friendly and attractive they appear), this procedure would increase the likelihood of getting correlated errors in the two conditions. As a result, this procedure would increase the “evidential value” (V coefficient) for “low veracity” as formulated by Klaassen (2015).
This procedure is unproblematic, because it does not increase the likelihood of obtaining significant differences between the means of the groups. Nevertheless, it does in fact violate random assignment, and creates dependency between measurement errors.
Here are a number of additional relatively common and benign practices (and accompanying sources of systematic error) that may contribute to the correlation of errors between experimental groups, without invalidating any findings regarding differences between the means of those groups:
(1) You make sure to run an equal number of men and women in all groups (and there is a gender effect).
(2) You run an equal number of participants from each condition in each room (and there is a room effect)
(3) You want to look at the results half-way through the study and force the conditions to have an equal n at that stage (and the time of running the study is a source of variance, a well-documented effect in psychology. For example, more conscientious participants typically participate in experiments earlier in the semester).
(4) You run an equal number of participants from each group in a certain period of time (each day, each week) and time introduces variance, e.g., due to changes in weather (which might affect mood), political events (people get worried and distracted in time of war) or other events (e.g., students become impatient/anxious before exams).
Needless to say, one could easily come up with many other examples. The important point is that in real research settings there are many (benign) violations of random assignment, any one of which, and even more so their combination, would introduce dependence between errors of the measured dependent variable in experimental groups and would therefore inflate the value of V.
d. In addition to these points, I would like to suggest that the file-drawer effect (i.e., publishing only the strongest experiments) could further inflate V values (as well as indexes of “fraud” in the C method). Indeed, back when the studies in question were done I chose to only report my best studies. I would run a few studies testing the same basic prediction, and if more than one of them worked, that is, yielded the predicted result, I would report only the strongest. This has been common research practice among many of my colleagues. Reviewers of papers would sometimes ask authors to exclude some studies to avoid repetition and redundancy. Some editors, too, would even ask authors explicitly to take out the weaker studies. You can also check this by simply reading almost any article published at this time- you will not find many studies with null or marginal results. The norms may have changed in the last few years, but it was definitely a prevalent practice some years ago. Would it be reasonable to retract all papers from that period that appear suspicious because of the file drawer effect? Methods to detect unlikely patterns should be adjusted to this aspect, moreover, they should be adjusted to reflect also combinations of practices that can potentially inflate indications of low scientific veracity (e.g., file drawer plus study practices that could have introduced dependency of errors between conditions).
Last but not least, note that UvA did not analyze all my papers that I published during my employment. This inflates V values as well. In Table 17.4 of the PKW-report they list 9 papers, but fail to list the 13 papers that were published together with Dutch colleagues at UvA. This is purposefully selecting papers that yield high V values, and is methodologically questionable. Needless to say, it inflates the critical V values of the papers under consideration.
To sum up, the methods as they are used in the PKW report likely produce overestimations of “low veracity”. They are in general not valid “forensic statistics”, and they cannot be used to detect fraud or data manipulation. To reiterate: The UvA asks for a retraction of the PSPB paper merely based on these methods, which are questionable not only in my view, but also in the view of colleagues and statisticians. Note that the Honor Court of the German Psychological Association (Ehrengericht der Deutschen Gesellschaft für Psychologie) that investigated my case recently, qualified these statistics as premature (see below 10.a and 10.b).
2) Specific shortcomings in the analyses used in the PSPB study
The general criticisms I raised above apply to the PSPB paper in question. In addition, a few other problems present themselves:
The paper includes two studies. Study 1 shows that priming with love enhanced creativity, whereas local processing decreased it. However, love priming decreased analytic thinking and sex priming increased it. Study 2 was a replication of Study 1.
Let me list reasons why I think that the evaluation by the PKW-report “low scientific veracity” for this respective paper is unjustified.
a) Sample size: how many studies have to be analyzed in a paper to reliably indicate “low veracity”?
Matuschek’s (2015) independent criticism against the V measure mentions that “the reliability of the V value for a small n remains unknown (as well as how large a large n must be to be considered large)” (note: n here refers to the number of analyzed experiments in the paper, not the number of participants in each study). He further says: “Thus, the V-value approach can serve as a test for sample correlations, if it is applied across several identical or at least similar experiments. In this case one is also able to decide whether the variability in the results is suspiciously small or not.”
It is unquestionable that 2 studies are not sufficient to judge a paper as “low scientific veracity”.
b) Dependency of measures
The analyzed studies had two or three within-subjects measures, creative and analytic performance (and in Study 2 global-local-processing), which were dependent on each other (all are performance measures and some dependence is to be expected). Therefore, one cannot treat the two high V values as independent from each other. Statistical analyses have to be adjusted for this natural dependency, but in the PKW report they were not. The analysis used is simply not suited for dependent measures that are correlated.
Five V values were computed for this paper. Two of them came out as high, exceeding the threshold of 6. I calculated the chance to get two “substantial” V values out of 5 calculated values by chance given the assumptions of linearity and independence of errors (Based on Klaassen et al.’s figure of 0.0809 per study; http://stattrek.com/online-calculator/binomial.aspx). This is the false alarm rate: It is .05549. That is, adopting the UvA recommendation would mean that we will be falsely retracting around 5.5% of published papers.
The actual rate of false alarm, however, is probably much higher, because of the following:
1. The two V values were actually dependent.
2. The V values were likely inflated by file drawer
3. The V values were inflated by selection of papers (the UvA report examined only some of my papers)
4. The V values could have been inflated by benign violations of random assignment (II.1.c)
Is this "strong evidence" for low data veracity? I do not think so.
I do not think that the analyses provide any clear evidence that the findings are unreliable or any evidence for fraud. The PKW-report includes invalid statistics, biased conclusions and should thus not be the basis for any retraction.
Regards, Jens Förster
10 a. Erklärung Ehrengerichtsurteil
Sehr geehrte Damen und Herren, liebe Freunde und Kollegen,
da über das Verfahren, das gegen mich im Jahr 2015 vor dem Ehrengericht der Deutschen Gesellschaft der Psychologie geführt wurde, verschiedene missverständliche und falsche Äußerungen getätigt wurden, möchte ich dazu folgende Erklärungen abgeben:
Das Ehrengericht der Deutschen Gesellschaft für Psychologie hat die gegen mich erhobenen Manipulationsvorwürfe nicht bestätigt, sondern das Verfahren eingestellt. Vor dem Ehrengericht wurden in einem rechtsstaatlichen Verfahren die Aussagekraft der statistischen Verfahren, die angeblich Manipulationen reflektieren, genauso diskutiert, wie die Tatsache, dass ich aggregierte Daten gespeichert hatte – letzteres auf dem Hintergrund der Archivierungs-Regeln, die zur Zeit der Datenerhebung galten. An dem Verfahren waren renommierte Experten der Statistik und Psychologen anderer Subdisziplinen beteiligt. Gegenstand waren vor allem zwei Artikel, die im Journal of Experimental Psychology: General im Jahre 2009 und 2011 erschienen waren. Eine Analyse von Peeters, Klaassen und van der Weil bezüglich weiterer Artikel wurde nicht mit einbezogen, weil sie von internationalen Experten stark kritisiert worden war. Die Gutachter konnten diese Kritikpunkte nachvollziehen.
Das Ehrengericht kam nach eingehender Prüfung zu dem Schluss, dass die statistischen Befunde nicht ausreichten, um die Schlussfolgerung zuzulassen, die Daten seien durch mich oder irgendeinen meiner Mitarbeiter manipuliert worden. Es wurde kein Schuldvorwurf gegen mich erhoben.
Das Verfahren endete, ohne dass irgendwelche Sanktionen gegen mich verhängt wurden. Als Sanktionen wären eine Verwarnung, ein Verweis mit oder ohne Geldbuße, ein Ausschluss auf Zeit oder ein dauerhafter Ausschluss in Betracht gekommen (§ 9 der Ehrengerichtsordnung).
In der mündlichen Verhandlung vor dem Ehrengericht am 19. November 2015 wurde vom Gericht schließlich folgender Vergleich vorgeschlagen:
1. Das Ehrengerichtsverfahren der Deutschen Gesellschaft für Psychologie gegen Herrn Prof. Dr. Jens Förster wird eingestellt.
2. Prof. Förster verpflichtet sich, bei den Herausgebern des Journal of Experimental Psychology darauf hin zu wirken, die in dem genannten Journal 2009, S. 88-111 und 2011, S. 364-389 veröffentlichten Beiträge des Autors zurückzuziehen.
3. Mit diesem Vergleich sind weder ein Schuldeingeständnis von Prof. Förster noch ein Schuldvorwurf seitens des Ehrengerichts verbunden.
4. Die Deutsche Gesellschaft für Psychologie stimmt mit der Ruhr-Universität Bochum eine gemeinsame Verlautbarung ab, in der sie in einer Pressemitteilung den Text des gerichtlichen Vergleichs veröffentlichen.
Dieser Vergleich wurde sowohl von mir als auch von der Präsidentin der Deutschen Gesellschaft für Psychologie angenommen. Insofern bleibt noch einmal festzuhalten, dass die gegen mich erhobenen Manipulationsvorwürfe sich nicht bestätigt haben, sondern das Verfahren eingestellt wurde, ohne dass es zu einer Sanktionierung kam.
Mit freundlichen Grüßen, Jens Förster
10 b. Statement Judgement by the Court of Honor
Dear friends and colleagues,
I noticed that some wrong or misleading statements about the outcome of my suit with the Court of Honor of the German Society of Psychology (Deutsche Gesellschaft für Psychologie) were published in the internet. Let me clarify:
The Court of Honor could not confirm the accusations held against me, but rather discontinued the suit. In front of the Court of Honor, in accordance with the rule of law, both the validity of the statistical analyses that were said to reflect manipulation and the fact that I only stored aggregated data were discussed – the latter on the background of rules of archiving that existed at the time the data was collected. Renowned experts of statistics and colleagues of other sub-disciplines of psychology participated. The Court considered especially two articles that appeared in 2009 and 2001 in the Journal of Experimental Psychology: General. A further analysis by Peeters, Klaassen und van der Weil on other articles that I (co)- authored was not considered further, because of the criticism that international experts have raised against the methods. The reviewers thought that the criticism was understandable.
After thorough examination of the case the Court of Honor concluded that the statistical results are not sufficient to allow for the conclusion that the data was manipulated by myself or by my co-workers. No guilt accusation was raised against me.
The suit was discontinued and no sanctions were imposed. Sanctions could have been admonishment, reprimand with or without a fine, temporary exclusion, or permanent exclusion (§ 9 of the rules of the Court of Honor).
A settlement was reached between both parties on November 9, 2015 in an oral trial (see here a translation of the official summary):
1. The Court of Honor suit against Prof. Dr. Jens Förster will be discontinued.
2. Prof. Förster undertakes to ensure the publishers of the Journal of Experimental Psychology to withdraw the two publications in question.
3. This mutual agreement does neither represent an admission of fault by Prof. Förster nor an accusation of fault by the Court of Honor.
4. The German Society of Psychology agrees with the Ruhr-Universität Bochum on an official statement, to publish the content of the settlement in a press release.
The settlement was accepted by the president of the German Society of Psychology and by myself.
It remains to be said that the accusations of data manipulation held against me were not confirmed, rather the suit was discontinued and no sanctions were imposed on me.
Kind regards, Jens Förster
9. Reaction to UvA Report, June 17, 2015
Dear Friends and Colleagues,
in the following, I would like to comment on the recent accusations by UvA’s Executive Committee. I will first describe the procedure and then criticize the methodology used. I am indebted to my co authors, especially Prof Dr. Nira Liberman, and Prof. Dr. Markus Denzler, and to experts in the field of statistics that helped me to refute the accusations.
Background: What happened?
On 1 April 2015 around 2 p.m., University of Amsterdam’s (UvA) legal affairs department sent a 100 pages report to me (authored by Prof. Dr. Carel Peeters, Prof. Dr. Chris Klaassen and Dr. Mark van de Wiel), including an investigation on some of my studies using so called “forensic statistics”. I was asked to respond before 2 April, 3 p.m. This was not possible, because I was out of office; my out of office reply was intact. In addition, on 2 April my co-authors were contacted by Prof. Dr. De Groot, telling them that UvA intended to ask editors for retraction of articles that were flagged by the commission with “strong statistical evidence for fabrication” or “questionable veracity”. I received a similar letter by the legal affairs department.
My co authors asked for the report that was not attached to their email; they wanted to examine it themselves. After an intense email exchange with Prof. Dr. De Groot, they eventually received the report. However, they were also explicitly asked not to comment on it. The co authors protested again and eventually, we were all given 2.5 weeks to respond. I sent my response in time and the co authors independently wrote a letter to UvA that I attached to my response. Note that in order to re-analyze some of the data, we needed a certain code used by the statisticians. We received this code only four days before the deadline. Nevertheless, even under this immense time pressure we were able to show that the method used in the investigation was fundamentally flawed. We told UvA explicitly that the report was biased and that the statistics used are not valid. The new method provided no evidence for any violation of academic integrity. Both my co-authors and I explicitly said that publication of this flawed report could damage our reputation.
While UvA was still processing our response, on 20 April, I received requests from the journalist Frank van Kolfschooten, asking me about my reaction to the “Klaassen-report”. Thus the press was informed about the report even before we were given a chance to respond to it. We received a modified report and a response letter by the statisticians on 2 June. In the email that accompanied the letter, I was informed that on the basis of the new report, UvA would immediately ask editors to retract 8 of my papers and to consider retraction of 3 more.
On the same day, a press release appeared on UvAs official web page announcing the report and summarizing its conclusions – UvA recommends that 11 of my papers would be retracted or considered for retraction. The same evening Frank van Kolfschooten published an article on this report in Science magazine. The UvA report was not sent to us ahead of time. Neither the co authors nor I had the time to prepare for a response. In no time, the UvA report was leaked and appeared in the internet. Due to other urgent obligations (workshops, teaching, approaching of the unchangeable deadline for my new book) this letter needed some time. However, note that even if this report would have arrived during vacation time, a response to a 109 pages report needs more than a day. Again, UvA put us under severe pressure.
The co authors and I discussed the response by the authors of the report to our criticisms. However, we all agreed that our main criticisms were not refuted at all by their response. The main change in the new report concerned some softening of style (e.g. in the old version there were sentences like: “eradication of the tumors that fraudulent publications are, is of the utmost importance”). The authors also toned down conclusions. For example, the authors agreed that the former category of “strong statistical evidence for fabrication” was inappropriate and changed it to “strong evidence for low data veracity” and the category of “questionable veracity” was now changed to “inconclusive evidence for low veracity”. This change reflects the authors' acknowledgment that their methods cannot distinguish between fabrication and QRPs. This change, however, was not reflected in UvA's decisions. They still announced that they will ask for retraction of papers in the first category, and ask journals to consider retraction of articles that were found to show "inconclusive evidence for low veracity".
Even though our initial response letters helped to improve the agreeableness of the new report, I cannot accept its conclusions. Together with my co authors I do not think that this analysis calls for retraction of any of my articles in question.
To repeat it again, I never manipulated data and never motivated my collaborators to do anything that is ethically questionable. I saw blogs saying that I admitted having used ethically questionable research practices. This however is not true. I never said that and I never did that. You can check my entire blog below – this blog contains all I said about the affair.
Needless to say, the entire course of events raises many questions about ethical standards of the procedure. I would regard it a minimum standard that an accused person is informed about the investigation, the authority that conducts it and the people who participate in this commission (Psychologists? Statisticians? Administrators? A competitor on grants? A former student?). The letter including the report from 2 June was not accompanied by an official letter and the announcement of retractions was communicated within an email signed with “On behalf of the Executive Board”. There should also be room for communication, reaction and feedback. I think, for example, that my request to include an international social psychologist in the team writing the report should have been respected. Neither me nor my co-authors had decent time to read the long report and the statistical paper on which it was based. Please also note that in 2014 I had sent a letter to UvA asking to start an official re appeal procedure (which is not clearly described in UvA’s rules) and they never responded to it. Again, the press and the rector of Ruhr- University Bochum were informed about the report without asking my permission to do so, violating normal standards of data protection.
Let me now share my doubts with you about the methods used in the report.
Concerns about the Methods Used
Let me now summarize why the co authors and I think that the analyses are flawed, and do not allow for any conclusions with regard to violations of academic integrity. Experts in statistics helped tremendously and allowed for a fresh view on the accusations. Moreover, the large amount of studies examined made it now possible to look deeper into the analyses that were used.
For me the conclusion is clear: The methods are flawed, biased and do not allow for any conclusions related to data manipulation. Interestingly, as a reaction to our criticism, in the new version of the report the authors added the following paragraph:
Note that the methods employed cannot demarcate witting practices (such as fraud and manipulation) from unwitting practices (such as erroneous or questionable research practices) leading to low veracity of the reported data. The question is if the veracity of the data on which a given publication is based can be deemed sucient. If the data patterns are, from a statistical standpoint, extremely unlikely, the veracity of the reported data is in doubt. Whether such data patterns are due to witting or unwitting practices then, is of secondary importance: Of main import is that the data are to be met with distrust, calling into question the scientific value of the publication.“
Even that disclaimer, however, is insufficient. My co-authors suggest a long list of benign violations of random assignment of participants to study conditions, which could have inflated indexes of "low veracity" without invalidating any findings about differences between the means of the study conditions (see Point 3 in the letter by Liberman and Denzler, (https://errorstatistics.files.wordpress.com/2015/06/june-2015-update-j-forster.pdf)).
After almost 3 years of investigation there is no evidence of any violation of academic integrity.
2. The Methods Used Were Neither Peer-reviewed Nor Validated
Basically, the report uses two methods:
1) A newly developed method by Prof. Dr. Klaassen (in the following the V-method) who is also one of the authors of this report.
2) A method that had been introduced by the complainant of the first complaint (in the following the C-method). These measures are introduced as a multiple methods tests that would increase confidence in the conclusions, preventing method-specific errors. Multi-methodological testing is indeed a basic requirement given the severity of the accusations and its consequences, because statistical methods, like any other measurement, could have a variety of problems and suffer from at least measurement errors. The two methods, however, are very similar to each other. Indicators of fraud in the C method, high values of pΔF, tend to agree with high “evidential values” of fraud, V, in the V method. The Spearman correlation between the two metrics is no less than .96 in the target studies (and .98 in the comparison studies).
Again, in this report, the “evidence” remains purely statistical. One major problem is that the V-method has never been validated; both methods are still unpublished – this is uncommon in science. New methods should be scrutinized by independent experts and this is usually achieved in our field through a peer-review process. Please note that the first ethic commission by UvA investigating my case explicitly asked the authors of the analyses to seek a scientific discussion via publishing their methods in refereed journals. This might have been a reason why the V-method, even though it was used in the first complaint as well, never played a role in the former investigation. Rather, it was basically ignored in the first evaluation by UvA (which was a “not guilty” decision). Now, the same analysis is reason for retraction of 8 or more papers? Why? Nothing has been added in the meantime to improve the V method! No further evidence has been delivered so far! Within 3 years neither of the methods was published!
One may argue that nowadays publication on websites such as Arxiv, where Klaassen (2015) is placed, counts. However, the problem is that such papers are not scrutinized by experts before publication (although they might be criticized after the fact. Indeed, one is welcome to visit https://pubpeer.com/publications/5439C6BFF5744F6F47A2E0E9456703 to see very serious criticisms of the Klaassen (2015) paper. ) Note that the entire discussion of my case occurred on blogs. I am not sure that this is an appropriate way to discuss our research, especially if it remains the only way of discussion.
Filling this void, my co authors included a validation attempt, investigating other papers that unlikely included "low veracity" data. They chose for example a study by Blanken, van de Veen, Zeelenberg, & Meijers (2014) that was part of a replication attempt and found null results - there is probably no situation where faking is more unlikely. Notably, the V-method produced false alarms flagging some result patterns as suspicious!
The V-method is invalid – it likely produces false alarms.
3. Selection of Criteria: what counts as "suspicious"
The authors of the report claim that the analyses were found useful in prior investigations of Diederik Stapel and Dirk Smeesters. My co authors looked into the investigations and found that in the Smeesters’ case, the threshold for questionable results was a V-Value of 9. This value has been lowered for my studies to 6. If one would use the value of 9, many of my papers would not been flagged with “low veracity”. For example, for the Förster 2009 JEP G paper, a threshold of 9 as used in other investigations would have lowered the number of suspicious cases from 7 to 2; the paper would not have been classified as “strong evidence for low data veracity” if the threshold was 9.
But there are further problems with the method and its application to my case.
4. Doubts on the Strength of Conclusions
Please remember that the main experts of the original complaint, did not state that the patterns of the means and the raw data which they examined at length allow for strong conclusions. In his evaluation on our 2012 SPPS paper that was under accusation of data manipulation, the major (and only) reviewer for the LOWI said: “It is of course possible that the observed pattern was obtained by measurements”, and “In fact, the numbers in the data files represent possible values for each individual data point, and these are the numbers that lead to the observed pattern“. He concludes: „I emphasize that from the data files one can in no way infer that […] adjustments have actually been done. Nor can be said when and by whom such adjustments would have been done.” Again, the reviewer I cite here checked the raw data of the study in question.
Moreover, Prof. Dr. Uli Schimmack concluded that the results could have been the outcome of QRPs (rather than fabrication) that were rather common at that time. I never changed any data so the only conclusion left based on these analyses is that collaborators in my lab might have used methods that were not considered unethical back then.
Admittedly, Schimmack's later claims that it is highly problematic that I cannot say what happened to the data and this alone would question my academic integrity. However, I do not think that the fact that I do not know which ethically appropriate methods people in my lab could have used warrants the conclusion that I am unethical, or that the possibility Schimmack proposed should be refuted. I do not know what produced the strange data patterns. I am willing to collaborate and I have collaborated with any balanced investigation of these patterns. I am willing to accept Schimmack's hypothesis as a possibility, but I simply cannot, based on the knowledge that I have, confidently affirm it or refute it. I know only what I did and did not do, I do not know what my lab workers did. I can only hypothesize and guess.
However, I have no reason to mistrust my collaborators – and I am simply not convinced by the so-called “forensic statistics” used to test my studies.
5. Severe Criticism against the Methods Used
The co authors’ letter includes detailed criticism against the methods. They show that the methods cannot be used to detect fraud and do not show any evidence for “low veracity”. I strongly recommend reading this letter (https://errorstatistics.files.wordpress.com/2015/06/june-2015-update-j-forster.pdf). I will only briefly summarize the main arguments here:
1. The V-method is portrayed as "standard procedure in Bayesian forensic inference." In fact, it diverges from Bayesian inference on a critical point. A core characteristic of Bayesian inference is that evidence can either increase or decrease confidence in prior beliefs, whereas with the new method evidence can only strengthen the belief in “low veracity”. That is, the more studies are in a paper, the more likely the judgment will become “low veracity”.
2. The criteria for fraud are too inclusive. In Denzler, Förster & Liberman (2009) for example, 2 out of 17 V values were above the critical value 6. There is however a 0.40 chance to wrongly find two or more values above 6 in 17 computed values.
3. Both methods rest on a wrong assumption that dependence of variances necessarily indicates fraud, whereas in real experimental settings many (benign) reasons may contribute to such dependence.
4. Between subjects designs of 3 X 2 (or more) cannot be treated as two (or more) independent experiments with three levels. Sometimes experimental conditions are simply not independent as was pointed out in (3), even if they concern between designs. Note also that it cannot be avoided that the same participant participates in different (similar) studies, because for reasons of confidentiality and protection of privacy we cannot archive lists of participants – if a participant participates several times, studies are not independent from another.
5. The committee often applied the two methods to “control variables,” for which experimental effects were neither predicted nor found. This application is wrong, and may give rise to an especially high rate of false indications of fraud.
6. The new method appears to be too sensitive to minute changes in values (changes that are within the boundaries of rounding). If you for example have three means 0.25; 0.13; 0.02; this results in a V = 7.28 ("low veracity"). If you would have chosen 0.254; 0.125; 0.02 (note that the difference is just a difference in rounding), the value of V=3.09 (no evidence of low veracity all).
Adjusting the V method to file-drawer effects. I would like to suggest, in addition to the above points raised by my co authors, that the file-drawer effect (i.e., publishing only the strongest experiments) could in fact inflate V values (as well as indexes of fraud in the previous method). Indeed, back when the studies in question were done I chose to only report my best studies. I would run a few studies testing the same basic prediction, and if more than one of them worked, that is, yielded the predicted result, I would report only the strongest. This has been common research practice among many of my colleagues. Reviewers of papers would sometimes ask authors to exclude some studies to avoid repetition and redundancy. Some editors, too, would even ask authors explicitly to take out the weaker studies. You can also check this by simply reading almost any article published at this time- you will not find many studies with null or marginal results. The norms may have changed in the last few years, but it was definitely a prevalent practice some years ago. The methods used need to be adjusted to this aspect, namely, some file-drawer effect. They should be adjusted to reflect also combinations of practices that can potentially inflate indications of fraud (e.g., file drawer plus study practices that could have introduced dependency of errors between conditions). The methods as they are used now likely produce overestimations of “low veracity”.
For more detail for most of these arguments, see the letter of the co authors (https://errorstatistics.files.wordpress.com/2015/06/june-2015-update-j-forster.pdf). The co authors also discuss several analyses of specific papers in the report, and show that these analyses are erroneous.
6. Selection of Studies
It is remarkable that in this report no papers were investigated that are co-authored by colleagues at UvA. The UvA now announced that they will also investigate all my other papers as well, probably because of my previous request. Putting the "suspicious" papers in context is important, because one might ask, for example, whether the "suspicious" papers were cherry-picked as especially "linear" among other papers. Statistics have to be adjusted to reflect this selection.
7. Examples of Inappropriate Biases in the Report
Prof. Dr. Nira Liberman observed some remarkable biases in the analyses.
It seems that with good old frequentist hypothesis testing came good old p-hacking. The authors of the UvA report use their own criteria very liberally. Here are a few examples of what they do when a study falls short of showing enough “incriminating” evidence:
1. Take in Vs lower than 6 as “substantive” (on p. 57, they refer to a V of 5.05 as “substantive”)
2. Take the higher-bound value of V instead of the lower-bound (on p. 82, a V between 3.84 and 12.38 is considered substantive).
3. Find a “suspicious” element post-hoc. For example, the paper K.JF.D10 has only one V above 6, but is listed in the “consider retraction” category because there is a SD=0 in one of the conditions and the authors felt it is “peculiar” (p. 78). (By the way, there is nothing “peculiar” about this SD=0 . In this study, participants classified metaphors and literal sentence as “metaphors” and “non-metaphors”. In the condition in questions, each of the 15 participants simply correctly classified five out of five non-metaphors as “non-metaphors”.)
4. If reported means do not provide enough high V values, try pooling together different conditions (e.g., p. 58).
To illustrate the approach, let me zoom into two studies:
Förster, Liberman, & Kuschel (2008) has one value of 18.05, that pertains to a null result. It seems that after the authors of the report found only one V higher than 6 among the 20 values they computed for this paper, they resorted to "pooled results"- they added together conditions to look for additional V values. Among the seven pooled results, they found one value of 18.13. Although the means are not provided, it obviously refers, once again, to null results. What could the authors do now? They decided to also count a V value of 5.05 (which also pertains to null results) as "substantive", relaxing their own criteria according to which only Vs higher than 6 should count substantive (a p-hacking strategy). The probability of obtaining two Vs higher than 6 among 27 computed values is 65%.
Kuschel, Förster, & Denzler (2010) has only one "substantial" V. It ranges between 116 and infinity. This is indeed a very high number. As mentioned before, it is high because the authors computed the V when they shouldn't have – there is a SD of zero in one of the conditions
between Prof. Dr. Nira Liberman and Prof Dr. Richard Gill) due to discrete data. The SD=0 pertains to an accuracy rate with a fairly easy task, on which all 15 participants in one condition correctly classified five out of five items. This was intended, since accuracy was not the main DV in this study, it was rather reaction time. Reaction times do not show suspicious linearity. Instead of counting this as zero evidence for low veracity, the authors of the report count it twice – once for the high V value (which they should not have computed to begin with) and once for the SD=0 which they unjustifiably deem "peculiar", (inventing a post-hoc criterion for "low veracity" and demonstrating yet another p-hacking strategy).
Please also note that both papers fell into the category of “inconclusive evidence for low veracity”. Ignoring for a moment all the problems listed with the V-method – is it fair that UvA risked to damage our reputation because of papers that are labeled as showing “inconclusive (!) evidence for low veracity”? Apparently, for some strange reasons, UvA thinks that they are in the position to do this. And that all this is just fine. I disagree.
And what does it imply? Would UvA now continue asking editors to retract papers without even communicating this to authors anymore? We examined some papers in the literature using the V-method and identified some high V values (that we think are false alarms). So, for example, the above-mentioned replication attempt by Blanken, van de Veen, Zeelenberg, & Meijers (2014) that produced null (!) results would have to be counted as an “inconclusive evidence for low veracity” case. Will there now be an automaticity of the kind: “If papers are tested with the V-method and if the tests reflect ‘strong or inconclusive evidence of low veracity’ then the UvA Executive Board will simply ask editors to retract these papers?”
It also seems that UvA does not even shy away from retracting papers first authored by people that never worked at UvA!
In sum, I find the report and the entire procedure that led to its writing and publication unacceptable.
There is no concrete evidence of data manipulation, the statistical methods used are error prone, flawed and most probably invalid.
UvA has no reason for asking journal editors to retract my papers.
In our first letters that I and my co authors sent to UvA in reference to the first version of the report, we explicitly pointed to the very high probability of false alarms, to the biases and inappropriate use of methods in the report. The authors and UvA decided to ignore these concerns.
Publication of the flawed report clearly and irresponsibly damaged my reputation and the reputation of my co-authors.
Regards, Jens Förster
P.S. As a reaction to the UvA report, I received more than 160 supporting emails by colleagues and friends who were upset about the procedure and encouraged me to hold on.
I have no time to respond to each of you but let me say here THANK YOU ALL. It feels good not to be alone.
8. Reaktion auf UvA-Notiz vom 03. Juni 2015/Reaction to UvA note June 3, 2015
Liebe Freundinnen, Freunde, Kolleginnen und Kollegen,
gestern veröffentliche die UvA eine Notiz, in dem von einem statistischen Bericht über einige meiner wissenschaftlichen Publikationen die Rede ist. Auf dieser Grundlage kündigte die UvA an, die Herausgeber der betreffenden wissenschaftlichen Zeitschriften anzuschreiben und sie zu bitten, einige Artikel zurückzuziehen. Meine Ko-Autoren und ich prüften kürzlich eine Vorversion dieses Berichts, den wir alle als voreingenommen und irreführend bewerteten und der keinerlei Beweise für Datenmanipulation liefert. Ich benötige ein wenig Zeit, um den neuen Bericht zu prüfen, den ich gestern Nachmittag das erste Mal zu sehen bekam. Zudem muss ich herausfinden wie ich mich überhaupt inhaltlich verteidigen kann, da ich zur strikten Geheimhaltung bezüglich der Email und des Berichts angehalten wurde.
Im Moment möchte ich nur meine tiefe Empörung über dieses Verfahren äußern, in dem über eine Analyse berichtet wird, auf die mir nicht einmal die Zeit gegeben wurde, angemessen zu reagieren. Die Intention der UvA ist mir vollkommen schleierhaft; ich weiß noch nicht einmal, aus welchen Personen sich die Kommission zusammensetzt.
Mit besten Grüßen, Jens Förster
Dear friends and colleagues,
Yesterday afternoon, the UvA published a note citing a statistical analysis on some of my articles. On the basis of the results, UvA announces to ask editors to retract some of my papers. My co authors and I saw a previous version of the statistical report, which we all found biased, misleading and lacking any evidence of data manipulation.
I will need some time to process the new report that I saw yesterday afternoon for the first time. Because I was sworn to secrecy with respect to the report and the email I received, I also need to figure out how I can defend myself without referring to the contents.
For now, I would like only to express my outrage at the procedure, by which the present report is published without allowing me time to prepare a response. UvA’s intention is completely unclear to me; I do not even know the names of the members of the commission who decided this.
Regards, Jens Förster
7 b. Humboldt-Professur
Sehr geehrte Damen und Herren, liebe Freundinnen und Freunde,
vor einiger Zeit habe ich mich dazu entschieden, die Alexander-von-Humboldt-Professur, für die ich im letzten Jahr auserwählt wurde, an die Stiftung zurückzugeben.
Diese Entscheidung traf ich in einer entspannten Lage. Ich hoffe, dass kaum noch jemand denkt, dass ich etwas Unethisches getan habe; Zweiflern empfehle ich die hoch-karätigen alternativen Erklärungen für meine zur Diskussion stehenden Befunde, die ihre Entstehung für ethisch unproblematisch halten. Weitere statistische Analysen werden folgen.
Dennoch befürchte ich, dass mich die Auseinandersetzung mit der Universität Amsterdam (UvA) weiterhin Kraft kosten wird. Zwar bewältige ich momentan den enorm erhöhten Arbeitsaufwand aufgrund der konstanten, unfairen Angriffe gut – ich habe trotz der Attacken ein Buch vollendet, ich habe ein innovatives Lehrprogramm an den Start gebracht, ich erhalte mehr Einladungen für Vorträge, Bücher und Artikel und Gutachten, als je zuvor. Ich meistere dies alles.
Die Organisation eines 5-Millionenprojektes mit ca. 50 Mitarbeitern ist unter diesen Umständen allerdings schwer zu bewältigen. Ich befürchte auch, dass mein Lebensrhythmus weiterhin von der UvA bestimmt sein wird. Die letzten drei Jahre über erhielt ich Anfragen von niederländischen Ethikkommissionen am 24. Dezember, an meinem Geburtstag, kurz vor Ostern.... Ich denke, dass ich auch in der Zukunft Urlaubszeiten damit verbringen werde, Briefe zu schreiben. Ich habe mich damit abgefunden, dass ich weiterhin ein beliebtes Forschungsobjekt niederländischer Statistiker sein werde. Ich habe mich daran gewöhnt. Ich werde ihre Anfragen zu ihrer Zufriedenheit beantworten und ich werde es überleben.
Beim Wandern in den Bergen mit meinem Mann zu meinem 50. Geburtstag, oben auf einem Gipfel, abseits von all der Hektik, stellte ich fest, wie viel Energie ich hatte. Ich hatte bei diesem Weg meine Höhenangst bezwungen, ich war wie ein Dreißigjähriger hochgeklettert, ich strahlte von innen, fühlte mich belebt, inspiriert und energetisiert.
In der Rückblende, einige Jahre zuvor, als ich noch an der Universität von Amsterdam arbeitete, war ich nervlich am Ende gewesen, unzufrieden, krank und traurig, selbst wenn ich im Urlaub war. Dieser Zustand hatte sich eingestellt, lange bevor an irgendwelche Anfeindungen zu denken gewesen wäre. Damals dachte ich, ich würde älter und das Leben hätte nicht mehr viel zu bieten. Auf diesem Berg atmete ich tief ein. Wie hatte mich doch das Leben in Köln und die Arbeit an der Ruhr-Universität wiederbelebt!
Ich will diese Energie für neue Projekte einsetzen, ich werde besser sein als je zuvor, ich habe ehrgeizige Pläne. Ich habe viel gelernt. Ich wurde mit Dreck beworfen, ich musste mich dort herausarbeiten. Es war eine grässliche Erfahrung, aber es war nützlich am eigenen Leibe zu erleben, wie viele Angriffe und Beleidigungen man ertragen kann, ohne daran zu Grunde zu gehen. Selbst in dem Dreck habe ich Gold für zukünftige Projekte gefunden und Erfahrungen gesammelt, die mir in meiner Forschung zur Selbstregulation und bei meiner Tätigkeit als Coach nützen.
Und ich kann Ethik-Kommissionen beraten, ihre Prozeduren menschlich und fair zu gestalten. Mir wurde bewusst, wie glücklich ich mich schätzen kann, Manfred, meinen Mann, an meiner Seite gehabt zu haben. Treffen ähnliche Angriffe jedoch auf Menschen in weniger günstigen Lebenssituationen, kann mit heftigen Reaktionen, einschließlich gesundheitlicher Beeinträchtigungen bis hin zu Suizidgedanken gerechnet werden. Als Psychologen müssen wir Schlimmeres verhindern – durch faire, demokratische Prozeduren und rechtsstaatliche Abläufe.
Meine wissenschaftlichen Pläne kann ich allesamt an der RUB umsetzen. Ich brauche dafür nicht viel – ich habe ja sogar in der unerträglichen Arbeitsatmosphäre an der UvA viel geschafft – und das selbst ohne Hilfskräfte, ohne eigenes Büro, ohne Budget und ohne eigene Sekretärin.
An der RUB habe ich viel bessere Bedingungen. Die Kolleginnen und Kollegen bieten mir hoch interessante Kooperationen an, meine Sekretärin, meine Mitarbeiterinnen und meine Hilfskräfte sind hervorragend, acht großartige Studierende arbeiten unentgeltlich als Forschungspraktikant/innen in meinem Labor, die Versuchspersonen erwarten keine finanziellen Belohnungen, weil sie uns unterstützen wollen und unsere Forschung für sie interessant ist. Hier lässt es sich leben: Menschen sprechen und kooperieren miteinander, die Hierarchien sind flach, alle Probleme werden schnell geklärt. Ich könnte hier ohne Probleme auch weiterhin Massen an Artikeln publizieren.
Aber hier kommt der springende Punkt: Das will ich gar nicht mehr. Nicht um jeden Preis.
Durch meine Arbeit an meinem neuen Forschungsthema „Haben und Sein“ „habe ich meine Lebenseinstellung maßgeblich geändert. Ich will die Jagd nach Publikationen, so wie es andernorts gelebt wird, hier nicht wiederholen. Ich will stattdessen aus der Breite meines Wissens schöpfen, ich will in die Tiefe gehen. Ich will mit meinen Arbeiten anregen und inspirieren und will die Dinge tun, die mich wirklich interessieren. Die Sozialpsychologie ist mehr als andere eine Disziplin, die bahnbrechende Theorien hervorbringen kann. Dafür benötige ich Zeit, Kommunikation mit anderen, und ein Wiedererwecken meiner Risikobereitschaft, jenseits der Trends und pragmatischen Erwägungen zu denken.
Ich will Sein statt Haben.
Ich lasse nun die immer materialistischere und seelenlose Produktionsweise hinter mir, die in unserer Wissenschaft vorherrscht. Und sage „Adieu“ zu grausamen zehn Jahren, in denen ich vor allem fremdbestimmt war. Ich mache jetzt mein eigenes Ding.
Ich fühle mich außerordentlich geehrt, überhaupt in die Reichweite dieses Preises gekommen zu sein und danke allen Mitarbeiterinnen und Mitarbeitern der AvH für die immerwährend wertschätzende Unterstützung und Betreuung. Ich danke ebenfalls allen Freundinnen und Freunden und allen Kolleginnen und Kollegen, die mich in meiner Arbeit bis hierher unterstützt haben.
Mit den besten Wünschen, Jens Förster
7 a. Humboldt-Professorship
Dear colleagues, dear friends,
Some time ago, I decided to return the Alexander von Humboldt-Professorship, for which I had been selected 1 year ago to the Alexander-von-Humboldt-Stiftung.
The decision I made in a rather relaxed situation. I hope that nobody thinks anymore that I did something unethical and I would urge those who still have doubts to read the recent excellent alternative explanations for my results under discussion, proving that such results patterns can be obtained by methods that are not problematic. More statistical analyses will follow.
Still, I am afraid that the conflict with the University of Amsterdam will furthermore cost a lot of energy. Surprisingly, I manage to cope with the enormous work load based on the constant and unfair attacks quite well – despite the assaults I finished my book, I set up an innovative teaching program, and I accepted more invitations for talks, books, articles, and reviews than ever. Indeed, I master all this.
However, given such situation, the organization of a 5-Million-project including 50 co workers is impossible. I am also afraid that my life rhythm will further depend on the UvA. Over the last 3 years letters by Dutch ethics commissions arrived 24th of December, on my birthday and shortly before Easter holidays…. I think that I will continue spending my holidays writing letters. I made peace with this idea that I will continue being the most interesting research project of Dutch statisticians. I got used to this. I will of course answer their questions thoughtfully and in detail. I will survive it.
Recently, when I was standing on the top of a mountain after an exciting hike with my husband, apart from the hectic of my daily life, I realized how strong I was and how much energy I have. During the climb, I had eventually gotten over my vertigo, I had climbed the mountain quickly like a 30-year-old, I was radiating, I felt refreshed, inspired, and energized.
In flashback, some years ago, when I was still working at the University of Amsterdam, I was nerve bundles, dissatisfied with my life, sick and depressed, even when I was on vacation. This miserable state I experienced much before there was any indication of any accusations. Back then I thought that I am just getting older and that life does not have anything to offer for me.
On the top of this mountain, I took a deep breath. How much had changed in my life since I started living in Cologne and working in Bochum! I would like to invest this energy into my new projects. I will be better than ever, I do have ambitious plans. Digging deeper into the dirt that I was confronted with, I actually found gold for future projects. It was an awful experience but useful to observe how many insults and assaults a person can bear without collapsing. This experience is helpful for my research on self –regulation and for my work as a coach.
And I can help ethics commission to set up procedures that are human and fair. I realized how privileged I am with Manfred, my husband. In less favorable life situations, however, such unfair accusations can cause damages in health including suicidal thoughts and we as psychologists have to prevent for this. We need to install fair, democratic procedures that are in accordance with the rule of law.
My scientific plans can all be implemented at the Ruhr-Universität Bochum. I do not need that much for this – I even achieved a lot in the unbearable work atmosphere of UvA – and even without having an own office, even without having an own budget, an own secretary or research assistants. At RUB I have much better conditions. My colleagues offer fascinating cooperations to me, my secretary, my co workers, my research assistants are excellent, currently eight brilliant interns work in my lab for free, students participate in our studies without asking for financial compensation because they want to support us and because they find our research important. Here’s place to live: People speak to each other, they cooperate with each other, hierarchies are flat, problems are solved quickly and efficiently. Here, I could easily write tons of papers per year.
But here comes the point: I do not want this anymore. Not at any price.
During my work on my new research project on “what having does to being” I changed my approach to life completely. I do not further want to chase after publications as was the rule elsewhere. I rather want to create theories from the breadth of my knowledge. I want to dig deeper.. I would like to inspire others with my work, and would rather like to do all the things that I am really interested in. More than other disciplines, social psychology creates ground breaking theories. This needs time, communication with others, it affords risk taking in thinking beyond trends and pragmatic considerations.
I will spend the rest of my life on BEING rather than on HAVING.
Thus, I will leave the materialistic and soulless production approach in science. And I want to say “Adieu” to 10 cruel years, in which my life was almost completely determined by others. I am going my own way now.
I feel very honored that I got into the reach of this award. I would like to thank the people of AvH for their constant and appreciative support and guidance through difficult times. I also want to thank all friends and colleagues who supported me in my work.
Regards, Jens Förster
6. Christmas Letter to Friends and Colleagues
Dear friends and colleagues,
this year I am so afraid that I am missing someone out with my Christmas greetings that I decided to use this blog.
I would like to thank you all for your support and your warm wishes during this year. It was a terrible year on the one hand. The accusations against me and the way people tried to insult and break me were scary and unheard-of. I had heard about shit storms before but experiencing one is really different. I never thought that people could do such evil things to me. But on the other hand this is only part of reality. It is good to see that all the bad stuff remains in the net – and I followed your advice to stay away from the dirt. Fortunately, nobody criticized me in person and nobody wrote bad emails. In stark contrast, whenever I went to conferences or met colleagues they were very supportive, hugged me and encouraged me to not give up and to fight for it. They even praised me and this was also new to me – I never thought that I have so many friends and I never thought that so many people like my work. I thank you for all this.
One of my papers was retracted and as I said in my other blogs below, I left the decision to the editors. I accept the decision. I still do not think that data was manipulated, since I have neither reasons nor evidence to believe so. But these events have unforeseen dynamics and I understand this as an act of precaution that might have been necessary for a very young journal. Please note that the study itself was a replication of Friedman et al. 2003, so that all-in-all the theory predicting such effects is still intact. However, I am very sorry that all this happened, and I know that words are not really enough to help repairing the reputation. Thus, I will actively work for the field.
Among others, I accepted talks on “Improving Ethics Procedures in Science” and will thus share my experiences with a broader audience. Research Ethics have always been important to me. We have to protect Ethics Commissions from biases in decision making, witch hunting practices, and dilettantism. A medieval understanding of justice hurts rather than helps our reputation as scientists.
Thus, what could have been a mere “annus horribilis” turned into something that I would regard as a challenge. Doors closed but many more doors opened.
I eventually moved back to Germany and this was a good thing. This way I eventually corrected a huge mistake that I made. Moving to Amsterdam I truly regret – even though I met some wonderful colleagues and had great neighbors and got some work done, the clique that worked against me was simply too strong. As a whole, personally, the University of Amsterdam was hell for me. The work-load, the competition, the ice-cold pragmatism – this is not healthy to me.
I do like Bochum and the students and colleagues here. I feel privileged to work at one of the best psychology departments in Germany. You are fantastic researchers and great colleagues. I thank you so much for all you have done to me so far. For creating a warm place, and a friendly atmosphere, a place where people approach each other with respect. I enjoy the international, multicultural atmosphere and feel safe because you accept me the way I am. For you, being different means something positive and this is where I can grow.
The next year will again be a challenge. I will have to wait for decisions and I have no control over the decision making process. I will wait patiently and will just do my work. I learned that in those times, that the next day is the focal goal. I became mindful during the last year and it helped me a lot. This is what I also teach people that started approaching me to get advice on coping with shit storms and unfair accusations. I am glad that my experiences are a source of further helping. This way I can turn the bad stuff into something useful and good.
And there is a lot of university work ahead of me. I will have to prepare all the lectures for the next semester, I will start a new line of research, my new book will appear, I was invited to writing skills workshops for PhD students, and I will give more talks.
You see, there is a lot of light in the darkness and you do not have to worry. I never thought that I could be strong like this but the truth is that I am strong because of you.
I love you and wish you all a wonderful, peaceful 2014 and a happy holiday season.
5 b. Brief von Jens Förster vom 10. September 2014
Sehr geehrte Damen und Herren, liebe Kolleginnen und Kollegen, liebe Freundinnen und Freunde,
Es ist nun einige Wochen her, dass die Niederländische Zeitung NRC einen Artikel publiziert hat, in dem behauptet wird, ich hätte Daten manipuliert; zeitgleich veröffentlichte die LOWI (die Niederländische Ethikkommission) einen Bericht über meinen „Fall“ auf ihrer Internetseite. Andere Publikationen folgten und ich habe hier, auf meiner Internetpage, die Fragen beantwortet.
Mittlerweile habe ich eine Vertretungsprofessur an der Ruhr-Universität-Bochum angetreten und ich bin erleichtert, dass die Anspannung, die ich über 2 Jahre täglich erlebt habe, sich langsam legt. Ich habe nun das Gefühl, dass ich die Dinge ein wenig ordnen und auch einordnen kann. Ich bin sehr dankbar, dass ich aus meiner Disziplin (und auch von Wissenschaftlerinnen und Wissenschaftlern anderer Disziplinen) so viel Unterstützung erfahren habe.
Auf der anderen Seite schulde ich Ihnen und Euch Antworten auf einige noch offene Fragen und das will ich gerne in diesem Brief in Angriff nehmen. Das ist keine einfache Angelegenheit, da die Fragen in verschiedene Richtungen gehen, aber ich will versuchen, mein Bestes zu geben. Auf jede Frage habe ich eine kurze und, falls nötig, eine lange Antwort verfasst. In den langen Antworten sind manchmal weitere Detailfragen enthalten.
Bevor ich damit beginne, möchte ich mich jedoch von ganzem Herzen bei Ihnen und Euch dafür entschuldigen, dass diese Angelegenheit solche Turbulenzen verursacht hat. Das Ganze ist nicht nur ein Alptraum für mich, sondern verunsichert auch viele Kolleginnen und Kollegen, meine Freundinnen und Freunde und meine Familie. Ich habe Daten veröffentlicht, die nach Meinung einiger Experten problematisch erscheinen. Ich übernehme dafür die Verantwortung, obwohl ich weder Daten gefälscht noch manipuliert habe.
Ich kann die Auffälligkeiten in den Daten bis heute nicht erklären, aber ich kann versuchen, die Dinge einzuordnen und kann einige mögliche, alternative Erklärungen liefern. Keiner der Wege, die ich bei der Suche nach Aufklärung verfolgt habe, war erfolgreich, und es gibt einige offene Fragen und Möglichkeiten, die ich nun beschreiben werde.
Wie andere Kollegen und Kolleginnen an der UvA auch habe ich Fragebögen entsorgt, nachdem das ganze Department von großen in sehr kleine Büros umgezogen war. Der damalige Chair des Departments, Prof. Dr. Agneta Fischer, hat mich dazu aufgefordert. Aus heutiger Sicht bedaure ich, dass ich dieser Anweisung gefolgt bin. Ich habe allerdings alle Daten in SPSS gespeichert.
Im Jahre 2007 zog ich an die UvA und brachte alle meine relevanten Fragebögen an die UvA mit. Nach dem Umzug in zu kleine Zimmer betonte der Chair des Departments, dass man Fragebögen an der UvA nicht aufheben müsse, dass ein Büro wohnlich aussehen müsse und man mit Raum haushalten sollte. Heute bedaure ich, dass ich, wohl im Stress eines Abteilungsumzugs, zu schnell und unbedacht auf eine Vorgesetzte reagiert habe.
Ich habe Kollegen (u.a. Prof. Dr. Gerben van Kleef), die bezeugen können, dass eine solche Aufforderung ausgesprochen wurde und ich war natürlich nicht der einzige, der Fragebögen aufgrund dieser Mitteilung wegwarf. Einige Kollegen erinnern sich daran (u.a. Prof. Dr. Joop van der Pligt), dass eine gleichklingende Anweisung auch während einer Teamsitzung gefallen war. Zudem wurde eine frühere Kollegin, Dr. Anja Zimmermann, ebenfalls gebeten, ihre Fragebögen aus vorigen Jahren wegzuwerfen, als sie 2007 an die UvA kam. Auch wenn diese Praxis heute merkwürdig erscheint so weist jedoch Prof. Dr. Liberman in einem offenen Brief darauf hin (siehe unten, Blog 2), dass vor den großen Betrugsskandalen in den Jahren 2011 und 2012 die Archivierung in der Psychologie insgesamt international suboptimal war. Unter „Rohdaten“ verstanden viele Kollegen und Kolleginnen diejenigen Daten, die in einem per-participant SPSS-Datenfile zusammengestellt werden; die Papiervorlagen wurden wegen Platzmangels oder aus anderen Gründen häufig entsorgt. Das erschien den meisten logisch, denn inzwischen werden Hauptvariablen häufig am Computer erhoben (so auch bei mir). Und nachdem ein Fragebogen in einem File kodiert wurde, so dachte man, warum sollte man die Bögen aufbewahren, wenn sie keine zusätzlichen Information enthalten? Ich sollte dennoch betonen, dass ich alle kodierten SPSS-Daten, mit denen man die Ergebnisse nachvollziehen kann, an die UvA Kommission geschickt habe (und zur Erinnerung: Statistiker konnten selbst nach monatelanger Arbeit daran nur feststellen, dass ich richtig gerechnet habe, und dass die Daten auf Rohdatenniveau keine Auffälligkeiten oder etwa typische Signaturen für Manipulation zeigen).
Nichtsdestotrotz bedaure ich mein Verhalten. Es ist keine Frage, dass ich in der Zukunft alles sorgfältig archivieren werde. Ich werde alle erdenklichen Vorsichtmaßnahmen unternehmen, um weiteren Datenverlust zu verhindern und etwaige Eingriffe in die Daten ausschließen zu können
I.2. Habe ich die Regelmäßigkeiten (die linearen Muster) sehen können, die der Kläger in seiner Klage an die LOWI bemängelt?
Ich habe die regelmäßigen Muster damals nicht bemerkt. Allerdings wurde auch weder vom Herausgeber der Zeitschrift, noch von unabhängigen Gutachtern (peer reviews) etwas Auffälliges bemerkt. Prof. Dr. Nira Liberman von der Tel Aviv University verfasste einen offenen Brief zu diesem Thema (ht hatte, hatte ich ihr die Daten geschickt, ohne ihr vorher zu sagen, welches Muster sie beinhalten und sie gebeten, den Anlass für die Klage zu finden. Sie hatte selbst unter dieser Instruktion nichts Auffälliges bemerkt. Zwei LOWI- und UvA- Gutachter bemerkten ebenfalls in ihren Gutachten, dass die Besonderheiten der Daten schwierig für Wissenschaftler oder Wissenschaftlerinnen zu erkennen sind, die nicht hochspezialisierte Statistikerinnen oder Statistiker sind. ), in dem sie beteuert, dass auch sie nichts bemerkt hatte. Als mich die Klage erreic
I.3. Wurden die Rohdaten von Experten analysiert und überprüft?
Ja. Ich habe alle SPSS Datenfiles an die UvA geschickt und Statistiker haben daran länger als ein Jahr gearbeitet. Ich denke, dass sie mittlerweile alle Files sorgfältig geprüft haben. Bis jetzt haben sie keine Anzeichen für Manipulationen entdecken können, auch wenn die Homogenität der Befunde und die Linearität merkwürdig erscheinen. Dabei sollte berücksichtigt werden, dass Manipulation normalerweise leicht in Rohdaten zu entdecken ist; in meinem Fall wurde jedoch kein Muster, das typisch für Fehlverhalten wäre, gefunden.
Der Hauptgutachter sagt explizit: „Es ist selbstverständlich möglich, dass Ergebnisse durch normale Erhebungen zustande gekommen sind“. Und: „Tatsächlich beinhalten die Zahlen in den Datenfiles mögliche Werte für jeden individuellen Datenpunkt, und dies sind die Zahlen, die zu den beobachteten Resultaten führen.“ Er schließt: „Ich betone, dass aus den Datenfiles keine Schlussfolgerung gezogen werden, kann dass Anpassungen (Manipulationen) stattgefunden haben. Es kann auch nicht gesagt werden, wann und von wem solche Anpassungen durchgeführt wurden.“
I.4. Was ist mir der „Smoking Gun“, auf den der Science Artikel und der LOWI-Report Bezug nehmen?
Die LOWI fand vor allem ein Argument des Klägers überzeugend, nämlich dass die Linearität, die in der Gesamtstichprobe beobachtet werden kann, nicht in Subgruppen aufträte. Im Science Artikel wird dies als „smoking gun“ bezeichnet, also ein unausweichlicher Beweis für Fehlverhalten.
Tatsächlich wurde das diesbezügliche Gutachten niemals im Netz veröffentlicht. Das gesamte Argument ist, wie Experten sagen, fragwürdig und dies ist offensichtlich, wie die unten zugefügten Graphiken zeigen. Natürlich ist es normal, dass Subgruppen vom Gesamtmuster abweichen. Zudem sind die Subgruppen in dem SPPS-Artikel, um den es geht, überhaupt nicht so verschieden vom Gesamtsample. Das Urteil der LOWI ist also unbegründet.
I.5. Werden die Studien repliziert?
Ja. Eine internationale „Replication Group“ wurde gegründet. Ich bin sehr gerührt von diesem Zeichen der Wertschätzung und bin sehr dankbar. Ergebnisse dieser Gruppe, der Wissenschaftler und Wissenschaftlerinnen aus mehreren Teilen der Welt angehören, werden für Herbst erwartet. Ich werde auch selbst Experimente unter kontrollierten Bedingungen replizieren, wenn mein neues Labor im Herbst die Arbeit aufnehmen kann. Lassen Sie mich bitte betonen, dass die Studien selbst bereits Replikationen früherer Artikel sind. Darum habe ich keinerlei Zweifel an der Validität der Theorie. Mittlerweile liegen zwei erfolgreiche Replikationen aus Israel und den Niederlanden vor.
II. Verdachtsmomente, falls die Daten doch manipuliert worden sind.
II.1. Ist es möglich, dass Mitarbeiter oder Mitarbeiterinnen oder Hilfskräfte die Daten gefälscht haben? Habe ich einen Verdächtigen oder eine Verdächtige im Kopf und habe ich versucht, ihn oder sie zu finden?
Ich habe niemals Daten gefälscht und habe auch keinen meiner Mitarbeiterinnen oder Mitarbeiter oder Hilfskräfte dazu ermutigt, Daten zu fälschen oder zu schönen. Die Experimente wurden zwischen 1999 und 2008 durchgeführt, ich hatte weit über 100 verschiedene Mitarbeiterinnen, Mitarbeiter, Hilfskräfte, Praktikanten und studentische Volontäre zu dieser Zeit (5-20 zu einem Zeitpunkt). Es ist jedoch nicht vollends ausgeschlossen, dass jemand Daten gefälscht hat. Ich habe nach möglichen Verdächtigen geforscht, aber keinen Täter gefunden.
1.1. Ist es möglich, dass die Hilfskräfte genug Kenntnisse besaßen, um Daten zu fälschen und wie kommt es, dass sie Zugang zu den Daten hatten?
Unter den Hilfskräften an den Deutschen Universitäten befanden sich einige, die hervorragend in Mathematik ausgebildet worden waren. Einige waren exzellente Statistiker. Zudem habe ich einigen ganze Sets von Datenfiles überlassen, um etwa zu entscheiden, welche Kontrollfaktoren über verschiedene Studien hinweg eine Rolle spielten, welche Experimente noch einmal durchgeführt werden müssten, und wo Unregelmäßigkeiten zu sehen waren, um nur einiges zu nennen. Hier kann es eventuell zu Eingriffen gekommen sein, auch wenn ich das damals streng kontrolliert habe und auch heute nicht daran glauben kann.
1.2. Warum würde eine Hilfskraft so etwas tun?
Es gibt tatsächlich Anreize für Mitarbeiter und Hilfskräfte dies zu tun, wie wir z.B. auch aus den LOWI-Protokollen der Amsterdamer Hilfskräfte erfahren haben, die angaben, dass man sich durch „gute“ Ergebnisse die Aufmerksamkeit des Professors sichern kann. Damals - vor den Betrugsskandalen - habe ich meinen Mitarbeiterinnen und Mitarbeitern schlichtweg vertraut, so wie viele andere Kolleginnen und Kollegen auch; wir haben es nicht für sehr wahrscheinlich gehalten, dass jemand Daten verfälscht. Ich habe Daten immer nachgeprüft, aber vielleicht habe ich nicht genug kontrolliert. Wenn ich zurückblicke, bedaure ich dies und meine naive Einstellung. Ich denke jedoch, dass Vertrauen in unserer Disziplin auch in der Zukunft dazugehört. Ich vertraue noch stets meinen Mitarbeitern und Mitarbeiterinnen und ich kann mir keine Arbeitsatmosphäre vorstellen, in der niemand dem anderen traut.
1.3. Habe ich mich bemüht einen möglichen Schuldigen zu finde?
Ja, das habe ich tatsächlich getan, aber einige Aspekte meiner damaligen Labororganisation erschweren die Angelegenheit. Zunächst einmal liegen die Experimente lange zurück. Zudem wurden damals aufgrund von Speicher- und Rechnerkapazitätsproblemen Daten von Laborcomputern grundsätzlich nach der Erhebung gelöscht. Dadurch gibt es keine Originalspuren zurück zu den Files, also den Files, die direkt auf dem Computer erhoben wurden, wenn computerisierte Studien im Labor durchgeführt wurden. Ich habe privat einen Ermittler zu Rate gezogen, um zu ergründen, ob ich mögliche Täter unter den gegebenen Umständen überhaupt identifizieren könnte. Dies sei – so die Auskunft - nach so vielen Jahren (7-15 Jahre) nahezu unmöglich.
1.4. Habe ich Mitarbeiterinnen und Mitarbeiter gefragt, ob sie jemanden verdächtigen?
Ja. Niemand hatte einen Verdacht.
Das habe ich. Mit einigen Mitarbeiterinnen und Mitarbeitern habe ich auch versucht, eine Liste zusammenzustellen, derjenigen Mitarbeiter, Praktikanten, Studierenden, Volontäre, die damals in dem Labor gearbeitet haben. An viele konnten wir uns erinnern, aber an manche nur schlecht. Es ist zudem schwierig, sie zurückzuverfolgen, weil viele in ihre Heimatländer zurückgezogen sind. Diejenigen Mitarbeiter, die ich kontaktiert habe, wiesen jegliches Fehlverhalten von sich.
Zudem führten wir sehr viele Studien durch, die sich nur in spezifischen Details voneinander unterschieden. Wenn wir z.B. Ähnlichkeiten erforschten, baten wir in einigen Experimenten die Versuchsteilnehmer, Haie und Delphine miteinander zu vergleichen und in anderen verschiedene Nachrichtensendungen. Die Mitarbeiterinnen und Mitarbeiter erinnerten sich im Allgemeinen daran, dass wir solche Studien durchgeführt haben, aber sie konnten sich natürlich nicht mehr daran erinnern, welche spezifische Studie sie durchgeführt hatten (z.B. die mit den Fischen oder die mit den Sendungen). Dadurch ist es schwierig, nach all der Zeit festzustellen, wer an welchem Experiment beteiligt war.
II. 2. Gibt es andere Artikel, in denen lineare Muster auftauchen?
Ja, aber das ist kein Problem, weil sich in anderen Artikeln die Linearität nicht massiert, oder in allen Studien zeigt. Statistiker sagen, dass Linearität an sich kein Problem darstellt; sie ist unwahrscheinlich aber nicht unmöglich. Es gibt einige Artikel in der psychologischen Literatur (und wahrscheinlich auch in anderen Disziplinen), in denen lineare Muster auftauchen. Linearität ist unwahrscheinlicher, wenn sie massiert über Studien in einem bestimmten Artikel hinweg auftaucht und das scheint nicht der Fall zu sein. Jedoch scheinen Befunde an der UvA weniger häufig lineare Muster aufzuweisen.
Hier einige Artikel mit unproblematischen „linearen“ Befunden:
Kanten, A. B. (2011). The effect of construal level on predictions of task duration. Journal of Experimental Social Psychology, 47(6), 1037-1047.
Lerouge, D. (2009). Evaluating the Benefits of Distraction on Product Evaluations: The Mind‐Set Effect. Journal of Consumer Research, 36(3), 367-379.
Malkoc, S. A., Zauberman, G., & Bettman, J. R. (2010). Unstuck from the concrete: Carryover effects of abstract mindsets in intertemporal preferences. Organizational Behavior and Human Decision Processes, 113(2), 112-126.
III. Veröffentlichung des Artikels; Zurückziehung
III. 1. Warum habe ich den Artikel nicht zurückgezogen?
Viele Kolleginnen und Kollegen habe mich gebeten, den Artikel nicht zurückzuziehen. Ich habe jedoch einen Brief an die Herausgeber von SPPS geschickt, mit der Bitte, dies selbst zu entscheiden. Tatsächlich wäre es für mich in dieser Situation viel leichter gewesen, einen Artikel, der in einem Journal mit geringem Impact factor veröffentlicht wurde, zurück zu ziehen und damit meine Ruhe zu haben. Ich habe das nicht getan, weil ich nicht sicher bin, ob das korrekt ist.
Man mag sich fragen, warum ich die Artikel nicht sofort zurückgezogen habe. Manche Kolleginnen und Kollegen denken gar, dass eine Zurückziehung per se ein Zeichen für wissenschaftliche Integrität ist. Für eine Zurückziehung sind jedoch konkrete Beweise für Manipulation nötig. Pure Zweifel reichen nicht aus, um einen wissenschaftlichen Artikel zurückzuziehen. Ich habe immer signalisiert, dass ich vollkommen einverstanden mit einer Zurückziehung bin, wenn Expertinnen oder Experten einhellig der Meinung sind, dass die Daten falsch sind, oder wenn ein „Täter“ gefunden würde. Dies ist aber nicht der Fall. So wurde zum Beispiel ein spezieller Anklagepunkt, der behauptet, die beobachtete Regelmäßigkeit trete bei Untergruppen nicht auf, inzwischen von Statistikern kritisiert: Es wäre völlig normal, dass Untergruppen vom Muster des gesamten samples abweichen und im Übrigen folgten in der SPPS-Studie viele Untergruppen dem Gesamtsample, wie auf den Grafiken unten auch unschwer zu sehen ist. Der LOWI standen diese Files, die genau dieses Problem zeigen, zur Verfügung.
Andere Kolleginnen und Kollegen warnten davor, aufgrund von statistischen Verfahren Artikel zurück zu ziehen. Sie argumentieren, dass Artikel, die bspw. nach modernen Vorstellungen falsch ausgewertet wurden, ja auch nicht zurückgezogen werden. Andere wiesen darauf hin, dass momentan zahlreiche psychologische Artikel einer Kritik durch Statistiker ausgesetzt sind, und auch diese Artikel nicht zurückgezogen würden. Die angewandten statistischen Verfahren seien selbst Gegenstand einer Diskussion und eine leichtfertige Zurückweisungs-Praxis würde einen Dammbruch einleiten. Zu guter Letzt stört es viele, dass die vorgelegte Kritik an meinen Befunden nicht publiziert werden konnte und die LOWI und die UvA dieses Dokument trotzdem zum Ausgangspunkt ihrer Entscheidung machten. Eine wissenschaftliche Diskussion erfordere eine Auseinandersetzung in wissenschaftlichen Zeitschriften, in denen unabhängige Gutachterinnen und Gutachter über die Veröffentlichung entscheiden.
Zusammenfassend bin ich also in einem Konflikt. Ich bin durchaus gewillt, den Artikel zurückzuziehen, sehe mich aber einem Heer von Kollegen gegenüber, das mich eindringlich davor warnt, durch Zurückziehung einen Dammbruch zu bewirken und der Disziplin nachhaltig zu schaden. Da ich keine konkreten Beweise für eine Datenmanipulation habe, dränge ich nicht auf eine Zurückziehung und überlasse die Entscheidung den Herausgebern, die, im Falle von SPPS, die Sache Expertinnen und Experten übergeben haben.
IV. 1. Habe ich mit der Presse zusammengearbeitet? Oder warum habe ich nicht mehr mit Journalisten zusammengearbeitet?
Bisher haben mich gerade einmal drei Journalisten um ein Gespräch zu den Vorwürfen gebeten. Einem habe ich ein umfangreiches Interview gegeben. Ich habe nicht auf Blogs und Internetseiten reagiert, weil sie zu viele Fehler beinhalten.
Manche Kolleginnen und Kollegen wunderten sich darüber, warum ich nicht auf Journalistenanfragen geantwortet hätte, andere rieten mir dagegen, niemals auf solche Anfragen zu antworten. Ich habe manche Tipps nicht berücksichtigen können und habe damit einige Leute vor den Kopf gestoßen, weil ich ihren offensichtlichen und dringlichen Ratschlägen nicht gefolgt bin.
Manche vermuteten wohl, dass ich zahlreiche Presseanfragen erhalten und alle abgewehrt hatte. Tatsache ist, dass ich von insgesamt 3 Journalisten zu den Vorwürfen angesprochen wurde, einem preisgekrönten Wissenschaftsjournalisten habe ich ein Interview gegeben.
In Wahrheit habe ich mich tatsächlich geweigert, mit Frank von Kolfschooten zu kommunizieren, der fast alle Originalartikel in Zeitungen und Zeitschriften über mich geschrieben hat. Prominente Niederländische Wissenschaftler verschiedener Disziplinen, darunter Mitglieder der Königlichen Niederländischen Akademie der Wissenschaften (KNAW) rieten mir, meist unaufgefordert, nicht mit ihm zu kommunizieren, da er ein starkes finanzielles Interesse an Berichten über „Betrugsskandale“ und eine recht negative Einstellung den Wissenschaften gegenüber habe. Ich war ein Fremder in diesem Land und bin den Empfehlungen wichtiger Kolleginnen und Kollegen gefolgt. Einige Kollegen, die er anschrieb, fühlten sich durch ihn belästigt (s.a. http://www.professorpruijm.com/2014/07/is-van-kolfschooten-een-stalker.html).
Zudem fielen mir einige Voreingenommenheiten auf, die mich auch jetzt nicht überzeugen würden, mit ihm zu kommunizieren.
*Van Kolfschooten zitiert die UvA- und LOWI-Berichte falsch und selektiv (z.B. titelt er in seinem ersten Bericht „UVA-Professor manipulierte Untersuchungsdaten“ – unterstellt also eine gesicherte Beteiligung meinerseits, die das UvA-Urteil explizit ausschließt. Da heißt es, man könne überhaupt nicht sagen, wenn etwas manipuliert wurde, wer es getan hätte.
*Seine Hypothese, die er im letzten Science Artikel äußert, dass die Studien nicht in Deutschland, sondern in Amsterdam durchgeführt worden wären, basieren auf der gleichlautenden Unterstellung des Klägers und einem oberflächlichen Lesen meiner Artikel, die Deutsches Stimulusmaterial beinhalten.
*Man kann zudem davon ausgehen, dass er in dieser Frage meine klärenden Antworten auf die Vorwürfe an die LOWI kannte. Man könnte daraus schließen, dass er diese Antworten nicht zitiert hat, weil er sonst den Artikel nicht in Science hätte veröffentlichen können.
Mit den sozialen Medien habe ich nicht kommuniziert. Es wäre eine Sisyphos-Arbeit gewesen, die falschen, anonymen und oftmals wohl nicht ernst gemeinten Kommentare zu beantworten. Auch Internetseiten wie „retraction watch“, die für manche eine gewisse Wissenschaftlichkeit ausstrahlen, enthalten viele inhaltliche Fehler und sollten in einer wissenschaftlichen Diskussion mit großer Vorsicht genossen werden. Auch auf diesen Seiten können Benutzer anonym schreiben, was sie wollen – verlieren kann hier lediglich der „Angeklagte“.
So wurden z.B. Gerüchte über einen „computer crash“ in die Welt gesetzt, was nie ein Thema in meinem Verfahren war. Zwar habe ich in meiner 20-Jährigen Karriere sicherlich Computer-Crashs gehabt und dabei auch Daten verloren, aber ich habe niemals behauptet, ich könne keine Daten liefern, weil sie bei einem Crash verlorengegangen seien. Tatsächlich habe ich Daten an die Kommissionen geschickt, sodass die Ergebnisse der Artikel rekonstruiert werden konnten. Ich vermute, dass manche eifrige Bloggerin oder mancher Leser LOWI-Berichte verwechselt hat, die anonymisiert im Internet zu finden sind. Es gibt dort viele Fehler im Netz, die offenbar auf solchen Verwechslungen beruhen.
Zu guter Letzt sollte ich erwähnen, dass mittlerweile drei weitere Artikel in Zeitschriften erschienen sind, die sich allerdings allein auf meine Forschung konzentrieren.
V. Leben an der Ruhr-Universität-Bochum (RUB)
V. 1. Waren die RUB und die Alexander von Humboldt-Stiftung (AvH) eigentlich informiert, bevor der Fall an die Öffentlichkeit kam?
V. 2. Ist die AvH Professur ausgesetzt?
Ja. Die Alexander-von-Humboldt-Stiftung setzte ihre Förderentscheidung zur Verleihung der Humboldt-Professur entsprechend ihrer Stiftungsregeln sowie auf meine Bitte bis zur Klärung des Sachverhaltes aus.
V. 3. Wann bewarb ich mich für die Stelle an der RUB?
Mein Bewerbungsgespräch fand im April 2012 statt, lange bevor mich die Klage erreichte. Zu diesem Zeitpunkt war auch keine Rede von einer Bewerbung bei der AvH. Die Berufung wurde zunächst zurückgestellt, bis die Untersuchung beendet ist. Im Moment habe ich eine Vertretungsprofessur.
V. 4. Bin ich wieder voll im Geschäft?
Ja. Ich begutachte Manuskripte und Anträge für wissenschaftliche Zeitschriften und Forschungsorganisationen, ich habe angefangen, ein Manuskript über meine neue motivationspsychologische Theorie zu schreiben, ich habe ein Buch angefangen, ich halte Vorträge und bereite meine Lehre an der RUB vor. Natürlich werde ich auch zusammen mit der replication group einige meiner Befunde replizieren, allerdings glaube ich, dass es zunächst gut ist, wenn die Replikation von anderen Laboren durchgeführt werden.
Ich liebe meine Arbeit, ich bin sehr glücklich über meine neuen Kolleginnen und Kollegen und die neuen Studierenden und ich hoffe das Beste.
Ich danke Ihnen für Ihre Aufmerksamkeit und sende herzliche Grüße an alle Freunde und Freundinnen.
Ich fühle ich mich jetzt sehr wohl in Bochum. Ich fühle, dass ich gebraucht werde und hoffe, dass ich mich nützlich mache, ich hoffe, dass ich einen guten Job mache, ich genieße die warme und freundliche Atmosphäre, ich treffe Kolleginnen und Kollegen zum Essen und zu Arbeitstreffen, und meine Begeisterung für mein Fach, die übrigens nie gewichen war und mich über die Zeit getragen hat, steckt andere an.
Zu guter Letzt muss ich sagen, dass mir die Kenntnisse in der Sozialpsychologie und der Selbstregulationsforschung entscheidend geholfen haben, diese ungeheuerliche Phase meines Lebens zu überstehen. Unsere Wissenschaft hat mir geholfen, zu verstehen was passiert ist, und ich habe sie gebraucht um zu wissen wie man mit Situationen umgeht, die vollkommen außer Kontrolle geraten sind. Unsere Forschung ist unglaublich hilfreich.
Mit herzlichem Dank und freundlichen Grüßen,
For graphs please copy this link and go to : http://www.ruhr-uni-bochum.de/soc-psy/Misc/Linearity%20in%20subgroups.pdf
5 b. Letter by Jens Förster, September 10, 2014
Dear colleagues and friends,
some weeks have passed since the Dutch newspaper NRC published an article suggesting that I manipulated data; at the same time the LOWI (the national ethics commission of the Netherlands) published a report on my “case” on their website. A few other publications followed, and I responded to them on my web site.
Meanwhile, I started my new job as a visiting professor at the Ruhr-University-Bochum (RUB) and I am relieved of some of the tension that was part of my daily life in the last two years. I have the feeling that I can start sorting things out. I am grateful for the support I receive from people in the field.
At the same time, I understand clearly that I owe you all answers to many questions and this is what I try to do in this letter. The questions branch out in a complicated structure so it is not an easy task, but I will try to do my best. For each question, I provide a short answer and – if necessary – a long answer. In the long answer I also answer sub-questions that might follow.
Before I do that, however, I would like to apologize wholeheartedly for the trouble that I caused. This is a nightmare not only for me but also for the field, colleagues, friends and family. I published data that according to some experts is problematic. I take responsibility for that, although I did not fake or manipulate any data. I still cannot explain the unlikely patterns in the data, but I can try to explore some possible, alternative explanations. None of the directions I examined were conclusive and there are many open questions and possibilities, as I will write next.
I. Data issues
I.1. Archiving: Why did I not keep the questionnaires?
Like other colleagues, I dumped them after the entire department at University of Amsterdam (UvA) moved to very small, shared offices from larger offices that we had each for his or her own. The Chair of the Social Psychology Department at that time, Dr. Agneta Fischer, asked me to do this. I regret that I followed these instructions. Of course, all the original data had been stored on SPSS files.
I would like to note that in 2007 I had moved all the questionnaires with me that appeared relevant to me. After the moving to much smaller rooms (that I had to share with a colleague), Dr. Fischer pointed out that one does not keep questionnaires here, that offices should look “livable”, and that one needs to take care of space. With hindsight, I think that I reacted too quickly and too thoughtlessly to the instructions by the Chair of the Social Psychology department.
I meanwhile have colleagues that can bear witness to the fact that this suggestion was made (Prof. Dr. Gerben van Kleef), and I was of course not the only one who dumped questionnaires based on this advice. Some colleagues (for example Prof. Dr. Joop van der Pligt) even remember that she said this during a team meeting. Also, a former post hoc student, Anja Zimmermann, was told to dump the questionnaires from previous years when moving into the offices at the UvA building in 2007 (by a different person). One may find this practice odd, however, Dr. Nira Liberman, in an open letter (see below Blog 2), pointed out that before the scandals in the years 2011 and 2012, archiving was underdeveloped in Psychology internationally. Colleagues understood that it was enough to keep the SPSS files with per-participant data, and paper material was dumped many times because of space limitations or other reasons. This seemed logical to researchers also because many studies (like some of mine) were run on computers and did not have paper questionnaires at all. We thought that once the data of a questionnaire is coded into a file, why keep the original questionnaire unless it has extra information that is not coded into the file? I should also repeat that I sent all relevant participant-level SPSS files to the UvA commission with which one can re analyze the findings (Note that statisticians, after months of investigation, noted that I analyzed the data correctly and that it shows no oddities, or “signatures of fraud”, even on the level of raw data).
In any event, I deeply regret my behavior. There is no question that I will carefully store my data in the future. I will do anything to avoid future data loss and will install even more control mechanisms to avoid any possibility for data manipulation.
I. 2. Could I have seen the regularities (linear relationships) that the complainant points to in his accusation?
I did not see the regular patterns back then. However, neither the editor nor the independent peer reviewers noticed anything odd. Dr. Liberman from Tel Aviv University published a letter on this topic (see below, Blog 2), in which she reaffirms that she also did not notice anything, even after being informed that the papers were the basis of an accusation. Two LOWI and UVA reviewers say in their reviews that the regularities are difficult to see for people not highly specialized in statistics.
I. 3. Was the raw data carefully analyzed by experts?
Yes. I had sent all the relevant SPPS files to the UvA and statisticians have been working on them for more than a year. I believe they must have examined meanwhile all files. So far, they have not detected any concrete signature of fraud, even though they found the homogeneity of variance and the linearity odd. Note that usually, fraud can be detected easily in raw data, but in my case no patterns typical for fraudulent behavior could be identified.
The main reviewer explicitly states: “It is of course possible that the observed pattern was obtained by measurements”, and “In fact, the numbers in the data files represent possible values for each individual data point, and these are the numbers that lead to the observed pattern“. He concludes: „I emphasize that from the data files one can in no way infer that […] adjustments have actually been done. Nor can be said when and by whom such adjustments would have been done.”
I. 4. What about the “smoking gun” the Science article and the LOWI report are referring to?
Answer: The LOWI ethics commission found especially one argument by the complainant compelling, namely, that the linearity that can be seen in the whole sample, does not show up in the subgroups of gender. In a Science article, the journalist Frank van Kolfschooten referred to this “finding” as the “smoking gun” – an inevitable sign of fraud.
In fact, the respective LOWI review, presenting this argument, has never been published on the net. The entire argument however is disputable as other experts say and as can be easily seen in the graphs I attach below. First, it is of course normal that subgroups deviate from the pattern of the larger sample. Second, in the SPPS paper, the gender subgroups are in fact not that different from each other. Consequently, the main basis for the LOWI argument is invalid.
I. 5. Will the studies be replicated?
Yes. An international replication group was established. I was overwhelmed by this sign of appreciation for my work and I am very thankful. Results of this replication group, involving researchers from many different countries can be expected in fall. I will also replicate studies under controlled condition when my lab starts operating in fall. Let me repeat that the studies themselves were replications of former papers. I have no doubts about the validity of my theory. Meanwhile two independent studies from Israel and the Netherlands already show successful replications.
II. Suspicions, in case the data was manipulated
II. 1. Is it possible that co workers or research assistants manipulated data? Do I have a suspect in mind and did I try to find him/her?
I never manipulated data nor did I encourage my co workers or research assistants to “clear” or “massage” data. The studies were conducted 1999-2008, I had more than 100 lab assistants at this period of time (5-20 at a certain time). I cannot entirely rule out the possibility that one or some of them manipulated data. I searched for potential suspects but could not find a perpetrator.
1.1. Is it possible that RAs knew enough stats to fake the data and how come they had access to data sets that combined a number of studies?
Among the research assistants at the German Universities, I had students that were highly educated in mathematics. Some of them were excellent statisticians. In addition, I gave entire sets of data files to some, to let them decide which control factors are important across studies, which experiments should be redone, and which showed irregularities, to name just a few questions that I would have normally asked RAs to take care of. It is possible that during this process (or at the time of data collection) some intervention took place, even though I used to strictly control my lab and even though I cannot believe today that this is true.
1.2. Why would an RA do that?
There are incentives for research assistants, as we learned during the LOWI interviews from the former Dutch research assistants, such as for example getting attention or approval from the instructor by providing “good” results. However, back then, before we learned about the scandals in our field, like many other colleagues I trusted my co workers in general and the possibility that they would do something with the data did not really occur to me – I thought this would be rather unlikely. Of course, I always checked the data they provided to me, but maybe I did not control enough. Looking back, I of course regret it, and regret my naïve attitude. I think, however, that trust will need to be part of the field in the future as well. I still do trust my former co workers, and I cannot envision a situation in which nobody trusts anybody else.
1.3. Did I make an effort to find out who could have been the person?
I did, but a few aspects of the way my lab was organized make it very difficult. First and foremost, the experiments were conducted long time ago. To complicate matters, back then as a rule files were taken from the computer and deleted immediately after data collection in order to make hard disk space available. Back then, disc space was an issue. In that way, there are no traces of the original files, the files that are created on the computer when a computerized experiment is being run. Privately, I asked a professional investigator for advice, in order to learn whether it would be possible to identify possible fraudsters, under these conditions. His professional evaluation was that after 7-14 years, this seems to be impossible.
1.4. Did I ask co-workers if they have a suspect in mind?
I did. There were no suspicions.
I did. With some of my colleagues and former RAs I tried to compile a list of RAs, interns, students, and volunteers that worked in the lab. Many of them we remembered, and some of them we remembered only vaguely and perhaps incorrectly. Moreover, it is difficult to trace them because most of them moved back to their different home countries. Of course, the colleagues themselves denied any kind of misconduct.
Please also note that we did many studies that differed only in very specific details from one another. For example when investigating similarities, we asked in some experiments participants to compare sharks with dolphins and in others to compare two tv shows. Co workers in general remembered that we did such studies but of course do not remember who ran specific studies (e.g. whether they did the study with sharks or tv shows). Thus it is difficult in any event to decide who participated in which specific study.
II. 2. Are there other papers showing linearity?
Yes, but this is not a problem, because in other papers linearity does not accumulate across studies, and as statisticians say, linearity per se is not a problem; it is unlikely but not impossible. Note that there are many other papers in psychology (and probably also in other disciplines) that show linearity or other regular patterns. Linearity is more unlikely if it accumulates across studies within a certain paper, and this does not seem to be the case. However, notably, it seems that linearity was less in studies conducted at UvA.
Some examples for non-problematic linear findings:
Kanten, A. B. (2011). The effect of construal level on predictions of task duration. Journal of Experimental Social Psychology, 47(6), 1037-1047.
Lerouge, D. (2009). Evaluating the Benefits of Distraction on Product Evaluations: The Mind‐Set Effect. Journal of Consumer Research, 36(3), 367-379.
Malkoc, S. A., Zauberman, G., & Bettman, J. R. (2010). Unstuck from the concrete: Carryover effects of abstract mindsets in intertemporal preferences. Organizational Behavior and Human Decision Processes, 113(2), 112-126.
III. Publication Issues
III. 1. Why did I not simply retract the article?
I have been asked by many colleagues not to retract the paper. However, I sent a letter to the SPPS editors to decide themselves. In a way, for me, in this situation, it would have been much easier to retract the paper of a low impact journal and to find closure and peace. I did not retract it because I am not sure that this is procedurally right.
One may wonder, why I have not simply retracted the article. Some people think that retraction would reflect in itself academic integrity. However, for retraction, concrete proof of fraud is necessary. Simple doubts are not sufficient to justify retraction. I have signalized that I would completely agree with a retraction, if experts come to the conclusion that the data is fake, or if we learn about a wrongdoer. However, note that the conclusions drawn from the statistical analyses are not totally clear. To give an example, the argument that the observed regularities would not show up in subgroups of the sample, has been criticized by other statisticians. They claim that it is normal that subgroups deviate from the pattern of the larger sample. In the SPPS paper in question, it looks as if they are in fact not that different from each other, and rather show linearity (see graphs attached). The LOWI had the files that show these patterns for inspection. Other colleagues warned against retracting papers on the basis of statistical analyses. They argue that former studies that according to new insights in statistics were analyzed in a wrong way also do not get retracted. Others mentioned that currently, many studies are criticized by statisticians in the field and argue that their papers also do not get rejected. Moreover, the statistical analyses used are themselves part of a scientific discussion, and retraction of my papers could lead to a ”breach in a dyke” that would affect many other publications in psychology. Finally, many are upset that “the complaint” itself is obviously unpublishable in a refereed journal, and that both LOWI and UvA still used it as an argument for a crucial judgment. However, a scientific discussion affords a discussion in refereed outlets, in which expert peer reviewers decide on publication.
In summary, I do experience a conflict. On the one hand, I am willing to retract the article, however on the other hand, I am confronted with many requests from colleagues that ask me not to do so. Since I do not have any concrete proof for data manipulation, I handed over the decision to the editors, who, I understand, in case of SPPS have handed over the case to experts.
IV. Press and Media Work
IV. 1. Did I cooperate with the press? Or why didn’t I cooperate with it more?
So long, only three journalists approached me about this issue, and I gave to one of them an interview. I did not respond to blogs on websites because of the many mistakes they contained.
Some colleagues complained that I did not respond to journalists’ approaches, while many more people recommended to never respond. I therefore violated some advice, and inevitably made some people feel that I fail to do what seems to them an obvious thing that they clearly told me to do.
Most probably, people assume that I received tons of requests and that I rejected them all.
The truth is, that I was approached by only three journalists about this issue, and with one of them, who is an award winning science journalist, I talked an entire afternoon for an interview.
However, it is true that I refused to talk to Frank van Kolfschooten, who wrote almost all original newspaper articles on this case. Prominent Dutch scientists from different disciplines (including members of the Royal Dutch Academy of Sciences, KNAW), warned me and advised me not to talk to him, because of his alleged strong financial interest in reporting about “fraud scandals” and because according to them he holds a rather negative attitude against sciences. I was a stranger in this country and followed the recommendations. Colleagues he approached felt molested (see also http://www.professorpruijm.com/2014/07/is-van-kolfschooten-een-stalker.html).
In addition, I also noticed some biased tendencies in his writings that still would not convince me to communicate with him in the future:
· Van Kolfschooten misquotes UvA and LOWI reports (e.g. he writes that these reports conclude that I manipulated data even though they really only say that it is not clear who did it if it was done).
· His hypothesis published in his recent Science article, following the argument by the complainant, suggesting that the studies were conducted in Amsterdam, are based on sloppy reading of my articles that report procedures with German stimulus material (see Blog 4 below).
· Moreover, it could be assumed that he also knew my clarifying answers to the LOWI commission on this issue. One may infer that he did not report these answers, because it would have made his manuscript unpublishable.
I did not communicate via social media. I would have been Sisyphus work to respond to the often incorrect, anonymized and sometimes obviously not serious comments. Internet pages like “retraction watch” that may have at least in the eyes of some people a scientific appearance should also be read with caution, because of the many mistakes they include. Bloggers can anonymously post whatever they have in mind. The only person that can lose here is the “accused”.
To give an example, rumors about a computer crash that I ostensibly claimed were disseminated even though this has never been a topic during the investigations. Of course during my 20 years career I had crashes and I lost data, but I never claimed that I cannot provide the data in question due to a computer crash, and in fact provided data files, from which the reported results of the papers could be reconstructed. I suppose that some eager readers and bloggers confused LOWI reports that can be found in anonymized versions on the internet. There are many mistakes that are supposedly due to confusion with other cases.
Finally, I might add that in the meantime three articles in journals and magazines appeared that focus only on my research.
V. Present Life at Ruhr-Universität-Bochum
V. 1. Were the AvH and the Ruhr-University-Bochum informed before the case was public?
V.2. Was the AvH professorship suspended?
Yes. The AvH and I came to the agreement to suspend the AvH-professorship and to hand it over to experts for further examination.
V. 3. When did I apply for the position?
I had my job interview in April 2012, long before I received the complaint. At this time, there was further no mentioning of the AvH-award. The permanent position has been suspended until the examination of the case is finished. I currently have a visiting professor position.
V. 4. Am I fully back to business?
I am currently reviewing again for grants and journals, I started writing up a manuscript for my new theory on motivation, I started writing a book, I give talks and prepare for teaching at RUB. I of course plan to be part of the replication effort of my work. I think, however, that it is better that at this point the replications are run by other labs. I love my work, I love the colleagues and the students and I hope for the best.
Let me end this letter with sending out my warmest wishes to my friends all over the world.
I am now very happy in Bochum. I hope that I am being useful, I hope that I am doing a good job, I am enjoying a warm and friendly atmosphere, I meet colleagues for lunch and for lab meetings, and my enthusiasm for my discipline, that never really left me and helped me to survive during this tough time, is contagious again.
Finally, the me say that, my expertise in Social Cognition and Self Regulation helped me a lot to survive this devastating period, since it helped me to understand what happened and it taught me how to cope with situations that are completely out of control. Our research is incredibly helpful.
For graphs please copy this link and go to : http://www.ruhr-uni-bochum.de/soc-psy/Misc/Linearity%20in%20subgroups.pdf
4. Reaction to Science Article, May 29, 2014
Dear colleagues, dear friends,
As you may have heard, Frank van Kolfschooten (the journalist who published the first NRC article on my "case" some hours after the UvA report was published, and who also wrote the articles that appeared in Science magazine and the Süddeutsche Zeitung) continues investigating my case, citing in his recent Science magazine article from Mai 29 an email conversation between me and a former research assistant. In his new article, the author presents his idea that the studies that I reported had not been done in Germany, but rather in Amsterdam.
Even though printing a conversation with a student in public might be seen as questionable behavior, I am glad that he published this because it illustrates the Kafkaesque situation I am in, in which everything, even standard and viable ideas and decisions are turned against me. As social psychologists we know that if people are convinced about certain facts, this shapes their world views and biases their information search and interpretation (aka as confirmation bias). Eventually, they perceive everything as consistent with their hypotheses. This however can happen unconsciously.
However, the concerns the article might raise can be easily addressed.
First, let me say that all these emails and indications (that are actually based on private investigations by the complainant who had a similar hypothesis in mind) had been examined by the National Ethics Commission (the LOWI). Apparently, all these concerns were unwarranted; this is the reason why they do not even show up in the final evaluation.
If you again wonder about the procedures, I can only tell you: Yes, it is true that the complainant who at the same time was the major expert in statistics during the investigation also interviewed my former research assistants secretly. And yes, it is true that we do not know how s/he asked questions and what questions exactly s/he asked. These conversations are presented out of context.
Second, I do not understand why conducting experiments at UvA logically excludes the possibility that I had done similar ones in Germany. Note that the JEP:G 2009 was submitted in March 2008, and I arrived in Amsterdam summer 2007 – this would have been a rather short time to do all the studies. Note also that in the studies participants had to compare for example “heute” and “tagesschau” – two German news shows that are rather unfamiliar to Dutch participants. Note further that the 2012 SPPS paper contains 390 solutions for a creativity task written in German. Finally, in the Appendix of the 2011 JEP: G paper you find a list with German words. I conclude that the articles that are criticized were not read carefully, and that search biases might have led to wrong conclusions.
Third and most importantly, let me repeat what I expressed in my statement #3 below: I conducted the published studies between 1999 and 2008 in Germany. For outsiders who are not familiar with research in Psychology, it might appear to be strange that I developed stimulus material that allegedly had been used previously.
However, I wanted to conceptually replicate and extend my previously in Germany obtained results with a different (i.e., Dutch) population in a different (i.e., Dutch) language. This requires stimulus material that is suited to test hypotheses with a different population. Just to illustrate this point: Imagine you conducted a certain type of studies with children and want to conduct it later with adults. Of course, you would have to prepare stimulus material for the adults that is different from the one for children. Applying this example to the journalist’s logic, he would wonder why adults would require different stimulus material.
More specifically, the study 1C (from JEP: G 2011) using a “Moldavian” nonsense poem, had been done in Germany. It included a poem for that I changed the vowels and consonants to a fantasy language. The original poem was an old Transylvanian song.
In Amsterdam, I first thought Moldavian would be associated with negative stereotypes (I sensed strong prejudice against East Europeans) and that Malaysian was both more neutral and more believable. Moreover, changing the language would count as yet another conceptual rather than straight replication; something we are looking for. Eventually, however after discussions with the research assistant I decided to take again Moldavian, among others because the poem sounded also to Dutch students more East European than Malaysian, and students considered Moldavians a rather neutral group.
Thus, it is true that I wanted to do similar studies at UvA that included both replications and extensions. I sent basic plans and designs to my research assistant. Obviously the journalist received these emails and files and misinterpreted them. Actually, these files were the beginning of the task for the research assistant: “Let me know what you think, how can this be done, what do you think works best for Dutch students – and if it is impossible for you to figure this out, I can take over”. I wanted his fresh creative “Dutch” input with regard to this paradigm. My experience told me that I cannot simply transport the studies from a German to a Dutch context, rather, some cultural differences (such as food preferences or contents of stereotypes) would apply. I wanted to obtain an unbiased view on materials using the logic from the old basic study set ups to see how they fitted the new environment. Creativity research shows that you block creative thought if you tell too much in advance. In addition, telling a research assistant that using similar paradigms in Germany had already led to many successful studies would have produced tremendous pressure on him. Such pressure could produce unwanted behavior (e.g. experimenter biases) that social psychologists aim to control for. As a matter of fact, such strategy of “not telling too much” is also used in other disciplines and I teach it whenever I teach methods in Social Psychology. However, note that the studies we did at UvA were slightly different (we added modalities to the basic one modality design).
Finally, the SPSS data file from February 2013 contains the original data. Laypeople might not know this but SPSS files get constantly updated. This however does not mean that the original values are changed. Rather, if you translate variable labels from German to English (like “Geschlecht” to “gender”), this file would receive a new time stamp – including the unchanged, original values. In fact, I translated variable labels from German to English in order to make re analyses (for the investigation committee) easier. And please let me use this example to illustrate the unfortunate situation: I wanted and still want to contribute to clarifying the situation. Therefore, I changed the names of the variables from German to English. This or at least the change in the time stamp is now held against me. If I would not have translated the variables, of course, one could have argued that if one fabricates data, I would have used German and not English variable names (to demonstrate that they have been conducted in Germany). In any event – with German or English variable names – I would have found “guilty”. Confirmation Bias in action!
As I said, I gave all these answers to the commissions, and I wonder why the person who passed the material to the journalist did not pass my answers as well - or why the journalist, in case he had the material, did not talk about these simple, unspectacular responses in his article. It is hard for me to believe that this selection happened in the unconscious.
In the end, the lengthy article does not convey any new relevant information. Still, there is no concrete evidence whatsoever of violation of academic integrity. However, this accumulation of negative conclusions, unintended or not, certainly affects my reputation. Note also that some concerns raised in the article were already addressed in my letters and reactions below. I explained in text #3 how I treated outliers and I reported in #1 that an UvA authority figure asked me to dump the questionnaires. Meanwhile I have witnesses for this. Moreover, a former PhD student wrote to me that s/he was asked to dump questionnaires by yet a different person from the department. Ignoring such information is another typical result of a confirmation bias.
In general, I wonder why people publish doubts about my studies that are so obviously unwarranted and that do certainly harm my reputation. Many times misrepresentations are of course lack of expertise to judge the facts (how do we prevent for demand characteristics? how do we prevent for experimenter biases? what do we tell experimenters and why?). However please also note that for some people my case could be profitable.
3. Response to the LOWI Report Published Last Wednesday
Jens Förster, May 11, 2014
Dear colleagues, some of you wonder how I am doing, and how I will address the current accusations. You can imagine that I have a lot of work to do, now. There are many letters to write, there are a number of emails, meetings, and phone calls. I also started the moving process. And there is my daily work.
I keep going because of the tremendous support that I experience. This is clearly overwhelming!
The publication of the LOWI report came unexpectedly, so forgive me that I needed some time to write this response. Another reason is that I still hesitate to share certain insights with the public, because I was asked to remain confidential about the investigation. It is hard for me to decide how far I can go to reveal certain reviews or results. This is especially difficult to me because the Netherlands is a foreign country to me and norms differ from my home country. In addition, this week, the official original complaint was posted to some chatrooms. Both papers raise questions, especially about my Förster et al. 2012 paper published in SPPS.
First and foremost let me repeat that I never manipulated data and I never motivated my co workers to manipulate data. My co author of the 2012 paper, Markus Denzler, has nothing to do with the data collection or the data analysis. I had invited him to join the publication because he was involved generally in the project.
The original accusation raises a few specific questions about my studies. These concerns are easy to alleviate. Let me now respond to the specific questions and explain the rules and procedures in my labs.
Origin of Studies and Lab-Organization During that Time
The series of experiments were run 1999 – 2008 in Germany, most of them Bremen, at Jacobs University; the specific dates of single experiments I do not know anymore. Many studies were run with a population of university students that is not restricted to psychology students. This is how we usually recruited participants. Sometimes, we also tested guests, students in the classrooms or business people that visited. This explains why the gender distribution deviates from the distribution of Amsterdam psychology students. This distribution closely resembles the one reported in my other papers. Note that I never wrote that the studies were conducted at the UvA, this was an unwarranted assumption by the complainant. Indeed, the SPSS files on the creativity experiments for 2012 paper include the 390 German answers. This was also explicitly noted by the expert review for the LOWI who re analyzed the data.
During the 9 years I conducted the studies, I had approximately 150 co-workers (research assistants, interns, volunteers, students, PhDs, colleagues). Note that the LOWI interviewed two research assistants that worked with me at UvA, their reports however do not reflect the typical organization at for example Bremen, where I had a much larger lab with many more co workers. However, former co workers from Bremen invited by the former UvA commission basically confirmed the general procedure described here.
At times I had 15 research assistants and more people (students, interns, volunteers, PhDs, etc.) who would conduct experimental batteries for me. They (those could be different people) entered the data when it was paper and pencil questionnaire data and they would organize computer data into workable summary files (one line per subject, one column per variable). For me to have a better overview of the effects in numerous studies, some would also prepare summary files for me in which multiple experiments would be included. The data files I gave to the LOWI reflect this: To give an example for the SPPS (2012) paper, I had two data files, one including the five experiments that included atypicality ratings as the dependent variable, and one including the seven experiments that included the creativity/analytic tasks. Coworkers analyzed the data, and reported whether the individual studies seemed overall good enough for publication or not. If the data did not confirm the hypothesis, I talked to people in the lab about what needs to be done next, which would typically involve brainstorming about what needs to be changed, implementing the changes, preparing the new study and re-running it.
Note that the acknowledgment sections in the papers are far from complete; this has to do with space limitations and with the fact that during the long time of running the studies. Unfortunately, some names got lost. Sometimes I also thanked research assistants who worked with me on similar studies around the time I wrote a paper.
Amount of Studies
The organization of my lab also explains the relatively large number of studies: 120 participants were typically invited for a session of 2 hours that could include up to 15 different experiments (some of them obviously very short, others longer). This gives you 120 X 15 = 1800 participants. If you only need 60 participants this doubles the number of studies. We had 12 computer stations in Bremen, we used to test participants in parallel. We also had many rooms, such as classrooms or lecture halls that could be used for doing paper and pencil studies or studies with laptops. If you organize your lab efficiently, you would need 2-3 weeks to complete this “experimental battery”. We did approximately 30 of such batteries during my time in Bremen and did many more other studies. Sometimes, people were recruited from campus, but most of them were recruited from the larger Bremen area, and sometimes we paid their travel from the city center, because this involved at least half an hour of travel. Sometimes we also had volunteers who helped us without receiving any payment.
None of the Participants Raised Suspicions and Outliers
The complainant also presumes that the participants are psychology students, typically trained in psychological research methods who are often quite experienced as research participants. He finds it unlikely that none of the participants in my studies raised suspicions about the study. Indeed, at the University of Amsterdam (UvA) undergraduates oftentimes know a lot about psychology experiments and some of them might even know or guess some of the hypotheses. However, as noted before, the participants in the studies in question were neither from UvA nor were they entirely psychology students. Furthermore, the purpose of my studies and the underlying hypotheses are oftentimes difficult to detect. For example, a participant who eats granola and is asked to attend to its ingredients is highly unlikely to think that attending to the ingredients made him or her less creative. Note also that the manipulation is done between participants: other participants, in another group eat the granola while attending to its overall gestalt. Participants do not know and do not have any way to know about the other group: they do not know that the variable that is being manipulated is whether the processing of the granola is local versus global. In those circumstances it is impossible to guess the purpose of the study. Moreover, a common practice in social psychological priming studies is to use “cover stories” about the experiments, which present the manipulation and the dependent measure as two unrelated experiments. We usually tell participants that for economic reasons, we test many different hypotheses for many different researchers and labs in our one to three hour lasting experimental sessions. Each part of a study is introduced as independent from the other parts or the other studies. Cover stories are made especially believable by the fact that most of the studies and experimental sessions indeed contain many unrelated experiments that we lump together. And in fact, many tasks do not look similar to each other. All this explains, I think, why participants in my studies do not guess the hypothesis. That being said, it is possible that the research assistants who actually run the studies and interview the participants for suspicion, do not count as “suspicion” if a participant voices an irrelevant idea about the nature of the study. For example, it is possible that if a participant says “I think that the study tested gender differences in perception of music” it would be counted as “no suspicion raised” – because this hypothesis would not have led to a systematic bias or artifact in our data.
Similarly, the complainant wonders how comes the studies did not have any dropouts. Indeed, I did not drop any outliers in any of the studies reported in the paper. What does happen in my lab, as in any lab, is that some participants fail to complete the experiment (e.g., because of computer failure, personal problems, etc.). The partial data of these people is, of course, useless. Typically, I instruct RAs to fill up the conditions to compensate for such data loss. For example, if I aimed at 20 participants per condition, I will make sure that these will be 20 full-record participants. I do not report the number of participants who failed to complete the study, not only because of journals’ space limitations, but also because I do not find this information informative: when you exclude extreme cases, for example, it could be informative to write what would the results look like had they been not excluded. But you simply have nothing to say about incomplete data.
Size of Effects
The complainant wonders about the size of the effects. First let me note that I generally prefer to examine effects that are strong and that can easily be replicated in my lab as well as in other labs. There are many effects in psychology that are interesting but weak (because they can be influenced by many intervening variables, are culturally dependent, etc.) - I personally do not like to study effects that replicate only every now and then. So, I focus on those effects that are naturally stable and thus can be further examined.
Second, I do think that theoretically, these effects should be strong. In studying global/local processing, I thought I was investigating basic effects that are less affected by moderating variables. It is a common wisdom in psychology that perceptual processes are less influenced by external variables than, for example, achievement motivation or group and communication processes. All over the world people can look at the big picture or at the details. It is what we call a basic distinction. Perception is always the beginning of more complex psychological processes. We perceive first, and then we think, feel, or act. Moreover, I found the global/local processing distinction exciting because it can be tested with classic choice or reaction time paradigms and because it is related to the neurological processes. I expected the effects to be big, because no complex preconditions have to be met (in contrast to other effects, that occur, for example, only in people that have certain personality traits). Finally, I assume that local (or global) processing styles are needed for analytic (or creative) processing- without them there is no creativity or analytic thought. If I trigger the appropriate processing style versus the antagonistic processing style, then relatively large effects should be expected. Note also, that the same effect can be obtained by different routes, or processes that could be potentially provoked by the experimental manipulation. My favorite one is that there are global versus local systems that are directly related to creativity. However, others suggested that a global processing style triggers more intuitive processing – a factor that is known to increase creativity in its own right. Yet others suggested that global processing leads to more fluid processing, yet a third factor that could produce our effects. Thus, the same manipulation of global (vs. local) processing could in principle trigger at least three processes that may produce the same effect in concert. From this perspective too, I believe that one would expect rather big effects.
Moreover, the sheer replicability of the effects further increased my confidence. I thought that the relatively large number of studies secures against the possibility of artifacts. My confidence explains why I did not question the results nor did I suspect the data. Of course I do thorough checks, but I could not see anything suspicious in the data or the results. Moreover, a large number of studies conducted in other labs found similar effects. The effects seem to (conceptually) replicate in other labs as well.
Dependent Measure of Analytic Task in the 2012 SPPS Paper
The complainant further wonders why performances on analytic tasks in general were so poor for undergraduates and are below chance level. The author probably assumes that because the task is given in a multiple-choice format with five alternatives, there is a 0.2 probability to answer each single question by chance. However, in our experiment, participants had only 4 minutes to do the task. If a participant was stuck on the first question, did not solve it correctly, and did not even attempt question 2-4 (which happened a lot), then we consider all 4 responses as incorrect, and the participant receives a score of 0. In other words, participants were not forced to just circle an answer for every question, but rather could leave questions unanswered that we counted as “not solving it” and thus “incorrect”. I think that there is no meaningful way to compute the chance level of answering the question in these studies.
The LOWI found the statistical analyses by the experts convincing. However, note that after almost 2 years of meticulous investigation, they did not find any concrete or behavioral evidence for data manipulation. The LOWI expert who did the relevant analysis always qualifies his methods, even though he is concerned about odd regularities, too. However, after having described his analysis, he concludes:
“Het is natuurlijk mogelijk dat metingen het waargenomen patroon vertonen.”
---->It is of course possible that the observed pattern was obtained by measurements.
This reviewer simply expresses an opinion that I kept repeating from my first letter to the UvA-commission on: Statistical methods are not error free. The choice of methods determines the results. One statistician wrote to me: “Lottery winners are no fraudsters, even though the likelihood is 1: 14 Millions to win the lottery.”
Even though I understand from the net that many agree with the analyses, however, I also received emails from statisticians and colleagues criticizing the fact that such analyses are the major basis for this negative judgment.
I even received more concrete advice suggesting that the methods the complainant used are problematic.
To give some examples, international colleagues wonder about the following:
1) They wonder whether the complainant selected the studies he compared my studies with in a way that would help the low likelihoods to come out.
2) They wonder whether the chosen comparison studies are really comparable with my studies. My answer is “no”. I do think that the complainant is comparing “apples with oranges”. This concern has been raised by many in personal emails to me. It concerns a general criticism with a method that made sense a couple of years ago; now many people consider the choice of comparison studies problematic.
3) They are concerned about hypothesis derivation. There are thousands of hypotheses in the world, why did the complainant pick the linearity hypothesis?
4) They complain that there is no justification whatsoever of the methods used for the analyses was provided, alternatives are not discussed (as one would expect from any scientific paper. They also wonder whether the the data met the typical requirements for the analyses used.
5) They mentioned that the suspicion is repeatedly raised based on unsupported assumptions: data are simply considered “not characteristic for psychological experiments” without any further justification.
6) They find the likelihood of 1:trillion simply rhetorical.
7) Last but not least, in the expert reviews, only some QRP were examined. Some people wondered, whether this list is exhaustive and whether „milder“ practices than fraud could have led to the results. Note however, that I never used QRP- if they were used I have unfortunately to assume that co workers in the experiments did them.
Given that there exist deviating opinions, and that many experts raise concerns, I am concerned that the analyses conducted on my paper need to be examined in more detail before I would retract the 2012 paper. I just do not want to jump to conclusions now. I am even more concerned that this statistical analysis was the main basis to question my academic integrity.
Can I Exclude Any Conceivable Possibility of Data Manipulation?
Let me cite the LOWI reviewer:
“Ik benadruk dat uit de datafiles op geen enkele manier is af te leiden, dat de bovenstaande bewerkingen daadwerkelijk zijn uitgevoerd. Evenmin kan gezegd worden wanneer en door wie deze bewerkingen zouden zijn uitgevoerd.”
---->I emphasize that from the data files one can in no way infer that the above adjustments have actually been done. Nor can be said when and by whom such adjustments would have been done.
Moreover, asked, whether there is behavioral evidence for fraud in the data, the LOWI expert answers:
“Het is onmogelijk, deze vraag met zekerheid te beantwoorden. De data files geven hiertoe geen nieuwe informatie.”
---->It is not possible to answer this question with certainty. The data does not give new information on this issue.
Let me repeat that I never manipulated data. However, I can also not exclude the possibility that the data has been manipulated by someone involved in the data collection or data processing.
I still doubt it and hesitated to elaborate on this possibility because I found it unfair to blame somebody, if even in this non-specific way. However, since I have not manipulated data, I must say that in principle it could have been done by someone else. Note that I taught my assistants all the standards of properly conducting studies and fully reporting them. I always emphasized that the assistants are not responsible for the results, but only for conducting the study properly, and that I would never accept any “questionable research practices”. However, theoretically, it is possible that somebody worked on the data. It is possible that for example some research assistants want to please their advisors or want to get their approval by providing “good” results; maybe I underestimated such effects. For this project, it was obvious that ideally, the results would show two significant effects (global > control; control > local), so that both experimental groups would differ from the control group. Maybe somebody adjusted data so that they would better fit this hypothesis.
The LOWI expert was informative with respect to the question how this could have been done. S/he said that it is easy to adjust the data, by simply lowering the variance in the control groups (deleting extreme values) or by replacing values in the experimental groups with more extreme values. Both procedures would perhaps bring the data closer to linearity and are easy to do. One may speculate that for example, a co worker might have run more subjects than I requested in each condition and replaced or deleted “deviant” participants. To suggest another possibility, maybe somebody reran control groups or picked control groups out of a pool of control groups that had low variance. Of course this is all speculation and there might be other possibilities that I cannot even imagine or cannot see from this distance. Obviously, I would have never tolerated any behavior such as this, but it is possible that something has been done with the goal in mind of having significant comparisons to the control group, thereby inadvertently arriving at linear patterns.
Theoretically, such manipulation could have affected a series of studies, since, as I described above, we put different studies into summary files in order to see differences, to decide what studies we would need to run next or which procedural adjustments (including different control variables etc.) we would have to make for follow ups. Again, I repeat that this is all speculation, I simply try to imagine how something could have happened to the data, given the lab structure back then.
During the time of investigation I tried to figure out who could have done something inappropriate. However, I had to accept that there is no chance to trace this back; after all, the studies were run more than 7 years ago and I am not even entirely sure when, and I worked with too many people. I also do not want to point to people just because they are for some reason more memorable than others.
Responsibility for Detecting Odd Patterns in my Data
Finally, one point of accusation is:
“3. Though it cannot be established by whom and in what way data have been manipulated, the Executive Board adopts the findings of the LOWI that the authors, and specifically the lead author of the article, can be held responsible. He could or should have known that the results (`samenhangen`) presented in the 2012 paper had been adjusted by a human hand.”
I did not see the unlikely patterns, otherwise I would have not sent these studies to the journals. Why would I take such risk? I thought that they are unproblematic and reflect actual measurements.
Furthermore, in her open letter, Prof. Dr. Nira Liberman (see on this page #2) says explicitly how difficult it is to see the unlikely patterns. I gave her the paper without telling her what might be wrong with it and asked her to find a mistake or an irregularity. She did not find anything. Moreover, the reviewers, the editor and many readers of the paper did not notice the pattern. The expert review also says on this issue:
Het kwantificeren van de mate waarin de getallen in de eerste rij van Tabel A te klein zijn, vereist een meer dan standaard kennis van statistische methoden, zoals aanwezig bij X, maar niet te verwachten bij niet- specialisten in de statistiek.
---->Quantifying the degree to which numbers in the first row of Table A are too small, affords a more than standard knowledge of statistical methods, a knowledge that X has, but that one cannot expect in non experts of statistics.
I can only repeat: I did not see anything odd in the pattern.
This is a very lengthy letter and I hope it clarifies how I did the study, and why I believe in the data. Statisticians asked me to send them the data and they will further test whether the analyses used by the expert reviewer and by the complainant are correct. I am also willing to discuss my studies within a scientific setting. Please understand that I cannot visit all chatrooms that currently discuss my research. It would also be simply too much to respond to all questions there and to correct all the mistakes. Many people (also in the press) confuse LOWI reports or even combine several ones; and some postings are simply too personal.
This is also the reason why I will not post the data on the net. I thought about it, but my current experience with “the net” prevents me from doing this. I will share the data with scientists who want to have a look at it and who are willing to share their results with me. But I will not leave it to an anonymous crowd that can post whatever it wants, including incorrect conclusions and insults.
I would like to apologize to everyone that I caused so much trouble with my publication. I hope that in the end we can only learn from this. I definitely learned my lesson and will help to work on new rules and standards that make our discipline better. I would like to go back to work.
Regards, Jens Förster
2. Letter by Prof. Dr. Nira Liberman, Tel Aviv, May 4, 2014
Brief von Prof. Dr. Nira Liberman, Tel Aviv, 4. Mai 2014
Let me first identify myself as a friend and a collaborator of Jens Förster. If I understand correctly, in addition to the irregular pattern of data, three points played a major role in the national committee’s conclusion against Jens: That he could not provide the raw data, that he claimed that the studies were actually run in Germany a number of years before submission of the papers, and that he did not see the irregular pattern in his results. I think that it would be informative to conduct a survey among researchers on these points before concluding that Jens’ conduct in these regards is indicative of fraud. (In a similar way, it would be useful to survey other fields of science before concluding anything against social psychology or psychology in general.) Let me volunteer my responses to this survey.
Providing raw data
Can I provide the original paper questionnaires of my studies published in the last five years or the original files downloaded from the software that ran the studies (e.g., Qualtrics, Matlab, Direct-Rt) dated with the time they were run? No, I cannot. I asked colleagues around me, they can’t either. Those who think they can would often find out upon actually trying that this is not the case. (Just having huge piles of questionnaires does not mean that you can find things when you need them.) I am fairly certain that I can provide the data compiled into workable data files (e.g., Excel or SPSS data files). Typically, research assistants rather than primary investigators are responsible for downloading files from running stations and/or for coding questionnaires into workable data files. These are the files that Jens provided the investigating committees upon request. It is perhaps time to change the norm, and request that original data files/original questionnaires are saved along with a proof of date for possible future investigations, but this is not how the field has operated. Until a few years ago, researchers in the field cared about not losing information, but they did not necessarily prepare for a criminal investigation.
Publishing old data
Do I sometimes publish data that are a few years old? Yes, I often do. This happens for multiple reasons: because students come and go, and a project that was started by one student is continued by another student a few years later; because some studies do not make sense to me until more data cumulate and the picture becomes clearer; because I have a limited writing capacity and I do not get to write up the data that I have. I asked colleagues around me. This happens to them too.
The published results
Is it so obvious that something is wrong with the data in the three target papers for a person not familiar with the materials of the accusation? I am afraid it is not. That something was wrong never occurred to me before I was exposed to the argument on linearity. Excessive linearity is not something that anybody checks the data for.
Let me emphasize: I read the papers. I taught some of them in my classes. I re-read the three papers after Jens told me that they were the target of accusation (but before I read the details of the accusation), and after I read the “fraud detective” papers by Simonsohn (2013; ” Just Post it: The Lesson from Two Cases of Fabricated Data Detected by Statistics Alone”), and I still could not see what was wrong. Yes, the effects were big. But this happens, and I could not see anything else.
The commission concluded that Jens should have seen the irregular patterns and thus can be held responsible for the publication of data that includes unlikely patterns. I do not think that anybody can be blamed for not seeing what was remarkable with these data before being exposed to the linearity argument and the analysis in the accusation. Moreover, it seems that the editor, the reviewers, and the many readers and researchers who followed-up on this study also did not discover any problems with the results or if they discovered them, did not regard them as problematic.
And a few more general thoughts: The studies are well cited and some of them have been replicated. The theory and the predictions it makes seem reasonable to me. From personal communication, I know that Jens is ready to take responsibility for re-running the studies and I hope that he gets a position that would allow him to do that. It will take time, but I believe that doing so is very important not only personally for Jens but also for the entire field of psychology. No person and no field are mistake proof. Mistakes are no crimes, however, and they need to be corrected. In my career, somehow anything that happens, good or bad, amounts to more work. So here is, it seems, another big pile of work waiting to be done.
1a. Dies ist meine Reaktion auf den NRC Artikel, erschienen am 29. April 2014
Man möge mir verzeihen, dass ich auf Deutsch schreibe, aber dies ist nun einmal die Sprache, in der ich mich am besten ausdrücke. Es folgt alsbald eine Englische Übersetzung. Ich bitte um Verzeihung, dass ich derzeit keine Niederländische Übersetzung liefern kann.
In der Niederländischen Zeitschrift NRC ist heute ein Artikel erschienen, der ein Ethikverfahren gegen mich zusammenfasst, das an der Universität van Amsterdam (UvA) im September 2012 gegen mich eröffnet wurde. Das Verfahren hatte ein Kollege aus der Methodenlehre gegen mich eingeleitet, weil er auffällige Regelmäßigkeiten in drei meiner Veröffentlichungen festgestellt hatte. In einem ersten, vorläufigen Urteil der UvA wurde kein wissenschaftliches Fehlverhalten festgestellt, jedoch wurde ich gebeten, „Notes of concern“ an die entsprechenden Herausgeber zu schicken, um auf die Muster hinzuweisen. Der Kläger reichte darauf eine Klage bei der nationalen Ethikkommission, der LOWI ein, weil ihm das Urteil zu milde ausfiel. Diese kam kürzlich zu einem negativeren Urteil und geht von wissenschaftlichem Fehlverhalten aus, vor allem weil die Muster, so die statistischen Analysen des Klägers, zu unwahrscheinlich wären. Konkrete Evidenz für Manipulation gäbe es allerdings nicht. Die LOWI schlägt vor, einen Artikel aus dem Jahre 2012 zurückzuziehen. Letzte Woche schloss sich die UvA diesem Urteil weitgehend an, weist aber nachdrücklich darauf hin, dass niemand sagen kann, wer das getan haben könnte und wie die Daten manipuliert wurden. Ich sei allerdings verantwortlich für die Veröffentlichung und hätte die Auffälligkeiten sehen müssen oder können. Die UvA will versuchen, einen Artikel, erschienen 2012, zurück zu ziehen; als Grund wird die statistische Analyse angeführt.
Die rasche Veröffentlichung des Untersuchungsergebnisses der UvA und der LOWI kamen vollkommen überraschend, genauso wie die negative Bewertung meines Verhaltens. Da die LOWI kaum neue Informationen gehabt hat als die vorige Kommission, und da ich nichts Betrügerisches getan habe, erwartete ich einen glatten Freispruch. Das jetzige Urteil ist ein entsetzliches Fehlurteil und für mich absolut nicht zu begreifen. Ich bezweifle auch, dass Kollegen dies nachvollziehen können.
Ich fühle mich als das Opfer einer aus den Rudern geratenen Hexenjagd, die auf Psychologinnen und Psychologen nach der Stapel-Affäre ausgerufen wurde.
Stapel hatte vor drei Jahren Daten frei erfunden und verständlicherweise besonders in den Niederlanden eine wahre Hysterie bewirkt, eine Situation, in der jeder jeden verdächtigt.
Um es klar zu sagen: ich habe weder Daten manipuliert noch meine Mitarbeiter dazu angehalten, Daten zu schönen. Der Ko-Autor Markus Denzler hat mit der Datensammlung und der Analyse nichts zu tun. Ich hatte ihn eingeladen, an der Veröffentlichung teilzunehmen, weil er im Allgemeinen am Thema beteiligt war.
Dementsprechend liegt auch nach über eineinhalb Jahren akribischer Untersuchungstätigkeit von Seiten der LOWI oder der UvA überhaupt kein einziger konkreter Beweis für Fälschung vor. Das einzige, was man mir vorwerfen kann, und das habe ich mehrere Male bereut, ist, dass ich Fragebogen (die übrigens älter als 5 Jahre waren und allesamt in Datenfiles übertragen worden waren) nach einem Umzug in ein viel zu kleines Zimmer weggeworfen habe. Dies geschah auf Anraten eines Kollegen, der mit den Niederländischen Gepflogenheiten vertraut ist. Dies alles geschah, bevor bekannt wurde, dass Diederik Stapel seine Fragebogen erfunden hatte. Es war eine Zeit voll Vertrauen und es galt: wenn Du die kodierten Daten im Computer hast, ist das mehr als genug. Zu Erklärung: die meisten Daten werden sowieso am Computer erhoben, d.h. sie werden direkt in analysebereite Datenformate übertragen. Jedoch hätte mir in meinem Fall auch das Vorweisen von Fragebögen wenig geholfen: Der Kläger ist so überzeugt von der Richtigkeit seiner Analysen, dass er mir hätte unterstellen müssen, ich hätte die Fragebögen gefälscht. Meine Daten wurden bereits nachgerechnet und akribisch überprüft. Die Ergebnisse der Untersuchung, auf die sich die Urteile der LOWI und der UvA ebenfalls stützen (den Namen des Gutachters nenne ich aus Geheimhaltungsgründen nicht) sind folgende:
*die Daten sehen vollkommen realistisch aus
*ich habe alle Analysen richtig gerechnet und wahrheitsgemäß berichtet
*alle Informationen der Fragebögen sind in dem Datenfile enthalten
*die Befunde sind tatsächlich unwahrscheinlich, aber können ebensogut durch tatsächliche Erhebungen zu Stande gekommen sein
*es ist immer möglich, so der Gutachter, dass ungewöhnliche Befunde in der Psychologie erst später erklärt werden können
*falls manipuliert wurde, was nicht mit Sicherheit gesagt werden kann, dann ist überhaupt nicht deutlich, wer es getan hat und wie das geschehen ist
Aufgrund dieser Bewertung hatte ich fest mit einem Freispruch gerechnet und kann die Urteile der LOWI und der UvA nicht nachvollziehen.
Nach dem großen Skandal vor drei Jahren hat sich vieles in der Psychologie geändert, zu Recht. Wir haben andere Standards entwickeln müssen, archivieren, versuchen, so transparent wie möglich zu sein. An der UvA herrschen bald die strengsten Regeln für das Durchführen, das Analysieren und das Archivieren von Daten und das ist auch richtig so.
Man kann dieses Urteil also streng und ahistorisch nennen. Zumindest lässt die Härte der Beurteilung Fragen offen. Auch die Schlussfolgerung, dass das Vernichten von Bögen auf Täuschung hinweist, ist absurd. Es kann auch schlichtweg damit zu tun haben, dass man aufräumen wollte, oder keinen Platz hatte, oder die Archivierung für irrelevant hielt, oder dass man kaum Ressourcen für diese Arbeit übrig hatten. Trotzdem bereue ich mein Verhalten. Ich werde in der Zukunft strengste Kontrolle über die in meinen Labors ablaufenden Prozesse haben. Absolute Transparenz ist für unsere Disziplin schnell zur Selbstverständlichkeit geworden. Mein Fall macht das wieder einmal deutlich.
Der zweite Punkt betrifft die statistischen Analysen des Klägers, die nahelegen, dass die Resultate „zu gut“ aussehen. Seine Analysen und späteren Schreiben klingen so, als gäbe es überhaupt keine andere Interpretation als die, dass Daten manipuliert wurden. Diese starken Schlussfolgerungen sind nicht adäquat. Methodenlehre ist eine Wissenschaft, d.h. Methoden werden wissenschaftlich diskutiert, es gibt immer bessere und weniger gute und die meisten sind fehlerbehaftet. Die vom Kläger verwendeten Methoden sind auch durchaus Teil einer gerade lebhaft stattfindenden wissenschaftlichen Diskussion. Methoden sind zudem auch immer abhängig vom Inhalt der Forschung und hier ließ der Kläger an mehreren Stellen sehen, dass er überhaupt nicht verstand, um was es in meiner Forschung geht. Andere Gutachten kommen zu ganz anderen, stärker qualifizierenden Bewertungen (s.o.): Die Ergebnisse sind unwahrscheinlich aber möglich.
Kurzum, die Schlussfolgerung, dass ich Daten manipuliert habe, wurde nicht bewiesen, sondern bleibt eine Schlussfolgerung auf der Basis von Wahrscheinlichkeiten.
Ich ging davon aus, dass die Unschuldsvermutung gilt und dass die Beweislast beim Kläger liegt. So verstehe ich Recht. LOWI und UvA stützen sich auf die hohe Unwahrscheinlichkeit, errechnet durch Analyseverfahren, die morgen schon obsolet sein können. Die UvA räumt dann auch ein, dass nicht klar ist, wer, wenn überhaupt Hand angelegt hat. Sie hält mich jedoch für verantwortlich. Ich hätte sehen können oder müssen, dass etwas an den Daten merkwürdig ist.
Dem widerspreche ich: ich habe die Regelmäßigkeiten nicht gesehen. Zwei Gutachten der LOWI und der UvA bestätigen das auch: Sie sagen, dass es schwierig bis unmöglich wäre, die Auffälligkeiten zu erkennen, wenn man nicht ein Experte auf diesem Gebiet wäre. Zudem wurde weder vom Herausgeber der Zeitschrift, noch von unabhängigen Gutachtern (peer reviews) etwas Auffälliges bemerkt.
Zudem sprechen externe Merkmale für die Echtheit der von mir gefundenen Phänomene. Die Befunde wurden in internationalen Labors repliziert, was bedeutet, dass die Phänomene, die ich zeige, wiederholbar sind und Substanz haben. Viele Wissenschaftler haben ihre eigenen Untersuchungen auf meinen aufgebaut. Meine Arbeiten leiden nicht an der Replication Crisis (e.g., die Unmöglichkeit, Daten zu replizieren).
UvA wie LOWI schlagen nun vor, den in 2012 erschienenen Artikel zurückzuziehen. Ich habe im Prinzip keine Probleme damit, Artikel zurückzuziehen, stimme inhaltlich aber keineswegs zu. Wenn ich mich meines eigenen Verstandes bediene, und das sollte ich als Wissenschaftler, dann sehe ich für mich keinen ausreichenden Grund zu der Annahme, dass etwas falsch ist. Die statistischen Gutachten sagen eindeutig, dass die Befunde möglich, aber unwahrscheinlich sind. Allein die Analysen des Klägers sind voller übertriebener Schlussfolgerungen und diese sind schlichtweg nicht ausreichend, um einen Artikel zurückzuziehen. Ich überlasse dem Herausgeber der Zeitschrift, dies zu entscheiden.
Zusammenfassend begreife ich das Urteil nicht. Ich kann mir auch nicht vorstellen, dass es viele andere, inhaltlich arbeitende Psychologen verstehen werden. Für mich ist das Verfahren deshalb nicht abgeschlossen. Ich werde einen Brief an die UvA und die LOWI schreiben (Artikel 13 erlaubt dies), in dem ich an Informationen aus den Gutachten erinnern möchte, die entweder vergessen wurden oder zu wenig Beachtung fanden. Zudem gilt es nun, die statistischen Analysen des Klägers, von denen er selbst sagt, sie seien nicht zu veröffentlichen, weil zu provokativ, eingehend zu prüfen. Ich habe mich bisher dabei zurückgehalten, andere mit einzubeziehen, weil ich die Geheimhaltungspflicht sehr Ernst genommen habe. Einige Punkte aus der Analyse des Klägers lassen sich nämlich vermutlich widerlegen. Aus dem Munde des Klägers klingt es dabei so, als wären die Gutachter alle gleicher Meinung, sie sind jedoch weit entfernt davon. Während der Kläger in seiner Ursprungsklage die Linearität für absolut unwahrscheinlich hält, wird später behauptet, dass Linearität in der wirklichen Welt wohl vorkomme.
Ich finde es sehr störend, dass durch die vorschnelle Veröffentlichung des UvA-Berichts (es war eigentlich mehrere Male zur Geheimhaltung angehalten worden, um meinen Ruf zu schützen) nun eine Dynamik ins Rollen kommt, die nur nachteilig für mich sein kann. Meine Möglichkeiten, gegen ein Urteil, das meiner Meinung nach ein Fehler ist, sensibel und in einem geeigneten Kommunikationsrahmen vorzugehen, werden nun beschränkt. Ich kann das Urteil nicht akzeptieren und hoffe, dass es auch in der derzeitigen, sehr angespannten Stimmung, die durch die Veröffentlichung sicherlich noch angepeitscht wird, möglich sein wird, den Dialog mit LOWI und UvA wieder aufzunehmen.
Hochachtungsvoll, Jens Förster
P.S. Zu guter Letzt möchte ich mich bei allen bedanken, die mich in den letzten eineinhalb Jahren unterstützt und begleitet haben. Es war keine leichte Zeit, aber ich habe sie überstanden. Ich liebe Euch.
1 b. This is an English translation of my reaction to a newspaper article that appeared in the Dutch newspaper NRC about me.
Today, an article appeared in the Dutch newspaper “NRC” summarizing an investigation on my academic integrity that was opened in September 2012. The case was opened because a colleague from the methodology department at the University of Amsterdam (UvA) observed some regularities in data of three articles that are supposedly highly unlikely. The UvA commission decided in a first, preliminary evaluation that there was no evidence of academic misconduct, but that I should send “cautionary notes” to the respective editors of the journals pointing to these unlikely regularities. The complainant filed yet a different complaint at the national ethics board, the LOWI, because he found the evaluation too mild. Recently, the LOWI finished the investigation, ending up with a more negative evaluation and found that academic misconduct must have taken place, mainly because the patterns are so unlikely. Concrete evidence for fraud however, has not been found. They also recommended to retract one of the papers that has been published 2012. Last week, the UvA accepted this advice but points to the fact that nobody could say who manipulated the data and how this could have taken place. However, I would be responsible for the data I published because I should or could have seen the odd pattern. They will try to retract the 2012 paper based on the statistical analyses provided during the investigation.
The rapid publication of the results of the LOWI and UvA case happened quite unexpectedly, the negative evaluation came unexpectedly, too. Note that we were all sworn to secrecy by the LOWI, so please understand that I have to write this letter in zero time. Because the LOWI, from my point of view, did not receive much more information than was available for the preliminary, UvA-evaluation, and because I did never did something even vaguely related to questionable research practices, I expected a verdict of not guilty. The current judgment is a terrible misjudgment, I do not understand it at all, and doubt that my colleagues will understand it.
I do feel like the victim of an incredible witch hunt directed at psychologists after the Stapel-affair. Three years ago, we learned that Diederik Stapel had invented data, leading to an incredible hysteria, and understandably, this hysteria was especially strong in the Netherlands. From this point on, everybody looked suspicious to everbody.
To be as clear as possible: I never manipulated data and I never motivated my co workers to manipulate data. My co author of the 2012 paper, Markus Denzler, has nothing to do with the data collection or the data analysis. I had invited him to join the publication because he was involved generally in the project.
Consistently, no concrete evidence for manipulation could be found by LOWI or UvA even after a one and half years lasting, meticulous investigation. The only thing that can be held against me is the dumping of questionnaires (that by the way were older than 5 years and were all coded in the existing data files) because I moved to a much smaller office. I regretted this several times in front of the commissions. However, this was suggested by a colleague who knew the Dutch standards with respect to archiving. I have to mention that all this happened before we learned that Diederik Stapel had invented many of his data sets. This was a time of mutual trust and the general norm was: “if you have the data in your computer, this is more than enough”. To explain: most of the data is collected at the computer anyway and is directly transported to summary files that can be immediately analyzed. Note however, that having the questionnaires would not have helped me in this case: the complainant is so self confident that he is right that he would have had to argue that I faked the questionnaires. In this way, I am guilty in any event. My data files were sent to the commissions and have been re analyzed and tested in detail. The results of the re analysis and the investigations were:
*the data is real
*the analyses I did are correct and are correctly reported
*all information of the questionnaires is in the data files
*the results are indeed unlikely but possible and could have been obtained by actual data collection
*it is always possible (according to the reviewer) that we will understand odd patterns in psychology at a later point in time
*if data manipulation took place, something that cannot even be decided on the basis of the available data, it cannot be said who did it and how it was done.
Based on this evaluation, I expected a verdict of not guilty; I cannot understand the judgments by LOWI and UvA.
After the big scandal three years ago, many things in psychological research have changed, and this is a good thing. We had to develop new standards for archiving, and for conducting our research in the most transparent manner. At UvA we have now the strictest rules one can imagine for conducting, analyzing, and archiving, and this is a good thing.
One can consider the judgment by LOWI and UvA as very strict and ahistorical. At least the harshness of the judgment makes one wonder. Moreover, the conclusion that dumping questionnaires necessarily indicates fraud is absurd. It can simply have happened because one wanted to clean up the room, or because one wanted to make room or because archiving was considered less relevant or because there were no resources left. Nonetheless I regret my behavior. I will of course in the future keep strict control over the procedures in my lab. Absolute transparency is self-evident for our discipline. My case is a cruel example for why this is important.
The second basis for the judgment is the statistical analyses by the complainant, suggesting that the results look “to good to be true”. His analyses and later writings sound, as if there is no other interpretation for the data than data manipulation. These stark conclusions are inadequate. Methodology is a science, that is methods are being discussed scientifically; there are always better or worse methods and many of them are error prone. The methods used by the complainant are currently part of a vivid scientific discussion. Moreover, methods are always dependent on the content of the research. During the investigation, I observed several times that the complainant had no idea whatsoever what my research was actually about. Other reviewers came to different, more qualifying conclusions (see above): the results are unlikely but possible.
In short: The conclusion that I manipulated data has never been proved. It remains a conclusion based on likelihoods.
I assumed, that the presumption of innocence prevails and that the burden of proof is on the accuser. This is how I understand justice. LOWI and UvA base their evaluation on analyses that already tomorrow could be obsolete (and they are being discussed currently). The UvA states that it is not clear, who could have manipulated data if this had been done. But UvA thinks that I am still responsible. I should have or could have seen that something is odd with the data. Note that I did not see these regularities. Two reviews sent to LOWI and UvA explicitly state that it is difficult or impossible for a non expert in statistics to see the regularities. Moreover, neither the editor of the journal nor the independent peer reviewers noticed something weird.
In addition, external markers speak for the validity of the phenomena that I discovered. The results were replicated in many international laboratories, meaning that the phenomena I found can be repeated and have some validity. Many scientists have built their research on the basis of my findings. My work is a counter example to the current “replication crisis”.
UvA and LOWI suggest to retract the 2012 article. In principle, I have no problems with retracting articles, but content wise I do not agree at all with this. I do not see any sufficient reason doing so. The lasts statistical review says explicitly that the results are possible yet unlikely. Only the analyses by the complainant are full of exaggerated conclusions and I simply cannot take them as a valid basis for retracting an article. I will leave it to the editor of the journal if he wants to retract the paper.
In summary, I do not understand the evaluation at all. I cannot imagine that psychologists who work on theory and actual psychological phenomena will understand it. For me, the case is not closed. I will write a letter to UvA and LOWI (see article 13 of the regulations), and will remind the commissions of the information that they overlooked or that did not get their full attention. Moreover, now is the time to test the statistical analyses by the complainant – note that he stated that those would not be publishable because they would be “too provocative”. I hesitated until now to hand his analysis over to other statisticians, because confidentiality in this case was important to me. Some of his arguments I believe will be challenged. Listening to the complainant, one gets the impression that the current reviews are all confirming his views. However this is not the case. For example, in his original complaint he stated that linearity is completely unlikely, and later we learned that linearity does exist in the real world.
I found it disturbing that due to the premature publication of the UvA-report a dynamic started that is clearly disadvantageous to me. My options, to argue against an evaluation that is from my perspective wrong in a sensible way and within an appropriate communication frame are drastically limited now. However, I cannot accept the evaluation and I hope that in the current tense atmosphere, that has been partly fueled by the premature publication, it will still be possible to re start the dialogue with UvA and LOWI.
Regards, Jens Förster
P.S. Finally, I would like to thank all the friends that supported me through the last one and a half years. It has been a quite difficult time for me but see, I survived. I love you.