1 a. Reaction to NRC article, April 29, 2014 (JF)
1 b. Reaktion auf den NRC Artikel, der am 29. April 2014 im Niederländischen NRC erschien (JF)
2. Letter by Prof. Dr. Liberman, May 4, 2014
3. Reaction to LOWI Report, May 11, 2014 (JF)
4. Reaction to Science Article, May 29, 2014 (JF)
4. Reaction to Science Article, May 29, 2014
Dear colleagues, dear friends,
As you may have heard, Frank van Kolfschooten (the journalist who published the first NRC article on my "case" some hours after the UvA report was published, and who also wrote the articles that appeared in Science magazine and the Süddeutsche Zeitung) continues investigating my case, citing in his recent Science magazine article from Mai 29 an email conversation between me and a former research assistant. In his new article, the author presents his idea that the studies that I reported had not been done in Germany, but rather in Amsterdam.
Even though printing a conversation with a student in public might be seen as questionable behavior, I am glad that he published this because it illustrates the Kafkaesque situation I am in, in which everything, even standard and viable ideas and decisions are turned against me. As social psychologists we know that if people are convinced about certain facts, this shapes their world views and biases their information search and interpretation (aka as confirmation bias). Eventually, they perceive everything as consistent with their hypotheses. This however can happen unconsciously.
However, the concerns the article might raise can be easily addressed.
First, let me say that all these emails and indications (that are actually based on private investigations by the complainant who had a similar hypothesis in mind) had been examined by the National Ethics Commission (the LOWI). Apparently, all these concerns were unwarranted; this is the reason why they do not even show up in the final evaluation.
If you again wonder about the procedures, I can only tell you: Yes, it is true that the complainant who at the same time was the major expert in statistics during the investigation also interviewed my former research assistants secretly. And yes, it is true that we do not know how s/he asked questions and what questions exactly s/he asked. These conversations are presented out of context.
Second, I do not understand why conducting experiments at UvA logically excludes the possibility that I had done similar ones in Germany. Note that the JEP:G 2009 was submitted in March 2008, and I arrived in Amsterdam summer 2007 – this would have been a rather short time to do all the studies. Note also that in the studies participants had to compare for example “heute” and “tagesschau” – two German news shows that are rather unfamiliar to Dutch participants. Note further that the 2012 SPPS paper contains 390 solutions for a creativity task written in German. Finally, in the Appendix of the 2011 JEP: G paper you find a list with German words. I conclude that the articles that are criticized were not read carefully, and that search biases might have led to wrong conclusions.
Third and most importantly, let me repeat what I expressed in my statement #3 below: I conducted the published studies between 1999 and 2008 in Germany. For outsiders who are not familiar with research in Psychology, it might appear to be strange that I developed stimulus material that allegedly had been used previously.
However, I wanted to conceptually replicate and extend my previously in Germany obtained results with a different (i.e., Dutch) population in a different (i.e., Dutch) language. This requires stimulus material that is suited to test hypotheses with a different population. Just to illustrate this point: Imagine you conducted a certain type of studies with children and want to conduct it later with adults. Of course, you would have to prepare stimulus material for the adults that is different from the one for children. Applying this example to the journalist’s logic, he would wonder why adults would require different stimulus material.
More specifically, the study 1C (from JEP: G 2011) using a “Moldavian” nonsense poem, had been done in Germany. It included a poem for that I changed the vowels and consonants to a fantasy language. The original poem was an old Transylvanian song.
In Amsterdam, I first thought Moldavian would be associated with negative stereotypes (I sensed strong prejudice against East Europeans) and that Malaysian was both more neutral and more believable. Moreover, changing the language would count as yet another conceptual rather than straight replication; something we are looking for. Eventually, however after discussions with the research assistant I decided to take again Moldavian, among others because the poem sounded also to Dutch students more East European than Malaysian, and students considered Moldavians a rather neutral group.
Thus, it is true that I wanted to do similar studies at UvA that included both replications and extensions. I sent basic plans and designs to my research assistant. Obviously the journalist received these emails and files and misinterpreted them. Actually, these files were the beginning of the task for the research assistant: “Let me know what you think, how can this be done, what do you think works best for Dutch students – and if it is impossible for you to figure this out, I can take over”. I wanted his fresh creative “Dutch” input with regard to this paradigm. My experience told me that I cannot simply transport the studies from a German to a Dutch context, rather, some cultural differences (such as food preferences or contents of stereotypes) would apply. I wanted to obtain an unbiased view on materials using the logic from the old basic study set ups to see how they fitted the new environment. Creativity research shows that you block creative thought if you tell too much in advance. In addition, telling a research assistant that using similar paradigms in Germany had already led to many successful studies would have produced tremendous pressure on him. Such pressure could produce unwanted behavior (e.g. experimenter biases) that social psychologists aim to control for. As a matter of fact, such strategy of “not telling too much” is also used in other disciplines and I teach it whenever I teach methods in Social Psychology. However, note that the studies we did at UvA were slightly different (we added modalities to the basic one modality design).
Finally, the SPSS data file from February 2013 contains the original data. Laypeople might not know this but SPSS files get constantly updated. This however does not mean that the original values are changed. Rather, if you translate variable labels from German to English (like “Geschlecht” to “gender”), this file would receive a new time stamp – including the unchanged, original values. In fact, I translated variable labels from German to English in order to make re analyses (for the investigation committee) easier. And please let me use this example to illustrate the unfortunate situation: I wanted and still want to contribute to clarifying the situation. Therefore, I changed the names of the variables from German to English. This or at least the change in the time stamp is now held against me. If I would not have translated the variables, of course, one could have argued that if one fabricates data, I would have used German and not English variable names (to demonstrate that they have been conducted in Germany). In any event – with German or English variable names – I would have found “guilty”. Confirmation Bias in action!
As I said, I gave all these answers to the commissions, and I wonder why the person who passed the material to the journalist did not pass my answers as well - or why the journalist, in case he had the material, did not talk about these simple, unspectacular responses in his article. It is hard for me to believe that this selection happened in the unconscious.
In the end, the lengthy article does not convey any new relevant information. Still, there is no concrete evidence whatsoever of violation of academic integrity. However, this accumulation of negative conclusions, unintended or not, certainly affects my reputation. Note also that some concerns raised in the article were already addressed in my letters and reactions below. I explained in text #3 how I treated outliers and I reported in #1 that an UvA authority figure asked me to dump the questionnaires. Meanwhile I have witnesses for this. Moreover, a former PhD student wrote to me that s/he was asked to dump questionnaires by yet a different person from the department. Ignoring such information is another typical result of a confirmation bias.
In general, I wonder why people publish doubts about my studies that are so obviously unwarranted and that do certainly harm my reputation. Many times misrepresentations are of course lack of expertise to judge the facts (how do we prevent for demand characteristics? how do we prevent for experimenter biases? what do we tell experimenters and why?). However please also note that for some people my case could be profitable.
3. Response to the LOWI Report Published Last Wednesday
Jens Förster, May 11, 2014
Dear colleagues, some of you wonder how I am doing, and how I will address the current accusations. You can imagine that I have a lot of work to do, now. There are many letters to write, there are a number of emails, meetings, and phone calls. I also started the moving process. And there is my daily work.
I keep going because of the tremendous support that I experience. This is clearly overwhelming!
The publication of the LOWI report came unexpectedly, so forgive me that I needed some time to write this response. Another reason is that I still hesitate to share certain insights with the public, because I was asked to remain confidential about the investigation. It is hard for me to decide how far I can go to reveal certain reviews or results. This is especially difficult to me because the Netherlands is a foreign country to me and norms differ from my home country. In addition, this week, the official original complaint was posted to some chatrooms. Both papers raise questions, especially about my Förster et al. 2012 paper published in SPPS.
First and foremost let me repeat that I never manipulated data and I never motivated my co workers to manipulate data. My co author of the 2012 paper, Markus Denzler, has nothing to do with the data collection or the data analysis. I had invited him to join the publication because he was involved generally in the project.
The original accusation raises a few specific questions about my studies. These concerns are easy to alleviate. Let me now respond to the specific questions and explain the rules and procedures in my labs.
Origin of Studies and Lab-Organization During that Time
The series of experiments were run 1999 – 2008 in Germany, most of them Bremen, at Jacobs University; the specific dates of single experiments I do not know anymore. Many studies were run with a population of university students that is not restricted to psychology students. This is how we usually recruited participants. Sometimes, we also tested guests, students in the classrooms or business people that visited. This explains why the gender distribution deviates from the distribution of Amsterdam psychology students. This distribution closely resembles the one reported in my other papers. Note that I never wrote that the studies were conducted at the UvA, this was an unwarranted assumption by the complainant. Indeed, the SPSS files on the creativity experiments for 2012 paper include the 390 German answers. This was also explicitly noted by the expert review for the LOWI who re analyzed the data.
During the 9 years I conducted the studies, I had approximately 150 co-workers (research assistants, interns, volunteers, students, PhDs, colleagues). Note that the LOWI interviewed two research assistants that worked with me at UvA, their reports however do not reflect the typical organization at for example Bremen, where I had a much larger lab with many more co workers. However, former co workers from Bremen invited by the former UvA commission basically confirmed the general procedure described here.
At times I had 15 research assistants and more people (students, interns, volunteers, PhDs, etc.) who would conduct experimental batteries for me. They (those could be different people) entered the data when it was paper and pencil questionnaire data and they would organize computer data into workable summary files (one line per subject, one column per variable). For me to have a better overview of the effects in numerous studies, some would also prepare summary files for me in which multiple experiments would be included. The data files I gave to the LOWI reflect this: To give an example for the SPPS (2012) paper, I had two data files, one including the five experiments that included atypicality ratings as the dependent variable, and one including the seven experiments that included the creativity/analytic tasks. Coworkers analyzed the data, and reported whether the individual studies seemed overall good enough for publication or not. If the data did not confirm the hypothesis, I talked to people in the lab about what needs to be done next, which would typically involve brainstorming about what needs to be changed, implementing the changes, preparing the new study and re-running it.
Note that the acknowledgment sections in the papers are far from complete; this has to do with space limitations and with the fact that during the long time of running the studies. Unfortunately, some names got lost. Sometimes I also thanked research assistants who worked with me on similar studies around the time I wrote a paper.
Amount of Studies
The organization of my lab also explains the relatively large number of studies: 120 participants were typically invited for a session of 2 hours that could include up to 15 different experiments (some of them obviously very short, others longer). This gives you 120 X 15 = 1800 participants. If you only need 60 participants this doubles the number of studies. We had 12 computer stations in Bremen, we used to test participants in parallel. We also had many rooms, such as classrooms or lecture halls that could be used for doing paper and pencil studies or studies with laptops. If you organize your lab efficiently, you would need 2-3 weeks to complete this “experimental battery”. We did approximately 30 of such batteries during my time in Bremen and did many more other studies. Sometimes, people were recruited from campus, but most of them were recruited from the larger Bremen area, and sometimes we paid their travel from the city center, because this involved at least half an hour of travel. Sometimes we also had volunteers who helped us without receiving any payment.
None of the Participants Raised Suspicions and Outliers
The complainant also presumes that the participants are psychology students, typically trained in psychological research methods who are often quite experienced as research participants. He finds it unlikely that none of the participants in my studies raised suspicions about the study. Indeed, at the University of Amsterdam (UvA) undergraduates oftentimes know a lot about psychology experiments and some of them might even know or guess some of the hypotheses. However, as noted before, the participants in the studies in question were neither from UvA nor were they entirely psychology students. Furthermore, the purpose of my studies and the underlying hypotheses are oftentimes difficult to detect. For example, a participant who eats granola and is asked to attend to its ingredients is highly unlikely to think that attending to the ingredients made him or her less creative. Note also that the manipulation is done between participants: other participants, in another group eat the granola while attending to its overall gestalt. Participants do not know and do not have any way to know about the other group: they do not know that the variable that is being manipulated is whether the processing of the granola is local versus global. In those circumstances it is impossible to guess the purpose of the study. Moreover, a common practice in social psychological priming studies is to use “cover stories” about the experiments, which present the manipulation and the dependent measure as two unrelated experiments. We usually tell participants that for economic reasons, we test many different hypotheses for many different researchers and labs in our one to three hour lasting experimental sessions. Each part of a study is introduced as independent from the other parts or the other studies. Cover stories are made especially believable by the fact that most of the studies and experimental sessions indeed contain many unrelated experiments that we lump together. And in fact, many tasks do not look similar to each other. All this explains, I think, why participants in my studies do not guess the hypothesis. That being said, it is possible that the research assistants who actually run the studies and interview the participants for suspicion, do not count as “suspicion” if a participant voices an irrelevant idea about the nature of the study. For example, it is possible that if a participant says “I think that the study tested gender differences in perception of music” it would be counted as “no suspicion raised” – because this hypothesis would not have led to a systematic bias or artifact in our data.
Similarly, the complainant wonders how comes the studies did not have any dropouts. Indeed, I did not drop any outliers in any of the studies reported in the paper. What does happen in my lab, as in any lab, is that some participants fail to complete the experiment (e.g., because of computer failure, personal problems, etc.). The partial data of these people is, of course, useless. Typically, I instruct RAs to fill up the conditions to compensate for such data loss. For example, if I aimed at 20 participants per condition, I will make sure that these will be 20 full-record participants. I do not report the number of participants who failed to complete the study, not only because of journals’ space limitations, but also because I do not find this information informative: when you exclude extreme cases, for example, it could be informative to write what would the results look like had they been not excluded. But you simply have nothing to say about incomplete data.
Size of Effects
The complainant wonders about the size of the effects. First let me note that I generally prefer to examine effects that are strong and that can easily be replicated in my lab as well as in other labs. There are many effects in psychology that are interesting but weak (because they can be influenced by many intervening variables, are culturally dependent, etc.) - I personally do not like to study effects that replicate only every now and then. So, I focus on those effects that are naturally stable and thus can be further examined.
Second, I do think that theoretically, these effects should be strong. In studying global/local processing, I thought I was investigating basic effects that are less affected by moderating variables. It is a common wisdom in psychology that perceptual processes are less influenced by external variables than, for example, achievement motivation or group and communication processes. All over the world people can look at the big picture or at the details. It is what we call a basic distinction. Perception is always the beginning of more complex psychological processes. We perceive first, and then we think, feel, or act. Moreover, I found the global/local processing distinction exciting because it can be tested with classic choice or reaction time paradigms and because it is related to the neurological processes. I expected the effects to be big, because no complex preconditions have to be met (in contrast to other effects, that occur, for example, only in people that have certain personality traits). Finally, I assume that local (or global) processing styles are needed for analytic (or creative) processing- without them there is no creativity or analytic thought. If I trigger the appropriate processing style versus the antagonistic processing style, then relatively large effects should be expected. Note also, that the same effect can be obtained by different routes, or processes that could be potentially provoked by the experimental manipulation. My favorite one is that there are global versus local systems that are directly related to creativity. However, others suggested that a global processing style triggers more intuitive processing – a factor that is known to increase creativity in its own right. Yet others suggested that global processing leads to more fluid processing, yet a third factor that could produce our effects. Thus, the same manipulation of global (vs. local) processing could in principle trigger at least three processes that may produce the same effect in concert. From this perspective too, I believe that one would expect rather big effects.
Moreover, the sheer replicability of the effects further increased my confidence. I thought that the relatively large number of studies secures against the possibility of artifacts. My confidence explains why I did not question the results nor did I suspect the data. Of course I do thorough checks, but I could not see anything suspicious in the data or the results. Moreover, a large number of studies conducted in other labs found similar effects. The effects seem to (conceptually) replicate in other labs as well.
Dependent Measure of Analytic Task in the 2012 SPPS Paper
The complainant further wonders why performances on analytic tasks in general were so poor for undergraduates and are below chance level. The author probably assumes that because the task is given in a multiple-choice format with five alternatives, there is a 0.2 probability to answer each single question by chance. However, in our experiment, participants had only 4 minutes to do the task. If a participant was stuck on the first question, did not solve it correctly, and did not even attempt question 2-4 (which happened a lot), then we consider all 4 responses as incorrect, and the participant receives a score of 0. In other words, participants were not forced to just circle an answer for every question, but rather could leave questions unanswered that we counted as “not solving it” and thus “incorrect”. I think that there is no meaningful way to compute the chance level of answering the question in these studies.
The LOWI found the statistical analyses by the experts convincing. However, note that after almost 2 years of meticulous investigation, they did not find any concrete or behavioral evidence for data manipulation. The LOWI expert who did the relevant analysis always qualifies his methods, even though he is concerned about odd regularities, too. However, after having described his analysis, he concludes:
“Het is natuurlijk mogelijk dat metingen het waargenomen patroon vertonen.”
---->It is of course possible that the observed pattern was obtained by measurements.
This reviewer simply expresses an opinion that I kept repeating from my first letter to the UvA-commission on: Statistical methods are not error free. The choice of methods determines the results. One statistician wrote to me: “Lottery winners are no fraudsters, even though the likelihood is 1: 14 Millions to win the lottery.”
Even though I understand from the net that many agree with the analyses, however, I also received emails from statisticians and colleagues criticizing the fact that such analyses are the major basis for this negative judgment.
I even received more concrete advice suggesting that the methods the complainant used are problematic.
To give some examples, international colleagues wonder about the following:
1) They wonder whether the complainant selected the studies he compared my studies with in a way that would help the low likelihoods to come out.
2) They wonder whether the chosen comparison studies are really comparable with my studies. My answer is “no”. I do think that the complainant is comparing “apples with oranges”. This concern has been raised by many in personal emails to me. It concerns a general criticism with a method that made sense a couple of years ago; now many people consider the choice of comparison studies problematic.
3) They are concerned about hypothesis derivation. There are thousands of hypotheses in the world, why did the complainant pick the linearity hypothesis?
4) They complain that there is no justification whatsoever of the methods used for the analyses was provided, alternatives are not discussed (as one would expect from any scientific paper. They also wonder whether the the data met the typical requirements for the analyses used.
5) They mentioned that the suspicion is repeatedly raised based on unsupported assumptions: data are simply considered “not characteristic for psychological experiments” without any further justification.
6) They find the likelihood of 1:trillion simply rhetorical.
7) Last but not least, in the expert reviews, only some QRP were examined. Some people wondered, whether this list is exhaustive and whether „milder“ practices than fraud could have led to the results. Note however, that I never used QRP- if they were used I have unfortunately to assume that co workers in the experiments did them.
Given that there exist deviating opinions, and that many experts raise concerns, I am concerned that the analyses conducted on my paper need to be examined in more detail before I would retract the 2012 paper. I just do not want to jump to conclusions now. I am even more concerned that this statistical analysis was the main basis to question my academic integrity.
Can I Exclude Any Conceivable Possibility of Data Manipulation?
Let me cite the LOWI reviewer:
“Ik benadruk dat uit de datafiles op geen enkele manier is af te leiden, dat de bovenstaande bewerkingen daadwerkelijk zijn uitgevoerd. Evenmin kan gezegd worden wanneer en door wie deze bewerkingen zouden zijn uitgevoerd.”
---->I emphasize that from the data files one can in no way infer that the above adjustments have actually been done. Nor can be said when and by whom such adjustments would have been done.
Moreover, asked, whether there is behavioral evidence for fraud in the data, the LOWI expert answers:
“Het is onmogelijk, deze vraag met zekerheid te beantwoorden. De data files geven hiertoe geen nieuwe informatie.”
---->It is not possible to answer this question with certainty. The data does not give new information on this issue.
Let me repeat that I never manipulated data. However, I can also not exclude the possibility that the data has been manipulated by someone involved in the data collection or data processing.
I still doubt it and hesitated to elaborate on this possibility because I found it unfair to blame somebody, if even in this non-specific way. However, since I have not manipulated data, I must say that in principle it could have been done by someone else. Note that I taught my assistants all the standards of properly conducting studies and fully reporting them. I always emphasized that the assistants are not responsible for the results, but only for conducting the study properly, and that I would never accept any “questionable research practices”. However, theoretically, it is possible that somebody worked on the data. It is possible that for example some research assistants want to please their advisors or want to get their approval by providing “good” results; maybe I underestimated such effects. For this project, it was obvious that ideally, the results would show two significant effects (global > control; control > local), so that both experimental groups would differ from the control group. Maybe somebody adjusted data so that they would better fit this hypothesis.
The LOWI expert was informative with respect to the question how this could have been done. S/he said that it is easy to adjust the data, by simply lowering the variance in the control groups (deleting extreme values) or by replacing values in the experimental groups with more extreme values. Both procedures would perhaps bring the data closer to linearity and are easy to do. One may speculate that for example, a co worker might have run more subjects than I requested in each condition and replaced or deleted “deviant” participants. To suggest another possibility, maybe somebody reran control groups or picked control groups out of a pool of control groups that had low variance. Of course this is all speculation and there might be other possibilities that I cannot even imagine or cannot see from this distance. Obviously, I would have never tolerated any behavior such as this, but it is possible that something has been done with the goal in mind of having significant comparisons to the control group, thereby inadvertently arriving at linear patterns.
Theoretically, such manipulation could have affected a series of studies, since, as I described above, we put different studies into summary files in order to see differences, to decide what studies we would need to run next or which procedural adjustments (including different control variables etc.) we would have to make for follow ups. Again, I repeat that this is all speculation, I simply try to imagine how something could have happened to the data, given the lab structure back then.
During the time of investigation I tried to figure out who could have done something inappropriate. However, I had to accept that there is no chance to trace this back; after all, the studies were run more than 7 years ago and I am not even entirely sure when, and I worked with too many people. I also do not want to point to people just because they are for some reason more memorable than others.
Responsibility for Detecting Odd Patterns in my Data
Finally, one point of accusation is:
“3. Though it cannot be established by whom and in what way data have been manipulated, the Executive Board adopts the findings of the LOWI that the authors, and specifically the lead author of the article, can be held responsible. He could or should have known that the results (`samenhangen`) presented in the 2012 paper had been adjusted by a human hand.”
I did not see the unlikely patterns, otherwise I would have not sent these studies to the journals. Why would I take such risk? I thought that they are unproblematic and reflect actual measurements.
Furthermore, in her open letter, Prof. Dr. Nira Liberman (see on this page #2) says explicitly how difficult it is to see the unlikely patterns. I gave her the paper without telling her what might be wrong with it and asked her to find a mistake or an irregularity. She did not find anything. Moreover, the reviewers, the editor and many readers of the paper did not notice the pattern. The expert review also says on this issue:
Het kwantificeren van de mate waarin de getallen in de eerste rij van Tabel A te klein zijn, vereist een meer dan standaard kennis van statistische methoden, zoals aanwezig bij X, maar niet te verwachten bij niet- specialisten in de statistiek.
---->Quantifying the degree to which numbers in the first row of Table A are too small, affords a more than standard knowledge of statistical methods, a knowledge that X has, but that one cannot expect in non experts of statistics.
I can only repeat: I did not see anything odd in the pattern.
This is a very lengthy letter and I hope it clarifies how I did the study, and why I believe in the data. Statisticians asked me to send them the data and they will further test whether the analyses used by the expert reviewer and by the complainant are correct. I am also willing to discuss my studies within a scientific setting. Please understand that I cannot visit all chatrooms that currently discuss my research. It would also be simply too much to respond to all questions there and to correct all the mistakes. Many people (also in the press) confuse LOWI reports or even combine several ones; and some postings are simply too personal.
This is also the reason why I will not post the data on the net. I thought about it, but my current experience with “the net” prevents me from doing this. I will share the data with scientists who want to have a look at it and who are willing to share their results with me. But I will not leave it to an anonymous crowd that can post whatever it wants, including incorrect conclusions and insults.
I would like to apologize to everyone that I caused so much trouble with my publication. I hope that in the end we can only learn from this. I definitely learned my lesson and will help to work on new rules and standards that make our discipline better. I would like to go back to work.
Regards, Jens Förster
2. Letter by Prof. Dr. Nira Liberman, Tel Aviv, May 4, 2014
Brief von Prof. Dr. Nira Liberman, Tel Aviv, 4. Mai 2014
Let me first identify myself as a friend and a collaborator of Jens Förster. If I understand correctly, in addition to the irregular pattern of data, three points played a major role in the national committee’s conclusion against Jens: That he could not provide the raw data, that he claimed that the studies were actually run in Germany a number of years before submission of the papers, and that he did not see the irregular pattern in his results. I think that it would be informative to conduct a survey among researchers on these points before concluding that Jens’ conduct in these regards is indicative of fraud. (In a similar way, it would be useful to survey other fields of science before concluding anything against social psychology or psychology in general.) Let me volunteer my responses to this survey.
Providing raw data
Can I provide the original paper questionnaires of my studies published in the last five years or the original files downloaded from the software that ran the studies (e.g., Qualtrics, Matlab, Direct-Rt) dated with the time they were run? No, I cannot. I asked colleagues around me, they can’t either. Those who think they can would often find out upon actually trying that this is not the case. (Just having huge piles of questionnaires does not mean that you can find things when you need them.) I am fairly certain that I can provide the data compiled into workable data files (e.g., Excel or SPSS data files). Typically, research assistants rather than primary investigators are responsible for downloading files from running stations and/or for coding questionnaires into workable data files. These are the files that Jens provided the investigating committees upon request. It is perhaps time to change the norm, and request that original data files/original questionnaires are saved along with a proof of date for possible future investigations, but this is not how the field has operated. Until a few years ago, researchers in the field cared about not losing information, but they did not necessarily prepare for a criminal investigation.
Publishing old data
Do I sometimes publish data that are a few years old? Yes, I often do. This happens for multiple reasons: because students come and go, and a project that was started by one student is continued by another student a few years later; because some studies do not make sense to me until more data cumulate and the picture becomes clearer; because I have a limited writing capacity and I do not get to write up the data that I have. I asked colleagues around me. This happens to them too.
The published results
Is it so obvious that something is wrong with the data in the three target papers for a person not familiar with the materials of the accusation? I am afraid it is not. That something was wrong never occurred to me before I was exposed to the argument on linearity. Excessive linearity is not something that anybody checks the data for.
Let me emphasize: I read the papers. I taught some of them in my classes. I re-read the three papers after Jens told me that they were the target of accusation (but before I read the details of the accusation), and after I read the “fraud detective” papers by Simonsohn (2013; ” Just Post it: The Lesson from Two Cases of Fabricated Data Detected by Statistics Alone”), and I still could not see what was wrong. Yes, the effects were big. But this happens, and I could not see anything else.
The commission concluded that Jens should have seen the irregular patterns and thus can be held responsible for the publication of data that includes unlikely patterns. I do not think that anybody can be blamed for not seeing what was remarkable with these data before being exposed to the linearity argument and the analysis in the accusation. Moreover, it seems that the editor, the reviewers, and the many readers and researchers who followed-up on this study also did not discover any problems with the results or if they discovered them, did not regard them as problematic.
And a few more general thoughts: The studies are well cited and some of them have been replicated. The theory and the predictions it makes seem reasonable to me. From personal communication, I know that Jens is ready to take responsibility for re-running the studies and I hope that he gets a position that would allow him to do that. It will take time, but I believe that doing so is very important not only personally for Jens but also for the entire field of psychology. No person and no field are mistake proof. Mistakes are no crimes, however, and they need to be corrected. In my career, somehow anything that happens, good or bad, amounts to more work. So here is, it seems, another big pile of work waiting to be done.
1a. Dies ist meine Reaktion auf den NRC Artikel, erschienen am 29. April 2014
Man möge mir verzeihen, dass ich auf Deutsch schreibe, aber dies ist nun einmal die Sprache, in der ich mich am besten ausdrücke. Es folgt alsbald eine Englische Übersetzung. Ich bitte um Verzeihung, dass ich derzeit keine Niederländische Übersetzung liefern kann.
In der Niederländischen Zeitschrift NRC ist heute ein Artikel erschienen, der ein Ethikverfahren gegen mich zusammenfasst, das an der Universität van Amsterdam (UvA) im September 2012 gegen mich eröffnet wurde. Das Verfahren hatte ein Kollege aus der Methodenlehre gegen mich eingeleitet, weil er auffällige Regelmäßigkeiten in drei meiner Veröffentlichungen festgestellt hatte. In einem ersten, vorläufigen Urteil der UvA wurde kein wissenschaftliches Fehlverhalten festgestellt, jedoch wurde ich gebeten, „Notes of concern“ an die entsprechenden Herausgeber zu schicken, um auf die Muster hinzuweisen. Der Kläger reichte darauf eine Klage bei der nationalen Ethikkommission, der LOWI ein, weil ihm das Urteil zu milde ausfiel. Diese kam kürzlich zu einem negativeren Urteil und geht von wissenschaftlichem Fehlverhalten aus, vor allem weil die Muster, so die statistischen Analysen des Klägers, zu unwahrscheinlich wären. Konkrete Evidenz für Manipulation gäbe es allerdings nicht. Die LOWI schlägt vor, einen Artikel aus dem Jahre 2012 zurückzuziehen. Letzte Woche schloss sich die UvA diesem Urteil weitgehend an, weist aber nachdrücklich darauf hin, dass niemand sagen kann, wer das getan haben könnte und wie die Daten manipuliert wurden. Ich sei allerdings verantwortlich für die Veröffentlichung und hätte die Auffälligkeiten sehen müssen oder können. Die UvA will versuchen, einen Artikel, erschienen 2012, zurück zu ziehen; als Grund wird die statistische Analyse angeführt.
Die rasche Veröffentlichung des Untersuchungsergebnisses der UvA und der LOWI kamen vollkommen überraschend, genauso wie die negative Bewertung meines Verhaltens. Da die LOWI kaum neue Informationen gehabt hat als die vorige Kommission, und da ich nichts Betrügerisches getan habe, erwartete ich einen glatten Freispruch. Das jetzige Urteil ist ein entsetzliches Fehlurteil und für mich absolut nicht zu begreifen. Ich bezweifle auch, dass Kollegen dies nachvollziehen können.
Ich fühle mich als das Opfer einer aus den Rudern geratenen Hexenjagd, die auf Psychologinnen und Psychologen nach der Stapel-Affäre ausgerufen wurde.
Stapel hatte vor drei Jahren Daten frei erfunden und verständlicherweise besonders in den Niederlanden eine wahre Hysterie bewirkt, eine Situation, in der jeder jeden verdächtigt.
Um es klar zu sagen: ich habe weder Daten manipuliert noch meine Mitarbeiter dazu angehalten, Daten zu schönen. Der Ko-Autor Markus Denzler hat mit der Datensammlung und der Analyse nichts zu tun. Ich hatte ihn eingeladen, an der Veröffentlichung teilzunehmen, weil er im Allgemeinen am Thema beteiligt war.
Dementsprechend liegt auch nach über eineinhalb Jahren akribischer Untersuchungstätigkeit von Seiten der LOWI oder der UvA überhaupt kein einziger konkreter Beweis für Fälschung vor. Das einzige, was man mir vorwerfen kann, und das habe ich mehrere Male bereut, ist, dass ich Fragebogen (die übrigens älter als 5 Jahre waren und allesamt in Datenfiles übertragen worden waren) nach einem Umzug in ein viel zu kleines Zimmer weggeworfen habe. Dies geschah auf Anraten eines Kollegen, der mit den Niederländischen Gepflogenheiten vertraut ist. Dies alles geschah, bevor bekannt wurde, dass Diederik Stapel seine Fragebogen erfunden hatte. Es war eine Zeit voll Vertrauen und es galt: wenn Du die kodierten Daten im Computer hast, ist das mehr als genug. Zu Erklärung: die meisten Daten werden sowieso am Computer erhoben, d.h. sie werden direkt in analysebereite Datenformate übertragen. Jedoch hätte mir in meinem Fall auch das Vorweisen von Fragebögen wenig geholfen: Der Kläger ist so überzeugt von der Richtigkeit seiner Analysen, dass er mir hätte unterstellen müssen, ich hätte die Fragebögen gefälscht. Meine Daten wurden bereits nachgerechnet und akribisch überprüft. Die Ergebnisse der Untersuchung, auf die sich die Urteile der LOWI und der UvA ebenfalls stützen (den Namen des Gutachters nenne ich aus Geheimhaltungsgründen nicht) sind folgende:
*die Daten sehen vollkommen realistisch aus
*ich habe alle Analysen richtig gerechnet und wahrheitsgemäß berichtet
*alle Informationen der Fragebögen sind in dem Datenfile enthalten
*die Befunde sind tatsächlich unwahrscheinlich, aber können ebensogut durch tatsächliche Erhebungen zu Stande gekommen sein
*es ist immer möglich, so der Gutachter, dass ungewöhnliche Befunde in der Psychologie erst später erklärt werden können
*falls manipuliert wurde, was nicht mit Sicherheit gesagt werden kann, dann ist überhaupt nicht deutlich, wer es getan hat und wie das geschehen ist
Aufgrund dieser Bewertung hatte ich fest mit einem Freispruch gerechnet und kann die Urteile der LOWI und der UvA nicht nachvollziehen.
Nach dem großen Skandal vor drei Jahren hat sich vieles in der Psychologie geändert, zu Recht. Wir haben andere Standards entwickeln müssen, archivieren, versuchen, so transparent wie möglich zu sein. An der UvA herrschen bald die strengsten Regeln für das Durchführen, das Analysieren und das Archivieren von Daten und das ist auch richtig so.
Man kann dieses Urteil also streng und ahistorisch nennen. Zumindest lässt die Härte der Beurteilung Fragen offen. Auch die Schlussfolgerung, dass das Vernichten von Bögen auf Täuschung hinweist, ist absurd. Es kann auch schlichtweg damit zu tun haben, dass man aufräumen wollte, oder keinen Platz hatte, oder die Archivierung für irrelevant hielt, oder dass man kaum Ressourcen für diese Arbeit übrig hatten. Trotzdem bereue ich mein Verhalten. Ich werde in der Zukunft strengste Kontrolle über die in meinen Labors ablaufenden Prozesse haben. Absolute Transparenz ist für unsere Disziplin schnell zur Selbstverständlichkeit geworden. Mein Fall macht das wieder einmal deutlich.
Der zweite Punkt betrifft die statistischen Analysen des Klägers, die nahelegen, dass die Resultate „zu gut“ aussehen. Seine Analysen und späteren Schreiben klingen so, als gäbe es überhaupt keine andere Interpretation als die, dass Daten manipuliert wurden. Diese starken Schlussfolgerungen sind nicht adäquat. Methodenlehre ist eine Wissenschaft, d.h. Methoden werden wissenschaftlich diskutiert, es gibt immer bessere und weniger gute und die meisten sind fehlerbehaftet. Die vom Kläger verwendeten Methoden sind auch durchaus Teil einer gerade lebhaft stattfindenden wissenschaftlichen Diskussion. Methoden sind zudem auch immer abhängig vom Inhalt der Forschung und hier ließ der Kläger an mehreren Stellen sehen, dass er überhaupt nicht verstand, um was es in meiner Forschung geht. Andere Gutachten kommen zu ganz anderen, stärker qualifizierenden Bewertungen (s.o.): Die Ergebnisse sind unwahrscheinlich aber möglich.
Kurzum, die Schlussfolgerung, dass ich Daten manipuliert habe, wurde nicht bewiesen, sondern bleibt eine Schlussfolgerung auf der Basis von Wahrscheinlichkeiten.
Ich ging davon aus, dass die Unschuldsvermutung gilt und dass die Beweislast beim Kläger liegt. So verstehe ich Recht. LOWI und UvA stützen sich auf die hohe Unwahrscheinlichkeit, errechnet durch Analyseverfahren, die morgen schon obsolet sein können. Die UvA räumt dann auch ein, dass nicht klar ist, wer, wenn überhaupt Hand angelegt hat. Sie hält mich jedoch für verantwortlich. Ich hätte sehen können oder müssen, dass etwas an den Daten merkwürdig ist.
Dem widerspreche ich: ich habe die Regelmäßigkeiten nicht gesehen. Zwei Gutachten der LOWI und der UvA bestätigen das auch: Sie sagen, dass es schwierig bis unmöglich wäre, die Auffälligkeiten zu erkennen, wenn man nicht ein Experte auf diesem Gebiet wäre. Zudem wurde weder vom Herausgeber der Zeitschrift, noch von unabhängigen Gutachtern (peer reviews) etwas Auffälliges bemerkt.
Zudem sprechen externe Merkmale für die Echtheit der von mir gefundenen Phänomene. Die Befunde wurden in internationalen Labors repliziert, was bedeutet, dass die Phänomene, die ich zeige, wiederholbar sind und Substanz haben. Viele Wissenschaftler haben ihre eigenen Untersuchungen auf meinen aufgebaut. Meine Arbeiten leiden nicht an der Replication Crisis (e.g., die Unmöglichkeit, Daten zu replizieren).
UvA wie LOWI schlagen nun vor, den in 2012 erschienenen Artikel zurückzuziehen. Ich habe im Prinzip keine Probleme damit, Artikel zurückzuziehen, stimme inhaltlich aber keineswegs zu. Wenn ich mich meines eigenen Verstandes bediene, und das sollte ich als Wissenschaftler, dann sehe ich für mich keinen ausreichenden Grund zu der Annahme, dass etwas falsch ist. Die statistischen Gutachten sagen eindeutig, dass die Befunde möglich, aber unwahrscheinlich sind. Allein die Analysen des Klägers sind voller übertriebener Schlussfolgerungen und diese sind schlichtweg nicht ausreichend, um einen Artikel zurückzuziehen. Ich überlasse dem Herausgeber der Zeitschrift, dies zu entscheiden.
Zusammenfassend begreife ich das Urteil nicht. Ich kann mir auch nicht vorstellen, dass es viele andere, inhaltlich arbeitende Psychologen verstehen werden. Für mich ist das Verfahren deshalb nicht abgeschlossen. Ich werde einen Brief an die UvA und die LOWI schreiben (Artikel 13 erlaubt dies), in dem ich an Informationen aus den Gutachten erinnern möchte, die entweder vergessen wurden oder zu wenig Beachtung fanden. Zudem gilt es nun, die statistischen Analysen des Klägers, von denen er selbst sagt, sie seien nicht zu veröffentlichen, weil zu provokativ, eingehend zu prüfen. Ich habe mich bisher dabei zurückgehalten, andere mit einzubeziehen, weil ich die Geheimhaltungspflicht sehr Ernst genommen habe. Einige Punkte aus der Analyse des Klägers lassen sich nämlich vermutlich widerlegen. Aus dem Munde des Klägers klingt es dabei so, als wären die Gutachter alle gleicher Meinung, sie sind jedoch weit entfernt davon. Während der Kläger in seiner Ursprungsklage die Linearität für absolut unwahrscheinlich hält, wird später behauptet, dass Linearität in der wirklichen Welt wohl vorkomme.
Ich finde es sehr störend, dass durch die vorschnelle Veröffentlichung des UvA-Berichts (es war eigentlich mehrere Male zur Geheimhaltung angehalten worden, um meinen Ruf zu schützen) nun eine Dynamik ins Rollen kommt, die nur nachteilig für mich sein kann. Meine Möglichkeiten, gegen ein Urteil, das meiner Meinung nach ein Fehler ist, sensibel und in einem geeigneten Kommunikationsrahmen vorzugehen, werden nun beschränkt. Ich kann das Urteil nicht akzeptieren und hoffe, dass es auch in der derzeitigen, sehr angespannten Stimmung, die durch die Veröffentlichung sicherlich noch angepeitscht wird, möglich sein wird, den Dialog mit LOWI und UvA wieder aufzunehmen.
Hochachtungsvoll, Jens Förster
P.S. Zu guter Letzt möchte ich mich bei allen bedanken, die mich in den letzten eineinhalb Jahren unterstützt und begleitet haben. Es war keine leichte Zeit, aber ich habe sie überstanden. Ich liebe Euch.
1 b. This is an English translation of my reaction to a newspaper article that appeared in the Dutch newspaper NRC about me.
Today, an article appeared in the Dutch newspaper “NRC” summarizing an investigation on my academic integrity that was opened in September 2012. The case was opened because a colleague from the methodology department at the University of Amsterdam (UvA) observed some regularities in data of three articles that are supposedly highly unlikely. The UvA commission decided in a first, preliminary evaluation that there was no evidence of academic misconduct, but that I should send “cautionary notes” to the respective editors of the journals pointing to these unlikely regularities. The complainant filed yet a different complaint at the national ethics board, the LOWI, because he found the evaluation too mild. Recently, the LOWI finished the investigation, ending up with a more negative evaluation and found that academic misconduct must have taken place, mainly because the patterns are so unlikely. Concrete evidence for fraud however, has not been found. They also recommended to retract one of the papers that has been published 2012. Last week, the UvA accepted this advice but points to the fact that nobody could say who manipulated the data and how this could have taken place. However, I would be responsible for the data I published because I should or could have seen the odd pattern. They will try to retract the 2012 paper based on the statistical analyses provided during the investigation.
The rapid publication of the results of the LOWI and UvA case happened quite unexpectedly, the negative evaluation came unexpectedly, too. Note that we were all sworn to secrecy by the LOWI, so please understand that I have to write this letter in zero time. Because the LOWI, from my point of view, did not receive much more information than was available for the preliminary, UvA-evaluation, and because I did never did something even vaguely related to questionable research practices, I expected a verdict of not guilty. The current judgment is a terrible misjudgment, I do not understand it at all, and doubt that my colleagues will understand it.
I do feel like the victim of an incredible witch hunt directed at psychologists after the Stapel-affair. Three years ago, we learned that Diederik Stapel had invented data, leading to an incredible hysteria, and understandably, this hysteria was especially strong in the Netherlands. From this point on, everybody looked suspicious to everbody.
To be as clear as possible: I never manipulated data and I never motivated my co workers to manipulate data. My co author of the 2012 paper, Markus Denzler, has nothing to do with the data collection or the data analysis. I had invited him to join the publication because he was involved generally in the project.
Consistently, no concrete evidence for manipulation could be found by LOWI or UvA even after a one and half years lasting, meticulous investigation. The only thing that can be held against me is the dumping of questionnaires (that by the way were older than 5 years and were all coded in the existing data files) because I moved to a much smaller office. I regretted this several times in front of the commissions. However, this was suggested by a colleague who knew the Dutch standards with respect to archiving. I have to mention that all this happened before we learned that Diederik Stapel had invented many of his data sets. This was a time of mutual trust and the general norm was: “if you have the data in your computer, this is more than enough”. To explain: most of the data is collected at the computer anyway and is directly transported to summary files that can be immediately analyzed. Note however, that having the questionnaires would not have helped me in this case: the complainant is so self confident that he is right that he would have had to argue that I faked the questionnaires. In this way, I am guilty in any event. My data files were sent to the commissions and have been re analyzed and tested in detail. The results of the re analysis and the investigations were:
*the data is real
*the analyses I did are correct and are correctly reported
*all information of the questionnaires is in the data files
*the results are indeed unlikely but possible and could have been obtained by actual data collection
*it is always possible (according to the reviewer) that we will understand odd patterns in psychology at a later point in time
*if data manipulation took place, something that cannot even be decided on the basis of the available data, it cannot be said who did it and how it was done.
Based on this evaluation, I expected a verdict of not guilty; I cannot understand the judgments by LOWI and UvA.
After the big scandal three years ago, many things in psychological research have changed, and this is a good thing. We had to develop new standards for archiving, and for conducting our research in the most transparent manner. At UvA we have now the strictest rules one can imagine for conducting, analyzing, and archiving, and this is a good thing.
One can consider the judgment by LOWI and UvA as very strict and ahistorical. At least the harshness of the judgment makes one wonder. Moreover, the conclusion that dumping questionnaires necessarily indicates fraud is absurd. It can simply have happened because one wanted to clean up the room, or because one wanted to make room or because archiving was considered less relevant or because there were no resources left. Nonetheless I regret my behavior. I will of course in the future keep strict control over the procedures in my lab. Absolute transparency is self-evident for our discipline. My case is a cruel example for why this is important.
The second basis for the judgment is the statistical analyses by the complainant, suggesting that the results look “to good to be true”. His analyses and later writings sound, as if there is no other interpretation for the data than data manipulation. These stark conclusions are inadequate. Methodology is a science, that is methods are being discussed scientifically; there are always better or worse methods and many of them are error prone. The methods used by the complainant are currently part of a vivid scientific discussion. Moreover, methods are always dependent on the content of the research. During the investigation, I observed several times that the complainant had no idea whatsoever what my research was actually about. Other reviewers came to different, more qualifying conclusions (see above): the results are unlikely but possible.
In short: The conclusion that I manipulated data has never been proved. It remains a conclusion based on likelihoods.
I assumed, that the presumption of innocence prevails and that the burden of proof is on the accuser. This is how I understand justice. LOWI and UvA base their evaluation on analyses that already tomorrow could be obsolete (and they are being discussed currently). The UvA states that it is not clear, who could have manipulated data if this had been done. But UvA thinks that I am still responsible. I should have or could have seen that something is odd with the data. Note that I did not see these regularities. Two reviews sent to LOWI and UvA explicitly state that it is difficult or impossible for a non expert in statistics to see the regularities. Moreover, neither the editor of the journal nor the independent peer reviewers noticed something weird.
In addition, external markers speak for the validity of the phenomena that I discovered. The results were replicated in many international laboratories, meaning that the phenomena I found can be repeated and have some validity. Many scientists have built their research on the basis of my findings. My work is a counter example to the current “replication crisis”.
UvA and LOWI suggest to retract the 2012 article. In principle, I have no problems with retracting articles, but content wise I do not agree at all with this. I do not see any sufficient reason doing so. The lasts statistical review says explicitly that the results are possible yet unlikely. Only the analyses by the complainant are full of exaggerated conclusions and I simply cannot take them as a valid basis for retracting an article. I will leave it to the editor of the journal if he wants to retract the paper.
In summary, I do not understand the evaluation at all. I cannot imagine that psychologists who work on theory and actual psychological phenomena will understand it. For me, the case is not closed. I will write a letter to UvA and LOWI (see article 13 of the regulations), and will remind the commissions of the information that they overlooked or that did not get their full attention. Moreover, now is the time to test the statistical analyses by the complainant – note that he stated that those would not be publishable because they would be “too provocative”. I hesitated until now to hand his analysis over to other statisticians, because confidentiality in this case was important to me. Some of his arguments I believe will be challenged. Listening to the complainant, one gets the impression that the current reviews are all confirming his views. However this is not the case. For example, in his original complaint he stated that linearity is completely unlikely, and later we learned that linearity does exist in the real world.
I found it disturbing that due to the premature publication of the UvA-report a dynamic started that is clearly disadvantageous to me. My options, to argue against an evaluation that is from my perspective wrong in a sensible way and within an appropriate communication frame are drastically limited now. However, I cannot accept the evaluation and I hope that in the current tense atmosphere, that has been partly fueled by the premature publication, it will still be possible to re start the dialogue with UvA and LOWI.
Regards, Jens Förster
P.S. Finally, I would like to thank all the friends that supported me through the last one and a half years. It has been a quite difficult time for me but see, I survived. I love you.