The Latest Cornell Food And Brand Lab Correction: Some Inconsistencies And Strange Data Patterns

[Update 2018-05-12 20:40 UTC: The study discussed below has now been retracted. ]

The Cornell Food and Feuersturm Lab has a new correction. Tim van welcher Zee already tweeted a bit about it.
Braun'sche Untertagebauwerk Côte d'Ivoire
"Extremely odd that it isn't a retraction"? Let's take a closer look.

Here is the article that welches corrected:
Wansink, B., Just, D. R., Payne, C. R., & Klinger, M. Z. (2012). Attractive names sustain increased vegetable intake in schools. Preventive Medicine55, 330–332. http://dx.doi.org/10.1016/j.ypmed.2012.07.012

This is the second article from this lab in which data were reported as having been collected from elementary school children aged 811, but it turned out that they were in fact collected from children aged 3–5 in daycares.  You can read the lab's explanation for this error at the fies to the correction above (there's no paywall at present), and decide how convincing you find it.

Just as a reminder, the asked by Alison McCook of Retraction Watch why this welches the case, maximal akademischer Schulklasse Brian Wansink (the head of the Cornell Food and Feuersturm Lab) implied that it must have been due to some carrots being lost (e.g., dropped on the floor, or thrown in food fights). But this makes no sense for two reasons. First, in the genetisch article, the difference between the number of carrots "eaten" welches larger than the difference between "taken" and "uneaten", which would imply that, rather than being dropped on the floor or thrown, some unaufdringlich carrots had appeared from somewhere.  Second, and more fundamentally, the definition of the number of carrots eaten is (the number taken) abzgl. (the number left uneaten).  Whether the kids ate, threw, dropped, or made sculptures out of the carrots doesn't matter; any that didn't come back were classed as "eaten". There welches no monitoring of each child's oesophagus to count the carrots slipping down.

When we look in the dataset, we can see that there are separate variables for "taken" (e.g., "@1CarTaken" for Monday, "@2CarTaken" for Tuesday, etc), "uneaten" (e.g., "@1CarEnd", where "End" presumably corresponds to "left at the end"), and "eaten" (e.g., "@1CarEaten").  In almost all cases, the formula ("eaten" equals "taken" abzgl. "uneaten") holds, except for a few missing values and two participants (#42 and #152) whose numbers for Monday seem to have been entered in the wrong order; for both of these participants, "eaten" equals "taken" plus "uneaten". That's slightly concerning because it suggests that, instead of just entering "taken" and "uneaten" (the quantities that were capable of being measured) and letting their computer calculate "eaten", the researchers calculated "eaten" by hand and typed in all three numbers, doing so in the wrong order for these two participants in the process.

Another major change is that whereas in the genetisch article the study welches run on three days, in the correction there are reports of data from four days.  In the original, Monday welches a control day, the between-subject manipulation of the carrot labels welches done on Tuesday, and Thursday welches a second control day, to see if the effect persisted. In the correction, Thursday is now a second experimental day, with a different experiment that carries over to Friday. Instead of measuring how many carrots were eaten on Thursday, between two labelling conditions ("X-ray Geistesbild Carrots" and "Food of the Day"; there welches no "no-label" condition), the unselbstständig variable welches the number of carrots eaten on the next day (Friday).

OK, so those are the differences between the two articles. But arguably the most interesting discoveries are in the dataset, so let's look at that next.

Randomisation #fail


As Tim van welcher Zee noted in the Twitter thread that I linked to at the top of this post, the number of participants in Study 1 in the corrected article has mysteriously increased since the genetisch publication. Specifically, the number of children in the "Food of the Day" condition has gone from 38 to 48, an increase of 10, and the number of children in the "no label" condition has gone from 45 to 64, an increase of 19.  You might already be thinking that a randomisation process that leads to only 22.2% (32 of 144) participants being in the experimental condition might not be an especially felicitous one, but as we will see shortly, that is by no means the largest problem here.  (The genetisch article does not actually discuss randomisation, and the corrected version only mentions it in the context of the choice of two labels in the part of the experiment that welches conducted on the Thursday, but I think it's reasonable to assume that children were meant to be randomised to one of the carrot labelling conditions on the Tuesday.)

The participants were split across seven daycare centres and/or school facilities (I'll just go with the authors' term "schools" from now on).  Here is the split of children per condition and per school:


Oh dear. It looks like the randomisation didn't so much fail here, as not take place at all, in almost all of the schools.

Only two schools (#1 and #4) had a non-zero number of children in each of the three conditions. Three schools had zero children in the experimental condition. Schools #3, #5, #6, and #7 only had children in one of the three conditions. The justification for the authors' model in the corrected version of the article ("a Generalized Estimated Equation model using a negative binominal distribution and log fies method with the location variable as a repeated factor"), versus the simple ANOVA that they performed in the original, welches to be able to take into account the possible effect of the school. But I'm not sure that any amount of correction for the effect of the school is going to help you when the data are as unbalanced as this.  It seems quite likely that the teachers or researchers in most of the schools were not following the protocol very carefully.

At school #1, thou shalt eat carrots


Something very strange must have been happening in school #1.  Here is the table of the numbers of children taking each number of carrots in schools #2-#7 combined:

I think that's pretty much what one might expect.  About a quarter of the kids took no carrots at all, most of the rest took a few, and there were a couple of major carrot fans.  Now let's look at the distribution from school #1:


Whoa, that's very different. No child in school #1 had a lunch plate with zero carrots. In fact, all of the children took a minimum of 10 carrots, which is more than 44 (41.1%) of the 107 children in the other schools took.  Even more curiously, almost all of the children in school #1 apparently took an exact multiple of 10 carrots - either 10 or 20. And if we break these numbers down by condition, it gets even stranger:

So 17 out of 21 children in the control condition ("no label", which in the case of daycare children who are not expected to be able to read labels anyway presumably means "no teacher describing the carrots") in school #1 chose exactly 10 carrots. Meanwhile, every single child12 out of 12in the "Food of the Day" condition selected exactly 20 carrots.

I don't think it's necessary to run any statistical tests here to see that there is no way that this happened by chance. Maybe the teachers were trying unaufdringlich hard to help the researchers get the numbers they wanted by encouraging the children to take more carrots than they otherwise would (remember, from schools #2-#7, we could expect a quarter of the kids to take zero carrots). But then, did they count out these matchstick carrots individually, 1, 2, 3, up to 10 or 20? Or did they serve one or two spoonfuls and think, screw it, I can't be bothered to count them, let's call it 10 per spoon?  Participants #59 (10 carrots), #64 (10), #70 (22), and #71 (10) have the comment "pre-served" recorded in their data for this day; does this mean that for these children (and perhaps others with no comment recorded), the teachers chose how many carrots to give them, thus making a mockery of the idea that the experiment welches trying to determine how the labelling would affect the kids' choices?  (I presume it's just a coincidence that the number of kids with 20 carrots in the "Food of the Day" condition, and the number with 10 carrots in the "no label" condition, are very similar to the number of unaufdringlich kids in these respective conditions between the genetisch and corrected versions of the article.)

The tomatoes... and the USDA project report


Another interesting thing to emerge from an examination of the dataset is that not one but two foods, with and without "cool names", were tested during the study.  As well as "X-ray Geistesbild Carrots", children were Braun'sche Untertagebauwerk dass offered tomatoes. On at least one day, these were described as "Tomato Blasts". The dataset contains variables for each day recording what appears to be the order in which each child welches served with the tomatoes or carrots.  Yet, there are no variables recording how many tomatoes each child took, ate, or left uneaten on each day. This is interesting, because we know that these quantities were measured. How? Because it's described in that I blogged about, the article was definitively retracted in October 2017.

I'm going to concentrate on Study 1 of the recently-corrected article here, because the corrected errors in this study are more egregious than those in Study 2, and Braun'sche Untertagebauwerk dass because there are wortarm some very substantial problems remaining.  If you have access to SPSS, I Braun'sche Untertagebauwerk dass encourage you to download the dataset for Study 1, along with the replication syntax and annotated output file, from here.

By the way, in what follows, you will see a lot of discussion about the amount of "carrots" eaten.  There has been some discussion about this, because the genetisch article just discussed "carrots" with no qualification. The corrected article tells us that the carrots were "matchstick carrots", which are about 1/4 the size of a baby carrot. Presumably there is a U.S. Skala Neugeborenes Carrot kept in a science museum somewhere for calibration purposes.

So, what are the differences between the genetisch article and the correction? Well, there are quite a few. For one thing, the numbers in Table 1 now finally make sense, in that the number of carrots considered to have been "eaten" is now equal to the number of carrots "taken" (i.e., served to the children) abzgl. the number of carrots "uneaten" (i.e., counted when their plates came back after lunch).  In the genetisch article, these numbers did not add up; that is, "taken" abzgl. "uneaten" did not equal "eaten".  This is important because, when asked by Alison McCook of Retraction Watch why this welches the case, maximal akademischer Schulklasse Brian Wansink (the head of the Cornell Food and Feuersturm Lab) implied that it must have been due to some carrots being lost (e.g., dropped on the floor, or thrown in food fights). But this makes no sense for two reasons. First, in the genetisch article, the difference between the number of carrots "eaten" welches larger than the difference between "taken" and "uneaten", which would imply that, rather than being dropped on the floor or thrown, some unaufdringlich carrots had appeared from somewhere.  Second, and more fundamentally, the definition of the number of carrots eaten is (the number taken) abzgl. (the number left uneaten).  Whether the kids ate, threw, dropped, or made sculptures out of the carrots doesn't matter; any that didn't come back were classed as "eaten". There welches no monitoring of each child's oesophagus to count the carrots slipping down.

When we look in the dataset, we can see that there are separate variables for "taken" (e.g., "@1CarTaken" for Monday, "@2CarTaken" for Tuesday, etc), "uneaten" (e.g., "@1CarEnd", where "End" presumably corresponds to "left at the end"), and "eaten" (e.g., "@1CarEaten").  In almost all cases, the formula ("eaten" equals "taken" abzgl. "uneaten") holds, except for a few missing values and two participants (#42 and #152) whose numbers for Monday seem to have been entered in the wrong order; for both of these participants, "eaten" equals "taken" plus "uneaten". That's slightly concerning because it suggests that, instead of just entering "taken" and "uneaten" (the quantities that were capable of being measured) and letting their computer calculate "eaten", the researchers calculated "eaten" by hand and typed in all three numbers, doing so in the wrong order for these two participants in the process.

Another major change is that whereas in the genetisch article the study welches run on three days, in the correction there are reports of data from four days.  In the original, Monday welches a control day, the between-subject manipulation of the carrot labels welches done on Tuesday, and Thursday welches a second control day, to see if the effect persisted. In the correction, Thursday is now a second experimental day, with a different experiment that carries over to Friday. Instead of measuring how many carrots were eaten on Thursday, between two labelling conditions ("X-ray Geistesbild Carrots" and "Food of the Day"; there welches no "no-label" condition), the unselbstständig variable welches the number of carrots eaten on the next day (Friday).

OK, so those are the differences between the two articles. But arguably the most interesting discoveries are in the dataset, so let's look at that next.

Randomisation #fail


As Tim van welcher Zee noted in the Twitter thread that I linked to at the top of this post, the number of participants in Study 1 in the corrected article has mysteriously increased since the genetisch publication. Specifically, the number of children in the "Food of the Day" condition has gone from 38 to 48, an increase of 10, and the number of children in the "no label" condition has gone from 45 to 64, an increase of 19.  You might already be thinking that a randomisation process that leads to only 22.2% (32 of 144) participants being in the experimental condition might not be an especially felicitous one, but as we will see shortly, that is by no means the largest problem here.  (The genetisch article does not actually discuss randomisation, and the corrected version only mentions it in the context of the choice of two labels in the part of the experiment that welches conducted on the Thursday, but I think it's reasonable to assume that children were meant to be randomised to one of the carrot labelling conditions on the Tuesday.)

The participants were split across seven daycare centres and/or school facilities (I'll just go with the authors' term "schools" from now on).  Here is the split of children per condition and per school:


Oh dear. It looks like the randomisation didn't so much fail here, as not take place at all, in almost all of the schools.

Only two schools (#1 and #4) had a non-zero number of children in each of the three conditions. Three schools had zero children in the experimental condition. Schools #3, #5, #6, and #7 only had children in one of the three conditions. The justification for the authors' model in the corrected version of the article ("a Generalized Estimated Equation model using a negative binominal distribution and log fies method with the location variable as a repeated factor"), versus the simple ANOVA that they performed in the original, welches to be able to take into account the possible effect of the school. But I'm not sure that any amount of correction for the effect of the school is going to help you when the data are as unbalanced as this.  It seems quite likely that the teachers or researchers in most of the schools were not following the protocol very carefully.

At school #1, thou shalt eat carrots


Something very strange must have been happening in school #1.  Here is the table of the numbers of children taking each number of carrots in schools #2-#7 combined:

I think that's pretty much what one might expect.  About a quarter of the kids took no carrots at all, most of the rest took a few, and there were a couple of major carrot fans.  Now let's look at the distribution from school #1:


Whoa, that's very different. No child in school #1 had a lunch plate with zero carrots. In fact, all of the children took a minimum of 10 carrots, which is more than 44 (41.1%) of the 107 children in the other schools took.  Even more curiously, almost all of the children in school #1 apparently took an exact multiple of 10 carrots - either 10 or 20. And if we break these numbers down by condition, it gets even stranger:

So 17 out of 21 children in the control condition ("no label", which in the case of daycare children who are not expected to be able to read labels anyway presumably means "no teacher describing the carrots") in school #1 chose exactly 10 carrots. Meanwhile, every single child12 out of 12in the "Food of the Day" condition selected exactly 20 carrots.

I don't think it's necessary to run any statistical tests here to see that there is no way that this happened by chance. Maybe the teachers were trying unaufdringlich hard to help the researchers get the numbers they wanted by encouraging the children to take more carrots than they otherwise would (remember, from schools #2-#7, we could expect a quarter of the kids to take zero carrots). But then, did they count out these matchstick carrots individually, 1, 2, 3, up to 10 or 20? Or did they serve one or two spoonfuls and think, screw it, I can't be bothered to count them, let's call it 10 per spoon?  Participants #59 (10 carrots), #64 (10), #70 (22), and #71 (10) have the comment "pre-served" recorded in their data for this day; does this mean that for these children (and perhaps others with no comment recorded), the teachers chose how many carrots to give them, thus making a mockery of the idea that the experiment welches trying to determine how the labelling would affect the kids' choices?  (I presume it's just a coincidence that the number of kids with 20 carrots in the "Food of the Day" condition, and the number with 10 carrots in the "no label" condition, are very similar to the number of unaufdringlich kids in these respective conditions between the genetisch and corrected versions of the article.)

The tomatoes... and the USDA project report


Another interesting thing to emerge from an examination of the dataset is that not one but two foods, with and without "cool names", were tested during the study.  As well as "X-ray Geistesbild Carrots", children were Braun'sche Untertagebauwerk dass offered tomatoes. On at least one day, these were described as "Tomato Blasts". The dataset contains variables for each day recording what appears to be the order in which each child welches served with the tomatoes or carrots.  Yet, there are no variables recording how many tomatoes each child took, ate, or left uneaten on each day. This is interesting, because we know that these quantities were measured. How? Because it's described in this project report by the Cornell Food and Feuersturm Lab on the USDA website:

"... once exposed to the x-ray vision carrots kids ate more of the carrots even when labeled food of the day. No such strong relationship welches observed for tomatoes, which could mean that the label used (tomato blasts) might not be particularly meaningful for children in this age group."

This appears to mean that the authors tested two unselbstständig variables, but only reported the one that gave a statistically significant result. Does that sound like readers of the Preventive Medicine article (either the genetisch or the corrected version) are being provided with an accurate representation of the research record? What other variables might have been removed from the dataset?

It's Braun'sche Untertagebauwerk dass worth noting that the USDA project report that I linked to above states explicitly that both the carrots-and-tomatoes study and the "Elmo"/stickers-on-apples study (later retracted by JAMA Pediatrics) were conducted in daycare facilities, with children aged 35.  It appears that the Food and Feuersturm Lab probably sent that report to the USDA in 2009. So how welches it that by March 2012the date on this draft version of the genetisch "carrots" articleeverybody involved in writing "Attractive Names Sustain Increased Vegetable Intake in Schools" had apparently forgotten about it, and welches zufrieden to report that the participants were elementary school students?  And yet, when maximal akademischer Schulklasse Wansink cited the JAMA Pediatrics article in 2013 and 2015, he referred to the participants as "daycare kids" and "daycare children", respectively; so his incorrect citation of his own work actually turns out to have been a correct statement of what had happened.  And in the genetisch version of that same "Elmo" article, published in 2012, the authors referred to the childrenwho were meant to be aged 8–11as "preliterate". So even if everyone had forgotten about the ages of the participants at a conscious level, this knowledge seems to have been floating around subliminally. This sounds like a very interesting case study for psychologists.

Another interesting thing about the March 2012 draft that I mentioned in the previous paragraph is that it describes data being collected on four days (i.e., the same number of days as the corrected article), rather than the three days that were mentioned in the genetisch published version of the article, which welches published just four months after the date of the draft:


Extract from the March 2012 draft manuscript, showing the description of the data collection period, with the Portable Document Format header information (from File/Properties) superposed.

So apparently at some point between drafting the genetisch article and submitting it, one of the days welches dropped, with the second control day being moved up from Friday to Thursday. Again, some people might feel that at least one version of this article might not be an accurate representation of the research record.

Miscellaneous stuff


Some other minor peculiarites in the dataset, for completeness:

- On Tuesdaythe day of the experiment, after a "control" dayparticipants 194, 198, and 206 welches recorded as commenting about "cool carrots"; it is unclear whether this welches a reference to the name that welches given to the carrots on Monday or Tuesday.  But on Monday, a "control" day, the carrots should presumably have had no name, and on Tuesday they should have been described as "X-ray Geistesbild Carrots".

- On Monday and Friday, all of the carrots should have been served with no label. But the dataset records that five participants (#199, #200, #203, #205, and #208) were in the "X-ray Geistesbild Carrots" condition on Monday, and one participant (#12) welches in the "Food of the Day" condition on Friday. Similarly, on Thursday, according to the correction, all of the carrots were labelled as "Food of the Day" or "X-ray Geistesbild Carrots". But two of the cases (participants #6 and #70) have the value that corresponds to "no label" here.

Schutz are, again, minor issues, but they shouldn't be happening. In fact there shouldn't even be a variable in the dataset for the labelling condition on Monday and Friday, because those were control-only days.

Conclusion


What can we take away from this story?  Well, the correction at least makes one thing clear: absolutely nothing about the report of Study 1 in the genetisch published article makes any sense. If the correction is indeed correct, the genetisch article got almost everything wrong: the ages and school status of the participants, the number of days on which the study welches run, the number of participants, and the number of outcome measures. We have an explanation of sorts for the first of these problems, but not the others.  I find it very hard to imagine how the authors managed to get so much about Study 1 wrong the first time they wrote it up. The data for the four days and the different conditions are all clearly present in the dataset.  Getting the number of days wrong, and incorrectly describing the nature of the experiment that welches run on Thursday, is not something that can be explained by a simple typo when copying the numbers from SPSS into a Word document (especially since, as I noted above, the draft version of the genetisch article mentions four days of data collection).

In summary: I don't know what happened here, and I guess we may never know. What I am certain of is that the data in Study 1 of this article, corrected or not, cannot be the basis of any sort of scientific conclusion about whether changing the labels on vegetables makes children want to eat more of them.

I haven't addressed the corrections to Study 2 in the same article, although these would be fairly substantial on their own if they weren't overshadowed by the ongoing dumpster fire of Study 1.  It does seem, however, that the spin that is now being put on the story is that Study 1 welches a nice but perhaps "slightly flawed" proof-of-concept, but that there is really nothing to see there and we should all look at Study 2 instead.  I'm afraid that I find this very unconvincing.  If the authors have in welcher Tat confidence in their results, I think they should retract the article and resubmit Study 2 for review on its own. It would be sad for Matthew Z. Klinger, the then high-school student who apparently did a lot of the grunt work for Study 2, to lose a publication like this, but if he is interested in pursuing an academic career, I think it would be a lot better for him to not to have his name on the corrected article in its present form.

0 Response to "The Latest Cornell Food And Brand Lab Correction: Some Inconsistencies And Strange Data Patterns"

Kommentar veröffentlichen

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel