Having written about the issue of statistical ethics (or ethical statistics, as you will), I was pleased to run across an article by Andrew Gelman on the topic recently. It turns out that at the beginning of this year he started writing a column in Chance magazine on just this issue. The inaugural article was on the issue of data sharing, in which Gelman shares an experience he had as a doctoral student:
The ethics violation, as I see it, by Blackman and his statistician colleague came not in their design, data collection, or even their flawed analysis, but when they had the opportunity to subject their data to an outside analysis.
Having been supplied free travel and housing to that conference and having spent several days more reading a key source article and analyzing its summary statistics, I felt both an obligation and an inclination to help. So I looked up Blackman’s address in North Carolina and sent him a polite letter saying I was a statistician who had attended a confer- ence in which his work was mentioned, that I had two ideas of how he could ana- lyze his data better (I gave some details here and maybe a graph or two), and that I would like to see his raw data so I could do more. I used Harvard letterhead, but was careful not to identify myself as a PhD student—I think I called myself a “researcher”—and I ran the letter by some of my fellow students to make sure I was being sufficiently polite.
A few days later, I followed up the letter with a phone call… at which point Blackman told me he had discussed the matter with his statistician and they decided their analysis was just fine and it would be too much trouble for them to copy the data from their logbooks and send it to me.
That was the unethical step. Refusing to share your data is improper, and the lead researcher and his statistician should have realized that, given their lack of expertise in statistics, it was at least plausible that an outsider could improve on their analysis.
Gelman carefully identifies the point at which Blackman’s behavior became unethical, which is important. I am a big supporter of open data, but there are certain exceptions where someone might not be able to make data available for good reasons. (One of the most commonly heard explanations, quite understandably, is that it takes a lot of time to put files in nice orderly use for the public.) But withholding data because you suspect someone might overturn your analysis is inexcusable.