Uncategorized


23
Dec 13

The week in stats (Dec. 23rd edition)


16
Dec 13

The week in stats (Dec. 16th edition)


10
Dec 13

Prize for statistics students?

In order to promote work on statistical simulations, as well as thinking about deeper issues in data analysis, I’m considering starting a prize for students.

Here are my ideas:

* One prize would be for the most innovative use of Monte Carlo methods to model a problem in pure or applied statistics. This prize would be offered in two divisions: undergraduate and graduate.

* One prize would be for an essay that explores the foundations of probability theory or statistics with an emphasis on epistemological issues. This would be open to all students.

* Prizes would be in the $3,000 – $6,000 range.

* The judging committee would be drawn from professors, students and industry.

What are your thoughts? Specifically:

* If you’re a student, is this something you’d apply for?

* If you’re a professor or instructor, do you think your students would be interested in this? Would you pass along the information to them?

* If you represent a company, could you see advantages to sponsoring one of the prizes?

* What changes or suggestions do you have?


9
Dec 13

The week in stats (Dec. 9th edition)

  • The problems with using a p-value as a fixed cutoff for hypothesis testing are well known. Probabilities and P-Values is another article that discusses the weakness of the p-value. However, like every author who claims the p-value is horrible, no one is able to produce a satisfactory substitute.
  • PirateGrunt is currently producing a series of 24 articles called 24 Days of R. In every post, he shares a few neat R tricks and explains how you can use them. You may find his first post here and the subsequent ones in his blog.
  • Coursera – an online education startup – has rapidly expanded its curriculum of statistics and data analysis courses. There are now 33 modules directly linked to the field, excluding the courses where statistics and data science are used as a supportive tool (e.g. finance). These courses make use of multiple statistical software packages like Python, MATLAB and of course R.  Here’s the complete list of Coursera courses using R, ranked by “popularity”.
  • For those interested in machine learning, a preview of Data Mining Applications with R by Yanchang Zhao and Yonghua Cen is available here.
  • A tutorial on the R package Plotly, and how to make beautiful visuals and graphs with it.
  • A recent article by Matt Asay claims that “Python is displacing R as the language for data science.” David Smith of Revolution Analytics discusses his thoughts on the competition of R and Python.
  • Consider n points uniformly distributed on a sphere. What is the probability that all points lie on a same hemisphere (not necessarily the north or south hemisphere)? Arthur Charpentier of Freakonometrics presents a simulation-based solution, along with some very nice visuals.

25
Nov 13

The week in stats (Nov. 25th edition)


18
Nov 13

The week in stats (Nov. 18th edition)


11
Nov 13

The week in stats (Nov. 11th edition)


4
Nov 13

The week in stats (Nov. 4th edition)


28
Oct 13

The week in stats (Oct. 28st edition)


22
Oct 13

The disgrace of the mandatory census

In 2011, Audrey Tobias refused to provide Statistics Canada with a filled out copy of her census form, as mandated by law. Her decision, and her decision to stand by that decision, led to a trial in which the 89-year-old faced jail time. Although Tobias stated that her act was protest against the use of US military contractor Lockheed Martin to process the forms, and not against the mandatory nature of the census itself, this was really a trial of the government’s power to compel citizens to provide it with private information. As Tobias’ lawyer, Peter Rosenthal, argued, compelling Tobias to fill out the form on threat of jail was a violation of the Canadian Charter of Rights, and its provisions for freedom of conscience and expression.

The judge in the case, Ramez Khawly, rejected Rosenthal’s argument, but found a way to find Tobias not guilty anyway on the basis of his doubt about her intent in not filling out the form. Perhaps sensing the outrage that might ensue over punishing an octogenarian for a non-violent act of civil disobedience, Khawly was nevertheless too fearful, or obtuse, to uphold an argument that would set a highly inconvenient precedent from the standpoint of the state. The judge both justified and exposed his particular mix of cowardice and compassion by asking, “Could they [the Crown] not have found a more palatable profile to prosecute as a test case?”

I suppose I shouldn’t be surprised by the judge’s politically expedient decision. What shocks me is the reaction of many regular citizens, and in particular of some fellow statisticians. Let me be as clear as possible about this: support for the mandatory census is a moral abomination and a professional disgrace. It should go without saying that informed consent is a baseline, a bare minimum for morality when conducting experiments with human subjects. Forcing citizens to divulge information they would otherwise wish to keep private, on pain of throwing them in a locked cage, does not qualify as informed consent!

There is no point here in arguing that what’s being requested is a minor inconvenience, or an inconsequential imposition. Informed consent doesn’t mean “what we think you should consent to.” More than anything else, statistics is about understanding the inherent uncertainties in measurement, prediction, and extrapolation. Just because you might not object to answering certain questions, gives no reason to assume the universality of your preferences. Finally, note that to at least a small group of revolutionaries, the right not to divulge certain information to authorities was so important that it was written right into the Bill of Rights.

Besides the argument that the census in minimally invasive, I’ve also heard it argued that the value of obtaining complete data outweighs concerns of privacy and choice. To this I say that our desire, as statisticians, for complete and reliable data, isn’t some ethical trump card, nor is it the scientific version of a religious indulgence that purifies our transgressions.

Dealing with incomplete and imprecise data isn’t some unique problem that can be overcome at the point of a gun, it’s the very heart and soul of statistics! In the real world, there is no such thing as indisputably complete or infinity precise data. That’s why we have confidence intervals, likelihood estimates, rules for data cleaning, and a wide variety of sampling procedures. In fact, these sampling procedures, if properly chosen and well executed, can be more accurate than a census.

I call on all those who work for StatsCan or other organizations to refuse to participate in any non-consensual surveys, to stand up for their own good name and the good name of the profession, and to focus their energies on finding creative, scientifically sound, non-coercive ways to obtain high quality data.