- A historian, a data scientist, a programmer, a mathematician, and a philosopher discuss the question
*How likely it is that a lottery draw (6 out of 49) contains two consecutive numbers.* - Suppose that A, B, and C are uniformly distributed on [0, 1], what is the probability that the equation [latex] Ax^2 + Bx + C [/latex] has real root(s)?
- Dimiter Toshkov of
*Rules of Reason*presents Predicting movie ratings with IMDb data and R and suggests a different way of awarding the Academy Awards based on statistics. - Visualized related articles are always liked by our readers. This week, we have: Plotting an Odd number of plots in single image, Beautiful table outputs in R, Visualizations on the Monopoly board, and Basketball movements visualized.
- Xi’an reviews
*Bayesian Programming*by Pierre Bessière, Emmanuel Mazer, Juan-Manuel Ahuactzin, and Kamel Mekhnacha. - Ever wonder how popular your favorite R functions are? Check out the Function Counter for R.
- And finally, Rasmus Bååth shares easy ways to create matrices in R.

## Uncategorized

10

Mar 14

## The week in stats (Mar. 10th edition)

3

Mar 14

## The week in stats (Mar. 3rd edition)

- Like almost every week, R articles attract lots of attention from readers. This week, we have: Quick and dirty notes on General Linear Mix Models, How to Make a Bad Password with R, rMaps and the Mexico map, How to Read Histograms and Use Them in R, Useful Functions in R for Manipulating Text Data, and Simply creating various scatter plots with ggplot.
- r4stats.com publishes a detailed report on various ways of measuring the popularity or market shares of approximately 30 software packages for analytics, including well-known names such as R, Matlab, SAS, SPSS, Stata, Python.
- Quintuitive discusses his experience and thoughts after using RStudio for one year.
- Xi’an reviews two new books this week, the first one is called Nonlinear Time Series by Randal Douc, Éric Moulines and David Stoffer, and the second is called Foundations of Statistical Algorithms by Claus Weihs, Olav Mersman and Uwe Ligges.
- If you are an active stock investor, you should consider Using CART for Stock Market Forecasting.
- And finally, Nathan Yau of FlowingData explains the statistical reasoning behind why you should buy the bigger pizza.

24

Feb 14

## The week in stats (Feb. 24th edition)

- Last Monday (Feb. 17th) was R.A. Fisher’s birthday. To honor him, Deborah G. Mayo, Professor of Philosophy at Virginia Tech, publishes
*Fisher and Neyman after anger management,*and R.A. Fisher: ‘Two New Properties of Mathematical Likelihood’. - A few articles related to visualization and graphics received lots of attention: ggplot2: Cheatsheet for Visualizing Distributions, Automatically coloring your R output in the terminal using colorout, A visual explanation of conditional probability, R: Fun with surf3D function and No need for SPSS – beautiful output in R.
- Given that you have a five-card hand with ♠K and ♡K, what is the probability that you have all four Kings?
- Coursera offers a new MOOC course called Data Analysis for Genomics. The course starts on April 7, 2014.
- And finally, Arthur Charpentier (aka Freakonometrics) publishes a technical article called Identification of ARMA processes.

17

Feb 14

## The week in stats (Feb. 17th edition)

- Professor Roger Peng of the Johns Hopkins Bloomberg School of Public Health discusses the meaning of Reproducible Analysis, why it is important, and how to ensure that your R analysis is reproducible.
- A recent survey by Revolution Analytics show that R language skills attract median salaries in excess of $110,000 in the United States.
- Last week, many helpful R articles attracted attention from readers. A million ways to connect R and Excel, efficiency of Importing Large CSV Files in R, R framework with Object-Oriented Programming, ggplot Fit Line and Lattice Fit Line in R, and Interactive maps with R.
- Big Data is a popular term that everyone in almost every field discusses, however, Stephen Turner, assistant professor of public health sciences and director of the Bioinformatics Core at the University of Virginia, argues that There is no Such Thing as Biomedical “Big Data”.
- Welcome to the age of Databall – the rise of analytics usage in the NBA.
- And finally, suppose that you pick a random interger from 0 to 1000. Given that this integer is divisible by 4, what is the probability that it is also divisible by 3?

10

Feb 14

## The week in stats (Feb. 10th edition)

- The latest survey conducted by RedMonk shows that R is 15th of top programming languages.
- Simplex Regression (a technique that minimizes the absolute error of residuals rather than squared error) is an alternative to traditional least squares because it is resistant to outliers in the data, and helpful in studies where outliers may be safely and effectively ignored. This week, WenSui (文穗) teaches how to fit simplex regressions in R.
- Does sexual activity change with age?
- Eran Raviv continues the R vs. Matlab comparison. This week, R wins the second round and we are tied at 1-1.
- A brief review of R Studio and “Advanced R Development”
- And finally, Joseph Rickert of Revolution Analytics presents a tutorial on analyzing weather data using his new R package weatherData.

3

Feb 14

## The week in stats (Feb. 3rd edition)

- The Odds Ratio is a confusing but unavoidable statistic which comes up in both scientific and non-scientific articles. In a recent short paper published in the British Medical Journal, Robert Grant explains why it confuses people and how it should be interpreted.
- Last week, many helpful R articles attracted attention from readers. Comparisons of R vs. Matlab, and R vs. Python, how to compare multiple (g)lm in one graph, working with time series data sources, Princeton’s guide to linear modeling and logistic regression with R, and A First Look at rxDForest() – an R classification and regression tree package.
- Xi’an discusses a recent paper by Chris Drovandi and Tony Pettitt called Bayesian indirect inference.
- What are your chances of making it to the big leagues? Ryan Sleeper created an interactive visualization to show the odds for different sports. Choose wisely: for a high school athlete your chances can be as high as one in 170 or as low as 1 in 19,056.

27

Jan 14

## The week in stats (Jan. 27th edition)

- If you see a good plot and want the dataset, what should you do? Wiekvoet presents a tutorial on how you can convert graphs into dataset via PlotDigitizer and Engauge Digitizer (and of course R as well).
- When statistics meets rhetoric: A text analysis of “I Have a Dream” in R.
- If you use R and frequently work with business datasets, you may find the following articles useful: Using Scatterplots and Models to Understand the Diamond Market, Estimating a nonlinear time series model in R, Easy data maps with R: the choroplethr package, Database Reflection using dplyr, and Fast and easy data munging, with dplyr.
- PirateGrunt publishes the first article of his new series called An idiot learns Bayesian analysis. As the title suggests, these articles explain key concepts of Bayesian analysis to readers without much background in probability and statistics.
- Wish you had a girlfriend? Learn how to use data to find one.

20

Jan 14

## The week in stats (Jan. 20th edition)

- If you do your statistical work in R, but need to present results in slides, read up on how to make your R figures legible in Powerpoint/Keynote presentations.
- We have a collection of good R tips and tricks this week: How to see source code of built-in functions in R, Calling Python from R with rPython, Some good R programming tips, Averaging R Datasets By Group, and An introduction to dplyr (a set of tools for efficiently manipulating datasets).
- Andrew Gelman gives some advice on writing research articles.
- Xi’an discusses a recent paper on accelerated ABC (approximate Bayesian computation), presented during MCMSki 4.
- And finally, show that for any random variables X and Y, and a constant c, we have P(X+Y>c) ≤ P(X>c/2) +P (Y>c/2)

13

Jan 14

## The week in stats (Jan. 13th edition)

- This week, we recommend two books on machine learning to our readers:
*Machine Learning with R*by Brett Lantz (reviewed by Alvaro “Blag” Tejada Galindo), and*An Introduction to Statistical Learning with Applications in R*by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani (a pdf version of this book is available on Gareth James’ website). - Patrick Burns gives a short tutorial for Excel users who want to start using R called
*From spreadsheet thinking to R thinking* - Andrew Gelman shares his recent debugging experience.
- Two articles on data visualization: using ggplot2 to help with barplots, and creating whale charts for visualizing customer profitability.
- Arthur Charpentier (aka Freakonometrics) wants to know what are the research interests (in statistics) of different universities. He studies 35 journals in statistics, probability and econometrics, and creates a series of really cool maps and visuals to present his findings.
- And lastly, some interesting results on the amount time people spend on watching porn videos in the UK.

6

Jan 14

## The week in stats (Jan. 6th edition)

- Revolution Analytics publishes a number of useful R articles: 15 tips on computing with Big Data for those R users who need to handle large datasets efficiently, Combining the Power of DeployR, rCharts, and AngularJS for data visualization, K-means Clustering 86 Single Malt Scotch Whiskies for clustering analysis, and How to ask for R help when you need it.
- Should there be a Nobel prize in statistics? Xi’an and Gelman discuss their views and thoughts on this.
- Radford Neal of University of Toronto has released a new version of his pqR (pretty quick R). The biggest improvement in this version is that vector operations are sped up using task merging, and the software now has a new logo and its own website.
- Rasmus Bååth, a PhD student at Lund University in Sweden, designs three mascots for of Bayesian Statistics. Have a look at them and let him know which one is your favorite! In another post, Rasmus admits that the confidence interval is a tricky concept for him to grasp when he was a student, and created an animation of the construction of a confidence interval for those who are also not 100% sure where this concept came from.
- Statistics Done Wrong – the woefully complete guide to the most popular statistical errors and slip-ups committed by scientists every day.
- Last, but not least, an introduction to integrating R with Google Map via the R package
*ggmap*.

And finally, Statistics Blog wishes everyone Happy New Year!