An overview of linear algebra libraries in Scala/Java

September 19, 2015 in Programming, Statistics

This semester I’m taking a course in big data computing using Scala/Spark, and we are asked to finish a course project related to big data analysis. Since statistical modeling heavily relies on linear algebra, I investigated some existing libraries in Scala/Java that deal with matrix and linear algebra algorithms. 1. Set-up Scala/Java libraries are usually distributed as *.jar files. To use them in Scala, we can create a directory to hold them and set up the environment variable to let Scala know about this path.

A conversation with Hadley Wickham

September 27, 2013 in Statistics, R

Dr. Hadley Wickham is the Chief Scientist of RStudio and Assistant Professor of Statistics at Rice University. He is the developer of the famous R package ggplot2 for data visualization and the author of many other widely used packages like plyr and reshape2. On Sep 13, 2013 he gave a talk at Department of Statistics, Purdue University, and later I (Yixuan) had a conversation with him (Hadley), talking about his own experience and interest on data visualization, data tidying, R programming and other related topics.

Is Normal normal?

September 18, 2012 in Statistics

The rumor says that Normal distribution is everything. It will take a long long time to talk about the Normal distribution thoroughly. However, today I will focus on a (seemingly) simple question, as is stated below: If $X$ and $Y$ are univariate Normal random variables, will $X+Y$ also be Normal? What’s your reaction towards this question? Well, at least for me, when I saw it I said “Oh, it’s stupid. Absolutely it is Normal.

Handwriting recognition using R

December 18, 2011 in Programming, R, Statistics

This title is a bit exaggerating since handwriting recognition is an advanced topic in machine learning involving complex techniques and algorithms. In this blog I’ll show you a simple demo illustrating how to recognize a single number (0 ~ 9) using R. The overall process is that, you draw a number in a graphics device in R using your mouse, and then the program will “guess” what you have input. It is just for FUN.

How to run regression on large datasets in R

October 2, 2011 in Programming, R, Statistics

It’s well known that R is a memory based software, meaning that datasets must be copied into memory before being manipulated. For small or medium scale datasets, this doesn’t cause any troubles. However, when you need to deal with larger ones, for instance, financial time series or log data from the Internet, the consumption of memory is always a nuisance. Just to give a simple illustration, you can put in the following code into R to allocate a matrix named x and a vector named y.

An overview of linear algebra libraries in Scala/Java

A conversation with Hadley Wickham

Is Normal normal?

Handwriting recognition using R

How to run regression on large datasets in R

Yixuan Qiu