A few days ago a friend asked me the following question: how to efficiently extract some specific lines from a large text file, possibily compressed by Gzip? He mentioned that he tried some R functions such as read.table(skip = ...), but found that reading the data was too slow. Hence he was looking for some alternative ways to extracting the data. This is a common task in preprocessing large data sets, since in data exploration, very often we want to peek at a small subset of the whole data to gain some insights.
This is a pretty old topic in R graphics. A classical article in R NEWS, Non-standard fonts in PostScript and PDF graphics, describes how to use and embed system fonts in the PDF/PostScript device. More recently, Winston Chang developed the extrafont package, which makes the procedure much easier. A useful introduction article can be found in the readme page of extrafont, and also from the Revolution blog. Now, we have another choice: the showtext package.
This title is a bit exaggerating since handwriting recognition is an advanced topic in machine learning involving complex techniques and algorithms. In this blog I’ll show you a simple demo illustrating how to recognize a single number (0 ~ 9) using R. The overall process is that, you draw a number in a graphics device in R using your mouse, and then the program will “guess” what you have input. It is just for FUN.
It’s well known that R is a memory based software, meaning that datasets must be copied into memory before being manipulated. For small or medium scale datasets, this doesn’t cause any troubles. However, when you need to deal with larger ones, for instance, financial time series or log data from the Internet, the consumption of memory is always a nuisance. Just to give a simple illustration, you can put in the following code into R to allocate a matrix named x and a vector named y.