A few days ago a friend asked me the following question: how to efficiently extract some specific lines from a large text file, possibily compressed by Gzip? He mentioned that he tried some R functions such as read.table(skip = ...), but found that reading the data was too slow. Hence he was looking for some alternative ways to extracting the data. This is a common task in preprocessing large data sets, since in data exploration, very often we want to peek at a small subset of the whole data to gain some insights.

Continue reading

Have you ever tried to find a lightweight yet nice theme for the R Markdown documents, like this page? Themes for R Markdown With the powerful rmarkdown package, we could easily create nice HTML document by adding some meta information in the header, for example --- title: Nineteen Years Later author: Harry Potter date: July 31, 2016 output: rmarkdown::html_document: theme: lumen --- The html_document engine uses the Bootswatch theme library to support different styles of the document.

Continue reading

A Quick View of Recommender System The main task of recommender system is to predict unknown entries in the rating matrix based on observed values, as is shown in the table below: Each cell with number in it is the rating given by some user on a specific item, while those marked with question marks are unknown ratings that need to be predicted. In some other literatures, this problem may be named collaborative filtering, matrix completion, matrix recovery, etc.

Continue reading

Introduction I have seen several conversations in Rcpp-devel mailing list asking how to compute numerical integration or optimization in Rcpp. While R in fact has the functions Rdqags, Rdqagi, nmmin, vmmin etc. in its API to accomplish such tasks, it is not so straightforward to use them with Rcpp. For my own research projects I need to do a lot of numerical integration, root finding and optimization, so to make my life a little bit easier, I just created the RcppNumerical package that simplifies these procedures.

Continue reading

In January 2016, I was honored to receive an “Honorable Mention” of the John Chambers Award 2016. This article was written for R-bloggers, whose builder, Tal Galili, kindly invited me to write an introduction to the rARPACK package. A Short Story of rARPACK Eigenvalue decomposition is a commonly used technique in numerous statistical problems. For example, principal component analysis (PCA) basically conducts eigenvalue decomposition on the sample covariance of a data matrix: the eigenvalues are the component variances, and eigenvectors are the variable loadings.

Continue reading

Using showtext in knitr

Thanks to the issue report by yufree and Yihui’s kind work, from version 1.6.10 (development version), knitr starts to support using showtext to change fonts in R plots. To demonstrate its usage, this document itself serves as an example. (Rmd source code) We first do some setup work, mainly about setting options that control the appearance of the plots. Notice that if you create plots in PNG format (the default format for HTML output), it is strongly recommended to use the CairoPNG device rather than the default png, since the latter one could produce quite ugly plots when using showtext.

Continue reading

Author's picture

Yixuan's Blog

Statistics, Programming, and …?