Updates on RSpectra: new "center" and "scale" parameters for svds()

November 29, 2019 in R, Statistics, Programming

Per the suggestion by @robmaz, RSpectra::svds() now has two new parameters center and scale, to support implicit centering and scaling of matrices in partial SVD. The minimum version for this new feature is RSpectra >= 0.16-0. These two parameters are very useful for principal component analysis (PCA) based on the covariance or correlation matrix, without actually forming them. Below we simulate a random data matrix, and use both R’s built-in prcomp() and the svds() function in RSpectra to compute PCA.

recosystem: recommender system using parallel matrix factorization

July 15, 2016 in Programming, R, Statistics

A Quick View of Recommender System The main task of recommender system is to predict unknown entries in the rating matrix based on observed values, as is shown in the table below: Each cell with number in it is the rating given by some user on a specific item, while those marked with question marks are unknown ratings that need to be predicted. In some other literatures, this problem may be named collaborative filtering, matrix completion, matrix recovery, etc.

Large scale eigenvalue decomposition and SVD with rARPACK

February 20, 2016 in Programming, Statistics, R

In January 2016, I was honored to receive an “Honorable Mention” of the John Chambers Award 2016. This article was written for R-bloggers, whose builder, Tal Galili, kindly invited me to write an introduction to the rARPACK package. A Short Story of rARPACK Eigenvalue decomposition is a commonly used technique in numerous statistical problems. For example, principal component analysis (PCA) basically conducts eigenvalue decomposition on the sample covariance of a data matrix: the eigenvalues are the component variances, and eigenvectors are the variable loadings.

An overview of linear algebra libraries in Scala/Java

September 19, 2015 in Programming, Statistics

This semester I’m taking a course in big data computing using Scala/Spark, and we are asked to finish a course project related to big data analysis. Since statistical modeling heavily relies on linear algebra, I investigated some existing libraries in Scala/Java that deal with matrix and linear algebra algorithms. 1. Set-up Scala/Java libraries are usually distributed as *.jar files. To use them in Scala, we can create a directory to hold them and set up the environment variable to let Scala know about this path.

How to run regression on large datasets in R

October 2, 2011 in Programming, R, Statistics

It’s well known that R is a memory based software, meaning that datasets must be copied into memory before being manipulated. For small or medium scale datasets, this doesn’t cause any troubles. However, when you need to deal with larger ones, for instance, financial time series or log data from the Internet, the consumption of memory is always a nuisance. Just to give a simple illustration, you can put in the following code into R to allocate a matrix named x and a vector named y.

Updates on RSpectra: new "center" and "scale" parameters for svds()

recosystem: recommender system using parallel matrix factorization

Large scale eigenvalue decomposition and SVD with rARPACK

An overview of linear algebra libraries in Scala/Java

How to run regression on large datasets in R

Yixuan Qiu