Cor and hist function of R not working in Binder

I was trying to find some correlations between different columns of a dataset and store them in a matrix to form a correlation matrix. For this, I wanted to use the cor function of R.
The link to the dataset: https://bit.ly/3i4rbWl
What I have done till now:

df<-read.csv('SampleSuperstore.csv',header=T)    #Reading the dataset
df<-df[!duplicated(df),]    #removing duplicate entries

Now as I go on to apply the cor function like this:
cor_mat<-round(cor(df[,sapply(df,class)!='character']),3), Binder throws an error:

Error in cor(df[, sapply(df, class) != “character”]): ‘x’ must be numeric
Traceback:

  1. cor(df[, sapply(df, class) != “character”])
  2. stop(“‘x’ must be numeric”)

But, in my local machine, this same command with this same set of data and same conditions works flawlessly:

Not only this, I used Google Colaboratory to run my IPYNB file, and there also, no errors came up.

Further, for my work, I had to plot some histograms of this dataset and I went on to do those as follows:

par(mfcol=c(3,2))
for(i in 1:ncol(df))
{
    if (class(df[,i])!='character')
    {
        hist(df[,i],xlab=names(df)[i],main=paste('Histogram of ',
                                                 names(df)[i]),col='#313b69')
    }
}

Here also, Binder threw an error:

Error in hist.default(df[, i], xlab = names(df)[i], main = paste("Histogram of ", : ‘x’ must be numeric
Traceback:

  1. hist(df[, i], xlab = names(df)[i], main = paste("Histogram of ",
    . names(df)[i]), col = “#313b69”)
  2. hist.default(df[, i], xlab = names(df)[i], main = paste("Histogram of ",
    . names(df)[i]), col = “#313b69”)
  3. stop(“‘x’ must be numeric”)

But in my local machine and Google Colaboratory, things worked flawlessly again:

Seeing the nature of the error, I found out that the variable should be numeric to be eligible to apply cor and hist. However, I provided a check for this criterion only and then applied the functions where these criteria are met. My local system and Google colab demonstrated that there is nothing wrong with my code. But Binder is showing else.

Can someone help me to understand why this discrepancy?

Edit 1: As I moved up one trust level, I can now put multiple images in my posts, thus, updating the post with the two images which I thought to be relevant

1 Like

Can you find out what version of R (and any related packages) are running on your local machine, Google Colab and Binder respectively?

2 Likes

@sgibson91 I’m running R version 4.1.0 in my local system but I don’t know what’s running in Binder and Google Colab. Can you guide me as to how I can find that out?

You can use version variable that is available in R.

@krassowski Google Colab is using version 4.1.0 (Camp Pontanezen), the same as in my laptop whereas Binder is using version 3.6.3 (Holding the Windsock)

1 Like

Great! It confirms that it’s not binder that throws this error, but R, and that’s easier to solve :wink: The 4.0 release of R has brought some long requested changes, including a change to the default value of the stringsAsFactors argument in read.csv function (and friends); this means that you need to either:

  • adapt your code to run with older version, or
  • tell binder to use a specific version of R

I would strongly recommend the second option. You can specify the version using runtime.txt file as described in binder-examples/r repository, or use conda (see binder-examples/r-conda). In any case, pinning the R version will be useful for you in the long term. Also, make sure to read the excellent From Zero to Binder in R! article if you want to make better use of binders capabilities.

As for the first option (adjusting your code to run with older version of R), this can be done by passing stringsAsFactors=FALSE to your read.csv call. Have a look at the screenshot where I highlighted how R 3.6 converted your columns to factors when you did not specify this argument (red highlight), and how those remained characters after adding it (blue highlight):

If you do decide to use 3.6 (again, I would recommend just telling binder to use the newer version, one way or another) please make sure to read the R’s changelog to note down what other things have changed and what effect this might have on your analyses.

As far as I remember stringsAsFactors was the biggest change in R 4.0 that broke a lot of code (but I think it was a good decision overall, see this explanation) so I think that it is a good thing that binder did not jump to update the default R version to R 4.0 as soon as it became available - but I guess it will finally make a switch too, so it’s best not to rely on old version being there by default and instead pinning a version to the one that suits your needs :).

2 Likes

Many many thanks @krassowski. I think it’s better for me to just use the second option. :grinning:
However, as mentioned in your given link, I am confused as to how this can be done…
Like, I wrote r-4.1-2021-07-24 instead of the default that was there previously and then saved the file but nothing seems to happen.

If you are doing the second option, I strongly encourage you to use the binder-examples/r-conda as your template. I haven’t had luck updating GitHub - binder-examples/r: Using R with Jupyter / RStudio on Binder using runtime.txt beyond 3.6, despite trying a lot. There’s something else in it that is not compatible. At least in my experience; your mileage may vary. And while things get updated very regularly, I tried this recently before going with the conda route, see here. At that time, even with conda I could only get things to go to 4.0.5, see commit note here.

In regards to your comment above, you have to fork the examples, using them as a template for your own repository and then change the contents in your fork. Then you trigger a launch from your fork to build a new image with your specifications and finally a session will start. Importantly, you need to also change the URL in the launch badge to match your new fork. Otherwise the launch badge URL is still referring to the original you forked from and won’t launch with your changes incorporated. Look at my recent example, here, to see an example of sort of what I am describing. Note how I edited the URL in the launch badges to point MyBinder.org at my repo. You can hover over the launch badges in the README to see the URL they are using and see they point MyBinder to my repository. (The fact they are custom launch badges is moot in this case.) Note though I actually used one of my own repos as a template and not the example. However, if I had to do it all again, when making a new repository, I should have just used binder-examples/r-conda as my template from the start. That is because I ended up needing to use conda in the end. You can see all the steps and attempt to use R 4.1 in the commits. Maybe things have changed and it will allow 4.1 now.

2 Likes

Thanks @fomightez. Will surely try this out…