Data Science / DevOps / OpenShift

Jupyter and R with OpenShift

UPDATE – Visit SparkR with OpenShift for lastest r-notebook/OpenShift solution. – UPDATE

The cool kids are using Jupyter notebooks. But we are going to step it up a notch by hosting an R-enabled Jupyter notebook on OpenShift. Max Bugger, may he rest-in-peace, will show you how. This lab is another in the OpenShift MiniLabs series.

max_bugger

Objectives

Let’s demonstrate hosting a Jupyter notebook instance as a container managed by OpenShift. Pulling down a prebuilt r-notebook image means we can get started very easily without any messy local workstation configuration. Moreover, the user work directory is mapped to an external volume such that any installed packages and scripts are preserved on container restart.

jupyter

Setup

Initial Attempt

This tutorial assumes you have completed the OpenShift MiniLabs installation procedure. Then refresh before continuing.

Repeat Attempt

To reset your environment to repeat this tutorial do the following:

$ cd ~/containersascode
$ ./oc-cluster-wrapper/oc-cluster up containersascode
$ oc login -u system:admin
$ oc delete persistentvolumeclaim workclaim
$ oc delete persistentvolume workvolume
$ rm -rf ~/.oc/profiles/containersascode/volumes/workvolume
$ oc login -u developer -p developer
$ oc delete project jupyter

Instructions

This demonstration begins by creating a persistent volume that can be later claimed by a container instance. This step is something typically done by an Administrator.

Create the Persistent Volume

Replace $VOLUMEPATH below with your preferred host-path location.

$ oc login -u system:admin
$ oc get pv
$ oc create -f - << EOF!
apiVersion: v1
kind: PersistentVolume
metadata:
  name: workvolume
spec:
  capacity:
    storage: 2Gi
  accessModes:
    - ReadWriteOnce
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Recycle
  hostPath:
    path: $VOLUMEPATH
EOF!

$ oc get pv

Create Project

Let’s create a project for our new application. The Jupyter r-notebook container needs some extra privileges so will assign that as follows:

$ oc login -u developer -p developer
$ oc new-project jupyter --display-name='Jupyter' --description='Jupyter' 
$ oc login -u system:admin
$ oc project jupyter
$ oc adm policy add-scc-to-user anyuid -z default

Create Application

You can create an OpenShift container straight from an image on Docker hub.

$ oc login -u developer -p developer
$ oc project jupyter
$ docker pull jupyter/r-notebook 
$ oc new-app jupyter/r-notebook -l name='r-notebook' --name='r-notebook'
$ oc expose service r-notebook

Claim the Storage

The set volume command will automatically trigger and new deployment of the r-notebook container.

$ oc login -u developer -p developer 
$ oc project jupyter
$ oc set volume dc/r-notebook --add \
    --overwrite \
    --name=work \
    --type=persistentVolumeClaim \
    --mount-path=/home/jovyan/work \
    --claim-size=1Gi \
    --claim-name=workclaim \
    --containers=r-notebook

Login to the r-notebook

To login to the r-notebook instance we will need to recover the token. This can be copy/pasted from the container’s ($PODID) log file using instructions as per below or from the Console. You can then inspect your shiny new Jupyter notebook with a URL such as http://r-notebook-jupyter.127.0.0.1.nip.io/?token=$TOKEN

$ oc login -u developer -p developer
$ oc project jupyter
$ oc get pods | grep r-notebook
$ oc logs -f $PODID
// Copy the token $TOKEN

Verify Lab Success

Here are a couple of tests to verify that your r-notebook can save state.

Install a Package

Create a notebook and then install a new package. Restart the container and verify that it still appears in the installed list. This means you can add a growing list of R packages easily.

[1]: install.packages('plyr', lib='/home/jovyan/work')
[1]: installed.packages(lib='/home/jovyan/work')

Create a Folder

Now try something more creative. For example, create a new folder, rename it to “iris”. Then create a new file called “sample” inside the “iris” folder. Paste the following below into the first cell [1]. Then Run that Cell, Save and Checkpoint the file, which should complete with the graphic below. Restart your container and confirm that your new “iris” folder and “sample” are preserved.

library(dplyr)

iris

iris %>%
  group_by(Species) %>%
  summarise(Sepal.Width.Avg = mean(Sepal.Width)) %>%
  arrange(Sepal.Width.Avg)

library(ggplot2)

ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) + geom_point(size=3)

download-2

Trivia

Knock yourself out at http://jupyter.org/

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s