Colab Jupyter Notebook Tips

6 minute read

Published:

=======================

Colab Jupyter Notebook Tips

Magics

Manage environment variables

%env NUM_THREAD# Measure entire block
%%time
# Measure a single line
%timeit func()

Get details

%timeit?
list?

Measure execution time

# Measure entire block
%%time
# Measure a single line
%timeit func()

Autoreload

If you edit the code of an imported module or package, you usually need to restart the notebook kernel or use reload() on the specific module. That can be quite annoying. With the autoreload magic command, modules are automatically reloaded before any of their code is executed.

%load_ext autoreload
%autoreload 2

Execute other python/notebook

# Excute code in demo.ipynb (can be .py)
%run ./demo.ipynb
# Load content in demo.py to the cell
%load ./demo.py

Store data between notebooks

data = ‘this is the string I want to pass to different notebook’
%store data
del data

# Read `data` in another notebook
%store -r data # variable name must be the same

Write cell code to python file

%%writefile pythoncode.py
import numpy
def append_if_not_exists(arr, x):
if x not in arr:
arr.append(x)

def some_useless_slow_function():
arr = list()
for i in range(10000):
x = numpy.random.randint(0, 10000)
append_if_not_exists(arr, x)

Display file content

%pycat pythoncode.py

Profiling

# Profile run time
%prun func()

# Profile run time by line
# !pip install line_profiler
%load_ext line_profiler
%lprun func()

# Profile memory
# !pip install memory_profiler
%load_ext memory_profiler
%mprun func()

# Measure the memory use of a single statement
%memit func()

Use other kernels for the cell

%%python2
%%python3
%%ruby
%%perl
%%bash
%%R

# Tools

Write fast code using Cython

!pip install cython
%load_ext Cython

%%cython
def myltiply_by_2(float x):
return 2.0 * x

myltiply_by_2(23.) # 46.0

Install Jupyter Contrib Extensions

pip install https://github.com/ipython-contrib/jupyter_contrib_nbextensions/tarball/master
pip install jupyter_nbextensions_configurator
jupyter contrib nbextension install –user
jupyter nbextensions_configurator enable –user

RISE: jupyter notebook presentation

conda install -c damianavila82 rise # recommended
pip install RISE # less recommended

Display media

from IPython.display import display, Image
display(Image(‘demo.jpg’))

Connect Github to Binder

to display notebooks to others

Upload / Download Files

If you want to upload / download files via browser, you can use Colab’s python API.

from google.colab import files
files.download(‘file_path’)
files.upload() # will display an upload button

Use Google Drive as Data Storage

Every time you open a notebook, Google starts a new docker container, so everything will be lost once the notebook tab is closed. As a result, you want to retain your training results. The good news is you can mount your Google drive as an external disk onto Colab container! Even better, you can upload dataset to Google Drive and use it directly from Colab after mounting it.

To mount Google Drive, just execute following script in Colab notebook:

(This section will ask for access permission to Google Drive.)

# Setup authentication to mount Google Drive
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse

from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} &1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}

After gaining permission, you can mount Google Drive with following script:

# Execute when running on colab
# Mount Google Drive
!mkdir -p drive
!google-drive-ocamlfuse drive

import os
os.chdir(‘drive/Colab Notebooks’)

Now your working directory is the Colab Notebooks directory in your Google Drive

Tensorboard

Run tensorboard with – tensorboard –logdir path/to/logdir

tf.summary.histogram(‘pred’, output)
tf.summary.scalar(‘loss’, loss) # add loss to scalar summary
merge_op = tf.summary.merge_all()
writer = tf.summary.FileWriter(‘./log’, sess.graph) # write to file

for step in range(100):
# train and net output
_, result = sess.run([train_op, merge_op], {tf_x: x, tf_y: y})
writer.add_summary(result, step)

Debugging Python code using breakpoint() and pdb

%debug # launch ipdb debugger
%pdb # set breakpoint (== `import ipdb; ipdb.set_trace()`)

Syntax:

breakpoint() # in Python 3.7

# OR

import pdb; pdb.set_trace() # in Python 3.6 and below.

Python comes with the latest built-in function breakpoint which do the same thing as pdb.set_trace() in Python 3.6 and below versions. Debugger finds the bug in the code line by line where we add the breakpoint, if a bug is found then program stops temporarily then you can remove the error and start to execute the code again.

The easiest way to debug a Jupyter notebook is to use the %debug magic command. Whenever you encounter an error or exception, just open a new notebook cell, type %debug and run the cell. This will open a command line where you can test your code and inspect all variables right up to the line that threw the error.

Type “n” and hit Enter to run the next line of code (The → arrow shows you the current position). Use “c” to continue until the next breakpoint. “q” quits the debugger and code execution.

Since i use python 3.7 mostly, hence i will discuss the Breakpoint only!

def debugger(a, b):
breakpoint()
result = a / b
return resultprint(debugger(5, 0))

Output :

In order to run the debugger just type c and press enter.

Commands for debugging :

c -> continue execution
q -> quit the debugger/execution
n -> step to next line within the same function
s -> step to next line in this function or a called function

iPython debugger is another great option. Import it and use set_trace() anywhere in your notebook to create one or multiple breakpoints. When executing a cell, it will stop at the first breakpoint and open the command line for code inspection. You can also set breakpoints in the code of imported modules, but don’t forget to import the debugger in there as well.

from IPython.core.debugger import set_trace
set_trace()

PixieDebugger

As a prerequisite, install PixieDust using the following pip command: pip install pixiedust. You’ll also need to import it into its own cell: import pixiedust.

pip install pixiedust
import pixiedust

To invoke the PixieDebugger for a specific cell, simply add the %%pixie_debugger magic at the top of the cell and run it. Detailed tutorial at – https://medium.com/codait/the-visual-python-debugger-for-jupyter-notebooks-youve-always-wanted-761713babc62

Original articles – https://j0e1in.github.io/dev_notes/ml/tools/jupyter-colab.html#use-google-drive-as-data-storage

Debugging