=1
a=2 b
{'input': None, 'unknown_input': True, 'output': None, 'unknown_output': True}
nbmodular
It is best to define data variables in separate data functions, so that they can be retrieved and reused from different test functions:
Then we can define our test function. This function uses the variables defined in our previous data functions:
And we can call it in our test function:
Let’s look at the resulting implementation of our test function:
def test_add_function():
a = test_first_data()
b = test_second_data()
c = add_function (a, b)
assert c==3
If we try to define the same variable in another data function, we will get an error:
def test_tutorial_pipeline (test=False, load=True, save=True, result_file_name="test_tutorial_pipeline"):
# load result
result_file_name += '.pk'
path_variables = Path ("test_tutorial") / result_file_name
if load and path_variables.exists():
result = joblib.load (path_variables)
return result
b, a = data ()
c = add_function (b, a)
print_result (c)
# save result
result = Bunch (c=c,b=b,a=a)
if save:
path_variables.parent.mkdir (parents=True, exist_ok=True)
joblib.dump (result, path_variables)
return result
from sklearn.utils import Bunch
from pathlib import Path
import joblib
import pandas as pd
import numpy as np
def test_test_tutorial_pipeline (test=True, prev_result=None, result_file_name="test_tutorial_pipeline"):
result = test_tutorial_pipeline (test=test, load=True, save=True, result_file_name=result_file_name)
if prev_result is None:
prev_result = test_tutorial_pipeline (test=test, load=True, save=True, result_file_name=f"test_{result_file_name}")
for k in prev_result:
assert k in result
if type(prev_result[k]) is pd.DataFrame:
pd.testing.assert_frame_equal (result[k], prev_result[k])
elif type(prev_result[k]) is np.array:
np.testing.assert_array_equal (result[k], prev_result[k])
else:
assert result[k]==prev_result[k]
['Untitled.ipynb',
'.ipynb_checkpoints',
'test_tutorial',
'debugging.ipynb',
'test_tutorial.ipynb']
['Untitled.ipynb',
'.ipynb_checkpoints',
'test_tutorial',
'debugging.ipynb',
'test_tutorial.ipynb']
We see that there is a new folder called test_tutorial
. Let’s look at its contents
['test_tutorial_pipeline.pk',
'test_test_tutorial_pipeline.pk',
'test_add_function.pk',
'test_add.pk']
There are two pickle files:
test_tutorial_pipeline.pk
stores the result of running the test_tutorial_pipeline
test_test_tutorial_pipeline.pk
stores the result of testing that pipeline.
If the results of the test are not the same as results from the previously run pipeline, the test fails. We can check that by storing different results for the pipeline:
['test_tutorial/test_tutorial_pipeline.pk']
Now we change it back, to see that the test passes:
['test_tutorial/test_tutorial_pipeline.pk']
Let’s revisit the first example, but this time we don’t add the function print_result
. By doing so, add_function
won’t have any output, since there is no other function in the notebook using its result.
could not remove c
could not remove a
could not remove b
Before trying to test a previous function we need to ensure that its output is the required one:
def add_function(b, a):
c = a+b
As we can see add_function
still doesn’t return anything because there were no other functions depending on it. The way to create a function depending on it is to use any of its created variables in another cell function, like so:
Since we won’t be needing such function for the time being, we can just manually add this dependency with the magic add_to_signature
:
Now add_function
has the required output:
def add_function(b, a):
c = a+b
return c
Now we can finally add our test function:
To avoid load pre-existing results, we can set the flag override
to True. By doing so, the global load flag is overriden with False, unless we explicitly pass –load in the command line.
We can also set the global load flag to false:
changing global load flag to False