This is the second post in the series TDD in Python with pytest where I develop a simple project following a strict TDD methodology. The posts come from my book Clean Architectures in Python and have been reviewed to get rid of some bad naming choices of the version published in the book.

You can find the first post here.

Step 7 - Division

The requirements state that there shall be a division function, and that it has to return a float value. This is a simple condition to test, as it is sufficient to divide two numbers that do not give an integer result

tests/test_main.py
def test_div_two_numbers_float():
    calculator = SimpleCalculator()

    result = calculator.div(13, 2)

    assert result == 6.5

The test suite fails with the usual error that signals a missing method. The implementation of this function is very simple as the operator / in Python performs a float division

simple_calculator/main.py
class SimpleCalculator:
    [...]

    def div(self, a, b):
        return a / b

Git tag: step-7-float-division

If you run the test suite again all the test should pass. There is a second requirement about this operation, however, that states that division by zero shall return inf.

I already mentioned in the previous post that this is not a good requirement, and please don't go around telling people that I told you to create function that return either floats or strings. This is a simple requirement that I will use to show you how to deal with exceptions.

The test that comes from the requirement is simple

tests/test_main.py
def test_div_by_zero_returns_inf():
    calculator = SimpleCalculator()

    result = calculator.div(5, 0)

    assert result == float('inf')

And the test suite fails now with this message

__________________________ test_div_by_zero_returns_inf ___________________________

    def test_div_by_zero_returns_inf():
        calculator = SimpleCalculator()

>       result = calculator.div(5, 0)

tests/test_main.py:70:  
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <simple_calculator.main.SimpleCalculator object at 0x7f0b0b733990>, a = 5, b = 0  1

    def div(self, a, b):
>       return a / b
E       ZeroDivisionError: division by zero

simple_calculator/main.py:17: ZeroDivisionError

Note that when an exception happens in the code and not in the test, the pytest output changes slightly. The first part of the message shows where the test fails, but then there is a second part that shows the internal code that raised the exception and provides information about the value of local variables on the first line 1.

We might implement two different solutions to satisfy this requirement and its test. The first one is to prevent b to be 0

simple_calculator/main.py
    def div(self, a, b):
        if not b:
            return float('inf')

        return a / b

and the second one is to intercept the exception with a try/except block

simple_calculator/main.py
    def div(self, a, b):
        try:
            return a / b
        except ZeroDivisionError:
            return float('inf')

Both solutions make the test suite pass, so both are correct. I leave to you the decision about which is the best one, syntactically speaking.

Git tag: step-7-float-division

Step 8 - Testing exceptions

A further requirement is that multiplication by zero must raise a ValueError exception. This means that we need a way to test if our code raises an exception, which is the opposite of what we did until now. In the previous tests, the condition to pass was that there was no exception in the code, while in this test the condition will be that an exception has been raised.

Again, this is a requirement I made up just for the sake of showing you how do deal with exceptions, so if you think this is a silly behaviour for a multiplication function you are probably right.

Pytest provides a context manager named raises that runs the code contained in it and passes only if the given exception is produced by that code.

tests/test_main.py
import pytest

[...]

def test_mul_by_zero_raises_exception():
    calculator = SimpleCalculator()

    with pytest.raises(ValueError):
        calculator.mul(3, 0)

In this case, thus, pytest runs the line calculator.mul(3, 0). If the method doesn't raise the exception ValueError the test will fail. Indeed, if you run the test suite now, you will get the following failure

________________________ test_mul_by_zero_raises_exception ________________________

    def test_mul_by_zero_raises_exception():
        calculator = SimpleCalculator()

        with pytest.raises(ValueError):
>           calculator.mul(3, 0)
E           Failed: DID NOT RAISE <class 'ValueError'>

tests/test_main.py:81: Failed

which signals that the code didn't raise the expected exception.

The code that makes the test pass needs to test if one of the inputs of the function mul is 0. This can be done with the help of the built-in function all, which accepts an iterable and returns True only if all the values contained in it are True. Since in Python the value 0 is not true, we may write

simple_calculator/main.py
    def mul(self, *args):
        if not all(args):
            raise ValueError
        return reduce(lambda x, y: x*y, args)

and make the test suite pass. The condition checks that there are no false values in the tuple args, that is there are no zeros.

Git tag: step-8-multiply-by-zero

Step 9 - A more complex set of requirements

Until now the requirements were pretty simple, and it was easy to map each of them directly into tests. It's time to try to tackle a more complex problem. The remaining requirements say that the class has to provide a function to compute the average of an iterable, and that this function shall accept two optional upper and lower thresholds to remove outliers.

Let's break these two requirements into a set of simpler ones

  1. The function accepts an iterable and computes the average, i.e. avg([2, 5, 12, 98]) == 29.25
  2. The function accepts an optional upper threshold. It must remove all the values that are greater than the threshold before computing the average, i.e. avg([2, 5, 12, 98], ut=90) == avg([2, 5, 12])
  3. The function accepts an optional lower threshold. It must remove all the values that are less then the threshold before computing the average, i.e. avg([2, 5, 12, 98], lt=10) == avg([12, 98])
  4. The upper threshold is not included when removing data, i.e. avg([2, 5, 12, 98], ut=12) == avg([2, 5, 12])
  5. The lower threshold is not included when removing data, i.e. avg([2, 5, 12, 98], lt=5) == avg([5, 12, 98])
  6. The function works with an empty list, returning 0, i.e. avg([]) == 0
  7. The function works if the list is empty after outlier removal, i.e. avg([12, 98], lt=15, ut=90) == 0
  8. The function outlier removal works if the list is empty, i.e. avg([], lt=15, ut=90) == 0

As you can see a requirement can produce multiple tests. Some of these are clearly expressed by the requirement (numbers 1, 2, 3), some of these are choices that we make (numbers 4, 5, 6) and can be discussed, some are boundary cases that we have to discover thinking about the problem (numbers 6, 7, 8).

There is a fourth category of tests, which are the ones that come from bugs that you discover. We will discuss about those later in this chapter.

Now, if you followed the posts coding along it is time to try to tackle a problem on your own. Why don't you try to go on and implement these features? Each of the eight requirements can be directly mapped into a test, and you know how to write tests and code that passes them. The next steps show my personal solution, which is just one of the possible ones, so you can compare what you did with what I came up with to solve the tests.

Step 9.1 - Average of an iterable

Let's start adding a test for requirement number 1

tests/test_main.py
def test_avg_correct_average():
    calculator = SimpleCalculator()

    result = calculator.avg([2, 5, 12, 98])

    assert result == 29.25

We feed the function avg a list of generic numbers, which average we calculated with an external tool. The first run of the test suite fails with the usual complaint about a missing function, and we can make the test pass with a simple use of sum and len, as both built-in functions work on iterables

simple_calculator/main.py
class SimpleCalculator:
    [...]

    def avg(self, it):
        return sum(it)/len(it)

Here, it stands for iterable, as this function works with anything that supports the loop protocol.

Git tag: step-9-1-average-of-an-iterable

Step 9.2 - Upper threshold

The second requirement mentions an upper threshold, but we are free with regards to the API, i.e. the requirement doesn't specify how the threshold is supposed to be specified or named. I decided to call the upper threshold parameter ut, so the test becomes

tests/test_main.py
def test_avg_removes_upper_outliers():
    calculator = SimpleCalculator()

    result = calculator.avg([2, 5, 12, 98], ut=90)

    assert result == pytest.approx(6.333333)

As you can see the parameter ut=90 is supposed to remove the element 98 from the list and then compute the average of the remaining elements. Since the result has an infinite number of digits I used the function pytest.approx to check the result.

The test suite fails because the function avg doesn't accept the parameter ut

_________________________ test_avg_removes_upper_outliers _________________________

    def test_avg_removes_upper_outliers():
        calculator = SimpleCalculator()

>       result = calculator.avg([2, 5, 12, 98], ut=90)
E       TypeError: avg() got an unexpected keyword argument 'ut'

tests/test_main.py:95: TypeError

There are two problems now that we have to solve, as it happened for the second test we wrote in this project. The new ut argument needs a default value, so we have to manage that case, and then we have to make the upper threshold work. My solution is

simple_calculator/main.py
    def avg(self, it, ut=None):
        if not ut:
            ut = max(it)

        _it = [x for x in it if x <= ut]

        return sum(_it)/len(_it)

The idea here is that ut is used to filter the iterable keeping all the elements that are less than or equal to the threshold. This means that the default value for the threshold has to be neutral with regards to this filtering operation. Using the maximum value of the iterable makes the whole algorithm work in every case, while for example using a big fixed value like 9999 would introduce a bug, as one of the elements of the iterable might be bigger than that value.

Git tag: step-9-2-upper-threshold

Step 9.3 - Lower threshold

The lower threshold is the mirror of the upper threshold, so it doesn't require many explanations. The test is

tests/test_main.py
def test_avg_removes_lower_outliers():
    calculator = SimpleCalculator()

    result = calculator.avg([2, 5, 12, 98], lt=10)

    assert result == pytest.approx(55)

and the code of the function avg now becomes

simple_calculator/main.py
    def avg(self, it, lt=None, ut=None):
        if not lt:
            lt = min(it)

        if not ut:
            ut = max(it)

        _it = [x for x in it if x >= lt and x <= ut]

        return sum(_it)/len(_it)

Git tag: step-9-3-lower-threshold

Step 9.4 and 9.5 - Boundary inclusion

As you can see from the code of the function avg, the upper and lower threshold are included in the comparison, so we might consider the requirements as already satisfied. TDD, however, pushes you to write a test for each requirement (as we saw it's not unusual to actually have multiple tests per requirements), and this is what we are going to do.

The reason behind this is that you might get the expected behaviour for free, like in this case, because some other code that you wrote to pass a different test provides that feature as a side effect. You don't know, however what will happen to that code in the future, so if you don't have tests that show that all your requirements are satisfied you might lose features without knowing it.

The test for the fourth requirement is

tests/test_main.py
def test_avg_upper_threshold_is_included():
    calculator = SimpleCalculator()

    result = calculator.avg([2, 5, 12, 98], ut=98)

    assert result == 29.25

Git tag: step-9-4-upper-threshold-is-included

while the test for the fifth one is

tests/test_main.py
def test_avg_lower_threshold_is_included():
    calculator = SimpleCalculator()

    result = calculator.avg([2, 5, 12, 98], lt=2)

    assert result == 29.25

Git tag: step-9-5-lower-threshold-is-included

And, as expected, both pass without any change in the code. Do you remember rule number 5? You should ask yourself why the tests don't fail. In this case we reasoned about that before, so we can accept that the new tests don't require any code change to pass.

Step 9.6 - Empty list

Requirement number 6 is something that wasn't clearly specified in the project description so we decided to return 0 as the average of an empty list. You are free to change the requirement and decide to raise an exception, for example.

The test that implements this requirement is

tests/test_main.py
def test_avg_empty_list():
    calculator = SimpleCalculator()

    result = calculator.avg([])

    assert result == 0

and the test suite fails with the following error

_______________________________ test_avg_empty_list _______________________________

    def test_avg_empty_list():
        calculator = SimpleCalculator()

>       result = calculator.avg([])

tests/test_main.py:127:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <simple_calculator.main.SimpleCalculator object at 0x7feeb7098a10>, it = [], lt = None, ut = None

    def avg(self, it, lt=None, ut=None):
        if not lt:
>           lt = min(it)
E           ValueError: min() arg is an empty sequence

simple_calculator/main.py:26: ValueError

The function min that we used to compute the default lower threshold doesn't work with an empty list, so the code raises an exception. The simplest solution is to check for the length of the iterable before computing the default thresholds

simple_calculator/main.py
    def avg(self, it, lt=None, ut=None):
        if not len(it):
            return 0

        if not lt:
            lt = min(it)

        if not ut:
            ut = max(it)

        _it = [x for x in it if x >= lt and x <= ut]

        return sum(_it)/len(_it)

Git tag: step-9-6-empty-list

As you can see the function avg is already pretty rich, but at the same time it is well structured and understandable. This obviously happens because the example is trivial, but cleaner code is definitely among the benefits of TDD.

Step 9.7 - Empty list after applying the thresholds

The next requirement deals with the case in which the outlier removal process empties the list. The test is the following

tests/test_main.py
def test_avg_manages_empty_list_after_outlier_removal():
    calculator = SimpleCalculator()

    result = calculator.avg([12, 98], lt=15, ut=90)

    assert result == 0

and the test suite fails with a ZeroDivisionError, because the length of the iterable is now 0.

________________ test_avg_manages_empty_list_after_outlier_removal ________________

    def test_avg_manages_empty_list_after_outlier_removal():
        calculator = SimpleCalculator()

>       result = calculator.avg([12, 98], lt=15, ut=90)

tests/test_main.py:135:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <simple_calculator.main.SimpleCalculator object at 0x7f9e60c3ba90>, it = [12, 98], lt = 15, ut = 90

    def avg(self, it, lt=None, ut=None):
        if not len(it):
            return 0

        if not lt:
            lt = min(it)

        if not ut:
            ut = max(it)

        _it = [x for x in it if x >= lt and x <= ut]

>       return sum(_it)/len(_it)
E       ZeroDivisionError: division by zero

simple_calculator/main.py:36: ZeroDivisionError

The easiest solution is to introduce a new check on the length of the iterable

simple_calculator/main.py
    def avg(self, it, lt=None, ut=None):
        if not len(it):
            return 0

        if not lt:
            lt = min(it)

        if not ut:
            ut = max(it)

        _it = [x for x in it if x >= lt and x <= ut]

        if not len(_it):
            return 0

        return sum(_it)/len(_it)

And this code makes the test suite pass. As I stated before, code that makes the tests pass is considered correct, but you are always allowed to improve it. In this case I don't really like the repetition of the length check, so I might try to refactor the function to get a cleaner solution. Since I have all the tests that show that the requirements are satisfied, I am free to try to change the code of the function.

After some attempts I found this solution

simple_calculator/main.py
    def avg(self, it, lt=None, ut=None):
        _it = it[:]

        if lt:
            _it = [x for x in _it if x >= lt]

        if ut:
            _it = [x for x in _it if x <= ut]

        if not len(_it):
            return 0

        return sum(_it)/len(_it)

which looks reasonably clean, and makes the whole test suite pass.

Git tag: step-9-7-empty-list-after-thresholds

Step 9.8 - Empty list before applying the thresholds

The last requirement checks another boundary case, which happens when the list is empty and we specify one of or both the thresholds. This test will check that the outlier removal code doesn't assume the list contains elements.

tests/test_main.py
def test_avg_manages_empty_list_before_outlier_removal():
    calculator = SimpleCalculator()

    result = calculator.avg([], lt=15, ut=90)

    assert result == 0

This test doesn't fail. So, according to the TDD methodology, we should provide a reason why this happens and decide if we want to keep the test. The reason is because the two list comprehensions used to filter the elements work perfectly with empty lists. As for the test, it comes directly from a corner case, and it checks a behaviour which is not already covered by other tests. This makes me decide to keep the test.

Git tag: step-9-8-empty-list-before-thresholds

Step 9.9 - Zero as lower/upper threshold

This is perhaps the most important step of the whole chapter, for two reasons.

First of all, the test added in this step was added by two readers of my book about clean architectures (Faust Gertz and Michael O'Neill), and this shows a real TDD workflow. After you published you package (or your book, in this case) someone notices a wrong behaviour in some use case. This might be a big flaw or a tiny corner case, but in any case they can come up with a test that exposes the bug, and maybe even with a patch to the code, but the most important part is the test.

Whoever discovers the bug has a clear way to show it, and you, as an author/maintainter/developer can add that test to your suite and work on the code until that passes. The rest of the test suite will block any change in the code that disrupts the behaviour you already tested. As I already stressed multiple times, we could do the same without TDD, but if we need to change a substantial amount of code there is nothing like a test suite that can guarantee we are not re-introducing bugs (also called regressions).

Second, this step shows an important part of the TDD workflow: checking corner cases. In general you should pay a lot of attention to the boundaries of a domain, and test the behaviour of the code in those cases.

This test shows that the code doesn't manage zero-valued lower thresholds correctly

tests/test_main.py
def test_avg_manages_zero_value_lower_outlier():
    calculator = SimpleCalculator()

    result = calculator.avg([-1, 0, 1], lt=0)

    assert result == 0.5

The reason is that the function avg contains a check like if lt:, which fails when lt is 0, as that is a false value. The check should be if lt is not None:, so that part of the function avg becomes

simple_calculator/main.py
        if lt is not None:
            _it = [x for x in _it if x >= lt]

It is immediately clear that the upper threshold has the same issue, so the two tests I added are

tests/test_main.py
def test_avg_manages_zero_value_lower_outlier():
    calculator = SimpleCalculator()

    result = calculator.avg([-1, 0, 1], lt=0)

    assert result == 0.5


def test_avg_manages_zero_value_upper_outlier():
    calculator = SimpleCalculator()

    result = calculator.avg([-1, 0, 1], ut=0)

    assert result == -0.5

and the final version of avg is

simple_calculator/main.py
    def avg(self, it, lt=None, ut=None):
        _it = it[:]

        if lt is not None:
            _it = [x for x in _it if x >= lt]

        if ut is not None:
            _it = [x for x in _it if x <= ut]

        if not len(_it):
            return 0

        return sum(_it)/len(_it)

Git tag: step-9-9-zero-as-lower-upper-threshold

Step 9.10 - Refactoring for generators

One of the readers of this series, Dmitry Labazkin, was following the series and noticed that the final implementation has some drawbacks, namely:

  • According to the requirements, this method should accept any iterable, but the implementation can't process generators (which are iterators and also iterables). For example, the function len() cannot be used with generators.
  • The iterable is copied, which is something we try to avoid to reduce memory usage.
  • Globally, the iterator is read 4 times, which affects performances.

These are interesting points, and he provides an implementation that solves them all. It's important to mention that the first point is closely related to requirements, so it should be represented by a unit test, while the other two are connected with performances and cannot be tested with pytest. However, any refactoring that produces code we consider better (for example from the performances point of view) can be tested by the existing tests. In other words, we can provide an alternative implementation and still make sure it works correctly.

Dmitry adds a test to check that generators are supported

tests/test_main.py
def test_avg_accepts_generators():
    calculator = SimpleCalculator()
    result = calculator.avg(i for i in [2, 5, 12, 98])
    assert result == 29.25

His implementation of the function avg() passes that test and the previous ones we wrote

simple_calculator/main.py
    def avg(self, it, lt=None, ut=None):
        count = 0
        total = 0

        for number in it:
            if lt is not None and number < lt:
                continue
            if ut is not None and number > ut:
                continue
            count += 1
            total += number

        if count == 0:
            return 0

        return total / count

One might argue that this implementation is less pythonic as it doesn't use fancy list comprehensions, but again, that is a matter of style (and performances). The point about generators is correct, but if that wasn't included in the requirements we might accept either implementation. I personally believe this new implementation is much better than the previous one, as I like to keep a low memory fingerprint, but if we were sure the calculator is used only on small sequences the concern might be overkill.

Git tag: step-9-10-refactoring-for-generators

Recap of the TDD rules

Through this very simple example we learned 6 important rules of the TDD methodology. Let us review them, now that we have some experience that can make the words meaningful

  1. Test first, code later
  2. Add the bare minimum amount of code you need to pass the tests
  3. You shouldn't have more than one failing test at a time
  4. Write code that passes the test. Then refactor it.
  5. A test should fail the first time you run it. If it doesn't ask yourself why you are adding it.
  6. Never refactor without tests.

How many assertions?

I am frequently asked "How many assertions do you put in a test?", and I consider this question important enough to discuss it in a dedicated section. To answer this question I want to briefly go back to the nature of TDD and the role of the test suite that we run.

The whole point of automated tests is to run through a set of checkpoints that can quickly reveal that there is a problem in a specific area. Mind the words "quickly" and "specific". When I run the test suite and an error occurs I'd like to be able to understand as fast as possible where the problem lies. This doesn't (always) mean that the problem will have a quick resolution, but at least I can be immediately aware of which part of the system is misbehaving.

On the other hand, we don't want to have too many test for the same condition, on the contrary we want to avoid testing the same condition more than once as tests have to be maintained. A test suite that is too fine-grained might result in too many tests failing because of the same problem in the code, which might be daunting and not very informative.

My advice is to group together assertions that can be executed after running the same setup, if they test the same process. For example, you might consider the two functions add and sub that we tested in this chapter. They require the same setup, which is to instantiate the class SimpleCalculator (a setup that they share with many other tests), but they are actually testing two different processes. A good sign of this is that you should rename the test to test_add_or_sub, and a failure in this test would require a further investigation in the test output to check which method of the class is failing.

If you have to test that a method returns positive even numbers, instead, you will have consider running the method and then writing two assertions, one that checks that the number is positive, and one that checks it is even. This makes sense, as a failure in one of the two means a failure of the whole process.

As a rule of thumb, then, consider if the test is a logical AND between conditions or a logical OR. In the former case go for multiple assertions, in the latter create multiple test functions.

How to manage bugs or missing features

In this chapter we developed the project from scratch, so the challenge was to come up with a series of small tests starting from the requirements. At a certain point in the life of your project you will have a stable version in production (this expression has many definitions, but in general it means "used by someone other than you") and you will need to maintain it. This means that people will file bug reports and feature requests, and TDD gives you a clear strategy to deal with those.

From the TDD point of view both a bug and a missing feature are cases not currently covered by a test, so I will refer to them collectively as bugs, but don't forget that I'm talking about the second ones as well.

The first thing you need to do is to write one or more tests that expose the bug. This way you can easily decide when the code that you wrote is correct or good enough. For example, let's assume that a user files an issue on the project SimpleCalculator saying: "The function add doesn't work with negative numbers". You should definitely try to get a concrete example from the user that wrote the issue and some information about the execution environment (as it is always possible that the problem comes from a different source, like for example an old version of a library your package relies on), but in the meanwhile you can come up with at least 3 tests: one that involves two negative numbers, one with a negative number as the first argument, and one with a negative numbers as the second argument.

You shouldn't write down all of them at once. Write the first test that you think might expose the issue and see if it fails. If it doesn't, discard it and write a new one. From the TDD point of view, if you don't have a failing test there is no bug, so you have to come up with at least one test that exposes the issue you are trying to solve.

At this point you can move on and try to change the code. Remember that you shouldn't have more than one failing test at a time, so start doing this as soon as you discover a test case that shows there is a problem in the code.

Once you reach a point where the test suite passes without errors stop and try to run the code in the environment where the bug was first discovered (for example sharing a branch with the user that created the ticket) and iterate the process.

The problem of types

Other than contributing to the TDD steps, Dmitry Labazkin asked some relevant questions about types, that I will summarise here. You can read his original questions in issue #11 and issue #12.

The question of type checking is thorny, and since this is an introductory series I will discuss it briefly and give some pointers. Don't get me wrong, though. As I will say later, this is one of the most important topics we can discuss in computer science.

Overall the problem Dmitry raises is that operators like addition and multiplication are valid for types other than integers (like floats) and also non-numeric ones (like strings). In Python, it is possible to multiply a string by a number and obtain a concatenation of that number of copies of the original string. At the same time, however, subtraction and division are not defined for strings, so some of the questions we can ask are:

  • can SimpleCalculator be used on non-integer numeric types?
  • can SimpleCalculator be used on non-numeric types?
  • shall we explicitly check in the code that the input values belong to a certain type?
  • shall we write tests to rule out other types?

As I said, such questions are deceptively simple, so let's tackle them step by step.

Let's assume it makes sense for our class to work with numeric types. In Python there is no way to prevent a program from calling SimpleCalculator().add("string1", "string2"), which would fail as the current implementation uses the built-in function sum that doesn't work on strings (unless you call it with a specific initial value). However, calling SimpleCalculator().mul("abc", 3) would result in "abcabcabc", as the internal implementation quietly supports strings.

Given the inconsistency, we might be tempted to rule out non-numeric types explicitly. In other words, we might want to add code to our calculator that actively checks if we are passing a non-numeric type. In that case we shall also add tests for those types, according to the TDD methodology, as no code can be added without tests.

The reason why this topic is thorny is because Python relies heavily on polymorphism, which means that it is more interested in the behaviour of an object more than in its nature. In other words, an object can be considered a number because it is an instance of int or float, for example, but it could just be a class we made up that behaves like one of those types. Using Abstract Base Classes like numbers is useful to check if an object is an instance of one of the types encompassed by the hierarchy (again, types such as int and float) but doesn't automatically include everything that behaves like a number. We can create a class that behaves like int without belonging to the hierarchy of numbers.

Ultimately, this is the reason why Python programmers have to remember that the operator + can be used with types like int, string, and list, but cannot be used with dictionaries. Conversely, len can be used on dictionaries and lists, but cannot be used on integers. We need to remember it, as these operators are polymorphic (there is no operator int+ or float+) but don't make sense or are not implemented for some types.

Those basic operators and functions raise an exception when the wrong type is passed, so we might be tempted to do the same and explicitly raise an exception when the wrong type is passed to SimpleCalculator. Again, the focus is on behaviour and implementation. If our implementation doesn't work with instances of certain classes an exception will occur already, and we don't need to do it explicitly. The aforementioned snipped SimpleCalculator().add("string1", "string2") would raise a TypeError because the underlying sum doesn't like strings. We don't need to do it explicitly.

In conclusion, my answers to the questions above are:

Can SimpleCalculator be used on non-integer numeric types? Probably, given the implementation is not specific to integers, but if we want to be sure we should add some tests to expose the functionality. So far, according to TDD, the class is certified to work with integers only. In this case, I might want to add some tests to show that it works with floats. But if someone feeds the class float-like objects that for some reason do not support the operator / some part of the calculator won't work, and there is no way to test all those conditions.

Can SimpleCalculator be used on non-numeric types? Yes, to a certain extent. mul can be used on sequences, for example. It is a calculator, though, so it doesn't make much sense to try to use it on non-numeric types. Users can feed the calculator any sort of non-numeric types and we cannot do anything to prevent it.

Shall we explicitly check in the code that the input values belong to a certain type? This goes against the nature of Python: if a certain function or method doesn't work with a specific type an exception will be raised.

Shall we write tests to rule out other types? Since it is basically impossible to write code that narrows the set of accepted types it is also impossible to write useful tests to check this. We can check that it doesn't work on strings, but what about other sequences? We can check it doesn't work with classes that inherit from Sequence, but what about classes that do not and behave the same?

In a dynamically typed language like Python, polymorphism and operator overloading are embedded in the language. I think the deeply polymorphic nature of Python is one of the most important aspects any user of this language should understand. It is an incredibly sharp double-edged sword, as it is at the same time extremely powerful and dangerous. "Everything is an object" might sound very simple at first, but it hides a degree of complexity that sooner of later has to be faced by those who want to be proficient with the language.

I wrote some posts that might help you to understand these topics. You can find them grouped here.

Final words

I hope you found the project entertaining and that you can now appreciate the power of TDD. The journey doesn't end here, though. In the next post I will discuss the practice of writing unit tests in depth, and then introduce you to another powerful tool: mocks.

Updates

2021-01-03: George fixed a typo, thanks!

2023-09-03: Dmitry Labazkin provided a new test for the method avg and a better implementation. He also asked relevant questions about type checking that I addressed in a new section. Thanks Dmitry!

Feedback

Feel free to reach me on Twitter if you have questions. The GitHub issues page is the best place to submit corrections.