My blog

View on GitHub
3 January 2022

Proof of concept of behave framework compared to shunit2

by Alex Harvey


I spent some time over the new year break learning the behave integration testing framework and decided to do a little proof of concept to compare this to my own Bash plus shunit2 integration test patterns. In this post, I create a very simple CloudFormation stack and write integration tests in both frameworks to see which one seems simpler. I conclude that I still prefer Bash plus shunit2 for its simplicity.

Sample code

To test out this framework I have created a simple CloudFormation stack as follows:

    Type: Number
    Default: 1

    Type: AWS::Logs::LogGroup
      RetentionInDays: 1

    Value: !Ref LogGroup
      Name: !Sub "${AWS::StackName}-LogGroup"

I am going to write tests for this in behave and then rewrite them in bash and shunit2.


What is behave

behave is a behaviour-driven development (BDD) framework. It is cucumber in Python. It uses Cucumber’s Gherkin syntax. Gherkin is said to be a natural readable language for expressing business logic without reference to the implementation detail.

Here is a sample of Gherkin:

Feature: Fight or flight

  In order to increase the ninja survival rate,
  As a ninja commander
  I want my ninjas to decide whether to take on an
  opponent based on their skill levels

  Scenario: Weaker opponent
    Given the ninja has a third level black-belt
     When attacked by a samurai
     Then the ninja should engage the opponent

  Scenario: Stronger opponent
    Given the ninja has a third level black-belt
     When attacked by Chuck Norris
     Then the ninja should run for his life

This allows us to express the feature as the Agile “story” that created it; and then a number of “scenarios” (test cases) that test and express the business logic.

Installing behave

To get started, I create an empty project and add a requirements file:

▶ tree .
├── loggroup.yml
└── requirements.txt

In there I add behave and boto3:

▶ cat requirements.txt

Create the virtualenv:

▶ virtualenv env
. env/bin/activate

And install requirements:

▶ pip install -r requirements.txt

What to test

One of the advantages of the Gherkins syntax is that, if properly written, it is in plain English, and should not require additional explanation. So, I am going to test the following:

Feature: Log Group Stack

  In order to test out the behave framework,
  As a crazy tech blogger,
  I want a CloudFormation log group stack,
  so that I can test stuff in it.

  Scenario: Create Stack
    Given the stack 'test-stack1' does not exist
     When the user creates stack 'test-stack1' with template 'loggroup.yml' and parameters 'RetentionInDays=1'
     Then the stack 'test-stack1' will exist
     And the stack 'test-stack1' will have RetentionInDays of '1'

Create the feature file

So I take that text and add it in a file features/loggroup.feature:

▶ mkdir features
▶ tree .
├── features
│   └── loggroup.feature
├── loggroup.yml
└── requirements.txt

Writing steps

Now I need some steps. A key insight in understanding the behave framework (and Cucumber) is that every line in a feature file must be written out as a decorated step_impl function in a steps file inside the steps directory. There can be any number of steps files, each one containing Python code, and named *.py. So I provide the following implementation for each line of the feature file I wrote:

import boto3
from typing import List, Dict, Optional

client = boto3.client("cloudformation")

@given("the stack '{stack_name}' does not exist")
def step_impl(_, stack_name: str) -> None:
    _wait(stack_name, "stack_delete_complete")

@when("the user creates stack '{stack_name}' with template '{template}' and parameters '{params_csv}'")
def step_impl(_, stack_name: str, template: str, params_csv: str) -> None:
    _wait(stack_name, "stack_create_complete")

@then("the stack '{stack_name}' will exist")
def step_impl(_, stack_name: str) -> None:
    response = client.list_stacks()
    found = False
    for summary in response["StackSummaries"]:
        if summary["StackName"] == stack_name:
            found = True
    assert found

@then("the stack '{stack_name}' will have RetentionInDays of '{expected_retention}'")
def step_impl(_, stack_name: str, expected_retention: str) -> None:
    log_group_name = _get_output(stack_name, "LogGroupName")
    retention = _get_retention(log_group_name)
    assert retention == expected_retention

def _wait(stack_name: str, state: str) -> None:
    waiter = client.get_waiter(state)
        WaiterConfig={"Delay": 5, "MaxAttempts": 10}

def _template_body(template: str) -> str:
    with open(template) as file_handle:

def _parameters(params_csv: str) -> List[Dict[str, str]]:
    return_val = []
    for param in params_csv.split(","):
        key, value = param.split("=")
        return_val.append({"ParameterKey": key, "ParameterValue": value})
    return return_val

def _get_output(stack_name: str, output_name: str) -> str:
    response = client.describe_stacks(StackName=stack_name)
    for output in response["Stacks"][0]["Outputs"]:
        if output["OutputKey"] == output_name:
            return output["OutputValue"]

def _get_retention(log_group_name: str) -> Optional[str]:
    client = boto3.client("logs")
    response = client.describe_log_groups(
    for log_group in response["logGroups"]:
        if log_group["logGroupName"] == log_group_name:
            return str(log_group["retentionInDays"])

Some things to pay attention to in this code:

Otherwise, I think this code is mostly self-explanatory for anyone familiar with Python and Boto3.

And my project structure now looks like this:

▶ tree .
├── features
│   └── loggroup.feature
├── loggroup.yml
├── requirements.txt
└── steps

Running the tests

To run the tests:


Bash and shunit2

Now I am going to rewrite all of this in Bash and shunit2 and compare the result.

Create the test file

▶ mkdir shunit2

In there I create shunit2/ with the following content:

export AWS_DEFAULT_OUTPUT="text"

_delete_stack() {
  local stack_name="$1"
  echo "Cleaning up $stack_name if it exists"
  aws cloudformation delete-stack \
    --stack-name "$stack_name"
  aws cloudformation wait stack-delete-complete \
    --stack-name "$stack_name"

_create_stack() {
  local stack_name="$1"
  local template="$2"
  local retention="$(cut -f2 -d= <<< "$3")"
  echo "Creating $stack_name"
  aws cloudformation create-stack \
    --stack-name "$stack_name" \
    --template-body "file://$template" \
    --parameters "ParameterKey=RetentionInDays,ParameterValue=$retention" \
    --output "json"
  aws cloudformation wait stack-create-complete \
    --stack-name "$stack_name"

_get_output() {
  local stack_name="$1"
  local output_name="$2"
  aws cloudformation describe-stacks \
    --stack-name "$stack_name" \
    --query \

_get_retention() {
  local log_group_name="$1"
  aws logs describe-log-groups \
    --log-group-name-prefix "$log_group_name" \
    --query \

oneTimeSetUp() {
  _delete_stack "$stack_name"

testRetention() {
  local actual_retention log_group_name
  _create_stack "$stack_name" "loggroup.yml" "RetentionInDays=1"
  log_group_name="$(_get_output "$stack_name" "LogGroupName")"
  actual_retention="$(_get_retention "$log_group_name")"
  assertEquals "$actual_retention" "1"

oneTimeTearDown() {
  _delete_stack "$stack_name"

. shunit2

My project structure now is:

▶ tree .
├── features
│   └── loggroup.feature
├── loggroup.yml
├── requirements.txt
├── shunit2
│   └──
└── steps

Running the tests

To run the tests:



As a Bash programmer with equally strong knowledge of Bash and Python, I prefer the Bash and shunit2 patterns. I find that everything has ended up in a single file, which has led to fewer moving parts and code that is — to me — easier to understand and maintain. The shunit2 framework seems to be simpler and more flexible and the code an easy top-to-bottom organisation that I am most expecting.

I am sure that someone who strongly prefers Python might be drawn to the behave framework. I feel that having the human-readable Gherkin syntax is a nice idea in theory, whereas I doubt that anyone will read those tests other than the developers and maintainers, and thus having business logic expressed in English does not seem to be a benefit in practice. Rather, we would spend quite a bit of time maintaining code around the need to implement those English sentences in Python.

From a user experience and output point of view, I feel that behave wins a little bit. The output is cleaner and more readable and this might be a reason to choose behave.


This concludes my post comparing the behave BDD framework with the Bash and shunit2 framework that I personally prefer. I have shown how to set it all up and given a simplest example. I did not cover all of the features but the remainder would be easy to pick up from here. I then rewrote this in Bash and shunit2 and found that the Bash code ended up simpler in my own opinion.

See also

tags: behave - shunit2