Exercise: Finding Bugs with Atheris in Python

Goal

In this exercise, you will use Atheris, a coverage-guided fuzzer for Python, to find a bug in a small date conversion utility.

Unlike ordinary unit tests, Atheris repeatedly mutates inputs and uses coverage feedback to explore new execution paths. Your job is to build a fuzz target, run it, and use the discovered failure to debug the program.

Setup

Install Atheris:

1
pip install atheris

Create a working directory for this exercise and place the files below inside it.


Starter Code

Create a file called date_utils.py:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from datetime import datetime

def normalize_date(date_str: str) -> str:
"""
Convert a user-provided date string into YYYY-MM-DD format.

Supported input formats:
- YYYY-MM-DD
- MM/DD/YYYY
- DD-MM-YYYY
"""
if "/" in date_str:
dt = datetime.strptime(date_str, "%m/%d/%Y")
elif "-" in date_str:
parts = date_str.split("-")
if len(parts[0]) == 4:
dt = datetime.strptime(date_str, "%Y-%m-%d")
else:
dt = datetime.strptime(date_str, "%m-%d-%Y")
else:
raise ValueError("Unsupported date format")

return dt.strftime("%Y-%m-%d")

Warm-Up

Before fuzzing, manually try a few examples:

1
2
3
4
5
from date_utils import normalize_date

print(normalize_date("2024-03-01"))
print(normalize_date("03/01/2024"))
print(normalize_date("01-03-2024"))

Now try this:

1
2
3
from date_utils import normalize_date

print(normalize_date("25-12-2024"))

What will happen? Can you fix the bug?

Now let’s use Atheris to try to expose more issues.


Atheris Primer

Atheris fuzzes a function usually named TestOneInput(data: bytes).
It repeatedly generates and mutates byte strings, then feeds them to your target.

A typical Atheris fuzzer has this shape:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import sys
import atheris

with atheris.instrument_imports():
import target_module

def TestOneInput(data):
# Parse or transform data
# Call code under test
pass

def main():
atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()

if __name__ == "__main__":
main()

In many cases, raw bytes are not directly useful.
Atheris provides FuzzedDataProvider to convert bytes into structured values such as integers, strings, and choices.


Task 1: Build a Minimal Fuzzer

Create a file called fuzz_date_utils_basic.py. You need to fill in the missing parts to make it work.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import sys
import atheris

with atheris.instrument_imports():
import date_utils

def TestOneInput(data):

## FILL ME!

try:
## FILL ME!
except ValueError:
# Invalid inputs are expected sometimes.
return

def main():
atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()

if __name__ == "__main__":
main()

Run it like this:

1
2
mkdir -p corpus
python fuzz_date_utils_basic.py corpus/

You may notice this fuzzer spend a lot of time exploring obviously invalid strings.
We’ll fix it in the next step. See if the current version generates any interesting inputs
that crash your program.


Task 2: Write a Structured Fuzzer

The previous fuzzer is a good start, but it is not very targeted.
Now write a better fuzzer that generates structured date inputs and checks a semantic property.

Create a file called fuzz_date_utils_structured.py and implement this fuzzer.
We’ve already provided some helper functions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import sys
from datetime import datetime
import atheris

with atheris.instrument_imports():
import date_utils

FORMATS = [
"YMD_DASH", # YYYY-MM-DD
"MDY_SLASH", # MM/DD/YYYY
"DMY_DASH", # DD-MM-YYYY
]

def build_date_string(year, month, day, fmt):
if fmt == "YMD_DASH":
return f"{year:04d}-{month:02d}-{day:02d}"
elif fmt == "MDY_SLASH":
return f"{month:02d}/{day:02d}/{year:04d}"
elif fmt == "DMY_DASH":
return f"{day:02d}-{month:02d}-{year:04d}"
else:
raise AssertionError("Unknown format")

def TestOneInput(data):

## FILL ME!

expected = dt.strftime("%Y-%m-%d")
actual = date_utils.normalize_date(s)

assert actual == expected, (
f"Mismatch found!\n"
f"Input: {s}\n"
f"Expected: {expected}\n"
f"Actual: {actual}"
)

def main():
atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()

if __name__ == "__main__":
main()

Run it:

1
2
mkdir -p corpus_structured
python fuzz_date_utils_structured.py corpus_structured/

Task 3: Investigate the Failure

If your structured fuzzer is effective, it should eventually report a failure.

When it does:

  1. inspect the crashing input
  2. reproduce it manually
  3. explain why the property failed

Task 4: Fix the Bug

Modify date_utils.py so that it correctly supports all three documented formats:

  • YYYY-MM-DD
  • MM/DD/YYYY
  • DD-MM-YYYY

After fixing it, rerun the structured fuzzer.

Expected behavior examples

Input Output
2024-03-01 2024-03-01
03/01/2024 2024-03-01
01-03-2024 2024-03-01
25-12-2024 2024-12-25

After your fix, does the structured fuzzer still find failures?


Task 5: Apply the fuzzer to real-world python applications

Could you write a fuzzer and find real-world bugs in these popular python packages?

If you find any, create a PR and contribute to the open-source community!