Designing a fuzzer for Knowledge Graph Language

Published on August 16, 2024 under the Coding category.

I have been writing a test suite for Knowledge Graph Language (KGL), a concise syntax for querying knowledge graphs. My test suite ensures that, given a specific input, the language execution engine returns the correct response. Specific functionalities are tested, too, such as data imports and class methods that allow someone to manipulate a knowledge graph in Python.

With that said, as with any system that accepts arbitrary text input, there is a wide surface area for errors. Thus, I decided to write a fuzzer for the test suite. A fuzzer is a specific type of testing engine that manipulates pre-defined inputs in such a way that may cause errors. The goal is to uncover potential errors that may not have been discovered by the tests defined by the programmer. Such errors could cause bugs or, potentially, security vulnerabilities.

For example, a fuzzer could test how a system performs when random letters in an input are replaced by random unicode characters, or by characters with different encodings – behaviours that may not have been tested explicitly.

In this blog post, I am going to walk through how I designed the fuzzer for KGL. Of note, fuzzing is an extensive subject, with a wide range of methods and practices that vary depending on what you are testing – programming languages, databases, query languages, file formats, and more. This post is not a guide to fuzzing in full, rather documentation on how I am using fuzzing for my use case.

The components

My fuzzing system has four components:

Seeds
The mutation engine
The query execution engine
The fuzzing definitions

This is a lot of jargon, but no prior knowledge of fuzzing is required to understand this guide. I discuss each of the above components, what they mean, and how they work below.

The seeds

Seeds are strings of valid inputs that can then be mutated by the fuzzer to be made potentially invalid.

For example, here is a valid KGL query:

{ coffee -> is }

This allows you to retrieve the is property of the coffee node in a knowledge graph in KGL.

I defined seeds for all different query types in KGL. This ensures that the fuzzer is able to test the full surface area of the language.

The fuzzer

The fuzzer itself takes a seed and manipulates it according to some criteria. One way to think about a fuzzer is that it replaces characters with different characters according to rules.

Before I defined my fuzzer, I defined a set of characters that could replace characters in the seeds. I did so with the following code:

from nltk.corpus import stopwords
import string

supported_languages = stopwords.fileids()

character_ranges = {
    file_id: list(stopwords.words(file_id)) for file_id in supported_languages
}

character_ranges["alphanumeric"] = string.ascii_letters + string.digits
character_ranges["unicode"] = [chr(i) for i in range(0x0000, 0x10FFFF)]
character_ranges["numbers"] = [str(random.randint(1, 10_000_000)) for _ in range(1000)]
character_ranges["long_numbers"] = [
    str(random.randint(10_000_000_000_000, 10_000_000_000_000_000)) for _ in range(1000)
]
character_ranges["emojis"] = list(emoji.EMOJI_DATA.keys())

Here, I define a dictionary that contains ranges of characters. The dictionary contains:

All alphanumeric characters.
All words in all stopwords lists from NLTK. This contains words in over a dozen languages.
All possible unicode characters.
1,000 random numbers in the range of 1 to 1 million.
1,000 random numbers in the range of 10 billion to 10 trillion.
All emojis.

While there is some overlap from the above lists – all emojis are unicode characters – I preferred to define explicit lists for clarity.

With these character lists ready, I defined a mutation function. This function takes a string, replaces characters in a given character range, then returns the new string. The mutation function triggers at random according to a random chance function. This function encodes my preferences for how often the fuzzer should run.

Here is how the mutation and change functions are coded:

CHANGE_RATE = 0.1

def change():
    return (
        random.choices(
            population=[["do not change"], ["change"]],
            weights=[1 - CHANGE_RATE, CHANGE_RATE],
            k=1,
        )[0][0]
        == "change"
    )

def mutate(
    seed, characters_to_skip=["{", "}", "-", ">", "<"], character_range="unicode"
):
    seed = list(seed)

    for i in range(len(seed)):
        if change() and seed[i] not in characters_to_skip:
            seed[i] = random.choice(character_ranges[character_range])

    return "".join(seed)

I decided that the fuzzer should have a 10% chance of changing any given character in a string. This is encoded in the CHANGE_RATE variable.

The mutate() function accepts a seed, a list of characters to skip, and a character range.

The list of characters to skip are characters that should not change. This can be overwritten. This is added because, for most tests, I want to preserve the characters with syntactic meaning so that I can ensure the fuzzer tests different language features. With that said, this implementation is naive. If I were testing a programming language, I may want to first parse a query then manipulate the Abstract Syntax Tree (AST). This would allow me to ensure that I manipulate only specific strings without changing strings with syntactic meaning.

The character range defines what characters can be used if the chance() function decides that any given character in a string should be changed.

Let’s walk through an example of a test generated by the code above.

Here is an input seed:

{ coffee -> is }

Here are a few fuzzed variations of the input seed from different tests:

{ coffee -> i棄 }
{ 瞛o𦋢fee -> is }
{ coffee\U000d3580-> is }
{ coffeуҳа ->шояд киis }
{ woffeeW-> is }

Several characters in all of the above examples have been manipulated. The characters with syntactic meaning are untouched. In production, I generate tests that both do and do not manipulate characters with syntactic meaning.

Here are some examples where characters with syntactic meaning are manipulated:

{ coffee diye> iya }
{ cdiyeffee -siz is }
{hcQQfee -> is X

Query execution

I now had a fuzzer ready for use! The next step was to define how queries would be run with my language, and what would constitute a pass or a failed test.

I came up with the following logic:

Any test that raises an UnexpectedCharacters or ValueError is a pass because it means that the language successfully caught malformed input (i.e. a query cannot be parsed into a valid Abstract Syntax Tree);
Any test that raises any other Exception is a fail because it means that an exception for which I had not accounted was raised, and;
Any test that returns no exception is a pass because it means that the language successfully executed the query.

The first statement is true only because the UnexpectedCharacters and ValueErrors are used internally to describe particular errors. If those errors are raised, it means the language execution engine noticed something was not as expected, which is the intended behaviour. But if another error is raised, it means there is a bug.

Here is my function for running queries:

def execute_query(query):
    try:
        kg.evaluate(query)
    except (lark.exceptions.UnexpectedCharacters, ValueError):
        # In this case, the program has successfully detected an invalid input.
        return "pass"
    except Exception as e:
        # In this case, an unknown error has been raised.
        return "fail"

I instantiate a single knowledge graph at the start of fuzzing, and use it throughout. This is appropriate because the language only supports querying. With that said, if the language supported any mutation operations (i.e. delete a node) I would instantiate a new knowledge graph for every test.

Test definition

We now have seeds, a fuzzer, a query execution engine, and a definition as to what constitutes a passed and a failed test. There is one step left: to define tests.

I defined several test cases, including:

Fuzz all input seeds without changing characters with syntactic meaning using Unicode characters;
Fuzz all input seeds with permission to change characters with syntactic meaning using Unicode characters, and;
Fuzz all input seeds with permission to change characters with syntactic meaning using all character_ranges (words in various languages, Unicode characters, numbers of various sizes).

For each of the above cases, 100 tests are generated. In total, thousands of tests are generated, a number reached in large part due to how many characters and words there are in the character_ranges dictionary.

The tests are run, and a record is kept of how many tests have passed and failed. A pass rate is calculated and returned.

Conclusion

The Knowledge Graph Language fuzzer is open source. The fuzzer is part of the main language test suite. It can be run as part of the language test suite locally with pytest, and is run with the GitHub Actions that run on each commit to the repository.

Before I designed the fuzzer, I had done limited testing on data with different Unicode characters. After the fuzzer, I had the capability to validate how queries would execute when characters from various different languages were used. I will need to investigate further to see whether I can test characters in different encodings, too.

This project was a pleasure to build. I am excited to learn more about fuzzers as I learn more about testing methodologies.