Screenshot of the Granef toolkit analysis window

Graph-based Network Forensics

We introduce a Granef toolkit that provides a new approach to exploratory network data analysis based on associations stored in a graph database, which follows a natural data perception by a human brain.

Toolkit Design

The Granef toolkit combines various operations representing the processing of network traffic captures to enable graph-based analysis in a web interface. These operations are implemented as Docker or Podman container modules whose interconnection and sequential execution are controlled by the Granef control script.

The core of the toolkit is a data handling module with a graph database Dgraph storing transformed network traffic captures. Using the default configuration, the data transformation is performed by a custom module that converts Zeek logs, extracted from the input PCAP file, into RDF triples. The data analysis operation consists of API above the Dgraph query language (DQL) and a custom web interface supporting graph-based exploratory data analysis.

For additional information about Granef operations and modules, click on the links in the following diagram.

Input
Extraction modules
Extraction
Transformation modules
Transformation
Data handling module
Indexing
Handling
API module
Web module
Analysis

Database Schema

Granef toolkit utilizes database schema based on the format of Zeek logs. The proposed schema follows works by Niese and Leichtnam et al., and extends their proposal by new edges and simplified naming reflecting people’s common perception of how a computer network works. See the simplified schema below or use the interactive diagram displaying all nodes and attributes.

The database schema is defined in Transformation modules producing a Dgraph schema file with a definition of node and attributes types. Host nodes represent a device on the network with a given IP address. These nodes can be associated with Host-data nodes containing information extracted from application data related to the host or provided by external data sources. Connection nodes contain information about the network connection, such as its duration, the number of bytes transferred, relevant ports, and used protocol. Application nodes contain application data extracted from the Connection and can be mutually connected by an additional edge. All edges are directional but allow reverse processing for querying from an arbitrary node regardless of its type.

Database model Database model

A detailed description including all database nodes and corresponding attributes is available in the interactive diagram.

Usage

The Granef toolkit is designed to provide maximum versatility while maintaining ease of use. All operations are handled by the main granef.py script, while the user does not need to change any configuration. However, advanced users may modify the configuration file to suit their specific needs.

The key part of the Granef toolkit is an interactive web interface available after data transformation at http://127.0.0.1:8000 (using the default configuration). This interface allows the analyst to query and analyze stored data using predefined queries and interactive visualizations.

Requirements

Granef toolkit primarily uses the Docker or Podman containerization maintained by a Python script granef.py. Therefore, there are no specific requirements except Python modules to run the script. The complete list of requirements is below:

  • Docker or Podman (requires approximately 1.9 GB disk space for container images in the default configuration)
  • Python 3 (the script is created in Python 3.12 but should work in earlier versions as well)
  • Python 3 modules in requirements.txt.

Granef toolkit installation can be performed using the following commands (install argument -e is used to allow dynamic changes in configuration files):

$ git clone https://gitlab.ics.muni.cz/granef/granef.git
$ pip3 install -e ./granef/
$ granef --help  # Verify the installation

Alternatively, the toolkit can be installed in the Python virtual environment using the following commands:

$ git clone https://gitlab.ics.muni.cz/granef/granef.git
$ python3 -m venv granef-venv
$ source granef-venv/bin/activate
(granef-venv)$ pip3 install -e ./granef/
(granef-venv)$ granef --help  # Verify the installation
(granef-venv)$ deactivate  # Deactivate the environment when no longer needed

Data Processing

Granef script allows you to fully automate the containers environment's preparation (pull container images and create a network) and run all data processing operations specified in the configuration file. Use the following command to start such automated data processing:

$ granef -a -i <INPUT_FILE_OR_DIRECTORY_PATH>

Using the -a argument will ensure that the Granef script automatically performs all the following required data processing steps (skips the relevant data processing part if Granef objects or generated data already exists).

  1. Pull all container images from the Granef container registry to the local containers registry.
  2. Create a new containers network in which all modules will be run.
  3. Run all data processing operations and their modules.
    • Each module will create a new directory in the parent of <INPUT_FILE_OR_DIRECTORY_PATH> named according to the operation and module name.
    • Modules with the attribute "detached" are run asynchronously and will remain running even after the script ends.

After a successful run of all operations, the following web services will be available (using the default configuration):

Network address and ports can be changed in the Granef configuration file (ports key for granef-handling-alpha, granef-analysis-api, and granef-analysis-web operation modules).

To completely clean up the containers environment and remove all Granef objects use the following command (the script will not delete generated data).

$ granef -r all

All Granef toolkit actions can also be called separately. The following commands demonstrate common operation steps: pull container images, create a new containers network, run the operation, and remove all running containers of the operation.

$ granef -p  # Pull all container images to the local containers registry
$ granef -n  # Create a new containers network
$ granef -o <OPERATION_NAME> -t <TASK> -i <INPUT_FILE_OR_DIRECTORY_PATH>  # Handle specified Granef operation
$ granef -r <OBJECT>  # Remove specified Granef objects

Use argument -h to see description of all script arguments and options.

Data Analysis

When all Granef toolkit operations are successfully performed, and the data is prepared, it is possible to access the web user interface at http://127.0.0.1:8000 (using the default configuration). The analyst can use predefined queries to get an overview of the data or use interactive visualization to follow displayed associations.

The following video shows an example of graph analysis that starts on connections search and follows by an investigation of extracted information (including the experimental MISP threat sharing extension).

Publications and Presentations

  • Milan Cermak, Tatiana Fritzová, Vít Rusňák and Denisa Sramkova. Using relational graphs for exploratory analysis of network traffic data Forensic Science International: Digital Investigation. 2023. Paper | Presentation )
  • Milan Cermak. (Keynote) Incident Investigation: From Packets to Graph-Based Analysis In: International Workshop on Graph-based network Security (GraSec) in conjunction with IEEE/IFIP Network Operations and Management Symposium NOMS. 2022. Keynote Web Page )
  • Milan Cermak. Toward Graph-Based Network Traffic Analysis and Incident Investigation In: DFRWS EU. 2022. Presentation )
  • Milan Cermak. Graph-based Network Traffic Analysis for Incident Investigation In: The 16th International Conference on Availability, Reliability and Security. 2021. Presentation )
  • Milan Cermak and Denisa Sramkova. GRANEF: Utilization of a Graph Database for Network Forensics. In: Proceedings of the 18th International Conference on Security and Cryptography. SCITEPRESS, 2021. ISBN 978-989-758-524-1. DOI: 10.5220/0010581807850790. Paper | Poster | Presentation | Video )

Our Team

We are part of the security team of Masaryk University (CSIRT-MU), which has been responsible for developing and maintaining proper ICT security at the university for more than 12 years. The team is also involved in various research projects ranging from network traffic analysis to digital forensics and criminal investigation.

Milan Cermak, Ph.D. ORCID | Scholar | LinkedIn )
Granef idea, project leader, and the main contributor
Denisa Sramkova (former member) ORCID | LinkedIn )
Prototyping, data processing, and analysis modules

Other contributions:

Contact

Pull requests, or new modules are welcome! For significant changes, please open an issue or contact us via the form below to discuss what you would like to add or change. You can also use the form for technical questions (but please be patient as it may take some time for us to respond).

We are also open to discussing possible research collaboration with you, so don't hesitate to contact us!