1. Choosing Traffic Tools/Techniques

The tools Skaion provides can (and usually are) used in combintation to make full cyber environments. To help choose which tools may be good to include below is a discussion of the roles and advantages of each. Some other options will also be considered.

The types of traffic we will consider are:

1.1. Live Traffic/Data

The most realistic data possible is, of course, the actual data in the real environment. There are both policy and technical challenges when using real data from a live enviornment.

Pros

Cons

  • Most realistic for that envrionment at that time

  • Often difficult to acquire

  • Contains the fullest set of data

  • May contain unexpected or unwanted traffic like novel malware

  • Captures full complexity of user interactions (with services and other users)

  • May contain PII or copyrighted material

  • Generally easy to start using

  • Can be difficult to establish ground truth in the data

  • Cannot change the environment the captured data easily or in a controlled way to experiment

  • When it is repeated to get more volume, the data itself is repeated which can result in more regularity than would actually be expected.

  • Limited ability to vary brings a risk of overfitting or over training if there is not a large amount or the captured data is not rich enough

Example use case: searching for anomolies in a real environment.

1.2. Packet Generation

This type of traffic uses tools that, often at very high speeds, craft packets that meet some criteria and dump them on the wire. Many of these tools are focused on the load they can achieve and do not bother to try to maintain any realism.

Pros

Cons

  • Able to generate large volume

  • Limited realism of stateful traffic.

  • Often easy to setup

  • Often packets are filled with random data which makes it ill suited to any deep packet inspection or content aware use cases.

Example use case: load testing a new switch/router.

1.3. Modeling Networks

With this appoach, general statistics about a network of interest are speficied, and tools attempt to generate traffic that reproduces those statistics. Skaion’s MPTGS tool largely uses this approach.

In the MPTGS a operator specifies how much of different activities they expect to see in each time slice, and the traffic generator starts and stops instances of “users” doing that activity to produce those targets.

Pros

Cons

  • Generate live, stateful traffic matching desired properties

  • Requires enough resources to produce the right amounts of traffic from each part of the test environment

  • Random choices create unique though similar data for each run

  • Can be time consuming to configure/set up

  • Produce moderate levels of traffic from each source

  • Traffic types are limited to supported types

  • Can change user profiles/models to rerun test with different

Example use case: testing a cyber defense tool in a lab environment.

1.4. Modeling Users

All of the above strategies assume, among other things, that there is no direct monitoring of the host users are using. This doesn’t hold in many situations where either host-based sensors are being used or there will be live interactions with the hosts (like when a red team will actually compromise a host). While real hosts can be added to environments with the other traffic generation options, if those hosts are to be more than passive landing points they will also need activity generation.

It is also worth noting here that not all host activity results in network activity. For example, editing a document may not produce any associated network traffic, but pasting plagerised or malicous content into it might be important to monitor.

Skaion’s ConsoleUser tool provides a way to do this. By intereacting with a target host by controlling the keyboard and mouse and monitoring the screen (unless other control structures like the accissbility layer are used) human-like activity can be carried out. In this, we attempt to model users, and the traffic that comes out is as a side effect of user activity, just as in the real world.

Pros

Cons

  • Supports host activities

  • Requires a real host endpoint for each that is observed in resulting traffic

  • Can enact human activities in controlled ways without being exact replays

  • Works at human speed, so waiting for random events can be slow

  • Get all real interactions between software/platform and all other computers in the enviornment (like SMB discovery and attempts to update software)

  • Sensitive to visual changes on the screen (with typical connector)

  • Can change user profiles/models to rerun test with different

Example use case: testing a cyber defense tool that includes host-based sensors.