Choosing Traffic Tools/Techniques
=================================

The tools Skaion provides can (and usually are) used in combintation to
make full cyber environments.  To help choose which tools may be good to
include below is a discussion of the roles and advantages of each.  Some
other options will also be considered.

The types of traffic we will consider are:

- :ref:`live_traffic` (Not available from Skaion)
- :ref:`packet_generation` (Mostly not available from Skaion)
- :ref:`network_modeling` (Available as MPTGS_)
- :ref:`user_modeling` (Available as ConsoleUser_)

.. _live_traffic:

Live Traffic/Data
-----------------

The most realistic data possible is, of course, the actual data in the
real environment.  There are both policy and technical challenges when
using real data from a live enviornment.


    +-----------------------------------+-----------------------------------+
    | Pros                              | Cons                              |
    +===================================+===================================+
    | * Most realistic for that         | * Often difficult to acquire      |
    |   envrionment at that time        |                                   |
    +-----------------------------------+-----------------------------------+
    | * Contains the fullest set of data| * May contain unexpected or       |
    |                                   |   unwanted traffic like novel     |
    |                                   |   malware                         |
    +-----------------------------------+-----------------------------------+   
    | * Captures full complexity of     | * May contain PII or copyrighted  |
    |   user interactions (with services|   material                        |
    |   and other users)                |                                   |
    +-----------------------------------+-----------------------------------+
    | * Generally easy to start using   | * Can be difficult to establish   |
    |                                   |   ground truth in the data        |
    +-----------------------------------+-----------------------------------+
    |                                   | * Cannot change the environment   |
    |                                   |   the captured data easily or in  |
    |                                   |   a controlled way to experiment  |
    +-----------------------------------+-----------------------------------+
    |                                   | * When it is repeated to get more |
    |                                   |   volume, the data itself is      |
    |                                   |   repeated which can result in    |
    |                                   |   more regularity than would      |
    |                                   |   actually be expected.           |
    +-----------------------------------+-----------------------------------+
    |                                   | * Limited ability to vary brings  |
    |                                   |   a risk of overfitting or        |
    |                                   |   over training if there is not   |
    |                                   |   a large amount or the captured  |
    |                                   |   data is not rich enough         |
    +-----------------------------------+-----------------------------------+


Example use case: searching for anomolies in a real environment.

.. _packet_generation:

Packet Generation
-----------------

This type of traffic uses tools that, often at very high speeds, craft
packets that meet some criteria and dump them on the wire.  Many of
these tools are focused on the load they can achieve and do not bother
to try to maintain any realism.


    +-----------------------------------+-----------------------------------+
    | Pros                              | Cons                              |
    +===================================+===================================+
    | * Able to generate large volume   | * Limited realism of stateful     |
    |                                   |   traffic.                        |
    +-----------------------------------+-----------------------------------+
    | * Often easy to setup             | * Often packets are filled with   |
    |                                   |   random data which makes it ill  |
    |                                   |   suited to any deep packet       |
    |                                   |   inspection or content aware     |
    |                                   |   use cases.                      |
    +-----------------------------------+-----------------------------------+


Example use case: load testing a new switch/router.

.. _network_modeling:

Modeling Networks
-----------------

With this appoach, general statistics about a network of interest are
speficied, and tools attempt to generate traffic that reproduces those
statistics.  Skaion's MPTGS_ tool largely uses this approach.

In the MPTGS a operator specifies how much of different activities they
expect to see in each time slice, and the traffic generator starts and
stops instances of "users" doing that activity to produce those
targets.


    +-----------------------------------+-----------------------------------+
    | Pros                              | Cons                              |
    +===================================+===================================+
    | * Generate live, stateful traffic | * Requires enough resources to    |
    |   matching desired properties     |   produce the right amounts of    |
    |                                   |   traffic from each part of the   |
    |                                   |   test environment                |
    +-----------------------------------+-----------------------------------+
    | * Random choices create unique    | * Can be time consuming to        |
    |   though similar data for each    |   configure/set up                |
    |   run                             |                                   |
    +-----------------------------------+-----------------------------------+
    | * Produce moderate levels of      | * Traffic types are limited to    |
    |   traffic from each source        |   supported types                 |
    +-----------------------------------+-----------------------------------+
    | * Can change user profiles/models |                                   |
    |   to rerun test with different    |                                   |
    +-----------------------------------+-----------------------------------+


Example use case: testing a cyber defense tool in a lab environment.

.. _user_modeling:

Modeling Users
--------------

All of the above strategies assume, among other things, that there is
no direct monitoring of the host users are using.  This doesn't hold in
many situations where either host-based sensors are being used or there
will be live interactions with the hosts (like when a red team will
actually compromise a host).  While real hosts can be added to
environments with the other traffic generation options, if those hosts
are to be more than passive landing points they will also need
activity generation.

It is also worth noting here that not all host activity results in
network activity.  For example, editing a document may not produce any
associated network traffic, but pasting plagerised or malicous content
into it might be important to monitor.

Skaion's ConsoleUser_ tool provides a way to do this.  By intereacting
with a target host by controlling the keyboard and mouse and monitoring
the screen (unless other control structures like the accissbility layer
are used) human-like activity can be carried out.  In this, we attempt
to model users, and the traffic that comes out is as a side effect of
user activity, just as in the real world.


    +-----------------------------------+-----------------------------------+
    | Pros                              | Cons                              |
    +===================================+===================================+
    | * Supports host activities        | * Requires a real host endpoint   |
    |                                   |   for each that is observed in    |
    |                                   |   resulting traffic               |
    +-----------------------------------+-----------------------------------+
    | * Can enact human activities in   | * Works at human speed, so waiting|
    |   controlled ways without being   |   for random events can be slow   |
    |   exact replays                   |                                   |
    +-----------------------------------+-----------------------------------+
    | * Get all real interactions       | * Sensitive to visual changes on  |
    |   between software/platform and   |   the screen (with typical        |
    |   all other computers in the      |   connector)                      |
    |   enviornment (like SMB discovery |                                   |
    |   and attempts to update software)|                                   |
    +-----------------------------------+-----------------------------------+
    | * Can change user profiles/models |                                   |
    |   to rerun test with different    |                                   |
    +-----------------------------------+-----------------------------------+


Example use case: testing a cyber defense tool that includes host-based
sensors.


.. _MPTGS: https://docs.skaion.com/mptgs/index.html
.. _ConsoleUser: https://docs.skaion.com/cu/index.html