.. _mptgs_config: Configuring MPTGS ================= There are several files that shape how the MPTGS will run. These files describe the environment it will use (e.g., which machines comprise the different elements of the system) and what it will do (e.g., volume and types of traffic). Those files will be discussed in detail below. One related topic that is not covered in this section is configuring the users themselves who will be used to produce the MPTGS traffic. That subject is covered in detail in :ref:`configuring_users` There are some files that are relatively likely to need changing, and many that are unlikely to need change. The ones that are likely to be changed are listed first below and include: * tg.config * db.info * assorted schedule files * gui.conf * internal_sites.conf Files that can be changed, but that most likely won't have to be include: * errorStrings.conf * motd-no-scenario and motd-scenario-running * verify.conf * actuator_map.conf (Managers only) * logpatterns/* Some files are derived from the user configuration and there are tools available to automatically build these, so hand editting is unlikely to be required. These include: * distributed.IP-map General Configuration --------------------- The main configuration file that describes the MPTGS system is ``/usr/local/tg/etc/tg.config``. This is a ``bash`` script that simply defines variables that other scripts will make use of. As such, the syntax of this file is very strict, and stray whitespace can break different lines. While familiarity with ``bash`` will be a benefit, here we will describe two types of lines that will be found in this file. Simple Assignments ^^^^^^^^^^^^^^^^^^ Many assignments in this file are simple, there is a single name or key that is receiving a single value of some sort. In this case the format must be ``name=value`` with no spaces around the equals sign. Some examples from the real tg.config include: .. code-block:: bash TG_LOG_ID=demo SEED=43243534 which will define the string that should be the basse of each TG log file and the seed to pass to the random number generator. Multiple Value Assignments ^^^^^^^^^^^^^^^^^^^^^^^^^^ Some values are lists of things rather than a single value. When specifying values like this we define the values in a bash array. The syntax for this in the config file generally uses the form ``name[index]=value`` so you can assign the values to each index in the array as you go. As with many programming languages, the index for the first element is ``0``. For example, to specify all of the linux manager hosts, i.e., the managers from which bot traffic will be run, we populate the ``lmanager`` array. If we have an inside and an outside manager the specification might look like the following: .. code-block:: bash lmanager[0]=mptgs_inside lmanager[1]=mptgs_outside if ``mptgs_inside`` and ``mptgs_outside`` are names of hosts that the orchestrator can reach. Below is the table of expected names and what the values mean. .. table:: Configuration Values in tg.config +---------------+-------+-----------------------------------------------+ | Name | S/M* | Meaning | +===============+=======+===============================================+ | MASK | S | This is Perl Regex that is compared against | | | | source IPs to determine if it counts as part | | | | of the inside (true if this regex matches | +---------------+-------+-----------------------------------------------+ | TG_LOG_ID | S | This is a string used a prefix to logfiles | | | | that bots produce during this run | +---------------+-------+-----------------------------------------------+ | SEED | S | A seed that will be passed to random number | | | | generators during this run. | +---------------+-------+-----------------------------------------------+ |MANAGER_RETRIES| S | The number of times to attempt to retry | | | | connecting to a Manager node before giving up | +---------------+-------+-----------------------------------------------+ | monitor | M | A list of hosts that run "monitoring" software| | | | that should be started and stopped with each | | | | run. | +---------------+-------+-----------------------------------------------+ | lmanager | M | A list of the linux hosts running the MPTGS | | | | Manager nodes. | +---------------+-------+-----------------------------------------------+ | apache_hosts | M | A list of hosts expected to be running a web | | | | server that may be used by traffic during a | | | | run. | +---------------+-------+-----------------------------------------------+ | misc_hosts | M | A list of hosts that must be ready to go for | | | | a run to begin. ``/root/prep_host.sh`` will | | | | be executed at the start of the run and | | | | ``/root/stop_host.sh`` at the end of the run. | +---------------+-------+-----------------------------------------------+ | cu_hosts | M | A list of hosts running ConsoleUser that have | | | | instances to start and stop with MPTGS runs | +---------------+-------+-----------------------------------------------+ | si_hosts | M | A list of hosts making up the Internet and | | | | whose status should be reflected in the GUI | +---------------+-------+-----------------------------------------------+ \*The "S/M" column indicates whether the value is Single or Multi valued In addition to those arguments, tg.config instances may contain the following now deprecated values. These may be ignored. .. table:: Deprecated values in tg.config +---------------+-------+-----------------------------------------------+ | Name | S/M* | Meaning | +===============+=======+===============================================+ | WINDOWSDB | S | The location of a database of WinUser data | +---------------+-------+-----------------------------------------------+ | NETDB | S | A value superceded by the db.info file | +---------------+-------+-----------------------------------------------+ | WIN_TRAFFIC | S | A value indicating which removed windows-based| | | | components should be started and stopped | +---------------+-------+-----------------------------------------------+ | WINLOGS | S | A value indicating whether to collect logs | | | | from the removed windows-based components | +---------------+-------+-----------------------------------------------+ | BASELOGDIR | S | The root directory into which logs from the | | | | P2INGS program would be gathered | +---------------+-------+-----------------------------------------------+ | CRUFTHOSTS | M | The list of hosts that ran the removed "cruft"| | | | components | +---------------+-------+-----------------------------------------------+ | SCAN_SCALING | S | The multiplier from the base scan rate that | | | | the scanner "cruft" components used | +---------------+-------+-----------------------------------------------+ | ATTACK_SCALING| S | The multiplier from the base attack rate that | | | | the attacker "cruft" components used | +---------------+-------+-----------------------------------------------+ | lvumanager | M | A list of hosts that managed the now removed | | | | Linux virtual user software (not bots) | +---------------+-------+-----------------------------------------------+ | wmanager | M | A list of hosts that manage the now removed | | | | windows managers (not related to ConsoleUser) | +---------------+-------+-----------------------------------------------+ | collector | M | A list of custom collectors that would have | | | | specific shell scripts installed to manage | | | | data collection during a run. | +---------------+-------+-----------------------------------------------+ | incollector | M | A list of custom collectors that would have | | | | specific shell scripts installed to manage | | | | data collection during a run. This differs | | | | from the collector property in that these | | | | are expected to be contacted "in-band" so they| | | | are not polled during a run so as to avoid | | | | poluting the data. | +---------------+-------+-----------------------------------------------+ | lcollector | M | A list of hosts from which Linux logs were | | | | collected during P2INGS | +---------------+-------+-----------------------------------------------+ | pcap_hosts | M | A list of hosts that would collect raw PCAP | | | | data during a run. | +---------------+-------+-----------------------------------------------+ | mysql_hosts | M | A list of hosts that are expected to provide | | | | MySQL databases that will be used during a run| +---------------+-------+-----------------------------------------------+ | samba_hosts | M | A list of hosts that are expected to provide | | | | Samba services during a run | +---------------+-------+-----------------------------------------------+ | bind_hosts | M | A list of hosts who run named and should have | | | | it restarted before a run. | +---------------+-------+-----------------------------------------------+ | ttl_mangle\_\ | M | A list of hosts running the old version of the| | hosts | | TTL mangler that needed to be stared and | | | | stopped. | +---------------+-------+-----------------------------------------------+ \*The "S/M" column indicates whether the value is Single or Multi valued Database Configuration ---------------------- The MPTGS relies on a PostgreSQL database for many of its user-based configuration settings. Since many components need to access that database the connection information is stored in a common file. This file is also in the form of a bash script, so it is not Ok to put spaces around the equals sign. There are no multi-valued fields in this file. .. table:: Database Connection Fields +---------------+-----------------------------------------------+ | Field Name | | +===============+===============================================+ | hostaddr | This is the name or address of the DB server | +---------------+-----------------------------------------------+ | dbname | This is the name of the database to access | +---------------+-----------------------------------------------+ Schedule Files -------------- Schedule files are what tell the MPTGS how many sessiosn of each bot type should be running at different points during the run. These files can be complicated or simple. They can have a single level that is used throughout the run or they can vary the levels as the duration of the run hits different times. There are 3 main types of lines in a schedule file. Combined they describe all the traffic the MPTGS will attempt to produce. SERVICE line ^^^^^^^^^^^^ The first line is special and only appears once. This line details all the bot types that will be used at any point in the schedule file, though any of them could be set to 0 to not appear during specific parts of the run. The first line must begin with the word ``SERVICE``. Each other word on the line is treated as the name of a bot type. The order that the bots are specified on this line will be meaningful on future lines, where the position on the line will correspond to the bot in the same position on this line. Direction lines ^^^^^^^^^^^^^^^ The MPTGS was originally designed with a concept of an interesting "inside" network and the rest of the Internet and other networks being "outside". The direction lines in a schedule file specify the ratio of traffic that originates in the inside (defined by the IP matching the ``MASK`` value in tg.config above) vs the outside. The value is a float between 0 and 1. The value 0 means all of the bots will originate from the inside, and a 1 means all the bots will originate from the outside. The format of a direction line is `` direction `` where the bot name must exactly match one of the names given in the SERVICE line. These lines can appear anywhere in the file to change the ratio throughout the run. There must be a direction specification for each bot type before the first time line. Time lines ^^^^^^^^^^ Time lines are what specify the volume of bot activities at various times within a run. There must be at least one of these lines in a file, but there can be as many more as desired. Each line will begin with a time in seconds, which represents the time at which this line takes effect. This time is given as the number of seconds into the run at which the line takes effect. For example, a line that begins with 3600 would start 1 hour into the run. The only exceptions to thihs are the first line which will start from the beginning of the run and the last line which will be repeated until the run ends. The remainder of the line are colon delimited pairs. Each pair defines the way bots should be active during each timeslice for which this line applies. The first number in the pair is the total number of bots that should be active, and the number after the colon is the number who should start new in each iteration, as opposed to those whose activity is continuing from a previous iteration. To know which bot type a given pair defines, its location in the time line is matched to the location of the bot definitions in the SERVICE line. So the first service maps to the first pair, the second to the second and so on. There are some types of bots that do a single activity and there's nothing to continue. These so called "one shot" bots will always have a "0" for the total number of bots, but the "new arrivals" number will determine how many get started each time. Example Schedule File ^^^^^^^^^^^^^^^^^^^^^ Let's look now at an example file that shows a simple version of the lines above:: SERVICE ftp smtp http smtp direction 0.7 ftp direction 0.2 http direction .05 120 10:4 0:5 30:8 http direction 1.0 240 10:4 0:5 90:24 As we can see, this file defines 3 bot types, ftp, smtp and http. The direction lines do not have to match the order given in the service line. We set the smtp bots to send 70% of the mail from outside bots, while ftp will be 80% inside sources, and http starts as a 50/50 split. We start out the traffic looking to have 10 active ftp bots in each iteration of decions, with about 4 of those being new each time. For smtp, once an email is sent there's no lingering activity so we don't have any carry over bots, but we expect 5 emails during each pass. For http we are looking to have 30 sources browsing the web with 8 of them being new. After a couple of minutes in the run, we switch the ratio for http traffic to be entirely starting with outside sources. There is no new direction line for the other bot types, so those will remain at their previous levels. After 240 seconds into the run we change the number of active bots. This must specify each bot type, even though in this example we only want to change one of the values. The ftp and smtp again remain at the levels described above, but only because we explicitly say to do so. We are turning up the knob on http though by looking to have 90 active bots with 24 new ones during each iteration. GUI Configuration ----------------- The GUI is not required to operate the MPTGS, but for those who choose to use it there is a config file that allows for a few configuration options that affect how it operates. This config file is by default ``/usr/local/tg/etc/gui.conf``. All the values are in the "key = value" style, but unlike some of the config files above this is not a bash file, so spaces around the equals sign are optional but acceptable. .. table:: GUI Configuration Options +-------------------+-----------------------------------------------+ | Key/Config Option | Meaning/Expected values | +===================+===============================================+ | orchestrator | If "true" then the GUI will act as though it | | | is running on the orchestrator host directly | | | and can access files locally, for example. | +-------------------+-----------------------------------------------+ | host | This is the address of the orchestrator. When | | | the GUI attempts to execute commands, those | | | will be done on the host identied here. If | | | no other value is given it assumes localhost | +-------------------+-----------------------------------------------+ | schedule | The path to the default schedule file the GUI | | | should use unless the operator selects another| +-------------------+-----------------------------------------------+ | monitor | If a file defining a set of monitors to launch| | | exists, and this parameter points to it, those| | | monitors will be loaded as the GUI starts. | +-------------------+-----------------------------------------------+ | duration | The default duration a run should last when | | | controlled from the GUI unless the operator | | | specifies otherwise. | +-------------------+-----------------------------------------------+ | prefix | The default log base prefix that should be | | | used if the operator doesn't specify something| | | different. | +-------------------+-----------------------------------------------+ | attack | DEPRECATED - NOT CURRENTLY USED. | | | This parameter is a mutliplier for the base | | | rate of attacks from the (retired) cruft | | | component of the MPTGS. | +-------------------+-----------------------------------------------+ | scan | DEPRECATED - NOT CURRENTLY USED. | | | This parameter is a mutliplier for the base | | | rate of scans from the (retired) cruft | | | component of the MPTGS. | +-------------------+-----------------------------------------------+ | experiment | NOT USED - this was an ID to track different | | | runs sharing a log prefix, but this is not | | | currently used. | +-------------------+-----------------------------------------------+ Internal Websites ----------------- It can be useful to add messages into logs that will be collected that mark the start of the run, so log messages that fall outside the bounds of the run can be ignored. The MPTGS will do an HTTP GET with a run starting marker to each host listed in ``/usr/local/tg/etc/internal_sites.txt``.