3.4. Spider

Spider is Skaion’s tool for spinning a new Web, and includes parts to generate all of the parts required to have a complete Skaion Synthetic Internet Documentation. These parts include DNS mapping IPs to host names with bind zone files, content for an interconnected web, and the suitable configuration for Apache to serve those sites and content.

3.4.1. Installation

The following instructions installed Spider on a minimal CentOS 8 system, but since the scripts are generally Perl or Python similar instructions should work on most GNU/Linux systems.

#. Install the following packages: ##. python2 ##. tar ##. perl ##. vim ##. screen ##. gcc ##. bind (especially if content will be installed) ##. httpd (especially if content will be installed) ##. wget (especially if content will be installed) ##. bind_utils (especially if content will be installed) ##. telnet (especially if content will be installed) #. Add extra Perl packages using perl -MCPAN -e 'install <pkgname>' for ##. Math::Random ##. Sort::Array ##. Switch ##. DBI #. Copy the source repo/package onto host #. cd into that directory, e.g., cd syntheticInternet #. untar Images.tgz: tar -xf Images.tgz

3.4.2. Operation

The following steps will create an entire Synthetic Internet that which will have a defined number of sites that are cross linked throughout the web content.

#. Produce a list of domain names the script will use and store it in domains.txt ##. Get or make a list of words to use as parts of host names, one option is to search for lists of common nouns, and store them in a file one entry per line ##. Generate a list of domain names to use, a sample python script that uses a file listing parts one per line and putting all of the hosts into a single TLD is available as make_names.py which can be adapted to generate names with other characteristics as needed #. change to the spider subdirectory #. make the output directory for all the web content: mkdir web #. Run spider.pl to create the web: spider.pl -b -m ../domains.txt 1G 250 using whatever size you want in place of 1G and the number of sites you want in place of 250 (though there must be enough names in domains.txt for all of them) #. Generate the standard index pages for the SI: ./make-index-pages.sh web #. If integrating with an existing SI (or SIs) get a list of all used IPs with a command similar to ip addr show | awk '/inet/ {print $2}' | cut -f1 -d/ > used_ips.txt #. Generate a list of class C (/24) networks that new IPs will inhabit, a simple script to help with this is available and can be run as ./gen_c_list.py > new_addrs ##. This script takes as input a file listing network spaces that need to be avoided, either because the testbed wants to reserve them for other uses or because they are non-routable (like the 127.0.0.0/8 space) #. Assuming the above commands or similar run to generate lists of IPs used by MediaGoblin in mg.txt, the existant Synthetic Internet in si.txt and the MPTGS clients in mptgs.txt then the following command will generate a file that can be used with the IP fakes system to load the addresses: ./gen_fakeips.pl -i mg.txt,si.txt,mptgs.txt new_addrs > avail_ips