3.1. Add a new site to the SI¶
This document describes the process for adding another site to the Synthetic Internet. The site can be added to the existing SI VM or to another machine that runs at an appropriate place in the network.
3.1.1. Get the site(s)¶
First, make sure you have permission to scrape/use the site.
Second, use wget or a similar tool to create a mirror of the site with a command like:
wget -m -k -K -E https://www.irs.gov
Be prepared that this may take a very long time depending on the site you are fetching.
Next Updating (most likely on SI) to include entry for the new site
3.1.2. Hosting on a new VM¶
create/clone a VM to be the web host for the new site, any webserver will do fine
- Copy the files to the default location for the system, e.g.,
/var/www/html for Apache on RHEL type systems
Profit!
3.1.3. Updating SI to include new site¶
create a directory under /var/www/html
copy new files to that directory
- edit /etc/httpd/conf.d/vhosts.conf
<VirtualHost hostname.domain.tld:80> ServerName hostname.domain.tld DocumentRoot /path/to/where/the/files/are </VirtualHost>