Quick Links #
You Will Need #
- Discord (to get touch with administrators and support).
- Some knowledge of HTML code
- Able to run a web browser on your primary computer for testing, such as RetroZilla.
- Various web browsers to test recovered websites, as needed.
Create account #
- First register for this website. To register, you first will need to contact Admins on Discord for an authorization key. Once you have the authorization key, click on Register on the top of this page. Use the authorization key when asked. Your account on the website allows you to write blog posts on restored web sites.
- Once registered on the website, go to appserv.protoweb.org/newuser.php and type in the same username and password here. Use the authorization key you received.
- Notify admins so your user account can be approved.
- Now you have created logins for the Website, the Development Portal and the Development Proxy Server (accessed via PuTTY). Congratulations!
Website access #
The website allows you to customize your user profile, write blog posts and edit pages.
- Click Login at the top of this page, and use your website credentials.
- If you are able to log on, your account has been created and your account is successfully approved.
Development portal access #
The development portal allows you to create new restoration projects as well as work on current projects. It also allows you to publish projects once they are completed. The development portal can also be used to modify pre-existing websites. The administrators will be able to assist with access requests.
- Open appserv.protoweb.org
- Log in with your username and password.
- If you are able to log on, your account is now working with the development portal and you will be able to start creating restoration projects.
- For more information on how to use the Development Portal, see this article.
Development proxy server access #
- Install and run the latest version of PuTTY.
- In PuTTY, under Host Name type in appserv.protoweb.org. Under Port, type in 2269
- On the navigation bar to the left, change to SSH and select the subcategory Tunnels.
- For Source Port, type in 8080.
- For Destination, type in the IP address for the development proxy server. It changes over time, so you will need to confirm it with one of the admins. Ask for it in the Discord channel. You will type in the address in the following format: <IP ADDR>:8080, where <IP ADDR> corresponds to the current IP address of the development server. So if the server IP is 126.96.36.199, then enter 188.8.131.52:8080 here.
- Make sure the option Local is selected, then click Add.
- On the navigation bar to the left, go back to the Session at the very top.
- You may now save your settings by typing in a name under Saved Sessions and clicking Save. This way you do not have to type everything in again.
- Click Open to login. Type in your username and password.
- Once you have logged in, you will be greeted with a simple welcome screen with instructions how to set up a tunnel, if you have not already done so.
- KEEP THIS WINDOW OPEN. It is needed to keep the SSH tunnel active.
Using the development proxy server #
- Once you have connected to the development proxy server via PuTTY (see above), you can start using it.
- Install and open up the web browser you will use for Testing. For the purposes of this tutorial, I recommend installing RetroZilla.
- Run the browser, and set your proxy parameters as follows:
- HTTP proxy: localhost, Port 8080
- FTP proxy: localhost, Port 8080
- This will set the proxy server to the development server (your localhost:8080 was redirected earlier in this process by PuTTY), so you can use the advanced features that the development server offers.
Note! Each time you wish to connect to the Development Server, you will need to first connect with PuTTY via SSH. If you saved the connection profile in PuTTY, this should be a straightforward process.
How to add new web sites to Protoweb #
The standard workflow is as follows:
- Decide which website you’d like to archive. Follow the Content Guidelines to determine if the website you are planning to restore fits our guidelines.
- Go to the Internet Archive Wayback Machine (opens in new tab) and type in a web site name. You will be presented by the Wayback Machine’s Timeline. Browse the timeline to track down a good copy of a website domain. You will need to find a home page with the least amount of broken links, missing pictures, and other artifacts. Write down the timestamp of the website copy you’ve chosen. You will use it as the basis for the mirror operation of the Protoweb archiver.
- Open the Site Rebuilder’s Toolbox (see Wiki manual on this site for in-depth instructions) and select “Archive web site”.
- If, for some reason you are having trouble opening the portal website, please refer to the Wiki manual for troubleshooting steps.
- Under Domain Name to Save type in the website domain in the format “www.example.com“.
- Under Targeted Date, type in the timestamp you had noted earlier in step 2.
- Under Link Traversal Depth, select a link depth that is realistic for the website. For example, if it’s a large website, with relevant pages 6 links deep starting from the home page, you will select 6. This will mirror all pages up to 6 links deep starting from the home page. It could take days to complete with higher values, so always choose an appropriate depth for the website you are mirroring. Usually the default 6 is fine for most sites. If the website has an initial landing page or a “welcome” page with just one link to enter the main site, you may add +1 to the depth value of your choice.
- Click Start Archiver. If everything checks out, this will start a job on the server which mirrors the given domain at the given timestamp from the Internet Archive.
- During the mirror process, you will see your page appear under “Running Jobs“.
- Once archiving is completed, you will see it appear under “My Sites in Development“.
- If the archive job failed for whatever reason, feel free to delete it and start over.
- Your website is not yet publicly available. It will be when you publish the site, once you are ready to do so – see below.
How to edit the imported web site #
The standard workflow is as follows:
- Once you have mirrored a web site, it will appear as a line under “My Sites in Development“. You are now ready to start editing and repairing the mirrored web site. Next to your completed archival job, you will see the “Edit” button. This button takes you to the File Manager so you access the files of the web site.
- The File Manager allows you to copy, move, create new files and edit files in place. It has an advanced Syntax Highlighting text editor and recognizes several programming languages. The File Manager supports file compression and bulk operations.
- Any changes you make can be immediately previewed using the Development Proxy Server, as long as you have set it up (see above section). The Development Proxy Server works just like all the other production proxy servers, with a couple of notable differences:
- The development server allows full access to browse an unpublished web site. The regular servers do not allow browsing unpublished web sites.
- The development server provides more information on the nature of errors your web site might encounter. This is useful for debugging your site.
- To further aid debugging, you will have access to the server logs. To access these, go to the iNode gateway and click Server Stats, then Development Debug Log.
- The development server does not throttle download speeds.
- When you are browsing your site, and you encounter a page or image that is missing, the development server will try to locate and download the missing document from the Internet Archive on the fly. This may be useful if you are validating the links on a page and notice missing pages – automatic downloading from Internet Archive assists you so you do not have to manually upload the missing file using the File Manager.
- The development server can also source missing files from an internal file repository.
- When testing your site, the development server highlights missing images with an ugly pink image so it is readily apparent what images are missing or broken.
- Use the File Manager and RetroZilla together to adjust, repair and fix a web site. Pay close attention to broken links and missing images.
How to publish your web site #
The standard workflow is as follows:
- If this is your first web site restoration, you will probably want to contact the Admins on Discord to vet your page before publishing it. They will be happy to offer guidance on what to fix and how to tackle some more complicated issues the web site may have.
- Give your web site one final glance. Make sure there are no broken links or missing images.
- Once you are done with the site, you can return back to the Site Rebuilder’s Toolbox list of pages and next to your page, hit the “Publish” button. The website will be marked published, and be viewable by everyone immediately.
- If you have signed up on the Protoweb.org web site, you can now write a blog entry of your restoration efforts and show off your work! Let the Admins know, and they will also be happy to feature the page on social media.
Questions Answered #
Q: How do I know if my job has completed or failed?
A: You can view running jobs in the Site Rebuilder’s Toolbox, and looking at the job logs of your project will usually indicate if a failure has occurred. You can also delete a running job. If you made an error, you can delete the running job and start over.
Q: What if a recovered site is broken?
A: This depends. If the page is too far gone, and you cannot reconstruct the start page, we recommend you delete the site and find an alternative date with less broken links or images. If most of the pages are fine, you can fix some problems on a site manually, and missing graphics can be reconstructed. Sometimes you may find a file or a graphic that is missing but an alternative resource on archive.org or somewhere else on the net is available. In this case you can use the Upload URL feature in the File Manager which fetches a file from the Internet to the directory you specify. If portions of the website are not available anywhere, the links leading to broken areas of the websites may be commented out, so that the user is not presented with broken links. Do leave the HTML code in though, but comment it out. The hope is that eventually some broken areas can be restored with new restoration techniques.
Q: In the logs, it looks like archiving has slowed down. Is it stuck?
A: Toward the end of an archival process, the archiver goes through a lot of files and links trying to find any files that it may have missed. This is probably what you are seeing. Give it time – it will complete.
Q: I cannot access the web site I just crawled on the development server. The job is Complete, but I’m always getting a 500 server error!
A: The development server expects exact addresses. “www.site.com” is different than “site.com”. So make sure you are accessing the site with the URL you mirrored. In other words, if you mirrored “www.site.com”, then you will access the site as “www.site.com”. If you mirrored “site.com”, then you will access the site with “site.com”. Only after publishing, the redirects will be added, so “site.com” will go to the primary site “www.site.com” or vice versa.
Q: I prefer working on site files on my own computer with my own file editors. Is this possible?
A: Affirmatively yes! Once you have archived a website, you will need go back to the job queue, click on “Edit” to access the File Manager. Inside the File Manager, you may easily compress a website into a zip-file by selecting all files and then choosing “Zip”. This will begin a process in the background and the file manager will be unavailable until the compression of files is done. Zipping may take a few minutes. Once the process is complete, the file will appear in the active directory. You may then download the zip-file and unzip it to your computer to work on the files. Once you are done, you can create another zip archive, upload it back to the server. Then inside the File Manager, open the archive you uploaded, and click UnZip. You do not need to zip up a site every time you make a change. You may also upload individual edited files if you prefer. If you’d like to test the site on your local computer, you may choose to use a locally running HTTP server, or you can upload files to the development server and test the site there.
Q: I would like to back up the site I crawled. Is that possible?
A: Yes, you can always back up your entire site, even after you fix and edit it. Just log on to the Site Rebuilder’s Toolbox, go to the File Manager of the your website using the Edit button, select all files, and click on the ZIP or TAR buttons to create archives of the selected files. You can then download the archive to your computer.
Q: Can I capture specific URL‘s such as website subdirectories or specific files?
A: While this feature is planned to be added in the future, it is not currently available. If you need to add specific files to an existing site, you can upload them using the File Manager, or use the Upload URL feature in the file manager to upload a link to a working file. If you need further assistance, contact one of the admins in Discord, and they will be able to modify the site files any way you need.