First things first… #
Quick Links #
You Will Need #
- Discord (to stay in touch with administrators and for support, see bottom of page for link to Discord).
- Able to run a web browser on your primary computer for testing, such as RetroZilla.
- PuTTY or another SSH client with tunneling support.
- Various web browsers to test recovered websites, as needed.
Create account #
- First create your user account at https://protoweb.org/register/. To register, you first will need to contact an admin on Discord for an authorization key. Once you have the authorization key, click on Register on the top of this page. Use the authorization key when asked. Your account on the website allows you to write blog posts on restored web sites.
- Once registered on the website, go to appserv.protoweb.org/newuser.php and type in the same username and password. Use the authorization key you received earlier.
- Let one of the admins know that you have registered so that your user account can be approved and activated.
- Now you have created logins for the Website, the Development Portal and the Development Proxy Server (accessed via PuTTY). Congratulations!
Website access #
This website allows you to customize your user profile, write blog posts and edit pages.
- Click Login at the top of this page, and use your website credentials.
- If you are able to log on, your account has been created and your account is successfully approved.
Development portal access #
The development portal allows you to create new restoration projects as well as work on current projects. It also allows you to publish projects once they are completed. The development portal can also be used to modify pre-existing websites. The administrators will be able to assist with access requests.
- Open appserv.protoweb.org
- Log in with your username and password.
- If you are able to log on, your account is now working with the development portal and you will be able to start creating restoration projects.
- For more information on how to use the Development Portal, see this article.
Using the development proxy server #
- Log into Development Portal. Once you have logged in, you can start using the Development proxy server.
- If you haven’t done so already, install the web browser you will use for testing. For a modern Windows computer, we recommend installing RetroZilla, or using a emulator with a suite of web browsers, such as the Windows 98 Protoweb installation available here. You can also use a vintage computer to test with real hardware. In this case we usually recommend Internet Explorer 5.x and Netscape Navigator 4.79. Both browser installation files can be obtained from Protoweb.
- Run the browser, and set your proxy parameters as follows:
- HTTP proxy: wayback.protoweb.org, Port 27851
- FTP proxy: wayback.protoweb.org, Port 27851
- Gopher proxy: wayback.protoweb.org, Port 27851
- This will set the proxy server to the development server, so you can use the advanced features that the development server offers.
Note! If you receive the error “401 – Unauthorized“, it means your IP address has probably changed or you are using a VPN and the server no longer recognizes your connection attempts. Please login again to the development portal to update your IP address. If you are using a VPN, in some cases you may need to disconnect from the VPN, then try logging on to the Development Portal and then try again.
How to add new web sites to Protoweb #
The standard workflow is as follows:
- Decide which website you’d like to archive. Follow the Content Guidelines to determine if the website you are planning to restore fits our guidelines. If you are unsure, contact the Administrators. We would be happy to help!
- Go to the Internet Archive Wayback Machine (opens in new tab) and type in a web site name. You will be presented by the Wayback Machine’s Timeline. Browse the timeline to track down a good copy of a website domain. You will need to find a home page with the least amount of broken links, missing pictures, and other artifacts. Write down the timestamp of the website copy you’ve chosen. You will use it as the basis for the mirror operation of the Protoweb archiver.
- Open the Site Rebuilder’s Toolbox (see Wiki manual on this site for in-depth instructions) and select “Archive web site”.
- If, for some reason you are having trouble opening the portal website, please refer to the Wiki manual for troubleshooting steps.
- Under Domain Name to Save type in the website domain in the format “www.example.com“.
- Under Targeted Date, type in the timestamp you had noted earlier in step 2.
- Under Link Traversal Depth, select a link depth that is realistic for the website. For example, if it’s a large website, with relevant pages 6 links deep starting from the home page, you will select 6. This will mirror all pages up to 6 links deep starting from the home page. It could take days to complete with higher values, so always choose an appropriate depth for the website you are mirroring. Usually the default 6 is fine for most sites. If the website has an initial landing page or a “welcome” page with just one link to enter the main site, you may add +1 to the depth value of your choice.
- Click Start Archiver. If everything checks out, this will start a job on the server which mirrors the given domain at the given timestamp from the Internet Archive.
- During the mirror process, you will see your page appear under “Running Jobs“.
- Once archiving is completed, you will see it appear under “My Sites in Development“.
- If the archive job failed for whatever reason, feel free to delete it and start over.
- Your website is not yet publicly available. It will be when you publish the site, once you are ready to do so – see below.
How to edit the imported web site #
The standard workflow is as follows:
- Once you have mirrored a web site, it will appear as a line under “My Sites in Development“. You are now ready to start editing and repairing the mirrored web site. Next to your completed archival job, you will see the “Edit” button. This button takes you to the File Manager so you access the files of the web site.
- The File Manager allows you to copy, move, create new files and edit files in place. It has an advanced Syntax Highlighting text editor and recognizes several programming languages. The File Manager supports file compression and bulk operations.
- Any changes you make can be immediately previewed using the Development Proxy Server, as long as you have set it up (see above section). The Development Proxy Server works just like all the other production proxy servers, with a couple of notable differences:
- The development server allows full access to browse an unpublished web site. The regular servers do not allow browsing unpublished web sites.
- The development server provides more information on the nature of errors your web site might encounter. This is useful for debugging your site.
- To further aid debugging, you will have access to the server logs. To access these, go to the iNode gateway and click Server Stats, then Development Debug Log.
- The development server does not throttle download speeds.
- When you are browsing your site, and you encounter a page or image that is missing, the development server will try to locate and download the missing document from the Internet Archive on the fly. This may be useful if you are validating the links on a page and notice missing pages – automatic downloading from Internet Archive assists you so you do not have to manually upload the missing file using the File Manager.
- The development server can also source missing files from an internal file repository.
- When testing your site, the development server highlights missing images with an ugly pink image so it is readily apparent what images are missing or broken.
- Use the File Manager and RetroZilla together to adjust, repair and fix a web site. Pay close attention to broken links and missing images.
How to restore and publish your web site #
The standard workflow is as follows:
- If you are new, it is always good idea to contact the Admins on Discord to let them know of the website you are planning to restore. They will help you identify possible issues before getting too involved with the website.
- When you have completed restoring a website, contact the Admins to assess the website. This way you can avoid any issues that can be addressed ahead of time. If you are a new team member, the Admins will publish the website for you. When we see that the websites you are publishing match or exceed our expectations in regards to our Content Guidelines, we will add permission to publish sites on your own. This is part of us helping guide new team members to successful future website restorations. Admins will be more than happy to offer guidance on what issues to fix, and how to fix them. They will also give step by step instructions on how to tackle some of the more complicated issues you may encounter with your web site projects! Feel free to reach out!
- When you think your website is ready, give it one final glance. Make sure there are no broken links or missing images. If you are unable to fix certain pages or images, you might opt to comment out links to missing pages and images. Again, this is a good time to ask for assistance in the #admin-chat channel in Discord. Other users in the channel will be happy to give you guidance and help you check your website before it goes live.
- Write a Protoblog entry about your website when it is getting ready to be released. You can write anything from a short to a long entry. Try to at least give a general description of the site. If you want, you can also elaborate with more pictures, what the site means to you, what hurdles you overcame, it’s notable features, and the pages on it that you think might be interesting. This will help Protoweb users see what pages are coming online, and will help us describe the pages when we do announcements in the #announcements Discord channel.
- Once you are done with the site and have written a ProtoBlog entry, you can return back to the Development Portal list of pages. If you are a new team member, ask an admin to publish your page. If you have given publish rights, then you can click on the Publish button. Once you click the button, and after a confirming, the website will be marked published, and be viewable by everyone immediately on Protoweb.
- Regardless if you are a new or experienced team member, please remember to let admins know that you have published a website! This way we can add the website to the iNode website directory, add it to search indexing, and announce your new publication to all Protoweb users on social media.
- If you have signed up on the Protoweb.org website, we highly encourage you to write a blog entry of your restoration efforts so that you can show off your work and let everyone know that your new website is accessible. If you want, let one of the administrators know, and they will be happy to feature your work on social media.
How to create a Protoblog post #
Once you have published a website, we ask you to write something about the website you restored and your work restoring the site. This not only provides you a platform to let others know of the aspects of your work and care put into restoring the website, it also serves as an announcement to the community that a new website is available on Protoweb, so that others will know and visit the site.
- First log on to Protoweb’s website using the link at the top of the page.
- When logged on, you should see a Contributor action menu on the top of the page.
- Now you can choose to submit a contribution which will open up a form where you can type in a freeform article about your web site restoration.
- From here, you can type up your article. To upload more than one image, first select a featured image of the website, then under the text editor you’ll be able to paste more pictures if needed.
Questions Answered #
Q: How do I know if my job has completed or failed?
A: You can view running jobs in the Development Portal, and looking at the job logs of your project will usually indicate if a failure has occurred. You can also delete a running job. If you made an error, you can delete the running job and start over.
Q: What if a recovered site is broken?
A: This depends. If the page is too far gone, and you cannot reconstruct the start page, we recommend you delete the site and find an alternative date with less broken links or images. If most of the pages are fine, you can fix some problems on a site manually, and missing graphics can be reconstructed. Sometimes you may find a file or a graphic that is missing but an alternative resource on archive.org or somewhere else on the net is available. In this case you can use the Upload URL feature in the File Manager which fetches a file from the Internet to the directory you specify. If portions of the website are not available anywhere, the links leading to broken areas of the websites may be commented out, so that the user is not presented with broken links. Do leave the HTML code in though, but comment it out. The hope is that eventually some broken areas can be restored with new restoration techniques.
Q: In the logs, it looks like archiving has slowed down. Is it stuck?
A: Toward the end of a mirroring process, the archiver goes through a lot of files and links trying to find any files that it may have missed. This is probably what you are seeing. This is by design. Give it time – the process will complete eventually.
Q: I cannot access the web site I just crawled on the development server. The job is Complete, but I’m always getting a 500 server error!
A: The development server expects exact addresses. “www.site.com” is different than “site.com”. So make sure you are accessing the site with the URL you mirrored. In other words, if you mirrored “www.site.com”, then you will access the site as “www.site.com”. If you mirrored “site.com”, then you will access the site with “site.com”. Only after publishing, the redirects will be added, so “site.com” will go to the primary site “www.site.com” or vice versa.
Q: I prefer working on site files on my own computer with my own file editors. Is this possible?
A: Affirmatively yes! Once you have archived a website, you will need go back to the job queue, click on “Edit” to access the File Manager. Inside the File Manager, you may easily compress a website into a zip-file by selecting all files and then choosing “Zip”. This will begin a process in the background and the file manager will be unavailable until the compression of files is done. Zipping may take a few minutes. Once the process is complete, the file will appear in the active directory. You may then download the zip-file and unzip it to your computer to work on the files. Once you are done, you can create another zip archive, upload it back to the server. Then inside the File Manager, open the archive you uploaded, and click UnZip. You do not need to zip up a site every time you make a change. You may also upload individual edited files if you prefer. If you’d like to test the site on your local computer, you may choose to use a locally running HTTP server, or you can upload files to the development server and test the site there.
Q: I would like to back up the site I crawled. Is that possible?
A: Yes, you can always back up your entire site, even after you fix and edit it. Just log on to the Site Rebuilder’s Toolbox, go to the File Manager of the your website using the Edit button, select all files, and click on the ZIP or TAR buttons to create archives of the selected files. You can then download the archive to your computer.
Q: Can I capture specific URL‘s such as website subdirectories or specific files?
A: While this feature is planned to be added in the future, it is not currently available. If you need to add specific files to an existing site, you can upload them using the File Manager, or use the Upload URL feature in the file manager to upload a link to a working file. If you need further assistance, contact one of the admins in Discord, and they will be able to modify the site files any way you need.