Imagine the situation: you are creating a website. Hire a webmaster or do it yourself, spending a lot of money and personal time on it. You host your brainchild and lovingly fill it with information, without thinking about the need to save a copy of the site so as not to lose data.
One day, not so great for you, you go to your site, but it does not work. You start to find out what's the matter, and, oh horror, the data center burned down or the hosting took off. Or maybe a virus got in and destroyed your data. The loss of information on a website is comparable to the loss of information on a computer. So how do you keep a copy of the site?
Let's deal with the definition first. The process of website archiving is the preservation of the current version of a page or site in an archive for later work with it. For these purposes, specialized software is used. The largest company in the world is the Internet Archive, which we will discuss below.
For a private archive, you can use offline browsers that have been specially designed to work offline. They will help createlocal copies of individual web pages or entire sites. These include, for example:
- A cross-platform HTTrack browser that supports 29 world languages and is able to resume interrupted downloads, update the site mirror.
- Sharely free Offline Explorer, which allows you to download not only files or pages, but entire sites from the Internet via FTP, HTTP, HTTPS, RTSP, MMS, BitTorrent.
- Download Manager Free Download Manager. It integrates with all browsers, has a built-in FTP, supports the BitTorrent protocol, can create torrent files, intercept links from the clipboard.
- Teleport Pro closed source for Windows. The program allows you to download entire sites.
- A free console-based non-interactive program for downloading files and sites from the Internet Wget. The program supports HTTPS, HTTP, FTP protocols, and can also work through an HTTP proxy server. Suitable for Linux.
Creating a backup on the hosting
You can set up a site backup on your hosting provider. To do this, you need to go to the admin panel, to the section for creating backups. Each hosting has its own admin panel, and it's hard to say exactly where yours hosts this section. If you can't figure it out, write to technical support.
Creating a backup with plugins
If your site is hosted on a CMS platform such as, for example,WordPress, you can save a copy of your site by installing the wp-db-backup plugin (www.wordpress.org/plugins/wp-db-backup/) or similar. By properly configuring the plugin, you will receive a site backup every day or every week, as you wish.
How to save a copy of the site to your computer
You can save the site to your computer using an FTP client. If you use the FileZilla program, then create a "Backup" folder on your computer (the folder name can be anything). Connect to the server via an FTP client and simply drag and drop to make a full backup of the site to the "Backup" folder.
Besides this, you can use the Site2ZIP service (archive the site), a program for downloading WinHTTrack WebSite Copier. How to view the saved copy of the site? To do this, open the folder where the site was saved and click on the index.html file.
Internet Archive
In San Farncisco, in 1996, Brewster Cale founded the nonprofit Internet Archive. It collects copies of all web pages, audio and video recordings, graphics files and programs. Archives of the collected material are stored here for a very long time and there is free access to its databases for everyone.
If you are wondering how to open a saved copy of a site, then go to archive.org/web/ and enter the address of the site or page in the appropriate field. At the end of 2012, the Internet Archive was 10 petabytes - that's 10,000 terabytes! And by the middle of 2016, it had accumulated 502 billion copies.web pages.
Caching the site by search engines
A saved copy of the Google site is nothing more than a cache of the pages of the site that was made by the search engine. Any user can use a copy of the page for their needs at any time. Storing them on search engine servers takes a lot of resources, and a lot of money is allocated for this, but such help pays for itself, since we still go to search engines. True, this method is only suitable for existing sites or for those that have been removed recently. If this happened a long time ago, then the search engine erases the data.
Specialized search engine
In addition to the fact that you can manually search for cached pages in Google or Yandex, you can use the specialized search engine cachedview.com. It has an analogue: cachedpages.com.
If you want to save a copy of the site or its individual page, you can do it yourself and for free at archive.is. In addition, there is also a global search for versions that have ever been saved by the user.
Creating a web archive in national libraries
Today, national libraries are faced with the task of creating archives of Internet documents that are part of the scientific, cultural and historical heritage of mankind. But this is very problematic.
Studies have shown that the number of web documents on the Web is growing exponentially, and on average a document livesfrom one to four months. It is most convenient to use a website as a unit of account for a web document archive. The process of creating a fund is to create a copy or "mirror" of the site. Since the information on it changes over time, the library needs to create mirrors of the same website at regular intervals.
Thus, there are 60,000 websites in Sweden, which is 20 times the number of traditional print publications. Copies of printed documents in the library of Sweden occupy 1.7 km of shelves per year. A web archive would fill 25 km of shelves! Now their archive contains 138 million files with a total weight of 4.5 gigabytes.
The Internet is growing every day. There are many companies and sites that take care to keep copies of web pages in their archives. But don't rely on them alone. Make timely backups and you will never lose your site.