Project on many Offline Internet-Archives on the subject of Scientology, Freezone, Critics
Contents
- 1 Plan
- 2 Situation
- 3 Solution
- 4 The steps of the project
- 4.1 We collect collections of URLs
- 4.2 Entering all Scn-URLs in a table
- 4.3 Consolidation of the data in tbl_URLs
- 4.4 Checkout the software and hardware used
- 4.5 Document our findings on the steps until now for the public
- 4.6 Create a Mailing List for the Scn-Internet-Archive
- 4.7 Regulary download sessions of all the Scn-related-sites
- 4.8 Further project: An online-Scientology-Webarchive
- 5 Who is interested to work with me on this?
Plan
Mirroring of all websites and Newsgroups on the subject of Scientology (including critics, Church sites and independent) on many independent computer systems of the library network so that these data remain still available, even if they are pulled out of traffic by the original publisher. This is a necessity for a scientific work and to saveguard the freedom of speech.
Situation
There is no guarantee, that anything once published on the internet will stay there forevever! See an actual example for this here: Truth revealed: Website suppressed, which showed the truth about the take-over of Scientology by the US-government
I know, that there already exists a project to archive the whole internet, including our subject: the WayBack-Machine here: http://www.archive.org/web/web.php
But there are some reasons, why this is not enough:
- You never know, whether this archive will be available for all the future to us. Perhaps this whole service will be shut down one day.
- You don't know, who really is behind this service. This site is in Alexandria, Egypt. There exists a CIA-headquarter in Egypt. Perhaps they will someday support the censorship of the internet to our disadvantage and we don't get out any data on suppressed websites.
- It is very nice to have all downloaded pages on your local harddrive and be able to make a fulltext search only on these downloaded Scientology-websites. This would give better results than googling on the whole internet (including non-Scn sites). This is because you could look for i.e. a wellknown name but find only references in our context of Scientology. This would not be possible with www.google.com
- And additionally you would also find pages, which HAVE BEEN there once but are not available any more.
Solution
Even the big national libraries don't rely on the WayBack-Machine but decided to make a copy of the internet for their own purposes.
Also the several yahoogroups etc should be included.
We should do this for ourselfes. Not as a central project, but as a cooperative one: Once the main work is done, everyone interested with a flat-rate for internet access and with a spare hard drive would be able to get his own copy of this archive.
There are valuable and free tools for this, see for example: http://de.wikipedia.org/wiki/HTTrack
Costless and easy.
And I expect, that a full backup of all Scientology & FZ Ressources and Sites will need no more space than a normal 500GB drive.
The steps of the project
We collect collections of URLs
First we need a collection of URLs on all the sites, which should be downloaded. A lot of people (churchies, freezoners and critics) have spend many hours in putting together lists of links to their favorite subjects and published these links on the internet.
To give you an idea, I will mention here some of these collections, if you know of further lists, PLEASE insert them directly here or send me an email: Special:Contact
- http://www.einet.net/directory/974656/Opposing_Views.htm
- http://www.altreligionscientology.org
- http://home.snafu.de/tilman/prolinks/
- http://home.snafu.de/tilman/index.htmlcos_link_anti
- http://www.freiescientologen.de/links.htm
- http://www.freiescientologen.de/links2.htm
- http://dmoz.org/Society/Religion_and_Spirituality/Scientology/Free_Zone
- http://dmoz.org/Society/Religion_and_Spirituality/Scientology
With these handfull of links we start a Meta-Collection of Scn-Links: A collection of Link-collections!
I created a database table for this: tbl_collections
From there we come to the next step:
Entering all Scn-URLs in a table
From each of the above named collection we enter every single URL into another database table: tbl_URLs
We do not sort out and doublettes at that point, just take every URL we can find. And attach any available data on these URLs:
- owner of the site,
- name of the site,
- language of the site,
- category of the site: freezone, critic, church, church-member, mass-media, ...
- from which collection stem these data
- and some more
Consolidation of the data in tbl_URLs
Now we can use the means of a database and sort all information in the order of either the site-owner or the URL. By this we can start to consolidate the data by entering further data to each record:
- double record (=don't use it), original is number so-and-so
- offline site (perhaps we can recover it by the Way-Back-Machine)
- this "site" is a subdirectory of another one, so take that and insert here the number of the main-site entry
Checkout the software and hardware used
Perhaps before we start we should compare several competing software-tools like WinHTTrack, even if they cost some money, this could be well invested if it saves working time and hard-drive-space.
For example I think it is necessary like in the way back machine to check and download every site at least monthly. But we should not store identical pages, only new pages or changed pages. Is this possible with our tool? This would save a lot of hard drive space and make it possible to check and download sites very frequently.
And then we should have a pilot and find out, how much disk-space we will need. Perhaps one drive is enough for all?
Document our findings on the steps until now for the public
We should document our findings on the steps until now for the public. By this we will get further feedback (missing URLs, better tools, new ideas) and improve our project.
Also we make it possible for others, who don't have the time to do all these steps, to create their own archive. The more archives exist, the more save we are in the access to the data. Because hard drives can fail and with collegues we can access their copies without too much overhead!
Create a Mailing List for the Scn-Internet-Archive
Thats why we will create a Mailing List for the Scn-Internet-Archive: To be able to cooperate with each other on this easily. No one of us needs a Backup of his hard drive as we are in comm with each other and if one of the hard drives fails, we exchange among us. But there is of course also the possibility for people to just use our data and keep in hiding. No problem.
After we downloaded the sites for some times, we will realize, which sites are down and can mark this in our tbl_URLs and which are stable (no much changes) or which are very activ in updates. We mark this information in our tbl_URLs so that we can make more frequent backups with the more active sites and more seldom backups with the stable sites. - With a group of people doing this, we could also without much risk share the work among us: I would i.e. download a more stable site just once a year, if I know that in case of an emergency (the site wents offline within this year and perhaps there are some updates I missed), I could get a copy from one of my friends which is next to the date of going offline. - By this 12 friends could coordinate their downloads of such sites in the way that every other month one of them download them all. Each friend downloads all once in a year. But in case we need an actual copy, we know, who may have it. This saves download-time, working-time and harddrive-space.
Further project: An online-Scientology-Webarchive
Similar to the Way Back Machine we could consider to put our collection online and give access to everyone. But this will need some more ressources and could be done later. My intention is to collect everything in time, as long as it is available.
Who is interested to work with me on this?
I am looking for friends, who are willing to cooperate on this project. Do you know of someone working on this or interested in this?
yours
Andreas