A Productive Desktop Environment for Scientists and Engineers - Part IV
Lost Worlds on Desktop
When we were kids, there was a time we referred to the computers as electronic brains; and though we had not seen a one for real, we revered the concept. After all we were brought up in the world of HAL 9000 and R2-D2 and believed that in just a few years passing the Turing test will be peanuts for the real-world computers. Well... the reality? Twenty odd years has passed and computers are nowhere near the level of doing intelligent things. Strong AI claims of the eighties and nineties were basically flops (though in science we can never claim that something will not be invented in the future).
A very rough estimate of the number of characters stored in a 200,000-book library can be made as follows:
- An average pages will have about 400 words. [1]
- A good approximation of an average word size is six.
- An average number of pages of a book will be around 300. [2]
- In ASCII encording, one byte can hold one character.
- 200000x300x400 /1000000000 =24 Giga Bytes.
- Most of the computers that came in to market during last five years had more than 24GB of hard disk space.
- Potentially our computers can store information contained in an average library quite easily!
Had everything was disappointing and boring about computers? Well...they certainly did not live up to our expectations, but at the same time they helped us achieve an entirely different set of tools and skills that we never expected from them. Who thought the computer will be the centerpiece of a communication revolution which allows us to write a letter, post it to a destination halfway around the globe and get the reply withing a few minitues. Or did we imagine that computers in Japan will show on their screen what is stored in a library of the United States? Do we appreciate the fact that the most primitive of todays (2006) computers can store the total amount of information contained in a good sized library? While, computers did not deliver the goods that we expect them to do, but at the same time they have opened up a whole set of new possibilities that have the potential to make the world better-connected, more-equal and efficient. (Here, I used the word potential!)
Since around 1996 I was using my own 'personal' computer. (That means I had a machine that was used only by me, though it was strictly not mine.) From that day onwards, I started collecting useful and not-so-useful information in the computer. In the beginning they were mostly computer programs -- things that were supposed to be associated with even '50s computers. However, soon afterwards I started collecting e-mail messages -- after all, they come to your computer, so, why not archive them there? (In fact, this was sometimes quite useful for digging up past events to check what happened and when.) Little by little my scope for collecting became wider and wider. By around 2001, I was pretty much keeping almost all of my research related notes, publications, presenations slides and many other things, permanantly in my computer.
Now, the computers have built-in mechanisms to organize things in the name of hierarchical file-system -- consisting of directories and files. At the first glance, this is a very good system for getting organizd and indeed it is. With the traditional 8.3 (maximum eight characters long file names with three character extension) restriction on Computer_file of PCs done away with, one can make their folder and file names very decriptive.
However, on second throughts, one may like to keep more information about their files readily visible, searchable and simply more interactive. In database jargon this information of data is called metadata and can be quite useful. Around an year back I was interested in this field of organizing my information in different ways than the traditional file systems allow.
Ready-made solutions
My colleague Dr. lee jin introduced me to the piece of software called MultiCentrix which is an excellent, ready-made tool for organizing information. I was impressed by the capability of the software and had to agree that it is an ingenious product. However, before we go and buy MultiCentrix, there are a few issues with this products. First, it is a commercial software produced by a not-so-well-known company. There are two adverse effects of this combination. First, the product like many others produced by specialist software houses (like ESRI, Worlfram, Mathworks etc.) tend to be pretty expensive. While the cost is not prohibitive for coorporate use, it is so for personal use. Another is that (at least when I had a look at the product during early 2006) the user interface was not well-polished. Finally there is the issue about sustainability. There always is the danger of such an enterprise being bought off by (sometimes competing) software Giant. Its all well and good if the buyer continues the offering of the product. But sometimes (as the initial case was when Microsoft gobbled up the Interix software) they simply buy to avoid a nitch competitor and simply 'hide-away' the product.
Now, what are the relevence of these arguments to a personal information organizing product. Here, we are not discussing about something like a web-browser or a wordprocessor which we can easily replace with a competing product and our level of committment is always minor. Once we commited to a solution to organize all results of our work (even the products of our work with other software!) god help us if the tool goes belly-up in the future! Financial issue also has similar strong arguments for it. For example, I like to have the (innocent?) right of recommending what I am using to a friend with the belief that it will not break their wallets! Finally for aesthetics, while someone will not die of lack of visual appeal, I prefer to my everyday software companion (by the way that is the purpose) to be good looking.
There are host of other products some of which I tried in quick succession. They include Inspiration, The Brain, Lucid Fried Eggs, thoughtstream, etc. (See this page for a good shopping list of products.) But none of them fit my bill as a solution that is:
- Cheap
- Looking decent
- Sustainable over the long-haul
- (Most importantly) Flexible enough to accomodate my purpose.
This was a goldern oppotunity for wasting time on a half-baked solution! Who can miss one. So, I started figuring out (I use this word deleberately over inventing or designing -- because I did not do those things.) a way to get the work done.
The needs
My major needs were the following:
- I needed to keep long amounts of text,
- easy to edit
- easily searcheable from a central location
- Ability to include graphics easily.
- Seamlessly make links to relevent computer files from within the text.
- Most importantly linking pieces of information in a non-heirarchical way, so that I can start at a point and trace most of the necessary information by following these links.
Now for the experienced it will be clear that I am writing a list of features of the world wide web!!! Web pages, web editors, hyperlinks and google make up the above list of requirements. But, I was not exactly trying to reinvent the wheel here: I needed a framework similar to the world wide web, on my desktop.
I implemented this sysem by hacking the famous mediawiki system, the heart and brain behind wikipedia.
Software pieces
There are a list of software that are needed to implement a wikipedia on the desktop. In the old days (five years ago) this used to be a hard task, but not so any more. Here is the list to scare the reader, but the initiated will learn later in this section that we can leapfrog through this process rather easy and fast.
- A web server - Apache
- A relational database management system - MySQL
- A programming language to glue the database to the web service - PHP
- Finally, the mediawiki system.
Practically doing this involves only two steps and is laughingly simple thanks to this organization called apachefriends.org. What they have done (and keep on doing) is collecting the first three pieces of software (Apache+MySQL+PHP) and make it a single coherent system that can be installed in one shot. In fact it contains much more than AMP; according to the web site of the product it currently (15:07, 6 September 2006 (JST)) contains:
... Apache, MySQL, PHP + PEAR, Perl, mod_php, mod_perl, mod_ssl, OpenSSL, phpMyAdmin, Webalizer, Mercury Mail Transport System for Win32 and NetWare Systems v3.32, JpGraph, FileZilla FTP Server, mcrypt, eAccelerator, SQLite, and WEB-DAV + mod_auth_mysql ... |
|
To install simply download the software from http://www.apachefriends.org/en/xampp-windows.html (for windows; this is available for other OS like Linux too.) and install as if we install any other windows application.
Security
It is important to follow the instructions given in http://www.apachefriends.org/en/xampp-windows.html (See: Method A: Installation with the Installer) particularly critical to finish off the installation with closely following "The XAMPP Security console" part of above document. DO NOT postpone this step.
During this step you will give a name and a password to your MySQL database administrator. Further you will give an access password to xampp folder. Note down these things and keep safe. We need them later.
Make it a point to give a simple name (like root, admin or sys) and a good strong password to the administrator.
Testing
If everything is ok, you should be able to open the following url in the browser and get XAMPP greeting page (shown on right): http://localhost/xampp/index.php.
The Document Root
We are simplifying things a quite a bit here. There are quite a number of factors that determine how the file system under document root, or sometimes elsewhere could be reflected through the web server. [See .htaccess tutorial on Apache web site for details related to Apache web server.]
A bit of the web server jargon is in order here. Every web server has a point in the file system called Document Root. Each sub-folder or file under the Document root will normally be accessible through the web server using a web-browser and would appear as a sub-directory or file.
File System | Web server |
---|---|
C:\HTDOCS │ index.html │ photos.html │ ├─documents │ report1.html │ └─downloads └─opensource shebang.zip |
http://www.foo-bar.net/ http://www.foo-bar.net/index.html http://www.foo-bar.net/photos.html http://www.foo-bar.net/documents http://www.foo-bar.net/documents/report1.html http://www.foo-bar.net/downloads http://www.foo-bar.net/downloads/opensource http://www.foo-bar.net/downloads/opensource/shebang.zip |
Similarly our default document root is C:\Program Files\xampp\htdocs. So, if there is a file C:\Program Files\xampp\htdocs\myfiles\foo-bar.html in the file system, we can access that as http://localhost/myfiles/foo-bar.html
How does Apache know the document root? There is a file called httpd.conf at C:\Program Files\xampp\apache\conf. Examine that file using a good text editor.
# # ServerRoot: The top of the directory tree under which the server's # configuration, error, and log files are kept. # # Do not add a slash at the end of the directory path. If you point # ServerRoot at a non-local disk, be sure to point the LockFile directive # at a local disk. If you wish to share the same ServerRoot for multiple # httpd daemons, you will need to change at least LockFile and PidFile. # ServerRoot "C:/Program Files/xampp/apache"
This is where Apache learns what our intended Documet Root is.
Take this section on Security very seriously and always check and make absolutely sure that your server can not be accessed from outside.
Security
HTTP servers are meant to be accessed. So, if you run a server in your computer and leave it at that, people around you (and sometimes others across in the other continent!) will be able to access your server pages (and there goes your privacy).
First see whether you have a section like the following in your configuration file, immediately below document root.
DocumentRoot "C:/Program Files/xampp/htdocs" # # Each directory to which Apache has access can be configured with respect # to which services and features are allowed and/or disabled in that # directory (and its subdirectories). # # First, we configure the "default" to be a very restrictive set of # features. # <Directory /> Options FollowSymLinks AllowOverride None Order deny,allow Deny from all </Directory> # # Note that from this point forward you must specifically allow # particular features to be enabled - so if something's not working as # you might expect, make sure that you have specifically enabled it # below. #
Never, ever change anything within the section :
<Directory /> Options FollowSymLinks AllowOverride None Order deny,allow Deny from all </Directory>
This section first restricts access from all ip-addresses (computers) to this server's document root (/) and everything below that. Little point in running a server eh? But, we can later selectively give access to different directories. By doing this one can reduce the chance of inadvertantly 'opening up' to the world.
Security: The Easy-way out
However, there is an easy way for the novice user to restrict access to the world and open up everything to the local computer. If you are the only user of your pc (which is the case for most of us), this is perfectly acceptable way of doing things as far as security is concernted.
- Find the following line in the configuration file
Listen 80
and change it to the following
#listen 80 listen localhost:80 # listen only to the local computer.
- Then scan all the files in the conf folder and its sub-folders for any other listen directives and comment them.
- Scanning of files can be done by executing the following command from cygwin
fgrep -lri listen *
Finding your ip address. Type the following command on the command prompt or cygwin:
ipconfigand in the output, your ip address will be listed as:
IP Address. . . . . . . : xyz.abc.pqr.stu
- Then edit those files (results of above command) and comment out all currently-unnecessary listen directives.
- Restart the web server.
- Make sure that your sever is not accessible from outside your computer. Try the following from another computer in the same subnet| as yours (In English: if your computer's ip address is xyz.abc.pqr.stu and the other computer's is xyz.abc.pqr.cde then they are in same subnet.). Try following addresses (substitute xyz.abc.pqr.cde with your computers ip address)
http://xyz.abc.pqr.cde https://xyz.abc.pqr.cde
By no way the above (testing from outside computer) is an exhastive test, but we are only just making sure!
Summery
Now we have a web server:
- Listen to port 80 (http port) requests originating from local computer.
- Don't listen to anything coming from outside.
Installing Mediawiki
Assuming that you have installed cygwin in your computer, some steps are given to be done in that environment. Please install cygwin if not already done.
Mediawiki installation is explained in detail here. For our case (where we install in the local computer) only the 'installation' part applies. For our project here's what need to be done:
cd /c/Program Files/xampp/htdocs/ wget http://jaist.dl.sourceforge.net/sourceforge/wikipedia/mediawiki-1.7.1.tar.gz #the mirror site jaist.dl.sourceforge.net and vesion 1.7.1 may be different. tar -xzf mediawiki-1.7.1.tar.gz
- Create a simple link (shortcut)
ln -s mediawiki-1.7.1 mywiki #or any other name you like
- Then access the installation page from your browser.
http://localhost/mywiki/
- Follow the instructions appear on the browser with the ones given here. You need the following information to complete this task.
- Database administrator username and password. (We specified this here ).
- A name for your wiki (something short), name for your database (short, without spaces or special characters e.g. wikistore) and a username (e.g. storeuser).
- At the end of installation (which is automatic, once we specify correct database and other parameters above) you will be asked to move the file LocalSettings.php which is created in the "config" directory. Official installation instructions [4] say:
This file contains all the information needed by MediaWiki to run. If it does not find the file in the main folder, it will launch the installation script to create a new one in the "config" directory.
|
|
In our context this means:
cd /c/Program Files/xampp/htdocs/mywiki mv ./config/LocalSettings.php . rm -rf ./config
Now we are all done and the brand new wiki can be accessed through :
http://localhost/mywiki/
Initial configuration
Before we go ahead and use the system, it is advicible to do some changes in the LocalSettings.php. Mainly this is done to remove some restrictions that make perfrect sence in the hostile environment of the Internet, but of little use but a hinderance for our task. Again, please be adviced not to use these settings blindly to a wiki hosted on the internet! These are only for personal-wikis!!
For brevity, default settings are listed on left and changed settings on right.
##########ORIGINAL FILE############## ##########MODIFIED FILE############## | | #Following will make the wiki respond faster. | $wgUseFileCache = true; | $wgFileCacheDirectory = "{$wgUploadDirectory}/cache"; | $wgUseGzip = true; | | ## To enable image uploads, make sure the 'images' dir| ## To enable image uploads, make sure the 'images' dir ## is writable, then set this to true: | ## is writable, then uncomment this: $wgEnableUploads = false; | $wgEnableUploads = true; $wgUseImageResize = true; | $wgUseImageResize = true; # $wgUseImageMagick = true; | $wgUseImageMagick = true; # $wgImageMagickConvertCommand = "/usr/bin/convert"; | $wgImageMagickConvertCommand = "C:¥_program_files¥imag | | $wgVerifyMimeType= false; | #
Customizing your wiki
There are several customizations that will make your wiki quite useful as a personal metadata systeem.
File Link Extension
This extension (expalined here) allows direct linking of files and folders as hyperlinks of your wiki. For example, suppose you have written a letter in a word processor like microsoft word and wants to link it to the page explaining the backgroun of the letter and its related objects (e.g. other letters). Such files can be linked to your wiki using two methods. The first is the 'so-called' uploading. This is the traditional method used in wikis on internets and is very safe. There is little chance of the file (your letter) moving/vanishing in the future and thus your link becoming obsolete. However, sometimes in person-wiki context, it is useful to have a direct link to the file system, particularly when the target file needs to be modified often in the future. This can be done using File Link Extension. There are pros as well as cons of this approach however.
- When you click on the direct link, corressponding file will be opened directly. Wheres in traditional uploaded objects, you have to download, open and modify, save and then re-upload a file to update it!!
- If you move your file from the original locatoin to somewhere else after linking it to your wiki, the wiki has no way of knowing it has moved!! There will be a broken link.
Therefore it is important to use this extension carefully. My strategy is this: I have a seperate disk partition (lets say drive m:) that I always use to keep my work files. And I keep these files in locations that are unlikely to be moved. For example, if I write a paper to the "Annual conference on Computer Games and Other ways to kill time, 2006, I keep my manuscript at: m:\Publications\accgowkt2006\paper\submission\ folder. Then it is very unlikely to be moved in the future. When I change computers, I create a new m: partition and put all my work data there. I have done this anyway for years, and it works, whether you use a wiki or not!