Difference between revisions of "A Productive Desktop Environment for Scientists and Engineers - Part I"

From assela Pathirana
Jump to navigationJump to search
Line 38: Line 38:
We use a rainfall dataset covering 1993-09-30 13:20 to 1999-12-08 10:39 downloaded from the [[http://www.ntua.gr National Technical University of Athens]] Greece. It has  
We use a rainfall dataset covering 1993-09-30 13:20 to 1999-12-08 10:39 downloaded from the [[http://www.ntua.gr National Technical University of Athens]] Greece. It has  
{{box|If you don't have wget command, install the package using [[Cygwin#Adding programs|Cygwin]] setup. }}
{{box|If you don't have wget command, install the package using [[Cygwin#Adding programs|Cygwin]] setup. }}
c
 
* [[:Media:sample-rainfall.bz2|Download data]].  
* [[:Media:sample-rainfall.bz2|Download data]].  
* Expand the compressed file. A text file named 'Qim4.txt' is the result.  
* Expand the compressed file. A text file named 'Qim4.txt' is the result.  

Revision as of 09:08, 3 April 2006

(THIS IS STILL NOT IN A FORM USEFUL TO ANYBODY)

Introduction

Over the years of computer use for my work, I have settled down for certain practices that make my tasks a bit easier. Whether I am analyzing data, rigging-up a crude model, editing computer programs, writing a paper, creating a presentation or trying to keep my things organized. (Here, 'things' are strictly limited to 'things' that are stored in computer storage media. Anyone who has been to my office know the situation on other 'things'). In these pages I attempt to go through my 'tools of the trade' with the hope that someone in a similar situation will find them to be useful.


The little geeky program: Cygwin

Excel or excel: Data processing without getting your hands dirty

Green info.gif

This section makes use of basic UNIX utilities like awk, sed and bash shell. You may want to go through the following resources briefly, before continuing with this section.

  1. Bash guide
  2. GNU awk (gawk) guide
  3. Sed and sed FAQ.

Spreadsheets are useful programs for performing small analyses and testing ideas that involove a bit of computations. The problem comes when one wants to handle a dataset that has about one million rows. (One never has to handle THAT big files? How long is a hourly rainfall series covering 100 years? 876600. That's pretty much near one million.) Many spreadsheets have strict limitations on the number of rows and columns they can handle. (Microsoft Excel can have 65536 rows and 256 columns in one worksheet.)

Of course it is possible to break your data in to parts, say with 60,000 rows and process them in different worksheets. But, look. We are using a computer and this darn thing is supposed to save our time!

There are two extremely useful Unix utilities for processing long text files: awk and sed.

In the following brief introductions we use a real-world dataset to demonstrate some of the possibilities of those programs. See this section on how to download and prepare data, before proceeding.

In comes AWK

Green info.gif

Wget command can download anything on the web without using a browser, just on the commandline.

You need a good web browser

Simple! Just download Firefox browser and be happy ever after :-) (at least for the foreseeable future!). I am not just trying to be different from the 'masses' here. Simply trust me on this one, install it, load a web page and press 'Ctrl+T'. Tabbed browsing is one of those simple improvement that endear a tool to the user. I have been using this feature on my browser (first Mozilla and then wikipedia:Mozilla Firefox since 2001 and I can not imagine browsing the internet without it.

Spell checker for your browser

When was the last time you have opend up one of large textarea on a web page. (a good example is using a webmail service like gmail or [[wikipedia:Yahoo! Mail|]].) Before submitting the 'Go' button, it is better to correct those misspelled words. This can be done by copying and pasting the text in your wordprocessor (e.g. Microsoft Word), checking spelling and pasting back. But that is a lot of work! It is better to have a built-in spell checker in the browser, always at service.

Spellbound is an extension for Mozilla firefox, for just doing that. Follow the link below to learn how it works and how to install it.



Download and prepare test data

We use a rainfall dataset covering 1993-09-30 13:20 to 1999-12-08 10:39 downloaded from the [National Technical University of Athens] Greece. It has

Green info.gif

If you don't have wget command, install the package using Cygwin setup.

  • Download data.
  • Expand the compressed file. A text file named 'Qim4.txt' is the result.

Here are the commands needed to do that

wget http://assela.pathirana.net/images/f/fe/Sample-rainfall.bz2
bunzip2 Sample-rainfall.bz2