A Productive Desktop Environment for Scientists and Engineers - Part III

From assela Pathirana
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
DSC 0034.JPG
Table of Content folder.png



Completed Chapters

  • Part I : Processing simple datasets, shell scripting, awk and sed, spatial data.
  • Part II: Map and other geographical/technical drawings.
  • Part III: Editors, web browsers.
  • Part IV: Create a Desktop database for storing everything!

Text data, Text files & Text Editors

Text editors were discussed briefly before. This article is simply a much through explanation.

I have met at least one person, who used to edit source code using microsoft word! While this is certainly possible to do, there are far easier ways of doing the same thing. In the windows environment, perhaps the most basic tool is the notepad application. Before we dive in to the world of text editors under different platforms, it is benificial to have some background knowlege on text files.

What exactly is a text data

All the information that is handled by a computer is Binary numeral system (bits) that are often grouped in to blocks called words or bytes, which is normally the smallest block of binary data on which a meaningful calculation can be done. This is essentially a numeral value (e.g. an 8-bit byte can be used to represent a number from 0 to 255.

In order to represent human-language characters using computer words, there are convenstions generally agreed-upon, known as character encodings. For example ASCII encoding system defines 128 characters with mapped to numbers from 0 to 127. These included printable characters (Alpha-numerics and some symbols) and other control characters (e.g. line feed characters). Another popular encoding system is unicode system. Most of the control characters have become largely obsolete except for carriage return and line feed. And these two causes a problem when we move text files between operating systems!

Carriage return, line feed and newline

First there were typewriters!

In an old manual typewriter, there is a lever for the typist to end the current line and start a new line (i.e. newline operation). The lever had two functions: to feed a line by rotating the cylinder carrying the paper and to move the cyclinder horizontally so that the typing starts at the left margin of the paper. Early computer designs adopted the typewriter system via the teletype input/output terminals and hence adopted the same convention, namely using a carriage return and line feed (CR+LF) to represent a newline. Later different operating systems adopted the newline convention differently. CP/M, MS-DOS and hence all versions of Microsoft Windows retained CR+LF convension, UNIX use LF and Apple Computer's Mac OS used CR, until recently.

Due to these discrepancies , issues arise when text files are exchanged between different operating systems. This can be easily demonstrated using our cygwin system.

Red warning.gif

This example was written in year 2006, for microsoft notepad version 5.1. There is the possibility that this program will change in the future and this demonstration no longer holds true.

  1. Use nedit to create a file named foo.txt containing:
line one
line two 
line three
  1. Then open it with notepad program. What we will see is something like:
line oneline twoline three

.

This is due to the fact that the notepad program fails to seperate lines with only LF. However, it should be noted that increasingly more utilties from either side of the fence (UNIX, Windows) can handle this difference gracefully.

Similarly,

  1. type the following with notepad in a file named bar.txt
head1
head2
head3
  1. Then on cygwin do the following:
cat bar.txt| sed 's/$/tail/g'

what this is supposed to do is to add the word tail to the end of each line, so that we would get

head1tail
head2tail
head3tail

But instead, we get something like:

tail1
tail2
head3tail

Take care

It is obvious that one has to be pretty careful when handling text files across operating systems. There are two points that we should remember.

  1. Always try to create and use data within same operating system. e.g. If you need to write a small script for cygwin, use, an editor like nedit, rather than doing it with notepad.
  2. When there is a dataset that is (or you suspect to be) written in other operating system, convert them. Cygwin has two small utilties to do this.
dos2unix bar.txt

will remove Carriage return characters from windows/DOS text file bar.txt.

unix2dos foo.txt

adds a line feed after each carriage return in file foo.txt. Try these with above files.

Editors for Writing Programs

Funny.gif

Some fun links showing hackers' devotion to their editors.

Well there are hundreads of them. Including the one we are familiar in these pages, nedit. Perhaps most of the text editors today are varients of the model on which nedit is based. These cover a range of products from windows notepad, Linux gedit to built-in editors in many Integrated Development Environments like Eclipse. One saliant feature of these editors is that it is very easy to start using them.

However, when one starts spending considerable time writing on the computer (especially programs and other structured text), it becomes increasingly profitable to learn one of the fast editors. There are basically two modern alternatives: Vim editor or Emacs. These editors have steep learning curves, making them hard to learn in the begining. However, once one spends several hours learning the basics the long-time reward is the editing speed and efficiency that is nearly impossible to achieve with the other category of editors.

I use vim. However, one should respect others religious convictions, so it is my duty to say that both editors are equally good. (Though I don't know a damn thing about emacs!)

And in my openion if one does not spend at least an hour a week with source-code (Programming, html, shell scripting or somthing similar), perhaps it is not worthwhile to spend time to learn the big brothers VIM or emacs. But, don't use wordpad or notepad to edit your scripts please. Cygwin comes with a good editor called nedit. One can instead use some simple editor like Crimson_Editor or another one from this list.

My experience with Vim

Vim editor of cygwin X11, with several files opened.

My first exposure to the vi editor was when I started fiddling with Unix computer systems (mostly SunOS) during the graduate school years. Then I considered it to be more of a hassle one has to go through occasionally, when faced with a situation with only text mode is working (no X11 graphics). A few years later once I was faced with a project that involved editing a number of source code files (Ok. I was modifying MM5 atmospheric model source code.) and faced the problem of often getting my desktop cluttered with editor windows and my wrist was painful from moving back and fourth between the keyboard and mouse. I decided to give a serious go at learning this weired editor called Vim.

I used the book called Vi iMproved (VIM) (<amazon>0735710015</amazon>) to learn the editor. For me it was an exercise like learning to touch type -- where most of the effort you spend is on practicing. However, to make a long story short -- in no time, I was happily editing six seven source code files simultaneously, compiling them and running various UNIX commands -- all without removing my hands from the keyboard! You really don't have to master VIM to reap it's benefits. Just learning six seven basic commands and practicing them is enough to start doing serious jobs with it. Then learn new stuff as you go.

In my opinion the biggest rewards one can get from learning VIM (or Emacs) include:

  • Constantly keeping your hands on the keyboard. (While editing, testing your edits by compiling and running programs and even running shell commands!)
  • Navigating through really large files and arriving at specific line numbers or text strings.
  • Editing multiple files, again without moving your hand from the keyboard at all.
  • Powerful search and replace functions.
  • Automatic syntax highlighting and formatting based on the type of the program you are editing (e.g. C, PHP, HTML, etc.)

Sounds pretty trivial, but increases the productivity on the computer more than any other piece of software.

You need a good web browser

Simple! Just download Firefox browser and be happy ever after :-) (at least for the foreseeable future!). I am not just trying to be different from the 'masses' here. Simply trust me on this one, install it, load a web page and press 'Ctrl+T'. Tabbed browsing is one of those simple improvement that endear a tool to the user. I have been using this feature on my browser (first Mozilla and then Mozilla Firefox since 2001 and I can not imagine browsing the internet without it.

Spell checker for your browser

When was the last time you have opend up one of large textarea on a web page. (a good example is using a webmail service like gmail or Yahoo! Mail.) Before submitting the 'Go' button, it is better to correct those misspelled words. This can be done by copying and pasting the text in your wordprocessor (e.g. Microsoft Word), checking spelling and pasting back. But that is a lot of work! It is better to have a built-in spell checker in the browser, always at service.

Spellbound is an extension for Mozilla firefox, for just doing that. Follow the link below to learn how it works and how to install it.