Mirroring LAMP Website

From assela Pathirana
Jump to: navigation, search
DSC 00652.JPG

Introduction

Cheap web hosting companies do not provide

  1. SSL support (https),
  2. Digest authentication (a poor-man's alternative to true SSL to introduce a modicum of security.)
  3. Admin access to MySQL database servers.
Green info.gif

This web site runs on Mediawiki with MySQL backend (Read details here: Mediawiki for personal webs. This is the story of setting up automatic syncrhonisation between my development machine (DM) and the (cheap, no-frills) web hosting account.

Running a web-based content creation system like a wiki off such a platform exposes your passwords being transmitted in clear-text each time you login to the site to change something.

The solution I wanted to to implement is to run the `editable' version of the web site in a different server (my home server, that is not 24h up and running) then to `mirror' the changes to a `read-only' server on my hosting site. After several false-stats, I was successfully in doing this. Following is the story.

There a a number of steps to this process: (Before anything else make a backup of the database, all the static files of the web site and keep them safely.)

  1. Use myseldump to backup the database from the host (Where the web site is served from) and restore it in the development machine (DM). The idea here is to have the identical databases in host and DM. It is OK to have other databases on either of the machines.
  2. Copy all the required files (e.g. php, skins, images, etc.) from host to DM.
  3. Get the web site working on DM (say https://mydm.net/develop/mysite)
  4. Now we have to set the databases and files on the host to be automatically updated by the versions on DM periodically. If you have admin access on DM, you can set database replication, But, we don't, so I use a round-about way of getting this done. File synching can be done using rsync.
  5. Have a crontab on DM to do the above step periodically. (say once a day)
  6. I had to add a url rewriting rule in the host machine to get the site fully functional.
  7. A few more niceties to tidy things up.

Replication without Admin Rights

MySQLs `proper' database replication is a powerful too. But, it has some restrictions making it unsuitable for our purpose. Instead I used a neat utility called mk-table-sync -- a part of the Maatkit family of tools. "Maatkit is a toolkit for users, developers, and administrators of open-source databases. It is high-quality, formally tested software that is well documented and has an active open-source developer community.[1]". While it is possible to install Maatkit as a package in popular linux systems, that is not strictly necessary. Each tool in the toolkit is an independant toolkit. So just downloading the tool we need by

wget http://www.maatkit.org/get/mk-table-sync

At the time of writing (01:32, 10 April 2011 (JST)), there is was a bug in mk-table-sync tool that leads to errors when admin rights in the target database is not available. But it was easy to patch it as follows (just by commenting out two `offending' blocks:

6065,6080c6065,6081
<    my ( $self, $dbh ) = @_;
<    if ( !$self->{not_a_slave}->{$dbh} ) {
<       my $sth = $self->{sths}->{$dbh}->{SLAVE_STATUS}
<             ||= $dbh->prepare('SHOW SLAVE STATUS');
<       MKDEBUG && _d($dbh, 'SHOW SLAVE STATUS');
<       $sth->execute();
<       my ($ss) = @{$sth->fetchall_arrayref({})};
< 
<       if ( $ss && %$ss ) {
<          $ss = { map { lc($_) => $ss->{$_} } keys %$ss }; # lowercase the keys
<          return $ss;
<       }
< 
<       MKDEBUG && _d('This server returns nothing for SHOW SLAVE STATUS');
<       $self->{not_a_slave}->{$dbh}++;
<    }
---
> #   my ( $self, $dbh ) = @_;
>    #if ( !$self->{not_a_slave}->{$dbh} ) {
>      # my $sth = $self->{sths}->{$dbh}->{SLAVE_STATUS}
>      #       ||= $dbh->prepare('SHOW SLAVE STATUS');
>      # MKDEBUG && _d($dbh, 'SHOW SLAVE STATUS');
> 	 # assela commented. 
>    #   $sth->execute();
>    #   my ($ss) = @{$sth->fetchall_arrayref({})};
> #
> #      if ( $ss && %$ss ) {
> #         $ss = { map { lc($_) => $ss->{$_} } keys %$ss }; # lowercase the keys
> #         return $ss;
> #      }
> #
> #      MKDEBUG && _d('This server returns nothing for SHOW SLAVE STATUS');
> #      $self->{not_a_slave}->{$dbh}++;
> #   }
8565,8570c8566,8571
<    # Ensure statement-based replication.
<    # http://code.google.com/p/maatkit/issues/detail?id=95
<    $sql = '/*!50105 SET @@binlog_format="STATEMENT"*/';
<    MKDEBUG && _d($dbh, $sql);
<    $dbh->do($sql);
< 
---
>   # # Ensure statement-based replication.
>   # # http://code.google.com/p/maatkit/issues/detail?id=95
>   # $sql = '/*!50105 SET @@binlog_format="STATEMENT"*/';
>   # MKDEBUG && _d($dbh, $sql);
>   # $dbh->do($sql);
>   # assela commented above lines.

Then a command like:

./mk-table-sync --print   --execute --ignore-tables=site_stats \
h=localhost,S=<some_path>/mysql.sock,u=<dbuser>,p=$passwd \
--databases <database_name> h=<host>,P=${port}

would do the replication.

Full Script

The full script to do, updating of files and databases is as follows:

#!/bin/bash
# sendUpdatedContent.bash 
host=<yyyy>
rsync -avz  --delete ../www/* ${host}:~/www/. --exclude 'LocalSettings.php' --exclude '.htaccess'
passwd=<xxxx>
port=23248
for name in <db1> <db2> ...<dbn> ; do 
    ssh -c flowfish -C -f -L -i ~/.ssh/id_dsa -L ${port}:localhost:3306 user@host.name sleep 3
    ./mk-table-sync --print   --execute --ignore-tables=site_stats \
h=localhost,S=<path>/mysql/mysql.sock,u=<db_user>,p=$passwd \
--databases $name h=127.0.0.1,P=${port}  >& ${name}.log
sleep 10
done

Making it Automatic

Now, I don't want to do this synchronization too often. A crontab entry can be used to run it once in 24h or so. That's is what I did, but at the same time, I needed a way to run it whenever I like it. For example, when I added something to the web site that I want to be visible quickly, I don't want to wait until the next replication cycle which may be up to 24h away! I wrote a small php script to create a file in a specified location:

<?php 
# index.php in some directory accessible to the DM web site. 
if (touch ("/<somepath>/scripts/touch/updatesite")){
    echo "Site update will be started in a minute...\n";
}else{
    echo "Failed to initiate update!\n";
}
?>

This script creates a file /<somepath>/scripts/touch/updatesite.

Then I created the following shell script:

#!/bin/bash
#runSend.bash 
cd <somepath>/scripts
date +'%Y%m%d_%H:%M:%S'
if [ -f ./touch/updatesite ]; then 
      echo "running send"
      cmd=sendUpdatedContent.bash
      if ps -ef|grep "$cmd"|grep -v grep >/dev/null 
      then
          echo "But, send script is already running. I quit this time!"
      else
          echo "I checked, but script is not running. So, here I go.."
          cat runSend.log|tail -n5000 > $$ 
          rm -f ./touch/*
          mv -f $$ runSend.log
          ./${cmd}
      fi
fi

This script checks

  1. Whether the file updatesite is present in the <somepath>/scripts/touch directory and
  2. Whether the sendUpdatedContent.bash script is not already running.

If the answer is yes to both questions, then it will run sendUpdatedContent.bash script.

I have yet another simple script:

 
#!/bin/bash
touch /<somepath>/scripts/touch/updatesite

Finally I added the following crontab entry:

14 23 * * * /<somepath>/scripts/initiate.bash
* * * * * cd /<somepath>/scripts/&& ./runSend.bash >> ./runSend.log 2>&1

The first entry creates the file touch/updatesite once a day (every night at 11:14). The second entry runs the runSend.bash command once each minute.

However, runSend.bash initiates the sendUpdateContent.bash only if the file touch/updatesite is present. This happens

  1. just after 11:15 each night
  2. just after running the php script.

When I want instant updates, I call the php script via web, just after I finish editing the site in DM.

Url Re-writing on Host

In my case I had to add the following two rewriting rules to my host to get the system running flowlesslly.

RewriteRule ^foo/bar/[a-z].*/(images/?.*)$ /$1 [L,R=301]
RewriteRule ^foo/bar/[a-z].*/(/?.*)$ /$1 [L,R=301]

Lock-down the database on host

Finally, I locked the database on the host. This can be done by adding a $wgReadOnlyFile entry to the LocalSettings.php:

$wgReadOnlyFile="$IP/../thiswikiislocakedxxxx";

Then the named file should be created with a text message explaining why the database is locked. (e.g. This site is not meant for editing.)