Difference between revisions of "Mirroring LAMP Website"
(27 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
[[File:DSC_00652.JPG|thub|600px|right|]] | |||
__TOC__ | |||
==Introduction== | ==Introduction== | ||
Cheap web hosting companies do not provide | Cheap web hosting companies do not provide | ||
# SSL support (https), | # SSL support (https), | ||
# Digest authentication (a poor-man's alternative to true SSL to introduce a modicum of security.) | # [[wikipedia:Digest authentication|Digest authentication]] (a poor-man's alternative to true SSL to introduce a modicum of security.) | ||
# Admin access to MySQL database servers. | # Admin access to MySQL database servers. | ||
{{box|This web site runs on [[wikipedia:Mediawiki|Mediawiki]] with MySQL backend (Read details here: [[Mediawiki for personal webs]]. This is the story of setting up automatic syncrhonisation between my development machine (DM) and the (cheap, no-frills) web hosting account.}} | |||
Running a web-based content creation system like a wiki off such a platform exposes your passwords being transmitted in clear-text each time you login to the site to change something. | Running a web-based content creation system like a wiki off such a platform exposes your passwords being transmitted in clear-text each time you login to the site to change something. | ||
Line 78: | Line 82: | ||
Then a command like: | Then a command like: | ||
<source lang=bash> | <source lang=bash> | ||
./mk-table-sync --print --execute --ignore-tables=site_stats \ | |||
h=localhost,S=<some_path>/mysql.sock,u=<dbuser>,p=$passwd \ | |||
--databases <database_name> h=<host>,P=${port} | |||
</source> | |||
would do the replication. | |||
==Full Script== | |||
The full script to do, updating of files and databases is as follows: | |||
<source lang=bash> | |||
#!/bin/bash | |||
# sendUpdatedContent.bash | |||
host=<yyyy> | |||
rsync -avz --delete ../www/* ${host}:~/www/. --exclude 'LocalSettings.php' --exclude '.htaccess' | |||
passwd=<xxxx> | |||
port=23248 | |||
for name in <db1> <db2> ...<dbn> ; do | |||
ssh -c flowfish -C -f -L -i ~/.ssh/id_dsa -L ${port}:localhost:3306 user@host.name sleep 3 | |||
./mk-table-sync --print --execute --ignore-tables=site_stats \ | |||
h=localhost,S=<path>/mysql/mysql.sock,u=<db_user>,p=$passwd \ | |||
--databases $name h=127.0.0.1,P=${port} >& ${name}.log | |||
sleep 10 | |||
done | |||
</source> | |||
==Making it Automatic== | |||
Now, I don't want to do this synchronization too often. A [[wikipedia:crontab|crontab]] entry can be used to run it once in 24h or so. That's is what I did, but at the same time, I needed a way to run it whenever I like it. For example, when I added something to the web site that I want to be visible quickly, I don't want to wait until the next replication cycle which may be up to 24h away! I wrote a small php script to create a file in a specified location: | |||
<source lang=php> | |||
<?php | |||
# index.php in some directory accessible to the DM web site. | |||
if (touch ("/<somepath>/scripts/touch/updatesite")){ | |||
echo "Site update will be started in a minute...\n"; | |||
}else{ | |||
echo "Failed to initiate update!\n"; | |||
} | |||
?> | |||
</source> | |||
This script creates a file <tt>/<somepath>/scripts/touch/updatesite</tt>. | |||
Then I created the following shell script: | |||
<source lang=bash> | |||
#!/bin/bash | |||
#runSend.bash | |||
cd <somepath>/scripts | |||
date +'%Y%m%d_%H:%M:%S' | |||
if [ -f ./touch/updatesite ]; then | |||
echo "running send" | |||
cmd=sendUpdatedContent.bash | |||
if ps -ef|grep "$cmd"|grep -v grep >/dev/null | |||
then | |||
echo "But, send script is already running. I quit this time!" | |||
else | |||
echo "I checked, but script is not running. So, here I go.." | |||
cat runSend.log|tail -n5000 > $$ | |||
rm -f ./touch/* | |||
mv -f $$ runSend.log | |||
./${cmd} | |||
fi | |||
fi | |||
</source> | |||
This script checks | |||
# Whether the file <tt>updatesite</tt> is present in the <tt><somepath>/scripts/touch</tt> directory and | |||
# Whether the sendUpdatedContent.bash script is not already running. | |||
If the answer is yes to both questions, then it will run sendUpdatedContent.bash script. | |||
I have yet another simple script: | |||
<source lang=bash> | |||
#!/bin/bash | |||
touch /<somepath>/scripts/touch/updatesite | |||
</source> | |||
Finally I added the following crontab entry: | |||
<source lang=bash> | |||
14 23 * * * /<somepath>/scripts/initiate.bash | |||
* * * * * cd /<somepath>/scripts/&& ./runSend.bash >> ./runSend.log 2>&1 | |||
</source> | |||
The first entry creates the file <tt>touch/updatesite</tt> once a day (every night at 11:14). | |||
The second entry runs the <tt>runSend.bash</tt> command once each minute. | |||
However, <tt>runSend.bash</tt> initiates the sendUpdateContent.bash only if the file <tt>touch/updatesite</tt> is present. This happens | |||
# just after 11:15 each night | |||
# just after running the php script. | |||
When I want instant updates, I call the php script via web, just after I finish editing the site in DM. | |||
==Url Re-writing on Host== | |||
In my case I had to add the following two rewriting rules to my host to get the system running flowlesslly. | |||
<source lang=apache> | |||
RewriteRule ^foo/bar/[a-z].*/(images/?.*)$ /$1 [L,R=301] | |||
RewriteRule ^foo/bar/[a-z].*/(/?.*)$ /$1 [L,R=301] | |||
</source> | |||
==Lock-down the database on host== | |||
Finally, I locked the database on the host. This can be done by adding a <tt>$wgReadOnlyFile</tt> entry to the <tt>LocalSettings.php</tt>: | |||
<source lang=php> | |||
$wgReadOnlyFile="$IP/../thiswikiislocakedxxxx"; | |||
</source> | </source> | ||
Then the named file should be created with a text message explaining why the database is locked. (e.g. This site is not meant for editing.) | |||
[[Category:Computing]][[Category:Unix]] | [[Category:Computing]][[Category:Unix]] |
Latest revision as of 09:57, 12 April 2011
Introduction
Cheap web hosting companies do not provide
- SSL support (https),
- Digest authentication (a poor-man's alternative to true SSL to introduce a modicum of security.)
- Admin access to MySQL database servers.
This web site runs on Mediawiki with MySQL backend (Read details here: Mediawiki for personal webs. This is the story of setting up automatic syncrhonisation between my development machine (DM) and the (cheap, no-frills) web hosting account.
Running a web-based content creation system like a wiki off such a platform exposes your passwords being transmitted in clear-text each time you login to the site to change something.
The solution I wanted to to implement is to run the `editable' version of the web site in a different server (my home server, that is not 24h up and running) then to `mirror' the changes to a `read-only' server on my hosting site. After several false-stats, I was successfully in doing this. Following is the story.
There a a number of steps to this process: (Before anything else make a backup of the database, all the static files of the web site and keep them safely.)
- Use myseldump to backup the database from the host (Where the web site is served from) and restore it in the development machine (DM). The idea here is to have the identical databases in host and DM. It is OK to have other databases on either of the machines.
- Copy all the required files (e.g. php, skins, images, etc.) from host to DM.
- Get the web site working on DM (say https://mydm.net/develop/mysite)
- Now we have to set the databases and files on the host to be automatically updated by the versions on DM periodically. If you have admin access on DM, you can set database replication, But, we don't, so I use a round-about way of getting this done. File synching can be done using rsync.
- Have a crontab on DM to do the above step periodically. (say once a day)
- I had to add a url rewriting rule in the host machine to get the site fully functional.
- A few more niceties to tidy things up.
Replication without Admin Rights
MySQLs `proper' database replication is a powerful too. But, it has some restrictions making it unsuitable for our purpose. Instead I used a neat utility called mk-table-sync -- a part of the Maatkit family of tools. "Maatkit is a toolkit for users, developers, and administrators of open-source databases. It is high-quality, formally tested software that is well documented and has an active open-source developer community.[1]". While it is possible to install Maatkit as a package in popular linux systems, that is not strictly necessary. Each tool in the toolkit is an independant toolkit. So just downloading the tool we need by
wget http://www.maatkit.org/get/mk-table-sync
At the time of writing (01:32, 10 April 2011 (JST)), there is was a bug in mk-table-sync tool that leads to errors when admin rights in the target database is not available. But it was easy to patch it as follows (just by commenting out two `offending' blocks:
6065,6080c6065,6081
< my ( $self, $dbh ) = @_;
< if ( !$self->{not_a_slave}->{$dbh} ) {
< my $sth = $self->{sths}->{$dbh}->{SLAVE_STATUS}
< ||= $dbh->prepare('SHOW SLAVE STATUS');
< MKDEBUG && _d($dbh, 'SHOW SLAVE STATUS');
< $sth->execute();
< my ($ss) = @{$sth->fetchall_arrayref({})};
<
< if ( $ss && %$ss ) {
< $ss = { map { lc($_) => $ss->{$_} } keys %$ss }; # lowercase the keys
< return $ss;
< }
<
< MKDEBUG && _d('This server returns nothing for SHOW SLAVE STATUS');
< $self->{not_a_slave}->{$dbh}++;
< }
---
> # my ( $self, $dbh ) = @_;
> #if ( !$self->{not_a_slave}->{$dbh} ) {
> # my $sth = $self->{sths}->{$dbh}->{SLAVE_STATUS}
> # ||= $dbh->prepare('SHOW SLAVE STATUS');
> # MKDEBUG && _d($dbh, 'SHOW SLAVE STATUS');
> # assela commented.
> # $sth->execute();
> # my ($ss) = @{$sth->fetchall_arrayref({})};
> #
> # if ( $ss && %$ss ) {
> # $ss = { map { lc($_) => $ss->{$_} } keys %$ss }; # lowercase the keys
> # return $ss;
> # }
> #
> # MKDEBUG && _d('This server returns nothing for SHOW SLAVE STATUS');
> # $self->{not_a_slave}->{$dbh}++;
> # }
8565,8570c8566,8571
< # Ensure statement-based replication.
< # http://code.google.com/p/maatkit/issues/detail?id=95
< $sql = '/*!50105 SET @@binlog_format="STATEMENT"*/';
< MKDEBUG && _d($dbh, $sql);
< $dbh->do($sql);
<
---
> # # Ensure statement-based replication.
> # # http://code.google.com/p/maatkit/issues/detail?id=95
> # $sql = '/*!50105 SET @@binlog_format="STATEMENT"*/';
> # MKDEBUG && _d($dbh, $sql);
> # $dbh->do($sql);
> # assela commented above lines.
Then a command like:
./mk-table-sync --print --execute --ignore-tables=site_stats \
h=localhost,S=<some_path>/mysql.sock,u=<dbuser>,p=$passwd \
--databases <database_name> h=<host>,P=${port}
would do the replication.
Full Script
The full script to do, updating of files and databases is as follows:
#!/bin/bash
# sendUpdatedContent.bash
host=<yyyy>
rsync -avz --delete ../www/* ${host}:~/www/. --exclude 'LocalSettings.php' --exclude '.htaccess'
passwd=<xxxx>
port=23248
for name in <db1> <db2> ...<dbn> ; do
ssh -c flowfish -C -f -L -i ~/.ssh/id_dsa -L ${port}:localhost:3306 user@host.name sleep 3
./mk-table-sync --print --execute --ignore-tables=site_stats \
h=localhost,S=<path>/mysql/mysql.sock,u=<db_user>,p=$passwd \
--databases $name h=127.0.0.1,P=${port} >& ${name}.log
sleep 10
done
Making it Automatic
Now, I don't want to do this synchronization too often. A crontab entry can be used to run it once in 24h or so. That's is what I did, but at the same time, I needed a way to run it whenever I like it. For example, when I added something to the web site that I want to be visible quickly, I don't want to wait until the next replication cycle which may be up to 24h away! I wrote a small php script to create a file in a specified location:
<?php
# index.php in some directory accessible to the DM web site.
if (touch ("/<somepath>/scripts/touch/updatesite")){
echo "Site update will be started in a minute...\n";
}else{
echo "Failed to initiate update!\n";
}
?>
This script creates a file /<somepath>/scripts/touch/updatesite.
Then I created the following shell script:
#!/bin/bash
#runSend.bash
cd <somepath>/scripts
date +'%Y%m%d_%H:%M:%S'
if [ -f ./touch/updatesite ]; then
echo "running send"
cmd=sendUpdatedContent.bash
if ps -ef|grep "$cmd"|grep -v grep >/dev/null
then
echo "But, send script is already running. I quit this time!"
else
echo "I checked, but script is not running. So, here I go.."
cat runSend.log|tail -n5000 > $$
rm -f ./touch/*
mv -f $$ runSend.log
./${cmd}
fi
fi
This script checks
- Whether the file updatesite is present in the <somepath>/scripts/touch directory and
- Whether the sendUpdatedContent.bash script is not already running.
If the answer is yes to both questions, then it will run sendUpdatedContent.bash script.
I have yet another simple script:
#!/bin/bash
touch /<somepath>/scripts/touch/updatesite
Finally I added the following crontab entry:
14 23 * * * /<somepath>/scripts/initiate.bash
* * * * * cd /<somepath>/scripts/&& ./runSend.bash >> ./runSend.log 2>&1
The first entry creates the file touch/updatesite once a day (every night at 11:14). The second entry runs the runSend.bash command once each minute.
However, runSend.bash initiates the sendUpdateContent.bash only if the file touch/updatesite is present. This happens
- just after 11:15 each night
- just after running the php script.
When I want instant updates, I call the php script via web, just after I finish editing the site in DM.
Url Re-writing on Host
In my case I had to add the following two rewriting rules to my host to get the system running flowlesslly.
RewriteRule ^foo/bar/[a-z].*/(images/?.*)$ /$1 [L,R=301]
RewriteRule ^foo/bar/[a-z].*/(/?.*)$ /$1 [L,R=301]
Lock-down the database on host
Finally, I locked the database on the host. This can be done by adding a $wgReadOnlyFile entry to the LocalSettings.php:
$wgReadOnlyFile="$IP/../thiswikiislocakedxxxx";
Then the named file should be created with a text message explaining why the database is locked. (e.g. This site is not meant for editing.)