I recently got to take on the task of migrating a social networking website from an aging system & platform to a new cloud-hosted platform. The old system consisted of Windows 2000 Server, ColdFusion 5, MSSQL 2000, and IIS 5. The new platform is running Ubuntu 12.04 LTS Linux, MySQL 5.5, Railo 3, and Apache 2.2.
The first catch was that all of this had to be done with only FTP access to the source tree.
The second catch was that the site was still running ColdFusion 5, from 2001 - which meant a lot of missing features.
No SSH, Remote Desktop, no SQL Server Management Studio, no ColdFusion admin access, etc. Just FTP.
I was also in a hurry to get it done, as it was all being done on a volunteer basis, and I was moving out of my house in a couple of weeks …
This post serves to document the process a bit for future developers and technically inclined users of the site that might be curious about what was involved.
The website under discussion was originally built by a single developer, starting in 2003, and grew to have thousands of active members across the world, mostly in the USA and Europe. I did some development for the website in 2007, adding some cool new features, fixing numerous bugs, and patching security holes. I stopped active development later that year as I became busy with other work, and because I was bascially crippled as a developer due to inadequte access (FTP only).
Getting the source code was the easy part. The site has been in existence since 2003, but did not have any source control, so the first task was to put everything in to git. This involved removing or ignoring duplicate & unnecessary files, but was rather straightforward.
This was the most painful part, and the one that caused the most issues after the migration. As I was not able to access the database through standard management tools, I was left with using unix tools and ColdFusion to (re)-create the DB schema and dump the data.
I had to refresh my ColdFusion knowledge from 5 years ago, but that didn’t take long (as it didn’t take too long to learn then, either). I just popped open the PDF file for the CFML Reference and was on my way.
Getting the Table Names
I extracted the table names from the SQL statements embedded in the CFML files using this command:
egrep -r "(FROM|INTO|UPDATE) " * | perl -p -e 's/.*(FROM|INTO|UPDATE) //i' | sed 's/ .*//' | sort | uniq
Since all the SQL was capitalized, this got me all of the table names plus a bit of extra junk which I threw out.
I wrote a simple ColdFusion script to read in my list of table names, query the database, and get the column names.
From this I generated a text dump of MySQL
CREATE TABLE statements. I defaulted all of the column types to
to avoid any truncation of data later during the DB load.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
I uploaded the CFML script to the server via FTP, loaded it in the browser, and copied the results in to a SQL file on my local development box.
Unfortunately I was not able to use features introduced in ColdFusion 7 to get column metadata, nor was I able to use ColdFusion 8’s CFDBInfo to get the columns and tables.
Once I had the SQL schema, I had to manually edit the file to add
AUTO_INCREMENT to the ID columns,
specify primary keys, and add indexes. I changed all primary keys and indexed columns to
VARCHAR, depending on the values.
I used a number of sources to figure out which indexes to add to the DB:
- SQL queries inside most accessed pages
- knowledge of site
- previous emails to site creator from when I was doing active development
- slow page loads during testing
Dumping the Data
I wrote another script (using this one as reference) to dump out the data for each of the database tables.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Dumping out all of the data using this method took over 30 hours. Some requests failed for whatever reason and I had
to re-run them. At one point I think I brought down the old site because I tried to run multiple DB dumps in parallel.
I had to add the
RequestTimeout parameter to the URL with a large value to get complete dumps of most tables.
Loading in to MySQL
Loading the database in to MySQL was accomplished with a bash script:
1 2 3 4 5 6 7 8 9
CHARACTER SET clause was something I actually missed the first time around, which caused issue with any entries
that had accent marks or special characters (French mostly), because the target tables were set to UTF-8. This proved
to be a bit of a mess to cleanup, since it wasn’t caught until after the site went live. I ended up reloading the
DB dumps in to a second DB and then using a combination of
INSERT statements with
WHERE clauses based
on the migration date and/or last modification times to each row.
Railo was installed via the command line, and the default options were used for most parameters.
A entry in the
etc/webdefault.xml config file had to be changed to
1 2 3 4
I configured the datasources in the Railo web admin to match the names reference in the existing source code.
I had switched the Zero DateTime behavior datasource option to
convertToNull to prevent errors.
localhost in the SMTP configuration.
A custom error template is used by the site, so I updated the path to that as well.
In addition to the dynamic CFML pages, the website hosts a lot of images for member profiles and events. Although the image directory path on the server was available via FTP, a flat directory structure had been employed to store the images, so tens of thousands of images or more were stored in a single directory, making it impossible to use FTP to access and download them.
To get around this, I wrote another set of scripts, this time in PHP and using the Zend DB adapter (since I’ve been working with both recently, and it was fastest & easiest).
INI file with DB parameters:
1 2 3 4 5 6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
Source Code updates
Various source code updates were of course needed after the migration. A combination of manual edits and a lot of Perl one-liners like this were used:
perl -pi -e 's/ORIG/NEW/' $(ack-grep -l ORIG)
This let me recursively search the source code tree for files with certain strings to replace and then update those files in place with a regex substitution.
I probably could have used an IDE for this (see next section), but this was fast & familiar, and I could pipe the
grep to limit files to be edited, etc.
For editing the source code I chose to use CFEclipse. Eclipse tends to be bloated and slow in my experience, but for ColdFusion, this seemed like the best option. I use Eclipse with the viPlugin which I purchased years ago most of the time, but will use Gvim at times if I’m having issues with Eclipse.
Of course, the migration from a Windows + MSSQL platform to a Linux + MySQL system required updates to the SQL throughout the code. Luckily, the SQL statements did not use a lot of vendor-specific extensions or syntax, so this wasn’t nearly as bad as it could have been.
One of the first things to fix was the capitalization of table names in the SQL statements to match the proper table names in MySQL.
perl -pi -e 's/(FROM|INTO|UPDATE) (<table_name>)/\1 <fixed_name>/i' $(ack-grep -l "(FROM|INTO|UPDATE) <table_name>")
Next all of the
GETDATE() functions had to be changed to
NOW() and the first
D parameter removed from the
DATEDIFF() function calls.
A handful of queries used
SELECT @@IDENTITY ... to get the key
for newly inserted rows.
result parameter for
<cfquery> and the
GENERATED_KEY property were used to replace this. These edits
were done manually.
I manually updated all of the usages of
TOP to use MySQL’s
One thing that I didn’t do was to update all of the usages of
MaxRows to use the MySQL
LIMIT clause instead.
- MAXROWS Attribute - Not as good as I thought
- The dangers of using cfquery maxRows | Evagoras Charalambous
Image file paths & uploads
Due to the restructuring of the image directory tree, all of the URLs for images had to be updated in the code.
This update was mostly accomplished with
perl one-liners with regular expressions as well.
The old version of the site used the CFX_Image custom tag for image processing. Because of the introduction of the
<cfimage> tag in ColdFusion 8, I was able to update the code to use this tag instead. Tag parameters had to
be renamed, and file paths had to be updated (Windows -> Linux). This change was done manually.
There were a handful of other updates that had to be done.
I replaced absolute URLs with the website domain with relative ones throughout the code (using Perl again). I probably
should have used
#CGI.HTTP_HOST# instead, but I thought of that later. An easy fix with Perl or and IDE later …
There were unused custom CF tags that were removed from the source code. This was done manually.
Various include paths that used absolute paths were changed via Perl one-liners to use relative paths.
Hosting & Migration
Once I had a copy of the website up & running successfully on my local system, I needed to find a new home for the site. I’ve been using Amazon EC2 for hosting my blog and a few applications, and I’ve been pleased with it. I also wanted to see if we could get away with using the AWS free tier for hosting.
I set up a micro instance first, including two EBS volumes: one for the MySQL DB and another for the member-uploaded images.
I installed the necessary packages on the barebones Ubuntu instance to start:
sudo apt-get install apache2 mysql-server
I installed Railo with Tomcat using the installer.
EBS volume setup:
# parted /dev/xvdf GNU Parted 2.3 Using /dev/xvdf Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) mklabel gpt (parted) print Model: Xen Virtual Block Device (xvd) Disk /dev/xvdf: 32.2GB Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags (parted) mkpart primary 1 32.2GB (parted) quit # mkfs.ext4 /dev/xvdf1
An elastic IP was also added to the account and assigned to the EC2 instance later when reverse DNS was configured.
Source Control & Deployment
I set up a bare git repository on the server and pushed all of the code to it. I cloned the source code
in to a directory in the
ubuntu user’s home directory and created a symlink for the Railo
On the server:
1 2 3 4 5 6 7 8 9
git remote add --track master origin email@example.com:/var/git/myrepo git push origin master
I added a Makefile target for deployment:
1 2 3
Development Site & Testing
Once the site was up & running on EC2, and I had fixed all of the known issues and tested that most pages loaded without error, I published a URL for the site so that others could test it.
Due to the fact that this was a volunteer effort, and my time was already very limited, I was forced to push the more extensive testing off to users of the site. Another reason to push the testing to the users was that I was no longer an active user of the site myself, so it was not clear which features were currently being used by members, and should receive the highest priority for testing.
Testing by a few users help reveal a bunch of bugs & issues which I fixed. However, testing was not adequately thorough, as we found out after the site went live. Lacking experienced (paid) software testers with enough time left us with a bit of a Catch 22:
To get people to test the site & find bugs, we pretty much had to switch the live site to the new site, so that they would be forced to use it.
Due to the shortage of time & resources, and the need to get the new site up & running so that other developers could continue from this point, it was decided to go live with the new site.
For the migration, I disabled access to the old website for all but my IP address, via an edit to the main CFML include file. In place of the regular site, I put up a notice about the site being down for maintenance. This downtime was necessary to avoid losing updates to the database during the migration.
Next, I re-ran the script to dump the DB data, downloaded it via FTP to my box, loaded it back up locally, dumped
it again with
mysqldump, uploaded it to the EC2 instance, and restored the DB via
mysql. I reran the
scripts to pull the images from the live server, and ran
rsync to push them back up to the EC2 instance.
Due to the method used to dump the DB, the site was offline for about 30 hours.
Once all of this was complete, I update the DNS record for the domain, and the site was live.
Upgrade to Small instance
We very quickly realized that a micro instance
wasn’t going to cut it. Railo and Tomcat would hang, MySQL struggled,
and Apache even got stuck. Luckily, upgrading to a larger instance was a minimal amount of work. I repeated the
installs of the Ubuntu packages, re-installed Railo, used
rsync to copy over data in the home directory, shutdown
the EBS volumes and re-attached them to the new instance. Total downtime was about 10 minutes this time around.
More Help, Misc. Updates
About this time two others with technical expertise joined effort. I became rather unavailable as I moved out of my current house and prepared for some travel, and they took over where I had left off - fixing bugs, getting the email delivery system working, and cleaning up the site.
Railo/Tomcat seemed to hang or get stuck once or twice a day for a yet unknown reason. To address this, a simple restart script was hacked together and added to the crontab.
1 2 3 4
0 8,20 * * * /home/ubuntu/restartRailo
As the system is a social networking site, it must send a lot of emails. This effort was led by one of the team members that joined the effort after the migration.
One of the first things to do was to remove the limit on EC2 for outoing emails, and set up reverse DNS.
An SPF entry was added to DNS.
The source code was updated so that the email address in the
From: header was valid. The SMTP port was opened
in the Amazon firewall. Aliases were set up in Postfix so that bounced email could be received and routed.
We verified that both the domain and IP address were not listed in the Spamhaus blacklist.
Most issues seemed to be related to the DB schema and incomplete email configuration at the time of migration. Most of these problems were taken care of by the new team members.
Now that we have multiple cooks in the kitchen, we’ve moved to Trello for tracking issues and requests.
The new site has full root SSH access, making just about anything possible.
Visual improvements and usability issues already being addressed.
Despite being frustrated, it is my impression that many members are happy and supportive of the changes and progress being made. It is my belief - which I think is shared by many others - that this migration was sorely needed so that further progress could be made before the site dies or becomes completely irrelevant (replaced by Facebook).
There are still huge code duplication issues with the existing codebase and other cleanup that needs done, so a rewrite seems likely once currently used features are fixed & stable and high priority bugs and usability issues have been addressed. Of course, I’ll leave that decision up to the new team.
I wish them luck!