This post will contain information about our current migration to the new 23 data center in the Netherlands.
17:34 UTC Monday: We’re still in the process of copying data.
18:38 UTC Monday: I estimate that the copying process will take at least a couple more hours. If it’s done by midnight, I’ll announce downtime starting at 09:00 UTC on Tuesday.
18:46 UTC Monday: We’re currently pushing through 16 mbps between our Cologne and Amsterdam data centers. Earlier I measured 33 mbps. I love the Internet.
19:20 UTC Monday: You guys sure have a lot of photos! If we ever do this again, we’d probably have to move data the old-fashioned way.
22:00 UTC Monday: Copying won’t be done today. The copy job will continue through the night, and I’m announcing downtime starting at 09:00 UTC tomorrow.
22:13 UTC Monday: Here’s the service window annoucement.
06:02 UTC Tuesday: Copying is almost done. We’re on track to start the service window at 09:00 UTC.
08:45 UTC Tuesday: We’ve started the service window 15 minutes early to take advantage of the light traffic.
10:44 UTC Tuesday: I’ve started the final rsync run to ensure that the old and new systems have identical files.
13:14 UTC Tuesday: I had copied part of the data using scp without preserving permissions and modification dates, which meant that rsync was re-transferring those files. I’m now running my original copy job (with tar piped through ssh) on those files, around 18 GB, after which I’ll need to re-run rsync.
13:16 UTC Tuesday: People have asked in comments whether everything will be up, or whether it will take more time after the service window is ended. The answer is that everything should be back up by 16:00 UTC, but it’s always possible that I’ve made mistakes.
14:21 UTC Tuesday: The 18 GB I mentioned earlier have now been copied and I’ve started the rsync run.
15:09 UTC Tuesday: I just learned that chown doesn’t change ownership of symbolic links unless you pass the -h option, requiring a rsync restart and another hour. This is starting to get embarassing.
16:08 UTC Tuesday: The end is in sight. rsync is now transferring photos that were uploaded today.
16:15 UTC Tuesday: rsync is done. Now the rest depends on how fast I can type.
16:35 UTC Tuesday: The web server is set up again. I’m running a vacuum of the database. We should be up in a few minutes, apart from mail upload.
18:59 UTC Tuesday: The last problems with PostgreSQL 8.1 should be fixed now. The service is up again. It may be a bit slow for the next 12 hours because requests that hit the old IP address will be proxied to Amsterdam.
19:23 UTC Tuesday: The web server process is crashing every few minutes. Trying to work this out.
10:10 UTC Wednesday: The blog has been inaccessible for some people for at least 12 hours because I made a mistake in the DNS configuration. After the web server came up, the server process kept dying with a segmentation fault (”fatal signal 11″) every few minutes. I tried a number of things and will try to identify the precise cause of the error later, but it seems to be either an insufficient stack size or the version of tDOM that we are running. I changed the stack to 2 MB and upgraded to the CVS version of tDOM, and things seem to be running fine now. Zip downloads and url uploads will continue to be deactivated, probably for the rest of the week, until we get the second web server online.
14:26 UTC Thursday: We’ve had some serious performance issues since the transition. We isolated the cause to a maxthreads setting in AOLserver that was way to high. It’s been lowered to 35 (from 50) and things seem to be faster now. The second web server will probably be online this evening. This will improve performance while many users are uploading.