11.17.07

If you’ve ever tried to download a file larger than 2GB you quickly learned that on the vast majority of Linux servers, you can’t. This has to do with a limitation imposed by having a 32-bit files system interface which you likely have. Even if you have a 64-bit processor. So for all intensive purposes: If you have a site that is larger than 2GB tarred/zipped/whatevered you can’t move it in on big chunk. This article will explain how you can, and then some. Skip the steps you already know as I’m going to try to make this easy enough to explain for someone who has never used any of these commands before.

In order for this article to make sense to you, you really ought to have some experience with SSH and in order to actually do this, you’re going to need shell access to both the server you’re moving your site from and the server you’re moving your site to.

Archive your site using tar :
The first thing you need to do is archive your site. Since I’m a Plesk man, I would navigate to cd /var/www/vhosts/sitename.com/httpdocs and use tar -xvzf sitename.tar ./* to backup my site. Let’s assume that this is an arcade site and the archive ends up at around 10GB since this happens to be the situation I found myself in.

Splitting it into managable chunks using split :
This is going to take a calculator, so open up calc or get one handy. For me, I wanted to split my 10GB file into 1GB chunks for easy movement. The way I like to use split is by telling it exactly how big in bytes I want the files (the last file created will be whatever is left over, so don’t worry about trying to be spot on with your calculations) so I use split -C 1073741824 sitename.tar sitename since I know that 1GB is equal to 1073741824 bytes (1KB is 1024 bytes, 1MB is 1024KB, etc.). This will split the file sitename.tar into 1GB chunks and call the new files sitenameaa, sitenameab, sitenameac etc. Depending on the file size of the file you’re trying to split, this is could take awhile.

Change the ownership of these files so you can access them through the web using chown :
The first thing you want to do is find out what owner and group your files normally belong to. Right now, the files you crated probably belong to the user root and the group root. Use ls -l to find the user and group that are given to files uploaded via FTP – otherwise you probably won’t be able to access them over http. You’re looking for the two words that come after the drwx- characters and the proceeding number. For me it is ftpusername psacln (this is Plesk specific). To change the ownership of the files I just created I would use chown -R ftpusername.psacln file1 file2 file3. The -R switch will go into directories recursively, you don’t need it here but it’s a habit. file1, file2 and file3 can be the the name of a file or folder (as long as you use -R) and you can have as many as you want. Now our files are ready to be downloaded.

Move the files to your new server with wget :
Navigate the the site folder of your site on your new server. For me it’s always Plesk of course, so it’s cd /var/www/vhosts/sitename.com/httpdocs. For you it will depend on what OS and control panel you use. Now it is as easy as wget sitename.com/filename. wget downloads files just as you would if you went to it in your browser. After grabbing all your files, you’re ready for the next step.

Putting it all back together with cat :
For our purpose, cat is the opposite of split. It takes split files and puts them back together. Because split creates files in the format filenameaa, filenameab, filenameac, filenamead, etc. that is how cat will put them back together. The command to do this is cat filename* > filename.tar. This will put all the split files that begin with filename and create filename.tar with them. This could take a while as well depending on how big your site is.

Final steps using tar and chown :
We’re almost there. Now that you have your file all transfered and put back together you’ll want to use tar with the extract switch instead of the create switch to un-tar your file. This is as easy as tar -xvzf filename.tar. When you’re done, use chown to make sure the ownership of your files are correct, otherwise there’s a good chance you won’t be able to view your site. The easiest way to do this is from within the root folder of your site with chown -R user.group ./*. Make sure you do not forget the period before the forward slash. If you do, your server is going to be useless to you.

A lot to take in if you’re new to SSH, but when you go through the process it’s actually fairly simple. Make sure you back everything up before doing this as a single misplaced character can wreak havok on your system. I had to have a server re-imaged because I used chown -R user.group /* instead of chown -R user.group ./* once, so be careful. Don’t forget your database(s) either. That’s another article in itself.

Next I’ll show you how to get around that pesky SQL limit you’ve run into when trying to upload databases larger than 2MB.


What filesystem are you using? ext3 has a support of at least 16GiB files without any magic tricks, and up to 2TiB with an 8kb block size. I’d assume you’re using HFS or FAT16 (the only two filesystems I know of with a 2GB file limit). It’s nothing to do with the 32-bit interface (Minix, back in the 8086 days, supported up to 1GiB- back when 2MB of RAM was hot, and 8-bit computing was considered lavish- don’t get confused by Microsoft’s poor programming on the link between 32/64-bits and memory/file size limits)

I’m not entirely sure James. I know enough to get around and perform the tasks I need to in Linux and that’s about it. I do know it’s not just me though, it seems like it’s still a very common problem from the searching I did.

Thank you for your input and information though.

I’ll back Greg up on that. My Linux host also has the limitation of not being able to display or access anything larger than 2GB through the web, though the file itself can reside on the host without a problem. Perhaps it is a problem of Apache then?

Hi Greg, I always linked this limitation to the size of an signed 32 bit int. I’m not sure why in the era of cheap terabyte disks we’re still discussing this.

By the way, you can probably gzip or bzip2 the file before split’ing it, will save you tons of bandwidth and transfer time between servers.

Regards,

*required

*required (will not be published)

Allowed html: <a href="">, <b>, <strong>, <em>, <i>, <strike>, <code> and <blockquote>