Developer Techno: Administering MySQL - Transferring Large Files

Copying, compressing, and decompressing huge files (often across a network) are common tasks when administering MySQL, initializing servers, cloning slaves, and performing backups and recovery operations. The fastest and best ways to do these jobs are not always the most obvious, and the difference between good and bad methods can be significant. It’s common to begin with an uncompressed file, such as one server’s InnoDB tablespace and log files. You also want the file to be decompressed when you finish copying it to the destination, of course. The other common scenario is to begin with a compressed file, such as a backup image, and finish with a decompressed file. If you have limited network capacity, it’s usually a good idea to send the files across the network in compressed form. You might also need to do a secure transfer, so your data isn’t compromised; this is a common requirement for backup images.

Copying Files
The task, then, is to do the following efficiently:
1. (Optionally) compress the data.
2. Send it to another machine.
3. Decompress the data into its final destination.
4. Verify the files aren’t corrupted after copying.

A Naive Example
We begin with a naïve example of how to send an uncompressed file securely from one machine to another, compress it en route, and then decompress it. On the source server, which we call server1, we execute the following:

server1$ gzip -c /backup/mydb/mytable.MYD > mytable.MYD.gz
server1$ scp mytable.MYD.gz root@server2:/var/lib/myql/mydb/

And then, on server2:

server2$ gunzip /var/lib/mysql/mydb/mytable.MYD.gz

This is probably the simplest approach, but it’s not very efficient because it serializes the steps involved in compressing, copying, and decompressing the file. Each step also requires reads from and writes to disk, which is slow. Here’s what really happens during each of the above commands: the gzip performs both reads and writes on server1, the scp reads on server1 and writes on server2, and the gunzip reads and writes on server2.

A One-Step Method
It’s more efficient to compress and copy the file and then decompress it on the other end in one step. This time we use SSH, the secure protocol upon which SCP is based. Here’s the command we execute on server1:

server1$ gzip -c /backup/mydb/mytable.MYD ssh root@server2
"gunzip -c - > /var/lib/mysql/mydb/mytable.MYD"

This usually performs much better than the first method, because it significantly reduces disk I/O: the disk activity is reduced to reading on server1 and writing on server2. This lets the disk operate sequentially.

You can also use SSH’s built-in compression to do this, but we’ve shown you how to compress and decompress with pipes because they give you more flexibility. For example, if you didn’t want to decompress the file on the other end, you wouldn’t want to use SSH compression.

You can improve on this method by tweaking some options, such as adding –1 to make the gzip compression faster. This usually doesn’t lower the compression ratio much, but it can make it much faster, which is important. You can also use different compression algorithms. For example, if you want very high compression and don’t care about how long it takes, you can use bzip2 instead of gzip. If you want very fast compression, you can instead use an LZO-based archiver. The compressed data might be about 20% larger, but the compression will be around five times faster.

Avoiding Encryption Overhead
SSH isn’t the fastest way to transport data across the network, because it adds the overhead of encrypting and decrypting. If you don’t need encryption, you can just copy the “raw” bits over the network with netcat. You invoke this tool as nc for noninteractive operations, which is what we want.

Here’s an example. First, let’s start listening for the file on port 12345 (any unused port will do) on server2, and uncompress anything sent to that port to the desired data file:
server2$ nc -l -p 12345 gunzip -c - > /var/lib/mysql/mydb/mytable.MYD
On server1, we then start another instance of netcat, sending to the port on which
the destination is listening. The -q option tells netcat to close the connection after it
sees the end of the incoming file. This will cause the listening instance to close the
destination file and quit:

server1$ gzip -c - /var/lib/mysql/mydb/mytable.MYD nc -q 1 server2 12345

An even easier technique is to use tar, so filenames are sent across the wire, eliminating another source of errors and automatically writing the files to their correct locations. The z option tells tar to use gzip compression and decompression. Here’s the command to execute on server2:

server2$ nc -l -p 12345 tar xvzf -

And here’s the command for server1:

server1$ tar cvzf - /var/lib/mysql/mydb/mytable.MYD nc -q 1 server2 12345

You can assemble these commands into a single script that will compress and copy lots of files into the network connection efficiently, then decompress them on the other side.

Other Options
Another option is rsync. rsync is convenient because it makes it easy to mirror the source and destination and because it can restart interrupted file transfers, but it doesn’t tend to work as well when its binary difference algorithm can’t be put to good use. You might consider using it for cases where you know most of the file doesn’t need to be sent—for example, for finishing an aborted nc copy operation.

You should experiment with file copying when you’re not in a crisis situation, because it will take a little trial and error to discover the fastest method. Which method performs best will depend on your system. The biggest factors are how many disk drives, network cards, and CPUs you have, and how fast they are relative to each other. It’s a good idea to monitor vmstat -n 5 to see whether the disk or the CPU is the speed bottleneck.

If you have idle CPUs, you can probably speed up the process by running several copy operations in parallel. Conversely, if the CPU is the bottleneck and you have lots of disk and network capacity, omit the compression. As with dumping and restoring, it’s often a good idea to do these operations in parallel for speed. Again, monitor your servers’ performance to see if you have unused capacity. Trying to over-parallelize may just slow things down.

Source of Information : OReIlly High Performance MySQL Second Edition

Developer Techno

Administering MySQL - Transferring Large Files

0 comments

My Favorite Links

Categories

About Me

Live Traffic Feed

Archives

Blogger Communities