How To Efficiently Copy Files To Multiple Destinations

May 8th, 2024

I bought two new hard drives recently and I'm currently doing a lot of data shuffling to get the drives into the configuration I want. I wanted to copy the files from one drive to two others so there will be an extra backup before I format the original drive. The set weighs about 6TiB so it had to be done as efficiently as possible.

I needed something that would read the data once, and copy it to both destinations in parallel. Neither rsync nor cp could do what I wanted, but then I found this great solution by Kamil Maciorowski on Stack Exchange.

tar -c /source/dirA/ /source/file1 |
	tee >(cd /foo/destination3/ && tar -x) >(cd /bar/destination2/ && tar -x) \
		>(cd /foobar/destination1/ && tar -x) > /dev/null

How does it work? First tar converts directories and files to a single bitstream that can be used in a pipe. The tee command forks that stream; every copy but one is extracted by tar in proper destination. The last copy moves down the pipe; it is discarded into /dev/null. (One may use the last copy for destination0 but the syntax would be different so I decided to keep it simple with tee only).

Genius! I wanted to see my progress, so I inserted pv in the pipeline. Mine ended up looking something like this.

tar -cS * | pv -s "$(du -bs --apparent-size . | cut -f 1)" -m 300 |
	tee >(cd /foobar/destination1/ && tar -x) >(cd /bar/destination2/ && tar -x) >/dev/null

-S (--sparse) is very important in the tar command because it makes tar handle sparse files properly. On the first run, I forgot I had a few disk images as sparse files and I realized it was writing out the empty parts to the disk.

pv is telling me there's about 14 hours left in the transfer. I better get a cup of tea.

[ ← On Website Builders | Notes Index | Fix Linux Mint Update Manager Unreachable Message When Using Apt-Cacher NG → ]