I bought two new hard drives recently and I'm currently doing a lot of data shuffling to get the drives into the configuration I want. I wanted to copy the files from one drive to two others so there will be an extra backup before I format the original drive. The set weighs about 6TiB so it had to be done as efficiently as possible.
I needed something that would read the data once, and copy it to both destinations in parallel. Neither rsync nor cp could do what I wanted, but then I found this great solution by Kamil Maciorowski on Stack Exchange.
tar -c /source/dirA/ /source/file1 | tee >(cd /foo/destination3/ && tar -x) >(cd /bar/destination2/ && tar -x) \ >(cd /foobar/destination1/ && tar -x) > /dev/null
How does it work? First tar converts directories and files to a single bitstream that can be used in a pipe. The tee command forks that stream; every copy but one is extracted by tar in proper destination. The last copy moves down the pipe; it is discarded into /dev/null. (One may use the last copy for destination0 but the syntax would be different so I decided to keep it simple with tee only).
Genius! I wanted to see my progress, so I inserted pv in the pipeline. Mine ended up looking something like this.
tar -cS * | pv -s "$(du -bs --apparent-size . | cut -f 1)" -m 300 | tee >(cd /foobar/destination1/ && tar -x) >(cd /bar/destination2/ && tar -x) >/dev/null
-S (--sparse) is very important in the tar command because it makes tar handle sparse files properly. On the first run, I forgot I had a few disk images as sparse files and I realized it was writing out the empty parts to the disk.
pv is telling me there's about 14 hours left in the transfer. I better get a cup of tea.
[ ← On Website Builders | Notes Index ]