I’ve got files. I want to send those files to another computer that I have ssh access to. What do I do?
Secure File Copy
I’d assume most who have used ssh in some capacity are familiar with
the program scp
.
Excerpt from man page scp(1):
scp copies files between hosts on a network.
scp uses the SFTP protocol over a ssh(1) connection for data transfer, and uses the same authentication and provides the same security as a login session.
scp will ask for passwords or passphrases if they are needed for authentication.
Say you want to transfer folder of files to some remote machine’s
user’s folder. Say the folder is called x
, the remote machine is
called remote
and the remote user is called user
. The invocation
of scp would look like
scp -r x user@remote:~/folder
where -r
stands for recursive. The right hand is a classic URL
construction and is both intuitive and universal on the *nix’s.
Why I hate SCP
Well actually I don’t really? It’s an acceptable program for what it’s meant to do. It’s simple and only requires SSH. But it’s not very smart, which makes it wasteful.
It craps itself when your connection drops
If partway through a transfer the connection drops, maybe your internet drops for whatever reason or the remote machine dies, SCP will just stop. If you invoke it again, it will start right from the beginning of the file.
It has no sense of caching in files
Let’s say I have the same file on my machine as the remote machine. I
add a newline then copy it over to the remote. scp
will copy the
entire file, byte by byte, to the remote machine even though the
actual difference between the two is only one byte.
It has no sense of caching in folders
Let’s say I have the same folder of files on my machine as the remote
machine. I add a file to the folder. scp
will copy the entire
folder, file by file, byte by byte to the remote machine despite only
realistically needing to copy one file.
You may argue that this is entirely the caller’s fault; if they want to only copy the specific new file they should do so manually. To that I say: why can’t the program figure that out for me?
With that, I introduce…
Rsync: The only copy program you’ll ever need
Rsync is another file copy program for the *Nix’s. It usually uses a SSH connection to send and receive data, but it can use other mediums for a connection. It can even transfer files on the same machine1. It requires the rsync program present on both the origin and the remote, but that’s a small requirement for the returns this program can bring.
Rsync’s claim to fame is the delta transfer algorithm. When
transferring some file, rsync can calculate the difference between the
origin and the remote’s version of the file. Rsync will only transfer
this delta (the updated portions) over the wire. This, of course,
speeds up transfers. The speedup is most apparent with large folders
where deltas are recursively computed for each member of the folder,
such that only new changes across the entire folder are sent over the
wire. These deltas can even be compressed via a variety of
compression programs such as xz
or zstd
to make the transfers even
smaller.
You can see clearly why rsync needs to be installed on both sides of the transfer: the program both computes a diff between the origin and the remote’s versions of a file, and applies the diff on the remote. Having a program on both sides also means transfers can be paused or dropped half way then resumed later: figuring out what’s left to transfer of a file is the same as computing the remaining delta to send.
You may say that this would’ve been useful in ye olde days when internet speeds were very slow, while nowadays internet speeds and storage sizes are so large that this is a negligible speed increase. I argue that it doesn’t matter if internet speeds are faster; transferring a 1GB folder will always be noticeably slower than transferring the 10MB delta.
-
This seems pointless but with delta-transfer it’s actually quite fast at doing rather large copies. ↩︎