User:Imz/research on HOW TO backup an IMAP account

From ALT Linux Wiki

(Better and worse ways to backup an IMAP account; a research of the available options.)

I want to maintain a backup of an "IMAP account" (I mean the collection of mailboxes/folders accessed via IMAP by a single user on a remote server), with history.

So I'm assessing the variants of doing this, choosing the optimal combination of individual decisions (regarding the subtasks of this greater task) and tools.

(I've also:

Subtasks and goals/desirable requirements

Tracking the history

I want to be able to revert to a certain state of the mailboxes in the past, or to inspect the past states, because I do not want to hold the trust that no valuable message gets accidentally deleted.

-- I'll probably use git to store the historical changes since it's a familiar (for me) and robust tool.

Alternatives:

  • "darcs" might fit well the purpose. Its model might fit better our task: tracking of sets (commutative changes) (cf. sets of messages: deleted, added are commutative), one repository per branch (cf.: we do not need several branches for mail backup; but when it comes to #tracking the authorship of the changes, we indeed might want to have one branch per person (i.e., per repository, as in darcs) and these branches will need to be synced). Which one (git or darcs) will be better with the collisions that might happen to messages (where a message is deleted or added as a whole)?

Merging independent backups

(In other words, it's almost merging intermediate snapshots into the history.)

The situation: one person did a backup at one place (repository) at a certain moment, and another person did a backup at another copy of the repo at another moment.

Maximum info is the union of these infos. But how to merge the histories correctly with chosen history tracking tool?

Git ...

Alternatives:

darcs ...

The format for storing the mailboxes

And for git to handle and show the history in the most sensible way, it's very desirable to use a format which uses one file per one message.

-- Maildir is such well-known format. Moreover[???], each message is not allowed to change its content in IMAP or Maildir: meta-information is stored in the filenames -- this will make the history tracking (the git way) especially lucid.

Contra: Maildir might not be supported by all the software.

Alternatives:

  • the non-"1 file per 1 msg" formats (mbox, etc.) -- not desirable;
  • the [??? mix] format, but although it can use multiple files per mailbox, it arbitrarily groups several messages into one file [1], which is not nice for tracking the history in the most lucid way with git.

Transferring the mailboxes from the IMAP server

Tools that could do that (listed in [2] and found through other Google searches):

Forcing read-only access

It's desirable to ensure that the access to the server is effectively read-only; that's important if the tool does bi-directional sync.

...
movemail
(yes, unidirectional and has an option to keep messages remotely)
mailutil transfer
(yes)
imapsync
(yes?)
offlineimap
no (the feature has been requested)
...

Transferring the whole account (many folders at once, with one command)

...
movemail
(no)
mailutil transfer
yes
imapsync
(?)
offlineimap
yes
...

Being able to write to the preferred format -- Maildir

...
movemail
(yes)
mailutil transfer
yes, with a patch -- there are several variants of a Maildir patch for the c-client library:
imapsync
(indirectly yes) (a feature request to do it in a cleaner way)
offlineimap
yes, it's even the only supported format (also writing to IMAP is supported)
...

Partial transfers (only updates) of the mailboxes

Obviously, maintaining the local backup copy of a remote IMAP mailbox with a certain utility, I want that the second and further calls to this utility only transfer the new and updated messages.

...
movemail
no: In my case, "in reality, "movemail" creates second copies of the already downloaded messages. <...> "movemail" has an option with a semantics close to what I want; <...> So, in principle, a mechanism analogous to UIDL is present in IMAP, and is even more efficient than its POP counterpart, so "movemail" could use it to download only new messages. But in practice, this is not so."
(The mechanism meant is unique ID (UID[3]) of a message in a folder on an IMAP server.)
"Perhaps, storing UIDs is even a problem of the design of the Maildir format, and that makes it more difficult to implement what I wanted" -- [4] -- "But there still seem to exist tools which work with Maildirs and that do track UIDs: http://isync.sourceforge.net/ ; another one: http://mailsync.sourceforge.net/ , but this one tracks Message-ID fields instead" in order to allow partial updating transfers.
mailutil transfer
no: "it also duplicates the messages <...>, but that simply seems to conform to how it was intended to work."
imapsync
(yes, by means other than UID)
offlineimap
(yes, by means of UID)
...

Viability under incorrect UID support on the server

...
movemail
N/A
mailutil transfer
N/A
imapsync
?
offlineimap
? (probably, not quite viable w.r.t. partial updates: it will re-download the folder with the new UIDs (all the messages), if the UIDs on the server get reset)[1]
...

Speed

...
movemail
quite slow (since partial updates are not available)
mailutil transfer
quite slow (since partial updates are not available)
imapsync
?
offlineimap
said to be quite fast
...

Accessing the backup data

The two big options:

Working with the local filesystem raises the issue of accessing the meta-information associated with the messages: the backup tool and the MUA might use different file formats for storing the meta-information; also, there will probably be no simple way of connecting some MUA's internal meta-information (stored internally in the MUA's "cache") associated with the messages on the old IMAP server with their backup copies. (Simply switching the IMAP server from the old one to a new one preserving the UIDs might be a solution to the second issue.)

Saving keywords, flags

(I'm almost sure any tool does this; that's why this section appeared behind the other (I don't worry much about this issue).)

...
movemail
(?)
mailutil transfer
yes (there is a command option for this)
imapsync
(?)
offlineimap
(?)
...

Making the new server to serve the old keywords

uw-imap
(???)
dovecot
uses a certain means to store the extra keywords in Maildirs[5][6], so if one has saved them, one could prepare them for dovecot.

Saving UIDs (to seamlessly switch to a backup IMAP server)

...
movemail
no, as follows from my experiments with movemail+Maildir
mailutil transfer
no, because of no support for UIDs in the Maildir driver available through the Chappa's patch. But possibly yes with other patches? This must be tested.
imapsync
?; maybe no, because IMAP doesn't support setting of UIDs AFAIU (Remember that imapsync works via IMAP on both ends: it reads via IMAP and writes via IMAP.); maybe yes: perhaps, it keeps a correspondence list of the UIDs...[2]
offlineimap
yes, in the filenames inside Maildir[7]; with some additional information in a special "metadata" directory (e.g., "UID validities"--does a new IMAP server need this?..)
...
http://isync.sourceforge.net/
(yes[8])
http://mailsync.sourceforge.net/
(no, "this one tracks Message-ID fields instead"[9])
...

Making the new server to serve the old UIDs

uw-imap
the Maildir patch (by Chappa) won't do this, because it doesn't at all try to preserve the UIDs between sessions, it doesn't store them [10]
dovecot
has a certain method to store UIDs (in a special file) [11][12]; so if one has the UIDs, one can translate the list of the UIDs into the format used by dovecot.
courier-imap
also stores the UIDs in a special own file[13][14], similarly to dovecot (but with formal differences).
uw-imap with another Maildir patch[15]
handles UIDs, stores them in the filename before the flags (extends the Maildir standard format), so in principle could be also a solution if I want a server to serve the backup copy.
uw-imap with yet another Maildir patch[16]
also stores the UIDs in the filenames, but I guess differently from the previous patch, and the author considers switching to a format compatible with courier-imap.

Restoring the content on the primary IMAP server

It must be impossible with preservation of UIDs if the access is only through IMAP, but otherwise one can--of course--use the following tools for this task:

...
offlineimap
How can one do this? (todo: Study the experience of other people! How at all does sync in offlineimap work?)
...

Conclusion

???

A presentation for technical non-specialists

[???] (todo: explain how it will function and the main features)

Additional features and more uses

Syncing the copies of a mailbox

(which are all being used by users) at different locations (hosting, office, home):

Which tools can be used for this?
  • offlineimap,
  • imapsync(?),
  • Lotus Domino(?),
  • ..., ???, ... .

How are collisions (conflicts) resolved?

...
offlineimap
todo: How at all does sync in offlineimap work: what are the rules to determine the direction? How are collisions (conflicts) resolved? (Perhaps, if the locally stored statuses of IMAP folders are cleared, offlineimap will think that the messages have been added locally, and will start the upload in reverse direction?..)
...

How could collisions (conflicts) be resolved with a VCS?

???

tracking the authorship of the changes

Similar to #Syncing the copies of a mailbox, but we want the VCS (git, darcs) to track the authors by distinguishing the sources (locations; locations are associated with users) the changes are pulled from.

How good will git (or darcs) be in resolving collisions (conflicts) in this case?

Syncing only a part of the mail

This might be useful if there is a quota on an Internet-connected IMAP server, and we want to make reasonable use of it, say, by keeping only the most recent part of the messages of our mail account, or by keeping only some of the folders there.

...
offlineimap
Can it do something like this?.. (todo: find out!)
...

Reading from (backing up) a Gmail account

Uploading more folders

  • "mailutil transfer" must be a good tool for this.

Footnotes

  1. But how does it do IMAP-to-IMAP sync? It tracks the UIDs separately and still relies on correct UIDs?
  2. You see, offlineimap, which is based on UIDs, still has support for IMAP-to-IMAP operation.