User:Imz/research on HOW TO backup an IMAP account
(Better and worse ways to backup an IMAP account; a research of the available options.)
I want to maintain a backup of an "IMAP account" (I mean the collection of mailboxes/folders accessed via IMAP by a single user on a remote server), with history.
So I'm assessing the variants of doing this, choosing the optimal combination of individual decisions (regarding the subtasks of this greater task) and tools.
(I've also:
- used these notes for writing an answer to "Any file-based email client application? (With the purpose of putting the mailbox under VCS.)".
- have been busy with a more specific similar task for the migration from MS Exchange to dovecot+SOGo and written down some notes in Russian: ru:Участник:IvanZakharyaschev/Репликация почтового ящика.)
Subtasks and goals/desirable requirements
Tracking the history
I want to be able to revert to a certain state of the mailboxes in the past, or to inspect the past states, because I do not want to hold the trust that no valuable message gets accidentally deleted.
-- I'll probably use git to store the historical changes since it's a familiar (for me) and robust tool.
Alternatives:
- "darcs" might fit well the purpose. Its model might fit better our task: tracking of sets (commutative changes) (cf. sets of messages: deleted, added are commutative), one repository per branch (cf.: we do not need several branches for mail backup; but when it comes to #tracking the authorship of the changes, we indeed might want to have one branch per person (i.e., per repository, as in darcs) and these branches will need to be synced). Which one (git or darcs) will be better with the collisions that might happen to messages (where a message is deleted or added as a whole)?
Merging independent backups
(In other words, it's almost merging intermediate snapshots into the history.)
The situation: one person did a backup at one place (repository) at a certain moment, and another person did a backup at another copy of the repo at another moment.
Maximum info is the union of these infos. But how to merge the histories correctly with chosen history tracking tool?
Git ...
Alternatives:
darcs ...
The format for storing the mailboxes
And for git to handle and show the history in the most sensible way, it's very desirable to use a format which uses one file per one message.
-- Maildir is such well-known format. Moreover[???], each message is not allowed to change its content in IMAP or Maildir: meta-information is stored in the filenames -- this will make the history tracking (the git way) especially lucid.
Contra: Maildir might not be supported by all the software.
Alternatives:
- the non-"1 file per 1 msg" formats (mbox, etc.) -- not desirable;
- the [??? mix] format, but although it can use multiple files per mailbox, it arbitrarily groups several messages into one file [1], which is not nice for tracking the history in the most lucid way with git.
Transferring the mailboxes from the IMAP server
Tools that could do that (listed in [2] and found through other Google searches):
- ...
- movemail from mailutils (present in Sisyphus!)
- mailutil from uw-imap/pine/alpine -- http://www.washington.edu/imap/ (present in Sisyphus! (in pine))
- offlineimap (present in Sisyphus!)
- imapsync (present in Sisyphus!)
- http://blog.hoopycat.com/2009/07/imap2maildir-a-tool-for-mirroring-imap-t , http://github.com/rtucker/imap2maildir
- http://en.wikipedia.org/wiki/Getmail
- fetchmail (present in Sisyphus!)
- ...
- isync (not present in Sisyphus!)
- mailsync (not present in Sisyphus!)
- ...
- http://packages.debian.org/unstable/mail/imapcopy
- ... .
Forcing read-only access
It's desirable to ensure that the access to the server is effectively read-only; that's important if the tool does bi-directional sync.
- ...
- movemail
- (yes, unidirectional and has an option to keep messages remotely)
- mailutil transfer
- (yes)
- imapsync
- (yes?)
- offlineimap
- no (the feature has been requested)
- ...
Transferring the whole account (many folders at once, with one command)
- ...
- movemail
- (no)
- mailutil transfer
- yes
- imapsync
- (?)
- offlineimap
- yes
- ...
Being able to write to the preferred format -- Maildir
- ...
- movemail
- (yes)
- mailutil transfer
- yes, with a patch -- there are several variants of a Maildir patch for the c-client library:
- Eduardo Chappa's one[???] (on UIDs support in it)
- ...'s one[???]
- ....
- imapsync
- (indirectly yes) (a feature request to do it in a cleaner way)
- offlineimap
- yes, it's even the only supported format (also writing to IMAP is supported)
- ...
Partial transfers (only updates) of the mailboxes
Obviously, maintaining the local backup copy of a remote IMAP mailbox with a certain utility, I want that the second and further calls to this utility only transfer the new and updated messages.
- ...
- movemail
- no: In my case, "in reality, "movemail" creates second copies of the already downloaded messages. <...> "movemail" has an option with a semantics close to what I want; <...> So, in principle, a mechanism analogous to UIDL is present in IMAP, and is even more efficient than its POP counterpart, so "movemail" could use it to download only new messages. But in practice, this is not so."
- (The mechanism meant is unique ID (UID[3]) of a message in a folder on an IMAP server.)
- "Perhaps, storing UIDs is even a problem of the design of the Maildir format, and that makes it more difficult to implement what I wanted" -- [4] -- "But there still seem to exist tools which work with Maildirs and that do track UIDs: http://isync.sourceforge.net/ ; another one: http://mailsync.sourceforge.net/ , but this one tracks Message-ID fields instead" in order to allow partial updating transfers.
- mailutil transfer
- no: "it also duplicates the messages <...>, but that simply seems to conform to how it was intended to work."
- imapsync
- (yes, by means other than UID)
- offlineimap
- (yes, by means of UID)
- ...
Viability under incorrect UID support on the server
- ...
- movemail
- N/A
- mailutil transfer
- N/A
- imapsync
- ?
- offlineimap
- ? (probably, not quite viable w.r.t. partial updates: it will re-download the folder with the new UIDs (all the messages), if the UIDs on the server get reset)[1]
- ...
Speed
- ...
- movemail
- quite slow (since partial updates are not available)
- mailutil transfer
- quite slow (since partial updates are not available)
- imapsync
- ?
- offlineimap
- said to be quite fast
- ...
Accessing the backup data
The two big options:
- an MUA which would access the Maildir on the local filesystem (study the capabilities of individual MUAs);
- working via a new IMAP server.
Working with the local filesystem raises the issue of accessing the meta-information associated with the messages: the backup tool and the MUA might use different file formats for storing the meta-information; also, there will probably be no simple way of connecting some MUA's internal meta-information (stored internally in the MUA's "cache") associated with the messages on the old IMAP server with their backup copies. (Simply switching the IMAP server from the old one to a new one preserving the UIDs might be a solution to the second issue.)
Saving keywords, flags
(I'm almost sure any tool does this; that's why this section appeared behind the other (I don't worry much about this issue).)
- ...
- movemail
- (?)
- mailutil transfer
- yes (there is a command option for this)
- imapsync
- (?)
- offlineimap
- (?)
- ...
Making the new server to serve the old keywords
- uw-imap
- (???)
- dovecot
- uses a certain means to store the extra keywords in Maildirs[5][6], so if one has saved them, one could prepare them for dovecot.
Saving UIDs (to seamlessly switch to a backup IMAP server)
- ...
- movemail
- no, as follows from my experiments with movemail+Maildir
- mailutil transfer
- no, because of no support for UIDs in the Maildir driver available through the Chappa's patch. But possibly yes with other patches? This must be tested.
- imapsync
- ?; maybe no, because IMAP doesn't support setting of UIDs AFAIU (Remember that imapsync works via IMAP on both ends: it reads via IMAP and writes via IMAP.); maybe yes: perhaps, it keeps a correspondence list of the UIDs...[2]
- offlineimap
- yes, in the filenames inside Maildir[7]; with some additional information in a special "metadata" directory (e.g., "UID validities"--does a new IMAP server need this?..)
- ...
- http://mailsync.sourceforge.net/
- (no, "this one tracks Message-ID fields instead"[9])
- ...
Making the new server to serve the old UIDs
- uw-imap
- the Maildir patch (by Chappa) won't do this, because it doesn't at all try to preserve the UIDs between sessions, it doesn't store them [10]
- dovecot
- has a certain method to store UIDs (in a special file) [11][12]; so if one has the UIDs, one can translate the list of the UIDs into the format used by dovecot.
- courier-imap
- also stores the UIDs in a special own file[13][14], similarly to dovecot (but with formal differences).
- uw-imap with another Maildir patch[15]
- handles UIDs, stores them in the filename before the flags (extends the Maildir standard format), so in principle could be also a solution if I want a server to serve the backup copy.
- uw-imap with yet another Maildir patch[16]
- also stores the UIDs in the filenames, but I guess differently from the previous patch, and the author considers switching to a format compatible with courier-imap.
Restoring the content on the primary IMAP server
It must be impossible with preservation of UIDs if the access is only through IMAP, but otherwise one can--of course--use the following tools for this task:
- ...
- offlineimap
- How can one do this? (todo: Study the experience of other people! How at all does sync in offlineimap work?)
- ...
Conclusion
???
A presentation for technical non-specialists
[???] (todo: explain how it will function and the main features)
Additional features and more uses
Syncing the copies of a mailbox
(which are all being used by users) at different locations (hosting, office, home):
- Which tools can be used for this?
- offlineimap,
- imapsync(?),
- Lotus Domino(?),
- ..., ???, ... .
How are collisions (conflicts) resolved?
- ...
- offlineimap
- todo: How at all does sync in offlineimap work: what are the rules to determine the direction? How are collisions (conflicts) resolved? (Perhaps, if the locally stored statuses of IMAP folders are cleared, offlineimap will think that the messages have been added locally, and will start the upload in reverse direction?..)
- ...
How could collisions (conflicts) be resolved with a VCS?
???
tracking the authorship of the changes
Similar to #Syncing the copies of a mailbox, but we want the VCS (git, darcs) to track the authors by distinguishing the sources (locations; locations are associated with users) the changes are pulled from.
How good will git (or darcs) be in resolving collisions (conflicts) in this case?
Syncing only a part of the mail
This might be useful if there is a quota on an Internet-connected IMAP server, and we want to make reasonable use of it, say, by keeping only the most recent part of the messages of our mail account, or by keeping only some of the folders there.
- ...
- offlineimap
- Can it do something like this?.. (todo: find out!)
- ...
Reading from (backing up) a Gmail account
- offlineimap has some support.
- ...
Uploading more folders
- "mailutil transfer" must be a good tool for this.
Footnotes
- ↑ But how does it do IMAP-to-IMAP sync? It tracks the UIDs separately and still relies on correct UIDs?
- ↑ You see, offlineimap, which is based on UIDs, still has support for IMAP-to-IMAP operation.