mu-index

Langue: en

Version: 252159 (debian - 07/07/09)

Section: 1 (Commandes utilisateur)

NAME

mu-index - index the contents of a Maildir or file

SYNOPSIS

mu-index [options] [maildir]

DESCRIPTION

mu-index recursively scans a email messages, and stores the information found. This stored information can then be used for searching emails. Usually, a recursive set of Maildir folders is scanned for all messages, but also single messages can be scanned.

mu-index understands Maildirs as defined by Dan Bernstein for qmail. It also understands recursive Maildirs (Maildirs within Maildirs), and the VFAT-version of Maildir, as used by Tinymail/Modest.

The first run of mu-index can take some time; on the author's 4-year old machine using mu version 0.2, scanning 10000 messages takes around 92 seconds, or about 110 messages per second. Fortunately, a full scan has to be done only once, after that it suffices to index the changes, which goes much faster.

There is ongoing work on optimizing mu-index ; so hopefully future versions will be faster (note that version 0.2 is almost three times faster than version 0.1).

mu-index stores the information it finds in the mu-directory in two databases:

The first one is mu-sqlite.db-<version> , which is an SQLite3 database with meta data about the indexed messages. One can access this database with the sqlite3(1) command line tool. In the mu-source code, you can find the database schema in index/storage.sql.

The second one is mu-xapian.db-<version> , which is a Xapian database and contains content data about indexed messages found.

The database schema might change in future versions; however note that the <version> refers to the database version , which is not necessarily following the versions of mu itself.

NOTE: It is probably not a good idea to run multiple instances of mu-index that write to the same database at the same time. No data loss should occur, but one or more of the instance might experience errors due to database locks.

Also, before indexing has completed, searches for messages may fail, even if they have already been indexed, as some of the esssential database information will only be written at the end of the indexing.

OPTIONS

(NB: see CONFIGURATION as well for using the configuration file)

--maildir=, -m <maildir> set the full path to the maildir; note that you can also specify this path as a non-option argument to mu-index ; if you use both, the non-option argument wins.

--quiet|-q makes mu-index not put out any progress info during its indexing. This is not the default, as running may take quite some time, and might confuse novice users.

General options

--home=, -h <dir> sets the mu home directory; default is ~/.mu ; this directory is where the message database is stored, as well as configuration files and logs.

--log-stderr, -s write logging information to standard error instead of to <mu-home-directory>/mu-find.log, which is the default.

--log-append, -a append to the log file instead of overwriting it for every run, which is the default.

--debug, -d add a lot of logging for debugging purposes

Performance tuning

Even though the defaults should be fine for most use cases, it is possible to tune the indexing process. The following options are available:

--tune-sqlite-transaction-size= --tune-xapian-transaction-size= <size> Sets the number of messages that are updated/inserted per transaction, for sqlite and Xapian. Setting the number higher improves the performance, but also increases the memory usage very significantly. And of course, in case of problems, up to <size> changes will be rolled back (this is very rare though). For SQLite, the default is 100 and for Xapian it is 1000. It seems that setting the value higher does not improve performance very much.

--tune-synchronous= <sync> Determines whether we wait for changes to be written to disk before continuing; possible values for sync are 2 (full synchronization), 1 (normal), 0 (no synchronization), with the latter being the fastest. Default is 0 (no synchronization). In mu's case, that is safe, as we don't lose any data, even if the indexing operation fails for some reason.

This option corresponds to PRAGMA synchronous in http://www.sqlite.org/pragma.html Values other that 0,1,2 for sync are ignored.

--tune-temp-store= <store> Determines where we store temporary data is stored; options are 0 (the default for SQLite), 1 (file) or 2 (memory). The latter is much faster, but consumes more memory. The default value (for mu) is 2.

This option corresponds to PRAGMA temp_store in http://www.sqlite.org/pragma.html Values other that 0,1,2 for store are ignored.

--tune-sort-inodes determines whether we sort the inodes before scanning a directory. When using an ext3 filesystem with the dir_index option enabled, this gives significant speedups. It does not seem to affect other filesystems very much. The parameter takes the values 0 (turn off) or 1 (turn on). It is on by default.

Again, we stress that in most cases, there should be no reason to use touch these tuning options. However, if you are in a resource-constrained environment, you could opt for using less memory, at a considerable speed penalty.

CONFIGURATION

Instead of specifying the options on the command line, you can also specify them in the mu.conf configuration file, in the mu home directory (by default, ~/.mu ). The General options go in the section [mu] while the mu-index specific options go under [mu-index]. For example, your configuration file could look something like this:
 [mu]
 debug=false
 
 [mu-index]
 maildir=~/MyMaildir
 

Note that command line arguments take precedence over the configuration file.

MAILDIR SUPPORT

mu-index supports an extended version of maildir(5) ; in particular, it supports (a) a tree of Maildirs (strictly, the maildir specification does not allow this, but it is useful and widely supported), and (b) it supports '!' in addition to ':' as separators in mail filenames, which some e-mail programs (such as modest(1) and the Maildir module in python(1) use to support on VFAT filesystems, which don't allow ':' in filenames.

mu-index ignores messages it cannot read or stat(2) ; but failure to read or stat will be logged. Files starting with '.' are ignored, but directories are not. Thus, if there is a message .dotdir/new/mymsg1234 it will be indexed. This allows indexing Maildir++ directories, as used by CourierIMAP and Dovecot

mu-index processes messages in cur/ and new/ leaf directories; it will ignore messages in tmp/

Thus, [....]/tmp/msg02 will be ignored, while [....]/new/msg01 won't.

On the other hand, [....]/tmp/cur/msg03 would not be ignored, while [....]/cur/tmp/msg04 would.

Note: single messages that are added by providing their full pathname to mu-index will not have their path checked.

ENVIRONMENT

mu-index uses MAILDIR to find the user's Maildir if it has not been specified explicitely (in either configuration file or command line). In that case, if MAILDIR is not set, mu-index will try $HOME/Maildir

BUGS

Please report bugs when you find them: http://code.google.com/p/mu0/issues/list

AUTHOR

Dirk-Jan C. Binnema <djcb@djcbsoftware.nl>

SEE ALSO

mu-find(1), sqlite3(1), maildir(5)