Rechercher une page de manuel
mount.aufs
Langue: en
Version: \*[AUFS_VERSION] (ubuntu - 08/07/09)
Section: 8 (Commandes administrateur)
Sommaire
- NAME
- DESCRIPTION
- MOUNT OPTIONS
- Module Parameters
- Branch Syntax
- External Inode Number Bitmap and Translation Table (xino)
- Pseudo Link (hardlink over branches)
- User's Direct Branch Access (UDBA)
- Linux Inotify Limitation
- Policies to Select One among Multiple Writable Branches
- Exporting Aufs via NFS
- Dentry and Inode Caches
- Compatible/Incompatible with Unionfs Version 1.x Series
- Incompatible with an Ordinary Filesystem
- EXAMPLES
- DIAGNOSTICS
- COPYRIGHT
- AUTHOR
NAME
aufs - another unionfs. versionDESCRIPTION
Aufs is a stackable unification filesystem such as Unionfs, which unifies several directories and provides a merged single directory. In the early days, aufs was entirely re-designed and re-implemented Unionfs Version 1.x series. After many original ideas, approaches and improvements, it becomes totally different from Unionfs while keeping the basic features. See Unionfs Version 1.x series for the basic features. Recently, Unionfs Version 2.x series begin taking some of same approaches to aufs's.MOUNT OPTIONS
At mount-time, the order of interpreting options is,-
- •
- simple flags, except xino/noxino, udba=inotify and dlgt
- •
- branches
- •
- xino/noxino
- •
- udba=inotify
- •
- dlgt
At remount-time, the options are interpreted in the given order, e.g. left to right, except dlgt. The 'dlgt' option is disabled in interpreting.
-
- •
- create or remove whiteout-base() and whplink-dir() if necessary
- •
- re-enable dlgt if necessary
- br:BRANCH[:BRANCH ...] (dirs=BRANCH[:BRANCH ...])
- Adds new branches. (cf. Branch Syntax).
Aufs rejects the branch which is an ancestor or a descendant of anther branch. It is called overlapped. When the branch is loopback-mounted directory, aufs also checks the source fs-image file of loopback device. If the source file is a descendant of another branch, it will be rejected too.
After mounting aufs or adding a branch, if you move a branch under another branch and make it descendant of anther branch, aufs will not work correctly.
- [ add | ins ]:index:BRANCH
- Adds a new branch. The index begins with 0. Aufs creates whiteout-base() and whplink-dir() if necessary.
If there is the same named file on the lower branch (larger index), aufs will hide the lower file. You can only see the highest file. You will be confused if the added branch has whiteouts (including diropq), they may or may not hide the lower entries.
If a process have once mapped a file by mmap(2) with MAP_SHARED and the same named file exists on the lower branch, the process still refers the file on the lower(hidden) branch after adding the branch. If you want to update the contents of a process address space after adding, you need to restart your process or open/mmap the file again. (cf. Branch Syntax).
- del:dir
- Removes a branch. Aufs does not remove whiteout-base() and whplink-dir() automatically. For example, when you add a RO branch which was unified as RW, you will see whiteout-base or whplink-dir on the added RO branch.
If a process is referencing the file/directory on the deleting branch (by open, mmap, current working directory, etc.), aufs will return an error EBUSY.
- mod:BRANCH
- Modifies the permission flags of the branch. Aufs creates or removes whiteout-base() and/or whplink-dir() if necessary.
If the branch permission is been changing 'rw' to 'ro', and a process is mapping a file by mmap(2) on the branch, the process may or may not be able to modify its mapped memory region after modifying branch permission flags. (cf. Branch Syntax).
- append:BRANCH
- equivalent to 'add:(last index + 1):BRANCH'. (cf. Branch Syntax).
- prepend:BRANCH
- equivalent to 'add:0:BRANCH.' (cf. Branch Syntax).
- xino=filename
- Use external inode number bitmap and translation table. It is set to <FirstWritableBranch>/ by default, or . Comma character in filename is not allowed.
The files are created per an aufs and per a branch filesystem, and unlinked. So you cannot find this file, but it exists and is read/written frequently by aufs. (cf. External Inode Number Bitmap and Translation Table).
- noxino
- Stop using external inode number bitmap and translation table.
If you use this option, Some applications will not work correctly. (cf. External Inode Number Bitmap and Translation Table).
- trunc_xib
- Truncate the external inode number bitmap file. The truncation is done automatically when you delete a branch unless you do not specify 'notrunc_xib' option. (cf. External Inode Number Bitmap and Translation Table).
- notrunc_xib
- Stop truncating the external inode number bitmap file when you delete a branch. (cf. External Inode Number Bitmap and Translation Table).
- create_policy | create=CREATE_POLICY
- copyup_policy | copyup | cpup=COPYUP_POLICY
- Policies to select one among multiple writable branches. The default values are 'create=tdp' and 'cpup=tdp'. link(2) and rename(2) systemcalls have an exception. In aufs, they try keeping their operations in the branch where the source exists. (cf. Policies to Select One among Multiple Writable Branches).
- verbose | v
- Print some information. Currently, it is only busy file (or inode) at deleting a branch.
- noverbose | quiet | q | silent
- Disable 'verbose' option. This is default value.
- dirwh=N
- Watermark to remove a dir actually at rmdir(2) and rename(2).
If the target dir which is being removed or renamed (destination dir) has a huge number of whiteouts, i.e. the dir is empty logically but physically, the cost to remove/rename the single dir may be very high. It is required to unlink all of whiteouts internally before issuing rmdir/rename to the branch. To reduce the cost of single systemcall, aufs renames the target dir to a whiteout-ed temporary name and invokes a pre-created kernel thread to remove whiteout-ed children and the target dir. The rmdir/rename systemcall returns just after kicking the thread.
When the number of whiteout-ed children is less than the value of dirwh, aufs remove them in a single systemcall instead of passing another thread. This value is ignored when the branch is NFS. The default value is .
- plink
- noplink
- Specifies to use 'pseudo link' feature or not. The default is 'plink' which means use this feature. (cf. Pseudo Link)
- clean_plink
- Removes all pseudo-links in memory. In order to make pseudo-link permanent, use 'auplink' script just before one of these operations, unmounting aufs, using 'ro' or 'noplink' mount option, deleting a branch from aufs, adding a branch into aufs, or changing your writable branch as readonly. If you installed both of /sbin/mount.aufs and /sbin/umount.aufs, and your mount(8) and umount(8) support them, and /etc/default/auplink is configured, 'auplink' script will be executed automatically and flush pseudo-links. (cf. Pseudo Link)
- udba=none | reval | inotify
- Specifies the level of UDBA (User's Direct Branch Access) test. (cf. User's Direct Branch Access and Inotify Limitation).
- diropq=whiteouted | w | always | a
- Specifies whether mkdir(2) and rename(2) dir case make the created directory 'opaque' or not. In other words, to create '' under the created or renamed directory, or not to create. When you specify diropq=w or diropq=whiteouted, aufs will not create it if the directory was not whiteouted or opaqued. If the directory was whiteouted or opaqued, the created or renamed directory will be opaque. When you specify diropq=a or diropq==always, aufs will always create it regardless the directory was whiteouted/opaqued or not. The default value is diropq=w, it means not to create when it is unnecessary. If you define CONFIG_AUFS_COMPAT at aufs compiling time, the default will be diropq=a. You need to consider this option if you are planning to add a branch later since 'diropq' affects the same named directory on the added branch.
- warn_perm
- nowarn_perm
- Adding a branch, aufs will issue a warning about uid/gid/permission of the adding branch directory, when they differ from the existing branch's. This difference may or may not impose a security risk. If you are sure that there is no problem and want to stop the warning, use 'nowarn_perm' option. The default is 'warn_perm' (cf. DIAGNOSTICS).
- coo=none | leaf | all
- Specifies copyup-on-open level. When you open a file which is on readonly branch, aufs opens the file after copying-up it to the writable branch following this level. When the keyword 'all' is specified, aufs copies-up the opening object even if it is a directory. In this case, simple 'ls' or 'find' cause the copyup and your writable branch will have a lot of empty directories. When the keyword 'leaf' is specified, aufs copies-up the opening object except directory. The keyword 'none' disables copyup-on-open. The default is 'coo=none'.
- dlgt
- nodlgt
- If you do not want your application to access branches though aufs or to be traced strictly by task I/O accounting, you can use the kernel threads in aufs. If you enable CONFIG_AUFS_DLGT and specify 'dlgt' mount option, then aufs delegates its internal access to the branches to the kernel threads.
When you define CONFIG_SECURITY and use any type of Linux Security Module (LSM), for example SUSE AppArmor, you may meet some errors or warnings from your security module. Because aufs access its branches internally, your security module may detect, report, or prohibit it. The behaviour is highly depending upon your security module and its configuration. In this case, you can use 'dlgt' mount option, too. Your LSM will see the aufs kernel threads access to the branch, instead of your application.
The delegation may have a negative impact to the performance since it includes task-switch (scheduling) and waits for the thread to complete the delegated access. You should consider increasing the number of the kernel thread specifying the aufs module parameter 'nwkq.'
Currently, aufs does NOT delegate it at mount and remount time. The default is nodlgt which means aufs does not delegate the internal access.
- shwh
- noshwh
- By default (noshwh), aufs doesn't show the whiteouts and they just hide the same named entries in the lower branches. The whiteout itself also never be appeared. If you enable CONFIG_AUFS_SHWH and specify 'shwh' option, aufs will show you the name of whiteouts with keeping its feature to hide the lowers. Honestly speaking, I am rather confused with this 'visible whiteouts.' But a user who originally requested this feature wrote a nice how-to document about this feature. See Tips file in the aufs CVS tree.
Module Parameters
- nwkq=N
- The number of kernel thread named .
Those threads stay in the system while the aufs module is loaded, and handle the special I/O requests from aufs. The default value is .
The special I/O requests from aufs include a part of copy-up, lookup, directory handling, pseudo-link, xino file operations and the delegated access to branches. For example, Unix filesystems allow you to rmdir(2) which has no write permission bit, if its parent directory has write permission bit. In aufs, the removing directory may or may not have whiteout or 'dir opaque' mark as its child. And aufs needs to unlink(2) them before rmdir(2). Therefore aufs delegates the actual unlink(2) and rmdir(2) to another kernel thread which has been created already and has a superuser privilege.
If you enable CONFIG_SYSFS, you can check this value through <sysfs>/module/aufs/parameters/nwkq.
So how many threads is enough? You can check it by <sysfs>/fs/aufs/stat, if you enable CONFIG_AUFS_SYSAUFS (for linux-2.6.24 and earlier) or CONFIG_AUFS_STAT (for linux-2.6.25 and later) too. It shows the maximum number of the enqueued work at a time per a thread. Usually they are all small numbers or 0. If your workload is heavy and you feel the response is low, then check these values. If there are no zero and any of them is larger than 2 or 3, you should set 'nwkq' module parameter greater then the default value. But the reason of the bad response is in your branch filesystem, to increase the number of aufs thread will not help you.
The last number in <sysfs>/fs/aufs/stat after comma is the maximum number of the 'no-wait' enqueued work at a time. Aufs enqueues such work to the system global workqueue called 'events', but does not wait for its completion. Usually they does no harm the time-performance of aufs.
- brs=1 | 0
- Specifies to use the branch path data file under sysfs or not.
If the number of your branches is large or their path is long and you meet the limitation of mount(8) ro /etc/mtab, you need to enable CONFIG_SYSFS and set aufs module parameter brs=1. If your linux version is linux-2.6.24 and earlier, you need to enable CONFIG_AUFS_SYSAUFS too.
When this parameter is set as 1, aufs does not show 'br:' (or dirs=) mount option through /proc/mounts, and /sbin/mount.aufs does not put it to /etc/mtab. So you can keep yourself from the page limitation of mount(8) or /etc/mtab. Aufs shows branch paths through <sysfs>/fs/aufs/si_XXX/brNNN. Actually the file under sysfs has also a size limitation, but I don't think it is harmful.
The default is brs=0, which means <sysfs>/fs/aufs/si_XXX/brNNN does not exist and 'br:' option will appear in /proc/mounts, and /etc/mtab if you install /sbin/mount.aufs. If you did not enable CONFIG_AUFS_SYSAUFS (for linux-2.6.24 and earlier), this parameter will be ignored.
There is one more side effect in setting 1 to this parameter. If you rename your branch, the branch path written in /etc/mtab will be obsoleted and the future remount will meet some error due to the unmatched parameters (Remember that mount(8) may take the options from /etc/mtab and pass them to the systemcall). If you set 1, /etc/mtab will not hold the branch path and you will not meet such trouble. On the other hand, /proc/mounts which holds the branch path is updated dynamically. So it must not be obsoleted. But I don't think users want to rename branches so often.
- sysrq=key
- Specifies MagicSysRq key for debugging aufs. You need to enable both of CONFIG_MAGIC_SYSRQ and CONFIG_AUFS_DEBUG. If your linux version is linux-2.6.24 and earlier, you need to enable CONFIG_AUFS_SYSAUFS too. Currently this is for developers only. The default is 'a'.
Branch Syntax
- dir_path[ =permission [ + attribute ] ]
- permission := rw | ro | rr
- attribute := wh | nolwh
- dir_path is a directory path. The keyword after 'dir_path=' is a permission flags for that branch. Comma, colon and the permission flags string (including '=')in the path are not allowed. Any filesystem can be a branch, except aufs, sysfs, procfs and unionfs. If you specify such filesystems as an aufs branch, aufs will return an error saying it is unsupported. If you enable CONFIG_AUFS_ROBR, you can use aufs as a non-writable branch of another aufs.
Cramfs in linux stable release has strange inodes and it makes aufs confused. For example,
$ mkdir -p w/d1 w/d2 $ > w/z1 $ > w/z2 $ mkcramfs w cramfs $ sudo mount -t cramfs -o ro,loop cramfs /mnt $ find /mnt -ls 76 1 drwxr-xr-x 1 jro 232 64 Jan 1 1970 /mnt 1 1 drwxr-xr-x 1 jro 232 0 Jan 1 1970 /mnt/d1 1 1 drwxr-xr-x 1 jro 232 0 Jan 1 1970 /mnt/d2 1 1 -rw-r--r-- 1 jro 232 0 Jan 1 1970 /mnt/z1 1 1 -rw-r--r-- 1 jro 232 0 Jan 1 1970 /mnt/z2
All these two directories and two files have the same inode with one as their link count. Aufs cannot handle such inode correctly. Currently, aufs involves a tiny workaround for such inodes. But some applications may not work correctly since aufs inode number for such inode will change silently. If you do not have any empty files, empty directories or special files, inodes on cramfs will be all fine.
A branch should not be shared as the writable branch between multiple aufs. A readonly branch can be shared.
The maximum number of branches is configurable at compile time. The current value is which depends upon configuration.
When an unknown permission or attribute is given, aufs sets ro to that branch silently.
Permission
- rw
- Readable and writable branch. Set as default for the first branch. If the branch filesystem is mounted as readonly, you cannot set it 'rw.'
- ro
- Readonly branch and it has no whiteouts on it. Set as default for all branches except the first one. Aufs never issue both of write operation and lookup operation for whiteout to this branch.
- rr
- Real readonly branch, special case of 'ro', for natively readonly branch. Assuming the branch is natively readonly, aufs can optimize some internal operation. For example, if you specify 'udba=inotify' option, aufs does not set inotify for the things on rr branch. Set by default for a branch whose fs-type is either 'iso9660', 'cramfs', 'romfs' or 'squashfs.'
Attribute
- wh
- Readonly branch and it has/might have whiteouts on it. Aufs never issue write operation to this branch, but lookup for whiteout. Use this as '<branch_dir>=ro+wh'.
- nolwh
- Usually, aufs creates a whiteout as a hardlink on a writable branch. This attributes prohibits aufs to create the hardlinked whiteout, including the source file of all hardlinked whiteout (.) If you do not like a hardlink, or your writable branch does not support link(2), then use this attribute. But I am afraid a filesystem which does not support link(2) natively will fail in other place such as copy-up. Use this as '<branch_dir>=rw+nolwh'. Also you may want to try 'noplink' mount option, while it is not recommended.
External Inode Number Bitmap and Translation Table (xino)
Aufs uses one external bitmap file and one external inode number translation table files per an aufs and per a branch filesystem by default. The bitmap is for recycling aufs inode number and the others are a table for converting an inode number on a branch to an aufs inode number. The default path is 'first writable branch'/. If there is no writable branch, the default path will be .Those files are always opened and read/write by aufs frequently. If your writable branch is on flash memory device, it is recommended to put xino files on other than flash memory by specifying 'xino=' mount option.
The maximum file size of the bitmap is, basically, the amount of the number of all the files on all branches divided by 8 (the number of bits in a byte). For example, on a 4KB page size system, if you have 32,768 (or 2,599,968) files in aufs world, then the maximum file size of the bitmap is 4KB (or 320KB).
The maximum file size of the table will be 'max inode number on the branch x size of an inode number'. For example in 32bit environment,
$ df -i /branch_fs /dev/hda14 2599968 203127 2396841 8% /branch_fs
and /branch_fs is an branch of the aufs. When the inode number is assigned contiguously (without 'hole'), the maximum xino file size for /branch_fs will be 2,599,968 x 4 bytes = about 10 MB. But it might not be allocated all of disk blocks. When the inode number is assigned discontinuously, the maximum size of xino file will be the largest inode number on a branch x 4 bytes. Additionally, the file size is limited to LLONG_MAX or the s_maxbytes in filesystem's superblock (s_maxbytes may be smaller than LLONG_MAX). So the support-able largest inode number on a branch is less than 2305843009213693950 (LLONG_MAX/4-1). This is the current limitation of aufs. On 64bit environment, this limitation becomes more strict and the supported largest inode number is less than LLONG_MAX/8-1.
The xino files are always hidden, i.e. removed. So you cannot do 'ls -l xino_file'. If you enable CONFIG_SYSFS, you can check these information through <sysfs>/fs/aufs/<si_id>/xino (for linux-2.6.24 and earlier, you need to enable CONFIG_AUFS_SYSAUFS too). The first line in <sysfs>/fs/aufs/<si_id>/xino shows the information of the bitmap file, in the format of,
<blocks>x<block size> <file size>
Note that a filesystem usually has a feature called pre-allocation, which means a number of blocks are allocated automatically, and then deallocated silently when the filesystem thinks they are unnecessary. You do not have to be surprised the sudden changes of the number of blocks, when your filesystem which xino files are placed supports the pre-allocation feature.
The rests are hidden xino file information in the format of,
<branch index>: <file count>, <blocks>x<block size> <file size>
If the file count is larger than 1, it means some of your branches are on the same filesystem and the xino file is shared by them. Note that the file size may not be equal to the actual consuming blocks since xino file is a sparse file, i.e. a hole in a file which does not consume any disk blocks.
Once you unmount aufs, the xino files for that aufs are totally gone. It means that the inode number is not permanent.
The xino files should be created on the filesystem except NFS. If your first writable branch is NFS, you will need to specify xino file path other than NFS. Also if you are going to remove the branch where xino files exist or change the branch permission to readonly, you need to use xino option before del/mod the branch.
The bitmap file can be truncated. For example, if you delete a branch which has huge number of files, many inode numbers will be recycled and the bitmap will be truncated to smaller size. Aufs does this automatically when a branch is deleted. You can truncate it anytime you like if you specify 'trunc_xib' mount option. But when the accessed inode number was not deleted, nothing will be truncated. If you do not want to truncate it (it may be slow) when you delete a branch, specify 'notrunc_xib' after 'del' mount option.
If you do not want to use xino, use noxino mount option. Use this option with care, since the inode number may be changed silently and unexpectedly anytime. For example, rmdir failure, recursive chmod/chown/etc to a large and deep directory or anything else. And some applications will not work correctly. If you want to change the xino default path, use xino mount option.
After you add branches, the persistence of inode number may not be guaranteed. At remount time, cached but unused inodes are discarded. And the newly appeared inode may have different inode number at the next access time. The inodes in use have the persistent inode number.
When aufs assigned an inode number to a file, and if you create the same named file on the upper branch directly, then the next time you access the file, aufs may assign another inode number to the file even if you use xino option. Some applications may treat the file whose inode number has been changed as totally different file.
Pseudo Link (hardlink over branches)
Aufs supports 'pseudo link' which is a logical hard-link over branches (cf. ln(1) and link(2)). In other words, a copied-up file by link(2) and a copied-up file which was hard-linked on a readonly branch filesystem.When you have files named fileA and fileB which are hardlinked on a readonly branch, if you write something into fileA, aufs copies-up fileA to a writable branch, and write(2) the originally requested thing to the copied-up fileA. On the writable branch, fileA is not hardlinked. But aufs remembers it was hardlinked, and handles fileB as if it existed on the writable branch, by referencing fileA's inode on the writable branch as fileB's inode.
Once you unmount aufs, the plink info for that aufs kept in memory are totally gone. It means that the pseudo-link is not permanent. If you want to make plink permanent, try 'auplink' script just before one of these operations, unmounting your aufs, using 'ro' or 'noplink' mount option, deleting a branch from aufs, adding a branch into aufs, or changing your writable branch to readonly.
This script will reproduces all real hardlinks on a writable branch by linking them, and removes pseudo-link info in memory and temporary link on the writable branch. Since this script access your branches directly, you cannot hide them by 'mount --bind /tmp /branch' or something.
If you are willing to rebuild your aufs with the same branches later, you should use auplink script before you umount your aufs. If you installed both of /sbin/mount.aufs and /sbin/umount.aufs, and your mount(8) and umount(8) support them, and /etc/default/auplink is configured, 'auplink' script will be executed automatically and flush pseudo-links.
The /etc/default/auplink is a simple shell script which does nothing but defines $FLUSH. If your aufs mount point is set in $FLUSH, 'auplink' flushes the pseudo-links on that mount point. If $FLUSH is set to 'ALL', 'auplink' will be executed for every aufs.
The 'auplink' script uses 'aulchown' binary, you need to install it too. The 'auplink' script executes 'find' and 'mount -o remount', they may take a long time and impact the later system performance. If you did not install /sbin/mount.aufs, /sbin/umount.aufs or /sbin/auplink, but you want to flush pseudo-links, then you need to execute 'auplink' manually. If you installed and configured them, but do not want to execute 'auplink' at umount time, then use '-i' option for umount(8).
# auplink /your/aufs/root flush # umount /your/aufs/root or # auplink /your/aufs/root flush # mount -o remount,mod:/your/writable/branch=ro /your/aufs/root or # auplink /your/aufs/root flush # mount -o remount,noplink /your/aufs/root or # auplink /your/aufs/root flush # mount -o remount,del:/your/aufs/branch /your/aufs/root or # auplink /your/aufs/root flush # mount -o remount,append:/your/aufs/branch /your/aufs/root
The plinks are kept both in memory and on disk. When they consumes too much resources on your system, you can use the 'auplink' script at anytime and throw away the unnecessary pseudo-links in safe.
Additionally, the 'auplink' script is very useful for some security reasons. For example, when you have a directory whose permission flags are 0700, and a file who is 0644 under the 0700 directory. Usually, all files under the 0700 directory are private and no one else can see the file. But when the directory is 0711 and someone else knows the 0644 filename, he can read the file.
Basically, aufs pseudo-link feature creates a temporary link under the directory whose owner is root and the permission flags are 0700. But when the writable branch is NFS, aufs sets 0711 to the directory. When the 0644 file is pseudo-linked, the temporary link, of course the contents of the file is totally equivalent, will be created under the 0711 directory. The filename will be generated by its inode number. While it is hard to know the generated filename, someone else may try peeping the temporary pseudo-linked file by his software tool which may try the name from one to MAX_INT or something. In this case, the 0644 file will be read unexpectedly. I am afraid that leaving the temporary pseudo-links can be a security hole. It makes sense to execute 'auplink /your/aufs/root flush' periodically, when your writable branch is NFS.
When your writable branch is not NFS, or all users are careful enough to set 0600 to their private files, you do not have to worry about this issue.
If you do not want this feature, use 'noplink' mount option and you do not need to install 'auplink' script and 'aulchown' binary.
The behaviours of plink and noplink
This sample shows that the 'f_src_linked2' with 'noplink' option cannot follow the link.none on /dev/shm/u type aufs (rw,xino=/dev/shm/rw/.aufs.xino,br:/dev/shm/rw=rw:/dev/shm/ro=ro) $ ls -li ../r?/f_src_linked* ./f_src_linked* ./copied ls: ./copied: No such file or directory 15 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ../ro/f_src_linked 15 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ../ro/f_src_linked2 22 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ./f_src_linked 22 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ./f_src_linked2 $ echo abc >> f_src_linked $ cp f_src_linked copied $ ls -li ../r?/f_src_linked* ./f_src_linked* ./copied 15 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ../ro/f_src_linked 15 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ../ro/f_src_linked2 36 -rw-r--r-- 2 jro jro 6 Dec 22 11:03 ../rw/f_src_linked 53 -rw-r--r-- 1 jro jro 6 Dec 22 11:03 ./copied 22 -rw-r--r-- 2 jro jro 6 Dec 22 11:03 ./f_src_linked 22 -rw-r--r-- 2 jro jro 6 Dec 22 11:03 ./f_src_linked2 $ cmp copied f_src_linked2 $ none on /dev/shm/u type aufs (rw,xino=/dev/shm/rw/.aufs.xino,noplink,br:/dev/shm/rw=rw:/dev/shm/ro=ro) $ ls -li ../r?/f_src_linked* ./f_src_linked* ./copied ls: ./copied: No such file or directory 17 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ../ro/f_src_linked 17 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ../ro/f_src_linked2 23 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ./f_src_linked 23 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ./f_src_linked2 $ echo abc >> f_src_linked $ cp f_src_linked copied $ ls -li ../r?/f_src_linked* ./f_src_linked* ./copied 17 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ../ro/f_src_linked 17 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ../ro/f_src_linked2 36 -rw-r--r-- 1 jro jro 6 Dec 22 11:03 ../rw/f_src_linked 53 -rw-r--r-- 1 jro jro 6 Dec 22 11:03 ./copied 23 -rw-r--r-- 2 jro jro 6 Dec 22 11:03 ./f_src_linked 23 -rw-r--r-- 2 jro jro 6 Dec 22 11:03 ./f_src_linked2 $ cmp copied f_src_linked2 cmp: EOF on f_src_linked2 $
If you add a branch which has fileA or fileB, aufs does not follow the pseudo link. The file on the added branch has no relation to the same named file(s) on the lower branch(es). If you use noxino mount option, pseudo link will not work after the kernel shrinks the inode cache.
This feature will not work for squashfs before version 3.2 since its inode is tricky. When the inode is hardlinked, squashfs inodes has the same inode number and correct link count, but the inode memory object is different. Squashfs inodes (before v3.2) are generated for each, even they are hardlinked.
User's Direct Branch Access (UDBA)
UDBA means a modification to a branch filesystem manually or directly, e.g. bypassing aufs. While aufs is designed and implemented to be safe after UDBA, it can make yourself and your aufs confused. And some information like aufs inode will be incorrect. For example, if you rename a file on a branch directly, the file on aufs may or may not be accessible through both of old and new name. Because aufs caches various information about the files on branches. And the cache still remains after UDBA.Aufs has a mount option named 'udba' which specifies the test level at access time whether UDBA was happened or not.
- udba=none
- Aufs trusts the dentry and the inode cache on the system, and never test about UDBA. With this option, aufs runs fastest, but it may show you incorrect data. Additionally, if you often modify a branch directly, aufs will not be able to trace the changes of inodes on the branch. It can be a cause of wrong behaviour, deadlock or anything else.
It is recommended to use this option only when you are sure that nobody access a file on a branch. It might be difficult for you to achieve real 'no UDBA' world when you cannot stop your users doing 'find / -ls' or something. If you really want to forbid all of your users to UDBA, here is a trick for it. With this trick, users cannot see the branches directly and aufs runs with no problem, except 'auplink' script. But if you are not familiar with aufs, this trick may make yourself confused.
# d=/tmp/.aufs.hide # mkdir $d # for i in $branches_you_want_to_hide > do > mount -n --bind $d $i > done
When you unmount the aufs, delete/modify the branch by remount, or you want to show the hidden branches again, unmount the bound /tmp/.aufs.hide.
# umount -n $branches_you_want_to_unbound
If you use FUSE filesystem as an aufs branch which supports hardlink, you should not set this option, since FUSE makes inode objects for each hardlinks (at least in linux-2.6.23). When your FUSE filesystem maintains them at link/unlinking, it is equivalent to 'direct branch access' for aufs.
- udba=reval
- Aufs tests only the existence of the file which existed. If the existed file was removed on the branch directly, aufs discard the cache about the file and re-lookup it. So the data will be updated. This test is at minimum level to keep the performance and ensure the existence of a file. This is default and aufs runs still fast.
This rule leads to some unexpected situation, but I hope it is harmless. Those are totally depends upon cache. Here are just a few examples.
-
- •
- If the file is cached as negative or not-existed, aufs does not test it. And the file is still handled as negative after a user created the file on a branch directly. If the file is not cached, aufs will lookup normally and find the file.
- •
- When the file is cached as positive or existed, and a user created the same named file directly on the upper branch. Aufs detects the cached inode of the file is still existing and will show you the old (cached) file which is on the lower branch.
- •
- When the file is cached as positive or existed, and a user renamed the file by rename(2) directly. Aufs detects the inode of the file is still existing. You may or may not see both of the old and new files. Todo: If aufs also tests the name, we can detect this case.
If your outer modification (UDBA) is rare and you can ignore the temporary and minor differences between virtual aufs world and real branch filesystem, then try this mount option.
-
- udba=inotify
- Aufs sets 'inotify' to all the accessed directories on its branches and receives the event about the dir and its children. It consumes resources, cpu and memory. And I am afraid that the performance will be hurt, but it is most strict test level. There are some limitations of linux inotify, see also Inotify Limitation. So it is recommended to leave udba default option usually, and set it to inotify by remount when you need it.
When a user accesses the file which was notified UDBA before, the cached data about the file will be discarded and aufs re-lookup it. So the data will be updated. When an error condition occurs between UDBA and aufs operation, aufs will return an error, including EIO. To use this option, you need linux-2.6.18 and later, and need to enable CONFIG_INOTIFY and CONFIG_AUFS_UDBA_INOTIFY.
To rename/rmdir a directory on a branch directory may reveal the same named directory on the lower branch. Aufs tries re-lookuping the renamed directory and the revealed directory and assigning different inode number to them. But the inode number including their children can be a problem. The inode numbers will be changed silently, and aufs may produce a warning. If you rename a directory repeatedly and reveal/hide the lower directory, then aufs may confuse their inode numbers too. It depends upon the system cache.
When you make a directory in aufs and mount other filesystem on it, the directory in aufs cannot be removed expectedly because it is a mount point. But the same named directory on the writable branch can be removed, if someone wants. It is just an empty directory, instead of a mount point. Aufs cannot stop such direct rmdir, but produces a warning about it.
Linux Inotify Limitation
Unfortunately, current inotify (linux-2.6.18) has some limitations, and aufs must derive it. I am going to address some harmful cases.IN_ATTRIB, updating atime
When a file/dir on a branch is accessed directly, the inode atime (access time, cf. stat(2)) may or may not be updated. In some cases, inotify does not fire this event. So the aufs inode atime may remain old.IN_ATTRIB, updating nlink
When the link count of a file on a branch is incremented by link(2) directly, inotify fires IN_CREATE to the parent directory, but IN_ATTRIB to the file. So the aufs inode nlink may remain old.IN_DELETE, removing file on NFS
When a file on a NFS branch is deleted directly, inotify may or may not fire IN_DELETE event. It depends upon the status of dentry (DCACHE_NFSFS_RENAMED flag). In this case, the file on aufs seems still exists. Aufs and any user can see the file.IN_IGNORED, deleted rename target
When a file/dir on a branch is unlinked by rename(2) directly, inotify fires IN_IGNORED which means the inode is deleted. Actually, in some cases, the inode survives. For example, the rename target is linked or opened. In this case, inotify watch set by aufs is removed by VFS and inotify. And aufs cannot receive the events anymore. So aufs may show you incorrect data about the file/dir.Policies to Select One among Multiple Writable Branches
Aufs has some policies to select one among multiple writable branches when you are going to write/modify something. There are two kinds of policies, one is for newly create something and the other is for internal copy-up. You can select them by specifying mount option 'create=CREATE_POLICY' or 'cpup=COPYUP_POLICY.' These policies have no meaning when you have only one writable branch. If there is some meaning, it must hurt the performance.Exceptions for Policies
In every cases below, even if the policy says that the branch where a new file should be created is /rw2, the file will be created on /rw1.- •
- If there is a readonly branch with 'wh' attribute above the policy-selected branch and the parent dir is marked as opaque, or the target (creating) file is whiteouted on the ro+wh branch, then the policy will be ignored and the target file will be created on the nearest upper writable branch than the ro+wh branch.
-
/aufs = /rw1 + /ro+wh/diropq + /rw2 /aufs = /rw1 + /ro+wh/wh.tgt + /rw2
-
- •
- If there is a writable branch above the policy-selected branch and the parent dir is marked as opaque or the target file is whiteouted on the branch, then the policy will be ignored and the target file will be created on the highest one among the upper writable branches who has diropq or whiteout. In case of whiteout, aufs removes it as usual.
-
/aufs = /rw1/diropq + /rw2 /aufs = /rw1/wh.tgt + /rw2
-
- •
- link(2) and rename(2) systemcalls are exceptions in every policy. They try selecting the branch where the source exists as possible since copyup a large file will take long time. If it can't be, ie. the branch where the source exists is readonly, then they will follow the copyup policy.
- •
- There is an exception for rename(2) when the target exists. If the rename target exists, aufs compares the index of the branches where the source and the target are existing and selects the higher one. If the selected branch is readonly, then aufs follows the copyup policy.
Policies for Creating
- create=tdp | top-down-parent
- Selects the highest writable branch where the parent dir exists. If the parent dir does not exist on a writable branch, then the internal copyup will happen. The policy for this copyup is always 'bottom-up.' This is the default policy.
- create=rr | round-robin
- Selects a writable branch in round robin. When you have two writable branches and creates 10 new files, 5 files will be created for each branch. mkdir(2) systemcall is an exception. When you create 10 new directories, all are created on the same branch.
- create=mfs[:second] | most-free-space[:second]
- Selects a writable branch which has most free space. In order to keep the performance, you can specify the duration ('second') which makes aufs hold the index of last selected writable branch until the specified seconds expires. The first time you create something in aufs after the specified seconds expired, aufs checks the amount of free space of all writable branches by internal statfs call and the held branch index will be updated. The default value is seconds.
In this mode, a FUSE branch needs special attention. The struct fuse_operations has a statfs operation. It is OK, but the parameter is struct statvfs* instead of struct statfs*. So almost all user-space implementaion will call statvfs(3)/fstatvfs(3) instead of statfs(2)/fstatfs(2). In glibc, [f]statvfs(3) issues [f]statfs(2), open(2)/read(2) for /proc/mounts, and stat(2) for the mountpoint. With this situation, a FUSE branch will cause a deadlock in creating something in aufs. Here is a sample scenario,
- •
- create a file just under the aufs root dir.
- •
- aufs will aquire a write-lock for the parent directory.
- •
- aufs may call statfs internally for each writable branches to decide the branch which has most free space.
- •
- FUSE in kernel-space converts and redirects the statfs request to the user-space.
- •
- the user-space statfs handler will call [f]statvfs(3).
- •
- the [f]statvfs(3) in glibc will access /proc/mounts and issue stat(2) for the mountpoint. But those require a read-lock for the aufs root directory.
- •
- Then a deadlock occurs.
In order to avoid this deadlock, I would suggest not to call [f]statvfs(3). Here is a sample code to do this.
struct statvfs stvfs; main() { [f]statvfs(..., &stvfs) } statfs_handler(const char *path, struct statvfs *arg) { struct statfs stfs; [f]statfs(..., &stfs); memcpy(arg, &stvfs, sizeof(stvfs)); arg->f_bfree = stfs.f_bfree; arg->f_bavail = stfs.f_bavail; arg->f_ffree = stfs.f_ffree; arg->f_favail = /* any value */; }
- create=mfsrr:low[:second]
- Selects a writable branch in most-free-space mode first, and then round-robin mode. If the selected branch has less free space than the specified value 'low' in bytes, then aufs re-tries in round-robin mode. Try an arithmetic expansion of shell which is defined by POSIX. For example, $((10 * 1024 * 1024)) for 10M. You can also specify the duration ('second') which is equivalent to the 'mfs' mode.
- create=pmfs[:second]
- Selects a writable branch where the parent dir exists, such as tdp mode. When the parent dir exists on multiple writable branches, aufs selects the one which has most free space, such as mfs mode.
Policies for Copy-Up
- cpup=tdp | top-down-parent
- Equivalent to the same named policy for create. This is the default policy.
- cpup=bup | bottom-up-parent
- Selects the writable branch where the parent dir exists and the branch is nearest upper one from the copyup-source.
- cpup=bu | bottom-up
- Selects the nearest upper writable branch from the copyup-source, regardless the existence of the parent dir.
Exporting Aufs via NFS
Aufs is supporting NFS-exporting in linux-2.6.18 and later. Since aufs has no actual block device, you need to add NFS 'fsid' option at exporting. Refer to the manual of NFS about the detail of this option.In linux-2.6.23 and earlier, it is recommended to export your branch filesystems once before exporting aufs. By exporting once, the branch filesystem internal pointer named find_exported_dentry is initialized. After this initialization, you may unexport them. Additionally, this initialization should be done per the filesystem type. If your branches are all the same filesystem type, you need to export just one of them once. If you have never export a filesystem which is used in your branches, aufs will initialize the internal pointer by the default value, and produce a warning. While it will work correctly, I am afraid it will be unsafe in the future. In linux-2.6.24 and later, this exporting is unnecessary.
Additionally, there are several limitations or requirements.
-
- •
- The version of linux kernel must be linux-2.6.18 or later.
- •
- You need to enable CONFIG_AUFS_EXPORT.
- •
- The branch filesystem must support NFS-exporting. For example, tmpfs in linux-2.6.18 (or earlier) does not support it.
- •
- NFSv2 is not supported. When you mount the exported aufs from your NFS client, you will need to some NFS options like v3 or nfsvers=v3, especially if it is nfsroot.
- •
- If the size of the NFS file handle on your branch filesystem is large, aufs will not be able to handle it. The maximum size of NFSv3 file handle for a filesystem is 64 bytes. Aufs uses 24 bytes for 32bit system, plus 12 bytes for 64bit system. The rest is a room for a file handle of a branch filesystem.
- •
- The External Inode Number Bitmap and Translation Table (xino) is required since NFS file handle is based upon inode number. The mount option 'xino' is enabled by default.
- •
- The branch filesystems must be accessible, which means 'not hidden.' It means you need to 'mount --move' when you use initramfs and switch_root(8), or chroot(8).
Dentry and Inode Caches
If you want to clear caches on your system, there are several tricks for that. If your system ram is low, try 'find /large/dir -ls > /dev/null'. It will read many inodes and dentries and cache them. Then old caches will be discarded. But when you have large ram or you do not have such large directory, it is not effective.If you want to discard cache within a certain filesystem, try 'mount -o remount /your/mntpnt'. Some filesystem may return an error of EINVAL or something, but VFS discards the unused dentry/inode caches on the specified filesystem.
Compatible/Incompatible with Unionfs Version 1.x Series
If you compile aufs with -DCONFIG_AUFS_COMPAT, dirs= option and =nfsro branch permission flag are available. They are interpreted as br: option and =ro flags respectively.'debug', 'delete', 'imap' options are ignored silently. When you compile aufs without -DCONFIG_AUFS_COMPAT, these three options are also ignored, but a warning message is issued.
Ignoring 'delete' option, and to keep filesystem consistency, aufs tries writing something to only one branch in a single systemcall. It means aufs may copyup even if the copyup-src branch is specified as writable. For example, you have two writable branches and a large regular file on the lower writable branch. When you issue rename(2) to the file on aufs, aufs may copyup it to the upper writable branch. If this behaviour is not what you want, then you should rename(2) it on the lower branch directly.
And there is a simple shell script 'unionctl' under sample subdirectory, which is compatible with unionctl(8) in Unionfs Version 1.x series, except --query action. This script executes mount(8) with 'remount' option and uses add/del/mod aufs mount options. If you are familiar with Unionfs Version 1.x series and want to use unionctl(8), you can try this script instead of using mount -o remount,... directly. Aufs does not support ioctl(2) interface. This script is highly depending upon mount(8) in util-linux-2.12p package, and you need to mount /proc to use this script. If your mount(8) version differs, you can try modifying this script. It is very easy. The unionctl script is just for a sample usage of aufs remount interface.
Aufs uses the external inode number bitmap and translation table by default.
The default branch permission for the first branch is 'rw', and the rest is 'ro.'
The whiteout is for hiding files on lower branches. Also it is applied to stop readdir going lower branches. The latter case is called 'opaque directory.' Any whiteout is an empty file, it means whiteout is just an mark. In the case of hiding lower files, the name of whiteout is '<filename>.' And in the case of stopping readdir, the name is '.opq' or '__dir_opaque.' The name depends upon your compile configuration CONFIG_AUFS_COMPAT. All whiteouts are hardlinked, including '<writable branch top dir>/.'
The hardlink on an ordinary (disk based) filesystem does not consume inode resource newly. But in linux tmpfs, the number of free inodes will be decremented by link(2). It is recommended to specify nr_inodes option to your tmpfs if you meet ENOSPC. Use this option after checking by 'df -i.'
When you rmdir or rename-to the dir who has a number of whiteouts, aufs rename the dir to the temporary whiteouted-name like '<dir>.<random hex>.' Then remove it after actual operation. cf. mount option 'dirwh.'
Incompatible with an Ordinary Filesystem
stat(2) returns the inode info from the first existence inode among the branches, except the directory link count. Aufs computes the directory link count larger than the exact value usually, in order to keep UNIX filesystem semantics, or in order to shut find(1) mouth up. The size of a directory may be wrong too, but it has to do no harm. The timestamp of a directory will not be updated when a file is created or removed under it, and it was done on a lower branch.The test for permission bits has two cases. One is for a directory, and the other is for a non-directory. In the case of a directory, aufs checks the permission bits of all existing directories. It means you need the correct privilege for the directories including the lower branches. The test for a non-directory is more simple. It checks only the topmost inode.
statfs(2) returns the first branch info except namelen. The namelen is decreased by the whiteout prefix length.
Remember, seekdir(3) and telldir(3) are not defined in POSIX. They may not work as you expect. Try rewinddir(3) or re-open the dir.
The whiteout prefix () is reserved on all branches. Users should not handle the filename begins with this prefix. In order to future whiteout, the maxmum filename length is limited by the longest value - . It may be a violation of POSIX.
If you dislike the difference between the aufs entries in /etc/mtab and /proc/mounts, and if you are using mount(8) in util-linux package, then try ./mount.aufs script. Copy the script to /sbin/mount.aufs. This simple script tries updating /etc/mtab. If you do not care about /etc/mtab, you can ignore this script. Remember this script is highly depending upon mount(8) in util-linux-2.12p package, and you need to mount /proc.
Since aufs uses its own inode and dentry, your system may cache huge number of inodes and dentries. It can be as twice as all of the files in your union. It means that unmounting or remounting readonly at shutdown time may take a long time, since mount(2) in VFS tries freeing all of the cache on the target filesystem.
When you open a directory, aufs will open several directories internally. It means you may reach the limit of the number of file descriptor. And when the lower directory cannot be opened, aufs will close all the opened upper directories and return an error.
The sub-mount under the branch of local filesystem is ignored. For example, if you have mount another filesystem on /branch/another/mntpnt, the files under 'mntpnt' will be ignored by aufs. It is recommended to mount the sub-mount under the mounted aufs. For example,
# sudo mount /dev/sdaXX /ro_branch # d=another/mntpnt # sudo mount /dev/sdbXX /ro_branch/$d # mkdir -p /rw_branch/$d # sudo mount -t aufs -o br:/rw_branch:/ro_branch none /aufs # sudo mount -t aufs -o br:/rw_branch/${d}:/ro_branch/${d} none /aufs/another/$d
There are several characters which are not allowed to use in a branch directory path and xino filename. See detail in Branch Syntax and Mount Option.
The file-lock which means fcntl(2) with F_SETLK, F_SETLKW or F_GETLK, flock(2) and lockf(3), is applied to virtual aufs file only, not to the file on a branch. It means you can break the lock by accessing a branch directly. TODO: check 'security' to hook locks, as inotify does.
The fsync(2) and fdatasync(2) systemcalls return 0 which means success, even if the given file descriptor is not opened for writing. I am afraid this behaviour may violate some standards. Checking the behaviour of fsync(2) on ext2, aufs decided to return success.
If you want to use disk-quota, you should set it up to your writable branch since aufs does not have its own block device.
When your aufs is the root directory of your system, and your system tells you some of the filesystem were not unmounted cleanly, try these procedure when you shutdown your system.
# mount -no remount,ro / # for i in $writable_branches # do mount -no remount,ro $i # doneIf your xino file is on a hard drive, you also need to specify 'noxino' option or 'xino=/your/tmpfs/xino' at remounting root directory.
To rename(2) directory may return EXDEV even if both of src and tgt are on the same aufs. When the rename-src dir exists on multiple branches and the lower dir has child(ren), aufs has to copyup all his children. It can be recursive copyup. Current aufs does not support such huge copyup operation at one time in kernel space, instead produces a warning and returns EXDEV. Generally, mv(1) detects this error and tries mkdir(2) and rename(2) or copy/unlink recursively. So the result is harmless. If your application which issues rename(2) for a directory does not support EXDEV, it will not work on aufs. Also this specification is applied to the case when the src directroy exists on the lower readonly branch and it has child(ren).
EXAMPLES
The mount options are interpreted from left to right at remount-time. These examples shows how the options are handled. (assuming /sbin/mount.aufs was installed)# mount -v -t aufs br:/day0:/base none /u none on /u type aufs (rw,xino=/day0/.aufs.xino,br:/day0=rw:/base=ro) # mount -v -o remount,\ prepend:/day1,\ xino=/day1/xino,\ mod:/day0=ro,\ del:/day0 \ /u none on /u type aufs (rw,xino=/day1/xino,br:/day1=rw:/base=ro)
# mount -t aufs br:/rw none /u # mount -o remount,append:/ro /u different uid/gid/permission, /ro # mount -o remount,del:/ro /u # mount -o remount,nowarn_perm,append:/ro /u # (there is no warning)
When you use aufs as root filesystem, it is recommended to consider to exclude some directories. For example, /tmp and /var/log are not need to stack in many cases. They do not usually need to copyup or to whiteout. Also the swapfile on aufs (a regular file, not a block device) is not supported.
And there is a good sample which is for network booted diskless machines. See sample/ in detail.
DIAGNOSTICS
When you add an branch to your union, aufs may warn you about the privilege or security of the branch, which is the permission bits, owner and group of the top directory of the branch. For example, when your upper writable branch has a world writable top directory, a malicious user can create any files on the writable branch directly, like copyup and modify manually. I am afraid it can be a security issue.When you mount or remount your union without -o ro common mount option and without writable branch, aufs will warn you that the first branch should be writable.
When you set udba other than inotify and change something on your branch filesystem directly, later aufs may detect some mismatches to its cache. If it is a critical mismatch, aufs returns EIO and issues a warning saying 'try udba=inotify.'
When an error occurs in aufs, aufs prints the kernel message with 'errno.' The priority of the message (log level) is ERR or WARNING which depends upon the message itself. You can convert the 'errno' into the error message by perror(3), strerror(3) or something. For example, the 'errno' in the message 'I/O Error, write failed (-28)' is 28 which means ENOSPC or 'No space left on device.'
COPYRIGHT
Copyright © 2005-2008 Junjiro OkajimaAUTHOR
Junjiro OkajimaContenus ©2006-2024 Benjamin Poulain
Design ©2006-2024 Maxime Vantorre