Name: nss_memcache Author: Will Drewry Copyright: 2006 Will Drewry License: GPLv2 Date: 23 Oct 2006 1. Summary 2. Rationale 1. Background 2. Why not use X? 3. Use cases 4. Scope 5. Design 1. NSS module 2. Caching daemon 6. Implementation 1. Components 2. Plan 3. Code 4. Security Concerns 7. Status 8. Future work Summary ======= nss_memcache is a NSS (name switch service) module that caches responses sourced from any other modules used on the system. The cache data is stored in a local memcached instance separated by user. Rationale ========= Background ---------- The NSS system is an approach for cleanly extending name services using modules. It is used in both GNU-based systems and on Solaris. Services using the NSS system are often handled using modules in a first-in-first-out sense. If the required data is resolvable by the first listed module (in /etc/nsswitch.conf), that data will be returned from that module and all other modules will be skipped. In general, the FIFO-approach works quite well, but it gets complicated when external name resolution systems get involved. When the NSS modules are mapped into a process's memory and resolution takes place, a distinct connection is made to the external system per-process and, at worst, per-request. Because of this, it is highly desirable to add some form of caching. Enter `nscd', the name service caching daemon. nscd is the solution to this problem. It caches module data. It is multi-threaded. It fails a lot. nscd works outside the NSS system design by intercepting data via UNIX socket. Not only does it not follow the general design, it also is rather complex. Problems on both Linux and Solaris have been seen for years. So why nss_memcache? -------------------- With the advent of easy-to-use thread local storage (TLS), it is easier to write a caching solution that works within the NSS system itself. It becomes easy for the systems administrator to select which services to cache within nsswitch.conf instead of a separate config. In addition, it allows for a cleaner separation of caching code and the main NSS system. Use cases ========= * User A works in a large environment that relies on nss_ldap for passwd and group data. User B types "ls -la" on the prompt but currently has to wait for each nss_ldap lookup even though most processes are owned by "root" or "userb". (rel1) * User B administers a small shell server which relies on a postgresql database using nss_pgsql and uses nscd to avoid constant lookups. Currently, nscd fails blocking many processes' progress on the system. Users are complaining. (rel1) * User C administers a high load internal mail server which uses nss_mysql and nscd. When mysql blocks or nscd hangs, mail gets delayed or even rejected. User D is considering moving to nss_db but doesn't want to deal with the fuss. (rel1) * User D works in a large environment that relies on nss_mysql for passwd data, and User A attempts to use bash username completion (cd ~). User A doesn't want to get coffee while waiting for getentpw() to iterate. (rel2: per-user, controlled cache dumps) * User E runs a large scale install of OpenLDAP for account management and is tired of setting up replication slaves. User E would prefer a limited number of slaves and more automatically configured caches. (rel3: kerberized remote caches) Scope ===== The scope of this project is to supply a caching subsystem similar to "nscd" but without relying on any special libc hooks. Ideally, it should be a stable robust drop-in replacement for "nscd". Future work may expand this scope to consider a distributed caching subsystem. Design ====== The nss_memcache project is comprised of two components: - NSS system module: nss_memcache - A caching daemon NSS module: nss_memcache ------------------------ The nss_memcache component has several constraints which must be met: - must work within the NSS system - thread-safe - will not block on failure The largest difference between nss_memcache and nscd is that nss_memcache will not rely on libc to provide a special interface for caching. Like nscd, there is a caching daemon, as described below, but unlike nscd, there is a NSS component. In particular, the module not only performs lookups against the cache daemon, it also performs updates. This is done by falling through to the other NSS modules on the system, such as NIS or LDAP. Once data is returned, the cache daemon is updated and the value is returned to the caller. The basic algorithm for this modules is as follows: 1. Receive request from client code 2. Query cache daemon for match 3. On match, return 4. On no match, call the same function that was called. E.g. getpwuid() 5. When the call reaches nss_memcache again, return unavailable 6. If the call returns a match, updates the cache server and return 7. If not, update the cache server with a negative match and return This approach introduces challenges around proper locking and thread-safe behavior. In particular, calls back into the same function, e.g. getpwuid(), should fall through if a fallthrough should indeed be happening, but in a threaded environment, this may not be the case. To get around this problem, a static thread local "fallthrough" variable is used to note when the called function has fallen through to the other NSS modules. Updates to this thread local variable is then done using libc_locks. The locks ensure that user-level threaded applications, which do not benefit from thread-local storage, do not cause unexpected updates. However, this means that user-level threaded applications may not benefit as much from nss_cache. (More attention must be paid to this.) With the first two constraints addressed, the third is somewhat straightforward. nss_memcache will avoid blocking program execution on failure primarily through the use of non-blocking I/O. The only acceptable place for nss_memcache to block is on callbacks into other NSS modules. With this design in mind, nss_memcache should be used ahead of all modules to be cached on the system -- E.g., passwd: memcache [NOTFOUND=return] ldap nis compat Caching daemon: memcached ------------------------- As the name of the project indicates, memcached was chosen as the caching daemon for use with nss_memcache. Initially, this project was to be called nss_cache and have a custom caching daemon written around libevent. Enter memcache. memcache is generic object caching system written around libevent. Not only does it meet the basic criteria of needing a fast, single-threaded caching daemon, it also already supports UNIX domain sockets and has a simple protocol and several existing APIs for multiple languages. In addition, memcache has been used behind production web services, like LiveJournal, for several years. It is actively supported and has a good track record of robustness. memcache is not perfect, however. It lacks any awareness of users and has no way to enforce security. This limitation, however is easy enough to overcome. The memcache daemon will be extended to retrieve peer credentials (e.g. using SO_PEERCRED) over UNIX sockets. This extension will allow the separation of user cache spaces with a minimal amount of added work. When nss_memcache connects to memcached over the socket, it will check the credentials and prefix the specified key (get, set, etc) with user's uid number. All "get", "set", "incr", and "decr" requests over the UNIX socket will have the key prefixed with the caller's uid. This is true in every case except when the uid number is 0. In those cases, no prefix will be enforced. This will allow the root user to easily prepopulate caches or even forcibly clear user-specific caches. In addition, "get" requests using user-separation will first check for a non-prefixed key set by the superuser prior to searching for a user-specific key. (This functionality may be offloaded to nss_memcache, but how is still unclear.) Implementation ============== The entire project will be written in C with an eye towards readability and extensibility. Components ---------- The memcached patches should be as simple and unobtrusive as possible. It is highly desirable to get them included in the upstream by the maintainers of memcached. nss_memcache should be templated as much as possible. Given that memcache accepts arbitrary objects, it seems that the largest challenge will be coming up with ways to marshal the data in a generic fashion. The libmemcache API was originally slated to be used in nss_memcache, but due to a high number of libmemcache-induced bugs, it has been set aside. Instead, a very lightweight memcache C API will be used. Plan ---- The initial goals of this project are to create a robust drop-in for passwd and group caching. Once this goal is achieved, this project will be showcased to a few potential test users for feedback and comments. From there, the code will be polished further and any future enhancements, such as distributed (kerberized) caching, will be added. Code ---- This code will live on code.google.com for now. Security Concerns ----------------- - abuse - one user claiming all the memcache daemon's allocated memory: ??? - may be solvable with user bucketing? - user cache tainting: memcached should separate by UID # - sensitive data: avoid caching passwords - bad code: :-( Status ====== - proof-of-concept nss_memcache using TLS and libc locks (may be able to ditch the libc locks since TLS is in use...maybe) - nss_memcached getpwuid_r() working - libmemcache sucks - need to replace with libmemcache-lite Future work =========== Now: - Make flags 32-bit and hold the UID of the caller ?? - Make nss_memcache do two lookups per req (0,getuid()) - Add key pattern match *get() to memcache() - Make lookups (get, bget) check a master entry then a user-specific entry: e.g. client requests should first check root-owned (non-prefixed) "key" then lookup ":key". Future: - libmemcache-lite -- add hashing, auth, etc - nss_dmemcache: - network based memcache with buckets for distributed servers - kerberos or ssh-agent used for user cache separation on updates/lookups - admin credentials can push "master" value updates - replace LDAP replication with server-per-office plus nss_kmemcache servers