It's been a while since I jumped on a nice steep learning curve, and cfengine is suiting me nicely to that end. Many administrators probably know of cfengine as something that sounds incredibly useful but a bit of work to get rolling, which ends up being true. You can head to
www.cfengine.org to get a pretty good basic introduction to what it is and what it does. I'll also try to describe it a bit here.
Basically, cfengine is a suite of daemons that run in a client-server type configuration. They are intended to compare the state of a given computer system to the ideal state described by a configuration file, and take steps to correct or warn of any problems it finds. This might take the form of:
- files that should be the same for every box in the environment (e.g. /etc/resolv.conf)
- processes that should or shouldn't be running
- ownership and permissions on files
- configuration of some system daemons
The basic gist is that if you have a network of over 200 virtual machines like I manage, and you need to make a change to a system configuration file, cfengine provides a method to do this without having to manually log into each machine and make the change, or alternately running a dangerous non-interactive shell script on every host.
One of the powerful aspects of this is that cfengine is aware of the differences between, say, how a Linux system does things and how a Solaris system does things. So for many tasks, the same line in the configuration file could cause a vastly different sequence of events depending on the system architecture. The system administrator doesn't have to care about this, she or he simply states in the high-level cfengine configuration language how the system should behave, and cfengine will figure out the details.
Another powerful aspect is the notion of "classes". This is great if you have a QA environment closely modeled on your production environment, but with some key changes. The cfgengine client daemon (cfagent) will automatically assign itself to several classes based on system variables such as subdomain or subnet. So if you have a "prod.acme.com" subdomain and a "qa.acme.com" subdomain for example, you can define different versions of key files for those subdomains.
At Lijit, we're tying it in to Subversion to make management simpler. We have a repository of cfengine configuration files (which get propogated to the clients in the same way as other files), and a repository of other files we wish to be distributed by cfengine. I can check out these repos to my PC, make the edits I need to make, commit the changes, and within 15 mintutes all of my clients will have the new configuration and, presumably, will have taken whatever steps they need to get in sync. There are methods for "pushing" updates from the master server as well, but I haven't gotten that far yet.
One stumbling block that I'm just figuring out now is how to use "define" statements to cause chains of events to take place. Take /etc/syslog.conf. If I update that file, the syslog daemon needs to be restarted. Cfengine can restart running processes in a myrad of ways, so that's no problem, but I don't want syslogd restarted every 15 minutes; that'd just be a waste, and could lead to dropped messages. So I add a "define" statement to the configuration line for the syslog.conf file. Like so:
copy:
qa_acme_com::
/foo/bar/qa/syslog.conf dest=/etc/syslog.conf mode=0644 owner=0 group=0 server=cfengine.acme.com define=new_syslog_conf
So, this line (in the "copy" section of the file) says that if you are a member of the qa_acme_com class, make sure your /etc/syslog.conf matches the master version and copy it over if it doesn't. If you copy the file, then define yourself as a member of the class "new_syslog_conf". I can then later on in the config say that members of the "new_syslog_conf" class should restart their syslog daemon.
This is just barely scratching the surface of cfengine. There are good tutorials and documentation out there for anyone wishing to learn about it. I suspect it will become an invaluable tool for us at lijit as the number of hosts we manage continues to grow. If you're managing more than a couple dozen hosts I definitely recommend having a look.