Home>Applications>Code Librarian

Contents

*Abstract
*Feel the source, Luke!
*Design - a db oriented briefing
*The indexer
*- Trigger mode
*- Full sweep mode
*The web interface


Abstract

Code librarian is a generic tool for keeping track of updates to CVS repositories and presenting them for site visitor in a friendly CVS Browser web environment, conceptually similar to tools such as Bonsai or ViewCVS. It also presents nice commit graphs, such as those shown on this site. Main goals for the project include scalability, usability, extendability and configurability, and in trying to acheve those goals, much of the dirty work is delegated to MySQL and Roxen WebServer.

Feel the source, Luke!

At present, the repository is hosted by the Lysator Academic Computer Society and can be downloaded via anonymous CVS from cvs.lysator.liu.se. More info on setting it all up will be available, here on the home page and in the INSTALLING file not yet written. For now, you can browse its source code on Lysator via the tool itself, though the layout template in use there is not as thoroughly worked through as that on this site. Stay tuned for updates. Those of you who want to check out the code and its docs to play with it and try it out could issue these two commands:

cvsá-d:pserver:anonymous@cvs.lysator.liu.se:/cvsroot/code_librarianálogin
cvsá-d:pserver:anonymous@cvs.lysator.liu.se:/cvsroot/code_librarianácoácode_librarian


Design - a db oriented briefing

The system design is fairly straight-forward; the core of the Code Librarian is its database, where all data about monitored repositories reside. This database is shared among the different components, who may or may not run across multiple machines simultaneously. Where needed, synchronization is implemented using the MySQL cooperative locking scheme (GET_LOCK() / RELEASE_LOCK()).

The database is initialized by the Code Librarian indexer, whose primary function is to produce the main contents of the database. Once setup, the database keeps track of what repositories are monitored, their host machine, cvs access scheme, cvsroot and how they should be referred to in the user interface. The bulk of the database, however, is all data collected about these repositories.

The indexer

The Code Librarian indexer is the main producer of content for the database. For one CL setup, you may run multiple indexers, typically one per repository host. Each indexer may monitor multiple repositories, granted filesystem (read) access to their cvsroot. The indexer can run in two modes (interchangably; a switch can be done at will anytime without them interfering with one another), depending on the nature of your setup and your personal preference.

- Trigger mode

The recommended (most responsive, least resource intensive) mode of operation is the incremental (notify, trigger, daemon) mode. This mode requires setting up commitinfo and taginfo hooks for the cvs repositories that notify the indexer of repository events as they happen through an eventlog directory on the same machine. Then start the indexer in daemon mode (supply the -l flag and the path of the event log directory). It will perform an initial sweep of all of its repositories at start-up, and then continuously monitor the log directory for updates (new commits and tags). This way, repository changes take effect and show up in the UI almost instantly, with a minimum of strain on the system.

- Full sweep mode

If, for some reason, incremental mode is not a viable option, there is also the more crude full sweep mode, suited to be run from crontab, for instance. Full sweep mode only performs one sweep through all its repositories and then exits back to the shell. The advantages over daemon mode is that it takes a minimum of setup and that it only runs when you choose to (if this is good or bad is of course a matter of taste). The disadvantages of this mode are:

*the unavoidable latency - The system is less responsive to repository changes, and the UI will lag behind more the less often you run the indexer.
*excessive disk activity - Since each invocation has to traverse all directories of all repositories, your disk arrays could take on a massive load of stat() / get_dir() calls if you monitor reasonably big repositories.


The web interface

The web frontend is provided by a set of roxen modules (okay, currently only one) and a set of RXML templates. The look and feel is entirely up to these templates, since the modules only provide the tools for extracting the relevant data from the database in a convenient manner consistent over time (database design is more likely to change than the RXML access methods). The web server (or servers; you may run the frontend on multiple machines, should you want to) need not run on the same machine(s) as your repositories. Read and write access to the database is also needed. Write access is strictly only needed for caching up some operations (colorized view of file contents, for instance) so far, though.