What Is An Audio Server
Linux users are most familiar with the concept of a server in the form of a separate machine dedicated to performing a specific task, such as a standalone mail server or a file server, or in the form of server software running on the same machine as its client software, such as the X windowing system. A server provides particular resources to an entire system, whether one machine or many, eliminating the need to directly access such services at the application level. For example, an X server manages access to and control of the video services of the machine's graphics chipsets, relieving user-level application developers of the burden of programming for those services directly.
An audio server manages access to the capabilities and services of the installed audio devices. These devices include soundcards, on-board audio chipsets, and any other hardware with audio capabilities (telephony hardware, combined A/V cards, television and radio boards). Common instances of standalone audio servers include machines dedicated to streaming audio over networks and machines dedicated to serving audio files. A streaming media server would be employed in a netcasting station, while an audio file server would be used typically in a recording or post-production studio.
The graphic Linux desktop depends upon X for its graphics and other video services. Thus, your favorite KDE or GNOME applications include routines for accessing those services through the X applications programming interface, i.e., the programmers can code for the hardware via a generalized API instead of having to directly address the hardware. Alas, the sonic Linux desktop lacks a single standardized solution for serving audio resources system-wide. Instead, a variety of solutions have appeared, including the artsd, esd, NAS, and JACK systems. Before considering these systems, let's take a closer look at the hardware they address.
The audio industry distinguishes between consumer and professional grade audio devices. Consumer-grade devices include PCI and USB audio interfaces, on-board chipsets for integrated desktop and laptop sound support, and more advanced hardware such as Creative's SB Live! and Audigy cards. These devices normally provide channels for a master volume control, PCM and CD audio output, and inputs for microphone and line-level signals. The master volume, CD, mic, and line channels are self-explanatory. The PCM channel is a general digital audio playback channel providing volume control for programs playing WAV, AIFF, OGG, MP3, and other soundfile types.
Depending on the audio chipset, these basic services may be expanded to include channels for on-board synthesizer output, digital audio connections, surround sound channels, and bass/treble tone controls. Software mixers such as alsamixer poll the audio hardware for its capabilities and configure the mixer to display the available channels and switches. Thus, my laptop's CS4232 audio chipset supplies little more than the basic services, while my desktop machine's SBLive provides a much larger array.
Professional-grade audio boards such as the RME Hammerfall or the M-Audio Delta cards are designed to satisfy different needs, providing higher-quality audio connectivity such as AES/EBU and balanced 1/4" plugs, a greater number of audio I/O channels, higher sampling rates, and hardware syncronization capabilities. They may or may not include hardware MIDI connectivity, and they do not usually include consumer-grade amenities such as an on-board synthesizer or a connector to your CD drive's audio output.
The distinction between the grades is obscured by some more advanced devices intended for the consumer market, and it is certainly possible to achieve truly high-quality results from some of the more modern soundcards. However, for truly professional requirements you'll need professional-grade audio hardware.
Audio Servers: Tasks And Solutions
Storage and transmission of high-quality audio data files and streams require non-trivial disk space and bandwidth. Managing a single CD-quality (16-bit stereo, 44.1 kHz sample rate) audio stream requires considerable computation, and in Ye Olden Times hardware imposed the severest limitations on digital audio I/O. Today, huge storage is cheap, bandwidth is broad, and CPUs are increasingly powerful. Today the performance burden has shifted to the software and how it coordinates and syncronizes audio data reception and delivery.
Common desktop sound services include CD/DVD audio playback, soundfile playback (WAV, MP3, OGG), access to rendering via a MIDI synthesizer, and so on. Modern desktop sound services also include audio multiplexing, meaning that you can listen to your OGG files and hear your instant messaging chimes without interruption. Without multiplexing capabilities only one sound source at a time can be active, hardly a tolerable situation for the modern sonic desktop. Of course, the job of managing these services belongs to your audio server software.
For triggering system sounds, listening to CD and DVD audio, and simple recording, the demands on an audio server are relatively light. Managing several audio streams at this level requires no sample-accurate syncronization, nor is there a need for a highly flexible client routing system. Most users simply want to be able to play recorded audio without blocking other audio streams.
Artsd and esd are sound servers designed to meet at least these requirements for the KDE and GNOME desktops, serving system audio cues and providing transparent audio multiplexing. The Network Audio System (NAS) is a venerable network-friendly client/server audio delivery system that its Web site describes as the audio equivalent of the X video server. Like the artsd and esd systems NAS was not designed for professional audio needs, but it remains in development and is still a good choice for serving audio over a LAN. Within their limits all these systems function effectively. However, none of them provide sample-accurate syncronized I/O of multiple audio data streams, nor were they designed for performance within low-latency systems. If your audio needs require those capabilities you have ventured into the domain of professional audio requirements, and now you need to know JACK.
From the brief descriptions above we see that the existing audio servers for Linux do not meet the demands of low-latency high-bandwidth audio systems. This lack is crippling to the writers of professional sound software, forcing either a workaround to accommodate an existing server or the creation of an application-specific audio service layer.
Fortunately developer Paul Davis has created one of the most remarkable pieces of open-source audio software, the JACK Audio Connection Kit, better known as just JACK to its developers and users. JACK has been designed specifically to work in low-latency environments, meeting the need for reliability and sample-accurate syncronization between clients. Later I'll discuss its possibilities as a general purpose sound server a la artsd or esd, but my primary focus in this article is on JACK's utility to desktop and professional musicians and sound workers.
JACK has been designed specifically for systems tuned for low-latency and high demand. Professional audio recording systems can not afford audible delays and dropouts (known as xruns in JACK-speak), and such systems are expected to support the synchronous operation of multiple clients in a low-latency environment.
By the way, lowering the performance latency of a computer dedicated to professional audio is a complex task. Hardware and software capabilities must be carefully considered. Typically the Linux kernel itself must be patched and recompiled to accommodate optimizations that dramatically lower system latency. Your hardware should be selected for high performance under sustained and possibly massive data throughput, so your soundcard, CPU, and disks must provide optimal performance for the task at hand. As this article is not intended to explain the details of setting up a low-latency Linux system interested readers should study the relevant material listed in this article's Resources.
JACK's primary task involves the management of multiple audio data streams, coming from and going into a variety of applications with synchronized I/O, while avoiding data delay and dropout. It needs to do this in an environment that can reasonably expect its applications to perform at professionally acceptable levels.
In addition to its robust performance capabilities, JACK's attractions include the following features for normal users :
Building & Installing JACK
JACK is available as a basic package in the AGNULA/Demudi and Planet CCRMA audio-optimized systems. A source tarball of the latest public release is available from the JACK Web site. The site also provides instructions for building JACK from CVS sources for those of us who want to keep up with the latest development. No special build requirements are needed for JACK itself beyond Erik de Castro Lopo's libsndfile audio file I/O library.
According to the JACK FAQ you must have a recent Linux kernel (2.4 or higher) with the tmpfs file system turned on. Most modern distributions will have this turned on by default, but you can check for it by running 'cat /proc/filesystems'. The FAQ also states that you must mount a shared memory filesystem on /dev/shm, advising that the following line be added to /etc/fstab :
shmfs /dev/shm shm defaults 0 0The FAQ further notes that you may have to create the /dev/shm directory yourself. The mkdir command is your friend here.
After unpacking the sources simply enter the new JACK directory, read the README for up-to-date instructions, then invoke './configure --help' to see the available configuration options. JACK is built with the familiar autotools utilities, so for most users the compile process is as easy as running './configure [your options here]; make; make install'.
Installing JACK from an RPM or other package also requires no special support. Follow the basic installation procedure for your system, and voila, you have a fresh JACK system ready for use.
The JACK server is launched with either jackd or jackstart. The JACK manual page (man jackd) tells us that jackd invokes the JACK server daemon and that jackstart is used when using JACK's built-in support for realtime capabilities. All options are the same for either invocation. For most users working on systems with a 2.4 kernel patched for capabilities jackstart will be the preferred method of starting the server. Users working with 2.6 kernels should use jackd.
Here's a relatively simple beginning :
jackd -R -d alsa -d hw:0In this example JACK has been started with realtime capability, acknowledging the ALSA back-end and addressing the first hardware device in the audio system. The '-d hw:0' switch is actually unnecessary, the hardware selection always defaults to hw:0 anyway. Obviously you would use a different number for a different card or chipset in a system with multiple sound devices.
Here's a slightly more complex example for my SBLive soundcard :
jackstart -R -d alsa -d hw:1 -p 512 -r 48000 -z sOnce again we see the realtime and ALSA options. The device selector is numbered hw:1 because the SBLive is the second card in that particular machine. I've added options for the buffer size (-p), for the JACK sample rate (-r), and for the audio dithering option (-z). Note that the -p option sets the software buffer size. As Jack O'Quin points out, this is the buffer size seen by all JACK clients.
When you first meet JACK you may be confused by some of its options, so I've prepared a summary of those options with user-level descriptions. First we'll look at the parameter settings.
-R, --realtime Starts JACK with realtime scheduler priority. Normally you will want this option activated, but be aware that it functions only if you have root status or are running a kernel that grants such status to a normal user. Kernels from AGNULA/Demudi and Planet CCRMA are prepared for such status, but any kernel can be patched and modified for low-latency, with root-user capabilities enabled. Jack O'Quin indicated to me that in fact JACK needs root privileges only for realtime scheduling and memory locking. Most other root privileges are not required. For information on configuring and building your own low-latency kernel please see this article's Resources.
I asked members of the Linux Audio Users mail list whether there might be good reasons to *not* use the realtime option. I learned that JACK is still useful on systems without realtime capability, hence the option. Additionally, realtime capabilities might be unwanted while testing an application whose realtime performance needed debugged.
-m, --no-mlock Signals JACK to keep memory unlocked. Paul Davis explained that it could be useful when running JACK in realtime on a system whose physical RAM is being consumed by JACK and its clients. Generally speaking, most users will not need to activate this option.
-u, --unlock Unlocks memory claimed by graphics toolkits (GTK, QT, FLTK, WINE). Again, this option is useful for machines with low amounts of memory (physical RAM), but it is especially useful for users running VST/VSTi plugins and other WINE-dependent applications. In some cases such applications may not run at all until this option is selected. You might also want to unlock memory held by GTK or QT if you're running many graphics-intensive applications in a memory-hungry environment.
-s, --softmode Ignores xruns reported by an ALSA driver, making JACK less likely to disconnect unresponsive ports when run without realtime status. You might select this option to avoid too-copious error reports. This option might also be valuable for live performance to keep JACK's connection state no matter what happens.
-S, --shorts Forces JACK's I/O to 16 bits. As Lee Revell pointed out, JACK's internal processing is always carried out at 32 bits, and by default it will attempt to set the bit resolution at its input and output stages to 32, 24, and 16, in that order, reporting success or failure with each attempt. Users with cards known to work optimally at 16 bits might want to use this option just to avoid the error reports. Clemens Ladisch also noted that due to audio chipsets with limited bandwidth some devices trade off the sample rate against the number of I/O channels, providing more channels at a lower sample rate.
-H, --hwmon Enables hardware monitoring of ALSA's capture ports, providing zero-latency monitoring of audio input. Requires hardware and device driver support. The jackd man page says this about hardware monitoring :
"When enabled, requests to monitor capture ports will be satisfied by creating a direct signal path between audio interface input and output connectors, with no processing by the host computer at all. This offers the lowest possible latency for the monitored signal."Note that this option is currently an ALSA-only option available for the RME Hammerfall cards and cards based on the ICE1712 chipset (M-Audio Delta 44/66/1010, Terratec, others). The ALSA soundcard database lists cards that support hardware monitoring, see this article's Resources for a link to the database.
-M, --hwmeter Another ALSA-only option. Enables hardware metering if your soundcard supports it. Paul Davis notes that this option is used only rarely and that it is likely to be removed in future releases.
-z, --dither Dithering is a process that minimizes unwanted side-effects of reducing an audio file's bit-depth. Low-level noise is mixed into a signal to randomize digital audio quantization errors, turning audible and unpleasant digital distortion into something more closely resembling analog noise. According to Paul Davis, dithering is especially helpful when your soundcard's output is less than 24-bit resolution and you run JACK at the hardware's real sample rate. Paul further noted that its a good idea to choose dithering for almost any consumer-grade hardware, though he added that the sonic difference might not be very noticeable in the speaker systems usually associated with consumer-grade soundcards.
-P, --realtime-priority Sets the realtime scheduler priority. Normally you can leave this setting at its default value of 10. If your kernel includes realtime preemption you might want to set it to at least 70 to keep JACK running ahead of interrupt handlers.
-p, --port-max Sets the maximum number of JACK output ports. This option is especially valuable for people using a lot of tracks in Ardour. The default of 128 should be enough for most users. QjackCtl lets you select up to 512 ports, but more are available with sufficient memory.
-d, --driver Select hardware driver. In fact, you're selecting the audio system back-end with this option. Currently supported systems include ALSA, OSS/Linux, CoreAudio, PortAudio, FreeBob, and a dummy system (useful for testing). Most Linux users will want to choose either ALSA or OSS.
-r, --rate Sets JACK's sample rate. The default is set to 48000 Hz, but you may need to experiment to determine the best sample rate for your system. Lower-powered systems may find it necessary to bring down the sample rate to improve performance, but generally you want a rate of at least 44100 Hz for high-quality sound. Note too that some soundcards (e.g. the SBLive) work well only at a single sample rate
-p, --period Specifies the number of frames between JACK's process() function calls. The default value is 1024, but for low latency you should set -p as low as possible without producing xruns. Larger periods yield higher latency, but also make xruns less likely, so you may have to experiment to find the optimal setting for your hardware. Incidentally, 'man jackd' tells us that JACK's input latency (measured in seconds) is --period divided by --rate.
These settings determine the number of audio I/O channels. The default is the maximum number supported by your hardware, so for most purposes you can just use the default values.
-n, --nperiods Specifies the number of periods in the hardware buffer. The default value is 2. The period size (-p) times --nperiods times four will equal the JACK buffer size in bytes. By the way, JACK's output latency (again in seconds) is the number of periods (-n) times the period size (-p) divided by the sample rate (-r).
Set JACK to record-only, playback-only, or full duplex status (simultaneous play and record). This setting can be very important: Some cards will simply not perform well in duplex mode but work quite well in the simplex modes. For example, on my laptop JACK will report a steady stream of xruns if I run the CS4232 chipset in duplex mode. The xruns disappear if JACK is set for either record-only or playback-only mode.
When trying to find the best settings for lowest latency and fewest xruns you'll want to focus your adjustments on the period size, the sample rate, and the number of periods. You may need to experiment to find the best overall settings.
GUIs For JACK
We have already seen JACK in action at the command prompt. However, when working in an X environment it's nicer to have a GUI for JACK's setup and configuration, and thanks to developer Rui Nuno Capela we have the wonderful QJackCtl [Figure 1]. This most helpful utility provides an all-in-one graphic interface for configuring and controlling all of JACK's operations. In addition to the convenient Setup dialog [Figure 2] QJackCtl supplies an audio connections panel for JACK clients and a set of basic JACK transport controls (if you want to use QJackCtl as the JACK transport master). QJackCtl further supplies messaging and status display panels, controls to start and stop the server, and play/pause controls for JACK's transport control system.
Figure 1: QJackCtl [01-qjackctl.png]
Figure 2: QJackCtl Setup Dialog [02-qjc-setup.png]
QJackCtl also includes a MIDI connections panel for ALSA sequencer clients, letting users manage audio and MIDI connectivity from a single control interface. You can save and load your total connections graph as a Profile in QJackCtl's Patchbay [Figure 3]. The Patchbay's operation isn't quite automatic, but it is a real time-saver if your connections are many and complex.
Figure 3: QJackCtl Patchbay [03-qjc-patchbay.png]
QJackCtl is my favorite standard tool for controlling JACK, but there are at least two other GUIs for managing JACK connectivity. Dave Robillard's Patchage is a patchbay for both JACK audio and ALSA MIDI connectivity via its unique visual interface [Figure 4]. Matthias Nagorni's QJackConnect [Figure 5] is a nice JACK-only QT-based patchbay, but it appears that project development is on hold.
Figure 4: Patchage [04-patchage.png]
Figure 5: QJackConnect [05-qjackconnect.png]
Applications Using JACK
JACK support has become an expected feature in new Linux audio software. As a result, the list of implementations has become too lengthy to print here, but its domains of implementation include hard-disk recording systems (Ardour, ecasound, Wired), drum machines/rhythm programmers (Hydrogen), software sound synthesis environments (Csound5, SuperCollider3), audio/MIDI sequencers (Rosegarden, MusE, seq24), soundfile editors (Snd, Audacity, mhWaveEdit, ReZound), and standalone softsynths (AMS, Om, ZynAddSubFX). Other significant JACK-savvy projects include the LinuxSampler and Specimen sampler projects and the various schemes for supporting VST/VSTi audio plugins under Linux (these schemes also require the WINE software). Linux media playback systems such as MPlayer, XMMS, and AlsaPlayer also provide JACK support.
Readers should note that these applications vary in the scope of their JACK support. Some use only its audio connectivity, some use only partial implementations of its transport control, and a few already take more complete advantage of JACK's features. Please consult the documentation for any JACK-aware application to determine the extent of its support.
The basic JACK package includes a number of useful command-line tools such as jack_connect/jack_disconnect (manages client connections), jack_metro (a configurable metronome), jack_lsp (lists JACK ports, their connections and properties), and jack_transport (manage JACK transport control status). Thanks to its appeal to applications programmers JACK has also inspired a wave of cool tools and utilities. Bob Ham's JACK-Rack [Figure 6] is a very useful container for LADSPA plugins that lets you build a virtual rack of audio processing modules with MIDI control of plugin parameters. Steve Harris's JAMin [Figure 7] is the result of a collective effort by Linux audio professionals to create a pro-quality stereo mastering interface based on LADSPA audio signal processing plugins. Timemachine [Figure 8] is another treat from Steve Harris. It's essentially a recorder that always maintains a buffer of the last ten seconds of recorded material. When fully armed Timemachine writes the buffer to disk and continues recording in realtime. Fons Adriaensen's JAAA (JACK and ALSA Audio Analyser, Figure 9) is a professional-grade signal generator and spectrum analyser designed for accurate audio measurement. And just to show that there's no absolute need for a fancy GUI, Florian Schmidt's jack_convolve is a JACK-based command-line convolution engine, very handy for creating high-quality reverb effects and other interesting sounds.
Figure 6: The JACK-Rack [06-jack-rack.png]
Figure 7: JAMin [07-jamin.png]
Figure 8: The Timemachine [08-timemachine.png]
Figure 9: The JACK and ALSA Audio Analyzer [09-jaaa.png]
URLs for all these and other neat JACK applications are listed on the Linux Sound & MIDI Software site at http://linux-sound.org/jack.html.
JACK In Action
Figures 10 and 11 show off JACK in two typical uses here at Studio Dave. Figure 10 illustrates the simpler use in an audio-plus-MIDI network combining the seq24 MIDI sequencer, the QSynth soundfont-based synthesizer, and the JACK-Rack, all operating on my PII 366 Omnibook and its humble Crystal Sound CS4232 chipset. Figure 11 demonstrates a more ambitious set of routing and connections with JACK managing I/O between my M-Audio Delta 66, Ardour, and Hydrogen.
Figure 10: JACK Simple [10-jack-simple.png]
Figure 11: JACK Complex [11-jack-complex.png]
There's little more to say about using JACK in these scenarios. Once I've configured JACK its performance is completely transparent. All I have to do is make my connections and make my music.
Programming With JACK
Programming with the JACK API is beyond the intended scope of this article. Interested readers can find excellent instructional material in the JACK source code (see simple_client.c in the example_clients directory) and on various Web sites. James Shuttleworth's tutorial is a well-written introduction to adding JACK to a simple audio application, Lewis Berman has contributed a PDF on writing a JACK audio recorder, and of course the JACK API can be read and studied in the well-commented jack.h header file.
If you build JACK yourself and you have the doxygen software installed you can generate JACK's developer documentation. This documentation is also available on the JACK Web site but it is out of date as of September 15 2005.
In 2004 JACK won a well-deserved Bronze award in the Merit Awards granted by the Open Source Initiative. At that point JACK's development was at version 0.9x. JACK is now at version 0.103, moving steadily towards its 1.0 release, and the future is looking good for JACK.
Stephane Letz has successfully ported JACK to OSX. Support for OSX has become a more common feature in new Linux audio software, and programmers using JACK may find it easier to plan for cross-platform audio support. Incidentally, an implementation for Java has already appeared, further enhancing JACK's cross-platform availability.
MIDI musicians are familiar with time code implementations not currently supported by JACK, and a coordination of synchronization capabilities would be most welcome. Some work in that direction has already begun, so it is likely that a blend of MIDI and JACK is bound to evolve. [NB: JackMIDI has been incorporated since version 0.102]
Development of artsd and esd has slowed recently, and the matter of a standardized audio server for the Linux desktop remains problematic. Some recent Linux distributions have enabled ALSA's dmix plugin by default, resolving at least the audio multiplexing issue. I hope all distributions will enable dmix's software mixing by default and give Linux users the same transparent service enjoyed by Windows users. However, other important issues regarding audio services are not addressed by dmix, and the need remains for a standard audio server for the Linux desktop. The resolution of this need would be a win for all Linux users. JACK's attractions may seem irresistable, but it may not be the best solution for common desktop audio services. Unlike ALSA, JACK is not planned for inclusion with the Linux kernel sources, so its presence in any Linux distribution results from a decision made by the distro's producer. Also, JACK is not so transparent to the user as the artsd and esd servers, and more configuration is required for obtaining best performance. Nevertheless, JACK is a very flexible system and may yet become the de facto audio server for the Linux desktop.
For the more professionally inclined JACK is a true blessing. Its performance stability has already been tested and verified in high-demand real audio world application, and its applicability can be seen in an expanding suite of increasingly powerful JACK-based programs. JACK's API has paved the way for a new wave of high-quality Linux audio applications and capabilities for the normal Linux desktop and for the recording professional. Whether you need rock-solid audio system performance for Ardour or you just want to have some fun routing the output from XMMS through some LADSPA effects in the JACK-Rack, you need to know JACK.
You knew I'd say it, didn't you ?
First and foremost I want to thank Paul Davis not only for his work on JACK but for all his contributions to the Linux audio community. Many users and developers have benefited from Paul's dedication to the cause and from his willingness to help whenever possible.
While writing this article I solicited input from members of the Linux Audio Users mail list. I learned a lot from their replies, and I thank them all for their assistance. In particular I must thank Paul Davis, Jack O'Quin, and Steve Harris for taking the time to proofread this article. Of course, I am responsible for any remaining errors.
The ALSA Soundcard Matrix
Interview with Paul Davis at Builder.com
The Low-latency Mini-HOWTO
Florian Schmidt's notes on building a low-latency 2.6 kernel
The JACK page at linux-sound.org