Sunday, September 30, 2012

Doing it wrong: application preferences

One thing that's been always bothering me is the lack of a unified way to keep persistent settings.

When you're installing a shiny new piece of software, you never know which surprises it brings. The location and syntax of the configuration files varies between applications and environments. Let's take a look at the most commonly used persistent storage formats and analyze their strong and weak points. Note that I only want to discuss the issue only in the aspect of desktop usage, where usability and maintainability should not be sacrificed over performance as we do in the embedded world. Also, note that I'm mainly focusing on the formats used in the desktop *NIX distributions because that's what I'm mostly using.

One key difference between many formats is whether they allow text data with arbitrary formatting and spacing or have a fixed binary layout. The former is obviously human-readable (and sometimes, editable), the latter can in most cases be easier parsed by the machine.

Plain text
Pro:

  • Human-readable

Contra:

  • No unified syntax (everyone invents their own)
  • No way to contol the correctness of the file contents (as opposed to xml schema for which there exists a multitude of tools and libraries)

XML
+

  • Can be edited by a human (to some extent)
  • Strict structure definition is a part of a standard (user can define custom types and limits for variable values)
  • Supported by many tools

-
  • Hard to read by a human
  • Inefficient to parse (though, better than a plain text without schema)

GConf is an XML-based configuration system for the GNOME and GTK applications. Basically, it inherits all the pluses and minuses of the XML.

JSON
+

  • Easy to read and edit by a human
  • Easy API in most libraries (we can think of it either as of a tree of objects or as hashmap of hashmaps of hashmaps...)

-

  • Lack of built-in schema checking (i.e., it has a formal syntax and we can check that a JSON is valid, but we cannot verify that the data structures are initialized with correct values)

Now, let's discuss binary formats.

Custom formats

Windows Registry
+

  • It's fast.

-

  • Most system-critical data is kept inside a single file (well, actually around 6 files in %windir%\System32\config). A single failure means the system is unable to boot.
  • All applications store their data inside a single database. If an application is running with 'Administrator' (something like root, but for retards), it can easily modify or read the data of other applications. This is a huge security issue.
  • Application activity is not logged (at least, by default). It means mischievous software leaves trails even after it is uninstalled which is especially true of shareware and other crapware.
  • Application and system registry are not isolated. You cannot just reinstall the system without keeping all preferences (like, keep the home dir but replace the rootfs in *NIX)
  • There are only several basic data types (DWORD, Text, Binary) and the user cannot introduce their own


DConf is a replacement for GConf from the GNOME project. The only real difference is using a binary format instead of XML. While it is arguably faster, it is completely incomprehensible to a human-being and when things go bad, I have no idea what to do.

BSON is a binary version of JSON.
+

  • It is faster
  • It consumes less space

-

  • It introduces custom types (like UUID) which means that any JSON can be converted to BSON, but not the other way round.


MessagePack is a binary serialization format that is aiming to be compatible with BSON while being optimized for network usage
+

  • It offers high compression ratio reducing network traffic
  • Comes with the IDL/serialization code for most mainstream languages (C++, ruby, python)
  • Offers type checking when used with statically typed languages
  • Comes with an RPC implementation which can be useful for network applications


-
  • Performance can theoretically be a bit lower than BSON due to bit-packing 
There of course exists a multitude of other config formats, but the most used on a linux destkop are plain text and GConf. Therefore, I'm lazy to describe every other thing under the sun.


Now, here is what I think application developers should do

  • Use glib/qt4/java built-in APIs for keeping preferences. Don't reinvent the wheel
  • Don't break the config format with time (ideally, you don't know anything about the config format if you use the existing API)
  • Make it easy for the users to backup the configuration data
Ideally, I would like that all apps (at least, all the free and open-source apps on linux) would use the same API for keeping preferences so that the system administrator can choose various backends, be it xml, json, sql or what not without modifying the applications.