So what is Metadata, anyway?

Writing about metadata is risky business, since every post and every tweet potentially starts the same discussion: what exactly is metadata, anyway? So here's my ambitious attempt to cut to the chase. And then open the can of worms again.

Why would you care, anyway? Isn't this just some highly technical or theoretical debate? Well, to some extent it is, but the fact remains that for any content technology, metadata is essential. In a way, metadata is what allows us to use a system to manage content in the first place. And even if you take the brute force approach of using enterprise search, rather than meticulously organizing all this content with metadata, you'll find results will be disappointing, at best. (In fact, if there's no useful metadata available, search engines will have to create it themselves.) Metadata is so important that we now even get court rulings to define it.

Of course, the essence is easily defined. Metadata is data about data.

The examples are abundant: a document's author, the date content was created or published, the name of a database column, even the filename is metadata. You can see it in any system dealing with content, and often, helpfully, it will actually be marked as "metadata." There are standards for what metadata you could have (like Dublin Core, or EXIF) or how to store it in a document itself (like XMP). If that's all you want to know, now might be a good time to stop reading. Because from there, it starts getting tricky.

Some argue that the concept of metadata is just not very intuitive, because it's artificial, something we're not used to "in real life." I doubt it. (You need to look no further than the cover of a book to understand why.) In fact, we're quite used to those meta-levels of looking at things. We need them to communicate. ("The color of my car is green.") So used to them, in fact, you could argue that any kind of content is metadata, since it always describes something else. (Even a picture of a chair is not really a chair, just a reference to it; and this blog post is not just a text -- it's about...) The problem is that, in the end, you can't really define the distinction between data and metadata.

So in content management, we actually define metadata by its use or purpose, rather than its nature. Something is metadata, because we want to use it as metadata.

A developer wants to sort on a date field; or your resident taxonomist or knowledge manager wants to classify the content; or users need facets to refine the results in their search interface. However, those uses can be quite different, and sometimes at odds with each other.

Your records manager may want to keep all the metadata together with the data, as one "document." A developer would often prefer a system to treat metadata just as it does any data (because then it's accessible through the APIs in a uniform way, and the developer doesn't need to jump hoops to get to it). On the other hand, for performance purposes, you might want to keep metadata and data separate (store the "about" stuff in the database, and the huge video itself on the filesystem.) But a web editor will often wonder why some important fields (their distinction will often seem entirely arbitrary) are marked "metadata" and hidden two tabs and several clicks away.

You're unlikely to resolve those conflicts by arguing who's right. Some of these particular debates have been raging for thousands of years. Plato would say that you should consider metadata to be external to what it describes. Aristotle would tell you that these are inherent attributes of a file or record. A point excellently illustrated by Raphael's painting in the Vatican, with Plato, at left, pointing to The Cloud, obviously, and Aristotle controlling the files.

You may want to hire several expert philosophers to argue on your behalf, while you get on with the job of actually managing content. Because in the end, everybody is going to disagree on what metadata is, and nobody is going to be right. So it's important to understand this is not a simple given.

But for any content management project, you'll want to be clear on what everybody needs. And that's what should define your metadata.

(And by the way, if you completely disagree with me on this -- have your philosopher contact my philosopher, and they can work out the epistemological and ontological fine print.)

Other ECM & Cloud File Sharing posts

ECM Standards in Perspective

In real life I don't see ECM standards proving particularly meaningful, and you should see them as a relative benefit rather than absolute must-have.