We want to automate the classification of parliamentary information. Is that possible? What do we need to succeed? Are there exemples of a succesful automatic classification?
Bert,
Take a look at LOCKSS, it is developed and maintained at Stanford. You may find it useful. The link is below.
In terms of auto-classification, I will tell you that from my experience that there simply is no "silver bullet". As much as the software industry tells us *btw, which I am a part of* that it can interpret "data" as "information" in "context" - which is what’s needed for “automatically” classifying those bits, which comprise “data”, that yield “information”, which can then be understood based not only on it’s "context" but within the “context” that is presented, and perhaps more, such as information (data) that was derived from it’s originating application, or collected as a results of its preservation - As much as I wish it was there, it’s just not, at least not today – and anything to the contrary would be interesting, I would personally love to see it and show how easily it can be broken.
Now that said, it’s not without hope – again it’s all dependent on the degrees of complexity, or better put simplicity, and ability to manage manual “exceptions” to an acceptable level.
Someone mentioned SPAM filters, and it’s an interesting and appropriate comparison – but note that SPAM filters are not taking into account “context” or “information”, SPAM filters are searching for “patterns”. But newer SPAM technologies based on anomalies to an environment, in combination with “pattern” detection is probably a good first step direction for “auto” classification, but the technologies needed are in the wrong vertical buckets today. So it will be sometime before the industry really attacks this requirement.
Again, today’s solution is truly based on human intelligence, not computer intelligence.
Auto-classification, can be done, it just has to be based on very simple parameters – taking all of the reliable and “well known” systems information that is available to begin to build parameters, outside of the content within a document, for example; time and date stamps, IP octets, binary versions, document size, number of pages, format, language; well known parameters can be the beginning of forming a foundation of reliability for “simple” auto-classification – the key is to at all times to stay clear of errors.
Good Luck,
Peter
January 2009
* http://www.lockss.org/lockss/About_Us
