Library Classification Systems
Tuesday, November 14th, 2006One of the topics of my Computational Linguistics class is the Semantic Web project. From the very beginning, it is apparent that such processing would yield better results if it is specific to one domain: exempli gratia, a separate Semantic Web for Chemistry and a separate Semantic Web for Computer Science. I was hoping that a library classification system would be applicable to the more general case of “knowledge” classification. For example, this particular posting includes information about (using Library of Congress Subject Headings) “Semantic Web”, “Computational Linguistics”, “Cataloguing”, and “Classification — Books”. This could be encoded in some “classification identifier” stating what kind of information this post conveys.
Currently, there are two mainstream cataloguing & classification models in use: the Library of Congress Classification system, administrated by the Library of the Congress of the United States of America, that is a work of the federal government of the United States (thus public domain), and the Dewey Decimal System, administrated by the OCLC, which is copyrighted. Note that access to the full on-line database for either system requires a costly subscription.
Initially, I was inclined towards Dewey, since I’m familiar with it and is the preponderant system in Greece. However, after a conversation with one of College’s librarians, I was told that one of the main deficits of Dewey is that there might be numerous books listed under the same call number, while LC avoids that by adding all kinds of specifiers and what not to a call number. After some research, turns out that some of the specifiers LC adds are the author’s name encoded in a three-digit alphanumerical identifier. Exempli gratia, the book “Semantic Web Primer” written by G. Antoniou and F. van Harmelen, a book about the Semantic Web project, has been described with:
- an LC class descriptor of “TK5105.88815″. “T” stands for Technology, “TK” for Electrical engineering, and “TK5105″ for Telecommunications. Note that its call number would be “TK5105.88815 .A58 2004″: “.A58″ is the Cutter-encoded version of “Antoniou” and “2004″ is the year this book was published in.
- a Dewey class descriptor of “025.05″. “000″ stands for Generalities, “025″ for Library operations, and “025.04″ for “Information storage and retrieval systems”. Its call number would be “025.04 .A58″.
“Manheimer’s cataloging and classification” by J. D. Saye and A. Bohannan is classified under “Z693″ (Cataloging) in LC, and under 025.3/076 (Generalities, Library Operations) in Dewey. The LC call number would be “Z693 .S28 1999 Alc”, and its Dewey call number “025.3/076 .S28″.
Personally, I think it can be argued that the “Semantic Web Primer” might be misclassified, but this is besides the point and I am no expert.
Dewey requires access to proprietary copyrighted information that can be rather costly. On the other hand, the LC, strangely enough, requires a subscription in order to access its on-line database, and the books are also costly.
LC seems to be more interested in assigning a unique identifier to each book, id est identifying the books instead of classifying them into distinct categories.
DDC, as it claims, seems to be more interested in classifying all known knowledge. For example, within 700 (”Arts”) we have 790 (”Recreational and Performing Arts”), and within, 795 for “Games of chance”. After the three leading numbers, decimals can be used for as much further subdivision as needed, so 795.4 is “Card games”, 795.41 is “Card games based chiefly on skill”, 795.415 is “Contract bridge”, and 795.4152 is the “Bidding process (auction) in contract bridge”. Of course, there’s a corresponding LC identifier, “GV1282.4″, but to me, the DDC process seems more intuitive.
Also, the LC subcategories are developed by different groups of experts thus lacking consistency. On the other hand, DDC is maintained and updated biweekly by a small group of people, maintaining consistency. Furthermore, the 10 basic divisions have an equal range of numbers, as opposed to the LC’s tendency to span related categories over different letters (for example, History can be D, E, or F). Also the fact that DDC is decimal greatly helps with search queries. For example, for one to find all the Diplomacy related books, one would need to look into CD1, CD2, CD3, …, CD511, a rather inefficient practice when it comes to on-line library systems that do not support ranges.
A nifty variation of the DDC is the Universal Decimal Classification system but it seems too complicate for my purposes.
Now, only if there was a free (free as in free speech), DDC alternative that I could get my hands on without paying money…
Relative reading:
- “As We May Think” by Vannevar Bush, The Atlantic Monthly, July 1945.
- “Straight Dope Staff Report: What’s so great about the Dewey Decimal System?” by SDSTAFF Dex.
