Posts Tagged ‘Web’

Library Classification Systems

Tuesday, November 14th, 2006

One of the topics of my Computational Linguistics class is the Semantic Web project. From the very beginning, it is apparent that such processing would yield better results if it is specific to one domain: exempli gratia, a separate Semantic Web for Chemistry and a separate Semantic Web for Computer Science. I was hoping that a library classification system would be applicable to the more general case of “knowledge” classification. For example, this particular posting includes information about (using Library of Congress Subject Headings) “Semantic Web”, “Computational Linguistics”, “Cataloguing”, and “Classification — Books”. This could be encoded in some “classification identifier” stating what kind of information this post conveys.

Currently, there are two mainstream cataloguing & classification models in use: the Library of Congress Classification system, administrated by the Library of the Congress of the United States of America, that is a work of the federal government of the United States (thus public domain), and the Dewey Decimal System, administrated by the OCLC, which is copyrighted. Note that access to the full on-line database for either system requires a costly subscription.
Initially, I was inclined towards Dewey, since I’m familiar with it and is the preponderant system in Greece. However, after a conversation with one of College’s librarians, I was told that one of the main deficits of Dewey is that there might be numerous books listed under the same call number, while LC avoids that by adding all kinds of specifiers and what not to a call number. After some research, turns out that some of the specifiers LC adds are the author’s name encoded in a three-digit alphanumerical identifier. Exempli gratia, the book “Semantic Web Primer” written by G. Antoniou and F. van Harmelen, a book about the Semantic Web project, has been described with:

  • an LC class descriptor of “TK5105.88815″. “T” stands for Technology, “TK” for Electrical engineering, and “TK5105″ for Telecommunications. Note that its call number would be “TK5105.88815 .A58 2004″: “.A58″ is the Cutter-encoded version of “Antoniou” and “2004″ is the year this book was published in.
  • a Dewey class descriptor of “025.05″. “000″ stands for Generalities, “025″ for Library operations, and “025.04″ for “Information storage and retrieval systems”. Its call number would be “025.04 .A58″.

“Manheimer’s cataloging and classification” by J. D. Saye and A. Bohannan is classified under “Z693″ (Cataloging) in LC, and under 025.3/076 (Generalities, Library Operations) in Dewey. The LC call number would be “Z693 .S28 1999 Alc”, and its Dewey call number “025.3/076 .S28″.

Personally, I think it can be argued that the “Semantic Web Primer” might be misclassified, but this is besides the point and I am no expert.

Dewey requires access to proprietary copyrighted information that can be rather costly. On the other hand, the LC, strangely enough, requires a subscription in order to access its on-line database, and the books are also costly.

LC seems to be more interested in assigning a unique identifier to each book, id est identifying the books instead of classifying them into distinct categories.

DDC, as it claims, seems to be more interested in classifying all known knowledge. For example, within 700 (”Arts”) we have 790 (”Recreational and Performing Arts”), and within, 795 for “Games of chance”. After the three leading numbers, decimals can be used for as much further subdivision as needed, so 795.4 is “Card games”, 795.41 is “Card games based chiefly on skill”, 795.415 is “Contract bridge”, and 795.4152 is the “Bidding process (auction) in contract bridge”. Of course, there’s a corresponding LC identifier, “GV1282.4″, but to me, the DDC process seems more intuitive.

Also, the LC subcategories are developed by different groups of experts thus lacking consistency. On the other hand, DDC is maintained and updated biweekly by a small group of people, maintaining consistency. Furthermore, the 10 basic divisions have an equal range of numbers, as opposed to the LC’s tendency to span related categories over different letters (for example, History can be D, E, or F). Also the fact that DDC is decimal greatly helps with search queries. For example, for one to find all the Diplomacy related books, one would need to look into CD1, CD2, CD3, …, CD511, a rather inefficient practice when it comes to on-line library systems that do not support ranges.

A nifty variation of the DDC is the Universal Decimal Classification system but it seems too complicate for my purposes.

Now, only if there was a free (free as in free speech), DDC alternative that I could get my hands on without paying money…

Relative reading:

  1. “As We May Think” by Vannevar Bush, The Atlantic Monthly, July 1945.
  2. “Straight Dope Staff Report: What’s so great about the Dewey Decimal System?” by SDSTAFF Dex.

Force Firefox to Remember Passwords

Friday, August 11th, 2006

In order to make Mozilla Firefox remember a password, even if the website vendor is using AutoComplete=off to disable this feature, drag’n'drop the following bookmarklet to your Bookmarks Toolbar: Save Passwords.

Everytime you click the link, the script will run on the active tab, find any form elements that have their autocomplete tag set to off, toggle it, and add a smiley emoticon. Of course, Firefox will ask for a confirmation to save the password.

Upside-Down-Ternet

Monday, July 31st, 2006

Kudos to Mernion for pointing out Upside-Down-Ternet.

Simple and Nice Index File

Tuesday, July 18th, 2006

Simple and Nice Index File

“Server generated directory indices are ugly. OK, they work everywhere, but they’re still ugly. If you’d like your download directory to be maintainable without creating and changing huge HTML files, just put snif as its index file into the directory and here you go!”

Διαδήλωση κάτα του IExplorer Box Model

Tuesday, June 27th, 2006

Προτίθεμαι να διοργανώσω διαδήλωση (πλέον κατέχω και το σχετικό λεξιολόγιο τύπου “μπατσομέγαρο”) κατά του non-standards compliant box model του Internet Explorer. Δεν με νοιάζει το ότι λένε ότι δεν είναι τόσο secure όσο άλλοι browsers, δεν με νοιάζει το ότι είναι της Microsoft, με νοιάζει ότι είναι δεν είναι standards compliant και πρέπει να βάζω hackιές στον ειδάλλως πανέμορφο CSS2 και XHTML1.1 κώδικα μου.

Και ελπίζω μια μέρα να δημιουργηθεί μια αγορά τέτοια, που θα μου επιτρέπει να βάζω στα sites πελατών μηνύματα που θα εμφανίζονται μόνο στους χρήστες του Internet Explorer και θα τους λένε κάτι του στυλ “κατέβασε το Mozilla Firefox! οι ανά τον κόσμο web developers θα σε αγαπάνε λίγο παραπάνω.” ή “κατέβασε το Mozilla Firefox! ο internet explorer περιέχει ραδιενεργά απόβλητα.” Οτιδήποτε για να σταματήσει αυτή η κατάρα.

If you are not using Mozilla Firefox, you are no friend of mine.

CSS2: How To Clear Floats Without Structural Markup

Saturday, June 17th, 2006

Clearing a float container by simply using the “:after” CSS selector… No source markup needed!


^