Libris Magnus- I
I have been working on a project for a long time that I'm going to describe in the next few blog entries.
I began collecting e-books some time ago. At first it was just a few but as I began finding more and more sources for them, my ebooks collection became like a snowball rolling downhill getting bigger and bigger gathering momentum.
It wasn't too long before the management of this collection became a real chore. This library is now several hundred gigs in size and contains over 100,000 distinct volumes.
There are a great many headaches involved. First: formats- there is pdf, djv, txt, html, compiled html and more. Second: some volumes are teasers and are incomplete and ineligible for inclusion. Third: some books don't neatly fit into one subject area- e.g. Does An Atlas of Gravity belong in the Reference section, physics section or astrophysics section? Fourth: duplicates are a big problem and are not easily detected.
As a person who has worked in "Information Systems" for 20+ years, the management of unstructured data is a challenge that I have been working on for a very long time. While many applications are neatly and cleanly death with in the tabular format of spreadsheets and relational databases, knowledge is not so neat or clean. It comes in blobs and often overlaps on to completely different genres. The breakthrough that I have accomplished is not making knowledge fit into a nice neat package or application. It is building the application to accommodate the knowledge.
Various schemes have been devised to accomplish this. The oldest and best known is the Dewey Decimal System which is a reasonable and efficient way to organize paper libraries. When IT is applied to the problem however, Dewey's decimals don't add up. The DDS depends on numbers like 431.11c, 536.32 and 702.122. The problem is that computers see those numbers as REAL numbers with a decimal part and an exponent. In some cases a Dewey Decimal Number doesn't equal a number that the computer can use. Because of rounding, 702.122 might actually equal 702.123. This Dawg just won't hunt.
In the eighties, another scheme developed for organizing library data with info technology in mind is called MARC MAchine Readable Cataloging. MARC kicks butt for cataloging libraries. The problem is that it was expanded to encompass ALL of the different things that modern libraries are keeping: records, CDs, software, etc. MARC has become so complex that it has become unwieldy and difficult to work with.
The use of other methods is underway like XML and XML enabled databases but the central problem remains: how best to organize and structure an electronic library?
0 Comments
Recommended Comments
There are no comments to display.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now