Internet Archive: Modern Day Library of Alexandria, or Unnecessary Expense?

AMY SEGELBAUM

On a recent prospective student tour of the University of Iowa, our tour guide walked the group in front of a glass enclosed conference room, and proudly pointed out the electronic tablets and floor to ceiling white boards. With eighteen “private study rooms,” two “large group rooms,” and six “open group areas” on the library’s main floor, it made our group wonder where all of the books were placed to make space for these high tech areas. [1] We were right to question this, because as David Streitfeld describes, a librarian he recently interviewed explained that “a lot of libraries are doing pretty drastic weeding.”[2] Are librarians responding to public requests for spaces, and slowly losing sight of what libraries were created to do? As we scan books to make room for spaces that utilize the technology of smart boards and video screens, we need to protect printed books and not pack them away or discard them. Scanning projects like the “Million Book Project” and “Google Books” have attempted (and are close to achieving) the historic goal of compiling a comprehensive digital global book collection. However, this goal has been controversial as Google Books tracks readers for marketing purposes, saves error-laden scans, and is monetizing of the world’s literature. Google publicizes that their main goal is to “improve access to books – not to replace them.”[3] Brewster Kahle does not trust this, and because of his concerns, he created the”Internet Archive” storage facility for books; but his project is duplicating efforts. Kahle’s costly system of closed containers is not needed because the Library of Congress has already been archiving books for centuries on publicly accessible bookshelves.

Kahle wants to ensure that every printed book in the public domain is saved, and to ensure this he built a facility where books are scanned and stored in large temperature controlled metal shipping containers.[4] Kahle considers his facility a fundamental resource, and he compares it to the historic “Great Library of Alexandria,” (300 B.C.) which compiled everything ever written at the time.[5] Kahle’s containers of boxed books do create great amounts of storage, but without convenient access, his system is not a library, but rather an insurance plan in case of a disaster. The Library of Congress in contrast, provides public access to their entire collection on site (although checking books is only possible for elected officials) and patrons can request book transfers to their local libraries for branch-only viewing.

Brewster Kahle has been criticized in the media, and Streitfeld uses a less than idealized tone toward his project. He mockingly refers to Kahle as a “latter-day Noah,” and cynically points out that he is funding the project with money made from selling a data-mining company to Amazon.com. [6] Interestingly, Kahle does not specifically address a plan for future funding, which could be an issue when he is no longer at the helm of the project. Financial stability of the donation dependant Archive could impact the future of the stored books.

The Internet Archive facility (built for $3 million) contains approximately six million scanned titles. Google books by contrast, has a library of over 150 million electronic books, and works in partnership with over forty worldwide libraries.[7] They have been a large part of the earliest digital archive projects, and although some of Google’s practices are questionable, their system does provide access to a vast library for the masses. Kahle’s lackluster online platform is not user friendly, and his system of packing up public domain materials, and removing them from circulation seems to banish them to darkness where they will never be seen again.

Although Google Books has succeeded in building the world’s largest online library, we should use caution and not depend on them to hold the exclusive copy of a book. Skepticism toward Google practices should motivate our librarians to keep hard copies of books on the shelves. The Google Books project continues to use controversial methods of targeted advertising and electronic tracking called “Google Analytics,” to generate profit from both public domain material and copyright protected material. [8] A reader simply accessing literature online is tracked and observed, and the information can be used for marketing and advertising purposes. Google uses “remarketing” which helps “find customers who have shown an interest in…products and services, then show them relevant ads.”[9] Reading literature should be a personal experience and not an opportunity for being electronically “followed.” Kahle feels his archive is the better solution because in reality, “Google’s search tool has become a digital bookstore.”[10] While readers are accessing literature, Google provides “clickable” links on each page for the purchase of books. Frequent critic of book digitization, Rory Litwin, in his book Library Juice Concentrate (2006), quotes Google co-founder Larry Page as saying he was a “firm believer in academic libraries being able to ‘monetise’ the information they hold.”[11] This is a questionable opinion for someone in control of the world’s largest online book source. Libraries should not be connected to the business of book selling, but continue to be a free resource for all citizens.

With each generation of a scan, errors are occasionally made and permanently archived. Illustrations are a vital element of printed literature, and details are sometimes muddled, out of focus or pixilated. In addition, pages are sometimes scanned in a crumpled or folded manner, and replaced with unreadable type or images of gloved fingers. Many of these images are collected and displayed on blogs by people who document errors made through scans. One such collection is located on the micro-blogging site, “Tumblr.” (http://theartofgooglebooks.tumblr.com/)  Krissy Wilson, creator of “The Art of Google Books,” sifts through scanned pages in Google Books searching for errors during the digitizing process. She collects and chronicles these photos and sees “book digitization as re-photography and…worthy of documentation and study.”[12] I see these “Google hands” as an important reminder that the Google Book project is executed by human beings who make errors. This is another important reason that our printed versions should not be discarded or put aside. They will continue to be needed for reference and viewing, and her blog highlights things that commonly go wrong.

  prezi1prezi3

Our largest book archive, the Library of Congress, is preserving the printed book on 838 miles of bookshelves, and adds 12,000 items to its collection daily.[13] Although every book that receives a U.S. copyright is required to be provided to the Library, books that are not chosen for the collection are shared with various libraries around the United States. The Library of Congress keeps 158 million items in three buildings, and they are also using scanning technology to protect their collection to provide more public access. The Library scans books to create digital copies for safekeeping, and “uses the full range of traditional methods of conservation and binding as well as newer technologies such as the deacidification of paper and the digitization of original materials to preserve its collections.”[14] According to their website, 234,333 books are available online through their system, and the historic and rare books continue to be protected on the shelves of this collection.

Our libraries and the printed books on their shelves are a precious resource.  Although digitizing books is an important part of creating access for the public, how we handle the printed editions after the scans is just as important. Google has amassed an extensive digital library, and the curated system of the Library of Congress (which has existed since the year 1800) is the best system we have so far for archiving our print literature. Vital information will be lost without access to printed books, and Kahle’s solution seems to be a redundant and unsustainable “band-aid” approach to the preservation of our collections.

NOTES

[1] University of Iowa Libaries; Learning Commons; “Spaces/Hardware.”

[2] Streitfeld, “In a Flood Tide of Digital Data. “

[3] Google; About Google Books; “Perspectives. “

[4] Kahle, “Why Preserve Books?” (blog).

[5] Internet Archive; About Us; “The Bibliotheca Alexandrina.”

[6] Streitfeld, “In a Flood Tide of Digital Data.”

[7] Google; About Google Books: “Library Partners.”

[8] Google; Google Analytics; “Features.”

[9] Ibid.

[10] Kahle, “A Book Grab by Google,” Google Books provides a “button” for accessing pricing for most books that have been scanned into their system. Price comparisons are provided between several online booksellers, and with the click of a mouse, a book can be ordered and shipped directly to the reader.

[11] Litwin, Library Juice Concentrate, 61.

[12] Wilson; Tumblr “The Art of Google Books.”

[13] Library of Congress; About the Library; “Fascinating Facts about the Library of Congress.”

[14] Library of Congress; About the Library; “Frequently Asked Questions.”

 Bibliography

Google; About Google Books: “Library Partners,” accessed April 8, 2015, https://www.google.com/googlebooks/library/partners.html

———; About Google Books; “Perspectives,” accessed April 5, 2015,

https://www.google.com/googlebooks/perspectives.

———;  Google Analytics; “Features,” accessed April 1, 2015

http://www.google.com/analytics/features/.

Internet Archive; “The Bibliotheca Alexandrina;” accessed March 28, 2015,

http://archive.org/about/bibalex_p_r.php

Kahle, Brewster, “A Book Grab by Google,” The Washington Post, May 19, 2009.

———; “Why Preserve Books? “The New Physical Archive of the Internet Archive,”

Internet  Archive Blogs, June 6, 2011. http://blog.archive.org/2011/06/06/why-preserve-books- the-new-physical-archive-of-

the-internet-archive/.

Library of Congress; About the Library; “Fascinating Facts about the Library of Congress;”

Accessed April 3, 2015, http://www.loc.gov/about/fascinating-facts/

Library of Congress; About the Library; “Frequently Asked Questions;” accessed March 31,

2015, http://www.loc.gov/about/frequently-asked-questions/

Litwin, Rory. Library Juice Concentrate. Duluth: Minnesota, Library Juice Press, 2006.

Streitfeld, David, “In a Flood Tide of Digital Data, An Ark Full of Books and Film,” New York Times, New York, NY,  March 4, 2012.

University of Iowa Libaries; Learning Commons; “Spaces/Hardware.” Accessed April 11, 2015 http://www.lib.uiowa.edu/commons/spaces-hardware/

Wilson, Krissy; Tumblr; “The Art of Google Books;” (blog), accessed March 24, 2015,

http://theartofgooglebooks.tumblr.com/.

Advertisements

7 Comments

Add yours →

  1. Amy, I really love your posting about the issue of preserving printed books, whether by digital archive or by storage. I think you make a great point about the objective of Kahle’s project – it really does seem like he’s duplicating the efforts of the Library of Congress and at the same time not making the printed versions of the books he’s collecting available to the public, so Kahle’s point about his facility resembling a modern day Library of Alexandria is not valid. In reality, Kahle’s Internet Archive does satisfy two quite different purposes. For one it is, as Streitfield points out, a sort of literary Noah’s Ark with Kahle shepherding the books into the Archive two by two (as the metaphor goes). The Archive is a good back up for now, a safe house that would at least have some printed books saved in case of an extreme emergency, or any possible devastation of the Library of Congress. For another, the Archive’s digital preservation is a way for readers to gain access to these volumes (presumably) without Google looking over their shoulders and categorizing reading habits in order to capitalize financially. Kahle’s project comes with a lot of issues, including who will be the guardian of the Archive when Kahle is gone and where will future funding come from to continue the project. But the biggest issue I think Kahle has is an issue of mislabeling his project. While the Archive is no “Great Library of Alexandria,” it is an intriguing way to preserve printed texts for future generations in case of emergency, and it is a way of digitally accessing texts in a more private manner than Google allows.

    Like

  2. Amy,
    Your comparison between Kahle’s project and the Library of Congress addresses such an important point: books are meant to be read. While Kahle’s project may be motivated by his desire to preserve the written word, his storage plans, like you note, are nothing more than “an insurance plan in case of a disaster.” The Library of Congress provides some type of public access to their collection, which allows these books to be read rather than simply stored.

    You also note how literature online is “tracked and observed” for monetization purposes (like Google Books), which caused me to question the price that we, as readers, are willing to pay to make sure that these books are read rather than stored away. Libraries seem to be moving toward an online-only system. For example, the Digital Public Library of America (DPLA) opened in 2013 as the United States’s first public, online-only library, which forces us to rethink what “reading” means and whether “private reading” can be accomplished in the digital age. The DPLA, which seems like a great online resource, also tracks a user’s information like Google Books. They state on their Privacy Policy webpage, “DPLA may retain all data and content collected through the Services for restorative, archival, or research purposes. Editing or deleting your User Content will alter the public availability of the User Content, but may not permanently delete the content from the Services.” They appear to be collecting data from their readers over which the reader has limited control. I think that this movement towards online libraries and book digitization forces us to evaluate this relationship between reading and privacy that we perhaps took for granted.

    If you want to explore the DPLA website (it’s pretty great), here’s the link to their homepage and to their Privacy Policies:
    http://dp.la/
    http://dp.la/info/terms/privacy/

    Like

  3. Chelcy Walker May 4, 2015 — 2:42 am

    Amy, your article helped me understand what kind of efforts are currently being made to safeguard information. It seems that Brewster Kahle’s efforts come from a deep value for the written word, history, and knowledge. He seems to fear what might be left behind or lost in the process of modernization. I think this is a perfect example of the clash between technology and tradition. Technology encourages us to discard the book after we scan it; tradition asks us to keep it. While I agree with your conclusion that his efforts are unnecessary, I would say that there need to be individuals like Kahle who consider what we might leave behind in the digital age. In principle, I would agree with him. I do worry what rapidly changing formats, error, and data corruption could mean for the future of knowledge. I also worry what could happen when that knowledge is not accessible to the masses, and so in that respect, I like that he is striving to privatize this operation. But in practice, I believe there are better ways we can safeguard the future of books. And I am believing that technology can be part of that solution.

    Like

  4. Josh McCaffrey May 12, 2015 — 2:59 am

    In theory, I think Google scanning millions of books is terrific. The fact that a person living in the remotest corner of the world has relatively easy access to obscure, out-of-print texts is amazing. In practice, however, I am deeply uncomfortable with the idea that it is OK for books to only exist in digital form with no hard copy as a back-up. For this reason, I fully support the work of Kahle, the Library of Congress, and anyone else working to preserve the print versions of books.

    While I agree that it would be nice if book collections were accessible to the general public, that isn’t ideal either. Amy mentions that books can be “checked out” from the Library of Congress and shipped to library branches across the country, but then the book is at risk of being lost, stolen or damaged. As much as I dislike the thought of keeping books away from the public, at least it would keep them safe and preserved.

    This whole discussion reminds me of a news story I saw on television several years ago about Norway’s Svalbard Global Seed Vault (https://www.regjeringen.no/en/topics/food-fisheries-and-agriculture/agriculture/svalbard-global-seed-vault/id462220/), where more than 4,000 plant species are stored in case they are ever needed after a global disaster or catastrophe. For me, this is what makes a book archive so valuable: even if nobody ever opens the vault, it’s just comforting to know that it’s there.

    Like

  5. Selena Efthimiou May 14, 2015 — 2:00 pm

    You know, I think it is interesting that you ended your posting with the idea that Kahle’s work is a “Band-Aid” approach. Google Books and other scanning companies have created a scar amidst the pages of texts, yet I’m not sure if this is just an approach to help heal what has been torn open. Honestly, at first I was going to comment on how I think Kahle’s concept of storage is a great idea. It does feel like a Noah’s ark type scenario where he saving the books from destruction. However, the more I think about it, the more I am noticing that it is just a band-aid trying to heal what can’t be healed.

    Like you had stated, “Kahle’s costly system of closed containers is not needed because the Library of Congress has already been archiving books for centuries on publicly accessible bookshelves.” It does seem a little redundant to be fixing something that is already in the process of being fixed. How much is Kahle’s approach really helping? I also question how many of the same editions, books, and stories, are kept locked away. Does Kahle keep only one edition and only one copy of each book? How is he able to keep track? Also, how does he know that the books he is storing aren’t the same books that the Library of Congress has already stored? There could be unnecessary redundancy worth exploring because it would save time, space, and money. Amy, at first upon first glance of your posting I completely agreed with what Kahle was doing, but now I’m not so sure because I’m seeing flaws in the concept.

    Like

  6. Amy,
    I always thought that Kahle sounded like an incredibly… interesting person. He truly believes that he is building the second Library of Alexandria and he actually wants to ‘beat’ the Greeks. When we read about him in class I really thought that he was a bit of a nut and that his attempt was kind of insane, a modern-day literary Noah was pretty funny to me. And then we went to the Anderson library at the U. Now their intention is a lot different than Kahles. They are not bringing all of their books together because they are paranoid that their literature is going to be lost if it gets put online and the material is left to the dust. But was still an incredible sight to walk through those underground caverns and see all of those millions of books on the shelves. I have never seen even pictures of Kahles ‘New Age Alexandria’ but I have to believe that his library is at the very least as impressive as the collection at Anderson. When you only think about the lines of books, all in one place, just waiting for someone to read them, it makes literary nerds like me who really appreciate the solid pages over digital ones so excited for the possibilities. Having a place like that in the world gives me hope that we will save the integrity of the physical book.

    Like

  7. Arianne Peterson May 14, 2015 — 7:16 pm

    Amy, it was such a relief to read your updated version of this post and find that indeed, the Library of Congress is fulfilling its responsibility in keeping a publicly accessible collection of all published books. Thanks for researching this. Even though theoretically something could go wrong with the Library of Congress in terms of its being able to sustain this collection (after all, this country is less than 250 years old and anything can happen), I really value the idea that this work is being done in what is at least supposed to be a democratically-controlled way. I think it was Erica who mentioned in her presentation that digital storage codes actually erode over time just like print books—this does make sense when I consider that I would never be able to read a file that had been created with software 20 years ago and not painstakingly updated. Why would we expect that a PDF of today will be easily opened 20 years in the future? This insight made me value print editions even more.

    While Kahle’s archive project does seem a bit redundant with the Library of Congress, I have to agree with Josh in feeling that any effort toward book preservation is commendable. He is doing this with private funds, and though it may not be necessary to the survival of the material in the end, at least it doesn’t hurt.

    On a different note, because of my work in the anti-nuclear field, I can’t help but compare these huge book repositories with radioactive waste repositories—especially after our visit to the U of M. Both are underground, climate controlled, meticulously cataloged sites designed for the kind of long-term storage that considers time on a deeper scale than we are used to. We haven’t yet created a nuclear waste repository that can reliably maintain these conditions for the long term. How can we realistically expect that a book archive will be different? And if we are able to keep these books around for centuries, who will read them and how will they be used? Will the language and/or the ideas even be intelligible by generations far in the future?

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: