A field trip to the Internet Archive

Thanks to Rob Pegoraro for the very nice article that appeared Tuesday in the Washington Post.

A snippet from the article:

A field trip to the Internet Archive by Rob Pegoraro
Many people think of the Internet Archive only as the home of the Wayback Machine, the site that lets you see what pages looked like years ago.


But the archive is also of the real world, a 501(c)(3) nonprofit organization that makes its home in a former church in the Richmond neighborhood here. Archive founder Brewster Kahle took an hour to show me around the place and talk about its work — an increasing amount of which has little to do with old Web pages.”

You can read the full article at: http://voices.washingtonpost.com/fasterforward/2010/05/a_field_trip_to_the_internet_a.html

-Jeff Kaplan

Posted in News | 4 Comments

NASA Images selected as one of MARS Best Free Reference Web Sites of 2010

From NASA Images blog:

NASA Images has been selected as one of the MARS Best Free Reference Web Sites of 2010, an annual series initiated under the auspices of the Machine-Assisted Reference Section (MARS) of the Reference and User Services Association (RUSA) of the American Library Association (ALA) to recognize outstanding reference sites on the World Wide Web. This years list consists of 30 sites recognized by MARS as outstanding for reference information, view the list here.

Kudos to the NASA Images team: Jon Hornstein, Jake Johnson, Greg Williamson and Samantha O’Connell.

-Jeff Kaplan

Posted in Image Archive, NASA Images, News | 4 Comments

Community: A New Name for “Open Source” Collections

Internet Archive has changed the names of the Open Source Audio, Open Source Books, and Open Source Movies collections.  We have chosen to replace the term “Open Source” with “Community,” because we feel it better reflects the purpose of these collections and the people who donated the content.

Open Source typically refers to free access to source code and open distribution. Wikipedia has a detailed definition of the term at http://en.wikipedia.org/wiki/Open_Source_Definition.

Please be assured that none of the URLs for collections or items have changed – if you have links to our site for any content, those links will continue to work.

We hope you agree that this change better reflects their purpose.  Please visit our newly renamed collections:
Community Audio
Community Texts
Community Video

-Jeff Kaplan

Posted in News | 28 Comments

An Old-fashioned Book Drive! Please help make the Open Library Book collection even bigger.

Internet ArchiveFrom Open Library:

The Internet Archive has been scanning books for some years now, and we’re always looking for more. In addition to 1,000,000+ eBooks available to anyone available through Open Library, we’ve announced the release of modern books for the print-disabled community in a special format called DAISY. It’s a brand new collection – one of the largest available online. For too long, print-disabled people have been denied access to the full breadth of contemporary books, and we’d like to assist in tipping that balance back to where it should be, universal access for all readers.

We are sponsoring the scanning of the first 10,000 books

Please help us by donating books to be scanned or with financial support for the scanning process. Based on existing foundation funding, we are sponsoring the scanning of the first 10,000 books that are donated in this Book Drive. We’re looking for wonderful and important books for this first 10,000 and even more books and money to keep it going. We will make these digital books as available to the world as we can, including the print-disabled, and will preserve the physical book for the long term.

How Does The Book Drive Work?

You can simply send up to 100 books or drop them off in person at our headquarters:

Internet Archive Book Drive
300 Funston Avenue
San Francisco, CA 94118

If you’d like to make a donation of more than 100 books, wow! That would be wonderful, but please give us a call on +1 415-561-6767 to arrange shipping and handling.

We’d like to recognize the generosity of everyone that donates a book to the book drive. It is simplest for us to do this if you include an “Ex Libris” bookplate inside the front cover of each book you donate. That way, when we scan your donation, we will simply photograph your bookplate. This will become part of the permanent digitized version of your donation.

For additional information: http://openlibrary.org/bookdrive

Posted in Audio Archive, Books Archive, News, Open Library | 3 Comments

Open Library redesign is live!

Open Library Yesterday I posted about the new Open Library Accessible Books service. Today I’d like to bring some attention to the wonderful redesign of Open Library that was also launched yesterday.

Beside having a great new look there are some cool features (from a post on Open Library (http://blog.openlibrary.org/2010/03/17/announcing-the-open-library-redesign/):

The previous version of Open Library was only aware of editions of books, or “manifestations” in FRBR-speak. We’re excited to release Works, which helps catch all editions of the same book and collect them all under this one umbrella. Each work also has its own URI too – we’re hoping these propagate.

Note that our representations of Works is imperfect. We’re the first to acknowledge that there are lots of duplicate edition records in Open Library, and these dupes clog up our ability to derive or create works from editions. That means that we might have 25 Jane Eyres for a while, and that the next logical feature to release is a way for people to help merge things.

Subject pages
We wanted to find a way to help people browse the catalog rather than having to know what they’re looking for before they start. So, we’ve gone through a process of breaking down and reconstructing the subject headings on our records, giving each heading a URL, and displaying a whole bunch of data about each heading: works about that subject, publishing history, related subjects, authors who write about it, and publishers who publish in that subject area.

Revamped search
We’ve rewritten search from scratch and upgraded to SOLR 1.4. Our ranking is very basic for now, so “relevance” doesn’t mean a lot yet. We can’t wait to improve on it, and in the meantime, you can also sort your searches by the number of editions, when things were published, or filter using facets.

UI Improvements
The whole site’s had an overhaul in terms of the user interface. All the major operations (editing, searching, adding covers etc) have been redesigned. Even changing the size and position of the Edit button will hopefully make it clearer that these records are open to correction. We’ll be blogging over the coming weeks with specifics about the user interface enhancements.

Links, link, links
Another major component of the redesign is to begin the process of connecting our records to other references out there on the interwebs. If you get to an Edit Edition page, you’ll notice that you can add different identifiers from a variety of systems to the Edition record, and even add a new type of identifier to the system. The more IDs we can collect, the more connections there’ll be into and out of Open Library.

The redesign is just out of the oven, so it’s important to be clear that there are still things missing, unclear, coming soon, or potentially even broken:

A lot of the revisions we’ve made to the API are undocumented. We’re looking forward to changing that, and will update you as we do. We’d also like to expand the range of ways you can write to Open Library via the API.

The Data
Now that we’ve improved on the ways to browse the Open Library catalog, we’ve exposed a lot of the corners and content in there that may never have seen the light of day, or are just plain wrong.

It might be odd to say, but we sympathize with Google’s recent position on metadata quality. Trying to merge records from lots of different catalogs means there will be duplicates, and that any errors in those different catalogs are imported as well. That’s not to say we’re not happy with what we’ve got at this first stage. Edward has done a fantastic job to get this far, and we’re looking forward to continual improvement of the dataset.

The fun thing — the best thing? — about Open Library is that you can correct any errors you come across, and those corrections can be propagated.

Please go and explore the new Open Library. This is just the beginning!

Kudos to the entire team:
Core Dev Team
Lance Arthur  HTML & Pixel Wrangler
Edward Betts  Chief Data Munger
Anand Chitipothu  Chief Web Programmer
George Oates  Project Lead, Designer

Karen Coyle  Metadata Czar
Brewster Kahle  Overseer

Winnie Chen  QA Master
Daniel Giffin  Programmer
Rebecca Malamud  Designer
Alexis Rossi  Manager
Aaron Swartz  Former Project Leader
solrize  Search Programmer

A.S.L. Devi
Werner Popken
Tommi Raivio
Allan Jardine
Simon Chetrit

More information on each team member’s contribution is at http://openlibrary.org/about/people

-Jeff Kaplan

Posted in Open Library | 2 Comments

Over 1 Million Digital Books Now Available Free to the Blind and Print-Disabled

Open Library logo

Open Library Accessible Books

The Washington Post is carrying the story of the new service launched today, Open Library Accessible Books. We’re really excited about this.

The Open Library team has been working very hard to create a fantastic way to bring books to the blind and print disabled. There will be over 1 million books available free in the open-source DAISY format with more to come.

talking book device

Jessie Lorenz, Jessie Lorenz, an associate director at the Independent Living Resource Center San Francisco, with talking book device

“Every person deserves the opportunity to enhance their lives through access to the books that teach, entertain and inspire,” said Brewster Kahle, founder and Digital Librarian of the Internet Archive. “Bringing access to huge libraries of books to the blind and print disabled is truly one of benefits of the digital revolution.”

The print disabled collection of books are now available through the Internet Archive’s newly redesigned Open Library site, which serves as a gateway to information about millions of hardcopy books and more than 1 million electronic books.

Kahle also announced that the Internet Archive will be investing in the growth of its virtual bookshelf by funding the digitization of the first 10,000 books donated. Individuals and organizations are welcome to donate their favorite book or a collection of books. Books in all languages welcome. To donate books visit: http://openlibrary.org/bookdrive

To read more go to: http://www.archive.org/iathreads/post-view.php?id=305502
Open Library Accessible Books: http://openlibrary.org/subjects/accessible_book
Open Library: http://openlibrary.org

Posted in Books Archive, Open Library | 4 Comments

New Firefox addons for Internet Archive and NASA Images Search

We’ve developed two more Firefox addons to making searching easier and direct to the sites:
NASA Images Search addon: https://addons.mozilla.org/en-US/firefox/addon/156140/
Internet Archive Search addon: https://addons.mozilla.org/en-US/firefox/addon/155131/
-Jeff Kaplan

Posted in NASA Images, News | 2 Comments

better mp4 (h.264) derivatives at archive.org!

Late last week, we pushed live a new video deriving technique, as well as in the process updated our audio/video file reader, ffmpeg.

New items will benefit from this newer method, and prior items can be re-derived by users if they desire (probably by the end of the year, we will rederive all our movies automatically).

The video will have significantly less “noise”, a higher PSNR (Peak Signal-to-Noise Ratio), and less”blocking” — all at similar or faster deriving speed to build and the same bitrate and filesize!

example new derivative frame

example new derivative frame

example old derivative frame

example old derivative frame

We now open the source video file up with ffmpeg, resize and convert it to raw video, and pipe it to the most recent build of “x264” tool (opting for baseline profile for iPhone, etc. compatibility).
For the very curious (and the very geeky 😉 here is a how we make our h.264 MPEG4 video files now:

• ffmpeg -i camels.avi -vn -acodec libfaac -ab 64k -ac 2 temp.aac
• ffmpeg -an -deinterlace -i camels.avi -s 320x240 -r 20 -vcodec rawvideo -pix_fmt yuv420p -f rawvideo - 2>/dev/null | ffmpeg -an -f rawvideo -s 320x240 -r 20 -i - -f yuv4mpegpipe - 2>/dev/null | x264 --bitrate 512 --vbv-maxrate 768 --vbv-bufsize 1024 --profile baseline --pass 1 /dev/stdin --demuxer y4m -o temp.h264
• ffmpeg -an -deinterlace -i camels.avi -s 320x240 -r 20 -vcodec rawvideo -pix_fmt yuv420p -f rawvideo - 2>/dev/null | ffmpeg -an -f rawvideo -s 320x240 -r 20 -i - -f yuv4mpegpipe - 2>/dev/null | x264 --bitrate 512 --vbv-maxrate 768 --vbv-bufsize 1024 --profile baseline --pass 2 /dev/stdin --demuxer y4m -o temp.h264
• mp4creator -c temp.h264 -r 20 t2.mp4
• mp4creator -c temp.aac -interleave t2.mp4
• ffmpeg -i t2.mp4 -acodec copy -vcodec copy -metadata title="Camels at a Zoo - http://www.archive.org/details/camels" -metadata year="2004" -metadata comment="license:http://creativecommons.org/licenses/by-nc/3.0/" camels_512kb.mp4
• mp4creator -optimize camels_512kb.mp4

–Tracey Jaquith

Posted in Video Archive | Tagged , , , , , | 2 Comments

Nuclear Summit and Marionettes

In light of the recent nuclear de-proliferation summit in Washington, D.C., I thought I’d bone up on pre-cautions in case things don’t work out.

Today’s question is: Can paper maché marionettes survive the bomb? Let’s find out:
Rural Civil Defense TV Spots 1965: http://www.archive.org/details/rural_civil_defense_tv_spots_1965

A good fact to remember: your livestock can survive fallout. Mmmm…steaks and chops that glow in the dark.
And, In case you were wondering, fertilizer can act as protection from the bomb.

Some of my favorite snarky reviewer comments:

“A series of helpful[?] ads to instruct farmers how to survive a nuclear attack. Not only would this not help the farmers, considering how slowly he goes down the stairs, it doesn’t even help puppets.”

“Filmed in less-than-super marionation. Puppet design by Mrs. McGreevy’s Third Grade Class.”

“Well, when these PSAs were designed, obviously the creepiness factor was considered to get people to pay attention.”

“So…. did anyone notice the random squirrel (8:27)? I couldn’t stop laughing!”

“Aside from the random squirrel, the best part is when that guy falls down the stairs and the camera just lingers as he lies motionless. Its almost as if you’re waiting for him to get back up but then he just lies there and you say ‘No, he’s dead…’.”

“See it to believe it, and even then who can believe it?”

-Jeff Kaplan

Posted in Video Archive | Leave a comment

“Houston, we’ve had a problem”

The now famous words spoken by Jim Lovell in 1970 in the ill-fated Apollo 13 flight. There was a reunion of astronauts and control crew to celebrate the 40th anniversary. NASAimages has many great photos and video from the flight. Here are a few of my favorites.

The news bulletin.
The duct tape fix!
Re-entry and recovery!

Tense ground control.
Success celebrated on the ground!

Check out more at NASAimages.org.

-Jeff Kaplan

Posted in NASA Images | Leave a comment

New Open Library Search Engine add-on for Firefox

We’ve developed a Firefox add-on that allows you to directly search Open Library from your browser’s toolbar search field.

To install it:
1. Go to: https://addons.mozilla.org/en-US/firefox/addon/144222/
2. Check the “Let me install this experimental add-on.” button
3. Click “Add to Firefox” button
4. Click “Add” in the pop-up window (check “Start using it right away” if you want to use it immediately.)
5. Lastly, if you’re registered with Mozilla please log in and write a review of it here: https://addons.mozilla.org/en-US/firefox/addon/144222#reviews

I hope you find it useful. Please use it often.

-Jeff Kaplan

Posted in Cool items, Open Library | 4 Comments

NASA partners with Internet Archive to archive digital imagery

nasaimages - thousands of images to discoverFrom Jon Hornstein at Internet Archive’s NASA images:

NASA gave a nice shout-out to the Internet Archive for helping them address their Open Government Initiative requirements. http://www.nasa.gov/open/plan/records-management.html

Here’s a couple of choice quotes . . .

“. . . (the Internet Archive) serves as custodian of much of NASA’s current and legacy digital imagery records. In addition, IA will help digitize NASA’s historically significant, analog images for inclusion on the Web site, enabling digital archiving with the National Archives and greater public access to these records via the IA Website.”

“Strictly on its own initiative, IA recently began to capture NASA’s publicly posted social media content. NASA is considering exploration of how this activity might be leveraged for records management purposes.”

There’s always cool stuff to be discovered at NASA images: http://nasaimages.org

-Jeff Kaplan

Posted in Cool items, Education Archive, Image Archive, News, Video Archive | 1 Comment

Millions of documents from over 350k federal court cases now freely available

Princeton University’s Center for Information Technology Policy, working with the Internet Archive and volunteers has launched RECAP, a project to make US Federal Court Documents available for no cost to the public.

RECAP is a Firefox Internet browser extension that allows users of the PACER to get free copies of documents they would normally pay for when the Archive has a copy, and if it is not available to then automatically donate the documents after they purchase them from PACER for future users. Therefore the repository on the Internet Archive grows as people use the PACER system with this plug-in. We are currently getting more than one document a minute and some large holdings are being uploaded. We hope that the government will eventually put all of these documents in an open archive, but until then this repository will grow with use.

We find this an exciting project in that it is taking public information and automatically making a second archive of these materials as they are used. This may be a first for doing this kind of automatic archiving system, and hope this could become a model for preserving public domain collections. Technically, the Archive is using its new S3-like interface to make automated uploads easy.

PACER, (Public Access to Court Electronic Records) provides on-line access to U.S. Appellate, District, and Bankruptcy court records and documents. Although it is available to the general public, it is difficult for non-lawyers to use, and users must pay significant fees for the documents they request. RECAP enhances Pacer user experience and simultaneously contributes these documents to Internet Archive to make available to the public. RECAP users are also alerted when a document they are searching for is already available from this repository. Since its August launch, RECAP continues to prompt interest from the legal profession and to use the capabilities of the web to increase government transparency.

Posted in News | 1 Comment

New York City

Just back from a sightseeing trip of New York City. After visiting so many famous landmarks I decided to check out what we have at the Archive.

I dig the riveting machine in this movie of the construction of the Empire State Building: http://www.archive.org/details/making_a_skyscraper_empire_state_bldgB-29 crash Empire State Building 1945

There was tragic airplane crash into the Empire State Building in 1945: http://www.archive.org/details/Pa2107Empire

I loved the historic structures all over the city. Here’s a book about some of the more notable historical buildings: http://www.archive.org/stream/oldbuildingsofne00newy#page/142/mode/2up

I spent a weekend in Woodbury, Connecticut. It’s a very old area with many houses and buildings over 200 years old. I found this book about the history of the area: http://www.archive.org/stream/historyofancient01cothr#page/n3/mode/2up

When we returned to San Francisco one of our party discovered a dreaded Deer tick on her leg. I found this document on Lyme disease: http://www.archive.org/stream/lymediseasediagn00unit#page/10/mode/2up

Nasty little bugs to remove. Perhaps it’s time to consider an antibiotic cocktail…   -Jeff Kaplan

Posted in Books Archive, Video Archive | Leave a comment

Two Million Free Texts Now Available

homiliaryThe Internet Archive is pleased to announce an important manuscript, Homiliary on Gospels from Easter to first Sunday of Advent, as the 2,000,000th free digital text. Internet Archive has been scanning books and making them available for researchers, historians, scholars, people with disabilities, and the general public for free on archive.org since 2005.

“This 1,000 year old book which has only been seen by a select few people, can, with the technology of today, be shared with millions tomorrow,” said Robert Miller, Director of Books of the Internet Archive. “Selecting this title for the 2 millionth text is a fitting tribute to the team of scanners who have been carefully working for the past 5 years.”

The Homiliary manuscript was copied on parchment by at least three different scribes at the important medieval Abbey of St. Martin in Tours less than 100 years after having been composed by Heiric of Auxerre and is the oldest known copy of Heiric’s original text.

“Handwritten in Latin by a number of scribes in a script inspired by the court of Charlemagne, this rare and beautiful treasure from the first millennium of Christianity, is one of the gems in the renowned collection of the Pontifical Institute of Mediaeval Studies. The Institute is dedicated to transmitting the inheritance of the Middle Ages to new generations; to deepening our understanding of the life and ideals of Western culture in the time of its first youth,” said Jonathan Bengtson, Director of Library and Archives, University of St. Michael’s College in the University of Toronto & Pontifical Institute of Mediaeval Studies.

About the Internet Archive:

The Internet Archive is a 501(c)(3) nonprofit digital library based in San Francisco that specializes in offering broad public access to digitized and born-digital books, music, movies and Web pages.

Internet Archive partners with the University of Toronto and over 150 libraries and universities around the world to create a freely accessible archive of texts representing a wide range texts which include non-fiction and fiction books, research and academic texts, popular books, children’s books and historical texts.

Posted in Books Archive, News | 7 Comments