I’ve been in Washington, DC for four weeks now. I accepted a summer internship at the Library of Congress which ends on August 1st. I ended up here all at once, before I even knew what happened. Grad school ended so suddenly! So, here I am, working in the American Folklife Center at the Library of Congress. My specific project requires me to reformat a medium-sized collection of tangible media held in the Local Legacies collection. Most of the media are 3.5″ floppy disks, but there are also zip drives and CDs that require preservation reformatting. I was really eager to get started on this work and so far things are going really well!
There are some challenges. My boss works remotely and I was disappointed to find out that I would not be seeing much of him during the summer. However, he’s a great boss! He’s very knowledgeable about the tools we use for data extraction and ingest and he always walks me through the whole process whenever we start a new workflow. Once I feel comfortable, I’m on my own. I have found this kind of hands-off approach to managing works really well for me. I like to take ownership over the projects that I’m working on and I like that he trusts me enough to do my job well.
Another challenge is that the project timeline is very short. Extracting the data doesn’t take very long. I use very simple tools like BagIt to package the files from the disk and save it to a hard disk drive. The bottleneck of the whole project is ingesting the data into the preservation storage environment. Although the content transfer system used by the Library of Congress is capable of handling a multiple content workflow, I am only ingesting bags (a collection of related files, really) one at a time. It is a slow and monotonous task.
The fun part of the project will be looking at the data itself. The total amount of physical media in the collection is 472 pieces. We have no idea how many files we will cull from the collection. Most of the time I find textual documents, but there are also images and webpages saved on those 3.5″ floppy disks. I met with someone who introduced me to the idea of topic modeling and the tool Mallet. We’re hoping to analyze these texts with Mallet so we can have a very high-level view of what the Local Legacies physical media collection contains. This is my first foray into the Digital Humanities and I’m excited to see what happens.