Showing posts with label digitization. Show all posts
Showing posts with label digitization. Show all posts

Tuesday, September 8, 2015

On Institutional Repository Success: Discovery, Search, Metadata

Over the summer I was asked to talk about institutional repositories and how to define what makes them successful as part of a job interview in an academic library. The text of what I said, along with some of the accompanying images, is below.


DIGITAL REPOSITORY SUCCESS

I've been asked to present my thoughts on what it means for a digital repository, an institutional repository, to be successful, and how to measure that success.

Very few people I know go into an institutional repository (IR) to look for something. It's not the way that search and discovery work. What I propose we do is to link the IR to our current search and discovery workflows, that is, link the IR to things that people already use.

It's not about making the repository more visible, it's about making the stuff in the repository more visible.

The IR is nothing without the things inside it; we need to have things that people want, and people need to know that they want those things, those items. Those items need to be where people can find them.

Don't have an IR just for the sake of having one. I turn here to one of my favorite library and information science theorists, Frank Zappa.

Thanks, Zappa estate.
Zappa once said that if a country wanted to be taken seriously, it needed two things: a beer and an airline. For Zappa, these are symbols of modernity. I want to make sure that an institutional repository isn't just a symbol of modernity, that we don't have one just because everyone else does, or because it's what academic libraries "should" have, but because it will be used. And for sure, having one is nice. On its own, an IR sends a positive signal concerning open access initiatives to faculty, to an academic community, and that's good, but it shouldn't be the main reason for having one.

Furthermore, we shouldn't have an IR because it's seen as a solution to non-existent or undefined problems. In organization theory, this is known as the "garbage can model" of decision making.

Not sure why PBS hosts this smushed image.
If we're going to have an IR, it should solve existing problems. It should help, not hinder, and it shouldn't exist for its own sake.

So with that in mind, we have an IR here, and an open access initiative and policy. We can improve the IR, and more importantly the stuff in it, in two ways, discovery and search.

For discovery, there are a few options. At my former place of work, we used widgets as well as a tab in our discovery search box.


Note the widgets, circled. (And yes, this is called burying the lede.) 
If possible, add a facet in the discovery layer search results. We already teach the use of these facets, may as well make the IR, and thus the stuff inside, more visible.

Note: no IR facet here. 
Results can also be expressed such that the IR is more visible. In "bento box" results, there could be an IR section of results, for example.

And of course if we don't have strong metadata for items in an IR, this won't matter. Application Platform Interfaces (APIs), Omeka has one, for example, are a good way to bring robust metadata into discovery. Digital Commons uses Open Authentication Interface, which is also workable. There's certainly room for collaboration with vendors here.

Metadata is also important in searching outside the library. Plenty of us, and faculty, use Google Scholar. With a link resolver we can bring faculty back to the library site, to the IR.

What success can look like. 
The library isn't a gateway, isn't always a starting point, so we need to bring what we have to where our users are. The library may not function as publisher, but it can certainly act as distributor.

Why is metadata so important here? Because Google Scholar works better with some schemas, some formats, than others. It doesn't play nicely with Dublin Core, for example. Without that robust metadata, we might come across our friend the paywall.

We've all seen one of these before, right? 
Ahhhh, the paywall, simultaneously too expensive, "you want how much for that paper?," and insultingly inexpensive given all that work that goes into research and publishing. Poor metadata will send people to a paywall instead of an IR for the same paper.

So discovery and search are two ways to build on IRs, to expand their capabilities. But if these methods work, how will we know? How can we track the output and measure the impact of an IR?

Traditionally, we use bibliometrics: citation tracking, pageviews, downloads, and the like. Our good friend COUNTER fits the bill. As the number of digital-only items grows, altmetrics become more important. Are articles being shared on LinkedIn or twitter? I know that one organization has tried to measure the effects of "#icanhazpdf," article sharing on social media, with mixed results. And increasingly, the line between biblio- and altmetrics are blurring.

Return on investment is also an opportunity to measure IR success, albeit crudely. Back to that paywalled article; we know that Elsevier thinks it's worth $36. Could we then write, in an annual report, that we added x-number of articles to our IR in 2015, or a fair market value of x times whatever the median article value is? That might be effective in terms of telling a story to academic administration.

Qualitative methods could also prove useful. Interview faculty, either individually or in focus groups, ask how IRs work, or don't, for them.

Speaking of faculty, this doesn't work without buy-in from them. It's why open access policies and initiatives are so important. Open access papers tend to get cited, get read, and get used more than those that are paywalled. Academic publishing looks like a moral hazard at times; faculty publish stuff and then we in the library have to buy it back from publishers.

Want one? Buy one!
We're asking a lot from faculty here, with the open access policy and the repository. We're asking them to trust us with their research, their work, and we librarians need to continually earn that trust. And that trust is part of success.

So to recap, institutional repository success is, to me, when you find the stuff, whether you notice the repository or not. When the repository is
  • Easy to use. 
  • Useful.
  • Interoperable, in that it works with what we have in terms of discovery platforms and search.  
  • Smooth and seamless, reducing friction so we don’t have to search in multiple places. That is, the IR can be unseen and still work! 
  • Branding/marketing can be useful: be consistent.
Thank you very much for the opportunity to present, and I look forward to your questions and comments. 


Take this with a grain of salt because I did not get the job.

Friday, May 16, 2014

The New York Times' Digital Strategy and "The Future of Libraries."

Last week the higher-ups at The New York Times did a bang-up job of reminding everyone that institutional sexism is real and pervasive. In addition, someone on The Times' payroll leaked a digital strategy document, titled Innovation Report 2014, to Buzzfeed that librarians would be wise to read.

To wit, The Times has a metadata problem: they lack both a controlled vocabulary and informal systems to tag stories behind the scenes, making it hard for reporters, writers, and digital content staff to make and promote connections.
“Without better tagging, we are hamstrung in our ability to allow readers to follow developing stories, discover nearby restaurants that we have reviewed or even have our photos show up on search engines.” (Page 41 of the report)
It took the Times seven years to come up with a “September 11th” tag, there's still no “Benghazi” tag (41).
“Just adding structured data, for example, immediately increased traffic to our recipes from search engines by 52 percent.” (44)
That's the price of bad, or non-existent, metadata.

There's more. The full Times report is hosted by the Neiman Journalism Lab, which also has excerpts. All images below come from that page.


Does this sound familiar, librarians? Do you think library websites are "gateways?" What is the role of content and discoverability?



The stuff that we, libraries and archives, have is valuable. But do we recognize opportunities when we see them? Gawker did. Phelps did. In reporting on the firing of executive editor Jill Abramson, The New Yorker did, scooping the Times on events that happened in the Times' own building.


Do we let the perfect be the enemy of the good? How afraid of mistakes, of failure, are we, even when we're surrounded by it?


Altmetrics: it's not just for scholarly communication.


Listen to your communities. Be responsive.

Your silos? They stink. They're often a product of organizational culture. They have implications for staff, and for communities.

The Times' Twitter account is run by its newsroom, while the business side of the Times handles its Facebook page, making for a confusing, incoherent public face for the paper.


“Because that's how we've always done it!”


Be curious. Seek continual improvement. Talk to people elsewhere, and steal their ideas. It's flattery. This is what conferences are for.


Again, it is okay to fail. I fail all the time, often in spectacular fashion. Failure is normal. Failure is natural. Try to create a culture where it is okay to take chances and okay to fail. And if something is failing, recognize it.


/Laughing
/Sobbing


Exit, Voice, and Loyalty.

The full report is worth a read.


Elsewhere on this site:
Glass Houses, Pots, Kettles
The End of "The End of Libraries"

Wednesday, April 16, 2014

Confessions of a Book Killer

Gather round, and I'll tell you a story.



In 2001 I worked for a large Midwestern research university on a grant from the National Science Foundation. I was tasked with digitizing a collection of books on non-Euclidian geometry.

Hang on. I'll wait here.

You were going to go to this page anyway, right? 

Do you have any questions?

Via Reddit.
Didn't think so.

Some of the books were old, dating back to the seventeenth century. Most were published in the nineteenth century, when non-Euclidian geometry was first recognized as a field worthy of study in Europe.

Back in those heady days, digitization was also called "digital conversion" or "digital preservation," though how these texts were preserved made those phrases sound rather Orwellian. I separated content from container, meaning, I took the books apart. I removed the pages from the covers and spine, and then I took the pages over to a book guillotine, which is exactly what you think it is.

Something like this, via Reddit's r/oddlysatisfying
When the blade of a book guillotine presses down the middle pages of a book sometimes bulge out, and text too close to the spine can be lost, so I often had to break up the books into more manageable batches of pages, which I learned the hard way. Text literally cut off by the guillotine had to be obtained via interlibrary loan.

After cutting, I shrinkwrapped the pages, and shipped them to Nogales, Arizona. Then they were trucked across the border to a land that labor and environmental standards forgot, the "other" Nogales in Sonora, Mexico.

Weeks later, I'd get the pages back, along with a CD-ROM full of .tiff (Tagged Image File Format) files. One page per tiff, as you might imagine. Sometimes there was enough room between the text of the cut pages and the spine to rebind the books, but not always. And not usually. And once some of the mathematics faculty found out, they were concerned.

I would perform quality control on these tiff files, making sure they were legible and level, which sometimes included holding a protractor up to a computer monitor. Really. From there, I sent the tiffs to colleagues who ran optical character recognition (OCR) on them, making them text-searchable, or, in today's parlance, discoverable. It took multiple passes through OCR to turn these files into text-searchable files, and the process was fraught with errors. Umlauts, for example, turned any letter below them into two "i"s. Other accent marks turned "e"s into "6"s. It wasn't always pretty. And once some of the mathematics faculty found out, they were even more concerned.

However, no one was as concerned as Nicholson Baker, who was so concerned he wrote a book about the seemingly haphazard ways in which libraries digitized material without regard for the source. Baker's book, Double Fold: Libraries and the Assault on Paper, was published as I was chopping up these texts. In Double Fold, Baker cited my boss' boss multiple times, often, according to my boss' boss, out of context. Have a look.


Baker's book sparked a firestorm in the library and information science fields, culminating in an appearance at the American Libraries Association Annual Meeting in San Francisco that June. It was the first ALA Annual I attended.

Anyway, if you come across some nineteenth century non-Euclidian material in a database, that was probably me. You're welcome.

Where are they now?
  • My then-boss' boss is now head of the preservation department at the University of Maryland. 
  • It later came out that Baker was storing archival materials in a high-humidity environment, a you-store-it warehouse site next to a river in New Hampshire. According to one library listserv, he also used Post-It Notes as bookmarks. He also wrote a book on pacifism and World War II, Human Smoke, that was widely criticized. The Association of Research Libraries website maintains a page on preservation that is, in large part, because of Baker. 
  • I felt bad about cutting up some of the books, so I put several third edition texts by Isaac Newton and a first edition Gottfreid Wilhelm von Leibnitz aside. 
  • Technological advances: the spread of sophisticated book mounts and cameras, and declining costs associated with them, have limited the above practices.

Elsewhere on this site:

Thursday, October 31, 2013

Data and the Surveillance State: Toward a New Ecology of Libraries

Image from the film The Lives of Others. It's excellent. Go see it
Years from now, we're going to need someone to help us make some sense of the surveillance state (b. 2001), which collects vast amounts of our data, which begets more data about that data.

In short, we're going to need librarians and archivists.

The data that the state collects can and will be used against it later. History has borne this out. Truth and Reconciliation commissions, court cases, oral histories... archives are sites of contestation, of resistance. Archives are an opportunity to build new power structures, to speak truth to official versions of events.

And to ensure that future generations have access to this data, we'll need librarians and archivists right now, too. Privacy is now a good, a commodity, and it's one that information professionals can offer.

Last year I visited the Baltimore Aquarium and was impressed with how conservation was embedded into the building. It's not just a place to see fish, but a place to learn about how to keep those fish around. We need to do this for privacy, for sensible copyright law, and for open access materials, among others.

The ecology of libraries should look more like that of the aquarium.
  • Secure browsers, search engines and email platforms, to the extent that these are possible.
  • In library instruction "one-shot" sessions, educate patrons not just on how to select sources for a particular task, because: 
our teaching must go beyond tools and skills, so that we can help students understand how information fundamentally works. This means exploring the moral, economic, and political context within which we create and share ideas. Access to information, she writes, is not enough. Our students need to see themselves in the context of "individuals and groups of people actively shaping the world as knowledge producers in a way that renders the consumer-producer dichotomy irrelevant." (The incomparable Barbara Fister quoting Christine Pawley)
  • Discovery platforms that take open access, embargoes, and paywalls into account; educating people while they search.
  • Notifications in the stacks and the catalog concerning
  • banned and challenged books, and 
  • items that are affected by copyright extensions.
  • Organizations and member institutions that fight for privacy, like the American Library Association (ALA) and the International Federation of Library Associations (IFLA). 
Source is the above link. Glorious, isn't it? 
And more.

We're going to need to, sometime in the future, remind us how and when we lost our damn minds. Let's build for this now.

Elsewhere on this site, related:
The Library as Aquarium, or, The SOPA Post

Friday, October 18, 2013

Your Special Collections Won't Save You


There are people who need a unique item to do research, but those people won't save your library. The same is true of your special collections, your unique items.

Here's how this will go down: The far majority of researchers who use your special collections are going to publish in their niche subjects, read by a handful of their peers, mostly likely in closed access journals. No fame. No fortune. The same old, same old.

All libraries, regardless of who they serve, have to prioritize. No doubt every patron is valuable. Are the above patrons the most valuable in an academic library? I'm skeptical.

Why is this the case? Because those collections are special for a variety of reasons, which cut both ways. There are reasons an item you have is the only copy; often the demand isn't there for more of them. Can we librarians and archivists drive demand? To some extent, yes. To the extent that special collections are what makes an specific academic library desirable for large segments of the communities we serve? Again, I'm skeptical. But hey, try it, because the following scenario might happen.
Special collections moved to an area of prominence, no longer behind closed doors. Unique books and manuscripts were of immense interest, a catalyst for research, integration into the curriculum, student internships, and user-driven content. (Source)
Yes, a modern-day Alexanderia, where people come from miles around to use your special collections and your expertise. Except there's this thing called satisficing, where researchers find items and sources that are "good enough," (pdf) as opposed to the perfect item, which may be tucked away in an archive or special collection. And yes, even tenured faculty practice this behavior at times. And because those special collections are special to us, but maybe not to our communities.

At my place of work (MPOW), we're using Omeka to help us digitize some of our unique collections. We're not doing it because we think doing so will boost our dwindling circulation or use of the library's physical items. We're doing it because we want to preserve our past, our heritage, who we are, for the future. That is part of our mission within our community. We have these resources, and we want to make them discoverable. However, it has never been our top priority, nor do I foresee a time or situation in which that will be the case. We spend far more time, more productive time, "buying love," as Rick Anderson might say.

What I'm particularly concerned about here is that, per usual, discussions of what R1 institutions should do are driving discourse in academic librarianship. Why is this? The far majority of academic librarians don't work at an institution with multiple Associate University Librarians, yet those places seem to dominate the conversation around what academic libraries should be doing. Much of this is because R1 libraries have the staff and budgets that, in theory, allow them to not only have these conversations, but to implement them. The money is necessary, but not sufficient, perhaps. And yes, I'm jealous of those staffs and budgets.

I'm not saying special collections are worthless. I'm saying they're worth less than you think they are and less than the current literature says they are. I'm saying that this probably isn't the hill you want to die on. Go look someplace else. At MPOW, we'll be looking to aquariums and zoos for inspiration.


My favorite response to Rick Anderson's "Can't Buy Us Love" is from Steven Harris.

Tuesday, December 18, 2012

New Year, New Library: Digital Preservation on the Cheap, Book Mount Edition

As we've gained staff this year, we're able to work on some projects that had been on the back burner in previous years, such as taking stock of our woefully neglected rare books room. As far as I can tell, there hasn't been a proper accounting of what's in there, and we now have eager part-time librarians and interns who want to learn about preservation, digitization, and the original cataloging that comes with those topics. I repeat: you have not properly preserved nor digitized an item until there is robust metadata to go with it. End rant.

One of our part-time librarians has a nice camera, and has been trained to catalog, so what's left is to train this staff member to preserve and digitize. Book mounts can be remarkably expensive for being a few pieces of foam, but we have a trick up our sleeve: Michael's. Yes, the craft store.

My first library job was in preservation for a small theological library in New York City, where I blew a bunch of money on Mylar, and my third was guillotining and digitizing books at a large Midwestern university (yes, I was there for Double Fold, in which two of my bosses are quoted out of context), so I have some background on this area of librarianship. If you are into preservation, Michael's should be your best friend.

That large foam board is $5.99. The four cones are $3.99 each. Placing two cones on each side, slide them closer or further depending on the angle you want. Those three sheets of felt, with sticker backing, are $.99 each. Use them as needed, on the cones and on the board to reduce slippage and to protect the book or pamphlet. The sum of these materials is $26.42, including Maryland state tax. A book mount costs approximately ten times this. Budgets are tight. Get creative.