Digitizing the Archives of the Afro-American Newspapers

Photos: Afro Newspaper/Gado/Getty Images

In 1892, former enslaved person, Civil War veteran, and Baltimore citizen John H. Murphy Sr. had a vision for a new kind of news source: one that would serve the city’s African American population. Merging several church newsletters — and borrowing $200 to buy printing presses — he launched the Afro-American Newspaper.

Murphy spent the rest of his life scaling up the paper. He later passed it down to his family, who stewarded it through multiple generations and continued its expansion. In its heyday, the paper had nine editions in 13 East Coast cities.

Today, more than 120 years after its founding, the Afro continues to operate. Even more remarkably, it has remained in the Murphy family. It’s still a major news source in Baltimore and D.C. and holds the title of the longest-operating family-owned African American paper in the United States.

Murphy led a vibrant and storied life, serving in the United States Colored Troops during the Civil War and crusading against Jim Crow laws. The Afro’s history is equally storied. Its reporters and photographers documented Plessy v. Ferguson, bore witness to riots and lynchings, and rubbed elbows with the likes of Jackie Robinson, Martin Luther King Jr, and Malcolm X.

History literally spilled from these boxes, with photos covering tables, desks, even walls.

Perhaps most remarkable, though, is the history the paper has amassed in its century of operation. In an era when many newspapers have actively sold off their photographic archives to raise revenue, the Afro has aggressively preserved and carefully stewarded its own. The result is that its archives contain a collection of 1.5 million photographs dating back to the paper’s founding. It’s likely one of the best Black history collections in the world and has been called a “national treasure.”

I first encountered the Afro’s collection while working on an oral history project in East Baltimore. I tagged along with a colleague and visited the Afro’s archive, looking for a historical photo of the neighborhood I was studying. What I found there blew me away.

In a meandering series of rooms filling the back portion of a nondescript building on Baltimore’s North Charles Street sat thousands upon thousands of boxes, floor to ceiling, filled with 8x10 photographs. History literally spilled from these boxes, with photos covering tables, desks, even walls. You could open a box and find original photos of Aretha Franklin, a 1930s wedding, or a protest — basically any event, large or small, personal or national, of the past century.

As a techie (and especially one in 2010), my first thought was, “This has to be digital.” The Afro had preserved its collection through two world wars, the civil rights era, and even a sinkhole that threatened to swallow the paper’s headquarters. But like most papers at the time, the Afro had just begun the process of digitizing its holdings. With the brashness of a young programmer, I proposed an ambitious project to rapidly digitize the paper’s holdings. The Afro accepted, and together we sought — and won — grant funding to get started.

The challenge before us was vast. The remarkable diversity of the Afro’s photos made them incredibly challenging to scan. You could open a folder and find a century-old photographic print, a glossy promotional photo from the 1960s, a contact sheet from a staff photographer — even a solid metal printing plate.

Most of the paper’s photos were annotated — an incredible rarity in the archives world and a big part of the collection’s historical value. But the way they were annotated was another big challenge. Most had typewritten or handwritten notes from the original photographer, copied onto onion skin (an impossibly thin paper used to make duplicates in typewriters and happens to age extremely poorly) and pasted onto the back of the photo.

All this meant that we couldn’t simply run the Afro’s materials through an auto-feed scanner. The onion skin would tear off, and a single paper jam could destroy a century-old historical artifact. But we also didn’t have the resources to take the standard approach — hiring tons of people and having them flip photos all day long on a flatbed scanner.

Under the supervision of an eternally patient attendant, the robot moved and scanned photos for more than a year.

Physical security was also a concern — the Afro’s materials are extremely valuable, and thus a target for theft. Entering the archives meant being buzzed through three levels of secure doors, like entering a bank vault. Nothing could leave the building, so hiring an outside bulk-scanning contractor was out.

Our solution, naturally, was to build a robot. Initially, this was a giant, hefty, improvised three-axis CNC machine that I hacked together in a local machine shop using the open-source prototyping platform Arduino, medium-density fiberboard, and parts ordered from McMaster-Carr. It used a suction cup, vacuum cleaner, and servo motors to lift photos onto a flatbed scanner, digitize them, and gently put them down in a pile. It was devilishly slow and finicky, but it proved the concept.

Eventually — with help from outside contractors, students and support from my alma mater Johns Hopkins University, and a friendly group of Finnish research scientists — this evolved into a sleek, 3D-printed, open-source arm (still Arduino-powered, but with a custom circuit board for a brain) that could lift photos and scan them on a flatbed at a rate of about one photo every two minutes.

Note to other creators: People love robots — I’m reluctant to even discuss it here, since I know doing so will restart a stream of people reaching out and attempting to buy it. I got to present it at the popular Python conference PyCon and fly to California to test it in libraries. It was featured in Forbes and the Wall Street Journal. I did numerous live demos, which seem insanely risky and ambitious now but felt like a good idea at the time (and generally went okay). The robot eventually found a home as part of a digitization tech initiative at Aalto University, where it helped educate a new generation of Finnish industrial designers.

But before all that, we put it to work in the Afro’s archive. Under the supervision of an eternally patient attendant, the robot moved and scanned photos for more than a year. In an interview with the Baltimore SunAfro archivist John Gartrell said, “I probably scanned in 5,000 images myself. [The robot has] taken out so many steps.” We also built Python software to trigger a manual scanner directly through its TWAIN driver, allowing for fast manual scanning without the fussiness of user interfaces. Ultimately, this worked just as well as the robot — though no major media outlets were knocking down our door to write about it.

I remember the system captioning a photo of a local dignitary as “a nun with a fish.”

After a year in the archive, we had scanned about 120,000 images. That’s roughly 10% of the overall archive and probably 80% or more of its most historically significant materials. John Oliver Jr., CEO of the Afro at the time, wrote, “For many years, our archives were largely inaccessible.” While the paper had created a number of new archival products, he said, “We are most excited about the potential of Tom Smith and [Gado]. Their work… this past year has given a dynamic visual element to our history.”

Soon, though, we discovered we had another, much bigger problem. Many digitization initiatives at the time — ours included — took an “if you build it, they will come” approach. The assumption was that if you create a fantastic digitized collection, people would line up to use it in droves.

This turns out not to be true. The internet is littered with broken links to mid-2010s initiatives that created wonderful digital collections and failed to find an audience for them. Unless you actively process and promote a collection, it ends up sitting on a digital shelf that’s just as dusty as the physical one it was created to supplement (or, in truly tragic cases, replace).

After the digitization initiative, the Afro had a wonderful collection of digital history that was barely being used outside the paper itself. Recognizing that this was a common problem — and having just finished another venture — I co-founded a new company, Gado Images, to help address it. The Afro was our first customer.

My team now faced a challenge much larger than the original one of digitizing the paper. Using the limited resources of a startup, we had to make sense of the content of every photo in the Afro’s digital collection. Each needed keywords, associated text, and other metadata to make it searchable. People in each photo needed to be identified.

To accomplish this, we leaned heavily into A.I., at a time when it was just starting to affect the visual media world. Initially, this meant building everything ourselves, which in turn meant spending untold hours working through the intricacies of the Open Source Computer Vision Library (OpenCV) and dissecting open-source OCR engines like Tesseract.

As the deep learning revolution ramped up and big companies started offering A.I. as a service, it meant (and still means) integrating with an ungodly number of APIs. We started out by running the paper’s entire collection through Google Vision’s OCR — including those vitally important onion-skin sheets of original captions and photographer notes. That gave us tens of thousands of snippets of text. We mined these using IBM’s Watson, pulling out concepts, references to people and places, and the like.

With the recent murder of George Floyd, the Afro’s photos provide an incredibly essential element: context.

We then used three different auto-tagging A.I. platforms to tag the Afro’s photos based on their visual content, comparing the platforms’ outputs to each other to increase accuracy. With a pretrained model from IBM, we identified historical figures in the photos based on their faces (IBM recently announced it would exit facial recognition altogether.) We considered tagging people based on gender and age, using face-derived data, but as many others have realized, facial recognition tech — being trained primarily on Caucasian faces — does poorly when analyzing the faces of people of color. The face analysis algorithms we tested consistently misidentified races and genders and often missed peoples’ true ages by a decade or more. Faced with this algorithmic bias, we decided to drop these tools from our process.

We then fed all this data in a massive neural network, with the goal of having it automatically write descriptions for each photo. The initial output was dismal. I remember the system captioning a photo of a local dignitary as “a nun with a fish.” Unless we were planning to restart the Dada movement, that wasn’t going to cut it.

Over time, though, the system improved. Except in very limited cases, it’s never gotten good enough to write descriptions totally unsupervised. But it creates fantastic finding aides that human researchers can use to home in on the juiciest parts of the collection for professional captioning. We also received access to folder-level data on the entire collection from a sister project and integrated much of this into our system. And the neural network itself served as the basis for a project to map historical Black social networks with the Black Press Research Collective.

Armed with several thousand of the Afro’s best photos — now with full captions and a solid set of metadata — we struck deals with several media marketplaces (including Getty Images) to act as our distributors and represent the collection to media buyers.

Today, more than five years down the line, the effects have been dramatic. The Afro’s photos have been used in more than 7,000 news stories, documentary films, blogs, advertisements, textbooks — even a video game. Playwright Alonzo Lamont even used them to create an original play, East Side Story, which was performed in East Baltimore. Through licensing, the photos provide an important source of revenue to support the paper’s reporting.

With the recent murder of George Floyd, the Afro’s photos provide an incredibly essential element: context. They show that racially prejudiced policing and the murder of Black citizens has a distressingly long history.

The paper has images of young African American men being arrested for minor crimes, such as carrying an unapproved type of knife, dating back to the 1930s. The paper also has powerful and affecting images of lynchings, from both its staff photographers and partners. These include documentation of the murder of Emmett Till, an unarmed Black teenager, which has haunting similarities to George Floyd’s death today. Rosa Parks reportedly had Till in mind when she refused to give up her bus seat, and his death is one of the turning points in the civil rights movement — just as many hope Floyd’s death will be a turning point today.

The Afro’s archive also captures the long history of Black activism. Photos dating to nearly the turn of the 20th century show Black citizens picketing, marching, protesting racial injustice, demanding unencumbered voting rights, fighting for equal pay, and more.

These photos are a powerful reminder that today’s events are not simply another turn of the news cycle. Many have undercurrents and roots that extend back a century or more, and that the world is long overdue in addressing. With issues this endemic, change won’t happen through some blacked-out Instagram feeds and corporate platitudes. It will require sweeping changes to policy and structural elements of our society — as well as the continued voice and advocacy of organizations like the Afro.

I love the Afro’s intimate images of Jackie Robinson with his family, and I find the paper’s images of protests deeply affecting. But some of my favorite images from the Afro’s archives are those that show daily life: weddings, civic awards ceremonies, local businesses launching, Elks conventions, Alpha Kappa Alpha galas, and the like.

They’re a powerful reminder that on most days in the Black community — as with all communities — not much happened. And it’s an important reminder since most mainstream papers of the 20th century covered only Black exceptionalism: civil rights marches, violence, famous athletes, celebrities. To a large extent, that’s true today, too.

As scholar Kim Gallon of the Black Press Research Collective writes, “The Black Press was founded in response to the distortions and ugly untruths that white newspapers often published about African Americans.” She quotes journalists Samuel Cornish and John Russworm: “Too long have others spoken for us. Too long has the public been deceived by misrepresentation.” That sounds like it could easily be a Twitter post by an activist today, responding to mainstream coverage of Floyd’s death. But those words were written in 1827.

As Gallon explains, papers like the Afro were created because “the only way to respond to the power of the early nineteenth-century white press and its regular false and negative depictions of African Americans was to develop Black news outlets that would counteract and offer an alternative to these images.”

Throughout its history, the Afro has done exactly that. Its depictions of daily life — thriving small businesses, civic engagement, city beautification, even society and gossip — counteract coverage that focuses only on negatives or the sensational. They’re the “alternative images” that Gallon discusses, and they’re a vitally important part of the historical record.

Its images are also a powerful way to connect communities to their past. While sitting in on rehearsals for East Side Story, I vividly remember when the Afro’s images were projected behind the stage for the first time. One of the actors nearly broke down. He was from Baltimore’s Middle East neighborhood. The photo happened to be of his father.

This also speaks to the power and importance of tech-driven mass digitization. The standard approach to scanning a commercial archive is to focus on the most valuable 1% to 2% of the collection. Almost invariably, this means capturing images that cover famous people and major events. The everyday, being less profitable, is left out.

Digitizing a whole archive (or at least a massive sample of it) affords the opportunity to capture both the iconic, highly profitable images and those that document daily experience. Today, this is much more doable than when we began at the Afro a decade ago. Modern scanning tech like the Phase One overhead camera can scan hundreds of images per hour, and sheet-feed scanners today can scan delicate materials without damaging them. For institutions that can afford the tech, there’s no excuse not to digitize everything.

Likewise, our work with the Afro is a reminder that A.I., big data, and even facial recognition need not be forces for domination and evil. The stories we hear about these technologies often focus on the Clearviews and Palantirs of the world. Deep learning and A.I. can certainly be mobilized for nefarious purposes — and the world needs to remain vigilant about these applications. But deep learning and A.I. also provide an increasingly powerful toolkit, and one that is becoming more accessible even to small organizations like ours.

The first step, though, is to embrace these technologies and apply them. It’s a testament to the innovation of the Afro that the paper has remained both viable and independent in an age when many news outlets are banding together into massive conglomerates just to keep existing.

It helps that innovation and outside-the-box thinking are in the paper’s DNA. The MDDC Press Association called the Afro’s early newsrooms “incubators” for their journalistic innovation, and the paper has been credited with launching “the careers of nationally recognized journalists.” Early on, the paper centralized its printing facilities and bought state-of-the-art equipment to ensure its ability to continue printing six days per week. I saw that innovation firsthand when the paper’s leadership listened to a fanciful pitch from an unknown twentysomething about using robots and A.I. on their century-old archive, to which they said, “Okay, let’s give it a shot.”

I’m not of the community that the Afro primarily serves. I can’t purport to know what it’s like to be Black in America today — or at any other time in history — and the daily struggles that entails. But in working with the Afro, I’ve always felt that there was a shared ethos — innovation, risk-taking, family business, a willingness to think differently from the mainstream — that binds across other lines.

At its worst, tech can be a divisive force that leads to suppression and alienation. But at its best, it can be a unifying set of practices and beliefs that connects people and organizes their actions. When John H. Murphy Sr. launched the Afro, the word “startup” was still a solid 80-plus years from entering the lexicon. But the process of recognizing a problem in the world, pouring your heart and soul into solving it, maintaining independent thought, and actively challenging conventional wisdom is one that would be familiar to any startup founder today. And even more than 120 years later, that aspect of Murphy’s spirit is still alive and thriving at the paper.

I know that the Afro’s images will prove to be a valuable resource as we attempt to come to grips with events like George Floyd’s death. I hope they provide a crucial context and background to inspire change. But I also hope they’re a reminder of the resilience of communities and institutions. Murphy’s paper has served the world well for the past 128 years. In the difficult times we now face, the Afro and its archive will continue to serve it well today.

This article originally appeared in Onezero.

Previous
Previous

Diverse Images Are Finally Getting Year-Round Attention

Next
Next

I Asked an AI Robot to Imagine Iconic Bay Area Scenes