Friday, January 31, 2025

Large-scale infrastructural changes to WWA site

Much has changed since we last posted on the Whitman Archive Changelog. In March 2024 we unveiled our redesigned website, the work of several years and many individuals, and made possible by the generous support of the National Endowment for the Humanities. Users may notice changes, large and small, including reorganization of the site's major sections; improvements to image browsing, searching and faceting of results, and support for mobile devices; as well as new content. Other changes and improvements will be less obvious but are important for the sustainability of the project. 

The most substantial change was the migration of the site architecture from the Apache Cocoon framework to the Ruby on Rails programming framework. The new site also makes use of a storage and processing system named Datura, developed by Nebraska's Center for Digital Research in the Humanities (CDRH), which is a set of scripts to pre-generate HTML from the TEI XML and sore the derived HTML along with the data in a defined structure on the server. This has several advantages: it is less resource intensive for the production site, it allows us to track changes to the front end (generated) code as well as the back end, and it allows us to make our transformed HTML easily available to others who don’t have the technical infrastructure to work with TEI XML. These sets of scripts utilize several programming languages, among them XSLT, which means we’ll be able to use much of the existing custom programming. The set of scripts also pulls data from the TEI XML and creates an intermediary JSON file which can push data to a search engine such as Elasticsearch. 

Our infrastructure work also involved the creation of an Application Programming Interface (API) for all the WWA data, and it is from the data in the API that the new site is built. The API draws data from an Elasticsearch instance and reformats it into a generalized format, which will be described by the OpenAPI specification, an emerging standard for web APIs and generated documentation. For the front end, the CDRH has created a Rails Engine which connects to the API, creates browse and search interfaces, and displays pre-generated HTML. These three pieces together—a storage and processing system, an API, and a front end—constitute the building blocks our new, sustainable infrastructure for making the Whitman Archive’s rich data available, searchable, and browsable.

On the front end, the site will likely appear much as it did before; however, some of the major site sections have been renamed or reorganized. The site's search pages—both the full-site search page and separate, section-specific search pages—have also been updated and improved. We have also added a new sub-section containing an "Index of Works," allowing users to see all instances (either manuscript or print) of a given work (poem or other Whitman-authored text) across the Archive. We have also added pages for several more Whitman "Disciples." 

- Kevin

No comments: