
By Toby Steiner, Javier Arias, Rupert Gatti, Ross Higman, Hannah Hillen, Amanda Ramalho and Vincent W.J. van Gerven Oei
DOI: https://doi.org/10.70950/jlke1310
Over the past three years, Thoth Open Metadata has continuously grown as community-led not-for-profit infrastructure for open access books. With this blog post, we revisit recent activities ranging from platform development and metadata research to engagement with publishers, library collaborations, multiple infrastructure partnerships, and joint collective work together with others active within the Copim community – with a particular focus on the Open Book Futures project, while also considering our activities within the Barcelona Declaration community of practice, the COMET initiative, the OPERAS network, and many other areas of work.
Open access books do not thrive through openness alone. They also need high-quality, reusable metadata that can travel well: across publishing platforms, library systems, repositories, catalogues, and knowledge bases. Without that, even openly available books remain hard to find, hard to integrate, and difficult to sustain.
Metadata, in other words, is not a technical side issue. It is part of the core infrastructure that helps open books circulate. Finding open solutions to the challenge of metadata management, dissemination, and archiving has been at the core of Thoth’s work over the last three years. Over the last six years, we have worked hard to establish a fully open, community-led infrastructure for the management and dissemination of metadata for open access books, while also contributing to wider conversations, establishing new collaborations, and expanding standards work – all of which is needed to integrate OA books more fully within the wider discourse that still remains focused on journals.
A major context for this work has been Thoth’s key stakeholder role within Open Book Futures. As part of the wider Copim Community that has grown out of the first project phase of COPIM (2019-23) and continued through Open Book Futures (2023-26), Thoth has contributed to the development of non-profit, community-led infrastructures for open access books.
Within that work, Thoth has played an important role in establishing open workflows that ensure that open access monographs are not only published, but properly disseminated across platforms, supply chains, and discovery systems. As Open Book Futures reaches its conclusion at the end of April 2026, that feels like an important moment. But it is not an endpoint. Through Thoth, the infrastructures, partnerships, and shared practices developed as part of this work will continue beyond the lifetime of the grant.
Over the past two years, Thoth Open Metadata has continued to strengthen its open platform as shared infrastructure for smaller and medium-sized independent scholar-led and institutional university and library publishers. A central aim here has been to reduce duplication of effort at the publisher’s end: the platform now serves as a one-stop shop in which a publisher only needs to enter data once, enabling publishers to manage rich metadata from a single open source of truth that exports it in multiple industry-standard formats for wider dissemination.
That work has included the establishment of automated export workflows that generate ONIX, MARC, Crossref XML, KBART, JSON, CSV, and BibTeX in multiple specifications, while also enabling the direct integration of persistent identifiers such as ORCID and ROR, and controlled subject vocabularies such as Thema. Alongside this, we have continued developing documentation, standards guidance, and updated our supply-chain mapping to make metadata practices more transparent, interoperable, and easier to adopt.
An important milestone in this trajectory came with the release of Thoth 1.0 on 1 April 2026, which introduced substantial changes and improvements across the codebase, schema, data model, workflows, migrations, and underlying infrastructure.
In practical terms, Thoth 1.0 brings together several strands of development that the team had been working on for quite some time. The release introduces a completely new GUI and backend, extended metadata workflows, as well as new metadata categories, automated ingest workflows, a new metrics dashboard and book-level usage widget, and a corresponding set of new services.
Next to that, Thoth 1.0 introduces multilingualism as a key element across both the data model and the user-facing layers, rather than treating it as an afterthought or localisation patch. At the metadata layer, since the inception of Thoth Unicode support has been foundational, as it removes the historical constraint of Latin-only encoding and enables the system to natively represent any script (e.g. CJK, Cyrillic) without lossy transliteration. This is not merely about display; it ensures semantic fidelity of bibliographic data across linguistic contexts. Building on that, the metadata model itself has now become explicitly multilingual. Core descriptive fields (titles, abstracts, or contributor biographies) can be instantiated in multiple languages within a single record. These are not separate records or translations bolted on externally; they coexist structurally, allowing, for example, English, Hungarian, and Dutch variants of a title to be stored, queried, and rendered as part of the same canonical object. This design supports parallel-language publishing workflows and improves discoverability across linguistic boundaries.
All of this is complemented by an integration of multilingual UX and documentation (in Spanish, Brazilian Portuguese, and German next to English), an integration of accessibility metadata, improved handling of JATS XML, self-service bulk ingest capabilities, enhanced catalogue services, direct content upload, and the provision of a dedicated OAI PMH endpoint. Taken together, these changes mark a significant step in Thoth’s maturation: not only does this introduce a more robust technical platform, but a broader and more flexible infrastructure for publishers working across complex metadata and dissemination environments.
Alongside the Thoth 1.0 release, Thoth Open Metadata has also introduced a clearer service structure built around four main service packages: keeping with the old-Egyptian theme of our portfolio, we have named these Oasis, Obelisk, Sphinx, and Pyramid. In this setup, Thoth Oasis constitutes the always-free self-service entry point that enables publishers to access Thoth’s full metadata management capabilities, while also providing export in multiple open formats, including ONIX, JSON, MARC, KBART, CSV, BibTeX, and Crossref XML.
The three other service packages include added-value subscription-based extensions to the free provision of Oasis. Thoth Obelisk is positioned as the dissemination package, adding Crossref membership with automated DOI deposit, automated distribution and archiving, file hosting via CDN, ebook distribution to multiple aggregators, metadata distribution to indexes, and permanent ebook archiving through Thoth’s Open Archiving workflows. Thoth Sphinx expands this by adding a dedicated usage statistics offer, including privacy-first analytics across all platforms that Thoth distributes to, title-level usage widgets and a publisher metrics dashboard drawing on OPERAS Metrics and other sources. Thoth Pyramid extends this further still towards full-scale website and catalogue hosting, with branded publisher sites, an integrated catalogue, detailed book and chapter landing pages utilising Thoth metadata, and delivery through a global CDN – all of which can be made available through a publisher’s own web domain.
What this new structure does, above all, is make clearer how Thoth’s infrastructure can support publishers at different stages and with different needs: from free self-service metadata management, through dissemination and automated DOI registration and management, to usage metrics and fully hosted website and catalogue environments.
The distribution service has been a game-changer for us, with most of the quilt-like process of matching records to platform handled automatically. It was important to mediastudies.press, too, that Thoth was generating revenue from the community (us included) to take a step toward sustainability. The paid distribution service was worth it several times over for the value delivered alone; that mediastudies.press could also support shared community infrastructure was an important-to-us bonus. Jeff Pooley, mediastudies.press
The newly-introduced pricing structure reflects this substantial change in service layout: the switch from our previous per-title fee for dissemination to an annual subscription fee structure applicable across the extended set of services became necessary to better account for the standing operational costs that Thoth incurs. Our fee calculations have sought to implement a like-for-like coverage of fees compared to the previous per-title fee structure, while also seeking to enable Thoth to remain operational over time.
Alongside platform development and the introduction of a new service portfolio, Thoth Open Metadata has continued contributing to wider sector research on metadata standards and dissemination requirements for open access books. A major outcome of this work was the release in January 2026 of a joint scoping exercise titled International Metadata Recommendations, and Platform-Specific Requirements for Open Access Books and Chapters (announcement & full report).
This report responds to a persistent problem in the OA books landscape: metadata fragmentation. Following established industry practice, by distinguishing between essential and desirable metadata elements, and by mapping platform-specific requirements, it offers practical guidance for publishers, libraries, and service providers alike.
The approach encapsulated in the proposed metadata framework to empower publishers to adopt good metadata practice themselves represents a key tenet of Thoth’s work and ethos. Thoth exists to make publishing of open access long-form publications easier – we are putting in the legwork of finding infrastructural solutions so individual publishers don’t have to. This is not only about implementing standards in technical systems. It is also about interpreting them, documenting them, and making them usable in real publishing contexts, especially for smaller scholar-led and institutional presses working under limited capacity constraints.
[Our] ability to provide clean, rich book metadata and to make our OA books widely and easily available to readers everywhere is orders of magnitude better than it was before we partnered with Thoth. James Rice, White Horse Press
Thoth Open Metadata’s participation in Copim’s Archiving & Digital Preservation Group has also led to exciting new research that investigated what it means to preserve the open nature of a scholarly publication, as compared to locking it away in a closed archive. This has resulted in the definition of Copim’s Open Archiving Criteria, and also led to the publication of a substantive research report titled Preserving Openness: A Comparative Review of Archiving Solutions for Open Access Books that sought to apply these criteria to five widely-used archiving solutions. The results in turn also helped us expand and adapt the archiving workflows available for publishers through Thoth – and there are now two dedicated options available that enable publishers to easily implement open archiving within their day-to-day practices. As we outline in our Archiving service package, this can be facilitated either by a self-service approach where publishers can easily do this themselves, or they task Thoth with facilitating the dissemination to open repositories on their behalf.
Libraries have played an important role throughout the development of Thoth Open Metadata. During the last three years, Thoth has worked with library partners not simply as recipients of metadata, or as financial supporters of our work via the Open Book Collective – and we are tremendously grateful to the more than 30 international research libraries that are already doing so – but also as collaborators in shaping better metadata practice for open access books across the wider ecosystem.
Ongoing collaborations between librarians and the Thoth team have been key to enabling the integration of high-quality metadata exports in various MARC formats, and metadata librarian experts have contributed substantially to the above-mentioned industry report on International Metadata Recommendations. Next to that, our work with libraries became particularly visible in Thoth’s recent contribution to the CILIP Metadata and Discovery Group Conference 2026. Our presentation, “Fixing the Leaky Pipeline: Metadata Challenges & Open Community Solutions for Open Access Books”, jointly prepared with Emma Booth, University of Manchester Library, focused on a persistent challenge: open access books may be available online, but if their metadata is fragmented, incomplete, or difficult to reuse, they remain hard for libraries and discovery platforms to ingest and expose properly. The presentation argued for closer collaboration between publishers, libraries, platforms, and infrastructures, and pointed toward open community approaches as part of the solution. That same concern has run through Thoth’s wider work during this period: how to help open books move more reliably into catalogues, repositories, knowledge bases, and research workflows, while reducing friction for both publishers and libraries.
Alongside its direct infrastructure and research work, Thoth has also been active in wider sector initiatives concerned with open research information and collaborative improvement of scholarly metadata. This has included participation in the Barcelona Declaration on Open Research Information and in COMET.
To us, these initiatives matter because the problems facing OA book metadata do not exist in isolation. Questions around openness, reuse, interoperability, authority control, data stewardship, and shared responsibility run across scholarly communication as a whole. Thoth’s engagement in these spaces reflects a commitment to ensuring that books are part of those larger-scale conversations, and that the needs of long-form publishing, which plays an essential role in the otherwise neglected research areas of Social Sciences and the Humanities, are represented within broader work on open research information and metadata enhancement.
Another important strand of this period has been Thoth’s contribution to the formation of the OPERAS Open Infrastructures for Open Access Books Working Group. Through this initiative, Thoth has worked alongside like-minded organisations to strengthen communication and coordination across the growing landscape of open infrastructures that focus specifically on open access books. The aim is not just to facilitate dialogue, but to foster closer collaboration across institutions to implement open data practices and interoperability across open systems and to work on overarching issues such as collective impact on policy development, and uptake of good metadata practice across systems.
This reflects a broader strand in Thoth’s work over the past years. The goal is not only to develop and improve one singular platform, but to lay the groundwork across systems with a focus on interoperability and the free flow of open metadata through open interfaces/APIs, which ultimately helps to foster the emergence of an open ecosystem in which multiple community-led, open infrastructures can collaborate.
That wider infrastructural perspective has shaped Thoth’s collaborations throughout the period. Rather than building in isolation, we have continued contributing to a broader ecosystem in which metadata can move more effectively across open scholarly systems. This has been manifest in our close collaboration with the Open Book Collective, for which Thoth is now providing a collective catalogue for all publishers represented on the OBC. It has also included a deepening relationship with OAPEN and DOAB, through strategic partnership, trusted-platform work, and continued technical collaboration around automated metadata and content exchange workflows.
Thoth has also deepened its collaboration with the Public Knowledge Project (PKP) to establish a metadata exchange between PKP’s Open Monograph Press and the Thoth platform, which now empowers presses to reuse and improve metadata across multiple dissemination channels. At the same time, early exploratory work with Pressbooks points toward the possibility of extending these interoperable workflows further into adjacent open publishing environments.
One of the outcomes of the 2026 Copim Conference that we are particularly excited about has been the establishing of closer ties with Pressbooks, the open-source publishing platform that is widely used to publish open textbooks and OER in a wider sense. In a similar fashion to the approach taken with Open Monographs Press, and utilising open data and open APIs available at both ends, Thoth is now working closely with Pressbooks to develop a programmatic data exchange connection between the two open systems to facilitate data exchange and subsequent improvement particularly for open textbooks.
Across these collaborations, the underlying goal has remained consistent: to contribute to a federated, open infrastructure landscape in which open metadata is shared more effectively and manual duplication is reduced, so that community-led publishing can thrive.
Of course, all of the above would have been in vain if our key user group of publishers had been left out of those developments. Thoth is proud of its origins as a publisher-led initiative, and we are grateful for the many conversations and feedback that publishers across the globe have kindly given to help us better understand their existing pain points – which in turn has enabled us to collaboratively come up with creative open solutions to issues that have long been persisting in the open access book publishing space.
Bokförlaget Stolpe publishes high-quality non-fiction in the humanities and social sciences, with a commitment to craftsmanship, scholarship, and quality. As part of the Axel and Margaret Ax:son Johnson Foundation for Public Benefit, making knowledge freely available is a natural extension of our mission. With Linné: Natur och nation by Lisbet Rausing we saw an opportunity to pilot open access publishing and explore how it could broaden the reach of our work to a global audience. As we began our journey into open access publishing, finding smooth and efficient ways to distribute our titles was essential. For a press focused primarily on print production, navigating the digital open access supply chain involves considerable administrative effort. Thoth offered a straightforward solution, handling wide distribution without that overhead. Through its service, Linné has been made available across a wide range of libraries, repositories, and discovery services — from OAPEN and DOAB to Google Play Books and Internet Archive. We look forward to adding more titles to our Thoth library as our open access catalogue grows. Alexandra Spendler, Bokförlaget Stolpe
Coming towards the end of the Open Book Futures grant, we are proud to highlight that the platform, which just two years ago had been used by less than ten publishers, has now been adopted by more than 100 independent scholar-led as well as institutional university and library publishers across the globe, including presses from Latin America, the US, the UK, and Europe.
Fig. 1: Applications received from publishers and publishing initiatives between May 2025 and April 2026: Geographical spread.
Going beyond the numerous exchanges with individual publishers, our team has also been engaging with associations and consortia such as SciELO Livros, Open Institutional Publishers Association (OIPA), the Irish Open Publishers Association (IOPA), the Working Group of German University Publishers (Arbeitsgemeinschaft Universitätsverlage), Netherlands University Presses (NUPs.nl), the African Platform for Open Scholarship (APOS), and many more.
Thoth Open Metadata is also proud to be a member of a number of relevant associations and industry bodies such as the Association of European University Presses (AEUP), UKSG, OASPA, OPERAS, and COUNTER, and is listed in open infrastructure indices such as the OPERAS Pathfinder, Invest in Open Infrastructure’s InfraFinder, and the European Diamond Capacity Hub registry.
Throughout this period, Thoth has remained active in wider conversations around open infrastructure, metadata quality, bibliodiversity, and the future of community-led scholarly communication. This has included participation in the Copim Community, the OPERAS network, the Barcelona Declaration, COMET, and related cross-sector discussions around metadata openness and interoperability.
These engagements are not separate from Thoth’s technical work. They are part of the same effort: making the case that metadata should be treated as shared infrastructure, and helping to build the conditions in which open access books can circulate more fully, more equitably, and with less dependence on closed systems.
Another example that I think has been very successful is Thoth. They provide DOI minting services to ensure that books have accurate and high quality metadata to make them easier to find. They’re pros at making sure works are discoverable and the metadata is reusable and high quality. Kyle Demes, OpenAlex
As Open Book Futures comes to a close at the end of this month and Thoth 1.0 marks a new stage in the platform’s development, the past three years have made the truth of the Copim community’s early key tenet acutely aware: open access needs open infrastructure. Tailored to our specific context, we are convinced that open access long-form publishing needs open metadata infrastructures to ensure that open access books in any shape or form remain genuinely discoverable, reusable, and sustained over time. For us, that sketches out the work ahead, and the foundation Thoth will continue building on.
The end of Open Book Futures also marks the closure of Thoth Open Metadata’s project-funded phase. We are confident that we have established an attractive and sustainable service model that will sustain Thoth’s operations in the coming years, and are looking forward to continuing our collaboration with the publisher and library communities we serve.
If you are a publisher interested in working with us, please don’t hesitate to be in touch via info@thoth.pub or our website at https://thoth.pub .
If you are a librarian interested in contributing to the good cause of open not-for-profit infrastructures such as ours, please consider supporting the emerging ecosystem via the Open Book Collective, our sister organisation that has also been established by the Copim community. And if you’re keen about thinking collectively about how metadata workflows can be opened up across the different systems in use within the wider book supply chain, we’d love to hear from you!
Fig. 2: Copim slogan “No open access without open infrastructure”
Header image by Jr Korpa on Unsplash.