Skip to main content
  • Pricing
  • Policies
  • Support us
  • Login
Sign up
10.70950
  • Announcing a Major Update to the Thoth Platform
  • What the FIL Guadalajara debates reveal about metadata for academic books and how Thoth Open Metadata and SciELO Books respond to this challenge
  • New Report Published: International Metadata Recommendations and Platform-Specific Requirements for Open Access Books and Chapters
  • Thoth Open Metadata and the Principles of Open Scholarly Infrastructure (POSI)
  1. Home
  2. Blog
  3. 10.70950
  4. Thoth Open Metadata and the Principles of Open Scholarly Infrastructure (POSI)

On this page

  • Why POSI matters to Thoth Open Metadata
  • How we approached this self-assessment
  • Governance
  • Coverage across the scholarly enterprise
  • Stakeholder-governed and non-discriminatory participation
  • Transparent governance and “cannot lobby”
  • Living will and regular review of purpose
  • Sustainability
  • Transparent operations
  • **Funding model, surplus, and reserves**
  • Volunteer labour and transition planning
  • Insurance
  • Open source
  • Open and secure data accessibility
  • Patent non-assertion
  • Interoperability and open standards
  • What happens next

Thoth Open Metadata and the Principles of Open Scholarly Infrastructure (POSI)

by Toby Steiner

DOI: https://doi.org/10.70950/sllm9969

sunny-tank-nG3uli8lerM-unsplash-cropped.jpg

With this post, we share the results of our recently-conducted self-assessment of Thoth Open Metadata against the current version v2.0 of the Principles of Open Scholarly Infrastructure (POSI). POSI is an important reference point for us: it articulates important elements of what it means for an infrastructure to be open, community-governed, and sustainable over the long term.

Thoth Open Metadata is a UK-registered Community Interest Company (CIC) and a community-led, non-profit metadata management and dissemination platform for open access (OA) books and chapters. Thoth Open Metadata (in the following referred to as “Thoth” for brevity) has been conceived right from its inception to serve as a not-for-profit community-owned infrastructure that cannot be sold to large commercial actors or otherwise. With this self-assessment, we wanted to highlight the following aspects:

Where we believe we are already strong

  • Stakeholder-governed, community-owned structure. Thoth is a CIC limited by guarantee under a large membership model. Members control the company and appoint a board made up of representatives from community-led OA initiatives such as the Open Book Collective, Open Book Publishers, and punctum books.
  • Open source software and open (CC0) metadata. Thoth’s codebase is released as open source (under a permissive Apache-2.0 license), with openly documented APIs and technical resources. All metadata records are released openly under a CC0 public-domain dedication via our GraphQL and Export APIs, the public catalogue, and through our OAI-PMH endpoint.
  • Mission-aligned revenue models. Rather than charging for access to metadata, our funding mixes the free provision of the core Thoth platform and all open metadata outputs with revenue generated via added-value services (Thoth dissemination packages, bespoke services) and library support via the Open Book Collective.
  • Interoperability and preservation. Thoth implements and documents a wide range of standards (ONIX 2.1/3.0/3.1, MARC, KBART, BibTeX, Crossref XML, etc.) and integrates with infrastructures such as Crossref, OAPEN, DOAB, OMP/PKP and many others. Thoth’s approach to Open Archiving supports multi-venue preservation of publishers’ content.

Where we know we have more work to do

  • Documenting a formal “living will” and transition plan. We already work to ensure that content and metadata live on in external preservation venues, but we have yet to publish a clear statement about how governance, services, and assets would be transferred if Thoth were to face a wind-down or transition of operations.
  • Formalising financial reserves and surplus policies. As a relatively young CIC (founded in 2022), we are still building up financial resilience. Going forward, we want to articulate more clearly how we think about surplus, reserves, and long-term sustainability.
  • Making governance and membership rules more visible. Our membership model and board structure are described in several places (incl. Thoth’s Companies House entry, and via the InfraFinder directory of open infrastructures). Making a consolidated governance charter and membership policy available directly on Thoth’s website will help with further improving transparency.
  • Clarifying data protection, security, and patent non-assertion. Our public information strongly emphasises openness, but we still need to publish concise statements on how we protect operational data (accounts, logs, analytics) and how we approach patents (e.g. a simple non-assertion statement).

The rest of this blog post dives a bit deeper into how we read POSI from Thoth Open Metadata’s perspective, and outlines concrete next steps.

Why POSI matters to Thoth Open Metadata

Rooted in the Copim community’s approach to open infrastructuring (see e.g. [1] and [2]), Thoth exists to make it easier for small-to-medium-sized independent scholar-led as well as university and library-based publishers of open access (OA) books to more fully participate in the global book supply chain. This is being facilitated by providing publishers with free and open tools to create, manage, and disseminate high-quality, standards-compliant, open re-usable metadata for books and chapters that adheres to the multiple industry standards in use in publishing (e.g. ONIX 3, MARC, KBART, etc.).

A portfolio of added-value services through which publishers can select to e.g. task Thoth with dissemination and archiving of their valuable contributions to the scholarly record on their behalf, or provide open website and catalogue hosting and usage metrics solutions. complements the free provision of the Thoth platform and associated open metadata.

Our mission has always been tightly connected to the idea of “open infrastructure”:

  • Thoth as a platform is built on fully open source software that produces open metadata.
  • Governance is community-based, via a CIC model and a board drawn from stakeholder organisations.
  • Our data, interfaces, and services are designed to complement and strengthen other open infrastructures – not to replace or enclose them.

POSI gives us a shared language to explain what this means in practice and a checklist of questions we should ask ourselves. With POSI v2.0 now available, and Thoth set to launch its new service portfolio in the coming weeks, this felt like a good moment to run a self-assessment of our own, so as to share publicly where we stand.

How we approached this self-assessment

The self-assessment summarised in this blog post is based on :

  • Public information available from the Thoth website.
  • Public documentation and descriptions from the Copim Community (COPIM and Open Book Futures projects), the Open Book Collective, Invest in Open Infrastructure’s Infra Finder database, and Thoth Open Metadata’s Articles of Association (available via our UK Companies House entry).

It is important to emphasise that this is a self-assessment rather than an external audit. Where POSI asks for explicit, published policies and we currently rely more on practice than documentation, we err on the side of treating these as “work in progress.”

Governance

Coverage across the scholarly enterprise

Thoth is intentionally focused on open access books and chapters, with particular attention to small-to-medium-sized independent as well as institutional presses and the “long tail” of OA book publishing.

Within that scope, the following details seem noteworthy:

  • At the end of Q4 2025, more than 80 independent, scholar-led, university and library publishers from around the world were using Thoth’s platform and APIs.
  • Thoth is actively engaged in communities such as Copim Open Book Futures, OPERAS, OASPA, the Barcelona Declaration community of practice, and the Collaborative Metadata (COMET) initiative.
  • We connect with numerous key infrastructures active in the open access book publishing space, incl. not-for-profit stakeholders such as Open Book Collective, OAPEN, DOAB, Project MUSE, JSTOR, and the Public Knowledge Project.

In POSI terms, we do not aim to cover the entire scholarly enterprise, but our goal is to serve a diverse and international set of stakeholders within the distinct space of open access long-form publishing, and we believe we are on a good trajectory there.

Stakeholder-governed and non-discriminatory participation

Thoth is a Community Interest Company Limited by Guarantee under a large membership model. Members have overall control over the activities of the company and approve the appointment of the Board of Directors. The board is drawn from organisations that are themselves community-led OA initiatives, including Open Book Collective, Open Book Publishers, and punctum books.

This is strongly aligned with POSI’s “stakeholder-governed” principle. Where we would like to improve is clarity and visibility, which is why we plan to publish a concise governance page on the Thoth website in due course that brings together more information on questions of:

  • How membership works (who can become a member and how).
    • How directors are appointed and for how long.
    • How conflicts of interest are handled.

We also want to make it easier for new communities (for example, regional publisher consortia) to understand how they might become more directly involved in governance.

Transparent governance and “cannot lobby”

Our current governance arrangements are described in several places (Open Book Collective materials, Copim documentation, IOI’s Infra Finder platform, Thoth’s own “About” pages, and our Articles of Association).

In practice, Thoth does not lobby for regulatory or policy changes that would primarily serve to entrench Thoth itself. Our public advocacy focuses first and foremost on collaboration with other infrastructures to establish an open ecosystem based on open (meta)data and interoperability facilitated through open APIs, with a goal to improve the position of OA books in the scholarly ecosystem through collaboration.

Looking at POSI, the “Cannot Lobby” principle has long been the one that we tended to quibble with the most, as we strongly believe in a need for collective advocacy for change across open infrastructures to foster the uptake of an open, alternative non-profit and community-led ecosystem for long-form publishing. With that in mind, we welcome POSI’s recent introduction of the “advocating for policy change” section to the “Cannot Lobby” principle in its version 2.0, and believe Thoth to be well-aligned with that notion.

What POSI reminds us to do is to write this down more explicitly. A short “policy and advocacy” statement that clarifies that we will not engage in self-entrenching lobbying is on our to-do list.

Living will and regular review of purpose

Two related areas where we still have work to do are:

  1. Living will. Through our open archiving workflows, we already work to ensure that publishers’ content and metadata are preserved in multiple locations. What we do not yet have is a public “what if Thoth disappeared?” document that sets out what would happen to our data and code (all already open, but how would that be stewarded), and how the transfer to the organisation identified as an asset-locked body in our Articles of Association would be managed. This also relates to a second question that deserves a more easily-identifiable answer – how governance and responsibilities would be handed over or wound down.
  2. Regular review of purpose and community value. Our direction is shaped by ongoing engagement with publishers, libraries, and partner infrastructures through the Copim and OPERAS communities, the Open Book Collective, and many others. However, we so far do not yet have a clearly signposted “recurring POSI report” or similar process. Hence, our plan is to move towards a regularly updated community report that explicitly revisits our mission, highlights progress, and identifies POSI-related priorities for the following year. This POSI self-assessment can be seen as a first step to implement the community report more fully.

Sustainability

Transparent operations

Some aspects of our operations are already very transparent:

  • Thoth’s pricing and service offers are fully documented on the website, including what is available for free and what is included in Thoth’s added-value services.
  • As a UK CIC, we file statutory annual accounts and reports that are publicly accessible via the Companies House website.
  • Infra Finder records provide additional detail about our mission, technical stack, standards support, and relationships with other infrastructures.

Where we would like to go further is:

  • Summarising key financial indicators and governance updates in a Thoth-hosted annual report, rather than relying solely on external registries and the Companies House website.
  • Making it clearer how community input shapes our roadmap, for example by highlighting how particular features or integrations arose from engagement with publishers and / or libraries.

Funding model, surplus, and reserves

Thoth’s sustainability model combines several elements:

  • Thoth Free: open access to the core metadata management platform, APIs, and export features, free forever.
  • Thoth services, including dissemination, and website/catalogue and usage statistics services to be formally introduced in Q1/2026: currently per-book service fees that cover DOI registration, distribution, and archiving, including partnerships with Crossref, OAPEN, DOAB, Internet Archive, and other platforms.
  • Bespoke services: additional, optional services such as metadata creation, bespoke workflows, or white-label catalogues.
  • Library support via Open Book Collective membership, where libraries contribute to sustaining Thoth as shared infrastructure rather than purchasing subscription access to proprietary data.

This is strongly aligned with the POSI principle that stipulates that revenue should come from services, not from restricting access to data.

However, going forward, it would be good to more fully articulate:

  • Our goal to generate surplus beyond strict cost recovery (for example, to invest in new features, community support, and to bolster infrastructural resilience).
  • Our policy on financial reserves: how much we aim to hold, what those reserves are for (e.g. wind-down / transition, major infrastructure investments), and how they are governed.

These are topics we intend to tackle in collaboration with our board and supporting institutions in the next planning cycle.

Volunteer labour and transition planning

Thoth’s development has benefitted significantly from community contributions, including community input on code, documentation, and standards research facilitated via the Copim Community, as well as through close collaboration with partner infrastructures and publishers on testing new features and integrations.

At the same time, we are aware that some aspects of Thoth’s operation still rely on a small team and a limited number of key individuals, which creates a key-person risk.

Our next steps here are to continue to document and distribute critical operational knowledge across the team and board, so that the organisation becomes less vulnerable to the potential loss of any one staff member’s capacity.

These actions align with POSI’s emphasis on both volunteer labour planning and transition planning.

Insurance

Open source

Thoth’s codebase is openly available on GitHub under a permissive Apache-2.0 licence.

This fully meets the POSI expectation that the software required to operate the infrastructure should be open source, making it possible in principle for the community to replicate or take over the service if necessary.

Open and secure data accessibility

On the openness side: All metadata records in Thoth are released under a public-domain CC0 dedication and accessible via the main Thoth catalogue (“By book” and “By publisher”), and programmatically via our GraphQL API explorer and Export API. A soon-to-be-launched OAI PMH endpoint will complement the provision of open data in a variety of ways.

On the preservation side: Thoth’s open archiving workflows provide automated, multi-venue archiving (e.g. Internet Archive, Zenodo) for content and metadata, adding redundancy beyond Thoth’s own infrastructure.

Where we want to improve is in the “secure” part of the POSI principle. As is noted on Infra Finder, we rely on AWS for hosting and AWS-level security standards therefore apply.

Going forward, we would like to complement this with a concise Thoth-specific statement on:

  • How we handle user accounts and authentication.
    • How we treat logs and analytics data.
    • How we respond to security incidents.

Patent non-assertion

Thoth’s position as a community-led CIC, combined with open source code and CC0 metadata, already makes patent-based enclosure extremely unlikely.

Even so, POSI encourages infrastructures to be explicit. We do not yet have a published patent non-assertion statement; drafting a simple commitment not to use patents to block community replication of the infrastructure is therefore another concrete action item emerging from this review.

Interoperability and open standards

Interoperability is one of the areas where Thoth is most closely aligned with POSI – and where we invest a significant proportion of our development effort.

  • Thoth supports a wide range of metadata formats (ONIX variants, MARC21/MARCXML, KBART, JSON, CSV, BibTeX, Crossref XML, etc.) and makes these available both through the user interface and via open protocols and APIs.
  • We actively collaborate with open infrastructures and platforms such as PKP’s Open Monographs Press, Crossref, OAPEN, DOAB, JSTOR, Project MUSE, and Jisc NBK, while also engaging with key stakeholders active in the book supply chain such as Google Books, Amazon, ProQuest, EBSCO, OCLC, and others to keep up with platform-specific requirements and to reduce duplication of effort for publishers.

From a POSI perspective, this work is central to ensuring continuity and resilience beyond Thoth itself.

What happens next

This self-assessment confirms that Thoth is already strongly aligned with POSI in several core areas, and particularly around open source, open metadata, stakeholder governance, interoperability, and preservation.

At the same time, POSI remains a useful reminder that “open by default” is not enough: we also need transparent governance documents, clear financial and transition planning, and explicit commitments around topics such as lobbying and patents.

Over the coming months, we plan to:

  1. Publish a governance and membership overview on thoth.pub, consolidating how our CIC structure, membership model, and board appointments work.
  2. Draft and share a living will and transition plan, including how data, code, and governance would be handled in a wind-down or transfer scenario.
  3. Develop and approve a reserves and surplus policy, working with our board and supporting institutions.
  4. Publish a concise data-protection and security statement, complementing our openness commitments with clarity on how we handle operational data.
  5. Add a short patent non-assertion statement confirming that we will not use IP rights to prevent community replication of Thoth.
  6. Move towards a regular POSI-aware community report, revisiting these principles on a regular basis and documenting our progress.

We see POSI not as a one-off compliance exercise, but as an ongoing conversation with the communities that rely on Thoth. If you have feedback on this assessment, or if you are interested in collaborating on any of the work outlined above, we would be very happy to hear from you.


Header image by Sunny Tank on Unsplash. Modifications by Thoth Open Metadata.

UK registered social enterprise and Community Interest Company (CIC).

Company registration 14549556

Metadata

  • By book
  • By publisher
  • GraphQL API
  • Export API

Thoth

  • About Us
  • Privacy policy
  • Terms & Conditions
  • Service status

Contact

  • Email
  • Twitter
  • Mastodon
  • Github

Copyright © 2025 Thoth Open Metadata. Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution 4.0 International license.