by Toby Steiner
DOI: https://doi.org/10.70950/sllm9969

With this post, we share the results of our recently-conducted self-assessment of Thoth Open Metadata against the current version v2.0 of the Principles of Open Scholarly Infrastructure (POSI). POSI is an important reference point for us: it articulates important elements of what it means for an infrastructure to be open, community-governed, and sustainable over the long term.
Thoth Open Metadata is a UK-registered Community Interest Company (CIC) and a community-led, non-profit metadata management and dissemination platform for open access (OA) books and chapters. Thoth Open Metadata (in the following referred to as “Thoth” for brevity) has been conceived right from its inception to serve as a not-for-profit community-owned infrastructure that cannot be sold to large commercial actors or otherwise. With this self-assessment, we wanted to highlight the following aspects:
Where we believe we are already strong
Where we know we have more work to do
The rest of this blog post dives a bit deeper into how we read POSI from Thoth Open Metadata’s perspective, and outlines concrete next steps.
Rooted in the Copim community’s approach to open infrastructuring (see e.g. [1] and [2]), Thoth exists to make it easier for small-to-medium-sized independent scholar-led as well as university and library-based publishers of open access (OA) books to more fully participate in the global book supply chain. This is being facilitated by providing publishers with free and open tools to create, manage, and disseminate high-quality, standards-compliant, open re-usable metadata for books and chapters that adheres to the multiple industry standards in use in publishing (e.g. ONIX 3, MARC, KBART, etc.).
A portfolio of added-value services through which publishers can select to e.g. task Thoth with dissemination and archiving of their valuable contributions to the scholarly record on their behalf, or provide open website and catalogue hosting and usage metrics solutions. complements the free provision of the Thoth platform and associated open metadata.
Our mission has always been tightly connected to the idea of “open infrastructure”:
POSI gives us a shared language to explain what this means in practice and a checklist of questions we should ask ourselves. With POSI v2.0 now available, and Thoth set to launch its new service portfolio in the coming weeks, this felt like a good moment to run a self-assessment of our own, so as to share publicly where we stand.
The self-assessment summarised in this blog post is based on :
It is important to emphasise that this is a self-assessment rather than an external audit. Where POSI asks for explicit, published policies and we currently rely more on practice than documentation, we err on the side of treating these as “work in progress.”
Thoth is intentionally focused on open access books and chapters, with particular attention to small-to-medium-sized independent as well as institutional presses and the “long tail” of OA book publishing.
Within that scope, the following details seem noteworthy:
In POSI terms, we do not aim to cover the entire scholarly enterprise, but our goal is to serve a diverse and international set of stakeholders within the distinct space of open access long-form publishing, and we believe we are on a good trajectory there.
Thoth is a Community Interest Company Limited by Guarantee under a large membership model. Members have overall control over the activities of the company and approve the appointment of the Board of Directors. The board is drawn from organisations that are themselves community-led OA initiatives, including Open Book Collective, Open Book Publishers, and punctum books.
This is strongly aligned with POSI’s “stakeholder-governed” principle. Where we would like to improve is clarity and visibility, which is why we plan to publish a concise governance page on the Thoth website in due course that brings together more information on questions of:
We also want to make it easier for new communities (for example, regional publisher consortia) to understand how they might become more directly involved in governance.
Our current governance arrangements are described in several places (Open Book Collective materials, Copim documentation, IOI’s Infra Finder platform, Thoth’s own “About” pages, and our Articles of Association).
In practice, Thoth does not lobby for regulatory or policy changes that would primarily serve to entrench Thoth itself. Our public advocacy focuses first and foremost on collaboration with other infrastructures to establish an open ecosystem based on open (meta)data and interoperability facilitated through open APIs, with a goal to improve the position of OA books in the scholarly ecosystem through collaboration.
Looking at POSI, the “Cannot Lobby” principle has long been the one that we tended to quibble with the most, as we strongly believe in a need for collective advocacy for change across open infrastructures to foster the uptake of an open, alternative non-profit and community-led ecosystem for long-form publishing. With that in mind, we welcome POSI’s recent introduction of the “advocating for policy change” section to the “Cannot Lobby” principle in its version 2.0, and believe Thoth to be well-aligned with that notion.
What POSI reminds us to do is to write this down more explicitly. A short “policy and advocacy” statement that clarifies that we will not engage in self-entrenching lobbying is on our to-do list.
Two related areas where we still have work to do are:
Some aspects of our operations are already very transparent:
Where we would like to go further is:
Thoth’s sustainability model combines several elements:
This is strongly aligned with the POSI principle that stipulates that revenue should come from services, not from restricting access to data.
However, going forward, it would be good to more fully articulate:
These are topics we intend to tackle in collaboration with our board and supporting institutions in the next planning cycle.
Thoth’s development has benefitted significantly from community contributions, including community input on code, documentation, and standards research facilitated via the Copim Community, as well as through close collaboration with partner infrastructures and publishers on testing new features and integrations.
At the same time, we are aware that some aspects of Thoth’s operation still rely on a small team and a limited number of key individuals, which creates a key-person risk.
Our next steps here are to continue to document and distribute critical operational knowledge across the team and board, so that the organisation becomes less vulnerable to the potential loss of any one staff member’s capacity.
These actions align with POSI’s emphasis on both volunteer labour planning and transition planning.
Thoth’s codebase is openly available on GitHub under a permissive Apache-2.0 licence.
This fully meets the POSI expectation that the software required to operate the infrastructure should be open source, making it possible in principle for the community to replicate or take over the service if necessary.
On the openness side: All metadata records in Thoth are released under a public-domain CC0 dedication and accessible via the main Thoth catalogue (“By book” and “By publisher”), and programmatically via our GraphQL API explorer and Export API. A soon-to-be-launched OAI PMH endpoint will complement the provision of open data in a variety of ways.
On the preservation side: Thoth’s open archiving workflows provide automated, multi-venue archiving (e.g. Internet Archive, Zenodo) for content and metadata, adding redundancy beyond Thoth’s own infrastructure.
Where we want to improve is in the “secure” part of the POSI principle. As is noted on Infra Finder, we rely on AWS for hosting and AWS-level security standards therefore apply.
Going forward, we would like to complement this with a concise Thoth-specific statement on:
Thoth’s position as a community-led CIC, combined with open source code and CC0 metadata, already makes patent-based enclosure extremely unlikely.
Even so, POSI encourages infrastructures to be explicit. We do not yet have a published patent non-assertion statement; drafting a simple commitment not to use patents to block community replication of the infrastructure is therefore another concrete action item emerging from this review.
Interoperability is one of the areas where Thoth is most closely aligned with POSI – and where we invest a significant proportion of our development effort.
From a POSI perspective, this work is central to ensuring continuity and resilience beyond Thoth itself.
This self-assessment confirms that Thoth is already strongly aligned with POSI in several core areas, and particularly around open source, open metadata, stakeholder governance, interoperability, and preservation.
At the same time, POSI remains a useful reminder that “open by default” is not enough: we also need transparent governance documents, clear financial and transition planning, and explicit commitments around topics such as lobbying and patents.
Over the coming months, we plan to:
We see POSI not as a one-off compliance exercise, but as an ongoing conversation with the communities that rely on Thoth. If you have feedback on this assessment, or if you are interested in collaborating on any of the work outlined above, we would be very happy to hear from you.
Header image by Sunny Tank on Unsplash. Modifications by Thoth Open Metadata.