Hybrid Consortium Research Plan 2013/15. Dynamic Publishing - New Platforms, New Readers!

HPC Research Plan 2013/15. Dynamic Publishing - New Platforms, New Readers!

hybrid methods diagram

10.5281/zenodo.18807

PDF download

1. Executive summary

Open infrastructures for publishing

Dynamic publishing is a digital workflow where the publication processes are automated (layout, multi-format conversion, distribution, rights clearance, translation workflows and payments) and made available on request for reuse, to give access to new audiences and revenues. Currently, the majority of publishers are excluded from the dynamic publishing arena. The software systems on offer are either too expensive on the high-end of the product range or not ‘fit for purpose’ on the low-end.

The infrastructure would be supported as a variety of industry partners, open source communities and research groups. HPC’s contribution to this network is to connect the scholarly publishing community with the long standing work on open standards in digital publishing by the industry and the open source community.

Free at the point of reading!

The HPC supports open intellectual property rights (IPR), for reading and learning, not just in academia but for publishing in general. At the same time HPC also understands the need to economically support the skills authors and the publisher lends to the process. We see dynamic publishing playing a key role in a reconfiguration of publishing and knowledge institutions–such as the university–as it allows easy direct relationships with the readers, cutting out costly intermediaries. In a near-future where content is available for free, via file sharing, economic models that move payment away from the consumer and up the chain become essential–to the network provider, funder, collection agency or via some other route.

Bold policy moves are needed to accompany the development of the related technology. The first being to support technology infrastructures as part of the public domain–a part that can then support–independent enterprise, publishing, research and knowledge institutions.

Single source

In the consortium our focus is on single source publishing. Single source is the structuring of documents to make them machine readable and available for multi-format conversion. Over 2013 we will be releasing a tool set, called Typesetr, for single source conversion. We’ll be running a series of rapid prototyping projects exploring how to best fit Typesetr into workflows, aiming to make the publishers lives a little more sane and enjoyable.

2. Mission

Publishing

Reliable figures on the size of the publishing industry are murky and hard to come by, but estimates put the industry size at in the EU at 64,000 companies, employing 750,000 people and valued at 0.5% of the EU GDP, which is currently 12.894€ trillion.1 This estimate puts the value of the EU publishing industry alone at revenues at €640 billion per annum. But this lucrative industry is skewed, with approximately 20% of the companies taking 73% of the profits.

The scholarly and independent publishers that the consortium is focused on sit within the less profitable majority group of publishers, the 80%. When it comes to these publishers ability to experiment with new forms of digital publishing they are held back by limited budgets and similarly they are not supplied with services by the technology industry because of the unprofitable nature of providing services to a cash strapped industry.

The consortium has a two roles role to play in bridging this technology gap. First we can carrying out research into models of dynamic publishing that an individual small publisher could never afford. Secondly, by providing innovations in low cost technology infrastructure, that is free for all to use, we can make it affordable for tech and design companies to provide services to small publishers.

Dynamic publishing

Dynamic Publishing
Figure: Dynamic Publishing. How the automation of workflow processes leads to dynamic publishing. Although this automation can only ever be partial

Dynamic publishing means root and branch changes for publishing. The places for the availability of publication suddenly and dramatically increase, with new reading platforms and distribution channels. The need for new types of contracts and permissions for re-publishing to be as speedy as the flow of books around digital networks and providers. With the form of the book broken up, from being a static bound object to becoming a set of ‘content components’ available for reuse, ways of authoring also need to adapt. Introducing these new dynamic publishing workflows into the publishers production environment is our core challenge.

The consortium will be exploring dynamic publishing models for new forms of scholarly dissemination with a series of rapid-prototyping exercises. For example in following the example of the US educational publishers Flat World Knowledge,2 who provide text books as free-to-read online and at 80% price reductions to institutions on ‘bulk bring your own device’ (BYOD) sales. Another example to emulate is how Pearson3 distributes publications via an API, which essentially means publishing on request. With an application program interface (API), if another publishers wants to use Pearson content, the publishers systems make an automatic request and the publication is supplied on-the-fly, instantly. Previously such a re-publishing request would have gone through lengthy contractual arrangements, which may have taken several months.

Open infrastructures for publishing

In the networked society there is an ongoing battle between private and public infrastructures. Unsurprisingly the consortium is in the public infrastructures camp. In the field of publishing we are researching dynamic publishing and how to make high quality technology public domain, allowing for open innovation, with reasonably priced service provision to the professional user. Many parts of the infrastructure have existed for a long time, provided by a wide spectrum of players from companies like IBM, research bodies such as the Hasso-Plattner-Institut and media activist service providers like DocumentCloud. The consortium’s role is to speed along the process of addressing dynamic publishing by coordinating the existing projects into coherent and reliable workflows that publishers can trust.

Policy and technology

Publishing is in the midst of such fundamental changes due to digital disruption that the provision of infrastructural technologies to lower barriers to market entry and streamline production costs will not be enough alone. Accompanying policy developments are also needed in many areas. Significant moves have been made in the area of open licensing, for general users Creative Commons and in academia with Open Access (OA) publishing. A game changing move in policy is the EU main research fund for 2014-20, Horizon 2020,4 gives a directive on STEM5 research publishing to becoming Open Access as the norm.

The areas that the consortium has identified for further policy reform are: IPR on content permissions, piracy (the need to differentiate personal use and commercial exploitation), pricing monopolies, new economic models that move payment away from the consumer, public service broadcast and media infrastructure provision. The consortium will be actively looking to see how its technology work can be connected to other policy researchers working in this area.

Values and objectives

Milestones

2.1. Defining Hybrid Publishing

Hybrid Publishing - From print to digital and onto dynamic publishing
Figure: Hybrid Publishing - From print to digital and onto dynamic publishing

Hybrid Publishing is where the book is broken apart, a move beyond the habitual mode of imagining digital publishing, which has consisted of copying the bound book in digital form. Hybrid publishing also reflects the challenges our age is facing–austerity, neo-liberalisation and digital disruption.

Hybrid publishing is the further innovation of the unbound book and the subsequent reorganisation of knowledge institutions, with an interesting twist that the cost of accessing knowledge lowers, so that the dreams of universal education can take one step forward.

The development of the conventional book was always accompanied by parallel experiments with an unbound book form. In the 2011 book ‘Paper Machines’7 media historian Markus Krajewski traces a history of the European unbound book beginning as library records in the 16C created by a Swiss librarian Konrad Gessner, to the US Dewey Decimal System of the 19C and how the library record keeping system transferred to businesses in the early 20C as the card index system. This helps in part to define Hybrid Publishing. To gain full definition of Hybrid Publishing we need to add the agent of digital-disruption or as described by the economist Joseph Schumpeter,8 creative destruction. Creative destruction is a process where a new economy emerges out of the destruction of a previous order. Technology innovation is the agent of this change, and Schumpeter describes the entrepreneur as the one who exploits this change.

Ironically it is the unbound book and its prodigy, the card index system, that led to the punch card, the early data packet of what are now packet networks. The packet network is where all media can be broken down into a common data packets and sent to any device. It is this technology that acts as the agent of creative destruction, that makes up basic internet and mobile networks, and have made the concept of the unbound book finally realisable.

It is the scaling of this effect of packet networks which means that the fundamentals of publishing are in flux. The innovation of the unbound book and the replacement of print books means that economic models crumble, institutions of knowledge lose their relevance, copyright law becomes unenforceable. Readership data changes from being a bi-annual sales report to ‘reader analytics’ where a publisher can know what people read down to a granular level, following single readers to know which chapter they are reading at a given time and place. How people read and write changes. A re-skilling of an industry quickly becomes necessary.

Hybrid publishing innovation

The innovation process of digital-disruption brings a careful examination of the institutions and conventions of publishing, showing not only how new technologies can make improvements, but also how it exposes inherent failings in the status quo. Take for example Open Access academic publishing. The main barrier here is not technological, it is instead self interest, the vested interests of academics and publishers being resistant to the universal access agenda.

Recently, what has been proclaimed as innovation within online educational publishing such as MOOCs, is rather an exemplification of how hard it is to innovate. Most MOOCs are simply replicated conventional educational formats that make the material available online, such as distributed textbooks and embedded videos. And it adds an automated student grading system (factory style) to the mix. There is neither rethinking nor innovation of new types education or forms of learning.

With this in mind, the consortium would advocate more exploratory approaches that use digital innovations and the capacities of information and communication technology (ICT) to rethink learning, education and publishing.

Fundamentals in flux

The following projects are tackling the changes to the fundamentals that underpin publishing, from new economic models to re-skilling of the industry and the reconfiguration of educational institutions.

The disruption caused to industry and institutional interests means it is necessary to explore and develop new forms of publishing and, along with it, the development of new policies in IPR, economic models and societal goals.

2.2. Making the case

The Consortium has a two part strategy to address the goals of its research remit to support digital scholarly publishing.

Single source. Why here? Why now?

There is an undeniable case for the publishing industry moving towards digital and multi-platform distribution so the challenges of Hybrid Publishing have to be faced, head on. All of the affordable commercial and open source software available are neither cost effective, efficient, nor error free (up to a reasonable tolerance) to carry out multi-format publishing. There are industry-level tools and providers who can carry out this work, but entry prices are restrictive.

Current multi-format publishing systems

Single source

Single source means you have one master document document, that is available for dynamic publishing meaning it can be automatically converted to multi-format. With further automation in styles, reuse–broken up and recombined, with rights management, remuneration, sychronisation, revisioning and reading metrics, long term preservation and distributed, and converted into any new format–eBook, Print-on-Demand (PoD), app, library system, OER, etc. This means only one edit and proof, in current systems each format needs a new edit and proof.

Why here? why now?

Single source is not new. The issue of taking/turning unstructured paper and digital documents into a machine-readable format is a large part of the history of computing. Take for example Charles Babbage, computing pioneer, in the 1820s, looking to remove human reading error from manual logarithmic calculations, for example used for nautical navigation, with his computing device the Difference Engine9 – an early concept for a computer. As a footnote it is worth mentioning that Babbage also recognised in the 1820’s that scholarly publishers were even then creating restrictive access to knowledge with unfair practices, and ran his own Open Access campaign of sorts, resulting in his books being censored by publishers and banned from sale.10

Schemas have been built for structuring documents over the last forty years but now with the consumerisation of technology with the Bring Your Own Device (BYOD) phenomenon the grip of the paper books has been broken and digital reading has taken off.

The consortium

Open innovation is now the default in the blue chip technology sector. The setting up of the consortium is our way of doing open research by creating a network of industry, open source communities, research groups and other stakeholders with a shared purpose of implementing open infrastructures for publishing.

The consortium looks to cover many aspects of publishing with the aims of supporting reliable, high quality software, with a vibrant economy to drive the technology innovators, developers and providers.

The consortium is based on free software and open standards that form the foundations for innovation on the net and across new platforms, like mobile phones and tablets.

Many parts of the open source infrastructure for publishing are in place, for example in journalism there is Document Cloud11 for referencing documents and for eReading there is Calibre12, a desktop reader and shareable home library. We use Android as a mobile, tablet and eReader operating system.

The consortium contributes to these communities and acts as a meeting place for the different stakeholders, media activists, educators, publishers, technologists, creators, universities and entrepreneurs.

2.3. Roadmap

The roadmap is based on the development of our core technology, TypeSetr, for ‘single source publishing’ which is aimed at efficient multi-format conversion, layout and distribution. The final goal being transmedia publications, including rich media, social media and open learning environments. This goal is achieved by moving up through layers of increased complexity, starting with TypeSetr Academic in Google Docs and ending in open learning systems.

A series of rapid prototyping exercises and workshops with partners will be taking place in an accompanying timeline.

Milestones

Phase A - Conversion infrastructure

1. Open Research Portal - Dec 2013

In partnership with a number of technology providers, research groups and publishers an open research portal will be launched.

2. InfoMesh Technologies UG - Dec 2013

InfoMesh Technologies UG founded in Lüneburg by Simon Worthington as part of the Innovations Inkubator EU funded programme. The company will carry out commercial development and client support for digital presses with Universities, museums and publishers, seeking further business to generate revenues for product development.

3. Typesetr Academic - private beta - Feb 2014

A web base software platform release for multi-format conversion, with accompanying user and developer support environment. A Google Docs based toolset with multi-format publishing, collaborative writing, customisable templates, full academic markup to accommodate research papers, journals and monographs.

4. Typesetr Academic Open Source Release - Feb 2014

Typesetr Academic will be released as a fully documented and supported free software compliant release, with source code on a software repository and publishing lab staff carrying out community support as part of their research process. The core algorithmic engine of Typesetr Academic will also be free software compliant and so new inputs and outputs can be added.

Phase B - prototyping and workshops

5. Typesetr Academic (TA) integration into Open Journal System (OJS) and Open Monograph System (OMS) - May 2014

TA document creation workflow integration into OJS and OMS with additional document input and output types.

6. Typesetr - publishing partner prototypes and hacklabs

All of the prototypes would have the theme of ‘Bring your own device (BYOD) distribution’, to academia, institutions, the wider public and in search contexts.

BYOD is key in mutli-format publishing as it is the phenomena that is driving development in publishing, with readers moving to new devices, to accompany the book and library.

Each workshop compliments the related prototype and involved research content and videos being published.

Prototypes

  1. Merve - search/BYOD distro 2014 Q1
  2. Leonardo - academic BYOD distro 2014 Q2
  3. AEE/Mute/Booksa/Saarbrucken - BYOD distro to OER 2014 Q3
  4. Public Library/McLuhan - public BYOD (iPad/tablet) distro 2014 Q3

Hacklabs

  1. Merve - search/BYOD distro/post-processing book scanning May 2014 Q2
  2. Leonardo - academic BYOD distro July 2014 Q3
  3. AEE/Mute/Booksa/Saarbrucken - BYOD distro to OER Sept 2014 Q3
  4. Public Library/McLuhan - public BYOD (iPad/tablet) distro Nov 2014 Q4

Phase C - post HP planning and partner projects

7. Hybrid Publishing Consortium Stiftung - Feb 2014

Based in Lüneburg, Mute Publishing and partners will form a stiftung to develop the consortium. It is important that the consortium is formed as a vendor and institution independent organisation to allow open and fair collaboration. Further grant funding for the project will be sought.

8. Indy Portal (Partner Project, external from HPC) - April 2014

Indy Portal is a partner project headed by Nätverkstan and Mute Publishing and a collection of other publishers and publishing networks. A distribution and sales web portal for small publishers using ePub 3.0 as books in browsers and adopting an open and mixed IPR model of the US publisher ‘Flat World Knowledge’. The publications would be curated and facilitate transmedia publishing. The objective of the Indy Portal is to offer fair prices for digital books and to bypass the monopolies of online book distributors.

Phase D - further development

9. Distribution API - June 2014

The Distribution API would add distribution functionality to TypeSetr Academic. API publication distribution allows for the automatic distribution of whole documents and specific sections of documents. The automated distribution is important here for Open Access repositories. It also allows for synchronisation of document across different systems, as well as interaction with features like annotation and usage analytics.

Other projects, workshops, publications and prototypes

Completed

In progress

Postponed or shelved

3. Design Research (Hacking methodologies)

Design research can be summarised as learning through a process of making. Our special version of Design Research also combines cutting edge Open Innovation and Service Design. A version of Open Innovation being borrowed from the Free Culture13 activist movements and Service Design being from hyper-capitalist product development, with the need to differentiate products in an ever increasing ‘product flood’, and mix online, offline consumer relations.

methods and design processes diagram
Figure: The diagram lists the methodologies or design processes that the Consortium uses in its research, technology prototyping and product development work

The consortium mantras

Two key goals for the project have been in place since the outset of the project

Goals

Single source and dynamic publishing

Single source and dynamic publishing is important, as it enables efficient workflows, brings new publishers into the market and allows new types of hybrid publishing to easily emerge: e.g., social reading, fluid curricular building, new editorial grouping etc. Without single source publishing the failures of multimedia age will be repeated in high-cost and time intensive authoring environments for products that quickly become redundant.

The Hybrid Publishing Consortium

The consortium has to balance collaborative transdisciplinary research with the rigors of a product-development. The single source can be seen as an infrastructural component, that then allows many products to be plugged in, emerge from or build on top of it. The consortium is a place for these cooperations, conversations and dialogues to take place, with experts in other parts of the publishing chain.

What can be achieved within the timeframe?

We have divided the tasks into four areas put together from the team’s combined expertise:

  1. Typesetr software product - Typesetr is the software tool that the consortium will be developing and implement as a single source publishing architecture performing multi-format publication conversion. The program will be developed, tested and launched during 2013.
  2. Qualitative research - Carry out studies on scholarly research behavior patterns on how stakeholders collect, organise, and annotate their work. What tools are they using, what measurements are available? What kind of metrics might be useful for authors?
  3. Prototype dynamic publishing, future scenarios of publishing - Single source publishing architecture allows publications to be easily distributed and have further algorithmic computing applied to the content, such as language translation and analytics.
  4. Open research - a consortium web portal with open research and community outreach to industry, the open source community, publishers and other research groups. The objective of the open research is to foster long term partnerships to address the numerous technical publishing infrastructure components that need improvements and share vision/inspiration and latest developments.

Hybrid Methodologies for Transdisciplinary Research and Design

The Hybrid Publishing Consortium is currently a group comprising publishers, academics, artists, designers, and programmers working across multiple disciplinary fields and research interests, while also weaving across theory and practice. These activities will necessitate mixing, matching, and hybridizing methods as appropriate. However there are a few overarching principles and methods we will employ:

Systems Thinking. Simply put, systems thinking is the process of understanding how things, regarded as systems, influence one another within a whole.

Participatory Design.15 In the design of new open-source tools for publishing the HPC will rely on, and be informed by, users as co-creators and end-users, following participatory design’s history of Democratizing Innovation16 (von Hippel, 2005). The group will be carrying out a wide variety of research activities, events, workshops with our stakeholders to develop future scenarios and ‘publishing futures.’ Due to the complex nature of our research area, we are aiming to make the design process open and transparent to interested parties via our Open Research Platform.

Design Thinking17 and ‘Research through Design.’ The process will integrate various tools and methods of Design Thinking to inform our process and outcomes. Complexity mapping, modeling, and rapid prototyping are simply means for us to learn about our given subjects through the process of making.

Service Design. The design process will entail the development of new product-service-systems with the intent of developing future business endeavors. Various tools for developing redefining areas for innovation will be used to develop future publishing scenarios, business models, and relationships within a given system.

These overarching processes may be combined with additional theoretical frameworks, tools and methodologies we have identified below:

  1. Systems mapping - a tool to understand relationships within any given system, i.e. standards, stakeholders, comparators, tech stack etc.
  2. Publication analysis - ­creates a taxonomy of publications, traditional and emerging. Breaking down all academic documents into constituent parts as content, structure and layout. This allows us to find commonalities between documents and ensure that documents are machine readable and thus computable.
  3. Knowledge galaxies18 - as publishing technologies change, from print, digital and to hybrid the technologies and conceptual models change. Part of our work is to map out the new forms of publishing in digital and hybrid publishing.
  4. Workflows - workflows are the steps need to create, use and read a publication. This can cover conventional publishing–commissioning, editing and layout–to new platforms like Bob Steins’ Social Book project featuring group annotation. Since we consider workflows to include the creation and use of published materials, so we can include a reader or a librarian in workflows.
  5. T-PINC - Technology, Power, Ideology, Normativity and Communication19 - discourse analysis related to how technology is developed and used.
  6. Software requirements - a process to build up clear requirements for software development, primarily based on user journeys.
  7. DSDM20 - Dynamic Systems Development Method - Agile software project management.
  8. Project management - aligning mission/objectives with projects and available resources and constraints.
Design research processes
Figure: Design research processes. The core processes can be repeatedly queried with new project questions. May 2013

3.1. TPINC

- Technology, Power, Ideology, Normativity and Communication

TPINC is a method of discourse analysis developed by Jochen Koubek, University of Bayreuth and based on PINC Werner Patzelt, Technische Univerität Dresden. TPINC was developed to examine decisions about technology development, usage and context. Where the stakeholders involved in a given situation are examined on the basis of the TPINC vectors (Technology, Power, Ideology, Normativity and Communication) which are used to see how and why those stakeholders take part to effect a technologies use and development.

TPINC forms part of the design research processes of the consortium and is employed at varying stages; to identify who the varying stakeholders are, to examine more specific questions and to communicate the project’s overall understanding of Hybrid Publishing.

TPINC has been chosen to establish the questions about technological infrastructures and what they can enable or facilitate, and how this translates into an effect of change on society, because there are such systematic failures in the institution of publishing, knowledge and learning. Failures in terms of ICT technology, economics, media diversity, skills and efficacy.

As an example, focusing on the institution of the university, one failure is in the realm of ICT technology. Universities have become trapped by IPR concerns and controlled by software vendor lock-in resulting in restrictive content approaches to content use. The alternative strategy of open and exploratory approaches has been neglected, making their systems almost redundant in a Bring Your Own Device (BYOD)21 environment or what can be described as a consumerisation of infrastructure, with smart phones, tablet and the abandoning of the desktop. In publishing they have allowed themselves to be exploited in a double-dip payments to publishers, paying their own scholars to contribute to publications, then buying back these same publications. In university staff training, scholars have near to zero digital scholarly skills. The university publishing has primarily narrowed its focus on a select range of peers, not addressing digital opportunities or new audiences. To sharpen the focus even further, if we look at Open Access (OA) publishing the barriers to take-up is not technological but to a large part the individualised self interest of scholars: their personal or departmental positions and standing being prioritised over a greater dissemination of their works. Hence TPINC is needed to help unpick these complex relations.

A TPINC exercise:

The question: ‘Open Access v closed publishing’

- the failure of publishing by knowledge institutions (universities) and actors/stakeholders

TPINC Schema - ‘Open Access v closed publishing’
Figure: TPINC Schema - ‘Open Access v closed publishing’

The actors/stakeholders

Scholar

Publisher

Librarian

Student

Teacher

University

Funder

Technology provider

State

Politician

Publics(s)

Media activists

News media

Standards bodies

Publishing employee

Broadcasters

Industry

Mapping the actors on the TPINC schema

3.2. Workflows and stakeholders

Workflows in the publishing process are at the heart of our research. Examining workflows has a threefold benefit, they show what software requirements need to be addressed, what efficiencies can be made and most importantly the preferences and habits of the user.

In order to articulate the preconditions and potentials of dynamic publishing we will engage with a variety of stakeholders ranging from individuals, to scholarly institutions and indy publishers. We will build a series of engagement activities to strengthen tools, resources, skills, and knowledge for the single source document structure and multi-format, Open Access publishing.

We start out by analyzing the workflows of our partners and stakeholders through a mixture of formal and informal processes. We want to find out how knowledge is circulated in a collaborative, co-editing group and how the group is organised around its publishing goal. By workflow we mean all aspects of a publication’s life from the research process before authoring to the publication going into circulation and use, and finally its long term preservation. Particularly for software building (Typesetr) the study of workflows is an indispensable activity. It needs to integrate a chain of interlocking processes, technical, real world situations and personal habits of the user, causing minimal disruption. And in a broader sense the studying of workflow will provide us with an overview of skill levels, economics and efficiencies. Beyond the workflow of our stakeholders we will investigate the changes and challenges of post-digital scholarship and the transition from conventional book publishing to unbound, hybrid forms of knowledge dissemination.

Our approach is designed to focus on each stakeholders singular engagement in processes that allow for co-creative development of new tools and strategies (See section 3. Design Research (Hacking methodologies)). This will simplify the work process, making works available in multiple formats and envisioning new forms of publishing, finding new readers. The design research process will introduce a reciprocal knowledge exchange platform, hosting a number of workshops and prototyping sessions. Aiming to discover spaces of opportunity: e.g., alternative tools, resources, networks, skills etc. we will facilitate the imagination of a speculative publishing landscape, its potential futures and practical implications. The outcome of our research will guide the development of our single source conversion platform: Typesetr as well as other research projects within the Hybrid Publishing Lab. We will make our research results transparent, as well as our approach and publish it on the Consortium’s Open Research Platform.

In concrete terms:

Stakeholders

3.3. Open research - a digital strategy

The Hybrid Publishing Consortium will partner with Nätverkstan, Leonardo Electronic Almanac, Eurozine, Mute, SourceFabric, and a variety of other publishers to produce a shared open research platform. The platform is a way for the consortium to carry out open innovation, with the objective of supporting the creation of an open source infrastructure for publishing.

Address: http://consortium.io

Start date: Aug 2013

Digital strategy

A digital strategy is based on clearly defining your values, message and audience. It determines how you communicate to and engage your audience with these values. The audience being made up of staff/members, audiences, community and stakeholders. A digital strategy then helps select what tools, platforms and content is produced, as well as methods of measurement and evaluation that are used to continually develop the strategy.

Open research

By its nature the Hybrid Publishing Consortium needs to partner with a variety of vendors and research groups to enable the expansive technology stack needed for open publishing systems.

An open research strategy allows us to create a greater profile for our research, gain contributions to our research through questionnaires and enter into collaborations with commercial vendors, open source projects and research institutions.

Examples of open research platforms

The portal would publish the following

3.4. Single source - system requirements

Single source workflow diagram
Figure: Single source workflow diagram

Multi-format publishing based on a single source publishing technical model?

The system requirements process is one of our design research processes, which also includes TPINC process used to help identify stakeholders then used in the system requirements process. The stakeholders being vitally important for gathering the constraints placed on software to solve real-world problems.

The process has three phases that are applied repeatedly to produce a finished requirements document. After this process is complete the document then enters into an agile development process, where once again the design of the software is reworked at varying stages.

Three repeated phases

These three phases are applied as we move through stages of planning refinement. Starting with;

  1. Identifying a real-world problem that can be solved and testing this assumption - in our case we begin from the problem that current multi-format publishing systems are ‘inefficient, costly and error prone’.
  2. Consulting with stakeholders/users
  3. Feasibility study - gathering other areas of knowledge to ensure a functioning system can be made, these include; establishing our goals, domain knowledge and stakeholders.
  4. A variety of modelling exercises are undertaken at different stages to address different issues; conflicts of interest in requirements, identify problems to be solved, sort our system requirements from software requirements etc.
  5. Translation and validation - final documentation has to be made, including schedule, costs and risks
Sommerville, 2011, requirements process diagram
Figure: Sommerville, 2011, requirements process diagram

Preliminary single source system and software requirements (summary)

  1. Outputs - ePub, mobi, designed PDF (screen and print ready, with covers for print ready version, HTML5, TEI, DITA, XML and master document.
  2. Inputs - Google docs, docx, database, InDesign, Scribus, MS Sharepoint, realtime text editors OX Documents https://ox.io/ox_text.
  3. Metadata - Distribution metadata - Have ability to make master document, or authoritative unique source, of all distribution metadata for publications and output metadata required by each output format. Usage metadata - consider requirements for usage metadata such as learning and reading analytics.
  4. Metadata inputting - allow metadata to be inputted at any time by different workflow members with relevant parts of knowledge they hold.
  5. Markup - academic markup, styling, citation, referencing management.
  6. Structured document - machine readable document, semantic markup, TEI, OAI, DITA.
  7. User feedback for structured document editing - with comments on the doc prompting corrections for re-input into system by the user. These comments should be two tier, generated by two separate methods, automatic/machine made and by an admin operator.
  8. Comments and annotation - what standard do we use for user/reader annotation? W3C Annotea, Framework 7 EU projects, industry use (Amazon, MS, OO, Adobe).
  9. Checking-in/out (round tripping) - the system can be treated as a versioning system, where user can check-in and check-out documents, rollback documents to any version. Different document types could be round-tripped; docx, Indesign, GoogleDoc etc, accepting the changes made in any of these editing environments?
  10. Editing templates - edit LaTex for example with Sharelatex. https://www.sharelatex.com/. With multiple templates needed if we’re outputting multi-format publications. With permissions system for restricting to groups and users.
  11. Heuristic conversion features - the system to learn about formatting and conversion for specific publishers and users, as well as for the common share rule sets.
  12. API distribution - automatic distribution as well as translation to different formats, for example from TEI to OAI.
  13. Workflow, users/group access control - because documents will used by a variety of different users, for example for outsourcing proofing or layout, we need a workflow user/group access control set of functionality.

4. Technology

4.1. Open infrastructure for publishing

A technology stack is a layered set of components and services needed to perform a given task. A classic technology stack for a web server and in open source terms is referred to as LAMP (Linux [Operating System], Apache [HTTP Server], MySQL [Database] and PHP [Scripting language). In a stack components can be swapped out for a different version or vendor, for example Linux could be replaced with Windows. Obviously you would prevent your stack being open source by using Windows.

In the case of the consortium the technology stack is core to what we do as it is how an open source public infrastructure for digital publishing is delivered. The consortium looks to identify, support and partner with third parties to enable an open infrastructure for publishing and this is where we would need to choose a stack and then how it can be applied to the scholarly publishing.

The publishing stack is vital for our scholarly publishers, since nearly all of them sit in the majority bracket of the 80% of publishers who are priced out of the opportunities that access to the stack offers, which is access to new markets and revenues. An annual cost of running a single source, XML first workflow is in the region of 100,000€ in end-to-end costs. Hence it is the consortiums top priority to ensure publishers can access this technology at an affordable rate. The dynamic publishing market means that publishers can offer content in bulk subscriptions to institutions, distribute title listings to online repositories and distributors, as well as carry out a wide variety of publishing options to reach readers. Currently most publishers use off-shore cheap labour conversion providers to make digital publications like eBooks. This is satisfactory as a low quality, short term measure, while the digital publishing market matures, but unless publishers make use of the single source publishing model they will continue to see profits fall, diminishing resources and risk closure.

The stack is provided by a variety of vendors, research groups, open source communities, industry and standards bodies. The consortium’s contribution is in the specialisation of single source multi-format conversion.

Publishing stack - pricing comparison - proprietary v open source

This stack shows the basic requirements for multi-format publishing. The pricing is based on three years of end-to-end costs. The costing does not include publication production costs.

The overall publishing stack can be extended in many ways, depending on the specific workflow, for example if social reading or OER are introduced. And as was mentioned before with the example of the LAMP web server stack, vendors and services can be swapped out for alternatives.

The stack can have various workflows applied to it to make custom stack selections, or paths through the stack.

proprietary v. open publishing stack price comparison
Figure: proprietary v. open publishing stack price comparison

4.2. Single source and dynamic publishing

There are a variety of competing approaches to single source publishing, below is one route that we have identified as being tried and tested.

XML first workflows

Single source publishing or ‘XML first workflows’ enable what is known as dynamic content publishing. Dynamic content publishing is the vital key for publishers to enter new markets and access their target audience in a time of digital disruption. It allows publications to become fully digital so they can be automatically converted, styled, broken up and distributed.

Dynamic publishing is where content can be queried and specific components brought together and published in automated workflows in multi-format to multiple channels. An example convention in dynamic publishing is known as ‘topic based authoring’. For example if you had a set of essays for a reading group, then a set of sections, chapters and book sections could be automatically collated and published as full text multi-format publication and as a app feed of titles and links for a mobile app. Topic based authoring is commonly used in things like customer help systems or large technical systems like aeronautics engineering.

The ‘XML first workflow’ approach to single source has been around for over ten years and was preceded by other languages such as SGML. The objective is to structure the content into content components, as opposed to leaving documents as single unstructured entities in the way that a Word or PDF document are commonly used.

Large companies and publishers have in the majority of case adopted this XML first workflow model, but the costs and resource use are way beyond the majority of publishers, academics or modestly resourced universities. System costs start at the hundreds of thousand euros and take in the region of a year to build, then there are the annual running costs. Since 80% of publishers in the publishing market have turnovers under 4€ million and not enough titles per annum the entry point cannot be justified these types of Dynamic Publishing systems and have stayed out of their reach.

This architecture involves the following building blocks. Having a CCMS (Content Component Management System), XML editing/authoring environment, content legacy system, choice of XML output and templating engine.

Architecture component Example provider Example standards
CCMS (Content Component Management System) Afresco (overlaid) or DitaToo http://ditatoo.com/ Not applicable
XML editing/authoring environment DITA tool kit (open source) http://en.wikipedia.org/wiki/DITA_Open_Toolkit. List of editors http://en.wikipedia.org/wiki/Comparison_of_XML_editors Not applicable
Content legacy system Stilo - Migrate tool http://www.stilo.com/products/conversion/ Custom pipelines
XML output standard DITA tool kit for publishers http://dita4publishers.sourceforge.net/ DITA 3.0
Templating engine Freemarker http://freemarker.org/fmVsXSLT.html XSLT

Structured writing and DITA

Darwin Information Typing Architecture (DITA) is a framework and tool set made open source by IBM in 2005 and appears to have taken over from older formats such as DocBook. DITA allows for what is know as structured writing, where content is marked up as its created, either by the author inputting into templated fields or by some follow up process. With structured writing, content can then easily be reused, opening up exciting new publishing opportunities.

DITA community http://www.ditawriter.com/

History http://www.digplanet.com/wiki/Darwin_Information_Typing_Architecture

Our target users and the consortiums contribution to this architecture

For the smaller publisher these technologies have not been affordable or might seem irrelevant to their needs. The task of the consortium is to make these architecture affordable for the majority of smaller publishers, which makeup in the region of 80% of the market. The case for making this approach relevant to publishers is by our research project exploring ways that these small publishers can be enhanced by dynamic publishing.

5. The Hybrid Publishing Consortium

The objective of the hybrid publishing consortium is to support the building of an Free Software compliant software infrastructure for digital publishing. An infrastructure for publishing would cover core issues as well adopt new components as it develops. The consortium would be formed of a number of partnerships with components of the infrastructure being provided by a variety of publishers, technology vendors, open source communities, research groups and industry bodies.

Core publishing infrastructure issues are;

Quality assurance, Free Software, open standards and open IPR

As well as being Free Software compliant, the infrastructure would be based on open standards and advocate open IPR content models.

Within the area of open source the consortium has the priority of encouraging quality assurance and with partners we would look to make use of our own research design processes and partner expertise to make a contribution to this important issue.

Open innovation

The infrastructure project acts as part of the consortium’s open research strategy allowing for sharing and partnership between different partners and the wider community. The consortiums specialisation of single source and dynamic publishing also finds a place within the consortiums infrastructural approach. This is especially valuable as the technology we are focusing on has many opportunities to be applied in different areas and it would be our interest to have partners create a variety of products based on our technology.

Governance and long term plans

The consortium has to balance the interests of the different partners and this requires that the consortium is a neutral body and has transparent decision making. Independance from institutions and vendors can be ensured by constituting the consortium as a stiftung, or non-profit, in late 2013.

The consortium is currently based at the Lüneburg Innovation Inkubator, a major EU project within Leuphana University of Lüneburg, financed by the European Regional Development Fund and co-funded by the German federal state of Lower Saxony. The Inkubator project comes to an end in July 2015, over this period the consortium will develop plans for continued operation. Part of these long terms plans is to create spin-off companies and commercial collaborations in the Lüneburg region and internationally as part of our business incubator funding remit.

History

The consortium had its beginnings as a Mute Publishing project lead by Simon Worthington, Mute Publishing’s Director of Digital and Pauline van Mourik Broekman, Mute Publishing Director when Mute magazine adopted an OA policy on 2004. Subsequently in 2007 the issue of single source publishing was first approached with the help of Mute’s systems administrator Darron Broad at a meeting with the Open Source Publishing23 organisation in Brussels. The work was continued with support from the London Development Agency and Angle PLC in 2009 launch a POD production company PicoPress. Then later in a Technology Strategy Board funded metadata project over 2012 for an RnD project for eBook conversion. As of 2012 the consortium project became part of the Hybrid Publishing Lab, Innovations Inkubator, Leuphana University.

5.1. Lab and people

The Hybrid Publishing Lab

The Hybrid Publishing Consortium is a project of the Hybrid Publishing Lab. With the wider lab being staffed with eighteen people, made up of professors, visiting professors, doctorates, post-doctorates and research associates. The Lab covers research areas of Open Access, open learning, the future university press as well as creating a press for the Centre for Digital Cultures.

The consortium is an integrated part of the lab’s research with all lab members giving input and feedback on the consortium's work. Lab team members periodically move between project or work on joint collaborations like workshops and colloquiums.

It is also important to emphasise that the lab site within the Innovations Inkubator, specialising in digital media and combining research areas that cover industry topics such as gamification and public service broadcast, with an approach which is informed by the humanities and critical discourse.

People

Working team

Simon Worthington - project leader

More than twenty years of experience as an independent publisher. Co-founder of Mute magazine. As Mute director of digital Simon has been involved in managing numerous independent and activist media technology projects, funded by London Development Agency, UK - Technology Strategy Board and Open Society Institute. His experience covers business, organisational and innovation management. Most recently until 2012 leading ‘Art of Digital London (AoDL)’, a digital strategy training project for Arts Council England, two hundred and seventy-five London clients. Simon studied at UCL, London and CalArts, California.

Agata Królikowski - information architect

Agata Królikowski studied law and informatics at the Humboldt University in Berlin, where she subsequently worked in the field of informatics in education and society. In 2012 she joined the Innovation Inkubator at the Leuphana University Lueneburg. As a PhD student her research focuses on the intersection between informatics and laws regarding privacy protection, copyright and the control of information in general. She is the spokesperson for the working group "Internet and Society" of the German Informatics Society.

Minuette Le - strategic design

Minuette is a designer and researcher with an interest in open innovation systems and re-thinking public services. She has a background in both business and design and completed her MFA in Transdisciplinary Design at Parsons the New School in New York, an experimental curriculum focused on design-led research, systems thinking, and service design. Her practice involves design across disciplinary fields ranging from community-based projects, public services, to emergent digital futures. Her work at the Hybrid Publishing Lab is researching the Post-Digital Scholar and shifting research and learning paradigms.

Ulrike Gollner - interaction designer/researcher

Ulrike is an interaction designer and researcher with a background in computer science and media arts. She completed her MA in Interface Cultures at the University of Art and Design Linz and BSc in Engineering in Media Technology and Design at the University of Applied Sciences Upper Austria. She brings many years of experience from Ars Electronica Linz and the Design Research Lab of the University of the Arts Berlin, most recently designing and developing hardware and software prototypes in the context of disability-inspired interaction design. Her work at the Hybrid Publishing Lab explores emerging scholarly practices in today’s paradigm of 'open everything'.

Johannes Amorosa - developer

Johannes studied at the academy of media arts in cologne, Germany and finished his diploma in audiovisual media in 2012. Since then he worked as a pipeline developer at the Berlin based visual effects company Celluloid VFX. At the consortium and the Hybrid Publishing Lab he is responsible for engineering the IT-Infrastructure. He is committed to open source and linux in particular.

Christina Kral - stakeholder coordinator/blue sky facilitation strategist.

As an artist she focuses on forms of participation and utopian idea development. Within the Leuphana University she researches the future of learning, concentrating on alternative forms of knowledge production and visual, as well as contextual translations of existing knowledge to reach new publics. The approach raises the questions of how these forms will look like and how will this influence the way we learn and ultimately alter educational institutions.

5.2. Partners

The consortium is able to carry out publishing research that is beyond the scope of many publishers and academic because of their workloads and resource constraints. And with our Innovations Inkubator we look to share this research using open research methods and Free Software IPR models to provide real-world solutions to the publishing issues these publishers and academic face.

In relationship to industry groups who are much more advanced in the development of single source and XML first workflows our work plays a role in translating how these approaches can be used by the smaller scholarly publisher.

The publishing infrastructure project of the consortium is by its expansive nature made of collaborations by domain knowledge specialists and so partnership forms part of this work.

Current partnerships and associates

Partnerships

Merve Verlan

Hogeschule - Amsterdam and Rotterdam

Centre for disruptive media, Coventry University

Leonardo Electronic Almanac

Associates

Sourcefabric

Xm:lab Saarbrucken University

Publishers

The consortium is working with a series of publishers that cover a wide spectrum of academic publication types to ensure that we can capture the requirements of their workflows. We work with the publishers in a variety of ways, from carrying our analysis of a specific publication type workflow, like a academic monograph. Over to in-depth collaboration on issues like BYOD bulk sales to institutions.

The consortium also works with a number of publisher support agencies such as Eurozine, Nätverkstan and Independent Art Publishers, who have a combined publisher list of over three hundred publishers to consult with.

Current publisher list

Autonomedia, b_books, Dienadel, Digital Culture in Education, Enrica picarelli, ISEA Leonardo Electronic Almanac, Liquid Books, Merve Verlag, Mute, Post-Media Lab, r0g_agency, Kuda, Booksa and Berlinner Gazette.

Under consideration as part of Eurozine

5.3. Comparators

Single source publishing has its origins in technical two worlds. Firstly technical documentation, engineering, legal documentation and help system publishing. Secondly the creation of the networked computing and the development of Internet, as personified by the Web as a universal document system. The application of single source in the humanities and literature is less common. This makes our job of finding comparators a little more difficult and requires a degree of translation and interpretation of relevant projects.

We have chosen two categories of comparable projects, single source and publishing infrastructure consortiums. Then dividing these into industry groups and research groups.

Single source

Industry

Commercial providers

Woodwing http://www.woodwing.com/

Stilo http://www.stilo.com/

Le-tex http://www.le-tex.de/en/

Open source, open standards

DITA Toolkit, IBM http://en.wikipedia.org/wiki/DITA_Open_Toolkit

Sphinx - python document generator http://sphinx-doc.org/

LaTex http://www.neodoc.biz/fr/calenco/index.html

Research

Consortium

Industry

IDPF (Industry Digital Publishing Forum) http://idpf.org/

OASIS (Organization for the Advancement of Structured Information Standards) https://www.oasis-open.org/

W3C (World Wide Web Consortium) http://www.w3.org/

Research

LGM (Libre Graphics Meeting) http://libregraphicsmeeting.org/2013/

Digital Publishing Toolkit http://digitalpublishingtoolkit.org/

IfBook - The Institute for the Future of the Book http://www.futureofthebook.org/blog/

Related

There are a variety of project which do not exactly correlate with the Hybrid Publishing Consortium but instead their values or leadership in their field builds a strong affinity.

Open Archive Initiative, Cornell University http://www.openarchives.org/

Open Knowledge Foundation http://okfn.org/

Data Liberation Front http://www.dataliberation.org/

Appendix

Glossary

Dynamic publishing - Dynamic publishing is a digital workflow where the publication processes are automated–layout, multi-format conversion, distribution, rights management, file transfer, translation workflows, document updates, payments and reading metrics. Secondly dynamic publishing to give access to new audiences and revenues, due to the fact that the publication is available on request, re-use and recombination and can be automatically supplied to libraries, repositories and Web 2.0 reading platforms. Dynamic publishing is enable by machine readable, structured document formats, such as IBMs FLOSS framework Darwin Information Typing Architecture (DITA).

Publishing API - An Application Process Interface (API) is how one computer talks to another computer. For publishing the API acts as an automated place where distribution and its related activities happens-orders, file transactions, metrics, payments etc. A publishing API allows for free and paid content to be distributed. With an API a publisher can connect directly into an library information system, or a new Web 2.0 reading service can initiate their own connection to the API and start interacting with content. Example publishing APIs are Pearson and The Guadian, see: http://developer.pearson.com/ and http://www.theguardian.com/open-platform. API listings site http://www.programmableweb.com/apitag/?q=publishing

Single source - single source is a technical architecture for multi-format digital publishing. It means you have one master document document that is automatically converted into any required digital publication format eg eBook, POD, PDF etc. This type of architecture for publishing is required as current parallel formatting workflows are inefficient and low quality. With single source any document update can be automatically distributed to all the output formats. In current digital workflows each output format needs to be individually updated, making digital publishing prohibitively expensive and almost impossible.

XML first workflows - is an alternative technology approach to document management from the dominent models of word processing technologies like MS Word. What differentiates XML first workflows and makes them superior is that the document is structured and machine readable. This allows for speedier and greater information and knowledge use. Most large publishing companies use XML first workflows.

Reading analytics - Reading analytics means the collection of detailed reader activity and then the methods to make sense of this data. Traditionally a publishing would only have one dimension of readership, for example, a quarterly book sales report. Currently the knowledge about reading activity is undergoing a revolutionary explosion, with the dimensions of data on the ready vastly increasing. The level of data about a reader can go down to location, time and even a wide spectrum of behavioral characteristics inputs via mobile devices–heart rate, body temperature and more.

http://linkme2.net/tz

Copyright the authors

Creative Commons Attribution-ShareAlike 3.0 Germany (CC BY-SA 3.0 DE) http://creativecommons.org/licenses/by-sa/3.0/de/deed.en

Legend: This deed is used in the absence of an intellectual property framework that represents the authors respective position on copyright.

The Hybrid Publishing Consortium is a project of the Hybrid Publishing Lab in collaboration with partners and associates. The Hybrid Publishing Lab is part of the Leuphana University of Lüneburg Innovation Incubator, financed by the European Regional Development Fund and co-funded by the German federal state of Lower Saxony.

http://consortium.io

http://cdc.leuphana.com/structure/hybrid-publishing-lab/

http://hybridpublishing.org/

References

[1] Eurostat http://epp.eurostat.ec.europa.eu/tgm/refreshTableAction.do?tab=table&plugin=1&init=1&pcode=tec00001&language=en

[2] Flat World Knowledge, open IRP commercial academic publisher http://catalog.flatworldknowledge.com/

[3] Pearson publishing API http://developer.pearson.com/

[4] Horizon 2020 http://ec.europa.eu/research/horizon2020/index_en.cfm

[5] STEM fields–Science, Technology, Engineering, and Mathematics

[6] MOOC+ refers to going beyond the MOOC. The reason to go beyond the MOOC is that it is a flawed model driven by the economic needs to institution. What is missed in the MOOC model is the potential of digital media for learning and reconfiguring of institutions of learning. See the book ‘We´re all Game Changers Now!’, https://curve.coventry.ac.uk/open/file/c04530ce-d16a-46ca-b359-a905195a76cb/1/Open%20education.pdf

[7] Paper Machines About Cards & Catalogs, 1548-1929. 2011 by Markus Krajewski, Bauhaus University, Weimar. http://mitpress.mit.edu/books/paper-machines

[8] Capitalism, Socialism and Democracy. 1942 by Joseph Schumpeter. http://www.scribd.com/doc/32097500/Capitalism-Socialism-and-Democracy-Copy

[9] Difference Engine http://en.wikipedia.org/wiki/Difference_engine

[10] On the Economy of Machinery and Manufactures. Charles Babbage, 842. Chp. 31. ‘Combinations of Masters Against the Public’, pp 312. https://archive.org/stream/oneconomyofmachi00babbrich#page/312/mode/2up

[11] Document Cloud https://www.documentcloud.org

[12] Calibre http://calibre-ebook.com/

[13] Free Culture Movement http://en.wikipedia.org/wiki/Free_culture_movement

[14] Based on publisher consultations with six cultural publishers in 2012 by Mute and LShift as part of a UK funded project by The Technology Strategy Board (Metadata strand) for automatic eBook creation

[15] DiSalvo, C. and LeDantec A. (2013), Infrastructuring and the formation of publics in participatory design, Social Studies of Science, Sage UK, 3-5. http://ledantec.net/wp-content/uploads/2013/02/Le-Dantec-sss-2013.pdf

[16] Eric von Hippel, E. (2005). Democratizing Innovation. Cambridge: MIT http://libros.metabiblioteca.org/bitstream/001/183/7/0-262-00274-4.pdf

[17] Buchanan, Richard, "Wicked Problems in Design Thinking", Design Issues, vol. 8, no. 2, Spring 1992. https://coop2012.files.wordpress.com/2012/01/buchanan_wicked_problems.pdf

[18] The Gutenberg Galaxy: The Making of Typographic Man, McLuhan. http://www.worldcat.org/title/gutenberg-galaxy-the-making-of-typographic-man/oclc/428949?page=citation

[19] Jochen Koubek, U of Bayreuth and based on PINC Werner Patzelt, TU Dresden

[20] DSDM, Dynamic Systems Development Method - http://www.dsdm.org/

[21] BYOD, Bring Your Own Device - http://www.zdnet.com/topic-byod-and-the-consumerization-of-it/

[22] Transmission - video activist infrastructure building network http://transmission.cc/

[23] Open Source Publishing http://osp.constantvzw.org/