Dossier: Standards in Digital Publishing - Practitioners' Viewpoints

Dossier: Standards in Digital Publishing - Practitioners' Viewpoints

Dossier Introduction

In this research dossier we have brought together a wide variety of perspectives, all from people working on real world productions. The contributors offer their professional insights and come from fields of publishing such as; a W3C expert committee, the scholarly library, professional training and development in digital publishing, eBook production and typographic Web design. This is the first of a planned series of dossiers covering topics in digital publishing. In the transformation of publishing from print to digital and hybrid publishing so many questions remain unresolved. The choice of the dossier publication format - or pulling together a number of shorter contributions on a question - suits the field of digital publishing as a way to distribute findings from a production context to wider audiences. The importance of issues arising from working practice can be clearly seen in the area of standards, in their use and effects. It is the standards that appear from working practice de facto standards that have the greatest sway, as opposed to legal requirements, known as de jure standards. The hope is that the five contributions below can be a help to give others working in publishing a series of questions that they can apply to the new contexts they will inevitably face in developing multi-format digital publications.

Does Digital Publishing Need Standards?

A History of Text File Standards and Outlook Into the Future

By Dr Johannes Wilm

Whenever files come into play, there is the question of how they are structured and whether that structure follows a logic defined by an inter-organizational body. Also in the world of publishing, standards have been an important issue.

The discussions about standards have been on different levels of layers of digital abstraction, and the meaning standards have had has shifted through the evolution of digital media. The most recent developments seem to point to a world in which the publishing industry, as the industry concerned with the editing and promulgation of long-format texts, will be a minor part in a larger Web content industry. This change will have an impact on the meaning of standards for digital publishing to such a degree that the need for continuing the development of certain standards will be called into question altogether.

From ASCII to graphic user interfaces

From the 1960s to the 1980s, the American Standard Code for Information Interchange (ASCII) was under development. As the name suggests, it focused on the US, and it provided 128 characters needed for English writing. It was the most common file encoding standard until 2008, when it was replaced by Unicode Transformation Format-8 (UTF-8)1. Unicode, developed since the 1980s, provides for more than 1.1 million characters.

Even before Unicode, digital content could be created in languages other than US American English. In part this was made possible through the creation of ASCII-alike standards for specific languages, such as the ISO 8859-1 standard commonly used in many Western European languages.

Another method was not to use the character encoding of a file for certain characters. For example, in an HTML 2.0 file the German/Swedish Ö-character can be represented by the character combination Ö.

Rich text file standards

When computers started dealing not only with textual content, but also styling, there was a shift in terms of what it meant for a file to follow a standard. Instead of just having to be encoded with a specific character encoding, files now needed to contain specific contents in a certain order.

The earliest file format describing styling information, such as LaTeX (1984), were designed to be readable and editable by non-graphical text applications.

The ability to directly edit the files "by hand" changed bit by bit. The Rich Text Format (RTF, 1987)2 was less readable in non-graphic editors. Text editors could still be used for minor fixes, as the file is encoded in ASCII and it can therefore be edited in most commandline text editors.

Recent word processor formats such OpenDocument Text (ODT, 2005)3 and Office Open XML (OOXML, 2006)4 cannot as easily be edited by hand as RTF. The main difference is that they are files that are wrapped in zip files to allow for the inclusion of graphics and alike.

While plain text files could be used both for editing and displaying files, file standards for richtext files have either focused on editing or being displayed. While RTF, ODT and OOXML are mainly used by graphic editing applications, the Hypertext Markup Language (HTML, 1993), and the Portable Document Format (PDF, 1993/2008)5 focus on display.

PDF files can seldom be edited in a pure text editor. And also formats that once did provide for manual editing have become more difficult to handle in recent versions. While HTML files were initially simple, most current HTML files use so many different tags and featured that editing with a text editor is cumbersome.

The development of rich text text formats, can be summarized as starting with text files that contained basic additional character and styling information editable by a skilled editor, to files that were so full of styling information that only complex programs with massive development teams behind them could interpret/write them correctly.

One reason for the shift was the advent of graphic user interfaces (GUI). The type of text files that existed could not explain the type of graphic styling users expected; files that just followed character encoding standards lacked ways of specifying font sizes and bold text.

The shift meant that standardized filed needed to be interpreted by programs. Reading the characters contained in a file according to an encoding is a feature available either through a programming language itself or a generic library available to any program written in that language. In contrast, application teams would have to build HTML interpretation capabilities.

The publishing industry's own file formats

With the advent of modern word processors installed on most computers, the publishing industry has come to rely on their authors using these programs for content production.

And indeed, these programs can all be used both for long texts that the traditional publishing industry dealt with, and for smaller texts written by lay people within personal communications, and the file formats can express a lot of the styling information needed for books.

Access to these generic file formats and programs has been an advantage for an industry that otherwise would have had to invest in their own tools. But it has also meant that the standards and programs have not been perfect matches.

For example, word processors are great to create simple documents like birthday cards and student essays. But they are really terrible for providing semantically structured output; the availability of the font size and type selection menus means that the user is able to change inline formatting on the fly and the publisher has to employ human beings to spend a lot of time interpreting such formatting to pin down their semantic meaning so that the same text can be exported to different formats easily.

Another side effect is that HTML, as presented by web browsers, is not well suited for long format texts, as it lacks capabilities such as pages with page numbers to which the user can return.

The advent of ebook readers, with the Electronic Publication (EPUB) format in 2007, was more specific to the needs of the publishing industry. The format consisted of a simplified HTML code inside a Zip file with metadata and images. The format was read by small Web browser-like programs created by small teams on a limited budget under severe time pressure running on machines with limited processing power. The result was that the limited features defined by the standard were seldom available to the end user.

More abstractions or less standardization?

As ebook reading devices become more powerful and morphed with tablets, it has been predicted that the next generation of ebook reading programs will contain open source browser engines that interpret an EPUB. If the programs could make use of browser engines for most things, it would mean they would have access to larger development team for that part and therefore less bugs in everything that EPUBs have in common with HTML files.

But EPUBs need to do more than browsers in some aspects: pagination, page numbers, table of contents, display metadata, etc. Such extra functionality could be provided through modifications of the browser engines. A disadvantage with such an approach would be that those making such changes to the engine would end up having to maintain patches to the browser engine source code apart from the engine's development team. Maintaining patches to third party software usually means a lot of extra work.

A second option would be to use the browser's JavaScript engine and Cascading Style Sheets (CSS) interpreter to add such features. This would mean a piece of JavaScript would have to interpret (a part of) the file format (in this case relating to pagination) which would now open for a new way of dealing with standards: It is now a piece of scripting language running inside the program that deals with the standard. Vivliostyle.js is trying that exact approach for more complex print-related styling information that browsers cannot handle by themselves.

If this approach gains more support, it will signify a shift almost as fundamental as the movement from file encoding standards to file type standards interpreted by individual programs.

Similarly, access to lower level features in browsers could allow for the creation of other more specific tools for the publishing industry, including semantic content editors, such as Fidus Writer, and HTML/CSS renderers that add page related features for print, such as Vivliostyle.js.

At the same time, the need for standards is cast into doubt. Scripting languages do not have to be compiled and can be programmed rapidly. Programs written in script languages can be updated every time a user goes to a specific website. So is there even a need to have a common standard for the files they read? A few months ago there seemed to be some doubts whether the W3C Digital Publishing Interest Group (DPUB-IG) should go ahead with the next version of the EPUB standard, labeled EPUB-WEB, or whether each publisher simply can create their own JavaScript app to display content that is written in the companies own closed or open format.

Concluding remarks

Whether or not standards for script-interpreted are important is a discussion that has not been concluded so far.

Not working with a standard has some advantages, notably that each publishing application can be more rapidly developed, inventing new format features as they are needed and implemented and not having to worry about implementing features few users will use only to comply with the standard.

In the favor of standards is a potentially higher longevity of content. Related to this is the rapid development of new technology combined with the fact that valuable long-form texts often continue to be valuable several centuries after they were written. Also interoperability between different file reading system is better assured through standards.

The arguments are not new, but their relevance may have shifted given the rapid development and updating of JavaScript based programs. A possible compromise could lie in the terms laid out in the The Extensible Web Manifesto6, a 2013 document signed by several of the leading participants in W3C standards groups that calls for the creation of new standards by giving access to lower level features to JavaScript developers, who then implement features as the standards continue to be updated. Once a particular feature gains maturity and is in general use, the code is moved into the lower level program itself. This would allow both for the flexibility of JavaScript based development and the stability that an agreed standard can provide.


Dr Johannes Wilm

mail (AT) johanneswilm.org

@johanneswilm

http://www.fiduswriter.org

http://www.vivliostyle.com

Member of the “W3C Task Force on Editing” as an invited expert, I represent the Japanese company Vivliostyle in the “W3C CSSWG”, “W3C Houdini” and “W3C DPUB IG” groups, and I was one of the founders of Fidus Writer.

Notes:

1 Unicode Transformation Format-8 (UTF-8) http://googleblog.blogspot.se/2008/05/moving-to-unicode-51.html

2The Rich Text Format (RTF, 1987) was introduced with Microsoft Word 3.0 for Macintosh.

3The OpenDocument Text (ODT, 2005) format is the native format of LibreOffice Writer.

4The Office Open XML (OOXML, 2006) format is the native format of Microsoft Word.

5The Portable Document Format (PDF) was released as a proprietary format in 1993 and an open standard in 2008.

6 The Extensible Web Manifesto https://extensiblewebmanifesto.org/

(e-)Book as a Network

By Catherine Lenoble

Lire+Écrire cover
Image: Lire+Écrire cover

This article is an account of both a collective and personal experience engaging with the editorial coordination and circulation of an electronic publication. It is also a chance to retrace the context and conditions of its creation, as well as the unexpected journey of this e-Book, a year after its release.

[READ+WRITE]DIGITAL : INFRASTRUCTURES AT STAKE

In 2013, I co-initiated a series of professional meetings on called [Read+Write]digital upon the invitation of a dedicated books and reading service, emanating from a regional authority of Pays de la Loire, France1. Together with Guénaël Boutouillet, book mediator, we had the chance to train and push forward a « new event format » mainly addressed to books professionals (publishers, authors, librarians, cultural operators and students). Usually when attending a conference or seminar on digital mutation in publishing, one could be informed about the latest development and hear a wide range of new technical terms, and so, in sum, gain knowledge; but one is left, in the end, with no handy way to practice using these promising tools.

Based on this observation, we conceived a simple « template », combining morning conferences and afternoon workshops, relating topics discussed in conferences and activating them through workshops. Proposed in a collaborative and nomadic setup, we curated four meetings during the year 2013, starting and ending in an auditorium and classroom (University of La Roche-sur-Yon), hosted in between in a public library (Médiathèque Diderot, Rezé) and an art center (Maison des Arts, Saint-Herblain). Conferences were filmed and archived on a dedicated blog2; the same blog was rerouted from its communicational function, to be used for the workshops as a « digital writing’s laboratory » or (now, looking back) as a sandbox3.

[READ+WRITE]DIGITAL : OPEN RECORD AND COLLABORATIVE DESIGN

Referring to sandbox in a gaming contextualization, applied to our case, this means participants would access the blog afterwards, to rewrite, update and proofread their production. We tried to build an « open learning environment », where participants could join whenever they want, without feeling behind because of not having followed the whole cycle. We would talk about the Internet history and evolution of libraries, copy-culture and the commons, electronic and algorithmic literature, and last but not least, digital publishing and the making of e-Books. We had in mind, indeed, to produce an electronic publication titled ‘Lire+Écrire’, instead of a « symposium report » traditionally in print or in PDF format. An EPUB – or the idea of a « liquid book » – perfectly matched the networked cycle we had designed.

The last and final meeting focused on digital publishing, with the participation of young digital craftsmen, Chapal&Panoz studio4, who delivered a tablet-only unconference. Oh, what I did not mention, working on bridging topics, we would, in ones and twos, invite the lecturers to think and run the afternoon workshop with us. That’s what we did with Roxane Lecomte & Jiminy Panoz (as we considered to work with them on the EPUB). We questioned the format and contents together with the participants by opening up the design process: in a technical (lift the hood, looking at how the text « flows » in Sigil software), editorialized approach (calling for a close-reading of the blog and discussing, for instance, the table of contents without thinking anymore in terms of pagination). This is where we talked about Creative Commons license issues as well, as a « sustainable » condition for this e-Book, to flow freely among peers, and hopefully, a wider audience.

[READ+WRITE]DIGITAL : ONLINE AND OFFLINE

Under the editorial coordination, this first experience with digital publishing was purposely a meta-minded one: [Read+Write]digital5 is a self-reflective object – an electronic publication on networked publishing, reading and writing. Although what follows is anecdotal, it is revealing about the workflow. I carried out work (from Brussels) with Chapal&Panoz (Barcelona-based) studio, the e-Book was published by the French publisher Publie.net (with the director living in Bangkok) for a backer in Nantes. It has worked pretty well. And for the first time, I didn’t face any printer's deadline. This was also the weirdest moment: it was hard to accept the work is « finished » when the ending has almost no tangible effect (like receiving full cardboard boxes, when you are managing your new stock).

That’s exactly what interests me now that the work is behind me (without a paper and cardboard copy at home): it is always available, accessible, shareable. From the last statistics on publie.net, [Read+Write]digital has been downloaded 1829 times. It is for sure more than we would have expected if we had chosen a publication in print. And I particularly appreciate always having a copy « within easy reach ». I’ve been working recently with offline networks, especially the LibraryBox project6. This is where and how I keep on spreading this resourceful e-Book among librarians as well as computer scientists, graphic designers and digital culture mediators.


Catherine Lenoble – author, digital culture mediator, http://litteraturing.net

catherinelenoble (AT) gmail.com @cathsign

Publication Details

Lire+Écrire

Un livre numérique sur l’édition, la lecture et l’écriture en réseau

Free e-Book released under a CC-BY-NC-SA license, http://www.publie.net/livre/lireecrire/

Notes:

1 Original title in french [Lire+Écrire]numérique: the cycle and publication were made possible thanks to the Centre Régional du Livre de la Région Pays de la Loire, France, restructured and renamed Mobilis in 2014: http://www.mobilis-paysdelaloire.fr/

3 « A sandbox is a kind of game in which minimal character limitations are placed on the gamer, allowing the gamer to roam and change a virtual world at will.(…) Instead of featuring segmented areas or numbered levels, a sandbox game usually occurs in a “world” to which the gamer has full access from start to finish. » (src: Techopedia, https://www.techopedia.com/).

5 Lire+Écrire publication. Free e-Book released under a CC-BY-NC-SA license, with the contributions of Guénaël Boutouillet, Olivier Ertzscheid, Antoine Fauchié, Roxane Lecomte, Lionel Maurel, An Mertens, Laurent Neyssensas and Jiminy Panoz. Access & download the publication in the following formats (EPUB – Kindle Mobipocket – Web version): http://www.publie.net/livre/lireecrire/

6 LibraryBox is a fork of Piratebox for the TP-Link MR 3020, customized for educational, library, and other needs, conceived by Jason Griffey. It is an open source, portable digital file distribution tool based on inexpensive hardware that enables delivery of information to individuals off the grid. http://librarybox.us/

Notes on Standards in Digital Publishing for Academic Libraries

By Corinna Haas

Introduction

As a contribution to a research dossier on different views on standards in digital publishing, this paper highlights the point of view of the academic library. First, (1) I will outline the transformation of the publishing process and its impact on libraries. Then, (2) I will present approaches to the standardization of publishing services offered by libraries, and finally, (3) I will point to further challenges.

1.) The Transformation of Publishing and its Impact on Libraries1

Traditionally, academic libraries dealt with published materials, but were not involved in the publishing process. The classic publishing life-cycle was characterized by a clear division of roles between the actors: the author, the publisher, the library, and the reader (and potential author) at the end (and new beginning) of the cycle - each one had their own clearly defined task areas. In the classic publishing process, the library acted as a customer for academic publications, made scholarly publications available, increased their use by indexing them, and finally took care of their long-term preservation.

In the last twenty years, the publishing process has undergone a fundamental transformation, caused by the digital development and other factors of influence. The division of roles has blurred, since with the internet and ways of electronic dissemination, in principle each actor is now able to perform each step of the process. Libraries have experienced profound changes because of the digital shift. They now have to compete with other information providers, and have had to tackle the increase in electronic dissemination of information, and have had to adjust their services to changing user expectations and needs. Libraries entered into publishing activities in answer to the “serials crisis” that hit them in the mid-nineties, when spiralling journal prices and static or declining acquisition budgets forced them to cancel journal subscriptions – whereas the literary production was constantly rising. The serials crisis catalyzed the Open Access (OA) movement and other alternative publication models, as a possible way out of the crisis and a move towards a re-commodification of knowledge. Libraries are contributing substantially to OA publishing.

The transformation of the publishing process is still going on. The present situation is characterized by the coexistence of print and digital publishing, and of both subscription models and Open Access. Between the actors new roles are still to be negotiated, according to new requirements, and to their respective skills and competencies. Agreements and cooperation are required, as well as the development and implementation of new standards.

2.) Developing Standards for Publishing Services of Libraries

Today, most libraries in higher education institutions offer publishing services, usually in cooperation with IT departments and Media Services. As an example, I’ll briefly introduce the document and publication server Edoc of the Humboldt University Berlin. 2 With Edoc, university writings can be uploaded and be made available for the public. As the Edoc guidelines point out3, the document and publication server offers an organizational and technical framework for the electronic publication of scholarly documents to all members of the university. Within the framework of the joint offering of the Computer and Media Service (CMS) and the University Library, highly relevant scholarly documents are provided for teaching and research, according to quality standards. The Edoc Server serves as a technological platform for the realization of the ideas formulated in the Open Access Declaration of Humboldt University4. Each electronic document gets a persistent identifier (URN) and is indexed by national and international library catalogues, search engines and other reference tools. Specific measures like digital signatures and time stamps are taken in order to protect the documents against falsification and deletion, thus ensuring their authenticity and integrity. Further, the Edoc services ensure the long-term preservation of the electronic documents. The operation and further development of the Edoc server are integrated into national and international initiatives and projects like the DINI network5, the Confederation of Open Access Repositories (COAR)6, the Networked Digital Library of Theses and Dissertations (NDLTD)7, or the Open Archives Initiative (OAI)8. In order to highlight some areas of standardization, I want to refer to the DINI Certificate 2013 for Open Access Repositories and Publication Services.9

The DINI Certificate

The German Initiative for Network Information DINI e.V. was founded by a consortium of German University Media Centers, the German Library Association dbv e.V., and an association of German University Computing Centers. DINI aims to enhance information and communication services, and therefore to support information infrastructures in higher education institutions. Since 2004, has DINI offered the certification of document and publication servers. The certification process follows the minimum requirements and recommendations mentioned in the certificate.10 The list of criteria includes eight topics:


  1. Visibility of the Service

  2. Policy

  3. Support of Authors and Publishers

  4. Legal Aspects

  5. Information Security

  6. Indexing and Interfaces

  7. Access Statistics

  8. Long-Term Availability.

Beyond these criteria, the certificate provides information about the certification process11 as well as an appendix with the interface guidelines of the Open Archives Initiative. OAI develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content.

Standards for Authors

At this point, I'd like to point to an aspect of the DINI criteria which had not been an issue in the traditional publishing process: the co-responsibility of the author for the indexing and long-term accessibility of his/her work. The certificate not only proposes that authors add abstracts and assign keywords to their work, but also that they use non-proprietary, open file formats like PDF/A, ODF, or TXT for writing, in order to facilitate long-term availability.12

To this, Peter Schirmbacher, the Director of the CMS at Humboldt Universität Berlin adds another requirement for authors (Schirmbacher 2009, 17f): He urgently recommends that formatted documents be created using machine-readable DTD (document type description) markup languages based on SGML or XML13. As he argues, the migration and preservation of documents through recurrent hardware and software changes and updates can be much better ensured if the files are non-proprietary (thus, open source) and machine-readable. Moreover, markup languages allow for more detailed indexing and better retrieval, since not only author, title, and keywords can be indexed, but also chapters, headlines, illustrations, citations and references, as well as descriptive metadata.

More standards for long-term archiving are developed and promoted by the nestor competence network for long-term archiving.14

To summarise, my presentation has addressed the ongoing transformation of the publishing process and the changing role of the library within it. Using the example of the publication and document server at Humboldt University Berlin, I addressed new publication models and forms of cooperation. Although there is still a lack of established standards for publishing services (Schirmbacher 2009), important approaches have been made, as shown by means of an important contribution from Germany, the DINI Certificate.

3. Further Challenges

Some crucial issues of publishing like the integration of multimedia have not been tackled in my presentation. But as Raffael Ball (2013) points out, there are much more fundamental challenges to publishing. As Ball elaborates, not just publishing but scholarly communication as a whole has changed fundamentally. He stresses two points in particular: the dissolution of the boundaries between informal and formal (ie published) scholarly communication, which have become a continuous process. To this I’d like to add that there are many different modes of consumption and dissemination of research today: interactive charts, linked data blogs, social media (including Facebook, LinkedIn, Academia.edu, and ReseachGate), visualisations and more.15 This doesn’t exactly facilitate the selection and aggregration of “relevant” materials, speaking from the perspective of libraries. The second point Ball stresses is the development from static to dynamic (or fluid) documents. Ball actually sees the traditional mission of libraries as being called into question when he asks: What will a “publication” be in the future and don’t we need a new definition for it? What will “collection building” mean; isn’t it already obsolete? And finally: what is and will be worth collecting or preserving, and what is not? To the latter fundamental question, Ball emphasises, a decision will not only be required from libraries, but also on a broader societal level.


Corinna Haas, Academic Librarian, ICI Berlin Institute for Cultural Inquiry

corinna.haas (AT) ici-berlin.org

Bibliography

Ball, Raffael (2013): Das Ende eines Monopols: Was von Bibliotheken wirklich bleibt. Ein Lesebuch, Wiesbaden 2013

CMS-Journal Nr. 32, (2009): Wissenschaftliches Publizieren im digitalen Zeitalter

Mittler, Elmar (2012): Wissenschaftliche Forschung und Publikation im Netz. Neue Herausforderungen für Forscher, Bibliotheken und Verlage. In: Füssel, Stephan (Hg.): Medienkonvergenz – Transdisziplinär / Media Convergence – across the Disciplines. Berlin: De Gruyter, 2012, S. 31-80.

Schirmbacher, Peter (2009): Möglichkeiten und Grenzen des elektronischen Publizierens, in: CMS-Journal, 32, S. 14-19

Notes:

1 For a concise description of the publishing process and its transformation from the viewpoint of the library see Mittler 2012 (49-54)

2 Many aspects of electronic publishing and the edoc services of HU Berlin are addressed in CMS-Journal 32, 2009, at http://linkme2.net/v9

10 The current version DINI Certificate 2013 Open Access Repositories and Publishing Services is available at http://dini.de/dini-zertifikat/english/

11 Current examples for the DINI certification process can be found in presentations of library specialist conferences as Deutscher Bibliothekartag 2015; see, for instance: Markus Putnings (2015): DINI-Zertifikat @ OPUS FAU, or Signe Weihe (2015): DINI ready – Was bedeutet das?, at https://opus4.kobv.de/opus4-bib-info/solrsearch/index/search/searchtype/collection/id/16253

12 Certificate (2013), 28

13 A document type definition (DTD) is a set of markup declarations that define a document type for an SGML-family markup language (SGML, XML, HTML). A DTD defines the legal building blocks of an XML document, and the document structure with a list of legal elements and attributes.

The Standard Generalized Markup Language (SGML; ISO 8879:1986) is for defining generalized markup languages for documents. – XML stands for Extensible Markup language.

XML defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.

15 In May, 2015, the John Crear Library at the University of Chicago addressed The Changing Ecosystem of Scholarly Communication: http://www.lib.uchicago.edu/e/crerar/zar/. The workshop documentation is expected to go online in July 2015.

The Aesthetic of Standards

By Emeline Brulé

Standards (Oxford Dictionary)
A required or agreed level of quality or attainment
Principles of conduct informed by notions of honour and decency
Something used as a measure, norm or model in comparative evaluations

I began tinkering around with digital editions in 2009, for a school assignment. I was wondering to what extent an edition could be augmented or diminished, when digitally remediated. I tagged 20,000 Leagues Under the Sea, word by word, to allow customizations of the text, by setting the level of detail or choosing which character to follow. I transformed books into coloured patterns. I made versions of novels where each word was replaced by found images, and photography books stripped of their pictures and reduced to bot-written descriptions. I generated texts or covers. And I investigated the EPUB format, as I was beginning to realize the intertwinings and entanglements between tools, texts, processes, formats, screens, encoding, material and standards.

For my graduation internship in 2012 at Subjectile1, a paper/digital publisher, I was in charge of designing two digital collections and building the first volume. Four versions were needed: EPUB 2, EPUB 3, interactive PDF and Mobi. I had no idea what I was getting into.

As usual, I began with visual drafts of the layout. Once validated, I designed the interactive PDF version. Although it presented several challenges – how do we give a grasp of the document, as the spine, dimensions and weight did for printed books? Should we use a horizontal format, which seems to be better adapted to laptop screens? How may we rethink the table of contents? – it did not approach the difficulty of the EPUB version.

EPUB is the open standard for digital books. Basically, it uses HTML and CSS, encapsulated with metadata and media. It defines an architecture of content. It is a highly resilient format, built upon solid web standards. Its current version is EPUB 3, which allows the inclusion of multimedia files as well as precise semantic description of content. But at the time, there were few comprehensive guides to the design of EPUBs. Documentation was lacking or difficult to understand. I also had to face a cruel lack of tools: InDesign would generate crappy files and Sigil2 was the only “native” EPUB software, but it would not support EPUB 3. You could use Calibre to generate your EPUB file on the basis of your HTML and CSS files, but it obfuscated the process, which made bug tracking difficult. Everybody was tinkering around, sharing informations on how the different e-reader devices or software interpreted this or that bit of code. The answer was that each one behaves differently so you have to test each individual scenario, which still remains the case. There was no tool to emulate the various readers, except for the Kindle Previewer3, so you had to put your hands on a wide variety of devices, if you wanted to do your testing right. Commercial EPUBs quality was very low. On top of that, we also quickly realized that few people were inclined to install an EPUB file reader, when they weren't equipped with some sort of tablet. In short: EPUB design was a nightmare.

Actually, it still is. Because there is no tool(s), and especially no open source tool(s), able to facilitate or accompany the whole design process of digital editions.

But EPUB also has its virtues. Because it has to take in charge an immense variety of technical apparatus (e-ink readers, tablets, phones, laptops, every software that might be used to open it and users personal reading parameters), it defines a truly hybrid book. I saw designers trying to avoid the problem by designing for iBooks4 only, or by designing fixed layout books. My take on the issue is different. I've been wondering how to make the most of the materials at hand. When possibilities are so restrained, what is left to designers? How do you design with this rough aesthetic of standards in mind?

EPUB strips design to its bare bones. EPUB design is, first and almost only, a matter of structure. A structure that “designates”5 its contents. A structure based on printed books, now flexibly displayed. A structure that switched from a paradigm of page to a paradigm of block. A structure that takes into account and accepts the reading experience will be completely different between one screen and another. A structure that should provide the best experience for all. A structure allowing for various remediation of contents, relying on the possibilities of the display.

I believe this structure is what defines EPUB aesthetic. It may mean poor typefaces, terrible kerning or out-of-control word spacing. It is often far from offering a great reading experience. But the standard itself is beautiful: lightweight, accessible, resilient and adaptable.

Where it has failed, though, is in giving birth to communities of practice. Most EPUBs are generated without correct prepping from PDF, HTML or Word files. Not many designers establish a correct structure for their digital editions. Few play with the constraints of devices' display – for example, sections that may appear only in certain cases, or the visual display of hyperlink URLs which may not be accessible on the same device as the book. Rare are people proposing a real remediation of video or audio files on e-ink readers, or exploring what those could be. But to be fair, as long as EPUB rendering is clumsy, as long as the most basic HTML elements and CSS rules are not correctly displayed by the various EPUB reading softwares, as long as the major publishers won't agree upon the standard for “the required or agreed level of quality or attainment6,” I don't expect to see that situation change.

There is so much EPUB can learn from the history of the Web, its cousin. Just like the Web aesthetic was oscillating between Flash and pure default semantic layouts for a long time, digital editions are oscillating between pure structure and unthoughtful app design. We could say that they haven't found their authentic form7 yet. So ... keep it simple. Focus on the specificity of the medium. Invest in the community. Propose frameworks. Build your own tools.

As a designer, I certainly dream of an electronic tool. A tool that would allow forms of writings specific to multiform digital documents. A tool that would allow for digital editions to live in various formats, designs and screens. A tool focusing on structure, both at the level of the document (blocks and their relationships) and at the level of the content (bricks of content, described and hierarchized). A tool that would produce resilient digital editions, lasting more than the few years of life of softwares or OS. Editions that could be apps, websites, EPUBs, PDFs, or even generic XML files. Designs that wouldn't hide their relationships to their medium by simulating paper textures, but would invent new forms of interactions.

That tool would need to simulate various devices, and to provide basic structures and style sheets. Edition structure could be visualized as graphs, much like as on Twine8. Every brick of this structure could be described by additional metadata, compatible with XML. The content should first be made of structured text, but you could insert multimedia or interactive blocks. You could set up alternative content for the different types of export. The structure “first-rank” elements (such as chapters) should be used to define how to break down the content, defining a switch between a page and another. Bricks should be defined by a link, which could be used in an interactive table of contents.

As the content would be completely structured, the style sheets would only need to describe the properties of every type of brick, depending on the format of export. One could use an existing CSS template, or customize it through a form. There should also be blank files listing every elements to be styled. Of course, this is only one of the approaches that could be taken to the design of digital editions. But we do need to think about the design of tools and tools for design, exploring how standards, aesthetic and publication processes intertwine, to envisage what digital editions actually are, and what they may become.


Emeline Brulé

Emeline Brulé is a PhD student and a designer. After passing through the Ecole de Recherche Graphique. of Brussels, and the EnsadLab in Paris, and working as an interaction designer and in a couple of other roles, she is now pursuing research into wearables, design and embodiment. http://emelinebrule.net/

hello (AT) emelinebrule.net @e_mln_e

Notes:

2 Sigil is an open source epub editor software.

3 Kindle previewer allows for the conversion of EPUB into Mobi files, and to check how they display on the various Kindle tablets and apps.

4 iBooks is Apple default EPUB reading app.

5 Design comes from the Latin designare, to put a mark/a sign on something.

6 Yes, Amazon, I'm particularly thinking about you and your horrendous reading apps.

7 Following Walter Benjamin's A Short History of Photography, 1931.

8 Twine is an open-source tool for telling interactive, nonlinear stories. http://twinery.org

Tonight We’re Gonna Publish Like it’s 1999

By Eric Schrijver

NEGENDE JAARGANG - VOL 9 / #1 / DECEMBER 1995
Image: NEGENDE JAARGANG - VOL 9 / #1 / DECEMBER 1995 http://ttypp.nl/TYP01/

Visiting www.ttypp.nl/arch/ is a time machine. This is where the Web pages of TYP/Typografisch papier are archived. TYP was a forward thinking magazine initiated by Dutch graphic designer Max Kisman which was published on paper, on floppy disks and online. Surfing through the online editions, one is transported twenty years back, to a period the World Wide Web had only just begun to arrive on people’s computers. The enthusiasm displayed by designers looking for ways to exploit the new medium is contagious. On the transient space of the Internet, it seems like a small miracle that this cultural moment is still accessible.

Surfing around websites from the nineties on the Internet Archive, one notices that quite a few do not display anymore, among which are many sites that use technologies like Shockwave or Flash. Websites using HTML tags, the native tongue of the browser, have generally aged much better. This is no accident: backwards compatibility has always been important for web browser vendors. Since browser vendors have so little control over the markup people write, browsers are very forgiving about the markup. Within HTML5, this tradition has been codified, as the parsing of malformed or non-conforming HTML has been standardised.1

In recent years there has been a return to creating solutions built on static HTML files. This is because hosting HTML files is easier, cheaper and more secure than hosting a dynamic system. Since the website does not need to store information in a database, the ‘attack surface’ of a website hosting static files is much smaller. The website does not need to expose an editing interface that can be hacked. Also, a dynamic system will be have to kept up to date to fix known security holes. No such maintenance is needed for the HTML files: and because they do not use any specific capacities of the server, the cheapest hosting solution will generally suffice.

For many of the first websites, HTML was not just the format in which they were delivered — it was the format in which they came about. The first websites were ‘hand-crafted HTML’: created as a series of HTML pages, with occasional updates (the person designing the site might have then charged for each update!). This did not mean coding was necessary: tools like Adobe Dreamweaver provided a visual view and a code view. The democratisation of Content Management Systems (CMS) like WordPress and Joomla changed the equation. In these systems, a general design is encoded into a template and the contents for individual pages are stored in a database that is easily editable by the user. For clients this saves time and money. The downside is that a CMS requires shoehorning every page into templates: those early HTML pages offered much more freedom in this respect, as potentially every page could be modified and changed to the designer’s whims.

This suggests that HTML has additional properties which not only make it the right format for delivering and archiving web sites: it looks like HTML files also provide a very powerful authoring format. The logic of CMSs (and indeed, the intended logic of CSS) is to pull form and content apart. Yet traditionally, the intelligence of designers has resided in creating links between form and content. Moving beyond the template, and allowing authors and designers to modify the design of each specific page is what working in separate HTML files enables.

If such an approach were to be viable today, new tools will have to be developed. With tools like Dreamweaver having fallen out of grace, it looks like the only tool we have now to edit HTML files is the code editor. Yet the popularity of database driven CMSs stems from the fact that they can provide different interfaces for the different people involved in creating a website. A developer might need a specific view, an editor might require a specific angle, as might a designer — even if one person combines these roles. New tools will have to be able to provide different views for editing the HTML document. Instead of generating the HTML, as conventional CMSs do, these tools should work on the files.

Even if there is something of a revival of HTML based websites, these are often built with tools that do not exploit the full authorial potential of HTML. At the time of writing, some 398 ‘static site generators’ were listed on the staticsitegenerators.net registry. These generators suffer some of the same drawbacks as conventional CMSs: they often work with a template through which all content has to be pushed. The advantage, then, is that such tools are always well equipped to generate indexes. The question of how to syndicate, index and provide navigation for a collection of static HTML file s — that is the second challenge for HTML as an authoring format.

Having HTML files as the source upon which tools can be used also has implications for interoperability. The workflows for creating websites today show an unprecedented fragmentation. Every web project has its own toolchain, where the development team has needed to pick a backend language, database and a specific set of front end ‘frameworks’ and ‘pre-processors’ to implement the front end design. This means the content will not be encoded in HTML, but in some abstraction of HTML specific to the project. Fashions change quickly and getting up to speed with the technologies used within a specific project can prove daunting. Skills do not transfer as easily and tools developed for a project will work for that project only.

This is a pressing issue in digital publishing. Whereas traditional publishing workflows are often built around bespoke XML formats, a new generation of technologists is discovering the flexibility of re-purposing web technologies for creating hybrid publications: available as websites, EPUBs and printable PDFs. Currently, many parties make their own database driven solutions. The design will be encoded in a custom template format. The text will be encoded in a custom markup format — often based on Markdown, but never exactly the same. This makes it very hard for third-party service providers to interact with these systems, which means it is harder for an economy to form around digital publishing.

There are counter examples. Atlas, the platform for creating technical publications created by the American publisher O’Reilly, goes a long way in basing a workflow on De facto standards. The source is a series of HTML files, stored in a Git repository.2 Atlas proposes a visual editor for the HTML, and a review and editing workflow that is built on top of Git’s built-in capacity to split and merge parallel versions of files (branches). Because their solution is built upon Git and the file system, one can use any other program to deal with the HTML files. The only downside to Atlas’ approach is that it deals with snippets of HTML, rather than complete files. These files will not display correctly by themselves and they will not validate unless one wraps them in some boilerplate code.

Another interesting product is the OERPUB Textbook Editor, which has the advantage of being fully Open Source. It embraces the EPUB standard. In most workflows, EPUBs are generated artifacts, but in this case they are the source. This makes sense, since at the basis EPUBs are collections of XHTML5 files with a mandatory set of metadata.3 This means that EPUBs as a source format combine the expressiveness of HTML with some of the rigour demanded by conventional publishing workflows. The OERPUB editor requires an EPUB to be stored in a Git repository, available on GitHub. The editor reads in the EPUB and is allowed to edit the text with a WYSIWYG editor, create chapters and sections and edit metadata. Because the file structure is fully standard, existing tools like the IDPF EpubCheck validation software can easily be used alongside the tool.

A decoupling of tool and content can be beneficial for the ecosystem of digital publishing. Having the content stored in a more standardised way allows tools to be more specialised. This has multiple advantages. First, standardisation will make it easier for newcomers to navigate inbetween projects, lowering the bar to entry and making it easier for a more diverse set of practitioners to enter a field that currently seems technocratic. At the same time, tools can become more specialised, allowing for the more rapid development of new tools in specific areas, and allowing experienced practitioners to focus on one particular aspect of the craft of digital publishing and thus progress the status quo in that area. Freed from the obligation to provide a monolithic solution that handles publishing from start to finish, service providers will be able to concentrate on that part of the chain where they see their maximum potential added value.

What such solutions have in common is separation of the editing tool from the content edited, as well as building on existing well-known technologies. Even the technologically advanced Git software, chosen by both projects as the place to store and exchange the project contents, re-uses tried and tested abstraction of the file system. It is low-tech but ubiquitous.

It is in this sense that I think that when imagining the future of digital publishing, we can take inspiration from the past, and more precisely from the cultural moment so lovingly described by Olia Lialina and Dragan Espenschied in the ‘Digital Folklore Reader’. In the late 1990s, at the height of the Dotcom bubble, the nature of online publishing was of course shaped by companies backed by billions in venture capital. Yet it was shaped at least in equal measure by passionate amateurs like the denizens of GeoCities who were the true authors behind the vernacular of the internet. The situation also provided ample opportunity for curious professionals like the graphic designer Max Kisman to experiment with and re-imagine the new medium. What levelled the playing field for all involved was that there was a lingua franca: HTML. Browsers could display the source code for a page, and authors could learn by copy-pasting. I think our success in imagining new digital publishing depends on whether we can enable such a hands-on approach, and envision solutions that allow for such bottom-up creativity. Taking a cue from the Web’s formative moment, we are gonna publish like it is 1999.


Eric Schrijver - is an author, designer and developer born in Amsterdam, and part of the Brussels based collective Open Source Publishing. He edits the blog I like tight pants and mathematics that aims to motivate artists and designers to get more involved in the subcultures of software development. Eric teaches at the Royal Academy of Art in The Hague.
http://ericschrijver.nl/

eric (AT) ericschrijver.nl @ericschrijver

Eric Schrijver’s contribution was developed in the context of ‘Life after the template’, a research project in collaboration with the Hybrid Publishing Group made possible in part by a grant from the Creative Industries Fund NL.

Notes:

1 Some misplaced puritanism has caused HTML standards writers and browser vendors to remove the blink tag. This might have to do with a narrative in which Geocities-style, amateur driven webdesign had created a chaos from which we all had to be saved by standards loving professionals — in this sense, the blink tag becomes a pars pro toto for an approach to webdesign built on Comic Sans and MIDI files, that the ‘professional’ web users suspect they can kill off by sacrificing <blink>. But it stands as a curious omission in what has basically been a technology that has been remarkably caring for its past.

2 Git is a software to deal with multiple versions of files. Git tracks the versions and allows users to merge their changed files. Initially available only as a command line utility, it is starting to be built into content creation tools.

3 XHTML5 is HTML5, with additional restrictions to make sure the HTML is also valid XML. XML is a more generic standard for markup languages. This makes it possible to re-use tools developed for XML with the HTML.