By pasilii - Posted on 04 January 2013

The art of digitization has blossomed particularly in this last decade. It is therefore fair to say that it is still taking baby steps: techniques and documentation criteria are still being polished, technical means are being built and funding sought after.

An inquisitive mind should be asking at this dawn of digitization, what exactly is being digitized, why should we even bother, who will benefit from this digitization, who are the players – or organizations – in this field, how are these organizations connected to each other, how are these digitization efforts funded and what are the possible pitfalls.

What, then, is being digitized? The answer is almost anything human and nature made. Digitizing machinery, be it scanners, cameras or other technical means, devour herbarium sheets, insect specimens, artwork, literature, sounds, photographs and films. Why? The philosophical answer “open knowledge itself is valuable” or even “this is a means of saving a heritage for the future generations” may not satisfy a utilitarian. One short, satisfying answer for a utilitarian would be that large masses of data increase reliability of analysis. A practical example could be that a biodiversity or a climate change researcher can now start to draw more accurate conclusions when she can access digitized, accurately georeferenced and highly concentrated insect and plant specimen data from numerous, previously poorly accessible natural museum collections. The truly relevant questions, when it comes to unraveling the global setup of digitization, are: who are the main players in this new art of digitization and is there an underlying network among them?

First of all the organizations can be divided into two different groups: those who receive and collect digitized data and thus form large databases, and those who work on implementation. In this context implementation means development of standards and analysis methods for this digitized data, and development of interoperability. In this blog entry we will focus only on the collectors and distributors of digitized data. Moreover these collecting and distributing nodes could be further divided based on the nature of the digitized data. In this last grouping the separation is simply drawn between objects, which are human or nature made.

This global network is best explained by following a thread that starts at a local level. We could take us, Digitarium, as an example. Digitarium digitizes natural history collections, thus this sets Digitarium on the “made by nature” path, which is separate from the man-made artifact path. Digitized data from Digitarium without photographs is transferred to Global Biodiversity Information Facility (GBIF), which has its home base in Denmark. GBIF has also national nodes and thus there is an intermediary station, GBIF Finland, on the pathway. Besides accepting data, this intermediary Finnish station also collects nature observation data from Hatikka, BirdLife Finland and the Insect Database at the Finnish Museum of Natural History. This is how the system currently operates in the field of natural history data within Finland. This, however, will change. In the future the national launch pad on the way to any of the global databases will be the Finnish Biodiversity Information Centre.

GBIF has currently 56 countries and 47 international organizations as its members. This certainly makes it a major global player. It is significant, however, to notice that the GBIF members come primarily from the Western and thus the wealthy world. In the same breath it has to be acknowledged that GBIF has recently gained good footing in central Africa.

GBIF, which plays on at least two different fields – as a data collector-distributor and an analysis developer –, is interested in only meta data. What, then, happens to the photographs of herbarium sheets and insect specimens? They end up in a vast database of another global player, Morphbank, which is physically located in Florida, USA. A major, but not a sole American feed to Morphbank comes from Integrated Digitized Biocollections (iDigBio). The Australian distributor to Morphbank is Atlas of Living Australia.

As a new twist in this global digitization setup, Europe has also recently established its own “European version of Morphbank”, which is OpenUp! within Europeana. (Europeana’s principal role is to operate as a collector and distributor of digitized cultural history data and that role is further discussed in a separate blog entry.) So it is safe to say that it was not safe to put all eggs in one basket. As if this was not sufficient, yet another “basket for one’s digitized eggs” is Encyclopedia of Life (EOL) which gathers digital knowledge – photographs included – from Earth’s organisms.

The guardian angel or overseer for this global digitization realm on natural sciences side is the Convention of Biological Diversity (CBD) formed under the United Nations Environment Programme (UNEP). CBD is following how the area of biological sample digitization is developing, and it is listing some current pitfalls. One concern is the lack of sufficient and accurate information on given species and the fact that persistent identifiers are not yet widely used. Another concern is that the global digitization realm is project-based, which usually means short term funding and thus quite possibly short term existence. In addition there is clearly some competition between existing systems, yet none of them is able to meet all the needs. It can be concluded that those players, who plan to survive in the field of biological digitization, should focus on covering a broad range of identifiers, yet keeping in mind the prioritization for optimal efficiency in time and cost. Data accuracy, transnational collaboration, development of tools and methods, global interoperability and long term funding solutions are also parameters that determine a forerunner in the field.

Pasi Liimatta, Eini Nyyssönen, Hanna Pennanen ja Pertti Pulkkanen