Commerce, search engines, social media and other actors of the digital world meticulously record in their databases the purchases we make and the websites we visit. Already now, they know a lot about us and our preferences, and would be more than happy to include register data on us in their databases, if only legislation allowed. For these actors, the management of large data sets and their systematic analysis is an everyday activity, and a seemingly productive one as well.
In scientific research, natural sciences and medical sciences have long-term experience in the management of large data sets. In human sciences, linguistics has been a pioneer in Finland through the national FIN-CLARIN consortium, which constitutes part of the European CLARIN network. The attitudes in human sciences towards data resources known as big data haven’t been unreservedly positive. The new way of producing information with its computational models and artificial intelligence, as required by big data processing, has even been regarded as a threat to hermeneutic research. On the other hand, advocates of big data have highlighted the opportunities it offers in the analysis of large data sets by using the above mentioned methods.
Within human sciences, there is an agreement on the fact that the amount of big data is growing exponentially in tandem with the digitalisation of our society and the digitisation of previously created, historical materials. The usability of these data sets, on the other hand, is promoted by investments in open science, multiplying the number of potential users. Big data can’t be ignored or hushed away in human sciences either, as evidenced by the appearance of theme issues in the field’s journals and the emergence of publication series devoted to big data, for example Big Data & Society from 2014 onwards. Semi Purhonen and Arho Toikka provided an excellent analysis of the significance of big data for research in social sciences in an article published in Sosiologia (1/2016). According to them, new methods allowing the use of large data sets and the change in the ratio between quantitative and qualitative methods are revolutionary. Computational methods enhance transparency and repeatability in the analysis of text materials. After presenting this view, a radical one in the context of human sciences, the authors note that the new computational methods do not undermine the significance of context-related understanding and substance-related expertise.
The fact that mastery of the computational methods is not included in the basic methodological tool kit of social scientists – or humanists for that matter – is largely overlooked in the discussion relating to digital humanism and big data. In order to mine relevant and in-depth information from big data, the researcher must be well versed in the phenomenon and its contextualisation, and the researcher also needs to have expertise in data collection and data analysis pertaining to large data sets. When presented to an individual researcher, the challenge seems and is too great to overcome. In-depth mastery and analysis of big data in social sciences calls for multidisciplinary collaboration involving at least computer science and statistics. Big data challenges the paradigms relating to the production of scientific information in human sciences, and also calls for our educational system to renew itself. The utilisation of exponentially growing data sets must not remain the privilege of the few, but needs to be harnessed to serve the development of society at large.