Routledge Handbook of Interdisciplinary Research Methods

Introduction

Digital and social media have opened up new avenues for data collection about social, cultural and political life. Since the advent of digital online media, their data have been of interest to a range of disciplines which have approached digital data with their respective questions and methodologies. Within sociology, online and social media data have initially led to the hope that these new data formats may not have been ‘contaminated’ by the interferences of researchers and their methodologies (Savage and Burrows 2009) – as opposed to traditional sociological methods such as interviews or questionnaires. Engaging with the making and technicity of digital and social media data, however, soon confronted researchers with multiple inscriptions of media, methods and their tools (Ruppert, Law and Savage 2013). Various methodological approaches to access such data preformatted by media have emerged in the context of media and communication studies, sociology and Science and Technology Studies (STS). In the field of digital research methods, digital media are treated as research devices, capable of structured data production (Rogers 2013; Weltevrede 2015). STS have embraced digital data as defined by the actors themselves rather than researchers (Callon 2006) and contributions from the field of Actor Network Theory (Latour et al. 2012) have pointed out that digital data offer both a granular view on individual actions and an aggregated overview, allowing the possibility to cut across the micro/macro distinction that has been so central to sociological debates.

Across disciplines, different ways of accessing data from online and social media have emerged, most notably scraping, that is the extraction of preformatted data from user interfaces (Marres and Weltevrede 2013), but also retrieval, the extraction of data via application programming interfaces (APIs) offered by digital and social media platforms. APIs are software interfaces that enable researchers and other third parties to connect to associated databases in order to produce content for, or extract data from, platforms. This access is usually highly structured, standardized and regulated by the associated platform, offering data access to developers, business partners and researchers among others. As scraping is limited by what can be extracted from user interfaces, retrieving has gained increasing relevance in the context of ever-growing volumes of data.

Application programming interfaces and the grammatization of data

Many digital and social media platforms offer (a variety of) APIs to build upon, produce content or extract data, enabling structured access to platform databases. In the case of Twitter, for example, the social media platform offers a so-called REST API for discrete queries, a Streaming API for continued data-capture in real time and an Advertising API.1 The data available via APIs can be considered pre-structured on many levels. On a first level, it is the result of standardized platform activities – or grammars of action to draw on the work of Philip Agre (1994) – which enable users to act in particular ways and platforms to instantly capture data about these actions in standardized form. In the context of Twitter, these grammars focus on organizing user relations (friending, following, muting), comments, likes, retweets, status updates and posts among others, as well as their metadata. The grammatization of user action in the front-end is met with another layer of grammatization in the back-end in the form of API commands and regulations that determine which data can be accessed by whom in what quantities. To remain with the case of Twitter, API access is organized through OAuth,¹ a personalized access token for third-party API access. Once access is granted, input to or retrieval from the database are pre-structured through the platform’s developer facing grammars. In the case of Twitter’s REST API grammars are organized alongside query-related GET commands which retrieve data and activity-focused POST commands, that allow the API user to post content. These commands largely mirror the grammars of front ends, but also offer additional data and are policed through extensive documentation, good practice cases and Twitter’s rules of conduct.

Looking at data retrieval through the lens of grammatization brings to attention that organizing data into categories and units is increasingly distributed, at least between the researcher, users realizing these grammars and the media, as platforms define what data formats users can generate and retrieve through APIs. This redistribution of ordering capacities away from the researcher towards media platforms was initially perceived as the promise of transactional digital data resulting from the direct capture of user activities. Following Callon, ‘One way of testing the relevance and robustness of a proposed categorization is to allow the entities studied to participate in the enterprise of classification’ (2006: 8). Indeed, the grammatization of API data suggests that researchers should attune their methods to the categorizations, inferences and objections made by the medium.

Retrieval and realism

Lively metrics

Looking at the proliferating client-ecosystem thus challenges a proof-in-use realism of retrieved data by leading us to ask: what do we actually count when retrieving data through APIs? The activity of categorizing is not only distributed between the researcher and the platform, but is realized through users, their practices and interpretations, third-party clients and cross-platform syndication. What API data retrieval gives access to, therefore, are ‘lively metrics’, that is data categories that are internally dynamic, situated, localized and alive. Their liveliness – as opposed to mere currency or liveness (Marres and Weltevrede 2013) refers to the multiple ways in which platform grammars can be realized and interpreted. It is hence not only the platform that categorizes and grammatizes the data that can be retrieved, the lively metrics available via APIs are animated by the entire ecosystem of users, practices and clients. Hence, the moment researchers retrieve data through APIs, it has already been pre-composed in dynamic, local and distributed ways. Or, put the other way around, expanding Agre’s work in the context of social media platforms, grammatization not only enables capture, but establishes avenues for new, dynamic and thus lively forms of data composition, which are often made invisible through standardized data and its retrieval infrastructures.

This opens up new avenues for inventive methods that seek to let objects pose their own problems (Lury and Wakeford 2012).

References

Agre, P. E. (1994). Surveillance and capture: two models of privacy. The Information Society, 10(2): 101–127.

Bijker, W. E., Hughes, T. P. and Pinch, T. J. (1987). The Social Construction of Technological Systems: New Directions in the Sociology and History of Technology. Cambridge, MA: MIT Press.

Callon, M. (2006). Can methods for analysing large numbers organize a productive dialogue with the actors they study? European Management Review, 3(1): 7–16.

Desrosieres, A. (2001). How real are statistics? Four possible attitudes. Social Research, 68(2): 339–355.

Gerlitz, C. and Rieder, B. (2014). Tweets Are Not Created Equal. Intersecting Devices in the 1% Sample. Presentation at the AoIR conference, Daegu, South Korea.

Latour, B. et al. (2012). ‘The whole is always smaller than its parts’: a digital test of Gabriel Tardes’ monads. The British Journal of Sociology, 63(4): 590–615.

Lury, C. and Wakeford, N. (2012). Inventive Methods: The Happening of the Social. London: Routledge.

Marres, N. and Gerlitz, C. (2016). Interface methods: renegotiating relations between digital social research, STS and sociology. The Sociological Review, 64(1): 21–46.

Marres, N. and Weltevrede, E. (2013). Scraping the social? Journal of Cultural Economy, 6(3): 313–335.

Passmann, J. and Gerlitz, C. (2014). ‘Good’ platform political reasons for ‘bad’ platform data. Zur sozio-technischen Geschichte der Plattformaktivitäten Fav, Retweet und Like. Datenkritik. Retrieved March 2018 from: www.medialekontrolle.de/wp-content/uploads/2014/09/Passmann-Johannes-Gerlitz-Carolin-2014-03-01.pdf

Rogers, R. (2013). Digital Methods. Cambridge, MA: The MIT Press.

Ruppert, E., Law, J. and Savage, M. (2013). Reassembling social science methods: the challenge of digital devices. Theory, Culture & Society, 30(4): 22–46.

Savage, M. and Burrows, R. (2009). Some further reflections on the coming crisis of empirical sociology. Sociology, 43(4): 762–772.

Weltevrede, E. (2015). Repurposing Digital Methods: The Research Affordances of Platforms and Engines, Amsterdam: PhD dissertation.

10
Retrieving

Introduction

Application programming interfaces and the grammatization of data

Retrieval and realism

The ecosystem of data retrieval

Lively metrics

Note

References

10 Retrieving

Introduction

Application programming interfaces and the grammatization of data

Retrieval and realism

The ecosystem of data retrieval

Lively metrics

Note

References

10
Retrieving