New research published by the Oxford Internet Institute (OII) reveals how determining the geography and languages of Twitter users is fraught with complications. Which in turn will be negatively affecting the quality of reports produced by marketers, governments, activists and researchers.
Of 111,143,814 tweets collected by OII, sample studies were carried out to determine the most effective way to clarify a user’s language and geographical location.
The study discovered:
- A user’s geographical location as written on their Twitter profile frequently conflicted with their device location (which is discoverable through individual tweets). The device location varied depending if location was detected through an IP address or GPS coordinates. A large number of users had invalid place names in their profile or too wide geographic boundaries (E.g. London, Washington or UK);
- Of a separate sample of 19.6 million tweets, only 0.7% featured structured geolocation information;
- There are number of significant challenges to determine the language of tweets in an automated manner (Colloquial phases, 140 character limit, use of multiple languages in single tweets, etc). Another issue was that Twitter only has thirty-three user-interface languages available, which misses key African, middle-eastern, Indian and Asian languages.
Unfortunately, due to the difficulty of language detected on Twitter, it isn’t even possible (yet) to use language for geographical detection – this could have been a useful proxy for device and profile location information.
All over the world various organisations rely on Twitter data to understand trends and patterns, mapping customer feedback or predict constituency vote swings; usually relying on technology less advanced than the OII to analyse and produce reports, compiled by people with limited understanding of spatial and language detection.
This study is a useful reminder for people who use Twitter data be aware that more work still needs to be done to improve location accuracy on Twitter. This may be achieved by asking users to supply more detailed location information in their profiles, or solved through time by increased mobile usage (therefore more GPS enabled devices being used).
However, the report does state, “Although this paper highlights the challenges associated with accurately understanding the geography of information in Twitter, this should not lead us to discount the usefulness of profile locations as a means of geolocating content”.
This post is only a short overview of the study. Read in far more detail here, in the preprint of the forthcoming article “Graham, M., Hale, S. A., and Gaffney, D. (2013). Where in the World are You? Geolocation and Language Identification in Twitter. Professional Geographer. Forthcoming.”