Category: Data Mining

Mining Spatial Data from Twitter

23/9/2016

Mining spatial data from Twitter is rather easy. You need just to download open source R software and geoffjentry's TwitteR package from github. I haven't even used R before, but this manual from Julian Hillebrand is fantastic and helps you to get started really easy.

After preparations, all you need is following code to mine tweets containing certain hashtag from certain location. To construct the code, help from peer R users was got from R-help site and from R-help mailing list.

This is really the coolest feature in the whole open data scene. Everyone are willing to help and share their knowledge and information for nothing. Motivated from this culture, I will publish the data I mined in the next blog post (I have it on another computer). Feel free to use if you are interested in Twitter activity in European Metropolitan Areas. Here is the code if You wan't to use different areas or different hashtags.

searchTwitter('innovation', n=30000, geocode='48.8566,2.3509,30mi')

where n=expected amount of tweets from last two weeks. The smaller the number, the faster the software mines the data. Therefore, for smaller metropolises I used n=1000 and if command brought me 1000 tweets, I raised the amount. With the largest metropolises, it was necessary to use n=30000. Geocode is latitude and longitude of the location where the tweets are mined from. Coordinates have to be in four-digit form. Coordinates I got one by one from website Find latitude and longitude. After the coordinates, the radius where the tweets are mined can be defined. Radius must be in miles. I thought 30 miles’ radius would be average size of metropolitan areas. By changing the parameters to code, I mined the list of tweets city by city. I will upload the database soon.

Comments

blog

Mining Spatial Data from Twitter

Categories

Archives

Author