The availability of geographical information from social media data such as Twitter allows the identification of diversity in the spreading of information across one country. This document reports the analysis of hashtags from the collected geo-enabled tweets in a period of two weeks from the country of France to identify geographical characteristics of the collected hashtags. By using topic modelling methods to cluster the hashtags into meaningful topics, the geo-topical distribution of hashtags can be calculated for each France department, then the local and global characteristic of the topics can be identified. The analysis shows that hashtags clusters that contain geographically self-referential information and punctual events are largely local, and clusters that are related to entertainment and pets are mostly global. Interestingly, some clusters that are related to TV show and entertainment are found to have local preferences.
This repository hosts the code used to collect, clean, explore and analyze the data.
twitter_stream.ipynbcollect tweets thanks to Twitter Standard Streaming API
preprocessing.ipynbafter an anonymization process: NLP cleaning of the textual contents including tokenization, stopwords removal, POS tagging, ... (not further used due to technical difficulties), and geo-enrichment based on geo-tags provided by the APIHashtags_analysis.ipynbcoarse exploration of the hashtags contained in the tweets and first analyses
LDA.ipynbLDA topic modelling on hashtags, including results and visualizationsLGLDA_tune.ipynbtuning procedure for the hyper-parameters of the LGLDA modelLGLDA.ipynbLGLDA topic modelling on hashtags, including results and visualizationsNetwork.ipynbtopic modelling through a graph theory approachLouvain_viz.ipynbLouvain communities topic modelling on hashtags, including results and visualizations
The main results and visualizations are also provided in two folders.
Interactive html figures are provided for deeper explorations.
data_filtered.htmlmap of the number of tweets for selected departments: preview herehashtags_count.htmlmap of the count of hashtags for selected depatments: preview herefinalMap.htmlfinal visualization associating word-cloud to each of the studied departments: preview hereLouvain.htmlhashtags network with nodes colored according to their Louvain community (topic): preview here
csv files of the found topics can be found under:
df_lda_short.csvtopics found with the LDA, with the 10-topwordsdf_lglda_short.csvtopics found with the LGLDA, with the 10-topwordsdf_louvain.csvtopics found with the network approach (communities with at least 5 hashtags)