Twitter is a fantastic news source and provider of billions of noisy, needle-in-the-haystack tweets to confound data-scientists and delight analysts plus commercial marketing efforts. Interactivity with billions of data items is key to developing, understanding, and validating analysis. (Validation is emphasized as Google was recently fingered by IEEE Spectrum for biased testing of self-driving cars.) Visualizing twitter data is challenging in the geographic sense because billions of tweets simply wipe out the background map. Even more challenging is the tagline I use for graph analysis that, “A laptop can represent a billion node graph. People don’t understand billion node graphs, or million, or thousand, or hundred, or even graphs many more than ten nodes.” For these reasons, the interactivity and display capability of the distributed GPUdb database is interesting and potentially a “deal-maker” for social analytics types.
Following is an introductory video by GPUdb:
They have even made a web-enabled interactive demo available for evaluation.
While visualization is key, so is validation and the ability to distil information down to a simple set of metrics. See my GTC 2013 talk where the click-together framework processes a billion tweets in an interactive fashion on a commodity workstation. The ability to process a billion tweets quickly was key to development of the idea of “sociolects” to identify groups in social media purely by word usage (See here for to read the peer-reviewed, “Sociolect-based Community Detection“) to enable analysis and validation (via triangulation and numerical methods) for social media.
Start at 39 minutes into the video. The slides can be viewed here starting at slide 34.

Leave a Reply