Social Media Analytics (SOMA) is the science of analyzing the sentiment of social media, to measure the current pulse of a particular topic in reference to what the online world is talking about. In order to do that, the computer requires Natural Language Processing (NLP) skills in order to make sense of the numerous messages being exchanged and shared.
My 12 months Specialist Diploma in Business Analytics is coming to an end soon and the last semester topic was SOMA, which was the subject that interests me the most. Over the past few weeks, I was introduced to this field of study. This was a build-up to the finale, the finishing line of my 1-year of evening classes and homework for projects/assignments/tests. Over the last 11 months in Temasek Poly, we have completed modules in business intelligence fundamentals (Qilkview dashboards), quantitative statistics and data mining (SAS Enterprise Miner, predictive models).
This course has opened my eyes to a new world where technology has empowered us to do big data analysis on a level never before thought possible. The quantity of data is now not an issue. We are able to crunch data using software that can review hundreds of thousands of data points in seconds and models that can look for trends to make sense of past or current data.
The art here: to use these tools to come up with conclusions and recommendations with a high level of confidence backed by data analysis. Each and every person will have a different way of analyzing from various angles. I am trying to leverage on my past bank marketing experience to pivot into the business analytics sector with a value-added edge.
In the past 3 weeks, we have learned that social media apps like Facebook and Twitter allow users to extract data easily from their database. Our assignment was to crawl 500 tweets of 2 companies from Twitter using RStudio software and then conduct a sentiment score comparison on the results.
On NLP, I have learned how computers look at documents and phrases to data mine information. Of the 100% of words used in documents, it has been shown that statistically, 80% is noise while 18% of the words are where the important information is stored. The remaining 2% are “stop-list” words that hold no meaning and should be discarded (eg. prepositions, a/the/at/on etc). By slowly refining the words list and “training” the software, we will then be able to obtain the main messages within the document.
Our final project will be to use SAS Enterprise Miner to analyze 290 recipes for Thai, Indian and Italian dishes and train a model (decision tree or regression) to accurately identify the ethnicity of any dish by looking at the methods of preparation.
So much to learn, so little time left to digest them and then produce the work with a video presentation within the next 2 weeks. Oh, did I also mention that we will have a test next week too?
Leave a Reply