Archive for the ‘internet programming’ Category
I have recently entertained a whole series of connected thoughts inspired by the idea of the sentiment ticker. To summarise this is a device that uses keyword analysis to read the emotions of people who write content online. Because there’s so much online it may be possible to gauge the moods and feelings of whole groups, nations or the world. The technique is in its infancy but leading hedge funds already use it to try to predict stock prices, so where the money goes the rest are to follow, perhaps.
Coincidentally at the same time I started listening to an audiobook of Isaac Asimov’s Foundation. I soon observed there was a synchronicity. Author’s note: I don’t believe such coincidences are magical or psychic in origin, as Jung thought, but they are still an interesting mental event and even literary tool.
The sentiment ticker and opinion mining in general would definitely qualify as embryonic Asimovian Psychohistorical tools. Governments and organisations could use them for prediction and other research. A way to formalise events in current affairs and link them with observed sentiment trends would be very powerful. If connections could be found then you may have the beginning of actual equations.
The other subject which is connected is the real, as opposed to SF, subject of Psychohistory. This is a discipline that uses psychotherapy techniques to try and understand the motivation of nations, groups and particularly political leaders. It is fascinating and its primary conclusion is that child rearing is critical for the future of the species. This is because psychological damage to children propagates into damaged maladapted adults who act neurotically and create conflict and perpetuation of their damage in the world. The sentiment mining approach could be a very powerful tool for a modern de Mausian psychistorian because huge volumes of textual data could be sifted for emotion words and phrases which correspond to psychohistorical patterns. Incidentally the baroque violence and depraved imagery of newspaper political cartoons are currently one of the richest veins for sentiment mining by psychohistorians, but an image processor would currently be very hard to make that could do this job.
The sentiment ticker is a small widget that sits on your computer’s desktop much like the conventional stock ticker. This widget connects to a central server cluster run by the provider. It provides a graphical/numerical display of various indices of market, political and consumer mood and feeling. The central servers poll blogs, tweets, journalistic articles and press releases constantly and perform sentiment analysis on this online content. Sentiment mining is looking for clusters of mood and emotion words in certain contexts. this has been shown to be useful, most notably recently when the Arab Spring was “predicted” (in hindsight) by analysing the sentiments of millions of bloggers, tweeters etc. a statistically significant spike was found in middle east mood just before the spring started.
A stock ticker is a small program that stays open on your screen with real-time updated information on stock prices, and states of the various markets. It usually shows a graph or two and required price information you have set it for. All traders use them perennially.
I am slightly fascinated by Google Sets because of its connectedness to ontologies and knowledge extraction / datamining etc. I came up with a script that can be used to auto query Sets to build families of keywords that all share a very approximate subject domain in common. I used this for some personal research on a way to use page meta data keywords to assign pages to a given subject domain. For each subject “Physics…Chemistry…Biology…” etc I had the subject name as dictionary key and corresponding dictionary value as a list of keywords that indicated a likelihood of meta keywords containing that word belonging to text about that subject.
In the small print google says no use for commercial purposes, but they lock you out if you automate more than a handful of queries anyway, so I don’t think I was being that evil.
I wondered if the meta keywords could provide a sufficiently accurate way to classify documents so I fed my system some dummies and it came right about 80% of the time. In my code I started with a seed list of 10 or so keywords, which are then ‘grown’ in sets to populate the list with more similar domain words. I believe the algorithm behind sets has been used in some ontology work elsewhere including NELL and Google Squared.
I visualised my document collection as a bag of rice grains scattered onto the floor. My subject categories were like playing cards thrown onto the grains. In a taxonomic system you want maximum coverage by the cards and minimum overlap in the terminal categories of the bottom level. There are two kinds of overlap, true overlap where parts of each card are not covered, and enclosure where a bigger card completely covers smaller ones. For a taxonomy that covers every instance you want the terminals to have no overlap. Parent and ancestor categories can cover by inclusion but overlap among the terminals causes ambiguity which is bad for the system. If a user is looking for “biophysics” papers and some of those have been placed, unbeknownst to her, under “physical biology” then the user has been failed by the system.