Pythonism

code and the oracular

New Concept – The Sentiment Ticker

with 2 comments

The Concept

The sentiment ticker is a small widget that sits on your computer’s desktop much like the conventional stock ticker. This widget connects to a central server cluster run by the provider. It provides a graphical/numerical display of various indices of market, political and consumer mood and feeling. The central servers poll blogs, tweets, journalistic articles and press releases constantly and perform sentiment analysis on this online content. Sentiment mining is looking for clusters of mood and emotion words in certain contexts. this has been shown to be useful, most notably recently when the Arab Spring was “predicted” (in hindsight) by analysing the sentiments of millions of bloggers, tweeters etc. a statistically significant spike was found in middle east mood just before the spring started.

Background

A stock ticker is a small program that stays open on your screen with real-time updated information on stock prices, and states of the various markets. It usually shows a graph or two and required price information you have set it for. All traders use them perennially.

Data mining is any process whereby useful information is gleaned by a program from large collections of data, in this case the textual content of the web. Sentiment mining is a particular kind of data mining where psychological and linguistic techniques are used to guess about human emotion from text on the web. This can be done to find things like “consumer confidence”, “willingness to spend”, “confidence in a certain stock”, “positivity about the future”.

So the ticker is the program that displays useful and concise information, and the mining is the means whereby this information is obtained. One part of the system mines perpetually and sends the data gained to each ticker belonging to every user whereby it is displayed for the user to take advantage of.

Leading Hedge Funds are using sentiment mining already. The ticker would be the democratisation of this process because it would allow sentiment mined knowledge to be available for a price to any investor who wanted, not just the pioneers and the elite rich enough to pay for a bespoke service.

Elaboration

We know that constant updates on actual market price levels is now seen as essential for financial decision makers. The sentiment ticker will enhance anyone in the industries knowledge of what is happening in the world, in a given country, or in a given industry. Gauging the psychological element in financial processes has always been something leading analysts do, but with the sentiment ticker a valuable extra insight into processes will be available. The data will be available to slice in a number of ways, by region, country, chosen time period or even down to an individual company. Knowing the world’s mood will become an essential metric for so many areas. Macroeconomics will also benefit from the ticker, enabling investors and their associates to be that much more educated about what is buzzing in the “planetary mind”.

The central sentiment server will be based on a set of Natural Language Processing algorithms, most of which are quite simple because they involve measuring the frequencies of emotion-indicator keywords. The implementers will expand and refine the system as the markets build the new service into their operations. A survey of human emotional modes will lead to isolating keywords and key phrases at first. Much of this work has already been started, but the process will no doubt give rise to new insights which will be added as the knowledge base of the ticker service provider develops.

The programming work can be done using mainly open source tools such as Python and the NLTK, which form the leading platform for NLP work. It is recommended that the service be Windows based until the means to hire Linux and Mac developers emerges. The costs of renting servers to form the platform will be low and services like Amazon’s EC2 are cheap and flexible enough to work well with the problem. A spider will constantly collect text from the web and the language processing will run concurrently so that the ticker can update on short intervals. Individual blocks of text will be ranked for importance using a spread of factors such as company position of the author, influence of them and their employer, location etc. The sentiment indices will be calculated using a weighting system that takes advantage of the rank given to the text.

The widget will be a small, attractively designed desktop window that stays active in the background when other work is being done. There may at first be a number of packages offered on a subscription basis perhaps starting with a bare-bones free service and building up to a platinum service with full functionality.

Spoofing

It is anticipated that many people will find it in their interests to try and manipulate the output of sentiment tickers by spoofing. This can be as simple as getting Facebook friends to say they believe a given stock will rise, or as complex as hiring armies of content authors to blog for you. This may even birth a whole industry in the same way as web content farms try to spoof Google’s keyword mechanisms to earn pay-per-click money. This will likely necessitate an arms race between sentiment developers and spammers, the exact unfolding of which will need to be observed and adapted to. This possibility does not invalidate the usefulness of the device, though, in the same way that keyword search on the web still stands in the face of spammers.

Development Issues

Initially it would seem sensible to let Google do a lot of the work for you. Page Rank, Google’s index of the influence of a web page will be a useful way to weight the importance of a sentiment source. Developers will start with a handful of stocks and get recent mentions of them from automatically querying google news, then try to make the system “read” the content and develop a means for assessing confidence in the given stock from the text. Many are quite used to stripping data from unstructured text, and at first just try to do it with keywords. the ambiguity factor will come into it a lot though, eg:

“… investor shows high confidence in Volkswagen…” : positive
“…investors are wrong to place high confidence in Volkswagen…” : negative
“… is an open question whether to place high confidence in Volkswagen…” : neutral

these three snippets might look the same to a system that couldn’t make very good judgements about context. so a way of discerning opinions has to be quite subtle, more so than just reading lots of text and trying to judge based on the occurrence of phrases. If you develop a way to make a good-enough system by testing on, say, 5 stocks, then scaling up to hundreds would not be substantially harder.

It is about looking for patterns in language so developers will need to collect loads and loads of these patterns, with an interpretation for each. Existing mining techniques are developing to resolve this kind of linguistic problem and much progress. When I was making the Essigned skills search we found that automation wasn’t very reliable because our system filled itself up with irrelevant words. so we hacked thousands of words by hand which was punishing work. Much of this has already been done in the Sentiment Mining community.

Language is very subtle, but with big data even getting a “buy” instruction right 51% of the time gives a 1 % advantage over random choice and might even get you a steady profit… This is what the pioneering hedge funds are already discovering.

The best way to proceed is to imagine that you had to solve this problem by hand, so you then read lots of financial news and observe how you are interpreting the text to make judgements. Then, when you see patterns emerge in what you do, develop a heuristic and then turn it into code and let the computer do the work for you. This is how sentiment mining is already moving forward.

The Sentiment Ticker – The Visionary Bit

The markets are a Chaotic Phenomenon (yes, caps). In the early days we knew that because of the Butterfly Effect there was a limit to the accuracy of our predictions. What an individual CEO ate for breakfast might critically alter future outcomes. Now we can move the fine granularity of our perceptions down one stage further because there is more data to trawl. The causal nexus of mass world opinion is a door that is starting to open with sentiment mining. The more honest data that people pour into the vastness of the web, the more of a handle we have on generating good predictions. Yes, the Butterfly Effect still holds, but there has been progress and refinement.

It is in the interests of the burgeoning global mind that people share personal data online. As an aside the online privacy, and data misuse issues still remain, of course. And there has never been a more clear need for ethical practises, fair government and responsible use of data than now.

People still watch conventional media news, like ITN News or BBC, because they want to drill down deeper into world situations for planning and foresight. A service that provides a real time breakdown of sentiment across the planet is a global evolution of the very concept of news, taking us one stage nearer to the retreating goal of perfect information. To be truly “in tune” or to have “one’s ear to the ground” investors need to surf the rising tsunami of information with better tools and finer analysis. Competitiveness accelerates too with greater and more timely information The sentiment ticker provides this because it is a microscope into the previously invisible but nonetheless critical machinations and complex flows of world opinion, and human sentiment.

Once one has “got” the concept nothing less will satisfy for the data omnivore. People around the world are desperate for any opportunity to scope deeper into the maelstrom of what we are doing as a species.

Markets will accelerate and diversify. Customer relations, sales, branding, indeed any enterprise dependent on rich data, will evolve irreversibly. Politics too will change as voter opinion becomes a 3D, graphically rich continuum into which we can gaze, and through which we can interactively test our ideas. These changes will rock our world-views, but it must also be stressed that they will demand greater responsibility, as increased knowledge always does.

The evolving economy places power in the hands of innovators and those who can exploit novelty, but the sentiment ticker will eventually democratise further too, because anyone with a network connection may achieve further knowledge for empowered decision making. Open information liberates intellectual capital into the ownership of humanity, and the species itself becomes wealthier. As Sentiment Developers we hope to facilitate this process while still profiting through a small appropriate return on the investment of our faith and vision in a revolutionary concept.

Advertisements

Written by Luke Dunn

December 6, 2011 at 1:53 pm

2 Responses

Subscribe to comments with RSS.

  1. Cool. I want one. So is there publicly available software to do something like this? I wouldn’t even need a spider to crawl the web. Just a program I could insert my own documents into, the ones I considered imp. (say, Federal Reserve announcements) to tell me sentiment from them….

    LRK

    March 5, 2012 at 9:56 pm

  2. thanks for commenting LRK. This post represents where I have got to with the project which so far is just to agree that *there should be one*. Bear in mind that it seems likely any such system would improve the more documents it saw, if you want to extract sentiment from a very small number human reading might still be a good option.

    I thought up the idea and did some early planning but since then I have been struck with illness. I knew I didn’t have the energy to follow my idea to the end so I just tried to describe what I had and set it free. Do share the article with anyone you feel is interested, because if it can get around and someone else pick up the ball that will be a result for me.

    Trip Technician

    March 6, 2012 at 10:50 am


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: