Completed New Features, Ongoing and Future Work

There are a number of things we are thinking of adding to our system -

Scripting Language

Typical usage of StockWave leads to some quite repetitive tasks - we can setup a batch job system for automating these. But we can go a step further than this and introduce a full-featured (i.e. Turing-complete) 'proper' programming language.

With this facility we can create 'cyber-traders' which can, e.g. mine the stocks database for interesting stock picks, do analyses for these stocks, select trading opportunities, make trades and monitor the outcomes; furthermore we can incorporate learning into this stock-trading 'meta-algorithm', so that our automated trader can improve his own performance over time.

In view of this idea it is fascinating to look forward at the possible future of the market system itself; it is quite likely that in less than 10 years time there will be no traders, no analysts and no fund managers anymore, simply a collection of supercomputer AI systems run by the central banks, investment banks and hedge funds all trading against each other at super high frequencies. (You can bet that a lot of money is being spent by these big players on super-secret behind-the-scenes technology to give themselves whatever small trading advantage that can make them money; just don't expect them to tell you about it!)

If such a situation ever arises it will be interesting to see what happens - markets could become completely 'efficient', or we could see a total financial meltdown, followed by economic collapse; Adam Smith or Karl Marx - take your pick!

Update Nov 2008 : After considering various options we have decided to use Python as the programming interface for StockWave; a StockWave API module will be created which can be imported into the Python system. Note that Python is a full-featured programming language, with extensive libraries and a very easy syntax - it can be downloaded for free, as can high quality programming tools like Eclipse; we believe this will give users who feel the need to develop their own custom systems, the maximum flexibility possible. Apologies to those who had hoped we might provide an interface with Visual Basic, EasyLanguage, C# or something else - trust us, Python is better than any of these. Learn Python!

Nov 2010 Python has been added using SWIG.

Natural Language Processing

Having news analysis alongside the usual charting tools is quite unusual in a stockmarket program; news articles are aggregated into news events and assignments made of their MFI values. The MFI scale is a classification system for news events; the scores are assigned by comparing a candidates event against a list of known types - the final assignments of values is done by various ad hoc algorithms based around pattern matching on keywords and keyphrases. Having an accurate, by which we really mean 'good enough', MFI score allows us to examine share price response to news, and to do so quantitatively.

Because the MFI assignment algorithm doesn't really 'read the news' it sometimes gets things wrong; what we need to do is to add a genuine natural language parsing engine which can more accurately judge the senses of the words in an article, and also account for the kind thing humans find easy, but computers hard, i.e. resolving ambiguity.

NLP techniques can also be used to sharpen up the Web Agent; when we go deep-digging for specific information about a company we start with the search engine query results - mostly we can find what we want, but sometimes due to the nature of search engines, we can miss out on high relevance content due to keyword-rich commercial spam. Search engines are all about keywords and commercial webmasters have become very skilled at producing pages which rank highly on these engines (- some might say it is killing the art of writing); the engines don't have much understanding of related concepts and of context - although some seem to have added a thesaurus to their algorithms and use clustering to relate things together, it doesn't seem that good, and it makes things slower. NLP can help to sharpen our heuristics for what we think is relevant to our queries thus helping to home-in on specific issues, for example, legal cases where a company is the plaintiff; a naive search engine result would throw up a lot of cases where the company was the defender, or documents relating to legal cases where the company was mentioned in passing but not involved in that particular case, plus a whole lot of other stuff which was irrelevant.

Jan 2011 Web Agent now has access to a dictionary, a thesaurus and a part-of-speech tagger; with this we can build Concept Models from our raw textual extractions. Furthermore looking to add interface for simple "common sense" reasoning.

News Event Modelling

When you start reading the news archives you begin to get a feel for the kinds of things that happen in the financial world - the range and character of financial discourse, how one thing follows another, and so on. Common events of note include - takeovers, mergers, resignations, lawsuits, strikes, new products, share buybacks, currency devaluations, interest rate changes, terrorism, wars, natural disaster, defaults - these are all the kind of 'surprises' which can cause jumps in the share price. What is more, there are often interrelationships between events, sometimes strongly causal, and sometimes loose, probabilistic; sometimes not at all - in the financial world, that is to say, a subset of the human world, things do not normally happen in a vacuum, they follow on from previous events e.g. if profits are bad the share price might tank, then there is a shareholders revolt at the AGM, at which the CEO gets forced out, and then the share price goes back up again, or when the share price was low after the poor results, a hostile bid was launched by a competitor, which was referred to the competition regulators, simulaneously as the Bank of England raised interest rates ...<

Our goal is to create a probabilistic event transition model directly relating to real-world news events which is calibrated specifically for our current company of interest, i.e. a detailed model for the kinds of news events we have referred to as 'trendbreakers' elsewhere in our documentation.

In the field of risk management it has long been acknowledged that there are two types of randomness to deal with - lets just call them 'nice' and 'nasty'; the nasty variety is what causes big jumps in the share price and leads to the 'fat tail' phenomenon (- or leptokurtosis if you want to use the correct jargon); this is the type of phenomenon which gives financial managers fits and sleepness nights as they try to hedge their portfolios while generating acceptable returns. Quants try to deal with the nasty randomness using variations of their mathematical models - 'jump diffusion', 'Levy processes' - but these models are crude.

Feb 2011 basic transition models have been created; these can be visualised in 3D.

Data Fusion

Data fusion is the act of taking together different data, of possibly different types, origins and formats and squeezing it all together to produce a collective, overall, inference; you use all the data to produce a single number which tells you what you want to know, and consequently, the right thing to do.

When you have a system which has share price timeseries, news event archives and company fundamentals, obviously you would like to somehow aggregate all this data together and express it as a unified belief about the future share price; we have developed two approaches to achieving this, but of course we cannot get started until we have all the elements in place.

Currently in the field of financial data analysis when something happens that we don't know how to deal with within the methodology of our model we label it as an 'externality'/ 'outlier'/'exogeneous', ... then proceed to ignore it! But this isn't good enough - we need to take all the data, even different types of data, process it and express it as a single probabilistic prediction - a heatmap with all relevant information factored in to it.

Improved Neural Networks for Time Series Prediction

Neural nets are great at learning from data, but finding a suitable net can be difficult, and if you want to find the very best for the job, almost impossible. StockWave includes networks with 3 different architectures, each of which can be further parametrised by the number of nodes and/or hidden layers they use.

Having a choice of the common architectures hopefully covers most of the bases for our practical application, but it is entirely possible that there is some wild and wacky architecture, completely different from the usual types which is best for your problem; actually discovering this specific configuration is less likely than finding a specific grain of sand on Bondi beach.

Building your own specialised, custom-job neural networks to solve a specific problem is tedious in the extreme - if you want to try it out, then download the Stuttgart Neural Network Simulator and play around with it - see how long your patience lasts.

One approach to the structural optimisation problem is to revert to genetic algorithms - the classic solution to "global search when we don't have a clue"; a prototype neurogenetic analyzer has been developed but is much too slow for the mass market - you really need a multiprocessor system for it to become viable. It is interesting to note that the most fruitful efforts in doing this kind of thing have relied on customised hardware, mostly clusters of FPGA-based systems (- i.e. racks of specialised computer chips) - and before you ask, the answer is NO - you cannot buy this kind of kit down at PC World for £399. (Or at 10 or 100 times this price - the Pentagon might object!)

We have another approach to this problem we call the SSONN - the structurally self-optimising neural network, but this is at a very early stage of development and cannot be guaranteed to bear fruit. (It's interesting, though ...)


This is a variation of the Bayesian Correlator analyzer, another timeseries technique.

Company Fundamentals Database

We haven't really touched on the matter of fundamentals so far - not because we don't think its important - we don't believe in throwing away any potentially useful information just because it doesn't fit in with some preconceived notions about what matters and what doesn't - so although from moment to moment, it is the share price itself, and the  current market sentiment from recent news events that seems to have the most effect on the evolution of the share price, we do believe fundamentals can give us something, if we approach them in the right way.

Currently investors look at key ratios of fundamentals to decide whether a company is 'good' or 'bad' with respect to some pre-defined stock-picking criterion, buy the shares if they seem cheap, and then hold-on to them for the long-term, arguing that eventually the company's good fundamentals will be reflected in the share price.

But individually, key ratios may not mean very much (- even when not being subject to gross and fraudulent manipulation) - to get anything from fundamentals we have to look at them collectively for a sample of companies. The fashionable buzzword to use in this context is data-mining - there are a mature variety of algorithms for doing this (- clustering, k-means, self-organizing maps, decision trees, maximum entropy).

What this approach gives you is the ability to identify, should they exist, the 'signals', i.e. defining characteristics of good and bad companies; with such a filter one can then go on to classify hitherto little-known companies to see if they have the right-stuff, whatever it happens to be. (Note that this 'right stuff' may be a lot more complicated than anything which can be reasonably described by classical stock-picking rules.)

We know exactly what we want to do here - we have a good design, but the main problem lies in gathering the data to begin with. Again, as you might expect, selling investors access to proprietary databases of company fundamentals is a lucrative business (- see e.g. Hemscott, Dun&Bradstreet) - you cannot get such data for free, but what you can do is to gather it from open sources which are common and easily accessible on the internet. The problem lies in taking unstructured, often very messy, raw data downloaded from web sources and process it into the kind of structured formats which you need for a 'proper' database; although a level of automation is possible, some human intervention is often required, which is very slow. Since we need at least several hundred, and possibly more likely several thousand company data files for the database to be viable, you can imagine the problems; the database also needs to be updated, at least every quarter.

Once gathered into a proper database and allied with our data-mining algorithms, we can potentially do a great deal with our data, still, finding genuinely useful information can be tricky; the fact is that interacting with databases currently is, well, ... a bit rubbish, and although there are some very expensive visualisation packages which allow you to graphically 'browse' your data, mostly these do little more than the usual tedious barcharts/piecharts/scatterplots etc, or they let you do some fairly baffling and rather pointless things like plotting arbitrary quantities along the xyz axes and representing other various data by having coloured shapes and so forth; on the other hand we are developing a novel and informative visualisation which uses advanced graphics to uncover hidden relationships within the company data.

Aug 2009 - this is a nice idea, but fundamentals are so often compromised that it is hard to see how this could be anything other than a source of noise; prices are real, as are news stories, with very little scope for interpretation of them.

Nov 2010 What could be viable is using sifting of fundamentals to be used as a basic stock scanner.

Mar 2011 a 3D browser for all the companies has been added; it gives the ability to watch / scan the whole market - can also "sift" companies based on financial ratios, or execute queries on a database; datamining / clustering algorithms are available.

Data Feed Support

With StockWave you get streaming intra-day data for free, with no subscriptions, but because we rely on free sources, this data has a (small) delay on it. Streaming realtime data is expensive and will remain so, it is highly unlikely that anyone will ever give it away for free - in the financial world it is the culture that one must "pay through the nose for everything", so this is unlikely to change.

For some users delayed data is unacceptable - we quite understand this; currently, with some coaxing and the crafting of an appropriate configuration file we can actually take prices from an ASCII, i.e. simple text file, which may be an adequate solution if your current datafeed can export data in such a way.

But again, you may need more than this, and wish for direct support for e.g. eSignal, Reuters DataLink, Bloomberg, MarketEye, etc.

Update April 2007 - we are going to add the OpenTick datafeed.

2009 - OpenTick have packed it in ! - and their parent, Xasax has too. The datafeed issue will be supported on an ad hoc basis - there is much too much of a variation among this anyway. Call us with your requirements; in time we will cover the common feeds which publish a public API for their systems, really high-spec or low latency feeds will be a custom job.

Nov 2010 we can support some feeds through the Python interface.

Mac and Linux Users

StockWave is a Windows-only program, but we would like to support Mac and Linux users; although StockWave may work under some emulation programs, a true cross-platform solution, e.g. using Qt, would be preferrable.

The introduction of the new version of Windows - Vista, adds further complication to the matter.

Update April 2007 - it seems we don't need to bother with this - BootCamp from Apple and the WINE project for Linux are compatible with StockWave.