Completed New Features, Ongoing and Future Work
There are a number of things we are thinking of adding to our system -
Scripting Language
Typical usage of StockWave leads to some quite
repetitive tasks - we can setup a batch job system
for automating these. But we can go a step further
than this and introduce a full-featured (i.e.
Turing-complete) 'proper' programming language.
With this facility we can create 'cyber-traders' which can, e.g. mine the stocks database for interesting stock picks, do analyses for these stocks, select trading opportunities, make trades and monitor the outcomes; furthermore we can incorporate learning into this stock-trading 'meta-algorithm', so that our automated trader can improve his own performance over time.
In view of this idea it is fascinating to look
forward at the possible future of the market system
itself; it is quite likely that in less than 10
years time there will be no traders, no analysts
and no fund managers anymore, simply a collection
of supercomputer AI systems run by the central
banks, investment banks and hedge funds all trading
against each other at super high frequencies. (You
can bet that a lot of money is being spent by these
big players on super-secret behind-the-scenes
technology to give themselves whatever small
trading advantage that can make them money; just
don't expect them to tell you about it!)
If such a situation ever arises it will be interesting to see what happens - markets could become completely 'efficient', or we could see a total financial meltdown, followed by economic collapse; Adam Smith or Karl Marx - take your pick!
Update Nov 2008 : After considering various options we have decided to use Python as the programming interface for StockWave; a StockWave API module will be created which can be imported into the Python system. Note that Python is a full-featured programming language, with extensive libraries and a very easy syntax - it can be downloaded for free, as can high quality programming tools like Eclipse; we believe this will give users who feel the need to develop their own custom systems, the maximum flexibility possible. Apologies to those who had hoped we might provide an interface with Visual Basic, EasyLanguage, C# or something else - trust us, Python is better than any of these. Learn Python!
Nov 2010 Python has been added using SWIG.
Natural Language Processing
Having news analysis alongside the usual charting
tools is quite unusual in a stockmarket program;
news articles are aggregated into news events and
assignments made of their MFI values. The MFI scale
is a classification system for news events; the
scores are assigned by comparing a candidates event
against a list of known types - the final
assignments of values is done by various ad hoc
algorithms based around pattern matching on
keywords and keyphrases. Having an accurate, by
which we really mean 'good enough', MFI score
allows us to examine share price response to news,
and to do so quantitatively.
Because the MFI assignment algorithm doesn't really 'read the news' it sometimes gets things wrong; what we need to do is to add a genuine natural language parsing engine which can more accurately judge the senses of the words in an article, and also account for the kind thing humans find easy, but computers hard, i.e. resolving ambiguity.
NLP techniques can also be used to sharpen up the Web Agent; when we go deep-digging for specific information about a company we start with the search engine query results - mostly we can find what we want, but sometimes due to the nature of search engines, we can miss out on high relevance content due to keyword-rich commercial spam. Search engines are all about keywords and commercial webmasters have become very skilled at producing pages which rank highly on these engines (- some might say it is killing the art of writing); the engines don't have much understanding of related concepts and of context - although some seem to have added a thesaurus to their algorithms and use clustering to relate things together, it doesn't seem that good, and it makes things slower. NLP can help to sharpen our heuristics for what we think is relevant to our queries thus helping to home-in on specific issues, for example, legal cases where a company is the plaintiff; a naive search engine result would throw up a lot of cases where the company was the defender, or documents relating to legal cases where the company was mentioned in passing but not involved in that particular case, plus a whole lot of other stuff which was irrelevant.
Jan 2011 Web Agent now has access to a dictionary, a thesaurus and a part-of-speech tagger; with this we can build Concept Models from our raw textual extractions. Furthermore looking to add interface for simple "common sense" reasoning.
News Event Modelling
When you start reading the news archives you begin
to get a feel for the kinds of things that happen
in the financial world - the range and character of
financial discourse, how one thing follows another,
and so on. Common events of note include -
takeovers, mergers, resignations, lawsuits,
strikes, new products, share buybacks, currency
devaluations, interest rate changes, terrorism,
wars, natural disaster, defaults - these are all
the kind of 'surprises' which can cause jumps in
the share price. What is more, there are often
interrelationships between events, sometimes
strongly causal, and sometimes loose,
probabilistic; sometimes not at all - in the
financial world, that is to say, a subset of the
human world, things do not normally happen in a
vacuum, they follow on from previous events e.g. if
profits are bad the share price might tank, then
there is a shareholders revolt at the AGM, at which
the CEO gets forced out, and then the share price
goes back up again, or when the share price was low
after the poor results, a hostile bid was launched
by a competitor, which was referred to the
competition regulators, simulaneously as the Bank
of England raised interest rates ...<
Our goal is to create a probabilistic event transition model directly relating to real-world news events which is calibrated specifically for our current company of interest, i.e. a detailed model for the kinds of news events we have referred to as 'trendbreakers' elsewhere in our documentation.
In the field of risk management it has long been acknowledged that there are two types of randomness to deal with - lets just call them 'nice' and 'nasty'; the nasty variety is what causes big jumps in the share price and leads to the 'fat tail' phenomenon (- or leptokurtosis if you want to use the correct jargon); this is the type of phenomenon which gives financial managers fits and sleepness nights as they try to hedge their portfolios while generating acceptable returns. Quants try to deal with the nasty randomness using variations of their mathematical models - 'jump diffusion', 'Levy processes' - but these models are crude.
Feb 2011 basic transition models have been created; these can be visualised in 3D.
Data Fusion
Data fusion is the act of taking together different
data, of possibly different types, origins and
formats and squeezing it all together to produce a
collective, overall, inference; you use all the
data to produce a single number which tells you
what you want to know, and consequently, the right
thing to do.
When you have a system which has share price timeseries, news event archives and company fundamentals, obviously you would like to somehow aggregate all this data together and express it as a unified belief about the future share price; we have developed two approaches to achieving this, but of course we cannot get started until we have all the elements in place.
Currently in the field of financial data analysis when something happens that we don't know how to deal with within the methodology of our model we label it as an 'externality'/ 'outlier'/'exogeneous', ... then proceed to ignore it! But this isn't good enough - we need to take all the data, even different types of data, process it and express it as a single probabilistic prediction - a heatmap with all relevant information factored in to it.
Improved Neural Networks for Time Series
Prediction
Neural nets are great at learning from data, but
finding a suitable net can be difficult, and if you
want to find the very best for the job, almost
impossible. StockWave includes networks with 3
different architectures, each of which can be
further parametrised by the number of nodes and/or
hidden layers they use.
Having a choice of the common architectures hopefully covers most of the bases for our practical application, but it is entirely possible that there is some wild and wacky architecture, completely different from the usual types which is best for your problem; actually discovering this specific configuration is less likely than finding a specific grain of sand on Bondi beach.
Building your own specialised, custom-job neural networks to solve a specific problem is tedious in the extreme - if you want to try it out, then download the Stuttgart Neural Network Simulator and play around with it - see how long your patience lasts.
One approach to the structural optimisation problem is to revert to genetic algorithms - the classic solution to "global search when we don't have a clue"; a prototype neurogenetic analyzer has been developed but is much too slow for the mass market - you really need a multiprocessor system for it to become viable. It is interesting to note that the most fruitful efforts in doing this kind of thing have relied on customised hardware, mostly clusters of FPGA-based systems (- i.e. racks of specialised computer chips) - and before you ask, the answer is NO - you cannot buy this kind of kit down at PC World for £399. (Or at 10 or 100 times this price - the Pentagon might object!)
We have another approach to this problem we call the SSONN - the structurally self-optimising neural network, but this is at a very early stage of development and cannot be guaranteed to bear fruit. (It's interesting, though ...)
HFS
This is a variation of the Bayesian Correlator
analyzer, another timeseries technique.
Company Fundamentals Database
We haven't really touched on the matter of
fundamentals so far - not because we don't think
its important - we don't believe in throwing away
any potentially useful information just because it
doesn't fit in with some preconceived notions about
what matters and what doesn't - so although from
moment to moment, it is the share price itself, and
the current market sentiment from recent news
events that seems to have the most effect on the
evolution of the share price, we do believe
fundamentals can give us something, if we approach
them in the right way.
Currently investors look at key ratios of fundamentals to decide whether a company is 'good' or 'bad' with respect to some pre-defined stock-picking criterion, buy the shares if they seem cheap, and then hold-on to them for the long-term, arguing that eventually the company's good fundamentals will be reflected in the share price.
But individually, key ratios may not mean very much (- even when not being subject to gross and fraudulent manipulation) - to get anything from fundamentals we have to look at them collectively for a sample of companies. The fashionable buzzword to use in this context is data-mining - there are a mature variety of algorithms for doing this (- clustering, k-means, self-organizing maps, decision trees, maximum entropy).
What this approach gives you is the ability to identify, should they exist, the 'signals', i.e. defining characteristics of good and bad companies; with such a filter one can then go on to classify hitherto little-known companies to see if they have the right-stuff, whatever it happens to be. (Note that this 'right stuff' may be a lot more complicated than anything which can be reasonably described by classical stock-picking rules.)
We know exactly what we want to do here - we have a good design, but the main problem lies in gathering the data to begin with. Again, as you might expect, selling investors access to proprietary databases of company fundamentals is a lucrative business (- see e.g. Hemscott, Dun&Bradstreet) - you cannot get such data for free, but what you can do is to gather it from open sources which are common and easily accessible on the internet. The problem lies in taking unstructured, often very messy, raw data downloaded from web sources and process it into the kind of structured formats which you need for a 'proper' database; although a level of automation is possible, some human intervention is often required, which is very slow. Since we need at least several hundred, and possibly more likely several thousand company data files for the database to be viable, you can imagine the problems; the database also needs to be updated, at least every quarter.
Once gathered into a proper database and allied with our data-mining algorithms, we can potentially do a great deal with our data, still, finding genuinely useful information can be tricky; the fact is that interacting with databases currently is, well, ... a bit rubbish, and although there are some very expensive visualisation packages which allow you to graphically 'browse' your data, mostly these do little more than the usual tedious barcharts/piecharts/scatterplots etc, or they let you do some fairly baffling and rather pointless things like plotting arbitrary quantities along the xyz axes and representing other various data by having coloured shapes and so forth; on the other hand we are developing a novel and informative visualisation which uses advanced graphics to uncover hidden relationships within the company data.
Aug 2009 - this is a nice idea, but fundamentals are so often compromised that it is hard to see how this could be anything other than a source of noise; prices are real, as are news stories, with very little scope for interpretation of them.
Nov 2010 What could be viable is using sifting of fundamentals to be used as a basic stock scanner.
Mar 2011 a 3D browser for all the companies has been added; it gives the ability to watch / scan the whole market - can also "sift" companies based on financial ratios, or execute queries on a database; datamining / clustering algorithms are available.
Data Feed Support
With StockWave you get streaming intra-day data for
free, with no subscriptions, but because we rely on
free sources, this data has a (small) delay on it.
Streaming realtime data is expensive and will
remain so, it is highly unlikely that anyone will
ever give it away for free - in the financial world
it is the culture that one must "pay through the
nose for everything", so this is unlikely to
change.
For some users delayed data is unacceptable - we quite understand this; currently, with some coaxing and the crafting of an appropriate configuration file we can actually take prices from an ASCII, i.e. simple text file, which may be an adequate solution if your current datafeed can export data in such a way.
But again, you may need more than this, and wish for direct support for e.g. eSignal, Reuters DataLink, Bloomberg, MarketEye, etc.
Update April 2007 - we are going to add the OpenTick datafeed.
2009 - OpenTick have packed it in ! - and their parent, Xasax has too. The datafeed issue will be supported on an ad hoc basis - there is much too much of a variation among this anyway. Call us with your requirements; in time we will cover the common feeds which publish a public API for their systems, really high-spec or low latency feeds will be a custom job.
Nov 2010 we can support some feeds through the Python interface.
Mac and Linux Users
StockWave is a Windows-only program, but we would
like to support Mac and Linux users; although
StockWave may work under some emulation programs, a
true cross-platform solution, e.g. using Qt, would
be preferrable.
The introduction of the new version of Windows - Vista, adds further complication to the matter.
Update April 2007 - it seems we don't need to bother with this - BootCamp from Apple and the WINE project for Linux are compatible with StockWave.