Documentation > Using Advanced Analyzers
Using Advanced Analyzers — choosing their
Parameter Values
Cutting your coat according to your cloth ...
All these rules of thumb are based on the
assumption of a 1,000 point data set and using a
3.6 GHz Intel Pentium 4 with 1 Gb RAM
(SpecFP2,000 ~ 1,900) — if you do not have
a fast machine, or you need an answer in minutes
rather than hours, forget about using the
advanced methods. Sorry! It's just the way it is.
(Go to http://www.spec.org/ and
get the floating point benchmarks for your
machine — divide 1,900 by the benchmark
for your machine and you will have the
adjustment factor for timings relative to the
test case.)
For all studies we suggest 5,000 runs MINIMUM and
at least 10,000 for serious work.
Ordinary random walks run in minutes.
- BPN analyzer
- Number of inputs ~ 50-80
- Number of hiddens ~ 120-200
- 3 layer network — i.e.:
- 1 input layer
- 1 hidden layer
- 1 output layer
We describe this as then, say a 50-120-1
feedforward network.
Using a four layer network (e.g., 50-120-30-1)
MAY be better, but not by much and will take a
lot longer to train. More than four layers is
almost certainly a waste of time.
These kind of studies can be done in a couple of
hours, say two to four.
To filter or not to filter?
Filtering can help with speed of training and
accuracy, i.e., error measures of the trained
network, but if you overdo it, you will, instead
of uncovering the 'true signal' be introducing
spurious order into the problem — you will
be training your network to "see things which
aren't really there" — this is bad enough,
but if you subsequently overtrain the network as
well... you will probably get some really crazy
looking results.
Experimentation will be necessary to find
suitable filters, so if you are in a hurry and
don't like "fiddling about" too much, restrain
yourself to the MCT algorithm and RAW input data.
(From a researcher's point of view, the interest
of the advanced filtering is in seeing how
wavelet transforms — a very exciting
mathematical technique — can be used in
stock market analysis; as far as I know this is
still pretty much virgin territory.)
- RNN analyzer
- SOM analyzer
- BCOR analyzer
Data Resolution and Sampling
1,000 points is about:
- 3 years of DAILY data
- 3 months of HOURLY data, and so on...
So if you want to use, say 5,500 data points, or
only 700 (I would use at least 500 as a minimum
dataset size), then the above benchmarks scale
linearly.
How far back you go in time with your input data
depends on how far forward you want to get with
your predictions, and with how far into the past
you consider the data to be relevant to the
current state (note that, unlike physical
systems, markets change their behaviour over time
— they have no universal laws which govern
them.) How much data you can use for input is
severely limited by your computational resources,
as stated above. Obviously, selecting your input
data — interval and samplng — is of
prime importance.
Data drawn at different samplings will have
different statistical properties, i.e., its
apparent randomness can vary — this is to
your advantage. That is to say, using lower
resolution data (but not too low) may produce
better results than using very high resolution
data — which will tend to be very noisy
indeed. This is interesting from the point of
view of the small investor — he may feel
himself disadvantaged at being unable to afford a
real-time, non-delayed, high frequency, second by
second datafeed, but in fact, having such a
service would not necessarily do him any good
anyway.
In choosing a sampling interval suitable for our
advanced techniques we can use the histogram for
help; if you see 'fat tails' in the histogram for
your data at a particular resolution, it means
that there are significant correlations within
the data at this sampling level — which
means the advanced, and very processor hungry,
algorithms will have a reasonable chance of
working well.
Derivatives
Tutorials
Guide
FAQ
Glossary
Copyright © 2006,2007 StockWave Software Ltd. All
Rights Reserved.
|