Multi Layered AI Powered Neural Network System

Neural networks are the systems which are composed by units which represents a mathematical expression. The whole system is designed for a predetermined number of numerical or Boolean inputs and outputs. The units, which are designed inspired by the biological neurons, are connected to the inputs and the outputs directly or indirectly and each connection represents a weigh coefficient to be multiplied by the related neuron’s value.

Neural networks consist of layers of neurons as seen in Figure 1. Each neuron in the layer is called a hidden neuron, connected to each input with a coefficient w and has a bias b. Each neuron has a predetermined transfer function to be applied on the input, commonly a sigmoid function.

Figure 1: The representation of a layer of hidden neurons in a neural network

Source: Demuth, Howard, Mark Beale, and Martin Hagan. Neural network toolbox 6.

User’s guide (2008): p. 2-9.

Neural networks are built with different approaches and strategies. Different approaches enable various types of neural networks to be researched on. These types include:

- Feed-forward neural networks
- Recurrent neural networks
- Convolutional neural networks
- Regulatory feedback neural networks

Neural networks have a wide range of use cases since it basically functions as a data in data out system. The main purposes are as follows:

- Classification
- Regression
- Compression
- Prediction
- Retro-diction
- Image recognition
- Anomaly detection

Back propagation is the method of finding the best weights and bias values according to the presented training data by iteration. Its purpose is to determine the weights and biases in order to mimic and approximate the unknown function which results in the output when the input data is fed as the input.

Back propagation basically uses gradient descent algorithm to reduce the error in multi-dimensional space (each input is a dimension). Properly trained neural networks by back propagation give reasonable results on inputs in the light of the training data set. In this manner, the training data set should be a representative subset of the whole space in order to achieve more accurate approximations.

Overfitting and underfitting are the most common phenomena encountered in neural network training. Underfitting represents the training state that network is still not accurate enough (still hunger for data), on the other hand overfitting represents that the neural network is fed with too much data and the model has memorized the training data set instead of learning it for general purposes.

Figure 2: A sample performance diagram over the iterations of a training process

Figure 2 shows the training process of a neural networks with iterations (epochs) using MATLAB Neural Network Toolbox. As seen on the figure, the error of the test data decreases with each epoch till the 8th epoch. From 0th to 7th epoch, the network is in underfitting state.

After 8th epoch, it is seen that the error on the training data continues decreasing but the error on the test and validation data starts to increase. The reduction on the training data may seem profitable but overfitting decreases the generality of the model on new data which were not presented to the neural network while training.

- The reasons of overfitting & underfitting may include
- Too less or too much data
- Too small neural network model
- Not enough hidden layers

- Insufficient hidden neurons
- Algorithmic flaws in the training functions

- Insensitivity in the error function
- Mismatch of the complexity of the model and the problem

In order to clarify the technical process a simplifying metaphor is utilized.

- Students from a wide range of characteristics and educational background are trained with prepared datasets corresponding to certain assets.
- The assistants constantly observe and examine the students. The students are graded and distributed into different classrooms.
- Each classroom (cluster) has its own corresponding aggregate grading. The investment decision is made through the clusters by smartly averaging the output estimations of the students in the classrooms.

Figure 3: Simplified general process flow chart

The value generation process is seen on the figure in a wider perspective. The data is fed to the training in order to generate AI units for the AI units pool. The units in the pool are assessed, scored and clustered by the second layer of classifying machine learning units.

The trading decisions are made through the evaluation of the clusters to reduce the risk of individual unit mistakes. The performance of the trading ideas are continuously measured with the real time tracking. The corresponding feedback is used at the previous stages of training AI units and clustering layer. The general process flow including the student & assistant metaphor with a more detailed view is as follows.

Figure 4: Relatively-detailed process flow chart

The main purpose of the collection of artificial intelligence units in the pool is not to rely only on individual models. This approach aims to avoid getting stuck in a local minimum error of only one machine learning model & Method. To overcome the problem we apply an ensemble solution by collecting massive count of various ML models.

In the same direction of reducing the risk of individual model mistakes another layer of classifying ML units are laid on top of the AI units pool. The AI units are categorized and gathered under corresponding clusters. These clusters might or might not have a descriptive definition due to unsupervised clustering methods.

The final trading decisions are obtained from the clusters. The clustering process enables the system to diversify the risk of individual models, ML method & configuration by establishing an ensemble solution.

Figure 5: The multi-layered structure of the production mechanism

Figure 6: The AI units ecosystem explained in detailed process flow

Making use of an Azure server farm, we continuously execute machine learning processes to generate AI units fitting the new financial markets’ data with a lower error rate. The generation of new units is crucial to enable the system to evolve and to explain the always-evolving financial markets.

Automation of the parametrization of the training is the core of the value created by the system. In order to achieve convergence to the minimum error in evolving markets new AI units are created in the same direction with the best performing previous AI units. The direction of the unit is transferred to the new units with 2 ways of approach:

- Starting the training with a similar initial seed
- Setting the training configurations with similar parameters

After the initial generation and the performance check of the AI units pool, the best performing units are selected by the errors accumulated in defined time intervals - the error functions included in the operation are listed below. For each stock, each error function and each time interval the top performing units are selected as the ideal models.

The next step is generating units similar to the previous top performing units. At this process, the learning does not start from an arbitrary random seed, but the initial seed is a randomized version of the previously successful neural network’s seed. In mathematical manner for a simple 1-hidden-layered neural network as shown in Figure 3

In order to preserve the information in the initial W matrix, is taken as 0. 2 is the parameter for the randomization factor.

Too low value of 2 results in bulkiness of the system and the evolution speed of the AI units might be slower than the stock markets. Too high value of 2 results in big variation and loss of the effect of the top performing ancestor. In order to limit the variation of the weights by their own value Wisr with a probability of 99,7%, sigma is calculated by the following formula (for more information: three-sigma rule).

Note: The same approach is applied on the bias array to comprehensively mimic the top performing previous network’s training.

N new seeds are fed to the deep-learning process and become N new neural networks. The training configurations are configured similar to the previous network as well with a randomizing factor.

The training configurations are exclusively covering a wide range of all the parameters space available. The parameters space consists of the following:

- Data Processing Parameters

- Data Normalization

- Standardization
- Min-max Scaling

- Data Partitioning Parameters

- Training Data Percentage
- Validation Data Percentage
- Test Data Percentage

- Data Division Types

- Random Shuffling
- Chronological Partitioning

- Network Parameters

- Activation function

An artificial neuron calculates a weighted sum of its input, adds a bias value and decides whether the neuron itself should be activated or not. The fact that the possible range of the weighted sum of the input can vary between minus infinity and plus infinity leads to a requirement of a function which can be used to map these values into a specific range. Activation functions are used in this context and are as follows:

- Step Function
- Linear Function
- Logistic (Sigmoid) Function
- Symmetric Logistic Function
- Tanh Function
- ReLu Function
- Sinusoidal Function
- Gaussian Function

- Number of Hidden Layers & Neurons

The literature proposes several formulas to find the most comprehensive solution for the structural design of neural networks. However no certain analytical formula exists which can be applied generally. In order to find the best design structures for each assets and data sets, we execute Monte Carlo simulations around the widely accepted formulas.

- Training Function

The training functions yield the mathematical operations to iterate the hidden layers matrix. Most of the available training functions are variations of backpropagation which refers to a widely-proven iteration algorithm.

- trainb: Batch training with weight & bias learning rules
- trainbfg: BFGS quasi-Newton backpropagation
- trainbr: Bayesian regularization
- trainc: Cyclical order incremental training w/learning functions
- traincgb: Powell -Beale conjugate gradient backpropagation
- traincgf: Fletcher-Powell conjugate gradient backpropagation
- traincgp: Polak-Ribiere conjugate gradient backpropagation
- traingd: Gradient descent backpropagation
- traingdm: Gradient descent with momentum backpropagation
- traingda: Gradient descent with adaptive lr backpropagation
- traingdx: Gradient descent w/momentum & adaptive lr backpropagation
- trainlm: Levenberg-Marquardt backpropagation
- trainoss: One step secant backpropagation
- trainr: Random order incremental training w/learning functions
- trainrp: Resilient backpropagation (Rprop)
- trains:Sequential order incremental training w/learning functions
- trainscg: Scaled conjugate gradient backpropagation
- Learning Parameters

- Error Function

The objective functions (i.e. i, ii, iii ) are fed to the training software as the reciprocal to enable the minimization of the objective functions which require to be maximized foundationally. The following error functions are sorted by the frequency of the usage in the training process:

- Reciprocal of Compound Growth
- Reciprocal of Hit Ratio
- Reciprocal of Sharpe Ratio
- The Determination Index (R2)
- Cross-Entropy
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- Mean Absolute Percentage Error (MAPE)

We categorize stocks & the AI units trained to predict the corresponding stock according to the stock‘s market cap (and category), shares outstanding, free float rate and average daily volume (last 20 days and 3 months).

Initial Classification Criteria

• Market Cap | • Free Float Rate |

• Market Cap Category | • Average Volume (last 20 days) |

• Shares Outstanding | • Average Volume (last 3 months) |

After reaching initial categories of AI units clusters, we evaluate each unit under its corresponding stock’s sector and industry with its peers. Instead of each stock’s nominal value at a certain criteria, its position within the sector and industry is prioritized by comparing nominal values to sector and industry averages. We group the stocks & AI units at 12 main sectors and 135 industries.

Sector List

• Basic Industries | • Finance |

• Capital Goods | • Health Care |

• Consumer Durables | • Miscellaneous |

• Consumer Non-Durables | • Public Utilities |

• Consumer Services | • Technology |

• Energy | • Transportation |

After training technical indicator values at neural network models which are the primary source of input data, the fundamental data of stocks is used as a categorical variable to neural networks. Fundamental values which changes quarterly are not directly used for daily trade decisions in terms of risk/return analysis, valuation metrics and stop loss level.

Fundamental Criteria

• Earnings Per Share (EPS) | • EV/EBITDA |

• 3 year EPS Growth | • Price to Book Ratio (P/B) |

• Price to Earnings Ratio (P/E) | • Price to Sales Ratio (P/S) |

• Dividend Per Share (DPS) | • Return On Equity (ROE) |

• Dividend Yield | • Net Profit Margin |

• 3 year Dividend Yield | • Debt/Total Capital |

• 3 year Dividend Growth | • Debt/Equity Ratio |

• 3 year Sales Growth | • Current Ratio |

• EBITDA Margin | • Acid Test Ratio |

• EBITDA Growth | • Cash Ratio |

Input Data

Initially, the trade time is from market open to market close which means no positions are held overnight. During the day, the model suggests to equally open position at 10 - 20 stocks on average. Positions might be either long or short. Each trade has an equal weight but different stop loss level depends on the expected return and risk.

Since positions get closed in couple of hours, technical indicators are the biggest source of data. Complex neural networks differentiate from linear neural networks by not giving access to learn the coefﬁcients of the input data. Since the effects of the variables on the output is unknown, in this manner it could be seen as black-box trading. 61 technical indicators data is used at training, validation and testing steps at neural networks.