Abstract:
Today’s world is highly dependent on financial markets. Financial markets are very dynamic, making it difficult to predict prices. If we can make good predictions, we can make financial gains without any risks. There are two main categories of data that we can use to predict the market price; historic market data and textual data. Most importantly, textual analysis on sources like news, social media and reports is more popular today among researchers. This process can be introduced as sentiment analysis. The whole idea behind sentiment analysis is checking the opinion behind the text; whether it is a positive, negative, or neutral polarity. This research focuses on sentiment-analysis-based financial market prediction using deep leaning. Market prediction using sentiment analysis is a very challenging task. There are complex linguistic issues to solve and using a microblog dataset like Twitter for the prediction task makes it even more difficult. However, the current prediction approach rarely exceeds the seventy percent accuracy mark. This research is based on the SEMEVAL 2017 fifth task and will use the same dataset shared by the SEMEVAL team. This thesis presents an improved version for above mention reported baseline. We experiment with different techniques in both machine leaning and deep learning domains. Lexicon based dictionaries are heavily used here in each model since this is a small dataset with train set (1693) and test set (793). We had to enlarge the dataset as much as possible to achieve good accuracy. We created mainly four models which are based on machine learning and deep leaning techniques. Support vector regression algorithm is used for the machine leaning model. Also we used convolutional neural network (CNN) , long short term memory (LSTM ) and gated recurrent unit (GRU ) as deep leaning architectures which are performed better than any of the baseline models on this dataset. Our deep leaning models are achieved maximum similarity scores than any of the single system. Finally, we experiment three main ensemble techniques which are Averaging Linear Regression and multilayer perceptron. We achieved best results from averaging ensemble model.