Predicting NEPSE Stock Prices with Machine Learning: A Step-by-Step Guide

Predicting stock prices is a challenging task, especially in emerging markets like Nepal, where data quality and availability can be limited. However, with the increasing availability of historical stock price data, machine learning algorithms can be used to build accurate prediction models. In this post, we will explore how to predict NEPSE stock prices using machine learning, focusing on data collection, preprocessing, model implementation, and evaluation. By the end of this post, you will have a clear understanding of how to build a predictive model for the Nepal Stock Exchange (NEPSE) using historical data.

Key Takeaways

Collecting and preprocessing historical stock price data is crucial for building an accurate prediction model.
Machine learning algorithms such as LSTM and Prophet can be used to predict stock prices.
Evaluating the performance of the model using metrics such as mean absolute error and mean squared error is essential.

The Problem

Predicting stock prices is a real-world problem that affects investors, financial institutions, and the overall economy. In Nepal, the Nepal Stock Exchange (NEPSE) is the primary stock exchange, and predicting its stock prices can help investors make informed decisions. However, building an accurate prediction model requires high-quality historical data and a robust machine learning algorithm.

Data and Sources

The data used in this post is the historical daily closing prices of the NEPSE index, which can be obtained from the NEPSE website or through APIs such as Alpha Vantage. The data is accessed on 2024-09-16, and it is essential to note that the accuracy of the model may vary depending on the quality and availability of the data. For more information, please visit the NEPSE website or the Alpha Vantage API documentation.

Loading the Data

To load the data, we will use the Alpha Vantage API, which provides free and paid APIs for historical stock price data. We will use the `requests` library to send a GET request to the API and retrieve the data in JSON format.

import requests
response = requests.get("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=NEPSE&apikey=YOUR_API_KEY")
data = response.json()

Data Preprocessing

After loading the data, we need to preprocess it to prepare it for the machine learning algorithm. This includes handling missing values, converting the data to a suitable format, and normalizing the data.

import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# Convert the data to a pandas DataFrame
df = pd.DataFrame(data['Time Series (Daily)']).T

# Handle missing values
df.fillna(method='ffill', inplace=True)

# Normalize the data
scaler = MinMaxScaler()
df[['1. open', '2. high', '3. low', '4. close', '5. volume']] = scaler.fit_transform(df[['1. open', '2. high', '3. low', '4. close', '5. volume']])

Model Implementation

For this example, we will use the LSTM (Long Short-Term Memory) algorithm, which is a type of recurrent neural network suitable for time series forecasting. We will use the `keras` library to implement the LSTM model.

from keras.models import Sequential
from keras.layers import LSTM, Dense

# Create the LSTM model
model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=(df.shape[1], 1)))
model.add(LSTM(units=50, return_sequences=False))
model.add(Dense(25))
model.add(Dense(1))

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

Model Evaluation

After training the model, we need to evaluate its performance using metrics such as mean absolute error and mean squared error.

from sklearn.metrics import mean_absolute_error, mean_squared_error

# Predict the stock prices
predictions = model.predict(df)

# Evaluate the model
mae = mean_absolute_error(df['4. close'], predictions)
mse = mean_squared_error(df['4. close'], predictions)

print(f'MAE: {mae}')
print(f'MSE: {mse}')

Complete Script

The full runnable script combining all steps:

#!/usr/bin/env python3
import requests
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import LSTM, Dense
from sklearn.metrics import mean_absolute_error, mean_squared_error

def load_data():
    response = requests.get("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=NEPSE&apikey=YOUR_API_KEY")
    data = response.json()
    df = pd.DataFrame(data['Time Series (Daily)']).T
    df.fillna(method='ffill', inplace=True)
    scaler = MinMaxScaler()
    df[['1. open', '2. high', '3. low', '4. close', '5. volume']] = scaler.fit_transform(df[['1. open', '2. high', '3. low', '4. close', '5. volume']])
    return df

def train_model(df):
    model = Sequential()
    model.add(LSTM(units=50, return_sequences=True, input_shape=(df.shape[1], 1)))
    model.add(LSTM(units=50, return_sequences=False))
    model.add(Dense(25))
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

def main():
    df = load_data()
    model = train_model(df)
    predictions = model.predict(df)
    mae = mean_absolute_error(df['4. close'], predictions)
    mse = mean_squared_error(df['4. close'], predictions)
    print(f'MAE: {mae}')
    print(f'MSE: {mse}')

if __name__ == "__main__":
    main()

Expected Output

When you run the script, you should see the mean absolute error and mean squared error printed to the console. These metrics will give you an idea of the model's performance and help you refine it further.

Limitations and Tradeoffs

This approach has several limitations and tradeoffs. First, the quality of the data is crucial, and any errors or inconsistencies in the data can affect the model's performance. Second, the model is sensitive to the choice of hyperparameters, and tuning them can be time-consuming. Finally, the model is a simplification of the complex relationships between stock prices and other economic factors, and it should not be used as the sole basis for investment decisions.

Frequently Asked Questions

What is the best machine learning algorithm for stock price prediction?

The best algorithm depends on the specific problem and dataset. However, LSTM and Prophet are popular choices for time series forecasting.

How do I handle missing values in the data?

There are several ways to handle missing values, including forward filling, backward filling, and interpolation. The choice of method depends on the nature of the data and the specific problem.

What is the importance of data preprocessing in machine learning?

Data preprocessing is crucial in machine learning, as it helps to prepare the data for the algorithm and improve its performance. This includes handling missing values, normalizing the data, and transforming the data into a suitable format.

What I'd Change

In conclusion, building an accurate stock price prediction model requires careful consideration of the data, algorithm, and hyperparameters. While this approach provides a good starting point, there are several areas for improvement. For example, using more advanced machine learning algorithms, incorporating additional economic indicators, and refining the hyperparameters can all help to improve the model's performance. Ultimately, the key to success lies in continuous refinement and iteration, and I would recommend exploring these areas further to build a more robust and accurate model.

Py Data