Forex data in Python

Forex data in Python

2022, Oct 05    

Foreign exchange of currencies is known as forex. Foreign exchange is the process of changing one currency into another in the foreign exchange market. This process which is also called forex trading is how individuals, businesses, central banks, and governments pay for products and services in other economies. You are trading forex whenever you make a purchase in a different currency or exchange money to travel. Due to the fact that currencies are usually traded in pairs, the foreign exchange market sets a currency’s relative value by determining the market price of one currency when purchased with another. Most forex trades aren’t made for practical reasons. Speculative traders bet on whether a currency’s value will increase or decrease relative to another in an effort to profit from changes in exchange rates.

The forex markets typically rank as the biggest and most liquid asset markets in the world due to the global reach of trade, business, and finance. This liquidity and volatility make the forex market very attractive for algorithmic traders. The first step to start with this market for an Algo trader is to get forex historical data. This data are usually available through Banks and financial companies, however, due to the decentralized nature of this market, the historical data might be slightly varied, depending on the source of the data. Also, typically, the forex historical data is not as clean as stock market data. Therefore before using it in any algorithm, we need to do some preprocessing on it.

For this post, I downloaded the historical data for the Euro-Pound exchange rates from Dukascopy. You may need to create an account and log in to be able to get their data. It is easy and completely free. You can download the data as a CSV file, and import it. If you are using Google Colab, as I do, you might need to mount your google drive first.

from google.colab import drive #for google colab users
drive.mount('/content/gdrive')
import pandas as pd 

pd.options.display.float_format = '{:,.5f}'.format
data = pd.read_csv('/content/gdrive/MyDrive/EURGBP_Candlestick_1_M_BID_31.01.2021-13.12.2021.csv')
data.head()
Gmt time Open High Low Close Volume
0 01.02.2021 00:00:00.000 0.88502 0.88505 0.88498 0.88500 75.31000
1 01.02.2021 00:01:00.000 0.88500 0.88505 0.88497 0.88505 66.06000
2 01.02.2021 00:02:00.000 0.88504 0.88504 0.88498 0.88500 73.19000
3 01.02.2021 00:03:00.000 0.88502 0.88503 0.88496 0.88502 95.50000
4 01.02.2021 00:04:00.000 0.88502 0.88503 0.88499 0.88499 23.50000
data_forex = data.copy()  
data_forex.columns = ['date', 'open', 'high', 'low', 'close', 'volume']             #change the columns' names
data_forex.date = pd.to_datetime(data_forex.date, format='%d.%m.%Y %H:%M:%S.%f')    #conver dates to Pandas' datetime objects
data_forex = data_forex.set_index(data_forex.date)
data_forex = data_forex.drop(['date'], axis =1)         #dropping the 'date' column
data_forex = data_forex.drop_duplicates(keep=False)     #dropping the duplicates
data_forex.head()
open high low close volume
date
2021-02-01 00:00:00 0.88502 0.88505 0.88498 0.88500 75.31000
2021-02-01 00:01:00 0.88500 0.88505 0.88497 0.88505 66.06000
2021-02-01 00:02:00 0.88504 0.88504 0.88498 0.88500 73.19000
2021-02-01 00:03:00 0.88502 0.88503 0.88496 0.88502 95.50000
2021-02-01 00:04:00 0.88502 0.88503 0.88499 0.88499 23.50000

For the forex data, it is realy nessecary to look at it with more details to be sure that you have a useful and intact data. For example, let’s export the trading days to check if we have a correct number of trading days, for 2021 year!

trading_days_series = pd.Series([], dtype='string')
for day, indx_data in data_forex.groupby(data_forex.index.date):
    trading_days_series = trading_days_series.append(pd.Series([day.strftime("%Y-%m-%d")]),ignore_index= True)

trading_days_series
0      2021-02-01
1      2021-02-02
2      2021-02-03
3      2021-02-04
4      2021-02-05
          ...    
267    2021-12-09
268    2021-12-10
269    2021-12-12
270    2021-12-13
271    2021-12-14
Length: 272, dtype: object
day_number =  data_forex.index.get_loc('2021-12-14')     #looking at a specific trading day in 2021
data_forex.iloc[day_number] 
open high low close volume
date
2021-12-14 00:00:00 0.85392 0.85397 0.85388 0.85393 50.64000
2021-12-14 00:01:00 0.85394 0.85398 0.85391 0.85396 40.90000
2021-12-14 00:02:00 0.85397 0.85400 0.85395 0.85396 33.14000
2021-12-14 00:03:00 0.85394 0.85398 0.85392 0.85394 40.87000
2021-12-14 00:04:00 0.85395 0.85396 0.85389 0.85396 35.29000
... ... ... ... ... ...
2021-12-14 23:55:00 0.85085 0.85085 0.85082 0.85082 6.44000
2021-12-14 23:56:00 0.85086 0.85088 0.85085 0.85087 18.94000
2021-12-14 23:57:00 0.85087 0.85089 0.85085 0.85086 22.85000
2021-12-14 23:58:00 0.85089 0.85090 0.85088 0.85089 21.24000
2021-12-14 23:59:00 0.85089 0.85089 0.85086 0.85087 10.69000

1431 rows × 5 columns

As it has been shown. above, our data is the 1-minute chart, as the traders call it. In another world, we have the summary of price fluctuations, in each one minute. What if we want to have the 5-minute chart? We simply can change the time frame by the blow lines of code.

data_forex_5min = data_forex.groupby(pd.Grouper(freq='5min')).agg({'open': 'first', 
                                                 'high': 'max', 
                                                 'low': 'min', 
                                                 'close': 'last',
                                                 'volume': 'sum'})
data_forex_5min.columns = ["open", "high", "low", "close", "volume"]
data_forex_5min.head()
open high low close volume
date
2021-02-01 00:00:00 0.88502 0.88505 0.88496 0.88499 333.56000
2021-02-01 00:05:00 0.88502 0.88513 0.88498 0.88509 174.09000
2021-02-01 00:10:00 0.88510 0.88511 0.88504 0.88507 243.92000
2021-02-01 00:15:00 0.88507 0.88511 0.88505 0.88507 243.87000
2021-02-01 00:20:00 0.88506 0.88507 0.88497 0.88500 185.69000

The Panadas.Grouper has combined all the trading data for every 5 minutes and picked the opening, closing, high and low prices from the combination for each 5-minutes time interval. However, it creates these time intervals uniformly for all trading days. It does make some Null value in our data for those days that the market was partially active (the market is closed on Fridays afternoon).

data_forex_5min['close'].isnull().any()
True

There are several methods to deal with Null data, but for now, I will drop them.

data_forex_5min = data_forex_5min.dropna()  #dropping the null values