
Forex data in Python
Foreign exchange of currencies is known as forex. Foreign exchange is the process of changing one currency into another in the foreign exchange market. This process which is also called forex trading is how individuals, businesses, central banks, and governments pay for products and services in other economies. You are trading forex whenever you make a purchase in a different currency or exchange money to travel. Due to the fact that currencies are usually traded in pairs, the foreign exchange market sets a currency’s relative value by determining the market price of one currency when purchased with another. Most forex trades aren’t made for practical reasons. Speculative traders bet on whether a currency’s value will increase or decrease relative to another in an effort to profit from changes in exchange rates.
The forex markets typically rank as the biggest and most liquid asset markets in the world due to the global reach of trade, business, and finance. This liquidity and volatility make the forex market very attractive for algorithmic traders. The first step to start with this market for an Algo trader is to get forex historical data. This data are usually available through Banks and financial companies, however, due to the decentralized nature of this market, the historical data might be slightly varied, depending on the source of the data. Also, typically, the forex historical data is not as clean as stock market data. Therefore before using it in any algorithm, we need to do some preprocessing on it.
For this post, I downloaded the historical data for the Euro-Pound exchange rates from Dukascopy. You may need to create an account and log in to be able to get their data. It is easy and completely free. You can download the data as a CSV file, and import it. If you are using Google Colab, as I do, you might need to mount your google drive first.
from google.colab import drive #for google colab users
drive.mount('/content/gdrive')
import pandas as pd
pd.options.display.float_format = '{:,.5f}'.format
data = pd.read_csv('/content/gdrive/MyDrive/EURGBP_Candlestick_1_M_BID_31.01.2021-13.12.2021.csv')
data.head()
Gmt time | Open | High | Low | Close | Volume | |
---|---|---|---|---|---|---|
0 | 01.02.2021 00:00:00.000 | 0.88502 | 0.88505 | 0.88498 | 0.88500 | 75.31000 |
1 | 01.02.2021 00:01:00.000 | 0.88500 | 0.88505 | 0.88497 | 0.88505 | 66.06000 |
2 | 01.02.2021 00:02:00.000 | 0.88504 | 0.88504 | 0.88498 | 0.88500 | 73.19000 |
3 | 01.02.2021 00:03:00.000 | 0.88502 | 0.88503 | 0.88496 | 0.88502 | 95.50000 |
4 | 01.02.2021 00:04:00.000 | 0.88502 | 0.88503 | 0.88499 | 0.88499 | 23.50000 |
data_forex = data.copy()
data_forex.columns = ['date', 'open', 'high', 'low', 'close', 'volume'] #change the columns' names
data_forex.date = pd.to_datetime(data_forex.date, format='%d.%m.%Y %H:%M:%S.%f') #conver dates to Pandas' datetime objects
data_forex = data_forex.set_index(data_forex.date)
data_forex = data_forex.drop(['date'], axis =1) #dropping the 'date' column
data_forex = data_forex.drop_duplicates(keep=False) #dropping the duplicates
data_forex.head()
open | high | low | close | volume | |
---|---|---|---|---|---|
date | |||||
2021-02-01 00:00:00 | 0.88502 | 0.88505 | 0.88498 | 0.88500 | 75.31000 |
2021-02-01 00:01:00 | 0.88500 | 0.88505 | 0.88497 | 0.88505 | 66.06000 |
2021-02-01 00:02:00 | 0.88504 | 0.88504 | 0.88498 | 0.88500 | 73.19000 |
2021-02-01 00:03:00 | 0.88502 | 0.88503 | 0.88496 | 0.88502 | 95.50000 |
2021-02-01 00:04:00 | 0.88502 | 0.88503 | 0.88499 | 0.88499 | 23.50000 |
For the forex data, it is realy nessecary to look at it with more details to be sure that you have a useful and intact data. For example, let’s export the trading days to check if we have a correct number of trading days, for 2021 year!
trading_days_series = pd.Series([], dtype='string')
for day, indx_data in data_forex.groupby(data_forex.index.date):
trading_days_series = trading_days_series.append(pd.Series([day.strftime("%Y-%m-%d")]),ignore_index= True)
trading_days_series
0 2021-02-01
1 2021-02-02
2 2021-02-03
3 2021-02-04
4 2021-02-05
...
267 2021-12-09
268 2021-12-10
269 2021-12-12
270 2021-12-13
271 2021-12-14
Length: 272, dtype: object
day_number = data_forex.index.get_loc('2021-12-14') #looking at a specific trading day in 2021
data_forex.iloc[day_number]
open | high | low | close | volume | |
---|---|---|---|---|---|
date | |||||
2021-12-14 00:00:00 | 0.85392 | 0.85397 | 0.85388 | 0.85393 | 50.64000 |
2021-12-14 00:01:00 | 0.85394 | 0.85398 | 0.85391 | 0.85396 | 40.90000 |
2021-12-14 00:02:00 | 0.85397 | 0.85400 | 0.85395 | 0.85396 | 33.14000 |
2021-12-14 00:03:00 | 0.85394 | 0.85398 | 0.85392 | 0.85394 | 40.87000 |
2021-12-14 00:04:00 | 0.85395 | 0.85396 | 0.85389 | 0.85396 | 35.29000 |
... | ... | ... | ... | ... | ... |
2021-12-14 23:55:00 | 0.85085 | 0.85085 | 0.85082 | 0.85082 | 6.44000 |
2021-12-14 23:56:00 | 0.85086 | 0.85088 | 0.85085 | 0.85087 | 18.94000 |
2021-12-14 23:57:00 | 0.85087 | 0.85089 | 0.85085 | 0.85086 | 22.85000 |
2021-12-14 23:58:00 | 0.85089 | 0.85090 | 0.85088 | 0.85089 | 21.24000 |
2021-12-14 23:59:00 | 0.85089 | 0.85089 | 0.85086 | 0.85087 | 10.69000 |
1431 rows × 5 columns
As it has been shown. above, our data is the 1-minute chart, as the traders call it. In another world, we have the summary of price fluctuations, in each one minute. What if we want to have the 5-minute chart? We simply can change the time frame by the blow lines of code.
data_forex_5min = data_forex.groupby(pd.Grouper(freq='5min')).agg({'open': 'first',
'high': 'max',
'low': 'min',
'close': 'last',
'volume': 'sum'})
data_forex_5min.columns = ["open", "high", "low", "close", "volume"]
data_forex_5min.head()
open | high | low | close | volume | |
---|---|---|---|---|---|
date | |||||
2021-02-01 00:00:00 | 0.88502 | 0.88505 | 0.88496 | 0.88499 | 333.56000 |
2021-02-01 00:05:00 | 0.88502 | 0.88513 | 0.88498 | 0.88509 | 174.09000 |
2021-02-01 00:10:00 | 0.88510 | 0.88511 | 0.88504 | 0.88507 | 243.92000 |
2021-02-01 00:15:00 | 0.88507 | 0.88511 | 0.88505 | 0.88507 | 243.87000 |
2021-02-01 00:20:00 | 0.88506 | 0.88507 | 0.88497 | 0.88500 | 185.69000 |
The Panadas.Grouper has combined all the trading data for every 5 minutes and picked the opening, closing, high and low prices from the combination for each 5-minutes time interval. However, it creates these time intervals uniformly for all trading days. It does make some Null value in our data for those days that the market was partially active (the market is closed on Fridays afternoon).
data_forex_5min['close'].isnull().any()
True
There are several methods to deal with Null data, but for now, I will drop them.
data_forex_5min = data_forex_5min.dropna() #dropping the null values