Cryptocurrency time series feature extraction

In this notebook I follow the methodology from "Visualising Forecasting Algorithm Performance using Time Series Instance Spaces" to generate time series features and then apply PCA and plot the components in an interactive holoviews graph to try and find interesting cryptocurrencies to look at

Resources:

In [1]:
from rpy2.robjects.packages import importr
#get ts object as python object
from rpy2.robjects import pandas2ri
import rpy2.robjects as robjects
import pandas as pd
import numpy as np
ts=robjects.r('ts')

import numpy as np
import pandas as pd
import holoviews as hv
hv.extension('bokeh', 'matplotlib')
#run this so bokeh plots are outputted when notebook is downlaoded as html    
from bokeh.io import output_notebook
output_notebook()

import seaborn as sns
import matplotlib.pyplot as plt
%pylab inline
sns.set(style="whitegrid")

from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from datetime import datetime

from coinmarketcap import Market


#install ForeCA package in R
try:
    ForeCA=importr('ForeCA')
except:
    #if forecast package doesnt load then need to install forecast package first
    importr("utils")
    utils = importr('utils')
    packnames = ('ForeCA','forecast')
    from rpy2.robjects.vectors import StrVector
    utils.install_packages(StrVector(packnames))
    ForeCA=importr('ForeCA')
Loading BokehJS ...
Populating the interactive namespace from numpy and matplotlib

Getting the Data

This function will return a dictionary of the crypto symbols as the keys and the values as the cryptocurrency names for a given amount of symbols

In [2]:
def get_top_x_symbols(n_symbols=500):
    """
    small function to return dict of top n symbols from coinmarket API
    
    keys are symbols
    values are names of cryptocurrencies
    """
    import pandas as pd
    import numpy as np
    from coinmarketcap import Market
    coinmarketcap = Market()
    
    n_iter = math.ceil(n_symbols/100)
    start = 0
    limit = 100
    
    symbol_name_dic = {}
    #paginate through results
    for page in range(n_iter):
        dic = coinmarketcap.ticker(start=start, limit=limit, convert='USD')
        #loop through each json block
        for i in dic['data']:
            symbol_name_dic[dic['data'][i]['symbol']] = dic['data'][i]['name']
        
        #count up
        start += 100
        limit += 100
        
        
    return symbol_name_dic
    
    
In [3]:
symbols = get_top_x_symbols()
#symbols

This is our workhorse function for making requests to coinmarketcap.com to get historical data for the coins

In [4]:
def get_historical_price_df(cryptocurrency='bitcoin', start_date = '20160501', end_date = datetime.now().strftime('%Y%m%d')):
    
    """gets historical prices and market cap of a particular cryptocurrency as a pandas df

        Args:
            cryptocurrency: full name of the cryptocurrency to pull

        Returns:
            historical_df: pandas df of historical data for the cryptocurrency
    """
    
    #we'll just take from 2013 up until current datetime
    url_string = "https://coinmarketcap.com/currencies/" + cryptocurrency + '/historical-data/?start=' + start_date + '&end=' + end_date
    historical_price_df = pd.read_html(url_string,parse_dates=['Date'])[0]
    historical_price_df = historical_price_df.set_index('Date')
    
    return historical_price_df

get_historical_price_df().head()
Out[4]:
Open High Low Close Volume Market Cap
Date
2018-05-23 8037.08 8054.66 7507.88 7557.82 6491120000 137024000000
2018-05-22 8419.87 8423.25 8004.58 8041.78 5137010000 143534000000
2018-05-21 8522.33 8557.52 8365.12 8418.99 5154990000 145264000000
2018-05-20 8246.99 8562.41 8205.24 8513.25 5191060000 140556000000
2018-05-19 8255.73 8372.06 8183.35 8247.18 4712400000 140689000000

We need to do some manual mapping because of the way the URL's on coinmarketcap work for historical data

In [5]:
#special cases that have different crypto names/symbols than what's found in url of coinmarketcap:
symbols['SXDT'] = 'spectre-dividend'
symbols['GTC'] = 'game'
symbols['IOC'] = 'iocoin'
symbols['ETP'] = 'metaverse'
symbols['MRK'] = 'mark-space'
symbols['TRST'] = 'trust'
symbols['HTML'] = 'html-coin'
symbols['POLY'] = 'polymath-network'
symbols['GBYTE'] = 'byteball'
symbols['DBET'] = 'decent-bet'
symbols['DLT'] = 'agrello-delta'
symbols['RMC'] = 'russian-mining-coin'
symbols['LBC'] = 'library-credit'
symbols['AMB'] = 'amber'
symbols['ECC'] = 'eccoin'
symbols['SAN'] = 'santiment'
symbols['BCN'] = 'bytecoin-bcn'
symbols['GUP'] = 'guppy'
symbols['KICK'] = 'kickico'
symbols['NET'] = 'nimiq'
symbols['ATM'] = 'attention-token-of-media'
symbols['MTN'] = 'medical-chain'
symbols['ELEC'] = 'electrifyasia'
symbols['DDD'] = 'scryinfo'
symbols['BCPT'] = 'blockmason'
symbols['BLT'] = 'bloomtoken'
symbols['NAV'] = 'nav-coin'
symbols['POE'] = 'poet'
symbols['MUSE'] = 'bitshares-music'
symbols['CFI'] = 'cofound-it'

This for loop will return us a dataframe with Sorted datetime index, and crypto ticker symbols as the columns

In [6]:
dfs = {}
for i in symbols.items():
    #try by symbol and by name of cryptocurrency
    try:
        #names with spaces get replace by hyphens
        df = get_historical_price_df(cryptocurrency=i[1].lower().replace(' ','-'))
        dfs[i] = df['Close'].sort_index()
    except:
        try:
            df = get_historical_price_df(cryptocurrency=i[0].lower())
            dfs[i] = df['Close'].sort_index()
        except:
        
            print(i,'not found')
            
#sometime get bad entries from APi, so delete these, could try retry in future for data pull
bad_keys = []
for i in dfs.items():
    if 'No data was found for the selected time period.' in i[1]:
        #delete dictionary entries with no data
        bad_keys.append(i[0])
        
for i in bad_keys:
    del dfs[i]
            
final_df = pd.DataFrame.from_dict(dfs)
final_df.columns = final_df.columns.droplevel(level=1)
final_df.head()
Out[6]:
$PAC 1ST ABT ACAT ACT ADA ADT ADX AE AEON ... ZAP ZCL ZCO ZEC ZEN ZIL ZOI ZPT ZRX ZSC
Date
2016-05-01 1.200000e-08 NaN NaN NaN NaN NaN NaN NaN NaN 0.009937 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2016-05-02 1.200000e-08 NaN NaN NaN NaN NaN NaN NaN NaN 0.011248 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2016-05-03 1.100000e-08 NaN NaN NaN NaN NaN NaN NaN NaN 0.010092 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2016-05-04 1.200000e-08 NaN NaN NaN NaN NaN NaN NaN NaN 0.010078 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2016-05-05 1.100000e-08 NaN NaN NaN NaN NaN NaN NaN NaN 0.009417 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 496 columns

We'll need to filter out series with less than 2 full year of data or else we can't use STL decomposition

In [7]:
column_keep_list = []
for column in final_df:
    if len(final_df[column][final_df[column].notna()].values) > 365*2:
        column_keep_list.append(column)
In [8]:
df = final_df[column_keep_list]
df.head()
Out[8]:
$PAC AEON AMP BAY BCN BITB BITCNY BITUSD BLK BLOCK ... VTC XCP XDN XEM XLM XMR XPM XRP XVG XWC
Date
2016-05-01 1.200000e-08 0.009937 0.045625 0.000463 0.000036 0.000045 0.150680 1.000000 0.029345 0.089024 ... 0.045914 1.27 0.000088 0.001606 0.001883 0.914293 0.085990 0.006791 0.000051 0.000266
2016-05-02 1.200000e-08 0.011248 0.050868 0.000439 0.000034 0.000046 0.150792 0.938603 0.029702 0.071552 ... 0.050688 1.25 0.000086 0.001574 0.001868 0.964395 0.082418 0.006714 0.000049 0.000351
2016-05-03 1.100000e-08 0.010092 0.048197 0.000503 0.000033 0.000046 0.149262 0.946518 0.029422 0.073155 ... 0.046547 1.16 0.000088 0.001578 0.001862 0.908237 0.082778 0.006560 0.000051 0.000342
2016-05-04 1.200000e-08 0.010078 0.050100 0.000478 0.000034 0.000045 0.153581 0.971128 0.029284 0.096075 ... 0.044714 1.10 0.000083 0.001535 0.001865 0.916455 0.080244 0.006323 0.000049 0.000371
2016-05-05 1.100000e-08 0.009417 0.050564 0.000450 0.000032 0.000045 0.152491 0.996995 0.029149 0.108424 ... 0.045264 1.12 0.000079 0.001535 0.001840 0.906841 0.079167 0.006193 0.000049 0.000323

5 rows × 83 columns

Extracting the features for Closing Price of each cryptocurrency series

The function below taken is a time series and returns the 6 features defined in :

In [9]:
def tsfeatures(time_series, freq=12):
    
    """Generates features from time series 
    (https://www.monash.edu/business/econometrics-and-business-statistics/research/publications/ebs/wp10-16.pdf)
    
        
        Args:
            time_series: time series object
            freq: frequency of time series (12 is monthly)

        Returns:
             full_series: time series of fitted and forecasted values

    """    
    #find the start of the time series
    start_ts = time_series[time_series.notna()].index[0]
    #find the end of the time series
    end_ts = time_series[time_series.notna()].index[-1]
    #extract actual time series
    time_series = time_series.loc[start_ts:end_ts]
    #interpolate any missing values
    time_series = time_series.interpolate()
    #converts to ts object in R
    time_series_R = robjects.FloatVector(time_series)
    rdata=ts(time_series_R,frequency=freq)

        
    rstring="""
     function(rdata){
     library(forecast)
     library(ForeCA)

     stl_decomp_res = stl(rdata, s.window = "periodic")
     trend = stl_decomp_res$time.series[,2]
     seasonal = stl_decomp_res$time.series[,1]
     remainder = stl_decomp_res$time.series[,3]

     detrend_ts = rdata - trend
     deseasonal_ts = rdata - seasonal

     #Entropy
     F1 = spectral_entropy(na.contiguous(rdata))[1L]
     #Strength_Trend
     F2 = 1- (var(remainder)/(var(deseasonal_ts)))
     #Strength_Seasonal
     F3 = 1- (var(remainder)/(var(detrend_ts)))
     #Period
     F4 = frequency(rdata)
     #First Order Autocorrelation
     F5_temp = Acf(rdata)
     F5 = F5_temp$acf[2]
     #optimal lambda parameter for transformation
     F6 = BoxCox.lambda(rdata,lower=0,upper=1)

     return(list(Entropy=F1,Trend=F2,Seasonal=F3,Period=F4,ACF1=F5,lambda=F6))
     }
    """
    
    rfunc=robjects.r(rstring)
    #gets fitted and predicted series, and lower and upper prediction intervals from R model
    Entropy,Trend_stength,Seasonal_strength,Period,ACF1,optim_lambda=rfunc(rdata)
    #convert to python objects
    Entropy = pandas2ri.ri2py(Entropy)
    Trend_stength = pandas2ri.ri2py(Trend_stength)
    Seasonal_strength = pandas2ri.ri2py(Seasonal_strength)
    Period = pandas2ri.ri2py(Period)
    ACF1 = pandas2ri.ri2py(ACF1)
    optim_lambda = pandas2ri.ri2py(optim_lambda)

    return Entropy[0],Trend_stength[0],Seasonal_strength[0],Period[0],ACF1[0],optim_lambda[0]

Now we'll loop through each column in the dataframe of cryptocurrencies and extract the time series features we want using the function above and append each feature for each series to a list

In [10]:
F1 = []
F2 = []
F3 = []
F4 = []
F5 = []
F6 = []
index_ts = []
for i in df:
    try:
        Entropy,Trend_stength,Seasonal_strength,Period,ACF1,optim_lambda = tsfeatures(df[i], freq=365)
        F1.append(Entropy)
        F2.append(Trend_stength)
        F3.append(Seasonal_strength)
        F4.append(Period)
        F5.append(ACF1)
        F6.append(optim_lambda)
        index_ts.append(i)
    except:
        print('error: ',i)

Now we can combine all these lists of features into a dataframe which has the features as columns and the coins as the row index

In [11]:
#all time series have same period here, so won't use here, as will become 0 in princomp analysis
df_features = pd.DataFrame(
    {'Entropy': F1,
     'Trend_stength': F2,
     'Seasonal_strength': F3,
     #'Period': F4,
     'ACF1': F5,
     'optim_lambda': F6
    },
    index=index_ts)
df_features.head()
Out[11]:
ACF1 Entropy Seasonal_strength Trend_stength optim_lambda
$PAC 0.992199 0.593229 0.500636 0.571272 0.000066
AEON 0.988175 0.439630 0.524733 0.732923 0.025587
AMP 0.983993 0.519536 0.391988 0.512149 0.000066
BAY 0.977061 0.510515 0.494084 0.589987 0.171018
BCN 0.925474 0.500465 0.451360 0.687967 0.366804

Below we can use seaborn Pairgrid to plot the scatterplots and density plots of the features over the space of time series

In [12]:
scatter_graph = sns.PairGrid(df_features, diag_sharey=False,size=2)
scatter_graph.map_lower(sns.kdeplot, cmap="Blues")
scatter_graph.map_upper(plt.scatter)
scatter_graph.map_diag(sns.kdeplot, shade=True,lw=3);
WARNING:matplotlib.legend:No handles with labels found to put in legend.
WARNING:matplotlib.legend:No handles with labels found to put in legend.
WARNING:matplotlib.legend:No handles with labels found to put in legend.
WARNING:matplotlib.legend:No handles with labels found to put in legend.
WARNING:matplotlib.legend:No handles with labels found to put in legend.

We can see that, similar to the research paper, there is correlation between trend strength and Entropy, autocorrelation at lag 1 and entropy, and possibly slight correlation of seasonal strength and entropy. This is all intuitive, as a lower entropy suggests a more forecastable series, and a series with high trend, seasonal, or lag 1 autocorrelation measures suggests a series is easier to forecast as well

Performing PCA on the features

Now we can use Principal Components analysis to reduce this feature space into 2 components that can be graphed in 2-D space and viewed in a more sensible way

In [13]:
pca = PCA(n_components=2)
pca.fit(df_features)

explained_ratios = pca.explained_variance_ratio_
print('PC1 variance explained: ', round(explained_ratios[0],4)*100,'%','\n', 
      'PC2 variance explained: ',round(explained_ratios[1],4)*100,'%')
PC1 variance explained:  52.56999999999999 % 
 PC2 variance explained:  40.300000000000004 %
In [14]:
df_PCA = pca.transform(df_features)
df_PCA[:5]
Out[14]:
array([[-0.15476091,  0.17618149],
       [-0.16914364, -0.02879959],
       [-0.14814567,  0.18691083],
       [ 0.00309484,  0.09693518],
       [ 0.18581921, -0.01336701]])

No we'll add the PCA coefficents for each row back to the features dataframe

In [15]:
df_plotting = df_features.copy()
df_plotting['PC1'] = df_PCA[:,0]
df_plotting['PC2'] = df_PCA[:,1]
#we are going to take the absolute value of the autocorrelation at lag 1, 
#because for plotting we don't really are about the direction, only the strength
df_plotting['ACF1'] = abs(df_plotting['ACF1'].values)
df_plotting.head()
Out[15]:
ACF1 Entropy Seasonal_strength Trend_stength optim_lambda PC1 PC2
$PAC 0.992199 0.593229 0.500636 0.571272 0.000066 -0.154761 0.176181
AEON 0.988175 0.439630 0.524733 0.732923 0.025587 -0.169144 -0.028800
AMP 0.983993 0.519536 0.391988 0.512149 0.000066 -0.148146 0.186911
BAY 0.977061 0.510515 0.494084 0.589987 0.171018 0.003095 0.096935
BCN 0.925474 0.500465 0.451360 0.687967 0.366804 0.185819 -0.013367

This is a plotting detail for holoviews, we'll reset the index so we can select the time series as a column

In [16]:
df_plotting_no_index = df_plotting.reset_index()
df_plotting_no_index.rename(columns={'index':'time_series'}, inplace=True)
df_plotting_no_index.head()
Out[16]:
time_series ACF1 Entropy Seasonal_strength Trend_stength optim_lambda PC1 PC2
0 $PAC 0.992199 0.593229 0.500636 0.571272 0.000066 -0.154761 0.176181
1 AEON 0.988175 0.439630 0.524733 0.732923 0.025587 -0.169144 -0.028800
2 AMP 0.983993 0.519536 0.391988 0.512149 0.000066 -0.148146 0.186911
3 BAY 0.977061 0.510515 0.494084 0.589987 0.171018 0.003095 0.096935
4 BCN 0.925474 0.500465 0.451360 0.687967 0.366804 0.185819 -0.013367

Plotting the features for Closing Price across the instance space

Now we create a dictionary where the keys are the feature names, and the values are the individual PCA plots for the features with each time series as a point

In [17]:
scatter_dic = {}
for feature in df_features.columns:
    #print(feature)
    scatter_plot_x = hv.Scatter(df_plotting_no_index, 'PC1',['PC2','time_series',feature],label=feature)
    scatter_plot_x = scatter_plot_x.options(width=300,height=200,tools=['hover'],color_index=feature,cmap='RdYlGn',colorbar=True,size=5,show_grid=True)
    scatter_dic[feature] = scatter_plot_x
    

Finally, we can plot whichever features we want from our dict onto a holoviews interactive chart. These graphs show the distribution of features across the instance space. You can hover over the individual points to see which cryptocurrency you are looking at.

Series towards the bottom left are easier to forecast, with low entropy, high trend, seasonality, and lag1 autocorrelation

In [18]:
feature_scatter = (scatter_dic['ACF1'] + scatter_dic['Entropy'] + scatter_dic['Seasonal_strength'] + 
 scatter_dic['Trend_stength'] + scatter_dic['optim_lambda']).cols(2)
feature_scatter
Out[18]:

It's no surprise to see BTC at the bottom left, since it's the most established coin, with a lot of data on historical price to look at

In [19]:
df[['BTC','BITUSD','LEO']].plot(subplots=True,title='Closing Price of BTC,BITUSD,LEO over time in USD',figsize=(10,8));

Extracting the features for Returns of each cryptocurrency series

Now we'll do the same analysis but look at returns, since that's more interesting than being able to predict the price of a coin

In [20]:
returns_df = df.pct_change()
returns_df.head()
Out[20]:
$PAC AEON AMP BAY BCN BITB BITCNY BITUSD BLK BLOCK ... VTC XCP XDN XEM XLM XMR XPM XRP XVG XWC
Date
2016-05-01 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2016-05-02 0.000000 0.131931 0.114915 -0.051836 -0.055556 0.022222 0.000743 -0.061397 0.012166 -0.196262 ... 0.103977 -0.015748 -0.022727 -0.019925 -0.007966 0.054799 -0.041540 -0.011339 -0.039216 0.319549
2016-05-03 -0.083333 -0.102774 -0.052508 0.145786 -0.029412 0.000000 -0.010146 0.008433 -0.009427 0.022403 ... -0.081696 -0.072000 0.023256 0.002541 -0.003212 -0.058231 0.004368 -0.022937 0.040816 -0.025641
2016-05-04 0.090909 -0.001387 0.039484 -0.049702 0.030303 -0.021739 0.028936 0.026001 -0.004690 0.313307 ... -0.039380 -0.051724 -0.056818 -0.027250 0.001611 0.009048 -0.030612 -0.036128 -0.039216 0.084795
2016-05-05 -0.083333 -0.065588 0.009261 -0.058577 -0.058824 0.000000 -0.007097 0.026636 -0.004610 0.128535 ... 0.012300 0.018182 -0.048193 0.000000 -0.013405 -0.010490 -0.013422 -0.020560 0.000000 -0.129380

5 rows × 83 columns

In [21]:
F1 = []
F2 = []
F3 = []
F4 = []
F5 = []
F6 = []
index_ts = []
for i in returns_df:
    try:
        Entropy,Trend_stength,Seasonal_strength,Period,ACF1,optim_lambda = tsfeatures(returns_df[i], freq=365)
        F1.append(Entropy)
        F2.append(Trend_stength)
        F3.append(Seasonal_strength)
        F4.append(Period)
        F5.append(ACF1)
        F6.append(optim_lambda)
        index_ts.append(i)
    except:
        print('error: ',i)
In [22]:
#all time series have same period here, so won't use here, as will become 0 in princomp analysis
df_features = pd.DataFrame(
    {'Entropy': F1,
     'Trend_stength': F2,
     'Seasonal_strength': F3,
     #'Period': F4,
     'ACF1': F5,
     'optim_lambda': F6
    },
    index=index_ts)
df_features.head()
Out[22]:
ACF1 Entropy Seasonal_strength Trend_stength optim_lambda
$PAC -0.001845 0.999976 0.500556 0.005920 0.000066
AEON -0.063987 0.986756 0.482791 0.001990 0.661200
AMP -0.029132 0.988044 0.518090 0.002758 0.999934
BAY 0.040579 0.984648 0.521432 0.011581 0.999934
BCN -0.056184 0.980589 0.428525 0.005559 0.800159
In [23]:
scatter_graph = sns.PairGrid(df_features, diag_sharey=False,size=2)
scatter_graph.map_lower(sns.kdeplot, cmap="Blues")
scatter_graph.map_upper(plt.scatter)
scatter_graph.map_diag(sns.kdeplot, shade=True,lw=3);
WARNING:matplotlib.legend:No handles with labels found to put in legend.
WARNING:matplotlib.legend:No handles with labels found to put in legend.
WARNING:matplotlib.legend:No handles with labels found to put in legend.
WARNING:matplotlib.legend:No handles with labels found to put in legend.
WARNING:matplotlib.legend:No handles with labels found to put in legend.
In [24]:
pca = PCA(n_components=2)
pca.fit(df_features)
df_PCA = pca.transform(df_features)

explained_ratios = pca.explained_variance_ratio_
print('PC1 variance explained: ', round(explained_ratios[0],4)*100,'%','\n', 
      'PC2 variance explained: ',round(explained_ratios[1],3)*100,'%')
PC1 variance explained:  92.91 % 
 PC2 variance explained:  6.6000000000000005 %
In [25]:
df_plotting = df_features.copy()
df_plotting['PC1'] = df_PCA[:,0]
df_plotting['PC2'] = df_PCA[:,1]
#we are going to take the absolute value of the autocorrelation at lag 1, 
#because for plotting we don't really are about the direction, only the strength
df_plotting['ACF1'] = abs(df_plotting['ACF1'].values)

df_plotting_no_index = df_plotting.reset_index()
df_plotting_no_index.rename(columns={'index':'time_series'}, inplace=True)
df_plotting_no_index.head()
Out[25]:
time_series ACF1 Entropy Seasonal_strength Trend_stength optim_lambda PC1 PC2
0 $PAC 0.001845 0.999976 0.500556 0.005920 0.000066 0.667272 -0.103818
1 AEON 0.063987 0.986756 0.482791 0.001990 0.661200 0.011952 0.003180
2 AMP 0.029132 0.988044 0.518090 0.002758 0.999934 -0.328388 -0.008320
3 BAY 0.040579 0.984648 0.521432 0.011581 0.999934 -0.333078 -0.077918
4 BCN 0.056184 0.980589 0.428525 0.005559 0.800159 -0.127136 0.004397
In [26]:
scatter_dic = {}
for feature in df_features.columns:
    #print(feature)
    scatter_plot_x = hv.Scatter(df_plotting_no_index, 'PC1',['PC2','time_series',feature],label=feature)
    scatter_plot_x = scatter_plot_x.options(width=300,height=200,tools=['hover'],color_index=feature,cmap='RdYlGn',colorbar=True,size=5,show_grid=True)
    scatter_dic[feature] = scatter_plot_x
    
feature_scatter = (scatter_dic['ACF1'] + scatter_dic['Entropy'] + scatter_dic['Seasonal_strength'] + 
 scatter_dic['Trend_stength'] + scatter_dic['optim_lambda']).cols(2)
feature_scatter
Out[26]:

As you might expect the forecastibility of returns of a cryptocurrency is extremely difficult with none having spectral entropy value less than .9, and you can see the lag 1 correlations and trend strengths are much lower than the raw closing price data as well

One interesting note is you can see DimeCoin is an anomaly in this data here with one of the highest relative trend strengths and lowest entropy. I'm not sure exactly why this is the case, but taking a quick glance at the coin's website, the coin does not have a max supply. In fact it is inflationary so perhaps this affects the trend of the returns. let's take a look below

In [27]:
print('summed inter-day percent returns for BTC: ',returns_df['BTC'].sum())
print('summed inter-day percent returns for ETH: ',returns_df['ETH'].sum())
print('summed inter-day percent returns for XRP: ',returns_df['XRP'].sum())
print('summed inter-day percent returns for SAFEX: ',returns_df['SAFEX'].sum())
print('summed inter-day percent returns for BITUSD: ',returns_df['BITUSD'].sum())
print('summed inter-day percent returns for DIME: ',returns_df['DIME'].sum())

returns_df[['DIME','BTC','ETH','XRP','SAFEX','BITUSD']].plot(subplots=True,title='Percent Returns for DIME and other cryptocurrencies',figsize=(10,12));
returns_df[['DIME','BTC','ETH','XRP','SAFEX','BITUSD']].plot(kind="hist",bins=100,subplots=True,title='Percent Returns Distribution for DIME and other cryptocurrencies',figsize=(10,12));
summed inter-day percent returns for BTC:  3.558026986553376
summed inter-day percent returns for ETH:  5.792606414178634
summed inter-day percent returns for XRP:  7.8092982745607
summed inter-day percent returns for SAFEX:  21.757847554128794
summed inter-day percent returns for BITUSD:  1.1496666064952974
summed inter-day percent returns for DIME:  336.3137364214372

The returns are not normally distributed around 0 it seems, with early 2018 seeing a large increase, and some outliers in the right tail of the distribution

I think this may be an artifact of small market cap coins that recently had a big pop in late 2017 and early 2018. We can see this if we take the 5 coins with highest summer returns. It's possible these coins haven't seen a long enough time scale to have normally distributed returns. We tried to filter these coins out earlier but some coins have long left tails with numbers very close to 0, and it's hard to know if this is their actual price or just an anomaly in the data collection process. Without diving individually into each coin and when it was officially launched, it's hard to know.

In [28]:
top_5_return_coins = np.array(returns_df.sum().sort_values(ascending=False)[:5].index)
top_5_return_coins
Out[28]:
array(['NYC', '$PAC', 'DIME', 'ECC', 'MOON'], dtype=object)
In [29]:
df[top_5_return_coins].plot(subplots=True,title='Top 5 coins by percent return: Closing price over time in USD',figsize=(10,10));

Analyzing Closing Price in Bitcoin terms

All the analysis we've done up to this point has been it terms of USD, but as you may know a lot of these coin prices can look very different on a BTC price scale. Since many altcoins still follow movements in BTC relatively closely it would be interesting to do the same analysis I just did but instead of having prices in USD, we look at prices in Bitcoin value. We can do this by using the .div method and divide by the BTC column along the rows of our dataframe

In [30]:
df_BTC = df.div(df['BTC'].values,axis='rows')
df_BTC.head()
Out[30]:
$PAC AEON AMP BAY BCN BITB BITCNY BITUSD BLK BLOCK ... VTC XCP XDN XEM XLM XMR XPM XRP XVG XWC
Date
2016-05-01 2.655572e-11 0.000022 0.000101 1.024608e-06 7.966717e-08 9.958396e-08 0.000333 0.002213 0.000065 0.000197 ... 0.000102 0.002810 1.947420e-07 0.000004 0.000004 0.002023 0.000190 0.000015 1.128618e-07 5.886519e-07
2016-05-02 2.698630e-11 0.000025 0.000114 9.872490e-07 7.646120e-08 1.034475e-07 0.000339 0.002111 0.000067 0.000161 ... 0.000114 0.002811 1.934018e-07 0.000004 0.000004 0.002169 0.000185 0.000015 1.101941e-07 7.893494e-07
2016-05-03 2.442816e-11 0.000022 0.000107 1.117033e-06 7.328448e-08 1.021541e-07 0.000331 0.002102 0.000065 0.000162 ... 0.000103 0.002576 1.954253e-07 0.000004 0.000004 0.002017 0.000184 0.000015 1.132578e-07 7.594937e-07
2016-05-04 2.686246e-11 0.000023 0.000112 1.070021e-06 7.611032e-08 1.007342e-07 0.000344 0.002174 0.000066 0.000215 ... 0.000100 0.002462 1.857987e-07 0.000003 0.000004 0.002052 0.000180 0.000014 1.096884e-07 8.304979e-07
2016-05-05 2.455467e-11 0.000021 0.000113 1.004509e-06 7.143176e-08 1.004509e-07 0.000340 0.002226 0.000065 0.000242 ... 0.000101 0.002500 1.763472e-07 0.000003 0.000004 0.002024 0.000177 0.000014 1.093799e-07 7.210143e-07

5 rows × 83 columns

What we'd be interested in seeing here is if any coin's price has been rising relative to BTC price over time.

To see this let's take the top 10 coins by summed price over the last 30 days and then plot theme against Bitcoin value

In [31]:
top_10_by_sum_price = np.array(df_BTC.iloc[-30:,:].sum().sort_values(ascending=False)[:11].index)
top_10_by_sum_price
Out[31]:
array(['BTC', 'ETH', 'DASH', 'DGD', 'XMR', 'LTC', 'UNO', 'BTCD', 'DCR',
       'SLS', 'OMNI'], dtype=object)
In [32]:
df_BTC[top_10_by_sum_price].plot(title='top 10 Coins: Closing Price relative to BTC price over time',figsize=(8,5));
df_BTC[top_10_by_sum_price[1:]].plot(title='top 10 Coins: Closing Price relative to BTC price over time',figsize=(8,5));

You can see at some points in time, coins like Etehereum have gained value in Bitcoin terms, but generally over time coins decrease or remain constant over time in BTC value. However the trend around April 2017 is interesting, because ETH and DASH jumped a level in BTC value and have remained relatively constant around that level over time. Time will tell if it stays that way given the increasing ease of buying other coins directly with USD

Feature Analysis with Returns in BTC

We're now going to do the same analysis as before, but in terms of BTC return now

In [33]:
returns_df_BTC = df_BTC.pct_change()
#remove first row
returns_df_BTC = returns_df_BTC.iloc[1:,:]
returns_df_BTC.head()
Out[33]:
$PAC AEON AMP BAY BCN BITB BITCNY BITUSD BLK BLOCK ... VTC XCP XDN XEM XLM XMR XPM XRP XVG XWC
Date
2016-05-02 0.016214 0.150285 0.132993 -0.036462 -0.040242 0.038797 0.016970 -0.046178 0.028577 -0.183230 ... 0.121877 0.000211 -0.006882 -0.004034 0.008119 0.071901 -0.025999 0.004692 -0.023637 0.340944
2016-05-03 -0.094794 -0.113992 -0.064355 0.131460 -0.041547 -0.012503 -0.022522 -0.004175 -0.021812 0.009620 ... -0.093177 -0.083603 0.010462 -0.009993 -0.015675 -0.070006 -0.008189 -0.035153 0.027803 -0.037823
2016-05-04 0.099652 0.006616 0.047814 -0.042086 0.038560 -0.013899 0.037182 0.034223 0.003286 0.323832 ... -0.031681 -0.044125 -0.049260 -0.019454 0.009638 0.017135 -0.022843 -0.028404 -0.031516 0.093489
2016-05-05 -0.085912 -0.068217 0.006423 -0.061225 -0.061471 -0.002813 -0.009890 0.023748 -0.007410 0.125361 ... 0.009453 0.015318 -0.050870 -0.002813 -0.016180 -0.013274 -0.016196 -0.023315 -0.002813 -0.131829
2016-05-06 -0.211365 -0.017313 -0.106188 -0.001456 0.005177 0.083019 -0.018475 -0.022345 -0.009667 0.222118 ... -0.043371 0.018231 0.061084 0.009642 -0.031110 -0.056857 -0.018684 0.007140 -0.084959 -0.022265

5 rows × 83 columns

In [34]:
F1 = []
F2 = []
F3 = []
F4 = []
F5 = []
F6 = []
index_ts = []
for i in returns_df_BTC:
    try:
        Entropy,Trend_stength,Seasonal_strength,Period,ACF1,optim_lambda = tsfeatures(returns_df_BTC[i], freq=365)
        F1.append(Entropy)
        F2.append(Trend_stength)
        F3.append(Seasonal_strength)
        F4.append(Period)
        F5.append(ACF1)
        F6.append(optim_lambda)
        index_ts.append(i)
    except:
        print('error: ',i)
        
#all time series have same period here, so won't use here, as will become 0 in princomp analysis
df_features_BTC = pd.DataFrame(
    {'Entropy': F1,
     'Trend_stength': F2,
     'Seasonal_strength': F3,
     #'Period': F4,
     'ACF1': F5,
     'optim_lambda': F6
    },
    index=index_ts)
df_features_BTC.head()
error:  BTC
Out[34]:
ACF1 Entropy Seasonal_strength Trend_stength optim_lambda
$PAC -0.001668 0.999980 0.500449 0.005925 0.000066
AEON -0.062367 0.985983 0.476802 0.001579 0.662381
AMP 0.001175 0.990853 0.530192 0.000830 0.999934
BAY 0.034377 0.985683 0.494087 0.005782 0.999934
BCN -0.065953 0.979393 0.420715 0.003728 0.827699
In [35]:
scatter_graph = sns.PairGrid(df_features_BTC, diag_sharey=False,size=2)
scatter_graph.map_lower(sns.kdeplot, cmap="Blues")
scatter_graph.map_upper(plt.scatter)
scatter_graph.map_diag(sns.kdeplot, shade=True,lw=3);
WARNING:matplotlib.legend:No handles with labels found to put in legend.
WARNING:matplotlib.legend:No handles with labels found to put in legend.
WARNING:matplotlib.legend:No handles with labels found to put in legend.
WARNING:matplotlib.legend:No handles with labels found to put in legend.
WARNING:matplotlib.legend:No handles with labels found to put in legend.
In [36]:
pca = PCA(n_components=2)
pca.fit(df_features_BTC)
df_PCA = pca.transform(df_features_BTC)

explained_ratios = pca.explained_variance_ratio_
print('PC1 variance explained: ', round(explained_ratios[0],4)*100,'%','\n', 
      'PC2 variance explained: ',round(explained_ratios[1],4)*100,'%')

df_plotting = df_features_BTC.copy()
df_plotting['PC1'] = df_PCA[:,0]
df_plotting['PC2'] = df_PCA[:,1]
#we are going to take the absolute value of the autocorrelation at lag 1, 
#because for plotting we don't really are about the direction, only the strength
df_plotting['ACF1'] = abs(df_plotting['ACF1'].values)

df_plotting_no_index = df_plotting.reset_index()
df_plotting_no_index.rename(columns={'index':'time_series'}, inplace=True)
df_plotting_no_index.head()
PC1 variance explained:  91.93 % 
 PC2 variance explained:  7.28 %
Out[36]:
time_series ACF1 Entropy Seasonal_strength Trend_stength optim_lambda PC1 PC2
0 $PAC 0.001668 0.999980 0.500449 0.005925 0.000066 0.716588 0.085274
1 AEON 0.062367 0.985983 0.476802 0.001579 0.662381 0.058550 -0.011531
2 AMP 0.001175 0.990853 0.530192 0.000830 0.999934 -0.281511 0.030646
3 BAY 0.034377 0.985683 0.494087 0.005782 0.999934 -0.283627 0.065239
4 BCN 0.065953 0.979393 0.420715 0.003728 0.827699 -0.106768 -0.022063
In [37]:
scatter_dic = {}
for feature in df_features_BTC.columns:
    #print(feature)
    scatter_plot_x = hv.Scatter(df_plotting_no_index, 'PC1',['PC2','time_series',feature],label=feature)
    scatter_plot_x = scatter_plot_x.options(width=300,height=200,tools=['hover'],color_index=feature,cmap='RdYlGn',colorbar=True,size=5,show_grid=True)
    scatter_dic[feature] = scatter_plot_x
    
feature_scatter = (scatter_dic['ACF1'] + scatter_dic['Entropy'] + scatter_dic['Seasonal_strength'] + 
 scatter_dic['Trend_stength'] + scatter_dic['optim_lambda']).cols(2)
feature_scatter
Out[37]:
In [38]:
print('summed inter-day percent returns for BTC: ',returns_df_BTC['BTC'].sum())
print('summed inter-day percent returns for ETH: ',returns_df_BTC['ETH'].sum())
print('summed inter-day percent returns for XRP: ',returns_df_BTC['XRP'].sum())
print('summed inter-day percent returns for SAFEX: ',returns_df_BTC['SAFEX'].sum())
print('summed inter-day percent returns for BITUSD: ',returns_df_BTC['BITUSD'].sum())
print('summed inter-day percent returns for DIME: ',returns_df_BTC['DIME'].sum())

returns_df_BTC[['DIME','BTC','ETH','XRP','SAFEX','BITUSD']].plot(subplots=True,title='Percent Returns for DIME and other cryptocurrencies',figsize=(10,12));
returns_df_BTC[['DIME','BTC','ETH','XRP','SAFEX','BITUSD']].plot(kind="hist",bins=100,subplots=True,title='Percent Returns Distribution for DIME and other cryptocurrencies',figsize=(10,12));
summed inter-day percent returns for BTC:  0.0
summed inter-day percent returns for ETH:  2.7529019223345004
summed inter-day percent returns for XRP:  4.9369467527694155
summed inter-day percent returns for SAFEX:  18.77286915638824
summed inter-day percent returns for BITUSD:  -1.032585436467464
summed inter-day percent returns for DIME:  329.20120971044344
In [39]:
top_5_return_coins = np.array(returns_df_BTC.sum().sort_values(ascending=False)[:5].index)
top_5_return_coins
df_BTC[top_5_return_coins].plot(subplots=True,title='Top 5 coins by percent return: Closing price over time in BTC',figsize=(10,10));
In [40]:
top_5_return_coins = np.append(top_5_return_coins,'BTC')
df_BTC[top_5_return_coins].plot(title='top 5 Coins by returns: Closing Price relative to BTC price over time',figsize=(8,5));
df_BTC[top_5_return_coins[:-1]].plot(title='top 5 Coins by returns: Closing Price relative to BTC price over time',figsize=(8,5));

We see from the preceding graphs that there is not much difference between this analysis in terms of BTC instead of USD. I think the only reason PacCoin is so high is it used to be one of the lowest market cap coins which got pushed by some youtubers as something to buy for the return multiple because it was the lowest price coin at the time, which turned out to be true if you were lucky enough to buy before around Jan 2018 when it had a big spike in price.

This may be the case for many of these low market cap coins over a short time horizon where you see large returns because of pump and dumps shooting the price up, and then the residual value from those left holding the bags hoping the price will go back up, but over time like you see with PACcoin the price drops back down dramatically

Conclusion

In this post, I used time series feature extraction and then principal components analysis as a data exploration tool to try and find interesting cryptocurrencies to look at in terms of price or returns. We saw that there are some coins on a shorter time horizon that have seen their price grow dramatically, possibly due to pump and dump schemes, and the aftermath of the residual value that is rapidly declining.

We also saw what others have found which is that almost all cryptocurrencies do not increase in terms of BTC price over time, since BTC is still the main cryptocurrency gateway. However, there are a few cryptocurrencies which have recently become marginally more valuable in relative BTC terms. This is also reflected here at the bottom of the page which shows Total Market Cap dominance percentage of BTC. These 2 facts represent the increasing ease of getting into other coins directly with USD or other national currencies, thereby eliminating BTC as the gateway into cryptocurrencies

Lastly, we did find one anomaly that we can't fully explain: DIME This coin has a long right tail in its returns distribution. The only differentiating factor from other coins I could find was that the coin has no capped supply so it is designed to be inflationary, unlike other coins. However, it's still unclear why this affects its returns distribution, if anything you would think an inflationary coin should be worth less as continually increasing supply decreases value