CAC 40 correlation

29 September 2021

In this article, we will get an insight on correlation analysis for an index, in this case we chose CAC 40 (French index 🇫🇷 on Euronext) use-case leveraging several APIs we provide.

Stocks trading on major index have a higher chance to be very liquid💧 just by being part of an index. Trading on very liquid stock ensures that costs are kept to a miminum.

📈 For a typical pair strategy trading, you would want to find highly correlated stocks. Any time a stock moves without the other, you will sell one and buy the other, waiting for both stocks to be back in line and flatten your position.

Today's article will focus on computing correlation between each component of an index using several APIs within Ganymede, our web and cloud-based JupyterLab environment.

Getting started

The overall approach followed in this article is as follows:

  • Select all CAC 40 components
  • Get 5-min bars for each component over a 50 days period
  • Compute correlation and pick for each stock the most closely correlated

We will be using a set of Systemathics modules in addition to Opensource modules. For this sample, we chose to use Python with the following Systemathics package:

PyPI version

Select all CAC 40 components

  • CAC 40 index
  • Stocks trading on Euronext Paris and Amsterdam

# set index
index = 'CAC 40'

# generate static data request
request = static_data.StaticDataRequest( 
    asset_type = static_data.AssetType.ASSET_TYPE_EQUITY
)

request.index.value = index # add index as per filter value
request.count.value = 1000 # by default the count is set to 100

We call StaticData method to return all equities on CAC 40. We get a StaticDataResponse

# define a method to handle the equities reponse using a Pandas dataframe
def get_equities_dataframe(response):
    exchange = [equity.identifier.exchange for equity in response.equities]
    ticker = [equity.identifier.ticker for equity in response.equities]
    name = [equity.name for equity in response.equities]
    primary = [equity.primary for equity in response.equities]
    index = [equity.index for equity in response.equities]
    
    # Create pandas dataframe
    d = {'Index': index, 'Name': name, 'Ticker': ticker, 'Exchange': exchange, 'Primary':primary}
    df = pd.DataFrame(data=d)
    return df

data = get_equities_dataframe(response)
# filter index component according to exchange
data = data[(data['Exchange'] == "XPAR") | (data['Exchange'] == "XAMS")]
CAC 40 Components

Request 5-min bars for each component over a 50-days period

To get a standard set of data shared by all components, we will compute a 5-min bar period and compute bars according to trade price.

We can define a function get_request to compute bars for the last 50 days for each index component. We build an array requests with each request associated with it's name.

# set the bar duration
sampling = 5 * 60

# set the bar calculation field
field = tick_bars.BAR_PRICE_TRADE 

# create time intervals (we are using Google date format)
today = datetime.today()
start = today - timedelta(days=50)

date_interval = dateinterval.DateInterval(
    start_date = date.Date(year = start.year, month = start.month, day = start.day), 
    end_date = date.Date(year = today.year, month = today.month, day = today.day)
)

# generate constraints based on the previous date selection
constraint = constraints.Constraints(
    date_intervals = [date_interval]
)

# function to generate tick bars request from exchange and ticker
def get_request(exchange, ticker):
    return tick_bars.TickBarsRequest(
                identifier = identifier.Identifier(exchange = exchange, ticker = ticker),
                constraints = constraint,
                sampling = duration.Duration(seconds = sampling),
                field = field)

# get all requests with name in same tuple
requests = [ (row['Name'], get_request(row['Exchange'],row['Ticker'])) for index, row in data.iterrows() ]

We can now request 5 min bars for each stock using TickBarsService. All results are aggregated in a single pandas dataframe.

# instantiate the tick bars service
service = tick_bars_service.TickBarsServiceStub(channel)

# initialize results dataframe
results = pd.DataFrame({'Date': []})
results = dataframe.set_index('Date')

# execute for each request
for name, request in requests :
    display(name)
    bars = []
    # process the tick bars request
    for bar in service.TickBars(request=request, metadata=metadata):
        bars.append(bar)
    
    # create temporary dataframe
    dates=[datetime.fromtimestamp(b.time_stamp.seconds) for b in bars]
    closes = [b.close for b in bars]
    df = pd.DataFrame(data ={'Date': dates, f'{name}': closes})
    df = df.set_index('Date')
    # merge temporary dataframe
    if (results.size == 0):
        results = df
    else:
        results = pd.merge(results, df, on="Date")

The previous code snippet gives us the pandas dataframe containing 5-min bars for each CAC 40 component:

CAC 40 bars

Note that time are UTC.

Highlight most closely correlated stocks

Compute correlations is very simple using pandas correlation function and display the first rows of the correlation matrix.


corr = results.corr()

CAC 40 correlation

We will now get the most closely correlated stocks. We will replace all the "1" in the correlation matrix diagonal by a "0" because each stock is perfecly correlated with itself, but this has no value 😀.

            
corr = corr.replace(1,0)
final = pd.DataFrame({ "Stock": corr.index, "Closest correlated stock" : corr.idxmax(), "Correlation value": corr.max() })
# filter only correlation above 0.90
final = final[final["Correlation value"] > 0.90]
final.sort_values(by="Correlation value",ascending =False)
CAC 40 best correlation

Some stocks in same sectors are represented as expected:

  • Banking with Credit Agricole and Societe Generale
  • Luxury market with LVMH and Hermes
  • Industrial solution with Legrand and Schneider

Some other are less obvious (Danone and Michelin in Consumer Goods ?😅).

It would be interesting to compute these correlations over bigger and shorter time period and plot them to detect any seasonality. This is an exercise left to the reader.💸

Reach out to try our solutions

In this article, we request tick data and use index components by calling a dedicated API service within Ganymede, our web and cloud-based JupyterLab environment, and our API. You can use it as-is and/or call directly our API within your internal tools and start immediately retrieving on-demand financial data.

To get the full sample and discover more data analytics samples and building blocks navigate to our public Github →