Intraday Credit Index Trading Behaviour as Revealed by Post-Trade Reports

For many OTC derivative products, everyone only see a corner of the whole market. While still unlikely to be comprehensive, the more connected market players have a much better view upon the market. This creates a degree of information asymmetry that successive regulatory reforms aimed to address. One of the policy initiative is to transaction data repository for public dissemination. This blog is a taster for the microstructure analysis of credit index that we can conduct upon these post-trade records.

Itraxx Europe and Crossover indices (denoted as ITX and XO) are the two credit indices under study. The raw data consists of SDR and MIFID post trade reports from Oct 2020 to Mar 2021 for ITX and XO series 34 5-year indices republished via Bloomberg. The underlying reference entities of these credit indices are European corporate entities and they are being actively traded by both US Person [1] and non-US Person [2]. Depending on the location of the trading venue and the jurisdiction of the trader, some of the regulatory trade reporting obligations fall under EU rules and some other under US rules. While regulatory regime is not a topic far from my interest or specialise area, I am going to briefly dabble in some of these matters when trying to garner all the relevant information.

OTC Trade Reports Aim for Public Dissemination

The collapse of Lehman Brothers in 2008 was a wake up call for the regulators. They were caught off-guard by the hidden scale of the OTC derivative exposure amongst major financial institutions and thus demanded more timely and detailed reporting post crisis. The reform was implemented in phases. In the earlier stage, it was about submitting transaction record to global trade repositories or warehouses. In the second stage, it was about mandatory clearing for the more liquid product, more timely reporting to relevant regulators and release of post trade data to the investing public. It is the latter that pique my interest.

Yet when I started looking into the reports, they do not seem to be straightforward to follow. I think there are two main reasons. First, the regulatory regimes have not been harmonised across the Atlantic. Each regulator may choose an agency or principal reporting regime. Along the same line, there are inconsistency for some basic issues e.g. who should report (buyer, seller or the trading venue). Second, many OTC markets are not very liquid or dominated by large players. Forcing instantaneous disclosure of all information for every order could upset the normal functioning of the markets. To mitigate the reporting burden, the regulators introduce deferral mechanisms for data release along with various exemptions. In do some, the reporting becomes more complicated.

For an OTC derivative being traded in EU, three sets of reports are typically required: EMIR to Trade Repositories at T+1, Approved Publication Arrangement (APA) in near real-time for public consumption and transaction reporting through Approved Reporting Mechanism (ARM) at T+1 for relevant authorities. APA is the focus here as it is the set of the info that would eventually be disclosed to the public. Trading are often done either through Multilateral Trading Facility (MTF) or with an investment bank through its bank specific electronic trading platform. MTF is a a recognised market established through EU regulations. It is a form of “exchange lite”. A client can see the quotes posted from multiple dealers it has trading relationship with and trade directly with that offers the best price. The venue then handles the post trade reporting requirements. Trading on an investment bank own electronic platfomr is a typical example of transacting outside an recognised trading venue. In such case, one of the parties is responsible for the APA trade reporting, depending on who is the buyer or seller and whether one of the parties is an Systematic Internaliser (SI). In these platforms, the bank would act as an SI with obligations to handle APA reporting and disclose its firm quotes [ESMA70-156-2756 Sec 3.1.8] to its other clients in order to improve market transparency. For a non-equity OTC derivative being traded in US, CFTC would be the regulator. In US, OTC derivatives are normally being traded in Swap Execution Facility (SEF). A participant can post RFQ to multiple other participants. After a trade is being done, the facility is responsible for sending out post-trade report to Swap Data Repository (SDR) as soon as technically practical. The trade will then be available for real-time public dissemination. Off-venue trading is possible in limited cases.

One complication is a US Person can trade on MTF. Similarly an EU financial institution can trade on SEF. As reporting framework has not been harmonised, duplicate disseminate of the same transaction at different times can happen. The ISDA guide highlights that a US Person trades on MTF would have its trade reported through the venue to the EU side but it still has to send the same transaction info to the US side as “off-facility” trade. On the other hand, a EU Person trades on SEF the requirement to republishing the transaction via APA is waived. The implication is we should ignore the off-facility trades in SDR report when aggregating the trades in order to avoid double counting.

While reporting aiming for public dissemination are collected at real-time as prescribed by regulators across the Atlantic, only SDR on the US side releases such info (execution time, traded size and price) to the public right away. However, SDR withholds the exact trade size for any trade with a notional of larger than $100M. On the EU side, credit indices are deemed as not sufficiently liquid and trade-by-trade data can be deferred for public dissemination. On each Tue, the aggregate trading volume for the prior week is released with the trade-by-trade data are made available to the public only after four extra weeks. In other words, tick data is not real-time but would be delayed for weeks. This could be an interesting machine learning project to train a model to spot actual transaction across EU trading venues.

Trading Activities by Venue

For Itraxx Europe and Crossover indices, there are three sets of reports in the data set (SDR, MTF, APA – both MTF and APA refer to trades covered under MIFID rules but APA are those traded outside recognised market). The average daily volumes were €7.4bn and €2.9bn for Itraxx Europe and Crossover respectively across different trading venues. On average 160 and 190 Itraxx Europe and Crossover trades were being done per day. Assume an 9-hour trade day, there is a transaction roughly average 3 minutes. No wonder many credit index trading desks can still cope with the workload when they are only semi-automated. In terms of trading venue, SEF is the most popular and is followed by MTF. While many US financial institutions are very active in both sides of the market and SEF is not exclusively traded by US Persons, the predominance of SEF is still a bit of a surprise.

Trade statistics for the Itraxx Europe & Crossover S34-5Y on-the-run indices between Oct20-Mar21

Brexit can be a counter-intuitive factor accounting for the popularity of SEF. The market share of SEF jumped from about 40% to above 60% when the Brexit transition ended on 31 Dec 2020. Most of the MTF and APA trades used to be booked in London. While the leading operators of MTF and APA operators have opened new facilities in EU to accommodate their EU based clients post Brexit, liquidity could fall as the market is split into two. Since many clients operating in Europe already have access to SEF, shifting more trading activities into the already popular SEF can be a rational response.

Change in market share of SEF trading venue between Oct20-Mar21

Comparing the average trading volume to number of trades per day gives us an indication of the relative trade size. The trades reported through APA tended to be the largest and the SEF ones be the smallest. It is conceivable as many packaged index trades (e.g. as a part of CDS-index basis package, the index delta leg of options or tranche trades) are booked through APA and they tend to be large trades. In the case of SEF, the price transparency and level of automation is the highest amongst different trade venues. And it may thus attract traders running more nimble and higher frequency strategies.

Arrival Time Between Successive Trades

The histogram for the arrival time between successive trades shows an exponential-like drop off. The log frequency plot shows that the relationship is reasonably linear at least for am arrival time of less than around 10 minutes. The pattern is similar for both ITX34 and XO34. Exponential distribution is thus the right model for the arrival time between successive trades.

Arrival Time Distribution for ITX34 and XO34

For an arrival time of beyond about 10 minutes, the rate of decrease becomes slower. One possibility is there exists two sub-populations representing the busier and quieter trading days. If we define a cut-off arrival time of 10 minutes, there were only 9 trading days of which more than 30% of all trades are above the cut-off and these days were the Thu and Fri around Thanksgiving and in the second half of Dec. A more advance arrival time model might include classifying dates into busier and quieter days and fit a separate exponential distribution model to each subset of data.

Trade Size Distribution

The distribution of trade size is not smooth. Most still prefer trading at multiples of 5MM or 10MM with 25MM and 10MM being the most popular trade size for Itraxx EUR and XO respectively. Note that in the case of SEF, the exact trade size of larger than USD100MM is not publicly disclosed and is capped to the EUR equivalent in reports. Depending on the EURUSD exchange rate, the trade size of these large trade might report as €82MM+, €84MM+, €90MM+ etc. This would slightly distort the results when we aggregate the data (esp for ITX which tend to be traded in larger sizes).

Trade Size Distribution for ITX34 and XO34

Seasonality Effect – Intraweek and Intraday

The trading activities in many markets often follow some cyclical patterns. This is known as the seasonality effect. For those who intend to design a trade execution model, the intraweek or intraday time scales are the most relevant. First, we examine the intraweek seasonality effect. Tue tended to be the quietest and got busier towards the end of the week. Nevertheless, the difference is less than 10-15% measured either by the average number of trades or volume. This effect was not that apparent for credit index trading during the observation period.

Seasonality Effect – Intraweek for ITX34 and XO34

Unless there are some major economic events, there are not much trading in ITX or XO before 0730 or after 1730 (London time). OTC credit trading does not have official open or close time. I thus pick 0730 and 1730 as the quasi trading hours and aggregate the trades into half hour slots. In terms of intraday pattern, there is an early peak around 0800 to 0830 and the trading activities gradually slow down towards noon. The market heats up again when the US traders come back to office and reach the peak between 1530 to 1630. The intraday seasonality pattern is quite pronounced with average trading volume easily be twice or more than in the peak hours.

Seasonality Effect – Intraday for ITX34 and XO34

Intraday Return Analysis

The transactions are aggregated into a number of different time intervals with length ranging from 7.5, 15, 30, 60 to 120 minutes. The spread movement in each time interval is taken as a proxy for the return. The first 4 moments for the returns (mean, standard deviation, skewness and excess kurtosis) are calculated for each return time interval. A time series of less than 5 months is not that long even for intraday return analysis. The reader should bear in mind this is more of a taster rather than a robust study.

For the period under examination (Oct20 to Mar21), the market rallied on the back of the successful launch of the first covid vaccine. Spread tightened during the period. Naturally, the means are negative. Also, the large movements of the market during this period tended to be risk-on related news and thus explains the negative skewness.

For the standard deviations, they tend to increase as a power to the time interval. The underlying random process matters and it is a topic to be discussed in the scaling law section later on.

Moment for Different Return Intervals for ITX34 and XO34

Excess kurtosis is often (extremely) unstable and much larger than zero (ie fatter tail than the normal distribution). In the case of ITX34, the excess kurtosis falls from very high level when the time interval for return calculations increases. This is similar is similar to many other financial series.

In the case of XO34, the kurtosis seems to be all over the places with kurtosis calculated using 15-minute and hourly return much higher than the rest.

It was caused by a genuine sharp movement in spread when examining the actual data. At around 11:30 am on 9Nov20, Pfizer announced the success third stage covid vaccine trial. XO34 was tightened by more than 23bp (or 10x hourly standard deviation) in the subsequent hour. As the market rally was driven by incessant buying order for high yield risk (rather than as a sudden jump), the kurtosis calculated at the shorter return interval is not affected by as much. If this data point was being removed from XO34, the kurtoses would fall substantially across board. Perhaps a data set with longer sampling period should be used. Alternatively, the kurtosis is just that unstable by its nature.

Effect of Removal of Just One Extreme Data Point Upon Kurtosis

Scaling Law

There is an element of randomness in return. Based upon Mandelbrot’s initial analysis, [3] suggests an empirical scaling law between return volatility (as measured by { E (|r|)}) and time intervals {\Delta t ^{D}}. {D} is termed as the drift exponent. If the return follows a Gaussian random walk, the drift exponent would be 0.5. If it follows a more more trend following process with a large movement tend to be clustered with other large movements, the drift exponent would be larger than 0.5 with {0.5 <D <1}. For more mean-reverting processes, the drift exponent would be smaller than 0.5 with {0 <D <0.5}. Since the focus is intraday behaviour, the time interval is limited to a maximum of 240 min.

\displaystyle E(|r|)=c\,\Delta t ^{D}

The expected absolute return ({E(|\Delta t|)}) is plotted against the time interval {\Delta t} in a log-log plot. It seems that it is more appropriate to fit the data into a shorter end and and longer end model. Picking a transition point at 900 seconds (15minute). Below that, the drift exponent is above 0.8 for both indices, suggesting the spreads tend to be trending when the time interval is short. Beyond this point, the drift exponent falls to nearly 0.5, suggesting the return is not too different from a random walk in longer horizon. This seems to be collaborated with the empirical observation – after an actual transaction goes through, the market quotes tend to trend until the orders from other participants who are in the same direction but with higher private reserve prices all get filled.

Scaling Law Analysis for ITX34 and XO34


Note 1 US Person is defined as a US Resident, partnership or corporate formed under US laws, various types of accounts held for the benefit of a US Person

Note 2 Non-US Person: largely refer to market participants fall within EU jurisdiction. Brexit can be a complication here. Given there still is no agreement on equivalence of financial regulation between the EU and the UK, UK’s FCA temporarily adopt EU rules after the end of Brexit transition period on 31 Dec 2020. Situation could change pending for further negotiations.

Note 3 “An Introduction to High-Frequency Finance, Dacorogna, Gencay, Muller, Olsen”, 2001. Ch 5.5

Credit Rating Assignment by Supervised Learning

The aim of this study is to see how well we can assign credit ratings to corporate entities by applying supervised learning technique on widely available financial data.

Starting from the top 1000 US listed companies, the financial and real estate companies are first removed (as separate model may be required due to their substantial difference in financial metrics when compared with other corporates). Those with a debt-to-asset ratio of less than 20% or listed for less than 3 years are also removed, leaving 450 companies as the universe for this study. S&P long term corporate rating is chosen as the raw label to target and widely available financial data (total asset, debt, cash, net income, FCF etc for the last year, the annual sales and EBITDA for the last 6 years along with limited amount of share price data – market cap and realised price volatility for the last year) are taken as the raw feature data set.

The data preprocessing and feature engineering procedures performed are pretty standard. New features are generated from the raw feature input. They are mostly financial ratios (such as profit margin, EV/EBITDA ratio, interest coverage) or sales/EBITDA growth rates across different time frames. The sectoral averages are also compared with the metrics of individual companies to form new features. After conducted feature engineering, missing data are replaced by group median of respected fields, highly skewed variables are smoothed by Box-Cox transformation (necessary to some algorithms such as Lasso regression that are sensitive to highly skewed data) and categorical variable (only the sector field in this data set) is converted into indicator variables. For the raw labels, the S&P rating watch indicators are stripped before converting them using an ordinal scale (AAA: 1, AA+: 2 …., BBB-: 10, BB+:11, …. CCC+: 17). In this way, the learning task can be formulated as a a regression problem.

The data set is not large. We use 5-fold cross-validation on the training data (67% of the data set) when training the models (including tuning the hyperparameters). We run a random grid search algorithm when tuning the hyperparameters. 33% of the data (about 150 companies) are retained as the test set. The Root Mean Squared Error (RMSE) is chosen as the performance metrics.

The performance of the following supervised learning algorithms will be examined.

  • Logistic Regression
  • Lasso Regression
  • Kernel Ridge Regression
  • Elastic Net
  • Gradient Boosting
  • Random Forest
  • Stacking Averaged Model

As much of this supervised learning study is based upon standard machine learning workflow, it would be sufficient to leave the implementation details in the Jupyter notebook linked below. There are a couple of interesting points in related to feature generation and stacked averaged model that would be discussed separately before examining the results.

Feature generation for financial ratios

Financial ratios are often more useful than raw accounting numbers. For instance, it is difficult to tell whether the debt is too higher unless we also know the ability the company can repay the debt. Leverage ratio, calculated as debt/EBITDA, is thus a much better reflection of the debt affordability. When communicating with other people about financial ratios, most tend to stick to conventions e.g. profit margin is calculated as net income/net sales (but not net sales/net income), leverage ratio is debt/EBITDA, EV multiple is EV/EBITDA and so on.

However, machine learning algorithms tend to work better if the feature is consistent: if we rank Company A better than B and B better C, the feature should follow the same order. The choice of ratio vs inverse ratio matters. Profit margin fits the bill – all else being equal, the higher the margin means the better the business. However, leverage ratio (along with many ratios that links income statement to balance sheet figures) are not consistent. Consider a company issued a 100MM debt when it generated 50MM EBITDA initially. The gross leverage ratio was a health 2x initially. Then the company suffered a decline business with EBITDA reduces to 20MM, 0MM and -10MM in subsequent years. The corresponding leverage ratio are 5x, infinity and -10x. The leverage ratio goes up to infinity then switch sign as the business is in decline. This would confuse many algorithms. The solution is to use inverse ratio. If we define inverse gross leverage as EBITDA/debt, we would get 0.5, 0.2, 0, -0.1 as the time series. Now the new feature is consistent. In general, it is better to keep the number which is always positive in denominator when generating financial ratios.

Stacked Averaged Algorithm

Other than the stacked averaged algorithm, the rest are off-the-shelf implementations. Hence a bit more explanation is provided here. Stacked averaged algorithm is a form of ensemble algorithm. It aims to improve the forecast by aggregating the predictions of N weaker learners with a meta algorithm. The meta algorithm are often simple learners like Lasso or Kernel Ridge regressor. In the training phase, the training data is split into k folds and fit to the weaker learners, resulting in k copies of each of the weaker learner (or k*N fitted models altogether). In the prediction phase, the feature data is fed to k copies of each weaker learner model to generate k predictions. These k predictions are then averaged. The process is repeated for each of N different weak learners before passing on a vector of length N to the meta algorithm.

Results and Discussion

RMSE is the performance metrics being used. It is measured in notch. To be clear, the difference between A- and BBB+ is 1 notch in rating. As shown in the cross-validation data, the best algorithm is stacked averaged model (RMSE = 1.43 {\pm} 0.05). Apart from logistic regression, the other algorithms are statistically no worse than that of stacked averaged model. The out-of-sample performances are more or less the same. Consider only quantitative data is being used here (CRAs use both qualitative and quantitative inputs, data are much better cultivated and some materials are non-public) and the RMS difference is close to 1 notch between for ratings assigned by different CRAs , the performances of these supervised learning algorithms are not bad. Potentially, supervised learning models can be used as a more responsive market monitoring tool.

It is useful to understand what contribute to the performance of these algorithms. The feature importance statistics is readily available once a random forest model is being trained. The most important feature is the market cap. Basically it is voting by the equity market. The strongest companies at this moment are rewarded with the highest share prices while the weakest ones would see their share prices tank. The second and third most important features are two different adjusted inverse gross leverage ratios – one adjusted for the historical equity volatility, another is a comparison against the sector average. While the leverage ratio is a key indicator of debt affordability, what is deemed as an acceptable (or too high) varies between industries. Intuitively, stable business with low fixed cost can sustain more high debt. This type of domain knowledge is not available in a quantitative data set. The adjustments partially fill the gap for this unavailable knowledge. Inverse gross leverage (defined as EBITDA/ gross debt) is the larger the better. The historical equity volatilities for business deemed as stable tend to be lower. Given two companies with the same inverse gross leverage ratio, dividing them by the respected historical equity volatilities would benefit the one with more stable business. Similarly, dividing inverse gross leverage by its sectoral average also introduces a reference point for comparison.

Link to Jupyter Notebook

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Pairs Trade – Practice

In the last post, we had reviewed some theory related to pairs trade.  In this post, we will go through a textbook case of arbitrage to show how various test-statistics should look like.  We also introduce the half-life of mean-reversion and the Hurst Exponent as performance indicators.  We then look into a possible implementation for mean-reversion strategy before discussing the real-world issue in pairs trade. 

Continue reading “Pairs Trade – Practice”

Pairs Trade – Theory

Pairs trade is one of the simplest market-neutral statistical arbitrage strategies.  The goal is to find a pair of securities which historically move up and down in highly correlated fashion but the price differential between them is temporarily at an extreme.  We then long the relatively cheap security and simultaneously short the other.  Hopefully, the price differential would promptly revert back to normal such that we can realise some profit.  We would review some useful statistical concept (such as co-integration, stationary process) and discuss the types of securities that are likely to form good trading pairs.   Continue reading “Pairs Trade – Theory”

Demystify the Volatility Cone

Volatility cone is a visualisation tool for the display of historical volatility term structure. It was introduced by Burghardt and Lane[1] in early 1990 and is popular in the option trading community. Using the same methodology, we can extend the use of such chart for periodic return data. I find these charts useful not only for options but also for the general market. Continue reading “Demystify the Volatility Cone”