Intraday Credit Index Trading Behaviour as Revealed by Post-Trade Reports

For many OTC derivative products, everyone only see a corner of the whole market. While still unlikely to be comprehensive, the more connected market players have a much better view upon the market. This creates a degree of information asymmetry that successive regulatory reforms aimed to address. One of the policy initiative is to transaction data repository for public dissemination. This blog is a taster for the microstructure analysis of credit index that we can conduct upon these post-trade records.

Itraxx Europe and Crossover indices (denoted as ITX and XO) are the two credit indices under study. The raw data consists of SDR and MIFID post trade reports from Oct 2020 to Mar 2021 for ITX and XO series 34 5-year indices republished via Bloomberg. The underlying reference entities of these credit indices are European corporate entities and they are being actively traded by both US Person [1] and non-US Person [2]. Depending on the location of the trading venue and the jurisdiction of the trader, some of the regulatory trade reporting obligations fall under EU rules and some other under US rules. While regulatory regime is not a topic far from my interest or specialise area, I am going to briefly dabble in some of these matters when trying to garner all the relevant information.

OTC Trade Reports Aim for Public Dissemination

The collapse of Lehman Brothers in 2008 was a wake up call for the regulators. They were caught off-guard by the hidden scale of the OTC derivative exposure amongst major financial institutions and thus demanded more timely and detailed reporting post crisis. The reform was implemented in phases. In the earlier stage, it was about submitting transaction record to global trade repositories or warehouses. In the second stage, it was about mandatory clearing for the more liquid product, more timely reporting to relevant regulators and release of post trade data to the investing public. It is the latter that pique my interest.

Yet when I started looking into the reports, they do not seem to be straightforward to follow. I think there are two main reasons. First, the regulatory regimes have not been harmonised across the Atlantic. Each regulator may choose an agency or principal reporting regime. Along the same line, there are inconsistency for some basic issues e.g. who should report (buyer, seller or the trading venue). Second, many OTC markets are not very liquid or dominated by large players. Forcing instantaneous disclosure of all information for every order could upset the normal functioning of the markets. To mitigate the reporting burden, the regulators introduce deferral mechanisms for data release along with various exemptions. In do some, the reporting becomes more complicated.

For an OTC derivative being traded in EU, three sets of reports are typically required: EMIR to Trade Repositories at T+1, Approved Publication Arrangement (APA) in near real-time for public consumption and transaction reporting through Approved Reporting Mechanism (ARM) at T+1 for relevant authorities. APA is the focus here as it is the set of the info that would eventually be disclosed to the public. Trading are often done either through Multilateral Trading Facility (MTF) or with an investment bank through its bank specific electronic trading platform. MTF is a a recognised market established through EU regulations. It is a form of “exchange lite”. A client can see the quotes posted from multiple dealers it has trading relationship with and trade directly with that offers the best price. The venue then handles the post trade reporting requirements. Trading on an investment bank own electronic platfomr is a typical example of transacting outside an recognised trading venue. In such case, one of the parties is responsible for the APA trade reporting, depending on who is the buyer or seller and whether one of the parties is an Systematic Internaliser (SI). In these platforms, the bank would act as an SI with obligations to handle APA reporting and disclose its firm quotes [ESMA70-156-2756 Sec 3.1.8] to its other clients in order to improve market transparency. For a non-equity OTC derivative being traded in US, CFTC would be the regulator. In US, OTC derivatives are normally being traded in Swap Execution Facility (SEF). A participant can post RFQ to multiple other participants. After a trade is being done, the facility is responsible for sending out post-trade report to Swap Data Repository (SDR) as soon as technically practical. The trade will then be available for real-time public dissemination. Off-venue trading is possible in limited cases.

One complication is a US Person can trade on MTF. Similarly an EU financial institution can trade on SEF. As reporting framework has not been harmonised, duplicate disseminate of the same transaction at different times can happen. The ISDA guide highlights that a US Person trades on MTF would have its trade reported through the venue to the EU side but it still has to send the same transaction info to the US side as “off-facility” trade. On the other hand, a EU Person trades on SEF the requirement to republishing the transaction via APA is waived. The implication is we should ignore the off-facility trades in SDR report when aggregating the trades in order to avoid double counting.

While reporting aiming for public dissemination are collected at real-time as prescribed by regulators across the Atlantic, only SDR on the US side releases such info (execution time, traded size and price) to the public right away. However, SDR withholds the exact trade size for any trade with a notional of larger than $100M. On the EU side, credit indices are deemed as not sufficiently liquid and trade-by-trade data can be deferred for public dissemination. On each Tue, the aggregate trading volume for the prior week is released with the trade-by-trade data are made available to the public only after four extra weeks. In other words, tick data is not real-time but would be delayed for weeks. This could be an interesting machine learning project to train a model to spot actual transaction across EU trading venues.

Trading Activities by Venue

For Itraxx Europe and Crossover indices, there are three sets of reports in the data set (SDR, MTF, APA – both MTF and APA refer to trades covered under MIFID rules but APA are those traded outside recognised market). The average daily volumes were €7.4bn and €2.9bn for Itraxx Europe and Crossover respectively across different trading venues. On average 160 and 190 Itraxx Europe and Crossover trades were being done per day. Assume an 9-hour trade day, there is a transaction roughly average 3 minutes. No wonder many credit index trading desks can still cope with the workload when they are only semi-automated. In terms of trading venue, SEF is the most popular and is followed by MTF. While many US financial institutions are very active in both sides of the market and SEF is not exclusively traded by US Persons, the predominance of SEF is still a bit of a surprise.

Trade statistics for the Itraxx Europe & Crossover S34-5Y on-the-run indices between Oct20-Mar21

Brexit can be a counter-intuitive factor accounting for the popularity of SEF. The market share of SEF jumped from about 40% to above 60% when the Brexit transition ended on 31 Dec 2020. Most of the MTF and APA trades used to be booked in London. While the leading operators of MTF and APA operators have opened new facilities in EU to accommodate their EU based clients post Brexit, liquidity could fall as the market is split into two. Since many clients operating in Europe already have access to SEF, shifting more trading activities into the already popular SEF can be a rational response.

Change in market share of SEF trading venue between Oct20-Mar21

Comparing the average trading volume to number of trades per day gives us an indication of the relative trade size. The trades reported through APA tended to be the largest and the SEF ones be the smallest. It is conceivable as many packaged index trades (e.g. as a part of CDS-index basis package, the index delta leg of options or tranche trades) are booked through APA and they tend to be large trades. In the case of SEF, the price transparency and level of automation is the highest amongst different trade venues. And it may thus attract traders running more nimble and higher frequency strategies.

Arrival Time Between Successive Trades

The histogram for the arrival time between successive trades shows an exponential-like drop off. The log frequency plot shows that the relationship is reasonably linear at least for am arrival time of less than around 10 minutes. The pattern is similar for both ITX34 and XO34. Exponential distribution is thus the right model for the arrival time between successive trades.

Arrival Time Distribution for ITX34 and XO34

For an arrival time of beyond about 10 minutes, the rate of decrease becomes slower. One possibility is there exists two sub-populations representing the busier and quieter trading days. If we define a cut-off arrival time of 10 minutes, there were only 9 trading days of which more than 30% of all trades are above the cut-off and these days were the Thu and Fri around Thanksgiving and in the second half of Dec. A more advance arrival time model might include classifying dates into busier and quieter days and fit a separate exponential distribution model to each subset of data.

Trade Size Distribution

The distribution of trade size is not smooth. Most still prefer trading at multiples of 5MM or 10MM with 25MM and 10MM being the most popular trade size for Itraxx EUR and XO respectively. Note that in the case of SEF, the exact trade size of larger than USD100MM is not publicly disclosed and is capped to the EUR equivalent in reports. Depending on the EURUSD exchange rate, the trade size of these large trade might report as €82MM+, €84MM+, €90MM+ etc. This would slightly distort the results when we aggregate the data (esp for ITX which tend to be traded in larger sizes).

Trade Size Distribution for ITX34 and XO34

Seasonality Effect – Intraweek and Intraday

The trading activities in many markets often follow some cyclical patterns. This is known as the seasonality effect. For those who intend to design a trade execution model, the intraweek or intraday time scales are the most relevant. First, we examine the intraweek seasonality effect. Tue tended to be the quietest and got busier towards the end of the week. Nevertheless, the difference is less than 10-15% measured either by the average number of trades or volume. This effect was not that apparent for credit index trading during the observation period.

Seasonality Effect – Intraweek for ITX34 and XO34

Unless there are some major economic events, there are not much trading in ITX or XO before 0730 or after 1730 (London time). OTC credit trading does not have official open or close time. I thus pick 0730 and 1730 as the quasi trading hours and aggregate the trades into half hour slots. In terms of intraday pattern, there is an early peak around 0800 to 0830 and the trading activities gradually slow down towards noon. The market heats up again when the US traders come back to office and reach the peak between 1530 to 1630. The intraday seasonality pattern is quite pronounced with average trading volume easily be twice or more than in the peak hours.

Seasonality Effect – Intraday for ITX34 and XO34

Intraday Return Analysis

The transactions are aggregated into a number of different time intervals with length ranging from 7.5, 15, 30, 60 to 120 minutes. The spread movement in each time interval is taken as a proxy for the return. The first 4 moments for the returns (mean, standard deviation, skewness and excess kurtosis) are calculated for each return time interval. A time series of less than 5 months is not that long even for intraday return analysis. The reader should bear in mind this is more of a taster rather than a robust study.

For the period under examination (Oct20 to Mar21), the market rallied on the back of the successful launch of the first covid vaccine. Spread tightened during the period. Naturally, the means are negative. Also, the large movements of the market during this period tended to be risk-on related news and thus explains the negative skewness.

For the standard deviations, they tend to increase as a power to the time interval. The underlying random process matters and it is a topic to be discussed in the scaling law section later on.

Moment for Different Return Intervals for ITX34 and XO34

Excess kurtosis is often (extremely) unstable and much larger than zero (ie fatter tail than the normal distribution). In the case of ITX34, the excess kurtosis falls from very high level when the time interval for return calculations increases. This is similar is similar to many other financial series.

In the case of XO34, the kurtosis seems to be all over the places with kurtosis calculated using 15-minute and hourly return much higher than the rest.

It was caused by a genuine sharp movement in spread when examining the actual data. At around 11:30 am on 9Nov20, Pfizer announced the success third stage covid vaccine trial. XO34 was tightened by more than 23bp (or 10x hourly standard deviation) in the subsequent hour. As the market rally was driven by incessant buying order for high yield risk (rather than as a sudden jump), the kurtosis calculated at the shorter return interval is not affected by as much. If this data point was being removed from XO34, the kurtoses would fall substantially across board. Perhaps a data set with longer sampling period should be used. Alternatively, the kurtosis is just that unstable by its nature.

Effect of Removal of Just One Extreme Data Point Upon Kurtosis

Scaling Law

There is an element of randomness in return. Based upon Mandelbrot’s initial analysis, [3] suggests an empirical scaling law between return volatility (as measured by { E (|r|)}) and time intervals {\Delta t ^{D}}. {D} is termed as the drift exponent. If the return follows a Gaussian random walk, the drift exponent would be 0.5. If it follows a more more trend following process with a large movement tend to be clustered with other large movements, the drift exponent would be larger than 0.5 with {0.5 <D <1}. For more mean-reverting processes, the drift exponent would be smaller than 0.5 with {0 <D <0.5}. Since the focus is intraday behaviour, the time interval is limited to a maximum of 240 min.

\displaystyle E(|r|)=c\,\Delta t ^{D}

The expected absolute return ({E(|\Delta t|)}) is plotted against the time interval {\Delta t} in a log-log plot. It seems that it is more appropriate to fit the data into a shorter end and and longer end model. Picking a transition point at 900 seconds (15minute). Below that, the drift exponent is above 0.8 for both indices, suggesting the spreads tend to be trending when the time interval is short. Beyond this point, the drift exponent falls to nearly 0.5, suggesting the return is not too different from a random walk in longer horizon. This seems to be collaborated with the empirical observation – after an actual transaction goes through, the market quotes tend to trend until the orders from other participants who are in the same direction but with higher private reserve prices all get filled.

Scaling Law Analysis for ITX34 and XO34

Note 1 US Person is defined as a US Resident, partnership or corporate formed under US laws, various types of accounts held for the benefit of a US Person

Note 2 Non-US Person: largely refer to market participants fall within EU jurisdiction. Brexit can be a complication here. Given there still is no agreement on equivalence of financial regulation between the EU and the UK, UK’s FCA temporarily adopt EU rules after the end of Brexit transition period on 31 Dec 2020. Situation could change pending for further negotiations.

Note 3 “An Introduction to High-Frequency Finance, Dacorogna, Gencay, Muller, Olsen”, 2001. Ch 5.5


The Construction of an Interest Rate Vol Cube

A volatility surface maps the time-to-expiration and strike to the volatility data. To capture the implied volatilities for a particular instrument such as an equity index or a currency, a volatility surface is sufficient. The interest rate volatility products are more complex as they have an extra dimension: the tenor of the underlying swap type of instrument. Hence, a volatility cube which handles the mapping from time-to-expiration, tenor, strike to implied volatility is required.

Based on [1] and [2], I implemented the algorithms related to the construction of a volatility cube. Swaptions and cap/floor are taken as a the raw inputs. In some implementation, other derivatives such as eurodollar futures (ED/ER series) can also be included as extra inputs. I pick the USD rates market on 2021-03-16 as an example throughout this blog. By adding a market to each of the cell in the volatility cube that I have access to the quotation of the underlying instrument, we can see the sparseness of the input data set. While those at forefront of rates trading would have better access, that would not change the fact that many data points would not have readily available quotations as the liquidity is similarly not there for some of the data points. Some form of volatility extrapolation (“smile-lift” as explained later) would be necessary.

Quote available to build a Volatility Cube (USD as of 2021-03-16). Source: ICAP/Bloomberg

Vol from Cap/Floor & Swaption

As swaption and cap/floor provide complementary information, they are both necessary when building a volatility cube.

One of the main use of swaption is to hedge the prepayment risk of mortgage exposure. The demand is thus concentrated to the at-the-money (ATM) or near ATM strikes. While ATM quotations are readily available for most standard expiration dates and tenors, quotations for Out-of-The-Money (OTM) strikes are limited only to a few most popular time-to-expiration and tenor combinations. Also the range of strikes is usually not far away from the ATM strike.

Introducing interest rate cap/floor as the inputs fill some gaps in both time-to-expiration and strike dimensions. As the caps and floors are often utilised to hedge contractual upper and lower bounds floating rate obligations, the dealers tend to make market for a wider set of fixed strikes for expiration dates ranging from 1 year to 20-30 years. However, the caps/floors have very limited range in terms of tenor. A cap/floor should be interpreted as a series of interest rate options with a tenor the same as that of the payment period. Much of the trading follows the market convention of which payment period is 3-month for USD, 6-month for GBP, 3-month if maturity equals to or below 2-year and 6-month if above for EUR. In other words, the tenor is restricted to either 3-month or 6-month. Nevertheless, the wider strike range and comprehensive coverage on different expiration dates make them complementary to the swaption data.

Implied volatilities

For a strike {K}, notional {N}, year fraction between time x and y be {\tau_{x,y}} and a reference index rate {L} e.g. LIBOR being fixed at {t_{t-1}} and paid at {t_i}, the payoffs of cap, floor and swaption are as follows.

\displaystyle  Cap = N \sum_{i=1}^n P(t_0, t_i) \, \tau_{t_{i-1}, t_i} \, (L(t_{i-1}, t_i)-K)^+

\displaystyle  Floor = N \sum_{i=1}^n P(t_0, t_i) \, \tau_{t_{i-1}, t_i} \, (K-L(t_{i-1}, t_i))^+

\displaystyle  Swaption = N \left( \sum_{i=1}^n P(t_0, t_i) \, \tau_{t_{i-1}, t_i} \, (L(t_{i-1}, t_i)-K) \right)^+

Each swaption is defined by its time-to-expiration, tenor and strike The calculation of the implied volatility is straightforward. It involves the conversion of quoted price to implied volatility using a root finding algorithm. Due to the current low rate environment, either shifted Black or Bachelier model are commonly being used

The volatility calculation for a cap requires more in-depth explanation. Recalling that a cap is comprised of a series of caplets. There is a flat volatility which treats the whole cap as if it is a single option. A series of spot volatilities corresponding to the maturity of each caplet can also be calculated. In the case of volatility cube, the latter is what we require. Let’s say the payment frequency is quarterly and the 1-year cap is examined first. Note the convention to ignore the trade date as a reset date. There are three caplets in that of the 1-year cap with the reset date be 3-month, 6-month, 9-month and the payment date at 6-month, 9-month, 12-month respectively. Since the caplets are not traded separately, assumptions have to make upon the shape of the caplet volatility curve. A common assumption is to assume the caplet volatilities to be the constant within the same year. After processed the 1-year cap, the implied caplet volatility calculations can be advanced to the 2-year cap. There are 4 caplets between year 1 and year 2 (reset date at 12-, 15-, 18- and 21-month). Subtracting the premium of a 1-year cap from that of the 2-year cap gives the sum of premiums for these four caplets. The implied volatility calculation can be repeated for the rest of the interest rate caps as well as the floors.

Filling in the Blanks

The table below shows the number of implied volatility calculations is available each time-to-expiration and tenor tuple (referred as a “cell” in this section). There are three main types of cells when classified in terms of data source: 1. with cap/floor data (grey cells), 2. with ATM and OTM swaption data (beige cells), 3. with ATM swaption data only (turquoise cells). In the first type of cells (in grey), implied volatility are derived from cap and floor data. When examined the raw input, each cell has cap/floor prices at fixed strikes from 0.5 to 7% with 0.5% increment for the lower rates and 1.0% for the higher end. ATM caps are also available. The second type of cells (in beige) has both ATM and OTM swaption data. The strikes are relative to ATM (e.g. ATM {\pm0.5\%, \pm1\%, \pm2\%, \pm3\%}) rather than fixed. The third type of cells (in turquoise) only has volatility calculation at ATM strike.

Number of implied volatilities calculations available for each expiration and tenor (as of 2021-03-16 USD Volatility Cube)

The SABR is a popular model which can be used to capture the volatility smile. Usually {\beta} for SABR is preselected and leaves with three parameters to estimate. The first two types of cells (in grey and beige) all have sufficient data to build a volatility smile. The implied volatility for any strike of these cells can be found using this fitted SABR model.

For the third type of cells (in turquoise), there is only an implied volatility estimate at the ATM strike. We would have to assume the shape of the volatility smile is similar to another cell which is either type 1 (grey) or type 2 (beige). We can either pick a neighbouring cell, or if nothing is available close by, the cap with matching expiration date can be selected. This procedure is known as “smile-lift”.

Smile-lift can be done in a number of different ways.

  • Moneyness Adjustment
    Suppose {\sigma_u=\sigma(\epsilon_u, \tau_u, k_u)} for cell {u} with time-to-expiration of {\epsilon_u}, tenor of {\tau_u}, strike of {k_u} is the unknown volatility we want to find and cell {a} be the one with a fitted volatility smile model.

    Define relative strike {\tilde{k}}

    \displaystyle \tilde{k} = k \frac{ k_a^{ATM}}{k_u^{ATM}}

    where {k_a^{ATM}} is the ATM strike at expiration of {\epsilon_a} and tenor of {\tau_a}.

    If using Bachelier option quoting model,

    \displaystyle \sigma_u= \sigma(\epsilon_u, \tau_u, k_u^{ATM})+\sigma(\epsilon_a, \tau_a, \tilde{k})-\sigma(\epsilon_a, \tau_a, k_a^{ATM})

    If using Black option quoting model,

    \displaystyle \sigma_u= \sigma(\epsilon_u, \tau_u, k_u^{ATM})*\sigma(\epsilon_a, \tau_a, \tilde{k})/\sigma(\epsilon_a, \tau_a, k_a^{ATM})

  • SABR Lift
    Let {\alpha_a, \beta_a, \nu_a, \rho_a} be the parameters of the SABR model fitted to cell {a}. Parameter {\beta_a, \nu_a} and {\rho_a} represent the shape of the smile. To lift the volatility smile from cell {a} to cell {u}, the parameter {\beta_u, \nu_u} and {\rho_u} are set to be the same as {\beta_a, \nu_a} and {\rho_a}. {\alpha} is a volatility-like parameter. {\alpha_u} is calibrated such that the SABR model at cell {u} gives the ATM swaption volatility at {k_u^{ATM}}

It is also possible to derive a structural model by relating the SDEs of the forward swap rate to forward rates. Please refer to [2] for details.

A Worked Example

Smile-lift is a form of extrapolation with many different possible approaches. For instance, we can lift the smile from the cap volatility far away. We can also lift the smile from nearby cells with both ATM/OTM swaption quotes and take weighted average between them. There is no right or wrong but have to test against real data to tell which approach is a better method.

A worked example is shown here without any attempt to show which approach is statistically better. The target is the 2Y x 10Y (time-to-expiration x tenor) USD swaptions with different strikes on 2021-03-16. The implied volatilities of these strikes are known and will be used as a reference when comparing with the outputs generated by different smile lifting approaches.

Three different approaches are demonstrated here. In the first approach (labelled as “Cap”), the volatility smile is lifted from the caps at 2Y x 3M. In the second approach (labelled as “Swptn”), the volatility smile is lifted from the swaptions twice: first from 1Y x10Y, then from 5Y x 10Y. The volatilities are calculated from each volatility smile before computing the weighted average as {\sigma = 0.75 * \sigma_{1Y x 10Y} + 0.25* \sigma_{5Y x 10Y}}. In either approach, moneyness adjustment is carried out to lift the smile. A simple average between the first and second approach is also being calculated (labelled as “Avg”).

The comparison is as shown in the table below. The values are in Bachelier volatilities in basis point. While this is a worked example with a single comparison performed for a single date, it still gives us some idea about the magnitude of estimation error. At least for this worked example, the estimations by either the first or the second approach are not too bad with the averaging cancels out much of the residual errors.

Worked example for the volatility calculation of the 2Y x 10Y point using different smile lifting approaches

Note 1 “Interest Rate Volatility Cube: Construction and Use”, Hagan & Konikov, 2004.

Note 2 “The Perfect Smile: Filling the Gaps in the Swaption Volatility Cube”, Deloitte, 2018.

Credit Rating Assignment by Supervised Learning

The aim of this study is to see how well we can assign credit ratings to corporate entities by applying supervised learning technique on widely available financial data.

Starting from the top 1000 US listed companies, the financial and real estate companies are first removed (as separate model may be required due to their substantial difference in financial metrics when compared with other corporates). Those with a debt-to-asset ratio of less than 20% or listed for less than 3 years are also removed, leaving 450 companies as the universe for this study. S&P long term corporate rating is chosen as the raw label to target and widely available financial data (total asset, debt, cash, net income, FCF etc for the last year, the annual sales and EBITDA for the last 6 years along with limited amount of share price data – market cap and realised price volatility for the last year) are taken as the raw feature data set.

The data preprocessing and feature engineering procedures performed are pretty standard. New features are generated from the raw feature input. They are mostly financial ratios (such as profit margin, EV/EBITDA ratio, interest coverage) or sales/EBITDA growth rates across different time frames. The sectoral averages are also compared with the metrics of individual companies to form new features. After conducted feature engineering, missing data are replaced by group median of respected fields, highly skewed variables are smoothed by Box-Cox transformation (necessary to some algorithms such as Lasso regression that are sensitive to highly skewed data) and categorical variable (only the sector field in this data set) is converted into indicator variables. For the raw labels, the S&P rating watch indicators are stripped before converting them using an ordinal scale (AAA: 1, AA+: 2 …., BBB-: 10, BB+:11, …. CCC+: 17). In this way, the learning task can be formulated as a a regression problem.

The data set is not large. We use 5-fold cross-validation on the training data (67% of the data set) when training the models (including tuning the hyperparameters). We run a random grid search algorithm when tuning the hyperparameters. 33% of the data (about 150 companies) are retained as the test set. The Root Mean Squared Error (RMSE) is chosen as the performance metrics.

The performance of the following supervised learning algorithms will be examined.

  • Logistic Regression
  • Lasso Regression
  • Kernel Ridge Regression
  • Elastic Net
  • Gradient Boosting
  • Random Forest
  • Stacking Averaged Model

As much of this supervised learning study is based upon standard machine learning workflow, it would be sufficient to leave the implementation details in the Jupyter notebook linked below. There are a couple of interesting points in related to feature generation and stacked averaged model that would be discussed separately before examining the results.

Feature generation for financial ratios

Financial ratios are often more useful than raw accounting numbers. For instance, it is difficult to tell whether the debt is too higher unless we also know the ability the company can repay the debt. Leverage ratio, calculated as debt/EBITDA, is thus a much better reflection of the debt affordability. When communicating with other people about financial ratios, most tend to stick to conventions e.g. profit margin is calculated as net income/net sales (but not net sales/net income), leverage ratio is debt/EBITDA, EV multiple is EV/EBITDA and so on.

However, machine learning algorithms tend to work better if the feature is consistent: if we rank Company A better than B and B better C, the feature should follow the same order. The choice of ratio vs inverse ratio matters. Profit margin fits the bill – all else being equal, the higher the margin means the better the business. However, leverage ratio (along with many ratios that links income statement to balance sheet figures) are not consistent. Consider a company issued a 100MM debt when it generated 50MM EBITDA initially. The gross leverage ratio was a health 2x initially. Then the company suffered a decline business with EBITDA reduces to 20MM, 0MM and -10MM in subsequent years. The corresponding leverage ratio are 5x, infinity and -10x. The leverage ratio goes up to infinity then switch sign as the business is in decline. This would confuse many algorithms. The solution is to use inverse ratio. If we define inverse gross leverage as EBITDA/debt, we would get 0.5, 0.2, 0, -0.1 as the time series. Now the new feature is consistent. In general, it is better to keep the number which is always positive in denominator when generating financial ratios.

Stacked Averaged Algorithm

Other than the stacked averaged algorithm, the rest are off-the-shelf implementations. Hence a bit more explanation is provided here. Stacked averaged algorithm is a form of ensemble algorithm. It aims to improve the forecast by aggregating the predictions of N weaker learners with a meta algorithm. The meta algorithm are often simple learners like Lasso or Kernel Ridge regressor. In the training phase, the training data is split into k folds and fit to the weaker learners, resulting in k copies of each of the weaker learner (or k*N fitted models altogether). In the prediction phase, the feature data is fed to k copies of each weaker learner model to generate k predictions. These k predictions are then averaged. The process is repeated for each of N different weak learners before passing on a vector of length N to the meta algorithm.

Results and Discussion

RMSE is the performance metrics being used. It is measured in notch. To be clear, the difference between A- and BBB+ is 1 notch in rating. As shown in the cross-validation data, the best algorithm is stacked averaged model (RMSE = 1.43 {\pm} 0.05). Apart from logistic regression, the other algorithms are statistically no worse than that of stacked averaged model. The out-of-sample performances are more or less the same. Consider only quantitative data is being used here (CRAs use both qualitative and quantitative inputs, data are much better cultivated and some materials are non-public) and the RMS difference is close to 1 notch between for ratings assigned by different CRAs , the performances of these supervised learning algorithms are not bad. Potentially, supervised learning models can be used as a more responsive market monitoring tool.

It is useful to understand what contribute to the performance of these algorithms. The feature importance statistics is readily available once a random forest model is being trained. The most important feature is the market cap. Basically it is voting by the equity market. The strongest companies at this moment are rewarded with the highest share prices while the weakest ones would see their share prices tank. The second and third most important features are two different adjusted inverse gross leverage ratios – one adjusted for the historical equity volatility, another is a comparison against the sector average. While the leverage ratio is a key indicator of debt affordability, what is deemed as an acceptable (or too high) varies between industries. Intuitively, stable business with low fixed cost can sustain more high debt. This type of domain knowledge is not available in a quantitative data set. The adjustments partially fill the gap for this unavailable knowledge. Inverse gross leverage (defined as EBITDA/ gross debt) is the larger the better. The historical equity volatilities for business deemed as stable tend to be lower. Given two companies with the same inverse gross leverage ratio, dividing them by the respected historical equity volatilities would benefit the one with more stable business. Similarly, dividing inverse gross leverage by its sectoral average also introduces a reference point for comparison.

Link to Jupyter Notebook

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Pragmatic Note on Volatility Model When Rates Turn Negative

In the interest rate volatility market, market participants often employ two sets of models – one for quoting the market (ie map the traded option prices in volatility terms using Black model initially and migrated to shifted Black or Bachelier in recent years), another for the modelling of the underlying volatility dynamics e.g. used to model of volatility smile (using SABR and maybe Hull-White). Any model with implicit lognormal assumption will fail under zero or negative rates. As the interest rates for much of the developed world has fallen to close or below zero post 2008 global financial crisis and 2012 European Sovereign debt crisis, early models and conventions need to be adjusted. Here found some of my notes in this topic.

Changes to the Option Quotation Model

The Black model assumes log normal distribution of the forward rate with {dF = \sigma_B F dW}. It was market convention when trading rate volatility products in the 90s and 00s. The price of a European call {C} and put option {P} are

\displaystyle  C=D[F\Phi(d_1) - K\Phi(d_2)]

\displaystyle  P=D[K\Phi(-d_2) - F\Phi(d_1)]


\displaystyle  d_1 = \frac{ln(F/K)+\sigma_B^2 T/ 2}{\sigma_B \sqrt{T}}

\displaystyle  d_2=d_1- \sigma_B \sqrt{T}

{\Phi()} is the cumulative normal distribution function. {\sigma_B}, {F}, {K}, {T} are the Black volatility, forward rate, strike and the time-to-expiry of the option respectively. {D} is a discount factor proportional to the amount of derivatives being held. For example, in the case of a caplet with a notional of {N}, time interval of {\delta} between the rate reset and payment date and discount factor {df} at the payment date, {D=N*\delta*df}. In the case of swaption, {D = N*A} if we define the annuity {A} as sum of discount factor and time interval for each payment period.

When the interest rate being considered (either the forward or the strike) is very low, the solution of the Black model becomes very unstable. When either becomes zero or negative, the solution of the Black model is undefined. We can resolve this either by adopting a shifted Black model or a Bachelier (normal) model.

Solution 1: Shifted Black Model

The idea of shifted Black model is very simple. We pick a “low enough” interest rate level that we do not expect the interest rate would ever fall below (say -3%). We then add this amount to all the inputs. Though there are two main drawbacks: 1. The shifted Black volatility is depend on the shift amount. If two dealers quoting volatilities with different shift amount, conversion has to be done before levels are being compared. 2. The process to select the shift amount can be a bit arbitrary.

Solution 2: Bachelier (Normal) Model

The alternative is to use Bachelier’s century old model which assumes that the forward rate follows normal distribution {dF = \sigma_N dW}. It is also known as Normal Model. The European call and put option prices become

\displaystyle  C=D[(F-K) \Phi(d) + \sigma_N \sqrt{T} \phi(d)]

\displaystyle  P=D[(F-K) \Phi(d) + \sigma_N \sqrt{T} \phi(d)]

\displaystyle d=(F-K)/(\sigma_N \sqrt{T})

with {\phi()} being the probability density function of normal distribution.

Some quick notes on normal volatility: first, normal volatility is shift-invariant as {dF} does not depend of the forward rate. This will become relevant later on. Second, the market convention for normal volatility is to quote in bps (rather than as percent in the case of Black volatility).

Brief Review of the SABR Vol Model

SABR is a popular volatility model for interest rate products. Take swaption as an example, dealer quotes are available often for just the at-the-money and a few popular out-of-the-money strikes (say, ATM, ATM{\pm}100bp, ATM{\pm}200bp etc) for a given swaption expiry date and tenor. If we need to find the volatility at a strike different from the known values, we can use SABR as a volatility smile model to handle the interpolation.

SABR is a four-parameter model with parameters {\alpha, \beta, \nu, \rho} with {\alpha \geq 0}, {0 \leq \beta \leq 1}, {\nu \geq 0} and {-1 < \rho <1}.

\displaystyle  \begin{array}{rcl}  dF &=& \hat{\alpha} F^\beta \, dW_1 \\ d\hat{\alpha} &=& \nu \hat{\alpha} \, dW_2, \quad \hat{\alpha}(0)=\alpha \\ dW_1 dW_2 &=& \rho dt \end{array}

{\beta} is often chosen by the user instead of backing out from the data. Setting {\beta} to zero is a special case. The instantaneous change of the forward rate does not depend on its current value in this situation and this is known as a stochastic normal SABR model. {\alpha} is a volatility-like parameter and {\nu} is a vol-of-vol like parameter. {\rho} is the correlation coefficient between two sources of random noises. The effect of change different SABR parameter is shown in the series of charts below.

The effect of changing different SABR parameters to quoted Bachelier volatility. The base case assume {\alpha=0.001, \beta=0.5, \nu=0.2, \rho=0.3, fwd=0.03, t=5} and one of the parameter is being changed each time

Hagen[1] derived closed-form approximations which map the SABR parameters to either Black or Bachelier volatilities. Define {T} as the option expiry, {z} and {\chi(z)} as

\displaystyle  z = \frac{\nu}{\alpha} (fK)^{\frac{1-\beta}{2}} \log{\frac{F}{K}}

\displaystyle  \chi(z) = \log{\frac{{\sqrt{1-2\rho z+z^2}+z+\rho}}{1-\rho}}

SABR to Black Volatility:

\displaystyle   \sigma_B(F,K) = \frac{\alpha \left[ 1+\frac{(1-\beta)^2 \alpha^2}{24 (FK)^{1-\beta}}+\frac{\alpha \beta \nu \rho}{4 (FK)^{(1-\beta)/2}}+ \frac{2-3 \rho^2}{24} \nu^2 \right] T}{ (F\beta)^{(1-\beta)/2} \{1+\frac{(1-\beta)^2}{24}\log^2{\frac{F}{K}}+\frac{(1-\beta)^4}{1920}\log^4{\frac{F}{K}}\} } \frac{z}{\chi(z)} \ \ \ \ \ (1)

SABR to Bachelier Volatility:

\displaystyle  \begin{array}{rcl}  c_1 &=& \dfrac{1+\frac{1}{24}\log^2\frac{F}{K}+\frac{1}{1920}\log^4\frac{F}{K}}{1+\frac{(1-\beta)^2}{24}\log^2\frac{F}{K}+\frac{(1-\beta)^4}{1920}\log^4\frac{F}{K}} \\ c_2 &=& \left[1+\frac{-\alpha^2 \beta (2-\beta)}{24 (FK)^{1-\beta}}+\frac{\alpha \beta \nu \rho}{4 (FK)^{(1-\beta)/2}}+\frac{2-3\rho^2}{24}\nu^2 \right] T \end{array}

\displaystyle   \sigma_N(F,K) = \alpha (FK)^{\beta/2}\, c_1 \, c_2 \, \frac{z}{\chi(z)} \ \ \ \ \ (2)

Continue with our swaption example. We first calculate the implied volatilities for the list of quoted swaptions with our chosen option model. We then fit the paired strike and implied volatility list to the approximation to Eq 1 or 2 by constrained optimisation to obtain the SABR parameters.

Hague applied singular perturbation technique to derive the above approximations. For points far away from ATM strike, the solutions can deviate for not insignificant amount when compared with solutions obtained by solving the SDEs numerically (e.g. by Monte Carlo).

Changes to Vol Model

Coming back to the topic of negative rates. For those implementations with Hull-White, no adjustment is required as Hull-White assumes the forward rate to follow a normal distribution{dF = (\theta-\alpha F)\,dt+\sigma\,dW}. Negative rate would not be an issue Rates can be negative and no adjustment is required.

SABR volatility model is however affected when the rates are approaching or below zero. As {dF \propto F^\beta}, SABR model is undefined for {0<\beta \leq 1} if the rates becomes zero or negative. There are three common solutions, corresponding to setting {dF} to {F+x}, {1} and {|F|^\beta} respectively.

Solution 1: Shifted Model

Again we can pick a “low enough” interest rate level we do not expect the interest rate would ever reach. Add that amount to all the inputs (forward rate, strikes). As before, determining such adjustment can be a bit arbitrary.

Solution 2: Stochastic Normal SABR

Turning the SABR model into the stochastic normal SABR model special case by setting {\beta} to 0. In such case, term {c_1} of Eq 2 becomes 1. The log forward or log strike calculations become unnecessary. Thus it can handle negative interest rate without any further modification. However, the increment of forward rate {dF} is now independent to its current value. For some, it is arguable whether this is conceptually appropriate [2].

Solution 3: Modified Models e.g. Free SABR

Antonov [3] modifies the SDE by taking the absolute value of the forward rate: {dF = \hat{\alpha} |F|^\beta dW_1}. It introduces a fairly sharp peak in for the probability density around zero. It can be explained as the market’s natural tendency to price high chance that the interest rate would hover not below zero. By now, most would agree the crucial role of the central banks in driving the short term rate to negative. A case in point, the short end of the Euro curve has been below zero (around -25 bps or lower) for a number of years since 2016. The market expectation is likely to peak around this negative ECB policy rate. There is no flexibility for this model to show the pdf peak down below.

FreePDF (blue solid). Graph is from [3]

Note 1 “Managing Smile Risk”, Hagan, Kumar, Lesniewski & Woodward, Wilmott Magazine, 2002.

Note 2 “Stability of the SABR model”, Deloitte, 2016.

Note 3 “The Free Boundary SABR: Natural Extension to Negative Rates”, Antonov, Konikov & Spector, published in SSRN, 2015

Nelson-Siegel Yield Curve Model

(Yield curves are obtained by bootstrapping the interest rate information contained in a range of risk-free/ near risk-free fixed income instruments (deposit rate, LIBOR, FRA, interest rate futures, interest rates wap, OIS swap, government bond…).  Rather than treating each data point as separate, many are more interested in the term structure of the yield curve. Nelson-Siegel (and its extension Nelson-Seigel-Svensson) exponential components framework is one of the most popular yield curve models.  The forward rates are readily available once a Nelson-Siegel model is obtained.  As shown by Diebold & Li, the three time-varying components in Nelson-Siegel can be interpreted as factors corresponding to level, slope and curvature.   These factors have also been shown by Diebold & Li to have some forecasting power. We will go through the meaning of various components of Nelson-Siegelmodel, its implementation and other alternatives in yield curve modelling. ` Continue reading “Nelson-Siegel Yield Curve Model”

Nelson Siegel Model – Python Source Code

This program implements Nelson-Siegel and Nelson-Siegel-Svensson Yield Curve models.  Grid-based OLS is chosen as the parameter estimation algorithm.  Last update: change OLS to weighted OLS to improve model fitting performance.
  • Author: David Y
  • Date of release: 2019-11-13
  • Date of last update: 2019-12-01
  • Version: 1.1.0
  • License: BSD
  • Language: Python (tested in python 3.7.1)

click here to download

Does Taking Higher Risk Lead to More Return In Bonds?

The low volatility anomaly is well-known in equity.  Holding a basket of shares with the highest beta does not generate the highest return.  It has been shown in many different regions and periods.  A similar mechanism may be in action in bonds as well.  The yield is higher when going down the rating spectrum.  But that does not fully compensate the credit quality deterioration beyond a certain point.  Examined 20 years of Bloomberg Barclays bond indices for US and European corporate, buy-and-hold the riskiest credit did not generate a good return.  There seems to be a sweet spot when going down the credit spectrum. Continue reading “Does Taking Higher Risk Lead to More Return In Bonds?”

Pairs Trade – Practice

In the last post, we had reviewed some theory related to pairs trade.  In this post, we will go through a textbook case of arbitrage to show how various test-statistics should look like.  We also introduce the half-life of mean-reversion and the Hurst Exponent as performance indicators.  We then look into a possible implementation for mean-reversion strategy before discussing the real-world issue in pairs trade. 

Continue reading “Pairs Trade – Practice”

Pairs Trade – Theory

Pairs trade is one of the simplest market-neutral statistical arbitrage strategies.  The goal is to find a pair of securities which historically move up and down in highly correlated fashion but the price differential between them is temporarily at an extreme.  We then long the relatively cheap security and simultaneously short the other.  Hopefully, the price differential would promptly revert back to normal such that we can realise some profit.  We would review some useful statistical concept (such as co-integration, stationary process) and discuss the types of securities that are likely to form good trading pairs.   Continue reading “Pairs Trade – Theory”