The aim of this study is to see how well we can assign credit ratings to corporate entities by applying supervised learning technique on widely available financial data.

Starting from the top 1000 US listed companies, the financial and real estate companies are first removed (as separate model may be required due to their substantial difference in financial metrics when compared with other corporates). Those with a debt-to-asset ratio of less than 20% or listed for less than 3 years are also removed, leaving 450 companies as the universe for this study. S&P long term corporate rating is chosen as the raw label to target and widely available financial data (total asset, debt, cash, net income, FCF etc for the last year, the annual sales and EBITDA for the last 6 years along with limited amount of share price data – market cap and realised price volatility for the last year) are taken as the raw feature data set.

The data preprocessing and feature engineering procedures performed are pretty standard. New features are generated from the raw feature input. They are mostly financial ratios (such as profit margin, EV/EBITDA ratio, interest coverage) or sales/EBITDA growth rates across different time frames. The sectoral averages are also compared with the metrics of individual companies to form new features. After conducted feature engineering, missing data are replaced by group median of respected fields, highly skewed variables are smoothed by Box-Cox transformation (necessary to some algorithms such as Lasso regression that are sensitive to highly skewed data) and categorical variable (only the sector field in this data set) is converted into indicator variables. For the raw labels, the S&P rating watch indicators are stripped before converting them using an ordinal scale (AAA: 1, AA+: 2 …., BBB-: 10, BB+:11, …. CCC+: 17). In this way, the learning task can be formulated as a a regression problem.

The data set is not large. We use 5-fold cross-validation on the training data (67% of the data set) when training the models (including tuning the hyperparameters). We run a random grid search algorithm when tuning the hyperparameters. 33% of the data (about 150 companies) are retained as the test set. The Root Mean Squared Error (RMSE) is chosen as the performance metrics.

The performance of the following supervised learning algorithms will be examined.

- Logistic Regression
- Lasso Regression
- Kernel Ridge Regression
- Elastic Net
- Gradient Boosting
- Random Forest
- Stacking Averaged Model

As much of this supervised learning study is based upon standard machine learning workflow, it would be sufficient to leave the implementation details in the Jupyter notebook linked below. There are a couple of interesting points in related to feature generation and stacked averaged model that would be discussed separately before examining the results.

** Feature generation for financial ratios **

Financial ratios are often more useful than raw accounting numbers. For instance, it is difficult to tell whether the debt is too higher unless we also know the ability the company can repay the debt. Leverage ratio, calculated as debt/EBITDA, is thus a much better reflection of the debt affordability. When communicating with other people about financial ratios, most tend to stick to conventions e.g. profit margin is calculated as net income/net sales (but not net sales/net income), leverage ratio is debt/EBITDA, EV multiple is EV/EBITDA and so on.

However, machine learning algorithms tend to work better if the feature is consistent: if we rank Company A better than B and B better C, the feature should follow the same order. The choice of ratio vs inverse ratio matters. Profit margin fits the bill – all else being equal, the higher the margin means the better the business. However, leverage ratio (along with many ratios that links income statement to balance sheet figures) are not consistent. Consider a company issued a 100MM debt when it generated 50MM EBITDA initially. The gross leverage ratio was a health 2x initially. Then the company suffered a decline business with EBITDA reduces to 20MM, 0MM and -10MM in subsequent years. The corresponding leverage ratio are 5x, infinity and -10x. The leverage ratio goes up to infinity then switch sign as the business is in decline. This would confuse many algorithms. The solution is to use inverse ratio. If we define inverse gross leverage as EBITDA/debt, we would get 0.5, 0.2, 0, -0.1 as the time series. Now the new feature is consistent. In general, it is better to keep the number which is always positive in denominator when generating financial ratios.

** Stacked Averaged Algorithm **

Other than the stacked averaged algorithm, the rest are off-the-shelf implementations. Hence a bit more explanation is provided here. Stacked averaged algorithm is a form of ensemble algorithm. It aims to improve the forecast by aggregating the predictions of N weaker learners with a meta algorithm. The meta algorithm are often simple learners like Lasso or Kernel Ridge regressor. In the training phase, the training data is split into k folds and fit to the weaker learners, resulting in k copies of each of the weaker learner (or k*N fitted models altogether). In the prediction phase, the feature data is fed to k copies of each weaker learner model to generate k predictions. These k predictions are then averaged. The process is repeated for each of N different weak learners before passing on a vector of length N to the meta algorithm.

** Results and Discussion **

RMSE is the performance metrics being used. It is measured in notch. To be clear, the difference between A- and BBB+ is 1 notch in rating. As shown in the cross-validation data, the best algorithm is stacked averaged model (RMSE = 1.43 0.05). Apart from logistic regression, the other algorithms are statistically no worse than that of stacked averaged model. The out-of-sample performances are more or less the same. Consider only quantitative data is being used here (CRAs use both qualitative and quantitative inputs, data are much better cultivated and some materials are non-public) and the RMS difference is close to 1 notch between for ratings assigned by different CRAs , the performances of these supervised learning algorithms are not bad. Potentially, supervised learning models can be used as a more responsive market monitoring tool.

It is useful to understand what contribute to the performance of these algorithms. The feature importance statistics is readily available once a random forest model is being trained. The most important feature is the market cap. Basically it is voting by the equity market. The strongest companies at this moment are rewarded with the highest share prices while the weakest ones would see their share prices tank. The second and third most important features are two different adjusted inverse gross leverage ratios – one adjusted for the historical equity volatility, another is a comparison against the sector average. While the leverage ratio is a key indicator of debt affordability, what is deemed as an acceptable (or too high) varies between industries. Intuitively, stable business with low fixed cost can sustain more high debt. This type of domain knowledge is not available in a quantitative data set. The adjustments partially fill the gap for this unavailable knowledge. Inverse gross leverage (defined as EBITDA/ gross debt) is the larger the better. The historical equity volatilities for business deemed as stable tend to be lower. Given two companies with the same inverse gross leverage ratio, dividing them by the respected historical equity volatilities would benefit the one with more stable business. Similarly, dividing inverse gross leverage by its sectoral average also introduces a reference point for comparison.

** Link to Jupyter Notebook **