When do most defaults happen at Bondora?

Post author By Taavi
Post date 14th October 2014
12 Comments on When do most defaults happen at Bondora?

Knowing when the majority of the loans default, can give you a good idea on how well your portfolio or some portion of the platform portfolio is performing.

In addition, if you know what proportion of loans default in a 6-month period after issuing and that the proportions have been relatively stable over time, you can make some rough estimations about your future default rates also.

The estimations might not be exactly scientifically accurate, but they’re certainly a whole lot more accurate than no estimations at all.

To shed some like to this topic, I’ve done a few analyses on the loan dataset provided by Bondora and will highlight some results below.

When do defaults happen?

First I decided to establish some sort of a baseline by looking at the entire dataset to see when do loans usually default.

Same information split into daily proportions of defaults.

Default proportions by day — The proportions of defaults by number of days since issuance.

About 50% of defaults have happened within the first 6 months and the biggest jump in defaults seems to happen in the first 120 days.

Defaults for different markets

From here, I wanted to see if there is any difference between markets. For this I have used the loans issued in 2013 and put the defaults into certain ranges to get a clearer image of what’s going on. This means that Slovakia is not included, as there were no loans issued in 2013.

Bondora defaults in Spain and Finland — Default proportions by day in Estonia, Spain and Finland for 2013.

There doesn’t seem to be much difference between countries for the proportions of defaults. For Estonian loans it seems like more defaults happen later on, but actually it’s just simply the fact that Finnish market was opened only at the end of July and Spanish market in October.

As time goes on, there’ll be some defaults at the 480 days point and some at 600 for FIN and ESP loans too and the graphs will likely look very much the same.

Defaults for different years

Another thing to look at is whether the proportions have changed in time and how. Again I split up the periods into ranges and put the results on the interesting graph below.

Bondora defaults per year of issuance — Default proportions per issuing years.

Some conclusions to make on this:

Year after year, the defaults seem to be happening later on in the loan term. My guess would be that this is the result of more stricter underwriting rules and better fraud checks. For example, initially it was possible to apply even with current payment problems.
The long tail really adds up after 2009. This is not some huge increase at some certain date, but small amounts of defaults every now and then which is to be expected (people lose their jobs or get in difficulties).
Even though 2013 seems like a large increase in the proportion of initial defaults, then do keep in mind that a large portion of the loans haven’t even reached the 360 day deadline so far and the later defaults are still to come in. The pattern seems relatively same as for previous years.

Year after year the pattern seems to be essentially same with some tendency of proportionally more loans defaulting after a longer period of time.

Monthly seasonality

A question of some interest now was to check whether there’s a difference in when a loan was issued. We know that there’s some seasonality when more defaults happen so would be interesting to see if there’s any differences based on the month when a loan is issued.

I ended up with this a bit crowded graph here (I know, it’s a bit difficult to figure out the months, but that’s the best I could get):

Monthly default proportions — The default proportions per month of issuing.

There seems to be two patterns that are separate from other months:

Loans issued in the summer months (April-June) seem to have largest proportion of defaults within the first 120 days (up to 4 months since issuing).
Loans issued in the winter months (October-January) seem to be defaulting mostly within the 120-240 day range (4-8 months since issuing).

If someone cares to go more deeply into the data, I think it would be very interesting to see possible reasons behind this. If you have an idea why this is in such a way, let us know in the comments.

Credit groups and default proportions

I did look at the numbers for different credit groups too to see if perhaps this explains some of the differences, but not really.

Credit group default proportions in time — When defaults happen for different credit groups.

There are some differences, but with the exception of credit score 500 and perhaps the 360 day part for 600-700, but nothing that clearly would say that “this is the cause for the differences”.

I guess this is the place for someone to fire up their SPSS and come up with an answer.

So, what’s your take on this?

***

If you’d like to receive the dataset I used for my analyses with all the data and graphs, then subscribe to the list below:

Tags Defaults, Statistics

By Taavi

Taavi has been investing into P2P-lending platforms since 2010.

View Archive

12 replies on “When do most defaults happen at Bondora?”

It is wonderful that somebody is doing such analyses. Made my fingers itch for data analyses too :). However doing 3D on computer screen unless it is absolutely necessary (here it is not) is not helpful for understanding what is going on. First three graphs are wonderful.
I find monthly graphs quite interesting. Looks like xmas season is big for bad loans.
Did you normalized using total amount of loans or was each year/month normalized by according loans given out during the time period?

Tried doing these without 3D, but the lines crossed each other so much that it didn’t really give any good overview of patterns (perhaps with some strain and really focusing it would).

This seemed like the best solution. Especially since the goal was only to see patterns, not very specific actual percentage points at every datapoint separately.

I simply used loan date (the time when loan was issued) and the default date. In other words, the graph shows what percentage of defaulted loans issued in a certain month, defaulted after certain time range.

It says nothing about the size of default rates for any group, just the timing.

Maybe adding some smooth gridlines would make these 3D graphs easier to read.

Thanks for an interesting analysis!

There are a couple of issues, though, that I started wondering about.

1. If you used the whole dataset for the first two graphs, aren’t the results distorted by the fact that a lot of the loans have been given out very lately, meaning that a significant portion of the loans haven’t even had the chance to default later than, say 6 months?

To make my point clear, if we assumed that for example 50% of the loans outstanding were given out on 14.4.2014 and the other 50% on 1.1.2000, then you would actually get the correct picture from 50% of the loans, but the other 50% would have only had the chance to default for 6 months, making a default seem relatively more likely in the first 6 months than it actually is.

To my understanding this could be countered by using only loans that have been outstanding for at least the period over which you are trying establish the likelihood of default. Another way would be to weigh the number of defaults occuring at a certain point of time with the inverse of the number of loans that have been issued at least that long time ago.

2. For me using the share of defaults taking place at a certain time in the life of a loan out of total defaults actually makes little sense. The reason is that the number of loans that can possibly default is much higher at the 100th day of the loan than it is for example on the 500th.

To exemplify, say that we started with 100 loans, then 10 of them defaulted in the first year and another 30 were payed back successfully. Then in year 2 we have only 60 (100-10-30) loans left, out of which 9 default. Using your rationale, the default rate in year 2 would be lower than it was in year 1, although actually in year 1 only 10% (10/100) of the loans outstanding defaulted, whereas a whole 15% (9/60) of the loans outstanding in year 2 defaulted.

These two flaws undermine (at least the first part of your post) quite badly in my opinion. Is this something that you actually have thought about?

In my own calculations I have reached a fairly stabile annual default rate for the loans (excluding the very beginning – those loans that never make the first payment).

Thanks for your input. Replied to your with the same numbers:

1. That’s what you see on the graph 4. That one includes the proportional defaults for loans issued in different years. In other words, those that have had 4 years, 3 years etc time to mature.
Your point is correct that for the later loans the majority will show as if most of defaults happen early in the process and this is also illustrated by the 2013 loans. However, you can also see that the pattern still holds for the older loans, although a bit more subtle version of it.

2. What value could you get from this kind of analyses:
a) if you consider that the patterns are similar over time and in different segments (as they seem to be), you could estimate by the end of the first 240 days, what the range of total default rate for some new group of loans might be at the end;
b) you can account for the risk when buying loans on the secondary market and pay less/more of a mark-up based on the age of the loan as the likelihood to default at certain points later on is lower.

I’m afraid I failed to make my 2. point clear. I didn’t mean that the data about when loans were defaulting was useless (and totally agree with your further elaborations), but rather that your methodology for measuring the likelihood of a loan defaulting at a certain point is erroneous.

Referring back to my original comment, the third last paragraph explains via an example why the ratio of total defaults that take place at a certain phase in the life of a loan is not a good way to measure a loan’s probability of default at the corresponding point of time.

A further way to think of the issue: assume that we knew that the overall default rate is 10% but 100% of loans that hve gotten so far default between 1500 and 1600 days time. However 99% of all loans have been either paid back or defaulted by 1500th day. So out of all defaults 10% (that is (100%-99%)*100%/10%) take place at between 1500-1600 days. That might be 80% less than for example between the 100th and 200th, if 50% of total defaults happened then. Everything seems clear – the probability of default between 100 and 200 days is 5 times higher than between 1500 and 1600 – 50% vs 10%, according to your calculations. However, as we know, actually 100% of the loans active default between 1500 and 1600 days, when only 5% of active loans default between 100th and 200th day.

Good point. Although I’d guess it only has an effect at the very late stages of the loan terms. Or perhaps in a few cases where defaults would be very very high. In other cases defaults are minority part of all loans and usually happen a lot sooner than repayments.

In other words, the proportion of loans that will default in the future will reduce faster than the amount of properly paying loans that are entirely repaid. At least that’s what I think based on my experience, haven’t looked at the data.

[…] no certain rule on how many months you should exclude, however, since a decent proportion of defaults happen around the 4th-5th month, excluding the last 4-6 months of loans may give you a decently accurate result […]

[…] example, let’s take the first graph about verification type. If you remember the graph on when defaults happen, you’ll probably understand that the longer a certain group of loans has been on the market, […]

[…] minu kunagise analüüsi põhjal oli seal kusagil 120 päeva juures juhtunud umbes 35% kõigist aja …defaultidest (tol ajal andmebaasis olnud andmete põhjal) ja seal paistis olevat trend, et laenud on […]

[…] most of the defaults have already happened by now, but judging by past results, we could expect more and the inaccuracy can only grow for those […]

[…] previous post Jan asked if I’ve done any analyses on the timing of when deaults happen. I did one back in October of 2014, but that could be relatively outdated and there’s a lot of new data available […]

Share your thoughtsCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.