In May 2016 the A-post day-to-day delivery of ordinary mail was abolished in Denmark (and something more expensive was introduced). A few weeks ago I ordered a package with day-to-day delivery through Postnord, which turned out to take longer than expected. In fact, I began to wonder how effective their package delivery in Denmark really is. To motivate the rest of the post, let us check out the delivery trail of some of the packages we will be investigating, as can be seen in the video below (click for fullscreen). We will return to this and explain exactly what is going on.
Mining postnord Track & Trace service
While waiting for my package, I realised that the only requirement for the Postnord T&T service is a single number. I investigated how the T&T service of the Post Nord website retrieves its data to see if I could somehow make direct calls to their database. After a while I gave up on that and turned my attention to the Selenium webdriver with Python. Using this, I simply input a T&T number and scraped the result. I let this script run for a around a week or two which returned 105628 total checked T&T numbers where 10005 of them resulted in an actual package transport, so around 9.5% of all the checked numbers gave a result.
Investigating the data
What do I want to investigate?
- Overview of the data
- Is day-to-day dead? (Posting time distribution)
- What's in the box? (Weight and size distribution)
- Most common shippers
- Video of deliveries
Overview of the data
First of, let's start by having a look at how many of our data entries have a "ship date" and "delivery date". This is necessary to check how well day-to-day delivery is going, later on. As can be seen, our requirement that we need both a send and receive date exclude quite a few data entries. A few more is excluded when we make the requirement that the delivery date should (logically) be later than the ship date. Having these though, we can now investigate whether day-to-day delivery is still alive.
Is day-to-day delivery dead?
Plotting the ship/delivery time of day results in the above plot. Here we see that while the shipping hours are rather congested, the receive time is spread out from early in the morning till late in the evening with a max at around 10-11 o'clock. The uneven distribution of the shipping hours could stem from the fact that many online shops have a "order before 12/13/14 and we'll ship your package today" line, which will in turn result in a lot of registered packages when these are shipped by the end of that ship-day, up to around 85% for 12 to 14 o'clock combined. We can now get the time between send and received which we see below.
To my surprise, an extraordinary amount of packages are delivered after a day or two. Seems like day-to-day delivery is still alive! In fact, the shortest delivery time is 9 hours, while the longest is 20 days. The dips, of course, indicate night-time. To get a better grasp of these numbers, I bundled the percentage for each day and plotted the cumulated sum, as seen below.
Now it is even more evident that day-to-day delivery is still alive since roughly 33% is delivered on the first day and 73% percent of all packages are delivered in 2 days or less. I must admit that this number impress me.
What's in the box?
The T&T will also (in some cases) return information about the weight and/or size of the package. We start by having a look at the weight distribution.
Notice that the Y-axis is on a logarithmic scale showing us that most packages (91%) are 5 kg or less in weight. The associate xyz-size distributions can be seen below.
Combining the average weight with the average of the xyz-size distribution seen above gives us an average density of 113 kg/m^3. A quick lookup tells us this could be styrofoam (75 kg/m^3) with some heavier stuff inside, the actual cargo. But of course, that is a total guess.
Most common shippers
It is also interesting to have a look at who are actually sending these packages (and of course there is data about the people receiving it, but for privacy reasons I will not create plots for that). Below I plotted the 10 most common shippers.
Retnemt.dk takes the 1. place for being a service which sends the costumer ingredients for new dinners for a reasonable price. At 2. and 3. we have Jysk Firmafrugt, which is a company that delivers fruit to danish companies. Makes a lot of sense that they would ship a lot of packages (it appears two times in the top list since they have two different shipping addresses). Most of the rest are clothing related. So we can conclude that people need food and clothes :)
Video of deliveries
For 6527 T&T numbers I got information about sender/receiver time and place (as can be seen in one of the earlier plots). Using the googlemaps Python API combined with the Basemap package enabled me to create plots of the packages being delivered, as can be seen below (click for fullscreen).
The first few frames of the above video is full of blue points, indicating shipment addresses. We see that most of the shipments aim for the two purple dots, which are the major package centers in Denmark. In Jutland (on the left) we have Taulov package center and on Zealand (on the right) we have Brøndby package center. After being handled at these centers, packages are either shipped to their delivery address, which can be seen in red or shipped to the other package center, as can be seen in purple. There is also the final case of orange, which indicate a package delivery which goes directly between shipper and delivery address (no package center involved). Below I plotted the number of each of these on the current plot, along with the accompanying timeline.
Here follows a few extracts of the above video. First of the blue shippers aim for the package centers, as mentioned above.
Likewise, after the package centers, the red packages are aiming for the recipients.
As always, the code for generating the plots is available at Github (I have not included the scraping script).
All of these data are publicly available and is classified as metadata since there is no actual information about what the packages contain. It is quite remarkable how much knowledge you can get from "data about data". I assume that this is also the case for all the times the NSA has collected metadata about people.