Wednesday, July 16, 2014

Finding the World's Economic Center of Gravity

[]

by Daniel Velkov, for comments go to Hacker News

The story starts with an insightful graphic which was published in The Economist. The original work was done by the McKinsey Global Institute which itself is based on data collected by the University of Groningen (link to the raw spreadsheet).

The graphic plots the movement of the world's geographic center of economic activity for the years between 1 and 2025. There was one thing which bothered me, and I hope it bothers you too, the points from the 20th century are all positioned in or above Scandinavia which seems unlikely to be the center of anything in the world. You can find the reason for that in the caption on the McKinsey webpage. Their report looks at the Earth as a sphere and finds the economic center of gravity which falls somewhere inside the sphere. To plot it on the map, they take a radius through the center of gravity and intersect it with the surface. To see why this is a problem, consider the extreme case of only USA and China existing and their economies being equally sized. In this setup the economic center will be close to the North Pole because the two countries happen to be on almost the exact opposite sides of the Earth and roughly at the same latitude.

I had an idea how to visualize the same data in a better and more intuitive way. Instead of the original method, I did all the calculations in 2D with respect to the usual map coordinates. This has the nice property that in the USA/China case from above, the center will end up somewhere near Spain which is the midway point between USA and China on most maps. The result of this approach is:

You can see that it shows the same directional trends as the first chart. In the year 1 AD the world's economic center is close to China and India. Looking back at the spreadsheet for that year, you can see that the total economic output of Asia is 5 times larger than Europe's. There is a slight move to the east, in the years 1-1600, which is due to the decline of the Roman Empire and Europe in general during that time. As time progresses, Europe's economy rises and the center starts shifting west. With the help of the US, this movement continues until 1950 when the trend reverses. The return of the Asian economies starts pushing the point east and in 2000 the position is at the same level on the east-west axis as it was in 1900.

The rest of this post describes how to gather the necessary data and produce the plot. It includes processing the raw spreadsheet data with the help of pandas and plotting with matplotlib and its basemap toolkit.

In [20]:
from mpl_toolkits.basemap import Basemap
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
%matplotlib inline
import pandas as pd

The first step was to find a source for historical GDP data by country. One of the top results when searching for "historical gdp data by country" was this spreadsheet which is what The Economist and McKinsey are using.

In [21]:
gdp = pd.read_csv('horizontal-file_03-2007 - GDP.csv', skip_footer=4, thousands=',')
gdp.ix[:5,:10]
Out[21]:
Unnamed: 0 1 1000 1500 1600 1700 1820 1821 1822 1823
0 Western Europe NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Austria 213 298 1414 2093 2483 4104 NaN NaN NaN
2 Belgium 135 170 1225 1561 2288 4529 NaN NaN NaN
3 Denmark 72 144 443 569 727 1471 1541 1564 1564
4 Finland 8 16 136 215 255 913 NaN NaN NaN
5 France 2366 2763 10912 15559 19539 35468 38524 37267 38706

The values are in inflation adjusted 1990 US dollars.

In [22]:
gdp.rename(columns = {'Unnamed: 0':'Country'}, inplace=True)

Some of the country names had an extra space at the end:

In [23]:
gdp.Country[:10].values
Out[23]:
array(['Western Europe ', 'Austria ', 'Belgium ', 'Denmark ', 'Finland ',
       'France', 'Germany ', 'Italy ', 'Netherlands ', 'Norway '], dtype=object)

Let's clean those up. Also set the Country column as an index which we'll use when joining with the geo coordinates data later.

In [24]:
gdp.Country = gdp.Country.map(lambda s: s.strip())
gdp.set_index('Country', inplace=True)

As you saw in the table above there are some NaN values. To ensure a smooth series of values for each country I fill the missing values with the nearest known value to the left. To handle the countries with an NaN in the first column we can set those cells to 0.

In [25]:
gdp_fill = gdp.applymap(lambda v: v if type(v) == np.float64 else np.nan)
gdp_fill.astype(float)
gdp_fill.ix[:,0].fillna(0, inplace=True)
gdp_fill.fillna(method='ffill', axis=1, inplace=True)
gdp_fill.ix[:5,:10]
Out[25]:
1 1000 1500 1600 1700 1820 1821 1822 1823 1824
Country
Western Europe 0 0 0 0 0 0 0 0 0 0
Austria 213 298 1414 2093 2483 4104 4104 4104 4104 4104
Belgium 135 170 1225 1561 2288 4529 4529 4529 4529 4529
Denmark 72 144 443 569 727 1471 1541 1564 1564 1611
Finland 8 16 136 215 255 913 913 913 913 913

The second piece of data that I need is a list of the geographic centers for each country. A quick search led me to http://gothos.info/resources/ which has exactly what I need.

In [26]:
coord = pd.read_table('country_centroids_primary.csv', sep='\t')
coord.rename(columns = {'SHORT_NAME':'Country'}, inplace=True)
coord.ix[:5]
Out[26]:
LAT LONG DMS_LAT DMS_LONG MGRS JOG DSG AFFIL FIPS10 Country FULL_NAME MOD_DATE ISO3136
0 33.000000 66.0 330000 660000 42STB1970055286 NI42-09 PCLI NaN AF Afghanistan Islamic Republic of Afghanistan 2009-04-10 AF
1 41.000000 20.0 410000 200000 34TDL1589839239 NK34-08 PCLI NaN AL Albania Republic of Albania 2007-02-28 AL
2 28.000000 3.0 280000 30000 31REL0000097202 NH31-15 PCLI NaN AG Algeria People's Democratic Republic of Algeria 2011-03-03 DZ
3 -14.333333 -170.0 -142000 -1700000 1802701 NaN PCLD US AS American Samoa Territory of American Samoa 1998-10-06 AS
4 42.500000 1.5 423000 13000 31TCH7675006383 NK31-04 PCLI NaN AN Andorra Principality of Andorra 2007-02-28 AD
5 -12.500000 18.5 -123000 183000 34LBM2828516873 SD34-01 PCLI NaN AO Angola Republic of Angola 2007-02-28 AO

One problem that I found was that there were countries present in one table but not the other. Here are the lists of all those cases:

In [27]:
print "Has coordinates but no GDP:", sorted(set(coord.Country) - set(gdp_fill.index))
print "\nHas GDP but no coordinates:", sorted(set(gdp_fill.index) - set(coord.Country))
Has coordinates but no GDP: ['American Samoa', 'Andorra', 'Anguilla', 'Antigua and Barbuda', 'Aruba', 'Ascension', 'Bahamas', 'Barbados', 'Belize', 'Bermuda', 'Bhutan', 'Bonaire', 'Bosnia and Herzegovina', 'British Virgin Islands', 'Brunei', 'Cayman Islands', 'Christmas Island', 'Cocos (Keeling) Islands', 'Comoros', 'Cook Islands', "Cote d'Ivoire", 'Cura\xc3\xa7ao', 'Cyprus', 'Democratic Republic of the Congo', 'Dominica', 'Eritrea', 'Ethiopia', 'Falkland Islands', 'Federated States of Micronesia', 'Fiji', 'French Guiana', 'French Polynesia', 'Gaza Strip', 'Gibraltar', 'Greenland', 'Grenada', 'Guadeloupe', 'Guam', 'Guinea-Bissau', 'Guyana', 'Haiti', 'Iceland', 'Kiribati', 'Kosovo', 'Liechtenstein', 'Luxembourg', 'Maldives', 'Malta', 'Marshall Islands', 'Martinique', 'Mayotte', 'Monaco', 'Montenegro', 'Montserrat', 'Nauru', 'New Caledonia', 'Niue', 'Norfolk Island', 'Northern Mariana Islands', 'Palau', 'Papua New Guinea', 'Pitcairn Islands', 'Republic of the Congo', 'Russia', 'Saint Barthelemy', 'Saint Kitts and Nevis', 'Saint Lucia', 'Saint Martin', 'Saint Pierre and Miquelon', 'Saint Vincent and the Grenadines', 'Samoa', 'San Marino', 'Sao Tome and Principe', 'Serbia', 'Sint Maarten', 'Solomon Islands', 'South Sudan', 'Suriname', 'Timor-Leste', 'Tokelau', 'Tonga', 'Turks and Caicos Islands', 'Tuvalu', 'US Virgin Islands', 'Vanuatu', 'Vatican City', 'Wallis and Futuna', 'West Bank', 'Western Sahara']

Has GDP but no coordinates: ['15 Latin American countries', '15 West Asian countries', '16 East Asian countries', '29 East Asian countries', '57 African countries', '8 Latin American countries', 'Bosnia', 'Comoro Islands', 'Congo', 'Czechoslovakia', "C\xc3\xb4te d'Ivoire", 'Eastern Europe', 'Eritrea and Ethiopia', 'Former Czechoslovakia', 'Former Yugoslavia', 'Guinea Bissau', 'Ha\xc3\xafti', 'Hong Kong', 'Russian Federation', 'Serbia/Montenegro', 'Successor Republics of USSR', 'S\xc3\xa3o Tom\xc3\xa9 and Principe', 'Total 12 Western Europe', 'Total 13 small WEC', 'Total 15 Latin American countries', 'Total 15 West Asian countries', 'Total 16 East Asian countries', 'Total 23 Small East Asian countries', 'Total 24 small Caribbean countries', 'Total 29 East Asian countries', 'Total 29 Western Europe', 'Total 3 Small African countries', 'Total 7 East European Countries', 'Total 8 Latin American countries', 'Total Africa', 'Total Asia', 'Total Former USSR', 'Total Latin America', 'Total Western Offshoots', 'West Bank and Gaza', 'Western Europe', 'Western Offshoots', 'Yugoslav and Czech Successor states', 'Yugoslavia', 'Zaire']

Looking through the two lists the only country which was worth a fix was Russia (or Russian Federation).

In [28]:
coord.Country[coord.Country=='Russia'] = 'Russian Federation'

A related problem were the separate entries for Russia and USSR. To reconcile those I filled the empty values for Russia with the ones from USSR.

In [29]:
rf = gdp.ix['Russian Federation']
ussr = gdp_fill.ix['Total Former USSR']
gdp_fill.ix['Russian Federation'][rf.isnull()] = ussr[rf.isnull()]

Set the Country column as the index for the coordinates table and join the two tables.

In [30]:
coord.set_index('Country', inplace=True)
joined = pd.merge(gdp_fill, coord, left_index=True, right_index=True, how='inner', sort=True)

The new table named joined now contains the columns from both tables. We'll need to know where the GDP table's columns end:

In [31]:
joined.columns[187:191]
Out[31]:
Index([u'2002', u'2003', u'LAT', u'LONG'], dtype=object)
In [32]:
N_years = 189

Now we have everything ready to plot the data. First let's look at the number of GDP entries per year (before we padded the gdp table):

In [33]:
plt.plot([sum(gdp.ix[:,x].notnull()) for x in range(189)])
Out[33]:
[<matplotlib.lines.Line2D at 0x1032cc890>]

Those initial years with the lowest counts will result in overlapping points in the plot because the padding we did takes the values from the previous column. When there are a low number of original entries this means that the consecutive centers will be almost identical. To remove them I picked a threshold of at least 15 valid entries per year in order to plot that center.

In [34]:
threshold = 14

It's time to plot what we have. The most important lines in the next code block are:

lat, lon = year_gdp * joined.LAT, year_gdp * joined.LONG
total = sum(year_gdp)
x, y = world(sum(lon)/total, sum(lat)/total)

To find the economic center we use the formula for the center of mass. We weight the coordinates for each country by the respective GDP, take the sum of the weighted coordinates and divide by the sum of all the GDPs.

In [35]:
def plot1(llon, llat, ulon, ulat):
    plt.figure(figsize=(20, 12))
    plt.title('World\'s Center of Economic Activity (years 1-2003)', fontsize=24)
    
    # Initialize the map and configure the style
    world = Basemap(resolution='l',projection='merc', area_thresh=10000,
                llcrnrlon=llon, llcrnrlat=llat, urcrnrlon=ulon, urcrnrlat=ulat)
    world.drawcoastlines(linewidth=0.1)
    world.drawcountries(linewidth=0.1)
    world.drawlsmask(land_color='#E1E1D1', ocean_color='#F0F0E8')
    
    # indices of year columns with enough GDP data
    years = [x for x in range(N_years)
             if sum(gdp.ix[:,x].notnull()) > threshold]
    N = len(years)
    for (c, i) in enumerate(years):
        year_gdp = joined.ix[:,i]
        # weight the coordinates for each country by the corresponding GDP
        lat, lon = year_gdp * joined.LAT, year_gdp * joined.LONG
        total = sum(year_gdp)
        # find the center of mass and convert to map coordinates
        x, y = world(sum(lon)/total, sum(lat)/total)
        world.plot(x, y, 'o', color=cm.Spectral(float(c)/N),
                   markersize=10, label=joined.columns[i])
    
    # Pick the first 4 points and then every 20th for the legend
    handles_labels = zip(*plt.gca().get_legend_handles_labels())
    handles_labels = [handles_labels[i] for i in range(4)+range(5,N,20)]
    handles, labels = zip(*handles_labels)
    legend = plt.legend(handles, labels, title='Year', fontsize=16, numpoints=1)
    plt.setp(legend.get_title(),fontsize=18)
    
    return world

plot1(-110, -40, 140, 65)
Out[35]:
<mpl_toolkits.basemap.Basemap at 0x106b16610>

The plot shows the expected east-west-east move of the economic center. There is a big cluster of points between 1900 and 1940. To see it better we can zoom in:

In [36]:
plot1(-10, 35, 30, 45)
Out[36]:
<mpl_toolkits.basemap.Basemap at 0x106f1ed10>

My guess is that the small reversal in the trend is caused by World War I and the Great Depression. Those events didn't affect Asia as much as the US and Europe.

Another zoomed-in version of the map omits the legend and instead plots year labels next to a sample of the points.

In [37]:
def plot2(llon, llat, ulon, ulat):
    plt.figure(figsize=(20, 15))
    plt.title('World\'s Center of Economic Activity (years 1-2003)', fontsize=24)
    
    world = Basemap(resolution='l',projection='merc', area_thresh=10000,
                llcrnrlon=llon, llcrnrlat=llat, urcrnrlon=ulon, urcrnrlat=ulat)
    world.drawcoastlines(linewidth=0.1)
    world.drawcountries(linewidth=0.1)
    world.drawlsmask(land_color='#E1E1D1', ocean_color='#F0F0E8')
    
    t = 0
    years = [x for x in range(45) + range(50, N_years, 5) + [N_years-1]
             if sum(gdp.ix[:,x].notnull()) > threshold]
    N = len(years)
    labels = [0, 1, 3, 5, 7, 11, 18, 24, 28, 36]
    for (c, i) in enumerate(years):
        year_gdp = joined.ix[:,i]
        lat, lon = year_gdp * joined.LAT, year_gdp * joined.LONG
        total = sum(year_gdp)
        x, y = world(sum(lon)/total, sum(lat)/total)
        world.plot(x, y, 'o', color=cm.Spectral(float(c)/N),
                   markersize=12, label=joined.columns[i])
        if t in labels:
            plt.annotate(joined.columns[i], xy=(x-150000, y+140000),
                         family='cursive', size=23)
        t += 1
    
    return world

plot2(-20, 20, 80, 50)
Out[37]:
<mpl_toolkits.basemap.Basemap at 0x106b16e90>

No comments:

Post a Comment