[]

by Daniel Velkov, for comments go to Hacker News

The story starts with an insightful graphic which was published in The Economist. The original work was done by the McKinsey Global Institute which itself is based on data collected by the University of Groningen (link to the raw spreadsheet).

The graphic plots the movement of the world's geographic center of economic activity for the years between 1 and 2025. There was one thing which bothered me, and I hope it bothers you too, the points from the 20th century are all positioned in or above Scandinavia which seems unlikely to be the center of anything in the world. You can find the reason for that in the caption on the McKinsey webpage. Their report looks at the Earth as a sphere and finds the economic center of gravity which falls somewhere inside the sphere. To plot it on the map, they take a radius through the center of gravity and intersect it with the surface. To see why this is a problem, consider the extreme case of only USA and China existing and their economies being equally sized. In this setup the economic center will be close to the North Pole because the two countries happen to be on almost the exact opposite sides of the Earth and roughly at the same latitude.

I had an idea how to visualize the same data in a better and more intuitive way. Instead of the original method, I did all the calculations in 2D with respect to the usual map coordinates. This has the nice property that in the USA/China case from above, the center will end up somewhere near Spain which is the midway point between USA and China on most maps. The result of this approach is:

You can see that it shows the same directional trends as the first chart. In the year 1 AD the world's economic center is close to China and India. Looking back at the spreadsheet for that year, you can see that the total economic output of Asia is 5 times larger than Europe's. There is a slight move to the east, in the years 1-1600, which is due to the decline of the Roman Empire and Europe in general during that time. As time progresses, Europe's economy rises and the center starts shifting west. With the help of the US, this movement continues until 1950 when the trend reverses. The return of the Asian economies starts pushing the point east and in 2000 the position is at the same level on the east-west axis as it was in 1900.

The rest of this post describes how to gather the necessary data and produce the plot. It includes processing the raw spreadsheet data with the help of pandas and plotting with matplotlib and its basemap toolkit.

In [20]:

from mpl_toolkits.basemap import Basemap
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
%matplotlib inline
import pandas as pd

The first step was to find a source for historical GDP data by country. One of the top results when searching for "historical gdp data by country" was this spreadsheet which is what The Economist and McKinsey are using.

In [21]:

gdp = pd.read_csv('horizontal-file_03-2007 - GDP.csv', skip_footer=4, thousands=',')
gdp.ix[:5,:10]

Out[21]:

	Unnamed: 0	1	1000	1500	1600	1700	1820	1821	1822	1823
0	Western Europe	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	Austria	213	298	1414	2093	2483	4104	NaN	NaN	NaN
2	Belgium	135	170	1225	1561	2288	4529	NaN	NaN	NaN
3	Denmark	72	144	443	569	727	1471	1541	1564	1564
4	Finland	8	16	136	215	255	913	NaN	NaN	NaN
5	France	2366	2763	10912	15559	19539	35468	38524	37267	38706

The values are in inflation adjusted 1990 US dollars.

In [22]:

gdp.rename(columns = {'Unnamed: 0':'Country'}, inplace=True)

Some of the country names had an extra space at the end:

In [23]:

gdp.Country[:10].values

Out[23]:

array(['Western Europe ', 'Austria ', 'Belgium ', 'Denmark ', 'Finland ',
       'France', 'Germany ', 'Italy ', 'Netherlands ', 'Norway '], dtype=object)

Let's clean those up. Also set the Country column as an index which we'll use when joining with the geo coordinates data later.

In [24]:

gdp.Country = gdp.Country.map(lambda s: s.strip())
gdp.set_index('Country', inplace=True)

As you saw in the table above there are some NaN values. To ensure a smooth series of values for each country I fill the missing values with the nearest known value to the left. To handle the countries with an NaN in the first column we can set those cells to 0.

In [25]:

gdp_fill = gdp.applymap(lambda v: v if type(v) == np.float64 else np.nan)
gdp_fill.astype(float)
gdp_fill.ix[:,0].fillna(0, inplace=True)
gdp_fill.fillna(method='ffill', axis=1, inplace=True)
gdp_fill.ix[:5,:10]

Out[25]:

	1	1000	1500	1600	1700	1820	1821	1822	1823	1824
Country
Western Europe	0	0	0	0	0	0	0	0	0	0
Austria	213	298	1414	2093	2483	4104	4104	4104	4104	4104
Belgium	135	170	1225	1561	2288	4529	4529	4529	4529	4529
Denmark	72	144	443	569	727	1471	1541	1564	1564	1611
Finland	8	16	136	215	255	913	913	913	913	913

The second piece of data that I need is a list of the geographic centers for each country. A quick search led me to http://gothos.info/resources/ which has exactly what I need.

In [26]:

coord = pd.read_table('country_centroids_primary.csv', sep='\t')
coord.rename(columns = {'SHORT_NAME':'Country'}, inplace=True)
coord.ix[:5]

Out[26]:

	LAT	LONG	DMS_LAT	DMS_LONG	MGRS	JOG	DSG	AFFIL	FIPS10	Country	FULL_NAME	MOD_DATE	ISO3136
0	33.000000	66.0	330000	660000	42STB1970055286	NI42-09	PCLI	NaN	AF	Afghanistan	Islamic Republic of Afghanistan	2009-04-10	AF
1	41.000000	20.0	410000	200000	34TDL1589839239	NK34-08	PCLI	NaN	AL	Albania	Republic of Albania	2007-02-28	AL
2	28.000000	3.0	280000	30000	31REL0000097202	NH31-15	PCLI	NaN	AG	Algeria	People's Democratic Republic of Algeria	2011-03-03	DZ
3	-14.333333	-170.0	-142000	-1700000	1802701	NaN	PCLD	US	AS	American Samoa	Territory of American Samoa	1998-10-06	AS
4	42.500000	1.5	423000	13000	31TCH7675006383	NK31-04	PCLI	NaN	AN	Andorra	Principality of Andorra	2007-02-28	AD
5	-12.500000	18.5	-123000	183000	34LBM2828516873	SD34-01	PCLI	NaN	AO	Angola	Republic of Angola	2007-02-28	AO

One problem that I found was that there were countries present in one table but not the other. Here are the lists of all those cases:

In [27]:

print "Has coordinates but no GDP:", sorted(set(coord.Country) - set(gdp_fill.index))
print "\nHas GDP but no coordinates:", sorted(set(gdp_fill.index) - set(coord.Country))

Has coordinates but no GDP: ['American Samoa', 'Andorra', 'Anguilla', 'Antigua and Barbuda', 'Aruba', 'Ascension', 'Bahamas', 'Barbados', 'Belize', 'Bermuda', 'Bhutan', 'Bonaire', 'Bosnia and Herzegovina', 'British Virgin Islands', 'Brunei', 'Cayman Islands', 'Christmas Island', 'Cocos (Keeling) Islands', 'Comoros', 'Cook Islands', "Cote d'Ivoire", 'Cura\xc3\xa7ao', 'Cyprus', 'Democratic Republic of the Congo', 'Dominica', 'Eritrea', 'Ethiopia', 'Falkland Islands', 'Federated States of Micronesia', 'Fiji', 'French Guiana', 'French Polynesia', 'Gaza Strip', 'Gibraltar', 'Greenland', 'Grenada', 'Guadeloupe', 'Guam', 'Guinea-Bissau', 'Guyana', 'Haiti', 'Iceland', 'Kiribati', 'Kosovo', 'Liechtenstein', 'Luxembourg', 'Maldives', 'Malta', 'Marshall Islands', 'Martinique', 'Mayotte', 'Monaco', 'Montenegro', 'Montserrat', 'Nauru', 'New Caledonia', 'Niue', 'Norfolk Island', 'Northern Mariana Islands', 'Palau', 'Papua New Guinea', 'Pitcairn Islands', 'Republic of the Congo', 'Russia', 'Saint Barthelemy', 'Saint Kitts and Nevis', 'Saint Lucia', 'Saint Martin', 'Saint Pierre and Miquelon', 'Saint Vincent and the Grenadines', 'Samoa', 'San Marino', 'Sao Tome and Principe', 'Serbia', 'Sint Maarten', 'Solomon Islands', 'South Sudan', 'Suriname', 'Timor-Leste', 'Tokelau', 'Tonga', 'Turks and Caicos Islands', 'Tuvalu', 'US Virgin Islands', 'Vanuatu', 'Vatican City', 'Wallis and Futuna', 'West Bank', 'Western Sahara']

Has GDP but no coordinates: ['15 Latin American countries', '15 West Asian countries', '16 East Asian countries', '29 East Asian countries', '57 African countries', '8 Latin American countries', 'Bosnia', 'Comoro Islands', 'Congo', 'Czechoslovakia', "C\xc3\xb4te d'Ivoire", 'Eastern Europe', 'Eritrea and Ethiopia', 'Former Czechoslovakia', 'Former Yugoslavia', 'Guinea Bissau', 'Ha\xc3\xafti', 'Hong Kong', 'Russian Federation', 'Serbia/Montenegro', 'Successor Republics of USSR', 'S\xc3\xa3o Tom\xc3\xa9 and Principe', 'Total 12 Western Europe', 'Total 13 small WEC', 'Total 15 Latin American countries', 'Total 15 West Asian countries', 'Total 16 East Asian countries', 'Total 23 Small East Asian countries', 'Total 24 small Caribbean countries', 'Total 29 East Asian countries', 'Total 29 Western Europe', 'Total 3 Small African countries', 'Total 7 East European Countries', 'Total 8 Latin American countries', 'Total Africa', 'Total Asia', 'Total Former USSR', 'Total Latin America', 'Total Western Offshoots', 'West Bank and Gaza', 'Western Europe', 'Western Offshoots', 'Yugoslav and Czech Successor states', 'Yugoslavia', 'Zaire']

Looking through the two lists the only country which was worth a fix was Russia (or Russian Federation).

In [28]:

coord.Country[coord.Country=='Russia'] = 'Russian Federation'

A related problem were the separate entries for Russia and USSR. To reconcile those I filled the empty values for Russia with the ones from USSR.

In [29]:

rf = gdp.ix['Russian Federation']
ussr = gdp_fill.ix['Total Former USSR']
gdp_fill.ix['Russian Federation'][rf.isnull()] = ussr[rf.isnull()]

Set the Country column as the index for the coordinates table and join the two tables.

In [30]:

coord.set_index('Country', inplace=True)
joined = pd.merge(gdp_fill, coord, left_index=True, right_index=True, how='inner', sort=True)

The new table named joined now contains the columns from both tables. We'll need to know where the GDP table's columns end:

In [31]:

joined.columns[187:191]

Out[31]:

Index([u'2002', u'2003', u'LAT', u'LONG'], dtype=object)

In [32]:

N_years = 189

Now we have everything ready to plot the data. First let's look at the number of GDP entries per year (before we padded the gdp table):

In [33]:

plt.plot([sum(gdp.ix[:,x].notnull()) for x in range(189)])

Out[33]:

[<matplotlib.lines.Line2D at 0x1032cc890>]

Those initial years with the lowest counts will result in overlapping points in the plot because the padding we did takes the values from the previous column. When there are a low number of original entries this means that the consecutive centers will be almost identical. To remove them I picked a threshold of at least 15 valid entries per year in order to plot that center.

In [34]:

threshold = 14

It's time to plot what we have. The most important lines in the next code block are:

lat, lon = year_gdp * joined.LAT, year_gdp * joined.LONG
total = sum(year_gdp)
x, y = world(sum(lon)/total, sum(lat)/total)

To find the economic center we use the formula for the center of mass. We weight the coordinates for each country by the respective GDP, take the sum of the weighted coordinates and divide by the sum of all the GDPs.

In [35]:

def plot1(llon, llat, ulon, ulat):
    plt.figure(figsize=(20, 12))
    plt.title('World\'s Center of Economic Activity (years 1-2003)', fontsize=24)
    
    # Initialize the map and configure the style
    world = Basemap(resolution='l',projection='merc', area_thresh=10000,
                llcrnrlon=llon, llcrnrlat=llat, urcrnrlon=ulon, urcrnrlat=ulat)
    world.drawcoastlines(linewidth=0.1)
    world.drawcountries(linewidth=0.1)
    world.drawlsmask(land_color='#E1E1D1', ocean_color='#F0F0E8')
    
    # indices of year columns with enough GDP data
    years = [x for x in range(N_years)
             if sum(gdp.ix[:,x].notnull()) > threshold]
    N = len(years)
    for (c, i) in enumerate(years):
        year_gdp = joined.ix[:,i]
        # weight the coordinates for each country by the corresponding GDP
        lat, lon = year_gdp * joined.LAT, year_gdp * joined.LONG
        total = sum(year_gdp)
        # find the center of mass and convert to map coordinates
        x, y = world(sum(lon)/total, sum(lat)/total)
        world.plot(x, y, 'o', color=cm.Spectral(float(c)/N),
                   markersize=10, label=joined.columns[i])
    
    # Pick the first 4 points and then every 20th for the legend
    handles_labels = zip(*plt.gca().get_legend_handles_labels())
    handles_labels = [handles_labels[i] for i in range(4)+range(5,N,20)]
    handles, labels = zip(*handles_labels)
    legend = plt.legend(handles, labels, title='Year', fontsize=16, numpoints=1)
    plt.setp(legend.get_title(),fontsize=18)
    
    return world

plot1(-110, -40, 140, 65)

Out[35]:

<mpl_toolkits.basemap.Basemap at 0x106b16610>

The plot shows the expected east-west-east move of the economic center. There is a big cluster of points between 1900 and 1940. To see it better we can zoom in:

In [36]:

plot1(-10, 35, 30, 45)

Out[36]:

<mpl_toolkits.basemap.Basemap at 0x106f1ed10>

My guess is that the small reversal in the trend is caused by World War I and the Great Depression. Those events didn't affect Asia as much as the US and Europe.

Another zoomed-in version of the map omits the legend and instead plots year labels next to a sample of the points.

In [37]:

def plot2(llon, llat, ulon, ulat):
    plt.figure(figsize=(20, 15))
    plt.title('World\'s Center of Economic Activity (years 1-2003)', fontsize=24)
    
    world = Basemap(resolution='l',projection='merc', area_thresh=10000,
                llcrnrlon=llon, llcrnrlat=llat, urcrnrlon=ulon, urcrnrlat=ulat)
    world.drawcoastlines(linewidth=0.1)
    world.drawcountries(linewidth=0.1)
    world.drawlsmask(land_color='#E1E1D1', ocean_color='#F0F0E8')
    
    t = 0
    years = [x for x in range(45) + range(50, N_years, 5) + [N_years-1]
             if sum(gdp.ix[:,x].notnull()) > threshold]
    N = len(years)
    labels = [0, 1, 3, 5, 7, 11, 18, 24, 28, 36]
    for (c, i) in enumerate(years):
        year_gdp = joined.ix[:,i]
        lat, lon = year_gdp * joined.LAT, year_gdp * joined.LONG
        total = sum(year_gdp)
        x, y = world(sum(lon)/total, sum(lat)/total)
        world.plot(x, y, 'o', color=cm.Spectral(float(c)/N),
                   markersize=12, label=joined.columns[i])
        if t in labels:
            plt.annotate(joined.columns[i], xy=(x-150000, y+140000),
                         family='cursive', size=23)
        t += 1
    
    return world

plot2(-20, 20, 80, 50)

Out[37]:

<mpl_toolkits.basemap.Basemap at 0x106b16e90>

Daniel's blog

Wednesday, July 16, 2014

Finding the World's Economic Center of Gravity

No comments:

Post a Comment