Mapping the (non-english) blogosphere

Posted on 02/01/2011 by


Determining the size of the blogosphere has been a nefariously tricky task that has plagued researchers and pundits of various inclinations since the early 2000’s.  Some estimates put the number of blogs at somewhere around 200-250 million.  Something I realized the other day was that most of the estimates we have are for the English-speaking world.  Why?

Some attribute the concentration of data on the English-language blogosphere to the monolithic entity of Technorati.  Back in the day, it was the place to go to check the authority or ranking of a blog.  If you wanted attention, you made sure you were featured on Technorati (the environment has now changed quite a bit–you can read up on the history on Wikipedia).  As a result, a great deal of information was concentrated in one (English-speaking) place.

But now, times are a changin’.

As various news sources reported last year, ICANN approved Chinese characters for use in domain names.  Wired reported that:

[T]his comes after countries such as Russia, Egypt and the United Arab Emirates successfully applied for country TLDs in their own alphabets. There are still pending requests from countries such as Jordan, Sri Lanka and Thailand to receive their own domains in Arabic, Sinhalese and Thai, respectively (Wired).

Internet usage is increasing worldwide as bandwidth becomes cheaper and governments invest in ICT infrastructure.  Pingdom reports that the number of Chinese blogs in 2009 was as high as 126 million.  By now, it could be twice that (the rate of internet penetration in urban China is truly astounding. The country added over 69 million internet users last year, Gigaom).

Luckily, a few researchers are looking into this question.

One of my favorite places in academia in general is the Berkman Center for Internet and Society at Harvard.  Their faculty and fellows research government and net neutrality/privacy issues, cyber-governance, social norms, &c.  It is pretty awesome.  And in the past couple of years, they have been looking into modeling the foreign blogosphere.

The Center’s “Mapping the Arabic Blogosphere: Politics, Culture and Dissent”  is a prime example of using data visualization to make larger points about social organization.

From their key findings:

We conducted a study of the Arabic language blogosphere using link analysis, term frequency analysis, and human coding of individual blogs. We identified a base network of approximately 35,000 active blogs, created a network map of the 6,000 most connected blogs, and with a team of Arabic speakers hand coded 4,000 blogs

The authors extrapolate that:

  1. The Arabic blogosphere is organized primarily around countries.
  2. Demographic results indicate that Arabic bloggers are predominately young and male.
  3. Personal life and local issues are most important: Most bloggers write mainly personal, diary-style observations.
  4. When discussing terrorism, Arab bloggers are overwhelmingly critical of terrorists. When the US is discussed, it is nearly always critically.

I highly recommend reading their whole report, if not for the gorgeous visuals.

A similar and more recent (October 2010) piece is entitled “Public Discourse in the Russian Blogosphere: Mapping RuNet Politics and Mobilization”  I won’t go into detail now (I would like to cover the Berkman Center‘s work in more detail in a later post), but once again they are able to make conclusions as to the preferences and relationships among internet users in a way that Technorati has never done.

What is the future of non-English internet use? What I see is an increasing divergence in the content and relationships between various ‘spheres’ (as the reports image them).  The more that we can gather data on ever-growing internet participation, the more we can identify the cleavages that are defining more and more distinct spaces on the web.

(Thanks to the Berkman Center for their images)