Recently Google released it’s Google Analytics API into public beta, which means any old joe soap developer can give it a go without having to apply. Despite the fact I didn’t apply for the private beta I’ve been looking forward to this API for a long time… I know, I know get a life. Seriously though this opens the door to some pretty tight integration between web / mobile / desktop apps and analytical data, the possibilities are endless.
Google Analytics Data Export API
The API is easy to use and works with standard HTTP requests which return XML feeds so you can use it from any programming language. I’m going to explore the API more over the next while but I’ve used it below to pull the top 20 content items from an Analytics profile along with pageView counts for each.
Before you dive in please have a look over the developer guide but the Protocol page in particular.
Using the Analytics API mostly centers around 3 key tasks which include Authentication, Account Authorization Query and finally profile Query. Account Authorization Query is not required if you know the numeric ID of the account/profile combo you need to access. All these steps include (in a nutshell) loading a webpage and examining the response for what you need. I’ll go through these steps with code later on but here’s a plain english overview of what’s involved first.
You need to tell Google which user your app is representing/requesting data for. Google offers three ways to authenticate your app for access to a certain Google Analytics profile. These are AuthSub, OAuth and ClientLogin.
AuthSub means Google manages the entering of the username/password of the account you want to work with. This will be reassuring to a lot of people as they login via Google.com and your app will get access to only the services they explicitly approve (in this case Google Analytics). Drawback is you lose a bit of control and Google displays nasty warning messages of differing severity depending on a number of things.
OAuth is kind of an open standard version of AuthSub which can be used for authorising the use of data in many apps (not just Google ones). A security certificate corresponding to your app must be uploaded to use it.
ClientLogin on the other hand is more traditional and requires your app to request username/password from the user or manually define (perhaps in the web.config file) it if it is static. If it is static and your working with the same account (your or your clients account) all the time this is not a problem but if your application works with arbitrary accounts, users of those accounts may be uneasy about giving you their Google login details as you might use it in an unethical way or store it stupidly and later be hacked. Additionally when they give your app their user/pass combo they are giving you access to their entire range of Google services which they use (not just Google Analytics).
Therefore depending the type of app your building one authentication mode may be more appropriate than the others. I’ve written C# code which utilizes both AuthSub and ClientLogin authentication which I’ll step through later but if you want to read more about OAuth please visit OAuth Authentication for Web Applications
Account Authorization Query
After your app has ‘logged in’ (authenticated) you need to retrieve the profile ID of the account/profile combo which you want to query for certain data as this is required in the next step. If you intend querying the same profile(s) all the time then you can manually retrieve IDs for these profile(s) via the Google Analytics GUI, simply login and click the ‘Edit’ link listed beside all your website profiles, you will then see the ID listed on the next page. You can hardcode one or more profile IDs into your web.config if you like. If you will not know which website profiles your app will query ahead of time you must run an account query first from which Google will return all website profiles your authenticated user is authorised to work with.
When you are authenticated and have the ID of the profile you want to query you can then do just that, query. This step like the others before involves submitting a HTTP request (asking for a webpage). The response from that request is an XML feed with all the data you asked for. You define the data you want by configuring query params for your aforementioned HTTP request.
Step through - Top 20 page titles by pageview count for March 2009
OK lets actually see how to get our hands on some data. I’m going to grab the top 20 pages (by pageviews) for March 2009 and just output that data in plain text. A rough demo I put together is located at http://www.davecallan.com/analytics/, source code is available at the end of this post.
How to make a HTTP Request (with specified headers) with ASP.NET/C#?
The whole API is HTTP request based so you’ll need to know how to do this. I’m using the below method as a kind of helper when I need to interact with the API at all. The main classes of interest here are HttpWebRequest and HttpWebResponse (if anyone cares these are (roughly speaking) equivalent to using Curl in PHP). Both of these classes are located in the System.Net namespace.
HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(url);
//will always be a token of some sort required in the header but the format
//it is passed in will depend on what type of authorization is being used
if (mode == mode.ClientLogin)
myRequest.Headers.Add(”Authorization: GoogleLogin auth=” + token);
else if (mode == mode.AuthSub)
myRequest.Headers.Add(”Authorization: AuthSub token=” + token);
//obviously you need some kind of try/catch here
//but OK to bubble auth/connection failures up for demo
HttpWebResponse myResponse = (HttpWebResponse)myRequest.GetResponse();
Stream responseBody = myResponse.GetResponseStream();
Encoding encode = System.Text.Encoding.GetEncoding(”utf-8″);
StreamReader readStream = new StreamReader(responseBody, encode);
//return string itself (easier to work with)
The above code requests ‘url’ and returns the response to the calling code. You’ll need to import System.Net, System.I0 and System.Text to use it. The Google Analytics API requires (well not always but mostly) authorization tokens to be included in headers which are sent along with requests to the API. An example of how an authorization key is included via HTTP headers when AuthSub or ClientLogin mode is used is also included above. You’ll notice the expected format of the Authorization header changes slightly based on what authentication mode your app is using.
How to authenticate (step 1 of possible 3) using AuthSub. (ClientLogin can be used here instead)
As outlined by Google on the Protocol page regarding AuthSub authentication:
AuthSub proxy authentication is used by web applications that need to authenticate users to Google Accounts. With AuthSub, the website operator and the client code never see the user’s username and password. Instead, the client obtains special AuthSub tokens which it uses to act on a particular user’s behalf.
To use this mode your app must first direct users (via a standard link) to the Google site to login securely. After logging in takes place Google will redirect users back to your app with a query param named ‘token’ embedded in the URL. Your app then in turn needs to upgrade this once off token for a session token, it does this yes you guessed it via a HTTP request.
Click on the ‘AUTHSUB REMOTE LOGIN’ link on http://www.davecallan.com/analytics/ to see what the process is like (notice the address bar when you get redirected back to my site). Notice also the structure of the link which directs users to the login page on Googles site in the first place. In my case it’s
The most important param here is ‘next’. It is used by Google after authentication to determine where to redirect the user to. Please read http://code.google.com/apis/analytics/docs/gdata/1.0/gdataProtocol.html#AuthSub for an explanation of the other params.
In terms of upgrading the once of token in the address bar to a longer term session token, well I’m using the method below for that
public static string getSessionTokenAuthSub(string tempToken)
string response = GArequestResponseHelper(”https://www.google.com/accounts/AuthSubSessionToken“, tempToken, mode.AuthSub);
//temp (once off) token will have been exchanged for session token, return it
Following the API reference on the analytics developer site I know what URL to ask for. I also specify the authentication mode so the HTTP request/response helper method will know what format of header to include.
How to get a list of website profiles which authenticated user is allowed access (step 2 of 3).
This step is not needed if you know the ID of the profile you want to work with. The method I’m using is below
string response = GArequestResponseHelper(”https://www.google.com/analytics/feeds/accounts/default“, sessionToken, mode);
//response will contain an XML formatted string similar to
//we need to convert it to proper XML for parsing
XmlDocument accountinfoXML = new XmlDocument(); accountinfoXML.LoadXml(response);
//each account/profile combo the current user is authorized for will have an ‘entry’ element
XmlNodeList entries = accountinfoXML.GetElementsByTagName(”entry”);
NameValueCollection profiles = new NameValueCollection();
for (int i = 0; i < entries.Count; i++)
//profile name, profile ID - profile ID is needed for ID what data you want from the API
Pass in the token (now a session token) aquired in the previous step. This method parses the XML response to get website profile name and profile ID which I have bound to a dropdownlist (see the demo) so the user can select the relevant profile to query (in the next step). You need System.XML namespace for the above to work.
How to actually query an account (step 3 of 3).
Authentication this, authorization that… it’s time to actually request and get some real data. Specifying what data you want is all done by configuring query params in the URL which you request from the Analytics API. The Retrieving Report Data section on the protocol page has a wealth of information (but there’s more to be had on other sections of the analytics developer site too) so get it open in another window if you haven’t already done so.
As before it’s all based around a HTTP request (asking for a webpage) so we’ll be using our friend GArequestResponseHelper again. The base request URL is https://www.google.com/analytics/feeds/data however we need to add a load of query string params to that URL to instruct Google what type of information we want. First I’ll show you the URL I have hardcoded (for the purposes of this demo only) which gets me the top 20 page titles by pageviews and then I’ll explain it somewhat. Please refer to the retrieving report data section linked to above for more.
IDs is required and allows you to specify the profileID for the profile you want data for (you got this ID previously). The last four params should be fairly obvious. Metrics are the actual values you want to get hold off. In this case I’ve requested pageview counts. Dimensions relates to metric contexts or breakdowns (or cross sections) . Without dimensions specified the metric value reflects data in an Analytics account as a whole (1 aggregated value only) however if you specify that you want to see a metric (or metrics) broken down by a dimension, you get a breakdown of dimension->value combos. In this instance I don’t just want a count of total pageviews in an Analytics acccout, I want total page views for each (top 20) individual pagetitle in the account.
What’s returned from the API is completely dependent on what you ask for in the request parameters (assuming you have any data in the first place of course). An excerpt from the XML returned by the above request (on the Google Analytics website profile for akamarketing.com) is below:
<title type=’text’>ga:pageTitle=dynamic URL rewriter tool | mod rewrite tool | convert dynamic urls into static urls</title>
<link rel=’alternate’ type=’text/html’ href=’http://www.google.com/analytics’/>
<dxp:dimension name=’ga:pageTitle’ value=’dynamic URL rewriter tool | mod rewrite tool | convert dynamic urls into static urls’/>
<dxp:metric confidenceInterval=’0.0′ name=’ga:pageviews’ type=’integer’ value=’409′/>
<title type=’text’>ga:pageTitle=Google Analytics - exclude your visits even with a dynamic IP</title>
<link rel=’alternate’ type=’text/html’ href=’http://www.google.com/analytics’/>
<dxp:dimension name=’ga:pageTitle’ value=’Google Analytics - exclude your visits even with a dynamic IP’/>
<dxp:metric confidenceInterval=’0.0′ name=’ga:pageviews’ type=’integer’ value=’389′/>
Each dimension (pagetitle) is separated into ‘entry’ elements. In this case each ‘entry’ element has only one dimension and metric however sInce you can request multiple metrics and multiple dimensions in the same request this will not always be the case. You will therefore have to alter your XML parsing code depending on what exact data your requesting. It is important to note that not all metrics can be combined with all dimensions, some data relationships just don’t make sense. If you request a bad combination Google will throw a ‘bad request’ exception.
I’ve used AuthSub method for step 1 above. Step 2 and 3 above are the same regardless of which authentication mode is used. I won’t go through the ClientLogin mode much as it’s simple enough and the code is available & commented. Basically to use ClientLogin you need to POST username/password details to a specific URL (as defined in the API reference), if they are correct you will get back a session token which you will use exactly the same way as if the token had originated from AuthSub authentication.
I’ve put all code helper segments into a class called GoogleAnalytics for you to see the full source. It’s by no means production ready code, so please don’t comment me bringing that to my attention. If you want to see the full code for what’s located on http://www.davecallan.com/analytics then I’ve created an analytics.zip file which might be helpful. The code behind for the default page is quite simple and really just makes use of the static helper methods in the GoogleAnalytics class but if you’ve any questions let me know. The main thing for you, me & everyone else to do at this stage is to just play around with the new API and eventually I’m sure lots of excellent resources, tutorials and walkthroughs will appear to enable it to really take off.
If your not an ASP.Net developer heres some PHP resources about the new Analytics API
Similiar to this blog post only targetted towards PHP programmers
Using PHP & CURL to authenticate against ClientLogin
PHP Class for doing common API thing.