For anyone in marketing and the analytics profession, being able to attribute your acquisition channel correctly is key to understanding the origin and success of your marketing campaigns, as well as your visitors journey. For sites who’s currency is derived from the consumption of publishing articles and content there are a few considerations when using campaign tracking, which is how you separate your acquisition sources or channels.
In a working example of this, consider how you might approach the tracking of content or campaigns you send out via Twitter. For this your site URL might be www.mysite.com/page1.htm?cpid=SOCIAL:1233456, which includes some campaign tracking (so that example is SiteCatalyst). I would certainly advocate you using a URL shortening service account, a good one is bit.ly, and would recommend that you setup an account be used to shorten any long URL as it will give you some extra data.
The approach of tagging your URLs with campaign tracking will be beneficial in helping you identify visitors to your site from Twitter. This isn’t fool proof as your tweeted link could end up in an email, bookmarked etc. however it does however give you an indication of the reach.
There are pitfalls to be aware of though.
Google is now much better at dealing with parameters within url’s (correct as of writing this), and since twitter is used by Google to feed and accelarate (my assertion) indexing of new content then it is possible that the link sent over Twitter is picked up by google. Certain url shorteners publish a permanent 301 redirect from their websites, which means there is a permenant reference to the shortened URL.
So why is this an issue?
Imagine you have a popular site that feeds articles out immediately by Twitter, RSS and the website simultaneously. The internal linking of the site might have a link that is nice and clean, such as www.my site.com/news/12345/this-is-my-proper-page.htm and RSS is auto tagged adding &cpid=RSS:123456 and that you also use feed burner, the Twitter url adds &cpid=SOCIAL:123456 on the end of the standard URL and is shortened by bit.ly.
There are now 3 possible links to pages that could end up in Googles index, but only one physical page. If Google indexed all 3, this in it’s simplest form would mean that duplicate content was created, as Google will treat them as separate pages. A challenge which needs to be overcome.
There are several ways of overcoming this, and for that matter cleaning up duplicate content on Google. These are detailed below but with different results and challenges.
1. Use google’s webmaster tools to remove parameters.
Since the world doesn’t just revolve around Google this is quite an issue as this solution would only work for Google it is not a complete solution. Given Google’s market share you coulp hypothesize that most visitors would come that route and naturally led to think – quick win! Unfortunately it’s not a quick win. I’ve experienced that post application of url parameter filtering is not guaranteed. Launching a new site with this enabled may help, however this technique is not universally accepted.
2. Write a canonical reference on to each page, so that parameters are ignored and the one true page is always acknowledged by a search engine.
There are inherent risks in retrospectively applying this to a huge site, but it should clean up a lot of duplicate content. Beware of canonicalisation of your entire site to one page – seriously bad news for your business! This approach is possibly the simplest to deploy (bar testing), as it requires little technical knowledge and a content management system could be changed to facilitate this. Canonicals will also help where you have the same content categorised differently, for example a pair of shoes categories in red and pink, as well as say an article that is categorised in different groupings say, a new product launch appearing in a car section and new product launches.
The following is an example of a canonical tag that would appear in the of the document.
<link rel=”canonical” href=”http://www.mysite.com/pagename.htm” />
3. Use a 301 redirect to remove URL parameters.
This is possibly the more challenging to implement and does have down sides as well as upsides. Google now acknowledges that some credit is lost with every 301 redirect, also there is question of retention of the anchor text. The real upside is that the page will always results in a clean URL being used. If this is later shared it would not be shared with all the parameters added. This solution requires server access to achieve the 301 redirect, and also the capability to write data either into the page as say meta data or into a cookie for later access. Be careful as if you also depend on the GCLID for your adwords and google analytics the 301 redirection to clean up the URL could be problematic as you’ll lose this parameter, which will have knock-ons if you use Google Analytics with Adwords.
The use of a 301 redirect would require that the web server process the page request, and then update a cookie on your browser (if cookies were used) or pass the parameters to the renderer to be added to the resulting page as meta data when the 301 redirect is issued. Also be aware that some analytics vendors lose referring sources through redirection, for example SiteCatalyst. When the analytics JavaScript is executed, it should be written in a way to either access the DOM to pick up the meta data, or access the cookie data.
If you use the cookie method, it would require that you write any parameters to a first party cookie to reduce the impact of cookie filtering. Since your website writes the cookie, this should be a mute point. We must not forget though that this technique may not be appropriate for mobile devices where cookie support is not enabled. Thankfully though the advent of Android and iphone there is greater support of JavaScript and also cookies on mobile devices.
Conclusion
My take on this is that pages need a canonical reference to it’s clean URL, however it is also appropriate that you consider the 301 redirection solution as this would ensure that any bookmarked, shared or linked content to your site uses a clean URL rather than depending on the canonical. With Canonicals, you are at the mercy of the search engines continued support of the approach. I cant see that being an issue though. The same applies to 301 redirects, and whether credit for links are diminished in some way.
One caveat though – With any post launch change to a site, comes risk. The way to reduce this is with lots of testing, to ensure you haven’t done anything to damage your site.
I’ve focused down on campaign management in this post, however this applies for anything that might add parameters to your URL and risk the creation of duplicate content in the eyes of a search engine.