Backlink Data – Who is the Best Provider of them All?
January 9, 2012
With the death of Yahoo Site Explorer, online marketers around the world have started looking for replacements for this crucial service. There are several link data providers in the market and the number seems to be climbing constantly. Yet, there has been little or no independent research to assess the quality and the comprehensiveness of the data that these vendors offer.
When looking at backlink data, different parameters are important for different purposes. There are people that do not want to look at all the data due to the sheer size of some of the reports that some vendors offer. What is important to remember is that when one decides to look at partial data, they are automatically subscribing to the decision-making process involved in choosing which links to show and which to leave out of the report. This process has to be based on a set of rules deciding which links are important and which are not. These rules are usually arbitrary and are based on a fair amount of assumptions regarding how search engines decide which links should get more weight and which should be discounted. While these assumptions may help you waft through large amounts of data, they may have absolutely nothing to do with how search engines look at the link data nor do they help you get what you need from the backlink reports. Therefore, here at Rankabove, we always prefer looking at as much data as possible, while performing our own analysis based on the most comprehensive source of data.
In order to establish which of the current large backlink data providers allows to analyse the largest sample of backlink data, we chose three different URLs from sites we work on and analysed them through the most popular back link data providers on the market today: OSE (by SEOmoz), MajesticSEO (both Fresh and Historic indices), Ahrefs and Link Research Tools. First, here is a little bit of background on each of the vendors we analysed:
OSE (SEOMoz): there is no one out there in the world of SEO that is not familiar both with the SEOMoz blog and the set of their tools. Their backlink data analyser Open Site Explorer is powered by their Linkscape Web index. It is built by crawling the web, starting by the seed list of trusted sites and branching out from there. In addition to backlink index, it offers some of its own proprietary analysis, assigning quality, trust and authority scores to link. According to SEOMoz, its index includes 9.2 trillion links and 0.36 trillion URLs. With the advanced reporting feature, you can download up to 100,000 links per report.
MajesticSEO: Majestic is the first link data vendor that came to SEO market. The need for a larger dataset existed even when YSE was still around – Yahoo allowed only 1000 links per query and even though one could have increased this number through advanced queries, there was still a large gap in information available to webmasters. This gap was filled by MajesticSEO, which provided access to all the data in its backlink index. As a matter of fact, Majestic has two indices: Historic index which contains all of their data from June 2006 till November 2011 and which gets updated every 30 days and the Fresh Index which includes all the links found in the last 30 days and gets committed to the Historic index every 30 days. According to MajesticSEO site, the Historic Index includes 357.6 trillion pages crawled and 3.6 quadrillion links, while the Fresh index includes 20.2 trillion pages and 108.3 trillion links. Their reports allow you to download all the links although the number of links and reports you are able to see depends on the package you purchase
UPDATED: May 2018 – Updated to show new information about index and amounts allowed to download
Ahrefs: this is a relatively new member in the link data crowd. They claim to have their own bots that crawl the web and gather backlink data. According to their site, their index has 5.9 billion unique URLs and 13 billion 15.8 trillion unique backlinks. However, there is a limit to the number of links you can export and it depends on your package – basic package will allow you 2500 100,000 backlinks per query, while the most expensive package will allow 20,000 10 million backlinks per query.
(Update: According to the Ahrefs CEO commenting in SEO Book Forum (subscription only), they do have a capability to access more raw data, with free account being able to download up to 10K links and the most expensive account (Enterprise – $499 per month) up to 200M links. Unfortunately, access to this data was not apparent at the time we did the research, so Ahrefs data is based on stated original sample sizes.)
Link Research Tools – while this service does not crawl the web by themselves, they are more of a “meta link provider” – they compile data from several other available sources (among them SEOMoz, MajesticSEO, SEMRush, etc.) and de-duplicate it and check whether links are live. As we will see later in this post, link decay is a big problem which requires a lot of resources so having an external service do this for you can be a big plus. However, they allow only up to 2,000 links per report in their Quality Backlink Report, with the possibility to use something called Link Boost within a different tool called Backlink Profiler and get up to 12,000.
So with all the central players presented, we started checking the issues we felt were important to assess as to which of the providers has the best data.
We checked several things:
1. The number of provided links for a requested URL
2. The percentage of decayed links – how many links were dead at the time of the report creation
3. The percentage of unique links – how many links appear in a specific provider’s report, measured against each of the other providers.
These three parameters were the most crucial for us when choosing an external backlink provider. They represent a transection of the need to look at the most comprehensive index with the wish not to waste resources on analysing link data for links that are not live. Additionally, we did not want to judge each provider based solely on these two needs and checked for the possibility that a certain provider -while providing a smaller sample of links – does a good job in showing a unique sample of links that is not available through other providers.
Number of links
As mentioned earlier, we looked at three different URLs from sites in three completely different industries. One of the URLs is a homepage and the other two are internal pages.
Table A reports the numbers of links for each of the URLs:
Table A
As can be seen, there are huge differences in the number of links reported for each URL. However, this data is useless without knowing how many of the reported links are actually live vs. links that were removed or linking sites that do not exist anymore. For that purpose, we developed specialized tools that check what is the server response of the linking page, whether link from a provided list exists and what the anchor is. This way we were able to establish the link decay for each of the URLs in every provider report
Link Decay
Listed in Table B are the percentages of decayed links in each of the reports we created:
Table B
As can be seen, Majestic Historic index has enormous percentages of link decay. This is rather strange, as one would suspect that with the ability to remove the deleted links from the report, there would be less missing links reported. Another thing that jumps out is that percentage of link decay can vary from site to site (and by that virtue, from industry to industry) quite a bit. Notice the difference between the decay percentage of URL3 and URL1/2 in OSE. A further issues thing worth mentioning is that Link Research Tools report the missing links as ‘missing’, unlike other tools that report them as ‘live’.
Table C shows what the backlink report looks like after accounting for the link decay.
Table C
Yellow fields mark the fact that the calculation was done through sampling. Even with link decay calculated in, Majestic Historic index outperforms all the other indices by far. Ahrefs comes close (not in all cases though, example URL1) and it would be interesting to check how much those two indices overlap.
The next thing we wanted to analyse is what percentage of unique links each report includes. We checked only URL1 and URL3, since URL2 had a huge number of links and a lot of the reports only represented a sample of all of them.
Unique links
Table D shows the figures on the number of unique links each backlink report included:
URL1
And here is the graphical representation of the above data:
And in Table E is the data for the URL3:
And here is the graphical representation of the above data:
Notice that in both cases, the average figures for the Majestic Fresh Index are skewed towards lower end due to high overlap with the Majestic Historic Index (only and 6.29% unique links for URL1 and URL3 respectively). If we discount the overlap with Historic Index, the Majestic Fresh Index has a pretty high percentage of unique links.
So the question we pose is how do unique percentages translate into numbers? In Table E below are the numbers of unique, live links each backlink data provider gives for the two URLs we tested:
Conclusion:
Here at RankAbove we were looking for a backlink data supplier that will provide the Company with the most comprehensive information about backlinks that we could find. Our proprietary Drive algorithm makes a lot of decisions and SEO suggestions based on backlink data and we wanted to base those decisions on the most engulfing and complete backlink data that exists in the market. For that purpose, it is clear that MajesticSEO Historic index is without the doubt the best solution for our needs.
AHrefs was a worthy contender but at the moment their pricing structure and the amount of raw links they allow to download makes them an unfeasible solution for our needs. However, different users may require different things from their backlink data. The number of API calls or data processing (if you are automating your backlink analysis) may be an important factor, in which case the size of the index could be a negative factor in your decision. It is important to remember that extremely high percentages of link decay in Majestic Historic Index make it a necessity to perform independent crawling showing which of the links in the index are live. This process can be very resource demanding and this should be an important factor in choosing the backlink data provider. There are other parameters that we didn’t check – such as comprehensiveness of linking domain and/or anchor profile report, freshness of different indices or even the number of API functions available for dicing and slicing the data on the provider side.
Furthermore, with the disappearance of Yahoo Site Explorer, there are constantly new backlink data providers appearing in the market. We hope that both the results and the methodology presented in this post will help to accurately assess new providers that may arise in the future.
About the Author:
Branko Rihtman started out in the world of SEO in 2001. Since then he has helped numerous companies increase revenue in some of the most competitive online niches. Over that time, he realized that the SEO competitive advantage is to be found in proper testing and analysis and has started applying his scientific training to plan and execute extensive SEO experiments.
Branko completed his M.Sc. in Microbial Ecology in Hebrew University, Jerusalem. He was a featured speaker at a number of leading SEO conferences in the US and in Europe, such as SMX advanced, Sphinncon, MIT Forum, Affilicon, etc.Branko joined RankAbove in October 2011.
Comments
Comments are closed.
Great post Branko and some great data comparing these providers. As a user of a number of these tools my most important take away from this article was the need to independently measure link decay (especially from the Majestic historical index).
Also very interesting to see OSE lagging behind. I need to take a closer look at Ahrefs.
I\’m surprised you did not mention the supplemental metrics provided with the SEOMoz data set. If you are looking only at raw link metrics (# links, # unique linking domains, etc.), the volume matters. But we know that modern search engines look at far more intelligent measurements of link quality – PageRank and TrustRank being prime examples. SEOMoz is currently the only company that provides these types of metrics, which are essential IMHO to crafting a link strategy that avoids over optimizations…
Hi Russ,
As I mentioned in the post, all of these metrics are subjective. Unless you have access to Google’s metrics, you are making an assumption and we were not interested in assumptions.
I have recently read a testimony by someone who was trying to find out why a relatively new website is ruling almost all of his SERPs. Using OSE, there were no links reported that explained the rise in rankings. When looking at Majestic database, he found a bunch of crappy links, using exact anchor text. According to some subjective criteria, these links shouldn’t be counting. But Google counts them and they are helping a site rank for target keywords.
My point is that different people need different things from their link data provider. If SEOMoz assumptions on trust and value work for you, then by all means go and use them. For our needs, we wanted to look at the most complete available sample of backlinks and make our own judgement regarding what is helping and what isn’t. For that task OSE was just not enough.
+1 for your answer and thanks for this benchmark.
About links quality in MajesticSEO:
The ACRank (provided by MajesticSEO, mainly based on number of Unique Domain Name linking to a page/domain) is enough to filter out the links quality:
\”ACRank stands for A-Citation-Rank. It is a very simple measure of how important a particular page is by assigning a number from 0 (lowest) to 15 (highest) depending on the number of unique referring external root domains.\”
\”ACRank is a very simple and primitive measure of importance that allows us to prioritise analysis by looking at the data with the highest ACRanks first. However it should be said that in its current design it suffers from a serious drawback: the actual ACRank weight of a backlink is not taken into account, so a page with one backlink of ACRank 1 will have the same ACRank value of 1 just like the other page that has got a backlink with ACRank of 10 (or 15). Clearly the latter page should have got a higher ACRank as being linked to from the more important pages intuitively makes the page they linked to also more important. This drawback is going to be addressed shortly. Despite this limitation ACRank makes it easier to analyse data that is likely to have the highest weight. Meanwhile a concept of ACRank spread is used in an attempt to have better comparison of pages that might have the same ACRank. \”
Another important point which should count when comparing link providers is the number of unique domain-names found. For instance, a backlink in footer can make huge variation in data comparison.
I use MajesticSEO since one and half year and I can\’t find best links source than this one for huge websites.
For small websites, I like the design and ergonomic Human Machine Interface of Ahref.
Thanks for the comment.
Regarding the ACRank – every provider has their own, proprietary way of measuring quality, in that they are all the same. The big difference is whether they limit access to link data according to those parameters or not. OSE does, LRT does and Ahref does to some extent. In this way Majestic is a big winner – let me download all the data and decide for my own needs what is important and what isn\’t.
As for Ahrefs, this is definitely a tool we will be looking more into in the future. In addition to the link data, they offer other services which seem to be amazing – their SERPs Analysis tool lets you see what keywords any site ranks for and as they expand the database of tracked keywords this will be especially useful tool. If they integrate it with the link data, there is a potential for something amazing there, much better than similar SERP tracking services that exist in the market today – SEMRush for example.
Although those trust metrics could be important factors – you cant rely on them 100% remember, they are based on Moz interpretation of trust, and not the actual interretation of Google.
I often find that some links that count as high trust dont seem to make an imact, while others that dont, do.
Exactly my point. OSE is a great tool, don\’t get me wrong. At the moment, and if I am not mistaken, OSE is the only tool out there that allows you to quickly find domains that 301 to a different domain, something that is sometimes done as an attempt to get rid of penalties or to disguise link building efforts. It just wasn\’t suitable for our needs, we don\’t like working according to other people\’s assumption and this approach was shown to be useful in identifying some spammy (but working) link building tactics in the past.
No RavenTools?
Raventools uses Majestic (and SEOMoz) as data sources. They are not a data source in their own right.
Great post.. in terms of links that were reported by each tool, what was the comparison with the link information given in the GWT?
Looking at a site this morning in OSE reported 120 links but GWT has around 1600.. it would be interesting to see the comparisons with larger sites.
Hey Branko! Thanks for sharing the results of your evaluation. I also picked Majestic for my work (not after such an extensive review though ^^ ), but I agree with SMO BLogger, why didn\’t you consider Raven Tools? I never tried them myself, but I mostly hear good things. Would be interesting to know your opinion. Thanks again for a great post!
Raven uses MajesticSEO as their source.
So as Mayer wrote, RavenTools is not a link data provider, they have an (awesome) tool for managing your link building efforts and much more but they are not a provider themselves. In that sense, Link Research Tools are not a data provider either, they are more of a meta provider as they combine link data from several sources. I thought it would be interesting to take them into account as I expected them to cover all the links available and because they check for decay as a part of their data integration process.
As for the GWT data, I was asked this on Twitter as well. There are two main reasons why Google Webmaster Tools data did not make it into this study:
1) It is only available for the sites you own. If you want to do competition research or link prospecting, GWT cannot help you. Our Drive algorithm uses link data for a lot of different purposes, so we needed data for sites that we do not necessarily have access to their GWT
2) There is currently no API access to the GWT backlink data
Excellent post Branko. Have been struggling to see which link provided the best backlink data. I have been using OSE for most of my clients and for website analysis and has proved quite useful. Really love the way data can be filtered to highlight spammy links during site reviews.
Not had a chance to review Majestic SEO thoroughly, will give it a try thanks.
Thanks for this, great article. I particularly like Ahrefs as the free version is excellent and you can download reports of competitor sites, including anchor texts.
Excellant article Branko! You concluded that Majestic was the winner but will you still be using any of the other providers as well? While your article didn\’\’t mention this I was wondering if some of the unique back links found on the other providers were of such high quality to justify keeping those BL providers as a resource?
You also mentioned having to develop specialized tools that check what is the server response of the linking page for these tests, do you know of any tools out in the mainstream that can validate those links? I know ALM does but it is very slow, does your Drive platform have this component?
@mindquest
Found out today that the Screaming Frog SEO Spider Tool will do it for you – http://goo.gl/8a8LU (great tool and worth having even if you don\’t need this option)
Also SEO Tools for Excel can check this for you. I woud say the SEO Spider would be a more stable option though if checking a lot of URLs.
Hi Mindquest,
Thanks for the compliments and thanks for the love on the forum 😉 Really appreciate it.
We will definitely keep an eye on the Ahrefs. As I mentioned in the article, they performed only slightly worse than the Majestic Historical index. I think that I will take a better look at them when in cases where I get the feeling Majestic is not providing me the answers I need, although this has yet to happen. As I wrote in one of the other comments, Ahrefs has a few more tools that are definitely worth a look, especially the SERP tracker which tells you what other keywords a site is ranking for.
As for the tools we have, we had a separate web-based link checker developed for this purpose. Drive does have this capability, but we did not want to put a load on Drive as it could possibly delay some of the ongoing checks that our customers perform. As Marc said, Screaming Frog SEO Spider (which I am sure you have seen me pimp numerous times on the forum) can do this as it is described in the article Marc linked to.
Great article, we made similar internal analysis and came to same result.
Great study, have been using SEOMoz, but will look at Majestic after this.
I\’m still in shock and mouring about yahoo site explorer…. 🙁
What would people recommend as the best for a free overview?
Jonny
With the change in how OSE will updates it\’s index do you think it would change the results of this test much?
SEOmoz announcement:
http://www.seomoz.org/blog/58-billion-urls-largest-linkscape-index-update-yet
This is an excellent blog post Branko. Ahref looks great but like you say their pricing structure needs to be looked at.
I have been using Majestic for a while now but am not confident am too sure about the quality of their data.
You did some fantastic analysis here and we\’ll have to re-review our Majestic look as well. The fact that it has such a large link index compared to the others makes it have quite an edge on the sheer volume front. I wonder how long it will take a few of the others to catch up.