- Seeing content in Google Cache doesn’t mean it is indexed by Google.
If you want to know which frameworks work well with SEO, but don’t want to go through the experiment’s documentation, click here to scroll straight to results section and see the charts presenting the data.
Why I Created This Experiment
I believe Google’s announcement was widely misunderstood. Let me explain why.
Most developers reference this section of Google’s blog post:
In the same article, you will find few more statements that are quite interesting, yet overlooked.
“Sometimes things don't go perfectly during rendering, which may negatively impact search results for your site.”
Angular U conference, June 22-25, 2015, Hyatt Regency, San Francisco Airport
“Angular 2 Server Rendering”
If you search for any competitive keyword terms, it’s always gonna be server rendered sites. And the reason is because, although Google does index client rendered HTML, it’s not perfect yet and other search engines don’t do it as well. So if you care about SEO, you still need to have server-rendered content.
This experiment is the first step in providing clear, actionable data on how to work with websites based on the JS framework used.
Now that we have discussed the why of this test, let’s look at how we set it up.
Setting Up the Website
The core of the website was coded 100% in HTML to make sure it is fully crawlable and indexable. It got interesting when you opened one of the subpages.
At this point, our experiment was more or less ready to go. All we needed now was content.
Our “Hello World” pages got indexed a few hours after we launched the website. To make sure there was some unique content we could “feed” Googlebot, I decided to hire artificial intelligence to write the article for us. To do that, we used Articoloo, which generates amazing content written by AI.
I decided the theme of our articles would be based on popular tourist destinations.
Having indexed content is only half the battle, though. A website’s architecture can only work properly if Googlebot can follow internal and external links.
Let me show you an example.
To make it even easier to track, links pointed to the *framework*/test/ URLs.
Link generated by Angular 2 page (http://jsseo.expert/angular2/) would point to http://jsseo.expert/angular2/t e s t/ (spaces added to avoid messing up the experiment with a live link!). This made it really easy to track how Googlebot crawls /test/ URLs. Links weren’t accessible to Googlebot in any other form (external links, sitemaps, GSC fetch etc.).
To track if Googlebot visited those URLs, we tracked server logs in Loggly.com. This way, I would have a live preview of what was being crawled by Googlebot while my log data history would be safely stored on the server.
Next, I created an alert to be notified about visits to any */test/ URL from any known Google’s IP addresses.
Methodology for the experiment was dead simple, to make sure we measured everything precisely and to avoid false positives or negatives.
- We had a plain HTML page as reference to make sure Googlebot could fully access our website, content, etc.
- We tracked server logs. Tools - Loggly for live preview + full server logs stored on server (Loggly has limited log retention time).
- We carefully tracked website’s uptime to make sure it was accessible for the Googlebot. Tools - NewRelic, Onpage.org, Statuscake
- We made sure all resources (CSS, JS) were fully accessible for the Googlebot.
- All http://jsseo.expert/*FRAMEWORK-NAME*/test/ URLs were set to noindex, follow and we carefully tracked if Googlebot visited any of /test/ pages via custom alerts setup in Loggly.com.
- We kept this experiment secret while gathering the data (to prevent someone from sharing the test URL on social or fetching it as Googlebot to mess with our results). Of course, we couldn't control crawlers, scrapers and organic traffic hitting the website after it got indexed in Google.
After getting feedback on this experiment from John Mueller and seeing different results across different browsers/devices, we won't be continuing to look at cache data when proceeding with this experiment. It doesn't reflect Googlebot's crawling or indexing abilities.
After collecting all the data, we created a simple methodology to analyze all the findings pouring in.
- Fetch and render via Google Search Console - does it render properly?
- Is URL indexed by Google?
- Is URL’s content visible in Google’s cache?
- Are links displayed properly in Google’s cache?
- Search for unique content from framework’s page.
- Check if ”*framework*/test/” URL was crawled.
Let’s go through this checklist by looking at Angular JS 2 framework. If you want to follow the same steps, check out framework’s URL here.
2. Is the site’s framework’s URL indexed by Google?
URL is properly indexed by Google, so this is obviously a SUCCESS!
3. Is URL’s content visible in Google’s cache?
4. Are links displayed properly in Google’s cache?
5. Search for unique content from framework’s page
6. Check if ”*framework*/test/” URL was crawled
To track Googlebot’s crawling we used Loggly, and to double check the data we manually went through the logs.
Here are the results for Angular 2.
Let’s start with basic configurations for all frameworks used for this experiment.
This is not the end of the experiment, though. The most exciting part of the results is still ahead of us.
Experiment Results - jQuery - Internal vs. External vs. Ajax call
Experiment Results - React - Inline vs. External
Again, not much to add here. I think you start to see the interesting pattern this experiment exposed. Inline code is fully crawlable and indexable, when external somehow blocks Googlebot from visiting /test/ URL.
Experiment Results - Angular JS 1 and 2 - Inline, vs External vs. Bundled
In the SEO community, we are used to Google making things complicated, so I won’t elaborate on this topic. Suffice it to say that Google’s framework was the most complicated and difficult to diagnose. Fortunately, it also delivered the most exciting results.
We can clearly see that none of the Angular frameworks are SEO-friendly “out of the box”. Now, this is interesting - they weren’t designed to be SEO-friendly without server side rendering.
Googlers know and admit that.
The problem with client rendered Angular websites comes from the lack of expertise of some Angular JS developers. Let me quote a few very smart Angular JS guys, who also happen to be responsible for creating and developing this framework.
During my research I’ve found a short Youtube video that explains it all.
If you search for any competitive keyword terms, it’s always going to be server rendered sites. And the reason is because although Google does index client-side rendered HTML, it’s not perfect yet and other search engines don’t do it as well. So if you care about SEO, you still need to have server-rendered content.
Angular U conference, June 22-25, 2015, Hyatt Regency, San Francisco Airport
“Angular 2 Server Rendering”
Jeff Whelpley worked with Tobias Bosch (Google engineer, part of the core Angular team). You can find profiles of both Jeff and Tobias here https://angular.io/about/.
I think the quote and video above explains it all. If you work with an Angular JS website, I highly recommend watching the whole thing and, of course, sending it over to your client’s developers.
The takeaway here is really hard to argue with and gives us (SEOs) a powerful argument against client rendered Angular websites. I don’t know about you guys, but I had a lot of my clients considering such solutions.
If you are making an Angular website, it has to be server rendered.
Not doing so is simply poor development. It is only OK to use client rendered Angular for content that isn’t publicly accessible (not accessible for Googlebot). For example, your website’s CMS panel, etc.
Experiment Results - Inline, External, or Bundled?
Technical things aside, this experiment gave us a little extra info we didn’t expect. Info that sheds some light on how Google’s crawling and indexing works.
Google Cache vs. Google’s Index?
Googlers have mentioned several times that Google cache works a bit different than Google index. Still, I find it quite interesting to see that content can be cached, but NOT indexed.
This really puts a huge question mark on even looking at Google’s cache while diagnosing potential technical SEO issues, and definitely confirms Google’s stand on Google cache being a separate entity from Google’s Index.
Methodology Behind the Experiment
1. The goal of experiment was to achieve 100% transparency and accuracy of results achieved. To make sure this was the case we focused on multiple metrics.
2. Experiment was setup on a separate, brand new domain with no external links, no history, etc.
Before deploying the website live, we configured:
- Loggly (to access server logs easier),
- Server log storing (Loggly stores server logs for a limited period of time),
- NewRelic - we used it to make sure there are no anomalies, downtimes etc. that could affect crawling and indexing.
- OnPage.org - we use OnPage.org for technical SEO but in this case we used it to track uptime.
- Statuscake.com - also for uptime monitoring. Undocumented downtime could affect our crawling data and we wanted to make sure it is not the case.
- Google Search Console - for fetching URLs as Googlebot
- Google Analytics
3. We made sure the experiment was kept secret when we were gathering data. Making it public would open an option to temper the log and crawling data (e.g. external links to /test/ URLs, Tweets etc.).
4. We checked if frameworks’ URLs were indexed properly to make sure that Googlebot had a chance to crawl and index the content on those URLs.
5. We added one page that was 100% HTML generated to have a “control group” URL where we could check if our methodology and tests add up.
To make sure we are even more transparent, we published the code used for our experiment on Github.
Github - experiment’s documentation
This experiment only makes sense if we are fully transparent about the code and methodology we used. You can find full Github repository with all the code used to built our experiment here https://github.com/kamilgrymuza/jsseo.
The Experiment Continues
I am really excited about what we’ve managed to achieve with this simple experiment, but I know it is just the start.
We are all aware that Google isn’t making this process any easier, and experiments like the one presented here can save hundreds of thousands of dollars spent on website development resulting in poor SEO results.
Feel free to contact me or Kamil with your questions. If you are a developer, it would be awesome if you could contribute to our Github repository JS framework/configuration. Just drop me a line and we’ll get it done so you can be sure your code is SEO friendly.