Why Hulu.com? Their website is unique and especially interesting from an SEO perspective, because it is often the only one that can terminate Hulu related Google queries. It is in Google’s interest to rank and show Hulu.com. Let me show you an example.
Hulu.com is the only place where you can watch many popular US TV shows like “Casual”, “The Awesomes” and many more. You can find a full list of Hulu.com exclusive shows here. This gives Hulu the upper hand in SEO but, unfortunately, this enormous SEO potential is not fulfilled.
Hulu SEO problems
“The Awesomes” - this is Hulu’s original series.
Let’s search for the title of the Hulu landing page we see above - “Watch The Awesomes Online at Hulu”.
Here is how the http://www.hulu.com/the-awesomes landing page looks like
But when you look “under the hood” of this page, you won’t see the regular HTML code you would expect.
This is what you see when you click “View Source” (Chrome).
How do you check if Google is crawling and indexing your website properly? This is where it gets complicated. Most people would recommend fetching and rendering your website using Google Search Console. However, there are two issues that need to be addressed to follow this recommendation:
1. I obviously don’t have access to Hulu’s Google Search Console.
2. Fetch and render is an awesome tool, but it is good to double check the fetch result with Google cache and Google index. How to do that?
Here comes the simplest and most obvious solution to check if Google can properly crawl and index Hulu.com. The awesome thing is that you can do this within a few seconds right now and see it for yourself.
2. Find any unique content within a page and copy it
3. Search for the content and see if the URL from step 1 is ranking (unique content should rank #1)
Did Hulu.com pass the test?
Unfortunately, Google never indexed Hulu’s content:
Now if you want to be SUPER sure, you can take it one step further and search for the page’s content only within a specific page.
How to do this? Go to Google and search for “website’s content” site:hulu.com/casual
Unfortunately, Google has never seen this content.
We can shorten the query to see if Google indexed at least short pieces of content from Hulu.com.
Again, no luck. We are now 100% sure that Google never indexed Hulu’s content.
You probably think hulu.com/casual URL isn’t indexed in Google. Let’s check this out.
As you can see, the URL is in Google index. This also means that Google had to find SOME content there to index it.
This leads us to another question. What did Google crawl and index?
Fortunately, we can analyze this valuable data in Google Cache.
Unfortunately, as you can see below, there is a bug in Hulu’s code, which is blocking Google Cache from being presented properly.
Yeah, you are not wrong, what you see above is a … code 500 error page.
What we see above comes from a bug in Hulu’s code causing Hulu.com to search for the slug from Google’s cache URL.
Let me explain. This is Google’s cache URL:
Due to the bug in the code, Hulu.com takes a piece of the code “search?q=cache:1hQQypT6GMoJ:www.hulu.com/casual” and performs a search.
This is how it should look.
Hulu is protecting its content from being displayed on different domains; this content we see above won’t be displayed in Google’s cache. Here is an example, if I try to load hulu.com from my own domain, the content isn’t showing up and is returning a code 500.
This solution protects Hulu from being scraped and from launching their content from other websites (movies, shows etc.) This is definitely a good move, but it should be disabled for Google Cache.
What is interesting, however, is that some of the pages from hulu.com return 404 error codes when looking into cache.
Enough about cache, let’s move on.
Risks for Hulu.com
1. Lack of indexation and risk of losing content authorship
With thousands of thin content pages indexed (in a crawler’s eyes), Hulu is a perfect target for Google Panda, Phantom and other algorithms that target thin/poor content. This is a very likely reason why Hulu.com ranks so badly right now, with a constant downtrend in visibility.
Hulu has two problems as far as crawling by search engines is concerned.
1. Hulu’s internal search algorithm interferes with Google crawler making it impossible for Google to see the same content the visitors see.
which may prevent crawlers from seeing that content.
When it comes to solving those problems, here are two proposed solutions aimed at improving the crawling effectiveness of hulu.com:
1. The search algorithm should be modified in order to strip “cache:www.hulu.com/” from the search criteria string (if present) so that the request performed by Google crawlers can be understood properly and relevant content can be returned.
First and foremost though, we need to make sure that the actual requests performed by Google crawlers are properly understood and return the expected data (what I mean by actual requests are requests performed by the actual crawlers, not only those emulated in the browser by prepending the address with “cache:”).
What is prerender?
And they change it into nice looking HTML, as can be seen on the screenshot below:
Basically we do what every browser does on the server side. Thanks to that, we don’t leave rendering to Google. Our website gets crawled and indexed properly, it ranks better and everyone is happy.