A 404-error code relates to when a web server received a request for a web page and was unable to locate it. To back up slightly, based on requests, a server will return a variety of standard response codes including when a page is found, and send this across the internet (to be viewed in a web browser, usually). As such, all kinds of response codes are sent – we just don’t usually see them as web users.
What is a Soft 404 Error Response Code?
A 404-error response from a server indicates that the page wasn’t found on the server. This is like a red light at a set of lights – it’s either illuminated or it is not. The server is simply saying that the page wasn’t there when the server went to look for it.
A Soft 404 error is a little different and the distinction is an important one. With a Soft 404 error, the server is communicating that while the page with that filename in the folders/sub-folders indicated in the URL wasn’t present, something else happened.
The non-existent page didn’t cause the server to serve a “404“page not found” error page to the web user. At least, not right away. Instead, it may have been redirected to another page. Maybe this secondary page was missing too, or it provided additional information to the web user.
In the above scenario, the actual page was not present. But something else happened in response to that being the case. Therefore, the page request didn’t completely fail, and the search engines still see the page as present.
Web Servers Aren’t Automatically Intelligent
It is possible that following a redesign, mistakes were made. These might have caused a folder to be renamed. Or perhaps a capitalization was added in the folder name, or with a dash (—) added and the server wasn’t able to overcome this discrepancy.
A human could look at the server, realize the page is missing, and spot the mistake. We’d then either change the URL we’re looking for by amending it inside the web browser, or we’d fix the mistake by renaming the folder on the server to correct it.
However, web servers aren’t that clever. There’s no AI in the background realizing the mistake. So, the server error response code is generated. The server doesn’t improvise; it follows instructions.
How Does a Soft 404 Error Response Differ from a 404 Error Response?
A 404-error code is where the server was both unable to find the page where it was supposed to be and there was nothing set up on the server-side to work around such issues. The request fell over, generating an HTTP 404 error code, and then a standard 404 error page was displayed to let the web user know.
With a Soft 404 error, the server did not find the page where it was supposed to be. Yet, the activity didn’t stop there because the server had other instructions. Also, and important for SEO purposes, a Soft 404 error tells the search engines that there’s a page there, kind of…
For example, there might be a plan to display a different page whenever a soft 404 error is generated. The secondary page might tell the user that the article was removed, which is useful information to them. They can stop searching for it.
Why Do Soft 404 Errors Usually Occur?
The larger the site, the more pages it has. When redesigning a website, it’s common that the website structure will change to some degree.
Also, if moving platforms, e.g., from a WIX site to a WordPress one, the structure and page naming conventions for individual web pages change too. Because of this, sometimes pages get lost in all the fuss, renamed, or deleted. Other times, an article simply gets deleted.
Crawl Budget and Why Are Soft 404 Errors Bad for SEO?
Without a standard 404 error code generated, the website is telling the search engines that the page is present. The fact that another page is being shown to the visitor is beside the point. But it certainly muddies the waters.
The crawl budget is the number of pages that the Googlebot and other search engine bots will look at. It’s not verifiable by number, but it exists. Google doesn’t index the complete web or all pages on a site. The newer the site, the less chance of it being fully indexed too. Therefore, the crawl budget needs to be protected, to ensure the most important and/or the latest content is indexable.
Sadly, with a Soft 404 error, the eventual page that does get shown is the one that’ll be indexed. For each one of these, it wastes a valuable page that could have been part of the crawl budget instead. And that’s potential web traffic that’s been lost.
Redirect Pages with Changed URLs
A frequent mistake in either site redesigns or web content management system switchovers is messing up the web structure. Because of this, pages go missing.
Whoever makes such changes needs to ensure that a 301 permanent redirection instruction is added to the server. This will redirect any requests for a page at an old location to the page at its new location. If the page file has been renamed but its folder location is the same, then a 301 redirection can also be used too (this may be needed when a published article’s filename had a typo and was later fixed).
By using the correct type of server redirection instruction, the server knows what action to take. Also, web browsers and search engine robots receive the correct information. Subsequently, they don’t use up the crawl budget or mess with the indexing of a valuable page.
How to Redirect Pages to Avoid a Soft 404 Error
Over 40 percent of websites today use WordPress as their CMS.
To set up redirection, there are redirection plugins that will perform this function for you. They can be manually inserted; however, many redirection plugins will pick up on a filename that’s just been changed and will automatically create a new 301 redirection to handle it properly.
It’s also possible to modify the HTACCESS file at the server level too. However, this is more cumbersome, and mistakes made here could render your site unloadable. Therefore, when using a CMS, it’s better to find a plugin or other solution to set up redirection within it.
While Soft 404 errors may seem like a minor issue, the crawl ability of your site is at stake. When wanting to extract as much value from your site as possible, it needs to be as crawler-friendly as possible.