Table of Contents
About the author
Rosslyn Elliott
Watch our provider review videos
Video ReviewsWhich speed do I need?
Tell us what you use Internet for
How many users?
Last week, 404 Media broke the news that Reddit has blocked most major search engines from indexing its recent content.
The single exception is Google, as a result of the company’s agreement earlier this year to pay for Reddit’s content.
Though Google’s payment to Reddit may seem like a logical reason for its continuing access, the privilege will again give Google a massive advantage over competitors.
For a company already facing lawsuits for its monopolistic status, the outcome may cause more government crackdown in the long run.
How Did Reddit Block Search Engines?
Reddit recently updated its robots.txt file, a standard web protocol that tells search engines which parts of a website they can crawl and index. This change prevents web crawlers from accessing Reddit’s latest posts and comments, affecting a wide range of popular search engines.
Google apparently is using an authorized manual override to avoid the block.
Search Engines That Can’t Access New Reddit Content
The following search engines have been affected by Reddit’s new policy:
Bing
DuckDuckGo
Mojeek
Qwant
Baidu
Yandex
The search engine Kagi still has access to new Reddit data because of its previous agreement to purchase content from Google.
Google’s Deal to Pay for Reddit Content Preserves Search Access
Google’s continued access to Reddit’s content stems from a $60 million deal struck earlier this year. This agreement allows Google to use Reddit’s data for AI training purposes, setting it apart from other search engines. Reddit forced the issue of compensation for its content by blacking out Google’s access in 2023 in protest of API changes.
While users and other search engine companies were quick to claim that the lock-out occurred because of the Google deal, Reddit denies that connection. “We block all crawlers that are unwilling to commit to not using crawl data for AI training, which is in line with enforcing our Public Content Policy and updated robots.txt file,” a company spokesperson said to Engadget.
How to Tell If Your Search Engine is Blocked
Users can easily check if their preferred search engine is affected by entering “site:reddit.com" in a search box followed by a date range or sorting by recent results.
For blocked search engines, users will notice:
· No results from the past week
· Empty search result pages
· Outdated content (several years old)
· Messages stating the site won’t allow descriptions
Impact on Users and Search Engines
For many internet users, adding “Reddit" to search queries has become a common way to find human-generated answers on topics ranging from tech support to personal advice. With the flood of AI information online, Reddit provides a valuable source of human input.
With this change, users who are looking for recent Reddit content will be limited to Google or search engines that pull from Google’s index.
Understanding Web Scraping and the AI Controversy
What is Web Scraping?
Web scraping is the automated process of extracting data from websites. Reddit has a no-scraping policy that forbids companies to scrape its data without compensation.
AI companies are just one type of organization that tries to scrape the web. Others include:
1. Search engines to index content
2. Researchers gathering data
3. Businesses monitoring competitors
AI companies try to scrape data in order to create new material based on that data. This use has created unprecedented controversy.
The aggressive scraping from AI has also caused concern for individuals, as more people wish to prevent their data from being used by AI. General anxiety about exposed personal information in the cloud is increasing with recent major hacks.
The AI Data Controversy
The use of online data for AI training has become a source of public debate and lawsuits due to several issues:
1. Copyright concerns: Many content creators argue that using their work to train AI models without permission or compensation infringes on their intellectual property rights.
2. Privacy issues: Privacy advocates have concerns about personal information being included in training data without consent.
3. Bias and representation: The data used to train AI can perpetuate or amplify existing biases in online content.
4. Economic impact: As AI models become more sophisticated, there are fears they could replace human workers, especially in customer service, retail, bookkeeping, and banking.
Reddit’s Statement on Scraping and Compensation for Content
Reddit spokesperson Tim Rathschmidt stated to the Verge:
“We have been in discussions with multiple search engines. We have been unable to reach agreements with all of them, since some are unable or unwilling to make enforceable promises regarding their use of Reddit content, including their use for AI."
This statement suggests that Reddit’s primary concern is not just about compensation, but also about controlling how its content is used, particularly in AI applications.
Search Block is the Latest Volley in a Larger Battle
Content Monetization and AI Training
Reddit’s move aligns with a growing trend of content creators and platforms seeking compensation for the use of their data in AI training.
Many web publishers feel that their survival depends on not allowing AI to take their content without payment.
Brent Csutoras, founder of Search Engine Journal, commented on the battle between AI companies and content platforms. “Publications, artists, and entertainers have been suing OpenAI and other AI companies, blocking AI companies, and fighting to avoid using public content for AI training,” Csutoras said in a LinkedIn post.
Search Engine Choice at Risk
If Google is the only major search engine able to index recent Reddit content, there will be a new threat to the ability of other search engines to compete with Google.
Google’s market dominance has always loomed as a potential annihilator of user choice. The fact that an entire industry (SEO marketing) depends on Google’s algorithms shows the lack of a truly competitive landscape in the search engine market.
Implications for the Open Web
Reddit’s decision raises questions about the future of the open web. As more platforms restrict access to their content, it could lead to a more fragmented internet where information is siloed within specific ecosystems.
Questions Raised for Regulation and Search in the Future
As the situation unfolds, several questions remain:
1. Will other search engines eventually strike deals similar to Google’s?
2. How will this standoff affect Reddit’s upcoming IPO and overall valuation?
3. Could this lead to more anti-trust lawsuits over Google’s growing influence?
4. Will other major websites follow Reddit’s lead in restricting access?
The Future of Web Crawling and Content Access
Reddit’s actions may set a precedent for other major websites and platforms. As the value of data continues to rise, we may see more content providers implementing similar restrictions on web crawlers and AI training data access.
Potential Outcomes
Legal and regulatory changes: Governments might step in to regulate the use of online data for AI training.
New business models: We might see the emergence of new monetization strategies for online content.
Technological adaptations: Search engines and AI companies may develop new ways to access and use online data ethically.
User behavior shifts: Internet users may change how they search for and consume online content.
FAQ: AI and Why Reddit Blocked Major Search Engines
What is web scraping?
Web scraping is the automated process of extracting data from websites. It’s used by search engines, researchers, businesses, and AI companies to gather online information.
How does AI potentially violate intellectual property laws?
AI companies may use copyrighted content to train their models without permission or compensation. This can infringe on creators’ intellectual property rights.
Which major search engines are blocked from accessing new Reddit content?
Bing, DuckDuckGo, Mojeek, Qwant, Baidu, and Yandex are blocked from accessing new Reddit content. Google is the main exception due to a payment agreement.
What is a robots.txt file?
A robots.txt file is a standard web protocol that tells search engines which parts of a website they can crawl and index. Reddit updated theirs to block most search engines from recent content.
How much did Google agree to pay Reddit for content access?
Google agreed to pay Reddit $60 million for access to its content. This deal allows Google to use Reddit’s data for AI training purposes.
About the author