I’m getting a lot of traffic from the MJ12bot and not only that, it is being a bad bot by trying to visit pages that my robots.txt file asks it politiely not to index. Therefore, my bad bot checker keeps blocking it, but it uses many different addresses. Here is a small list of 27 of them:
51.38.181.206 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 54.38.85.17 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 65.108.110.26 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 65.108.2.171 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 65.108.203.133 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 65.108.227.178 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 65.108.46.72 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 65.108.78.33 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 67.144.2.241 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 135.181.180.59 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 135.181.212.177 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 135.181.74.243 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 135.181.79.106 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 146.19.215.244 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 149.202.65.189 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 149.202.86.86 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 158.220.119.234 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 162.244.27.137 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 178.150.14.250 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 178.25.244.194 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/ 192.99.36.61 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 192.99.37.133 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 193.70.81.103 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 193.70.81.106 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 213.199.34.199 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 217.182.134.134 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 217.182.175.187 - HTTP/1.1 - Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)
These bots are the majority of my visitors and they keep morphing (changing IP addresses and User Agent strings) to avoid being blocked.
What is MJ12bot?
MJ12bot is a web crawler developed by Majestic, a UK-based company specializing in SEO and link intelligence data. Its primary role is to gather information about websites to enhance Majestic’s backlink analysis tools. This bot is known for its extensive crawling capabilities, reportedly visiting over 8 billion websites daily to maintain an up-to-date database of link relationships across the internet[2][3][4].
Majestic and the MJ-12 (Majestic 12) documents are not related in any formal or organizational capacity.
Overview of Majestic 12
MJ-12 Background: Majestic 12, also known as Majic-12, is a purported secret committee allegedly formed by the U.S. government in 1947 to investigate UFOs and extraterrestrial phenomena. The concept emerged from a series of documents that surfaced in the 1980s, which many experts and organizations, including the FBI, have declared to be forgeries or hoaxes[9][10].
– **Claims of Existence**: The documents claim that this group was established by President Harry S. Truman following incidents such as the Roswell crash. However, extensive investigations have found no credible evidence supporting the existence of such a committee[9][12].
Overview of Majestic (the Company)
– Majestic Company: In contrast, Majestic is a legitimate company known for its SEO tools and link intelligence data, specifically through its web crawler MJ12bot. This bot is focused on collecting data to aid in search engine optimization and does not have any connection to UFO investigations or conspiracy theories[9].
While both share the name “Majestic” or “MJ-12,” they operate in completely different domains—one in the realm of conspiracy theories related to UFOs and the other in digital marketing and SEO analytics. The similarities in nomenclature are coincidental and do not imply any relationship between the two.
Key Features of MJ12bot
– SEO Focus: MJ12bot is designed specifically for SEO purposes, helping businesses analyze their backlink profiles and understand the interconnectedness of web pages through internal and external links[3][4].
– High Activity: It is recognized as one of the most active SEO bots, ranking just behind Google and Bing in terms of crawling frequency[3].
– Data Collection: The bot collects data that allows Majestic to provide insights into website trustworthiness and influence based on backlink metrics[4][5].
Interaction with Websites
– Crawling Behavior: MJ12bot does not follow a fixed visitation schedule; its crawling frequency can vary based on factors like keyword rankings and the number of backlinks pointing to a site[2][6].
– Robots.txt Compliance: Website owners can manage MJ12bot’s access via the `robots.txt` file, which can either allow or restrict its crawling activities. However, compliance with these directives can vary among bots[5][6].
– Considerations for Blocking: While some website owners may consider blocking MJ12bot due to high traffic from it, doing so could be counterproductive if they benefit from SEO services that rely on such crawlers[2][5].
In summary, MJ12bot plays a significant role in the landscape of SEO by providing essential data for website analysis, although its high activity level can sometimes raise concerns among webmasters regarding server resource usage.
How does Majestic Make Money Crawling My Site?
Majestic, known for its SEO tools and link intelligence data through its web crawler MJ12bot, generates revenue primarily through the following methods:
1. Subscription Services
Majestic offers various subscription plans that provide access to its extensive link intelligence database. These plans cater to different user needs, from small businesses to large enterprises, allowing users to analyze backlinks, track competitors, and assess their own website’s performance. The subscription model is a significant source of recurring revenue for the company[14][19].
2. Data Sales
The data collected by MJ12bot is not only used internally but is also sold to third parties. This includes providing insights and analytics to digital marketers, SEO professionals, and businesses looking for detailed information about their website’s link profile and overall online presence. Such data is crucial for businesses aiming to enhance their search engine optimization strategies[18][24].
3. API Access
Applicaiton Programmer Interface: Majestic provides API access to its data, allowing developers and businesses to integrate link intelligence features into their applications. This service is typically charged based on usage or through a subscription model, further diversifying the company’s revenue streams[19][24].
4. Affiliate Programs
Majestic may also engage in affiliate marketing programs where it partners with other companies to promote its services. This can include offering commissions for referrals or providing bundled services with complementary products[19].
Author Explorer Summary
The Majestic web site says: “Author Explorer is a breakdown of what Majestic’s Link Intelligence data can tell you about a notable handle, or content author”. Sure, makes sense. So I spend a lot of time on this web site to avoid obnoxious tracking on Big Social Media and this is what I get. Good to know.
How Can I Block the MJ12bot?
To block the MJ12bot in an Apache server, you can use the `.htaccess` file or configure it directly in the Apache configuration. Here are the methods:
Using .htaccess
1. Open your `.htaccess` file, which is typically located in the root directory of your website.
2. Add the following code to block MJ12bot:
<IfModule mod_rewrite.c> RewriteEngine On RewriteCond %{HTTP_USER_AGENT} MJ12bot [NC] RewriteRule .* - [F,L] </IfModule>
This code checks if the user agent matches “MJ12bot” and returns a 403 Forbidden response if it does, effectively blocking the bot from accessing your site[26].
Using Apache Configuration
If you have access to the main Apache configuration file (httpd.conf or apache2.conf), you can block MJ12bot by adding the following directives:
<Directory "/path/to/your/directory"> SetEnvIfNoCase User-Agent "MJ12bot" bad_bots <RequireAll> Require all granted Require not env bad_bots </RequireAll> </Directory>
Replace `”/path/to/your/directory”` with the actual path to your website. This configuration will deny access to any requests from MJ12bot while allowing all other traffic[27].
Additional Notes
– Ensure that `mod_rewrite` is enabled on your server for the `.htaccess` method to work.
– It’s also advisable to verify that you are blocking the legitimate MJ12bot and not a spoofed version, as there have been instances of fake bots mimicking its user agent[28].
– You can also manage bot behavior using `robots.txt` by adding:
``` User-agent: MJ12bot Disallow: / ```
This instructs compliant bots not to crawl your site, although it may not prevent all access from non-compliant bots[29].
Yeah, I’m not sure out it being fake MJ12bot because the Majestic site says for newsi8.com that it has over 3000 pages crawled. At this time I only have about 1600 published, except for fake pages set up to trap bad bots in a black hole. So, the “it wasn’t us, really!” excuse doesn’t seem valid in my case. Unless there are AMP pages for each of my pages that I don’t know about, or something else I’m overlooking.
Three Other Bots Blocked
Today I also noticed three bots were hitting my site particularly hard. Amazonbot has gone crazy, visiting 122,286 times, Well could be an imposter.
- Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot) Chrome/119.0.6045.214 Safari/537.36 (Visits: 122,286)
- Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot) (Visits: 8963)
- Mozilla/5.0 (compatible; DotBot/1.2; +https://opensiteexplorer.org/dotbot; help@moz.com) (Visits: 5983)
SInce I don’t have any humans leaving comments these days, it does not seem like allowing these bots does anything useful. If they were indexing the site to provide to human users, that would be fine.
Read More
[1] https://www.mj12bot.com
[2] https://darkvisitors.com/agents/mj12bot
[3] https://originality.ai/seo-bot
[4] https://www.ushiblo.com/reject-mj12bot/
[5] https://webmasters.stackexchange.com/questions/129316/trying-to-determine-if-bot-crawling-my-site-is-malicious-mj12bot
[6] https://fortiguard.fortinet.com/appcontrol/44782
[7] https://webmasters.stackexchange.com/questions/93672/do-i-really-have-to-block-mj12bot-as-the-prevailing-visitor-on-my-site
[8] https://datadome.co/bots/0tpxvmoq/
[9] https://en.wikipedia.org/wiki/Majestic_12
[10] https://science.howstuffworks.com/space/aliens-ufos/majestic-12.htm
[11] https://www.youtube.com/watch?v=KzL4_vydsg0
[12] https://www.archives.gov/research/military/air-force/ufos
[13] https://vault.fbi.gov/Majestic%2012/Majestic%2012%20Part%2001%20(Final)/view
[14] https://majesticcruiseline.com/business-model
[15] https://www.mj12bot.com
[16] https://darkvisitors.com/agents/mj12bot
[17] https://corporate.majestic.co.uk/at-a-glance
[18] https://originality.ai/seo-bot
[19] https://majestic.com/company/about
[20] https://www.research-tree.com/newsfeed/article/majestic-corporation-annual-financial-report-and-notice-of-agm-2439298
[21] https://www.webmasterworld.com/search_engine_spiders/3937580.htm
[22] https://webmasters.stackexchange.com/questions/129316/trying-to-determine-if-bot-crawling-my-site-is-malicious-mj12bot
[23] https://www.the-buyer.net/insight/how-majestic-wine-is-transforming-its-business
[24] https://datadome.co/bot-management-protection/crawlers-list/
[25] https://internetretailing.net/magazine-article/majestic-wine-retail-strategy/
[26] https://cleantalk.org/help/blocking-bots-by-ua
[27] https://www.milesweb.in/hosting-faqs/block-bots-apache-whm-cpanel/
[28] https://www.webmasterworld.com/apache/3603824.htm
[29] https://github.com/fail2ban/fail2ban/issues/712
[30] https://www.apachelounge.com/viewtopic.php?t=7425