Server logs are «out»
I decided to slowly launch sites for a popular niche. To monitor traffic (bots+users) I find it convenient to use server logs (so on the front there are no footprints), no need to install a single JS-code or on the contrary, to register a counter for each site.
The sites are hosted on regular hosting, so I don't have time to figure out how to connect server logs and export them somehow.
Made some sort of «takeaway» server logs:
- On clients, it's just a php-incloud of a single script that sends visit data to the server with a post-request;
- Server receives data, checks bots for validity, sends data to the database.
What's going to happen:
- Date-Time;
- URL accessed;
- Country (data from CF);
- Language;
- Referer;
- User Agent;
- IP;
Also on the server is validation of Google, Yandexbot, Bing bots (in two stages: by ranges, if there is no there - then through PTR-record).
For easy viewing I made a reader of these logs (on the screenshot).
Why?
- Monitoring the activity of search bots;
- Understanding whether or not there is or is not traffic on the site without installing JS counters;
- Understanding the sources of traffic (so you don't have to guess where the traffic came from);
- Understanding how much of this traffic is live, blocking left UA/languages/countries
The scripts themselves I do not lay out (probably it is not necessary for many people in general, and nawaybkodit himself not long, but if anyone needs - write in the comments), was inspired originally by this script - http://usefulscript.ru/log_info.php - he's generally okay, too.
Nuances:
Only those scripts where an include is made are logged. If you want to see references to files (robots.txt, sitemap.xml, for example), you will have to make them executable. If you need logs for non-existent urls, you need to make your own page handler for errors, where to make an include.
How to make robots.txt executable:
robots.php
<?php
header("Content-type: text/plain");
function get_http_response_code($theURL) {
$headers = get_headers($theURL);
return substr($headers[0], 9, 3);
}
?>
User-agent: *
Disallow: /wp-login.php
.htaccess
RewriteCond %{REQUEST_FILENAME} robots.txt$ [NC]
RewriteRule ^(.*)$ /robots.php [L]
Its own error-handling pages:
ErrorDocument 404 /404.php
ErrorDocument 403 /403.php
ErrorDocument 500 /500.php