Facebook Share Button Twitter Share Button Reddit Share Button

Building a Spam-Filtering PHP Form Processing Script

 

In the previous page, we had PHP write a file to the tmp/ directory of the user's /home directory. That file contains the following data:

Now we're going to use that file to enable the form processor script to know whether the data was submitted through the form, as well as whether the form was submitted more quickly than a fast human typist could have filled it out. Let's start by initializing the script and collecting some data about the computer that called the processor script.

<?php
session_start();
date_default_timezone_set('America/New_York'); // replace with the server's time zone
$submitTime = time(); // gets submission time;
$submitBrowser = $_SERVER['HTTP_USER_AGENT']; // gets the current browser
$submitIP = $_SERVER['REMOTE_ADDR']; // gets the current IP address
$submitReferer = $_SERVER['HTTP_REFERER']; // gets the URL of page that sent the form data, if available

Next, we have to tell PHP where to find the PHP file we created on the form page. We used the session ID as the first part of the filename and stored it in the tmp/ directory, so we know where it should be. If it's not there, then we know this submission didn't come through the form. So what do we do?

Your basic choices at this point are to accept the submission but rewrite the title in some way as to label it as suspicious, to redirect it to a different email address, or to discard it.

What I usually do with submissions that I know for a fact could not have come through the form is redirect the spammer or spambot to the success page and terminate the script. Why the success page? So they stop trying. Whether it's a human spammer or a robot, they'll think they succeeded; so they'll either go away or keep attempting to send spam using the same faulty script, all of which will be discarded.

So assuming that my success page is /success.php and that I want to discard the message, my next snippet of code would be something like:

$checkFile = "/home/yourusername/tmp/" . session_id() . ".php";
if (!file_exists($checkFile)) {
    include("/success.php");
    die;
}

or if using a meta refresh instead of an include:

$checkFile = "/home/yourusername/tmp/" . session_id() . ".php";
if (!file_exists($checkFile)) {
    print "<meta http-equiv=\"refresh\" content=\"0;URL=https://www.mydomain.tld/success.php\">";
    die;
}

Assuming that the file does exist, let's include it so we can do some more checking.

include("$checkFile");

This will allow us to do at least three more tests:

Let's check for all three. For the third test, we'll assume that the world's fastest typist would take at least six seconds to fill out the form, so we'll use four seconds to provide a bit of a safety margin against false positives. Forms that are filled out too quickly are one of the most common signs of robotic spam, but we want to be careful not to trash any legitimate messages.

To refresh your memory, four of the variables defined in the PHP file we included from the form page were

$startTime // time the form page was opened
$startBrowser // the browser used to open the form page
$startIP // the IP address used to open the form page
$startHash // an md5 hash of ($startTime . $startIP)

To perform the first test, we have to reconstruct the hash and compare it to $startHash. For the second test, we have to compare $startBrowser to $submitBrowser. For the third test we have to subtract $startTime from $submitTime and determine whether the result is less than four seconds. We can do all three tests with:

$testHash = md5($startTime . $startIP);
if  (
    ($testHash !== $startHash) ||
    ($startBrowser !== $submitBrowser) ||
    ($submitTime - $startTime < 4)
    )
    {
        include("/success.php");
        die;
}

or

$testHash = md5($startTime . $startIP);
if  (
    ($testHash !== $startHash) ||
    ($startBrowser !== $submitBrowser) ||
    ($submitTime - $startTime < 4)
    )
    {
        print "<meta http-equiv=\"refresh\" content=\"0;URL=https://www.mydomain.tld/success.php\">";
        die;
}

At this point, the PHP file we created has served it's purpose and can be deleted:

unlink("$checkfile");

I strongly advise you not to use a change in IP address between loading the form page and submitting the form as a spam test. Most mobile providers issue IP addresses with very short leases, so it's not at all unusual for a legitimate user's IP to change between loading the form and submitting it. It can be useful, however, for tracking the source of spam that does get past the filters, which is the main reason I collect it.

Another good use for the IP address collected at the time the form is submitted would be to chack it against available databases of Web spammers or other Internet miscreants. There are many free and paid blocklists available from organizations such as Project Honeypot and AbuseIPDB. I don't typically check IP addresses at the form level because I import the blocklists at firewall-level; but if you have no control over your server's firewall, you can learn how to compare the IP of the computer that submitted the form against a text file of known spammers' IP addresses (plus get a free blocklist to get you started) on this page.

You may have noticed that we also collected the referer page. Although not very useful for spam filtering because it can easily be spoofed, knowing the referer page may be helpful for tracking down the source of sudden spam deluges. Or not. But having the information doesn't hurt.

At this point, we still haven't even loaded the variables from the form input, yet we've done quite a bit of spam filtering already. That's by design. We may as well discard mail that we know didn't come through the form or that was submitted too quickly for a human to have typed it before processing the form data. We'll talk about that on the next page.