Trackbacks – a form of remote notifications or reverse linking from one blog to another – are in jeopardy.
Spammers are increasingly using Trackbacks as a target for spam. Trackbacks are much less protected than other spam targets such as blog comments. Blog comments are often protected against automated attacks by password authentication such as is available on our BlogHarbor service via Blogware Reader Accounts, or through the use of CAPTCHAs as a means to detect automated processes. Trackbacks however do not require any standardized security procedures which means that they are very susceptible to abuse.
Trackbacks should be treated more conservatively than email. Trackbacks are not ‘mission critical’ in the way that email is; the loss of one trackback incorrectly tagged as a spam (a false positive) would not disrupt a business. It is not a public matter to receive a spam email; you click the delete button in your email client and move on. But a spam posted to your weblog is indeed a public matter, a defacement of your public persona. A liberal trackback implementation will lead to decreased utility of trackbacks on your site. If there’s a question that a trackback could be a spam, it should be deleted or queued for moderation.
The question is, how do we save Trackbacks? How can we tell if an incoming trackback is authentic? While there are many methods in use, they seem to rely on crude methods of content analysis (constantly updating lists of grep patterns is not something the average blogger is likely to do).
There is a better way: Analyze trackbacks as if they were email, using the same proven and available spam prevention tools available to email servers.
Using DNSBLs to Verify the Source of a Trackback
A common means by which mail servers are defended against spam is through the use of DNS blacklists or DNSBLs. A DNSBL is a list of IP addresses through which spam has been sent or are likely to be used by spammers. These IPs can include open proxies, dynamically assigned IPs, and compromised servers.
DNSBLs allow a mail server to determine in realtime if an incoming email is likely to be spam. By sending a query to DNSBL services such as Spamhaus, SORBS, SPEWS, or NJABL, a mail server can determine in milliseconds whether or not an incoming email is being sent from an IP address which is likely being used by a spammer.
Blogging services must immediately begin basing trackback security on existing DNSBL standards. Some users have begun implementing such checks already:
http://weblog.sinteur.com/index.php?p=7967
http://bradchoate.com/weblog/2004/11/05/mt-dsbl
The use of DNSBLs has already proven to be an excellent defense against the very same spammers which are now beginning to attack blog trackback systems.
Content Scanning with DNSBLs
In addition to checking whether or not an incoming trackback originates from an IP address listed on a DNSBL, the content of a Trackback should be scanned against one or more DNSBLs.
The following parameters are sent as part of the Trackback protocol:
- Title
- Excerpt
- URL
- Blog Name
While the URL and excerpt parameters are the most likely locations for spammers to place spam URLs, it is not inconceivable that the title or excerpt parameters might also contain spammers’ web addresses, therefore it would be prudent to implement trackback security procedures which scan the entire content of a Trackback for web addresses, resolve those hostnames to IPs, and check those IPs against one or more DNSBLs.
Dynamic IP Addresses
While using a standard DNSBL such as Spamhaus will help reduce spam by blocking Trackbacks originating from or containing from known spammer IP addresses, one of the single most effective ways to reduce trackback spam would be to check trackbacks against DNSBLs such as SORBS which list dynamic IP address space, and to deny Trackbacks originating from dynamic IP addresses.
Many if not most ISPs are already disallowing email which is sent from dynamic IP addresses and requiring outgoing email to be sent from the ISPs own mailserver. This helps to reduce not only deliberate email spamming from end users on dial up or broadband networks, but also “accidental spamming” when zombie computers are used to send spam without their owners knowledge.
It is not good practice going forward to allow trackbacks originating from end-user clients. Best practice should be that trackbacks originate from servers, not from clients.
Blog hosting providers should require trackbacks to be sent from address space that is not dynamically assigned. The Trackback protocol has no inherent security and as a result, allowing trackbacks from dynamically assigned IP space is irresponsible.
Comments on blogs can be secured in any number of ways, such as requiring authentication etc. but the trackback spec does not allow for such security measures. Therefore additional means must be introduced in order to keep Trackback viable.
Blog hosts should send Trackbacks, not blog clients. Trackbacks technically are from weblogs – hosts – so requiring the trackback ping to actually come from a host is not unreasonable.
It would be necessary to alter the behavior and Trackback implementation in some blog hosting and blogging client tools, but if this modification can save the Trackback that is a small sacrifice.
SpamAssassin
One of the most popular open sourced tools for protecting mail servers against spam is SpamAssassin. This tool parses incoming email for content, and assigns it a score based on a series of tests. A higher score means that the message is more likely to be spam. Mail server administrators can configure their server to reject mail above the threshold of their choosing.
SpamAssassin is now part of the Apache project and has a high level of support within the developer community and it is regularly updated with new tests to detect spam. Additionally, it can also check DNSBLs as part of its scanning process.
SpamAssassin’s protocol allows it to be integrated with other protocols, and some developers have already begun implementing SpamAssassin-based trackback protection.
Using SpamAssassin as an element of Trackback spam detection should provide a considerable level of protection.
Summary
By combining IP-based spam detection via DNSBLs with content scanning from tools such as SpamAssassin and requiring that Trackbacks originate from servers and not from dynamically assigned IP space, Trackbacks can be saved…
Continue reading →