AKA Marketing.com Logo            VISIT THE BLOG            

Blogged thoughts, is our web blog. Expect views, opinion, rants and tirades about everything and anything 

« Home / Forums »        


Subscribe to our SEO / IT related blog by entering your email address below

Blogged thoughts

| by the www.akamarketing.com team

Archive for July, 2008

Using Wordpress? Check the text only version of Google’s cache for hidden spam links

Thursday, July 31st, 2008

I’m up to my eyes programming another adwords API system at the moment so when I discovered that my Wordpress installation had been hacked, I wanted to strangle someone (ideally the person responsible) because I really didn’t have time for this.

Wordpress hidden spam links hack
The hack didn’t shutdown my blog but it might as well have because it made all my posts unfindable on the major search engines for any of their related keywords (and exact string searches). The hack I fell victim to involves some waste of space making secret changes to Wordpress source files and the Wordpress database enabling him to output a tonne of hidden links on all blog pages via a hidden Wordpress plugin. The links were complete keyword stuffed spam with anchor texts such as ‘viagra’, ‘xanax’ and ‘teeth whitening’ common among them so needless to say the search engines don’t like my blog pages anymore.

What makes this hack hard to detect is that fact that the links only get outputted when a major search engine visits a page from an ‘infected’ Wordpress installation so blog readers will likely not notice until a lot of damage is already done to your Google, MSN and Yahoo rankings. I myself only stumbled upon it earlier today when I seen all the links near the bottom of Googles’s text only cache of my last post about converting to PDF from within PHP so it was by pure chance. The links where present on the regular cache too, however they where contained in a hidden div so could not be seen by anything except the search engines… unless you viewed the page source.

Want to see an example? Well right now there a lots of cached examples on Google of what this hack did to my pages, but I’m hoping they will be gone soon so here’s a copy of the text only cache of http://www.akamarketing.com/blog/109-php-to-pdf-conversion-with-tcpdf.html from today (31st July 2008).

How can I tell if my Wordpress blog has been hit with this?
Easiest thing to do is to just visit Googles text only cache page for a couple of your blog posts (and perhaps your main blog page) and keep an eye out for about 50 spam links towards the end of the page. If you have caching by search engines disabled you can use something like Curl and ‘fake’ your user agent string to appear as if your Google (and then check the page source). I’ve done it already for you though with a iamgoogle.php script, visit http://www.akamarketing.com/iamgoogle.php?url=http://www.akamarketing.com/blog/&google=1 while replacing my URL ‘http://www.akamarketing.com/blog/‘ to the URL of one of your blog pages. When the parameter google is equal to 1 the user agent is ‘Googlebot’, when it’s anything else a regular ‘human’ user agent is used. If your checking your blog main page be sure to add the trailing slash after your blog folder as Wordpress implements a redirect from the non slashed URL version to the slashed URL version so you’ll just get a ‘Moved Permanently’ message without the trailing slash. The code of iamgoogle.php is available for those of us that are ‘into’ PHP.

If I’ve been hit with this hidden link hack how to I get rid of it?
After discovering this hack my first port of call was Google to try and search for some good information. I found three particularly good articles about what this hack is and how to get rid of it so I’ll just point you in the direction of a couple of existing posts if you don’t mind (it’s been a long day) rather than go through how to remove this in detail. The posts below all helped me:

Wordpress exploit giving backlinks, redirects and headaches but no visitors ;)

Wordpress exploit: we been hit by hidden spam link injection

Has Your WordPress Been Hacked Recently?

The above links will fill you in on the complete story but in essence fixing this hack for me involved doing a bit of fiddling with the Wordpress database, deleting some files with strange extensions and upgrading Wordpress from version 2.0.2 to 2.6. On that note I must say hats off to the Wordpress development team, it was pretty much the most pain free web application upgrade I’ve ever be done… (although I did backup everything twice just to be safe). If you already have the latest version of Wordpress I’d still recommend replacing your source code with ‘fresh’ code just in case it’s been edited (which is very likely for this hack).

How can I detect something like this in the future?
After I upgraded Wordpress I was pretty certain that my installation was now clean, however I asked myself how can I detect something like this more quickly (I have a hunch that this hack was ‘active’ since April) in the future if it happens again? I came to the conclusion that I needed some sort of file integrity checker similar to Tripwire to alert me when any of my www space files change. 

Tripwire and many other similiar systems are not usually available on shared hosts but they all essentially take a sha1 (or md5) hash of all watched files, store the hashes and then periodically compare the stored hashes against regenerated ones to check if any files have been edited so writing something custom specific to my needs wouldn’t be that hard to do.

OK that’s enough rambling for today, here’s hoping you have a had a better day than me.

PHP to PDF conversion with TCPDF

Friday, July 25th, 2008

Recently I had a development client which as part of a larger system had a requirement of creating a PDF based report from his clients metrics, KPI’s etc. which he could then forward onto them. It was simple numerical data but for presentation purposes it was needed in PDF… you know to look good.

In the past when budget was less of an issue I used PDFLib, a commercial library which these days is available as part of the core PHP package. This project however required me to look for a free alternative. I found TCPDF on Sourceforge. It had almost 80,000 downloads, good documentation, lots of examples and was being used by applications such as Joomla, Drupal, Moodle and phpMyAdmin so I said I’d give it a go.

Installation was easy, basically I just needed to copy the TCPDF folder to my www space and require() the main class file from PHP scripts that needed to create PDFs on the fly.

I have to say I found it quite a slow & tedious process to create the more complex dynamic PDFs with this library, however this is because of what I was trying to do in the overall sense and was not the libraries ‘fault’, after all creating PDFs dynamically is quite different than creating webpages dynamically. I found having to work out all the ‘maths’ for positioning elements and the fact you can’t just press refresh to see if your latest line or two outputted as intended the most frustrating.  

OK to give you a feel for how the TCPDF class library can be used I’ll go through how I actually created the PDF report which my client wanted by providing a striped down version of the code. The two interesting things about the report was that it had to have a table with all the data and the page the table was on had to be presented in landscape style (because the table was wide). The table I output is related to Golf and is very simple, but hopefully it will be a good TCPDF starting block for you. 

Creating a table with TCPDF
Within the TCPDF class there are a couple of useful methods which enable me to output a nice table with DB data embedded in the cells. These are writeHTML(), writeHTMLCell()Cell() & MultiCell(). I had to rule out Cell() for the most part as it does not support putting HTML into the cell data. Although I could have outputted a standard HTML coded table using writeHTML() I went with MultiCell() in the end. The code below is similar to what I used, it produces this PDF (please right click and save as… otherwise your browser might crash). Be sure to change the line that says ‘FIX THIS LINE’… I had to remove the HTML because Wordpress was acting the goat again. The full not-messed-up-by-wordpress version is available too.

//reference the class so you can use it

 // create new PDF document

 //do not show header or footer
 $pdf->SetPrintHeader(false); $pdf->SetPrintFooter(false);

 // add a page - landscape style

 // set font
 $pdf->SetFont(”freeserif”, “”, 11);

 //Colors, line width and bold font for the header
 $pdf->SetFillColor(11, 47, 132); //background color of next Cell
 $pdf->SetTextColor(255); //font color of next cell
 $pdf->SetFont(”,’B'); //b for bold
 $pdf->SetDrawColor(0); //cell borders - similiar to border color
 $pdf->SetLineWidth(.3); //similiar to cellspacing

 $cols=array(’Rank’,'Player’,'Pts. Avg.’,'Total Pts.’);//Column titles
 $width=array(20,50,40,30); //amount of elements must correspond with $header array above

 for($i = 0; $i < count($cols); $i++)
     //void Cell( float $w, [float $h = 0], [string $txt = ''], [mixed $border = 0],
     //[int $ln = 0], [string $align = ''], [int $fill = 0], [mixed $link = ''], [int $stretch = 0])

 $pdf->Ln(); //new row


 //styling for normal non header cells
 $pdf->SetTextColor(0); //black

 //the data - normally would come from DB, web service etc.
 $rank = array(’1′,’2′,’3′);
 $player = array(’Tiger Woods,  USA’,'Phil Mickelson,  USA ‘,’Padraig Harrington,  Irl ‘);
 $playerWWW = array(’http://tigerwoods.com/’,'http://philmickelson.com/’,'http://padraigharrington.com/’);
 $avgPts = array(’10′,’9′,’8′);
 $totPts = array(’100′,’90′,’80′);

 //create & populate table cells
 for($i = 0; $i < count($rank); $i++)
       if($i == "2")//highlight Harrington because he Irish...
      {                //in reality you might highlight profits/losses etc.
          $pdf->SetFillColor(89, 239, 152); //green
         $pdf->SetFillColor(255); //white
       //link the players name to his website

      $playerANDlink = “a href=\”$playerWWW[$i]\”>$player[$i]/a”; //FIX THIS LINE
     //int MultiCell( float $w, float $h, string $txt, [mixed $border = 0], [string $align = 'J'],
     //[int $fill = 0], [int $ln = 1], [int $x = ''], [int $y = ''], [boolean $reseth = true],
     //[int $stretch = 0], [boolean $ishtml = false])
     $pdf->Ln(); //new row

 //output the PDF to the browser
 $pdf->Output(”./pdfs/example.pdf”, “F”); //F for saving output to file

PDF creation and setup
OK I’ll briefly go through this code then. The first couple of lines really just sets up the PDF document or pages within the document, please refer to the TCPDF class documentation for more information. The only real item of note here is the method for creating a landspaced PDF page. The default AddPage() method takes no parameters and with this a page is created with the default page style (as per the overall TCPDF config file) which is usually portrait style, so pass in an ‘L’ for landscape pages. It is possible to have some pages landscape and some portrait style in a single PDF document.

Table Header
The TCPDF class has a lot of methods for setting the style of elements. The styles set will correspond to the next cell/element drawn. Most of them are obvious. SetFillColor() sets the background color of a cell when that cell is set to be painted or filled. The fun begins though when you actually start outputting cells (retangles). The header is just plain text so I used cell(). Cell() is well documented on the TCPDF site and it is easy to use. Parameters in order from left to right are, width, height, cell text, border true or false, where next cell should go, cell alignment, fill in cell true or false, optional link and stretch options.

The $ln - where next cell should go parameter, is useful if you want to build your tables vertically rather than horizontally. Leave it at 0 to go to the right and then call Ln() (kind of like what tr does in HTML table) to start a new row is what I suggest. If the fill parameter is set to true the cell background will be the color set by SetFillColor() as mentioned above, if no fill color has yet be set, the background will be grey. My header is built by using a loop to create the four required cells. The first iteration in the loop will be:


which means create a cell of width 20 and height 7 with its value set to “Rank”. It should have a border, have its value centered and should have its background filled in.

Table Body
The main body of the table is very similar, but uses the method MultiCell() as we want the ability to output HTML as the cells’ value. A couple of arrays of data are created and populated. These will slot into the cells we are about to create. In reality the values of the cell will likely come straight from a DB or webservice but hardcoded arrays is fine for this sample.

MultiCell() has a lot of the same parameters which we have come across when using cell() above so I won’t mention them again. It also introduces a couple of new parameters including, X and Y for setting the positional coordinates of a cell, Reseth which resets the height of the last cell (without setting this to true your likely to get crazy looking tables… leave it to true and forget about it) and ishtml which determines if the cell value can hold HTML or not. MultiCell()’s full definition is below.  

int MultiCell( float $w, float $h, string $txt, [mixed $border = 0], [string $align = 'J'], [int $fill = 0], [int $ln = 1], [int $x = ''], [int $y = ''], [boolean $reseth = true], [int $stretch = 0], [boolean $ishtml = false])

It’s pretty simple to use. It provides power by allowing you to set the exact X and Y coordinates of a cell, but also ease of use in the sense that if you don’t specify values for X and Y it will just output at the current position (just like cell() does) so you don’t have to do any logic to get suitable X & Y values… in most cases anyhow.

After four calls to MultiCell() which printed one row of cells, we call Ln() to move to a new line. In fact we didn’t even need to do this to be honest, we could have just changed the $ln parameter value from 0 (to the right) to 1 (to the beginning of the next line) on our fourth cell in each row. The code then would change from this:

$pdf->Ln(); //new row

to this:


Personally I prefer the first way of doing things as it’s more obvious that a new line/row is being outputted. 

Before the call to MultiCell() I changed the fill colour of the cells related to Padraig Harrington (for those that don’t know who he is… he’s a two time Golf Major champion from Dublin), I set them back to white for all other rows. Of course that’s more hardcoding, in a ‘real world’ scenario you might highlight your good figures in green and your bad figures in red.

Outputting the final PDF
When you’ve finished creating all required cells, images, links, text etc. you have to call the Output() method to actually get your hands on your dynamically created PDF. This can take no parameters in which case the PDF is sent to the browser, more commonly though, developers specify the filename and the destination of the generated PDF. The destination can be one of four values, these are:

I: send the file inline to the browser.
D: send to the browser and force a file download with the name given by name.
F: save to a local file with the name given by name.
S: return the document as a string.

You can see my code sets the destination value as F:

$pdf->Output(”./pdfs/example.pdf”, “F”);

this is telling TCPDF to save the dynamically generated PDF document in the pdfs folder with the name example.pdf. On Windows it’s not needed but on Unix based machines you will need to set appropriate permissions on the pdfs (or whatever) folder to allow TCPDF to write the pdfs to it.

A little tip when your developing locally (as opposed to directly on your webhost) and using ‘F’ for the destination parameter is to create your PDFs with a random filename so you can simply press refresh on your script that does the PDF creation logic. If you have a static filename as I do in this example (called example.pdf) and you have the last generated PDF file (also example.pdf) open TCPDF will not be able to write the PDF (as it is aleady open, so a sharing violation error will occur internally). What I often use for random filenames during development is sha1(microtime()), this means to check changes I just need to press refresh on my PHP script and then visit my PDFs folder without having to close previous versions of my PDF.

S is useful if you want to sent the PDF as an attachment in an email without first saving it to disk somewhere.

Both I and D allow you to access the PDF quickly via the browser. A note about these two lads is this… Internet Explorer often looks at the extension of the file, (which will be .php) and assumes that the output will be HTML and thus will not present you a PDF, it will likely present a load of binary data in the webpage itself which obviously is not what you want. Firefox handles both I and D perfectly so I recommend using this during development, you obviously need to keep this in mind when you go into production too as your users might have the same problem too. It might be an idea to save to disk first, provide a link to the pdf and then periodically purge your temp PDFs folder.

I guess you could say that was kind of an introduction to TCPDF, my own introduction to it came from the TCPDF examples page. Thanks to Nicola Asuni for all her hard work on the examples and on TCPDF itself of course.

At this stage I’m really only learning TCPDF myself too so at the moment so I can’t really comment on its real power yet. I’ve come across a couple of issues using it so far but none were without workarounds, I imagine the commercially available libraries will out do it but for a library that’s free and relatively easy to use I offer my closing statement as… so far so good.

12 Lorcan Crescent, Santry, Dublin 9, Ireland +353 87 9807629