Login to download the latest version of Mint and your favorite Pepper, purchase additional licenses, or post in the Forum. Don't have an account? Create one!

In Partnership with Media Temple

Mint Forum

Crawlers Pepper

mls
Minted
Posted on Mar 06, '07 at 07:28 am

I think I’ll have it ready for another testing later today or tomorrow, so if you’re interested send me an email and I’ll send you a copy as soon as I can.

admin [at] mlslatest.com

JCFP
Minted
Posted on Mar 06, '07 at 09:49 am

mls,

do you have a website for us to keep track of this pepper development?

it sounds like a great pepper.

plz see email

mls
Minted
Posted on Mar 06, '07 at 09:21 pm

Public Beta for everyone: mlslatest.com/pepper

If you have any problems please post here.

JCFP
Minted
Posted on Mar 06, '07 at 09:36 pm

great job, mls.

just want to know if the redirect bug had been fixed with this beta?

also, as a feature request, is it possible to have a pane showing the subtotals for the various search engine crawls (not just individual pages), with subpanel that allow for comparisons between different time periods (e.g., vs. past 24 hours, 48 hours, 1 week, etc)?

mls
Minted
Posted on Mar 06, '07 at 09:44 pm

Yep, found the problem… appeared to be my (as well as some of the testers) placement of the tracking code.

Basically, my site was being redirected because it was attempting to use the Mint database for all of its regular database queries. This was due the positioning of the tracker include. As long as you place your tracker code in the correct spot you’ll have no problems.

I explained it in detail in the readme, but you can read about it here: http://php.net/manual/en/ref.mysql.php

JCFP
Minted
Posted on Mar 06, '07 at 10:03 pm

mls,

is there any possibility to enable your tracker by non-php methods? i ask this because the blog software i am using, by default, does not allow for php codes. though it is possible to turn that feature on, for security reasons we have not done so for all of our templates that run the blog. thus, for now, i am only able to track pages that has php enabled templates.

just a thought…

mls
Minted
Posted on Mar 06, '07 at 10:08 pm

It is possible to track via JavaScript, but most crawlers and robots do not have JS enabled… thus you’re not going to be able to track them at all.

I believe User Agent 007 tracks crawlers with JS enabled.

JCFP
Minted
Posted on Mar 06, '07 at 10:20 pm

mls,

i read your post on php.net, can you clarify exactly where do i need to insert your php code?

i have a blog that uses templates, whereby a master templates has multipled embedded templates, each of which handles the header, the banner, the body, the footer, the sidebar etc…

i take it that i will need to insert your code into my embedded template that handles my header since this is the first template. do i put your php code at the beginning above all other codes?

also, from the prespective of a search engine or another else looking at the referrer header, apache logs, etc., will there be any change with and without your php code (i.e., any alteration AT ALL)?

thanks in advance for the clarification…

mls
Minted
Posted on Mar 06, '07 at 10:32 pm

The best place for it would probably be in the header.php-type file, instead of the template file… but it may also work in the header template depending on how the blog software is coded.

You can always place it in your blog template and visit your site with the User Agent Switcher Firefox extension to make sure everything is working correctly.

I’ll try and look into popular blog software and CMS software and figure out the best place to place it.

JCFP
Minted
Posted on Mar 06, '07 at 10:36 pm

mls,

that is great… if you can post your suggestions for the following blog software:

  1. wordpress
  2. expressionengine
  3. moveable type

much appreciate… these should cover most bloggers…

I’ve just installed it on my Expression Engine site. But I’m unsure what you mean by before any other db calls.

Does it need to be within the head tag? or can we place it before the head tag?

I’ve set Mint up to insert itself just before my tag, I use a EE call to get a page title so would it be best to put this straight after the tag?

Am I right in thinking I could add this using an .htaccess rule?

Something like:

AddType application/x-httpd-php .html .htm php_value auto_prepend_file /home/me/mydomain.com/mint/pepper/mlslatest/crawlers/tracker.php

in my post my tag and after the tag are missing < head > and < / head > tags respectively

mls
Minted
Posted on Mar 07, '07 at 07:16 am

@David Webb

It does not need to be within the < head > tag. It should be placed within the header file, or some sort of PHP file that is included throughout your site. I will look into placement for WP, ExpressionEngine, and Moveable Type later today.

What I mean by DB calls, is you want the tracker code to execute before any other MySQL queries are made within every page of your site. What will happen is, since the last database that was connected to (in this case the Mint DB) will be used for the rest of the queries within your site. For example your blog may be trying to grab all the posts to display from the Mint database instead of using it’s own.

This would only be an issue if ExpressionEngine does not set a link_identifier for it’s MySQL functions.

@giginger

Great idea. That should work, but I have not tested it myself. I’ll test it out a little later on today when I get a chance.

If anyone is unsure about how the pepper will function on your site you could always place the tracker code on a private page and test it out using the User Agent Switcher. If everything on the page is operating normally you should be able to add it site-wide.

I’ll give it a whirl and let you know if I break anything.

Well I tried it but it seemed to remove my mint include. I was doing it wrong somehow. I’ll wait until somebody more knowledgeable does it :)

OK. so if it doesn’t need to be called to any specific location, (I assume because only crawlers see it) how does it or does it not get called first last or anything else?

For expression engine I’d suggest putting the call in the main index.php file on your server. This way its included in every page that your site serves.

I call both my mint include and my google analytics include via this page, but both get placed in a specific location on the delivered page. As I said I’m not totally understanding where this include is placed on the delivered page or if its even visible to joe public.

mls
Minted
Posted on Mar 07, '07 at 03:19 pm

The thing about the include is that is not visible anyone. PHP is executed server-side as oppose to by the browser itself. A regular user cannot see anything, nor can a crawler. In a way it is transparent tracking. This allows it to track ANY visit to your website. Regular users will still have the file included, of course they won’t see anything, and nothing will be recorded since the user agent string will not match that of a crawler’s.

I just took a look at Expression Engine and if you place the code in index.php minus the <?php and ?> tags you should be tracking with no problems. Just be sure to place it somewhere near the top, right before this should work:

// DO NOT EDIT BELOW THIS!!!

If anyone is running some of the popular blog software and finds a place that it will work with no problems please share where you placed it and I’ll include some instructions in the next release.

Cheers for the clarification. I’ve popped the call in both my index files. So we’ll see if it works in a couple of days i guess.

For Wordpress: I updated the WP-Mint file to put it in right where it puts the .js file in (ie edited the wp-mint.php file and added it right below the js include line). Which puts it in the header, seems to work fine for me right now.

mls
Minted
Posted on Mar 07, '07 at 08:22 pm

I’ve posted an optional, but recommended update for my pepper. Crawlers version 0.6. It simply adds an extra feature that I feel might help some people out.

You can grab v0.6 here

I’ve added the ability for you to “simulate” exactly what a crawler will see when they visit a page on your site. By adding a simple query to the end of your site’s URL from within browser you can view the site just as a crawler would.

To simulate what a crawler will see when they visit your website add ?crawler_observe=string to the end of various page URLs in your browser. For example entering the URL: yourdomain.com/page.php?crawler_observe=googlebot

will simulate what Googlebot will see when it crawls page.php.

It doesn’t really matter what string you choose as long is it is one of the crawlers that you are tracking. The only purpose to setting a string is that your visit will actually get recorded. You can clear your test visits in the preferences pane by ticking the new option that I have added.

I plan to release v1.0 within the next few weeks.

Hey, Just checked my crawlers pane and the sites been crawled by three bots. As you can see below.

Now if you hover over a link its giving the full link as http://mint.photography-of-rock.com/……

Which is obviously wrong. I run mint from a sub-domain of my site. But the site runs from photography-of-rock.com and galleries.photography-of-rock.com.

The page I’ve highlighted should actually be http://galleries.photography-of-rock.co … images/229

I assume its your pepper generating the wrong URL purely because the crawlers will have no knowledge of mint or where I’ve put it.

Hope this is an easy fix. It could also be nice to see the page titles as well as the urls.

Crawlers

SDJL
Minted
Posted on Mar 08, '07 at 06:40 am

If I use the ?crawler_observe=googlebot appended to any URL on my domain, I get the following error:

Fatal error: Call to a member function logErrorNote() on a non-object in /home/sdjl/public_html/mint/app/lib/pepper.php on line 56

David

mls
Minted
Posted on Mar 08, '07 at 07:33 am

@David Webb

Thanks for pointing that out, I’ve fixed the problem and will have it included in the next release. I’ve also done some work on recording page titles and hopefully I’ll be able to get that out soon.

@SDJL

Thanks for the details, I’ll look into it and try to find a fix for the next release.

You must be logged in to reply. Login above or create an account