Author Topic: [module] Google sitemap.xml Generator  (Read 134728 times)

Offline viulian

  • Posts: 53
[module] Google sitemap.xml Generator
« on: November 10, 2005, 12:28:17 AM »
Hi guys!

Introduction:

This module creates the file sitemap.xml required by Google Sitemaps; for more informations about Google Sitemaps, please visit: http://www.google.com/webmasters/sitemaps/docs/en/about.html

Observations:

  • It is my first module :) but I have tested it both on Linux and Windows and it works. (i tested only with WB 2.5.2).
  • The sitemap.xml validates successfully using xsv.
  • sitemap.xml is generated in the root of the WB installation, for example http://server/wb/sitemap.xml. Please make sure that file is writeable by the webserver.
  • If you want to uninstall the module, please make sure to first remove the page you created. Otherwise an error with "already in use" will appear.
  • It will only generate links for the visibility='public' pages. I have not yet added support for directories in the gallery (I use PicKLE)... but I intend to do that.
  • The daily/weekly/monthly/yearly update frequencies are computed based on page's last modification time.
  • As WB doesn't have (or at least I did not see it in the database) infos about the visitors for each page, all the pages get a priority of 1.0. (I would have used the visitor counts to generate priorities on the fly.).

Steps to install and use:

1. Install the module
2. Create a hidden page of type Google sitemap.xml Generator (or a private page for the Administrators of the same type)
3. You can then access the http//server/wb/pages/[abovepage].php (if it is hidden then you can access it directly; if it is private then you first have to login as admin).
3.1 You can also generate the sitemap.xml by modifying the module page which you created. There's a link there you can use to invoque the page itself and generate the file. (sorry for this, but I did not figure out how to create an admin/administrator module :) so I had to create a page so I can get this done quicker.

Finally, for any other questions/problems, please post here.

Change Log

Version 1.0.4
- menu links that point outside the site (external links) are now skipped.

Version 1.0.3
- fixed google resubmission request (sitemap path was badly formed when composing the request)

Version 1.0.2
- links to news posts (not only to news pages as it was in 1.0). Only active posts are included.
- http resubmission of the sitemap.

Version 1.0
- original version.

The module archive:

The latest version of the gsitemap module for WB 2.5.2 is attached in this post.


[attachment deleted because of being too old]
« Last Edit: February 20, 2006, 04:27:04 PM by viulian »

Offline i2Paq

  • Posts: 524
  • Gender: Male
  • Tempelier, on bare feet!
Re: [module] Google sitemap.xml Generator
« Reply #1 on: November 10, 2005, 08:02:47 AM »
OK, I've installed your modules and when I create the page it say's

Quote
Hi! You cannot modify anything on this page :)
Please click on the link below:

http://wb252ml.opensourcebakery.nl/pages/google.php

to generate the /var/www/wb252ml/sitemap.xml file.

Please make sure that the file is writtable by the web server.

When I go to the link mentioned it say's

Quote
Done in 0 seconds.

Now what?
Opensource is my life, but then elsewhere.

Offline viulian

  • Posts: 53
Re: [module] Google sitemap.xml Generator
« Reply #2 on: November 10, 2005, 08:21:40 AM »
That means success :) and you should be able to access the sitemap file at:

http://wb252ml.opensourcebakery.nl/sitemap.xml

You have to create an account with Google, submit the URL above for the sitemap, and google will start crawling all the links and so on on your site.

Offline i2Paq

  • Posts: 524
  • Gender: Male
  • Tempelier, on bare feet!
Re: [module] Google sitemap.xml Generator
« Reply #3 on: November 10, 2005, 09:31:04 AM »
What if things (pages) change, get deleted or added to my site? Do I just run http://wb252ml.opensourcebakery.nl/pages/google.php again?
Opensource is my life, but then elsewhere.

Offline viulian

  • Posts: 53
Re: [module] Google sitemap.xml Generator
« Reply #4 on: November 10, 2005, 09:52:04 AM »
Yes.

Another ideea is to put a cronjob so that page is accessed each time during midnight - hence it will run automatically. I don't usually run the sitemap daily because I have static pages and not dynamic content ones.

I have to search more about modules that are "admin", maybe those can be triggered automatically whenever the content changes - hence regenerating the sitemap on the fly.

Offline i2Paq

  • Posts: 524
  • Gender: Male
  • Tempelier, on bare feet!
Re: [module] Google sitemap.xml Generator
« Reply #5 on: November 10, 2005, 10:32:32 AM »
Well, as most of the sites I build have static pages so an automatic feature is not needed.

If the other "mod" builders could have a look at your contribution and maybe have their way with it we could make it a completed module.

I wait for their opinion and the upgrade the rest of my site's.

btw. I'll guess that when you need a manual install (PHP safe-mode) you just copy the files in \yoursite\modules\gsitemap and run the install.php?
« Last Edit: November 10, 2005, 02:05:19 PM by i2Paq »
Opensource is my life, but then elsewhere.

Offline viulian

  • Posts: 53
Re: [module] Google sitemap.xml Generator
« Reply #6 on: November 10, 2005, 09:21:36 PM »
Well, I did not tried it to be sure...

However, install.php is almost empty (it is just the 2 lines basic install.php, the module does not need settings to be stored in the database) and I don't use other files outside its directory.

So I assume you can just copy in it's own directory, and is as good as it gets :)
« Last Edit: November 10, 2005, 09:23:36 PM by viulian »

Woudloper

  • Guest
Re: [module] Google sitemap.xml Generator
« Reply #7 on: November 10, 2005, 11:47:10 PM »
Sound like a nice module that is good for working on SEO, but would it also be possible to add support for the following option: "Resubmitting using the My Sitemaps page or using an HTTP request" which is described on this page over at Google....

upd: Would it also be possible to customize the fields:


per page? Furthermore I see Google is requesting to zip the files. Would it be possible to work with .gz compression for the sitemap.xml? And one other thing... Do you also convert the special characters (like: & etc.) correct as mentioned by Google??

In the end I would like to say I like the module and encourage people to work on more and more modules for WebsiteBaker.
« Last Edit: November 11, 2005, 12:07:20 AM by Woudloper »

Offline viulian

  • Posts: 53
Re: [module] Google sitemap.xml Generator
« Reply #8 on: November 11, 2005, 12:25:08 AM »
Hi!

Thanks for the input :)

1. Resubmitting is doable of course, after creating it.

2. Change frequency is updated based on the last modification time of the page. That is
  • if a page is last modified today, then change frequency is daily
  • if a page is modified a couple of days ago, till 7 days ago, change freq is weekly
... and so on, till yearly.

2.1 Priority... Frankly, I wrote this module first for SMF - and there I used the visitor counts (for >1000 per thread, I declared it as priority 1.0 and for ~500 visitors as priority 0.5). I 've been thinking on how to compute the priority on the fly for a wb page, but I found nothing I can use.

Do you guys think of a new column/page in the database, where the admin can put a priority value ?

3. I already have the code for .gz :) just haven't added it so the .xml can be read in a browser.

4. For the escaping, you have a point. I use the page name as found in the WB database, and somehow I assumend that to be already escaped. I will add escaping too, to be certain  8-)

LATEREDIT:

There's something in Google that I have not yet implemented, but its only used when huge number of pages have to be put into the sitemap.xml (you need to create multiple sitemaps, if there are more than 10Mb in the sitemap.xml or more than 50.000 links).
But that's too time consuming for me to do, given the fact that only very few people will really need it.
« Last Edit: November 11, 2005, 12:30:06 AM by viulian »

Offline zuccs

  • Posts: 15
Re: [module] Google sitemap.xml Generator
« Reply #9 on: November 14, 2005, 12:38:02 AM »
excellent mod mate! works very well.  :mrgreen:

one little suggestion though, how hard would it be to implement in all the news posts into the xml file output as well as the normal WYSIWYG file type pages...? Most of my pages are based around news posts and not static content pages as I am sure a lot of others websites would be too....if this is not too hard to include would you be kind enough to add it to your mod?

thanks

Offline viulian

  • Posts: 53
Re: [module] Google sitemap.xml Generator
« Reply #10 on: November 14, 2005, 10:19:01 AM »
Hi! (and thank you :) )

I have added a couple of news on the my site, ran the sitemap and  the news items do exist in sitemap.xml.

If you tried the module, please provide more information (like website, sitemap.xml, and a link to a news item that is not present in the .xml).

Thanks!

Offline zuccs

  • Posts: 15
Re: [module] Google sitemap.xml Generator
« Reply #11 on: November 15, 2005, 06:23:43 AM »
URL: Myspace Place

Sitemap: WB Sitemap

and every news item is not present in the xml file. eg. Myspace Templates

thanks mate

Offline viulian

  • Posts: 53
Re: [module] Google sitemap.xml Generator
« Reply #12 on: November 15, 2005, 12:07:52 PM »
Hi!

I have updated the module, and added:
  • sitemap.xml HTTP resubmission - per Woudloper request.
  • news posts are now included in the sitemap.xml - per zuccs request.

Please download the attachement from the first post in the thread :)

I will try to include the other requests too in the next updates.

Offline zuccs

  • Posts: 15
Re: [module] Google sitemap.xml Generator
« Reply #13 on: November 16, 2005, 05:43:47 AM »
amazing!

thanks for the mod update mate, keep up the good work.

mangione

  • Guest
Re: [module] Google sitemap.xml Generator
« Reply #14 on: January 04, 2006, 05:47:41 PM »
Wow, this sitemap creator is really nice. I solved an old problem with it. Thanx a lot.

Offline bupaje

  • Posts: 570
    • http://www.stormvisions.com
Re: [module] Google sitemap.xml Generator
« Reply #15 on: January 08, 2006, 07:23:58 PM »
Hi. I installed this and get

Done in 0 seconds.

Submitting sitemap to Google...
Done.
Answer from Google is:
--------------------------------------------------------------------------------
HTTP/1.0 400 Bad Request Cache-control: private Content-Length: 145 Date: Sun, 08 Jan 2006 18:22:22 GMT Content-Type: text/html Server: GFE/1.3 Connection: Keep-Alive
Bad Request
Error 400
Done.


any idea what I need to do?
My Blog, My Site

Offline viulian

  • Posts: 53
Re: [module] Google sitemap.xml Generator
« Reply #16 on: January 10, 2006, 01:15:39 PM »
Well, is it still giving out errors ?

I am wondering whether this is not a temporary issue with Google.. or maybe you have an older version of php and it didn't properly form the request ?

The code is actually very simple, it souldn't bahave badly... Please PM (or post online) the link to your sitemap and I'll try to figure it out.

Thanks

Offline bupaje

  • Posts: 570
    • http://www.stormvisions.com
Re: [module] Google sitemap.xml Generator
« Reply #17 on: January 10, 2006, 10:54:50 PM »
The xml file is correctly created here



the page is here



what permissions do I need to set the xml file to? I tried 775 but didn't work and so reset it to 664 for now as I don't really understand all those permissions.

Thanks for the mod and any help you can offer.

« Last Edit: April 29, 2009, 05:22:49 AM by bupaje »
My Blog, My Site

Offline viulian

  • Posts: 53
Re: [module] Google sitemap.xml Generator
« Reply #18 on: January 11, 2006, 12:17:09 AM »
Are you familliar/confortable with editing .php files ?

If so, please find the modules/gsitemap/view.php file and search for this line
Code:
Only registered users can see contents. Please click here to Register or Login.
and add two lines like below:
Code:
Only registered users can see contents. Please click here to Register or Login.
After this, please save / rerun the sitemap generator and post the output (or let me know that you did the change and I will access again the google.php page).

The things I suspect are:
1. the $sitemapURL is not computed properly (WB_URL might not be set correctly) but this is unlikely because otherwise the forum would not work.
2. the urlencode method of php returns empty result for a valid $sitemapURL

If you are not confortable with it, please wait till tomorrow when I will release a new version of the module, because I just found a bug in the code which makes automatical submission to google not working.
The sitemap path is not computed properly in the request. But please proceed on adding those two lines, we want to make sure urlencoder works for you.

About the bug, the line:

Code:
Only registered users can see contents. Please click here to Register or Login.
should in fact be (uppercase S in $fSep):

Code:
Only registered users can see contents. Please click here to Register or Login.
I will release a new version tomorrow - thanks for letting me know of your problem :) otherwise I would have not found this new issue.

Offline bupaje

  • Posts: 570
    • http://www.stormvisions.com
Re: [module] Google sitemap.xml Generator
« Reply #19 on: January 11, 2006, 12:52:50 AM »
Hi. Thanks very much. Changing the case of the s in $fsep to $fSep fixes the problem - thanks. I left the page with the two lines there in case it provides useful info for you and will delete those lines later.

Thanks again!
My Blog, My Site

Woudloper

  • Guest
Re: [module] Google sitemap.xml Generator
« Reply #20 on: January 11, 2006, 11:12:34 PM »
The installation for the module fails (on my windows system). I solved it by recreating the .zip file. Your file contains a folder structure for the general files (view, add, etc.) and this causes an error.


suggestion: Furthermore I was wondering why you solved the generation and submitting of the 'sitemap.xml' file via a front-end page?

  • Couldn't this be handled via the admin section as this can be much nicer and
  • create a dummy page for the frontend of the website

I solved some simular stuff for resubmitting via the admin console for the 'flickr gallery module'. There a link is available and then a popup is being loaded that handles resubmitting and so on, maybe  this is also something you can use...

Maybe you can even integrate this in the existing sitemap module, but I don't know it that is a wise decision....

Offline viulian

  • Posts: 53
Re: [module] Google sitemap.xml Generator
« Reply #21 on: January 13, 2006, 01:55:16 AM »
I will retest too on my Windows machine (I just got it reinstalled) so it will take a couple of days till I get to this too.
I kept the same folder structure as I did for the version 1.0 which did install ok on WB 2.5.2.. Are you using 2.5.2 or a newer version ?

About your suggestions:
- I like the ideea of launching the gsitemap from the admin section, but I did not know how to implement an admin module or a section to add there..
- It doesn't come easy for me to do the popup that handles resubmission (I mean I never did it before) so that's why the clumsy solution with the front-end page (I mean the page can be hidden from the normal users and only be seen by administrators) but.. it was easier for me :-D that way.

About the integration you were telling, I am OK with it - the code is actually very simple for this Google sitemap module, can be integrated easily. However, some people (just like I do) might only want a SEO module and are not using a sitemap module at all.

nelsoni

  • Guest
Re: [module] Google sitemap.xml Generator
« Reply #22 on: January 16, 2006, 10:19:05 AM »
Hi guys - i'm new to this, could you tell me why I get this error when I try to install the module??

Warning: main(/home/fhlinux205/s/solwaycoastrally.co.uk/user/htdocs/temp/unzip/info.php): failed to open stream: No such file or directory in /home/fhlinux205/s/solwaycoastrally.co.uk/user/htdocs/admin/modules/install.php on line 60

Warning: main(/home/fhlinux205/s/solwaycoastrally.co.uk/user/htdocs/temp/unzip/info.php): failed to open stream: No such file or directory in /home/fhlinux205/s/solwaycoastrally.co.uk/user/htdocs/admin/modules/install.php on line 60

Fatal error: main(): Failed opening required '/home/fhlinux205/s/solwaycoastrally.co.uk/user/htdocs/temp/unzip/info.php' (include_path='.:/usr/share/pear') in /home/fhlinux205/s/solwaycoastrally.co.uk/user/htdocs/admin/modules/install.php on line 60


Ian N.

Woudloper

  • Guest
Re: [module] Google sitemap.xml Generator
« Reply #23 on: January 16, 2006, 10:44:21 AM »
As mentioned in the post above and in another post on the forum. This has to do with the fact that the new version of WB handles subdirectories in the .zip files. To solve this you need to unzip and rezip the files and make sure no folder structure is available in the zip file...

Offline viulian

  • Posts: 53
Re: [module] Google sitemap.xml Generator
« Reply #24 on: January 16, 2006, 10:58:32 AM »
As probably people are starting using WB 2.6, I have made a zip file with no directory structure and attached it to the first post in the thread, and due to the limitation of only one attachement per post I am here attaching the older version for WB 2.5.2 :)




[gelöscht durch Administrator]
« Last Edit: February 20, 2006, 04:28:19 PM by viulian »