WebsiteBaker Community Forum

WebsiteBaker Support (2.12.x) => General Help & Support => Topic started by: noname8 on April 22, 2019, 06:13:02 AM

Title: Create mirror from website baker
Post by: noname8 on April 22, 2019, 06:13:02 AM
I would like to create mirror from my baker website.
This mirror i would host in a serverless-server, no php-support, no mysql.

So to do this, i'm thinking of some kind of php curl crawler-script that i'd run every time i update content in the main baker site with admin, once a month. It would need to have different root url, but the same structure of pages and php>html and copy whole media dir also.

Has anybody done this, I don't know where to start...

Thanks for your help
Title: Re: Create mirror from website baker
Post by: hgs on April 22, 2019, 07:28:23 AM
Did I get that right?
You want to create a backup of your WebsiteBaker live page via script?

My hoster offers this for the webspace and the database, I have this done automatically once a week by a cronjob. and my "mirror" is created automatically. A 2nd cronjob then deletes all older data after 15 weeks again automatically.
Title: Re: Create mirror from website baker
Post by: ruebenwurzel on April 22, 2019, 09:41:07 AM
Hello,

try this here:

http://www.httrack.com/

Matthias
Title: Re: Create mirror from website baker
Post by: noname8 on April 23, 2019, 02:13:00 AM
Thank you for replies

The mirror i would like to create should be .htm only, without .php and mysql. So just clone does not work.
Httrack i have used, but currenly using mac so i would like to have a bash script or php curl job.
[edit: httrack seems to work on mac too, but i would still prefer a quick script that i can run on the server]
Title: Re: Create mirror from website baker
Post by: Martin Hecht on April 23, 2019, 09:58:37 AM
maybe something like wget -m -p -D yourdomain yourstartpage
Title: Re: Create mirror from website baker
Post by: evaki on April 23, 2019, 10:35:57 AM
Cache your pages, and crawl once
Renew a page:
After you have updated a page, delete the changed (html) page. -and crawl  :-D

Your Template:
Code: [Select]
<?php
//$cacheFile=$_SERVER['DOCUMENT_ROOT']."/wb/pages/".constant('MENU_TITLE').".html";
$cacheFile=$_SERVER['DOCUMENT_ROOT']."/htmlout/".constant('MENU_TITLE').".html";
if (
file_exists($cacheFile)) //we can read this cache file back reduce database load
{
header("Content-Type: text/html");
readfile($cacheFile);
exit;
} else {
ob_start(); //start buffering so we can cache for future accesses
}
?>


<html>
<body>
<content>Hello, World!</content>
</html>
</body>

 
<?php

// get the buffer
$buffer ob_get_contents();

// end output buffering, the buffer content
// is sent to the client
ob_end_flush();

// now we create the cache file
$fp fopen($cacheFile"w");
fwrite($fp$buffer);
fclose($fp);
?>
Reg./MfG. Evaki
Title: Re: Create mirror from website baker
Post by: evaki on April 23, 2019, 11:02:13 AM
The other tasks can be solved with PHP
Title: Re: Create mirror from website baker
Post by: evaki on April 23, 2019, 04:17:28 PM
Alternative:
Code: [Select]
https://forum.WebsiteBaker.org/index.php/topic,14663.msg92478.html#msg92478
Title: Re: Create mirror from website baker
Post by: evaki on April 23, 2019, 07:02:11 PM
@an die Helfer
Die vorgelegten Vorschläge bitte ignorieren.
Hintergrund:
Die kamen aus der "Grabbelkiste", was oft zwar nützlich daherkommt, ist in diesem Falle "für die Tonne". Ich guck mir nicht jedes Teil an, das mir "rübergeschoben" wird, diesmal nun doch, weil mir das verdächtig "schlicht" erschien.

Man kann das als Idee nehmen, mehr nur, wenn man noch ein wenig "bastelt".
Habe es am Nachmittag mal getestet. Als Idee tatsächlich interessant. Wenn man z.B. $cacheFile umschreibt
$cacheFile=$_SERVER['DOCUMENT_ROOT']."/htmlout/".$row['link'].".html";
also der jeweiligen Seite den entsprechenden Dateinamen (aus der DB .$row['link'] )verpaßt, klappt das für Dateien in wb_root gut. Für Untermenüs dagegen "noch" nicht, dafür müßten die Vezeichnisse schon vorher vorhanden sein. Dann aber funktioniert auch das. Wer Zeit für sowas hat, kann das ja erweitern/anpassen.

Dies wäre auch nur eine Teilaufgabe, wobei fraglich ist, ob diese zufriedenstellend im Sinne des Topic wäre. Da hinge ja noch'n Rattenschwanz dran. Der ist zwar zu lösen, aber ein alternativer Ansatz wäre vielleicht sinnvoller.

Ich rate eher zu einem Crawler-Script (die man wahrscheinlich nur noch aus den frühen Tagen kennt)

Nochmals Entschuldigung für das ungeprüfte Raufladen.

MfG. Evaki
Title: Re: Create mirror from website baker
Post by: noname8 on April 24, 2019, 12:08:48 PM
Thank you for your replies, I decided to go with http://short.dev4me.nl/

and in the mirror server, decided to have php-support after all (but no sql)
and have the same script but with modifications to make curl to the main site, rewrite all the urls with replace, and then cache the file. Works now 80%, work still in progress..
Title: Re: Create mirror from website baker
Post by: CodeALot on April 24, 2019, 08:54:41 PM
 :-o :-o

Short.php is in no way intended to create some kind of "mirror-without-a-database" from a WB-website. It is a script that will eliminate /pages/ from the URL's.
So how is it that you "decided to go with short.php"?
Title: Re: Create mirror from website baker
Post by: noname8 on April 25, 2019, 02:33:58 AM
I decided to go with short.php because
- all the links and site structure in wb contains .php urls. So just mirroring it to .html jus cannot work
- WB also contains FULL (not relative) paths for every files that is in the site, and these needs to be rewritten
- I like better the urls that are /mything than /pages/mything.php and maybe google likes them more also

so dev4me's .htaccess + short.php contained already 70% what was requirement. I tried the httrack-website copier and that just didn't work. Breaks the site with broken links after clicking links few times. Seems that it cannot rebuild WB atleast not with .php as original urls.

after dev4me solution i'm trying to implement a cache and already made curl that calls the main server and rewrites urls with mirror domain and path and now just need cacheing and few...alot more tweaks.