Author Topic: Upgrade from 2.8.3 to 2.10.0-latin1 to utf8 (Read 71412 times)

contactjw · « **on:** July 16, 2017, 07:41:09 PM »

I'm getting ready to upgrade. I read on the German support forum that there is a problem with older DBs with latin1 entries. My DB has quite a few latin1 entries so I'm planning on finding and replacing in notepad all latin1 entries with utf8. Just want to check to make sure that I understood correctly that this is something that I should do before I upgrade.

jacobi22 · « **Reply #1 on:** July 17, 2017, 07:41:31 PM »

the problem with latin1 comes, if you use special chars like the german Ä, Ö, Ü, ä, ö, ü, ß or other specialchars from other languages. search specially for problems in textfields like page- or menutitle, description, keywords of pages and for the project (wb-settings) , but also in newstitle, bakery-item-title etc,
do you use only english in your project? if yes, i think, thats no problem

contactjw · « **Reply #2 on:** July 18, 2017, 01:57:21 AM »

Yes, I only use English. I will go ahead with the upgrade without changing the entries. If needed, I can always go back and change the entries later but sounds like it shouldn't be needed. Thanks for the info.

contactjw · « **Reply #3 on:** July 18, 2017, 11:56:25 PM »

Seems the problem comes up in English for certain characters like '. I've gone ahead and reverted back to 2.8.3 SP7. If I understand correctly the easiest way to resolve the problem is to replace all the latin1 entries with utf8 using notepad. Is that right? Or is there an easier/better way?

DarkViper · « **Reply #4 on:** July 19, 2017, 02:44:13 AM »

Quote from: contactjw on July 18, 2017, 11:56:25 PM

[…] is to replace all the latin1 entries with utf8 using notepad. Is that right? Or is there an easier/better way?

Never use Notepad for any kind of changes! Notepad is the biggest producer of charset mismatch.

A really good and utf8 secure editor is Notepad++ (Download Notepad++
NP++ also allows to change the charset in the editor..

contactjw · « **Reply #5 on:** July 19, 2017, 04:58:30 PM »

OK. Thanks. Will do.

contactjw · « **Reply #6 on:** July 19, 2017, 08:05:20 PM »

OK. Changed all DB entries from latin1 to utf8. However, that doesn't seem to have resolved the problem. Seems like the problem is only in the news module pages and one page I use with a sortable list. The other pages are OK. When I access one of the affected pages from the editor and check the source code I find â€™ for all of the ' (single quote) throughout a page. (See first example of code below)
For the second problem in the sortable table list I get ï»¿ only in some places on one side or both sides of an entry but not in others. (See bottom portion of the second example below.)
Seems this is the editor changing or adding these characters. Is there a way to resolve this without manually fixing each entry?

1.
<div class="alignCenter">When things go wrong, as they sometimes will, 
When the road youâ€™re trudging seems all uphill, 
When the funds are low and the debts are high, 
And you want to smile, but you have to sigh, 
When care is pressing you down a bit, 
Rest, if you mustâ€”but donâ€™t you quit. 
 

2.
<tr>
 <td width="135">
 <a href="http://en.wikipedia.org/wiki/Marathon_(2005_film)" target="_blank">Marathon</a></td>
 <td width="175">
 <a href="http://en.wikipedia.org/wiki/Jo_Seung-woo" target="_blank" title="Jo Seung-woo">Jo Seung-woo</a> 
 <a href="http://en.wikipedia.org/wiki/Kim_Mi-sook" target="_blank" title="Kim Mi-sook">Kim Mi-sook</a> 
 <a href="http://en.wikipedia.org/wiki/Lee_Ki-young" target="_blank" title="Lee Ki-young">Lee Ki-young</a></td>
 <td width="137"> based on (Korean w/ Engish subtitles)</td>
 <td width="111">courage</td>
 <td width="52">2005</td>
 <td width="85">ï»¿Amazon</td>
 <td width="72">ï»¿Netflix</td>
 </tr>

contactjw · « **Reply #7 on:** July 21, 2017, 11:37:18 PM »

OK. I looked it over and there weren't that many entries that needed to be corrected so I went ahead and did them manually. Funny thing is there wasn't any consistency to the problem entries. Some places had them and others didn't.

hgs · « **Reply #8 on:** July 22, 2017, 09:49:16 AM »

That makes courage! I also have a web page that I urgently need the current version.

jacobi22 · « **Reply #9 on:** July 22, 2017, 01:07:28 PM »

question: the examples in #6 ist from the SQL-Backup-file???

looks for me like a wrong encoding on the way browser-> used charset
â€™ is the UTF8-Char for the apostroph, it has to show in your browser as ', if the browser use charset utf8 and (if you read this with a tool like phpmyadmin from the database) for a wrong table collation.
But....
its also possible, that you use the wrong method in the import of the repaired backup

1. at first (see post from Darkviper): Never use the simple Notepad / (Windows-)Editor or also Wordpad for a job like this. better is a editor like Notepad++ or a good IDE like Netbeans etc with a included converter
2. look at first into the database with a tool like phpmyadmin or mysqldumper etc and search there for the table collation and there for the fields "collation" in the overview (see my example)

the whole database has a collation (picture 1) and also every table (see pic #2). use here utf8_general_ci or utf8_unicode_ci.
you can change this in your backup-file. at the end of every table structure for found the settings like

Quote

ENGINE=MyISAM AUTO_INCREMENT=28 DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci;

DEFAULT CHARSET has to be change to utf8
COLLATE has to be change to utf8_general_ci or utf8_unicode_ci

3. look into every table for the collations of the fields. see the next picture from my backup. in the install-process, every field get the collation from the settings in the wb-installer. you say, your old wb runs with latin1, so i'm sure, every single field has now the wrong collation like my screenshot

you have to change every field collation in the backup file here also from (here in the example) latin1_german1_ci to utf8_general_ci or utf8_unicode_ci

if you dont do this, you write the wrong collation for this tables, when you import the repaired backup and you see the problem in the output later. best method for a job like this is a php-based-converter, works with the method search&&replace. a lot of users has a own converter, but its very complexe, too many possibilitys. for example: if i add in my converter-code the job: search for latin1_german1_ci and i use in my database latin1_german2_ci, the script found nothing and replace nothing. in the next job, i use latin1_swedish_ci instead of latin1_german1_ci. in simple words: if you use the search&&replace method, its not possible to build a converter for everybody, that works in every case

my converter use a long list with ~ 20 different collations and writing forms and everytime, when i found a new combination in a backup-file, i add a new line in my converter and start the job again.
its also possible, to replace the wrong collations with notepad++, but you have to check every table structure (same problem like the php-converter, if you use different collations)

back to your code and to notepad++
open the backup-sql-file with notepad++, go in the top-menu to "ENCODING". set notepad++ here to UTF8 without BOM (important!!). your code show's ï»¿ and thats a sign for the simple UTF8 (with BOM)
if you open the file, the position of the black point show's the actual status, here UTF8 ohne BOM

if the actual status show not UTF8 without BOM, go in the same menu to the point "Convert to UTF8 without BOM"
now make a check for some special chars
- replace all ï»¿ with nothing
- search for other combinations, the most UTF8-chars start with Â or Ã or Å . these are chars from spain or swedish language, not used in english. Search for this letters and check every result. best method is a single edit for every result or a single search for every utf8-char. See also a simple list from here Search for the blue marked letters in your file, try it for lower case letters and upper case letters and replace it with the real chars ( â€™ == ', â€“ == - etc)

at last step, add this line somewhere in the top of the sql-file

Code: [Select]

SET NAMES 'utf8' COLLATE 'utf8_general_ci';
save the repaired file with a new file name, maybe with date + time at the end of the filename. now you can import the file as UTF8-formated File

contactjw · « **Reply #10 on:** July 22, 2017, 09:09:37 PM »

#6 examples are from pages in ckeditor in source code view.

My database is latin1_swedish_ci. All my tables are latin1_swedish_ci except for droplets, output filter, and settings which are all utf8_unicode_ci. Not sure why it's swedish?

Here's how my db file looks:

(Addons table latin1)

CREATE TABLE `wbaddons` (
`addon_id` int(11) NOT NULL AUTO_INCREMENT,
`type` varchar(255) NOT NULL DEFAULT '',
`directory` varchar(255) NOT NULL DEFAULT '',
`name` varchar(255) NOT NULL DEFAULT '',
`description` text NOT NULL,
`function` varchar(255) NOT NULL DEFAULT '',
`version` varchar(255) NOT NULL DEFAULT '',
`platform` varchar(255) NOT NULL DEFAULT '',
`author` varchar(255) NOT NULL DEFAULT '',
`license` varchar(255) NOT NULL DEFAULT '',
PRIMARY KEY (`addon_id`),
UNIQUE KEY `ident` (`directory`)
) ENGINE=MyISAM AUTO_INCREMENT=198 DEFAULT CHARSET=latin1;
/*!40101 SET character_set_clien t = @saved_cs_client */;

(Settings table utf8.)

CREATE TABLE `wbsettings` (
`name` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
`value` text COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
/*!40101 SET character_set_clien t = @saved_cs_client */;

So if I change all the default latin1 entries to utf 8 will that be enough?

contactjw · « **Reply #11 on:** July 22, 2017, 11:56:21 PM »

I changed the db to utf8_unicode_ci. Then I changed all the latin1 entries to utf8 in notepad++. Now all my tables show utf8_general_ci except the ones that were already utf8_unicode_ci. They stayed the same. I also used the ckeditor to change the few remaining entries like â€™ that I missed when I first did the ones that were not reading properly. Everything seems to be working ok.

Did I do anything wrong? Is there anything else I need to do?

jacobi22 · « **Reply #12 on:** July 23, 2017, 12:14:55 AM »

what is the used ckeditor version and do you use the editor setting $ckeditor->config['entities'] = true ??

you found this setting (if avaiable) in include.php of the editor (modules/ckeditor/include.php), nearly in the middle of the file, in my version at line 183

or the same setting in wb_ckconfig.js
config.entities = false;

to explain it: this option converted (if TRUE) special chars to html-entities (for example: the apostroph ' is saved as ' )
is this option deactivated or false, the script use the charset from WB, save it in sql-language as â€™ and in the browser readable as '

pls check the table collation of this table. open this section with the ckeditor, search there for the section-id in the head of this section and search in the module-table mod_wysiwyg (if its is a wysiwyg section). what do you see in the database tool like phpmyadmin?
export this table and open it with a simple editor like notepad (not notepad++) what do you see there???

contactjw · « **Reply #13 on:** July 23, 2017, 01:33:35 AM »

Version 4.6.2.

if (!$bWbConfigSetting ) { $ckeditor->config['entities'] = false; }

config.entities = false;

(This is from a news module post)

When the road you're trudging seems all uphill, (ckeditor source view)

When the road you’re trudging seems all uphill, (phpmyadmin content view)

When the road you’re trudging seems all uphill, (notepad)

(This is from a normal page.)

<a href="https://en.wikipedia.org/wiki/Fran%C3%A7ois_Cluzet" target="_blank">François Cluzet</a> (ckeditor source view)

<a href="https://en.wikipedia.org/wiki/Fran%C3%A7ois_Cluzet" target="_blank">François Cluzet</a> (phpmyadmin content view)

<a href=\"https://en.wikipedia.org/wiki/Fran%C3%A7ois_Cluzet\" target=\"_blank\">François Cluzet</a> \n (notepad)

Also, while looking at phpmyadmin I noticed that my localhost db is set to latin1 swedish with all the tables the same as my wb db was before I changed it. Should I change my localhost db like I did my wb db?

jacobi22 · « **Reply #14 on:** July 23, 2017, 02:22:33 PM »

thanks for the infos, but i see, that the simple notepad doesnt show the real SQL

sorry for the trouble

here the real UTF8-Chars from the table wb_addons in real SQL without Converter, only visible in a editor, but not in your browser

and the same code in notepad, notepad++ or other IDE's with included converter

but back to your post

the editors like ckeditor, tiny_mc etc has a own converter included, if activated, the editor convert special chars via html_entities(), a php-function. i dont now, how much, but html_entities has a small range, not possible, to convert all chars from all over the world. thats why, utf8 is the better solution, there are no limits. because of that, the php-group make a recommendation to use utf8 as default charset on every server since PHP 5.5. the most providers follow this instruction. in older php-versions it was the choice of the provider. german providers prefer the latin-charsets in the past and today its nearly not possible, to change the default charset without trouble. if they change the server charset for all the customers at the same time, you have a big chaos on the service line on one day or one week, so its part of the customers, to choose a new charset (like your case), so you have maybe only 1 user with trouble every day and not all in the same time

to your code-examples
thats okay in ckeditor source view and also the version with entities is okay in ckeditor source view

Quote

When the road you're trudging seems all uphill, (ckeditor source view)

the editor and his own converter convert the entities from the database and show the real char (like your apostroph). it doesnt matter, what the script found in the database. if its a readable char, its possible to convert it
P.S.: if the editor founds a not-readable char and its not possibe to convert it, the editor show nothing in the editor field and show also no toolbar. its a javascript error and its not possible for javascript to show not-existing-chars

But.... whats happend, if you save this content??

Quote

When the road you’re trudging seems all uphill, (phpmyadmin content view)

after saving this content from your example (with the correct charset and correct table collation) the phpmyadmin must show the real chars like

Quote

When the road you're trudging seems all uphill,

its okay for older CKeditor versions with activated entities-converter, but not for the newer version. if i remenber correct, we change the default settings for the ckeditor in WB 2.8.3 SP6, but its possible for every User, to use the old method, if he set config.entities to TRUE
Background for this was the view in tools like phpmyadmin. its difficult to read the content, when i use Entities like this

Quote

When the road you’re trudging seems all uphil

Without entities, i've the same view in phpmyadmin like my output in Front- or backend, the used charset convert the SQL-language to readable chars without entities

notepad++ has a good converter, but not possible to convert different methods at the same time, because of that, we use php-converters, in my case with a big array of chars to replace. Maybe you can send me your zipped sql-backup and i'll try to convert it.

for my locale development, i use xampp for windows with the default settings from xampp. if i look into phpmyadmin at the databases overview, i found in the last line latin1_swedish_ci, dont know, where it comes from. the same on the online-servers, dont know why, maybe a default setting somewhere.

in one project, i've a lot of trouble like yours, a long period of search&&try, but then i found the database charset settings in this overview and it was set to latin1 also, so i change it to utf8_general_ci and all the problems are gone after converting all the chars in the backup
to change it, go to phpadmin, startpage with the overview (show all databases as list in the big, right content-field, (like my last picture). Click on the link to your prefered database and after reload in the top-menu on the OPERATION-Link, there you can change the collation for this database

contactjw · « **Reply #15 on:** July 23, 2017, 06:22:01 PM »

From what I've read latin1_swedish_ci is the default for MySQL from it's beginning and which has never changed.

utf8mb4 seems to be the latest recommended setting as far as capabilities to deal with a wider range of characters.

utf8mb4_unicode_ci seems to be a more accurate collation setting than utf8mb4_general_ci and seems to be the recommended setting.

Thanks for the offer to try and convert my db for me. If it's not too much trouble for you I can send you my zipped sql-backup. Would you want the backup before I upgraded or after upgrading?

Also found this for converting from latin1 to utf8 in MySQL:

ALTER DATABASE databasename CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Is this an option?

contactjw · « **Reply #16 on:** July 29, 2017, 07:14:02 PM »

Well, all was well for about a week until yesterday. Could no longer connect with db. So I ended up reverting back to 2.8.3. I'm wondering if maybe a better approach would be to identify the columns where utf8 is actually needed and using the drop downs in PHPMyadmin change just the ones that need it.

jacobi22 · « **Reply #17 on:** July 30, 2017, 10:20:25 AM »

Quote

Could no longer connect with db

cannot be a problem with the charset
and if you mean, the charset is the problem, why does it work well for one week??

contactjw · « **Reply #18 on:** July 30, 2017, 07:19:20 PM »

Quote from: jacobi22 on July 30, 2017, 10:20:25 AM

Quote
Could no longer connect with db

cannot be a problem with the charset
and if you mean, the charset is the problem, why does it work well for one week??

You're right. I found out later that it was a denial of service attack and somehow I guess my connection got messed up in the process.

Back to my charset. I tried using PHPMyadmin to change the collation in particular columns like the content_short, content_long columns of the news_posts table to utf8mb4_unicode_ci in the 2.8.3 installation. I could see the text when I opened the website in the browser, but the ckeditor was just blank without any text. Any idea why that would happen?

jacobi22 · « **Reply #19 on:** July 31, 2017, 12:25:14 PM »

Quote

I tried using PHPMyadmin to change the collation in particular columns like the content_short, content_long columns of the news_posts table to utf8mb4_unicode_ci in the 2.8.3 installation. I could see the text when I opened the website in the browser, but the ckeditor was just blank without any text. Any idea why that would happen?

see my answer in the top :

Quote from: jacobi22

P.S.: if the editor founds a not-readable char and its not possibe to convert it, the editor show nothing in the editor field and show also no toolbar. its a javascript error and its not possible for javascript to show not-existing-chars

the ckeditor and also a lot of functions inside of all the modules and the core doesnt support utf8mb4
utf8mb4 needs special multibyte-string-functions to work, see also
http://www.phptherightway.com/#php_and_utf8

contactjw · « **Reply #20 on:** August 08, 2017, 11:29:08 PM »

Took a while for me to get back to working on this. So what I did is I changed the fields in the table news_posts to utf8_unicode_ci in phpMyadmin. Then I changed the table to Utf8_unicode_ci. I got a few places where the character didn't convert and I lost a little bit of text but I just retyped it in the ckeditor without too much effort. Everything seems to be working OK.

I'm wondering about a few things still. Should I change the fields in the news_settings table to Utf8 as well? What about the database? Should I set that to Utf8 also or leave it as latin1_swedish_ci?

One more thing. I'm not really sure I need to convert to Utf8 at all since I really don't have anything but English on my site. The instances of having to post a word or two in another language is rare and I can always get by with substituting any letters outside of English with an English letter. I figure if I can change to Utf8 I would be better off but don't know if it's really worth it. WDYT?

jacobi22 · « **Reply #21 on:** August 09, 2017, 01:17:51 AM »

Quote

Everything seems to be working OK.

I'm wondering about a few things still. Should I change the fields in the news_settings table to Utf8 as well? What about the database? Should I set that to Utf8 also or leave it as latin1_swedish_ci?

WB use the same Table Collation and Table Engine for all database tables. Since WB 2.8.3 SP7 only UTF8 as Charset is possible, no other charsets.
Maybe everything works now and you see no problems, but it comes, if you need your SQL-Backup the next time. Dont think about german or other special letters like ä, ö, ü, é, ó, í etc, also the special chars in your english text like apostroph, paragraph, %, & or the simple ? have to be encoded in the database.
maybe you have to change the provider, lower price, more service, combination with smart-tv etc. and you have to import your backup there. you have actual some tables with utf8-encoded chars and some tables with latin1. whatever you select in the charset box for the sql-importer, its only possible to use one charset and you start to correct the rest again. why not now and without pressure. step by step, every day a table?

okay, if you dont think about the future and dont think about the providers etc, its of course possible, to leave it at it is, it works, if nobody change the conditions
recommendation is: use only utf8 as default charset and utf8_general_ci or utf8_unicode_ci as table collation for all WebsiteBaker-tables

contactjw · « **Reply #22 on:** August 09, 2017, 09:04:50 PM »

OK. I changed all the news module tables and fields to utf8. Everything seems to work OK except I can't seem to get the drag and drop working for rearranging the order of posts. It seems to act unpredictably. Also one of the posts won't work even when I hit the up and down icon. It only works with the down icon if it's in the top position but otherwise it just goes up on either icon. Don't know if this has anything to do with the changing of character sets but wouldn't think so.

jacobi22 · « **Reply #23 on:** August 09, 2017, 11:42:37 PM »

the drag&drop- or better, the sortable-problem is reported on the WB-Project-Page as Ticket #50
it has nothing to do with a charset, its a ajax-problem

contactjw · « **Reply #24 on:** August 10, 2017, 08:58:01 PM »

OK. Changed addons table and fields. Everything I checked is OK except the 2 drop downs in the language section of the addons menu. Some entries have all or a few "?" for the actual letters. I can still select them and in "View Details" it shows the actual letters for the language. Should I be concerned about this, or will this not have any affect on anything else in the program?

News

Support WebsiteBaker

Your donations will help to:

You can donate by clicking on the button below.

Author Topic: Upgrade from 2.8.3 to 2.10.0-latin1 to utf8 (Read 71412 times)