Wednesday, July 6, 2011

How to set up MySql master slave replication

On master server you must enter in your configuration:

# replication master 
server-id=1
log-bin=www0-bin
expire_logs_days = 30
sync_binlog=1
# end replication

On slave server you must enter in you configuration:

# replication slave 
relay-log=/data/database/mysql/mysqld-relay-bin
server-id=2
# end replication

Be careful to set the unique id for the server-id (must be numeric)!

On master dump the DB in single transaction with current master position (on top of the dump file):

mysqldump -pyourpass --single-transaction --quick --master-data=2 schema_name > db_dump.sql

Using master-data=2 option will add change master to statement on the top of your dump file (commented).

On slave server if replication is running, we must first clean slave state. Log in to mysql console and run:

STOP SLAVE;
RESET SLAVE;

Now we can import the database on slave:

mysql -uroot -p db_name <  db_dump.sql

Get the master position from the top of db_dump.sql:

head -50 db_dump.sql | grep CHANGE

Add some additional parameters to CHANGE MASTER command (master_user, ...):

CHANGE MASTER TO MASTER_LOG_FILE='www0-bin.001519', MASTER_LOG_POS=518850262, master_user='replslave';

Start the slave and check the status:

START SLAVE;
show slave status \G;

If by any chance there are some duplicates, do the following:

STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE;

How to kill long lasting connections in mysql automatically

Although it is possible to use console and kill long lasting connections by hand, or we could set up MYSQL to do the same in config, wouldn't it be nice to have a procedure that will inspect the database for long lasting connections automatically. Well here it is:


drop procedure if exists purge_slow_queries;
drop procedure if exists purge_idle_connections;
drop event if exists auto_purge_slow_queries;
drop event if exists auto_purge_idle_connections;

delimiter //

create procedure purge_idle_connections()
deterministic
begin
    declare done boolean default false;
    declare max_time int default coalesce(@max_kill_time, 200);
    declare pid bigint;
    declare c cursor for
    SELECT id
    FROM information_schema.processlist
    WHERE command in ('Sleep')
        AND time > max_time;
    declare continue handler for not found
        set done = true;
    open c;
    set @q_kill = 'KILL ?';
    prepare q_kill from @q_kill;
    PURGELOOP: loop
        fetch c into pid;
        if done then
            leave PURGELOOP;
        end if;
        set @pid = pid;
        execute q_kill using @pid;
    end loop;
    deallocate prepare q_kill;
end//


create procedure purge_slow_queries()
deterministic
begin
    declare done boolean default false;
    declare max_time int default coalesce(@max_kill_time, 200);
    declare pid bigint;
    declare c cursor for
    SELECT id
    FROM information_schema.processlist
    WHERE state in ('executing')
        AND time > max_time;
    declare continue handler for not found
        set done = true;
    open c;
    set @q_kill = 'KILL ?';
    prepare q_kill from @q_kill;
    PURGELOOP: loop
        fetch c into pid;
        if done then
            leave PURGELOOP;
        end if;
        set @pid = pid;
        execute q_kill using @pid;
    end loop;
    deallocate prepare q_kill;
end//

delimiter ;

create event auto_purge_idle_connections
    on schedule every 10 second
    do call purge_idle_connections();

create event auto_purge_slow_queries
    on schedule every 10 second
    do call purge_slow_queries();

Please note that you must have information_schema.processlist table wich was added in MySQL 5.1.7.

Wednesday, June 15, 2011

Demystifying the garbage collector

In my work history I have met a lot of people that were using languages like PHP, Java, .NET but had no clue of how garbage collectors actually work. Most common answer was: 'Memory is no problem, because we have garbage collector, that free the resources.'. Unfortunately although this is true, we have garbage collectors, memory leaks are as big problem as they were in C or C++ for example. To illustrate the problem I have written PHP snippet bellow, but could be in any language.


echo 'Memory usage on script run:' .memory_get_usage() . "n"; 
//let us create a simple object 
$var = array(); 
for($i=1; $i < 10000; $i++) { 
$var[] = 'Hello'; 
} 
//after array 
echo 'Memory usage when object is created:' .memory_get_usage() . "n";

//memory usage is the same 
//because there was no data cloning, just pointing 
$new_var = $var; 
echo 'Memory usage when new pointer is made:' .memory_get_usage() . "n";

//unset does not free resource 
//because there is still pointer to that memory location $var 
unset($var); 
echo 'Memory usage when one after uset:' .memory_get_usage() . "n";

//unset($new_var); 
//echo 'Memory usage when one after all pointers:' .memory_get_usage() . "n"; 

The code illustrates the problem. If you are using objects and you assign a variable an instance of an object it will hold the resource, because are actually pointing to that object and not cloning it. Therefor the resources used by an object will be collected by garbage collector only when all the pointers (variables) pointing to that resource are released.

Good article that explains garbage collectors can be found here

Monday, May 30, 2011

Safe file_get_contents

If we want to get file from remote location and link is valid but file does not exist, we get connection refused error and php returns critical error and kills the script (at least in my version of php). Using @ won't help. For accessing external locations I use this method:


static public function safe_url_get_contents
($url,$include_path = false,$context = null,
$start = -1,$max_length = null)
   {
      $strItem = FALSE;
      if (self::checkUrl($url))
      {
         if (!is_numeric($max_length))
         {
            $strItem = file_get_contents($url,$include_path,$context,$start);
         }
         else 
         {
            $strItem = file_get_contents($url,$include_path,$context, $start, $max_length);
         }
         
       }
       
       return $strItem;
   }

   static public function checkUrl($url)
   {
      @$headers = get_headers($url);
      if (preg_match('/^HTTP\/\d\.\d\s+(200|301|302)/', $headers[0]))
      {
         return true;
      }
      else
         return false;
   }

Although it introduces the overhead of checking the headers I believe it to be a small price to pay, when alternative is PHP to fail.

How to convert html encoded string to utf8

When I need to convert string from html encoded to e.g. UTF8 I use the next line of code:

html_entity_decode(strip_tags($strBody), null, 'UTF-8')

First I strip all the html tags and then decode the string.

Tuesday, May 17, 2011

How to fix image color dept problem or why does this picture do not work in IE

We have all stumbled upon a problem when certain pictures (jpegs) work perfectly in firefox, chrome ... but not in IE (any version). The problem is that color depth in image header is different than the actual color depth. So if i.e. actual color depth of a picture is 24bit but header reports that it is 32bit, this picture will not display in IE. If you are using Linux, you are in luck, because that can simply be fixed.

I've found this site with tools needed (there is also an ubuntu package if you are using ubuntu). To detect the problem use:

jpeginfo -c -merronly Downloads/test.jpg

If there is an error, you will get something like:
Downloads/test.jpg  594 x 420  32bit 
Exif  N  174956  Unsupported color conversion request  [ERROR]

To fix the problem simply use imageMagic:

#convert it to png
mogrify -format png Downloads/test.jpg
#and then back to jpg
mogrify -format jpeg Downloads/test.png

SVN: How to list all the changes made by uses in a period of time

When you are using svn, you may wan't to inspect what did you (or your team member) changed in a period of time. To achieve that use following command:

svn log -r "{2011-04-01 00:00:00}:{2011-05-01 00:00:00}" | 
sed -n '/username/,/-----$/ p'

where you should replace the username with username of aa actual user.

Tuesday, April 19, 2011

How to update a mysql database record with value from another record in the same table

I have stumbled upon a problem that is not so uncommon. I wanted to set parent id of child record. I could always do something like:

update news_category set parent = 100;

But that was not the option because there was no way I can be 100% sure that 100 was id of a correct parent on any other database except mine.

When we try to do something like:

update news_category set parent = (select id from news_category where name = 'parent');
we get an error. This is the point where people give up and write a script in php or something, not knowing that this problem can solved by simple aliasing.
If we use a simple trick an convince the mysql that we are using two tables, one for update and one for select, it will work.

Update looks like:

update news_category set parent = (select id from (select id from  news_category where name = 'parent') ct) where name = 'child';
Update statement for mysql is exactly the same as
update news_category set parent= (select id from ct) where name = 'child';
because we used table alias.

How to optimize tables in mysql

As everyone that uses mysql knows, cardinality of indexes and partitioning of table space is a problem, especially when having heavy usage database. The result are the declining performance of queries, in my case query took 15-16 second instead of 2. We could always do OPTIMIZE TABLE on a specific table, but I wanted a system solution.

I was familiar with EXECUTE_IMMEDIATE concept from oracle and was lucky to find this post. With minnor modification (if not using default collation original won't work) it provided all I needed.

-- --------------------------------------------------------------------------------
-- OPTIMIZE_SCHEMA_TABLES
-- --------------------------------------------------------------------------------
DELIMITER $$

CREATE PROCEDURE `cms`.`OPTIMIZE_SCHEMA_TABLES` ( inSchemaName VARCHAR(999) )
BEGIN
        DECLARE vDynSQL    VARCHAR(999)   DEFAULT '';
        DECLARE vTableName VARCHAR(999)   DEFAULT '';
        DECLARE vDone      BOOLEAN        DEFAULT FALSE;

        DECLARE cDB_TABLES CURSOR
        FOR     SELECT DISTINCT TABLE_NAME
                FROM INFORMATION_SCHEMA.TABLES
                WHERE TABLE_SCHEMA = CONVERT(inSchemaName USING utf8);

        DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET vDone=TRUE;

        OPEN cDB_TABLES;

        REPEAT 
                FETCH cDB_TABLES INTO vTableName;
                SET vDynSQL = CONCAT('OPTIMIZE TABLE ', inSchemaName, '.', vTableName);
                CALL EXECUTE_IMMEDIATE(vDynSQL);
                SET vDynSQL = '';
        UNTIL vDone END REPEAT;

        CLOSE cDB_TABLES;
END;
-- --------------------------------------------------------------------------------
-- Routine DDL
-- --------------------------------------------------------------------------------
DELIMITER $$

CREATE DEFINER=`cms`@`localhost` PROCEDURE `EXECUTE_IMMEDIATE`(IN tSQLStmt TEXT)
BEGIN

        SET @executeImmediateSQL = tSQLStmt;

        PREPARE executeImmediateSTML FROM @executeImmediateSQL;

        EXECUTE executeImmediateSTML;

        DEALLOCATE PREPARE executeImmediateSTML;

END

Only thing left to do was to put it in a cron.

How to implement Levenshtein algotithm in mysql

I have stumbled upon a problem. I needed to check if a title contains the keyword. Using previous experience, I wanted to find a mysql function implementing the algorithm and I found this site that helped a lot. Algorithm published was almost perfect, except for a few limits in my version of mysql.

-- --------------------------------------------------------------------------------
-- Routine DDL
-- --------------------------------------------------------------------------------
DELIMITER $$

CREATE DEFINER=`cms`@`localhost` FUNCTION `levenstein`() RETURNS int(11)
    DETERMINISTIC
BEGIN
 DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT;
 DECLARE s1_char CHAR;
 DECLARE cv0, cv1 VARBINARY(256);

 SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0;

 IF s1 = s2 THEN
      RETURN 0;
    ELSEIF s1_len = 0 THEN
      RETURN s2_len;
    ELSEIF s2_len = 0 THEN
      RETURN s1_len;
    ELSE
      WHILE j <= s2_len DO
        SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1;
      END WHILE;
      WHILE i <= s1_len DO
        SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1;
        WHILE j <= s2_len DO
          SET c = c + 1;
          IF s1_char = SUBSTRING(s2, j, 1) THEN 
            SET cost = 0; ELSE SET cost = 1;
          END IF;
          SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost;
          IF c > c_temp THEN SET c = c_temp; END IF;
            SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1;
            IF c > c_temp THEN 
              SET c = c_temp; 
            END IF;
            SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1;
        END WHILE;
        SET cv1 = cv0, i = i + 1;
      END WHILE;
    END IF;
    RETURN c;
END

-- --------------------------------------------------------------------------------
-- Routine DDL
-- --------------------------------------------------------------------------------
DELIMITER $$

CREATE FUNCTION levenshtein_ratio( s1 VARCHAR(255), s2 VARCHAR(255) )
RETURNS INT
DETERMINISTIC
BEGIN
DECLARE s1_len, s2_len, max_len INT;
SET s1_len = LENGTH(s1), s2_len = LENGTH(s2);
IF s1_len > s2_len THEN 
SET max_len = s1_len; 
ELSE 
SET max_len = s2_len; 
END IF;
RETURN ROUND((1 - LEVENSHTEIN(s1, s2) / max_len) * 100); 
END

To use it simply use select:

select levenshtein_ratio(upper(a),upper(keyword));

That was just a half of the story. I needed the split function, that would split the title. There is none in mysql.Luckily i found this thread to help me out. Only modification I made was not to return record set, because functions cannot return record sets.

-- --------------------------------------------------------------------------------
-- Routine DDL
-- --------------------------------------------------------------------------------
DELIMITER $$

CREATE DEFINER=`cms`@`localhost` PROCEDURE `split`(input text, delim VARCHAR(10))
begin

    declare foundPos tinyint unsigned;
    declare tmpTxt text;
    declare delimLen tinyint unsigned;
    declare element text;

    drop temporary table if exists tmpValues;
    create temporary table tmpValues
    (
        `values` varchar(256) not null default ''
    ) engine = memory;

    set delimLen = length(delim);
    set tmpTxt = input;

    set foundPos = instr(tmpTxt,delim);
    while foundPos <> 0 do
        set element = substring(tmpTxt, 1, foundPos-1);
        set tmpTxt = replace(tmpTxt, concat(element,delim), '');
        insert into tmpValues (`values`) values ( element);
        set foundPos = instr(tmpTxt,delim);
    end while;

    if tmpTxt <> '' then
        insert into tmpValues (`values`) values (tmpTxt);
    end if;

    -- select * from tmpValues;
END

Now I had all the tools I needed so I could write my own function to do the keyword search.

-- --------------------------------------------------------------------------------
-- Routine DDL
-- --------------------------------------------------------------------------------
DELIMITER $$

CREATE FUNCTION compare_title(title VARCHAR(255), keyword VARCHAR(255) )
  RETURNS INT
  DETERMINISTIC
BEGIN

    DECLARE done INT DEFAULT 0;
    DECLARE match_found INT DEFAULT 0;
    DECLARE a VARCHAR(255);
    DECLARE ratio INT DEFAULT 0;
    DECLARE cur1 CURSOR FOR SELECT `cms`.`values` FROM tmpValues;
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;

    OPEN cur1;

    call split(title, ' ');

    read_loop: LOOP
        FETCH cur1 INTO a;
        IF done THEN
            LEAVE read_loop;
        END IF;
  select levenshtein_ratio(upper(a),upper(keyword)) into ratio;
        IF ratio > 80 THEN
  SET done = 1;
  SET match_found = 1;
        END IF;
       
    END LOOP read_loop;

    CLOSE cur1;
    return match_found;

END

What function does is: open a cursor on a temporary table created by split procedure, loops through it and compare the words with the keyword. If levensthein ratio is above 79% match than returns 1 else continues until the end of table and return 0.

To use it you should use something like:

select id from table where compare_title(upper(title),'mysql') = 1;

Although it works this method can be used only on very limited set of data. It take approx. 3 minutes to compare 150 titles on my quad core pc.