Tuesday, April 19, 2011

How to update a mysql database record with value from another record in the same table

I have stumbled upon a problem that is not so uncommon. I wanted to set parent id of child record. I could always do something like:

update news_category set parent = 100;

But that was not the option because there was no way I can be 100% sure that 100 was id of a correct parent on any other database except mine.

When we try to do something like:

update news_category set parent = (select id from news_category where name = 'parent');
we get an error. This is the point where people give up and write a script in php or something, not knowing that this problem can solved by simple aliasing.
If we use a simple trick an convince the mysql that we are using two tables, one for update and one for select, it will work.

Update looks like:

update news_category set parent = (select id from (select id from  news_category where name = 'parent') ct) where name = 'child';
Update statement for mysql is exactly the same as
update news_category set parent= (select id from ct) where name = 'child';
because we used table alias.

How to optimize tables in mysql

As everyone that uses mysql knows, cardinality of indexes and partitioning of table space is a problem, especially when having heavy usage database. The result are the declining performance of queries, in my case query took 15-16 second instead of 2. We could always do OPTIMIZE TABLE on a specific table, but I wanted a system solution.

I was familiar with EXECUTE_IMMEDIATE concept from oracle and was lucky to find this post. With minnor modification (if not using default collation original won't work) it provided all I needed.

-- --------------------------------------------------------------------------------
-- OPTIMIZE_SCHEMA_TABLES
-- --------------------------------------------------------------------------------
DELIMITER $$

CREATE PROCEDURE `cms`.`OPTIMIZE_SCHEMA_TABLES` ( inSchemaName VARCHAR(999) )
BEGIN
        DECLARE vDynSQL    VARCHAR(999)   DEFAULT '';
        DECLARE vTableName VARCHAR(999)   DEFAULT '';
        DECLARE vDone      BOOLEAN        DEFAULT FALSE;

        DECLARE cDB_TABLES CURSOR
        FOR     SELECT DISTINCT TABLE_NAME
                FROM INFORMATION_SCHEMA.TABLES
                WHERE TABLE_SCHEMA = CONVERT(inSchemaName USING utf8);

        DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET vDone=TRUE;

        OPEN cDB_TABLES;

        REPEAT 
                FETCH cDB_TABLES INTO vTableName;
                SET vDynSQL = CONCAT('OPTIMIZE TABLE ', inSchemaName, '.', vTableName);
                CALL EXECUTE_IMMEDIATE(vDynSQL);
                SET vDynSQL = '';
        UNTIL vDone END REPEAT;

        CLOSE cDB_TABLES;
END;
-- --------------------------------------------------------------------------------
-- Routine DDL
-- --------------------------------------------------------------------------------
DELIMITER $$

CREATE DEFINER=`cms`@`localhost` PROCEDURE `EXECUTE_IMMEDIATE`(IN tSQLStmt TEXT)
BEGIN

        SET @executeImmediateSQL = tSQLStmt;

        PREPARE executeImmediateSTML FROM @executeImmediateSQL;

        EXECUTE executeImmediateSTML;

        DEALLOCATE PREPARE executeImmediateSTML;

END

Only thing left to do was to put it in a cron.

How to implement Levenshtein algotithm in mysql

I have stumbled upon a problem. I needed to check if a title contains the keyword. Using previous experience, I wanted to find a mysql function implementing the algorithm and I found this site that helped a lot. Algorithm published was almost perfect, except for a few limits in my version of mysql.

-- --------------------------------------------------------------------------------
-- Routine DDL
-- --------------------------------------------------------------------------------
DELIMITER $$

CREATE DEFINER=`cms`@`localhost` FUNCTION `levenstein`() RETURNS int(11)
    DETERMINISTIC
BEGIN
 DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT;
 DECLARE s1_char CHAR;
 DECLARE cv0, cv1 VARBINARY(256);

 SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0;

 IF s1 = s2 THEN
      RETURN 0;
    ELSEIF s1_len = 0 THEN
      RETURN s2_len;
    ELSEIF s2_len = 0 THEN
      RETURN s1_len;
    ELSE
      WHILE j <= s2_len DO
        SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1;
      END WHILE;
      WHILE i <= s1_len DO
        SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1;
        WHILE j <= s2_len DO
          SET c = c + 1;
          IF s1_char = SUBSTRING(s2, j, 1) THEN 
            SET cost = 0; ELSE SET cost = 1;
          END IF;
          SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost;
          IF c > c_temp THEN SET c = c_temp; END IF;
            SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1;
            IF c > c_temp THEN 
              SET c = c_temp; 
            END IF;
            SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1;
        END WHILE;
        SET cv1 = cv0, i = i + 1;
      END WHILE;
    END IF;
    RETURN c;
END

-- --------------------------------------------------------------------------------
-- Routine DDL
-- --------------------------------------------------------------------------------
DELIMITER $$

CREATE FUNCTION levenshtein_ratio( s1 VARCHAR(255), s2 VARCHAR(255) )
RETURNS INT
DETERMINISTIC
BEGIN
DECLARE s1_len, s2_len, max_len INT;
SET s1_len = LENGTH(s1), s2_len = LENGTH(s2);
IF s1_len > s2_len THEN 
SET max_len = s1_len; 
ELSE 
SET max_len = s2_len; 
END IF;
RETURN ROUND((1 - LEVENSHTEIN(s1, s2) / max_len) * 100); 
END

To use it simply use select:

select levenshtein_ratio(upper(a),upper(keyword));

That was just a half of the story. I needed the split function, that would split the title. There is none in mysql.Luckily i found this thread to help me out. Only modification I made was not to return record set, because functions cannot return record sets.

-- --------------------------------------------------------------------------------
-- Routine DDL
-- --------------------------------------------------------------------------------
DELIMITER $$

CREATE DEFINER=`cms`@`localhost` PROCEDURE `split`(input text, delim VARCHAR(10))
begin

    declare foundPos tinyint unsigned;
    declare tmpTxt text;
    declare delimLen tinyint unsigned;
    declare element text;

    drop temporary table if exists tmpValues;
    create temporary table tmpValues
    (
        `values` varchar(256) not null default ''
    ) engine = memory;

    set delimLen = length(delim);
    set tmpTxt = input;

    set foundPos = instr(tmpTxt,delim);
    while foundPos <> 0 do
        set element = substring(tmpTxt, 1, foundPos-1);
        set tmpTxt = replace(tmpTxt, concat(element,delim), '');
        insert into tmpValues (`values`) values ( element);
        set foundPos = instr(tmpTxt,delim);
    end while;

    if tmpTxt <> '' then
        insert into tmpValues (`values`) values (tmpTxt);
    end if;

    -- select * from tmpValues;
END

Now I had all the tools I needed so I could write my own function to do the keyword search.

-- --------------------------------------------------------------------------------
-- Routine DDL
-- --------------------------------------------------------------------------------
DELIMITER $$

CREATE FUNCTION compare_title(title VARCHAR(255), keyword VARCHAR(255) )
  RETURNS INT
  DETERMINISTIC
BEGIN

    DECLARE done INT DEFAULT 0;
    DECLARE match_found INT DEFAULT 0;
    DECLARE a VARCHAR(255);
    DECLARE ratio INT DEFAULT 0;
    DECLARE cur1 CURSOR FOR SELECT `cms`.`values` FROM tmpValues;
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;

    OPEN cur1;

    call split(title, ' ');

    read_loop: LOOP
        FETCH cur1 INTO a;
        IF done THEN
            LEAVE read_loop;
        END IF;
  select levenshtein_ratio(upper(a),upper(keyword)) into ratio;
        IF ratio > 80 THEN
  SET done = 1;
  SET match_found = 1;
        END IF;
       
    END LOOP read_loop;

    CLOSE cur1;
    return match_found;

END

What function does is: open a cursor on a temporary table created by split procedure, loops through it and compare the words with the keyword. If levensthein ratio is above 79% match than returns 1 else continues until the end of table and return 0.

To use it you should use something like:

select id from table where compare_title(upper(title),'mysql') = 1;

Although it works this method can be used only on very limited set of data. It take approx. 3 minutes to compare 150 titles on my quad core pc.