Friday, September 28, 2012

How to remove duplicate rows in MySQL

Sometimes your table contains duplicate keys due to the fact that you forgot to add a primary key to the initial design. Something like that
CREATE TABLE employee (
ssn INT NOT NULL,
name VARCHAR(20) NOT NULL 
);
Yeah, Bad database design!!

Here are different ways to fix it
1- You realize that a primary(unique) key is missing, so you add it
ALTER IGNORE TABLE employee ADD PRIMARY KEY (ssn);  
or
ALTER IGNORE TABLE employee ADD UNIQUE KEY (ssn); 
notice the IGNORE keyword.
You could later reverse that constraint by
ALTER TABLE employee DROP index ssn;  
or 
ALTER TABLE employee DROP index ssn; 
2- Delete all but one using LIMIT keyword

DELETE FROM employee WHERE ssn=x LIMIT n; 
where x is the value of the duplicated attribute and n is the number of required deleted rows, i.e. less than the total duplicated rows by 1.
example:
if you have a duplicated ssn = 123456789 repeated 10 times and you want to delete 9 of them

DELETE FROM employee WHERE ssn=123456789 LIMIT 9; 
You can do a COUNT on that ssn before you apply the previous command.

3- Using a temporary table
CREATE TEMPORARY TABLE employee_temp AS SELECT DISTINCT * FROM employee;
DELETE * FROM employee;
INSERT INTO employee SELECT * FROM employee_temp; 

 

No comments:

Post a Comment