How to Delete Duplicate rows In MS Sql Server

What is the best way to remove duplicate rows from a fairly large SQL Server table having 10, 00,000+ rows?

The rows, of course, will not be perfect duplicates because of the existence of the RowID identity field.

MyTable

RowID int not null identity(1,1) primary key,
Col1 varchar(20) not null,
Col2 varchar(2048) not null,
Col3 tinyint not null

How to Remove Duplicate rows In Sql Server?

Answers:

Assuming no nulls, you GROUP BY the unique columns, and SELECT the MIN (or MAX) RowId as the row to keep. Then, just delete everything that didn't have a row id:

DELETE FROM MyTable
LEFT OUTER JOIN (
SELECT MIN(RowId) as RowId, Col1, Col2, Col3
FROM MyTable
GROUP BY Col1, Col2, Col3
) as KeepRows ON
MyTable.RowId = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL
In case you have a GUID instead of an integer, you can replace

MIN(RowId)
with

CONVERT(uniqueidentifier, MIN(CONVERT(char(36), MyGuidColumn)))

Another possible way of doing this is

;

--Ensure that any immediately preceding statement is terminated with a semicolon above
WITH cte
AS (SELECT ROW_NUMBER() OVER (PARTITION BY Col1, Col2, Col3
ORDER BY ( SELECT 0)) RN
FROM #MyTable)
DELETE FROM cte
WHERE RN > 1;

Using ORDER BY (SELECT 0) above as it is arbitrary which row to preserve in the event of a tie.

To preserve the latest one in RowID order for example you could use ORDER BY RowID DESC

The following query is useful to delete duplicate rows. The table in this example has ID as an identity column and the columns which have duplicate data are Column1, Column2 and Column3.

DELETE FROM TableName
WHERE ID NOT IN (SELECT MAX(ID)
FROM TableName
GROUP BY Column1,
Column2,
Column3
/*Even if ID is not null-able SQL Server treats MAX(ID) as potentially
nullable. Because of semantics of NOT IN (NULL) including the clause
below can simplify the plan*/
HAVING MAX(ID) IS NOT NULL)

The following script shows usage of GROUP BY, HAVING, ORDER BY in one query, and returns the results with duplicate column and its count.

SELECT YourColumnName,
COUNT(*) TotalCount
FROM YourTableName
GROUP BY YourColumnName
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC

I would prefer CTE for deleting duplicate rows from sql server table

Method 1 – by keeping original

WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY col1,col2,col3 ORDER BY col1,col2,col3) AS RN
FROM MyTable
)

DELETE FROM CTE WHERE RN<>1

Method 1 – without keeping original

WITH CTE AS
(SELECT *,R=RANK() OVER (ORDER BY col1,col2,col3)
FROM MyTable)

DELETE CTE
WHERE R IN (SELECT R FROM CTE GROUP BY R HAVING COUNT(*)>1).

How to Delete Duplicate rows In MS Sql Server

Post a Comment

Post a Comment

Follow by Email

Siridhanya Sampoorna Arogyam Telugu PDF Dr Khadar Vali

Siridhanya Sampoorna Arogyam English PDF By Dr Khadar Vali

పాక సిరి సిరిదాన్యాలతో వంటలు పిండి వంటలు వంటల పుస్తకం

Infosys JavaScript Interview Questions Answers

Infosys ASP.NET MVC Interview Questions Answers

ఏసిరిధాన్యం ఏయే వ్యాధులను తగ్గిస్తుంది

Capgemini Frequently Asked SSIS Interview Questions

Match Group PHP Most Frequently Asked Interview Questions

Mindtree Frequently Asked SQL Server Interview Questions

Infosys PHP Recent Technical Interview Questions And Answers

Contact Form

How to Delete Duplicate rows In MS Sql Server

You Might Like

Post a Comment

Post a Comment

Contact Form