How to Find Duplicate Values in a SQL Table @ Johnson峰的部落格

How to Find Duplicate Values in a SQL Table

Generally, it’s best practice to put unique constraints on a table to prevent duplicate rows. However, you may find yourself working with a database where duplicate rows have been created through human error, a bug in your application, or uncleaned data from external sources. This tutorial will teach you how to find these duplicate rows.

To follow along, you’ll need read access to your database and a tool to query your database.

Identify Duplicate Criteria
The first step is to define your criteria for a duplicate row. Do you need a combination of two columns to be unique together, or are you simply searching for duplicates in a single column? In this example, we are searching for duplicates across two columns in our Users table: username and email.

Write Query to Verify Duplicates Exist
The first query we’re going to write is a simple query to verify whether duplicates do indeed exist in the table. For our example, my query looks like this:

SELECT username, email, COUNT(*)
FROM users
GROUP BY username, email
HAVING COUNT(*) > 1

HAVING is important here because unlike WHERE, HAVING filters on aggregate functions.

If any rows are returned, that means we have duplicates. In this example, our results look like this:

USERNAME	EMAIL	COUNT
Pete	pete@example.com	2
Jessica	jessica@example.com	2
Miles	miles@example.com	2

List All Rows Containing Duplicates

In the previous step, our query returned a list of duplicates. Now, we want to return the entire record for each duplicate row.

To accomplish this, we’ll need to select the entire table and join that to our duplicate rows. Our query looks like this:

SELECT a.*
FROM users a
JOIN (SELECT username, email, COUNT(*)
FROM users 
GROUP BY username, email
HAVING count(*) > 1 ) b
ON a.username = b.username
AND a.email = b.email
ORDER BY a.email

If you look closely, you’ll see that this query is not so complicated. The initial SELECT simply selects every column in the users table, and then inner joins it with the duplicated data table from our initial query. Because we’re joining the table to itself, it’s necessary to use aliases (here, we’re using a and b) to label the two versions.

Here is what our results look like for this query:

ID	USERNAME	EMAIL
1	Pete	pete@example.com
6	Pete	pete@example.com
12	Jessica	jessica@example.com
13	Jessica	jessica@example.com
2	Miles	miles@example.com
9	Miles	miles@example.com

Because this result set includes all of the row ids, we can use it to help us deduplicate the rows later.

文章轉載https://chartio.com/learn/databases/how-to-find-duplicate-values-in-a-sql-table/

Johnson峰

Johnson峰的部落格

Johnson峰發表在痞客邦留言(0) 人氣()

E-mail轉寄

«	五月 2024					»
日	一	二	三	四	五	六
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Johnson峰的部落格

How to Find Duplicate Values in a SQL Table

歷史上的今天

留言列表

熱門文章

最新文章

文章分類

站方公告

活動快報

【全民...

我的好友

最新留言

動態訂閱

文章精選

文章搜尋

新聞交換(RSS)

誰來我家

參觀人氣

QR Code

POWERED BY

月曆

«	五月 2024					»
日	一	二	三	四	五	六
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

«	五月 2024					»
日	一	二	三	四	五	六
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

«	五月 2024					»
日	一	二	三	四	五	六
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31