Suprtool

A Discussion on Deleting Duplicates

Recently, there was an interesting discussion on recovering from an error that inadvertently created duplicate records in a detail dataset.

Below is the original post from Richard French at Axiom Systems Inc.

 From: Richard S. French 
 Sent: Wednesday, October 10, 2001 1:10 PM
 To: robelle-l
 Subject: [robelle-l] Suprtool - Deleting duplicates in detail dataset


 Using Suprtool, I inadvertently created duplicate records in my detail
 dataset. Is there an easier way to delete these duplicate
 records, other  than what I have described below?
 Note - currently using Suprtool Version  4.3
 Thanks.

 Open database
 Get detail dataset records
 Sort records
 Duplicate none keys
 Output flat file
 Delete
 Xeq
 Input flat file
 Put to detail dataset
 Xeq

 Richard S. French

Richard's code above deletes all the records from the dataset and then re-puts them, but without the duplicates. Glenn Cole suggested using Adager to erase the dataset, which would be faster. But Hans Hendriks from Robelle support came up with another way to do the entire job, only much faster:

Sent:   Wednesday, October 10, 2001 5:35 PM
To:     robelle-l
Subject:        Re: Suprtool - Deleting duplicates in detail dataset

Richard,

You example deletes *all* records, then writes back *all* "originals". This
could be expensive if you have a large number of records, and only a few
duplicates.

If this is the case, I suggest you do the following:

Pass 1 - find duplicated records
  get dataset
  define longfield,1,80 (enough to make records unique)
  sort longfield
  duplicate only keys
  output dupfile,link
  xeq

Pass 2, Delete all records that have duplicates
  get dataset
  table duptab,longfield,sorted,dupfile
  if $lookup(duptab,longfield)
  delete
  Output $null
  xeq

Pass 3, Add back 1 of each previously duplicated record:
  input dupfile
  sort longfield
  duplicate none keys  {in case there were 3 or more of some....}
  put dataset
  xeq

/Hans

And Richard confirmed the beauty of the solution:

Subject:      Re: Suprtool - Deleting duplicates in detail dataset
To: robelle-l

Thanks, Hans. That was just what I was looking for  - as my detail dataset
had 1.5 million records and I had roughly 20,000 duplicates. I just tested
this and it worked great.!