"Wir alle haben eine Geschichte zu erzählen" 

Deduplication Helper - Deduplicate bibliographic references faster

This little tool makes manual deduplication of bibliographic references in EndNote much faster, is transparent, and accurate.

Recently my favorite deduplication tool, Deduklick, was down. As a medical information specialist spezializing in systematic reviews and evidence synthesis I need such tools (or i.e. deduplication) very often.
That's why I decided to test other applications and methods:

Comparison of deduplication tools

Here are the results:

Table comparing the deduplication results of refrences in ASySD, ASR Deduplicator, Covidence and manual deduplication
 They differ by almost 300 references, which is a lot if you have to go through all of them.

 

Leeds method is balanced, but doesn't include DOI

For my data set, the Leeds method in EndNote seems to be quite balanced. But this method is - scientifically speaking - quite old and doesn't take DOI into account.
Newer versions of EndNote are able to mark duplicates by the DOI. But - and this is were the Deduplication Helper comes in - DOI data coming from different sources often don't have the same data structure, making semi-automatic comparison difficult. The same goes for the page format.

The Deduplication Helper fixes DOIs and page numbers with one click! This makes manual deduplication in EndNote so much faster!

Download the free Deduplication Helper installer (.exe, 11 MB) to have this handy tool wherever you need it (it is documented on GitHub, raw Python file available there).

INSTALL DEDUPLICATION HELPER 

How to use the Deduplication Helper

  1. EndNote: Create a group for each database
  2. Import your reference files into EndNote (version 20 and above) in the following order and add the references to the appropriate groups:
    1. Ovid databases: a) Medline, b) Embase, c) PsychInfo, d) Eric)
    2. Embase.com
    3. PubMED
    4. CINAHL databases
    5. Web of Science
    6. ProQuest databases
    7. Cochrane Reviews
    8. CENTRAL
    9. etc.
  3. Overwrite "Name of database" field if necessary.
  4. Export all references to a .txt file
  5. Open the Deduplication Helper. Select the .txt file.
    The tool places a cleaned file in the original folder, marked with CLEANED at the end of the name.
  6. EndNote: Delete all references (empty trash too)
  7. EndNote: Upload cleaned .txt file
  8. If you like: sort the references by "name of database" and add them to the groups
  9. Create a new group "DOI"
  10. Go to "all references", sort them by DOI.
    Select all references with DOI and add them to the group "DOI"
  11. Go to group "DOI", mark all references there
  12. Preferences > Duplicates
    Select "Title" and "DOI"
  13. Library > Find Duplicates (click "Cancel" in the pop-up window)
  14. Delete all marked duplicates
  15. Delete group "DOI"
  16. Follow the steps of the Leeds method (described here).

This is now much faster because all DOI duplicates are already removed. In addition, the following steps are easier/faster because the page format is identical in all references (and comparable by machine reading).

Let me know what you think of it:

Tanya Karrer, M.A, MAS ALIS
Information Specialist
University of Bern
Medical Library
tanya.karrer(at)unibe.ch