"Wir alle haben eine Geschichte zu erzählen"

Deduplication Helper - Deduplicate bibliographic references faster

This little tool makes manual deduplication of bibliographic references in EndNote much faster, is transparent, and accurate.

Recently my favorite deduplication tool, Deduklick, was down. As a medical information specialist spezializing in systematic reviews and evidence synthesis I need such tools (or i.e. deduplication) very often.
That's why I decided to test other applications and methods:

ASySD (Automated Systematic Search Deduplicator): https://camarades.shinyapps.io/ASySD/ (free)
Systematic Review Accelerator Deduplicator: https://sr-accelerator.com/#/deduplicator (free)
Covidence in-built deduplicator: https://www.covidence.org/ (paid) - to double check
Manual deduplication in EndNote (following the Leeds method): PDF

Comparison of deduplication tools

Here are the results:

Table comparing the deduplication results of refrences in ASySD, ASR Deduplicator, Covidence and manual deduplication
They differ by almost 300 references, which is a lot if you have to go through all of them.

Leeds method is balanced, but doesn't include DOI

For my data set, the Leeds method in EndNote seems to be quite balanced. But this method is - scientifically speaking - quite old and doesn't take DOI into account.
Newer versions of EndNote are able to mark duplicates by the DOI. But - and this is were the Deduplication Helper comes in - DOI data coming from different sources often don't have the same data structure, making semi-automatic comparison difficult. The same goes for the page format.

The Deduplication Helper fixes DOIs and page numbers with one click! This makes manual deduplication in EndNote so much faster!

Download the free Deduplication Helper installer (.exe, 11 MB) to have this handy tool wherever you need it (it is documented on GitHub, raw Python file available there).

INSTALL DEDUPLICATION HELPER

How to use the Deduplication Helper

EndNote: Create a group for each database
Import your reference files into EndNote (version 20 and above) in the following order and add the references to the appropriate groups:
1. Ovid databases: a) Medline, b) Embase, c) PsychInfo, d) Eric)
2. Embase.com
3. PubMED
4. CINAHL databases
5. Web of Science
6. ProQuest databases
7. Cochrane Reviews
8. CENTRAL
9. etc.
Overwrite "Name of database" field if necessary.
Export all references to a .txt file
Open the Deduplication Helper. Select the .txt file.
The tool places a cleaned file in the original folder, marked with CLEANED at the end of the name.
EndNote: Delete all references (empty trash too)
EndNote: Upload cleaned .txt file
If you like: sort the references by "name of database" and add them to the groups
Create a new group "DOI"
Go to "all references", sort them by DOI.
Select all references with DOI and add them to the group "DOI"
Go to group "DOI", mark all references there
Preferences > Duplicates
Select "Title" and "DOI"
Library > Find Duplicates (click "Cancel" in the pop-up window)
Delete all marked duplicates
Delete group "DOI"
Follow the steps of the Leeds method (described here).

This is now much faster because all DOI duplicates are already removed. In addition, the following steps are easier/faster because the page format is identical in all references (and comparable by machine reading).

Let me know what you think of it:

Tanya Karrer, M.A, MAS ALIS
Information Specialist
University of Bern
Medical Library
tanya.karrer(at)unibe.ch