Deduplication Helper - Deduplicate bibliographic references faster
This little tool makes manual deduplication of bibliographic references in EndNote much faster, is transparent, and accurate.
Recently my favorite deduplication tool, Deduklick, was down. As a medical information specialist spezializing in systematic reviews and evidence synthesis I need such tools (or i.e. deduplication) very often.
That's why I decided to test other applications and methods:
- ASySD (Automated Systematic Search Deduplicator): https://camarades.shinyapps.io/ASySD/ (free)
- Systematic Review Accelerator Deduplicator: https://sr-accelerator.com/#/deduplicator (free)
- Covidence in-built deduplicator: https://www.covidence.org/ (paid) - to double check
- Manual deduplication in EndNote (following the Leeds method): PDF
Comparison of deduplication tools
Here are the results:
They differ by almost 300 references, which is a lot if you have to go through all of them.
Leeds method is balanced, but doesn't include DOI
For my data set, the Leeds method in EndNote seems to be quite balanced. But this method is - scientifically speaking - quite old and doesn't take DOI into account.
Newer versions of EndNote are able to mark duplicates by the DOI. But - and this is were the Deduplication Helper comes in - DOI data coming from different sources often don't have the same data structure, making semi-automatic comparison difficult. The same goes for the page format.
The Deduplication Helper fixes DOIs and page numbers with one click! This makes manual deduplication in EndNote so much faster!
Download the free Deduplication Helper installer (.exe, 11 MB) to have this handy tool wherever you need it (it is documented on GitHub, raw Python file available there).
How to use the Deduplication Helper
- EndNote: Create a group for each database
- Import your reference files into EndNote (version 20 and above) in the following order and add the references to the appropriate groups:
1. Ovid databases: a) Medline, b) Embase, c) PsychInfo, d) Eric)
2. Embase.com
3. PubMED
4. CINAHL databases
5. Web of Science
6. ProQuest databases
7. Cochrane Reviews
8. CENTRAL
9. etc. - Overwrite "Name of database" field if necessary.
- Export all references to a .txt file
- Open the Deduplication Helper. Select the .txt file.
The tool places a cleaned file in the original folder, marked with CLEANED at the end of the name. - EndNote: Delete all references (empty trash too)
- EndNote: Upload cleaned .txt file
- If you like: sort the references by "name of database" and add them to the groups
- Create a new group "DOI"
- Go to "all references", sort them by DOI.
Select all references with DOI and add them to the group "DOI" - Go to group "DOI", mark all references there
- Preferences > Duplicates
Select "Title" and "DOI" - Library > Find Duplicates (click "Cancel" in the pop-up window)
- Delete all marked duplicates
- Delete group "DOI"
- Follow the steps of the Leeds method (described here).
This is now much faster because all DOI duplicates are already removed. In addition, the following steps are easier/faster because the page format is identical in all references (and comparable by machine reading).
Let me know what you think of it:
Tanya Karrer, M.A, MAS ALIS
Information Specialist
University of Bern
Medical Library
tanya.karrer(at)unibe.ch