Working with Large Datasets in Derwent Data Analyzer

Overview:

When you are working with large datasets in Derwent Data Analyzer, you may see an error message that you are running out of RAM. The following guidelines may help you to free up some system memory and continue to work. Which guidelines to apply will depend on your computer hardware, your Operating System, your analytical needs, and where you are in the workflow process. Strategies discussed in this document include:

  • Use a 64-bit Operating System and use the 64-bit version of Derwent Data Analyzer with at least 16GB RAM and preferably more.
  • Close other programs that are not essential to your analysis.
  • Import a small number of fields at first; Use “Import More Fields” to add other fields later.

Use a 64-bit OS, 64-bit Derwent Data Analyzer, and Install a Lot of RAM

Derwent Data Analyzer is available in both a 32-bit and a 64-bit version, and is subject to the per-process memory usage limits of the operating system.

If you have a 32-bit version of Windows, you must use the 32-bit version of Derwent Data Analyzer.  The maximum amount of memory that 32-bit Derwent Data Analyzer can use is 2 gigabytes, which is not sufficient for serious analysis. On a 64-bit Windows system, 32-bit Derwent Data Analyzer can use only up to 3 GB - still not sufficient for serious analysis.  These limits exist regardless of how much physical memory the computer has installed.

If you have a 64-bit Operating System, you should use the 64-bit version of Derwent Data Analyzer with at least 16GB RAM and preferably more.  For most professional versions of 64-bit Windows, 64-bit Derwent Data Analyzer can use up to 128GB of RAM.

Close Non-essential Programs and *.vpt Files.

If you have other applications running that are not essential to your workflow, close them to make more system memory available for Derwent Data Analyzer to use. If you have more than one Derwent Data Analyzer data file (*.vpt) open, close all open data files except the one in which you are currently working.

Maintain a Dataset with as Few Fields as Possible

When you maintain a dataset with only the essential fields, you also keep the size (in MB) of the *.vpt file on the disk as small as possible. This is especially important when you import raw data files, and it is advisable to import only the “Title” field at first, so you do not run out of memory before you save the *.vpt file to a disk. Once you have saved your dataset as a *.vpt file, exit and restart Derwent Data Analyzer (to free up as much memory as possible) and open your saved dataset. You can use “Import More Fields” (on the “Refine” ribbon) to add other fields you need after your data is imported and saved to a *.vpt file.

Here is a useful table to guide you in selecting a minimum set of fields:

Minimum Field Set
for the Cleanup Macro

   

Additional Fields Needed
for the Reporting Macros
[
defaults]

   

Additional Fields Needed 
for the Pivot Charts

  1. Assignee Code - DWPI
  2. Assignee/Applicant
  3. Assignee/Applicant (long)
  4. DWPI Accession Number
  5. Inventor
   
  1. Country [Priority Countries]
  2. Year [Priority Years (earliest)]
  3. Technology [DWPI Manual Codes]
   
  1. Priority Number (long)
  2. Family Member Years
  3. Family Member Countries

 

Use discretion when choosing which fields to add. Whenever possible, avoid importing fields with Long Text (e.g. Patent Claims, Abstracts, etc.)

Fields with a very large number of items will also consume a lot of system resources. Examples of such fields include:

  • Fields with “NLP” Words or Phrases
  • “Cited References” fields and fields derived from Cited References (e.g. “Cited Authors” or “Cited Journals”).
  • Authors, Inventors, Full Organization Names, or fields with uncontrolled vocabulary terms (see note below)

Note: Delete existing large fields that are not in use, but only if they can be readily imported again using “Import More Fields.” Use caution not to delete fields that have Groups you want to keep or “Cleaned” fields. “Cleaned” fields cannot be readily re-imported with “Import More Fields,”[1], but the originating field on which the cleaning was done can usually be safely deleted.

Fields that include a lot of items also tend to have long tails on their record frequency distributions. That is, a vast majority of the terms will occur in only one or two records. When this is the case, consider creating a group of all terms that occur in at least N records. You can then use “Create Field using Group Items” to make a new field with far fewer items, and delete the originating, much larger field.



[1] If you saved your List Cleanup work as a thesaurus, you can re-import the original field, and run your saved thesaurus on that field.