Working with Large Data Files

From Q
Jump to: navigation, search

Q can handle very large files - we have tested a file with 130,000,000 cases. However, the bigger the data the slower it will get. Generally, you should not have any noticeable problems unless working with large data files (e.g., 100,000+ cases, 200+ MB) or with large numbers of variables (e.g., 10,000 variables and 10,000 cases). Here are a few things to consider:

  • When it is slow, restart your computer and work with a minimum number of programs open.
  • You will need to be running a 64-bit version of Windows, because otherwise Q will be unable to use more than 2GB of your machine's RAM. (This is a limitation of Windows.)
  • If you are working with an SPSS or Excel data file and it is surprisingly large, you can often substantially reduce its size by:
    1. Select File > New Project.
    2. Select File > Data Sets > Add to Project > From File.
    3. In the Data Import Window:
      1. Select Use original data file structure.
      2. Untick Tidy Up Variable Labels and Strip HTML from Labels.
      3. Click OK.
    4. Select Tools > Save Data as SPSS/CSV File.
    5. Use File > Open to open the initial Q Project again.
    6. File > Data Sets > Update to switch your project to the new file.
  • Reduce the number of variables and/or cases in the data file:
    1. Delete cases that you don't need, via the Data tab (e.g. incomplete respondents).
    2. Hide variables that you do not need (see How to Hide Variables and Questions).
    3. Tools > Save Data as SPSS/CSV File.
    4. Open the initial Q Project again.
    5. File > Data Sets > Update to switch your project to the new file.
  • You may need extra memory (RAM) in your computer. As a rough guide, multiply the number of variables in your file by the number of cases, and then divide by 100,000,000 and add 2. This is how many GB of RAM you will need.
  • Read Tables Take a Long Time to Compute and Files Take a Long Time to Open.
  • Avoid unnecessary linkages between variables. That is, avoid situations where A refers to B, B refers to C, C refers to D, etc. In general, it is better with huge projects to have fewer variables that contain lots of calculations than to have lots of variables, each containing only a bit of the calculations. This is because the more variables that are included in the calculation, the greater the amount of memory on your computer that will be used up.
  • If your project is still too slow and you have acted on the above suggestions then send it to us and we will see whether we can help. We may be able to suggest improvements to your project, or we may improve Q.
Personal tools
Namespaces

Variants
Actions
Navigation
Categories
Toolbox