Working with Large Data Files
Q can handle very large files - we have tested a file with 130,000,000 cases. However, the bigger the data the slower it will get. Generally, you should not have any noticeable problems unless working with large data files (e.g., 100,000+ cases, 200+ MB) or with large numbers of variables (e.g., 10,000 variables and 10,000 cases). Here are a few things to consider:
- When it is slow, restart your computer and work with a minimum number of programs open.
- You will need to be running a 64-bit version of Windows, because otherwise Q will be unable to use more than 2GB of your machine's RAM. (This is a limitation of Windows.)
- If you are working with an SPSS or Excel data file and it is surprisingly large, you can often substantially reduce its size by:
- Select File > New Project.
- Select File > Data Sets > Add to Project > From File.
- In the Data Import Window:
- Select Use original data file structure.
- Untick Tidy Up Variable Labels and Strip HTML from Labels.
- Click OK.
- Select Tools > Save Data as SPSS/CSV File.
- Use File > Open to open the initial Q Project again.
- File > Data Sets > Update to switch your project to the new file.
- Reduce the number of variables and/or cases in the data file:
- You may need extra memory (RAM) in your computer. As a rough guide, multiply the number of variables in your file by the number of cases, and then divide by 100,000,000 and add 2. This is how many GB of RAM you will need.
- Some specific operations in Q, such as table calculations, can utilize extra cores in your CPU. If you find that an operation is slow, perform this test to see if using a CPU with more cores will help:
- Open Windows' Task Manager
- Click More details
- Go to the Performance tab, CPU section.
- Right-click on the graph and select Change graph to > Logical processors (each "core" of your CPU may have one or more "logical processors", and each allows extra parallel work to be done by Q).
- Perform the slow operation in Q while keeping an eye on the graph.
- If you find that all of the logical processors are being fully utilized (near 100%), then this means the operation in Q is able to utilize the extra cores. Performing the operation on a CPU with more cores may make it faster as it is able to utilize more cores.
- If you find that only one or two logical processors are being fully utilized and the rest are not, then this means that the operation in Q is only using a single core. The operation will not go any faster on a CPU with more cores. It will, however, take advantage of a CPU with a higher clock speed (typically rated in GHz, gigahertz).
- Read Tables Take a Long Time to Compute and Files Take a Long Time to Open.
- Avoid unnecessary linkages between variables. That is, avoid situations where A refers to B, B refers to C, C refers to D, etc. In general, it is better with huge projects to have fewer variables that contain lots of calculations than to have lots of variables, each containing only a bit of the calculations. This is because the more variables that are included in the calculation, the greater the amount of memory on your computer that will be used up.
- If your project is still too slow and you have acted on the above suggestions then send it to us and we will see whether we can help. We may be able to suggest improvements to your project, or we may improve Q.
Using QDat Files (Experimental)
Our new QDat file format can substantially speed up the loading of a project. To convert your raw data file to this format:
- Import the raw data file into Q 5.12.0 or later (Projected availability later in 2021).
- After importing, select Use original data file structure
- Select Tools > Save Data as SPSS/CSV File... and click OK.
- In the file save dialog (which may be slow to appear), change Save as type to QDat Files (*.QDat).
- Click Save to save the new data file.
- Open your original project.
- Click File > Data Sets > Update > your file
- Select the new QDat file and click Open.
You can also export your current project to a QDat and use that in Q instead of your original data, but keep in mind all banners, filters, and other constructed variables will be hardcoded and will not update with new/revised data.