SUMIF function does not appear to work in 11.7.6, copy calcs

I purchased the program due to needing to work with several million+ row spreadsheets. It loads and saves them very quickly. I’m using Windows 7 Ultimate 64 bit and the 64 bit v. 11.7.6 of GS-CALC.

Problem 1: SUMIF formula returns an error
However, when I try and use SUMIF to collect data from a sheet after Sheet 1, I get a VALUE error.

My workbook is set up as follows:
Sheet 1, which is named Events, has record numbers in rows 2:467775 in column A. In column W in row 2, I’m trying to collect data from Sheet 4, which is named NM. NM has record numbers in column A in rows 2:20711, and numeric data in columns B and C in the same rows. The record numbers on Sheets 2-4 vary, some will have more than one of the same, some will not have all of the record numbers from Sheet 1. I use SUMIF to compare, collect, and condense numeric data for the various records so I can have a total for each record on Sheet 1.

In the wizard for formulas, it describes SUMIF as:

sumIf(if_range, criteria, [sum_range])

Sums cells that meet the specified criteria. If the ‘sum_range’ range is omitted, cells from the ‘if_range’ range are used. The ‘sum_range’ and ‘if_range’ ranges must have the same number of columns and rows. The criteria can be one of the following:

(1) a number or a text string,

(2) a text string beginning with the =,>,>=,<,<= operators - numeric and date/time searches requires unformatted numbers and generic date/time strings (YYYY-MM-DD),

(3) a search pattern: a text string containing special characters ‘?’ (any character) or ‘*’ (any string, including an empty string). To search for ? or * place a tilde (~) before them.

=sumIf({1,2;3,2}, 2,) returns 4

=sumIf({1,2;3,4}, “>2”,) returns 7

=sumIf({“abcde”,“def”;“abc”,“a”}, “?bc*”, {2, 4; 3, 1}) returns 5

=sumIf({“abcde”,“def”;“bc”,“a”}, “bc”, {2, 4; 3, 1}) returns 3

=sumIf({“ABcde”,“def”;“Bc”,“a”}, “>=bc”, {2, 4; 3, 1}) returns 7

The syntax is listed as : =sumIf(if_range, criteria, [sum_range])
I tired to collect it from Sheet 4 column C as follows using both the wizard function under formula and also manually typing in the data:
=sumIf(NM!A2:A20711,Events!A2:A467775,NM!C2:C20711)
I have used the above formula on Open Office Calc and it returns the proper result.

I experimented with inserting the Sheet name before the cell ranges and this made no difference:
=sumIf(NM!A2:NM!A20711,Events!A2:Events!A467775,NM!C2:NM!C20711)

I tried copying & pasting the Sheet 4 data into blank columns (AE, AF, and AG) on Sheet 1 and got the same result:
=sumIf(AE2:AE20711,A2:A467775,AG2:AG20711)

Problem 2: Auto-filling rows with SUMIF calculations
If I can cross this bridge, the next thing I need to do is to have this calculation repeated for rows 3:467775 on Sheet 1. Is there a way to easily do so? On Open Office Calc, I can double-click the “handle” of the cell with the formula and it automatically copies the formula for the remaining rows.

Thank you for any help, I look forward to using the program.

Ad. 1)
The 2nd argument has to be a single value if the function is to be used as the “non-array” version. It seems this is how (let’s say: the most popular approach to) the specification of the function looks like.
The problem here is that the SUMIF (and COUNTIF, for that matter) function hasn’t been implemented (so far) to be used as the array versions.
The update (11.7.7) should be published tomorrow.

However, I would recommend to not use the SUM(SUMIF()) construction with the array version of SUMIF.
Instead, use a series of the SUMIF functions then the SUM function at the end of the calculation chain.
If you have millions of cells to process and a multicore processor, this should be significantly faster. (Also probably significantly faster than that non-array SUMIF version accepting an array as the 2nd argument.)

Ad. 2)
It works more or less the same. The “Insert > Series” command can be used to create series of numbers, dates and formulas (optionally with some interleaving empty cells etc.)
Please see the “Series” help topic for details and examples. In brief: select a cell range with a given formula in the top-left cell and press F8.
(Unless you meant something else.)

Thank you for the reply.

I’m not as knowledgeable as I should be on spreadsheets, so I’ll try and express my needs plainly (thinking out loud here). I have combined all data onto one sheet using more columns to cut down on the number of rows. I now have a maximum of about 800K rows on the largest worksheet, the smaller one I’m working on now has 540K rows and about 80 columns.
In Column A, I have a series of record numbers. They only appear once in this column and there are a total of 305,399 rows with record numbers.
In Column BN, I have a series of record numbers that are within the range of Column A. Some appear more than once, as there are a total of 540,420 rows.
In Columns BO and BP, I have a series of either a 0 or a 1 to be added to the applicable record in Column A. They total the same 540,420 rows.

This situation repeats for Columns BQ (record #), BR (data), and BS (data) for the second data set, and Columns BT (record #), BU (data), and BV (data) for the third one. They total 210,045 and 11,139 rows respectively.

What I want to end up with are two columns of data that capture the subtotals for each matching Column A record. The first column will be from BO+BR+BU and the second column from BP+BS+BV.

I’m trying to capture subtotals in two columns that can then be added to the total columns. If I can use a formula to get the numbers directly into the totals, that would be great.

Thanks also for the F8 tip.

The following formula appears to be working, after selecting the 305,399 cells where I wanted the formula and hitting F8, it is updating one column now and shows about 50% progress bar after a few minutes (I see I only had 2 threads/CPU set, forgot to advise I’m running an AMD A8-3850 CPU). Updating a smaller Open Office file has taken over 4 hours, so GS-Calc is far ahead at this point :smiley: :

=sum((sumIf(BN2:BN54020,A2,BO2:BO540420)),(sumIf(BQ2:BQ210045,A2,BR2:BR210045)),(sumIf(BT2:BT11139,A2,BU2:BU11139)))

When I scrolled down while it was updating, I noted a SYNTAX error in cell #152700, exactly halfway down the selected cells, but it still says updating. I’m going to let it run and will check back later.

For the second column, I’m going to use this formula:

=sum((sumIf(BN2:BN54020,A2,BP2:BP540420)),(sumIf(BQ2:BQ210045,A2,BS2:BS210045)),(sumIf(BT2:BT11139,A2,BV2:BV11139)))

Thinking out loud, I’m comparing the specified range in BN against A2, then the specified range in BQ against A2, then the specified range in BT against A2, and summing all three up. Am I doing this as you suggested? Is there a better way to write the formula? How do I place the SUM function after the calculations?

After letting the updating finish, I’m losing data. My total for the first set by the three columns is 110,490, and for the second set is 1,236.
For the two columns I added up using the SUMIFs, I got 27,575 and 117.

The errors no longer show up, and it appears the proper formula is in place for the second half of the cells. Not sure what has happened.

If you’re using the “Insert Series (F8)” command for this formula:
=sum((sumIf(BN2:BN54020,A2,BP2:BP540420)),(sumIf(BQ2:BQ210045,A2,BS2:BS210045)),(sumIf(BT2:BT11139,A2,BV2:BV11139)))
as I understand, you probably have to use absolute references for the search and summing ranges, that is BN$2:BN$54020 etc. Otherwise they might be updated (to BN3:BN54021 etc.) depending on your current “Settings > Modify copied (…)” option selection.

Is this a file created entirely in GS-Calc or copied/imported from something else? If there was some “syntax” error, it’s still there unless you corrected it.
Please use the “File > Content & Statistics” command to check to see whether there are any formulas returning errors or accidentally entered array formulas. If there are some errors or array formulas, use the “Inspect cells” window to locate them.

You might want to verify that your worksheets are built correctly using some initial records/rows only. (If you could remove/replace any sensitive data and if you think it might be helpful, feel free to send me a small test file if there’re still some problems.)

It looks like the absolute references were the solution. I thinned the worksheet columns down to remove the irrelevant (for this operation) data between the overall record number and the subsets. I’m updating now and have checked a couple of record numbers to verify the data in the end result cell are correct. I won’t have totals until the updating is finished, but so far so good. I will post back here to let you know if this returned matching data.

I copied the data from a CSV file opened in GS-Calc, these are traffic crash records and that is how the state sends them out. The formulas were written in GS-Calc. The settings/modify copied was default as yes. The file content & statistics does not list any error or array formulas, but does list 36,208 number formulas (I have a formula at the bottom of each data column to total the numbers in the column).

Thanks for your help, this is a great program and a good value as compared to the others I have used. I’ll be writing some positive reviews once this job is completed.

I have finished the calculations for the first column, and they match the expected total based upon the raw CSV data. Presuming the second column also matches, I’ll have my desired data. This was a good solution to a complex (for me) problem, thank you again for the help. I have left favorable reviews on a couple of software sites.

Thanks. BTW, even if the update times you mentioned above are acceptable at the moment, if your data grows to million(s) of records x million(s) of searchable record number sets, then worksheets with simple SUMIF’s will/might become unusable (being too slow). It’s because SUMIF searches a given range linearly.
Ideally, you should sort those ranges with record numbers. If you do, you’ll be able to use the MATCH() function with fast binary searching flag to find the indices of the first and last occurrences of a given record, use the RANGE() function to create a range and then the plain SUM() function to sum what’s needed.
If - like you wrote - SUMIF’s required several minutes to complete, the above shouldn’t take more than a few seconds.

If manual sorting isn’t possible or is too inconvenient, it should be also possible to use some formula-based, internal sorting, though the results will be worse.
Unfortunately, these are “native” GS-Calc features and a corresponding Open Document *.ods file will not update correctly in other spreadsheet programs.

The data I needed to combine and then list each has a record number. If I use MATCH vs SUMIF, wouldn’t this return the number of matching records vs the numeric data I need? I may have a matching record that has a 0 value, and if I understand the MATCH function correctly, this would return a 1 or if there were two driver records, a 2 when the correct total may be 0. I hope I’m missing something here, will have to learn about using MATCH with RANGE and then SUM.

My pared down spreadsheet looks like this:
A: record_number B: Total_injury C: Total_fatal D: Driver_record E: Driver_injury F: Driver_fatal G: Pass_record H: Pass_injury I: Pass_Fatal J: NM_record K: NM_injury L: NM_fatal

Rows A2:A305399 are crash record numbers

Bows B2:B305399 and C2:C305399 are where I want the combined data sums to appear

Rows D2:D540420 are driver records reduced to injury
Rows E2:E540420 and F2:F540420 are data for injury and fatal

Rows G2:210045 are driver records reduced to injury
Rows H2:H210045 and I2:I210045 are data for injury and fatal

Rows J2:J11139 are driver records reduced to injury
Rows K2:K11139 and L2:L11139 are data for injury and fatal

The goal is for rows in B and C to total the matching record numeric data from the applicable data in E,F,H,I,K, and L.

I have a screencap but I see there is no image attachment feature on this forum.

The worksheet “layout” will be the same. You just need to replace the SUMIF’s with SUM which adds data from the proper sub-range of the sorted E…:F…, H…:I…, K…:L… ranges.

For example, to determine the first row of your E… data for the column B and the n-th record you need to use the following:

=match(n, D$2:D$540420, SEARCH::FirstMatch + SEARCH::SortAscending + SEARCH::NoPattern)

The last row:

=match(n, D$2:D$540420, SEARCH::LastMatch + SEARCH::SortAscending + SEARCH::NoPattern)

The above specifies indices relative to the E$2:E$540420 range (first row, last row, column 1), so the range to sum would be:

=index(E$2:E$540420, match(n, D$2:D$540420, SEARCH::FirstMatch + SEARCH::SortAscending + SEARCH::NoPattern), 1):index(E2:E540420, match(n, D$2:D$540420, SEARCH::LastMatch + SEARCH::SortAscending + SEARCH::NoPattern), 1)

And final formula for the E$2:E$540420 values would be

=sum(index(E$2:E$540420, match(n, D$2:D$540420, SEARCH::FirstMatch + SEARCH::SortAscending + SEARCH::NoPattern), 1):index(E2:E540420, match(n, D$2:D$540420, SEARCH::LastMatch + SEARCH::SortAscending + SEARCH::NoPattern), 1))

You just need to add the sum() results for E…, H… and K… in the column B, and use similar formulas for the F…, I… and L… values to placed in the column C.

Finally, the IF’s and isError() function must be used for each individual sum to trap the situation where the searched data range doesn’t contain a given record number.

(You can use numerical values instead of the SEARCH:: flags to make the formulas much shorter.)

Thank you for the detailed reply. The fact is I only need to do a total of 5 of these files, and they will never be increased as they are historical data. The above looks like a lot of work and the opportunity for more keystroke errors, my typing skills are not the best. I have a good template down for =SUM(SUMIF), and have already completed one file, am halfway through the second, and have a third running on an unused laptop. I will stick with this process for the remainder.

I really appreciate you education on your program, it has been a good investment.

This is mostly copying and pasting but if you would like to avoid it, you can simply use the pivotData() function. This will generate the whole record/sums sorted report with just one (array) formula entered in sheet1, for example:

=pivotData(sheet2!D1:F1000000, {1},{2,3}, {PIVOT::Sum,PIVOT::Sum}, PIVOT::RowGrandTotals + PIVOT::ColumnGrandTotals + PIVOT::SubTotals)

or with two formulas for each column of sums:

=pivotData(sheet2!D1:F1000000, {1},{2}, {PIVOT::Sum}, PIVOT::RowGrandTotals + PIVOT::ColumnGrandTotals + PIVOT::SubTotals)
and
=pivotData(sheet2!D1:F1000000, {1},{3}, {PIVOT::Sum}, PIVOT::RowGrandTotals + PIVOT::ColumnGrandTotals + PIVOT::SubTotals)

(where the D1, E1 and F1 cells contain e.g. the following labels: “record number”, “injury”, “fatal”.)

However the report will contain only those record numbers which occur within the sheet2!D1:D1000000 range. To display all record numbers (e.g. 1…n), you would need to add the series of 1…n numbers below sheet2!D1000000 and
change the above sheet2!D1:F1000000 to include these numbers.

The above will be slower than the previous solution, but still much faster than SUMIFs (for very large data sets).

If you need something like that in the future, please take a look at the corresponding help topics and the sample.gsc file, which contains a few pivot tables.