Stop Using +=, or How to Add Data to an Array
If you are coming to PowerShell from another scripting language you may fall into this coding trap. I know I did when I was first learning the ropes. It’s easy to miss too since your system probably won’t feel the load when running lighter tasks. However, this trick may save you when you are running 1000+ item reports. So let’s get into it.
For this example, let’s say you are asked to build a report of all the processes running on a system. Now you might have some prior scripting knowledge and so You whip up some code similar to the example below. Except there’s one problem. It doesn’t run…
$processes = Get-process
foreach ($process in $processes) {
$properties= [ordered]@{
Name = $process.Name
ID = $process.ID
}
$object = new-object -type PSObject -property $properties
$report += $object
}
Write-output $report
The PowerShell Prompt will yell at you ALOT saying “A hash table can only be added to another hash table.” So let’s look. After if I run that script in the prompt I’ll be able to inspect $report as we see below.
PS C:\> $report | get-member
TypeName: System.Management.Automation.PSCustomObject
Name MemberType Definition
---- ---------- ----------
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
ToString Method string ToString()
ID NoteProperty int ID=10124
Name NoteProperty string Name=audiodg
So our problem is how PowerShell handles typecasting of $report. We want it to be an array. However, after we “add” (+=
) $object
to it PowerShell makes $report
a PSCustomObject. Then when we loop through again, PowerShell gets confused because you can’t add an object to an object. So it gives up and tosses an error. So how do we fix that?
$processes = Get-process
#################
$report = @()
#################
foreach ($process in $processes) {
$properties= [ordered]@{
Name = $process.Name
ID = $process.ID
}
$object = new-object -type PSObject -property $properties
$report += $object
}
Write-output $report
We’ll go ahead and explicitly case $report as an array by using $report = @()
. That solves the issue and now the code runs. However, we are actually sacrificing some performance using this method of adding to an array.
Here’s what’s going on under the hood when we use +=
to add to an array in PowerShell. In the beginning, our array, $report
, is created into memory as an empty array. Once PowerShell reaches the first iteration of +=
the following happens in memory. PowerShell allocates enough memory to store our current version of $report
PLUS the data of the new iteration of $object
. This is all happening while the old version of $report
is still hanging out in memory. PowerShell then dumps the contents of $report
into the new memory allocation block after that it then adds $object
too. After all of that, it cleans up the old version of the $report
array in memory.
This means every iteration of the foreach
loop you end up with 2 copies of $report
in memory. So while you are running through 100 tiny objects this isn’t an issue for memory on modern machines. However, once you run through 10,000, or even 100,000 or more objects it can be a little dicey. We’ll use the code below to demonstrate.
@(10, 100, 1000, 10000, 100000) | ForEach-Object {
$IterationTime = Measure-Command {
$array = @() #make a fresh array
1..$_ | ForEach-Object { #We make Powershell count from 1 to the current value in the iteration 10, 100, 1000 etc
$array += $_
}
}
$output = [pscustomobject]@{
IterationTime = $IterationTime.TotalMilliseconds
IterationCount = $_
}
$output
}
IterationTime IterationCount
------------- --------------
3.5395 10
5.5907 100
41.9854 1000
2775.8977 10000
554277.5833 100000
So we don’t really have anything to compare those numbers to so lets run this. We’ll add to an array using Microsoft’s accepted solution using an ArrayList.
@(10, 100, 1000, 10000, 100000) | ForEach-Object {
$IterationTime = Measure-Command {
$array = [System.Collections.ArrayList]::new()
1..$_ | ForEach-Object {
$array.Add($_)
}
}
$output = [pscustomobject]@{
IterationTime = $IterationTime.TotalMilliseconds
IterationCount = $_
}
$output
}
IterationTime IterationCount
------------- --------------
3.3847 10
3.4836 100
9.7816 1000
76.1171 10000
764.8612 100000
Item Count | += | ArrayList |
---|---|---|
10 | 3.5395 ms | 3.3847 ms |
100 | 5.5907 ms | 3.4836 ms |
1000 | 41.9854 ms | 9.7816 ms |
10000 | 2775.8977 ms | 76.1171 ms |
100000 | 449415.2217 ms | 764.8612 ms |
The data looks pretty clear. Around 10,000 items you will start really feeling the performance hit at 2.7758 seconds for using +=
vs the .076 seconds using an ArrayList. Now keep in mind that we did absolutely no work on the data before adding it to the array. So, expect this issue to compound as you use +=
on arrays in the real world. Stick to ArrayLists if you want to add to Arrays.
After looking at the Arraylist example above, you might be thinking: “No way am I going to remember System.Collections.ArrayList]::new()
when I go to make an array object”. If you are I have some good news for you. You don’t have to remember any of that. Just remember this simple trick instead.
@(10, 100, 1000, 10000, 100000) | ForEach-Object {
$IterationTime = Measure-Command {
$array = 1..$_ | ForEach-Object {
write-output $_
}
}
$output = [pscustomobject]@{
IterationTime = $IterationTime.TotalMilliseconds
IterationCount = $_
}
$output
}
IterationTime IterationCount
------------- --------------
3.6656 10
9.8842 100
54.5917 1000
540.9592 10000
4490.1897 100000
Welcome to the compromise. It’s a little slower than explicitly using ArrayLists, but you are still using an ArrayList. It’s slower for a couple of reasons. First, we are adding write-output to the loop which wasn’t there before. Second, there’s a little extra overhead while PowerShell figures out what’s we’re doing. Basically PowerShell notices we’re capturing the output of something. So to be clever PowerShell will spin up an ArrayList and capture the output of each loop. Once the loop is terminated PowerShell converts the completed ArrayList into an appropriate Object. In our case, we get a System.Int32.
PS C:\> $array | get-member
TypeName: System.Int32
Name MemberType Definition
---- ---------- ----------
...
So to put it all together with our use case from the beginning. If we want to efficiently want to collect the object we’re creating for the Processes running on a system we’ll want to use the following.
$processes = Get-process
$report = foreach ($process in $processes) {
$properties= [ordered]@{
Name = $process.Name
ID = $process.ID
}
$object = new-object -type PSObject -property $properties
write-output $object
}
Write-output $report
This allows us to leverage ArrayLists effectively while cheating a little by not having to remember how to explicitly create an ArrayList variable. Now, when you spot someone using +=
in their code to add to an array just point them to this article and help change their mind 😀
vP
May 25, 2020 - 2:08 am
This is good illustration of well-known feature of immutable objects. Java’s strings were classical example, as manipulating immutable strings created a lot of objects and that was way slow in early 2000’s, so StringBuilder was invented. .Net’s got StringBuffer for the same. But I digress.
Let’s see the array version, but this time one declares an initial size for the array. The syntax for declaring an array of size is New-Object Object[] num-of-elements. In code, ($_ +1 ) to get an array that’s large enough for each iteration. +1 is from the non-zero index starting.
Like so,
@(10, 100, 1000, 10000, 100000) | ForEach-Object {
$IterationTime = Measure-Command {
$array = New-Object Object[] ($_+1)
1..$_ | ForEach-Object {
$array[$_] = $_
}
}
$output = [pscustomobject]@{
IterationTime = $IterationTime.TotalMilliseconds
IterationCount = $_
}
$output
}
Care to guess how that one performs?
actnjaxxon
May 25, 2020 - 9:57 am
Interestingly enough the first iteration is slower than the second.
IterationTime IterationCount
————- ————–
3.7552 10
1.6015 100
10.3745 1000
78.0969 10000
884.661 100000
The rest of the times line up with running as an ArrayList on my test machine 😀