Stop Using +=, or How to Add Data to an Array

If you are coming to PowerShell from another scripting language you may fall into this coding trap. I know I did when I was first learning the ropes. It’s easy to miss too since your system probably won’t feel the load when running lighter tasks. However, this trick may save you when you are running 1000+ item reports. So let’s get into it.

For this example, let’s say you are asked to build a report of all the processes running on a system. Now you might have some prior scripting knowledge and so You whip up some code similar to the example below. Except there’s one problem. It doesn’t run…

$processes = Get-process
foreach ($process in $processes) {
    $properties= [ordered]@{
        Name = $process.Name
        ID = $process.ID
    }
    $object = new-object -type PSObject -property $properties
    $report += $object
}
Write-output $report

The PowerShell Prompt will yell at you ALOT saying “A hash table can only be added to another hash table.” So let’s look. After if I run that script in the prompt I’ll be able to inspect $report as we see below.

PS C:\> $report | get-member


   TypeName: System.Management.Automation.PSCustomObject

Name        MemberType   Definition
----        ----------   ----------
Equals      Method       bool Equals(System.Object obj)
GetHashCode Method       int GetHashCode()
GetType     Method       type GetType()
ToString    Method       string ToString()
ID          NoteProperty int ID=10124
Name        NoteProperty string Name=audiodg

So our problem is how PowerShell handles typecasting of $report. We want it to be an array. However, after we “add” (+=) $object to it PowerShell makes $report a PSCustomObject. Then when we loop through again, PowerShell gets confused because you can’t add an object to an object. So it gives up and tosses an error. So how do we fix that?

$processes = Get-process

#################
$report = @()
#################

foreach ($process in $processes) {
    $properties= [ordered]@{
        Name = $process.Name
        ID = $process.ID
    }
    $object = new-object -type PSObject -property $properties
    $report += $object
}
Write-output $report

We’ll go ahead and explicitly case $report as an array by using $report = @(). That solves the issue and now the code runs. However, we are actually sacrificing some performance using this method of adding to an array.

Here’s what’s going on under the hood when we use += to add to an array in PowerShell. In the beginning, our array, $report, is created into memory as an empty array. Once PowerShell reaches the first iteration of += the following happens in memory. PowerShell allocates enough memory to store our current version of $report PLUS the data of the new iteration of $object. This is all happening while the old version of $report is still hanging out in memory. PowerShell then dumps the contents of $report into the new memory allocation block after that it then adds $object too. After all of that, it cleans up the old version of the $report array in memory.

This means every iteration of the foreach loop you end up with 2 copies of $report in memory. So while you are running through 100 tiny objects this isn’t an issue for memory on modern machines. However, once you run through 10,000, or even 100,000 or more objects it can be a little dicey. We’ll use the code below to demonstrate.

@(10, 100, 1000, 10000, 100000) | ForEach-Object {
    $IterationTime = Measure-Command {
        $array = @() #make a fresh array
        1..$_ | ForEach-Object {        #We make Powershell count from 1 to the current value in the iteration 10, 100, 1000 etc
            $array += $_ 
        }
    }
    $output = [pscustomobject]@{
        IterationTime  = $IterationTime.TotalMilliseconds 
        IterationCount = $_
    }
    $output
}
IterationTime IterationCount
------------- --------------
       3.5395             10
       5.5907            100
      41.9854           1000
    2775.8977          10000
  554277.5833         100000

So we don’t really have anything to compare those numbers to so lets run this. We’ll add to an array using Microsoft’s accepted solution using an ArrayList.

@(10, 100, 1000, 10000, 100000) | ForEach-Object {
    $IterationTime = Measure-Command {
        $array = [System.Collections.ArrayList]::new()
        1..$_ | ForEach-Object {
            $array.Add($_) 
        }
    }
    $output = [pscustomobject]@{
        IterationTime  = $IterationTime.TotalMilliseconds
        IterationCount = $_
    }
    $output
}
IterationTime IterationCount
------------- --------------
       3.3847             10
       3.4836            100
       9.7816           1000
      76.1171          10000
     764.8612         100000
Item Count+=ArrayList
103.5395 ms3.3847 ms
1005.5907 ms3.4836 ms
100041.9854 ms9.7816 ms
100002775.8977 ms76.1171 ms
100000449415.2217 ms764.8612 ms
Here’s the data consolidated to a table

The data looks pretty clear. Around 10,000 items you will start really feeling the performance hit at 2.7758 seconds for using += vs the .076 seconds using an ArrayList. Now keep in mind that we did absolutely no work on the data before adding it to the array. So, expect this issue to compound as you use += on arrays in the real world. Stick to ArrayLists if you want to add to Arrays.

After looking at the Arraylist example above, you might be thinking: “No way am I going to remember System.Collections.ArrayList]::new() when I go to make an array object”. If you are I have some good news for you. You don’t have to remember any of that. Just remember this simple trick instead.

@(10, 100, 1000, 10000, 100000) | ForEach-Object {
    $IterationTime = Measure-Command {
        $array = 1..$_ | ForEach-Object {
            write-output $_ 
        }
    }
    $output = [pscustomobject]@{
        IterationTime  = $IterationTime.TotalMilliseconds
        IterationCount = $_
    }
    $output
}
IterationTime IterationCount
------------- --------------
       3.6656             10
       9.8842            100
      54.5917           1000
     540.9592          10000
    4490.1897         100000

Welcome to the compromise. It’s a little slower than explicitly using ArrayLists, but you are still using an ArrayList. It’s slower for a couple of reasons. First, we are adding write-output to the loop which wasn’t there before. Second, there’s a little extra overhead while PowerShell figures out what’s we’re doing. Basically PowerShell notices we’re capturing the output of something. So to be clever PowerShell will spin up an ArrayList and capture the output of each loop. Once the loop is terminated PowerShell converts the completed ArrayList into an appropriate Object. In our case, we get a System.Int32.

PS C:\> $array | get-member


   TypeName: System.Int32

Name        MemberType   Definition
----        ----------   ----------
...

So to put it all together with our use case from the beginning. If we want to efficiently want to collect the object we’re creating for the Processes running on a system we’ll want to use the following.

$processes = Get-process
$report = foreach ($process in $processes) {
    $properties= [ordered]@{
        Name = $process.Name
        ID = $process.ID
    }
    $object = new-object -type PSObject -property $properties
    write-output $object
}
Write-output $report

This allows us to leverage ArrayLists effectively while cheating a little by not having to remember how to explicitly create an ArrayList variable. Now, when you spot someone using += in their code to add to an array just point them to this article and help change their mind 😀

Exchange Online Powershell v2- Not so fast?

Comments

  1. This is good illustration of well-known feature of immutable objects. Java’s strings were classical example, as manipulating immutable strings created a lot of objects and that was way slow in early 2000’s, so StringBuilder was invented. .Net’s got StringBuffer for the same. But I digress.

    Let’s see the array version, but this time one declares an initial size for the array. The syntax for declaring an array of size is New-Object Object[] num-of-elements. In code, ($_ +1 ) to get an array that’s large enough for each iteration. +1 is from the non-zero index starting.

    Like so,

    @(10, 100, 1000, 10000, 100000) | ForEach-Object {
    $IterationTime = Measure-Command {
    $array = New-Object Object[] ($_+1)
    1..$_ | ForEach-Object {
    $array[$_] = $_
    }
    }
    $output = [pscustomobject]@{
    IterationTime = $IterationTime.TotalMilliseconds
    IterationCount = $_
    }
    $output
    }

    Care to guess how that one performs?

  2. Interestingly enough the first iteration is slower than the second.

    IterationTime IterationCount
    ————- ————–
    3.7552 10
    1.6015 100
    10.3745 1000
    78.0969 10000
    884.661 100000

    The rest of the times line up with running as an ArrayList on my test machine 😀

Leave a Reply

Your email address will not be published / Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.