[SOLVED] Perl, array vs hash with int

Procyon · 2009-12-21 17:42:34

I started learning Perl this weekend, I read something interesting about efficiency:

Perl has two kinds of array, numerically-indexed and associative. Perl associative arrays are called "hashes". Awk arrays are usually translated to hashes, but if you happen to know that the index is always going to be numeric you could change the {...} to [...]. Iteration over a hash is done using the keys() function, but iteration over an array is NOT. You might need to modify any loop that iterates over such an array.

I am writing a process tree viewer so this applies to me, having pids as indices, but how is an array with two elements, say at index 1 and 50000, used by Perl? Won't it take up a huge amount of memory?

Last edited by Procyon (2009-12-22 09:21:23)

juster · 2009-12-22 02:03:35

Since perl's arrays aren't sparse arrays, all those undefined elements in the middle do take up memory. They don't take up memory on their own but there is memory allocated in the array itself, maybe to contain the pointers to scalars later on? I wouldn't call the memory wasted a "huge" amount, but it is a waste. The Devel::Size module shows me how much:

C:\>perl -MDevel::Size=size -E "$#PIDS = 5000;
say q{Bytes of 5000 element array = }, size( \@PIDS );
say q{Bytes of 1 element array = }, size( [ q{} ] );"
Bytes of 5000 element array = 20212
Bytes of 1 element array = 108

I would only take your quoted advice if the key values are not widely distributed (to limit the gaps in between) and are close to a lower limit (so you could offset them to zero if need be). Go with a hash, they're pretty fast anyways!

pauldonnelly · 2009-12-22 07:30:06

Procyon wrote:

I am writing a process tree viewer so this applies to me, having pids as indices, but how is an array with two elements, say at index 1 and 50000, used by Perl? Won't it take up a huge amount of memory?

Yeah. If you actually just had two processes, searching a list of key/value pairs would probably be fastest, but since I expect a real system to have nearer to 100 processes, a hash table is likely the way to go.

EDIT: Although on second thought, since Perl is interpreted, using the C-implented hash table might be fastest regardless of number.

Last edited by pauldonnelly (2009-12-22 07:30:56)

Procyon · 2009-12-22 09:20:56

Thanks, very interesting. Will stay with the hash table.

Arch Linux

#1 2009-12-21 17:42:34

[SOLVED] Perl, array vs hash with int

#2 2009-12-22 02:03:35

Re: [SOLVED] Perl, array vs hash with int

#3 2009-12-22 07:30:06

Re: [SOLVED] Perl, array vs hash with int

#4 2009-12-22 09:20:56

Re: [SOLVED] Perl, array vs hash with int

Board footer