Parallel array

In computing, a group of parallel arrays (also known as SoA) is a data structure for representing arrays of records. It keeps a separate, homogeneous data array for each field of the record, each having the same number of elements. Then, objects located at the same index in each array are implicitly the fields of a single record. Pointers from one object to another are replaced by array indices. This contrasts with the normal approach of storing all fields of each record together in memory (also known as AoS). For example, one might declare an array of 100 names, each a string, and 100 ages, each an integer, associating each name with the age that has the same index.

An example in C using parallel arrays:

int  ages[]   = {0,          17,        2,          52,         25};
char *names[] = {"None",     "Mike",    "Billy",    "Tom",      "Stan"};
int  parent[] = {0 /*None*/, 3 /*Tom*/, 1 /*Mike*/, 0 /*None*/, 3 /*Tom*/};

for(i = 1; i <= 4; i++) {
    printf("Name: %s, Age: %d, Parent: %s \n",
           names[i], ages[i], names[parent[i]]);
}

in Perl (using a hash of arrays to hold references to each array):

my %data = (
    first_name   => ['Joe',  'Bob',  'Frank',  'Hans'    ],
    last_name    => ['Smith','Seger','Sinatra','Schultze'],
    height_in_cm => [169,     158,    201,      199      ]);

for $i (0..$#{$data{first_name}}) {
    printf "Name: %s %s\n", $data{first_name}[$i], $data{last_name}[$i];
    printf "Height in CM: %i\n", $data{height_in_cm}[$i];
}

Or, in Python:

first_names = ['Joe',  'Bob',  'Frank',  'Hans'    ]
last_names = ['Smith','Seger','Sinatra','Schultze']
heights_in_cm = [169,     158,    201,      199      ]

for i in xrange(len(first_names)):
    print "Name: %s %s" % (first_names[i], last_names[i])
    print "Height in CM: %s" % heights_in_cm[i]

Using zip:
for first_name, last_name, height in zip(first_names, last_names, heights_in_cm):
    print "Name: %s %s" % (first_name, last_name)
    print "Height in CM: %s" % height_in_cm

Pros and cons

Parallel arrays have a number of practical advantages over the normal approach:

However, parallel arrays also have several strong disadvantages, which serves to explain why they are not generally preferred:

The bad locality of reference can be alleviated in some cases: if a structure can be divided into groups of fields that are generally accessed together, an array can be constructed for each group, and its elements are records containing only these subsets of the larger structure's fields. (see data oriented design). This is a valuable way of speeding up access to very large structures with many members, while keeping the portions of the structure tied together. An alternative to tying them together using array indexes is to use references to tie the portions together, but this can be less efficient in time and space. Another alternative is to mock up a record structure in a single-dimensional array by declaring an array of n*m size and referring to the r-th field in record i as element as array(m*i+r). Some compiler optimizations, particularly for vector processors, are able to perform this transformation automatically when arrays of structures are created in the program.

See also

References

This article is issued from Wikipedia - version of the 6/14/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.