Joseph
June 1, 2016

Data Structures – A Soft Introduction

After you’ve learned to code a bit, and know how to get a script to do what you want, the question always comes to: Can it be done faster? One of the most crucial concepts to understand as a developer is the idea of efficiency – specifically how we can make code run in the fastest, most extensible, and most coherent way possible. It’s the challenge that differentiates the utilitarianism of code and to a degree, the beauty of code.

From the outside looking in, some of these more advanced concepts, especially understanding efficiency, can be a little difficult to wrap your head around. So we’re going to start with one of the most basic concepts of computer science, data structures.

Everything about an application is centered around data, how it’s stored, accessed, modified, and removed. This concept isn’t limited to just programming though, think about roads, mail carriers, or libraries. Take for example libraries. Imagine a library where there are only 4 shelves of books in a 12,000 square foot building.  The kicker is that the 4 shelves are all spread equally apart in different corners of the warehouse.

If you want to check out a book from two different shelves, that requires running across the entire warehouse for each shelf. The situation gets worse if you don’t know what shelf either book is on. No Dewey decimal system here! This example illustrates the major points of what makes data structures important, allocation of space, organization, and indexing. This library would be much better to visit if it didn’t take up quite so much space, after all, who needs 12,000 square feet for 4 shelves? It wouldn’t be terrible either if the library was the appropriate size, and we just had to sort through 4 shelves. But imagine 12 shelves? 24 shelves? The problem of finding a book starts to become a bit more daunting.

Let’s correlate this to a computer science concept. How do a store and access a list of names, say a contacts application? Well from a simple standpoint we could hard code each contact in:

...
joseph_email = "joseph@unconfusing.com"
joseph_phone = "334-826-1288"
heath_email = "heath@unconfusing.com"
...

But we’ll quickly notice that this simple concept for storing names is going to become unwieldy very quickly. Each time you create a contact, you’ll have to manually add a group of variables for each user. Not only that, you’ll also have to keep up with what all these variables are named. To give an example, imagine packing up your house to move, and putting every. single. item. into a different box. That’s what this illustrates.

This is where the concept of arrays and objects come in. These can be thought of exactly as boxes. Practically, you want to put things in boxes that have similar places. Put all the plates from the kitchen in this box, all the clothes in this one, etc. Then you know exactly what’s in where when you unpack it all right? Let’s take this concept back to our idea of an array. (in this case we will be using an associative array, in some languages this would be considered an object).

...
joseph = ['email' => 'joseph@unconfusing.com', 'phone' => '334-826-1288']
erich = ['email' => 'erich@unconfusing.com', 'phone' => '334-826-1288']
...

So now we have a box for each person, we know for joseph there’s going to be a box with an email inside and a phone number. But we can do better than this.

Let’s scale back from our box example for a moment, think about moving a huge corporation, it’s going to take a lot more than just the one truck it did for your house isn’t it? So now we have hundreds and thousands of boxes stored in trucks. From a practicality standpoint, we’re now not looking at boxes as individuals, but trucks. We can arrange our data into ‘trucks’ as well:

[
   0 => [
     'name' => 'heath',
     'email' => 'heath@unconfusing.com',
     'phone' => '334-826-1288'
   ],
   1 => [
     'name' => 'erich',
     'email' => 'erich@unconfusing.com',
     'phone' => '334-826=1288'
   ],
   ...
]

Going through the code above, we can imagine the first set of braces to be the truck. We’re now using the traditional idea of an array, it’s numbered from 0 to the number of elements in the array. Meaning that each box has a number (originally they had numbers anyways, but we just called them by name). So now we know that the truck we are looking at has 2 boxes in it, and each box has a name, email and phone in it.

Hold up though – here’s a question. Would it be easier to find the box named erich in a truck if we had to look in every box to find the name? Or would it be easier if we wrote the name on the outside? Okay, but what if two boxes need to have the same name? Do we call one box office_supplies_1 and the other office_supplies_2? That starts to make things confusing doesn’t it.

Well this is where it starts to get fun.

If we’re moving just a house, we’re not going to have many boxes, so it might make more sense for us to put the name of everything on the outside of the box, that we always know exactly where it’s supposed to go. The extra time to write out a name on the box will make it more convenient to find them later.

But if we’re moving a corporation, like our last example. We can just group boxes in trucks, all the office supplies go in these trucks, furniture in those trucks, etc. It doesn’t make sense to name the boxes, because there’s just so much of it. Names would either be non-descriptive, too specific, or repetitive. So why not just count the boxes and figure out how to set everything back up on the other side?

Which sets us up for the next post – how do we decide?

Leave a Reply

Your email address will not be published. Required fields are marked *