Skip to main content

Growing Pains: Building a Vector in Go

Table of Contents

I’ve been interested in the Go programming language for a while: it’s statically typed, compiles to native code, and has concurrency built-in. As a Python developer, it has a very attractive feature set.

To get a feel for Go, I thought it’d be fun to implement the Vector datatype, also known as the dynamically sized array.

Follow along as I try to get to grips with Go, and do a little algorithmic analysis along the way.

All the code is available here.

Why Go?
#

So what makes Go so interesting? For me, there’s three selling points:

  • Static Type Checking: The Go compiler performs type-checking at compile time, meaning no runtime type errors. This is a huge attraction coming from Python, which is dynamically typed, meaning type-related errors at runtime are a common occurrence.

  • Compiles To Native Code: Go compiles to optimized machine code making it crazy fast. This is again a contrast to Python, which is interpreted and very slow.

  • Concurrency Built-In: Go was designed with concurrency as a core feature. We won’t go into its concurrency support in this post, but it’s something I’ll explore in the future.

With that context, let’s define basic operations for Vector.

Deciding Functionality
#

To keep the scope small, I came up with the following functionality:

  • NewVector() - constructor function to create new, empty Vectors
  • Size() - returns the number of elements in the Vector
  • IsEmpty() - returns true if the Vector has no elements, false otherwise
  • Capacity() - returns the available capacity of the Vector
  • At(index) - element access with bounds checking
  • Push(elem) - add elements to the end of the Vector
  • Insert(index, elem) - insert an element at a certain position
  • Pop() - remove and return an element from the end
  • Delete(index) - delete an element at an index

Implementation
#

Setting Everything Up
#

Go structures code into packages - source files that live in the same directory - and modules, a collection of packages.

To get started, I made a new Go module:

go mod init go_vector

go mod init creates a go.mod file, which contains metadata about your module.

I then created a main.go file and a directory for the Vector code itself.

main.go is a special file which we’ll return to.

I created a vector directory and switched to it, and then made a vector.go file where we’ll spend most of our time.

touch main.go
mkdir vector && cd vector
touch vector.go  # this is where the vector code will go

After all that, this was my directory structure:

go.mod
main.go
vector
└── vector.go

With everything set up, it was time to start implementing.

Defining Types
#

Taking inspiration from its great-grandfather C, Go uses structures (struct) to define types instead of classes and objects.

If you’re unfamiliar with structs, a struct is just a grouping of data.

For Vector to be useful, I made it generic, meaning it can hold any data type:

// vector/vector.go

package vector

type Vector[T any] struct {
    arr []T
    capacity int
}

From the top:

  • package vector: Every Go source file belongs to a package, normally the name of the directory the file is in.

  • type Vector[T any] struct: I declare Vector as a generic struct which can hold any type T.

  • arr []T: Creates a generic slice called arr where I’ll store the elements and manage resizing. I’ll return to slices shortly.

  • capacity int: Holds the current capacity of the Vector - the number of elements it can hold.

Creating New Vectors
#

With my struct defined, I implemented the constructor function to return a new empty Vector.

func NewVector[T any]() *Vector[T] {
    initialCap := 1
    
    return &Vector[T]{
        arr: make([]T, 0, initialCap),
        capacity:   initialCap,
    }
}
  • func NewVector[T any]() *Vector[T]: Defines a generic function NewVector which returns *Vector[T] - a pointer to the newly created Vector instantiated with type T.

  • initialCap := 1: The initial capacity of the Vector will be 1.

  • Inside the new struct I initialize the fields. capacity is set to initialCap, and arr is initialized using make([]T, 0, initialCap), which creates a new slice of type T with length 0 (no elements) but a capacity of initialCap (which is 1).

  • I then return &Vector[T], a reference to the newly created Vector.

Before we go further, let me explain what slices are.

Slices In A Sentence (Or Two)
#

A slice is a view into an underlying backing array. Slices are Go’s answer to dynamically sized arrays.

A slice is made up of three things:

  • A pointer to the underlying array
  • The number of elements it currently contains (its length)
  • The number of elements it can contain (its capacity)

When you append to a slice beyond its capacity, Go automatically allocates a larger backing array and copies the contents over.

Size(), Capacity(), IsEmpty()
#

Next I implemented functions related to the size and capacity of a Vector.

func (v *Vector[T]) Size() int {
    return len(v.arr)
}

func (v *Vector[T]) Capacity() int {
    return v.capacity
}

func (v *Vector[T]) IsEmpty() bool {
    return v.Size() == 0
}

Hmm, what’s that weird (v *Vector[T]) syntax before the function names? Ah, now we come to another quirk of Go.

Functions and Methods
#

As I mentioned earlier, Go isn’t an object-oriented language - it doesn’t have classes, methods, or inheritance.

But you can define functions which act on a type, just like methods which act on an instance of a class.

In Go these functions are called receiver methods (just to confuse you).

The receiver is the type that the function acts on. It’s specified in-front of the function name, which is what the (v *Vector[T]) is before each of the function names above.

Value and Pointer Receivers
#

There are two kinds of receiver - pointer receivers and value receivers.

I use pointer receivers in my code snippet above (you can tell because of the *). Pointer receivers mean that the function acts on the datatype directly.

This is in contrast to value receivers, which don’t have the * in-front of the type name (i.e., (v Vector[T])). In value receivers, a copy of the type is made each time the function is called.

The table below shows important differences between the two kinds of receiver.

Aspect Pointer Receiver Value Receiver
Syntax func (v *Vector[T]) IsEmpty() bool func (v Vector[T]) IsEmpty() bool
Modifies Receiver Yes No (works on a copy)
Memory Passes a pointer Copies the entire data type
Best practice Use for consistency if any method modifies Use for small, immutable types

I chose to use pointer receivers for all Vector methods for two reasons:

  1. Some methods modify the contents of the Vector struct, so they must use pointer receivers.
  2. In Go, it’s good practice to use pointer receivers for every method even if they are only required in some methods.

At(index int)
#

For accessing an element at a given index, I defined At(index int).

func (v *Vector[T]) At(index int) (T, error) {

    if index < 0 || index >= v.Size() {
        var zero T
        return zero, fmt.Errorf("Index out of range")
    }

    return v.arr[index], nil
}

var zero T creates a zero value of type T (e.g., 0 for int, "" for string, nil for pointers). This is needed because if the Vector is empty, we need something to return. Setting to zero avoids memory leaks if T is specialized as a pointer type.

You’ll notice the function returns (T, error). In the case where index is out of bounds, I return a tuple of the aforementioned zero and a formatted error message (fmt.Errorf).

In the case where index is within bounds, I return the element at index (v.arr[index]) and nil to signify “no fault occurred”.

Returning a tuple such as (T, error) brings me to another Go idiom.

Errors As Values
#

Go doesn’t have exceptions. Instead, functions explicitly return an error value that the caller must check.

If someone calls Pop in the case of Vector, they’d have to do so like this:

data, err := myVector.Pop()
if err != nil {
    // handle the error
}
// use data

The code pattern above is very common, and is how Go programs check for failure of functions.

Adding Elements
#

Now we come to the real meat and potatoes - adding elements to the Vector.

Let’s start with adding elements to the end of the array.

Push(elem T)
#

func (v *Vector[T]) Push(elem T) {
    // check if we need to resize
    if v.Size()+1 > v.Capacity() {
        // double the capacity
        v.capacity *= 2 
        // create the resized slice
        newArr := make([]T, v.Size(), v.capacity)
        // copy elements to new
        copy(newArr, v.arr)
        v.arr = newArr
    }
    v.arr = append(v.arr, elem)
}

When adding a new element, I first check there’s enough allocated memory (capacity).

If not, I double the capacity.

Why double the capacity? I’ll explain in a moment.

I then create a new slice using the make function, which allocates and initializes the memory and returns (in my case) something of type []T. I specify the length of the slice as v.Size() and the new doubled capacity.

Once that’s done, I call the builtin append function to add the element to the end of v.arr.

Why did I choose to double the capacity? Theory time!

Spreading The Cost
#

Doubling
#

To explain why I’ve chosen to double the size of the array when resizing, let’s do some analysis.

Assume we start with a capacity of 1 and we want to push n elements into the Vector. Every time the capacity is reached, we double it.

Let’s track when copying elements actually happens:

Resize Occurs When… New Capacity Must Copy
1 Pushing the 2nd element 2 1 element
2 Pushing the 3rd element 4 2 elements
3 Pushing the 5th element 8 4 elements
4 Pushing the 9th element 16 8 elements

Do you see the pattern? We only pay the copy cost when the number of elements hits a power of 2.

To find the total cost of inserting n elements, we need to sum two things:

  • The Insertions: We perform n constant-time writes (one for each new element), which costs n.
  • The Copies: All the copy operations caused by resizing up to n.

The copy operations from 1 to n form a geometric series:

\(1 + 2 + 4 + 8 + \cdots + 2^k\)

Here, \(2^k\) represents the number of elements copied during the most recent resize, which is the largest power of 2 less than or equal to n.

To find the total cost of all the copy operations up to n, we need to find the sum of the series from \(1\) to \(2^k\).

A neat trick about powers of 2 is that the sum of the series is always one less than the next power of 2.

\(1 + 2 + 4 + 8 + \cdots + 2^k = 2^{k+1} − 1\)

Since \(2^k \leq n\), the next power of 2 (\(2^{k+1}\)) must be \(\leq 2n\). Therefore, we can bound the total copy operations:

\(\text{Total Copies} \leq 2n\)

Now we combine the costs to find the total:

\(\text{Total Work} = \text{Insertion Cost} + \text{Copy Cost}\)

\(\text{Total Work} \leq n + 2n\)

\(\text{Total Work} \leq 3n\)

Looking at that occasional expensive resize, you may think the operation is \(O(n)\).

However, if I spread the total cost over n push operations, the average cost per push is \(3n/n = 3\).

Since 3 is a constant, we can conclude that pushing amortizes to \(O(1)\) per push - constant time on average, even though individual pushes might occasionally cost \(O(n)\) when a resize is needed.

Constant Resize
#

Compare this to increasing the number of slots by a fixed amount such as 2.

Starting with size 2:

Resize Occurs When… New Capacity Must Copy
1 Pushing the 3rd element 4 2 elements
2 Pushing the 5th element 6 4 elements
3 Pushing the 7th element 8 6 elements
4 Pushing the 9th element 10 8 elements

Notice that resizing occurs more frequently, and the copy cost grows linearly with each resize.

For n insertions with a constant increment of k (where \(k = 2\)):

  • The Insertions: n constant-time insertions still cost n.
  • The Copies: We need to work out the total cost of copying up to n when copying occurs every k insertions.

The copy operations from 2 up to n will be:

\(2 + 4 + 6 + 8 + \cdots + n\), assuming n is a multiple of 2 for simplicity.

Factoring out \(k = 2\) from earlier:

\(k * (1 + 2 + 3 + 4 + \cdots + n/k)\)

In this case, the copy operations form an arithmetic series rather than a geometric one.

Using the formula for the sum of the first m positive integers to give us the total copies:

\(k * (m * \frac{a + l}{2})\), where

  • \(m\) = number of integers
  • \(a\) = first term
  • \(l\) = last term

Substituting:

  • \(m\) = \(n/k\)
  • \(a\) = \(1\)
  • \(l\) = \(n/k\)

We get:

\(k * (\frac{(n/k)(n/k + 1)}{2}) \approx \frac{n^2}{2k}\)

Substituting \(k = 2\)

\(\text{Total copies} \approx \frac{n^2}{4}\)

So the total work is \(n + \frac{n^2}{4}\), which is \(O(n^2)\).

If we spread the total cost over n Push operations:

\(\frac{O(n^2)}{O(n)} = O(n)\)

Therefore, increasing the size by a constant amount amortizes to \(O(n)\), or linear complexity per Push, making each operation more expensive than doubling.

This means on average, every single push becomes a generic linear \(O(n)\) operation, which is disastrous for performance compared to \(O(1)\).

Math lesson over.

Insert(index int, elem T)
#

I’ll Insert an element at an index, and shift all the elements after it to the right. I’ll add some bounds checking to make sure users only Insert elements within the Vector’s bounds.

For example, if the original Vector is [1, 2, 4] and I Insert(2, 3), the resulting Vector should be [1, 2, 3, 4].

func (v *Vector[T]) Insert(index int, elem T) error {
    // check bounds
    if index < 0 || index > v.Size() {
        return fmt.Errorf("Index %d out of bounds.", index)
    }

    // add new space at the end
    var zero T
    v.Push(zero)

    // shift elements to the right, starting from the end
    for i := v.Size() - 1; i > index; i-- {
        v.arr[i] = v.arr[i-1]
    }

    // insert the element
    v.arr[index] = elem

    return nil
}
  1. First I check that index is within bounds.

  2. If index is within bounds, I Push zero to the end of the Vector.

  3. Then I loop from the end of the Vector to index, moving elements to the right.

  4. Finally I Insert the element at index.

The Insert function introduces another Go quirk.

Loops
#

Go only has one kind of loop - the for loop. The for loop serves a few purposes:

  • It can be used as a normal for loop: for i := 0; i < len(v.arr); i++

  • It can iterate over a range: for index, element := range v.arr

  • It doubles as a while loop: for i < len(v.arr)

  • It can be used as an infinite loop: for { ... }

I found the lack of a while loop took a little getting used to, but it soon felt very natural.

Removing Elements
#

We’ll begin with removing elements from the end of the Vector with Pop.

Pop()
#

func (v *Vector[T]) Pop() (T, error) {
    var zero T
    // can't pop from an empty Vector
    if v.Size() == 0 {        
        return zero, fmt.Errorf("Can't pop from an empty Vector.")
    }
    // save the last element
    lastElem := v.arr[v.Size()-1]
    // set the last element to zero - avoids memory leaks
    v.arr[v.Size()-1] = zero
    // shrink the slice
    v.arr = v.arr[:v.Size()-1]
    return lastElem, nil
}
  • If v.Size() == 0 is true the Vector has no elements. In that case, I return a tuple of zero and an fmt.Errorf. I’ll explain this in a moment.

  • Otherwise I save the element at the end of the Vector to lastElem.

  • Then I set the end of the Vector to zero to mark it empty.

  • Next, I shrink the length of the slice by 1 with v.arr = v.arr[:v.Size()-1].

  • Finally I return a tuple containing lastElem and nil, again signaling success.

Delete(index int)
#

Next I implemented Delete, which takes an index and deletes the element at that index, shifting all elements after it to the left.

E.g., given the Vector [1, 2, 3], Delete(1) should yield [1, 3].

func (v *Vector[T]) Delete(index int) error {

    // bounds check
    if index < 0 || index >= v.Size() {
        return fmt.Errorf("index out of range")
    }

    // start at 'index' and overwrite the
    // current element with the next one (index+1)
    for i := index; i < v.Size()-1; i++ {
        v.arr[i] = v.arr[i+1]
    }

    var zero T
    // set last element to zero
    v.arr[v.Size()-1] = zero
    // shrink the slice
    v.arr = v.arr[:v.Size()-1]

    return nil
}
  • As usual we start with a bounds check.

  • If index is within bounds, I loop from index to the second last element of the Vector (for i := index; i < v.Size() - 1; i++) and overwrite the current element (v.arr[i]) with the next element v.arr[i+1].

  • At the end of the loop, I assign the last value to the generic zero.

  • Then I reduce the length of the slice by 1 with v.arr[:v.Size()-1].

  • I return nil to signal success.

With that, we’re done!

Next Steps
#

With a working Vector implementation, there are several directions to explore next.

Testing and Benchmarking: The code would benefit from a comprehensive test suite using Go’s built-in testing package. Writing unit tests for edge cases (empty Vectors, single elements, boundary conditions) would help ensure correctness. Benchmarking against Go’s built-in slices would also reveal how the implementation performs in practice.

Additional Methods: There are useful operations we haven’t covered yet: Clear() to empty the Vector, Reserve(capacity) to pre-allocate memory, Shrink() to reduce excess capacity, and Contains(elem) for searching. A String() method would also make debugging easier.

Concurrency: We haven’t touched Go’s concurrency features yet, but making Vector thread-safe would be a natural next step. This could involve adding mutex locks or exploring Go’s channels and goroutines to handle concurrent access safely.

Real-World Usage: The best way to learn is by building something. Try using Vector in a small project - perhaps implementing other data structures like a stack or queue on top of it, or using it in a program that processes data dynamically.

Go’s simplicity and power make it a joy to work with. While this Vector implementation is educational, it’s given me hands-on experience with Go’s type system, memory management, and idioms. The journey from Python to Go continues, and I’m excited to see where it leads.