Repository: Where It Should Live and How To Avoid Common Mistakes

Unlock maintainable code with the Repository Pattern—essential insights for startup engineers aiming for robust architecture. Learn to sidestep common pitfalls and elevate your development process.

Jun 19, 2024

Creating abstractions to access data in a recurring topic.

In the dynamic world of software engineering, especially within the fast-paced startup environment, the ability to write scalable and maintainable code is paramount. As projects become complex, the direct interaction with data storage can quickly become a tangled web of dependencies, making the system harder to manage and evolve. This is where abstracting the data layer is necessary.

Adopting repositories is a good solution for this issue.

Adopting the Repository Pattern creates a consistent and easily testable interface for your data operations, ensuring your codebase remains flexible and adaptable to changes. Whether you're dealing with a relational database, a NoSQL store, or even a remote API, the Repository Pattern allows you to encapsulate these details, promoting cleaner code and reducing the risk of errors.

In this post, I will explain:

The structure of the applications and where repositories come in place.
The difference between a Repository and a Data Access Object.

Along the way, I will also share some mistakes I made in the past using repositories.

The 3 Application Layers

You can think of any application divided into three layers.

Even the most straightforward application should have at least this structure. You can have sub-layers and more complex structures, but the idea is the same. You want to separate these three layers.

Each module should have this clear division if your application is divided into modules.

1. The API Layer

The API Layer is the public interface of your application.

Users will interact with your applications using the functions exposed by this layer. Other applications might interact with your application through this layer. If your application is bigger, it should be structured in modules, and each module should expose its public interface, which other modules will use.

The API Layer represents a contract between your application and the outside world.

2. The Business Logic Layer

The Business Logic Layer is the core of your application.

It contains relevant logic to your application and should not be exposed. Changing this logic should not affect anything but your application. If your application has modules, each module will have its Business Logic Layer completely different.

The Business Logic Layer is why your application exists; it expresses the functionalities of your application that will be used through the API Layer.

3. The Storage Layer

The Storage Layer or Data Layer is responsible for retrieving and storing data.

It doesn’t matter if your application stores data in an SQL or NoSQL database, a file on your disk, or a remote server. The logic to access those data is in the Storage Layer. If your application is divided into modules, each module will access its storage with its Storage Layer.

Repositories live inside your Storage Layer.

The Repository

A Repository is a way to access storage data, which abstracts how those data are structured and persisted.

Your Business Logic Layer should not know how data are structured inside your database. The BLL only wants an entity with a specific shape. If that entity is stored in two different tables, the repository should not expose it.

In the above image, users and addresses are two different tables. But in the BLL, those might be the same entity. This means the UserRepository should abstract this information and always expose a User with its Address.

Repository vs Data Access Object

Sometimes, Repositories and Data Access Objects are considered the same thing.

They both are abstractions to storage. But they are not the same thing. A Data Access Object is a CRUD interface for a database table. It provides more granular operations than a Repository. A DAO has almost no logic. The user should directly express the operations using a DAO.

Repositories do not expose operations that are not allowed in the domain logic.

DAOs are a direct interface connected to data storage that allows for any operation. DAOs do not know your domain, and they shouldn’t. Repositories, on the other hand, are related to your domain.

You should implement a repository to return objects in the shape expected by your BLL; DAOs will return what you have in your database.

Collection Repositories

A Collection Repository is a Repository that exposes data as if they would in a Collection.

With a Collection Repository, you can modify your data as if they were in memory. You can imagine your data storage as a Set to give you a concrete example. You can modify elements in a Javascript Set without re-saving them since your changes are reflected.

Collection Repositories are usually implemented using ad-hoc tools.

These tools should support a mechanism to track the changes to your objects and ensure they are reflected in the data storage version. Using a Repository for data that lives in memory is also possible, but this is not a common scenario.

A Collection Repository would be used just like a Set:

As you can see, changes made to the user are reflected in the storage.

Persistence Based Repositories

When you use Persistent Based Repositories, you need to save items manually.

If you are creating a repository system without ORMs, you will probably use this kind of repository. The downside of this type of repository is that you have less abstraction. Anyway, this might also be considered a good thing in some situations.

The code using this repository type would look like this:

As you can see, you need to re-save the user before seeing the changes reflected in the storage.