Scalability is the ability of a system, process or resource to adapt and handle an increase in user demands without losing its performance or functionality. In the context of technology and operating systems, scalability refers to the ability to handle an increase in workload efficiently, whether by increasing hardware capacity, distributing the load across multiple resources, or using optimization techniques. of software.
Horizontal vs Vertical Scalability
Scalability in computer systems can be achieved in two main ways: horizontal scaling and vertical scaling . Both approaches have their own specific characteristics and applications.
Horizontal scalability
- Also known as scale-out scalability, it involves adding more resource instances, such as servers, nodes, or virtual machines ( virtualization ), to distribute the workload;
- It is achieved by adding more resource units horizontally, that is, adding more servers or nodes to the system ;
- allows handling large volumes of traffic by distributing the workload across multiple instances;
- It is highly flexible and can easily adapt to sudden increases in demand by adding more instances;
- It is commonly used in distributed applications and cloud computing environments, such as cloud storage .
Vertical scalability
- Also known as scale-up scalability, it involves increasing the resource capacity of a single instance, such as a server or virtual machine, by adding more resources, such as CPU (parallel processing), RAM, or storage;
- It generally involves a higher cost per upgrade , as it involves the acquisition of more powerful hardware;
- It is suitable for applications that cannot be easily distributed across multiple instances or where latency between components is critical;
- may reach physical or economic limits in terms of upgradability, depending on hardware specifications and budget.
Traffic spikes
Managing traffic spikes is crucial to ensuring that IT systems and online applications can more energy and operationally handle a sudden increase in demand.
Two important techniques for managing traffic spikes are traffic engineering and traffic balancing.
traffic engineering
The process of managing and optimizing the flow of data on a network to ensure optimal performance and high availability . It involves the design and configuration management of network infrastructure to minimize congestion and maximize traffic efficiency.
Traffic engineering techniques may include network segmentation , intelligent routing, priority bandwidth allocation, and congestion control. In the context of traffic spike management, it can help distribute the load evenly among available resources and avoid network bottlenecks .
Traffic Balancing
A technique that distributes the workload across multiple servers or resources to optimize system performance and availability. It is used to uniformly serve user requests across multiple servers, which helps prevent overload and improves server responsiveness .
It can be implemented at the network level, using devices such as load balancers, which direct incoming requests to different servers based on certain criteria, such as the current load on each server or resource availability. Traffic balancing can be performed statically (requests are distributed based on a predefined configuration), or dynamic (the response is adjusted in real time based on load conditions and resource availability).
Databases
Databases are fundamental tools in data management at scale, offering solutions adapted to different scalability and performance needs.
Distributed databases
They store data on multiple nodes or servers, distributing the workload and allowing quick access from different geographic locations. Horizontal scalability is a fundamental feature of distributed databases, as they can scale by adding more nodes to the cluster to handle increases in demand.
NoSQL databases (Not Only SQL or not only SQL)
Database management systems designed to handle large volumes of unstructured or semi-structured data across multiple nodes. These databases are usually highly scalable. Some NoSQL technologies, such as MongoDB or Cassandra , are specifically designed to scale horizontally and can efficiently distribute data across clusters of servers.
In-memory databases
They store data in the server's RAM instead of physical disks, allowing for extremely fast access. Vertical scalability is important for in-memory databases, as adding more hardware resources can improve performance and data handling capacity.
Scalable databases
Those that can grow efficiently to handle a greater volume of data and a greater workload. They can be both relational and NoSQL, but their design and architecture are oriented towards scalability , whether horizontal or vertical. Scalable databases typically use techniques such as sharding, partitioning, and data replication to distribute the workload and ensure optimal performance as demand grows.
Architectures and design patterns
Architectures and design patterns are powerful tools for achieving scalability in software systems, allowing them to adapt and grow efficiently to meet changing user and application demands.
Microservices
Microservices architecture is a software design approach where an application is built as a set of small, independent services, each running in its own process and communicating over lightweight network protocols such as HTTP. Horizontal scalability is one of its key features, as each service can be scaled according to demand, allowing for greater flexibility and efficiency in resource use.
Distributed architecture
A model in which the components of a software system run on multiple interconnected nodes or servers. Allows you to scale horizontally by adding more nodes as needed to handle increases in demand.
Sharding
Sharding is a data sharding technique that divides a database into multiple fragments (or shards ), and distributes them across multiple servers. Each contains a portion of the total data and can be managed independently, allowing storage capacity and performance to be scaled out.