Article #2 from 2022
During my 1.5 years working as a Cloud Product Owner on a major defense company private Cloud, I had the opportunity to use Cloud Computing technologies and write a research paper about it. Cloud computing offers remarkable financial flexibility, improves operational efficiency, and delivers updates faster.
Reading Time: 10 minutes
Public: A Cloud provider owns and operates infrastructure resources and services that you can use and access via the internet? Although you are sharing them with other tenants, the advantages are manifold: no purchase, no maintenance, unlimited scaling, high reliability.
Private: Instead of being shared, the infrastructure is dedicated to one customer whether it is his (on-premise Cloud) or managed by a third-party (managed private Cloud). This grants a total access over resources and related confidentiality management.
Hybrid: By combining the advantages of Public and Private Cloud, you make it possible to move application easily between the two environments while benefiting from both more control and infinite scaling when needed.
Public - No investment, unlimited capacity, shared resources.
Private - High customization, secure data, complexity of operation.
Hybrid - Increase flexibility, ease of adaptation, difficulty of integration.
Resource virtualization simulates hardware functions in order to create a virtual computer system such as an application, a server, a storage space or even a network: virtualizing resources allows to reduce costs and increase efficiency. More concretely, virtualizing consists of creating several virtual machines, called VM for Virtual Machine, from a physical machine using software called a hypervisor which manages the sharing of resources.
A hypervisor can be of type 1 (KVM, Hyper-V, vSphere) called native hypervisor where it runs directly on the host hardware as in datacenters and in server-based environments. A hypervisor can also be of type 2 (Workstation, VirtualBox) called hosted where it runs on a host operating system that runs on the hardware. This type of hypervisor is used to run multiple operating systems on a personal computer.
The operation of a VM is identical to that of a physical machine such as a computer, but several VMs can rely on the resources of a single physical system and thus make it possible to run several operating systems on a single server called a host. It is the hypervisor which, depending on the needs of the instantiated VMs, will allocate resources from the host to them. IT operations are automatically made more efficient thanks to this flexible allocation of resources which is the foundation of Cloud Computing, the mode of operation of which is as follows:
1. Virtualization relies on a hypervisor to create virtual machines from physical servers, making the processing power, applications or storage space of these servers available in a virtual environment: the Cloud.
2. The user accesses this Cloud via the network.
3. In the event of a greater need for resources, the user can ask the operators to allocate more or do it himself thanks to the as-a-Service model.
4. The user no longer needs the resources and releases them, reducing his bill and allowing others to use them.
Cloud Computing allows different users to access resources that they share through virtualization, which is the technology that allows a server to provide its capabilities to multiple consumers. The advantages of virtualization are greater flexibility, reliability and efficiency: the company has greater flexibility in the way it allocates resources, including virtualization that facilitates operation, data backup and recovery can be automated, resource consumption is optimized and costs are reduced.
Since 2013, using a VM is no longer the only viable virtualization technique following the emergence of the Open Source Docker container engine. A container is generally lighter because it does not include an operating system but only the code and software elements necessary for an application to run in any environment. A container image is a software package containing everything needed to run an application: code, tools, libraries, settings, etc. This image becomes a container as soon as it is instantiated.
VMs and containers have similar benefits in isolation and resource allocation but work differently because containers virtualize the operating system instead of the hardware and are more portable. A container is an abstraction at the application layer that embeds the code and its dependencies together. Multiple containers can run on the same machine and share the same operating system between them while operating as isolated processes.
Containers generally take up less space than a VM with images weighing a few tens of MB and can thus support more applications with fewer resources. A VM is an abstraction of the physical hardware layer that transforms a server into multiple servers and the hypervisor allows multiple VMs to run on a single machine. Each VM carries an entire copy of an operating system in addition to the application code and its dependencies, thus taking up several tens of GB and reducing launch time.
In order to guarantee its users complete autonomy in the management of resources made available, the Cloud practices the management of its infrastructure as code, called IaC, to dynamically respond to needs by automating associated deployments. Infrastructure management is done using lines of code rather than manual processes, thus ensuring an idempotent way to build and modify it, whether it is in a public or private Cloud and with virtually no technological constraints.
The code is stored in configuration files containing the characteristics of the infrastructure, facilitating modification and version management. Indeed, codifying and documenting the configuration facilitates its management and reduces the rate of undocumented changes.
Version control is essential in IaC and is enabled by the use of configuration files that can be managed by a source control system like any other software source code. The deployment can also be split into modules, i.e. configuration files grouped by theme, which can be combined as needed. By automating the implementation, called provisioning, of the infrastructure with IaC, operators no longer need to manually manage any component each time a new version or patch is deployed: no need to connect to x servers to perform dozens of operations, all you have to do is write the code once and deploy it on x servers.
IaC can be declarative or procedural/imperative. With the declarative approach, the desired state of the system must be defined and the IaC tool (Terraform) is responsible for modifying the system configuration accordingly. With the procedural approach, the specific commands necessary to obtain the desired configuration of the system must be defined, which the IaC tool (Ansible) will then execute in order.
Unlike the long and costly infrastructure provisioning in the past due to the need to manage it at the physical hardware level in data centers, virtualization and the Cloud allow this to be done remotely. However, as the number of applications put into production every day, the number of infrastructure components has also rapidly increased and IaC is the only way to manage current infrastructures. These advantages are numerous: reduced costs, accelerated deployments, reduced errors, improved infrastructure consistency and elimination of configuration gaps.
Regarding version control, it is a system that records the evolution of files over time so that a previous version can be recalled at any time. Whether it is a digital drawing, a spreadsheet or computer code, using a VCS (Version Control System) like git allows you to return any project to a previous state, to visualize the changes over time, to see what modifications have been made and by whom. Using a VCS means having the security of being able to return to a stable state in the event of an error.
What differentiates git from other VCSs is its model proposing the use of independent branches allowing you to work on several versions of the same file in parallel and then merge them all. This allows you to change context quickly to test an idea and do some tests before returning to the untouched base version while keeping the freshly created test branch. The division by role is made possible with the branches, which allows to define that only one branch of the file is the one going into production and to reserve others for the tests for example, or to dedicate each branch to the development of a precise functionality.
With the facilitated multiplication of the number of branches, git prevents users from getting lost thanks to several functionalities such as knowing which branch one is in, changing branches, comparing the differences between two branches, deleting or creating a branch and of course merging two branches with the highlighting of potential conflicts.
By combining the practices and tools of the software world with that of infrastructure, we end up with the ability to deliver services at a high pace. This is DevOps, a contraction of Development and Operations, which uses software development processes for infrastructure management to gain competitiveness, be better optimized and evolve more quickly. The development and operations teams are merged into the same team and engineers can then work on the entire life cycle of an application: creation, testing, deployment, operation.
By combining the practices and tools of the software world with that of infrastructure, we end up with the ability to deliver services at a high pace. This is DevOps, a contraction of Development and Operations, which uses software development processes for infrastructure management to gain competitiveness, be better optimized and evolve more quickly. The development and operations teams are merged into the same team and engineers can then work on the entire life cycle of an application: creation, testing, deployment, operation.
DevOps accelerates the pace of innovations delivered to customers and the ability to adapt to the market, resulting in increased efficiency and growth. The release of new versions is rapid and overall reliability increases because the high delivery rate ensures regular updates of problems encountered. This model is accompanied by several best practices including IaC presented above, but also the design of microservices where each of them responds to a function. The latter facilitate continuous delivery which consists of automatically testing and publishing code changes. Monitoring and tracking anomalies is facilitated by maintaining permanently fed event logs.
Automation also affects containers with tools dedicated to their orchestration due to the complexity of controlling the scaling, resilience and frequency of changes in containerized applications. The first version of Kubernetes was released in 2015 and this tool is now the most widespread. It orchestrates containers, their networking, scaling, storage infrastructure and load balancing.
By deploying their application with Kubernetes, the user obtains a cluster composed of at least one worker machine called node on which the containerized application runs. Each node is in permanent connection with the control plane which is an orchestration layer specific to the cluster in charge of the lifecycle of the containers, and therefore of the nodes. This orchestration layer is also responsible for exposing the Kubernetes API of the cluster, a software interface allowing access to the services of a program, which is used by the deployment tools presented above.
Applied to the Cloud, elasticity describes the ability of a system to add or remove resources on the fly to adapt to load variations over time. It is a dynamic property of Cloud Computing that can be horizontal or vertical. Elasticity is based on scaling. The most widespread is horizontal scaling, which consists of adding or removing VMs on which an application is based, as well as vertical scaling, which consists of increasing or reducing the capacity of these VMs, both ultimately having the objective of correctly adjusting the sizing of resources to the evolution of demand over time. Elasticity can be seen as the combination of scaling resources, automation of operations and optimization of allocations.
Elasticity = Scaling + Automation + Optimization
This means that elasticity is not the equivalent of scaling but is built through scaling. Elasticity is also not the equivalent of automated scaling called auto-scaling because it adds the notion of efficiency characterized by the use of the least possible resources to perform a task.
To facilitate the elasticity of a Cloud, the latter must use interoperable resources so that changes are made transparently, ensure that they are available to meet requests by performing reactive and proactive operations, offer the lowest possible startup time, define thresholds for triggering capacity adjustment processes, monitor consumption, raise awareness among its users about adapting their application to the Cloud, find a balance between user needs and the provider's interests, and combine the use of VMs and containers.
Cluster by entity: Since business entities have different projects and data, it is necessary to make separation at several levels to ensure maximum sealing of the environments and control the flow of communications preventing intrusions into the system and data theft. The first separation takes place at the entity level where each is considered a tenant meaning that they will all be in their own cluster, namely a group of hosts, with their own domain name to access it and control over access management. The entities share the available resources but each receives a fixed amount reserved for it. This separation into clusters is possible thanks to network segmentation.
Virtual network by project: Within the cluster of an entity, all the infrastructure elements communicate through a first network layer dedicated to them. On this underlying network is built a virtual Overlay network on which the VMs dedicated to a project are executed. There are as many Overlay networks as there are projects within the entity. The network encapsulation protocol (VXLAN) chosen makes it possible to create several million virtual networks superimposed on a single underlying network. For projects using Kubernetes, the operation is similar: on its Overlay network, the team can orchestrate millions of individually isolated containers. The encapsulation protocol used is different but the result is the same regarding the number of usable virtual networks. In its cluster, an entity can therefore theoretically have millions of different projects separated from each other, and each of them can itself have millions of virtual networks to separate its containers.
Environment by need: Environment are where code is deployed and come by three for its users to follow the three stages of maturity: the development environment for the creation and configuration of the software, the pre-production environment for the verification of conformity, the production environment making it accessible to all.
General redundancy: Since computer systems must guarantee maximum reliability, often called HA for High Availability, operators use redundancy. This technique consists of duplicating critical elements so that a failure does not cause the system to shut down. Let's take a computer network as an example: redundancy means having two output gateways to avoid connection interruption. Redundancy can be active-active where the two elements share the load, or active-passive with one taking the entire load while the second only receives control requests. Redundancy is intended to be extended to the datacenter level by duplicating all or part of the infrastructure in a second datacenter in order to benefit from all the advantages of duplication such as better resilience and higher availability. In the most extreme cases such as a fire or the destruction of the building, this second data center becomes essential to ensure compliance with the RPO, the maximum amount of data that can be lost, and the RTO, the maximum downtime.
Edge Computing: As defined above, Cloud Computing is the execution of a workload in the Cloud via the network and the more data there is on the network, the greater the risk of slowdown and loss. It is therefore necessary to avoid the scenario where the data does not arrive in the Cloud. No data, no processing. No processing, no results. No results… no results. Bringing computing capacity and resources as close as possible to users located at the edge of the network represents an architectural and technological challenge. Offering Edge Computing means the inclusion of new external resources to the existing datacenters requiring work on data flow and security management, centralized management of a decentralized network, load transfer management, taking into account location, taking into account application specificities and traffic offloading according to the load on the network.