- Robert Sheldon
Erasure coding (EC) is a method of data protection in which data is broken into fragments, expanded and encoded with redundant data pieces and stored across a set of different locations or storage media.
If a drive fails or data becomes corrupted, the data can be reconstructed from the segments stored on the other drives. In this way, EC can help increase data redundancy, without the overhead or limitations that come with different implementations of RAID.
How does erasure coding work?
Erasure coding works by splitting a unit of data, such as a file or object, into multiple fragments (data blocks) and then creating additional fragments (parity blocks) that can be used for data recovery. For each parity fragment, the EC algorithm calculates the parity's value based on the original data fragments. The data and parity fragments are stored across multiple drives to protect against data loss in case a drive fails or data becomes corrupted on one of the drives. If such an event occurs, the parity fragments can be used to rebuild the data unit without experiencing data loss.
For example, a storage system might use a 5+2 encoding configuration to distribute data across multiple physical drives. In this configuration, the EC algorithm breaks each data unit into five data fragments and then adds two parity fragments, which are calculated from the original data. Each fragment is stored on a different physical drive. As a result, the storage system must include at least seven drives.
This article is part of
What is data protection and why is it important?
- Which also includes:
- Comparing data protection vs. data security vs. data privacy
- 20 keys to a successful enterprise data protection strategy
- 5 common data protection challenges that businesses face
In a 5+2 configuration, the parity data consumes 40% of raw capacity. The configuration can also tolerate up to two disk failures, whether the disks contain data fragments or parity fragments. However, EC is flexible enough to support a wide range of configurations. For example, a 17+3 encoding would split each data unit into 17 segments and then add three parity segments. Although this configuration requires at least 20 physical drives, it can support up to three simultaneous disk failures, while reducing the parity overhead to less than 18%.
Erasure coding makes it possible to protect data without having to fully replicate it because the data can be reconstructed from parity fragments. For instance, in a simple 2+1 configuration, a data unit is split into two segments, with one parity fragment added for protection. If an application tries to retrieve data from either of the data segments and those segments are available, the operation proceeds as normal, even if the parity segment is unavailable.
However, if the first data fragment is available but the second data fragment isn't, or vice versa, data is read from the first data fragment and the parity fragment. Together these two fragments are used to reconstruct the data that was in the second fragment, making it possible to continue data operations while the disk is being rebuilt.
Erasure coding vs. RAID
Erasure codes, also known as forward error correction codes, were developed more than 50 years ago to help detect and correct errors in data transmissions. The technology has since been adopted to storage to help protect data in the event of drive failure or data corruption. More recently, EC has been gaining popularity for use with large object-based data sets, particularly those in the cloud. As data sets continue to grow and object storage is more widely implemented, EC is becoming an increasingly viable alternative to RAID.
RAID relies on two primary mechanisms for protecting data: mirroring and striping with parity. Mirroring is one of the most basic forms of data protection. When used alone, it's referred to as RAID 1. In this configuration, multiple copies of the data are stored on two or more drives. If one drive fails, the data can be retrieved from one of the other drives, without interruption to service. Mirroring is easy to implement and maintain, but it uses a large amount of storage resources, just like any form of replication.
Striping with parity, referred to as RAID 5, stripes data across multiple hard disks and adds parity blocks to protect the data. If a drive fails, the missing data can be reconstructed using the data on the other disks. However, RAID 5 can support only one disk failure at a time. For this reason, some vendors offer RAID 6 storage systems, which can handle up to two simultaneous disk fails. Different RAID configurations can also be combined, as in RAID 10, which uses disk mirroring and data striping without parity to protect data.
The various RAID configurations have been integral to data center operations for many years because the technology is well understood and has proven a reliable form of data protection for a wide range of workloads. However, RAID comes with significant challenges. For example, mirroring is inefficient when it comes to resource utilization, and striping with parity can protect against only two simultaneous disk failures at best.
Another issue with RAID is related to capacity. As disk drives become larger, it takes much more time to rebuild a drive if it should fail. Not only can this affect application performance, it can also increase the risk of losing data. For example, if a drive fails in a RAID 5 configuration, it might take days to rebuild that drive, leaving the storage array in a vulnerable position until the rebuild is complete. An incapacitated disk can also affect application performance.
In some cases, erasure coding can be used in place of RAID to address its limitations. Erasure coding can exceed RAID 6 in terms of the number of failed drives that can be tolerated, increasing the level of fault tolerance. In a 10+6 erasure coding configuration, 16 data and polarity segments are spread across 16 drives, making it possible to handle up to six simultaneous drive failures.
Erasure coding is also much more flexible than RAID, whose configurations are fairly rigid. With EC, organizations can implement a storage system to meet their specific data protection requirements. In addition, EC can reduce the amount of time it takes to rebuild a disk that has failed, depending on the configuration and number of disks.
Despite these benefits, EC has a serious drawback: its effect on performance. Erasure coding is a processing-intensive operation. The EC algorithm must run against all data written to storage, and the data and parity segments must be written across all participating disks. If a disk fails, rebuild operations put an even greater strain on CPU resources because the data must be reconstructed on the fly. RAID configurations, whether mirroring or striping with parity, have much less of an effect on performance and can often improve it.
Why is erasure coding useful?
Major cloud storage services such as Amazon Simple Storage Service (S3), Microsoft Azure and Google Cloud use erasure coding extensively to protect their vast stores of data. Erasure coding has proven especially beneficial for protecting object-based storage systems, as well as distributed systems, making it well suited to cloud storage services. That said, erasure coding has also been making its way into on-premises object storage systems, such as the Dell EMC Elastic Cloud Storage (ECS) object storage platform.
Erasure coding can be useful with large quantities of data and any applications or systems that must tolerate failures, such as disk array systems, data grids, distributed storage applications, object stores and archival storage. Most of today's use cases revolve around large data sets for which RAID isn't a practical option. To support EC, the infrastructure must be able to deliver the necessary performance, which is why its predominant use case has been with major cloud services.
Erasure coding is often recommended for storage such as backups or archive -- the types of data sets that are fairly static and not write-intensive. That said, erasure coding is finding its way into a variety of systems trying to avoid the high costs of replication. For example, many Hadoop Distributed File System (HDFS) implementations now use EC to reduce the overhead associated with storing redundant data across data nodes. In addition, object storage platforms such as Hitachi Content Platform now support erasure coding for protecting data.
What are the benefits of erasure coding?
Although RAID can still be a useful tool for data protection, EC offers several important benefits that should be considered when planning data storage:
- Better resource utilization. Replication techniques such as RAID 1 mirroring use a high percentage of storage capacity for data copies. Erasure coding can significantly reduce storage consumption, while still protecting data. The exact amount of capacity saving will depend on the encoding configuration, but whatever it is, it will still translate to greater storage efficiency and lower storage costs.
- Lower risk of data loss. When a RAID array is made up of high-capacity disks, rebuilding a failed drive can take an extremely long time, which increases the risk of data loss should another drive fail before the first one can be rebuilt. Erasure coding can handle many more simultaneous disk failures, depending on the encoding configuration, which means that there is a lower risk of data loss if a drive goes down.
- Greater flexibility. RAID tends to be limited to fairly fixed configurations. Although vendors can implement proprietary RAID configurations, most RAID implementations are fairly standard. Erasure coding provides far more flexibility. Organizations can choose the data-to-parity ratio that best fits their specific workloads and storage systems.
- Greater durability. Erasure coding enables an organization to configure a storage system that offers a high degree of availability and durability. For example, Amazon S3 is designed for 99.999999999% object durability across multiple Availability Zones. Unlike RAID 6, which can sustain only two simultaneous disk failures, an EC-based system can be configured to handle substantially more.
When planning their storage strategies, organizations must consider several factors, including how to protect against data loss and provide disaster recovery. Straightforward replication is one approach and RAID is another. Erasure coding is yet one more.
Each strategy comes with advantages and disadvantages. However, with the growing amount of data and continued move to object storage, EC is destined to gain momentum. Erasure coding enables organizations to meet their scalability needs and still protect their data, without incurring the high costs of full replication. Even so, no technology can flourish without adapting to industry changes, and the EC in service today could look much different five years down the road.
This was last updated in January 2021
Continue Reading About erasure coding
- 6 business benefits of data protection and GDPR compliance
- 20 keys to a successful enterprise data protection strategy
- 4 GDPR strategy tips to bring IT processes up to speed
- How tape backup systems improve data protection
- 3 ways automated backup can aid your data protection
- electric field strength
- Electric field strength is a quantitative expression of the intensity of an electric field at a particular location. Seecompletedefinition
- memory management unit (MMU)
- A memory management unit (MMU) is a computer hardware component that handles all memory and caching operations associated with ... Seecompletedefinition
- permittivity (electric permittivity)
- Permittivity (electric permittivity) is defined as the ratio of electric displacement to the electric field intensity. Seecompletedefinition
Dig Deeper on Flash memory and storage
RAID, flash and erasure coding: What works best with solid state?
Erasure coding vs RAID: Data protection in the cloud era
- A guide to hyper-converged data resiliency options and requirementsBy: ScottLowe
Excelero adds NVMe flash via Ethernet and Fibre Channel
By: AntonyAdshead(Video) Erasure Coding at the Performance of RAID-1 in the vSAN Express Storage Architecture (ESA)
What is erasure coding? ›
Erasure coding (EC) is a method of data protection in which data is broken into fragments, expanded and encoded with redundant data pieces and stored across a set of different locations or storage media.What is RAID 5 erasure coding? ›
RAID 5 or RAID 6 erasure coding is a policy attribute that you can apply to virtual machine components. To use RAID 5, set Failure tolerance method to RAID-5/6 (Erasure Coding) - Capacity and Primary level of failures to tolerate to 1.What is erasure explain with the help of at least one example? ›
The erasure of something is the removal, loss, or destruction of it. Globalization is the erasure of national borders for economic purposes.Why is RAID mirror replication parity erasure code by itself not a replacement for backup? ›
Keep in mind that mirroring and replication by themselves are not a replacement for backups, versions, snapshots, or another recovery point, time-interval (time-gap) protection. The reason is that replication and mirroring maintain a copy of the source at one or more destination targets.What is data erasure and how it is useful? ›
Data erasure is a software-based method of permanently erasing the sensitive, confidential data on a device to make it irrecoverable while ensuring that the device is still reusable. Unlike other measures of data sanitization, it doesn't lead to device destruction and thus, is eco-friendly.What is meant by data erasure how is it useful? ›
Data erasure (sometimes referred to as data clearing, data wiping, or data destruction) is a software-based method of overwriting the data that aims to completely destroy all electronic data residing on a hard disk drive or other digital media by using zeros and ones to overwrite data onto all sectors of the device in ...What are 3 types of RAID? ›
- Striping: data is split between multiple disks.
- Mirroring: data is mirrored between multiple disks.
- Parity: also referred to as a checksum. Parity is a calculated value used to mathematically rebuild data.
The most common types are RAID 0 (striping), RAID 1 (mirroring) and its variants, RAID 5 (distributed parity), and RAID 6 (dual parity).Why is it called RAID 5? ›
(Redundant Array of Independent Disks Mode 5) A popular disk or solid state drive (SSD) subsystem that increases safety by computing parity data and increasing speed by interleaving data across three or more drives (striping).What are the two 2 types of Erasure? ›
- Erasure is a type of alteration in document. It can be classified as chemical erasure and physical erasure.
What is an example of Erasure? ›
The act or an instance of erasing. Erasure of the blackboard. A mark showing that something has been erased. The document has many erasures.
Type erasure ensures that no new classes are created for parameterized types; consequently, generics incur no runtime overhead.What is difference between replication and mirroring? ›
Mirroring is the copying of data or database to a different location. While replication is the creation of data and database objects to increase the distribution actions.How does RAID difference from mirroring? ›
Mirroring is another form of RAID – RAID-1 for the purist. Mirroring consists of at least 2 disk drives that duplicate the storage of data. More frequently, you will see 2 or disk units on each array so duplicate data is sent to the second array of disks.Which is better mirror or parity? ›
Device parity protection is slower than mirroring when a failure has occurred because data on the failing unit has to be reconstructed from the data on other units. Mirroring can also provide protection from other hardware failures.Why is erasure coding better than RAID? ›
In erasure coding, the data is broken in parts, then expanded and encoded. After that the data segments are kept in multiple locations. Verily RAID facilitates data protection; however, erasure coding consumes less storage and RAID is time efficient.What is the most secure method of data erasure? ›
Another form of physical destruction, shredding may be the most secure and cost-effective way to destroy electronic data in any media that contain hard drives or solid state drives and have reached their end-of-life.
That's why you need erasure software to remove all traces of data. Data wiping software makes it more difficult for third parties to get their hands on deleted data. It erases data from hard drives, phone memory, and other storage media several times to prevent the leakage of sensitive information.What is the difference between data deletion and data erasure? ›
Deleted data can be recovered using a DIY data recovery tool or manual/lab techniques. Erased data is no longer present on the system & cannot be retrieved by any means – software or in-lab methods/techniques.How do you handle data erasure requests? ›
If your company receives an erasure request, you must be transparent with the requestor by detailing what will happen to their data when the request is fulfilled. You should always verify the identity of the individual first in order to confirm that they are who they claim to be.
What is data erasure? ›
Data erasure / data wiping is a method of software-based overwriting, that completely destroys all electronic data residing on a hard disk drive.What is RAID in simple words? ›
RAID (redundant array of independent disks) is a way of storing the same data in different places on multiple hard disks or solid-state drives (SSDs) to protect data in the case of a drive failure. There are different RAID levels, however, and not all have the goal of providing redundancy.Why is it called a RAID? ›
The term itself stems from the military definition of 'a sudden attack and/or seizure of some objective'.What are the two 2 types of RAIDs? ›
We have two types of RAID implementation viz. Hardware and Software. Both these implementation has its own advantages and disadvantages.Which RAID is fastest? ›
RAID 0 is the only RAID type without fault tolerance. It is also by far the fastest RAID type. RAID 0 works by using striping, which disperses system data blocks across several different disks.What is RAID 0 RAID 1 and RAID 5? ›
RAID 0 – striping. RAID 1 – mirroring. RAID 5 – striping with parity. RAID 6 – striping with double parity. RAID 10 – combining mirroring and striping.Which RAID is the strongest? ›
RAID 1 – For Highest Security. RAID 1 is best described using two disks as an example. Imagine two hard drives on your desk, and they are working as real-time clones of each other.Can you do RAID with 3 drives? ›
A RAID 5 array is built from a minimum of three disk drives, and uses data striping and parity data to provide redundancy. Parity data provides data protection, and striping improves performance. Parity data is an error-correcting redundancy that's used to re-create data if a disk drive fails.What is the advantage of RAID? ›
Why is RAID used? RAID stands for Redundant Array of Independent Disks, and combines multiple hard drives together in order to improve efficiency. Depending on how your RAID is configured, it can increase your computer's speed while giving you a single drive with a huge capacity. RAIDs can also increase reliability.When should I use RAID? ›
When Should I Use RAID? RAID is extremely useful if uptime and availability are important to you or your business. Backups will help insure you from a catastrophic data loss. But, restoring large amounts of data, like when you experience a drive failure, can take many hours to perform.
What is the difference between Erasure and eraser? ›
Erasure is the act of erasing, deleting, or removing something. It's tricky to write an essay on a typewriter instead of a computer, because it's hard to hide any erasures. An erasure can be made, appropriately, by erasing pencilled words with an eraser, but there are many other kinds of erasure.What is the difference between obliteration and Erasure? ›
Alterations include erasures, charring, indented writing, additional markings, as well as obliterations, while obliterations are the overwriting of text with another substance.What is type erasure and explain the functionality of type erasure with example? ›
Type erasure is a process in which compiler replaces a generic parameter with actual class or bridge method. In type erasure, compiler ensures that no extra classes are created and there is no runtime overhead.What is the opposite of Erasure? ›
Antonyms & Near Antonyms for erasure. enactment, legislation.Does data erasure use encryption? ›
What is Cryptographic Erasure (CE)? This wiping method uses the native command to call a cryptographic erasure, which erases the encryption key. While the encrypted data remains on the storage device itself, it is effectively impossible to decrypt, rendering the data unrecoverable.What are erasure software standards called? ›
DoD 5220.22-M Standard is a widely recognized method for data erasure used by government agencies and organizations worldwide for performing drive erasure. In the media sanitization circles, it is known as US DoD 5220.22-M data wipe standard.What are reified generics? ›
Reified Generics is a generics system that makes generics type information available at runtime. C# is one language that supports Reified Generics natively, as opposed to Java's Type-Erased Generics.What is bridge method in Java? ›
When compiling a class or interface that extends a parameterized class or implements a parameterized interface, the compiler may need to create a synthetic method, which is called a bridge method, as part of the type erasure process.What are not allowed for generics? ›
Cannot Declare Static Fields Whose Types are Type Parameters. Cannot Use Casts or instanceof With Parameterized Types. Cannot Create Arrays of Parameterized Types. Cannot Create, Catch, or Throw Objects of Parameterized Types.What are the three types of replication? ›
There were three models for how organisms might replicate their DNA: semi-conservative, conservative, and dispersive.
What's the difference between replication and clustering? ›
Replication writes all storage points, it can be at the same time, or it can be delayed depending on how it is configured. Clustering uses shared storage. Only the active node accesses the data. Even in Active/Active clustering, only 1 node accesses data at a time, as a cluster group can only be active on one node.What is difference between replication and caching? ›
Replicating data and distributing it as needed is called mirroring. Pure replication differs from caching in the sense that caching systems "pull data" from the origin server, while replication systems tend to "push" data to maintain mirror copies of the same data at various place on the network.What are the three characteristics of RAID? ›
Raid levels has its own characteristics such as fault-tolerance, performance and capacity. Fault-tolerance is the ability to survive one or several disk failures. Performance shows the change in the read and write speed of the entire array as compared to a single disk.Is RAID faster than single drive? ›
A common RAID setup for volumes that are larger, faster, and more safe than any single drive. Your data is spread across all the drives in the RAID along with information that will allow your data to be recovered in case of a single drive failure.How many drives can you have in a RAID 1? ›
RAID 1 is most often implemented with two drives. Data on the drives is mirrored, providing fault tolerance in case of drive failure. Read performance is increased while write performance will be similar to a single drive. A single drive failure can be sustained without data loss.Why is parity used in RAID? ›
RAID 5 is disk striping with parity. With this level of RAID, data is striped across three or more disks, with parity information stored across multiple disks. Parity is a calculated value that's used to restore data from the other drives if one of the drives in the set fails.What are the differences between storage pool and storage spaces? ›
Creating a Pool and a Storage Space
A pool is simply a logical grouping of physical disks, whereas a storage space is a virtualized disk that can be used like a physical disk.
The virtual disk uses three-way mirroring and is a fixed size of 20 GB. You must have at least five physical disks in the storage pool for this cmdlet to work.What does erasure mean in GDPR? ›
This is also known as the 'right to be forgotten'.
You have the right to have your data erased, without undue delay, by the data controller, if one of the following grounds applies: Where your personal data are no longer necessary in relation to the purpose for which it was collected or processed.
Erasure Coding VS RAID? Erasure coding and RAID are sometimes mixed up but they are very much different from each other. RAID allows data to be stored at different locations and it protects against drive failures. In erasure coding, the data is broken in parts, then expanded and encoded.
What is erasure error? ›
In coding theory, an erasure code is a forward error correction (FEC) code under the assumption of bit erasures (rather than bit errors), which transforms a message of k symbols into a longer message (code word) with n symbols such that the original message can be recovered from a subset of the n symbols.What is right to erasure or blocking? ›
The right to erasure or blocking
Under the law, you have the right to suspend, withdraw or order the blocking, removal or destruction of your personal data. You can exercise this right upon discovery and substantial proof of the following: Your personal data is incomplete, outdated, false, or unlawfully obtained.
What is a personal data breach? A personal data breach is defined as “a breach of security leading to the accidental or unlawful destruction, loss, alteration, unauthorised disclosure of, or access to, personal data transmitted, stored or otherwise processed.” The key is that any breach must concern personal data.What is considered an erasure? ›
The word erasure has a multiplicity of meanings and applications; to remove, rub or scrape out written or drawn marks; to remove recorded matter; to remove from existence or memory; and to nullify the effect or force of something.What is RAID 5 and how it works? ›
RAID 5 is a redundant array of independent disks configuration that uses disk striping with parity. Because data and parity are striped evenly across all of the disks, no single disk is a bottleneck. Striping also allows users to reconstruct data in case of a disk failure.What are the three 3 general data privacy principles? ›
Principles of Transparency, Legitimate Purpose and Proportionality. The processing of personal data shall be allowed subject to adherence to the principles of transparency, legitimate purpose, and proportionality.How do you handle data erasure request? ›
If your company receives an erasure request, you must be transparent with the requestor by detailing what will happen to their data when the request is fulfilled. You should always verify the identity of the individual first in order to confirm that they are who they claim to be.What are the 8 rights in data privacy? ›
Under Chapter IV of the Act, there are eight (8) rights that belong to data subjects, namely: the right to be informed; the right to access; the right to object; the right to erasure and blocking; the right to rectify; the right to file a complaint; the right to damages; and the right to data portability.