Skip to main content

Bit Rot, And How Important Is This For Audiophiles

music library

Usually I refrain from blogging computing technical stuffs on Snakeoil. This article will be a rare exception, today I would like to talk about bit rot. This is something you guys should aware of (but not alarmed).

This article will briefly describe:

  • What is bit rot
  • What bit rot means to an audiophile
  • My personal experiences with bit rot
  • How to detect if you are affected with bit rot
  • Strategies to prevent bit rot

Hopefully after reading this article you're more aware of what bit rot is and how to counteract it. Before we start let it be made clear that bit rot is not a big problem. There is no reason to panic and the sky is not going to fall over.

Let us begin!

Image removed.

The Basics Of Bits

Computer data is made up of bits - a bit is a binary value because it can only take one of the two given values, for example 0 and 1.

In simple sense data is a collection of bits grouped and arranged in a specific order to represent information. In computing terms we call this representation a encoding. Like the picture below, bits are first grouped into basic building blocks, then re-arranged to form a more complex building block. Multiple of these more complex secondary building blocks can then be used to form an even building block. 

By applying this technique (like Lego building), computer scientists have created a system where binary values can be used to represent any type of information, or instruction. The whole structure is written to digital media like hard disk (HDD), Solid State Drive (SSD) or USB stick, to be read from at a later time.

The following example we start with the most basic data primitive - the char (Short for character). 

Bit Representation (Data) Value (Information) Meaning
0000 0001 1 The number 1
0100 0000 64 The number 64 
1100 1001 201 The number 201

The char data type takes up 8 bits of space (8 bits = 1 byte). We are also using this as unsigned (geek speak for all positive values). Given 8 bits of addressing space, we can represent whole numbers from 0 to 255 (for a total of 256 values). In the next page we'd explain how the maximum of 255 comes above, as well as the difference between unsigned (all positive) and  signed (positive and negative). 

The Basics Of Bits (Part II)

In the binary number system, the bit position gains more value (magnitude) as we move to the left (starting from the rightmost bit). 

The right most bit is the first bit. Also called the least significant bit (LSB), this holds the value of 0 or 1, second bit holds the value of 0 or 2, the third bit holds the value of 0 or 4, until the last bit, the most significant bit (MSB). The MSB can be interpreted two ways:

  • As unsigned char this holds a value of 0 or 128.
  • As signed char this holds the sign (see figure above). Because the MSB is half the possible range of the bits before it- the range of values that can be represented by the unsigned char is now limited to between -128 to 127.

The value of each bit position adheres to the formula:

Bit Position To The Power Of  Binary Value
1 (LSB) o or 10 o or 1
2 o or 21 0 or 2
3 o or 22  0 or 4
4 o or 23 0 or 8
5 o or 24 0 or 16
6 o or 25 0 or 32
7 o or 26 0 or 64
8 (MSB if unsigned) o or 27 0 or 128
8 (MSB if signed) 0 or 10 1 or -1

Therefore a value of 201 with a binary representation of 11001001 because:

10 + 0 + 0 + 23 + 0 + 0 + 26 + 27
= 1 + 0 + 0 + 8 + 0 + 0 + 64 + 128
= 201

And a value of 247 with a binary representation of 1111 0111 is:

10 + 21 + 22 + 0 + 24 + 25 + 26 + 27
= 1 + 2 + 4 + 0 + 16 + 32 + 64 + 128
= 247

Finally, here's how we get 255 as the maximum (when all bits are 1)

10 + 21 + 22 + 23 + 24 + 25 + 26 + 27
= 1 + 2 + 4 + 0 + 16 + 32 + 64 + 128
= 255

8 bit numbers are not used in modern day computing because the numbers they can represent are too small for any practical use. We only use this 8 bit data type for one case - character representation (you'd read more on this later).

For numerical purposes we use 32 bit integers on 64 bit computers, and 16 bit integer on 32 bit computers.

CPU Architecture Range For Signed Range For Unsigned
32 bit computers (16 bits integers, 2 bytes) -32,768 to 32,767  0 to 65,535
64 bit computers (32 bits integer, 4 bytes) -2,147,483,648 to 2,147,483,647  0 to 4,294,967,295

By now you'd realise what the effects of bit rot can be for the more complex data types, as a flip at the 16th position or 30th position is far more significant than a flip at the 8th position or lower.

Image removed.

The Basics Of Bits (Part III)

Computers uses different patterns to represent the number 201 and the text 201. Text is an example of a new building block built on the raw bits. We mentioned earlier this is call encoding. The two most commonly used encoding systems for text are ASCII and the modern Unicode.

In most cases a text has a special character to denote the end of the string. This special character is called null or terminator.

The number of 201 can be represented in a single byte (of type char), but to represent the same number as text requires a total of 4 bytes.  The table below depict the ASCII encoding of the text 201 (note the extra terminator character to denote the end of the string): 

Text ASCII Code Binary
'2' 50 0011 0010
'0' 48 0011 0000
'1' 51 0011 0011
NULL 0 0000 0000

A big number like 2 billion (2,000,000,000) can fit into a 32 bit integer type (4 bytes), but to represent that same number (including commas) in text format will require a total of 14 bytes (do not forget the terminator)!

char, integer and text are three of the most common data types used by computers. There are more data types together and together we can create very complex software and data files.

For example, a photo viewing software that can load a JPEG encoded file on your computer. The software will be load from the storage medium and be executed by the computer. The execution will include reading another file, encoded in JPEG format, process the file and then display the decoded picture on your computer monitor.

Another example you'd be more familiar with is a music playing software (e.g. MPD, Roon) reading a FLAC, WAV, APE or MP3  file, decode the information within and convert that into music that is played back on your Hifi system.

Computer bits are like the grains of sand, on its own it's just an unassuming speck. Together the grains of sand can be shaped and sculptured to take on new forms and meaning - a beautiful sand castle.

Sand castle can be washed away with the incoming tide, as bits can erode away over time.

What Is Bit Rot?

Bit rot is a form of data degradation - what you have written before is not exactly the same as what is read now. In binary terms it means a value originally written down as 1 is read back as 0.

The following illustrate the effects of bit rot on raw data type unsigned char (used to represent whole numbers from 0 to 255). 

Original Bits Original Value Corrupted Bit(s) Representation Corrupted Value

1111 1111

 

255 1111 1110 254
0111 1111 127
1111 0111 247
0000 0000 0 0000 0000 1
0010 0000 32
1000 0000 128

Note the change in the corrupted value depends on position of the corrupted bit. This is the same table from The Basics Of Bits (Part II).  This table highlights the significance of the jump in value from the LSB to the MSB. 

Bit Position To The Power Of  Binary Value
1 o or 10 o or 1
2 o or 21 0 or 2
3 o or 22  0 or 4
4 o or 23 0 or 8
5 o or 24 0 or 16
6 o or 25 0 or 32
7 o or 26 0 or 64
8 (MSB if unsigned) o or 27 0 or 128
8 (MSB if signed) 0 or 10 1 or -1

The difference of bit rot on char data type can be small, the difference can range from insignificant change of 1 to half the value of the data type, 128. Both are not big numbers by most measures. Where it can make a difference is in passwords used in some instances. e.g. if you have bit rot happened on the space where your password is saved, in the worst case you can no longer login to your OS or access the application software. 

This problem can scale up with more complex building blocks. On other data types (e.g. 4 byte integers), a bit flip can alter the final value by 231 (That's a value of 2,147,483,648)! With more complicated encoding that works on multiple bytes in a single block (e.g. a FLAC frame), a single corrupted bit means the entire block has to be discarded.

In general, as the complexity of the bits representation increases, the worse the effects of bit rot can be. However, having said that, some encodings have various mechanisms like built in error detection, fallback, backup and correction.

So again let me stress this is nothing to be alarmed about, much! The odds of any of the above happening is really small - your odds of winning division 1 lottery is likely to be higher.

Image removed.

The problem of bit rot only comes when multiple bit rots over time corrupted the data file (or software) enough to cause problems (e.g. random computer crashes, corrupted videos, corrupted audio, etc). Usually by this time it is too late to fix or salvage anything.

Regular backups is no guarantee this data corruption. You'd have to ensure the integrity of the files being backed up as well.

In the next section we'd demonstrate the effects of a single bit rot.

Effects Of Bit Rot

Here is a simple BMP file that is just 9 pixels wide and 7 pixels high. This BMP file is a blank canvas - only the white background and nothing else. This file is then loaded in Microsoft Paint, zoomed in to make it easier for you to see. At 100% this bitmap is extremely small, on a typical 1920 x 1080 screen this bitmap will only take up about 3% of the screen.

A single bit is then flipped in the file - and saved as Corrupted.png. The file is loaded and again zoomed to the maximum. The corrupted file shows a clear yellow spot in the top left corner. This yellow spot is just from a single bit change.

At the original resolution this yellow spot will be very difficult to spot indeed.

The above is an exaggerated example that makes bit rot easier to spot . The resolution of this example is a mere 9 by 7 pixels (total of 63 pixels). The resolution of a modern cameras like the Canon 1Dx is 5184 by  3456  pixels, that's 17,915,904 pixels in total.

Imagine trying to spot about 3 pixels of error from a picture with a pixel density of nearly 18 megapixels. The error(s) will be impossible to spot indeed - in most cases these errors will look like digital noise.

Effects Of Bit Rot (Part II)

This is the hex1 dump of the original bitmap. You can recreate the file from the previous example by using a hex editor and enter these values:

Here is the hex dump of the corrupted bitmap, almost identical, except for one small difference.

Can you find it?

The difference between the two files is at position 0x000000C8. Look at the row labelled 000000C0, and then count 8 columns from the left. The original file has a value of 0xFF, while the corrupted file has a value of 0x7F. 

Original Representation Original Value Corrupted Representation Corrupted Value
1111 1111 255 (FF) 0111 1111 127 (7F)

So the two files are identical in almost every way except for that single bit. A single bit flip caused that yellow spot you saw in the previous page. Over time as more bit rot creeps in the picture will change, and because the rotting is unpredictable and random, it is really hard to say for certain what the result will be.

Footnotes:

  1. Hexadecimal is another way of representing numbers. Binary is base 2 (0, 1, 10, 11, 100, 101, ...) , decimal system is base 10 (0, 1, ..., 10, 11, 12, ....), and hexadecimal is base 16 (0, ..., 9, a, b, ..., e, f, 10, 11, 19, 1A, ...) 

What Causes Bit Rot?

The most plausible theory right now is the magnetic or electrical charge changed after a long period of time. e.g. the bit 1 slowly lost its magnetism/charge over time and moved to the threshold where it's interpreted as 0.

This is a reasonable assumption because data is packed denser and denser into smaller and smaller areas, making cross contamination that much easier. To support this theory, you can copy a file to your SSD just once, never used the SSD until years later, only to find corruption in some of the files.

However, in my experience, a computer system is a very complicated beast and the multi-layered interaction and interconnection between hardware, firmware and software can cause problems. Especially when parts of this critical chain of communications gets unmaintained over the years, turned obsolete as other layers are updated and mordernised.

My money is on a obscure bug somewhere that is causing this flip. This bug rarely surface easily, but when it does, the bit flip situation happens. e.g. as data is travelling along the SATA cable, to be written to the SSD, something happened and a bit is flipped during that time.

Corruption may well not be caused by bit rot. Things like anti-virus, and other spy ware that are installed on your system are constantly looking at the things we do. Any of these applications are capable of altering the contents of anything that goes on inside your computer (if they have the access rights) .

Cannot prove any of my theories though. 

Regardless of the cause, we know the effects - data retrieved is sometimes not from the same as what is written. You can read more about bit rot here: wikipedia article.

My Experience With Bit Rot

Since sometime from 2010, I've been using a NAS. My first machine is a 7 bay SOHO NAS made by Thecus - the 7700PRO. All my photos, music, movies and a ton of Animes I've collected since my childhood have been transferred from various medium and storage to this NAS. 

The N7700Pro faithfully serve my digital content for well over 6 years, until I begin to notice something strange. The data that I copied and CRC32 (later MD5) checksum verified during the transfer stage start to show problems. These problems is usually over in a blank of a eye so I did not really pay much attention to it. The problems include:

  • Video corruption. Corruption tend to happen to my MPEG2 encodings the most. In the worst case I can lose a few seconds of audio, video or both, but usually just a bit of corrupted section in a few frames.
  • FLAC files will sometimes skip a fraction of a second, a brief moment of silence. If you're not paying enough attention you may well miss it.
  • JPEG files start displaying weird coloured blocks.

Not a lot of files are corrupted. A rough number would be less than fifteen files from tens of thousands, with MPEG2 files (Copied from Video CDs accounting for the majority). This is enough to make me curious to investigate further. Easy enough for FLAC since it has a built in integrity checker.

My Experience With Bit Rot (Part II)

The FLAC format has the ability to test it's own integrity. You can invoke the test by using -t option with the FLAC command line utility, like so:

flac -t file_to_be_tested.flac

The following graphic is showing the output of a normal FLAC file (OK means all clear!):

Next I create a copy of this file (calling it Corrupted.flac), modify a single bit from 1 to 0 and run the test again. Testing now detect a checksum mismatch in one of the frames: 

So knowing how FLAC testing works, I created a quick BASH script to run it through all my FLAC files for a given directory. To illustrate how rare bit rot really is, only 3-4 FLAC tracks failed this verification test. While my N7700Pro setup running RAID-6 using the XFS file system allow up to two disks to fail,  unfortunately it cannot protect me from bit rot.

Checksum Collision

Note that any form of integrity testing will not be able to detect all forms of bit rot corruption. There are cases where two different content can yield the same checksum. This effect is known as a hash collision.  An acceptable trade off. While there will be instances where bit rot detection is missed, it is still a good tool to have because eventually the errors will be detected.

Ironically you'd not be able to hear a problem if you are using WAV file. As there are 44100 samples per channel per second, so every second of audio is represented by a total of 88,200 numbers. This is like trying to spot of a few bad pixels from a twenty megapixel picture. Even harder with music since a single bit error is unlikely to be resolved physically by your Hifi system during playback. It is only apparent with FLAC files because it drop the corrupted frame, leaving enough time to note the skip. 

About 18 months ago I swapped out my N7700 NAS to a different machine and running the FreeNAS OS. This new machine has 16 GB of ECC RAM and is now using the more modern ZFS file system. This file system is supposed to guard against bit rot, using a process called "scrubbing". Scrubbing is a system process that is run about once a month to detect and correct any bit errors it discover.

Two years is still too early for me to equivocally state my problem with bit rot is now completely resolved. Will continue to check the scrubbing output, as well as monitoring for corruption of any newly created files on the new NAS. 

The bit rot phenomenon is very rare issue to begin with, adding to the difficulty in detection and diagnostics. It is assuring to know that scrubbing did detect (and corrupt) a single bit error once. However it is still way too early for me to ascertain just how reliable scrubbing (and it's error correcting mechanism) is. In the worst case scenario bit correcting apparently may well corrupt my entire file system!

Image removed.

Bit Rot Detection

The Snakeoil Squad has created a quick Linux script that can test the integrity of your FLAC music library. The script is packed as a Snakeoil module and you can find this in the Snakeoil Technical Support Forums. Refer to the Snakeoil Technical Reference on how to install a Snakeoil module.

Once installed, you need to SSH into your Snakeoil machine and run the script like so:

cd /media/music
./testflac.sh

You can see a sample of the output here, corrupted files will be listed with the prefix of Error. Files that past the test will not be shown, so please be be patient and wait for the process to finish. Ideally you'd only want to see nothing at all pop up on the screen. At the end of the run the total number of FLAC files tested will be shown, along with the number of detected corrupted files. Like so:

You'd also get another file called testflac.log if there are any corrupted files. This file is located in your home directory and it contains a list of all the files that failed the integrity test. For example:

#cat ~/testflac.log
./Corrupted.flac
#

Bit rot should be a very rare occurrence. There should be no more than 1 file per 10,000 on average. If your total number of corrupted files exceed this ratio, please leave me a comment because I'd be interested to know your storage details.

Note that this script only works with FLAC, and it expects to have write permission to the /media/music folder. Expect some bugs as this script may need more finetuning to suit operating environments that is different to my setup.

As mentioned previously this script will not be able to test WAV formats as it do not support integrity verification. For other formats that do please modify the script as you see fit, and send me patches so I can update the module for every Snakeoil user to use.

Once you've verified the integrity of your existing files,  replacing any that were corrupted, you may want to consider moving your files to a more secure file system. According to the Internet, you'd need to use modern file systems with built in checksum support, e.g.

Not entirely certain how true this claim actually is as I'm mid-way into this myself (I'm using FreeNAS with ZFS). It'd be a few years before I can say for certain bit rot is a thing of the past. Alternatively, you can perform regular backups and verify the backed up copy against the working copy.

Disclaimer

This test script scans your file system and test the integrity of all the FLAC files it can find. This is a non-invasive test. Having said that, the increased I/O may cause a HDD/SSD on it's last legs to die. The Snakeoil Squad is not responsible for any bad thing that happens after you run this script. If in doubt, maintain a regular backup schedule before embarking on this, or simply do not execute this.

Conclusion

Do not be alarmed about the damage of what bit rot can do.

The two examples in this article are the exceptions as I know what I'm doing so I deliberately find a situation where a bit flip can be detected. Any corrupted files you may have discovered in your music library is likely to have tens of bits flipped, an action that will take years.

Most RAID setup is not designed to detect and fix bit root. For performance reasons some NAS do not even look at the parity bits when doing the READ operation. Where possible, set up your volumes to use BtrFS or ZFS. In the case of ZFS, make sure scrubbing is done at a regular schedule (about once a month).

Prevention is better than cure. These words are especially good advice since there is no real cure to bit rot - once a file is corrupted the only way of recovery is to restore from a backup. You can always re-rip your music. However other files like family photos and tax documents have to be relied on backups.

The best strategy in my opinion to combat bit-rot is to detect and catch the problem early before it gets too bad. Do not get overly concerned on this issue as a single bit rot will be a problem in most cases. The law of averages is on your side here.

If you are interested in articles like these please check out the rest of the Snakeoil Tweaks.

Comments

Frank Collins (not verified) Mon, 02/06/2017 - 04:53

I don’t believe bit rot is a problem and the Red Book standard copes with missing bits, to a point.

I store all of my music (FLAC, level 8 compression) on a NAS, using FreeNAS. It does a four-weekly scrub which detects and corrects bit rot on the whole file system. While bit rot isn’t a problem for me, it never will be.

agent_kith Mon, 02/20/2017 - 07:47

Maybe it’s Cosmic Rays: https://science.slashdot.org/story/17/02/19/2330251/serious-computer-glitches-can-be-caused-by-cosmic-rays

A "single-event upset" was also blamed for an electronic voting error in Schaerbeekm, Belgium, back in 2003. A bit flip in the electronic voting machine added 4,096 extra votes to one candidate. The issue was noticed only because the machine gave the candidate more votes than were possible. "This is a really big problem, but it is mostly invisible to the public," said Bharat Bhuva. Bhuva is a member of Vanderbilt University’s Radiation Effects Research Group, established in 1987 to study the effects of radiation on electronic systems.

Add new comment

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.

Contents