memtest86+: Checking Your RAM For Errors

Bits are bits - and your computer playback will always be perfect. Right?
Do not answer that - the above is a rhetorical question.
No matter how good your computers are - once in a while you may find your computer is running fine one minute but suddenly crash in the next.
The rarer this phenomenon occurs, the harder it is to troubleshoot the cause.
There are many reasons why a system may crash this way. More often than not, the main cause is simply badly written software. The less usual suspects are leaky motherboard capacitors, poor quality 'Yumcha' power supply and issues like over heating.
However, there is one component that is often overlooked, and this component can cause unpredictable system or software failure. This component is the main focus of this article, and you you probably already know what that is:
Bad RAM!
So what is RAM? It is short for Random Access Memory.
You can picture RAM as the link between your CPU and your storage (SSD, HDD, etc).
Anything the CPU needs is retrieved from storage, stored in RAM, and called upon by the CPU as required.
Very generally, the following are stored in RAM:
- Application
Otherwise know as code. This is the software you are running (e.g. MPD, Squeezelite, LMS). In Windows code is stored in a file and we refer to it as an exe (show for executable). This is the same in Linux, but we call them binaries. Libraries are code that can be reused by multiple applications. In general an application will require a multitude of these shared libraries to execute. - Data
Software are not very useful running by itself.
Most software will load or generate data, manipulate it in some way to make themselves useful. For example, Microsoft Spreadsheet is the application, and the data is the contents in the spreadsheet. And if MPD is the application, the FLAC music files will be the data.
Data can be allocated in memory in one of two ways - heap or stack. And data can be categorised into many types (constants, buffers, variables etc). Explaining the technical differences of all this is beyond the scope of this blog. In the context of this article, it is sufficient to see them all as just data.
One of the roles of the Operating System is to divide all usable system RAM into individual units called 'pages'. These pages are then assigned to each running process as needed.
Memory allocated to each process roughly looks like so:
The application (#1) is stored in the blue section, anything stored in this band can be executed by the CPU. Everything else is data.
Imagine if there is a problem (read or write errors) in the blue box. The CPU is very likely to generate an invalid instruction fault. This fault will cause your software or Operating System to crash or reboot.
In Windows you'd see the Blue Screen Of Death:
In Linux you will get a Kernel OOP.
Linux is slightly different from Windows because there are two types of OOPs. One that behaves like Windows that will halt the entire system, requiring a computer reboot. There is a second type of OOPS that will reset only the CPU, leaving everything else to tick normally like nothing has gone wrong.
Again let me stress 99% of the time, these crashes are bad software and not bad RAM.
Previously we've seen what can happen if the bad part of the RAM falls in the Instructions area.
How the system behaves when it comes to data can be pretty unpredictable. This is because data are categorise (or split) into multiple types - Literals, Static Data, Dynamic Heap and Stack. And bad RAM behaviour differs depending on where it falls on.
Understanding what each of these data types are and knowing how Operating Systems handle them is key to knowing the consequences of bad RAM affects the system. Unfortunately explaining these types in detail will explode this article into many many pages. After all, the purpose of this article is to check if your system has bad RAM, or not.
I am over-simplifying the consequences by a huge extent. Problems you can expect when bad RAM falls in the data regions are:
- Blue screen of death or kernel oops (OS crash)
- Corrupted file
- Software crashes
- Random freezes or reboots
- corrupted video/audio
Frustratingly, in some cases, your computer may even function fine without missing a beat!
Do problems like these affect audio quality? Like bit rot discussed earlier, it is unlikely to. Having said that, why use faulty component or allow things to fail when you know there is a mitigation strategy?
Memory testing utilities are already readily available that can check for RAM issues.
One of the more popular tool is memtest86+.
From Wikipedia:
MemTest86 and Memtest86+ are memory test software programs designed to test and stress test an x86 architecture computer's random access memory (RAM) for errors, by writing test patterns to most memory addresses, reading back the data, and comparing for errors.[1] Each tries to verify that the RAM will accept and correctly retain arbitrary patterns of data written to it, that there are no errors where different bits of memory interact, and that there are no conflicts between memory addresses.
In other words - in plains English, memtest86+ will run through your RAM and tells you if they are OK, or not.
To be safe rather than sorry, it is good practice to test your RAM to ensure they are working as designed. Always run this test when you first build (or purchase) your computer. And re-run this test once every year or two to verify that your computer is still working as expected.
Now that you have a background of what bad RAM is and memtest86+ is. We'd go to the important bits next - how to run memtest86+.
memtest86+ is really an Operating System upon itself. As such you'd need to start it from the Snakeoil OS ISO. Please refer to the manual if you havn't create a Snakeoil OS yet. Here are some prerequisites for memtest86+
- You need to connect a keyboard and monitor to the test PC
- The test PC must support legacy BIOS mode (to run 16 bit code)
This is the boot menu you should se when you first boot up the Snakeoil ISO:
To start memtest86+, move the cursor down with the arrow keys until the entry Memory Test is highlighted. press the ENTER key to begin.
If your computer supports legacy BIOS, the next screen should be this.
Over the next page we'd show you a short video of how to invoke memtest86+, run through the battery of tests and we'd end the article by describing the important bits.
Here's a quick clip of how to start memtest86+, and how it works.
Read on to know what this test is all about.
The first thing to check is the area highlighted in the red box. This tells you the amount of RAM your system has (1 GB in the example) and the throughput (40744 MB/s). This is a good test to see if your memory DIMMs are slotted in properly with your new computer. e.g. If you have 2x 8GB sticks of RAM, but one of your RAM is not slotted in properly, you'd only see 4 GB of Memory in total, and your your throughput will be halved of what it should be.
In the left top corner (highlighted in magenta), memtest86+ is showing the current test memtest86+ that's running now. Memory RAMs have different ways of showing up, and memtest86+ is going to brute force through all the test across all your memory to make sure everything is running correctly. In truth this section is only relevant to the geeks, but some of you may find this screen strangely hypnotic.
The most important bits are really the Pass/Errors columns (sky blue box). For all intents and purposes the Pass column need to be greater than 0 and the errors are always 0. My personal preference is to run through all the tests 3 times to make sure everything passes with flying colours. To do this I just run memtest86+, leave it running overnight and check the screens again the following morning.
To finish the test, either power down the computer, or press the ESC key on the keyboard. And this is how you can test your system for defects. Now knowing your system is fully capable of decoding the bits, you should rest your feet and enjoy the bliss of music played back on your Snakeoil PC!
We'd like to hear from you! Please feel free to comment below or better yet, submit your results in the forum poll and let us know if your PC is fine, or suffering issues.
Add new comment