Recovery of data from half of RAID0

(07/17/22, )

Case

There was a time, before fast NVME SSD drives, typical home computers was using a 3,5 inch HDD SATA Drive with 7200RPM. Due to use of that particular technology, those drive could achieve up to 160MB/s for read/write operations. That's also was case for my home PC, however, at some point, I have decided to use RAID 0 in my PC. Of course, I traded safety for performace, becasue I wasn't able to install more that two drives and one DVD recorder into my motherboard. I was using such a setup for several years, until time has come and my PC was upgraded. I cleared one of the RAID0 drive using dd, when decided to run photorec on the second. Plenty of files appeared during recovery...

Theory

RAID stands for Redundant Array of Independent Disks - it allows to install two or more hard drives, which are working with each other, creating one logical unit. In most cases, RAID is created for improvement performace and/or data redundancy. Let's take into consideration RAID0 with two drives A and B.In this case, part of the file is being stored on disk A, and other part on disk B. During configuration, the size of data being stored on that particular drive is called stripe. Depending on the platform, stipe can vary form several kilobytes up to several megabytes. So, that large files will definately be stored on different drives, where reading large file from RAID0, will definitely improve read/write operation - there are two disks taking care of operation. Then, what about files smaller than stripe size? There will sit only on one of the drives entirely.

Next thing is, that despite of files being spliced between two drives, if it is a plain text file, there is a chance, that part of it can be retrieved from only one drive... and it does, as it will be presented later.

Research

Having left with one drive from previous RAID0, I have decide to run photorec. It is a data recovery software, that scan given drive for lost files such as photos, videos, text files and many others¹. PhotoRec is quite simple software to use, in a few steps, user have to choose which drive want to scan for files, from which partition (or whole disk). And provide it with location of copy retrieved files. Depending on size of the drive, scanning entire drive could take up to several hours.

Menu of Photorec

Screenshot 1: Menu of photorec.

Selecting filesystem

Screenshot 2: Selecting filesystem.

After entire drive has been scanned, there were ca. 75000 text files recovered. Due to inner working of photorec, most of the files are some parts of original files, not necessarily text files. Going through random file, I discovered, that it is part of an e-mail - I was using Thunderbird.

During recovery process

Screenshot 3: During recovery process.

Thunderbird storing method

Thunderbird is using variation of mbox file for storage data. In that format, messages from specific folder in e-mail server is being stored in one, plain text file. So, every sigle e-mail, can be read from this file and starts with From keyword, containing all information, from date and time, sender and recipient address, any HTML formatiing and even attachments, which in this case, are endoded using base64 and stored in the same mbox file. Depending on how many messages current folder contains, those mbox files may have sizes up to 4GB².

Retriving information from recovered files

Retrieved file was not entire mbox files, only part it, which could be stored in one stripe of data. Using grep and some regex for e-mail format, I tried to find email adresses in all retrieved txt files. I run following command on all file.

grep -E -o "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,6}\b" * > aaaMaile.txt

Having all occurences of e-mail address - I found, that there was at least 63142 addresses, however there was many repetitions.

All recovered e-mails with duplicatess

Screenshot 4: All recovered e-mails with duplicates.

Trying find only unique values, I used following command

sort -u > 1.txt

and received 5216 unique e-mail addresses! I belive, I have recreated more, than only address book from Thunderbird. I think that some of these addresses could be stored in html files from webbrowser's cache.

All sorted, unique addresses recoverd from drive

Screenshot 5: All sorted, unique addresses recoverd from drive.

Conclusion

Simply by asking question "What would happen if..." I found out, that there is possible to recover data from part of RAID0.

How to avoid recovering data or preventing such a scenario from happening?

After disassembly of any RAID, remove data from all of the drives.
Using encrypted data on a drives always improve security.
Improve physical security of drives - most of modern NAS storage are capable of hotswap drive. If a threat actor has an access to storage, he can remove drive, place a new one, and NAS will simply recreate data on to new drive using built-in mechanisms.

All that from buing a new PC :-)