In windows malware analysis, as a malware analyst we usually employs several ways to perform our analysis. The key is to understand what the malware is doing and we need to classify according to the behavior or artifacts which matches with the existing malware family or attackers toolset. In malicious file detection, the companies comes up with writing detection logic for addressing malware campaigns and if new malware set comes in the wild, the detection logic matches then the new files might be detected. Malware authors keep pushing new techniques for evasion and tries to propagate further. In this article, we are not going to see any evasion techniques but we going to see one of the old detection method, we can say the generic signature used by many AV engines to detect the malware, the method is referred as Sectional MD5.
Basically, MD5 is popular hashing algorithm which is to check integrity of the message or a file. For example, if two parties one is sending a message and another one is receiving, that message can be converted in to hash. Consider this, if the server stores a file and client is downloading the file. Server already shared the hash of the file, in our case MD5 hashing algorithm. After client downloads the file, they can calculate the hash of the file, and check whether both the files are same. Hashing algorithms are irreversible. One can generate the hash for a message or a file, but can't reverse it back to the original. Totally for integrity check only. SHA1, SHA256 are popular like MD5 hash. For a note, remember MD5 is prone to collision, we can talk about collision in some other posts.
Every PE files has sections and using any PE tools like PE bear, filealyzer, PEStudio we can determine the hash of the sections present in the files. When a malware researcher given the task to create generic signature, the researcher can compare the samples and found that one section which has malicious code and also it has the same hash in the all the given samples. Now the researcher can write the logic, if any of the section found with this particular MD5, it can be detected as malware. This technique is called as sectional MD5. So a common question is if one single byte changes in that section then the whole sectional MD5 will be collapsed and new hash will be generated. Easily the malware escapes the detection. Most of the times, the sections will not have same hash in the malware samples but still have the same behaviour and codes. It is something like single byte change or assembly logic change. In this scenario, how we can play?
While debugging, we could spot the malicious call and the same call with same bytes found in the other files. Collect the bytes and locate the bytes in the file at disk; calculate the hash for those bytes which can be supplied as sectional MD5. In previous case, we mentioned about hash of the whole section. And in this case, sectional MD5 got created for the suspicious or malicious call subroutine found in the file.
We got so many advance techniques for detection writing in practice, but sectional MD5 is known lesser now and even many don't know whether their engine has such capabilities. In the future posts, we will cover similar detection writing techniques and malware analysis related techniques.
Kindly Note: This post is not generated by AI, and it is written by human; so please share it maximum and help us to write further. Your support needed. Our focus is to create high quality article in malware analysis field without using any AI.
Post by