Ransomware: research strikes back

All it takes is one click for your computer to be taken hostage by a malware program that will relinquish control only for a ransom. Upon the launch of Cybermallix, a CNRS joint laboratory with the software publisher Wallix, Jean-Yves Marion, director of the LORIA, explains how these attacks work, and how AI and formal methods provide protection.

You specialise in the fight against ransomware using formal methods, artificial intelligence, and collaboration with start-ups and industry. Can you explain what these programs are, and how they function?
Jean-Yves Marion: The principle of ransomware is relatively simple. Malware compromises a computer or any other computer system, such as a tablet or smartphone. It then blocks or encrypts part of it, and returns access only after being paid a ransom, generally in the form of cryptocurrency. The attacker can also retrieve data for the purpose of blackmail or resale. In any event, law enforcement recommends never to pay, and instead visit the dedicated official site.

These attacks can take place on a large scale, as was recently the case in the US where a pipeline network was completely paralysed. Specifically, almost all of them follow roughly the same pattern in three phases. The first consists in entering the system. The intrusion generally occurs through phishing, a social engineering technique which prompts the victim to open an email containing a link or document that will execute a script. This script, known as a loader, in turn retrieves—unbeknownst to the user—a malicious code which will install the executable file that is at the heart of ransomware. A malware program, like WannaCry, can also exploit a vulnerability such as a bug or an error in order to infect a system.

Discrete real-time observation of attacks on the Internet. Partnership between the High Security Laboratory (LHS, France) and the National Institute of Information and Communications Technology (NICT, Japan).

The second phase then begins. The installed malware can try to update itself and see if the host machine is not already infected, or seek to communicate with the attacker via an encrypted protocol. These attacks typically occur within a network. A compromised machine, known as the bot, will connect to a larger network, the botnet. All of this is controlled using command and control software. Ransomware is part of a fairly structured black economy that includes multiple specialised tools, which generally do not come from the same source. Some of them compromise the system and conceal the malware’s presence, while others develop the elements central to the attack.

The third phase is that of exploitation. Data can be stolen and exfiltrated, or it can be encrypted and held for ransom in the case of ransomware. What’s more, malware can subsequently be replaced by another, which will in turn infect the host and exploit its position. This is the case with Emotet, some versions of which have downloaded Trickbot, and then the Ryuk ransomware.

Can a simple click paralyse a computer immediately, or how does it actually happen? 
J.-Y. M.:
A certain amount of time may pass between unknowingly downloading the first loader and the final attack. Speed is not necessary, as long as the infiltration goes unnoticed. This is known as a backdoor, which remains open without the user knowing that their machine has been infected. It can compromise systems of all kinds. Since devices are increasingly connected, a phenomenon that will only be exacerbated with 5G, the infection can affect a computer as well as a television, and even an Internet box. These attackable digital objects form a chain, in which the information available on a particular link can help reach the next one. If, for example, a smartphone is contaminated, the ransomware can extract the passwords shared with other devices. The malware will then have a broader perimeter of attack. This is why today’s protection solutions try to take the entire digital chain into account. 

Diagram showing a self-modifying program, which tries to hide its intentions, for instance by unfolding in successive waves. Since its real purpose is revealed only after its deployment, it remains invisible at first glance as long as the program is not executed.

One can imagine cameras and security systems being infected in advance, with the attacker waiting for a favourable moment to take advantage of the situation. This type of attack mostly targets industry, transportation, and the energy sector. The network of compromised devices, the botnet, can also proceed with denial-of-service attacks, which flood a server with requests in order to overload it and shut it down.

What are the specific difficulties in combatting ransomware?
J.-Y. M.:
These programs incorporate obfuscation techniques, in other words protections that hide their intent. These were originally used to protect the intellectual property of commercial software. Such methods slow down analysis of a program by both humans and computers. Malware can rely on cryptography to either hide its data, transform its control-flow graph, or introduce program parts that have no particular purpose. Finally, it can detect when it is being analysed, and stop functioning or self-destruct. Countermeasures are therefore needed to deceive it.

Obfuscation offers different ways of designing software, such as self-modifying codes. They are deployed in waves, and self-generate during execution. It thus becomes impossible to access their semantics, in other words their ultimate intent. The most sophisticated malware can include up to six hundred waves. To manage this, the best-known method is to use a succession of packers that decode data on the fly and transform it into executable code. They are often analysed using dynamic analysis, which runs malware in a secure environment in order to remove traces of its execution. 

Software development courses teach us to separate programs from data, to consider them as two distinct elements. This distinction is however not so clear-cut in such wave-based programming. These self-modifying programs use both a reflexive mechanism, thereby converting data into a program, as well as a reification mechanism, transforming a program into data. Another example of self-modification is program virtualisation, which involves piling up interpreters of programming languages. Each interpreter executes its code in different programming languages that are unknown to outsiders. Obfuscations are fascinating, as they are concrete in some ways but also relate to programming theory, which remains little explored.

There is sometimes talk of digital epidemics. What do you think of this notion?
J.-Y. M.:
Today malware is mutating, and each of these mutations can be considered a variant, to draw an analogy with the current health crisis. They are new versions of a known attack, which incorporate new protections and sometimes new functionalities. Defence systems that can easily identify the original threat can quite readily miss a variant they have never seen. Yet it is possible to create a number of effective variants, using malware that has already been identified and even dismantled.

What solutions can researchers offer?
J.-Y. M.:
Unfortunately, the public academic world has not sufficiently grappled with these questions, although there are several approaches. First, statistical analysis combines formal methods and algorithmic techniques to identify a program’s semantics, which determines whether its execution can be dangerous. Reverse engineering techniques also help infer its intentions, and hence recognise ransomware and malware in general. Statistical analysis offers a series of promising avenues, especially by simulating the execution of a program without actually opening it, and by even calculating the deployment of a program’s code waves in advance. 

In this illustration, a program (in this case Wannacry) is reconstructed in the form of a graph. Sites are then detected within it, each corresponding to an elementary behaviour. The nature of the sites identified indicates that it is ransomware.

For a number of years now, AI has provided new detection solutions. Machine learning methods can train a neural network using characteristics from malware obtained via statistical analysis. However, AI is not the be-all and end-all, as malware authors, who know it only too well, develop attacks specifically designed to bypass such software. The other approach involves dynamic analysis, in which the malware is executed in a closed environment known as a sandbox. This makes it possible to gather information on its behaviour, which is once again processed using statistical analysis methods. In practice, most researchers combine both the statistical and dynamic solutions.

The most advanced systems for malware analysis and detection include the three above-mentioned approaches. The Gorille software suite, developed by the Cyber-Detect start-up thanks to the technology transfer to industry of research conducted at the LORIA, is no exception. It proceeds with morphological analysis of code, and detects threats based on what it has already learned. While Gorille is not a traditional antivirus program, it can provide rapid answers in the face of new infections. 

My laboratory, the LORIA, has just signed a LabCom partnership, called Cybermallix, with the company Wallix. We propose a solution that collects all available information regarding a network, and then resorts to our expertise in executable programs and network traces to identify an attack from the onset. For me this project is interesting because it involves an approach developed by researchers, and put to use by experts in this field.

See all articles in Insights

Similar Posts