The Terrible, Terrible State of VOIP Security
>>> You can read this Long article split in 3 parts <<<
The beginning of this (sad) sad story dates back to December 2012, but ends somewhere in 2016 — so it took us almost 4 years to go from unknown to the impossible. It was a journey that involved us getting to know new people and technologies. It was actually so long that by its end new protocols were introduced in the market and some old ones deprecated. It was a journey full of frustration and constant experiments with uncommon setups. Read on to see who won the battle for encryption, and what happens when you bypass standards.
This article — despite being targeted at security professionals and network engineers —provides references and explanations in an attempt to be more accessible to a broader audience.
VOIP is slowly but surely replacing traditional telephony. Nowadays, some telecoms aren't even offering conventional phone lines anymore — just VOIP channels on top of the Internet.
Businesses are also embracing the change, happily noticing that VOIP is overall a cheaper and a more flexible solution. A new market is emerging -— hosted Virtual IP-based PBX services are being offered to simplify phone system management. IP is everywhere.
With change come new challenges. Among others: security. IP systems are susceptible to attacks and eavesdropping and, since they allow for a more distributed topology — e.g., people working from home, distributed offices, etc. — concerns for securing telephone conversations in these new systems are pretty common.
There are two main types of attacks that we are concerned about: compromised logins and eavesdropping. There are, actually, more than just these two security concerns, but these are beyond the scope of this rant.
If an attacker compromises a PBX login, the company may end up being abused by criminals as a free-for-all telephone exchange for long-distance calls, thus losing thousands and thousands on phone bills. One might think that eavesdropping isn't dangerous for legitimate conversations, but it has more far-reaching implications, starting just with basic privacy over industrial espionage and leading to governmental surveillance.
Having said that, we want to make sure that the technology we use is safe. And in the wake of E. Snowden's revelations, we want our communications to be secure. Most Internet-users can now distinguish between an unencrypted (unsafe) and encrypted (safer) transmission indicated by their Internet browser. Using telephony, most people just assume it is safe and private — but with VOIP-based communication this is not necessarily the case.
Thus, we embarked on a trip of securing our VOIP setup so that our customers don't have to think about their privacy and security. And on the way we discovered that the world is not ready for change. While surfing the web, if we want to switch to secure transmission, we just change "http" to "https" in the web site address and enjoy immediate effect of encryption safeguarding our actions while we work our way through the web (provided the page administrator has enabled this feature, of course). VOIP, too, has this notion of securing the transmission by just changing the protocol to a secure counterpart. However, contrary to our expectations, it just didn't work out that way, and required a lot of knowledge, persistence and patience.
VOIP is actually an umbrella term nowadays. Any program or protocol that communicates audio from one point to another over the Internet is considered as VOIP. IP Telephony is a more modern term for that because it doesn't restrict the protocols to voice only. Our concern was not IP Telephony in general, but a particular incarnation of it: SIP. SIP-based telephony is probably the most widely used. There are numerous SIP servers available — free and commercial ones — and the communication is relatively easy to Setup.
Asterisk is, for example, the most popular free SIP-based VOIP server, with numerous installations all over the world. Not only is it free but also open source, making it a good choice for administrators familiar with open/free software. It comes in different flavours and names, with and without a GUI, and is a de-facto standard in the world of VOIP.
There are also quite a few commercial SIP-based servers available, with better or worse in implementation discipline — some of them are very proprietary, others adhere more to standard features. Even though we chose 3CX for our needs (since we liked it Microsoft Windows affinity and administration-friendly concepts), it really could be any other vendor — the experience would have probably been very similar.
When one speaks of SIP protocol, what is actually meant is a set of different protocols that don't necessarily even run on the same line. Let us briefly look at the whole suite in some detail.
SIP itself is one of them, and is the starting point of all communications. SIP's task is to provide signalling and other meta information, such as routing and voice channel assignment. Within SIP (Session Initiation Protocol), we can also find SDP or Session Description Protocol. SDP bears media description information — such as audio codecs negotiation information and other audio endpoint parameters, including, actually, a specification of the protocol that will be used to communicate the audio streams. Since SDP runs on top of — or within — SIP, it is often referred to as part of SIP or SIP/SDP.
As you can see from above, SIP/SDP only specify where the audio will be running, but actually do not transmit the audio streams itself. For that yet another protocol is used. It could theoretically be any protocol capable of media transmission — but RTP (Real-Time Transport Protocol) is a most widely-used one. This is where it becomes complicated, since RTP (or any other audio transmission protocol) runs over a different connection than SIP and asynchronously from SIP.
Thus, we have at least 3 levels here: the first one, a signalling protocol, SIP, that provides basic signalling and metadata transmission; a second one, SDP, that provides media and encoding specification; and the third, RTP, that transmits the actual encoded media.
For the sake of simplicity, we are not looking into other media protocols like video transmission or protocol extensions such as chat.
What we have described so far was communication in plain text, without encryption. Thus, in order to encrypt the parts involved we need to provide the means of encryption for all three protocols mentioned above.
Since SDP runs inside SIP, if we encrypt SIP, SDP will be encrypted along the way. Similar to secure HTTP, or HTTPS, there is a secure counterpart to SIP, or SIPS, which is the same protocol, but run within a TLS-encrypted channel. Now originally, SIP runs over UDP — but since TLS requires a stateful connection, secure SIP runs over TCP. There are both positive and negative implications to switching to TCP, but they go beyond this document. This type of SIP-encryption is often referred to as SIP/TLS, rather than SIPS.
RTP does not have a secure counterpart, but when RTP's data is to be encrypted, the encryption key for the stream will be typically specified within stream metadata contained in SDP. And since SDP in inherently encrypted after SIP is, the encryption key for RTP, transmitted over SDP, is also transmitted in encrypted form and will be then used by RTP to encrypt the stream data itself. Encrypted RTP is generally referred to as SRTP, or Secure Real-Time Transport Protocol.
From that setup we can derive one important conclusion: In order for SIP-based communication to be secure, we must provide encryption for both SIP and RTP. By omitting either one, we open up the voice data to prying eyes — or ears, for that matter.
With all that fuzzy setup in mind we embarked on the journey to provide our existing and new DaaS-customers with telephony services.
We had to provide them with robust telephony devices and a robust service as well. So basically, we had two choices to make: a scalable SIP-server and reliable vendor of IP-phones.
We chose Cisco as phone vendor, since phones themselves were of quite good quality and relatively easily customizable. We are talking about Cisco Small Business SPA500 series —originally Linksys products before Cisco bought them. This fact becomes important once we start implementing encryption on these phones.
For the server side our choice of vendor fell on 3CX, a European company with a sound product with close to being non-proprietary in their implementation of SIP. Hence, further along the lines we will be assuming 3CX as our server-side component.
Although 3CX provided phone provisioning to automate phone setup on the customers' side, initially that provisioning was done over HTTP without encryption — so we also wanted to make it really secure by providing an HTTPS-based provisioning and client-certificate-based authentication, thus closing the circle and leaving no chance for an attacker to eavesdrop.
We first made sure that our setup worked in general, and only then did we start to transform it so that it embraced encryption.
And this is where the actual fun began.
Further comparing an HTTPS session with a SIP/TLS session, we have a phone, which can be regarded as a browser and the SIP-server as a web server. As a web browser requests a web page over a secure connection, so does the phone, making a SIP request over a TLS-connection. An established (and in our case, authenticated) TLS session assumes that both sides have exchanged their cryptographically signed certificates and that both are mutually trusted. On a computer running your web browser, you would import any root certificate to the appropriate store to make a foreign certificate trusted. At the time of writing, this is not a huge problem with Cisco, but it was indeed at the time of implementation.
Cisco phones came with a pre-installed client certificate and a small list of root certificates that could be trusted. That meant that we would have needed to obtain a certificate for our provisioning server and for our SIP server, signed by Cisco. Indeed, there was a known way to do so, well described on a Cisco support site. It basically said that you would need to generate a certificate request and submit it to Cisco to be signed. It even provided an email-address to send the requests to, but the submission could only be initiated by an official Cisco distributor. We contacted two distributors regarding the task and none of them was even able to understand what we wanted from them, since they had never heard about Cisco SPA phones in the context of secure SIP.
We opened a case with Cisco partner support and, after a week or two emailing back and forth we were able to find a person responsible. A few days later we finally got a signed certificate and were able to install it on the web server (to ensure secure provisioning) and on the SIP server (for SIP/TLS to work). Although SIP/TLS did not work right away (more to it in part IV), things started moving.
Now, at this stage you have to remember that SIP encrypting is separate from RTP encryption and we had to provide that, too. After switching on SRTP on the phones, we realized that it was not working — the phones were not even trying to establish a secure RTP session. Support forums were full of questions regarding SRTP-encryption — most with no answers.
Further reading revealed that SPA phones required the encryption key pair to be generated for them and installed as part of provisioning process. But there was no specification of the format of that data. Two obscure settings were provided for that: SRTP-key and a "mini-certificate". These were mentioned on support pages, too, with a reference to a command-line tool that would generate that data. But all links that used to point to that tool were broken. We saw a community attempt to recreate the tool, but could not find a working copy of it. Only another support request provided us with the tool that would generate the necessary data for the provisioning.
It is not entirely clear why Cisco chose that complexity in implementing SRTP, but jumping ahead, it is worth mentioning that other vendors did not require any special setup in order to be able to generate SRTP keys.
After installing the server certificates and the obscure SRTP keys we thought that the troubles would be over, but the setup just did not work. This time it was the 3CX server that did not play nice.
As SIP/TLS sessions between our Cisco phones and 3CX would not get established, we started to collect network traces. Since we were in the possession of the private key of the server certificate we were able to decrypt the otherwise encrypted communication between the phone and the server (3CX). What we saw there was that after the phone presented its protocol specification against the server during a TLS negotiation (so-called handshake), the server would just close the connection.
We sent decrypted traces to 3CX support and got an answer, saying that our "phone is trying to talk SSL 2.0 and we do not support that old protocol". Being relatively new to the field of applied cryptography, we were inclined just to believe that 3CX blaming Cisco's choice of an "old protocol" was justifiable. So, we went on and filed another case with Cisco support, citing 3CX and the "old protocol" claim.
The answer came relatively quickly this time, referring us to the TLS RFC (2246 and 5246), where we could read that a client wishing to support an older protocol (SSL 2.0 in our phone's case), but also being able to support a new one (TLS 1.0 in our case), may start talking the old protocol, indicating the support for the new one, thus allowing the server to decide which protocol to use. The server in our case — once registering that the client, while initiating an SSL 2.0 connection also supports TLS 1.0 — would have needed to answer with the new protocol and the phone would have forgotten the SSL 2.0 and started communicating in TLS. But instead, the server closed the connection. That was against the RFC and we returned to 3CX with our findings.
The screenshot above depicts a part of an SSL handshake with an indication of SSL 2.0 as an initial protocol specification (underlined in red) and a supported TLS version as an alternative (underlined in blue).
By now we knew that both Cisco and 3CX used OpenSSL for their implementation of TLS. We looked at how OpenSSL handles these types of requests and saw that it was a matter of linkage options in order for it to start working. It took us some effort to persuade 3CX to investigate the issue, but it took 3CX almost half a year, including QA, to implement the change. Although thankfully, we were provided with a private patch that had the needed functionality and were able to carry on with our findings.
It was almost a success, until suddenly, 3CX, who seemingly stroke a strong partnership with Snom, tells us that SPA phones should not be considered anymore, that Cisco will likely stop producing them and that we should switch to Snom phones, that have everything we needed out of the box. By that time we had deployed SPAs in the field already, and so we are not immediately convinced, but still, we decide to give it a shot.
Snom is a German-based company with good quality phones, a sound roadmap, and German engineering. Their phones do look good and robust, although I personally still prefer the Cisco finishing to Snom's.
We presented our long case to Snom engineers in full detail and were told that their telephones were able to complete our task with ease and that TLS works out-of-the box with their phones. We received 3 different models and started testing.
To our big surprise, the out-of-the-box functionality was nowhere to be found. As soon as we imported Cisco's root certificates into the phone the certificate chains became broken and TLS would stop working altogether. After filing a support case — now with Snom — we were told that the current firmware has a bug and so it would not be possible to use other root certificates than those already installed in the phones. New firmware was in the making, but there was no ETA for the release, so we stowed the phones for a later date.
On a side note, by this time I started realizing that most vendors — whether phone or software — often backed off once encryption was being considered and it started getting complicated. Snom phones did work fine without encryption, but so did Cisco's. But with encryption enabled, all sorts of problems began to creep in, and we could not find anyone capable of handling them.
A year or two have passed and we decided to revisit the Snom case and took the phones off the shelf. After updating to the latest firmware, we saw that only two out of three phones worked. Another case was opened. The engineers at first are perplexed, because surprisingly "the hardware platform is the same across all three models" — but then suddenly it turns out that the smallest model is indeed different and that the difference would even prevent it from working properly in our setup!
The phones were sent back to Snom, the TLS case was closed, though not completely resolved for us. But by now we had a surprise waiting for us from our SIP server vendor, 3CX.
Software vendors update their product from time to time. This is obviously a good practice, because bugs need to be killed and new features have to be added — the software evolves. Along the way they sometimes make architectural changes that might break loosely coupled nodes within your infrastructure.
That was our case with a new version from 3CX. The system now assumed that its web interface, TLS endpoints for SIP and everything else that had to do with SSL was governed by one and the same certificate and was to be publicly trusted if we wanted everything to work fine. This was new, since previously we had a certificate signed by Cisco and it was not publicly trusted, since Cisco's Certificate Authority was a private one and the certificates it signed were normally not for open publication.
So, we had to obtain a new certificate that was (a) publicly trusted, (b) trusted by Cisco and (c) was wildcard-capable, because we needed to repurpose it for many different things within our VOIP infrastructure. This normally would not be an accomplishable task — at least during our first experiments — since Cisco phones did not support any third-party certificates. All had to be signed by Cisco. For reasons that now can only cause speculation, and that even Snowden might have an idea about the purpose of such a policy, Cisco suddenly had a pleasant surprise for us in their pipeline.
It came to our attention that an option to support third-party root certificates — that was even present in earlier versions of phone firmware but never worked — finally started working and allowed for third party certificates usage. This was very good news in light of changes within 3CX system, and we plunged into a new investigation.
While we prepared and rearranged our lab, clouds descended on one of the fundamental hash algorithms used in PKI of all sorts — SHA1. No, it is not yet 2017 where first collisions are reported, but it is the preparation by browser vendors and certificate authorities: Basically, it was no longer possible to get a SHA1-signed certificate issued — the new algorithm to be used was SHA256. It was (and is) cryptographically much more stable and the newest Cisco firmware allegedly even supported SHA256-signed certificates.
Now remember that Small Business SPA product line consist of 300 and 500-series phones. And within that 500-series there is one model — an SPA-525G2 — that does not share the firmware codebase with the rest of the family. There is no apparent reason for that fact, and although it has many more functions than its siblings, it still is a phone with many aspects in common with them. Now guess what? This was the only phone that did not work with that new cryptography! The whole line worked fine: the 300s, the 500s — but not the 525!
Time to open a new case with Cisco. First thing we learn is that support engineers are not aware of the fact that Cisco now supports third party certificates and at first, they just refuse to take the case, saying that we have to use only Cisco-signed certificates. We insist and the case gets escalated.
The second level is unable to reproduce the case and seems to be quite reluctant to probe any further, since they do not understand what is happening. Time goes on without any progress until we push our contact at Cisco and the case finally gets assigned to a new engineer. The engineer cannot reproduce and asks us to send him network traces.
Now in order to obtain decrypted TLS traces, you need a decryptable handshake on the wire. This becomes quite rare, as more and more nodes negotiate the keys using Diffie-Hellman algorithms. These algorithms do not rely on pre-defined keys (otherwise obtainable from the key/certificate pair), but instead negotiates an ephemeral key that is not stored anywhere and not accessible to our deciphering tool (Wireshark). Thus, in order to obtain the traces, we had to disable cypher suites that used Diffie-Hellman key negotiation in favour of RSA keys that we could use for decryption. Luckily, this change had no effect on the phones' behaviour, so we collected the traces from both a working phone (a 300-series) and non-working one (the 525er) and send them both to Cisco.
Next day Cisco's supporter is back, though not with findings, but with more riddles. He tells us that our server closes the connection after the phone (the 525er) presents its client certificate within the handshake and that Cisco cannot be held accountable for our server's behaviour. Then he gives us crucial information: two traces that we sent to him from a working phone and from our problem phone show two different TLS versions: v1.0 in the first and v 1.2 in the second, not working case.
So off we go to dive into the trace and decipher individual messages. It turns out that TLS 1.2, according to the RFC, assumes the hashes in the cryptographical signatures for the presented client certificates to be at least SHA1, unless specified otherwise by the sender in a special field during the TLS handshake.
It also turns out that Cisco phone's client certificates use MD5 for their hashes and that the phones indeed specified this algorithm in that particular special field during the handshake as stated in the RFC.
An excerpt from a network trace showing an MD5 signature for the presented client certificate.
So, everything should have been fine as all the steps were carried out according to the big plan — the RFC. What was wrong? We were so close to the resolution, and yet something was still missing.
We carried out two additional tests and were able to see that SIP/TLS indeed worked fine — only the secure provisioning over HTTPS was not working. Something was obviously wrong with our web server. Now our web server was a standard IIS on a Windows Server 2012 R2, and we were not expecting any surprises.
It is important to know that IIS itself does not handle TLS, but rather outsources this functionality to another subsystem called SChannel — Microsoft's SSL/TLS-engine. After some further examination we find out that SChannel has a flaw in TLS 1.2! And this flaw concerns exactly our scenario: The server, while talking TLS 1.2, ignores the said field for hash algorithm specification and, having received an MD5-hash instead of expected SHA1, decides that this is a violation of its TLS contact and closes the connection.
The SIP flow that was not governed by SChannel — but rather by OpenSSL — worked fine, since OpenSSL most probably was RFC compliant in this issue.
Many cryptographic forums were full of questions and outcries on this subject. So we consider ourselves lucky for understanding what had happened — others were just poking around in the dark.
The same setup with TLS 1.2 disabled worked just fine (the phone and the server would negotiate TLS 1.0 in that case), and for the time being it could serve as a compromise. But in the long run the problem would need to be solved. Hopefully Microsoft patches this flaw at some point.
Even if think MD5 is a thing of the past, it still has to be supported — only in order to support those old devices that got their certificates issued when MD5 was still OK to use or to support the devices from devices from slower vendors with older root certificates.
What have we got at the end of the road?
Let us see:
So, why on earth did we expect asynchronous encrypted telephone system comprised of nodes from different vendors to be uncomplicated? This assumption might have been our biggest mistake — to assume and just hope that it will work.
But it was not just our bad luck: We learnt in that lengthy experiment that implementation details (and errors) are crucial. You cannot just expect two devices that promised to talk the same protocol to work flawlessly together. Manufacturers make assumptions and predictions and they do it with good intentions or to save money. But sometimes it backfires.
Had not Cisco imposed restrictions on their PKI we would have been able to use trusted certificates from the very beginning. Had 3CX implemented the integration with OpenSSL correctly we would have been able to use Cisco phones from the very beginning. Had Snom implemented the PKI in their phones properly we might have switched to Snom from the very beginning. Had Cisco used the same code-base for the whole product line we would have not lost so much time scratching our heads over why SPA525 was the only funny phone among the rest of the family. And finally, had Microsoft implemented TLS 1.2 properly all this would finally just work as intended — especially if we think about the fact that the life of TLS 1.0 is coming to an end.
And as if this was not enough, I am curious just how Cisco is going to replace the Client Certificates in the myriad of phone devices deployed around the world? All these certificates bear MD5 hashes and are signed by some (now) obscure certificate authorities that bear the names of two VOIP vendors that Cisco bought in the course of time: Linksys and Sipura. These root certificates expire in 2035. Other Cisco root certificates expire later and have SHA1 hashes — but as far as we know, SHA1 is practically broken, so the next problem is just around the corner.
If you think about the last point made, you will have to realize that this problem is actually fundamental for all devices with security features: As security evolves and features get deprecated, devices may just sit there, un-updated, and are unable to get updated, thus posing a potential threat to network environments.
It seems that complicated setups like the one described would never just work— or not at least until standards are strictly enforced and vendor self-discipline rises above today's level.
Let us hope that this day comes soon.
 Voice-Over-IP (Internet protocol), a set of standards and protocols that enableusing Internet channels for transmitting telephone calls.
 Private Branch Exchange, telephone switchboard installed locally in an office.
 E.Snowden is a NSA whistle-blower, who in 2013 revealed a big scale governmentalsurveillance program.
 HTTP: Hyper-Text Transfer Protocol, the main protocol used for surfing the internet
 Secure HTTP, an encrypted version of the HTTP protocol
 Graphical User Interface, http://en.wikipedia.org/wiki/Graphical_user_interface
 It is possible to run TLS over UDP using DTLS (Datagram TLS), but DTLS is notcompatible with current SIP implementation, so it is not used.
 Desktop as a Service: http://en.wikipedia.org/wiki/Desktop_virtualization
 Secure Socket Layer: a predecessor to TLS.
 Request for Comments, a publication type from the InternetEngineering Task Force, defining standards for the Internet: https://en.wikipedia.org/wiki/Request_for_Comments
 Private Key Infrastructure, a set of roles, policies, and procedures needed to create, manage, distribute, use, store, and revoke digital certificates and manage public-key encryption. (Cited from Wikipedia)
 SHA1 is a cryptographical hash function (irreversible transformation) that generates a 20-byte hash-digest.
 First successful collision attacks were reported in 2017: https://phys.org/news/2017-02-cwi-google-collision-industry-standard.html
 A replacement for SHA1 is SHA2, a family of hash functions with different bit length, e.g. SHA265 uses 256 bits for the digest calculation andproduces a 32-byte value.
 A key-exchange protocol that does not rely on a pre-defined secret: https://en.wikipedia.org/wiki/ Diffie%E2%80%93 Hellman_key_exchange
 Internet Information Server, a standard part of Microsoft Windows Server