Here we are on another post of our technical blog post series. In this series, we try to share our experience on some of the Wi-Fi related issues, further discuss them to elaborate their effect on Wi-Fi Quality of Experience, and how to avoid them.
Before diving into this blog post, you might want to check other technical posts in this series.
- The Playbook: An ISP’s Guide For Providing The Best In-Home Wi-Fi Service
- Coverage Issues
- 2.4 GHz Time and Congestion Issues
- Client Related Issues
Also if you would like to get early access to the next technical blog posts, sign up from the below link:
Alright! Let’s skip the ads and start talking about today’s topic: The surprising effects of CPEs’ free memory on Wi-Fi performance.
Even though the Wi-Fi issues related to free memory of the CPEs cover a small percentage of the issues we see on the field, we think it is worth a blog post, considering their significant effect on the Wi-Fi quality of experience.
When and how does free memory become a problem?
When we analyze different CPE models for free memory in the field, we observe many different behaviors. We are interested in the behaviors resulting in very low free memory, and hence, poor user experience.
We observe in the field that the more uptime a CPE has, it is more likely for the CPE to have low free memory. However, this is not the only reason for low free memory. Fig. 1 shows three common behaviors resulting in low free memory, with script restart and jump events. Script restart events indicate the start of the data collection. Jump events represent the high jumps in the memory, due to a reboot or some internal processes in the CPE. In the first case, free memory is observed to decay exponentially over a couple of days. Another case is that the free memory is showing a monotonic decrease over time until a certain level of memory where the poor user experience starts to be observed. Here, we can group these two cases under a memory leak case. In the last case, sudden jumps/drops in the memory are observed. Sudden jumps occur due to some sudden heavy load processes and tend to revert back to the previous high memory stage after they are completed.
Fig. 1: Three common behaviors resulting in low free memory: exponential free memory decay, slow memory leaks over time, and sudden jumps in free memory.
Now we know what causes low free memory, but what is “low” here? It is definitely important to understand the levels of memory usage that could affect Wi-Fi performance. To determine the “critical” free memory affecting the performance, we conducted various tests on different CPEs.
Let’s talk about the test results.
Every CPE brand shows different types of degradation in Wi-Fi performance at various memory levels. While the free memory is at a critical level, we have observed disruptions in the internet connection, DNS issues, instability in the client throughput, increased ping delays, and packet losses. Fig. 2 is an example showing the effect of low free memory on the client throughput. We have observed that the client throughput diminishes and CPU load doubles after the CPE spent one hour at the critical free memory region. Also in some cases, the variance in the client throughput increases, and disconnections occur after the free memory hits the critical level. Additionally, Fig. 3 is an example of high ping delays due to low free memory. The ping delays are measured when the memory of the CPE is deliberately leaked. It is observed that ping delays skyrocketed up to 12 seconds causing packet loss. This behavior is also observed at other CPE brands. Another issue for some CPEs at critical memory is that the connected clients continue downloading content when the IP of the source is already resolved, but cannot browse a new URL. This means that low memory sometimes causes unresolved DNS issues, preventing the user from browsing new web pages.
Fig. 2: An example of low free memory causing increased CPU load and diminishing client throughput.
Fig. 3: An example of high ping delays due to low free memory.
Another interesting observation from our tests is self-rebooting, depicted in Fig. 4. When the free memory of some CPE models is on a critical level, the average CPU load suddenly jumps, and the devices reboot themselves. After the reboot, these CPEs reach safe free memory and low CPU load levels. However, this self-preservation mechanism obviously results in a few minutes of service interruptions for end-users. The worst scenario would be staying in this problematic state for a long time. Unfortunately, many CPE models do not have self-preserving mechanisms and stay under low free memory or high CPU conditions, trying to survive with the limited memory they have. This causes more severe problems on the CPE and the client side.
Fig. 4: An example of a self-reboot against low free memory
Categorizing the memory problems in the population
We approach the memory problem in four categories. The first category is obviously for the unproblematic CPEs. The second category is for the CPEs that have memory at the critical level because, beyond this level, users start having a poor quality of experience. Finding this critical memory may require explicit testing and comprehensive analysis of the data acquired from a population, utilizing the CPEs of interest. The third and fourth categories represent the memory leak issue but at different levels of severity. When the memory leak is determined via some pattern recognition algorithms, the remaining days for memory to leak until the critical memory level is estimated. The ones that are estimated to hit the critical level in under two weeks are considered urgent cases and fall into the third category. The others having more than two weeks of the estimate are considered as not so urgent cases and fall into the fourth category.
Let’s examine the required actions for the above categories under field test results.
A field study: Memory leak in the CPE firmware is a real thing!
A field study is conducted to understand if firmware causes a memory leak. For this, we selected two different firmware versions of a CPE type, one being a descendant of the other, both vastly adopted in the field. On Lifemote, live trackers are generated to indicate the ratio of the population suffering from a memory leak issue every day, for both firmware groups, separately. Fig. 5 demonstrates the aggregated results of those trackers, demonstrating that the CPEs with the old firmware has comparatively high memory issues, with respect to the CPEs with the new firmware. After the firmware update, we observe that the ratio of unproblematic CPEs increased by 4.7%, and the CPEs at the critical memory is shifted to the low-priority memory leak region, where there is a long time for these CPEs to hit the critical memory level.
With the feedback we received from the field, we learned that this problem largely reflects the end user’s Wi-Fi experience. The findings of this field study prove that different firmware versions may demonstrate different memory leak behavior, even if both are deployed on the same CPE. Also, it proves that firmware versions definitely have some effects on the memory leak behavior of the CPEs. Hence, updating the CPE firmware would be a promising solution to solve memory-related issues.
Fig. 5: Memory light distributions of two different firmware on the same CPE.
What is the solution?
As we discussed in prior sections, free memory can have significant impacts on the Wi-Fi quality of experience and therefore needs to be actively maintained. The recipe for resolving memory-related issues begins with granular data collection and running analysis on the whole population. Then, we can determine the categories of the memory issue mentioned in the previous sections. Finally, appropriate actions can be taken according to the memory condition.
Internet Service Providers (ISPs) can take a reactive or a proactive action when faced with memory issues. The reactive action refers to a call-center action to solve a problem when an end-user complains about it. For CPEs with critical memory levels, this action would be rebooting. Rebooting the CPE provides an instant but temporary solution to the problem. In the proactive case, ISPs can take action without waiting for customers to call. Here, we can benefit from estimating the time for a CPE to hit a critical memory level. If the estimated time to reach low free memory levels is short for most CPEs in a population, the firmware upgrade would be a solution to address the problem efficiently. On the other hand, if the problem is in a small group of CPEs with high uptimes (e.g., a few months), rebooting these devices from ACS would bring free memory levels back to normal.
In general, memory-related issues causing performance degradation in Wi-Fi occur only in a small percentage of the population, but once it happens, it happens in all its glory. Long delays, internet connection problems, high CPU loads, DNS issues, and all nightmares… Troubleshooting such issues without any visibility in the CPE, and a lack of insight from the user experience turns into a mission impossible! In that sense, granular data collection enables comprehensive analysis of the population and as a result, possible reactive and proactive solutions. As a field study shows, a firmware upgrade might be a promising solution to memory-related Wi-Fi problems.
What Lifemote brings to the table?
Lifemote brings visibility to in-home Wi-Fi networks and the CPEs, providing charts, trackers, and insight cards that are built on powerful analytics, running over millions of devices in the population. CPEs demonstrating low free memory can be easily determined by data-driven population analytics, automatically. Memory health can be categorized under various severity levels and displayed on an insight card, to further elaborate on the issue. Also, the memory health of the whole population can be monitored by applying filters on the CPEs, regarding their memory status. As a result, Lifemote’s data-driven approach enables reactive and proactive solutions and helps service providers offer better and seamless Wi-Fi experiences to their customers.
About the Authors:
Customer Success Manager