Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Version History

« Previous Version 7 Next »

There are many reasons that could lead to the users experiencing slowness, by understanding the root cause of the system slowness, we will be able to troubleshoot effectively. In order to resolve the problems effectively, it is extremely important for the users to provide the details. To some users, slow means if a page is loaded more than 2 seconds, while slowness could mean  20 seconds for the same page to be loaded. Instead of using qualitative words (big , small, slow, fast, good, not good), we will use quantitative words while communicating with the engineers who is going to help you in seconds, minutes, hours, date / time etc. Also, in order to be specific, we are going to indicate which page is slow, and which page (function / screen / module) takes how many seconds to load, and how many seconds to generate a certain report. And if we are generating the report, we will indicate exactly which date range, which branches / company, what category filter or settings we use to generate the reports. All this information is essential for us to help you identify the root cause quickly, and resolve the problems effectively.

Possible Root Cause of the System Slowness, Detection, and Solutions

NoPossible Root CauseTroubleshooting GuideSolutions
1

Client Device

 

The fastest way to check if the slowness is caused by the client device, is to access the same website / functions with multiple other devices. If Accessibility from other devices are normal, then the probability of slowness caused by the slowness in the device itself is almost certain.

Regardless of whether it is mobile or desktop devices, see below for more steps to identify the root cause:

Apart from System Performance monitoring, it is never alien to our technical support team that some of the client devices that have been infected by Virus is known to perform badly. Usually, when we detected high CPU usage, and high Network Bandwidth, we would also be recommending the users to install anti-virus software to detect the potential security threat to the devices.

Another possibility is the web browser. When it comes to web-browser usage, settings, the following areas could be checked:

  • DNS settings
  • Too many tabs
  • Cache
  • Proxy Settings

 

 

Depends on the root cause from the troubleshooting of the client device, users could upgrade the RAM, Processor, or perform anti-virus scanning accordingly.

At the time of reading, the following guide may be out-dated, but google is your best friends, there are plenty of resources available online to resolve the problems that you are having with optimizing your devices.

2Client Local Area Network

When devices are not connected directly to the internet, it is possible it goes thru routers, switches, proxy servers and other possible network routes to access the Wavelet EMP on the cloud. There are a few ways to check and detect if the problems are caused by the Local Area Network:

  • Non-technical approach
    • Getting help from another users from another location to access the website
    • Use a different device that directly connect to the internet, for example you are having problems with your computer accessing the website, but you can test with your phone (without connecting to Wifi, use direct internet connection, 3G etc)
  • Using Ping
    • Ping your router IP address (not the google and not other external websites / servers)
  • Monitoring the network performance on the networking devices
    • this hyperlink provides tutorial for D-Link routers, but other routers should be similar as well, refer to the respective user manual for other routers.
  • Other devices within the same local area network
    • Sometimes, there's no problem with your device or routers, but there could be another device within the local area network that is infected by Virus, and consuming significant amount of bandwidth. It is also possible that somewhere in the LAN (Local Area Network), there's another users installed some applications like Torrent that download and share large files, or watching youtube, or consuming some live streaming content.

 

 
3Client Internet Connectivity

On the client side, for easier troubleshoot. Please ask customer to use right tools such as Teamviewer. Top 3 reasons from client side

  • The Obvious: The URL has a typo.
    • Ask the user to double-check the URL. This may seem obvious but it is very critical that the user puts in the correct URL. It is better to ask the user to send you the URL so that you can examine and try it yourself. There are three things to note in the URL.
      1. Is the protocol correct? (i.e) http. Your Web Server admin could have very well blocked port 80 altogether
      2. Is the domain name and the resource being accessed correctly spelled?
      3. Is the Port number correct ? (if any). The browser will default http to 80 or http to 8080.
  • The user’s PC does not have Network connection
    • Ask the user to access other sites to make sure his PC is in the Network. If he cannot access any sites, chances are his PC is having some issues.

  • The user’s PC is not able to resolve names (DNS)
    • It is possible that the user’s PC is unable to resolve any names due to DNS issues. Have the user execute “nslookup <domain name>” in command prompt and ensure that he is able to resolve the domain name to an IP. 

  • The client router's problem
    • Advise client to find internal IT expert to fix their router until they able to connect to internet. Kindly inform customer we are not supporting internal problem such as router is reset, LAN cable doesn't work, problem with WIFI. We only support everything that related to EMP server and Port forwarding.
    • If the server is accessible internally, but not accessible externally. You need to ask customer to open router page and login to the router's page and check the Port Forwarding setup. Get the router model and brand and proceed to research the configuration from google. If you can get their internal IT to fix port forwarding, that will be better.

  • The client's router is congested
    • Root cause : A large number of network broadcasts, A large number of broadcasts can be an indication of a misconfigured or faulty NIC, host, switch, software, driver, etc. and can also be an indication of a malware infection somewhere in the network
    • High usage from user's client internally such as downloading and buffering behavior that consumes too much connectivity content.
    • We can advise customer to "restart" their modem and router. Remove the power cord from the modem and/or router, wait at least 10 seconds, and then plug the modem and/or router back in.

Troubleshoot Internet connection problems
http://windows.microsoft.com/en-us/windows-vista/troubleshoot-internet-connection-problems
Guide to set DNS
https://developers.google.com/speed/public-dns/docs/using?hl=en

4Public Internet

If you are connected from "Public Internet", you might experienced having difficulties of accessing URL which causes by:

  • Enhanced Protected Mode
  • Browser History
  • Browser Add-On
  • Proxy and DNS Settings 
  • Browser need to be reset
  • Check whether a third-party service, program, or anti-virus is conflicting with browser
  • Temporarily disabling the firewall
  • Updating older drivers and editing registry key TabProcGrowth 
  • Checking Windows Updates for drivers
  • Restore or refresh your PC

 

 

 

Refer to this wiki for solutions

https://support.microsoft.com/en-us/kb/956196

5Data Center Network

Always refer to complete server list https://docs.google.com/spreadsheets/d/1nfHlOJYQhhxe_aMSOLJd6zC1cL9dptmPPf8t7zMwllE/edit#gid=0

 

Need to ask from data center

  • Internet connectivity of data center
  • Customer server condition.
  • ssh port is accessible?
  • backup database can be restored and manually restored to wavelet cloud or AWS

AWS confluences:

http://intranet.wavelet.asia/projects/cloud-amazon/wiki/Launching_new_EMP_instance_on_Amazon

/wiki/spaces/WU/pages/36798614

/wiki/spaces/WU/pages/22839391

/wiki/spaces/WU/pages/28803771

Azure confluences:

/wiki/spaces/WU/pages/11567303

 

6Server Network

For AWS server network, you may click the instance -> monitoring. Refer to network in and network out graph and observer

 

If the network out too high, use netstat to observe which IP that causing high network out.

kill and chmod the process will help

7Server Operating SystemsServer Operating Systems : Ubuntu, Centos, Amazon Linux AMI

Useful LINUX command

http://www.tecmint.com/useful-linux-commands-for-system-administrators/

The different between Ubuntu and Centos

Ubuntu : apt-get

Centos : yum

8Server Application Server

Refer to Ubuntu and EMP installation guide to back to basic

/wiki/spaces/WU/pages/11568251

Basic Checking for server slow

http://intranet.wavelet.asia/projects/tech/wiki/Trouble_Shoot_Server_Slow__Unresponsive

http://intranet.wavelet.asia/projects/tech/wiki/Server_Slow_Issue

Missing run.sh and shutdown.sh

Auto add run.sh and shutdown.sh using rc.local script

 

Out of memory error

http://intranet.wavelet.asia/projects/tech/wiki/Out_of_memory_error_--_javalangOutOfMemoryError

 

 

 

 

as root

Check running process using these command and observe

showlog : Output the running server log

top : List processes running on the system

free -m : check RAM size

ps aux|grep jboss : Process status of jboss

jboss-stop : to stop server

/etc/init.d/postgresql-9.2.4 restart : to restart postgres

jboss-start : to start server

reboot : to reboot the server

 

Useful wiki

http://intranet.wavelet.asia/projects/tech/wiki/%22What_to_do_when_server_is_down%22-A_complete_write_up

http://intranet.wavelet.asia/projects/tech/wiki/Checking_Performance_of_Linux_System_(_using_%E2%80%9Ctop%E2%80%9D_command_)

http://intranet.wavelet.asia/projects/tech/wiki/High_Availability__Load_Balancing_and_Replication_for_PostgreSQL

http://intranet.wavelet.asia/projects/tech/wiki/JBoss_

http://intranet.wavelet.asia/projects/tech/wiki/JBoss_configurations

http://intranet.wavelet.asia/projects/tech/wiki/Connection_Pool_size

9Server Database Server
  • If you are not able to connect to the wsemp database, most probably your postgres has not started yet.
    • run command as root, /etc/init.d/postgresql-9.2.4 start and try psql command
  • When you run commad ps aux|grep java and found idle in transaction, means some generated query is stuck and you need to find out which query. 
  • In most cases, customer is inpatient with server down, immediately offer to restart their server when idle in transaction happened. After restarting server, check the server log and troubleshot what was happened.
    • kill -9 running process
    • restart postgres
    • jboss-start
    • Check server log : cd /usr/java/jboss/server/default/log/


common error is when postgres cannot start as postmasterpid has invalid data

To solve: http://intranet.wavelet.asia/projects/tech/wiki/When_postgres_cannot_start_as_postmasterpid_has_invalid_data

10Server Hard Disk

Run this command

df -h

If the usage is more thatn 90%, you need to clear up some data by finding it using command

find / -type f -size +20M -exec ls -lh {} \; | awk '{ print $NF ": " $5 }'

Note: Don't delete anything inside /var/lib/pgsql/data, we stored database inside that folder. It's normal for that folder have big size

In same cases, HDD 100% caused by /var/log/cups/ 

/wiki/spaces/WU/pages/20579213

11Wavelet EMP Itself

Server slowness can be found only from specific module from EMP caused by Inefficient code or SQL queries

  • Get from customer which module causing slowness and test from your personal PC or Laptop whether the module is really slow. Run command showlog, observe whether there is any exception error found. If you are not confident on the code, get any programmer to sit beside you and help to troubleshoot it.
    • Variety of exceptions:
      • org.postgresql.util.PSQLException. Solutions: copy the query before that error appeared and report to programming team to fix the SQL query.

      • nullpointerExecption. Solutions: Repopulate the database to your localhost. Run the debug or your eclipse and found which table or queries causing null pointer exception. In most cases, error will be solved if you find which table has null data and update it to some values depends on the "Type". Example, if the Type is integer, you can update to 0, If it is string, update to '' or empty string
  • When customer complain their reports run very slow, but after you repopulate the database to your localhost no such thing as slowness occurred. What you need to do
    • Check the whether the VACUMM is running. 
      • To check as empbackup users. run vim backup.log and you need to make sure the latest backup log shows the VACUMM is done.
    • Inform customer that you will repopulate database again at night. In same cases, the slowness is gone after we re-populate their database. This can't be done at working hours unless customer ask you to

  • Common cases is that Modules is missing from EMP. Please refer to below wiki to fix the problem. 

 

Check the server log using vim command

 vim /usr/java/jboss/server/default/log/server.log

12Usage of EMP

Heavy usage of EMP from customer : Need to find out which report and who is generating and communicate with customer about the usage of report.

The usage of query checking running process would be useful to find out the specific report

 

 

To check Queries to check running processes

http://intranet.wavelet.asia/projects/tech/wiki/Queries_to_check_running_processes

 

If jboss-stop doesn't run anything

use ps aux|grep java

kill -9 2304

 

 

 

 

 

 

 

 

  • No labels