/
Server Troubleshoot

Server Troubleshoot

There are many reasons that could lead to the users experiencing slowness, by understanding the root cause of the system slowness, we will be able to troubleshoot effectively. In order to resolve the problems effectively, it is extremely important for the users to provide the details.


  1. CHECKING SERVER SLOW SCRIPT
    This script is usually is in the server. To run the script as root: 
    ./checking_server_slow.sh

    This script has compiled to check on error that is happening on the server which may causes the server slowness. 

    Note: If there script is not located in the customer server, please add this into the server.

    Steps: 
    vim checking_server_slow.sh
    Press i to insert text
    copy and paste the content of the text and press esc
    :wq to save 
    chmod a+x ./checking_server_slow.sh



    As shown above, the server is currently having a outofmemory error. This error usually causes some of the user to unable to access server while others might not facing the same problem. The solutions recommended for this error is restart the server.



    This the example for Exception error shown. The user might experienced slowness.

    Checking running query can give us the query that could have stuck which may causes the server slowness.

     

    The query above shows that the user is generating a report with large duration which consumes the server resources causes server slowness. 


  2. VIRUS
    However the there could be another problem which causes server slowness. One of the causes is because of virus in the server pc. To check on this, as root, type top and the CPU is above 100% 



    As shown above, the CPU is currently at 332.6%, indicating the presence of a virus in the server pc.
    If this happens, please refer to wiki: /wiki/spaces/WM/pages/1002276120


  3. HARDDISC FULL
    For on premise customer, harddisc capacity could also leads to server slowness when it is full.

    The above screenshot is a sign that the harddisc is currently too full to check on running queries.

    To check on harddisc, enter df -h as root.


    The image above shows that the harddisc is currently at 100% capacity, which causes the server slowness. If this happens, you can remove some of the file that are no longer used.
    For example, old server log. To find server log, enter locate server.log



    You can see the directory to the server log. cd to the folder and enter ls -lhrt


    This will list out the item inside the folder. You can also see the size of the server log. To make room in the harddisc, you can remove some of the older server log as temporary measure.

    You can also find files using line:
    find / -type f -size +20M -exec ls -lh {} \; | awk '{ print $NF ": " $5 }'

    This is to find file with more than 20MB. You can change the size according tot your preferences.



    If you are unsure the file can be safely remove or not, please refer to your manager.

  4. RUNNING QUERIES
    Stucked query can also causes server slowness. To check on this, login into postgres from tealive test or keyopswork
    PGPASSWORD=4v1c3nn4s4msung psql postgres --host=my-samsung-hq-rds-new-emp.cejxvpigvz8w.ap-southeast-1.rds.amazonaws.com --port=5432 --username=janet

    To select all the running queries using postgres:
    SELECT (now() - xact_start) as period, pid,datname,state, substring(query, 0, 80) FROM pg_stat_activity where (now() - xact_start) is not null order by (now() - xact_start) desc;

    If you want to check on specific server (ALLIT): 
    SELECT (now() - xact_start), pid,datname,state, substring(query, 0, 80), application_name FROM pg_stat_activity where datname ='allit' order by (now() - xact_start) desc; 

    To check on the full query:
    SELECT query from pg_stat_activity where pid =XXXX;

    To kill / terminate the query:
    SELECT pg_terminate_backend(PID);
    select pg_cancel_backend(PID);

  5. RESTARTING SERVER
    Restart the server can usually solve most of the problem. To restart the server you have to stop the jboss using jboss-stop. 
    This could take a while depending on the server usage at the moment. If it takes too long, use ps aux|grep jboss to check on the jboss and kill -9 [PID]. however, killing the jboss could make some transaction error in the server.

    As for on-premise customer please restart postgres as well
    as root: /etc/init.d/postgresql-9.2.4 restart

Private & Confidential