Wednesday, 18 February 2015

Ubuntu black screen after system upgrade fix

I have a laptop on which I run Linux. When I first bought it, it had pre-installed Windows OS on in, and, in general, manufacturer only guaranteed support for Windows operating system. However, due to nature of my work, I simply had to reinstall it and put Linux OS instead. I picked up Ubuntu distribution simply because I have been using it for some time, but I strongly believe most of the things mentioned below would be same or very similar with any other Linux distribution - Debian or RedHat based.

Even though almost everything worked right out of the box, I did have some issues with graphical card. Card I have installed is NVIDIA GeForce GT 540M (yeap, laptop is a bit old by now). During system installation process I was able to find, download and install proper legacy binary drivers for it and it performed beautifully.

However, from time to time, update comes along that simply messes up graphical card drivers which renders it unusable. Symptom is usually always the same one: system starts booting up normally, even the login screen is displayed perfectly, but then, after successful login (you can hear appropriate sounds confirming successful login) screen completely goes blank. As if you simply shut off the monitor. Clear indicator that graphical card drivers are not fine.

It happened to me few times already so, when it happened last night, I simply decided to record quick steps how to recover from this problem so next time it happens, I can fix problem within minutes.

Of course, there are several different ways one can solve this problem, but after several times dealing with this issue, I found the following procedure the quickest.

Step 1: Reboot machine and boot in textual mode

To boot machine in textual mode simply wait for bootloader to display boot options, use arrows to select latest option available and then press 'e' key to enable boot option edit.

In options, find line with kernel options and replace 'quiet splash' option with 'text'.

In some cases, instead of 'text' option 'nomodeset' might work and should be able to start graphical interface in default mode, but in situation like this, when I have problematic graphical driver, I always prefer text mode only.

Step 2: Remove graphical drivers you have on machine

Since problem is wrong / faulty graphical driver, my quickest solution was to simply remove all installed drivers. In my case, I simply do:
sudo apt-get remove nvidia-*
If you wish, you can always dig a bit deeper and find exactly which driver is to blame and simply remove that installation. For me, last time culprit were nvidia-319-updates so I would have fixed problem by:
sudo apt-get purge nvidia-common nvidia-settings-319-updates nvidia-319-updates
Whichever option you have chosen, after removing wrong / faulty drivers, you should be able to move on. 
Step 3: Reboot machine and install correct drivers
Simply reboot machine. It should be able to fully boot up using default graphical drivers. Sure, picture you get might not be the perfect one but at least it will get you to your home ground and all familiar GUI tools you use.

Use any tool to locate and install correct drivers for your graphical card (e.g. Additional Drivers). After installing and applying correct drivers, everything should get back to normal. I also tend to do one more system reboot just to make sure my changes were not temporary and that system will indeed now work as expected.

Monday, 16 December 2013

Short Veritas cluster cheat sheet

Veritas Cluster Server (also known as VCS) is a High-availability cluster software, for Unix, Linux and Microsoft Windows computer systems, created by Veritas Software (now part of Symantec). It provides application cluster capabilities to systems running other applications, including databases, network file sharing, and electronic commerce. As I'm working on it, here is short list of most important commands with short descriptions...

List is compiled more-less from VCS official pages and documents - so you won't find anything new her but merely short organized that I found to be the most useful in my case.

Basic commands

Cluster deamons and log files

Command
Description
had
High Availability Daemon
hashadow
Companion Daemon
Agent
Resource Agent daemon
CmdServer
Web Console cluster managerment daemon
/var/VRTSvcs/log
Log Directory
/var/VRTSvcs/log/engine_A.log
Primary log file (engine log file)

Cluster status

Command
Description
hastatus
Continually monitor cluster and display relevant information
hastatus -sum
Display cluster summary
hastatus -display
Verify the cluster is operating

Cluster details

Command
Description
haclus -display
Display information about a cluster
haclus -value 
Display value for a specific cluster attribute
haclus -modify  
Modify a cluster attribute
haclus -enable LinkMonitoring
Enable LinkMonitoring
haclus -disable LinkMonitoring
Disable LinkMonitoring

Starting and stopping the cluster

Command
Description
hastart [-stale|-force]
"-stale" instructs the engine to treat the local config as stale
"-force" instructs the engine to treat a stale config as a valid one
hastart [-onenode]
 
hasys -force 
Bring the cluster into running mode from a stale state using the configuration file from a particular server
hastop -local
Stop the cluster on the local server but leave the application/s running, do not failover the application/s
hastop -local -evacuate
Stop cluster on local server but evacuate (failover) the application/s to another node within the cluster
hastop -all -force
Stop the cluster on all nodes but leave the application/s running

System operations 

Command
Description
hasys -add 
Add a system to the cluster
hasys -delete 
Delete a system from the cluster
hasys -modify  
Modify a system attributes
hasys -state
List a system state
hasys -force
Force a system to start
hasys -display [-sys]
Display the systems attributes
hasys -list
List all the systems in the cluster
hasys -load  
Change the load attribute of a system
hasys -nodeid
Display the value of a systems nodeid (/etc/llthosts)
hasys -freeze [-persistent][-evacuate]
Freeze a system (No offlining system, No groups onlining) Note: main.cf must be in write mode
hasys -unfreeze [-persistent]
Unfreeze a system ( reenable groups and resource back online) Note: main.cf must be in write mode

User operations 

Command
Description
hauser -add 
Add a user
hauser -update 
Modify a user
hhauser -delete 
Delete a user
hauser -display
Display all users

Dynamic Configuration Commands 

The VCS configuration must be in read/write mode in order to make changes. When in read/write mode the configuration becomes stale, a .stale file is created in $VCS_CONF/conf/config. When the configuration is put back into read only mode the .stale file is removed.

Command
Description
haconf -makerw
Change configuration to read/write mode
haconf -dump -makero
Change configuration to read-only mode
haclus -display | grep -i 'readonly'
Check what mode cluster is running in
(0 = Write mode; 1 = Read only mode)
hacf -verify /etc/VRTSvcs/conf/config
Check the configuration file Note: you can point to any directory as long as it has main.cf and types.cf
hacf -cftocmd /etc/VRTSvcs/conf/config -dest /tmp
Convert a main.cf file into cluster commands
hacf -cmdtocf /tmp -dest /etc/VRTSvcs/conf/config
Convert a command file into a main.cf file

Service groups

Command
Description
haconf -makerw 
  hagrp -add groupw
  hagrp -modify groupw SystemList sun1 1 sun2 2
  hagrp -autoenable groupw -sys sun1
haconf -dump -makero
Add a service group
haconf -makerw
  hagrp -delete groupw
haconf -dump -makero
Delete a service group
haconf -makerw 
  hagrp -modify groupw SystemList sun1 1 sun2 2 sun3 3 
haconf -dump -makero
Change a service group
hagrp -list
List the service groups
hagrp -dep 
List the groups dependencies
hagrp -display 
List the parameters of a group 
hagrp -resources 
Display a service group's resource
hagrp -state 
Display the current state of the service group
hagrp -clear  [-sys]  
Clear a faulted non-persistent resource in a specific grp
# remove the host
hagrp -modify grp_zlnrssd SystemList -delete 
# add the new host (don't forget to state its position)
hagrp -modify grp_zlnrssd SystemList -add  1
# update the autostart list 
hagrp -modify grp_zlnrssd AutoStartList  
Change the system list in a cluster

Service group operations 

Command
Description
hagrp -online  -sys 
Start a service group and bring its resources online
hagrp -offline  -sys 
Stop a service group and takes its resources offline
hagrp -switch  to 
Switch a service group from system to another
hagrp -enableresources 
Enable all the resources in a group
hagrp -disableresources 
Disable all the resources in a group
hagrp -freeze  [-persistent]
Freeze a service group (disable onlining and offlining)
hagrp -unfreeze  [-persistent]
Unfreeze a service group (enable onlining and offlining)
haconf -makerw 
  hagrp -enable  [-sys]
haconf -dump -makero
Enable a service group. Enabled groups can only be brought online.
haconf -makerw
  hagrp -disable  [-sys]
haconf -dump -makero
Disable a service group. Stop from bringing online.
hagrp -flush  -sys 
Flush a service group and enable corrective action

Resources 

Command
Description
haconf -makerw 
  hares -add appDG DiskGroup groupw
  hares -modify appDG Enabled 1
  hares -modify appDG DiskGroup appdg
  hares -modify appDG StartVolumes 0
haconf -dump -makero
Add a resource
haconf -makerw 
  hares -delete 
haconf -dump -makero
Delete a resource
haconf -makerw 
  hares -modify appDG Enabled 1
haconf -dump -makero
Change a resource
hares -global   
Change a resource attribute to be globally wide
hares -local   
Change a resource attribute to be locally wide
hares -display 
List the parameters of a resource
hares -list
List the resources
hares -dep
List the resource dependencies

Resource operations  

Command
Description
hares -online  [-sys]
Online a resource
hares -offline  [-sys]
Offline a resource
hares -state
Display the state of a resource( offline, online, etc)
hares -display 
Display the parameters of a resource
hares -offprop  -sys 
Offline a resource and propagate the command to its children
hares -probe  -sys 
Cause a resource agent to immediately monitor the resource
hares -clear  [-sys]
Clearing a resource (automatically initiates the onlining)

Resource types operations 

Command
Description
hares -online  [-sys]
Add a resource type
hatype -delete 
Remove a resource type
hatype -list
List all resource types
hatype -display 
Display a resource type
hatype -resources 
List a particular resource type
hatype -value  
Change a particular resource types attributes

LLT and GRAB

VCS uses two components, LLT and GAB to share data over the private networks among systems. These components provide the performance and reliability required by VCS.

LLT (Low Latency Transport) provides fast, kernel-to-kernel comms and monitors network connections. The system admin configures the LLT by creating a configuration file (llttab) that describes the systems in the cluster and private network links among them. The LLT runs in layer 2 of the network stack.
GAB (Group membership and Atomic Broadcast) provides the global message order required to maintain a synchronised state among the systems, and monitors disk comms such as that required by the VCS heartbeat utility. The system admin configures GAB driver by creating a configuration file (gabtab).

LLT and GAB files

Command
Description
/etc/llthosts
The file is a database, containing one entry per system, that links the LLT system ID with the hosts name. The file is identical on each server in the cluster.
/etc/llttab
The file contains information that is derived during installation and is used by the utility lltconfig.
/etc/gabtab
The file contains the information needed to configure the GAB driver. This file is used by the gabconfig utility.
/etc/VRTSvcs/conf/config/main.cf
The VCS configuration file. The file contains the information that defines the cluster and its systems.

Gabtab entries

Example entries
/sbin/gabdiskconf - i /dev/dsk/c1t2d0s2 -s 16 -S 1123
/sbin/gabdiskconf - i /dev/dsk/c1t2d0s2 -s 144 -S 1124
/sbin/gabdiskhb -a /dev/dsk/c1t2d0s2 -s 16 -p a -s 1123
/sbin/gabdiskhb -a /dev/dsk/c1t2d0s2 -s 144 -p h -s 1124
/sbin/gabconfig -c -n2
Command
Description
gabdiskconf
-i   Initialises the disk region
-s   Start Block 
-S   Signature
gabdiskhb (heartbeat disks)
-a   Add a gab disk heartbeat resource
-s   Start Block
-p   Port
-S   Signature
gabconfig
-c   Configure the driver for use
-n   Number of systems in the cluster

LLT and GAB commands 

Command
Description
lltstat -n
Verifying that links are active for LLT
lltstat -nvv | more
Verbose output of the lltstat command
lltstat -p
Open ports for LLT
lltstat -c
Display the values of LLT configuration directives
lltstat -l
Lists information about each configured LLT link
lltconfig -a list
List all MAC addresses in the cluster
lltconfig -U
Stop the LLT
lltconfig -c
Start the LLT
gabconfig -a
Verify that GAB is operating Note: port a indicates that GAB is communicating, port h indicates that VCS is started
gabconfig -U
Stop the GAB
gabconfig -c -n 
Start the GAB
gabconfig -c -x
Override the seed values in the gabtab file