Crashes
“… Lembre-se que estes organismos biônicos também falham.”
(Seriado recente da Mulher Biônica)
Sim é um fato real: o Linux eventualmente “crasha”.
Isto não ocorre com frequência, entretanto eu resolvi investigar um pouco.
Deixei micro ligado antes do almoço, por volta do meio-dia e quando retornei, às 15:00 percebi que o sistema aparentava estar normal, exceto pela rede wireless estar desconectada e a luz de indicação na frente do micro, indicava conexão.
Achei aquilo esquisito e tentei reabilitar a rede com o Network-manager, mas em foi em vão. Parte para os procedimentos de perdas-mínimas que sempre utilizo:
- Fechei as páginas abertas no Firefox
- Parei os aplicativos
- Reiniciei o Gnome – ctrl-alt-bkspc
- Ao aparecer a console e verifiquei o processos com o tradicional ps -ax e os mesmos estavam normais
- Já no root, Parei a rede com o comando /etc/init.d/networking stop e a console congelou quando emiti comando ifconfig
- Fiz o mesmo em outra console e esta congelou também
- Na tty7 a mensagem informava que o Gnome estava reiniciando.
- Emiti o comando poweroff e o sistema fez o broadcast para as consoles, mas a parada ficou aguardando alguma evolução – em outras palavras travou
- Nisto o indicador de wireless ainda sinalizava conexão.
- Restou-me o bom e velho ctrl-alt-del. E o sistema começou a cair sem desmontar os discos (aiaiaiaeee) e … novamente parou no meio do processo de reinicialização.
- Por fim… desliguei na marra pressionando o botão de liga-desliga.
Óbvio que sempre me preocupa desligar tudo com discos montados e isto me dá uma sensação realmente muito desagradável de fraqueza néofita. (:-S)
Quando reiniciava o sistema me dei conta de que esta situação, devido a um suposto problema no driver da placa de rede, tem ocorrido com uma frequência acima do esperado – em torno de uma a cada duas semanas.
Minha falta de controle sobre a quantidade de vezes que isto ocorre também me chamou a atenção.
Quando o sistema voltou, passei a pesquisar alguma forma de registrar isto e também saber qual a origem do problema, pois devido a “natureza fechada” de certos drivers, em especial este da broadcom – Broadcom Corporation NetLink BCM5906M Fast Ethernet PCI Express (rev 02) – admito que tenho sido muito tolerante com estas falhas e admito minha negligência.
A minha investigação resultou em alguns links sobre o tema:
Tips and tools for handling Linux crashes
Projeto Linux Kernel Crash Dumps
Dealing with crash reports
Log Analyzers
Log Analysis for intrusion detection (off-topic)
Com os estes logs
cheguei a seguinte conclusão
Quando rodei o apport ele gerou o seguinte que me deu uma ÓTIMA DICA sobre o horário do problema (11:45)
marz@persimmon:~$ apport
bash: apport: comando não encontrado
marz@persimmon:~$ ap
apache2 apt
apache2ctl apt-cache
aplay apt-cdrom
aplaymidi apt-config
apm apt-extracttemplates
apm_available apt-ftparchive
apmd apt-get
apmsleep aptitude
apparmor_parser aptitude-create-state-bundle
apparmor_status aptitude-run-state-bundle
appletviewer apt-key
apport-cli apt-mark
apport-unpack apt-sortpkgs
appres apturl
apropos
marz@persimmon:~$ apport-cli*** The program “evolution” closed unexpectedly on 2008-04-23 at 11:45:11.
If you were not doing anything confidential (entering passwords or other
private information), you can help to improve the application by reporting
the problem.What would you like to do? Your options are:
R: Report Problem…
I: Cancel and ignore future crashes of this program version
C: Cancel
Please choose (R/I/C): c
Na segunda vez que rodei o apport ele não gerou mais nada. (armadilhas…)
Daí eu rodei o apport-cli no /var/crash que havia encontrado, e ele gerou a seguinte saída: apport-cli log
que não me trouxe nenhuma informação útil.
O problema ficou evidente ao analisar o kern.log e damon.log por volta do horário apresentado pelo apport.log.
- Eu fiz uma inserção do PEN-DRIVE e o retirei em seguida
- Esta operação aparentemente gerou algum atraso no bus, que
- Causou alguma disfunção na placa de rede Wireless
- Nesta situação, sem rede, o evolution deve ter se enrolado nas suas várias threads e entrou num deadlock, causando a parada do sistema de rede
Abaixo as comparações:
Hora do problema: apport-cli
*** The program "evolution" closed unexpectedly on 2008-04-23 at 11:45:11.
Inserção do pendrive (a rede começou a soluçar)
Apr 23 11:12:13 persimmon hald: mounted /dev/sdb1 on behalf of uid 1000
Apr 23 11:18:56 persimmon vmnet-dhcpd: DHCPREQUEST for 172.16.47.129 from 00:0c:29:a8:36:89 via vmnet8
Apr 23 11:18:56 persimmon vmnet-dhcpd: DHCPACK on 172.16.47.129 to 00:0c:29:a8:36:89 via vmnet8
Retirada do pendrive (e o soluço continua)
Apr 23 11:20:53 persimmon hald: unmounted /dev/sdb1 from '/media/PEN_MARZ' on behalf of uid 1000
Apr 23 11:20:56 persimmon NetworkManager: <debug> [1208960456.131519] nm_hal_device_removed(): Device removed (hal udi is '/org/freedesktop/Hal/devices/usb_device_13fe_1d00_5B750294183A_if0_scsi_host_scsi_device_lun0_scsi_generic').
Apr 23 11:20:56 persimmon NetworkManager: <debug> [1208960456.136506] nm_hal_device_removed(): Device removed (hal udi is '/org/freedesktop/Hal/devices/usb_device_13fe_1d00_5B750294183A_if0_scsi_host_scsi_device_lun0').
Apr 23 11:20:56 persimmon NetworkManager: <debug> [1208960456.136999] nm_hal_device_removed(): Device removed (hal udi is '/org/freedesktop/Hal/devices/usb_device_13fe_1d00_5B750294183A_if0_scsi_host').
Apr 23 11:20:56 persimmon NetworkManager: <debug> [1208960456.137468] nm_hal_device_removed(): Device removed (hal udi is '/org/freedesktop/Hal/devices/volume_uuid_47FC_E843').
Apr 23 11:20:56 persimmon NetworkManager: <debug> [1208960456.137933] nm_hal_device_removed(): Device removed (hal udi is '/org/freedesktop/Hal/devices/usb_device_13fe_1d00_5B750294183A_if0').
Apr 23 11:20:56 persimmon NetworkManager: <debug> [1208960456.138401] nm_hal_device_removed(): Device removed (hal udi is '/org/freedesktop/Hal/devices/usb_device_13fe_1d00_5B750294183A_usbraw').
Apr 23 11:20:56 persimmon NetworkManager: <debug> [1208960456.149947] nm_hal_device_removed(): Device removed (hal udi is '/org/freedesktop/Hal/devices/storage_serial_Kingston_DataTraveler_2_0_5B750294183A_0_0').
Apr 23 11:20:56 persimmon NetworkManager: <debug> [1208960456.153384] nm_hal_device_removed(): Device removed (hal udi is '/org/freedesktop/Hal/devices/usb_device_13fe_1d00_5B750294183A').
Apr 23 11:24:38 persimmon nmbd[11652]: [2008/04/23 11:24:38, 0] nmbd/nmbd_namequery.c:query_name_response(109)
Queda da rede
Apr 23 11:24:38 persimmon nmbd[11652]: query_name_response: Multiple (2) responses received for a query on subnet 192.168.1.110 for name SOHO<1d>.
Apr 23 11:24:38 persimmon nmbd[11652]: This response was from IP 192.168.1.100, reporting an IP address of 192.168.1.100.
Apr 23 11:24:38 persimmon nmbd[11652]: [2008/04/23 11:24:38, 0] nmbd/nmbd_namequery.c:query_name_response(109)
Apr 23 11:24:38 persimmon nmbd[11652]: query_name_response: Multiple (3) responses received for a query on subnet 192.168.1.110 for name SOHO<1d>.
Apr 23 13:33:18 persimmon NetworkManager: <info> Activation (eth0) Stage 2 of 5 (Device Configure) complete.
Apr 23 13:33:55 persimmon NetworkManager: <WARN> nm_device_802_11_wireless_get_essid(): error getting ESSID for device eth0: Resource temporarily unavailable
Tentativa de volta da rede
Apr 23 13:34:27 persimmon last message repeated 16 times
Apr 23 13:35:17 persimmon last message repeated 25 times
Apr 23 13:35:18 persimmon NetworkManager: <info> Activation (eth0/wireless): association took too long (>120s), failing activation.
Apr 23 13:35:18 persimmon NetworkManager: <info> Activation (eth0) failure scheduled...
Apr 23 13:35:19 persimmon NetworkManager: <info> Activation (eth0) failed for access point (SohoAP)
Apr 23 13:35:19 persimmon NetworkManager: <info> Activation (eth0) failed.
Apr 23 13:35:19 persimmon NetworkManager: <info> Deactivating device eth0.
Apr 23 13:35:19 persimmon NetworkManager: <WARN> nm_device_802_11_wireless_set_essid(): error setting ESSID to '' for device eth0: Resource temporarily unavailable
Apr 23 13:35:19 persimmon NetworkManager: <WARN> nm_device_802_11_wireless_set_wep_enc_key(): error setting key for device eth0: Resource temporarily unavailable
Apr 23 13:35:19 persimmon NetworkManager: <WARN> nm_device_802_11_wireless_get_mode(): error getting card mode on eth0: Resource temporarily unavailable
Apr 23 13:35:19 persimmon NetworkManager: <WARN> nm_device_802_11_wireless_set_mode(): error setting card eth0 to Infrastructure mode: Resource temporarily unavailable
Reboot forçado
Apr 23 13:35:19 persimmon NetworkManager: <WARN> nm_device_802_11_wireless_set_essid(): error setting ESSID to '' for device eth0: Resource temporarily unavailable
Apr 23 13:35:19 persimmon NetworkManager: <WARN> nm_device_802_11_wireless_set_wep_enc_key(): error setting key for device eth0: Resource temporarily unavailable
Apr 23 13:35:19 persimmon NetworkManager: <WARN> nm_device_802_11_wireless_get_mode(): error getting card mode on eth0: Resource temporarily unavailable
Apr 23 13:35:19 persimmon NetworkManager: <WARN> nm_device_802_11_wireless_set_mode(): error setting card eth0 to Infrastructure mode: Resource temporarily unavailable
Apr 23 13:41:29 persimmon vmnet-dhcpd: DHCPREQUEST for 172.16.47.129 from 00:0c:29:a8:36:89 via vmnet8
No kern.log a coisa aparece mais limpa
Apr 23 11:13:54 persimmon kernel: [99516.360000] /dev/vmmon[11030]: host clock rate change request 83 -> 1043
Apr 23 11:14:07 persimmon kernel: [99529.372000] /dev/vmmon[11030]: host clock rate change request 1043 -> 83
Apr 23 11:20:56 persimmon kernel: [99937.600000] usb 7-4: USB disconnect, address 6
Apr 23 13:32:53 persimmon kernel: [107854.820000] ipw3945: Error sending cmd #07 to daemon: time out after 500ms.
Apr 23 13:32:54 persimmon kernel: [107856.416000] ipw3945: Error sending SCAN_ABORT_CMD: time out after 500ms.
Apr 23 13:32:55 persimmon kernel: [107856.916000] ipw3945: Error sending cmd #08 to daemon: time out after 500ms.
Apr 23 13:32:55 persimmon kernel: [107857.416000] ipw3945: Error sending ADD_STA: time out after 500ms.
Apr 23 13:32:56 persimmon kernel: [107857.916000] ipw3945: Error sending SCAN_ABORT_CMD: time out after 500ms.
Apr 23 13:32:59 persimmon kernel: [107861.484000] ipw3945: Error sending LEDS_CMD: time out after 500ms.
Apr 23 13:33:18 persimmon kernel: [107880.060000] bridge-eth0: disabling the bridge
Apr 23 13:33:18 persimmon kernel: [107880.160000] bridge-eth0: down
Apr 23 13:33:18 persimmon kernel: [107880.160000] ADDRCONF(NETDEV_UP): eth0: link is not ready
Apr 23 13:33:18 persimmon kernel: [107880.160000] bridge-eth0: enabling the bridge
Apr 23 13:33:18 persimmon kernel: [107880.160000] bridge-eth0: up
Apr 23 13:33:55 persimmon kernel: [107916.916000] ipw3945: Error sending LEDS_CMD: time out after 500ms.
Apr 23 13:33:55 persimmon kernel: [107917.416000] ipw3945: Error sending LEDS_CMD: time out after 500ms.
Apr 23 13:35:20 persimmon kernel: [108001.648000] bridge-eth0: disabling the bridge
Apr 23 13:35:20 persimmon kernel: [108001.660000] bridge-eth0: down
E o syslog passa a régua.
Apr 23 13:34:29 persimmon last message repeated 16 times
Apr 23 13:35:17 persimmon last message repeated 24 times
Apr 23 13:35:18 persimmon NetworkManager: <info> Activation (eth0/wireless): association took too long (>120s), failing activation.
Apr 23 13:35:18 persimmon NetworkManager: <info> Activation (eth0) failure scheduled...
Apr 23 13:35:19 persimmon NetworkManager: <info> Activation (eth0) failed for access point (SohoAP)
Apr 23 13:35:19 persimmon NetworkManager: <info> Activation (eth0) failed.
Apr 23 13:35:19 persimmon NetworkManager: <info> Deactivating device eth0.
Apr 23 13:35:19 persimmon NetworkManager: <WARN> nm_device_802_11_wireless_set_essid(): error setting ESSID to '' for device eth0: Resource temporarily unavailable
Apr 23 13:35:19 persimmon NetworkManager: <WARN> nm_device_802_11_wireless_set_wep_enc_key(): error setting key for device eth0: Resource temporarily unavailable
Apr 23 13:35:19 persimmon NetworkManager: <WARN> nm_device_802_11_wireless_get_mode(): error getting card mode on eth0: Resource temporarily unavailable
Apr 23 13:35:19 persimmon NetworkManager: <WARN> nm_device_802_11_wireless_set_mode(): error setting card eth0 to Infrastructure mode: Resource temporarily unavailable
Apr 23 13:35:20 persimmon kernel: [108001.648000] bridge-eth0: disabling the bridge
Apr 23 13:35:20 persimmon kernel: [108001.660000] bridge-eth0: down
Apr 23 13:41:29 persimmon vmnet-dhcpd: DHCPREQUEST for 172.16.47.129 from 00:0c:29:a8:36:89 via vmnet8
Conclusão: os LOGs devem ser levados em consideração em análises quando o sistema se comporta de maneira estranha. E eu apesar de ter entendido o fato ocorrido – guardadas as devidas irrelevâncias – continuo com um “mico” na mão, pois a fragilidade do sistema parece vir de várias partes do sistema:
- HAL
- Usb
- Rede
- Evolution
Alguma ajuda ou idéia ?
Nenhum comentário ainda.



