You are not logged in.

#1 2014-11-18 12:05:22

stef_204
Member
Registered: 2014-08-15
Posts: 37

[SOLVED] Fan problems, infinite loop, revs up and down, since update

Hi,

(I believe this is a kernel issue but if this is the wrong forum, mod please feel free to move it  or let me know.)

The fan on my laptop (MSI GE60, cpu IntelCore i7-3630QM, Video Nvidia GeForce GTX 660M) is running a perfectly consistent cycle of revving very high and then immediately revving back down, approx. every 10 seconds.  It is a perfect sine wave, which starts as soon as kernel boots and never fails. 

This started with a recent update (hard to pinpoint exactly but approximately in the last 10 days.)

I tried the lts kernel (3.14.24.1) but it does not resolve the problem (which could be indicative it is something other than kernel, a module, etc.?)

I have also booted my laptop from a Mandriva 2008 Live CD I had lying around, and the issue does not seem to be present using that OS.

I have googled, and searched Arch's wiki and forum, to no avail.

I have looked here: https://wiki.archlinux.org/index.php/fan_speed_control and looked at lm-sensors to try to sort it out but to no avail, so far.

$ man sensors.conf seems to describe either this problem or a similar one  (I do not know if the issue I am experiencing is related or not), in the following section:

man sensors.conf wrote:

THERMAL HYSTERESIS MECHANISM
Many  monitoring  chips do not handle the high and critical temperature limits as simple limits. Instead, they have two values for each limit, one which triggers an alarm when the temperature  rises and another one which clears the alarm when the temperature falls. The latter is typically a few degrees below the former. This mechanism is known as hysteresis.

The reason for implementing things that way is that high temperature alarms typically trigger an action to attempt to cool the system down, either by scaling down the  CPU  frequency,  or  by kicking  in  an  extra  fan.  This should normally let the temperature fall in a timely manner.  If this was clearing the alarm immediately, then the system would be back to its original state where the temperature rises and the alarm would immediately trigger again, causing an undesirable tight fan on, fan off loop. The hysteresis mechanism ensures that the system is  really  cool before the fan stops, so that it will not have to kick in again immediately.

I have run sensors-detect and gone through all the steps but no changes.  I have also downloaded the most recent version of sensors-detect (sensors-detect revision 6256 (2014-11-17 09:21:25 +0100) but no changes either.

running $ sensors (with the latest revision 6256) on my laptop yields only very little information (which could be indicative of a problem?):

$ sensors
nouveau-pci-0100
Adapter: PCI adapter
temp1:            N/A  (high = +95.0°C, hyst =  +3.0°C)
                       (crit = +105.0°C, hyst =  +5.0°C)
                       (emerg = +135.0°C, hyst =  +5.0°C)

That's the only info, which seems very little.

I looked at changing the fan divisor but am struggling with identifying the chipset and not even sure it would work.  I'm going about this sort of "trial and error" which is probably not the best way to go.

Would greatly appreciate some guidance in troubleshooting this issue (which I am concerned might ruin the fan over time) and hopefully resolving it.

Thanks.

PS Edit: I have run sensors-detect one more time, using the sensors-detect script which comes standard with our distro (revision 6209) and the output of "sensors" is now back to what follows (but no change in the fan loop problem which is still present):

$ sensors
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +46.0°C  (high = +87.0°C, crit = +105.0°C)
Core 0:         +45.0°C  (high = +87.0°C, crit = +105.0°C)
Core 1:         +46.0°C  (high = +87.0°C, crit = +105.0°C)
Core 2:         +44.0°C  (high = +87.0°C, crit = +105.0°C)
Core 3:         +40.0°C  (high = +87.0°C, crit = +105.0°C)

nouveau-pci-0100
Adapter: PCI adapter
temp1:            N/A  (high = +95.0°C, hyst =  +3.0°C)
                       (crit = +105.0°C, hyst =  +5.0°C)
                       (emerg = +135.0°C, hyst =  +5.0°C)

And here is:

$ sensors -u
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:
  temp1_input: 48.000
  temp1_max: 87.000
  temp1_crit: 105.000
  temp1_crit_alarm: 0.000
Core 0:
  temp2_input: 45.000
  temp2_max: 87.000
  temp2_crit: 105.000
  temp2_crit_alarm: 0.000
Core 1:
  temp3_input: 48.000
  temp3_max: 87.000
  temp3_crit: 105.000
  temp3_crit_alarm: 0.000
Core 2:
  temp4_input: 44.000
  temp4_max: 87.000
  temp4_crit: 105.000
  temp4_crit_alarm: 0.000
Core 3:
  temp5_input: 40.000
  temp5_max: 87.000
  temp5_crit: 105.000
  temp5_crit_alarm: 0.000

nouveau-pci-0100
Adapter: PCI adapter
temp1:
ERROR: Can't get value of subfeature temp1_input: Can't read  (<-- could this error be contributing to the problem?)
  temp1_max: 95.000
  temp1_max_hyst: 3.000
  temp1_crit: 105.000
  temp1_crit_hyst: 5.000
  temp1_emergency: 135.000
  temp1_emergency_hyst: 5.000

Last edited by stef_204 (2014-11-24 10:17:43)

Offline

#2 2014-11-19 11:55:27

stef_204
Member
Registered: 2014-08-15
Posts: 37

Re: [SOLVED] Fan problems, infinite loop, revs up and down, since update

Update

  1. The issue has become less consistent, i.e. at times, it has appeared before booting the kernel AND, at other times, it has failed to appear after booting--for about 10 minutes, only to come back then.  Hard to know what sets it off.

  2. Upon further research, I have found many owners of the MSI GE60 laptop experiencing same problem (there are video clips on youtube of users documenting it--it is exactly the same issue) so this looks like a bug specific to the MSI GE60.

  3. It is related to fan tables/temperatures in the EC (Embedded Controller) and MSI has finally come up with an updated version of the EC in the BIOS to fix it.  I did the update (of the EC only, not full BIOS) and so far so good.  I will wait a couple of days to make sure it is (and stays) fixed and mark this thread as solved.  (It now looks like the thread should have been posted to the "Laptop Issues" forum as opposed to Kernel.)

Last edited by stef_204 (2014-11-19 11:59:28)

Offline

Board footer

Powered by FluxBB