Discussion:
svn commit: r309400 - head/sys/dev/acpica
Oliver Pinter
2016-12-03 20:12:54 UTC
Permalink
On Fri, Dec 2, 2016 at 9:21 AM, Hans Petter Selasky
Author: hselasky
Date: Fri Dec 2 08:21:08 2016
New Revision: 309400
URL: https://svnweb.freebsd.org/changeset/base/309400
Fix for endless recursion in the ACPI GPE handler during boot.
When handling a GPE ACPI interrupt object the EcSpaceHandler()
function can be called which checks the EC_EVENT_SCI bit and then
recurse on the EcGpeQueryHandler() function. If there are multiple GPE
events pending the EC_EVENT_SCI bit will be set at the next call to
EcSpaceHandler() causing it to recurse again via the
EcGpeQueryHandler() function. This leads to a slow never ending
recursion during boot which prevents proper system startup, because
the EC_EVENT_SCI bit never gets cleared in this scenario.
The behaviour is reproducible with the ALASKA AMI in combination with
Enter BIOS and adjust the clock one hour forward. Save and exit the
BIOS. System fails to boot due to the above mentioned bug in
EcGpeQueryHandler() which was observed recursing multiple times.
This patch adds a simple recursion guard to the EcGpeQueryHandler()
function and also also adds logic to detect if new GPE events occurred
during the execution of EcGpeQueryHandler() and then loop on this
function instead of recursing.
Reviewed by: jhb
MFC after: 2 weeks
head/sys/dev/acpica/acpi_ec.c
I have similar error since the latest BIOS update on my gigabyte
H170N-Wifi board. The curiosity of the BIOS update was after upgrading
to this version, there are no possibility to rollback to older
version.

The other weird thing, is that MFCing back this patch does not help. I
get stucked lock in acmtx mutex, as you
could see from the attached log. The other interesting is the ACPI
error at boot time:

[1] ACPI Error: Mutex [0x0] is not acquired, cannot release
(20160527/utmutex-386)
[1] ACPI Error: Could not release AML Interpreter mutex (20160527/exutils-147)
[1] ACPI Error: Mutex [0x0] is not acquired, cannot release
(20160527/utmutex-386)
[1] ACPI Error: Could not release AML Interpreter mutex (20160527/exutils-147)
[1] cpu1: <ACPI CPU> on acpi0
[1] ACPI Error: Mutex [0x0] is not acquired, cannot release
(20160527/utmutex-386)
[1] ACPI Error: Could not release AML Interpreter mutex (20160527/exutils-147)
[1] ACPI Error: Mutex [0x0] is not acquired, cannot release
(20160527/utmutex-386)
[1] ACPI Error: Could not release AML Interpreter mutex (20160527/exutils-147)

(This error is on 10-STABLE.)
Modified: head/sys/dev/acpica/acpi_ec.c
==============================================================================
--- head/sys/dev/acpica/acpi_ec.c Fri Dec 2 08:15:52 2016 (r309399)
+++ head/sys/dev/acpica/acpi_ec.c Fri Dec 2 08:21:08 2016 (r309400)
@@ -613,16 +613,14 @@ EcCheckStatus(struct acpi_ec_softc *sc,
}
static void
-EcGpeQueryHandler(void *Context)
+EcGpeQueryHandlerSub(struct acpi_ec_softc *sc)
{
- struct acpi_ec_softc *sc = (struct acpi_ec_softc *)Context;
UINT8 Data;
ACPI_STATUS Status;
int retry;
char qxx[5];
ACPI_FUNCTION_TRACE((char *)(uintptr_t)__func__);
- KASSERT(Context != NULL, ("EcGpeQueryHandler called with NULL"));
/* Serialize user access with EcSpaceHandler(). */
Status = EcLock(sc);
@@ -647,7 +645,6 @@ EcGpeQueryHandler(void *Context)
EC_EVENT_INPUT_BUFFER_EMPTY)))
break;
}
- sc->ec_sci_pend = FALSE;
if (ACPI_FAILURE(Status)) {
EcUnlock(sc);
device_printf(sc->ec_dev, "GPE query failed: %s\n",
@@ -678,6 +675,29 @@ EcGpeQueryHandler(void *Context)
}
}
+static void
+EcGpeQueryHandler(void *Context)
+{
+ struct acpi_ec_softc *sc = (struct acpi_ec_softc *)Context;
+ int pending;
+
+ KASSERT(Context != NULL, ("EcGpeQueryHandler called with NULL"));
+
+ do {
+ /* Read the current pending count */
+ pending = atomic_load_acq_int(&sc->ec_sci_pend);
+
+ /* Call GPE handler function */
+ EcGpeQueryHandlerSub(sc);
+
+ /*
+ * Try to reset the pending count to zero. If this fails we
+ * know another GPE event has occurred while handling the
+ * current GPE event and need to loop.
+ */
+ } while (!atomic_cmpset_int(&sc->ec_sci_pend, pending, 0));
+}
+
/*
* The GPE handler is called when IBE/OBF or SCI events occur. We are
* called from an unknown lock context.
@@ -706,13 +726,14 @@ EcGpeHandler(ACPI_HANDLE GpeDevice, UINT
* It will run the query and _Qxx method later, under the lock.
*/
EcStatus = EC_GET_CSR(sc);
- if ((EcStatus & EC_EVENT_SCI) && !sc->ec_sci_pend) {
+ if ((EcStatus & EC_EVENT_SCI) &&
+ atomic_fetchadd_int(&sc->ec_sci_pend, 1) == 0) {
CTR0(KTR_ACPI, "ec gpe queueing query handler");
Status = AcpiOsExecute(OSL_GPE_HANDLER, EcGpeQueryHandler, Context);
- if (ACPI_SUCCESS(Status))
- sc->ec_sci_pend = TRUE;
- else
+ if (ACPI_FAILURE(Status)) {
printf("EcGpeHandler: queuing GPE query handler failed\n");
+ atomic_store_rel_int(&sc->ec_sci_pend, 0);
+ }
}
return (ACPI_REENABLE_GPE);
}
@@ -759,7 +780,8 @@ EcSpaceHandler(UINT32 Function, ACPI_PHY
* we call it directly here since our thread taskq is not active yet.
*/
if (cold || rebooting || sc->ec_suspending) {
- if ((EC_GET_CSR(sc) & EC_EVENT_SCI)) {
+ if ((EC_GET_CSR(sc) & EC_EVENT_SCI) &&
+ atomic_fetchadd_int(&sc->ec_sci_pend, 1) == 0) {
CTR0(KTR_ACPI, "ec running gpe handler directly");
EcGpeQueryHandler(sc);
}
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/svn-src-head
Oliver Pinter
2016-12-03 20:56:35 UTC
Permalink
Post by Oliver Pinter
On Fri, Dec 2, 2016 at 9:21 AM, Hans Petter Selasky
Author: hselasky
Date: Fri Dec 2 08:21:08 2016
New Revision: 309400
URL: https://svnweb.freebsd.org/changeset/base/309400
Fix for endless recursion in the ACPI GPE handler during boot.
When handling a GPE ACPI interrupt object the EcSpaceHandler()
function can be called which checks the EC_EVENT_SCI bit and then
recurse on the EcGpeQueryHandler() function. If there are multiple GPE
events pending the EC_EVENT_SCI bit will be set at the next call to
EcSpaceHandler() causing it to recurse again via the
EcGpeQueryHandler() function. This leads to a slow never ending
recursion during boot which prevents proper system startup, because
the EC_EVENT_SCI bit never gets cleared in this scenario.
The behaviour is reproducible with the ALASKA AMI in combination with
Enter BIOS and adjust the clock one hour forward. Save and exit the
BIOS. System fails to boot due to the above mentioned bug in
EcGpeQueryHandler() which was observed recursing multiple times.
This patch adds a simple recursion guard to the EcGpeQueryHandler()
function and also also adds logic to detect if new GPE events occurred
during the execution of EcGpeQueryHandler() and then loop on this
function instead of recursing.
Reviewed by: jhb
MFC after: 2 weeks
head/sys/dev/acpica/acpi_ec.c
I have similar error since the latest BIOS update on my gigabyte
H170N-Wifi board. The curiosity of the BIOS update was after upgrading
to this version, there are no possibility to rollback to older
version.
The other weird thing, is that MFCing back this patch does not help. I
get stucked lock in acmtx mutex, as you
could see from the attached log. The other interesting is the ACPI
[1] ACPI Error: Mutex [0x0] is not acquired, cannot release
(20160527/utmutex-386)
[1] ACPI Error: Could not release AML Interpreter mutex
(20160527/exutils-147)
[1] ACPI Error: Mutex [0x0] is not acquired, cannot release
(20160527/utmutex-386)
[1] ACPI Error: Could not release AML Interpreter mutex
(20160527/exutils-147)
[1] cpu1: <ACPI CPU> on acpi0
[1] ACPI Error: Mutex [0x0] is not acquired, cannot release
(20160527/utmutex-386)
[1] ACPI Error: Could not release AML Interpreter mutex
(20160527/exutils-147)
[1] ACPI Error: Mutex [0x0] is not acquired, cannot release
(20160527/utmutex-386)
[1] ACPI Error: Could not release AML Interpreter mutex
(20160527/exutils-147)
(This error is on 10-STABLE.)
After backported the last to ACPICA update to 10-STABLE with this
patch, the issue reducated to this warning message:

[1] acpi0: Power Button (fixed)
[1] ACPI Error: Method parse/execution failed [\134_SB.PCI0.IOTR._CRS]
(Node 0xfffff80006592f00), AE_AML_NO_RESOURCE_END_TAG
(20161117/psparse-560)
[1] ACPI Error: Method execution failed [\134_SB.PCI0.IOTR._CRS] (Node
0xfffff80006592f00), AE_AML_NO_RESOURCE_END_TAG (20161117/uteval-111)
[1] can't fetch resources for \134_SB_.PCI0.IOTR - AE_AML_NO_RESOURCE_END_TAG

but the lockup has gone. ;)


[trim]
Oliver Pinter
2016-12-03 21:10:58 UTC
Permalink
On Sat, Dec 3, 2016 at 9:58 PM, Oliver Pinter
Post by Oliver Pinter
Post by Oliver Pinter
On Fri, Dec 2, 2016 at 9:21 AM, Hans Petter Selasky
Author: hselasky
Date: Fri Dec 2 08:21:08 2016
New Revision: 309400
URL: https://svnweb.freebsd.org/changeset/base/309400
Fix for endless recursion in the ACPI GPE handler during boot.
When handling a GPE ACPI interrupt object the EcSpaceHandler()
function can be called which checks the EC_EVENT_SCI bit and then
recurse on the EcGpeQueryHandler() function. If there are multiple GPE
events pending the EC_EVENT_SCI bit will be set at the next call to
EcSpaceHandler() causing it to recurse again via the
EcGpeQueryHandler() function. This leads to a slow never ending
recursion during boot which prevents proper system startup, because
the EC_EVENT_SCI bit never gets cleared in this scenario.
The behaviour is reproducible with the ALASKA AMI in combination with
Enter BIOS and adjust the clock one hour forward. Save and exit the
BIOS. System fails to boot due to the above mentioned bug in
EcGpeQueryHandler() which was observed recursing multiple times.
This patch adds a simple recursion guard to the EcGpeQueryHandler()
function and also also adds logic to detect if new GPE events occurred
during the execution of EcGpeQueryHandler() and then loop on this
function instead of recursing.
Reviewed by: jhb
MFC after: 2 weeks
head/sys/dev/acpica/acpi_ec.c
I have similar error since the latest BIOS update on my gigabyte
H170N-Wifi board. The curiosity of the BIOS update was after upgrading
to this version, there are no possibility to rollback to older
version.
The other weird thing, is that MFCing back this patch does not help. I
get stucked lock in acmtx mutex, as you
could see from the attached log. The other interesting is the ACPI
[1] ACPI Error: Mutex [0x0] is not acquired, cannot release
(20160527/utmutex-386)
[1] ACPI Error: Could not release AML Interpreter mutex
(20160527/exutils-147)
[1] ACPI Error: Mutex [0x0] is not acquired, cannot release
(20160527/utmutex-386)
[1] ACPI Error: Could not release AML Interpreter mutex
(20160527/exutils-147)
[1] cpu1: <ACPI CPU> on acpi0
[1] ACPI Error: Mutex [0x0] is not acquired, cannot release
(20160527/utmutex-386)
[1] ACPI Error: Could not release AML Interpreter mutex
(20160527/exutils-147)
[1] ACPI Error: Mutex [0x0] is not acquired, cannot release
(20160527/utmutex-386)
[1] ACPI Error: Could not release AML Interpreter mutex
(20160527/exutils-147)
(This error is on 10-STABLE.)
After backported the last to ACPICA update to 10-STABLE with this
Attached the two backport.
Post by Oliver Pinter
[1] acpi0: Power Button (fixed)
[1] ACPI Error: Method parse/execution failed [\134_SB.PCI0.IOTR._CRS]
(Node 0xfffff80006592f00), AE_AML_NO_RESOURCE_END_TAG
(20161117/psparse-560)
[1] ACPI Error: Method execution failed [\134_SB.PCI0.IOTR._CRS] (Node
0xfffff80006592f00), AE_AML_NO_RESOURCE_END_TAG (20161117/uteval-111)
[1] can't fetch resources for \134_SB_.PCI0.IOTR -
AE_AML_NO_RESOURCE_END_TAG
but the lockup has gone. ;)
CC: ACPI and AMD64
Post by Oliver Pinter
[trim]
Moore, Robert
2016-12-06 22:01:55 UTC
Permalink
-----Original Message-----
Sent: Saturday, December 3, 2016 1:11 PM
Subject: Re: svn commit: r309400 - head/sys/dev/acpica
On Sat, Dec 3, 2016 at 9:58 PM, Oliver Pinter
Post by Oliver Pinter
Post by Oliver Pinter
On Fri, Dec 2, 2016 at 9:21 AM, Hans Petter Selasky
Author: hselasky
Date: Fri Dec 2 08:21:08 2016
New Revision: 309400
URL: https://svnweb.freebsd.org/changeset/base/309400
Fix for endless recursion in the ACPI GPE handler during boot.
When handling a GPE ACPI interrupt object the EcSpaceHandler()
function can be called which checks the EC_EVENT_SCI bit and then
recurse on the EcGpeQueryHandler() function. If there are
multiple GPE
Post by Oliver Pinter
Post by Oliver Pinter
events pending the EC_EVENT_SCI bit will be set at the next call
to
Post by Oliver Pinter
Post by Oliver Pinter
EcSpaceHandler() causing it to recurse again via the
EcGpeQueryHandler() function. This leads to a slow never ending
recursion during boot which prevents proper system startup,
because
Post by Oliver Pinter
Post by Oliver Pinter
the EC_EVENT_SCI bit never gets cleared in this scenario.
The behaviour is reproducible with the ALASKA AMI in combination
with
Post by Oliver Pinter
Post by Oliver Pinter
Enter BIOS and adjust the clock one hour forward. Save and exit
the
Post by Oliver Pinter
Post by Oliver Pinter
BIOS. System fails to boot due to the above mentioned bug in
EcGpeQueryHandler() which was observed recursing multiple times.
This patch adds a simple recursion guard to the
EcGpeQueryHandler()
Post by Oliver Pinter
Post by Oliver Pinter
function and also also adds logic to detect if new GPE events
occurred
Post by Oliver Pinter
Post by Oliver Pinter
during the execution of EcGpeQueryHandler() and then loop on this
function instead of recursing.
Reviewed by: jhb
MFC after: 2 weeks
head/sys/dev/acpica/acpi_ec.c
I have similar error since the latest BIOS update on my gigabyte
H170N-Wifi board. The curiosity of the BIOS update was after
upgrading to this version, there are no possibility to rollback to
older version.
The other weird thing, is that MFCing back this patch does not help.
I get stucked lock in acmtx mutex, as you could see from the
[1] ACPI Error: Mutex [0x0] is not acquired, cannot release
(20160527/utmutex-386)
[1] ACPI Error: Could not release AML Interpreter mutex
(20160527/exutils-147)
[1] ACPI Error: Mutex [0x0] is not acquired, cannot release
(20160527/utmutex-386)
[1] ACPI Error: Could not release AML Interpreter mutex
(20160527/exutils-147)
[1] cpu1: <ACPI CPU> on acpi0
[1] ACPI Error: Mutex [0x0] is not acquired, cannot release
(20160527/utmutex-386)
[1] ACPI Error: Could not release AML Interpreter mutex
(20160527/exutils-147)
[1] ACPI Error: Mutex [0x0] is not acquired, cannot release
(20160527/utmutex-386)
[1] ACPI Error: Could not release AML Interpreter mutex
(20160527/exutils-147)
(This error is on 10-STABLE.)
After backported the last to ACPICA update to 10-STABLE with this
Attached the two backport.
Post by Oliver Pinter
[1] acpi0: Power Button (fixed)
[1] ACPI Error: Method parse/execution failed
[\134_SB.PCI0.IOTR._CRS] (Node 0xfffff80006592f00),
AE_AML_NO_RESOURCE_END_TAG
(20161117/psparse-560)
[1] ACPI Error: Method execution failed [\134_SB.PCI0.IOTR._CRS]
(Node 0xfffff80006592f00), AE_AML_NO_RESOURCE_END_TAG
(20161117/uteval-111) [1] can't fetch resources for
\134_SB_.PCI0.IOTR - AE_AML_NO_RESOURCE_END_TAG
[Moore, Robert]

This is a regression in 20161117 that is fixed and will be released in about 2 weeks.
Bob
Post by Oliver Pinter
but the lockup has gone. ;)
CC: ACPI and AMD64
Post by Oliver Pinter
[trim]
Oliver Pinter
2016-12-06 22:48:14 UTC
Permalink
Post by Moore, Robert
-----Original Message-----
Sent: Saturday, December 3, 2016 1:11 PM
Subject: Re: svn commit: r309400 - head/sys/dev/acpica
On Sat, Dec 3, 2016 at 9:58 PM, Oliver Pinter
Post by Oliver Pinter
Post by Oliver Pinter
On Fri, Dec 2, 2016 at 9:21 AM, Hans Petter Selasky
Author: hselasky
Date: Fri Dec 2 08:21:08 2016
New Revision: 309400
URL: https://svnweb.freebsd.org/changeset/base/309400
Fix for endless recursion in the ACPI GPE handler during boot.
When handling a GPE ACPI interrupt object the EcSpaceHandler()
function can be called which checks the EC_EVENT_SCI bit and then
recurse on the EcGpeQueryHandler() function. If there are
multiple GPE
Post by Oliver Pinter
Post by Oliver Pinter
events pending the EC_EVENT_SCI bit will be set at the next call
to
Post by Oliver Pinter
Post by Oliver Pinter
EcSpaceHandler() causing it to recurse again via the
EcGpeQueryHandler() function. This leads to a slow never ending
recursion during boot which prevents proper system startup,
because
Post by Oliver Pinter
Post by Oliver Pinter
the EC_EVENT_SCI bit never gets cleared in this scenario.
The behaviour is reproducible with the ALASKA AMI in combination
with
Post by Oliver Pinter
Post by Oliver Pinter
Enter BIOS and adjust the clock one hour forward. Save and exit
the
Post by Oliver Pinter
Post by Oliver Pinter
BIOS. System fails to boot due to the above mentioned bug in
EcGpeQueryHandler() which was observed recursing multiple times.
This patch adds a simple recursion guard to the
EcGpeQueryHandler()
Post by Oliver Pinter
Post by Oliver Pinter
function and also also adds logic to detect if new GPE events
occurred
Post by Oliver Pinter
Post by Oliver Pinter
during the execution of EcGpeQueryHandler() and then loop on this
function instead of recursing.
Reviewed by: jhb
MFC after: 2 weeks
head/sys/dev/acpica/acpi_ec.c
I have similar error since the latest BIOS update on my gigabyte
H170N-Wifi board. The curiosity of the BIOS update was after
upgrading to this version, there are no possibility to rollback to
older version.
The other weird thing, is that MFCing back this patch does not help.
I get stucked lock in acmtx mutex, as you could see from the
[1] ACPI Error: Mutex [0x0] is not acquired, cannot release
(20160527/utmutex-386)
[1] ACPI Error: Could not release AML Interpreter mutex
(20160527/exutils-147)
[1] ACPI Error: Mutex [0x0] is not acquired, cannot release
(20160527/utmutex-386)
[1] ACPI Error: Could not release AML Interpreter mutex
(20160527/exutils-147)
[1] cpu1: <ACPI CPU> on acpi0
[1] ACPI Error: Mutex [0x0] is not acquired, cannot release
(20160527/utmutex-386)
[1] ACPI Error: Could not release AML Interpreter mutex
(20160527/exutils-147)
[1] ACPI Error: Mutex [0x0] is not acquired, cannot release
(20160527/utmutex-386)
[1] ACPI Error: Could not release AML Interpreter mutex
(20160527/exutils-147)
(This error is on 10-STABLE.)
After backported the last to ACPICA update to 10-STABLE with this
Attached the two backport.
Post by Oliver Pinter
[1] acpi0: Power Button (fixed)
[1] ACPI Error: Method parse/execution failed
[\134_SB.PCI0.IOTR._CRS] (Node 0xfffff80006592f00),
AE_AML_NO_RESOURCE_END_TAG
(20161117/psparse-560)
[1] ACPI Error: Method execution failed [\134_SB.PCI0.IOTR._CRS]
(Node 0xfffff80006592f00), AE_AML_NO_RESOURCE_END_TAG
(20161117/uteval-111) [1] can't fetch resources for
\134_SB_.PCI0.IOTR - AE_AML_NO_RESOURCE_END_TAG
[Moore, Robert]
This is a regression in 20161117 that is fixed and will be released in about 2 weeks.
Bob
Hi Bob,

For me the original issue was different. Querying the acpi related
sysctls locked up / deadlocked for me,
and backporting the 20161117 ACPICA fixed the issue for me, but
introduced a new one (probably the one what you mentioned).

Btw, thanks for the info about the expected new release.

Oliver
Post by Moore, Robert
Post by Oliver Pinter
but the lockup has gone. ;)
CC: ACPI and AMD64
Post by Oliver Pinter
[trim]
Loading...