profile
viewpoint

5tan/cxxmidi 21

C++ MIDI library

5tan/ESPAsyncWebServer 0

Async Web Server for ESP8266 and ESP32

5tan/pomodoro 0

Pomodoro timer plugin for Xfce4 panel.

issue commentesphome/issues

Weird ESC character in Web Server GUI logs

how to reopen?

5tan

comment created time in 23 days

issue commentesphome/issues

Double log messages in Web Server GUI

how to reopen?

5tan

comment created time in 23 days

issue commentesphome/issues

Scheduler deadlock

@oxan Thank you for your concern. I forgot to mention that I have downgraded to esphome==1.20.4 (which seems to perform better, at least from my application, in term of number of requests it can handle). If the fix was applied, this must have been done in later version. Since I applied the fix and workaround race condition (described above) my application runs stable for 19h already :tada:

5tan

comment created time in a month

fork 5tan/ESPAsyncWebServer

Async Web Server for ESP8266 and ESP32

fork in a month

issue commentesphome/issues

Scheduler deadlock

the issue was already reported https://github.com/Depau/ESPAsyncWebServer/tree/wi-se-patches

5tan

comment created time in a month

issue commentesphome/issues

Scheduler deadlock

I have fixed the issue by adding a recursive mutex to Scheduler and putting a lock on each of it's public methods :tada: No more scheduler deadlocks! Thank's for your support @OttoWinter !

Unfortunately I still can't say that esphome works stable. After ~1h of bombarding it with requests, I got following crash:

[13:56:09]Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.
[13:56:09]Core 0 register dump:
[13:56:09]PC      : 0x400e71be  PS      : 0x00060030  A0      : 0x800e72c8  A1      : 0x3ffd5380  
INFO Need to fetch platformio IDE-data, please stand by
INFO Running:  platformio run -d my_test -t idedata
WARNING Decoded 0x400e71be: String::c_str() const at /home/z00lwix/my_test/my_test/.piolibdeps/my_test/ESPAsyncWebServer-esphome/src/WebRequest.cpp:797
 (inlined by) AsyncWebServerRequest::_removeNotInterestingHeaders() at /home/z00lwix/my_test/my_test/.piolibdeps/my_test/ESPAsyncWebServer-esphome/src/WebRequest.cpp:184
[13:56:10]A2      : 0x3ffd6980  A3      : 0x3ffd5a10  A4      : 0xbaad5678  A5      : 0x3ffc6f5c  
[13:56:10]A6      : 0x3ffd6a14  A7      : 0x3ffd6990  A8      : 0x800e721d  A9      : 0x3ffd5360  
[13:56:10]A10     : 0x3ffd5a10  A11     : 0x3ffd5a10  A12     : 0x3ffc6f5c  A13     : 0x3ffd602e  
[13:56:10]A14     : 0x0000000d  A15     : 0x00000000  SAR     : 0x00000008  EXCCAUSE: 0x0000001c  
[13:56:10]EXCVADDR: 0xbaad5678  LBEG    : 0x400014fd  LEND    : 0x4000150d  LCOUNT  : 0xfffffffe  
[13:56:10]
[13:56:10]ELF file SHA256: 0000000000000000
[13:56:10]
[13:56:10]Backtrace: 0x400e71be:0x3ffd5380 0x400e72c5:0x3ffd53b0 0x400e73c1:0x3ffd53f0 0x400e7615:0x3ffd5440 0x40163dd9:0x3ffd5460 0x40163e55:0x3ffd54a0 0x40164502:0x3ffd54c0 0x4008a5ba:0x3ffd54f0
WARNING Found stack trace! Trying to decode it
WARNING Decoded 0x400e71be: String::c_str() const at /home/z00lwix/my_test/my_test/.piolibdeps/my_test/ESPAsyncWebServer-esphome/src/WebRequest.cpp:797
 (inlined by) AsyncWebServerRequest::_removeNotInterestingHeaders() at /home/z00lwix/my_test/my_test/.piolibdeps/my_test/ESPAsyncWebServer-esphome/src/WebRequest.cpp:184
WARNING Decoded 0x400e72c5: AsyncWebServerRequest::_parseLine() at /home/z00lwix/my_test/my_test/.piolibdeps/my_test/ESPAsyncWebServer-esphome/src/WebRequest.cpp:797
WARNING Decoded 0x400e73c1: AsyncWebServerRequest::_onData(void*, unsigned int) at /home/z00lwix/my_test/my_test/.piolibdeps/my_test/ESPAsyncWebServer-esphome/src/WebRequest.cpp:797
WARNING Decoded 0x400e7615: std::_Function_handler<void (void*, AsyncClient*, void*, unsigned int), AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*, void*, unsigned int)#8}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&, std::_Any_data const&, unsigned int&&) at /home/z00lwix/my_test/my_test/.piolibdeps/my_test/ESPAsyncWebServer-esphome/src/WebRequest.cpp:797
 (inlined by) _M_invoke at /home/z00lwix/.platformio/packages/toolchain-xtensa32/xtensa-esp32-elf/include/c++/5.2.0/functional:1871
WARNING Decoded 0x40163dd9: std::function<void (void*, AsyncClient*, void*, unsigned int)>::operator()(void*, AsyncClient*, void*, unsigned int) const at /home/z00lwix/my_test/my_test/.piolibdeps/my_test/AsyncTCP-esphome/src/AsyncTCP.cpp:1136
 (inlined by) AsyncClient::_recv(tcp_pcb*, pbuf*, signed char) at /home/z00lwix/my_test/my_test/.piolibdeps/my_test/AsyncTCP-esphome/src/AsyncTCP.cpp:951
WARNING Decoded 0x40163e55: AsyncClient::_s_recv(void*, tcp_pcb*, pbuf*, signed char) at /home/z00lwix/my_test/my_test/.piolibdeps/my_test/AsyncTCP-esphome/src/AsyncTCP.cpp:1136
WARNING Decoded 0x40164502: _async_service_task(void*) at /home/z00lwix/my_test/my_test/.piolibdeps/my_test/AsyncTCP-esphome/src/AsyncTCP.cpp:1136
 (inlined by) _async_service_task at /home/z00lwix/my_test/my_test/.piolibdeps/my_test/AsyncTCP-esphome/src/AsyncTCP.cpp:197
WARNING Decoded 0x4008a5ba: vPortTaskWrapper at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c:355 (discriminator 1)

@OttoWinter what does "LoadProhibited" exactly mean? Does it mean ESP32 run out of memory? Or does it mean "segmentation fault"/invalid pointer? (If latter, I guess I should open a separate issue).

5tan

comment created time in a month

issue commentesphome/issues

Scheduler deadlock

If underlying library can't easily be configured to work in a single thread, I thought I can simply put a mutex lock on WebServer::handleRequest to make web server component handle requests synchronously. I have tested this idea and unfortunately this didn't solve the issue: premature scheduler major increment still takes place. I think this breaks the hypothesis of the race condition from web server.

5tan

comment created time in a month

issue commentesphome/issues

Scheduler deadlock

This hypothesis seems to correlate with my observation, that deadlock isn't happening on ESP8266/sonoff_sv. Is it possible to make web server synchronous/single-threaded for ESP32?

5tan

comment created time in a month

issue commentesphome/issues

Scheduler deadlock

@OttoWinter what I have observed when enabled #define ESPHOME_DEBUG_SCHEDULER is that scheduler deadlock happens always after millis_major_ gets incremented here.

e.g.:

[15:20:23][I][main:127]: hello from my script :)
[15:20:23][I][main:127]: hello from my script :)
[15:20:23][I][main:127]: hello from my script :)
[15:20:23][I][main:127]: hello from my script :)
[15:20:23][I][main:127]: hello from my script :)
[15:20:23][I][main:127]: hello from my script :)
[15:20:23][I][main:127]: hello from my script :)
[15:20:24][I][main:127]: hello from my script :)
[15:20:24][I][main:127]: hello from my script :)
[15:20:24][I][main:127]: hello from my script :)
[15:20:24][I][main:127]: hello from my script :)
[15:20:24][I][main:127]: hello from my script :)
[15:20:24][I][scheduler:235]: Incrementing scheduler major
[15:20:24][I][main:127]: hello from my script :)
[15:20:24][I][main:083]: hello from interval!
[15:20:25][I][main:083]: hello from interval!
[15:20:25][I][scheduler:095]: Items: count=40, now=239057
[15:20:25][I][scheduler:101]:   interval 'update' interval=3000 last_execution=236532 (0) next=239532 (0)
[15:20:25][I][scheduler:101]:   interval 'update' interval=1000 last_execution=238704 (0) next=239704 (0)
[15:20:25][I][scheduler:101]:   interval 'update' interval=1000 last_execution=238815 (0) next=239815 (0)
[15:20:25][I][scheduler:101]:   interval 'update' interval=1000 last_execution=238898 (0) next=239898 (0)
[15:20:25][I][scheduler:101]:   interval 'update' interval=1000 last_execution=238900 (0) next=239900 (0)
[15:20:25][I][scheduler:101]:   interval 'update' interval=1000 last_execution=238964 (0) next=239964 (0)
[15:20:25][I][scheduler:101]:   interval 'update' interval=3000 last_execution=236976 (0) next=239976 (0)
[15:20:25][I][scheduler:101]:   interval '' interval=10000 last_execution=231436 (0) next=241436 (0)
[15:20:25][I][scheduler:101]:   interval 'update' interval=3000 last_execution=239020 (0) next=242020 (0)
[15:20:25][I][scheduler:101]:   interval 'update' interval=10000 last_execution=237455 (0) next=247455 (0)
[15:20:25][I][scheduler:101]:   interval 'update' interval=60000 last_execution=235652 (0) next=295652 (0)
[15:20:25][I][scheduler:101]:   timeout '' interval=0 last_execution=237708 (1) next=237708 (1)
[15:20:25][I][scheduler:101]:   timeout '' interval=0 last_execution=237758 (1) next=237758 (1)
[15:20:25][I][scheduler:101]:   timeout '' interval=0 last_execution=237810 (1) next=237810 (1)
[15:20:25][I][scheduler:101]:   timeout '' interval=0 last_execution=237855 (1) next=237855 (1)
[15:20:25][I][scheduler:101]:   timeout '' interval=0 last_execution=237901 (1) next=237901 (1)
[15:20:25][I][scheduler:101]:   timeout '' interval=0 last_execution=237949 (1) next=237949 (1)
[15:20:25][I][scheduler:101]:   timeout '' interval=0 last_execution=237996 (1) next=237996 (1)
[15:20:25][I][scheduler:101]:   timeout '' interval=0 last_execution=238041 (1) next=238041 (1)
[15:20:25][I][scheduler:101]:   timeout '' interval=0 last_execution=238088 (1) next=238088 (1)
[15:20:25][I][scheduler:101]:   timeout '' interval=0 last_execution=238139 (1) next=238139 (1)
[15:20:25][I][scheduler:101]:   timeout '' interval=0 last_execution=238189 (1) next=238189 (1)
[15:20:25][I][scheduler:101]:   timeout '' interval=0 last_execution=238238 (1) next=238238 (1)
[15:20:25][I][scheduler:101]:   timeout '' interval=0 last_execution=238282 (1) next=238282 (1)
[15:20:25][I][scheduler:101]:   timeout '' interval=0 last_execution=238347 (1) next=238347 (1)
[15:20:25][I][scheduler:101]:   timeout '' interval=0 last_execution=238394 (1) next=238394 (1)
[15:20:25][I][scheduler:101]:   timeout '' interval=0 last_execution=238443 (1) next=238443 (1)
[15:20:25][I][scheduler:101]:   timeout '' interval=0 last_execution=238489 (1) next=238489 (1)
[15:20:25][I][scheduler:101]:   timeout '' interval=0 last_execution=238538 (1) next=238538 (1)
[15:20:25][I][scheduler:101]:   timeout '' interval=0 last_execution=238584 (1) next=238584 (1)
[15:20:25][I][scheduler:101]:   timeout '' interval=0 last_execution=238630 (1) next=238630 (1)
[15:20:25][I][scheduler:101]:   timeout '' interval=0 last_execution=238677 (1) next=238677 (1)
[15:20:25][I][scheduler:101]:   timeout '' interval=0 last_execution=238735 (1) next=238735 (1)
[15:20:25][I][scheduler:101]:   timeout '' interval=0 last_execution=238781 (1) next=238781 (1)
[15:20:25][I][scheduler:101]:   timeout '' interval=0 last_execution=238835 (1) next=238835 (1)
[15:20:26][I][scheduler:101]:   timeout '' interval=0 last_execution=238882 (1) next=238882 (1)
[15:20:26][I][scheduler:101]:   timeout '' interval=0 last_execution=238937 (1) next=238937 (1)
[15:20:26][I][scheduler:101]:   timeout '' interval=0 last_execution=238986 (1) next=238986 (1)
[15:20:26][I][scheduler:101]:   timeout '' interval=0 last_execution=239040 (1) next=239040 (1)
[15:20:26][I][scheduler:101]:   timeout '' interval=5000 last_execution=237658 (1) next=242658 (1)

Now all timeout items are being added to the heap with this "major"==1 and never get executed.

I am not exactly sure, what is the purpose of "major". Looking at this line i thought it should be incremented only on overflow (after approx 50days, according to the reference). But it looks like "major" get incremented earlier when I am bombarding esphome with REST requests. Any idea why?

Also why those timeout items are not being executed?

5tan

comment created time in a month

issue commentesphome/issues

Scheduler deadlock

I am unable to reproduce the issue with esphome==1.20.4

5tan

comment created time in a month

issue openedesphome/issues

Scheduler deadlock

The problem

I can very easily put scheduler into a "deadlock state" using REST requests to the web server. In "deadlock state" scheduler won't dispatch any new REST requests (not even reboot request).

To reproduce the issue please compile & run attached yaml. Then start periodic requests to the web server: while [ true ] ; do curl http://192.168.0.15/switch/my_switch/toggle -i -X POST ; done. Scheduler deadlock will happen in a few minutes.

Here is a demo video: https://user-images.githubusercontent.com/15819543/138690910-0db82fb3-50f8-45d8-ba38-2cd142b4f402.mp4 Deadlock happens at 02:46.


I have added a following log to diagnose the issue:

diff --git a/usr/local/lib/python3.8/dist-packages/esphome/core/scheduler.cpp_ b/usr/local/lib/python3.8/dist-packages/esphome/core/scheduler.cpp
index a6d3e03..7efe953 100644
--- a/usr/local/lib/python3.8/dist-packages/esphome/core/scheduler.cpp_
+++ b/usr/local/lib/python3.8/dist-packages/esphome/core/scheduler.cpp
@@ -198,6 +198,7 @@ void HOT Scheduler::process_to_add() {
 
     this->items_.push_back(std::move(it));
     std::push_heap(this->items_.begin(), this->items_.end(), SchedulerItem::cmp);
+    ESP_LOGI(TAG, "queue size is %d", this->items_.size());
   }
   this->to_add_.clear();
 }

Here is a video showing the logs: https://user-images.githubusercontent.com/15819543/138692401-e2ccda85-eb6f-4677-a0ed-f686ea2a8a5f.mp4 Issue happens at 02:34. As you can see no more items from the queue/heap are being dispatched/popped.

Which version of ESPHome has the issue?

2021.10.2

What type of installation are you using?

pip

Which version of Home Assistant has the issue?

N/A

What platform are you using?

ESP32

Board

nodemcu-32s / Espressif ESP32-DevKitC-32E

Component causing the issue

core/scheduler

Example YAML snippet

esphome:
  name: scheduler-esp32
  platform: ESP32
  board: nodemcu-32s

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

logger:
  level: INFO

ota:

web_server:
  port: 80

interval:
  - interval: 1s
    then:
      - logger.log:
          format: "hello from interval"
          level: INFO

globals:
  - id: my_switch_status
    type: bool
    restore_value: no
    initial_value: "false"

script:
  - id: my_script
    mode: restart
    then:
      - logger.log:
          format: "hello from my script :)"
          level: INFO

switch:
  - platform: template
    id: my_switch
    name: my_switch
    lambda: return id(my_switch_status);
    turn_on_action:
      - script.execute: my_script
      - lambda: id(my_switch_status) = true;
    turn_off_action:
      - script.execute: my_script
      - lambda: id(my_switch_status) = false;
  - platform: restart
    name: reboot

Anything in the logs that might be useful for us?

No response

Additional information

Issue is very easy to reproduce within a few minutes on ESP32, but I was unable to reproduce it on ESP8266/sonoff_sv!

created time in a month

issue commentesphome/issues

Toggling switch in Web Server GUI can trigger "Guru Meditation Error: Core 1 panic'ed (LoadProhibited)", when DEBUG log level is used and website is shown

possibly related to https://github.com/me-no-dev/ESPAsyncWebServer/issues/932

5tan

comment created time in 2 months

more