Wednesday, 3 April 2013< ^ >
arjan has set the subject to: Zotonic - the Erlang Content Management Framework
[14:54:55] <Marc Worrell> free virtual Windows machines with IE10 for testing
[14:54:56] <Marc Worrell> http://www.modern.ie/en-US/virtualization-tools
[14:55:05] <Marc Worrell> (from Microsoft, fully licensed OS)
[17:17:19] <Maas> geinig..
[20:07:03] <simon.smithies> Just logged a case about my 'restarting' problem
[20:09:08] <simon.smithies> I thought the issue had gone away on my vps but it hasn't ... this is a test server ... if it would help solve the problem, I could give someone ssh access to it
[20:46:49] <Andreas Stenius> looking at the logs..
[20:49:44] <simon.smithies> :)
[20:51:24] <Andreas Stenius> if you don't mind me looking, I've got some time now to if you want me to logon to your server..
[20:53:16] <Andreas Stenius> Just reading up on how to share a ssh session.. so you can follow what I'd do... ;)
[20:53:44] <simon.smithies> sorry - just grabbing breakfast
[20:53:58] <Andreas Stenius> np :)
[20:53:59] <simon.smithies> would be great to watch
[20:54:58] <Andreas Stenius> you have my e-mail, right? or pm the details to me..
[20:56:27] <simon.smithies> you have a gtalk account?
[20:56:40] <Andreas Stenius> yep
[20:57:13] <Andreas Stenius> otherwise mail at git@astekk.se..
[21:49:26] <Andreas Stenius> marc? instead of "chatting" on github... :p
[21:54:45] <simon.smithies> I think it restarts regularly, but crashes (on restart) just occasionally
[21:55:04] <Andreas Stenius> frustrating that it's not telling why it restarts..
[21:55:19] <simon.smithies> so console shows restarts every 5 mins, but crash.log only has occasional entries
[21:55:21] <Marc Worrell> can you log in the supervisors?
[21:55:40] <Andreas Stenius> do you mean add code and recompile?
[21:55:41] <Marc Worrell> or z_supervisors - it should show when a child stops
[21:55:47] <Marc Worrell> worst case - yes
[21:55:56] <Andreas Stenius> I can do it (have access)
[21:56:03] <Marc Worrell> or start the debugger
[21:56:09] <Andreas Stenius> no gui :/
[21:56:13] <Marc Worrell> :p
[21:56:19] <Marc Worrell> sprinkle lager calls
[21:56:23] <Andreas Stenius> haven't redirected the display :p
[21:57:05] <Marc Worrell> we need to know if it is just killed (-9 'ed) or shut down nicely
[21:57:37] <Andreas Stenius> I'd like to see all incoming connections..
[21:57:41] <Andreas Stenius> isn't that logged?
[21:57:54] <Marc Worrell> there are logger calls
[21:58:08] <Marc Worrell> but the access log is written after the request
[21:58:16] <Marc Worrell> would be nice to have something at the start :p
[21:58:23] <Andreas Stenius> there has to be some outside thing affecting the system.. or we woud've seen this
[21:58:24] <Andreas Stenius> yeah
[21:58:27] <Marc Worrell> the z_logger can log better
[21:58:50] <Andreas Stenius> we need better transparency :)
[21:58:52] <Marc Worrell> this is a good exercise to add this kind of instrumentation
[21:58:58] <Andreas Stenius> yep
[21:59:08] <Marc Worrell> more logging - live views etc etc
[22:00:21] <Marc Worrell> i also want to add some kind of debugging to the templates & translations
[22:00:40] <Marc Worrell> "where does this come from" is the kind of questions I regularly ask myself
[22:03:12] <Andreas Stenius> on restart there was nothing in sasl nor crash log. console log simply logs a normal startup... and within less than 5 minutes of initial boot after a rebuild...
[22:03:18] <Marc Worrell> are there any entries in the system logs?
[22:03:19] <Andreas Stenius> will add some debug logging...
[22:03:32] <Andreas Stenius> uhm... will check
[22:03:55] <Marc Worrell> it smells like something external - as I would expect other errors when it was internal
[22:04:33] <Arjan> Andreas Stenius: did you look in other log files on the system?
[22:04:34] <Andreas Stenius> indeed. access denied to /var/log/syslog
[22:04:35] <Arjan> /var/log/*
[22:04:46] <Arjan> hmm
[22:07:27] <Andreas Stenius> hmm.. how does heart monitor erlang?
[22:07:54] <Andreas Stenius> could there something in the system preventing heart from detecting that it is alive, and restarting it every 5 minutes
[22:07:55] <Marc Worrell> via the pid
[22:08:08] <Marc Worrell> try starting without heart?
[22:08:28] <Andreas Stenius> +1
[22:09:00] <Andreas Stenius> well, the pid file looks ok
[22:09:13] <Marc Worrell> default heart beat is 60 seconds
[22:09:26] <Arjan> http://erlang.org/doc/man/heart.html
[22:09:31] <Arjan> it does not look at the pid file iirc
[22:09:34] <Maas> heart and beam.smp look at each other. when you kill heart it is restarted by beam.smp
[22:09:45] <simon.smithies> @Andreas your user can sudo ... if you need to tail syslog
[22:13:05] <Andreas Stenius> oh, ok
[22:13:11] <Andreas Stenius> didn't even try that :p
[22:15:51] <Andreas Stenius> didn't see anthing in the system logs.. sendmail is run every 10 minutes...
[22:16:23] <Maas> Looking at the crash log I see echild. That is from unix ECHILD errno I guess. Why does it error like that? Anyhow, it sounds pretty low level.
[22:16:41] <Andreas Stenius> yeah, I found that strange too
[22:17:07] <Andreas Stenius> and from a gen_server init_ack too... from webmachine_mochiweb, what it seems...
[22:19:09] <Maas> Is there anything particularly interesting about this machine?
[22:19:11] <Arjan> ECHILD -- The wait() or waitpid() function tried to wait for a child process to exit, but all children have already exited.
[22:20:11] <Marc Worrell> could happen when stuff is closed?
[22:22:41] <Andreas Stenius> for future reference... to tag along a ssh session, start with: `screen -d -R <session name>`, then others can join by `screen -x <session name>`. ;)
[22:23:21] <Andreas Stenius> hmmm starting takes quite a long time...
[22:24:27] <simon.smithies> yep ... I thought that was due to a change in 0.9
[22:24:56] <simon.smithies> sometimes git is slow delivering the deps
[22:25:06] <Andreas Stenius> no, shouldn't have to be...
[22:25:07] <Andreas Stenius> aha
[22:25:22] <Andreas Stenius> but when you already have them... it shouldn't take this long..
[22:26:54] <Maas> Does ulimit -a reveal an interesting constraint on this machine maybe?
[22:27:17] <Andreas Stenius> zotonic@vps783:~/zotonic$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 16382
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[22:28:43] <Marc Worrell> really need to add lager log calls to see if the system is shutting down nicely or just killed from mid-air
[22:28:49] <Marc Worrell> did running without heart help?
[22:28:50] <Maas> doesn't look like it, no max user processes. just checking
[22:29:06] <Andreas Stenius> watching that now..
[22:29:25] <Andreas Stenius> didn't find any way to start without heart, so I tweaked a copy of the start script
[22:29:36] <Andreas Stenius> (don't want to use debug, the eshell is funky)
[22:30:03] <simon.smithies> I have to leave you guys to it. Will be back online when I get to the office.
[22:30:09] <Andreas Stenius> feels like eshell and screen doesn't play nice or something..
[22:30:22] <Andreas Stenius> I'm headed to bed in a few...
[22:30:36] <simon.smithies> ok - thanks for looking into this!
[22:30:56] <Andreas Stenius> Simon, sorry I wasn't of more help. I guess we'll have to add more instrumentation to be able to spot what's going on
[22:31:10] <Andreas Stenius> I can try 0.9 real quick, see if it's more stable...
[22:31:24] <simon.smithies> please - good idea
[22:31:36] <simon.smithies> I'll hang around for that
[22:33:16] <Andreas Stenius> just delaying a minute to see if we get the 5 minute restart first...
[22:34:25] <Andreas Stenius> you know what, it seems like running without heart helped...
[22:35:01] <Andreas Stenius> that is interesting to know, now I can switch you back to 0.9
[22:42:11] <Andreas Stenius> oh, bugger... there seems to be some left overs from 0.10-dev now...
[22:42:19] <simon.smithies> :(
[22:43:41] <Marc Worrell> sometimes there are some leftovers in deps
[22:43:57] <Andreas Stenius> I cleaned the deps dir... oh, I didn't clean ebin
[22:44:01] <Marc Worrell> if heart was not working then maybe we can dig further there?
[22:44:34] <Andreas Stenius> yep.
[22:44:39] <Andreas Stenius> but I need to go to bed... :/
[22:47:35] <simon.smithies> understood - and I need to go to work!
[22:50:04] <Andreas Stenius> ok, I got rid of the left overs, I'll monitor it for a while...
[22:50:46] <simon.smithies> not too long - you need your beauty sleep! ;)
[22:50:53] <Andreas Stenius> :)
[22:51:11] <simon.smithies> might chat again in your morning
[22:51:22] <Andreas Stenius> sounds good :)
[22:51:23] <Marc Worrell> surely :)
[22:51:39] <simon.smithies> thanks guys
[22:51:42] <simon.smithies> cu
[22:51:45] <Andreas Stenius> cu
[22:52:45] <Andreas Stenius> ok, 0.9 is restarting for him too... going heartless...
[22:59:51] <Andreas Stenius> which seems to work *phew*
[23:00:28] <Marc Worrell> :p
[23:00:30] <Andreas Stenius> so, it definitely seems to be a heart related issue
[23:00:37] <Marc Worrell> very strange
[23:00:41] <Andreas Stenius> indeed
[23:02:44] <Andreas Stenius> could add some logging in the zotonic-start script whenever zotonic is restarted by heart..
[23:03:02] <Andreas Stenius> where does stdout go in that case? or do we need to send it to syslog or some such?
[23:03:09] <Andreas Stenius> or a file would work, of course
