summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-06-21res_pjsip_pubsub: Address SEGV when attempting to terminate a subscriptionGeorge Joseph
Occasionally under load we'll attempt to send a final NOTIFY on a subscription that's already been terminated and a SEGV will occur down in pjproject's evsub_destroy function. This is a result of a race condition between all the paths that can generate a notify and/or destroy the underlying pjproject evsub object: * The client can send a SUBSCRIBE with Expires: 0. * The client can send a SUBSCRIBE/refresh. * The subscription timer can expire. * An extension state can change. * An MWI event can be generated. * The pjproject transaction timer (timer_b) can expire. Normally when our pubsub_on_evsub_state is called with a terminate, we push a task to the serializer and return at which point the dialog is unlocked. This is usually not a problem because the task runs immediately and locks the dialog again. When the system is heavily loaded though, there may be a delay between the unlock and relock during which another event may occur such as the subscription timer or timer_b expiring, an extension state change, etc. These may also cause a terminate to be processed and if so, we could cause pjproject to try to destroy the evsub structure twice. There's no way for us to tell that the evsub was already destroyed and the evsub's group lock can't tolerate this and SEGVs. The remedy is twofold. * A patch has been submitted to Teluu and added to the bundled pjproject which adds add/decrement operations on evsub's group lock. * In res_pjsip_pubsub: * configure.ac and pjproject-bundled's configure.m4 were updated to check for the new evsub group lock APIs. * We now add a reference to the evsub group lock when we create the subscription and remove the reference when we clean up the subscription. This prevents evsub from being destroyed before we're done with it. * A state has been added to the subscription tree structure so termination progress can be tracked through the asyncronous tasks. * The pubsub_on_evsub_state callback has been split so it's not doing double duty. It now only handles the final cleanup of the subscription tree. pubsub_on_rx_refresh now handles both client refreshes and client terminates. It was always being called for both anyway. * The serialized_on_server_timeout task was removed since serialized_pubsub_on_rx_refresh was almost identical. * Missing state checks and ao2_cleanups were added. * Some debug levels were adjusted to make seeing only off-nominal things at level 1 and nominal or progress things at level 2+. ASTERISK-26099 #close Reported-by: Ross Beer. Change-Id: I779d11802cf672a51392e62a74a1216596075ba1
2016-06-21BuildSystem: Avoid obsolete warning with HELP_STRING on autoconf.Alexander Traud
Some configure scripts used both AC_HELP_STRING and its replacement AS_HELP_STRING. For consistency and to avoid obsolete warnings, those were changed to AS_HELP_STRING. ASTERISK-26046 Change-Id: I8aad4fd2bdee40aa2a31ce3339a1eb33ff4f5b0f
2016-06-21Merge "fix: memory leaks, resource leaks, out of bounds and bugs"zuul
2016-06-20Merge "app_voicemail.c: Fix IMAP compile error."zuul
2016-06-20Merge "http: leverage 'bindaddr' for TLS in http.conf"zuul
2016-06-20app_voicemail.c: Fix IMAP compile error.Richard Mudgett
Fix compile error introduced by the patch for ASTERISK-26045 Change-Id: I5b02876266f2824f4cec2b54d6ff4db5de5778d3
2016-06-20fix: memory leaks, resource leaks, out of bounds and bugsAlexei Gradinari
ASTERISK-26119 #close Change-Id: Iecbf7d0f360a021147344c4e83ab242fd1e7512c
2016-06-20ARI: Ensure announcer channels are destroyed.Mark Michelson
Announcer channels were not being destroyed because the stasis_app_control structure that referenced them was not being destroyed. The control structure was not being destroyed because it was not being unlinked from its container. It was not being unlinked from its container because the after bridge callback for the announcer channel was not being run. The after bridge callback was not being run because the after bridge datastore was not being removed from the channel on destruction. The channel was not being destroyed because the hangup that used to destroy the channel was now only reducing the reference count to one. The reference count of the channel was only being reduced to one because the stasis_app_control structure was holding the final reference... The control structure used to not keep a reference to the channel, so that loop described above did not happen. The solution is to manually remove the control structure from its container when the playback on a bridge is complete. ASTERISK-26083 #close Reported by Joshua Colp Change-Id: I0ddc0f64484ea0016245800b409b567dfe85cfb4
2016-06-20http: leverage 'bindaddr' for TLS in http.confAlexander Traud
The internal HTTP/WebSocket server supports both TCP and TLS, which can be activated separately via the file http.conf. The source code intends to re-use the TCP parameter 'bindaddr' for TLS, even if 'tlsbindaddr' is not specified explicitly. This did not work because of a typo. This change resolves this typo. ASTERISK-26126 #close Change-Id: I5efb0409ae12044dfb3495b6b97b6d40a8c9c51f
2016-06-17Merge "Add support for OGG/Speex file format"Joshua Colp
2016-06-16Merge "chan_sip: bigger buffers for headers, better failure mode"zuul
2016-06-15res_pjsip_transport_management.c: Misc cleanups to survive shutdown.Richard Mudgett
* In unload_module(), reordered destroying things to minimize the window that the global transports container could be used by other threads on shutdown. When shutting down you need to stop things in the opposite order of creation. * Put the global transports container into an AO2_GLOBAL_OBJ_STATIC to eliminate the crash potential by other threads using the container on shutdown. * Made struct monitored_transport.sip_received not use ast_atomic_fetchadd_int() since it is used as a boolean value that is only set TRUE. It was previously incremented for every received SIP message and could theoretically overflow. * In monitored_transport_state_callback(), allocated the monitored transport object without a lock since the lock was unused. * In keepalive_global_loaded(), removed releasing the transports container if the keepalive_thread could not be started. I set it up to be tried again if the user reloads the configuration. Change-Id: I8d12d16ef564290fa6d25a32334bb5ce8fdf87ff
2016-06-14res_pjsip.c: Add check that timer actually got scheduled.Richard Mudgett
Change-Id: Iabaa2e5dccf0762c258101ea0eb1487cf6959ad1
2016-06-14Merge "res_pjsip_session.c: Reorganize ast_sip_session_terminate()."zuul
2016-06-13res_rtp_multicast.c: Fix warning message typo.Richard Mudgett
Change-Id: Ic9928208b9957e09866abe3d9649030942ec52b3
2016-06-10res_pjsip_session.c: Reorganize ast_sip_session_terminate().Richard Mudgett
Change-Id: I68a2128bcba4830985d2d441e70dfd1ac5bd712b
2016-06-10Merge "core: Not the configured but granted number of possible file ↵zuul
descriptors."
2016-06-10core: Not the configured but granted number of possible file descriptors.Alexander Traud
With CLI "core show settings", simply the parameter maxfiles of the file asterisk.conf was shown. If that parameter was not set, nothing was displayed although the environment might have set a default number itself. Or if maxfiles were not granted (completely), still maxfiles was shown. Now, the maximum number of possible file descriptors in the environment is shown. ASTERISK-26097 Change-Id: I2df5c58863b5007b34b77adbe28b885dfcdf7e0b
2016-06-10Merge "astfd: With RLIMIT_NOFILE only the current value is sensible."Joshua Colp
2016-06-10translate: Enables native Packet-Loss Concealment (PLC) for supporting codecs.Joshua Colp
This reverts commit 5bfef2a8b4674382f959b21a3b8e14cf1d942bab as it caused fax test failures. ASTERISK-25629 Change-Id: I79de974dc4f63a1cafe0d2509169fd9a6b3cbaf4
2016-06-10astfd: With RLIMIT_NOFILE only the current value is sensible.Alexander Traud
With menuselect "DEBUG_FD_LEAKS" and CLI "core show fd", both the maximum max and current max of possible file descriptors were shown. Both show the same value always. Not to confuse users, just the current maximum is shown now. ASTERISK-26097 Change-Id: I49cf7952d73aec9e3f6a88942842c39be18380fa
2016-06-09Merge "cel: Ensure only one dial status per channel exists."zuul
2016-06-09Merge "ARI: Ensure proper channel state on operations."zuul
2016-06-09Merge "test_http_media_cache: Fix failing test."zuul
2016-06-09Merge "chan_sip: Support auth username for callbackextension feature"zuul
2016-06-09Merge "res_pjsip_registrar.c: Eliminate rx REGISTER request race condition."Joshua Colp
2016-06-09Merge "stasis: Add setting subscription congestion levels."Joshua Colp
2016-06-09Merge "sorcery: Add setting object type congestion levels."Joshua Colp
2016-06-09Merge "taskprocessors: Implement high/low water mark alerts."Joshua Colp
2016-06-09Merge "res_pjsip_session: Use distributor serializer for incoming calls."Joshua Colp
2016-06-09Merge "res_pjsip_pubsub.c: Recreate subscriptions using distributor serializer."Joshua Colp
2016-06-09Merge "res_pjsip_pubsub.c: Use distributor serializer for incoming ↵Joshua Colp
subscriptions."
2016-06-09Merge "pjsip_distributor.c: Consistently pick a serializer for messages."Joshua Colp
2016-06-09Merge "pjsip_distributor.c: Ignore messages until fully booted."zuul
2016-06-09cel: Ensure only one dial status per channel exists.Joshua Colp
CEL wrongly assumed that a channel would only have a single dial event on it. This is incorrect. Particularly in a queue each call attempt to a member will result in a dial event, adding a new dial status in CEL without removing the old one. This would cause the container to grow with only one dial status being removed when the channel went away. The other dial status entries would remain leaking memory. This change fixes the memory leak by ensuring that only one dial status will only ever exist for each channel. The behavior during the scenario where multiple events are received has also been improved. For failure cases the first failure will be the dial status. If an answer dial status is received, though, it will take priority and the dial status for the channel will be answer. Memory usage has also been decreased by storing the minimal amount of information and the code has been cleaned up slightly. ASTERISK-25262 #close Change-Id: I5944eb923db17b6a0faa7317ff6abc9307c009fe
2016-06-09ARI: Ensure proper channel state on operations.Mark Michelson
ARI was recently outfitted with operations to create and dial channels. This leads to the ability to try funny stuff. You could create a channel and then immediately try to play back media on it. You could create a channel, dial it, and while it is ringing attempt to make it continue in the dialplan. This commit attempts to fix this by adding a channel state check to operations that should not be able to operate on outbound channels that have not yet answered. If a channel is in an invalid state, we will send a 412 response. ASTERISK-26047 #close Reported by Mark Michelson Change-Id: I2ca51bf9ef2b44a1dc5a73f2d2de35c62c37dfd8
2016-06-09test_http_media_cache: Fix failing test.Mark Michelson
The retrieve_cache_control_directives test has been failing occasionally in Jenkins. The apparent failure occurs when attempting to validate the expiration of the retrieved file. After reproducing, the problem was pretty clear. At the beginning of the test, the current time is retrieved. The seconds value of this timestamp is X. When the file is retrieved, res_http_media_cache calculates the expiration and in doing so retrieves the current time. In most cases, since the test executes quickly, it will also retrieve a timestamp with X seconds. However, if the test starts very near to when the timestamp seconds are set to increment, res_http_media_cache may retrieve a timestamp with X+1 seconds instead. The test attempted to account for this by allowing a tolerance of 1 second when validating the expiration. However, the problem was that the comparisons being used in the validation used > and < operations. This meant that values that fell within the tolerance (because they equaled the upper bound of the tolerance) would fail. The solution is to use >= and <= operators in the expiration validation. However, I estimated that while the one second tolerance should be fine on most machines, it would still be possible on a very slow machine to end up falling outside the one second tolerance. So I have also relaxed the tolerance of expiration validation to be three seconds instead. The final change here is to add a debug message when validating expiration so that we can see what values are being compared. ASTERISK-25959 #close Reported by Joshua Colp Change-Id: Ic1a0e10722c1c5d276d5a4d6a67136d6ec26c247
2016-06-09Add support for OGG/Speex file formatTimo Teräs
ASTERISK-18995 #close Change-Id: I98518bd28fc8f95668b3fe27d2cab45045ff3f7a
2016-06-09Merge "chan_pjsip: Lock channel when checking for RTP changes."zuul
2016-06-09cdr.c: Remove assert in base_process_dial_endGeorge Joseph
Scenario: Caller blonde transfer Bob calls Charlie who answers. Bob puts Charlie on hold and calls Alice. Before Alice answers, Bob transfers Charlie to Alice. Charlie's channel triggers an assert because he gets an "ANSWERED" event even though he never dialed anything. With recent changes to dial events, this is now a valid scenario so the assert needed to be removed. ASTERISK-26103 #close Change-Id: I2679b517b696e7952ab7fb29403df9140e7d1de2
2016-06-09chan_pjsip: Lock channel when checking for RTP changes.Mark Michelson
bridge_native_rtp can call into an RTP-capable channel driver in order for the driver to update information about who the channel is communicating with. For SIP channel drivers, this means deactivating RTCP and sending a reinvite so that the endpoints can communicate directly. bridge_native_rtp does the right thing and has the channel locked when calling into the channel driver. chan_pjsip can't alter session properties in this thread, though. chan_pjsip queues a task on the session serializer in order to update properties there. The problem is that this queued task was not locking the channel. This meant that the queued task could attempt to deactivate RTCP at the same time that the channel thread was attempting to process an incoming RTCP packet. This could lead to a crash. This patch fixes the issue by locking the channel in the queued task when altering RTP properties. ASTERISK-26092 #close Reported by Niklas Larsson Change-Id: I3464e226a3c41f6b915f97891e07fa1599e2a159
2016-06-09res_pjsip_registrar.c: Eliminate rx REGISTER request race condition.Richard Mudgett
This patch fixes a race condition processing received REGISTER requests and their retransmissions caused by REGISTER requests being processed by two threads. The "sip_transaction Unable to register REGISTER transaction (key exists)" message is a notable symptom of this issue. This issue was more likely to happen before the pjsip/distributor serializers were created. Instead of steps one and two below placing the REGISTER messages into the same pjsip/distributor they were placed in random pjsip/default serializers. 1) REGISTER requests come in and get placed on the pjsip/distributor serializer. 2) Before the first request is processed a retransmission comes in and is placed on the same pjsip/distributor serializer. 3) The first request goes up the pjsip stack and is then shunted off to the pjsip/aor/<aor> serializer. 4) Before the first request is completed processing in the pjsip/aor/<aor> serializer, the second request goes up the pjsip stack and is also shunted off to the pjsip/aor/<aor> serializer. 5) The first request completes processing and sends out its response. 6) The second request completes processing and tries to send out its response but pjlib complains that the REGISTER transaction key already exists. 7) Sadness ensues. * The race is eliminated by removing the pjsip/aor/<aor> serializer and continuing the processing in the pjsip/distributor serializer. Now any retransmissions queued in the pjsip/distributor serializer will be processed after the first message is completely processed. ASTERISK-26088 #close Reported by: Richard Mudgett Change-Id: I842d714346088bf717ea27437f1dd85bff0bab5a
2016-06-09stasis: Add setting subscription congestion levels.Richard Mudgett
Stasis subscriptions and message routers create taskprocessors to process the event messages. API calls are needed to be able to set the congestion levels of these taskprocessors for selected subscriptions and message routers. * Updated CDR, CEL, and manager's stasis subscription congestion levels based upon stress testing. Increased the congestion levels to reduce the potential for bursty call setup/teardown activity from triggering the taskprocessor overload alert. CDRs in particular need an extra high congestion level because they can take awhile to process the stasis messages. ASTERISK-26088 Reported by: Richard Mudgett Change-Id: Id0a716394b4eee746dd158acc63d703902450244
2016-06-09sorcery: Add setting object type congestion levels.Richard Mudgett
Sorcery creates taskprocessors for object types to process object observer callbacks. An API call is needed to be able to set the congestion levels of these taskprocessors for selected object types. * Updated PJSIP's contact and contact_status sorcery object type observer default congestion levels based upon stress testing. Increased the congestion levels to reduce the potential for bursty register/unregister and subscribe/unsubscribe activity from triggering the taskprocessor overload alert. ASTERISK-26088 Reported by: Richard Mudgett Change-Id: I4542e83b556f0714009bfeff89505c801f1218c6
2016-06-09taskprocessors: Implement high/low water mark alerts.Richard Mudgett
When taskprocessors get backed up, there is a good chance that we are being overloaded and need to defer adding new work to the system. * Implemented a high/low water alert mechanism for modules to check if the system is being overloaded and take appropriate action. When a taskprocessor is created it has default congestion levels set. A taskprocessor can later have those congestion levels altered for specific needs if stress testing shows that the taskprocessor is a symptom of overloading or needs to handle bursty activity without triggering an overload alert. * Add CLI "core show taskprocessor" low/high water columns. * Fixed __allocate_taskprocessor() to not use RAII_VAR(). RAII_VAR() was never a good thing to use when creating a taskprocessor because of the nature of how its references needed to be cleaned up on a partial creation. * Made res_pjsip's distributor check if the taskprocessor overload alert is active before placing a message representing brand new work onto a distributor serializer. ASTERISK-26088 Reported by: Richard Mudgett Change-Id: I182f1be603529cd665958661c4c05ff9901825fa
2016-06-09res_pjsip_session: Use distributor serializer for incoming calls.Richard Mudgett
We must continue using the serializer that the original INVITE came in on for the dialog. There may be retransmissions already enqueued in the original serializer that can result in reentrancy and message sequencing problems. Outgoing call legs create the pjsip/outsess/<endpoint> serializers for their dialogs. ASTERISK-26088 Reported by: Richard Mudgett Change-Id: I24d7948749c582b8045d5389ba3f6588508adbbc
2016-06-09res_pjsip_pubsub.c: Recreate subscriptions using distributor serializer.Richard Mudgett
* Resolves potential reentrancy problems if system restarted in the middle of subscription message transactions. * Fixes memory leak recreating persistent subscriptions when the subscription resource tree could not be created. ASTERISK-26088 Reported by: Richard Mudgett Change-Id: I71e34d7ae8ed35a694f1030e820e2548c48697be
2016-06-09res_pjsip_pubsub.c: Use distributor serializer for incoming subscriptions.Richard Mudgett
We must continue using the serializer that the original SUBSCRIBE came in on for the dialog. There may be retransmissions already enqueued in the original serializer that can result in reentrancy and message sequencing problems. The "sip_transaction Unable to register SUBSCRIBE transaction (key exists)" message is a notable symptom of this issue. Outgoing subscriptions still create the pjsip/pubsub/<endpoint> serializers for their dialogs. ASTERISK-26088 Reported by: Richard Mudgett Change-Id: I18b00bb74a56747b2c8c29543a82440b110bf0b0
2016-06-09pjsip_distributor.c: Consistently pick a serializer for messages.Richard Mudgett
Incoming messages that are not part of a dialog or a recognized response to one of our requests need to be sent to a consistent serializer. Under load we may be queueing retransmissions before we can process the original message. We don't need to throw these messages onto random serializers and cause reentrancy and message sequencing problems. * Created a pool of pjsip/distributor serializers that get picked by hashing the call-id and remote tag strings of the received messages. * Made ast_sip_destroy_distributor() destroy items in the reverse order of creation. ASTERISK-26088 Reported by: Richard Mudgett Change-Id: I2ce769389fc060d9f379977f559026fbcb632407
2016-06-09pjsip_distributor.c: Ignore messages until fully booted.Richard Mudgett
We should not be processing any incoming messages until we are fully booted. We may not have dialplan or other needed configuration loaded yet. ASTERISK-26089 #close Reported by: Scott Griepentrog ASTERISK-26088 Reported by: Richard Mudgett Change-Id: I584aefb4f34b885a8927e1f13a2c64babd606264