Age | Commit message (Collapse) | Author |
|
|
|
Added a new api to res_statsd.c to allow it to receive a
character pointer for the value argument. This allows for a
'+' and a '-' to easily be sent with the value.
ASTERISK-25419
Reported By: Ashley Sanders
Change-Id: Id6bb53600943d27347d2bcae26c0bd5643567611
|
|
|
|
When an endpoint is deleted (such as through an API), the persistent endpoint
currently continues to lurk around. While this isn't harmful from a memory
consumption perspective - as all persistent endpoints are reclaimed on
shutdown - it does cause Stasis endpoint related operations to continue
to believe that the endpoint may or may not exist.
This patch causes the persistent endpoint related to a PJSIP endpoint to be
destroyed if the PJSIP endpoint is deleted.
Change-Id: I85ac707b4d5e6aad882ac275b0c2e2154affa5bb
|
|
The contact_status Sorcery objects are currently not destroyed when a contact
is deleted. This causes the contact's last known RTT/status to be 'sticky'
when the contact itself may no longer exist. This patch causes the
contact_status objects associated with both dynamic and static contacts to
be destroyed if the AoR holding those contacts is also destroyed (or via
other paths where a contact may be deleted.)
Change-Id: I7feec8b9278cac3c5263a4c0483f4a0f3b62426e
|
|
During a stress test of subscriptions, a huge blast of
subscription-related traffic resulted in the threadpool expanding to a
ridiculous number of threads. The balooning of threads resulted in an
increase of memory, which led to a crash due to being out of memory.
An easy fix for the particular test was to limit the size of the
threadpool, thus reining in the amount of memory that would be used. It
was decided that there really is no downside to having a non-infinite
default value for the maximum size of the threadpool, so this change
introduces 50 threads as the maximum threadpool size for the SIP
threadpool.
ASTERISK-25513 #close
Reported by John Bigelow
Change-Id: If0b9514f1d9b172540ce1a6e2f2ffa1f2b6119be
|
|
When an AoR is created or destroyed dynamically, the scheduled OPTIONS
requests that qualify the contacts on the AoR are not necessarily started
or destroyed, particularly for persistent contacts created for that AoR.
This patch adds create/update/delete sorcery observers for an AoR, which
schedule/unschedule the qualifies as expected.
Change-Id: Ic287ed2e2952a7808ee068776fe966f9554bdf7d
|
|
When compiled with assertions enabled one will occur when destroying
the subscription tree when UAS dialog creation fails. This is because
the code assumes that a dialog will always exist on a subscription
tree when in reality during this specific scenario it won't.
This change makes it so a dialog is not removed from the subscription
tree if it is not present.
ASTERISK-25505 #close
Change-Id: Id5c182b055aacc5e66c80546c64804ce19218dee
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Add the ability to filter output from pjsip list and show commands
using the "like" predicate like chan_sip.
For endpoints, aors, auths, registrations, identifyies and transports,
the modification was a simple change of an ast_sorcery_retrieve_by_fields
call to ast_sorcery_retrieve_by_regex. For channels and contacts a
little more work had to be done because neither of those objects are
true sorcery objects. That was just removing the non-matching object
from the final container. Of course, a little extra plumbing in the
common pjsip_cli code was needed to parse the "like" and pass the regex
to the get_container callbacks.
Some of the get_container code in res_pjsip_endpoint_identifier was also
refactored for simplicity.
ASTERISK-25477 #close
Reported by: Bryant Zimmerman
Tested by: George Joseph
Change-Id: I646d9326b778aac26bb3e2bcd7fa1346d24434f1
|
|
response"
|
|
During outbound registration it is possible to receive a fatal (any permanent/
non-temporary 4xx, 5xx, 6xx) response from the registrar that is simply due
to a problem with the registrar itself. Upon receiving the failure response
Asterisk terminates outbound registration for the given endpoint.
This patch adds an option, 'fatal_retry_interval', that when set continues
outbound registration at the given interval up to 'max_retries' upon receiving
a fatal response.
ASTERISK-25485 #close
Change-Id: Ibc2c7b47164ac89cc803433c0bbe7063bfa143a2
|
|
A certain situation can result in our attempting to send a NOTIFY on a
destroyed dialog. Say we attempt to send a NOTIFY to a subscriber, but
that subscriber has dropped off the network. We end up retransmitting
that NOTIFY until the appropriate SIP timer says to destroy the NOTIFY
transaction. When the pjsip evsub code is told that the transaction has
been terminated, it responds in kind by alerting us that the
subscription has been terminated, destroying the subscription, and then
removing its reference to the dialog, thus destroying the dialog.
The problem is that when we get told that the subscription is being
terminated, we detect that we have not sent a terminating NOTIFY
request, so we queue up such a NOTIFY to be sent out. By the time that
queued NOTIFY gets sent, the dialog has been destroyed, so attempting to
send that NOTIFY can result in a crash.
The fix being introduced here is actually a reintroduction of something
the pubsub code used to employ. We hold a reference to the dialog and
wait to decrement our reference to the dialog until our subscription
tree object is destroyed. This way, we can send messages on the dialog
even if the PJSIP evsub code wants to terminate earlier than we would
like.
In doing this, some NULL checks for subscription tree dialogs have been
removed since NULL dialogs are no longer actually possible.
Change-Id: I013f43cddd9408bb2a31b77f5db87a7972bfe1e5
|
|
When sending a NOTIFY, we lock the dialog and then unlock the dialog
when finished. A recent change made it so that the subscription tree's
dialog pointer will be set NULL when sending the final NOTIFY request
out. This means that when we attempt to unlock the dialog, we pass a
NULL pointer to pjsip_dlg_dec_lock(). The result is that the dialog
remains locked after we think we have unlocked it. When a response to
the NOTIFY arrives, the monitor thread attempts to lock the dialog, but
it cannot because we never released the dialog lock. This results in
Asterisk being unable to process incoming SIP traffic any longer.
The fix in this patch is to use a local pointer to save off the pointer
value of the subscription tree's dialog when locking and unlocking the
dialog. This way, if the subscription tree's dialog pointer is NULLed
out, the local pointer will still have point to the proper place and the
dialog lock will be unlocked as we expect.
Change-Id: I7ddb3eaed7276cceb9a65daca701c3d5e728e63a
|
|
The SIP dialog is removed from the subscription tree when the final
NOTIFY is sent. However, after the final NOTIFY is sent, the persistence
update function still attempts to access the cseq from the dialog,
resulting in a crash.
This fix removes the subscription persistence at the same time that the
dialog is removed from the subscription tree. This way, there is no
attempt to update persistence when the subscription is being destroyed.
Change-Id: Ibb46977a6cef9c51dc95f40f43446e3d11eed5bb
|
|
There have been crashes seen where a taskprocessor's listener is NULL
unexpectedly.
Looking at backtraces, the problem was specifically seen in PJSIP
serializers.
Subscriptions make the mistake of removing a serializer from a dialog
during subscription tree destruction. Since subscription trees are
reference-counted, guaranteeing the circumstances behind the destruction
are not possible. This makes it so that the dialog serializer can be
removed while not holding the dialog lock. This makes it possible for
the distributor to get a pointer to the dialog serializer and have that
serializer get freed out from under it.
The fix for this is to remove the serializer from a subscription dialog
when sending the final NOTIFY. This guarantees that the serializer is
removed with the dialog lock held. By doing this, we guarantee that if
the distributor gains access to the dialog's serializer, it will not be
possible for the serializer to get freed by another thread.
Change-Id: I21f5dac33529f65cec45679bdace60670800ff66
|
|
If an old persistent subscription is recreated but then immediately
destroyed because it is out of date, the subscription tree will have no
leaf subscriptions on it. This was resulting in a crash when attempting
to destroy the subscription tree.
A simple NULL check fixes this problem.
Change-Id: I85570b9e2bcc7260a3fe0ad85904b2a9bf36d2ac
|
|
There have been crashes and general instability seen in the pubsub code,
so this patch introduces three changes to increase the stability.
First, the ownership model for subscriptions has been modified. Due to
RLS, subscriptions are stored in memory as a tree structure. Prior to my
patch, the PJSIP subscription was the owner of the subscription tree.
When the PJSIP subscription told us that it was terminating, we started
destroying the subscription tree along with all of the individual leaf
subscriptions that belong to the tree. The problem with this model is
that the two actors in play here, the PJSIP subscription and the
individual leaf subscriptions, need to have joint ownership of the
subscription tree. So now, the PJSIP subscription and the individual
leaf subscriptions each have a reference to the subscription tree. This
way, we will not actually free memory until no players are left that
care. The PJSIP subscription is a bigger stakeholder, in that if the
PJSIP subscription's reference to the subscription tree is removed, the
subscription tree instructs the leaf subscriptions to shut down and drop
their references to the subscription tree when possible. The individual
leaf subscriptions, upon being told to shut down, can drop their stasis
subscriptions or whatever they use to learn of new state, and then drop
their reference to the subscription tree once they are ready to die.
Second, the lifetime of a PJSIP subscription's reference to our
subscription tree has been altered. As I learned from doing a deep dive,
the PJSIP evsub code can tell Asterisk multiple times that the
subscription has been terminated, and not all of these times
are especially helpful. I have altered the message flow that we use for
SIP subscriptions such that we will always drop the PJSIP subscription's
reference to the subscription tree when we send the NOTIFY that
terminates a SIP subscription. This also means that we will now queue
NOTIFY requests to be sent after responding to incoming SUBSCRIBEs so
that we can have predictable state changes from the PJSIP evsub code.
Third, the synchronization of operations has been improved. PJSIP can
call into our code from a serializer thread (e.g. upon receiving an
incoming request) or from the monitor thread (e.g. when a subscription
times out). Because of this, there is the possibility of competing
threads stepping on each other. PJSIP attempts to do some
synchronization on its own by always keeping the dialog lock held when
it calls into us. However, since we end up pushing tasks into the
serializer, the result was that serialized operations were not grabbing
the dialog lock and could, as a result, step on something that was being
attempted by a different thread. Now we ensure that serialized
operations grab the dialog lock, then check for extenuating
circumstances, then proceed with their operation if they can.
Change-Id: Iff2990c40178dad9cc5f6a5c7f76932ec644b2e5
|
|
In a realtime based system with a limited number of threadpool threads
it is possible for a deadlock to occur. This happens when permanent
endpoint state is updated, which will cause database queries to be done.
These queries may result in URI validation being done which is done
synchronously using a PJSIP thread. If all PJSIP threads are in use
processing traffic they themselves may be blocked waiting to get the
permanent endpoint container lock when identifying an endpoint.
This change moves URI validation to occur at use time instead of
configuration time. While this comes at a cost of not seeing a problem
until you use it it does solve the underlying deadlock problem.
ASTERISK-25486 #close
Change-Id: I2d7d167af987d23b3e8199e4a68f3359eba4c76a
|
|
On v13, loading several thousand PJSIP endpoints on Asterisk start causes
a deadlock most of the time.
Thanks to mdu113 for discovering that there was a call to pgsql_exec() not
protected by the pgsql_lock reentrancy lock.
{quote}
I believe a code path exists that attempts to use pgsql connection without
locking pgsql_lock. I believe what happens during that deadlock that I
see is two concurrent threads are both attempting to send query to pgsql,
one of the thread is using a code path without locking pgsql_lock. If
they managed to send queries at the same time, it seems postgres ignores
one of the queries and replies only to the one of them. If it happens so
that the thread holding the lock didn't receive the reply it will wait for
it (and hold the lock) forever (or at least for very long time), thus
completely blocking all access to db.
{quote}
* Added missing reentrancy locking around pgsql_exec() in find_table().
* Moved unlock of pgsql_lock in unload_module() to avoid locking inversion
between the psql_tables list lock and the pgsql_lock.
ASTERISK-25455 #close
Reported by: mdu113
Patches:
res_config_pgsql.c-connlock2.diff (license #5543) patch uploaded by mdu113
Change-Id: Id9e7cdf8a3b65ff19964b0cf942ace567938c4e2
|
|
|
|
The struct send_request_wrapper has a pjsip lock associated with it that
is created non-recursive. There is a code path for the struct
send_request_wrapper lock that will attempt to lock it recursively. The
reporter's deadlock showed that the thread calling endpt_send_request()
deadlocked itself right after the wrapper object got created.
Out-of-dialog requests such as MESSAGE, qualify OPTIONS, and unsolicited
MWI NOTIFY messages can hit this deadlock.
* Replaced the struct send_request_wrapper pjsip lock with the mutex lock
that can come with an ao2 object since all of Asterisk's mutexes are
recursive. Benefits include removal of code maintaining the pjsip
non-recursive lock since ao2 objects already know how to maintain their
own lock and the lock will show up in the CLI "core show locks" output.
ASTERISK-25435 #close
Reported by: Dmitriy Serov
Change-Id: I458e131dd1b9816f9e963f796c54136e9e84322d
|
|
frame->subclass.frame_ending"
|
|
In ast_rtp_read, the value of the variable 'mark' which we try to assign to a
frame->subclass.frame_ending may be 0, 1 or (1<<23), but we should translate
it to 0 or 1.
ASTERISK-25451 #close
Change-Id: I53bdf5c026041730184a6a809009c028549ce626
|
|
When we decide we will no longer schedule an RTCP write, we remove the
reference to the RTP instance, then assign -1 to the stored scheduler ID
in case something else comes along and wants to see if anything is scheduled.
That scheduler ID is on the RTP instance. After 60a9172d7ef2 was merged to
fix the regression introduced by 3cf0f29310, this improper assignment on a
potentially destroyed object started getting tripped on the build agents.
Frankly, this should have been crashing a lot more often earlier. I can only
assume that the timing was changed just enough by both changes to start
actually hitting this problem.
As it is, simply moving the assignment prior to the ao2 deference is sufficient
to keep the RTP instance from being referenced when it is very, truly,
aboslutely dead.
(Note that it is still good practice to assign -1 to the scheduler ID when we
know we won't be scheduling it again, as the ao2 deref *may* not always destroy
the ao2 object.)
ASTERISK-25449
Change-Id: Ie6d3cb4adc7b1a6c078b1c38c19fc84cf787cda7
|
|
Apparently some endpoints attempt to send a reINVITE before completing the
initial INVITE transaction. In this case PJSIP responds appropriately to
the reINVITE with a 491 INVITE request pending. Unfortunately chan_pjsip
is using the initial INVITE transaction state to determine if an INVITE is
the initial INVITE or a reINVITE. Since the initial INVITE transaction
has not been confirmed yet chan_pjsip thinks the reINVITE is an initial
INVITE and starts another PBX thread on the channel. The extra PBX thread
ensures that hilarity ensues.
* Fix checks for a reINVITE on incoming requests to look for the presence
of a to-tag instead of the initial INVITE transaction state.
* Made caller_id_incoming_request() determine what to do if there is a
channel on the session or not. After a channel is created it is too late
to just store the new party id on the session because the session's party
id has already been copied to the channel's caller id.
ASTERISK-25404 #close
Reported by: Chet Stevens
Change-Id: Ie78201c304a2b13226f3a4ce59908beecc2c68be
|
|
When 5c713fdf18f was merged, it allowed for scheduled items to have an ID of
'0' returned. While this was valid per the documentation for the API, it was
apparently never returned previously. As a result, several users of the
scheduler API viewed the result as being invalid, causing them to reschedule
already scheduled items or otherwise fail in interesting ways.
This patch corrects the users such that they view '0' as valid, and a returned
ID of -1 as being invalid.
Note that the failing HEP RTCP tests now pass with this patch. These tests
failed due to a duplicate scheduling of the RTCP transmissions.
ASTERISK-25449 #close
Change-Id: I019a9aa8b6997584f66876331675981ac9e07e39
|
|
A deadlock can happen when a sorcery object is being expired from the
memory cache when at the same time another object is being placed into the
memory cache. There are a couple other variations on this theme that
could cause the deadlock. Basically if an object is being expired from
the sorcery memory cache at the same time as another thread tries to
update the next object expiration timer the deadlock can happen.
* Add a deadlock avoidance loop in expire_objects_from_cache() to check if
someone is trying to remove the scheduler callback from the scheduler.
ASTERISK-25441 #close
Change-Id: Iec7b0bdb81a72b39477727b1535b2539ad0cf4dc
|
|
Make sorcery_memory_cache_close() call remove_all_from_cache() instead of
partially inlining it.
ASTERISK-25441
Change-Id: I1aa6cb425b1a4307096f3f914d17af8ec179a74c
|
|
Basically you should shutdown in the opposite order of how you setup since
later setup pieces likely depend on earlier setup pieces. e.g.,
Registering your external API with the rest of the system should be the
last thing setup and the first thing unregistered during shutdown.
Change-Id: I5715765b723100c8d3c2642e9e72cc7ad5ad115e
|
|
Change-Id: I8cd32dffbb4f33bb0c39518d6e4c991e73573160
|
|
Change-Id: Ibca6574dc3c213b29cc93486e01ccd51f5caa46c
|
|
In practice the set_role API callback can be invoked even
when no ICE is present on an RTP instance. This can occur
if ICE has not been enabled on it.
ASTERISK-25438 #close
Change-Id: I0e17e4316f0f0d7f095c78c3d4fd73a913b6ba69
|
|
Made use the ao2 sort compare template function and OBJ_SEARCH_xxx
identifiers.
Change-Id: Ic53005dc5aafa7a36c72300dd89b75fb63c92f4c
|
|
* Now conf_alloc() has more off nominal error checking.
* Eliminated RAII_VAR() use in conf_alloc().
* Eliminated a dubius shortcut when destroying cfg->general in
conf_destructor() that would cause a crash if cfg->general failed to get
allocated.
* Add some ACO registration section comments.
Change-Id: Ia40c2b1b2d0777d641605118ae019c5a73865e1a
|
|
Need to finish initializing the string fields in the ao2 object before
putting any default strings into them.
ASTERISK-25383 #close
Reported by: yaron nahum
Change-Id: I9f7f3a03f0c4991a01593abf8697b9a587c0ea84
|
|
When b99a7052621700a1aa641a1c24308f5873275fc8 was merged, subscribing to a
NULL bridge will now cause app_subscribe_bridge to implicitly subscribe to
all bridges. Unfortunately, the res_stasis control loop did not check that
a bridge changing on a channel's control object was actually also non-NULL.
As a result, app_subscribe_bridge will be called with a NULL bridge when a
channel leaves a bridge. This causes a new subscription to be made to the
bridge. If an application has also subscribed to the bridge, the application
will now have two subscriptions:
(1) The explicit one created by the app
(2) The implicit one accidentally created by the control structure
As a result, the 'BridgeDestroyed' event can be sent multiple times. This
patch corrects the control loop such that it only subscribes an application
to a new bridge if the bridge pointer is non-NULL.
ASTERISK-24870
Change-Id: I3510e55f6bc36517c10597ead857b964463c9f4f
|
|
This patch adds the ability to subscribe to all events. There are two possible
ways to accomplish this:
(1) On initial WebSocket connection. This patch adds a new query parameter,
'subscribeAll'. If present and True, Asterisk will subscribe the
applications to all ARI events.
(2) Via the applications resource. When subscribing in this manner, an ARI
client should merely specify a blank resource name, i.e., 'channels:'
instead of 'channels:12354'. This will subscribe the application to all
resources of the 'channels' type.
ASTERISK-24870 #close
Change-Id: I4a943b4db24442cf28bc64b24bfd541249790ad6
|
|
This patch adds support for subscribing to all device state changes. This is
done either by subscribing to an empty device, e.g., 'eventSource=deviceState:',
or by the WebSocket connection specifying that it wants all state in the
system.
ASTERISK-24870
Change-Id: I9cfeca1c9e2231bd7ea73e45919111d44d2eda32
|
|
This patch adds support for receiving events regarding Peer status changes
and Contact status changes. This is particularly useful in scenarios where
we are subscribed to all endpoints and channels, where we often want to know
more about the state of channel technology specific items than a single
endpoint's state.
ASTERISK-24870
Change-Id: I6137459cdc25ce27efc134ad58abf065653da4e9
|
|
|
|
Validate string buffer allocation before using them.
ASTERISK-25323
Change-Id: Ib9c338bdc1e53fb8b81366f0b39482b83ef56ce0
|
|
Validate ast_malloc buffer returned before using it in
set_redirecting_value().
ASTERISK-25323
Change-Id: I15d2ed7cb0546818264c0bf251aa40adeae83253
|
|
There is a slim chance of a race condition occurring where two threads
can both attempt to manipulate the same area.
Thread A can be handling an incoming initial SUBSCRIBE request. Thread A
lets the specific subscription handler know that the subscription has
been established.
At this point, Thread B may detect a state change on the subscribed
resource and queue up a notification task on Thread C, the subscription
serializer thread.
Now Thread A attempts to generate the initial NOTIFY request to send to
the subscriber at the same time that Thread C attempts to generate a
state change NOTIFY request to send to the subscriber.
The result is that Threads A and C can step on the same memory area,
resulting in a crash. The crash has been observed as happening when
attempting to allocate more space to hold the body for the NOTIFY.
The solution presented here is to queue the subscription establishment
and initial NOTIFY generation onto the subscription serializer thread
(Thread C in the above scenario). This way, there is no way that a state
change notification can occur before the initial NOTIFY is sent, and if
there is a quick succession of NOTIFYs, we can guarantee that the two
NOTIFY requests will be sent in succession.
Change-Id: I5a89a77b5f2717928c54d6efb9955e5f6f5cf815
|