Age | Commit message (Collapse) | Author |
|
Occasionally under load we'll attempt to send a final NOTIFY on a
subscription that's already been terminated and a SEGV will occur
down in pjproject's evsub_destroy function. This is a result of a
race condition between all the paths that can generate a notify
and/or destroy the underlying pjproject evsub object:
* The client can send a SUBSCRIBE with Expires: 0.
* The client can send a SUBSCRIBE/refresh.
* The subscription timer can expire.
* An extension state can change.
* An MWI event can be generated.
* The pjproject transaction timer (timer_b) can expire.
Normally when our pubsub_on_evsub_state is called with a terminate,
we push a task to the serializer and return at which point the dialog
is unlocked. This is usually not a problem because the task runs
immediately and locks the dialog again. When the system is heavily
loaded though, there may be a delay between the unlock and relock
during which another event may occur such as the subscription timer
or timer_b expiring, an extension state change, etc. These may also
cause a terminate to be processed and if so, we could cause pjproject
to try to destroy the evsub structure twice. There's no way for us to
tell that the evsub was already destroyed and the evsub's group lock
can't tolerate this and SEGVs.
The remedy is twofold.
* A patch has been submitted to Teluu and added to the bundled
pjproject which adds add/decrement operations on evsub's group lock.
* In res_pjsip_pubsub:
* configure.ac and pjproject-bundled's configure.m4 were updated
to check for the new evsub group lock APIs.
* We now add a reference to the evsub group lock when we create
the subscription and remove the reference when we clean up the
subscription. This prevents evsub from being destroyed before
we're done with it.
* A state has been added to the subscription tree structure so
termination progress can be tracked through the asyncronous tasks.
* The pubsub_on_evsub_state callback has been split so it's not doing
double duty. It now only handles the final cleanup of the
subscription tree. pubsub_on_rx_refresh now handles both client
refreshes and client terminates. It was always being called for
both anyway.
* The serialized_on_server_timeout task was removed since
serialized_pubsub_on_rx_refresh was almost identical.
* Missing state checks and ao2_cleanups were added.
* Some debug levels were adjusted to make seeing only off-nominal
things at level 1 and nominal or progress things at level 2+.
ASTERISK-26099 #close
Reported-by: Ross Beer.
Change-Id: I779d11802cf672a51392e62a74a1216596075ba1
|
|
|
|
ASTERISK-26119 #close
Change-Id: Iecbf7d0f360a021147344c4e83ab242fd1e7512c
|
|
Announcer channels were not being destroyed because the
stasis_app_control structure that referenced them was not being
destroyed. The control structure was not being destroyed because it was
not being unlinked from its container. It was not being unlinked from
its container because the after bridge callback for the announcer
channel was not being run. The after bridge callback was not being run
because the after bridge datastore was not being removed from the
channel on destruction. The channel was not being destroyed because the
hangup that used to destroy the channel was now only reducing the
reference count to one. The reference count of the channel was only
being reduced to one because the stasis_app_control structure was
holding the final reference...
The control structure used to not keep a reference to the channel, so
that loop described above did not happen.
The solution is to manually remove the control structure from its
container when the playback on a bridge is complete.
ASTERISK-26083 #close
Reported by Joshua Colp
Change-Id: I0ddc0f64484ea0016245800b409b567dfe85cfb4
|
|
* In unload_module(), reordered destroying things to minimize the window
that the global transports container could be used by other threads on
shutdown. When shutting down you need to stop things in the opposite
order of creation.
* Put the global transports container into an AO2_GLOBAL_OBJ_STATIC to
eliminate the crash potential by other threads using the container on
shutdown.
* Made struct monitored_transport.sip_received not use
ast_atomic_fetchadd_int() since it is used as a boolean value that is only
set TRUE. It was previously incremented for every received SIP message
and could theoretically overflow.
* In monitored_transport_state_callback(), allocated the monitored
transport object without a lock since the lock was unused.
* In keepalive_global_loaded(), removed releasing the transports container
if the keepalive_thread could not be started. I set it up to be tried
again if the user reloads the configuration.
Change-Id: I8d12d16ef564290fa6d25a32334bb5ce8fdf87ff
|
|
Change-Id: Iabaa2e5dccf0762c258101ea0eb1487cf6959ad1
|
|
|
|
Change-Id: Ic9928208b9957e09866abe3d9649030942ec52b3
|
|
Change-Id: I68a2128bcba4830985d2d441e70dfd1ac5bd712b
|
|
|
|
ARI was recently outfitted with operations to create and dial channels.
This leads to the ability to try funny stuff. You could create a channel
and then immediately try to play back media on it. You could create a
channel, dial it, and while it is ringing attempt to make it continue in
the dialplan.
This commit attempts to fix this by adding a channel state check to
operations that should not be able to operate on outbound channels that
have not yet answered. If a channel is in an invalid state, we will send
a 412 response.
ASTERISK-26047 #close
Reported by Mark Michelson
Change-Id: I2ca51bf9ef2b44a1dc5a73f2d2de35c62c37dfd8
|
|
This patch fixes a race condition processing received REGISTER requests
and their retransmissions caused by REGISTER requests being processed by
two threads. The "sip_transaction Unable to register REGISTER transaction
(key exists)" message is a notable symptom of this issue.
This issue was more likely to happen before the pjsip/distributor
serializers were created. Instead of steps one and two below placing the
REGISTER messages into the same pjsip/distributor they were placed in
random pjsip/default serializers.
1) REGISTER requests come in and get placed on the pjsip/distributor
serializer.
2) Before the first request is processed a retransmission comes in and is
placed on the same pjsip/distributor serializer.
3) The first request goes up the pjsip stack and is then shunted off to
the pjsip/aor/<aor> serializer.
4) Before the first request is completed processing in the pjsip/aor/<aor>
serializer, the second request goes up the pjsip stack and is also shunted
off to the pjsip/aor/<aor> serializer.
5) The first request completes processing and sends out its response.
6) The second request completes processing and tries to send out its
response but pjlib complains that the REGISTER transaction key already
exists.
7) Sadness ensues.
* The race is eliminated by removing the pjsip/aor/<aor> serializer and
continuing the processing in the pjsip/distributor serializer. Now any
retransmissions queued in the pjsip/distributor serializer will be
processed after the first message is completely processed.
ASTERISK-26088 #close
Reported by: Richard Mudgett
Change-Id: I842d714346088bf717ea27437f1dd85bff0bab5a
|
|
Sorcery creates taskprocessors for object types to process object observer
callbacks. An API call is needed to be able to set the congestion levels
of these taskprocessors for selected object types.
* Updated PJSIP's contact and contact_status sorcery object type observer
default congestion levels based upon stress testing. Increased the
congestion levels to reduce the potential for bursty register/unregister
and subscribe/unsubscribe activity from triggering the taskprocessor
overload alert.
ASTERISK-26088
Reported by: Richard Mudgett
Change-Id: I4542e83b556f0714009bfeff89505c801f1218c6
|
|
When taskprocessors get backed up, there is a good chance that we are
being overloaded and need to defer adding new work to the system.
* Implemented a high/low water alert mechanism for modules to check if the
system is being overloaded and take appropriate action. When a
taskprocessor is created it has default congestion levels set. A
taskprocessor can later have those congestion levels altered for specific
needs if stress testing shows that the taskprocessor is a symptom of
overloading or needs to handle bursty activity without triggering an
overload alert.
* Add CLI "core show taskprocessor" low/high water columns.
* Fixed __allocate_taskprocessor() to not use RAII_VAR(). RAII_VAR() was
never a good thing to use when creating a taskprocessor because of the
nature of how its references needed to be cleaned up on a partial
creation.
* Made res_pjsip's distributor check if the taskprocessor overload alert
is active before placing a message representing brand new work onto a
distributor serializer.
ASTERISK-26088
Reported by: Richard Mudgett
Change-Id: I182f1be603529cd665958661c4c05ff9901825fa
|
|
We must continue using the serializer that the original INVITE came in on
for the dialog. There may be retransmissions already enqueued in the
original serializer that can result in reentrancy and message sequencing
problems.
Outgoing call legs create the pjsip/outsess/<endpoint> serializers for
their dialogs.
ASTERISK-26088
Reported by: Richard Mudgett
Change-Id: I24d7948749c582b8045d5389ba3f6588508adbbc
|
|
* Resolves potential reentrancy problems if system restarted in the middle
of subscription message transactions.
* Fixes memory leak recreating persistent subscriptions when the
subscription resource tree could not be created.
ASTERISK-26088
Reported by: Richard Mudgett
Change-Id: I71e34d7ae8ed35a694f1030e820e2548c48697be
|
|
We must continue using the serializer that the original SUBSCRIBE came in
on for the dialog. There may be retransmissions already enqueued in the
original serializer that can result in reentrancy and message sequencing
problems. The "sip_transaction Unable to register SUBSCRIBE transaction
(key exists)" message is a notable symptom of this issue.
Outgoing subscriptions still create the pjsip/pubsub/<endpoint>
serializers for their dialogs.
ASTERISK-26088
Reported by: Richard Mudgett
Change-Id: I18b00bb74a56747b2c8c29543a82440b110bf0b0
|
|
Incoming messages that are not part of a dialog or a recognized response
to one of our requests need to be sent to a consistent serializer. Under
load we may be queueing retransmissions before we can process the original
message. We don't need to throw these messages onto random serializers
and cause reentrancy and message sequencing problems.
* Created a pool of pjsip/distributor serializers that get picked by
hashing the call-id and remote tag strings of the received messages.
* Made ast_sip_destroy_distributor() destroy items in the reverse order of
creation.
ASTERISK-26088
Reported by: Richard Mudgett
Change-Id: I2ce769389fc060d9f379977f559026fbcb632407
|
|
We should not be processing any incoming messages until we are fully
booted. We may not have dialplan or other needed configuration loaded
yet.
ASTERISK-26089 #close
Reported by: Scott Griepentrog
ASTERISK-26088
Reported by: Richard Mudgett
Change-Id: I584aefb4f34b885a8927e1f13a2c64babd606264
|
|
|
|
|
|
POSIX defines signal.h. sys/signal.h should not be used as it is
c-library internal header which may or may not exist. Notably with
musl it generates warning of being incorrect.
Change-Id: Ia56b0aa1d84b5c590114867b1b384a624f39a6fc
|
|
A crash can occur in res_hep_pjsip or res_hep_rtcp if res_hep has not
loaded and does not have a configuration file. Previously when this
occurred, checks were put in to see if the configuration was loaded
successfully. While this is a good idea - and has been added to the
offending function in res_hep - the reality is res_hep_pjsip and
res_hep_rtcp have no business running if res_hep isn't also running.
As such, this patch also adds a function to res_hep that returns whether
or not it successfully loaded. Oddly enough, ast_module_check returns
"everything is peachy" even if a module declined its load - so it cannot
be solely relied on. res_hep_pjsip and res_hep_rtcp now also check this
function to see if they should continue to load; if it fails, they
decline their load as well.
ASTERISK-26096 #close
Change-Id: I007e535fcc2e51c2ca48534f48c5fc2ac38935ea
|
|
|
|
|
|
Testing has shown that our usage of UnixODBC is problematic
due to bugs within UnixODBC itself as well as the heavy weight
cost of connecting and disconnecting database connections, even
when pooling is enabled.
For users of UnixODBC 2.3.1 and earlier crashes would occur due
to insufficient protection of the disconnect operation. This was
fixed in UnixODBC 2.3.2 and above.
For users of UnixODBC 2.3.3 and higher a slow-down would occur
under heavy database use due to repeated connection establishment.
A regression is present where on each connection the database
configuration is cached again, with the cache growing out of
control.
The connection pool implementation present in this change helps
to mitigate these issues by reducing how much we connect and
disconnect database connections. We also solve the issue of
crashes under UnixODBC 2.3.1 by defaulting the maximum number of
connections to 1, returning us to the previous working behavior.
For users who may have a fixed version the maximum concurrent
connection limit can be increased helping with performance.
The connection pool works by keeping a list of active connections.
If the connection limit has not been reached a new connection is
established. If the connection limit has been reached then the
request waits until a connection becomes available before
continuing.
ASTERISK-26074 #close
ASTERISK-26054 #close
Change-Id: I6774bf4bac49a0b30242c76a09c403d2e856ecff
|
|
Since libSRTP 1.5, its Random Number Generator (RNG) is not maintained anymore.
Therefore, the symbol RAND_bytes is used instead of crypto_get_random.
ASTERISK-24436 #close
Change-Id: Iea0bae4d4e3c9aa0926ea442b6484b5159789d96
|
|
If you create a local channel and don't specify an originator channel
to take capabilities from, we automatically add all audio formats to
the new channel's capabilities. When we try to make the channel
compatible with another, the "best format" functions pick the best
format available, which in this case will be slin192. While this is
great for preserving quality, it's the worst for performance and
overkill for the vast majority of applications.
In the absense of any other information, adding all formats is the
correct thing to do and it's not always possible to supply an
originator so a new parameter 'formats' has been added to the channel
create/originate functions. It's just a comma separated list of formats
to make availalble for the channel. Example: "ulaw,slin,slin16".
'formats' and 'originator' are mutually exclusive.
To facilitate determination of format names, the format name has been
added to "core show codecs".
ASTERISK-26070 #close
Change-Id: I091b23ecd41c1b4128d85028209772ee139f604b
|
|
These flags are non-portable GNU extensions. Make their use
optional. This fixes complication error on e.g. musl c-library
based systems.
Change-Id: I0aa06efc62aa8995f091445c8b762a75a91042f3
|
|
The pjproject doxygen for rdata->msg_info.info says to call
pjsip_rx_data_get_info() instead of accessing the struct member directly.
You need to call the function mostly because the function will generate
the struct member value if it is not already setup.
Change-Id: I4d519385a577f3e9d9193a88125e493cf17fa799
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Re-ordered the body items so Message-Account is second.
Messages-Waiting: no
Message-Account: sip:1571@<IP Removed>:5060
Voice-Message: 0/0 (0/0)
ASTERISK-26065 #close
Reported-by: Ross Beer
Change-Id: If5d35a64656eac98c2dd5e490cc0b2807bed80c3
|
|
Added notes about when you can read or write headers. Specifically
about being able to read on the inbound channel and write on an
outbound channel.
ASTERISK-26063 #close
Reported by: Private Name
Tested by: Rusty Newton
Change-Id: Ibeb64af17d1f6451028b3c29855a3f151a01d8c5
|
|
This adds a new parameter to the end of a multicast RTP dialing string.
This parameter defines the following options:
* i: Set the interface from which multicast RTP is sent
* l: Set whether multicast packets are looped back to the sender
* t: Set the TTL for multicast packets
* c: Set the codec to use for RTP
ASTERISK-26068 #close
Reported by Mark Michelson
Change-Id: I033b706b533f0aa635c342eb738e0bcefa07e219
|
|
ARI dial had been implemented using the Dial API. This made great sense
when dialing was 100% separate from bridging. However, if a channel were
to be added to a bridge during the dial attempt, there would be a
conflict between the dialing thread and the bridging thread. Each would
be attempting to read frames from the dialed channel and act on them.
The initial attempt to make the two play nice was to have the Dial API
suspend the channel in the bridge and stay in charge of the channel
until the dial was complete. The problem with this was that it was
riddled with potential race conditions. It also was not well-suited for
the case where the channel changed which bridge it was in during the
dial.
This new approach removes the use of the Dial API altogether. Instead,
the channel we are dialing is placed into an invisible ARI dialing
bridge. The bridge channel thread handles incoming frames from the
channel. If the channel is added to a real bridge, it is departed from
the invisible bridge and then added to the real bridge. Similarly, if
the channel is removed from the real bridge, it is automatically added
back to the invisible bridge if the dial attempt is still active.
This approach keeps the threading simple by always having the channel
being handled by bridge channel threads.
ASTERISK-25925
Change-Id: I7750359ddf45fcd45eaec749c5b3822de4a8ddbb
|
|
As res_pjsip_nat rewrites contact's address, only the last Via header
can contain the source address of registered endpoint.
Also Call-Id header may contain the source address of registered
endpoint.
Added "via_addr", "via_port", "call_id" to contact.
Added new fields ViaAddress, CallID to AMI event ContactStatus.
ASTERISK-26011
Change-Id: I36bcc0bf422b3e0623680152d80486aeafe4c576
|
|
There are a lot of verbose messages about Endpoint and Contact status
changes if there are many dynamic endpoints.
The patch sets verbose level 2 for Endpoint status changes
and verbose level 3 for Contact status changes.
ASTERISK-26055 #close
Change-Id: Ie64e261ddbbc41bfff0f0190241152cc123fe6d7
|
|
The pjproject doxygen for rdata->msg_info.info says to call
pjsip_rx_data_get_info() instead of accessing the struct member directly.
You need to call the function mostly because the function will generate
the struct member value if it is not already setup.
Change-Id: Iafe8b01242b7deb0ebfdc36685e21374a43936d2
|
|
destroying."
|
|
|
|
Recent changes to res_pjsip_outbound_publish have introduced a
race condition at shutdown where an outbound publish may be shutdown
twice. In this case the first succeeds as a result of the unpublish.
In the second invocation since it's been unpublished a task is
queued to just destroy the client. This task holds no ref to the
publish and as a result the publish may be destroyed before the
task is run, causing a crash.
This explicit destruction task now holds a reference to the publish
to ensure it remains valid.
ASTERISK-26053 #close
Change-Id: I10789b98add3e50292ee3b33a55a1d9061cec94b
|
|
recording"
|
|
The send request callback function currently assumes that it
will only ever be called on transaction state changes. This is
not always true. If our own timer callback occurs we will call
the callback with a timer event instead of a transaction state
change event. In this case the transaction on the event is
invalid and accessing it will result in a crash.
ASTERISK-26049 #close
Change-Id: I623211c8533eb73056b0250b4580b49ad4174dfc
|
|
When receiving an incoming response to a dialog-starting INVITE, we were
not matching the response to the INVITE dialog. Since we had not
recorded the to-tag to the dialog structure, the PJSIP-provided method
to find the dialog did not match.
Most of the time, this was not a problem, because there is a fall-back
that makes the response get routed to the same serializer that the
request was sent on. However, in cases where an asynchronous DNS lookup
occurs in the PJSIP core, the thread that sends the INVITE is not
actually a threadpool serializer thread. This means we are unable to
record a serializer to handle the incoming response.
Now, imagine what happens when an INVITE is sent on a non-serialized
thread, and an error response (such as a 486) arrives. The 486 ends up
getting put on some random threadpool thread. Eventually, a hangup task
gets queued on the INVITE dialog serializer. Since the 486 is being
handled on a different thread, the hangup task can execute at the same
time that the 486 is being handled. The hangup task assumes that it is
the sole owner of the INVITE session and channel, so it ends up
potentially freeing the channel and NULLing the session's channel
pointer. The thread handling the 486 can crash as a result.
This change has the incoming response match the INVITE transaction, and
then get the dialog from that transaction. It's the same method we had
been using for matching incoming CANCEL requests. By doing this, we get
the INVITE dialog and can ensure that the 486 response ends up being
handled by the same thread as the hangup, ensuring that the hangup runs
after the 486 has been completely handled.
ASTERISK-25941 #close
Reported by Javier Riveros
Change-Id: I0d4cc5d07e2a8d03e9db704d34bdef2ba60794a0
|
|
This patch adds a new feature to ARI that allows a client to download
the media associated with a stored recording. The new route is
/recordings/stored/{name}/file, and transmits the underlying binary file
using Asterisk's HTTP server's underlying file transfer facilities.
Because this REST route returns non-JSON, a few small enhancements had
to be made to the Python Swagger generation code, as well as the
mustache templates that generate the ARI bindings.
ASTERISK-26042 #close
Change-Id: I49ec5c4afdec30bb665d9c977ab423b5387e0181
|