asyncio
Scrapy has partial support for asyncio. After you install the
asyncio reactor, you may use asyncio and
asyncio-powered libraries in any coroutine.
Installing the asyncio reactor
To enable asyncio support, your TWISTED_REACTOR setting needs
to be set to 'twisted.internet.asyncioreactor.AsyncioSelectorReactor',
which is the default value.
If you are using AsyncCrawlerRunner or
CrawlerRunner, you also need to
install the AsyncioSelectorReactor
reactor manually. You can do that using
install_reactor():
install_reactor("twisted.internet.asyncioreactor.AsyncioSelectorReactor")
Handling a pre-installed reactor
twisted.internet.reactor and some other Twisted imports install the default
Twisted reactor as a side effect. Once a Twisted reactor is installed, it is
not possible to switch to a different reactor at run time.
If you configure the asyncio Twisted reactor and, at run time, Scrapy complains that a different reactor is already installed, chances are you have some such imports in your code.
You can usually fix the issue by moving those offending module-level Twisted imports to the method or function definitions where they are used. For example, if you have something like:
from twisted.internet import reactor
def my_function():
reactor.callLater(...)
Switch to something like:
def my_function():
from twisted.internet import reactor
reactor.callLater(...)
Alternatively, you can try to manually install the asyncio reactor, with install_reactor(), before
those imports happen.
Integrating Deferred code and asyncio code
Coroutine functions can await on Deferreds by wrapping them into
asyncio.Future objects. Scrapy provides two helpers for this:
- scrapy.utils.defer.deferred_to_future(d: Deferred[_T]) Future[_T][source]
Return an
asyncio.Futureobject that wraps d.This function requires
AsyncioSelectorReactorto be installed.When using the asyncio reactor, you cannot await on
Deferredobjects from Scrapy callables defined as coroutines, you can only await onFutureobjects. WrappingDeferredobjects intoFutureobjects allows you to wait on them:class MySpider(Spider): ... async def parse(self, response): additional_request = scrapy.Request('https://example.org/price') deferred = self.crawler.engine.download(additional_request) additional_response = await deferred_to_future(deferred)
Changed in version 2.14: This function no longer installs an asyncio loop if called before the Twisted asyncio reactor is installed. A
RuntimeErroris raised in this case.
- scrapy.utils.defer.maybe_deferred_to_future(d: Deferred[_T]) Deferred[_T] | Future[_T][source]
Return d as an object that can be awaited from a Scrapy callable defined as a coroutine.
What you can await in Scrapy callables defined as coroutines depends on the value of
TWISTED_REACTOR:When using the asyncio reactor, you can only await on
asyncio.Futureobjects.When not using the asyncio reactor, you can only await on
Deferredobjects.
If you want to write code that uses
Deferredobjects but works with any reactor, use this function on allDeferredobjects:class MySpider(Spider): ... async def parse(self, response): additional_request = scrapy.Request('https://example.org/price') deferred = self.crawler.engine.download(additional_request) additional_response = await maybe_deferred_to_future(deferred)
Tip
If you don’t need to support reactors other than the default
AsyncioSelectorReactor, you
can use deferred_to_future(), otherwise you
should use maybe_deferred_to_future().
Tip
If you need to use these functions in code that aims to be compatible
with lower versions of Scrapy that do not provide these functions,
down to Scrapy 2.0 (earlier versions do not support
asyncio), you can copy the implementation of these functions
into your own code.
Coroutines and futures can be wrapped into Deferreds (for example, when a Scrapy API requires passing a Deferred to it) using the following helpers:
- scrapy.utils.defer.deferred_from_coro(o: Awaitable[_T]) Deferred[_T][source]
- scrapy.utils.defer.deferred_from_coro(o: _T2) _T2
Convert a coroutine or other awaitable object into a Deferred, or return the object as is if it isn’t a coroutine.
- scrapy.utils.defer.deferred_f_from_coro_f(coro_f: Callable[_P, Awaitable[_T]]) Callable[_P, Deferred[_T]][source]
Convert a coroutine function into a function that returns a Deferred.
The coroutine function will be called at the time when the wrapper is called. Wrapper args will be passed to it. This is useful for callback chains, as callback functions are called with the previous callback result.
- scrapy.utils.defer.ensure_awaitable(o: Awaitable[_T], _warn: str | None = None) Awaitable[_T][source]
- scrapy.utils.defer.ensure_awaitable(o: _T, _warn: str | None = None) Awaitable[_T]
Convert any value to an awaitable object.
For a
Deferredobject, usemaybe_deferred_to_future()to wrap it into a suitable object. For an awaitable object of a different type, return it as is. For any other value, return a coroutine that completes with that value.Added in version 2.14.
Enforcing asyncio as a requirement
If you are writing a component that requires asyncio
to work, use scrapy.utils.asyncio.is_asyncio_available() to
enforce it as a requirement. For
example:
from scrapy.utils.asyncio import is_asyncio_available
class MyComponent:
def __init__(self):
if not is_asyncio_available():
raise ValueError(
f"{MyComponent.__qualname__} requires the asyncio support. "
f"Make sure you have configured the asyncio reactor in the "
f"TWISTED_REACTOR setting. See the asyncio documentation "
f"of Scrapy for more information."
)
- scrapy.utils.asyncio.is_asyncio_available() bool[source]
Check if it’s possible to call asyncio code that relies on the asyncio event loop.
Added in version 2.14.
This function returns
Trueif there is a running asyncio event loop. If there is no such loop, it returnsTrueif the Twisted reactor that is installed isAsyncioSelectorReactor, returnsFalseif a different reactor is installed, and raises aRuntimeErrorif no reactor is installed.Code that doesn’t directly require a Twisted reactor should use this function while code that requires
AsyncioSelectorReactorshould useis_asyncio_reactor_installed().When this returns
True, an asyncio loop is installed and used by Scrapy. It’s possible to call functions that require it, such asasyncio.sleep(), and await onasyncio.Futureobjects in Scrapy-related code.When this returns
False, a non-asyncio Twisted reactor is installed. It’s not possible to use asyncio features that require an asyncio event loop or await onasyncio.Futureobjects in Scrapy-related code, but it’s possible to await onDeferredobjects.Note
As this function uses
asyncio.get_running_loop(), it will only detect the event loop if called in the same thread and from the code that runs inside that loop (this shouldn’t be a problem when calling it from code such as spiders and Scrapy components, if Scrapy is run using one of the supported ways).Changed in version 2.15.0: This function now also returns
Trueif there is a running asyncio loop, even if no Twisted reactor is installed.
- scrapy.utils.reactor.is_asyncio_reactor_installed() bool[source]
Check whether the installed reactor is
AsyncioSelectorReactor.Raise a
RuntimeErrorif no reactor is installed.In a future Scrapy version, when Scrapy supports running without a Twisted reactor, this function won’t be useful for checking if it’s possible to use asyncio features, so the code that that doesn’t directly require a Twisted reactor should use
scrapy.utils.asyncio.is_asyncio_available()instead of this function.Changed in version 2.13: In earlier Scrapy versions this function silently installed the default reactor if there was no reactor installed. Now it raises an exception to prevent silent problems in this case.
Using Scrapy without a Twisted reactor
Added in version 2.15.0.
Warning
This is currently experimental and may not be suitable for production use.
It’s possible to use Scrapy without installing a Twisted reactor at all, by
setting the TWISTED_REACTOR_ENABLED setting to False. In this
mode Scrapy will use the asyncio event loop directly, and most of the Scrapy
functionality will work in the same way.
Doing this provides several benefits in certain use cases:
A Twisted reactor, once stopped, cannot be started again. This prevents, for example, using several instances of
AsyncCrawlerProcessin the same process when they use a reactor, but withTWISTED_REACTOR_ENABLED=Falseit becomes possible.There may be limitations imposed by
AsyncioSelectorReactorand related Twisted code, such as the requirement of usingSelectorEventLoopon Windows (see Windows-specific notes), that do not apply if the reactor is not used.AsyncioSelectorReactormanages the underlying event loop, and whileAsyncCrawlerRunnercan use a pre-existing reactor which, in turn, can use a pre-existing event loop, it’s easier to useAsyncCrawlerRunnerwith a pre-existing loop directly.Omitting the reactor machinery may improve performance and reliability.
Limitations
As some Scrapy features and components require a reactor, they don’t work and are disabled without it. Replacements that don’t require a reactor may be added in future Scrapy versions. The following features are not available:
The default HTTP(S) download handler,
HTTP11DownloadHandler(this is likely the biggest difference; Scrapy provides an HTTP(S) download handler that doesn’t require a reactor and will be used instead of it:HttpxDownloadHandler)CrawlerRunnerandCrawlerProcess(AsyncCrawlerProcessandAsyncCrawlerRunnerare available)Twisted-specific DNS resolvers (the
DNS_RESOLVERsetting)User and 3rd-party code that requires a reactor (see below for examples)
Note that importing Twisted modules and, among other things, creating and using
Deferred objects doesn’t require a reactor, so
code that uses Deferred,
Failure and some other Twisted APIs will not
necessarily stop working.
Other differences
When TWISTED_REACTOR_ENABLED is set to False, Scrapy will change
the defaults of some other settings:
TELNETCONSOLE_ENABLEDis set toFalse.The
"http"and"https"keys inDOWNLOAD_HANDLERS_BASEare set to"scrapy.core.downloader.handlers._httpx.HttpxDownloadHandler".The
"ftp"key inDOWNLOAD_HANDLERS_BASEis set toNone.
Thus, HttpxDownloadHandler is
used by default for making HTTP(S) requests. Please refer to its documentation
for its differences and limitations compared to
HTTP11DownloadHandler.
Additionally, AsyncCrawlerProcess will install a
meta path finder that prevents twisted.internet.reactor from
being imported.
Adding support to existing code
Code that doesn’t directly use Twisted APIs or APIs that depend on Twisted ones doesn’t need special support for running without a reactor.
Here are some examples of APIs and patterns that need a replacement:
Using
reactor.callLater()for sleeping or delayed calls. You can useasyncio.loop.call_later()instead.Using
twisted.internet.threads.deferToThread(),reactor.callFromThread()and related APIs to execute code in other threads. You can useasyncio.to_thread(),asyncio.loop.call_soon_threadsafe()and related APIs instead.Using
twisted.internet.task.LoopingCallfor scheduling repeated tasks. As there is no direct replacement in the standard library, you may need to write your own one usingasyncio.sleep()in a task.Using Twisted network client and server APIs (
reactor.connectTCP(),reactor.listenTCP(),twisted.web.client,twisted.mail.smtpetc.). You can use other built-in or 3rd-party libraries for this.Using
CrawlerProcessorCrawlerRunner. You should useAsyncCrawlerProcessorAsyncCrawlerRunnerrespectively instead.Checking whether
asynciosupport is available withscrapy.utils.reactor.is_asyncio_reactor_installed(). You should usescrapy.utils.asyncio.is_asyncio_available()instead.
Scrapy provides unified helpers for some of these examples:
- scrapy.utils.asyncio.call_later(delay: float, func: Callable[[Unpack[_Ts]], object], *args: Unpack[_Ts]) CallLaterResult[source]
Schedule a function to be called after a delay.
This uses either
asyncio.loop.call_later()orreactor.callLater(), depending on whether asyncio support is available.Added in version 2.14.0.
- scrapy.utils.asyncio.create_looping_call(func: ~collections.abc.Callable[[~_P], ~scrapy.utils.asyncio._T], *args: ~typing.~_P, **kwargs: ~typing.~_P) AsyncioLoopingCall | LoopingCall[source]
Create an instance of a looping call class.
This creates an instance of
AsyncioLoopingCallorLoopingCall, depending on whether asyncio support is available.Added in version 2.14.0.
- class scrapy.utils.asyncio.AsyncioLoopingCall(func: ~collections.abc.Callable[[~_P], ~scrapy.utils.asyncio._T], *args: ~typing.~_P, **kwargs: ~typing.~_P)[source]
A simple implementation of a periodic call using asyncio, keeping some API and behavior compatibility with
LoopingCall.The function is called every interval seconds, independent of the finish time of the previous call. If the function is still running when it’s time to call it again, calls are skipped until the function finishes.
The function must not return a coroutine or a
Deferred.Added in version 2.14.0.
- async scrapy.utils.asyncio.run_in_thread(func: ~collections.abc.Callable[[~_P], ~scrapy.utils.asyncio._T], *args: ~typing.~_P, **kwargs: ~typing.~_P) _T[source]
Call a function in a thread and return its result as a coroutine.
This uses either
asyncio.to_thread()ortwisted.internet.threads.deferToThread(), depending on whether asyncio support is available.Added in version 2.15.0.
If your code needs to know whether the reactor is available, you can either
check for the value of the TWISTED_REACTOR_ENABLED setting (you need
access to the Crawler instance to do this) or use the
following function:
- scrapy.utils.reactorless.is_reactorless() bool[source]
Check if we are running in the reactorless mode, i.e. with
TWISTED_REACTOR_ENABLEDset toFalse.As this checks the runtime state and not the setting itself, it can be wrong when executed very early, before the reactor and/or the asyncio event loop are initialized.
Note
As this function uses
scrapy.utils.asyncio.is_asyncio_available(), it has the same limitations for detecting a running asyncio event loop as that one.Added in version 2.15.0.
In general, code that doesn’t use the reactor (directly or indirectly) can be used unmodified both with the asyncio reactor and without a reactor. This includes code that converts Deferreds to futures and vice versa as described in Integrating Deferred code and asyncio code.
Troubleshooting
ImportError: Import of twisted.internet.reactor is forbidden when running
without a Twisted reactor […]: Scrapy is configured to run without a
reactor, but some code imported twisted.internet.reactor, most likely
because that code needs a reactor to be used. You need to stop using this code
or set TWISTED_REACTOR_ENABLED back to True. It’s also possible
that the reactor isn’t really needed but was installed due to the problem
described in Handling a pre-installed reactor, in which case it should be
enough to fix the problematic imports.
RuntimeError: TWISTED_REACTOR_ENABLED is False but a Twisted reactor is
installed: Scrapy is configured to run without a reactor, but a reactor is
already installed before the Scrapy code is executed. If you are trying to set
TWISTED_REACTOR_ENABLED via per-spider settings, it’s currently unsupported.
RuntimeError: We expected a Twisted reactor to be installed but it isn’t:
Scrapy is configured to run with a reactor and not to install one, but a
reactor wasn’t installed before the Scrapy code is executed. If you are trying
to set TWISTED_REACTOR_ENABLED via per-spider settings, it’s currently unsupported.
RuntimeError: <class> doesn’t support TWISTED_REACTOR_ENABLED=False: The
listed class cannot be used with TWISTED_REACTOR_ENABLED set to
False. There may be a replacement in the documentation above or the documentation of the affected class.
Windows-specific notes
The Windows implementation of asyncio can use two event loop
implementations, ProactorEventLoop (default) and
SelectorEventLoop. However, only
SelectorEventLoop works with Twisted.
Scrapy changes the event loop class to SelectorEventLoop
automatically when you change the TWISTED_REACTOR setting or call
install_reactor().
Note
Other libraries you use may require
ProactorEventLoop, e.g. because it supports
subprocesses (this is the case with playwright), so you cannot use
them together with Scrapy on Windows (but you should be able to use
them on WSL or native Linux).
Note
This problem doesn’t apply when not using the reactor, see Using Scrapy without a Twisted reactor.
Using custom asyncio loops
You can also use custom asyncio event loops with the asyncio reactor. Set the
ASYNCIO_EVENT_LOOP setting to the import path of the desired event
loop class to use it instead of the default asyncio event loop.
Switching to a non-asyncio reactor
If for some reason your code doesn’t work with the asyncio reactor, you can use
a different reactor by setting the TWISTED_REACTOR setting to its
import path (e.g. 'twisted.internet.epollreactor.EPollReactor') or to
None, which will use the default reactor for your platform. If you are
using AsyncCrawlerRunner or
AsyncCrawlerProcess you also need to switch to their
Deferred-based counterparts: CrawlerRunner or
CrawlerProcess respectively.