Файловый менеджер - Редактировать - /var/www/html/mediawiki-1.43.1/includes/libs/rdbms/ChronologyProtector.php
Ðазад
<?php /** * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. * http://www.gnu.org/copyleft/gpl.html * * @file */ namespace Wikimedia\Rdbms; use LogicException; use Psr\Log\LoggerAwareInterface; use Psr\Log\LoggerInterface; use Psr\Log\NullLogger; use Wikimedia\ObjectCache\BagOStuff; use Wikimedia\ObjectCache\EmptyBagOStuff; /** * Provide a given client with protection against visible database lag. * * ### In a nut shell * * This class tries to hide visible effects of database lag. It does this by temporarily remembering * the database positions after a client makes a write, and on their next web request we will prefer * non-lagged database replicas. When replica connections are established, we wait up to a few seconds * for sufficient replication to have occurred, if they were not yet caught up to that same point. * * This ensures a consistent ordering of events as seen by a client. Kind of like Hawking's * [Chronology Protection Agency](https://en.wikipedia.org/wiki/Chronology_protection_conjecture). * * ### Purpose * * For performance and scalability reasons, almost all data is queried from replica databases. * Only queries relating to writing data, are sent to a primary database. When rendering a web page * with content or activity feeds on it, the very latest information may thus not yet be there. * That's okay in general, but if, for example, a client recently changed their preferences or * submitted new data, we do our best to make sure their next web response does reflect at least * their own recent changes. * * ### How * * To explain how it works, we will look at an example lifecycle for a client. * * A client is browsing the site. Their web requests are generally read-only and display data from * database replicas, which may be a few seconds out of date if a client elsewhere in the world * recently modified that same data. If the application is run from multiple data centers, then * these web requests may be served from the nearest secondary DC. * * A client performs a POST request, perhaps to publish an edit or change their preferences. This * request is routed to the primary DC (this is the responsibility of infrastructure outside * the web app). There, the data is saved to the primary database, after which the database * host will asynchronously replicate this to its replicas in the same and any other DCs. * * Toward the end of the response to this POST request, the application takes note of the primary * database's current "position", and save this under a "clientId" key in the ChronologyProtector * store. The web response will also set two cookies that are similarly short-lived (about ten * seconds): `UseDC=master` and `cpPosIndex=<posIndex>@<write time>#<clientId>`. * * The ten seconds window is meant to account for the time needed for the database writes to have * replicated across all active database replicas, including the cross-dc latency for those * further away in any secondary DCs. The "clientId" is placed in the cookie to handle the case * where the client IP addresses frequently changes between web requests. * * Future web requests from the client should fall in one of two categories: * * 1. Within the ten second window. Their UseDC cookie will make them return * to the primary DC where we access the ChronologyProtector store and use * the database "position" to decide which local database replica to use * and on-demand wait a split second for replication to catch up if needed. * 2. After the ten second window. They will be routed to the nearest and * possibly different DC. Any local ChronologyProtector store existing there * will not be interacted with. A random database replica may be used as * the client's own writes are expected to have been applied here by now. * * @anchor ChronologyProtector-storage-requirements * * ### Storage requirements * * The store used by ChronologyProtector, as configured via {@link $wgMicroStashType}, * should meet the following requirements: * * - Low latencies. Nearly all web requests that involve a database connection will * unconditionally query this store first. It is expected to respond within the order * of one millisecond. * - Best effort persistence, without active eviction pressure. Data stored here cannot be * obtained elsewhere or recomputed. As such, under normal operating conditions, this store * should not be full, and should not evict values before their intended expiry time elapsed. * - No replication, local consistency. Each DC may have a fully independent dc-local store * associated with ChronologyProtector (no replication across DCs is needed). Local writes * must be immediately reflected in subsequent local reads. No intra-dc read lag is allowed. * - No redundancy, fast failure. Loss of data will likely be noticeable and disruptive to * clients, but the data is not considered essential. Under maintenance or unprecedented load, * it is recommended to lose some data, instead of compromising other requirements such as * latency or availability for new writes. The fallback is that users may be temporary * confused as they observe their own actions as not being immediately reflected. * For example, they might change their skin or language preference but still get a one or two * page views afterward with the old settings. Or they might have published an edit and briefly * not yet see it appear in their contribution history. * * ### Operational requirements * * These are the expectations a site administrator must meet for chronology protection: * * - If the application is run from multiple data centers, then you must designate one of them * as the "primary DC". The primary DC is where the primary database is located, from which * replication propagates to replica databases in that same DC and any other DCs. * * - Web requests that use the POST verb, or carry a `UseDC=master` cookie, must be routed to * the primary DC only. * * An exception is requests carrying the `Promise-Non-Write-API-Action: true` header, * which use the POST verb for large read queries, but don't actually require the primary DC. * * If you have legacy extensions deployed that perform queries on the primary database during * GET requests, then you will have to identify a way to route any of its relevant URLs to the * primary DC as well, or to accept that their reads do not enjoy chronology protection, and * that writes may be slower (due to cross-dc latency). * See [T91820](https://phabricator.wikimedia.org/T91820) for %Wikimedia Foundation's routing. * * @ingroup Database * @internal */ class ChronologyProtector implements LoggerAwareInterface { /** @var array Web request information about the client */ private $requestInfo; /** @var string Secret string for HMAC hashing */ private string $secret; private bool $cliMode; /** @var BagOStuff */ private $store; /** @var LoggerInterface */ protected $logger; /** @var string Storage key name */ protected $key; /** @var string Hash of client parameters */ protected $clientId; /** @var string[] Map of client information fields for logging */ protected $clientLogInfo; /** @var int|null Expected minimum index of the last write to the position store */ protected $waitForPosIndex; /** @var bool Whether reading/writing session consistency replication positions is enabled */ protected $enabled = true; /** @var float|null UNIX timestamp when the client data was loaded */ protected $startupTimestamp; /** @var array<string,DBPrimaryPos> Map of (primary server name => position) */ protected $startupPositionsByPrimary = []; /** @var array<string,DBPrimaryPos> Map of (primary server name => position) */ protected $shutdownPositionsByPrimary = []; /** @var array<string,float> Map of (DB cluster name => UNIX timestamp) */ protected $startupTimestampsByCluster = []; /** @var array<string,float> Map of (DB cluster name => UNIX timestamp) */ protected $shutdownTimestampsByCluster = []; /** @var float|null */ private $wallClockOverride; /** * Whether a clientId is new during this request. * * If the clientId wasn't passed by the incoming request, lazyStartup() * can skip fetching position data, and thus LoadBalancer can skip * its IDatabaseForOwner::primaryPosWait() call. * * See also: <https://phabricator.wikimedia.org/T314434> * * @var bool */ private $hasNewClientId = false; /** Seconds to store position write index cookies (safely less than POSITION_STORE_TTL) */ public const POSITION_COOKIE_TTL = 10; /** Seconds to store replication positions */ private const POSITION_STORE_TTL = 60; /** Lock timeout to use for key updates */ private const LOCK_TIMEOUT = 3; /** Lock expiry to use for key updates */ private const LOCK_TTL = 6; private const FLD_POSITIONS = 'positions'; private const FLD_TIMESTAMPS = 'timestamps'; private const FLD_WRITE_INDEX = 'writeIndex'; /** * @param BagOStuff|null $cpStash * @param string|null $secret Secret string for HMAC hashing [optional] * @param bool|null $cliMode Whether the context is CLI or not, setting it to true would disable CP * @param LoggerInterface|null $logger * @since 1.27 */ public function __construct( $cpStash = null, $secret = null, $cliMode = null, $logger = null ) { $this->requestInfo = [ 'IPAddress' => $_SERVER['REMOTE_ADDR'] ?? '', 'UserAgent' => $_SERVER['HTTP_USER_AGENT'] ?? '', // Headers application can inject via LBFactory::setRequestInfo() 'ChronologyClientId' => null, // prior $cpClientId value from LBFactory::shutdown() 'ChronologyPositionIndex' => null // prior $cpIndex value from LBFactory::shutdown() ]; $this->store = $cpStash ?? new EmptyBagOStuff(); $this->secret = $secret ?? ''; $this->logger = $logger ?? new NullLogger(); $this->cliMode = $cliMode ?? ( PHP_SAPI === 'cli' || PHP_SAPI === 'phpdbg' ); } private function load() { // Not enabled or already loaded, short-circuit. if ( !$this->enabled || $this->clientId ) { return; } $client = [ 'ip' => $this->requestInfo['IPAddress'], 'agent' => $this->requestInfo['UserAgent'], 'clientId' => $this->requestInfo['ChronologyClientId'] ?: null ]; if ( $this->cliMode ) { $this->setEnabled( false ); } elseif ( $this->store instanceof EmptyBagOStuff ) { // No where to store any DB positions and wait for them to appear $this->setEnabled( false ); $this->logger->debug( 'Cannot use ChronologyProtector with EmptyBagOStuff' ); } if ( isset( $client['clientId'] ) ) { $this->clientId = $client['clientId']; } else { $this->hasNewClientId = true; $this->clientId = ( $this->secret != '' ) ? hash_hmac( 'md5', $client['ip'] . "\n" . $client['agent'], $this->secret ) : md5( $client['ip'] . "\n" . $client['agent'] ); } $this->key = $this->store->makeGlobalKey( __CLASS__, $this->clientId, 'v4' ); $this->waitForPosIndex = $this->requestInfo['ChronologyPositionIndex']; $this->clientLogInfo = [ 'clientIP' => $client['ip'], 'clientAgent' => $client['agent'], 'clientId' => $client['clientId'] ?? null ]; } public function setRequestInfo( array $info ) { if ( $this->clientId ) { throw new LogicException( 'ChronologyProtector already initialized' ); } $this->requestInfo = $info + $this->requestInfo; } public function setLogger( LoggerInterface $logger ) { $this->load(); $this->logger = $logger; } /** * @return string Client ID hash * @since 1.32 */ public function getClientId() { $this->load(); return $this->clientId; } /** * @param bool $enabled Whether reading/writing session replication positions is enabled * @since 1.27 */ public function setEnabled( $enabled ) { $this->enabled = $enabled; } /** * Yield client "session consistency" replication position for a new ILoadBalancer * * If the stash has a previous primary position recorded, this will try to make * sure that the next query to a replica server of that primary will see changes up * to that position by delaying execution. The delay may timeout and allow stale * data if no non-lagged replica servers are available. * * @internal This method should only be called from LBFactory. * * @param ILoadBalancer $lb * @return DBPrimaryPos|null */ public function getSessionPrimaryPos( ILoadBalancer $lb ) { $this->load(); if ( !$this->enabled ) { return null; } $cluster = $lb->getClusterName(); $primaryName = $lb->getServerName( ServerInfo::WRITER_INDEX ); $pos = $this->getStartupSessionPositions()[$primaryName] ?? null; if ( $pos instanceof DBPrimaryPos ) { $this->logger->debug( "ChronologyProtector will wait for '$pos' on $cluster ($primaryName)'" ); } else { $this->logger->debug( "ChronologyProtector skips wait on $cluster ($primaryName)" ); } return $pos; } /** * Update client "session consistency" replication position for an end-of-life ILoadBalancer * * This remarks the replication position of the primary DB if this request made writes to * it using the provided ILoadBalancer instance. * * @internal This method should only be called from LBFactory. * * @param ILoadBalancer $lb * @return void */ public function stageSessionPrimaryPos( ILoadBalancer $lb ) { $this->load(); if ( !$this->enabled || !$lb->hasOrMadeRecentPrimaryChanges( INF ) ) { return; } $cluster = $lb->getClusterName(); $masterName = $lb->getServerName( ServerInfo::WRITER_INDEX ); if ( $lb->hasStreamingReplicaServers() ) { $pos = $lb->getPrimaryPos(); if ( $pos ) { $this->logger->debug( __METHOD__ . ": $cluster ($masterName) position now '$pos'" ); $this->shutdownPositionsByPrimary[$masterName] = $pos; $this->shutdownTimestampsByCluster[$cluster] = $pos->asOfTime(); } else { $this->logger->debug( __METHOD__ . ": $cluster ($masterName) position unknown" ); $this->shutdownTimestampsByCluster[$cluster] = $this->getCurrentTime(); } } else { $this->logger->debug( __METHOD__ . ": $cluster ($masterName) has no replication" ); $this->shutdownTimestampsByCluster[$cluster] = $this->getCurrentTime(); } } /** * Persist any staged client "session consistency" replication positions * * @internal This method should only be called from LBFactory. * * @param int|null &$clientPosIndex DB position key write counter; incremented on update * @return DBPrimaryPos[] Empty on success; map of (db name => unsaved position) on failure */ public function persistSessionReplicationPositions( &$clientPosIndex = null ) { $this->load(); if ( !$this->enabled ) { return []; } if ( !$this->shutdownTimestampsByCluster ) { $this->logger->debug( __METHOD__ . ": no primary positions data to save" ); return []; } $scopeLock = $this->store->getScopedLock( $this->key, self::LOCK_TIMEOUT, self::LOCK_TTL ); if ( $scopeLock ) { $positions = $this->mergePositions( $this->unmarshalPositions( $this->store->get( $this->key ) ), $this->shutdownPositionsByPrimary, $this->shutdownTimestampsByCluster, $clientPosIndex ); $ok = $this->store->set( $this->key, $this->marshalPositions( $positions ), self::POSITION_STORE_TTL ); unset( $scopeLock ); } else { $ok = false; } $clusterList = implode( ', ', array_keys( $this->shutdownTimestampsByCluster ) ); if ( $ok ) { $this->logger->debug( "ChronologyProtector saved position data for $clusterList" ); $bouncedPositions = []; } else { // Maybe position store is down $this->logger->warning( "ChronologyProtector failed to save position data for $clusterList" ); $clientPosIndex = null; $bouncedPositions = $this->shutdownPositionsByPrimary; } return $bouncedPositions; } /** * Get the UNIX timestamp when the client last touched the DB, if they did so recently * * @internal This method should only be called from LBFactory. * * @param ILoadBalancer $lb * @return float|false UNIX timestamp; false if not recent or on record * @since 1.35 */ public function getTouched( ILoadBalancer $lb ) { $this->load(); if ( !$this->enabled ) { return false; } $cluster = $lb->getClusterName(); $timestampsByCluster = $this->getStartupSessionTimestamps(); $timestamp = $timestampsByCluster[$cluster] ?? null; if ( $timestamp === null ) { $recentTouchTimestamp = false; } elseif ( ( $this->startupTimestamp - $timestamp ) > self::POSITION_COOKIE_TTL ) { // If the position store is not replicated among datacenters and the cookie that // sticks the client to the primary datacenter expires, then the touch timestamp // will be found for requests in one datacenter but not others. For consistency, // return false once the user is no longer routed to the primary datacenter. $recentTouchTimestamp = false; $this->logger->debug( __METHOD__ . ": old timestamp ($timestamp) for $cluster" ); } else { $recentTouchTimestamp = $timestamp; $this->logger->debug( __METHOD__ . ": recent timestamp ($timestamp) for $cluster" ); } return $recentTouchTimestamp; } /** * @return array<string,DBPrimaryPos> */ protected function getStartupSessionPositions() { $this->lazyStartup(); return $this->startupPositionsByPrimary; } /** * @return array<string,float> */ protected function getStartupSessionTimestamps() { $this->lazyStartup(); return $this->startupTimestampsByCluster; } /** * Load the stored replication positions and touch timestamps for the client * * @return void */ protected function lazyStartup() { if ( $this->startupTimestamp !== null ) { return; } $this->startupTimestamp = $this->getCurrentTime(); // There wasn't a client id in the cookie so we built one // There is no point in looking it up. if ( $this->hasNewClientId ) { $this->startupPositionsByPrimary = []; $this->startupTimestampsByCluster = []; return; } $this->logger->debug( 'ChronologyProtector using store ' . get_class( $this->store ) ); $this->logger->debug( "ChronologyProtector fetching positions for {$this->clientId}" ); $data = $this->unmarshalPositions( $this->store->get( $this->key ) ); $this->startupPositionsByPrimary = $data ? $data[self::FLD_POSITIONS] : []; $this->startupTimestampsByCluster = $data[self::FLD_TIMESTAMPS] ?? []; // When a stored array expires and is re-created under the same (deterministic) key, // the array value naturally starts again from index zero. As such, it is possible // that if certain store writes were lost (e.g. store down), that we unintentionally // point to an offset in an older incarnation of the array. // We don't try to detect or do something about this because: // 1. Waiting for an older offset is harmless and generally no-ops. // 2. The older value will have expired by now and thus treated as non-existing, // which means we wouldn't even "see" it here. $indexReached = is_array( $data ) ? $data[self::FLD_WRITE_INDEX] : null; if ( $this->waitForPosIndex > 0 ) { if ( $indexReached >= $this->waitForPosIndex ) { $this->logger->debug( 'expected and found position index {cpPosIndex}', [ 'cpPosIndex' => $this->waitForPosIndex, ] + $this->clientLogInfo ); } else { $this->logger->warning( 'expected but failed to find position index {cpPosIndex}', [ 'cpPosIndex' => $this->waitForPosIndex, 'indexReached' => $indexReached, 'exception' => new \RuntimeException(), ] + $this->clientLogInfo ); } } else { if ( $indexReached ) { $this->logger->debug( 'found position data with index {indexReached}', [ 'indexReached' => $indexReached ] + $this->clientLogInfo ); } } } /** * Merge the new replication positions with the currently stored ones (highest wins) * * @param array<string,mixed>|false $storedValue Current replication position data * @param array<string,DBPrimaryPos> $shutdownPositions New replication positions * @param array<string,float> $shutdownTimestamps New DB post-commit shutdown timestamps * @param int|null &$clientPosIndex New position write index * @return array<string,mixed> Combined replication position data */ protected function mergePositions( $storedValue, array $shutdownPositions, array $shutdownTimestamps, ?int &$clientPosIndex = null ) { /** @var array<string,DBPrimaryPos> $mergedPositions */ $mergedPositions = $storedValue[self::FLD_POSITIONS] ?? []; // Use the newest positions for each DB primary foreach ( $shutdownPositions as $masterName => $pos ) { if ( !isset( $mergedPositions[$masterName] ) || !( $mergedPositions[$masterName] instanceof DBPrimaryPos ) || $pos->asOfTime() > $mergedPositions[$masterName]->asOfTime() ) { $mergedPositions[$masterName] = $pos; } } /** @var array<string,float> $mergedTimestamps */ $mergedTimestamps = $storedValue[self::FLD_TIMESTAMPS] ?? []; // Use the newest touch timestamp for each DB primary foreach ( $shutdownTimestamps as $cluster => $timestamp ) { if ( !isset( $mergedTimestamps[$cluster] ) || $timestamp > $mergedTimestamps[$cluster] ) { $mergedTimestamps[$cluster] = $timestamp; } } $clientPosIndex = ( $storedValue[self::FLD_WRITE_INDEX] ?? 0 ) + 1; return [ self::FLD_POSITIONS => $mergedPositions, self::FLD_TIMESTAMPS => $mergedTimestamps, self::FLD_WRITE_INDEX => $clientPosIndex ]; } /** * @internal For testing only * @return float UNIX timestamp * @codeCoverageIgnore */ protected function getCurrentTime() { if ( $this->wallClockOverride ) { return $this->wallClockOverride; } $clockTime = (float)time(); // call this first // microtime() can severely drift from time() and the microtime() value of other threads. // Instead of seeing the current time as being in the past, use the value of time(). return max( microtime( true ), $clockTime ); } /** * @internal For testing only * @param float|null &$time Mock UNIX timestamp * @codeCoverageIgnore */ public function setMockTime( &$time ) { $this->load(); $this->wallClockOverride =& $time; } private function marshalPositions( array $positions ) { foreach ( $positions[ self::FLD_POSITIONS ] as $key => $pos ) { $positions[ self::FLD_POSITIONS ][ $key ] = $pos->toArray(); } return $positions; } /** * @param array|false $positions * @return array|false */ private function unmarshalPositions( $positions ) { if ( !$positions ) { return $positions; } foreach ( $positions[ self::FLD_POSITIONS ] as $key => $pos ) { $class = $pos[ '_type_' ]; $positions[ self::FLD_POSITIONS ][ $key ] = $class::newFromArray( $pos ); } return $positions; } /** * Build a string conveying the client and write index of the chronology protector data * * @param int $writeIndex * @param int $time UNIX timestamp; can be used to detect stale cookies (T190082) * @param string $clientId Client ID hash from ILBFactory::shutdown() * @return string Value to use for "cpPosIndex" cookie * @since 1.32 in LBFactory, moved to CP in 1.41 */ public static function makeCookieValueFromCPIndex( int $writeIndex, int $time, string $clientId ) { // Format is "<write index>@<write timestamp>#<client ID hash>" return "{$writeIndex}@{$time}#{$clientId}"; } /** * Parse a string conveying the client and write index of the chronology protector data * * @param string|null $value Value of "cpPosIndex" cookie * @param int $minTimestamp Lowest UNIX timestamp that a non-expired value can have * @return array (index: int or null, clientId: string or null) * @since 1.32 in LBFactory, moved to CP in 1.41 */ public static function getCPInfoFromCookieValue( ?string $value, int $minTimestamp ) { static $placeholder = [ 'index' => null, 'clientId' => null ]; if ( $value === null ) { return $placeholder; // not set } // Format is "<write index>@<write timestamp>#<client ID hash>" if ( !preg_match( '/^(\d+)@(\d+)#([0-9a-f]{32})$/', $value, $m ) ) { return $placeholder; // invalid } $index = (int)$m[1]; if ( $index <= 0 ) { return $placeholder; // invalid } elseif ( isset( $m[2] ) && $m[2] !== '' && (int)$m[2] < $minTimestamp ) { return $placeholder; // expired } $clientId = ( isset( $m[3] ) && $m[3] !== '' ) ? $m[3] : null; return [ 'index' => $index, 'clientId' => $clientId ]; } }
| ver. 1.1 | |
.
| PHP 8.4.18 | Ð“ÐµÐ½ÐµÑ€Ð°Ñ†Ð¸Ñ Ñтраницы: 0 |
proxy
|
phpinfo
|
ÐаÑтройка