diff --git "a/include/class-mw\\API.php" "b/include/class-mw\\API.php" index 0806850..83adb86 100644 --- "a/include/class-mw\\API.php" +++ "b/include/class-mw\\API.php" @@ -1,329 +1,341 @@ . # MediaWiki namespace mw; use cli\Log; /** * Make HTTP request to a MediaWiki API */ class API extends \network\HTTPRequest { /** * MediaWiki tokens handler * * @var mw\Tokens */ private $tokens; /** * Default MediaWiki login username * * @var string */ static $DEFAULT_USERNAME; /** * Default MediaWiki login password * * @var string */ static $DEFAULT_PASSWORD; /** * Default MediaWiki API maxlag * * See https://www.mediawiki.org/wiki/Manual:Maxlag_parameter * * @var int */ static $DEFAULT_MAXLAG = 5; /** * Default MediaWiki API response format * * @var string */ static $DEFAULT_FORMAT = 'json'; /** * Inspect POST flag * * @var bool */ public static $INSPECT_BEFORE_POST = false; /** * Username used for the login. * * @var string */ private $username; /** * Constructor * * @param $api string API endpoint */ public function __construct( $api ) { parent::__construct( $api, [] ); $this->tokens = new Tokens( $this ); } /** * Create an API query with continuation handler * * @param $data array Data * @return mw\APIQuery */ public function createQuery( $data ) { return new APIQuery( $this, $data ); } /** * Effectuate an HTTP POST request but only after a login. * * @param $data array GET/POST data * @param $args array Internal arguments * @override \network\HTTPRequest#post() * @return mixed */ public function post( $data = [], $args = [] ) { if( !$this->isLogged() ) { $this->login(); } if( static::$INSPECT_BEFORE_POST ) { print_r( $data ); \cli\Input::askInput( "Press ENTER to submit" ); } return parent::post( $data, $args ); } /** * Effectuate an HTTP POST (multipart) request but only after a login. * * @param $data array Array of ContentDispositions(s) * @param $args array Internal arguments * @override \network\HTTPRequest#post() * @return mixed */ public function postMultipart( $data = [], $args = [] ) { if( !$this->isLogged() ) { $this->login(); } if( static::$INSPECT_BEFORE_POST ) { print_r( $data ); \cli\Input::askInput( "Press ENTER to submit" ); } return parent::postMultipart( $data, $args ); } /** * Fetch response * * @param $data array GET/POST data * @return mixed */ public function fetch( $data = [], $args = [] ) { if( [] === $data ) { throw \InvalidArgumentException( 'empty data' ); } return parent::fetch( $data, $args ); } /** * Preload some tokens * * @return self */ public function preloadTokens( $tokens ) { $this->tokens->preload( $tokens ); return $this; } /** * Get the value of a token * * @param $token string Token name * @return string Token value */ public function getToken( $token ) { return $this->tokens->get( $token ); } /** * Invalidate a token * * @param $token string Token name * @return self */ public function invalidateToken( $token ) { $this->tokens->invalidate( $token ); return $this; } /** * Get the username used for the login * * @return string|null */ public function getUsername() { return $this->username; } /** * Check if it's already logged in. * * @return bool */ public function isLogged() { return $this->getUsername() !== null; } /** * Login into MediaWiki using an username/password pair. * * Yes, I'm talking about a bot password. * * @param string $username Username * @param string $password Password * @return self */ public function login( $username = null, $password = null ) { // Can use a default set of credentials if( ! $username && ! $password ) { if( $this->isLogged() ) { return $this; } $username = self::$DEFAULT_USERNAME; $password = self::$DEFAULT_PASSWORD; } // no password no party if( !$username || !$password ) { throw new \Exception( sprintf( 'you must call %1$s#login( $username, $password ) or '. 'set %1$s::$DEFAULT_USERNAME and %1$s::$DEFAULT_PASSWORD ' . 'before trying to login', __CLASS__ ) ); } // keep track of the login Log::info( "login with username '$username'" ); // Login $response = parent::post( [ 'action' => 'login', 'lgname' => $username, 'lgpassword' => $password, 'lgtoken' => $this->getToken( Tokens::LOGIN ), ], [ // do not show this sensitive data in cleartext in the log 'sensitive' => true, ] ); // no success no party // TODO: create ExceptionLoginFailed and pass $response to it if( !isset( $response->login->result ) || $response->login->result !== 'Success' ) { print_r( $response ); throw new \Exception("login failed"); } // remember the username $this->username = $response->login->lgusername; return $this; } /** * Filters the data before using it. * * Array elements are imploded by a pipe * NULL values are unset * * @override network\HTTPRequest::onDataReady() * @param $data array GET/POST data * @return array */ protected function onDataReady( $data ) { // Some default values $data = array_replace( [ 'maxlag' => self::$DEFAULT_MAXLAG, 'format' => self::$DEFAULT_FORMAT, ], $data ); foreach( $data as $k => $v ) { if( null === $v ) { unset( $data[ $k ] ); } elseif( is_array( $v ) ) { // remove duplicates (API netiquette) $v = array_unique( $v ); // index alphabetically (API netiquette) sort( $v, SORT_STRING ); $data[ $k ] = implode( '|', $v ); } } if( $this->isLogged() ) { $data = array_replace( [ 'assertuser' => $this->getUsername(), ], $data ); } return $data; } /** * JSON decode and check formal API errors * * @param $response mixed Response * @param $request_data array GET/POST request data + * @param $method string HTTP Method 'GET'/'POST' * @override \network\HTTPRequest#onFetched() * @throws \mw\API\Exception */ - protected function onFetched( $response_raw, $request_data ) { + protected function onFetched( $response_raw, $request_data, $method ) { $response = json_decode( $response_raw ); if( null === $response ) { Log::debug( $response_raw ); throw new \Exception( 'response is not JSON-encoded' ); } if( isset( $response->warnings ) ) { foreach( $response->warnings as $subject => $warning ) { Log::warn( sprintf( '%s: %s', $subject, $warning->{'*'} ) ); } } if( isset( $response->error ) ) { $exception = API\Exception::createFromApiError( $response->error ); if( $exception instanceof API\MaxLagException ) { // retry after some time when server lags Log::warn( "Lag! ({$this->api}) {$response->error->info}" ); - $response = $this->fetch( $request_data, [ + $args = [ 'wait-anti-dos' => true, - ] ); + ]; + + if( $method === 'POST' ) { + + $response = $this->post( $request_data, $args ); + + } else { + + $response = $this->fetch( $request_data, $args ); + + } + } else { throw $exception; } } return $response; } } diff --git "a/include/class-network\\HTTPRequest.php" "b/include/class-network\\HTTPRequest.php" index c6ad2a7..32cfd86 100644 --- "a/include/class-network\\HTTPRequest.php" +++ "b/include/class-network\\HTTPRequest.php" @@ -1,585 +1,588 @@ . # Network namespace network; use cli\Log; use InvalidArgumentException; /** * HTTP request handler for GET and POST requests. */ class HTTPRequest { /** * Class version * * To be incremented every time do you notice this number. * * @var string */ const VERSION = 0.7; /** * Source code URL * * @var string */ const REPO = 'https://gitpull.it/source/boz-mw/'; /** * Wait seconds before each GET request. * * @var float */ public static $WAIT = 0.2; /** * Wait seconds before each POST request * * @var float */ public static $WAIT_POST = 0.2; /** * Seconds to wait before each server error * * Well, do not try to not denial of service the webserver. * * @var float */ public static $WAIT_ANTI_DOS = 5.0; /** * Additional seconds to wait for each retry * * @var float */ public static $WAIT_ANTI_DOS_STEP = 1.5; /** * Number of requests done because of server errors * * When it's over self::$MAX_RETRIES, the script dies. * * @var int */ private $retries = 0; /** * Maximum number of retries before quitting */ public static $MAX_RETRIES = 8; /** * Full HTTP URL to the API endpoint * * @var string */ protected $api; /** * HTTP GET/POST data * * @var array */ private $data; /** * Internal arguments * * @param array */ private $args; /** * Latest HTTP response headers * * They are indexed by lowercase header name. * * @var array */ private $latestHttpResponseHeaders = []; /** * Latest HTTP error status code * * @var Status */ private $latestHttpResponseStatus; /** * HTTP cookies * * @var array */ private $cookies = []; /** * Constructor. * * @param $api string API endpoint * @param $args array Internal arguments */ public function __construct( $api, $args = [] ) { $this->api = $api; $this->setArgs( $args ); } /** * Statical constructor. * * @return network\HTTPRequest */ public static function factory( $api, $args = [] ) { return new self( $api, $args ); } /** * Get internal arguments. * * @return array */ public function getArgs() { return $this->args; } /** * Set internal arguments * * method: GET, POST etc. * user-agent: HTTP user agent * sensitive: flag to indicate if the data is sensitive and should not be printed as usual in the log * wait: microseconds to wait after every GET request * wait-post: microseconds to wait after every POST request * headers: HTTP headers * * @param $args array Internal arguments * @return self */ public function setArgs( $args ) { $this->args = array_replace( [ 'method' => 'GET', 'user-agent' => sprintf( 'boz-mw HTTPRequest.php/%s %s', self::VERSION, self::REPO ), 'sensitive' => false, 'multipart' => false, 'wait' => self::$WAIT, 'wait-post' => self::$WAIT_POST, 'headers' => [], ], $args ); return $this; } /** * Make an HTTP GET query * * @param $data array GET data * @param $args array Internal arguments * @return mixed Response */ public function fetch( $data = [], $args = [] ) { // merge the default arguments with the specified ones (the last have more priority) $args = array_replace( $this->args, $args ); // Eventually post-process the data before using $data = static::onDataReady( $data ); // HTTP query using file_get_contents() $url = $this->api; $context = [ 'http' => [], ]; + + // GET or POST $context['http']['method'] = $args['method']; // populate the User-Agent if( $args['user-agent'] ) { $args['headers'][] = self::header( 'User-Agent', $args['user-agent'] ); } // well, we support Cookie if( $this->haveCookies() ) { $args['headers'][] = $this->getCookieHeader(); } $query = ''; switch( $args['method'] ) { case 'POST': case 'PUT': // populate the content context // the multipart has a data boundary if( $args['multipart'] ) { // get the request body aggregating the content dispositions generating a safe boundary $query = ContentDisposition::aggregate( $data, $boundary ); // override the content type and set the boundary $args['content-type'] = "multipart/form-data; boundary=$boundary"; } else { // normal POST/PUT request $query = http_build_query( $data ); } $context['http']['content'] = $query; $args['headers'][] = self::header( 'Content-Type', $args['content-type'] ); Log::sensitive( "{$args['method']} $url $query", "{$args['method']} $url" ); break; case 'GET': case 'HEAD': $query = http_build_query( $data ); $url .= "?$query"; Log::debug( "GET $url" ); break; } if( $args['headers'] ) { $context['http']['header'] = self::implodeHTTPHeaders( $args['headers'] ); } // eventually preserve server resources to avoid a denial of serve if( isset( $args['wait-anti-dos'] ) ) { // I hope you will see this amazing warning if( $this->retries >= self::$MAX_RETRIES ) { Log::error( "stop riding a dead horse: this server ({$this->api}) is burning ¯\_(ツ)_/¯" ); exit( 1 ); } // set a base wait time and increase it on each step $args['wait'] = self::$WAIT_ANTI_DOS; $args['wait'] += self::$WAIT_ANTI_DOS_STEP * $this->retries; $this->retries++; Log::warn( sprintf( "wait and retry (%d of %d)", $this->retries, self::$MAX_RETRIES ) ); } // wait before executing the query if( $args['wait'] ) { Log::debug( sprintf( "waiting %.2f seconds", $args['wait'] ) ); usleep( $args['wait'] * 1000000 ); } // build context for the shitty but handy file_get_contents() $stream_context = stream_context_create( $context ); // suppress warnings about error 500 (handled later) $response = @file_get_contents( $url, false, $stream_context ); // here $http_response_header should magically exist but sometime it's not if( !isset( $http_response_header ) ) { throw new MissingResponseHeadersException( $url, $stream_context ); } // load the response headers $this->loadHTTPResponseHeaders( $http_response_header ); // Check the HTTP status $status = $this->getLatestHTTPResponseStatus(); if( ! $status->isOK() ) { if( $status->isServerError() ) { // oh nose! Log::error( sprintf( "Huston, we have the code %s: %s", $status->getCode(), $status->getMessage() ) ); // override the upstream wait $args = array_replace( $args, [ 'wait-anti-dos' => true, ] ); return $this->fetch( $data, $args ); } throw new NotOKException( $status ); } Log::debug( $status->getHeader() ); - return static::onFetched( $response, $data ); + return static::onFetched( $response, $data, $args['method'] ); } /** * Effectuate an HTTP POST. * * @param $data array POST data * @param $args Internal arguments * @return mixed Response */ public function post( $data = [], $args = [] ) { $args = array_replace( // low-priority arguments [ 'wait' => $this->args['wait-post'], 'content-type' => 'application/x-www-form-urlencoded', ], // medium priority arguments $args, // high priority arguments [ 'method' => 'POST' ] ); return $this->fetch( $data, $args ); } /** * Effectuate an HTTP POST with multipart (suitable for files) * * An useful reference: * https://stackoverflow.com/a/4247082 * * @param array $data POST data to be converted in content dispositions * @param array $content_disposition Array of ContentDisposition(s) * @param array $args Internal arguments * @return mixed Response */ public function postMultipart( $data = [], $args = [] ) { $args = array_replace( $args, [ 'multipart' => true, 'content-type' => null, // will be filled later ] ); return $this->post( $data, $args ); } /** * Get the latest HTTP response headers * * They will be indexed by lowercase HTTP header name. * * @return array */ public function getLatestHTTPResponseHeaders() { return $this->latestHttpResponseHeaders; } /** * Get latest HTTP status * * @return Status */ private function getLatestHTTPResponseStatus() { return $this->latestHttpResponseStatus; } /** * Check if cookies are set * * @return bool */ public function haveCookies() { return $this->cookies; } /** * Get cookies * * @return array */ public function getCookies() { return $this->cookies; } /** * Set an HTTP cookie name => value pair. * * @param $name string Cookie name * @param $value string Cookie value * @return self */ public function setCookie( $name, $value ) { $this->cookies[ $name ] = $value; return $this; } /** * Get the 'Cookie' HTTP header * * @return string */ public function getCookieHeader() { $cookies = []; foreach( $this->getCookies() as $name => $value ) { $cookies[] = urlencode( $name ) . '=' . urlencode( $value ); } return self::header( 'Cookie', implode( '; ', $cookies ) ); } /** * Get an array of all the HTTP cookies * * @return array */ public function getHTTPCookies() { return $this->cookies; } /** * Set an HTTP cookie in its raw form. * * @param $cookie string Cookie text */ private function setRawCookie( $cookie ) { $parts = explode(';', $cookie); if( isset( $parts[0] ) ) { $name_value = explode('=', $parts[0] ); if( 2 === count( $name_value ) ) { list( $name, $value ) = $name_value; $this->setCookie( $name, $value ); } } } /** * Load HTTP response headers filling cookies * * It will be analyzed the 'Set-Cookie' response header. */ private function loadHTTPResponseHeaders( $http_response_headers ) { // parse the HTTP respose headers and save the last headers and the last status list( $this->latestHttpResponseHeaders, $this->latestHttpResponseStatus ) = self::parseHTTPResponseHeaders( $http_response_headers ); // parse each cookie (the header name will be always case insensitive) if( isset( $this->latestHttpResponseHeaders['set-cookie'] ) ) { foreach( $this->latestHttpResponseHeaders['set-cookie'] as $cookie ) { $this->setRawCookie( $cookie ); } } } /** * Implode HTTP headers with CRLF * * @param array $headers * @return string */ public static function implodeHTTPHeaders( $headers ) { $s = ''; if( ! $headers ) { return null; } foreach( $headers as $header ) { $s .= "$header\r\n"; } return $s; } /** * Group HTTP headers by keys and get the HTTP Status. * * Note that the keys always will be lowercase e.g. 'set-cookie'. * * @param array $http_response_headers * @return array The first element contains an associative array of header name and value(s). The second one contains the Status. */ private static function parseHTTPResponseHeaders( $http_response_headers ) { $status = null; $headers = []; foreach( $http_response_headers as $header ) { // check if it's an header like 'Foo: bar' $header_parts = explode(':', $header, 2); if( 2 === count( $header_parts ) ) { list( $name, $value ) = $header_parts; // the header names must be considered case-insensitive $name = strtolower( $name ); if( !isset( $headers[ $name ] ) ) { $headers[ $name ] = []; } $headers[ $name ][] = ltrim( $value ); } else { try { $status = Status::createFromHeader( $header ); } catch( InvalidArgumentException $e ) { $headers[ $header ] = true; } } } // wtf if( null === $status ) { throw new Exception( "HTTP response without an HTTP status code" ); } return [ $headers, $status ]; } /** * Build a sanitized HTTP header string from a name => value pair * * @param string $name HTTP header name * @param string $value HTTP header value * @return string HTTP header */ public static function header( $name, $value ) { return self::headerRaw( sprintf( '%s: %s', $name, $value ) ); } /** * Sanitize a single header * * As you know an header does not contains a line feed or a carriage return. * * @param $name string HTTP header * @return string HTTP header */ private static function headerRaw( $header ) { if( false !== strpos( $header, "\n" ) || false !== strpos( $header, "\r" ) ) { Log::warn( "wtf header with line feed or carriage return (header injection?)" ); Log::debug( $header ); return str_replace( [ "\n", "\r" ], '', $header ); } return $header; } /** * Can be overrided to post-process the data before its use * * @param $data GET/POST data * @return array GET/POST data post-processed */ protected function onDataReady( $data ) { return $data; } /** * Callback to be overloaded * * This is called every time something is fetched. * * @param $response mixed Response * @param $request_data mixed GET/POST data + * @param $method string HTTP Method 'GET'/'POST' * @return mixed Response */ - protected function onFetched( $response, $request_data ) { + protected function onFetched( $response, $request_data, $method ) { return $response; } }