Posts Tagged ‘Crawler’

Master Blogging and Altitis Birthsday

So after a significant slowdown to my posting activities, this is my 300th post on Altitis. Incidentally, the blog is also 1 year (and 10 days) old now.

Before moving on towards 400, let’s take the opportunity to review some facts both interesting and trivial about this place:

  • Collectively, my Damage Meter benchmarking series are what interested most readers, attracting slightly over 10’000 pageviews over time. While I can’t make any promises, I intend to get back to these “soon” to check where we stand now that the landscape has stabilized and the new combat log feature is almost ironed out.
  • My Parrot review remains the most popular post not part of a series, followed quite closely by my CowTip review.

Interestingly enough, as the wow blogosphere always makes a point of mentioning this kind of things, none of the above have ever been mentioned by wowinsider, and for that matter, haven’t been linked to from other blogs. The readers all come in through search engines, 98% from Google.

A quick review of phpbb3 combined with a mention of wowdb comes next in popularity, although I suspect most visitors to that page leave disappointed. From the search terms used, visitors were mainly interested in phpbb3 wow themes, not my short review & ramblings. Well, for wow-themed phpbb3 styles, here’s a short list:

There’s likely to be more out there if you want to google around but the above sampling should give you a good starting point.

My two most popular rants are tied to the Ghostwolf nerf, and I have mainly Mania to thank for that, as most viewers to these pages come from her blog.

One of my oldest theory posts still attracts a decent amount of viewers every day, the second one in the Defense Theory series which explains how PvE defense works, in particular for tanks.

Now for some other interesting or odd stats:

  • Last week, Altitis ranked second in Google for clicked queries on wrath talent trees (in fact I’m still second as I write this). There’s definitely a hunger for information on the matter out there. Unfortunately for visitors looking for this kind of information, what they get here is my post on how I believe it is too early to engage in in-depth discussions about wrath talents.
  • Some people are apparently still interested in my clumsy attempts to write my own armory crawler in php.
  • To the three people looking for Stop the Warrior: although we both are frequently commenting on each other’s posts and sometimes shouting out (or at) each other, his blog is over there. And while we’re at it, his GM, who holds a (probably deserved) bad opinion of me, has her own blog as well, and if you’re interested in insights into how guild management works in a serious raiding environment, you should have her on your blogroll. No excuses, go subscribe now.
  • What gives honor in AV? Killing other people of course, but also burning towers, holding onto towers until the end of the match, killing the opposing Captain (that’s either Galvander or Balinda depending on your faction), protecting your own captain until the end of the game, killing the enemy general.
  • Armchair from treehugger: dunno what you were looking for, but it sounds hurtful.
  • Casserole FFXI: sounds tasty
  • Cheese Conspiracy Theory: Yes, the good old mystery about the Darnassian Bleu still hasn’t been solved.

While there’s a lot of additional sassy keywords in here, this is probably enough of self-congratulation for a single post. As always, allow me to thank everyone of you for reading and commenting on Altitis, it’s your silent or outspoken presence which gives this blog a reason to be.

On Similar Matters

The need for better Arena Statistics – Armory Crawler v2 in design

Relmstein stirred an interesting debate in his last two posts and gave me the kick I needed to get back to work in earnest in improving my Arena Ladder crawler.

In his first post, Relmstein made some pointed remarks about the reasons for the popularity of the warrior / pallie combo in arena. I did however dispute part of the premise based on the numbers I had gathered with the first version of the Crawler – that is, that warrior / pallie combos are over-represented to a significant degree in the top arena teams.

In his second post, he grabbed some numbers based on the top 5 in a couple of US battlegroups which added credence to that same premise. A quick verification on one of these however showed that a shaman was present as the third man in every instance where a pallie and warrior duo were on the roster for more than 10% of the season’s games.

Beyond nitpicking on the interpretation of these small samples, though, I’m feeling the same dissatisfaction with the generally accepted conventional wisdom than what lead me to begin a crawler in the first place (back then it was a claim that locks were massively over-represented in the top teams), that is, not enough hard numbers.

In my next recode, I’ll be grabbing more data for a bit more detailed analysis. This is what I plan to gather, still for top 100 teams per snapshot:
- Team composition, Rating, total games in season
- Member class, total points per talent tree, games played in season, and team rank in all three brackets

This should allow me to produce the following figures:

  • The old class and build statistics, this time discarding new (or bought-in players), I’m thinking of setting the threshold so that a player has to have attended 10% of the team’s games in order to be considered
  • Team composition figures (I could actually extract part of those from my old data, but I’m not satisfied with the way I stored them)
  • Player Competitivity in the other brackets

If you see anything else worth recording, this is the moment to chime in. Anything related to gear is, however, something I won’t consider, I believe that’s slightly beyond the scope of what I’m trying to achieve.

On Similar Matters

Armory Crawler – future plans

After my first series of runs and my recent hiatus, here are my plans for the armory crawler:

  • Rewrite a V2 with a better data model for storage
  • Include US BattleGroups
  • Get a couple of late Season 2 Snapshots
  • Harvest early Season 3 results.

I do expect some differences in class composition between both seasons, especially in 5v5, what with the practice of some top-ranked teams who sell spots to Joe Randoms for easy points and / or mounts. I expect that in the third week of S3 we will start to see the true top team compositions in that bracket to emerge. In any case, time will tell.

Stay tuned for future updates on the matter.

On Similar Matters

Coding an Armory Crawler in PHP – basic HOWTO

Upon special request, a few notes on how I build my crawler.What I’m using:
A stock XAMPP (for windows in my case) package, containing

  • PHP 5: I wanted 5 simply because it contains SimpleXML, which makes parsing easy and straightforward
  • MySQL 5: that one was in the package, I don’t think v4 would have made any difference
  • Initially nothing else.

In my latest recode I finally managed to find how to activate cURL in PHP.
The biggest difficulty I had was getting the Armory to send me back XML data instead of a formatted web page. From what little understanding I have, modern browsers are considered to have all necessary extensions to run the AJAX code locally in order to display the armoury – in that case Blizzard only sends you the page data, the rendering is done on your own computing power (I hope I did get that right). On older or unkown browsers, however, the page rendering is done on the armoury server and you are sent a formatted HTML page – which isn’t what you want.To determine the browser, the Armory will look its your User Agent. This can either be set in code or in the php.ini file.
An important note if you’re just starting out, xampp (and I expect the rest of php installations equally) has a php.ini in the \php subdirectory of your web server tools, which you can edit to your hearth’s content once you’ve started your server for the first time… without any results. I expect this one is the template used to build the real php.ini, which resides under your \apache directory. This is why I couldn’t get cURL to work for several days.

Told you I’m a noob.

Anyway, there are three ways you can “fake” your user agent so that the Armory believes you’re a modern browser:
- In php.ini (the default is set to PHP and the version number)
- In your code, you can use the below:

ini_set(‘user_agent’, ‘[a modern user agent's string]‘)

- If you’re using cURL, you can pass it in a cURL session with cURL_setopt:

$myvar = curl_init();
curl_setopt($myvar, CURLOPT_USERAGENT, ‘[a modern user
agent's
string]‘);

As for the user agent string, you could use a recent Firefox one, like
this one:
“Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.6)
Gecko/20070725 Firefox/2.0.0.6″ . If you want others, there’s a good list here.

Once you have that, the rest is a matter of browsing through the various Armory pages, which is simply done by pointing to a valid URL.

In my case, the way I’m doing this is as follows:

  • I grab 5 arena ladder pages (which gives me the top 100)
  • Using simpleXML, I parse the URLs for each individual teams
  • I store these in a temporary table
  • In a second step (in order to limit timeout), I go “browse” these Team URLs one by one
  • I fetch the class composition and the URL to each player’s individual character sheet
  • I browse through the char sheet and get the Talent Trees
  • I store team rank, class and build type (not all Trees, just a type classification)

The rest is done by my still to be improved statistics code.

To “browse” to the various Armory URLs, there’s two systems:
Without cURL:

$myvar = fopen(URL);
$xml = file_get_contents ($myvar);
fclose ($myvar);

That one takes just three lines but apparently a lot of processing time.
Using cURL:

$myvar = curl_init();
curl_setopt($myvar, CURLOPT_USERAGENT, ‘[a modern user
agent's
string]‘);
curl_setopt($myvar, CURLOPT_URL, $myURL);
$xmlstr
= curl_exec ($myvar);
curl_close ($myvar);

Which is more code but quite a bit faster. In both cases, you can then parse $xml with whatever method your PHP release allows for – if you have PHP 5, simpleXML is the simplest way to do it, since all relevant data is actually contained in the XML attributes:

$xml = new SimpleXMLElement($xmlstr);

This will give you $xml as an object (at least I think so, as I said before, I’m a noob coder), where the various attributes can be accessed simply by the means of defining, for instance,

$myattribute = $xml->arenaTeams->arenaTeam['name']

Of course, you’ll want to study the xml of the particuliar Armory data you’re looking for in order to extract whatever you need, but you get the ghist of it.

EDIT: Changed link to the user agent strings listing, as the site appears dead

On Similar Matters

More Arena Compostion Statistics

Context: Data sampled tonight, 20h after arena reset. Limitations of the previous sample still apply.
Without changing the Crawler code, I collected more ladders. The data collected is reproduced below by class and build, expressed in percentage of the total amount of team members surveyed.

Since there’s still quite a bit of automation I need to code , the result isn’t yet as reader-friendly as I want it to be. The Build Type numbers correspond to the order of the talent trees for their respective classes, eg for warrior, Build 1 is Arms, Build 2 is Fury, Build 3 is Prot.

EU Blackout, 5v5, 661 members, 92 teams

Class Hybrids Build 1 Build 2 Build 3 Class Total
Warrior 0.9% 12.1% 2.1% 0.8% 15.9%
Paladin 1.2% 15% 0.5% 0.5% 17.1%
Priest 0.8% 0.8% 7.1% 4.8% 13.5%
Rogue 0.2% 3.2% 3.9% 0.5% 7.7%
Hunter 0% 1.8% 1.5% 0% 3.3%
Druid 0.3% 1.1% 0.9% 2.7% 5%
Mage 0.8% 0.8% 1.7% 9.7% 12.9%
Shaman 0.5% 4.5% 0.8% 6.2% 12%
Warlock 0.2% 6.8% 5.1% 0.6% 12.7%

EU Cataclysme (FR), 5v5, 596 members, 100 teams

Class Hybrids Build 1 Build 2 Build 3 Class Total
Warrior 0.5% 13.3% 0.7% 0.8% 15.3%
Paladin 0.5% 15.9% 0% 0.3% 16.8%
Priest 0.7% 0.8% 10.2% 4.5% 16.3%
Rogue 0.2% 2.7% 3.7% 0.3% 6.9%
Hunter 0.5% 0.7% 2.2% 0.3% 3.7%
Druid 0% 1.2% 1.7% 3.2% 6%
Mage 0.2% 1.2% 1.3% 10.2% 12.9%
Shaman 0.2% 6.9% 0% 5.7% 12.8%
Warlock 0.2% 5.7% 3% 0.5% 9.4%

EU Blackout, 2v2, 223 members, 87 teams

Class Hybrids Build 1 Build 2 Build 3 Class Total
Warrior 0.4% 11.2% 0.9% 0.4% 13%
Paladin 0% 9% 0% 0.9% 9.9%
Priest 0.9% 1.8% 7.2% 5.4% 15.2%
Rogue 0.4% 8.5% 8.1% 0.4% 17.5%
Hunter 0.4% 0.9% 1.3% 0% 2.7%
Druid 0% 0.9% 1.8% 9% 11.7%
Mage 0% 0.4% 0% 5.4% 5.8%
Shaman 0.4% 0.9% 0% 4.9% 6.3%
Warlock 0% 8.1% 9.9% 0% 17.9%

Note: This is last night’s sample reformatted to match the other tables.

EU Cataclysme (FR), 2v2, 253 members, 100 teams

Class Hybrids Build 1 Build 2 Build 3 Class Total
Warrior 0.4% 11.5% 0.4% 1.2% 13.4%
Paladin 0.4% 7.1% 0% 0% 7.5%
Priest 0.8% 0.8% 10.3% 5.9% 17.8%
Rogue 0.4% 4% 10.7% 2% 17%
Hunter 0% 1.2% 2.4% 0.4% 4%
Druid 0% 0.8% 0.4% 7.5% 8.7%
Mage 0% 0% 0.8% 5.9% 6.7%
Shaman 0% 1.2% 0% 2.8% 4%
Warlock 0.4% 7.5% 13% 0% 20.9%

EU Vindication, 2v2, 266 members , 100 Teams

Class Hybrids Build 1 Build 2 Build 3 Class Total
Warrior 0.8% 10.9% 1.1% 1.9% 14.7%
Paladin 2.3% 13.5% 0.4% 0.4% 16.5%
Priest 2.3% 1.1% 3% 4.9% 11.3%
Rogue 3% 1.5% 7.1% 0.4% 12%
Hunter 0% 1.1% 0.4% 0% 1.5%
Druid 1.1% 0% 1.5% 1.5% 4.1%
Mage 2.6% 0.8% 1.1% 9.4% 13.9%
Shaman 0.8% 1.1% 1.9% 4.5% 8.3%
Warlock 1.5% 7.1% 8.3% 0.8% 17.7%

EU Vindication, 3v3, 486 members, 100 teams

Class Hybrids Build 1 Build 2 Build 3 Class Total
Warrior 2.5% 11.7% 1% 0.6% 15.8%
Paladin 2.5% 12.8% 0.2% 0.6% 16%
Priest 3.1% 0.6% 4.1% 4.7% 12.6%
Rogue 1.9% 2.5% 6.4% 0.2% 10.9%
Hunter 1% 0.8% 1.4% 0.2% 3.5%
Druid 0.8% 0.4% 2.9% 2.3% 6.4%
Mage 2.5% 1% 1.4% 7% 11.9%
Shaman 1.6% 1.2% 1% 3.9% 7.8%
Warlock 3.5% 4.3% 6.2% 1% 15%

There you go for tonight. Watch this space for more samples in the future. As last night, the mandatory caveat: percentages may not add up exactly to 100% due to rounding errors.

On Similar Matters

Top Arena 2v2 Team Composition – Blackout EU

As promised, here’s the data gathered so far.

Notes:
Battlegroup: Blackout EU
Ladder: 2v2
Teams surveyed: Top 87
Players surveyed: 223
Methodology: A player who invests more than 30 points in a tree gets classified as that Build type, if no tree has 31 points he gets counted as a hybrid.
Sample time: Data was collected 2 hours before weekly Arena Score accounting by Blizzard.

Class Breakdown:

Warrior:
Count: 29
Highest Rank: 7
Arms: 25
Fury: 2
Prot: 1
Hybrids: 1

Paladin:
Count: 22
Highest Rank: 5
Holy: 20
Prot: 0
Ret: 2
Hybrids: 0

Priest:
Count: 34
Highest Rank: 1
Disc: 4
Holy: 12
Shadow: 16
Hybrids: 2

Rogue:
Count: 39
Highest Rank: 1
Assassination: 19
Combat: 18
Subtelty: 1
Hybrids: 1

Hunter:
Count: 6
Highest Rank: 8 (next one is 71)
BM: 2
MM: 3
Surv: 0
Hybrids: 1

Druid:
Count: 26
Highest Rank: 7
Balance: 2
Feral: 4
Resto: 20
Hybrids: 0

Mage:
Count: 13
Highest Rank: 1
Arcane: 1
Fire: 0
Frost: 12
Hybrids: 0

Shaman:
Count: 14
Highest Rank: 5
Elemental:2
Enhancement: 0
Resto: 11
Hybrids: 1

Warlock:
Count: 40
Highest Rank: 2
Affliction: 18
Demo: 22
Destro: 0
Hybrids: 0

Team Composition Summary
The numbers below measure how many teams had at least one of each class on their roster:
Warrior: 31%
Paladin: 21.8%
Priest: 37.9%
Rogue: 41.4%
Hunter: 5.8%
Druid: 29.9%
Mage: 11.5%
Shaman: 16.9%
Warlock: 43.7%
Numbers may not add up to exactly 100% due to rounding errors

Please note that due to calculation time, the Crawler does not currently make any distinctions based on effective amount of games played.

On Similar Matters

Crawler Progress Update

The crawler itself is now totally debugged (had one until the last minute which I kept missing which wouldn’t actually get the individual talent trees for each arena member).

I gather all the data I want to for this project, but have yet to code the statistics part.

My next issue is the execution time of the whole thing – hitting the armory with 410 requests in short progression, with a fopen() and fclose for every single one is horribly inefficient. I raised my script execution time to 5 minutes and I cannot even get all 400 members of a 2v2 ladder that way.

Looks like I have to investigate more into getting cURL to work instead of fopen() to get the armory data, as it is supposed to be a lot more efficient, so a recode is due shortly.

In the next post I’ll dump the data I have now, a lot less than expected but a lot more than what I did manually.

On Similar Matters

New Crawler Delay

Crawler coding proper is done, I was able to collect a data sample from the top 2v2 teams in one battleground. Now it’s the statistical part which I have to work on.
Not entirely sure how I want to compile and display the data yet, so I’ll take another night or two to think things through.

On Similar Matters

Crawler coding progress

Got parsing to work, coded through retrieving individual team members and storing them.

Tomorrow, I only need to retrieve their builds and I’m done with the Crawling. After that, it’s all a matter of interpreting the collected data. Looking good.

On Similar Matters

Arena Class Distribution Crawler Progress

After hitting my head against a brick wall for ages in trying to solve various problems generated by my lack of PHP expertise, I finally managed to get the armory to send me xml data instead of formatted html. Sounds trivial but it took me several hours to figure it out. The database part is fine, so this weekend I’ll be able to start parsing the xml properly.
So I’m progressing at a slower pace than expecting but at least I now have all tools I need, the rest is only coding and ironing out bugs and typos.

On Similar Matters

World of Warcraft™ and Blizzard Entertainment® are all trademarks or registered trademarks of Blizzard Entertainment in the United States and/or other countries. These terms and all related materials, logos, and images are copyright © Blizzard Entertainment. This site is in no way associated with Blizzard Entertainment®