The current contest grading scale is:
Code:
Art and Style 5 points
So1und 5 points
Polish and Completeness 10 points
Originality 15 points
Overall 15 points
Two categories stick out as being arbitrary and over-valued: "Polish and Completeness", and "Originality". Together these categories make up half of the points.
Both of these categories have some oddball results from last year's contest. For example, "Nothing Good Can Come of This" was ranked last in originality by non-entrants, but first in originality by entrants. I get the impression that nobody's certain on how to judge these two categories, and that their results largely overlap with the other, more concrete categories.
A ProposalLet's scrap the grading scale and have just a single vote for the overall score. Rate every game out of 10. That's it. The top 5 games get prizes. Really simple.
In addition, let's have a bunch of smaller categories run independently:
Code:
Art and Style
Sound
Game Play
Originality
Humor
Programming
Multiplayer
Rate each out of 10. The winner of each category wins bragging rights (and perhaps one of M-Tee's ribbons on the a53 menu). This scores are completely independent from the overall score; nothing is summed together.
That would make some aspects of judging easier. I remember staring at the "originality" field when judging your F-FF and not knowing what to put. One one hand, it was just F-Zero, and not original at all. On the other hand, you did something really technically new and different on the NES, so that would make it incredibly original.
I think I just shrugged and made up a number based on how much I liked it instead.
I like having different, independent categories as suggested.
I think a ranking-based voting system might be beter than assigning scores.
I haven't read in-depth about it, but
https://civs.cs.cornell.edu/ might be worth checking out, with a different poll per each category.
M_Tee wrote:
I haven't read in-depth about it, but
https://civs.cs.cornell.edu/ might be worth checking out, with
I did some quick research.
If using Cordocet, the best version would be
Ranked Pairs, which assigns 2nd, 3rd, 4th, etc places rather than just 1st. The website you posted can be used for automating this.
Still, I'm not really convinced this is the way to go. Preferential voting is less precise than numeric ratings because there's less data. The results are harder to check (and near impossible if using a 3rd party), and the system is really complicated to describe. So it's cool and all, but I think numeric is better for our purposes because it's simple while still being accurate.
Outside of that, one other topic to talk about is "incincere" voting: the idea that people can vote based on the outcome they want, rather than the actual quality of the games. Because of how few voters we have, a single low vote is enough to drop an entry several places and push up the voter's own. People can win better prizes by being jerks, essentially.
I don't believe this has happened yet (we're all swell people), but it's something to consider. Ignoring the best and worst vote of each entry could filter out some of this should the need arise. Or we could just use the median.
Ahhh voting... I wish it could be a simple as rating each game out of 10. But do you remember how close all of the games were? And that was with multiple categories! I think there would definitely need to be categories to keep some type of granularity.
I'm down for adjusted grading categories, and tweaking the scale. Of course this would have to be agreed upon by everyone since it wasn't announced before the beginning of the competition. Maybe make everything 10's? Things have changed since the beginning of the competition. There weren't a lot of collaborations with artists and musicians, but that is changing. Perhaps it brings more value to the competition and art/sound probably deserve higher value. Originality was there pretty much to dissuade people from making simple rip-offs of official releases, but honestly that really isn't important as long as it isn't copyright infringing. The overall category was kind of like a "Ok, the art was nice, it sounded good, but was it a good game?".
As for numeric scores, a voting system where a number could be typed in (thereby accepting non-whole number input) would be very helpful, considering the number of collaborations present. For instance, last time around, Lukasz and I both graded each game separately and then I had planned to submit our average for our judging submission, but being restricted to whole numbers, we had to determine a fair way to handle rounding so that no game received an uneven boost due to multiple categories needing to be rounded up.
Another preference, if possible, would be to assess by game and not by category. For instance, I would like to input scores for all the categories for Game A before moving onto Game B instead of ranking every game in Category A before ranking every game in Category B.
Google Forms could handle this. Each game could be a section (page). Once one section is made, it could be duplicated and edited for the rest.
Here's a mockup with the first two games from '17 in it as an example:
https://goo.gl/forms/yIDdJVv1kaF9uJ843
NESHomebrew wrote:
Ahhh voting... I wish it could be a simple as rating each game out of 10. But do you remember how close all of the games were? And that was with multiple categories! I think there would definitely need to be categories to keep some type of granularity.
That's a very good point, but here's how the games from last year rank if only using "Overall" score:
Code:
27.76 project blue
25.36 grunio
24.4 wolfling
23.09 alphonzo game
22.36 f-ff
21.94 jamin honey
20.6 miedow
20.24 robo ninja
18.42 nothing good
18.22 star evil
18 alphonzo melee
16.82 inherent smile
15.42 lightshields
The results are actually more spread out than the multiple categories combined scores!
@M_Tee that google form looks really good! But it kinda makes it hard to go back and check what you ranked previous entries.
Yeah, honestly, the way I'd like to judge the games would be just typing them into a spreadsheet file. category by row and game by column or vice versa.
EDIT:
NESHomebrew wrote:
Originality was there pretty much to dissuade people from making simple rip-offs of official releases, but honestly that really isn't important as long as it isn't copyright infringing. The overall category was kind of like a "Ok, the art was nice, it sounded good, but was it a good game?".
I actually like the heavy weight that originality has, or at least its presence. It adds a little motivation not just to do something well, but to do something new.
The large size of overall provided an opportunity to assess aspects that I found important that weren't necessarily assessed elsewhere, and as I look back over them, most were gameplay based (difficulty, controls, replay value, social value), with just 3 points actually going to a literal "overall" (gestalt / synergy) , so it seems to have done that job, at least in my case.
pubby wrote:
I get the impression that nobody's certain on how to judge these two categories, and that their results largely overlap with the other, more concrete categories.
As an art teacher, my career is basically built around attempting to assess works in a subjective field in the most objective manner possible. Not an easy thing, haha. Definitely worth doing though.
I did play around with google forms for a while, but I found that survey monkey had a few advantages. I think you can return and change answers after the fact with google, but I don't think it worked as well as the survey monkey one.
I'll have to look at add-ons for google forms. If we could get something working good with google forms I'd be more than happy to use it. The nice thing as well would be a place to put some anonymous feed back for the entrants. I'm pretty sure I could do that with survey monkey as well.
I like the spreadsheet idea, where you can play around with your values and make sure you are judging them all equally.
Spreadsheet seems like it could be the easiest to setup and to judge, no forms to construct or navigate.
A little tricky getting all the data together at the end, but from what I've searched, it seems the following might be feasible:
Using Google Sheets, a basic judging spreadsheet could be made and copied for each judge, shared with only that judge and the person in charge. Once the judging deadline's over, each judge could have their collaborator access removed to prevent further changes. A separate sheet could then reference each of the judge's sheets dynamically via the
ImportRange feature.
Games as rows, categories as column, the last column could be very wide and set up for comments, so anonymous feedback could be collected that way. A little conditional formatting could even be used on judging sheet to highlight cells if the score input is out of range.
I've created a few polls to gauge how the community feels on voting. Please vote!
FIRST POLL - How should the rubric work?Option 1 is what past competitions have used. Participants assign a score to various categories (art, sound, etc) and then sum up these categories to reach a final score.
Option 2 is similar in that players still assign scores to various categories, but the final score is independent of this. Players pick whatever final score they want for each game.
In both cases, the final score is what determines prizes.
Vote here:
https://www.strawpoll.me/17291856SECOND POLL - What voting method?With option 1, voters assign each category a score, presumably from 1 to 10.
With option 2, voters rank each game by preference.
https://en.wikipedia.org/wiki/Borda_countVote here:
https://www.strawpoll.me/17291860THIRD POLL - What categories should we vote on? Select all categories you want to see.
Vote here:
https://www.strawpoll.me/17291910
Alright, so it seems like we want to keep the same format as last year, but the polls show a desire to add a "Gameplay" category.
Something like this then?
Code:
Art 10
Sound 10
Gameplay 10
Polish 10
Freshness 10
Overall 20
(freshness can be renamed originality; I'm just stubborn and thought calling it "freshness" would make the category less ambiguous to judge)
Voting Method
I hadn't voted on method, waiting to read up on Borda method, and I just cast my vote for it. I think it would simplify the act of judging, as it's easier to rank entries against each other, as opposed to assigning a numeric score. The primary benefit is eliminating judges' relative interpretation of scale, reducing the effect from a single judge whose assessments are overall far stricter or looser.
Category Weights
I like the suggested heavier weights of art and sound. (10 instead of 5 each) Regardless of the individual weights of categories, I feel that the final scores as announced should be converted mathematically to an out-of-fifty scale in order to retain comparability to previous years' results.
Terminology
I'm not a big fan of the term freshness. It adds no clarity in terms of definition, (and although I know it is not intended) it has a feel of corporate marketing hipness (see Rotten Tomatoes' usage of the term) that leaves a poor taste, but that could very well be my own disposition.
Regardless, I still feel that the ambiguity in both the originality and overall categories is beneficial because it provides room for each judge to express their own priorities in judging.
pubby wrote:
Alright, so it seems like we want to keep the same format as last year, but the polls show a desire to add a "Gameplay" category.
Something like this then?
Code:
Art 10
Sound 10
Gameplay 10
Polish 10
Freshness 10
Overall 20
(freshness can be renamed originality; I'm just stubborn and thought calling it "freshness" would make the category less ambiguous to judge)
For polish and completeness it was kind of meant to include Gameplay. Honestly, changing the categories this close to the competition deadline probably isn't the best idea (or am I wrong?). I do like the conversation and if changes are warranted it would be nice to start the next years competition with any judging changes already ironed out.
I'm cool with postponing this until next year. I agree that it could be unethical to change the categories so late.
In reality, I don't think the voting system makes a huge difference. The best games always win, the worst games don't. The system can be improved of course, but there's no dire need to do it now.
While this thread is still going, here are two more polls regarding the categories. The first is for
this year (a change which likely won't happen), and the second is for
next year. This pastebin explains the choices:
https://pastebin.com/raw/CSPCca7PThis year:
https://www.strawpoll.me/17303975Next year:
https://www.strawpoll.me/17303984Of course, these polls aren't binding or official or anything, but they might help a little in planning.
Agreeing that no changes should be made to the ongoing compo.
Maybe a way to distinguish gameplay from polish clearer would be to subtitle them a bit more verbosely:
gameplay design and game rules
execution, completedness and polish
the first examines the creative design of the game rules
the second looks at how well things function and how complete they feel
anyway, i'm in favour of balancing art and sound. Not sure why graphics and sound/music would not count as much as design and programming?
I include gameplay in my overall rating, IIRC. For polish, I look for screen transitions, flourishes, etc—anything that's not essential that enhances he experience. Something like Wolfling's camera scrolling system, Miedow's cutscenes, or nonessential sound effects (such as G2's falling sound effect) are what I think of when I think of polish.
How about we do this:
Art and Style 10
Sound 10
Polish and Completeness 10
Originality 10
Overall 20
And then reconsider the categories for 2019.
I've been thinking about this quite a lot, lately. I usually try to create a full fledged game within 32 or 64K. And I mean a feature-lenght game, or whatever you call it, and that usually means I have to simplify stuff. I wonder if that is taken in account when judges rate the entries. If I spent such 64K on a single, uberpolished level, with lots of eye candy and complex mechanics, would the game get higher ranks? How do you balance this?
My entries often don't have cool transitions of nice game over screens simply because I'm using every single byte in level data. Would you rather have shorter but prettier games?
I've been thinking about this because also I'm quite aware that very nice things I've worked on and added to my games won't ever get noticed as they are in the later levels.
I won't change this, I mean, I like the challenge of cramming a full fledged game in just 32K/64K the best I can, but I'm just curious about what you guys think about the matter.
I don't vote, but I'd rather have shorter and prettier games.
Imagine two entries. One with 40 levels, one with 4. Same amount of development time.
The one with 40 levels is likely going to take more of my time to beat. But I'm actually less likely to enjoy the extra time it will take to beat. While the 40 level game's team was building 36 levels, the 4 level game's team was polishing everything else.
Even outside the context of NES, if a game takes 60 hours, I'm likely to beat it only once. I've played many hour long games 60 times, and greatly prefer it. For every set of game mechanics, there is a cutoff point where new levels stop adding meaningful content.
If I play 60 hour long games, I get 60 sets of game mechanics that don't overstay their welcome. The longer a single game is, the more likely it is to overstay its welcome. Most games released today outlast the new experiences their mechanics provide. At least for me. Less content to consume means that's less likely to happen, and it also means less of my time will be spent finding out if it gets boring.
In short: I'm less likely to think about whether the time playing the game was worth spending if it wasn't a lot of time.
Edit: Simplifying something to get more space is something I'd personally try very hard not to do, because the player is more likely to notice the simplification than they are to notice the absence of content the simplification made room for. Especially in a free game.
I will generally rank a "full-length" game higher than a short demo of similar quality. A full game just takes a lot more time and effort, and thus is more impressive.
I'm always impressed that you (Mojon Twins) submit full-length games.
Full-length gets more points from me too, though your games are usually so difficult I can't finish them, and so don't see any surprises in the later parts, like you said.
I'm a terrible gamer, so I usually don't get very far in most games. I guess I prefer presentation over length for this reason.
NESHomebrew wrote:
How about we do this:
Art and Style 10
Sound 10
Polish and Completeness 10
Originality 10
Overall 20
And then reconsider the categories for 2019.
Sounds good.