on voting

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
on voting
by on (#229073)
The current contest grading scale is:

Code:
Art and Style              5 points
So1und                     5 points
Polish and Completeness   10 points
Originality               15 points
Overall                   15 points

Two categories stick out as being arbitrary and over-valued: "Polish and Completeness", and "Originality". Together these categories make up half of the points.

Both of these categories have some oddball results from last year's contest. For example, "Nothing Good Can Come of This" was ranked last in originality by non-entrants, but first in originality by entrants. I get the impression that nobody's certain on how to judge these two categories, and that their results largely overlap with the other, more concrete categories.

A Proposal

Let's scrap the grading scale and have just a single vote for the overall score. Rate every game out of 10. That's it. The top 5 games get prizes. Really simple.

In addition, let's have a bunch of smaller categories run independently:

Code:
Art and Style
Sound
Game Play
Originality
Humor
Programming
Multiplayer

Rate each out of 10. The winner of each category wins bragging rights (and perhaps one of M-Tee's ribbons on the a53 menu). This scores are completely independent from the overall score; nothing is summed together.
Re: on voting
by on (#229074)
That would make some aspects of judging easier. I remember staring at the "originality" field when judging your F-FF and not knowing what to put. One one hand, it was just F-Zero, and not original at all. On the other hand, you did something really technically new and different on the NES, so that would make it incredibly original.

I think I just shrugged and made up a number based on how much I liked it instead.
Re: on voting
by on (#229075)
I like having different, independent categories as suggested.
I think a ranking-based voting system might be beter than assigning scores.

I haven't read in-depth about it, but https://civs.cs.cornell.edu/ might be worth checking out, with a different poll per each category.
Re: on voting
by on (#229088)
M_Tee wrote:
I haven't read in-depth about it, but https://civs.cs.cornell.edu/ might be worth checking out, with

I did some quick research.

If using Cordocet, the best version would be Ranked Pairs, which assigns 2nd, 3rd, 4th, etc places rather than just 1st. The website you posted can be used for automating this.

Still, I'm not really convinced this is the way to go. Preferential voting is less precise than numeric ratings because there's less data. The results are harder to check (and near impossible if using a 3rd party), and the system is really complicated to describe. So it's cool and all, but I think numeric is better for our purposes because it's simple while still being accurate.

Outside of that, one other topic to talk about is "incincere" voting: the idea that people can vote based on the outcome they want, rather than the actual quality of the games. Because of how few voters we have, a single low vote is enough to drop an entry several places and push up the voter's own. People can win better prizes by being jerks, essentially.

I don't believe this has happened yet (we're all swell people), but it's something to consider. Ignoring the best and worst vote of each entry could filter out some of this should the need arise. Or we could just use the median.
Re: on voting
by on (#229319)
Ahhh voting... I wish it could be a simple as rating each game out of 10. But do you remember how close all of the games were? And that was with multiple categories! I think there would definitely need to be categories to keep some type of granularity.

I'm down for adjusted grading categories, and tweaking the scale. Of course this would have to be agreed upon by everyone since it wasn't announced before the beginning of the competition. Maybe make everything 10's? Things have changed since the beginning of the competition. There weren't a lot of collaborations with artists and musicians, but that is changing. Perhaps it brings more value to the competition and art/sound probably deserve higher value. Originality was there pretty much to dissuade people from making simple rip-offs of official releases, but honestly that really isn't important as long as it isn't copyright infringing. The overall category was kind of like a "Ok, the art was nice, it sounded good, but was it a good game?".
Re: on voting
by on (#229322)
As for numeric scores, a voting system where a number could be typed in (thereby accepting non-whole number input) would be very helpful, considering the number of collaborations present. For instance, last time around, Lukasz and I both graded each game separately and then I had planned to submit our average for our judging submission, but being restricted to whole numbers, we had to determine a fair way to handle rounding so that no game received an uneven boost due to multiple categories needing to be rounded up.

Another preference, if possible, would be to assess by game and not by category. For instance, I would like to input scores for all the categories for Game A before moving onto Game B instead of ranking every game in Category A before ranking every game in Category B.

Google Forms could handle this. Each game could be a section (page). Once one section is made, it could be duplicated and edited for the rest.

Here's a mockup with the first two games from '17 in it as an example: https://goo.gl/forms/yIDdJVv1kaF9uJ843
Re: on voting
by on (#229324)
NESHomebrew wrote:
Ahhh voting... I wish it could be a simple as rating each game out of 10. But do you remember how close all of the games were? And that was with multiple categories! I think there would definitely need to be categories to keep some type of granularity.


That's a very good point, but here's how the games from last year rank if only using "Overall" score:

Code:
27.76 project blue
25.36 grunio
24.4 wolfling
23.09 alphonzo game
22.36 f-ff
21.94 jamin honey
20.6 miedow
20.24 robo ninja
18.42 nothing good
18.22 star evil
18 alphonzo melee
16.82 inherent smile
15.42 lightshields

The results are actually more spread out than the multiple categories combined scores!

@M_Tee that google form looks really good! But it kinda makes it hard to go back and check what you ranked previous entries.
Re: on voting
by on (#229325)
Yeah, honestly, the way I'd like to judge the games would be just typing them into a spreadsheet file. category by row and game by column or vice versa.

EDIT:
NESHomebrew wrote:
Originality was there pretty much to dissuade people from making simple rip-offs of official releases, but honestly that really isn't important as long as it isn't copyright infringing. The overall category was kind of like a "Ok, the art was nice, it sounded good, but was it a good game?".

I actually like the heavy weight that originality has, or at least its presence. It adds a little motivation not just to do something well, but to do something new.

The large size of overall provided an opportunity to assess aspects that I found important that weren't necessarily assessed elsewhere, and as I look back over them, most were gameplay based (difficulty, controls, replay value, social value), with just 3 points actually going to a literal "overall" (gestalt / synergy) , so it seems to have done that job, at least in my case.

pubby wrote:
I get the impression that nobody's certain on how to judge these two categories, and that their results largely overlap with the other, more concrete categories.

As an art teacher, my career is basically built around attempting to assess works in a subjective field in the most objective manner possible. Not an easy thing, haha. Definitely worth doing though.
Re: on voting
by on (#229340)
I did play around with google forms for a while, but I found that survey monkey had a few advantages. I think you can return and change answers after the fact with google, but I don't think it worked as well as the survey monkey one.

I'll have to look at add-ons for google forms. If we could get something working good with google forms I'd be more than happy to use it. The nice thing as well would be a place to put some anonymous feed back for the entrants. I'm pretty sure I could do that with survey monkey as well.

I like the spreadsheet idea, where you can play around with your values and make sure you are judging them all equally.
Re: on voting
by on (#229342)
Spreadsheet seems like it could be the easiest to setup and to judge, no forms to construct or navigate.

A little tricky getting all the data together at the end, but from what I've searched, it seems the following might be feasible:

Using Google Sheets, a basic judging spreadsheet could be made and copied for each judge, shared with only that judge and the person in charge. Once the judging deadline's over, each judge could have their collaborator access removed to prevent further changes. A separate sheet could then reference each of the judge's sheets dynamically via the ImportRange feature.
Games as rows, categories as column, the last column could be very wide and set up for comments, so anonymous feedback could be collected that way. A little conditional formatting could even be used on judging sheet to highlight cells if the score input is out of range.
Re: on voting
by on (#232745)
I've created a few polls to gauge how the community feels on voting. Please vote!



FIRST POLL - How should the rubric work?

Option 1 is what past competitions have used. Participants assign a score to various categories (art, sound, etc) and then sum up these categories to reach a final score.

Option 2 is similar in that players still assign scores to various categories, but the final score is independent of this. Players pick whatever final score they want for each game.

In both cases, the final score is what determines prizes.

Vote here: https://www.strawpoll.me/17291856



SECOND POLL - What voting method?

With option 1, voters assign each category a score, presumably from 1 to 10.

With option 2, voters rank each game by preference. https://en.wikipedia.org/wiki/Borda_count

Vote here: https://www.strawpoll.me/17291860



THIRD POLL - What categories should we vote on?

Select all categories you want to see.

Vote here: https://www.strawpoll.me/17291910
Re: on voting
by on (#232860)
Alright, so it seems like we want to keep the same format as last year, but the polls show a desire to add a "Gameplay" category.

Something like this then?

Code:
Art       10
Sound     10
Gameplay  10
Polish    10
Freshness 10
Overall   20

(freshness can be renamed originality; I'm just stubborn and thought calling it "freshness" would make the category less ambiguous to judge)
Re: on voting
by on (#232864)
Voting Method
I hadn't voted on method, waiting to read up on Borda method, and I just cast my vote for it. I think it would simplify the act of judging, as it's easier to rank entries against each other, as opposed to assigning a numeric score. The primary benefit is eliminating judges' relative interpretation of scale, reducing the effect from a single judge whose assessments are overall far stricter or looser.

Category Weights
I like the suggested heavier weights of art and sound. (10 instead of 5 each) Regardless of the individual weights of categories, I feel that the final scores as announced should be converted mathematically to an out-of-fifty scale in order to retain comparability to previous years' results.

Terminology
I'm not a big fan of the term freshness. It adds no clarity in terms of definition, (and although I know it is not intended) it has a feel of corporate marketing hipness (see Rotten Tomatoes' usage of the term) that leaves a poor taste, but that could very well be my own disposition.

Regardless, I still feel that the ambiguity in both the originality and overall categories is beneficial because it provides room for each judge to express their own priorities in judging.
Re: on voting
by on (#232865)
pubby wrote:
Alright, so it seems like we want to keep the same format as last year, but the polls show a desire to add a "Gameplay" category.

Something like this then?

Code:
Art       10
Sound     10
Gameplay  10
Polish    10
Freshness 10
Overall   20

(freshness can be renamed originality; I'm just stubborn and thought calling it "freshness" would make the category less ambiguous to judge)


For polish and completeness it was kind of meant to include Gameplay. Honestly, changing the categories this close to the competition deadline probably isn't the best idea (or am I wrong?). I do like the conversation and if changes are warranted it would be nice to start the next years competition with any judging changes already ironed out.
Re: on voting
by on (#232866)
I'm cool with postponing this until next year. I agree that it could be unethical to change the categories so late.

In reality, I don't think the voting system makes a huge difference. The best games always win, the worst games don't. The system can be improved of course, but there's no dire need to do it now.

While this thread is still going, here are two more polls regarding the categories. The first is for this year (a change which likely won't happen), and the second is for next year. This pastebin explains the choices: https://pastebin.com/raw/CSPCca7P

This year: https://www.strawpoll.me/17303975

Next year: https://www.strawpoll.me/17303984

Of course, these polls aren't binding or official or anything, but they might help a little in planning.
Re: on voting
by on (#232867)
Agreeing that no changes should be made to the ongoing compo.

Maybe a way to distinguish gameplay from polish clearer would be to subtitle them a bit more verbosely:

gameplay design and game rules
execution, completedness and polish

the first examines the creative design of the game rules
the second looks at how well things function and how complete they feel

anyway, i'm in favour of balancing art and sound. Not sure why graphics and sound/music would not count as much as design and programming?
Re: on voting
by on (#232903)
I include gameplay in my overall rating, IIRC. For polish, I look for screen transitions, flourishes, etc—anything that's not essential that enhances he experience. Something like Wolfling's camera scrolling system, Miedow's cutscenes, or nonessential sound effects (such as G2's falling sound effect) are what I think of when I think of polish.
Re: on voting
by on (#233192)
How about we do this:

Art and Style 10
Sound 10
Polish and Completeness 10
Originality 10
Overall 20

And then reconsider the categories for 2019.
Re: on voting
by on (#233385)
I've been thinking about this quite a lot, lately. I usually try to create a full fledged game within 32 or 64K. And I mean a feature-lenght game, or whatever you call it, and that usually means I have to simplify stuff. I wonder if that is taken in account when judges rate the entries. If I spent such 64K on a single, uberpolished level, with lots of eye candy and complex mechanics, would the game get higher ranks? How do you balance this?

My entries often don't have cool transitions of nice game over screens simply because I'm using every single byte in level data. Would you rather have shorter but prettier games?

I've been thinking about this because also I'm quite aware that very nice things I've worked on and added to my games won't ever get noticed as they are in the later levels.

I won't change this, I mean, I like the challenge of cramming a full fledged game in just 32K/64K the best I can, but I'm just curious about what you guys think about the matter.
Re: on voting
by on (#233387)
I don't vote, but I'd rather have shorter and prettier games.

Imagine two entries. One with 40 levels, one with 4. Same amount of development time.

The one with 40 levels is likely going to take more of my time to beat. But I'm actually less likely to enjoy the extra time it will take to beat. While the 40 level game's team was building 36 levels, the 4 level game's team was polishing everything else.

Even outside the context of NES, if a game takes 60 hours, I'm likely to beat it only once. I've played many hour long games 60 times, and greatly prefer it. For every set of game mechanics, there is a cutoff point where new levels stop adding meaningful content.

If I play 60 hour long games, I get 60 sets of game mechanics that don't overstay their welcome. The longer a single game is, the more likely it is to overstay its welcome. Most games released today outlast the new experiences their mechanics provide. At least for me. Less content to consume means that's less likely to happen, and it also means less of my time will be spent finding out if it gets boring.

In short: I'm less likely to think about whether the time playing the game was worth spending if it wasn't a lot of time.

Edit: Simplifying something to get more space is something I'd personally try very hard not to do, because the player is more likely to notice the simplification than they are to notice the absence of content the simplification made room for. Especially in a free game.
Re: on voting
by on (#233392)
I will generally rank a "full-length" game higher than a short demo of similar quality. A full game just takes a lot more time and effort, and thus is more impressive.

I'm always impressed that you (Mojon Twins) submit full-length games.
Re: on voting
by on (#233406)
Full-length gets more points from me too, though your games are usually so difficult I can't finish them, and so don't see any surprises in the later parts, like you said.
Re: on voting
by on (#233407)
I'm a terrible gamer, so I usually don't get very far in most games. I guess I prefer presentation over length for this reason.
Re: on voting
by on (#233444)
NESHomebrew wrote:
How about we do this:

Art and Style 10
Sound 10
Polish and Completeness 10
Originality 10
Overall 20

And then reconsider the categories for 2019.

Sounds good.