Since we have the opportunity to use the system one more time at Battlecry, might as well consider some more additions to make to the system. However, we'd like to maintain the simplicity of the system from the drive team's perspective (the summary statistics they get from the system are simple), so we're still not sure we should implement every single one of these.

They are mostly statistics we will add to the java app and database, which fall into three categories: hypothesis tests, confidence intervals, and OPR.

An Introduction (important, yet skippable): note that the statistics we get from the database can be treated in two different ways. We could treat them as all the data available: we do not consider previous or future competitions. The data on the system is all the data there is (given we get all forms), making the data on the system be the result of a census. On the other hand, we could treat them as a sample from the theoretical population of all matches a specific team ever plays with their current drive team and robot. Imagine there is a set of all matches that drive team and robot plays in history: our data in the system would be a subset (sample) of that set.

If we treat the data in the system in the first way, then everything below (except for OPR) is not relevant. But, if we treat the data in the second way, we can utilize the tools below to come up with some more interesting statistics. For the most part this year we've treated the data in the first way, but it may be in our interest to treat it in the second way. For Battlecry, we will assume teams will be using the same drive teams they used in the district events for the sake of utilizing these methods properly, but for next year it would be interesting to include additional questions in prescouting regarding drive team...

- HYPOTHESIS TESTS: say we are on our second district event, we have data from a team from our previous event, and we want to know whether this team has improved from their previous event. A well-designed hypothesis test would tell us that. The idea is to hypothesize that they did not improve, then calculate the chances of getting the sample data we did - if it is the case that they did not improve. Say our data for this team is visibly better than our previous data for the same team. We could say that is enough evidence they improved, but this increase may simply be a natural randomness factor (that's a thing in stats, though it can be worded better). If the chances of getting the data we did are too low, that means our initial hypothesis is incorrect, proving they did in fact improve from their previous event. If the chances are not low enough, what we thought was visibly an increase may have just been sampling randomness, which does not prove they improved.

- CONFIDENCE INTERVALS: It's great to have averages and standard deviations for a team's gear makes and shot makes, but that's sometimes hard to read. Standard deviations are meant to tell you how variable the sample data is, but the significance of the standard deviation is tied to units - basically, a standard deviation of 2 for gear makes is very different from a standard deviation of 2 for shots made. Besides, the average we have is only a sample average, and shouldn't be regarded as a team's 'true average' even though we do. What if we could instead provide a range of values in which we believe the team's true average gear makes lies? We can't ever calculate a team's true gear makes with a specific drive team and robot (its a theoretical value), but we can provide an interval that would include this value. That is called a confidence interval. Not only can we provide such an interval, but we can determine how confident we are that the value is in the range provided (i.e. 90% confidence, 95% confidence). Looking at this statistic for gear makes (simply a range of possible averages) would be much simpler than looking at a mean and standard deviation. It's the difference between telling the drive team, "this robot usually makes 2.7 to 3.2 gears per match" to "this robot has an average of 3 gears per match with a standard deviation of 0.4 gears".

- OFFENSIVE POWER RATING: although a subjective measure, this would have "the database pick teams for you". It would rank all teams according to some score, which would be calculated according to weights on the different functions a robot can have in the game. For example, gears would be 50% of the weight, climbing 35%, shooting 15%, something like that - that's the subjective part. One way to test how good a weighting system is would be to compare its ranks to the actual qual rankings of some competition: the closer the ranks are, the better the weighting system. This type of calculation is the kind that doesn't use linear algebra.

These are only examples of what these tools can do. For example, we could also use hypothesis tests to see if a team is truly better than another at gears. It's only a matter of whether we think these tools will make the system simpler and/or more powerful. The only time we'll have to test these is Battlecry.