Gaussian Distribution+Hash Tables
The original question I uploaded on StackOverflow. I assume it is extra mathematically inclined so I uploaded it below once more. In regards to mathematics.
There is a class of pupils. Each pupils has a rating in between 0 and also 300 (gaussian, with some recognized standard and also typical inconsistency). I require to locate courses of marks (like 0 - 100, 101 - 125, 126 - 140, 141 - 160, 161 - 175, 176 - 200, 201 - 300) such that :
- The variety of courses is minimum
- The variety of pupils in each class is minimum
How do I deal with doing this? Is this a typical trouble? Additionally, is it feasible to confirm that just this set of courses will certainly have the above building.
You wonder about is a little bit underdefined, yet as a whole what you are seeking is index (order?) data. As an example, if the variety of courses is to be $2$, you require to reduce them at the typical. If it is $3$, you reduced them at the $1/3$ and also $2/3$ order data, and more. The courses will certainly be totally equally - spread.
The previous paragraph thought the courses can rely on the information. If the courses are not intended to depend just on the circulation, after that you simply require to locate the index data for the circulation. As an example, the typical of a Gaussian amounts to its mean, and also for the various other factors you can get in touch with a table (there is no shut kind). You can after that approximate just how equally spread out ball games are mosting likely to be, as an example the spread in the L2 spread is offered by a $\chi^2$ circulation, whose suggest you can seek out in Wikipedia.
To have the minimum variety of courses, take a class from 0 to 300. To have the minimum variety of pupils per class, take each various rating to be a class.
Extra seriously, you usually require to specify the variety of courses you desire, and also whether you desire all the courses to be concerning an offered dimension. You could desire this when offering qualities, claim 50% A's, 40% B is and also 10% C's. For various other applications you are simply attempting to team "things that are close adequate with each other that we can treat them the same." If the circulation is a real Gaussian, without sound, there is no wonderful solution. Yet with a sensibly tiny example, you can simply outline a pie chart, seek ratings that are reasonably uncommon, and also make it there.
In greater than one measurement it obtains tougher. You could consider cluster analysis
Statisticians would generally describe your "classes" as "bins".
Your needs for "minimum variety of bind" and also "minimum dimension of the bins" remain in resistance, and also you would certainly need to make clear just how the problem needs to be settled. As an example, if one setup has 10 containers with 5 marks each, and also an additional has 5 containers with 10 marks each, which is much better? Or, one setup has 10 containers with 10 marks each, and also an additional has 10 containers complete, 9 with 9 marks each, and also the last one with 19 marks. You need to define a regulation for just how to recognize which of any kind of 2 setups is much better. Just after that can you ask just how to attain the most effective one, and also whether it is one-of-a-kind.