i feel like you misunderstood that part of the argument.
he is saying representing the state is very hard, and you are saying: given a well represented state, ML is very good at finding the important features, reducing the dementionality, and finding mathematical transformations, etc.
deep learning has been so successful with images because representing them is trivial - flattened pixel vector.
with your last paragraph is that in starcraft, that raises some questions on what rules is the AI going to adhere to.
in SC, you don't view the entire board. you view the minimap / hear noises and alerts and decide were to focus your attention on the map. in battle, being able to click and accurately place attacks quickly is important.
Do you give the computer full view of what they would be able to see? does the computer have 10 million clicks per second abilities, essentially every action is like hitting pause and then making the next action?
I was actually assuming the input representation would just be a video stream, which (combined with audio) is enough for human players, but looking more into it, it's a lot more than a video feed[1].
It feels a little like cheating, but I guess processing the game UI video feed isn't the interesting part of the problem. Plus, it makes the problem much more accessible to hobbyists who can't afford the GPU cluster required to productively experiment on models that process streams of 1080p video.
Still, in principle, I think modern ML modeling approaches could handle the problem of transforming the video feed into a useful high level state representation. I don't think I misunderstood the OP in that regard at least.
Using just the video feed, the AI would be required to reconstruct an overview of the strategic situation, and then develop a forward strategy on top of that involving individual units. Even for a much simpler game like doom, video-only input is enough for strategies like "see an enemy, target and shoot it as fast as possible".
For an AI to be able to effectively compete in a complex game like SC2, preparing high-level inputs is important. Look at these like shortcuts, heuristic approximations of task that would be hard to represent and train with deep learning. I would guess an implementation would need multiple independent nets for various tasks, combined with heuristics. Then each could be separately trained to do the given task.
People should just read the article, I think. It answers all the things you are debating (limit on APM, what features are used, what models they already tried and how well they perform).
he is saying representing the state is very hard, and you are saying: given a well represented state, ML is very good at finding the important features, reducing the dementionality, and finding mathematical transformations, etc.
deep learning has been so successful with images because representing them is trivial - flattened pixel vector.
with your last paragraph is that in starcraft, that raises some questions on what rules is the AI going to adhere to.
in SC, you don't view the entire board. you view the minimap / hear noises and alerts and decide were to focus your attention on the map. in battle, being able to click and accurately place attacks quickly is important.
Do you give the computer full view of what they would be able to see? does the computer have 10 million clicks per second abilities, essentially every action is like hitting pause and then making the next action?