Two Cents on Computer Architecture Research -104 [Gollu’s Kutti Story: Solving a research problem, one step at a time]
Preamble: This blog is a continuation of my previous blogs: Two cents on Computer Architecture Research [1][2] [3]. Primarily, meant for Indian undergraduate/early graduate students who have taken a course on computer organization/architecture and want to pursue research in the field of computer architecture. As I write this blog, the COVID-19 surge in India is at its peak and still soaring. So, first, stay safe and then read this blog. PS: Kutti means “small” in Tamil :)
Through this blog, we jump into the journey of solving a problem, one step at a time. So this blog will be at the ten feet view and not from 10,000 feet view. After going through [1][2][3], Gollu found a research problem to solve. From where? Maybe while reading research papers that appeared in top forums or discussing with the group members or his advisor Prof. Golli gave him a problem to solve. He did what was supposed to be done, as mentioned in [3]. So now he is excited and ready to solve a problem. For the benefit of readers, we will assume the problem is a simple problem (that undergraduates can correlate to) related to performance aspects of computer architecture (goal is to improve the execution time of an application by providing microarchitectural enhancements such as better caching, TLBs, etc.).
Let the kutti story begin: Gollu, excited to solve his first research problem, met his advisor Prof. Golli. Prof. Golli suggested going through this amazing link by Prof. Mark Hill (https://www.sigarch.org/increasing-your-research-impact/) and not forget the Amdahl’s law. The rest of the story will be the conversation between Gollu and Prof. Golli.
Gollu: I have this exciting problem to solve, and I guess I have some interesting ideas too. Here is an initial write-up/slides.
Prof. Golli: Sounds interesting. Let’s take one step at a time on the problem itself.
Gollu: Why on the problem? I know the problem. I want to discuss the ideas to solve the problem.
Prof. Golli: Well, if you know the problem well, then you know the answer to the problem too.
Instead of searching for ideas, understand the problem deeply. The more you know about the problem, the better will be your solution.
Gollu: ohhhhh.
Prof. Golli: What is the scope of the problem?
Gollu: What do you mean by that? And how to quantify or reason about the same?
Prof. Golli: Let’s assume you have the best solution for this problem. For example, if you want to propose a new branch predictor, let’s see the performance, if the branch predictor accuracy is 100%, similarly, 100% cache hit rate, TLB hit rate, etc. So could you get me a plot that shows
THE performance improvement that you can achieve (with the oracle), and where is the state-of-the-art (SOTA)? How big is the gap? If the gap is narrow, then the problem is solved by the community, and it has marginal scope. Did the SOTA ignore some cases where you can improve?
Gollu: I did not think about it. I was thinking about the ideas that may work. I will get back to you soon.
Prof. Golli: Make sure you use the right set of benchmarks else the insights and conclusions will be wrong. For example, if you want to improve the performance of TLBs, then your benchmark suite should have high TLB miss rates. Look out for memory-intensive and irregular workloads that miss at TLBs, for a good fraction of time.
…….
Gollu: Here is the plot showing a small gap between SOTA and the ideal solution. I think this problem is not worth solving. I am not sure what to do now.
Prof. Golli: OK. What if we bring “xyz” into this plot? XYZ can be a new dimension to the same old problem: new technology, new workloads, or new microarchitecture. All the solutions are not perfect in all the conditions. So, let’s try it.
Gollu: Oh, let me do that.
…….
Gollu: Yes, the scope is much more prominent now. Shall I try “abc” too and see if that makes the scope broad.
Prof. Golli: Sure. However, do not digress too much.
Gollu: I am on it.
…….
Gollu: I guess we have a strong motivation now. I am excited again. Let’s discuss the ideas.
Prof. Golli: Nah. Let’s discuss why SOTA ideas are ineffective or less effective. Could you get me some insights?
Gollu: Hmmm. But why should I care about what others have done? I have a good problem to solve, and we should move on with the ideas.
Prof. Golli: We should do it so that we should not end up providing an incremental solution on top of the state-of-the-art.
Gollu: Oh, I see. OK.
Prof. Golli: Also, let’s write all that we have got.
Gollu: Write? Now? Let’s focus on finding ideas.
…….
Gollu: So, here are the insights on the SOTA and why there is a big gap between the SOTA and the ideal solution. Here is the write-up summarizing all. I guess I have the ideas now.
Prof. Golli: Let’s try to improve the existing ideas incrementally, and see if the ideal performance gap becomes narrow.
Gollu: Yes, it does for some of the existing ideas. However, not for all the workloads, and still there is a significant ideal performance gap. Can we focus on the ideas now? I think I have some ideas that bridge the performance gap.
Prof. Golli: Can we characterize the workloads where SOTA ideas fail? Is it possible to get performance closer to the ideal solution, practically keeping all the trade-offs in mind? What are the trade-offs of interest? Let’s get a detailed characterization with an end-to-end picture. These characterizations will help us find ideas.
…….
Gollu: Here are the insights; there are many, so many plots. I am not sure what to get out of all these plots.
Prof. Golli: Connect all the plots and try to get a global picture.
Gollu: I found something interesting that SOTA ideas miss. It is extremely simple, and I guess anyone can come up with this idea. I think I should find a better idea.
Prof. Golli: Did anyone propose these insights before?
Gollu: No
Prof. Golli: Then :) It is not easy to find simple ideas that work. So you are on the right path. I guess now; you know what to do? and your ideas too :)
Gollu: Oh yes, I will evaluate and get back to you.
Prof. Golli: Sure. Before that, write again.
Gollu: Now? Again?
Prof. Golli: Yup
…….
Gollu: I got some subtle points while writing and performed some more experiments. Now I have some more insights. I have evaluated all. The performance is closer to the ideal solution. I think we are done. We should submit this work.
Prof. Golli: Can you find out why you get what you get?
Gollu: I am confused.
Prof. Golli: Let’s take a workload where you’ve shown 25% performance improvement. From where does this 25% come? Could you get me a stacked bar showing what enhancements contribute by how much?
…….
Gollu: Here it is. I guess now we are done. Shall we submit to a conference?
Prof. Golli: I see, the ideal solution provides 40% performance improvement, whereas we get 25%; what happened to the rest? Where are we failing?
Gollu: I found a bug in my implementation; now I am getting only 15% improvement.
Prof. Golli: Any new insights that we can improve upon?
Gollu: There is an insight that is not closely related to our work, but it can be used. Shall I go for it?
Prof. Golli: Sure.
…….
Gollu: Now, I am getting 28% improvement, and I have reasons why we fail to get 40%. I think our idea is OK but not good enough.
Prof. Golli: If someone has to improve based on what we have proposed to achieve 40%, how practical is it?
Gollu: Oh, that will demand significant storage overhead, which is not practical.
Prof. Golli: Then we are good to go. Could you perform sensitivity studies showing the effectiveness of your idea across different microarchitectures? Also compare, where we do well and where SOTA ideas fail, and why.
Gollu: Done, what next?
Prof Golli: Write, write, write, read, write, read, write. ……
Gollu: Till?
Prof. Golli: Our goal should be to reject our work, and improve it till we do not get any negatives, at least we should be sure that we have done our best.
…….
Gollu: While reading, I got another insight, is it too late, to incorporate that?
Prof. Golli: Not at all. Could you do it?
Gollu: Now, we have 35% improvement, and I have reasons for all the performance benefits, and I have got some ideas that we can try to extend this work further.
Prof. Golli: Awesome. Let’s freeze all and submit.
So, if I have to summarize the story, understand the problem, find out where the problem is, get the ideal solution, characterize the problem, get the insights that give birth to ideas. How good is the idea? Why something works or does not work. As always, thanks to my mentees, especially Anuj and Vinod. This “kutti” story is a real story where Gollu is my M.S. student Vasudha (https://www.cse.iitk.ac.in/users/vasudha/).
[A small note: 06th July 2024] Thanks to Ishita [4] for bringing out a subtle message relevant to this blog. For some research problems, you may not need to find out where the problem is, and the ideal solution. Instead, you jump into the problem with a “what if”/ “can we” kinda narrative. One such research paper is Ghost [5], which appeared in ISCA 2024.
[1] https://medium.com/@biswabandan/two-cents-on-computer-architecture-research-101-4f00957c312a
[2] https://medium.com/@biswabandan/two-cents-on-computer-architecture-research-102-8ec0e127b25d
[4] https://ishitachaturvedi.github.io/
[5] https://liberty.princeton.edu/Publications/isca24_ghost.pdf