- 精华
- 0
- 帖子
- 679
- 威望
- 0 点
- 积分
- 721 点
- 种子
- 12 点
- 注册时间
- 2007-1-8
- 最后登录
- 2024-3-21
|
发表于 2012-11-30 15:48 · 辽宁
|
显示全部楼层
I have found two timing-related problems in the game's main ELF, but I cannot say whether that list is complete or just the tip of the iceberg. Both problems can be triggered by a CDVD command completing at the wrong time, and if triggered will definitely result in the game freezing instantly. The problems are:
1.Access to the global variable cntrl_** from within cdvCallback and cdvMain.
2.Usage of sceCdGetError in cdvCallback.
Both are programming errors, and a program containing them is not even guaranteed to run without issues on real hardware. In my opinion, it is pure luck that SRW does not hang when run from optical media. The first problem can be masked if the hardware is slow enough, but the second one cannot (but fortunately, that one is even more rare to occur)...
Now for the details. The global variable cntrl_** is used to keep track of whether a command has been sent to the CDVD mechanism. For that, it starts at zero and is incremented on two occasions: once from cdvMain when starting the command (e.g., via sceCdRead), and another time from cdvCallback if a command completed successfully. The problem is that cdvMain increments the variable after the command has been started successfully, using a three-instruction sequence (e.g., lw, addiu, sw). cdvCallback does the same after calling sceCdGetError to check for an error. It reminds me of Shadow Hearts Covenant, its the same kind of problem.
Now suppose the callback comes in after cdvMain's lw/addiu, but before sw. The temporary register will contain the value 1, which is going to be written to cntrl_** no matter what. However, the callback thread has a higher priority, so it interrupts this sequence and manipulates cntrl_** itself, setting it to 1. When cdvMain runs again, it writes a value of 1 again, but waits for the value 2 to appear. This will never happen (the callback has already run, but its update to the variable was lost), and the game hangs .
This problem exists at a few locations within cdvMain, as there is no single point where it increments cntrl_**. The code is duplicated a few times, basically for every call to an asynchronous CDVD routine (sceCdRead, sceCdStandby and so on). A fix would need to patch all these increments to prevent cdvCallback from interrupting them; I recommend to patch cdvCallback itself as well. I will try a DIntr/EIntr around the increment and see whether that helps.
The first issue is rare, and unfortunately does not seem to occur as often on my test setup as it does on your machine. I can actually leave my TOOL with ODEM running for half an hour before the game freezes. However, when single-stepping across the problematic regions in the debugger, the hang is reproducible. I can also get the game to continue after such a hang by manually setting cntrl_** to 2.
Second problem. Remember I said that cdvCallback invokes sceCdGetError to check whether the command completed successfully? Well, its not that simple. The routine actually does this: "while ((err = sceCdGetError()) != -1);". That is, it invokes sceCdGetError repeatedly until it succeeds, denoted by a return value other than -1. But for this routine to work, the EE-side CDVD client library must be able to acquire the SCMD semaphore; that is, no other SCMD must be outstanding when sceCdGetError runs, or the routine fails and returns -1. The second problem now is exactly this case: the main thread (cdvMain) does an SCMD, that call acquires the SCMD sema, performs an RPC and waits for it to complete. Then the callback runs and control switches to cdvCallback, which tries to invoke sceCdGetError. That routine sees the SCMD sema being held by another thread and fails, prompting cdvCallback to invoke it again immediately. Of course, cdvMain has to means to run in the meantime (cdvCallback does never sleep and runs at a higher priority), so the program is stuck indefinitely .
I don't have a solution to the second problem yet. In my opinion, the program logic is completely flawed, because sceCdGetError simply is not safe to be called from the callback. The developers seem to have known of such a problem (if not, why would they have put that crazy loop around sceCdGetError?), but unfortunately that doesn't help us. I will try to patch in a call to DelayThread here and see if that helps; the routine does not exist in the file yet, but should be easy to implement manually...
I'm going to test some patches to the game, but I cannot say when these'll be finished. I will be very busy with university for at least the coming month, but I will try to find some time for hacking SRW.
这个是他给的回复 |
|