Computer Architecture

References

1. Adve SV, Gharachorloo K. Shared memory consistency models: A tutorial. IEEE Computer. 1996;29(12):66–76 (December).

2. Adve SV, Hill MD. Weak ordering—a new definition. Proc 17th Annual Int’l Symposium on Computer Architecture (ISCA) 1990;2–14.

3. Agarwal, A. [1987]. “Analysis of Cache Performance for Operating Systems and Multiprogramming,” Ph.D. thesis, Tech. Rep. No. CSL-TR-87-332, Stanford University, Palo Alto, Calif.

4. Agarwal A. Limits on interconnection network performance. IEEE Trans on Parallel and Distributed Systems. 1991;2(4):398–412 (April).

5. Agarwal A, Pudar SD. Column-associative caches: A technique for reducing the miss rate of direct-mapped caches. 20th Annual Int’l Symposium on Computer Architecture (ISCA) 1993; Also appears in Computer Architecture News. 1993;21(2):179–190 (May).

6. Agarwal A, Bianchini R, Chaiken D, Johnson K, Kranz D. The MIT Alewife machine: Architecture and performance. Int’l Symposium on Computer Architecture 1995; June, 2–13.

7. Agarwal A, Hennessy JL, Simoni R, Horowitz MA. An evaluation of directory schemes for cache coherence. Proc 15th Int’l Symposium on Computer Architecture (June) 1988;280–289.

8. Agarwal A, Kubiatowicz J, Kranz D, et al. Sparcle: An evolutionary processor design for large-scale multiprocessors. IEEE Micro. 1993;13:48–61 (June).

9. Agerwala, T.,and J. Cocke [1987]. High Performance Reduced Instruction Set Processors, IBM Tech. Rep. RC12434, IBM, Armonk, N.Y.

10. Akeley K, Jermoluk T. High-Performance Polygon Rendering. Proc 15th Annual Conf on Computer Graphics and Interactive Techniques (SIGGRAPH 1988) 1988;239–246.

11. Alexander WG, Wortman DB. Static and dynamic characteristics of XPL programs. IEEE Computer. 1975;8(11):41–46 (November).

12. Alles, A. [1995]. “ATM Internetworking,” White Paper (May), Cisco Systems, Inc., San Jose, Calif. (www.cisco.com/warp/public/614/12.html).

13. Alliant. Alliant FX/Series: Product Summary Acton, Mass: Alliant Computer Systems Corp.; 1987.

14. Almasi GS, Gottlieb A. Highly Parallel Computing Redwood City, Calif.: Benjamin/Cummings; 1989.

15. Alverson G, Alverson R, Callahan D, Koblenz B, Porterfield A, Smith B. Exploiting heterogeneous parallelism on a multithreaded multiprocessor. Proc ACM/IEEE Conf on Supercomputing 1992;188–197.

16. Amdahl GM. Validity of the single processor approach to achieving large scale computing capabilities. Proc AFIPS Spring Joint Computer Conf. 1967;483–485.

17. Amdahl GM, Blaauw GA, Brooks Jr FP. Architecture of the IBM System 360. IBM J Research and Development. 1964;8(2):87–101 (April).

18. Amza C, Cox AL, Dwarkadas S, et al. Treadmarks: Shared memory computing on networks of workstations. IEEE Computer. 1996;29(2):18–28 (February).

19. Anderson D. You don’t know jack about disks. Queue. 2003;1(4):20–30 (June).

20. Anderson D, Dykes J, Riedel E. SCSI vs ATA—More than an interface. Proc 2nd USENIX Conf on File and Storage Technology (FAST ’03) 2003.

21. Anderson DW, Sparacio FJ, Tomasulo RM. The IBM 360 Model 91: Processor philosophy and instruction handling. IBM J Research and Development. 1967;11(1):8–24 (January).

22. Anderson MH. Strength (and safety) in numbers (RAID, disk storage technology). Byte. 1990;15(13):337–339 (December).

23. Anderson TE, Culler DE, Patterson D. A case for NOW (networks of workstations). IEEE Micro. 1995;15(1):54–64 (February).

24. Ang B, Chiou D, Rosenband D, Ehrlich M, Rudolph L, Arvind. StarT-Voyager: A flexible platform for exploring scalable SMP issues. Proc ACM/IEEE Conf on Supercomputing 1998.

25. Anjan KV, Pinkston TM. An efficient, fully-adaptive deadlock recovery scheme: Disha. Proc 22nd Annual Int’l Symposium on Computer Architecture (ISCA) 1995.

26. Anon. et al. [1985]. A Measure of Transaction Processing Power, Tandem Tech. Rep. TR85.2. Also appears in Datamation 31:7 (April), 112–118, 1985.

27. Apache Hadoop. In: http://hadoop.apache.org; 2011.

28. Archibald J, Baer J-L. Cache coherence protocols: Evaluation using a multiprocessor simulation model. ACM Trans on Computer Systems. 1986;4(4):273–298 (November).

29. Armbrust, M., A. Fox, R. Griffith, A. D Joseph, R. Katz., A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, M. Zaharia [2009]. Above the Clouds: A Berkeley View of Cloud Computing, Tech. Rep. UCB/EECS-2009-28, University of California, Berkeley (http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.html).

30. Arpaci RH, Culler DE, Krishnamurthy A, Steinberg SG, Yelick K. Empirical evaluation of the CRAY-T3D: A compiler perspective. 22nd Annual Int’l Symposium on Computer Architecture (ISCA) 1995.

31. Asanovic, K. [1998]. “Vector Microprocessors,” Ph.D. thesis, Computer Science Division, University of California, Berkeley.

32. Associated Press. Gap Inc shuts down two Internet stores for major overhaul. In: USATODAY.com; 2005.

33. Atanasoff, J.V. [1940]. Computing Machine for the Solution of Large Systems of Linear Equations, Internal Report, Iowa State University, Ames.

34. Atkins M. Performance and the i860 Microprocessor. IEEE Micro. 1991;11(5):72–78 (September), 24–27.

35. Austin TM, Sohi G. Dynamic dependency analysis of ordinary programs. Proc 19th Annual Int’l Symposium on Computer Architecture (ISCA) 1992;342–351.

36. Babbay F, Mendelson A. Using value prediction to increase the power of speculative execution hardware. ACM Trans on Computer Systems. 1998;16(3):234–270 (August).

37. Baer J-L, Wang W-H. On the inclusion property for multi-level cache hierarchies. Proc 15th Annual Int’l Symposium on Computer Architecture 1988;73–80.

38. Bailey DH, Barszcz E, Barton JT, et al. The NAS parallel benchmarks. Int’l J Supercomputing Applications. 1991;5:63–73.

39. Bakoglu HB, Grohoski GF, Thatcher LE, et al. IBM second-generation RISC processor organization. Proc IEEE Int’l Conf on Computer Design, September 1989;138–142.

40. Balakrishnan H, Padmanabhan VN, Seshan S, Katz RH. A comparison of mechanisms for improving TCP performance over wireless links. IEEE/ACM Trans on Networking. 1997;5(6):756–769 (December).

41. Ball T, Larus J. Branch prediction for free. Proc ACM SIGPLAN’93 Conference on Programming Language Design and Implementation (PLDI) 1993;300–313.

42. Banerjee, U. [1979]. “Speedup of Ordinary Programs,” Ph.D. thesis, Dept. of Computer Science, University of Illinois at Urbana-Champaign.

43. Barham P, Dragovic B, Fraser K, et al. Xen and the art of virtualization. Proc of the 19th ACM Symposium on Operating Systems Principles 2003.

44. Barroso LA. Warehouse Scale Computing [keynote address]. Proc ACM SIGMOD 2010.

45. Barroso LA, Hölzle U. The case for energy-proportional computing. IEEE Computer. 2007;40(12):33–37 (December).

46. Barroso LA, Hölzle U. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines San Rafael, Calif.: Morgan & Claypool; 2009.

47. Barroso LA, Gharachorloo K, Bugnion E. Memory system characterization of commercial workloads. Proc 25th Annual Int’l Symposium on Computer Architecture (ISCA) 1998;3–14.

48. Barton RS. A new approach to the functional design of a computer. Proc Western Joint Computer Conf. 1961;393–396.

49. Bashe CJ, Buchholz W, Hawkins GV, Ingram JL, Rochester N. The architecture of IBM’s early computers. IBM J Research and Development. 1981;25(5):363–375 (September).

50. Bashe CJ, Johnson LR, Palmer JH, Pugh EW. IBM’s Early Computers Cambridge, Mass: MIT Press; 1986.

51. Baskett F, Keller TW. An evaluation of the Cray-1 processor. In: Kuck DJ, Lawrie DH, Sameh AH, eds. High Speed Computer and Algorithm Organization. San Diego: Academic Press; 1977;71–84.

52. Baskett F, Jermoluk T, Solomon D. The 4D-MP graphics superworkstation: Computing + graphics = 40 MIPS + 40 MFLOPS and 10,000 lighted polygons per second. Proc IEEE COMPCON 1988;468–471.

53. BBN Laboratories. [1986]. Butterfly Parallel Processor Overview, Tech. Rep. 6148, BBN Laboratories, Cambridge, Mass.

54. Bell CG. The mini and micro industries. IEEE Computer. 1984;17(10):14–30 (October).

55. Bell CG. Multis: A new class of multiprocessor computers. Science. 1985;228:462–467 (April 26).

56. Bell CG. The future of high performance computers in science and engineering. Communications of the ACM. 1989;32(9):1091–1101 (September).

57. Bell, G.,and J. Gray [2001]. Crays, Clusters and Centers, Tech. Rep. MSR-TR-2001-76, Microsoft Research, Redmond, Wash.

58. Bell CG, Gray J. What’s next in high performance computing? CACM. 2002;45(2):91–95 (February).

59. Bell CG, Newell A. Computer Structures: Readings and Examples New York: McGraw-Hill; 1971.

60. Bell CG, Strecker WD. Computer structures: What have we learned from the PDP-11? Third Annual Int’l Symposium on Computer Architecture (ISCA) 1976;1–14.

61. Bell CG, Strecker WD. Computer structures: What have we learned from the PDP-11? 25 Years of the International Symposia on Computer Architecture (Selected Papers) 1998;138–151.

62. Bell CG, Mudge JC, McNamara JE. A DEC View of Computer Engineering Bedford, Mass: Digital Press; 1978.

63. Bell CG, Cady R, McFarland H, et al. A new architecture for mini-computers: The DEC PDP-11. Proc AFIPS Spring Joint Computer Conf. 1970;657–675.

64. Benes VE. Rearrangeable three stage connecting networks. Bell System Technical Journal. 1962;41:1481–1492.

65. Bertozzi D, Jalabert A, Murali S, et al. NoC synthesis flow for customized domain specific multiprocessor systems-on-chip. IEEE Trans on Parallel and Distributed Systems. 2005;16(2):113–130 (February).

66. Bhandarkar DP. Alpha Architecture and Implementations Newton, Mass: Digital Press; 1995.

67. Bhandarkar DP, Clark DW. Performance from architecture: Comparing a RISC and a CISC with similar hardware organizations. Proc Fourth Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1991;310–319.

68. Bhandarkar DP, Ding J. Performance characterization of the Pentium Pro processor. Proc Third Int’l Symposium on High-Performance Computer Architecture 1997;288–297.

69. Bhuyan LN, Agrawal DP. Generalized hypercube and hyperbus structures for a computer network. IEEE Trans on Computers. 1984;32(4):322–333 (April).

70. Bienia, C., S. Kumar, P. S Jaswinder, K. Li [2008]. The Parsec Benchmark Suite: Characterization and Architectural Implications, Tech. Rep. TR-811-08, Princeton University, Princeton, N.J.

71. Bier, J. [1997]. “The Evolution of DSP Processors,” presentation at Univesity of California, Berkeley, November 14.

72. Bird S, Phansalkar A, John LK, Mericas A, Indukuru R. Characterization of performance of SPEC CPU benchmarks on Intel’s Core Microarchitecture based processor. Proc 2007 SPEC Benchmark Workshop 2007.

73. Birman M, Samuels A, Chu G, et al. Developing the WRL3170/3171 SPARC floating-point coprocessors. IEEE Micro. 1990;10(1):55–64.

74. Blackburn M, Garner R, Hoffman C, et al. The DaCapo benchmarks: Java benchmarking development and analysis. ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA) 2006;169–190.

75. Blaum M, Bruck J, Vardy A. MDS array codes with independent parity symbols. IEEE Trans on Information Theory. 1996;IT-42:529–542 (March).

76. Blaum M, Brady J, Bruck J, Menon J. EVENODD: An optimal scheme for tolerating double disk failures in RAID architectures. Proc 21st Annual Int’l Symposium on Computer Architecture (ISCA) 1994;245–254.

77. Blaum M, Brady J, Bruck J, Menon J. EVENODD: An optimal scheme for tolerating double disk failures in RAID architectures. IEEE Trans on Computers. 1995;44(2):192–202 (February).

78. Blaum M, Brady J, Bruck J, Menon J, Vardy A. The EVENODD code and its generalization. In: Jin H, Cortes T, Buyya R, eds. High Performance Mass Storage and Parallel I/O: Technologies and Applications. New York: Wiley–IEEE; 2001;187–208.

79. Bloch E. The engineering design of the Stretch computer. 1959 Proceedings of the Eastern Joint Computer Conf. 1959;48–59.

80. Boddie JR. History of DSPs. In: http://www.lucent.com/micro/dsp/dsphist.html; 2000.

81. Bolt KM. Amazon sees sales rise, profit fall. Seattle Post-Intelligencer 2005; October 25 http://seattlepi.nwsource.com/business/245943_techearns26.html; 2005.

82. Bordawekar R, Bondhugula U, Rao R. Believe It or Not!: Multi-core CPUs can Match GPU Performance for a FLOP-Intensive Application!. 19th International Conference on Parallel Architecture and Compilation Techniques (PACT 2010) 2010;537–538.

83. Borg A, Kessler RE, Wall DW. Generation and analysis of very long address traces. 19th Annual Int’l Symposium on Computer Architecture (ISCA) 1992;270–279.

84. Bouknight WJ, Deneberg SA, McIntyre DE, Randall JM, Sameh AH, Slotnick DL. The Illiac IV system. Proc IEEE. 1972;60(4):369–379 Also appears in.

84. Siewiorek DP, Bell CG, Newell A. Computer Structures: Principles and Examples New York: McGraw-Hill; 1972; 306–316.

85. Brady JT. A theory of productivity in the creative process. IEEE CG&A 1986; (May), 25–34.

86. Brain M. Inside a Digital Cell Phone. In: www.howstuffworks.com/inside-cellphone.htm; 2000.

87. Brandt M, Brooks J, Cahir M, Hewitt T, Lopez-Pineda E, Sandness D. The Benchmarker’s Guide for Cray SV1 Systems Seattle, Wash: Cray Inc.; 2000.

88. Brent RP, Kung HT. A regular layout for parallel adders. IEEE Trans on Computers. 1982;C-31:260–264.

89. Brewer EA, Kuszmaul BC. How to get good performance from the CM-5 data network. Proc Eighth Int’l Parallel Processing Symposium 1994.

90. Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. Proc 7th Int’l World Wide Web Conf. 1998;107–117.

91. Brown A, Patterson DA. Towards maintainability, availability, and growth benchmarks: A case study of software RAID systems. Proc 2000 USENIX Annual Technical Conf. 2000.

92. Bucher IV, Hayes AH. I/O performance measurement on Cray-1 and CDC 7000 computers. Proc Computer Performance Evaluation Users Group, 16th Meeting 1980;245–254.

93. Bucher IY. The computational speed of supercomputers. Proc Int’l Conf on Measuring and Modeling of Computer Systems (SIGMETRICS 1983) 1983;151–165.

94. Bucholtz W. Planning a Computer System: Project Stretch New York: McGraw-Hill; 1962.

95. Burgess N, Williams T. Choices of operand truncation in the SRT division algorithm. IEEE Trans on Computers. 1995;44(7):933–938.

96. Burkhardt III, H., S. Frank, B. Knobe, J. Rothnie [1992]. Overview of the KSR1 Computer System, Tech. Rep. KSR-TR-9202001, Kendall Square Research, Boston, Mass.

97. Burks AW, Goldstine HH, von Neumann J. Preliminary discussion of the logical design of an electronic computing instrument. In: Aspray W, Burks A, eds. Report to the U.S Army Ordnance Department, p 1;. Los Angeles, Calif.: MIT Press, Cambridge, Mass., and Tomash Publishers; 1987;97–146. also appears in Papers of John von Neumann.

98. Calder B, Reinman G, Tullsen DM. Selective value prediction. Proc 26th Annual Int’l Symposium on Computer Architecture (ISCA) 1999.

99. Calder B, Grunwald D, Jones M, et al. Evidence-based static branch prediction using machine learning. ACM Trans Program Lang Syst. 1997;19(1):188–222.

100. Callahan D, Dongarra J, Levine D. Vectorizing compilers: A test suite and results. Proc ACM/IEEE Conf on Supercomputing 1988;98–105.

101. Cantin JF, Hill MD. Cache Performance for Selected SPEC CPU2000 Benchmarks. www.jfred.org/cache-data.html; 2001; (June).

102. Cantin JF, Hill MD. Cache Performance for SPEC CPU2000 Benchmarks, Version 3.0. In: www.cs.wisc.edu/multifacet/misc/spec2000cache-data/index.html; 2003.

103. Carles S. Amazon reports record Xmas season, top game picks. Gamasutra, December 27 http://www.gamasutra.com/php-bin/news_index.php?story=7630; 2005.

104. Carter J, Rajamani K. Designing energy-efficient servers and data centers. IEEE Computer. 2010;43(7):76–78 (July).

105. Case RP, Padegs A. The architecture of the IBM System/370. Communications of the ACM. 1978;21(1):73–96 Also appears in.

105. Siewiorek DP, Bell CG, Newell A. Computer Structures: Principles and Examples New York: McGraw-Hill; 1978; 830–855.

106. Censier L, Feautrier P. A new solution to coherence problems in multicache systems. IEEE Trans on Computers. 1978;C-27(12):1112–1118 (December).

107. Chandra R, Devine S, Verghese B, Gupta A, Rosenblum M. Scheduling and page migration for multiprocessor compute servers. Sixth Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1994;12–24.

108. Chang F, Dean J, Ghemawat S, et al. Bigtable: A distributed storage system for structured data. Proc 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’06) 2006.

109. Chang J, Meza J, Ranganathan P, Bash C, Shah A. Green server design: Beyond operational energy to sustainability. Proc Workshop on Power Aware Computing and Systems (HotPower ’10) 2010.

110. Chang PP, Mahlke SA, Chen WY, Warter NJ, Hwu WW. IMPACT: An architectural framework for multiple-instruction-issue processors. 18th Annual Int’l Symposium on Computer Architecture (ISCA) 1991;266–275.

111. Charlesworth AE. An approach to scientific array processing: The architecture design of the AP-120B/FPS-164 family. Computer. 1981;14(9):18–27 (September).

112. Charlesworth A. Starfire: Extending the SMP envelope. IEEE Micro. 1998;18(1):39–49 (January/February).

113. Chen PM, Lee EK. Striping in a RAID level 5 disk array. Proc ACM SIGMETRICS Conf on Measurement and Modeling of Computer Systems 1995;136–145.

114. Chen PM, Gibson GA, Katz RH, Patterson DA. An evaluation of redundant arrays of inexpensive disks using an Amdahl 5890. Proc.ACM SIGMETRICS Conf on Measurement and Modeling of Computer Systems 1990.

115. Chen PM, Lee EK, Gibson GA, Katz RH, Patterson DA. RAID: High-performance, reliable secondary storage. ACM Computing Surveys. 1994;26(2):145–188 (June).

116. Chen S. Large-scale and high-speed multiprocessor system for scientific applications. Proc NATO Advanced Research Workshop on High-Speed Computing 1983; Also appears in In: Hwang K, ed. Superprocessors: Design and applications. 1983;602–609. IEEE (August).

117. Chen TC. Overlap and parallel processing. In: Stone H, ed. Introduction to Computer Architecture. Chicago: Science Research Associates; 1980;427–486.

118. Chow, F. C. [1983]. “A Portable Machine-Independent Global Optimizer—Design and Measurements,” Ph.D. thesis, Stanford University, Palo Alto, Calif.

119. Chrysos GZ, Emer JS. Memory dependence prediction using store sets. Proc 25th Annual Int’l Symposium on Computer Architecture (ISCA) 1998;142–153.

120. Clark B, Deshane T, Dow E, et al. Xen and the art of repeated research. Proc USENIX Annual Technical Conf. 2004;135–144.

121. Clark DW. Cache performance of the VAX-11/780. ACM Trans on Computer Systems. 1983;1(1):24–37.

122. Clark DW. Pipelining and performance in the VAX 8800 processor. Proc Second Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1987;173–177.

123. Clark DW, Emer JS. Performance of the VAX-11/780 translation buffer: Simulation and measurement. ACM Trans on Computer Systems. 1985;3(1):31–62 (February).

124. Clark D, Levy H. Measurement and analysis of instruction set use in the VAX-11/780. Proc Ninth Annual Int’l Symposium on Computer Architecture (ISCA) 1982;9–17.

125. Clark D, Strecker WD. Comments on ‘the case for the reduced instruction set computer,’. Computer Architecture News. 1980;8(6):34–38 (October).

126. Clark WA. The Lincoln TX-2 computer development. Proc Western Joint Computer Conference 1957;143–145.

127. Clidaras J, Johnson C, Felderman B. Private communication 2010.

128. Climate Savers Computing Initiative. Efficiency Specs. In: http://www.climatesaverscomputing.org/; 2007.

129. Clos C. A study of non-blocking switching networks. Bell Systems Technical Journal. 1953;32:406–424 (March).

130. Cody WJ, Coonen JT, Gay DM, et al. A proposed radix- and word-lengthindependent standard for floating-point arithmetic. IEEE Micro. 1984;4(4):86–100.

131. Colwell RP, Steck R. A 0.6 μm BiCMOS processor with dynamic execution. Proc of IEEE Int’l Symposium on Solid State Circuits (ISSCC) 1995;176–177.

132. Colwell RP, Nix RP, O’Donnell JJ, Papworth DB, Rodman PK. A VLIW architecture for a trace scheduling compiler. Proc Second Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1987;180–192.

133. Comer D. Internetworking with TCP/IP 2nd ed. Englewood Cliffs, N.J.: Prentice Hall; 1993.

134. Compaq Computer Corporation. [1999]. Compiler Writer’s Guide for the Alpha 21264, Order Number EC-RJ66A-TE, June, www1.support.compaq.com/alpha-tools/documentation/current/21264_EV67/ec-rj66a-te_comp_writ_gde_for_alpha21264.pdf.

135. Conti C, Gibson DH, Pitkowsky SH. Structural aspects of the System/360 Model 85 Part I General organization. IBM Systems J. 1968;7(1):2–14.

136. Coonen J. [1984]. “Contributions to a Proposed Standard for Binary Floating-Point Arithmetic,” Ph.D. thesis, University of California, Berkeley.

137. Corbett P, English B, Goel A, et al. Row-diagonal parity for double disk failure correction. Proc 3rd USENIX Conf on File and Storage Technology (FAST ’04) 2004.

138. Crawford J, Gelsinger P. Programming the 80386 Alameda, Calif.: Sybex Books; 1988.

139. Culler DE, Singh JP, Gupta A. Parallel Computer Architecture: A Hardware/Software Approach San Francisco: Morgan Kaufmann; 1999.

140. Curnow HJ, Wichmann BA. A synthetic benchmark. The Computer J. 1976;19(1):43–49.

141. Cvetanovic Z, Kessler RE. Performance analysis of the Alpha 21264-based Compaq ES40 system. Proc 27th Annual Int’l Symposium on Computer Architecture (ISCA) 2000;192–202.

142. Dally WJ. Performance analysis of k-ary n-cube interconnection networks. IEEE Trans on Computers. 1990;39(6):775–785 (June).

143. Dally WJ. Virtual channel flow control. IEEE Trans on Parallel and Distributed Systems. 1992;3(2):194–205 (March).

144. Dally WJ. Interconnect limited VLSI architecture. Proc of the International Interconnect Technology Conference 1999.

145. Dally WJ, Seitz CI. The torus routing chip. Distributed Computing. 1986;1(4):187–196.

146. Dally WJ, Towles B. Route packets, not wires: On-chip interconnection networks. Proc 38th Design Automation Conference 2001.

147. Dally WJ, Towles B. Principles and Practices of Interconnection Networks San Francisco: Morgan Kaufmann; 2003.

148. Darcy JD, Gay D. FLECKmarks: Measuring floating point performance using a full IEEE compliant arithmetic benchmark. CS 252 class project Berkeley: University of California; 1996; see HTTP.CS.Berkeley.EDU/~darcy/Projects/cs252/; 1996.

149. Darley, H. M. et al. [1989]. “Floating Point/Integer Processor with Divide and Square Root Functions,” U.S. Patent 4,878,190, October 31.

150. Davidson ES. The design and control of pipelined function generators. Proc IEEE Conf on Systems, Networks, and Computers 1971;19–21.

151. Davidson ES, Thomas AT, Shar LE, Patel JH. Effective control for pipelined processors. Proc IEEE COMPCON 1975;181–184.

152. Davie BS, Peterson LL, Clark D. Computer Networks: A Systems Approach 2nd ed. San Francisco: Morgan Kaufmann; 1999.

153. Dean J. Designs, lessons and advice from building large distributed systems [keynote address]. Proc 3rd ACM SIGOPS Int’l Workshop on Large-Scale Distributed Systems and Middleware, Co-located with the 22nd ACM Symposium on Operating Systems Principles 2009.

154. Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. In Proc Operating Systems Design and Implementation (OSDI) 2004;137–150.

155. Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Communications of the ACM. 2008;51(1):107–113.

156. DeCandia G, Hastorun D, Jampani M, et al. Dynamo: Amazon’s highly available key-value store. Proc 21st ACM Symposium on Operating Systems Principles 2007.

157. Dehnert JC, Hsu PY-T, Bratt JP. Overlapped loop support on the Cydra 5. Proc Third Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1989;26–39.

158. Demmel JW, Li X. Faster numerical algorithms via exception handling. IEEE Trans on Computers. 1994;43(8):983–992.

159. Denehy TE, Bent J, Popovici FI, Arpaci-Dusseau AC, Arpaci-Dusseau RH. Deconstructing storage arrays. Proc 11th Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2004;59–71.

160. Desurvire E. Lightwave communications: The fifth generation. Scientific American (International Edition). 1992;266(1):96–103 (January).

161. Diep TA, Nelson C, Shen JP. Performance evaluation of the PowerPC 620 microarchitecture. Proc 22nd Annual Int’l Symposium on Computer Architecture (ISCA) 1995.

162. Digital Semiconductor. Alpha Architecture Handbook, Version 3 Maynard, Mass: Digital Press; 1996.

163. Ditzel DR, McLellan HR. Branch folding in the CRISP microprocessor: Reducing the branch delay to zero. Proc 14th Annual Int’l Symposium on Computer Architecture (ISCA) 1987;2–7.

164. Ditzel DR, Patterson DA. Retrospective on high-level language computer architecture. Proc Seventh Annual Int’l Symposium on Computer Architecture (ISCA) 1980;97–104.

165. Doherty WJ, Kelisky RP. Managing VM/CMS systems for user effectiveness. IBM Systems J. 1979;18(1):143–166.

166. Dongarra JJ. A survey of high performance processors. Proc IEEE COMPCON 1986;8–11.

167. Dongarra J, Sterling T, Simon H, Strohmaier E. High-performance computing: Clusters, constellations, MPPs, and future directions. Computing in Science & Engineering. 2005;7(2):51–59 (March/April).

168. Douceur JR, Bolosky WJ. A large scale study of file-system contents. Proc ACM SIGMETRICS Conf on Measurement and Modeling of Computer Systems 1999;59–69.

169. Douglas J. [2005]. “Intel 8xx series and Paxville Xeon-MP microprocessors,” paper presented at Hot Chips 17, August 14–16, 2005, Stanford University, Palo Alto, Calif.

170. Duato J. A new theory of deadlock-free adaptive routing in wormhole networks. IEEE Trans on Parallel and Distributed Systems. 1993;4(12):1320–1331 (December).

171. Duato J, Pinkston TM. A general theory for deadlock-free adaptive routing using a mixed set of resources. IEEE Trans on Parallel and Distributed Systems. 2001;12(12):1219–1235 (December).

172. Duato J, Yalamanchili S, Ni L. Interconnection Networks: An Engineering Approach 2nd printing San Francisco: Morgan Kaufmann; 2003.

173. Duato J, Johnson I, Flich J, Naven F, Garcia P, Nachiondo T. A new scalable and cost-effective congestion management strategy for lossless multistage interconnection networks. Proc 11th Int’l Symposium on High-Performance Computer Architecture 2005.

174. Duato J, Lysne O, Pang R, Pinkston TM. Part I: A theory for deadlock-free dynamic reconfiguration of interconnection networks. IEEE Trans on Parallel and Distributed Systems. 2005b;16(5):412–427 (May).

175. Dubois M, Scheurich C, Briggs F. Synchronization, coherence, and event ordering. IEEE Computer. 1988;21(2):9–21 (February).

176. Dunigan W, Vetter K, White K, Worley P. Performance evaluation of the Cray X1 distributed shared memory architecture. IEEE Micro 2005;30–40 January/February.

177. Eden A, Mudge T. The YAGS branch prediction scheme. Proc of the 31st Annual ACM/IEEE Int’l Symposium on Microarchitecture 1998;69–80.

178. Edmondson JH, Rubinfield PI, Preston R, Rajagopalan V. Superscalar instruction execution in the 21164 Alpha microprocessor. IEEE Micro. 1995;15(2):33–43.

179. Eggers, S. [1989]. “Simulation Analysis of Data Sharing in Shared Memory Multiprocessors,” Ph.D. thesis, University of California, Berkeley.

180. Elder J, Gottlieb A, Kruskal CK, et al. Issues related to MIMD shared-memory computers: The NYU Ultracomputer approach. Proc 12th Annual Int’l Symposium on Computer Architecture (ISCA) 1985;126–135.

181. Ellis JR. Bulldog: A Compiler for VLIW Architectures Cambridge, Mass: MIT Press; 1986.

182. Emer JS, Clark DW. A characterization of processor performance in the VAX-11/780. Proc 11th Annual Int’l Symposium on Computer Architecture (ISCA) 1984;301–310.

183. Enriquez P. What happened to my dial tone? A study of FCC service disruption reports. poster, Richard Tapia Symposium on the Celebration of Diversity in Computing 2001.

184. Erlichson A, Nuckolls N, Chesson G, Hennessy JL. SoftFLASH: Analyzing the performance of clustered distributed virtual shared memory. Proc Seventh Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1996;210–220.

185. Esmaeilzadeh H, Cao T, Xi Y, Blackburn SM, McKinley KS. Looking Back on the Language and Hardware Revolution: Measured Power, Performance, and Scaling. Proc 16th Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2011.

186. Evers M, Patel SJ, Chappell RS, Patt YN. An analysis of correlation and predictability: What makes two-level branch predictors work. Proc 25th Annual Int’l Symposium on Computer Architecture (ISCA) 1998;52–61.

187. Fabry RS. Capability based addressing. Communications of the ACM. 1974;17(7):403–412 (July).

188. Falsafi B, Wood DA. Reactive NUMA: A design for unifying S-COMA and CC-NUMA. Proc 24th Annual Int’l Symposium on Computer Architecture (ISCA) 1997;229–240.

189. Fan X, Weber W, Barroso LA. Power provisioning for a warehouse-sized computer. Proc 34th Annual Int’l Symposium on Computer Architecture (ISCA) 2007.

190. Farkas KI, Jouppi NP. Complexity/performance trade-offs with non-blocking loads. Proc 21st Annual Int’l Symposium on Computer Architecture (ISCA) 1994.

191. Farkas KI, Jouppi NP, Chow P. How useful are non-blocking loads, stream buffers and speculative execution in multiple issue processors? Proc First IEEE Symposium on High-Performance Computer Architecture 1995;78–89.

192. Farkas KI, Chow P, Jouppi NP, Vranesic Z. Memory-system design considerations for dynamically-scheduled processors. Proc 24th Annual Int’l Symposium on Computer Architecture (ISCA) 1997;133–143.

193. Fazio D. It’s really much more fun building a supercomputer than it is simply inventing one. Proc IEEE COMPCON 1987;102–105.

194. Fisher JA. Trace scheduling: A technique for global microcode compaction. IEEE Trans on Computers. 1981;30(7):478–490 (July).

195. Fisher JA. Very long instruction word architectures and ELI-512. 10th Annual Int’l Symposium on Computer Architecture (ISCA) 1982;140–150.

196. Fisher JA, Freudenberger SM. Predicting conditional branches from previous runs of a program. Proc Fifth Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1992;85–95.

197. Fisher JA, Rau BR. Journal of Supercomputing 1993; January (special issue).

198. Fisher JA, Ellis JR, Ruttenberg JC, Nicolau A. Parallel processing: A smart compiler and a dumb processor. Proc SIGPLAN Conf on Compiler Construction 1984;11–16.

199. Flemming PJ, Wallace JJ. How not to lie with statistics: The correct way to summarize benchmarks results. Communications of the ACM. 1986;29(3):218–221 (March).

200. Flynn MJ. Very high-speed computing systems. Proc IEEE. 1966;54(12):1901–1909 (December).

201. Forgie JW. The Lincoln TX-2 input-output system. Proc Western Joint Computer Conference 1957;156–160 (February).

202. Foster CC, Riseman EM. Percolation of code to enhance parallel dispatching and execution. IEEE Trans on Computers. 1972;C-21(12):1411–1415 (December).

203. Frank SJ. Tightly coupled multiprocessor systems speed memory access time. Electronics. 1984;57(1):164–169 (January).

204. Freiman CV. Statistical analysis of certain binary division algorithms. Proc IRE. 1961;49(1):91–103.

205. Friesenborg SE, Wicks RJ. DASD Expectations: The 3380, 3380-23, and MVS/XA Gaithersburg, Md.: Tech. Bulletin GG22-9363-02, IBM Washington Systems Center; 1985.

206. Fuller SH, Burr WE. Measurement and evaluation of alternative computer architectures. Computer. 1977;10(10):24–35 (October).

207. Furber SB. ARM System Architecture Harlow, England: Addison-Wesley; 1996; see www.cs.man.ac.uk/amulet/publications/books/ARMsysArch; 1996.

208. Gagliardi UO. Report of workshop 4—software-related advances in computer hardware. Proc Symposium on the High Cost of Software 1973;99–120.

209. Gajski D, Kuck D, Lawrie D, Sameh A. CEDAR—a large scale multiprocessor. Proc Int’l Conf on Parallel Processing (ICPP) 1983;524–529.

210. Gallagher DM, Chen WY, Mahlke SA, Gyllenhaal JC, Hwu WW. Dynamic memory disambiguation using the memory conflict buffer. Proc Sixth Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1994;183–193.

211. Galles M. Scalable pipelined interconnect for distributed endpoint routing: The SGI SPIDER chip. Proc IEEE HOT Interconnects ’96 1996.

212. Game M, Booker A. CodePack code compression for PowerPC processors. MicroNews. 1999;5 In: www.chips.ibm.com/micronews/vol5_no1/codepack.html; 1999.

213. Gao QS. The Chinese remainder theorem and the prime memory system. 20th Annual Int’l Symposium on Computer Architecture (ISCA) 1993; (Computer Architecture News 21:2 (May), 337–340).

214. Gap. [2005]. “Gap Inc. Reports Third Quarter Earnings,” http://gapinc.com/public/documents/PR_Q405EarningsFeb2306.pdf.

215. Gap. [2006]. “Gap Inc. Reports Fourth Quarter and Full Year Earnings,” http://gapinc.com/public/documents/Q32005PressRelease_Final22.pdff.

216. Garner R, Agarwal A, Briggs F, et al. Scalable processor architecture (SPARC). Proc IEEE COMPCON 1988;278–283.

217. Gebis J, Patterson D. Embracing and extending 20th-century instruction set architectures. IEEE Computer. 2007;40(4):68–75 (April).

218. Gee JD, Hill MD, Pnevmatikatos DN, Smith AJ. Cache performance of the SPEC92 benchmark suite. IEEE Micro. 1993;13(4):17–27 (August).

219. Gehringer EF, Siewiorek DP, Segall Z. Parallel Processing: The Cm* Experience Bedford, Mass: Digital Press; 1987.

220. Gharachorloo K, Gupta A, Hennessy JL. Hiding memory latency using dynamic scheduling in shared-memory multiprocessors. Proc 19th Annual Int’l Symposium on Computer Architecture (ISCA) 1992.

221. Gharachorloo K, Lenoski D, Laudon J, Gibbons P, Gupta A, Hennessy JL. Memory consistency and event ordering in scalable shared-memory multiprocessors. Proc 17th Annual Int’l Symposium on Computer Architecture (ISCA) 1990;15–26.

222. Ghemawat S, Gobioff H, Leung S-T. The Google file system. Proc 19th ACM Symposium on Operating Systems Principles 2003.

223. Gibson DH. Considerations in block-oriented systems design. AFIPS Conf Proc. 1967;30:75–80.

224. Gibson GA. Redundant Disk Arrays: Reliable, Parallel Secondary Storage, ACM Distinguished Dissertation Series Cambridge, Mass: MIT Press; 1992.

225. Gibson J. C. [1970] “The Gibson mix,” Rep. TR. 00.2043, IBM Systems Development Division, Poughkeepsie, N.Y. (research done in 1959).

226. Gibson J, Kunz R, Ofelt D, Horowitz M, Hennessy J, Heinrich M. FLASH vs (simulated) FLASH: Closing the simulation loop. Proc Ninth Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2000;49–58.

227. Glass CJ, Ni LM. The Turn Model for adaptive routing. 19th Annual Int’l Symposium on Computer Architecture (ISCA) 1992.

228. Goldberg D. What every computer scientist should know about floating-point arithmetic. Computing Surveys. 1991;23(1):5–48.

229. Goldberg IB. 27 bits are not enough for 8-digit accuracy. Communications of the ACM. 1967;10(2):105–106.

230. Goldstein, S. [1987]. Storage Performance—An Eight Year Outlook, Tech. Rep. TR 03.308-1, Santa Teresa Laboratory, IBM Santa Teresa Laboratory, San Jose, Calif.

231. Goldstine HH. The Computer: From Pascal to von Neumann Princeton, N.J.: Princeton University Press; 1972.

232. González J, González A. Limits of instruction level parallelism with data speculation. Proc Vector and Parallel Processing (VECPAR) Conf. 1998;585–598.

233. Goodman JR. Using cache memory to reduce processor memory traffic. Proc 10th Annual Int’l Symposium on Computer Architecture (ISCA) 1982;124–131.

234. Goralski W. SONET: A Guide to Synchronous Optical Network New York: McGraw-Hill; 1997.

235. Gosling JB. Design of Arithmetic Units for Digital Computers New York: Springer-Verlag; 1980.

236. Gray J. A census of Tandem system availability between 1985 and 1990. IEEE Trans on Reliability. 1990;39(4):409–418 (October).

237. Gray J. The Benchmark Handbook for Database and Transaction Processing Systems 2nd ed. San Francisco: Morgan Kaufmann; 1993.

238. Gray J. Sort benchmark home page 2006; In: http://sortbenchmark.org/; 2006.

239. Gray J, Reuter A. Transaction Processing: Concepts and Techniques San Francisco: Morgan Kaufmann; 1993.

240. Gray J, Siewiorek DP. High-availability computer systems. Computer. 1991;24(9):39–48 (September).

241. Gray J, van Ingen C. Empirical Measurements of Disk Failure Rates and Error Rates Redmond, Wash: MSR-TR-2005-166, Microsoft Research; 2005.

242. Greenberg A, Jain N, Kandula S, et al. VL2: A Scalable and Flexible Data Center Network. In: Proc ACM SIGCOMM. 2009.

243. Grice C, Kanellos M. Cell phone industry at crossroads: Go high or low?. CNET News 2000; August 31 technews.netscape.com/news/0-1004-201-2518386-0.html?tag=st.ne.1002.tgif.sf; 2000.

244. Groe JB, Larson LE. CDMA Mobile Radio Design Boston: Artech House; 2000.

245. Gunther KD. Prevention of deadlocks in packet-switched data transport systems. IEEE Trans on Communications. 1981;COM–29(4):512–524 (April).

246. Hagersten E, Koster M. WildFire: A scalable path for SMPs. Proc Fifth Int’l Symposium on High-Performance Computer Architecture 1998.

247. Hagersten E, Landin A, Haridi S. DDM—a cache-only memory architecture. IEEE Computer. 1992;25(9):44–54 (September).

248. Hamacher VC, Vranesic ZG, Zaky SG. Computer Organization 2nd ed. New York: McGraw-Hill; 1984.

249. Hamilton J. [2009]. “Data center networks are in my way,” paper presented at the Stanford Clean Slate CTO Summit, October 23, 2009 (http://mvdirona.com/jrh/TalksAndPapers/JamesHamilton_CleanSlateCTO2009.pdf).

250. Hamilton J. [2010]. “Cloud computing economies of scale,” paper presented at the AWS Workshop on Genomics and Cloud Computing, June 8, 2010, Seattle, Wash. (http://mvdirona.com/jrh/TalksAndPapers/JamesHamilton_GenomicsCloud20100608.pdf).

251. Handy J. The Cache Memory Book Boston: Academic Press; 1993.

252. Hauck EA, Dent BA. Burroughs’ B6500/B7500 stack mechanism. Proc AFIPS Spring Joint Computer Conf. 1968;245–251.

253. Heald R, Aingaran K, Amir C, et al. Implementation of third-generation SPARC V9 64-b microprocessor. ISSCC Digest of Technical Papers 2000;412–413 and slide supplement.

254. Heinrich J. MIPS R4000 User’s Manual Englewood Cliffs, N.J.: Prentice Hall; 1993.

255. Henly, M., and B. McNutt [1989]. DASD I/O Characteristics: A Comparison of MVS to VM,” Tech. Rep. TR 02.1550 (May), IBM General Products Division, San Jose, Calif.

256. Hennessy J. VLSI processor architecture. IEEE Trans on Computers. 1984;C-33(11):1221–1246 (December).

257. Hennessy J. VLSI RISC processors. VLSI Systems Design. 1985;6(10):22–32 (October).

258. Hennessy J, Jouppi N, Baskett F, Gill J. MIPS: A VLSI processor architecture. In: CMU Conference on VLSI Systems and Computations. Rockville, Md.: Computer Science Press; 1981.

259. Hewlett-Packard. PA-RISC 2.0 Architecture Reference Manual 3rd ed. Palo Alto, Calif: Hewlett-Packard; 1994.

260. Hewlett-Packard. HP’s ‘5NINES:5MINUTES’ Vision Extends Leadership and Redefines High Availability in Mission-Critical Environments. February 10 www.future.enterprisecomputing.hp.com/ia64/news/5nines_vision_pr.html; 1998.

261. Hill, M. D. [1987]. “Aspects of Cache Memory and Instruction Buffer Performance,” Ph.D. thesis, Tech. Rep. UCB/CSD 87/381, Computer Science Division, University of California, Berkeley.

262. Hill MD. A case for direct mapped caches. Computer. 1988;21(12):25–40 (December).

263. Hill MD. Multiprocessors should support simple memory consistency models. IEEE Computer. 1998;31(8):28–34 (August).

264. Hillis WD. The Connection Multiprocessor Cambridge, Mass: MIT Press; 1985.

265. Hillis WD, Steele GL. Data parallel algorithms. Communications of the ACM. 1986;29(12):1170–1183 (December) http://doi.acm.org/10.1145/7902.7903; 1986.

266. Hinton G, Sager D, Upton M, et al. The microarchitecture of the Pentium 4 processor. Intel Technology Journal 2001; February.

267. Hintz RG, Tate DP. Control data STAR-100 processor design. Proc IEEE COMPCON 1972;1–4.

268. Hirata H, Kimura K, Nagamine S, et al. An elementary processor architecture with simultaneous instruction issuing from multiple threads. Proc 19th Annual Int’l Symposium on Computer Architecture (ISCA) 1992;136–145.

269. Hitachi. SuperH RISC Engine SH7700 Series Programming Manual Santa Clara, Calif: Hitachi; 1997; see www.halsp.hitachi.com/tech_prod/; 1997; and search for title.

270. Ho R, Mai KW, Horowitz MA. The future of wires. Proc of the IEEE. 2001;89(4):490–504 (April).

271. Hoagland AS. Digital Magnetic Recording New York: Wiley; 1963.

272. Hockney RW, Jesshope CR. Parallel Computers 2: Architectures, Programming and Algorithms Bristol, England: Adam Hilger, Ltd.; 1988.

273. Holland JH. A universal computer capable of executing an arbitrary number of subprograms simultaneously. Proc East Joint Computer Conf. 1959;16:108–113.

274. Holt RC. Some deadlock properties of computer systems. ACM Computer Surveys. 1972;4(3):179–196 (September).

275. Hopkins M. [2000]. “A critical look at IA-64: Massive resources, massive ILP, but can it deliver?” Microprocessor Report, February.

276. Hord RM. The Illiac-IV, The First Supercomputer Rockville, Md: Computer Science Press; 1982.

277. Horel T, Lauterbach G. UltraSPARC-III: Designing third-generation 64-bit performance. IEEE Micro. 1999;19(3):73–85 (May–June).

278. Hospodor AD, Hoagland AS. The changing nature of disk controllers. Proc IEEE. 1993;81(4):586–594 (April).

279. Hölzle U. Brawny cores still beat wimpy cores, most of the time. IEEE Micro. 2010;30 (July/August).

280. Hristea C, Lenoski D, Keen J. Measuring memory hierarchy performance of cache-coherent multiprocessors using micro benchmarks. Proc ACM/IEEE Conf on Supercomputing 1997.

281. Hsu P. Designing the TFP microprocessor. IEEE Micro. 1994;18(2):2333 (April).

282. Huck J. Introducing the IA-64 Architecture. IEEE Micro. 2000;20(5):12–23 (September–October).

283. Hughes CJ, Kaul P, Adve SV, Jain R, Park C, Srinivasan J. Variability in the execution of multimedia applications and implications for architecture. Proc 28th Annual Int’l Symposium on Computer Architecture (ISCA) 2001;254–265.

284. Hwang K. Computer Arithmetic: Principles, Architecture, and Design New York: Wiley; 1979.

285. Hwang K. Advanced Computer Architecture and Parallel Programming New York: McGraw-Hill; 1993.

286. Hwu W-M, Patt Y . HPSm, a high performance restricted data flow architecture having minimum functionality. Proc 13th Annual Int’l Symposium on Computer Architecture (ISCA) 1986;297–307.

287. Hwu WW, Mahlke SA, Chen WY, et al. The superblock: An effective technique for VLIW and superscalar compilation. J Supercomputing. 1993;7(1):229–248 2 (March).

288. IBM. The Economic Value of Rapid Response Time White Plains, N.Y.: GE20-0752-0, IBM; 1982; 11–82.

289. IBM. [1990]. “The IBM RISC System/6000 processor” (collection of papers), IBM J. Research and Development 34:1 (January).

290. IBM. The PowerPC Architecture San Francisco: Morgan Kaufmann; 1994.

291. IBM. Blue Gene. IBM J Research and Development. 2005;49 (special issue).

292. IEEE. IEEE standard for binary floating-point arithmetic. SIGPLAN Notices. 1985;22(2):9–25.

293. IEEE. Intel virtualization technology, computer. IEEE Computer Society. 2005;38(5):48–56 (May).

294. IEEE. 754-2008 Working Group. DRAFT Standard for Floating-Point Arithmetic 754-2008. In: http://dx.doi.org/10.1109/IEEESTD.2008.4610935; 2006.

295. Imprimis Product Specification, 97209 Sabre Disk Drive IPI-2 Interface 1.2 GB, Document No. 64402302, Imprimis, Dallas, Tex.

296. InfiniBand Trade Association. [2001]. InfiniBand Architecture Specifications Release 1.0.a, www.infinibandta.org.

297. Intel. Using MMX Instructions to Convert RGB to YUV Color Conversion. In: cedar.intel.com/cgi-bin/ids.dll/content/content.jsp?cntKey=Legacy::irtm_AP548_9996&cntType=IDS_EDITORIAL; 2001.

298. Internet Retailer. The Gap launches a new site—after two weeks of downtime. Internet^® Retailer 2005; September 28 http://www.internetretailer.com/2005/09/28/the-gap-launches-a-new-site-after-two-weeks-of-downtime; 2005.

299. Jain R. The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling New York: Wiley; 1991.

300. Jantsch A, Tenhunen H, eds. Networks on Chips. The Netherlands: Kluwer Academic Publishers; 2003.

301. Jimenez DA, Lin C. Neural methods for dynamic branch prediction. ACM Trans on Computer Systems. 2002;20(4):369–397 (November).

302. Johnson M. Superscalar Microprocessor Design Englewood Cliffs, N.J.: Prentice Hall; 1990.

303. Jordan HF. Performance measurements on HEP—a pipelined MIMD computer. Proc 10th Annual Int’l Symposium on Computer Architecture (ISCA) 1982;207–212.

304. Jordan KE. Performance comparison of large-scale scientific processors: Scalar mainframes, mainframes with vector facilities, and supercomputers. Computer. 1987;20(3):10–23 (March).

305. Jouppi NP. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. Proc 17th Annual Int’l Symposium on Computer Architecture (ISCA) 1990;364–373.

306. Jouppi NP. Retrospective: Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. 25 Years of the International Symposia on Computer Architecture (Selected Papers) 1998;71–73.

307. Jouppi NP, Wall DW. Available instruction-level parallelism for super-scalar and superpipelined processors. Proc Third Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1989;272–282.

308. Jouppi NP, Wilton SJE. Trade-offs in two-level on-chip caching. Proc 21st Annual Int’l Symposium on Computer Architecture (ISCA) 1994;34–45.

309. Kaeli DR, Emma PG. Branch history table prediction of moving target branches due to subroutine returns. Proc 18th Annual Int’l Symposium on Computer Architecture (ISCA) 1991;34–42.

310. Kahan J. [1990]. “On the advantage of the 8087’s stack,” unpublished course notes, Computer Science Division, University of California, Berkeley.

311. Kahan W. 7094-II system support for numerical analysis. SHARE Secretarial Distribution SSD-159 University of Toronto: Department of Computer Science; 1968.

312. Kahaner DK. Benchmarks for ‘real’ programs. SIAM News 1988; November.

313. Kahn RE. Resource-sharing computer communication networks. Proc IEEE. 1972;60(11):1397–1407 (November).

314. Kane G. MIPS R2000 RISC Architecture Englewood Cliffs, N.J.: Prentice Hall; 1986.

315. Kane G. PA-RISC 2.0 Architecture Upper Saddle River, N.J: Prentice Hall; 1996.

316. Kane G, Heinrich J. MIPS RISC Architecture Englewood Cliffs, N.J: Prentice Hall; 1992.

317. Katz RH, Patterson DA, Gibson GA. Disk system architectures for high performance computing. Proc IEEE. 1989;77(12):1842–1858 (December).

318. Keckler SW, Dally WJ. Processor coupling: Integrating compile time and runtime scheduling for parallelism. Proc 19th Annual Int’l Symposium on Computer Architecture (ISCA) 1992;202–213.

319. Keller RM. Look-ahead processors. ACM Computing Surveys. 1975;7(4):177–195 (December).

320. Keltcher CN, McGrath KJ, Ahmed A, Conway P. The AMD Opteron processor for multiprocessor servers. IEEE Micro. 2003;23(2):66–76 (March–April) dx.doi.org/10.1109.MM.2003.119116; 2003.

321. Kembel R. Fibre Channel: A comprehensive introduction. Internet Week 2000; April.

322. Kermani P, Kleinrock L. Virtual Cut-Through: A New Computer Communication Switching Technique. Computer Networks. 1979;3:267–286 (January).

323. Kessler R. The Alpha 21264 microprocessor. IEEE Micro. 1999;19(2):24–36 (March/April).

324. Kilburn T, Edwards DBG, Lanigan MJ, Sumner FH. One-level storage system. IRE Trans on Electronic Computers. 1962;EC-11:223–235 (April). Also appears in.

324. Siewiorek DP, Bell CG, Newell A. Computer Structures: Principles and Examples New York: McGraw-Hill; 1962; 135–148.

325. Killian E. MIPS R4000 technical overview–64 bits/100 MHz or bust. Hot Chips III Symposium Record 1991;1.6–1.19.

326. Kim MY. Synchronized disk interleaving. IEEE Trans on Computers. 1986;C-35(11):978–988 (November).

327. Kissell KD. MIPS16: High-density for the embedded market. Proc Real Time Systems ’97 1997; see www.sgi.com/MIPS/arch/MIPS16/MIPS16.whitepaper.pdf; 1997.

328. Kitagawa K, Tagaya S, Hagihara Y, Kanoh Y. A hardware overview of SX-6 and SX-7 supercomputer. NEC Research & Development J. 2003;44(1):2–7 (January).

329. Knuth D. The Art of Computer Programming. Vol. II 2nd ed. Reading, Mass: Addison-Wesley; 1981.

330. Kogge PM. The Architecture of Pipelined Computers New York: McGraw-Hill; 1981.

331. Kohn L, Fu S-W. A 1,000,000 transistor microprocessor. Proc of IEEE Int’l Symposium on Solid State Circuits (ISSCC) 1989;54–55.

332. Kohn L, Margulis N. Introducing the Intel i860 64-Bit Microprocessor. IEEE Micro. 1989;9(4):15–30 (July).

333. Kontothanassis L, Hunt G, Stets R, et al. VM-based shared memory on low-latency, remote-memory-access networks. Proc 24th Annual Int’l Symposium on Computer Architecture (ISCA) 1997.

334. Koren I. Computer Arithmetic Algorithms Englewood Cliffs, N.J: Prentice Hall; 1989.

335. Kozyrakis C. [2000]. “Vector IRAM: A media-oriented vector processor with embedded DRAM,” paper presented at Hot Chips 12, August 13–15, 2000, Palo Alto, Calif, 13–15.

336. Kozyrakis C, Patterson D. Vector vs superscalar and VLIW architectures for embedded multimedia benchmarks. Proc 35th Annual Int’l Symposium on Microarchitecture (MICRO-35) 2002.

337. Kroft D. Lockup-free instruction fetch/prefetch cache organization. Proc Eighth Annual Int’l Symposium on Computer Architecture (ISCA) 1981;81–87.

338. Kroft D. Retrospective: Lockup-free instruction fetch/prefetch cache organization. 25 Years of the International Symposia on Computer Architecture 1998;20–21 (Selected Papers).

339. Kuck D, Budnik PP, Chen S-C, et al. Measurements of parallelism in ordinary FORTRAN programs. Computer. 1974;7(1):37–46 (January).

340. Kuhn DR. Sources of failure in the public switched telephone network. IEEE Computer. 1997;30(4):31–36 (April).

341. Kumar A. The HP PA-8000 RISC CPU. IEEE Micro. 1997;17(2):27–32 (March/April).

342. Kunimatsu A, Ide N, Sato T, et al. Vector unit architecture for emotion synthesis. IEEE Micro. 2000;20(2):40–47 (March–April).

343. Kunkel SR, Smith JE. Optimal pipelining in supercomputers. Proc 13th Annual Int’l Symposium on Computer Architecture (ISCA) 1986;404–414.

344. Kurose JF, Ross KW. Computer Networking: A Top-Down Approach Featuring the Internet Boston: Addison-Wesley; 2001.

345. Kuskin J, Ofelt D, Heinrich M, et al. The Stanford FLASH multiprocessor. Proc 21st Annual Int’l Symposium on Computer Architecture (ISCA) 1994.

346. Lam M. Software pipelining: An effective scheduling technique for VLIW processors. SIGPLAN Conf on Programming Language Design and Implementation 1988;318–328.

347. Lam MS, Wilson RP. Limits of control flow on parallelism. Proc 19th Annual Int’l Symposium on Computer Architecture (ISCA) 1992;46–57.

348. Lam MS, Rothberg EE, Wolf ME. The cache performance and optimizations of blocked algorithms. Proc Fourth Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1991;63–74 (SIGPLAN Notices 26:4 (April).

349. Lambright D. Experiences in measuring the reliability of a cache-based storage system. Proc of First Workshop on Industrial Experiences with Systems Software (WIESS 2000), Co-Located with the 4th Symposium on Operating Systems Design and Implementation (OSDI) 2000.

350. Lamport L. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans on Computers. 1979;C-28(9):241–248 (September).

351. Lang W, Patel JM, Shankar S. Wimpy node clusters: What about non-wimpy workloads? Proc Sixth International Workshop on Data Management on New Hardware (DaMoN) 2010.

352. Laprie J-C. Dependable computing and fault tolerance: Concepts and terminology. Proc 15th Annual Int’l Symposium on Fault-Tolerant Computing 1985;2–11.

353. Larson, E. R. [1973] “Findings of fact, conclusions of law, and order for judgment,” File No. 4-67, Civ. 138, Honeywell v. Sperry-Rand and Illinois Scientific Development, U.S. District Court for the State of Minnesota, Fourth Division (October 19).

354. Laudon J, Lenoski D. The SGI Origin: A ccNUMA highly scalable server. Proc 24th Annual Int’l Symposium on Computer Architecture (ISCA) 1997;241–251.

355. Laudon J, Gupta A, Horowitz M. Interleaving: A multithreading technique targeting multiprocessors and workstations. Proc Sixth Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1994;308–318.

356. Lauterbach G, Horel T. UltraSPARC-III: Designing third generation 64-bit performance. IEEE Micro. 1999;19 (May/June).

357. Lazowska ED, Zahorjan J, Graham GS, Sevcik KC. Quantitative System Performance: Computer System Analysis Using Queueing Network Models Englewood Cliffs, N.J.: Prentice Hall; 1984; (Although out of print, it is available online at www.cs.washington.edu/homes/lazowska/qsp/.).

358. Lebeck AR, Wood DA. Cache profiling and the SPEC benchmarks: A case study. Computer. 1994;27(10):15–26 (October).

359. Lee R. Precision architecture. Computer. 1989;22(1):78–91 (January).

360. Lee WV, et al. Debunking the 100X GPU vs CPU myth: An evaluation of throughput computing on CPU and GPU. Proc 37th Annual Int’l Symposium on Computer Architecture (ISCA) 2010.

361. Leighton FT. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes San Francisco: Morgan Kaufmann; 1992.

362. Leiner AL. System specifications for the DYSEAC. J ACM. 1954;1(2):57–81 (April).

363. Leiner AL, Alexander SN. System organization of the DYSEAC. IRE Trans of Electronic Computers. 1954;EC-3(1):1–10 (March).

364. Leiserson CE. Fat trees: Universal networks for hardware-efficient supercomputing. IEEE Trans on Computers. 1985;C-34(10):892–901 (October).

365. Lenoski D, Laudon J, Gharachorloo K, Gupta A, Hennessy JL. The Stanford DASH multiprocessor. Proc 17th Annual Int’l Symposium on Computer Architecture (ISCA) 1990;148–159.

366. Lenoski D, Laudon J, Gharachorloo K, et al. The Stanford DASH multiprocessor. IEEE Computer. 1992;25(3):63–79 (March).

367. Levy H, Eckhouse R. Computer Programming and Architecture: The VAX Boston: Digital Press; 1989.

368. Li K. IVY: A shared virtual memory system for parallel computing. Proc 1988 Int’l Conf on Parallel Processing University Park, Penn: Pennsylvania State University Press; 1988.

369. Li, S., K. Chen, J. B. Brockman, N. Jouppi [2011]. “Performance Impacts of Non-blocking Caches in Out-of-order Processors,” HP Labs Tech Report HPL-2011-65 (full text available at http://Library.hp.com/techpubs/2011/Hpl-2011-65.html).

370. Lim K, Ranganathan P, Chang J, Patel C, Mudge T, Reinhardt S. Understanding and designing new system architectures for emerging warehouse-computing environments. Proc 35th Annual Int’l Symposium on Computer Architecture (ISCA) 2008.

371. Lincoln NR. Technology and design trade offs in the creation of a modern supercomputer. IEEE Trans on Computers. 1982;C-31(5):363–376 (May).

372. Lindholm T, Yellin F. The Java Virtual Machine Specification 2nd ed. Reading, Mass: Addison-Wesley; 1999; (also available online at java.sun.com/docs/books/vmspec/).

373. Lipasti MH, Shen JP. Exceeding the dataflow limit via value prediction. Proc 29th Int’l Symposium on Microarchitecture 1996.

374. Lipasti MH, Wilkerson CB, Shen JP. Value locality and load value prediction. Proc Seventh Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1996;138–147.

375. Liptay JS. Structural aspects of the System/360 Model 85, Part II: The cache. IBM Systems J. 1968;7(1):15–21.

376. Lo J, Barroso L, Eggers S, Gharachorloo K, Levy H, Parekh S. An analysis of database workload performance on simultaneous multithreaded processors. Proc 25th Annual Int’l Symposium on Computer Architecture (ISCA) 1998;39–50.

377. Lo J, Eggers S, Emer J, Levy H, Stamm R, Tullsen D. Converting thread-level parallelism into instruction-level parallelism via simultaneous multithreading. ACM Trans on Computer Systems. 1997;15(2):322–354 (August).

378. Lovett T, Thakkar S. The Symmetry multiprocessor system. Proc 1988 Int’l Conf of Parallel Processing 1988;303–310.

379. Lubeck O, Moore J, Mendez R. A benchmark comparison of three supercomputers: Fujitsu VP-200, Hitachi S810/20, and Cray X-MP/2. Computer. 1985;18(12):10–24 (December).

380. Luk C-K, Mowry TC. Automatic compiler-inserted prefetching for pointer-based applications. IEEE Trans on Computers. 1999;48(2):134–141 (February).

381. Lunde A. Empirical evaluation of some features of instruction set processor architecture. Communications of the ACM. 1977;20(3):143–152 (March).

382. Luszczek, P., J. J. Dongarra, D. Koester, R. Rabenseifner, B. Lucas, J. Kepner, J. McCalpin, D. Bailey, D. Takahashi [2005]. “Introduction to the HPC challenge benchmark suite,” Lawrence Berkeley National Laboratory, Paper LBNL-57493 (April 25), repositories.cdlib.org/lbnl/LBNL-57493.

383. Maberly NC. Mastering Speed Reading New York: New American Library; 1966.

384. Magenheimer DJ, Peters L, Pettis KW, Zuras D. Integer multiplication and division on the HP precision architecture. IEEE Trans on Computers. 1988;37(8):980–990.

385. Mahlke SA, Chen WY, Hwu W-M, Rau BR, Schlansker MS. Sentinel scheduling for VLIW and superscalar processors. Proc Fifth Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1992;238–247.

386. Mahlke SA, Hank RE, McCormick JE, August DI, Hwu WW. A comparison of full and partial predicated execution support for ILP processors. Proc 22nd Annual Int’l Symposium on Computer Architecture (ISCA) 1995;138–149.

387. Major JB. Are queuing models within the grasp of the unwashed? Proc Int’l Conf on Management and Performance Evaluation of Computer Systems 1989;831–839.

388. Markstein PW. Computation of elementary functions on the IBM RISC System/6000 processor. IBM J Research and Development. 1990;34(1):111–119.

389. Mathis HM, Mercias AE, McCalpin JD, Eickemeyer RJ, Kunkel SR. Characterization of the multithreading (SMT) efficiency in Power5. IBM J Research and Development. 2005;49(4/5):555–564 (July/September).

390. McCalpin J. STREAM: Sustainable Memory Bandwidth in High Performance Computers. In: www.cs.virginia.edu/stream/; 2005.

391. McCalpin, J., D. Bailey, D. Takahashi [2005]. Introduction to the HPC Challenge Benchmark Suite, Paper LBNL-57493 Lawrence Berkeley National Laboratory, University of California, Berkeley, repositories.cdlib.org/lbnl/LBNL-57493.

392. McCormick, J., and A. Knies [2002]. “A brief analysis of the SPEC CPU2000 benchmarks on the Intel Itanium 2 processor,” paper presented at Hot Chips 14, August 18–20, 2002, Stanford University, Palo Alto, Calif.

393. McFarling S. Program optimization for instruction caches. Proc Third Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1989;183–191.

394. McFarling S. Combining Branch Predictors Palo Alto, Calif: WRL Technical Note TN-36, Digital Western Research Laboratory; 1993.

395. McFarling S, Hennessy J. Reducing the cost of branches. Proc 13th Annual Int’l Symposium on Computer Architecture (ISCA) 1986;396–403.

396. McGhan H, O’Connor M. PicoJava: A direct execution engine for Java bytecode. Computer. 1998;31(10):22–30 (October).

397. McKeeman WM. Language directed computer design. Proc AFIPS Fall Joint Computer Conf. 1967;413–417.

398. McMahon, F. M. [1986]. “The Livermore FORTRAN Kernels: A Computer Test of Numerical Performance Range,” Tech. Rep. UCRL-55745, Lawrence Livermore National Laboratory, University of California, Livermore.

399. McNairy C, Soltis D. Itanium 2 processor microarchitecture. IEEE Micro. 2003;23(2):44–55 (March–April).

400. Mead C, Conway L. Introduction to VLSI Systems Reading, Mass: Addison-Wesley; 1980.

401. Mellor-Crummey JM, Scott ML. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans on Computer Systems. 1991;9(1):21–65 (February).

402. Menabrea LF. Sketch of the analytical engine invented by Charles Babbage. Bibliothèque Universelle de Genève. 1842;82 (October).

403. Menon A, Renato Santos J, Turner Y, Janakiraman G, Zwaenepoel W. Diagnosing performance overheads in the xen virtual machine environment. Proc First ACM/USENIX Int’l Conf on Virtual Execution Environments 2005;13–23.

404. Merlin PM, Schweitzer PJ. Deadlock avoidance in store-and-forward networks Part I Store-and-forward deadlock. IEEE Trans on Communications. 1980;COM-28(3):345–354 (March).

405. Metcalfe RM. Computer/network interface design: Lessons from Arpanet and Ethernet. IEEE J on Selected Areas in Communications. 1993;11(2):173–180 (February).

406. Metcalfe RM, Boggs DR. Ethernet: Distributed packet switching for local computer networks. Communications of the ACM. 1976;19(7):395–404 (July).

407. Metropolis N, Howlett J, Rota GC, eds. A History of Computing in the Twentieth Century. New York: Academic Press; 1980.

408. Meyer RA, Seawright LH. A virtual machine time sharing system. IBM Systems J. 1970;9(3):199–218.

409. Meyers GJ. The evaluation of expressions in a storage-to-storage architecture. Computer Architecture News. 1978;7(3):20–23 (October).

410. Meyers GJ. Advances in Computer Architecture 2nd ed. New York: Wiley; 1982.

411. Micron. Calculating Memory System Power for DDR2. In: http://download.micron.com/pdf/pubs/designline/dl1Q04.pdf; 2004.

412. Micron. The Micron^® System-Power Calculator. In: http://www.micron.com/systemcalc; 2006.

413. MIPS. MIPS16 Application Specific Extension Product Description. In: www.sgi.com/MIPS/arch/MIPS16/mips16.pdf; 1997.

414. Miranker GS, Rubenstein J, Sanguinetti J. Squeezing a Cray-class supercomputer into a single-user package. Proc IEEE COMPCON 1988;452–456.

415. Mitchell D. The Transputer: The time is now. Computer Design (RISC suppl.) 1989;40–41.

416. Mitsubishi. Mitsubishi 32-Bit Single Chip Microcomputer M32R Family Software Manual Cypress, Calif: Mitsubishi; 1996.

417. Miura K, Uchida K. FACOM vector processing system: VP100/200. Proc NATO Advanced Research Workshop on High-Speed Computing 1983; Also appears in In: Hwang K, ed. Superprocessors: Design and applications. 1983;59–73. IEEE (August 1984).

418. Miya EN. Multiprocessor/distributed processing bibliography. Computer Architecture News. 1985;13(1):27–29.

419. Montoye RK, Hokenek E, Runyon SL. Design of the IBM RISC System/6000 floating-point execution. IBM J Research and Development. 1990;34(1):59–70.

420. Moore B, Padegs A, Smith R, Bucholz W. Concepts of the System/370 vector architecture. 14th Annual Int’l Symposium on Computer Architecture (ISCA) 1987;282–292.

421. Moore GE. Cramming more components onto integrated circuits. Electronics. 1965;38(8):114–117 (April 19).

422. Morse S, Ravenal B, Mazor S, Pohlman W. Intel microprocessors—8080 to 8086. Computer. 1980;13 (October).

423. Moshovos A, Sohi GS. Streamlining inter-operation memory communication via data dependence prediction. Proc 30th Annual Int’l Symposium on Microarchitecture 1997;235–245.

424. Moshovos A, Breach S, Vijaykumar TN, Sohi GS. Dynamic speculation and synchronization of data dependences. 24th Annual Int’l Symposium on Computer Architecture (ISCA) 1997.

425. Moussouris J, Crudele L, Freitas D, et al. A CMOS RISC processor with integrated system functions. Proc IEEE COMPCON 1986;191.

426. Mowry TC, Lam S, Gupta A. Design and evaluation of a compiler algorithm for prefetching. Proc Fifth Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1992;62–73.

427. MSN Money. Amazon Shares Tumble after Rally Fizzles. In: http://moneycentral.msn.com/content/CNBCTV/Articles/Dispatches/P133695.asp; 2005.

428. Muchnick SS. Optimizing compilers for SPARC. Sun Technology. 1988;1(3):64–77 (Summer).

429. Mueller M, Alves LC, Fischer W, Fair ML, Modi I. RAS strategy for IBM S/390 G5 and G6. IBM J Research and Development. 1999;43(5–6):875–888 (September–November).

430. Mukherjee SS, Weaver C, Emer JS, Reinhardt SK, Austin TM. Measuring architectural vulnerability factors. IEEE Micro. 2003;23(6):70–75.

431. Murphy B, Gent T. Measuring system and software reliability using an automated data collection process. Quality and Reliability Engineering International. 1995;11(5):341–353 (September–October).

432. Myer TH, Sutherland IE. On the design of display processors. Communications of the ACM. 1968;11(6):410–414 (June).

433. Narayanan D, Thereska E, Donnelly A, Elnikety S, Rowstron A. Migrating server storage to SSDs: Analysis of trade-offs. Proc 4th ACM European Conf on Computer Systems 2009.

434. National Research Council. The Evolution of Untethered Communications, Computer Science and Telecommunications Board Washington, D.C.: National Academy Press; 1997.

435. National Storage Industry Consortium. Tape Roadmap. In: www.nsic.org; 1998.

436. Nelson VP. Fault-tolerant computing: Fundamental concepts. Computer. 1990;23(7):19–25 (July).

437. Ngai T-F, Irwin MJ. Regular, area-time efficient carry-lookahead adders. Proc Seventh IEEE Symposium on Computer Arithmetic 1985;9–15.

438. Nicolau A, Fisher JA. Measuring the parallelism available for very long instruction word architectures. IEEE Trans on Computers. 1984;C-33(11):968–976 (November).

439. Nikhil RS, Papadopoulos GM, Arvind. *T: A multithreaded massively parallel architecture. Proc 19th Annual Int’l Symposium on Computer Architecture (ISCA) 1992;156–167.

440. Noordergraaf L, van der Pas R. Performance experiences on Sun’s WildFire prototype. Proc ACM/IEEE Conf on Supercomputing 1999.

441. Nyberg CR, Barclay T, Cvetanovic Z, Gray J, Lomet D. AlphaSort: A RISC machine sort. Proc ACM SIGMOD 1994.

442. Oka M, Suzuoki M. Designing and programming the emotion engine. IEEE Micro. 1999;19(6):20–28 (November–December).

443. Okada S, Okada S, Matsuda Y, Yamada T, Kobayashi A. System on a chip for digital still camera. IEEE Trans on Consumer Electronics. 1999;45(3):584–590 (August).

444. Oliker L, Canning A, Carter J, Shalf J, Ethier S. Scientific computations on modern parallel vector systems. Proc ACM/IEEE Conf on Supercomputing 2004;10.

445. Pabst T. Performance Showdown at 133 MHz FSB—The Best Platform for Coppermine. In: www6.tomshardware.com/mainboard/00q1/000302/; 2000.

446. Padua D, Wolfe M. Advanced compiler optimizations for supercomputers. Communications of the ACM. 1986;29(12):1184–1201 (December).

447. Palacharla S, Kessler RE. Evaluating stream buffers as a secondary cache replacement. Proc 21st Annual Int’l Symposium on Computer Architecture (ISCA) 1994;24–33.

448. Palmer J, Morse S. The 8087 Primer New York: John Wiley & Sons; 1984; 93.

449. Pan S-T, So K, Rameh JT. Improving the accuracy of dynamic branch prediction using branch correlation. Proc Fifth Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1992;76–84.

450. Partridge C. Gigabit Networking Reading, Mass: Addison-Wesley; 1994.

451. Patterson D. Reduced instruction set computers. Communications of the ACM. 1985;28(1):8–21 (January).

452. Patterson D. Latency lags bandwidth. Communications of the ACM. 2004;47(10):71–75 (October).

453. Patterson DA, Ditzel DR. The case for the reduced instruction set computer. Computer Architecture News. 1980;8(6):25–33 (October).

454. Patterson DA, Hennessy JL. Computer Organization and Design: The Hardware/Software Interface 3rd ed. San Francisco: Morgan Kaufmann; 2004.

455. Patterson, D. A., G. A. Gibson, and R. H. Katz [1987]. A Case for Redundant Arrays of Inexpensive Disks (RAID), Tech. Rep. UCB/CSD 87/391, University of California, Berkeley. Also appeared in Proc. ACM SIGMOD, June 1–3, 1988, Chicago, 109–116.

456. Patterson DA, Garrison P, Hill M, et al. Architecture of a VLSI instruction cache for a RISC. 10th Annual Int’l Conf on Computer Architecture Conf Proc. 1983;108–116.

457. Pavan P, Bez R, Olivo P, Zanoni E. Flash memory cells—an overview. Proc IEEE. 1997;85(8):1248–1271 (August).

458. Peh LS, Dally WJ. A delay model and speculative architecture for pipe-lined routers. Proc 7th Int’l Symposium on High-Performance Computer Architecture 2001.

459. Peng V, Samudrala S, Gavrielov M. On the implementation of shifters, multipliers, and dividers in VLSI floating point units. Proc 8th IEEE Symposium on Computer Arithmetic 1987;95–102.

460. Pfister GF. In Search of Clusters 2nd ed. Upper Saddle River, N.J.: Prentice Hall; 1998.

461. Pfister GF, Brantley WC, George DA, et al. The IBM research parallel processor prototype (RP3): Introduction and architecture. Proc 12th Annual Int’l Symposium on Computer Architecture (ISCA) 1985;764–771.

462. Pinheiro E, Weber WD, Barroso LA. Failure trends in a large disk drive population. Proc 5th USENIX Conference on File and Storage Technologies (FAST ’07) 2007.

463. Pinkston TM. Deadlock characterization and resolution in interconnection networks. In: Zhuand MC, Fanti MP, eds. Deadlock Resolution in Computer-Integrated Systems. Boca Raton, FL: CRC Press; 2004;445–492.

464. Pinkston TM, Shin J. Trends toward on-chip networked microsystems. Int’l J of High Performance Computing and Networking. 2005;3(1):3–18.

465. Pinkston TM, Warnakulasuriya S. On deadlocks in interconnection networks. 24th Annual Int’l Symposium on Computer Architecture (ISCA) 1997.

466. Pinkston TM, Benner A, Krause M, Robinson I, Sterling T. InfiniBand: The ‘de facto’ future standard for system and local area networks or just a scalable replacement for PCI buses? Cluster Computing (special issue on communication architecture for clusters). 2003;6(2):95–104 (April).

467. Postiff MA, Greene DA, Tyson GS, Mudge TN. The limits of instruction level parallelism in SPEC95 applications. Computer Architecture News. 1999;27(1):31–40 (March).

468. Przybylski SA. Cache Design: A Performance-Directed Approach San Francisco: Morgan Kaufmann; 1990.

469. Przybylski SA, Horowitz M, Hennessy JL. Performance trade-offs in cache design. 15th Annual Int’l Symposium on Computer Architecture 1988;290–298.

470. Puente V, Beivide R, Gregorio JA, Prellezo JM, Duato J, Izu C. Adaptive bubble router: A design to improve performance in torus networks. Proc 28th Int’l Conference on Parallel Processing 1999.

471. Radin G. The 801 minicomputer. Proc Symposium Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1982;39–47.

472. Rajesh Bordawekar, Uday Bondhugula, Ravi Rao: Believe it or not!: mult-core CPUs can match GPU performance for a FLOP-intensive application! 19th International Conference on Parallel Architecture and Compilation Techniques (PACT 2010), Vienna, Austria, September 11-15, 2010: 537-538.

473. Ramamoorthy CV, Li HF. Pipeline architecture. ACM Computing Surveys. 1977;9(1):61–102 (March).

474. Ranganathan P, Leech P, Irwin D, Chase J. Ensemble-Level Power Management for Dense Blade Servers. Proc 33rd Annual Int’l Symposium on Computer Architecture (ISCA) 2006;66–77.

475. Rau BR. Iterative modulo scheduling: An algorithm for software pipelining loops. Proc 27th Annual Int’l Symposium on Microarchitecture 1994;63–74.

476. Rau BR, Glaeser CD, Picard RL. Efficient code generation for horizontal architectures: Compiler techniques and architectural support. Proc Ninth Annual Int’l Symposium on Computer Architecture (ISCA) 1982;131–139.

477. Rau BR, Yen DWL, Yen W, Towle RA. The Cydra 5 departmental supercomputer: Design philosophies, decisions, and trade-offs. IEEE Computers. 1989;22(1):12–34 (January).

478. Reddi VJ, Lee BC, Chilimbi T, Vaid K. Web search using mobile cores: Quantifying and mitigating the price of efficiency. Proc 37th Annual Int’l Symposium on Computer Architecture (ISCA) 2010.

479. Redmond KC, Smith TM. Project Whirlwind—The History of a Pioneer Computer Boston: Digital Press; 1980.

480. Reinhardt SK, Larus JR, Wood DA. Tempest and Typhoon: User-level shared memory. 21st Annual Int’l Symposium on Computer Architecture (ISCA) 1994;325–336.

481. Reinman G, Jouppi NP. Extensions to CACTI. In: research.compaq.com/wrl/people/jouppi/CACTI.html; 1999.

482. Rettberg RD, Crowther WR, Carvey PP, Towlinson RS. The Monarch parallel processor hardware design. IEEE Computer. 1990;23(4):18–30 (April).

483. Riemens A, Vissers KA, Schutten RJ, Sijstermans FW, Hekstra GJ, La Hei GD. Trimedia CPU64 application domain and benchmark suite. Proc IEEE Int’l Conf on Computer Design: VLSI in Computers and Processors (ICCD’99) 1999;580–585.

484. Riseman EM, Foster CC. Percolation of code to enhance paralled dispatching and execution. IEEE Trans on Computers. 1972;C-21(12):1411–1415 (December).

485. Robin J, Irvine C. Analysis of the Intel Pentium’s ability to support a secure virtual machine monitor. Proc USENIX Security Symposium 2000.

486. Robinson B, Blount L. The VM/HPO 3880-23 Performance Results Gaithersburg, Md: IBM Tech. Bulletin GG66-0247-00, IBM Washington Systems Center; 1986.

487. Ropers, A., H. W Lollman, J. Wellhausen [1999]. DSPstone: Texas Instruments TMS320C54x, Tech. Rep. IB 315 1999/9-ISS-Version 0.9, Aachen University of Technology, Aaachen, Germany (www.ert.rwth-aachen.de/Projekte/Tools/coal/dspstone_c54x/index.html).

488. Rosenblum M, Herrod SA, Witchel E, Gupta A. Complete computer simulation: The SimOS approach. in IEEE Parallel and Distributed Technology (now called Concurrency). 1995;4(3):34–43.

489. Rowen C, Johnson M, Ries P. The MIPS R3010 floating-point coprocessor. IEEE Micro. 1988;8(3):53–62 (June).

490. Russell RM. The Cray-1 processor system. Communications of the ACM. 1978;21(1):63–72 (January).

491. Rymarczyk J. Coding guidelines for pipelined processors. Proc Symposium Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1982;12–19.

492. Saavedra-Barrera, R. H. [1992]. “CPU Performance Evaluation and Execution Time Prediction Using Narrow Spectrum Benchmarking,” Ph.D. dissertation, University of California, Berkeley.

493. Salem K, Garcia-Molina H. Disk striping. Proc 2nd Int’l IEEE Conf on Data Engineering 1986;249–259.

494. Saltzer JH, Reed DP, Clark DD. End-to-end arguments in system design. ACM Trans on Computer Systems. 1984;2(4):277–288 (November).

495. Samples A. D., and P. N. Hilfinger [1988]. Code Reorganization for Instruction Caches, Tech. Rep. UCB/CSD 88/447, University of California, Berkeley.

496. Santoro MR, Bewick G, Horowitz MA. Rounding algorithms for IEEE multipliers. Proc Ninth IEEE Symposium on Computer Arithmetic 1989;176–183.

497. Satran J, Smith D, Meth K, et al. iSCSI. IPS Working Group of IETF 2001; Internet draft www.ietf.org/internet-drafts/draft-ietf-ips-iscsi-07.txt.

498. Saulsbury A, Wilkinson T, Carter J, Landin A. An argument for Simple COMA. Proc First IEEE Symposium on High-Performance Computer Architectures 1995;276–285.

499. Schneck PB. Superprocessor Architecture Norwell, Mass: Kluwer Academic Publishers; 1987.

500. Schroeder B, Gibson GA. Understanding failures in petascale computers. J of Physics Conf Series. 2007;78(1):188–198.

501. Schroeder B, Pinheiro E, Weber W-D. DRAM errors in the wild: a large-scale field study. Proc Eleventh Int’l Joint Conf on Measurement and Modeling of Computer Systems (SIGMETRICS) 2009.

502. Schurman E, Brutlag J. The user and business impact of server delays. Proc Velocity: Web Performance and Operations Conf. 2009.

503. Schwartz JT. Ultracomputers. ACM Trans on Programming Languages and Systems. 1980;4(2):484–521.

504. Scott NR. Computer Number Systems and Arithmetic Englewood Cliffs, N.J.: Prentice Hall; 1985.

505. Scott SL. Synchronization and communication in the T3E multiprocessor. Seventh Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1996.

506. Scott SL, Goodman J. The impact of pipelined channels on k-ary n-cube networks. IEEE Trans on Parallel and Distributed Systems. 1994;5(1):1–16 (January).

507. Scott SL, Thorson GM. The Cray T3E network: Adaptive routing in a high performance 3D torus. Proc IEEE HOT Interconnects ’96 1996;14–156.

508. Scranton R. A., D. A. Thompson, and D. W. Hunter [1983]. The Access Time Myth,” Tech. Rep. RC 10197 (45223), IBM, Yorktown Heights, N.Y.

509. Seagate. [2000]. Seagate Cheetah 73 Family: ST173404LW/LWV/LC/LCV Product Manual, Vol. 1, Seagate, Scotts Valley, Calif. (www.seagate.com/support/disc/manuals/scsi/29478b.pdf).

510. Seitz CL. The Cosmic Cube (concurrent computing). Communications of the ACM. 1985;28(1):22–33 (January).

511. Senior JM. Optical Fiber Commmunications: Principles and Practice 2nd ed. Hertfordshire, U.K.: Prentice Hall; 1993.

512. Sharangpani H, Arora K. Itanium Processor Microarchitecture. IEEE Micro. 2000;20(5):24–43 (September–October).

513. Shurkin J. Engines of the Mind: A History of the Computer New York: W.W. Norton; 1984.

514. Shustek L. J. [1978]. “Analysis and Performance of Computer Instruction Sets,” Ph.D. dissertation, Stanford University, Palo Alto, Calif.

515. Silicon Graphics. [1996]. MIPS V Instruction Set (see www.sgi.com/MIPS/arch/ISA5/#MIPSV_indx).

516. Singh JP, Hennessy JL, Gupta A. Scaling parallel programs for multiprocessors: Methodology and examples. Computer. 1993;26(7):22–33 (July).

517. Sinharoy B, Koala RN, Tendler JM, Eickemeyer RJ, Joyner JB. POWER5 system microarchitecture. IBM J Research and Development. 2005;49(4–5):505–521.

518. Sites, R. [1979]. Instruction Ordering for the CRAY-1 Computer, Tech. Rep. 78-CS-023, Dept. of Computer Science, University of California, San Diego.

519. Sites RL, ed. Alpha Architecture Reference Manual. Burlington, Mass: Digital Press; 1992.

520. Sites RL, Witek R, eds. Alpha Architecture Reference Manual. Newton, Mass: Digital Press; 1955.

521. Skadron K, Clark DW. Design issues and tradeoffs for write buffers. Proc Third Int’l Symposium on High-Performance Computer Architecture 1997;144–155.

522. Skadron K, Ahuja PS, Martonosi M, Clark DW. Branch prediction, instruction-window size, and cache size: Performance tradeoffs and simulation techniques. IEEE Trans on Computers. 1999;48 (November).

523. Slater R. Portraits in Silicon Cambridge, Mass: MIT Press; 1987.

524. Slotnick DL, Borck WC, McReynolds RC. The Solomon computer. Proc AFIPS Fall Joint Computer Conf. 1962;97–107.

525. Smith AJ. Cache memories. Computing Surveys. 1982;14(3):473–530 (September).

526. Smith A, Lee J. Branch prediction strategies and branch-target buffer design. Computer. 1984;17(1):6–22 (January).

527. Smith BJ. A pipelined, shared resource MIMD computer. Proc Int’l Conf on Parallel Processing (ICPP) 1978;6–8.

528. Smith BJ. Architecture and applications of the HEP multiprocessor system. Real-Time Signal Processing IV. 1981;298:241–248 (August).

529. Smith JE. A study of branch prediction strategies. Proc Eighth Annual Int’l Symposium on Computer Architecture (ISCA) 1981;135–148.

530. Smith JE. Decoupled access/execute computer architectures. ACM Trans on Computer Systems. 1984;2(4):289–308 (November).

531. Smith JE. Characterizing computer performance with a single number. Communications of the ACM. 1988;31(10):1202–1206 (October).

532. Smith JE. Dynamic instruction scheduling and the Astronautics ZS-1. Computer. 1989;22(7):21–35 (July).

533. Smith JE, Goodman JR. A study of instruction cache organizations and replacement policies. Proc 10th Annual Int’l Symposium on Computer Architecture (ISCA) 1982;132–137.

534. Smith JE, Pleszkun AR. Implementing precise interrupts in pipelined processors. IEEE Trans on Computers. 1988;37(5):562–573 (May) (This paper is based on an earlier paper that appeared in Proc. 12th Annual Int’l. Symposium on Computer Architecture (ISCA), June 17–19, 1985, Boston, Mass.).

535. Smith JE, Dermer GE, Vanderwarn BD, et al. The ZS-1 central processor. Proc Second Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1987;199–204.

536. Smith MD, Horowitz M, Lam MS. Efficient superscalar performance through boosting. Proc Fifth Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1992;248–259.

537. Smith MD, Johnson M, Horowitz MA. Limits on multiple instruction issue. Proc Third Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1989;290–302.

538. Smotherman M. A sequencing-based taxonomy of I/O systems and review of historical machines. Computer Architecture News. 1989;17(5):5–15 (September).

538. Reprinted in Hill MD, Jouppi NP, Sohi GS, eds. Computer Architecture Readings. San Francisco: Morgan Kaufmann; 1989.

539. Sodani A, Sohi G. Dynamic instruction reuse. Proc 24th Annual Int’l Symposium on Computer Architecture (ISCA) 1997.

540. Sohi GS. Instruction issue logic for high-performance, interruptible, multiple functional unit, pipelined computers. IEEE Trans on Computers. 1990;39(3):349–359 (March).

541. Sohi GS, Vajapeyam S. Tradeoffs in instruction format design for horizontal architectures. Proc Third Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1989;15–25.

542. Soundararajan V, Heinrich M, Verghese B, Gharachorloo K, Gupta A, Hennessy JL. Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors. Proc 25th Annual Int’l Symposium on Computer Architecture (ISCA) 1998;342–355.

543. SPEC. [1989]. SPEC Benchmark Suite Release 1.0 (October 2).

544. SPEC. [1994]. SPEC Newsletter (June).

545. Sporer M, Moss FH, Mathais CJ. An introduction to the architecture of the Stellar Graphics supercomputer. Proc IEEE COMPCON 1988;464.

546. Spurgeon C. Charles Spurgeon’s Ethernet Web Site. In: wwwhost.ots.utexas.edu/ethernet/ethernet-home.html; 2001.

547. Spurgeon C. Charles Spurgeon’s Ethernet Web SITE. In: www.ethermanage.com/ethernet/ethernet.html; 2006.

548. Stenström P, Joe T, Gupta A. Comparative performance evaluation of cache-coherent NUMA and COMA architectures. Proc 19th Annual Int’l Symposium on Computer Architecture (ISCA) 1992;80–91.

549. Sterling T. Beowulf PC Cluster Computing with Windows and Beowulf PC Cluster Computing with Linux Cambridge, Mass: MIT Press; 2001.

550. Stern N. Who invented the first electronic digital computer? Annals of the History of Computing. 1980;2(4):375–376 (October).

551. Stevens WR. TCP/IP Illustrated (three volumes) Reading, Mass: Addison-Wesley; 1994–1996.

552. Stokes J. Sound and Vision: A Technical Overview of the Emotion Engine. In: arstechnica.com/reviews/1q00/playstation2/ee-1.html; 2000.

553. Stone H. High Performance Computers New York: Addison-Wesley; 1991.

554. Strauss W. DSP Strategies 2002. In: www.usadata.com/market_research/spr_05/spr_r127-005.htm; 1998.

555. Strecker WD. Cache memories for the PDP-11? Proc Third Annual Int’l Symposium on Computer Architecture (ISCA) 1976;155–158.

556. Strecker WD. VAX-11/780: A virtual address extension of the PDP-11 family. Proc AFIPS National Computer Conf. 1978;967–980.

557. Sugumar RA, Abraham SG. Efficient simulation of caches under optimal replacement with applications to miss characterization. Proc ACM SIGMETRICS Conf on Measurement and Modeling of Computer Systems 1993;24–35.

558. Sun Microsystems. [1989]. The SPARC Architectural Manual, Version 8, Part No. 8001399-09, Sun Microsystems, Santa Clara, Calif.

559. Sussenguth E. IBM’s ACS-1 Machine. IEEE Computer. 1999;22 (November).

560. Swan RJ, Fuller SH, Siewiorek DP. Cm*—a modular, multi-microprocessor. Proc AFIPS National Computing Conf. 1977;637–644.

561. Swan RJ, Bechtolsheim A, Lai KW, Ousterhout JK. The implementation of the Cm* multi-microprocessor. Proc AFIPS National Computing Conf. 1977;645–654.

562. Swartzlander E, ed. Computer Arithmetic. Los Alamitos, Calif: IEEE Computer Society Press; 1990.

563. Takagi N, Yasuura H, Yajima S. High-speed VLSI multiplication algorithm with a redundant binary addition tree. IEEE Trans on Computers. 1985;C-34(9):789–796.

564. Talagala, N. [2000]. “Characterizing Large Storage Systems: Error Behavior and Performance Benchmarks,” Ph.D. dissertation, Computer Science Division, University of California, Berkeley.

565. Talagala, N., and D. Patterson [1999]. An Analysis of Error Behavior in a Large Storage System, Tech. Report UCB//CSD-99-1042, Computer Science Division, University of California, Berkeley.

566. Talagala N, Arpaci-Dusseau R, Patterson D. Micro-Benchmark Based Extraction of Local and Global Disk Characteristics University of California, Berkeley: CSD-99-1063, Computer Science Division; 2000.

567. Talagala N, Asami S, Patterson D, Futernick R, Hart D. The art of massive storage: A case study of a Web image archive. Computer 2000; (November).

568. Tamir Y, Frazier G. Dynamically-allocated multi-queue buffers for VLSI communication switches. IEEE Trans on Computers. 1992;41(6):725–734 (June).

569. Tanenbaum AS. Implications of structured programming for machine architecture. Communications of the ACM. 1978;21(3):237–246 (March).

570. Tanenbaum AS. Computer Networks 2nd ed. Englewood Cliffs, N.J: Prentice Hall; 1988.

571. Tang CK. Cache design in the tightly coupled multiprocessor system. Proc AFIPS National Computer Conf. 1976;749–753.

572. Tanqueray D. The Cray X1 and supercomputer road map. Proc 13th Daresbury Machine Evaluation Workshop 2002.

573. Tarjan, D., S. Thoziyoor, and N. Jouppi [2005]. “HPL Technical Report on CACTI 4.0,” www.hpl.hp.com/techeports/2006/HPL=2006+86.html.

574. Taylor GS. Compatible hardware for division and square root. Proc 5th IEEE Symposium on Computer Arithmetic 1981;127–134.

575. Taylor GS. Radix 16 SRT dividers with overlapped quotient selection stages. Proc Seventh IEEE Symposium on Computer Arithmetic 1985;64–71.

576. Taylor G, Hilfinger P, Larus J, Patterson D, Zorn B. Evaluation of the SPUR LISP architecture. Proc 13th Annual Int’l Symposium on Computer Architecture (ISCA) 1986.

577. Taylor MB, Lee W, Amarasinghe SP, Agarwal A. Scalar operand networks. IEEE Trans on Parallel and Distributed Systems. 2005;16(2):145–162 (February).

578. Tendler JM, Dodson JS, Fields Jr JS, Le H, Sinharoy B. Power4 system microarchitecture. IBM J Research and Development. 2002;46(1):5–26.

579. Texas Instruments. History of Innovation: 1980s. In: www.ti.com/corp/docs/company/history/1980s.shtml; 2000.

580. Tezzaron Semiconductor. [2004]. Soft Errors in Electronic Memory, White Paper, Tezzaron Semiconductor, Naperville, Ill. (http://www.tezzaron.com/about/papers/soft_errors_1_1_secure.pdf).

581. Thacker CP, McCreight EM, Lampson BW, Sproull RF, Boggs DR. Alto: A personal computer. In: Siewiorek DP, Bell CG, Newell A, eds. Computer Structures: Principles and Examples. New York: McGraw-Hill; 1982;549–572.

582. Thadhani AJ. Interactive user productivity. IBM Systems J. 1981;20(4):407–423.

583. Thekkath R, Singh AP, Singh JP, John S, Hennessy JL. An evaluation of a commercial CC-NUMA architecture—the CONVEX Exemplar SPP1200. Proc 11th Int’l Parallel Processing Symposium (IPPS) 1997.

584. Thorlin JF. Code generation for PIE (parallel instruction execution) computers. Proc Spring Joint Computer Conf. 1967;27.

585. Thornton JE. Parallel operation in the Control Data 6600. Proc AFIPS Fall Joint Computer Conf., Part II 1964;33–40.

586. Thornton JE. Design of a Computer, the Control Data 6600 Glenview, Ill: Scott, Foresman; 1970.

587. Tjaden GS, Flynn MJ. Detection and parallel execution of independent instructions. IEEE Trans on Computers. 1970;C-19(10):889–895.

588. Tomasulo RM. An efficient algorithm for exploiting multiple arithmetic units. IBM J Research and Development. 1967;11(1):25–33 (January).

589. Torrellas J, Gupta A, Hennessy J. Characterizing the caching and synchronization performance of a multiprocessor operating system. Proc Fifth Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1992;162–174.

590. Touma WR. The Dynamics of the Computer Industry: Modeling the Supply of Workstations and Their Components Boston: Kluwer Academic; 1993.

591. Tuck N, Tullsen D. Initial observations of the simultaneous multithreading Pentium 4 processor. Proc 12th Int Conf on Parallel Architectures and Compilation Techniques (PACT’03) 2003;26–34.

592. Tullsen DM, Eggers SJ, Levy HM. Simultaneous multithreading: Maximizing on-chip parallelism. Proc 22nd Annual Int’l Symposium on Computer Architecture (ISCA) 1995;392–403.

593. Tullsen DM, Eggers SJ, Emer JS, Levy HM, Lo JL, Stamm RL. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. Proc 23rd Annual Int’l Symposium on Computer Architecture (ISCA) 1996;191–202.

594. Ungar D, Blau R, Foley P, Samples D, Patterson D. Architecture of SOAR: Smalltalk on a RISC. Proc 11th Annual Int’l Symposium on Computer Architecture (ISCA) 1984;188–197.

595. Unger SH. A computer oriented towards spatial problems. Proc Institute of Radio Engineers. 1958;46(10):1744–1750 (October).

596. Vahdat A, Al-Fares M, Farrington N, Niranjan Mysore R, Porter G, Radhakrishnan S. Scale-Out Networking in the Data Center. IEEE Micro. 2010;30(4):29–41 (July/August).

597. Vaidya AS, Sivasubramaniam A, Das CR. Performance benefits of virtual channels and adaptive routing: An application-driven study. Proc ACM/IEEE Conf on Supercomputing 1997.

598. Vajapeyam, S. [1991]. “Instruction-Level Characterization of the Cray Y-MP Processor,” Ph.D. thesis, Computer Sciences Department, University of Wisconsin-Madison.

599. van Eijndhoven JTJ, Sijstermans FW, Vissers KA, et al. Trimedia CPU64 architecture. Proc IEEE Int’l Conf on Computer Design: VLSI in Computers and Processors (ICCD’99) 1999;586–592.

600. Van Vleck T. The IBM 360/67 and CP/CMS. In: http://www.multicians.org/thvv/360-67.html; 2005.

601. von Eicken T, Culler DE, Goldstein SC, Schauser KE. Active Messages: A mechanism for integrated communication and computation. Proc 19th Annual Int’l Symposium on Computer Architecture (ISCA) 1992.

602. Waingold E, Taylor M, Srikrishna D, et al. Baring it all to software: Raw Machines. IEEE Computer. 1997;30:86–93 (September).

603. Wakerly J. Microcomputer Architecture and Programming New York: Wiley; 1989.

604. Wall DW. Limits of instruction-level parallelism. Proc Fourth Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1991;248–259.

605. Wall DW. Limits of Instruction-Level Parallelism Palo Alto, Calif: Research Rep. 93/6, Western Research Laboratory, Digital Equipment Corp.; 1993.

606. Walrand J. Communication Networks: A First Course Homewood, Ill: Aksen Associates/Irwin; 1991.

607. Wang W-H, Baer J-L, Levy HM. Organization and performance of a two-level virtual-real cache hierarchy. Proc 16th Annual Int’l Symposium on Computer Architecture (ISCA) 1989;140–148.

608. Watanabe T. Architecture and performance of the NEC supercomputer SX system. Parallel Computing. 1987;5:247–255.

609. Waters F, ed. IBM RT Personal Computer Technology. Austin, Tex: SA 23-1057,0 IBM; 1986.

610. Watson WJ. The TI ASC—a highly modular and flexible super processor architecture. Proc AFIPS Fall Joint Computer Conf. 1972;221–228.

611. Weaver DL, Germond T. The SPARC Architectural Manual, Version 9 Englewood Cliffs, N.J.: Prentice Hall; 1994.

612. Weicker RP. Dhrystone: A synthetic systems programming benchmark. Communications of the ACM. 1984;27(10):1013–1030 (October).

613. Weiss S, Smith JE. Instruction issue logic for pipelined supercomputers. Proc 11th Annual Int’l Symposium on Computer Architecture (ISCA) 1984;110–118.

614. Weiss S, Smith JE. A study of scalar compilation techniques for pipelined supercomputers. Proc Second Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1987;105–109.

615. Weiss S, Smith JE. Power and PowerPC San Francisco: Morgan Kaufmann; 1994.

616. Wendel D, Kalla R, Friedrich J, et al. The Power7 processor SoC. Proc Int’l Conf on IC Design and Technology 2010;71–73.

617. Weste N, Eshraghian K. Principles of CMOS VLSI Design: A Systems Perspective 2nd ed. Reading, Mass: Addison-Wesley; 1993.

618. Wiecek C. A case study of the VAX 11 instruction set usage for compiler execution. Proc Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1982;177–184.

619. Wilkes M. Slave memories and dynamic storage allocation. IEEE Trans Electronic Computers. 1965;EC-14(2):270–271 (April).

620. Wilkes MV. Hardware support for memory protection: Capability implementations. Proc Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1982;107–116.

621. Wilkes MV. Memoirs of a Computer Pioneer Cambridge, Mass: MIT Press; 1985.

622. Wilkes MV. Computing Perspectives San Francisco: Morgan Kaufmann; 1995.

623. Wilkes MV, Wheeler DJ, Gill S. The Preparation of Programs for an Electronic Digital Computer Cambridge, Mass: Addison-Wesley; 1951.

624. Williams S, Waterman A, Patterson D. Roofline: An insightful visual performance model for multicore architectures. Communications of the ACM. 2009;52(4):65–76 (April).

625. Williams TE, Horowitz M, Alverson RL, Yang TS. A self-timed chip for division. In: Losleben P, ed. Stanford Conference on Advanced Research in VLSI. Cambridge, Mass: MIT Press; 1987.

626. Wilson Jr AW. Hierarchical cache/bus architecture for shared-memory multiprocessors. Proc 14th Annual Int’l Symposium on Computer Architecture (ISCA) 1987;244–252.

627. Wilson RP, Lam MS. Efficient context-sensitive pointer analysis for C programs. Proc ACM SIGPLAN’95 Conf on Programming Language Design and Implementation 1995;1–12.

628. Wolfe A, Shen JP. A variable instruction stream extension to the VLIW architecture. Proc Fourth Int’l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 1991;2–14.

629. Wood DA, Hill MD. Cost-effective parallel computing. IEEE Computer. 1995;28(2):69–72 (February).

630. Wulf W. Compilers and computer architecture. Computer. 1981;14(7):41–47 (July).

631. Wulf W, Bell CG. C.mmp—A multi-mini-processor. Proc AFIPS Fall Joint Computer Conf. 1972;765–777.

632. Wulf W, Harbison SP. Reflections in a pool of processors—an experience report on C.mmp/Hydra. Proc AFIPS National Computing Conf. 1978;939–951.

633. Wulf WA, McKee SA. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News. 1995;23(1):20–24 (March).

634. Wulf WA, Levin R, Harbison SP. Hydra/C.mmp: An Experimental Computer System New York: McGraw-Hill; 1981.

635. Yamamoto W, Serrano MJ, Talcott AR, Wood RC, Nemirosky M. Performance estimation of multistreamed, superscalar processors. Proc 27th Annual Hawaii Int’l Conf on System Sciences 1994;195–204.

636. Yang Y, Mason G. Nonblocking broadcast switching networks. IEEE Trans on Computers. 1991;40(9):1005–1015 (September).

637. Yeager K. The MIPS R10000 superscalar microprocessor. IEEE Micro. 1996;16(2):28–40 (April).

638. Yeh T, Patt YN. Alternative implementations of two-level adaptive branch prediction. Proc 19th Annual Int’l Symposium on Computer Architecture (ISCA) 1993a;124–134 1992.

639. Yeh T, Patt YN. A comparison of dynamic branch predictors that use two levels of branch history. Proc 20th Annual Int’l Symposium on Computer Architecture (ISCA) 1993b;257–266.