- Design
- Process Technology/Packaging/Assembly
- AI
- Test/Validation
Low-cost Error Resilient Circuits for Digital Control Paths
Aradhana Kumari
STMicroelectronics
Low-cost Error Resilient Circuits for Digital Control Paths
Aradhana Kumari
STMicroelectronics
Automotive and space applications demand design resilience against bit-flips in digital circuits that could lead to catastrophic results. Immunity to SEU (Single Event Upset) events is presently ensured through “rad hard” (radiation hardened) processes and error resilient circuits. The rad hard devices are slower and limit the operation frequency of the designs. Also, they are expensive because they require special processing steps to fabricate. In this paper, multiple circuit level error resilient techniques are discussed and proposed that enable immunity to SEU events thus preventing system failure. A state machine with complex feedback paths is made error resilient by increasing the hamming distance between the states. The proposed approach is much more efficient than directly using hamming codes for state encoding. It is proposed to design a error resilient binary high-speed counter by one-hot coding its high speed LSBs (Least Significant Bits), thus self-correcting any deviations from one hot codes. For lower frequency operations, it is shown that SEU error resilience of a grey code counter is much higher than that of a regular binary counter. It is demonstrated that the presented techniques have a higher figure of merit than the contemporary redundancy-based error resilient solutions. The proposed digital circuits are technology independent and algorithmically resistant to any SEU events in the control path.
Building Digital Twins for semiconductor manufacturing using AWS and Generative AI
Dhara Vaishnav
Amazon Web Services (AWS)
Building Digital Twins for semiconductor manufacturing using AWS and Generative AI
Dhara Vaishnav
Amazon Web Services (AWS)
Digital twins are transforming industrial enterprises, offering a powerful tool to drive data-driven decision making and optimize operations. A digital twin is a living digital representation of a physical system that is dynamically updated with data to precisely mimic the true structure, state, and behavior of that system.
In the semiconductor and high-tech manufacturing sectors, which face immense complexity and capital-intensive production, digital twins are emerging as a critical technology. These virtual models ingest data from a variety of sources, including sensors, SCADA systems, and production systems, and combine it with 3D CAD models to create comprehensive digital representations. By leveraging the scalability, reliability, and advanced analytics capabilities of the AWS cloud, manufacturers can build and manage these digital twins at scale, integrating and contextualizing data from disparate IT and OT sources. By constructing detailed scenarios of real-world manufacturing situations, these intelligent virtual models can serve as an early-warning system, forecasting events and their likelihood before they occur. They also provide a digital test bed for evaluating strategies, optimizing schedules and sequences, and improving equipment performance – driving substantial gains in efficiency, quality, and time-to-market.
Building a digital twin, especially for highly specialized applications (such as multimachine production scheduling or vehicle routing), can be time-consuming and resource-intensive. The effort often entails designing and developing new digital-twin models, a process that can take six months or longer and incur substantial labor, computing, and server costs. Through the seamless integration of AWS services like IoT TwinMaker and generative AI capabilities, semiconductor and high-tech companies can build and deploy sophisticated digital twins that deliver tangible operational and business benefits. Large language models (LLMs) can create code for the digital twin, accelerating the development process and increasing effectiveness. This ability to generate such output leads to an exciting prospect: LLMs could possibly be used to create a generalized digital-twin solution. These generalized digital wins can then be augmented using other Generative AI techniques and further enhance the value of these digital twins.
Secure Management of Hyperscale Cloud Network Accelerators
Faye Yang
Microsoft
Secure Management of Hyperscale Cloud Network Accelerators
Faye Yang
Microsoft
This paper introduces an innovative design of a secure platform management system tailored for high-speed network accelerators, featuring robust security measures and advanced hardware interfaces to ensure resiliency and high availability in hyperscale cloud environment.
Modernizing Semiconductor Manufacturing Analytics Platform using AWS Data Analytics Services
Upasana Pandya
Amazon
Modernizing Semiconductor Manufacturing Analytics Platform using AWS Data Analytics Services
Upasana Pandya
Amazon
In the high-tech semiconductor industry, data is paramount to achieving operational excellence, product quality, and competitive advantage. This paper will explore prevalent manufacturing analytics use cases in the semiconductor industry, underscoring their role in optimizing processes, enhancing yield, improving quality, and fostering innovation. Additionally, it provides insights into how to establish a robust manufacturing analytics data lake utilizing AWS services, enabling organizations to harness the power of data-driven decision-making and gain a competitive edge in their respective industries.
Complex Shape EUV Extreme Ultraviolet Patterning: EUV Resist Process Optimization and Dry Etch Solutions for Defect Reduction and Cross-Wafer
Yashvi Singh
Micron
Complex Shape EUV Extreme Ultraviolet Patterning: EUV Resist Process Optimization and Dry Etch Solutions for Defect Reduction and Cross-Wafer
Yashvi Singh
Micron
Silicon surface area enhancement and preservation is imperative for chip scaling. Existing challenges of defects generated during EUV lithography process and challenges of pattern control of novel complex geometric shapes through dry etch and EUV process, needed to enhance silicon surface area have been discussed in the paper. The solutions proposed for EUV track developer optimization through new developer nozzle is critical for any memory technology pursuing EUV process. Dry etch solutions proposed to improve cross wafer pattern uniformity and pattern integrity are fundamental to advance high aspect ratio tight pitch EUV etches for complex patterns. In this paper we provide three mechanisms to achieve large surface area while maintaining cross wafer uniformity of the complex pattern.
Elevating 3D NAND performance: Dogwood and Its Process-Property Correlation for Low Resistivity, High Speed, and Superior Cell Performance
Lakshmi Suresh
Micron
Modernizing Semiconductor Manufacturing Analytics Platform using AWS Data Analytics Services
Lakshmi Suresh
Micron
A significant breakthrough in fluorine-free Dogwood (DW is a Micron Codename for a first-of-a-kind WL metal) wordline processing for 3D NAND flash memory has been realized. The integration of DW into WLs enhances both vertical and horizontal cell scaling, yielding substantial reductions in resistance-capacitance (RC) delay and lower leakage failure rates compared to tungsten (W) counterparts. Key to this development is process optimizations aimed at minimizing oxygen impurity, a critical factor in improving resistivity and work function stability. By optimizing deposition parameters such as partial pressure, precursor concentration, and purge cycles, oxygen residues within DW films can be effectively mitigated. Optimizing hydrogen partial pressure, reaction chemistry through precursor tuning during deposition enhances reduction reactions, improving film purity. Novel chemical surface treatments and optimized ALD cycling techniques eliminate oxygen contamination in DW voids, driving down resistivity and boosting speed performance. This fluorine-free process directly addresses key issues like read disturb and charge loss, closely tied to metal fill and work function degradation. Through fine-tuning DW film growth dynamics and reducing oxygen incorporation, the process achieves superior electrical properties with larger grain structures and higher uniformity. Collectively, these advancements extend the scaling limits of 3D NAND memory, offering a pathway to achieve low resistivity and high-speed operation with exceptional cell performance. This DW-based process is a vital innovation for next-generation 3D NAND flash technologies, supporting the increasing demand for high-density, high-performance memory.
Timing Constraint Generation for SerDes Interface IP Using Generative AI
Patricia Fong
Marvell
Patricia Fong
Marvell
Using ML based virtual metrology for advanced process control to improve high product mix manufacturing
Srividya Jayaram
Siemens EDA
Using ML based virtual metrology for advanced process control to improve high product mix manufacturing
Srividya Jayaram
Siemens EDA
An advanced process control system using the Virtual Metrology model for run-to-run control is proposed which incorporates measurements, design, fault detection and classification features to achieve the desired thickness target for the Chemical Vapor Deposition process.
Parametric Data Analysis for Pre-empting Link Failures
Granthana Rangaswamy
Meta
Parametric Data Analysis for Pre-empting Link Failures
Granthana Rangaswamy
Meta
As Meta prepares to deploy the 24K cluster and beyond, designing resilient hardware systems is increasingly critical to prevent unplanned resource unavailability and job restarts. There is a growing need to accurately identify and proactively repair interconnect issues to minimize the significant costs associated with hardware failures. High-speed data designs have always been a significant challenge for system deployment, often determining the overall performance and reliability of the system. With higher speeds, more complex modulation schemes, reduced signal-to-noise ratios (SNRs), and longer job runs for AI clusters, high-speed designs are even more critical and must be designed with utmost care.
Our proposed solution involves collecting large-scale data on Ethernet SerDes parameters, along with system and channel data, to build anomaly detectors that can predict link failures. These anomaly detectors, enhanced with machine learning algorithms and refined pass/fail criteria, will enable preemptive detection of link issues and shifts in margin distributions. This capability will accelerate the deployment and effective management of next-generation systems.
We have implemented parametric analysis on one of our systems, generating a data correlation heatmap that showcases good correlation, analyzing trends, and identifying clusters of well-performing and underperforming ports. By presenting our analysis and insights from a system-level perspective, we aim to inform and shape AI infrastructure development and failure attribution practices, driving more informed decision-making.
Test Time Optimization: A Novel Staggered- capture Architecture Using A Token-passing Architecture
Khushboo Agarwal
AMD
Test Time Optimization: A Novel Staggered- capture Architecture Using A Token-passing Architecture
Khushboo Agarwal
AMD
ATE Multi-Site Hardware Design for Wi-Fi 6E and BT Devices
Kate Cheng
Synaptics
ATE Multi-Site Hardware Design for Wi-Fi 6E and BT Devices
Kate Cheng
Synaptics
This article is to summarize best practices and guidelines for designing multi-site ATE PCBs for testing of BT and WLAN combo products. As the frequency, bandwidth and performance requirement are higher, the design of ATE test load board becomes more crucial. The parasitic of pcb itself and Electro-Magnetic Interference (EMI) between traces can significantly degrade RF performance. The preference is to design for multi-site parallel testing to improve test efficiency and reduce cost nowadays, thus signal integrity and isolation are important factors in the PCB design. Moreover, we need to take DC power plane, layer to layer/digital signal isolation and proper grounding into consideration.
Meta Silicon Infrastructure and Evolution with AI
Salina Dbritto
Meta
Meta Silicon Infrastructure and Evolution with AI
Salina Dbritto
Meta
Meta is developing its own specialized silicon to support AI operations. Moving into the AI domain necessitated a shift in thinking, perspective, and a paradigm change in our approach to test and validate infrastructure and workflows. Where previously a focus on individual component validation and infrastructure was adequate, we now need to broaden our infrastructure scope to accommodate AI systems, transitioning from a component-centric to a system-level outlook. This involves integrating computing, networking, and liquid cooling strategies, and constructing infrastructure in labs that not only support component validation but also explore methods to assess integrated racks and interoperability capabilities. Shifting from component-based to AI system-based infrastructure poses significant challenges, and we are crafting solutions to address the needs of AI infrastructure. Silicon automation infrastructure is designed to be portable and scalable, allowing for seamless integration across various development and validation phases and environments. This includes pre-Silicon phase Emulation environments, post-Silicon engineering labs, data centers during NPI phase, deployment in the fleet during MP phase, and ODM/Vendor environments. By standardizing our automation tools across multiple geographical regions and platforms, we provide engineers with a consistent user experience, ultimately increasing developer velocity. Additionally, we have shifted many of our validation efforts to earlier stages of the NPI cycle, minimizing issues that may arise in the fleet. The robust infrastructure, standard automated tools, and processes have played a crucial role in enabling the significant growth and achievement in the AI programs. We continue to improve and enhance our infra capabilities to support integrated AI systems for next-generation Silicon programs.
Emerging PMIC Trends Demand Test Innovation
Lauren Getz
Teradyne
Emerging PMIC Trends Demand Test Innovation
Lauren Getz
Teradyne
Power management integrated circuits (PMICs) are responsible for managing the power requirements for mobile systems while preserving the battery charge. If the PMIC has an integrated charger it is also responsible for handling different charging schemes, modes and inputs, such as wall, USB, and wireless. This component has become increasingly complex and embeds not only low drop out regulators, multi-phase DC-DC converters, USB fast chargers, general purpose analog to digital converters and communication interfaces but also other optional blocks such as LEDs, drivers, fuel gauges, and audio analog to digital / digital to analog converters. Testing high-performance PMICs requires high-density DC instruments with highly flexible merging capabilities to achieve high current. This presentation will discuss the trends and requirements driving PMIC innovation and test solutions that maximize yield for high quality chips at the lowest cost of test.
Clock Sensitivity Test
Lei Han
Synopsys
Clock Sensitivity Test
Lei Han
Synopsys
Customers and internal may experience QoR instability due to minor changes in flow, settings, or design. These fluctuations complicate the debugging process and compromise the stability of release-to-release results. A sensitivity test was initialized to assess the engine’s response to external stimuli in ICCII. 13 defects were identified during the experiment, with 100% R&D engagement. This guarantees a stable and predictable tool performance and enables adaptability to customer scenarios. Recent customer feedback has confirmed the success of this approach in streamlining the migration process and reducing development costs. As we look to the future, an AI model for predictive analysis based on historical sensitivity data is underway.
Testing beyond spec for product security assurance!
Rachana Maitra
Marvell
Testing beyond spec for product security assurance!
Rachana Maitra
Marvell
This paper will bring to light the aspects of product security testing not covered by conventional hardware verification/validation methods and explain the challenges stemming from the fundamental shift in mindset in trying to break a product like a hacker. Contrary to looking for the presence of functional correctness and compliance to power and performance expectations per spec, security testing should ensure the absence of behavior or characteristics that a hacker could utilize for malicious intent such as stealing or counterfeiting IP, disrupting, or corrupting functionality, leaking secret key or confidential information, etc. For this reason, when creating a security test plan, the test parameters or what to look for while analyzing test results must not be bounded by spec. One must also accept that, the threat landscape is continually evolving, and the sophistication of bad actors is exponentially growing. Hence, security testing strategies must also be open to continuous improvement in test methods and tools. This paper will demonstrate how Marvell is building best known method (BKM) to address these challenges, as part of corporate initiatives to continuously improve its security development lifecycle process (SDL). While elaborating the concept of testing beyond spec, it will enforce the need for a layered approach to testing to achieve product security assurance before production. It will do so by categorizing security testing requirements throughout various pre- and post silicon development phases, with strategies and tools appropriate for the phase. In every phase, the goal of the testing will be to intentionally look for potential security violations that may remain hidden within a functionally clean design!