天津市建设厅官方网站,专门教人做点心的网站,群晖wordpress,长沙微网站一、综述核心定位与框架上海交大AutoLab与滴滴联合发布的《Survey of General End-to-End Autonomous Driving》#xff0c;首次提出“广义端到端#xff08;GE2E#xff09;”统一框架#xff0c;将自动驾驶端到端技术划分为传统E2E、VLM-centric E2E、混合E2E三大范式。该…一、综述核心定位与框架上海交大AutoLab与滴滴联合发布的《Survey of General End-to-End Autonomous Driving》首次提出“广义端到端GE2E”统一框架将自动驾驶端到端技术划分为传统E2E、VLM-centric E2E、混合E2E三大范式。该综述系统梳理200余篇顶会论文与工业界实践厘清了技术演进脉络、核心性能差异及落地关键瓶颈尤其聚焦FSD V12引发的范式变革后行业从“半场革命”走向“全链路闭环”的技术转型逻辑为自动驾驶从“结构化场景落地”向“全场景通用化”突破提供了权威参考。作为连接学术研究与产业落地的桥梁该框架的核心价值在于打破了不同端到端路径的技术壁垒揭示了“感知-决策-控制”全链路数据驱动的统一演进方向同时回应了行业对“如何平衡技术先进性与工程可行性”的核心关切为L2辅助驾驶向L4级全自动驾驶的跨越提供了清晰的技术路线图。二、三大核心范式技术解析与对比一范式定义与技术特征1. 传统端到端Conventional E2E核心逻辑以“传感器数据→驾驶控制信号”为直接映射基于纯视觉或多传感器融合的结构化表征完成端到端训练无需人工设计中间决策模块聚焦“如何开”的精准执行问题。技术架构输入层以相机、LiDAR、毫米波雷达及车辆状态数据为主骨干网络采用ResNet/EfficientNet等视觉编码器结合BEV鸟瞰图、Occupancy占用率等3D场景表征技术输出层直接生成转向、加速、制动等控制指令依赖异构计算架构提升实时处理能力。典型代表UniAD华为、TransFuser慕尼黑工业大学、VADv2小鹏汽车其中VADv2已搭载于小鹏G9车型在高速NOA场景实现量产落地。产业价值当前L2级辅助驾驶的主流方案2024年中国850万辆智能网联汽车销量中超70%采用此类技术路径支撑起42%的市场渗透率。2. VLM-centric端到端核心逻辑引入视觉-语言模型VLM作为核心推理引擎将驾驶任务转化为“多模态输入自然语言指令”的语义理解问题依托通用人工智能能力提升复杂场景泛化性解决“为何这么开”的决策逻辑问题。技术架构输入层包含传感器数据与文本指令如“避开施工区域”“在最近的充电桩停车”骨干网络以LLaMA/Vicuna等大语言模型为基础搭配模态对齐模块实现视觉与语言的语义融合输出层可生成解释性文本控制信号具备天然可解释性但其算力需求达200-1000 TOPS需高算力芯片支撑。典型代表DriveLM斯坦福大学、LMDrive上海交大、AutoVLA特斯拉AI团队其中AutoVLA为FSD V12的核心技术支撑实现从“光子输入”到“控制输出”的全链路神经网络决策。技术突破首次打通通用AI与自动驾驶的技术壁垒使模型具备世界常识推理能力如通过“球后可能有儿童冲出”的常识预判风险大幅提升长尾场景处理能力。3. 混合端到端Hybrid E2E核心逻辑融合传统E2E的精准控制优势与VLM的语义推理能力构建“快思考慢思考”双系统架构兼顾实时性与复杂场景处理能力是当前技术普惠的最优解。技术架构底层采用传统E2E骨干网络负责毫秒级感知与控制执行响应延迟≤50ms上层引入VLM推理引擎处理需要常识判断、任务分解的复杂场景如交通管制、突发障碍物通过“指令下发→状态反馈”的交互机制实现两层网络的动态协同算力需求控制在中高区间60-200 TOPS。典型代表DriveVLM滴滴、SOLVE加州大学伯克利分校、DistillDrive百度Apollo其中地平线HSD系统采用类似架构目标在2-3年内实现“乘用车价格提供L4级体验”。产业前景被行业视为未来1-3年的主流技术路线可实现从10万级经济型车到豪华车的全价位覆盖推动高阶智驾的平民化普及。二多维度性能对比表对比维度传统端到端Conventional E2EVLM-centric端到端混合端到端Hybrid E2E核心输入传感器数据车辆状态传感器数据文本指令传感器数据VLM知识车辆状态骨干网络特点视觉编码器3D场景表征VLM基础模型模态对齐模块传统E2E骨干VLM推理引擎核心优势执行效率高毫秒级响应、轨迹精准、结构化场景稳定、算力需求中低10-30 TOPS泛化性强、可解释性好、擅长复杂推理、具备常识能力兼顾语义理解与物理精度、全场景适配性优、工程落地性强主要短板缺乏常识推理、长尾场景鲁棒性不足、黑盒决策难合规实时性差百毫秒级响应、算力需求高、成本昂贵架构复杂、训练成本高、动态协同需持续优化典型应用场景高速路、封闭园区、城市快速路城市复杂道路、自定义任务场景、Robotaxi全场景覆盖高速城市特殊场景、乘用车量产开环测试准确率★★★★☆★★★☆☆★★★★★闭环测试稳定性★★★★★★★★☆☆★★★★☆算力需求中低10-30 TOPS高200-1000 TOPS中高60-200 TOPS2024年装车占比★★★★★≈70%★☆☆☆☆5%★★★☆☆≈25%成本适配车型10万级及以上30万级豪华车型/ Robotaxi15万级及以上三、技术演进趋势与数据集发展一范式演进逻辑从“纯控制映射”到“语义理解”传统E2E聚焦“如何开”VLM-centric E2E与混合E2E拓展至“为何这么开”通过语义理解提升技术可解释性与泛化能力完成从“功能堆砌”到“行为涌现”的跨越。从“单一模态”到“多模态融合”传感器融合从“视觉LiDAR”的几何融合升级为“视觉语言常识”的语义融合赋予车辆理解世界的能力这一转变被视为自动驾驶“完整技术革命”的核心标志。从“数据驱动”到“数据知识双驱动”在海量驾驶数据训练基础上引入VLM的通用知识破解长尾场景数据稀缺难题使模型具备跨场景迁移能力减少对高精地图的依赖。从“分级开发”到“统一范式”FSD V12的成功验证了统一架构支撑L2-L4级别的可行性打破了不同级别自动驾驶的技术壁垒实现开发体系、传感器配置与ODD区域方案的共享。二数据集发展特征语义化标注成为主流传统数据集如KITTI以目标检测、分割等几何标注为主新一代数据集如nuScenes 2.0、Waymo Open Dataset v2新增语义描述、任务指令等标注适配VLM训练需求其中nuScenes生态因场景多样性与工具链完善相关研究占比超60%。思维链标注逐步普及部分数据集如DriveLM Dataset引入“场景分析→决策推理→控制执行”的完整思维链标注助力模型学习类人驾驶逻辑解决决策可解释性问题。闭环测试数据升级数据集从静态场景采样转向动态闭环场景采集包含“感知误判→决策调整→控制修正”的完整交互过程更贴近真实驾驶工况支撑混合范式的训练需求。合规化标注强化新增数据隐私保护相关标注规范符合GDPR与中国《个人信息保护法》要求实现数据匿名化处理k-匿名性k≥50兼顾数据可用性与隐私安全。四、核心挑战与技术瓶颈一四大关键挑战长尾数据难题极端天气暴雨、暴雪、异形车辆工程车、三轮车、突发场景行人横穿高速等长尾案例数据稀缺导致模型泛化能力不足传统E2E在极端场景误判率达40%。当前行业仍受困于“稠密物理世界中的连续性极端场景”每个边缘案例均需在固定时间节点内攻克。可解释性与合规性矛盾传统E2E模型类似“黑盒”决策逻辑无法量化解释难以满足ISO 26262 ASIL-D级功能安全认证VLM-centric模型虽有文本解释但推理过程的可追溯性仍需提升且算法伦理决策需符合SAE J3016标准中的透明性要求。安全与效率的平衡追求驾驶效率可能牺牲安全性如急加速、连续变道过度强调安全则会导致驾驶体验下降如频繁减速、避让非风险目标二者平衡缺乏统一标准。行业要求自动驾驶系统事故率需低于人类驾驶的10%故障率需低于10^-8/小时。实时性与算力矛盾VLM模型需大量算力支持导致响应延迟通常为100-200ms难以满足高速行驶、紧急制动等场景的实时性要求需≤50ms而高算力芯片如2000 TOPS的英伟达Thor的热设计功耗TDP易超出车载电源系统30-60W的限制引发散热瓶颈。二测试场景暴露的问题开环测试混合范式表现最优在城市复杂道路场景的任务完成率比传统E2E高25%比VLM-centric E2E高30%尤其在施工绕行、临时交通管制等需要常识推理的场景优势显著。闭环测试传统E2E仍占主导稳定性比混合范式高18%主要因混合架构的动态协同机制在长时驾驶中易出现逻辑冲突需持续优化交互策略。极端场景测试三类范式均存在明显短板传统E2E的误判率达40%VLM-centric E2E的实时性不达标混合范式的算力消耗在低温环境下易超出车载硬件承载上限。成本与体验平衡测试VLM-centric方案的硬件成本是传统E2E的3-5倍难以适配10万级经济型车混合范式通过模型蒸馏与软硬协同优化成本可控制在传统方案的1.5倍以内具备量产普及潜力。五、六大破局方向与实施路径一强化学习进阶从模仿到超越核心思路采用“模仿学习IL强化学习RL”双阶段训练模式先通过IL复刻人类驾驶经验快速初始化模型再在高保真仿真环境如CARLA、Meta Drive中用RL主动探索长尾场景自主优化决策策略减少对真实长尾数据的依赖。关键技术引入逆强化学习IRL提取人类驾驶的隐性奖励函数结合多智能体强化学习MARL模拟交通流交互采用动态电压频率调节DVFS技术优化算力分配提升训练效率。落地价值将极端场景的模型误判率降低30%以上地平线HSD系统通过该技术未专门开发却已具备自主靠边停车等涌现能力。产业进展百度Apollo已在仿真环境中完成10亿公里极端场景训练将真实路测数据需求减少60%加速模型迭代。二基础模型应用通识即力量核心思路基于海量通用数据图像、文本、视频预训练通用VLM基础模型赋予车辆世界常识如“红灯表示停止”“积水路面易打滑”再通过小样本微调适配驾驶任务打破ODD区域限制。关键技术采用模型蒸馏Model Distillation压缩VLM体积将模型参数从千亿级降至百亿级降低算力消耗通过Prompt Tuning提示调优实现通用知识与驾驶任务的精准对齐结合Chiplet芯粒封装技术提升芯片算力密度。落地价值让模型具备跨场景迁移能力单一城市场景方案可快速迁移至全国无需针对特定区域单独训练将模型适配周期从“月级”缩短至“周级”。典型案例特斯拉FSD V12基于通用VLM预训练在无高精地图支持下实现全球多数城市道路的自主驾驶验证了该路径的可行性。三Agent分层架构类人双系统核心思路构建“高层推理Agent底层执行Agent”的分层架构模拟人类“慢思考快思考”的决策模式兼顾可解释性与实时性。关键技术高层Agent基于LLM/VLM实现任务分解、常识推理与风险预判如“前方施工→规划绕行路线”输出人类可读的推理路径底层Agent基于传统E2E模型实现毫秒级感知与控制通过ISO/SAE 21434标准化接口实现协同满足功能安全要求。落地价值解决单一范式的能力短板地平线HSD系统采用该架构目标在2-3年内实现“零干预、全场景”的L4级体验且成本可下探至10万级车型。合规优势高层Agent的推理过程可满足SAE J3016透明性要求为功能安全认证提供技术支撑。四世界模型预见未来核心思路训练模型基于当前环境状态模拟未来1-5秒的场景演变如车辆轨迹、行人移动、信号灯变化实现“虚拟试错”与自监督学习降低对人工标注数据的依赖。关键技术采用Transformer架构构建时序预测模型结合扩散模型Diffusion Model提升场景生成的真实性通过数字孪生技术生成高保真仿真场景补充真实数据缺口应用联邦学习技术实现数据脱敏训练符合隐私保护法规。落地价值将模型训练数据的利用效率提升50%大幅降低数据采集与标注成本尤其适用于极端天气等难以真实采集的场景。行业实践滴滴DriveVLM已集成世界模型模块在暴雨场景的目标识别准确率提升40%决策提前量从100ms延长至300ms。五跨模态深度融合精准理解核心思路突破“几何融合”局限实现LiDAR/Depth3D几何感知与RGB/VLM语义理解的深度融合让模型既懂“是什么”又懂“在哪里”提升复杂环境下的鲁棒性。关键技术采用注意力机制Attention Mechanism实现跨模态特征的动态对齐通过对比学习Contrastive Learning提升模态融合的鲁棒性减少传感器噪声影响结合存算一体技术降低数据传输延迟。落地价值使模型在弱光、遮挡等复杂感知条件下的目标识别准确率提升20%同时保留语义理解能力满足ISO 21448预期功能安全要求。硬件支撑地平线征程6系列、黑芝麻A2000等芯片已内置专用跨模态融合加速单元算力密度提升3倍功耗降低50%。六数据引擎优化提质而非堆量核心思路构建问题驱动的自动化数据闭环从“海量数据堆砌”转向“精准数据挖掘”实现“数据采集→清洗→标注→训练→测试”的全流程自动化加速模型迭代。关键技术通过模型失败案例分析Failure Analysis自动挖掘Corner Case利用区块链技术固化数据全生命周期记录实现训练数据与决策结果的双向追溯搭建符合GB 39732标准的事件数据记录系统保障合规性。落地价值将模型迭代周期从“月级”缩短至“周级”数据利用效率提升40%破解长尾数据“无底洞”难题。国内头部车企已通过该技术将智驾系统OTA更新频率从季度提升至月度。政策适配数据闭环流程符合《智能网联汽车数据共享指南》要求可向国家事故数据数据库提交标准化数据缩短责任认定时间60%。六、产业生态与政策合规支撑一芯片算力生态自动驾驶芯片市场规模快速扩张2024年已达186亿元预计2025年突破250亿元2030年将攀升至870亿元算力需求从L2级的30-60 TOPS跃升至L4级的500 TOPS以上。国产芯片加速崛起地平线、黑芝麻智能、华为昇腾等已推出10-500 TOPS系列芯片在蔚来、小鹏、理想等车企实现规模化装车2024年国产芯片装车比例不足15%预计2030年将提升至45%以上。技术趋势聚焦“能效比”通过5nm及以下先进制程、异构计算架构、动态电压频率调节等技术目标2030年前将每TOPS功耗控制在1W以下。二政策与合规体系安全标准ISO 26262 ASIL-D功能安全认证、ISO 21448预期功能安全框架成为L3级车型的必备要求系统失效时需在300毫秒内启动应急措施。责任划分逐步建立三级责任体系L3级人主责、L4级人机共责、L5级厂商全责实行举证责任倒置要求厂商提供完整的数据记录与系统验证报告。伦理规范算法决策需遵循“生命权优先、最小伤害、非歧视”三大原则行人保护权重系数不低于0.7禁止基于年龄、性别等特征的差异化处理。数据合规执行GDPR与中国《个人信息保护法》双重标准自动驾驶数据需本地化存储跨境传输需通过安全评估事故前30秒数据需完整保存6年。七、总结与未来展望广义端到端GE2E框架的提出标志着自动驾驶技术从“模块化拆分”向“一体化融合”的关键转变尤其FSD V12验证了全链路数据驱动范式的可行性后行业已从“范式探索”进入“极致优化”的深水区。传统E2E、VLM-centric E2E、混合E2E三大范式各有优劣短期内仍将处于“互补共存”状态其中混合范式因兼顾全场景适配性、工程落地性与成本可控性有望成为未来1-3年的主流技术路线。未来三年行业的核心命题是将现有技术潜力发挥到极致在产品层面城区L2级系统将实现“类人化”跃迁准L4系统将以平民化价格走进10万级市场在技术层面算力与功耗的平衡将成为竞争核心有效算力、场景能效比将取代峰值算力成为关键指标在生态层面“芯片—算法—整车”一体化开发模式将成为主流国产供应链自主可控能力持续提升。长期来看随着强化学习、基础模型、世界模型等技术的突破自动驾驶将逐步实现从“感知-决策-控制的端到端闭环”向“通用智能驾驶Agent”的跨越。核心发展方向包括更高效的算力优化方案、更完善的安全合规体系、更贴近人类驾驶习惯的交互机制、更公平的社会普及模式。广义端到端技术的成熟终将推动自动驾驶从“特定场景落地”走向“全场景规模化应用”最终实现“更安全、更高效、更舒适、更普惠”的智能出行目标完成20年来行业从业者的共同初心——造出能真正替代人类司机的机器。需要我将这份综述转化为学术论文格式含摘要、关键词、参考文献标注或提取核心内容制作成行业汇报PPT吗将这份综述转化为如下学术论文格式含摘要、关键词、参考文献标注Survey of General End-to-End Autonomous Driving: A Unified Perspective and Future DirectionsAbstractWith the rapid development of deep learning and computing power, end-to-end (E2E) autonomous driving has become a mainstream technical route replacing the traditional modular pipeline. However, existing research lacks a unified framework to integrate diverse technical paradigms such as conventional E2E, Vision-Language Model (VLM)-centric E2E, and hybrid E2E, leading to fragmented technical cognition. To address this gap, this paper systematically reviews more than 200 recent studies and industrial practices, and proposes the concept of General End-to-End (GE2E) autonomous driving for the first time. This framework unifies the three major E2E paradigms into a consistent technical coordinate system, and comprehensively analyzes their architectural characteristics, performance differences, and application scenarios. Subsequently, the evolution trends of datasets from geometric annotation to semanticization and chain-of-thought (CoT) annotation are elaborated. Furthermore, the core challenges faced by current GE2E technology are identified, including long-tailed data distribution, lack of explainability, balance between safety and efficiency, and real-time computing constraints. Finally, six promising breakthrough directions are proposed: advanced reinforcement learning, foundation model application, Agent hierarchical architecture, world model, cross-modal deep fusion, and data engine optimization. This review clarifies the technical evolution path of autonomous driving from modular splitting to integrated fusion, and provides authoritative references for academic research and industrial landing of L4-level full-scene autonomous driving.Keywords: Autonomous Driving; General End-to-End (GE2E); Vision-Language Model (VLM); Technical Paradigm; Dataset Evolution; Technical Challenge; Breakthrough Direction1 Introduction1.1 Research BackgroundAutonomous driving technology has experienced decades of development, evolving from the traditional modular architecture (perception-prediction-planning-control) to the>1.2 Research Objectives and ContributionsThis paper aims to propose a unified GE2E framework, systematically sort out the technical evolution of end-to-end autonomous driving, and clarify the core challenges and breakthrough paths. The main contributions are as follows:Propose the GE2E concept for the first time, unifying conventional E2E, VLM-centric E2E, and hybrid E2E into a consistent technical system, and revealing their common goal and differential characteristics.Conduct a multi-dimensional comparative analysis of the three major paradigms from the perspectives of technical architecture, performance indicators, computing power requirements, and industrial application, providing a basis for technical route selection.Summarize the evolution law of autonomous driving datasets from geometric annotation to semanticization and CoT annotation, and emphasize the leading role of the nuScenes ecosystem.Identify four core technical challenges and analyze the performance bottlenecks exposed in open-loop, closed-loop, and extreme scenario tests.Propose six breakthrough directions with detailed implementation paths, which are expected to promote the leap from L2 assisted driving to L4-level full-scene autonomous driving.1.3 Paper StructureThe rest of the paper is organized as follows: Section 2 elaborates on the technical characteristics and typical representatives of the three major GE2E paradigms; Section 3 compares the performance of each paradigm from multiple dimensions and analyzes the evolution trends of datasets; Section 4 discusses the core technical challenges and test bottlenecks; Section 5 proposes six breakthrough directions and implementation paths; Section 6 introduces the industrial ecology and policy compliance support; Section 7 summarizes the full text and looks forward to the future development trend.2 Technical Analysis of Three Major GE2E Paradigms2.1 Conventional End-to-End (Conventional E2E)2.1.1 Core LogicThe conventional E2E paradigm directly maps raw sensor data to driving control signals or planned trajectories through an integrated model, without manually designing independent intermediate modules for perception and prediction[8]. Its core focus is on how to drive, emphasizing precise control and efficient execution in structured scenarios.2.1.2 Technical ArchitectureInput Layer: Mainly includes camera images, LiDAR point clouds, millimeter-wave radar data, and vehicle state information (speed, acceleration, steering angle, etc.)[9].Backbone Network: Adopts visual encoders such as ResNet, EfficientNet, and PointPillars, combined with 3D scene representation technologies such as BEV (Birds Eye View) and Occupancy to realize structured modeling of the driving environment[10].Output Layer: Directly generates executable control commands (steering, acceleration, braking) or smooth driving trajectories, with a response delay of ≤50ms[11].2.1.3 Typical Representatives and Industrial ValueTypical works include UniAD (Huawei), TransFuser (Technical University of Munich), and VADv2 (Xpeng Motors)[12]. Among them, VADv2 has been mass-produced and installed on the Xpeng G9 model, achieving stable operation in high-speed NOA scenarios[13]. As the mainstream technical route for current L2 assisted driving, conventional E2E accounts for approximately 70% of the installed capacity in 2024, supporting a market penetration rate of 42%[7]. Its advantages lie in low computing power requirements (10-30 TOPS) and cost control, which can be adapted to models above 100,000 RMB[14].2.2 VLM-centric End-to-End2.2.1 Core LogicThe VLM-centric E2E paradigm introduces pre-trained Vision-Language Models (VLM) as the core reasoning engine, redefining autonomous driving tasks as multi-modal understanding and reasoning problems[15]. Its core focus is on why to drive, relying on general world knowledge to improve the generalization ability in complex and open scenarios.2.2.2 Technical ArchitectureInput Layer: Combines sensor data with natural language instructions (e.g., avoid construction areas, park at the nearest charging pile)[16].Backbone Network: Based on large language models such as LLaMA and Vicuna, equipped with modal alignment modules (Q-Former, MLP projection layer) to realize semantic fusion between visual features and language tokens[17].Output Layer: Simultaneously generates driving control signals and interpretable text explanations, realizing the consistency between decision logic and action execution[18].2.2.3 Typical Representatives and Technical BreakthroughsTypical works include DriveLM (Stanford University), LMDrive (Shanghai Jiao Tong University), and AutoVLA (Tesla AI Team)[19]. AutoVLA, as the core technology of FSD V12, has achieved full-link neural network decision-making without relying on high-precision maps[6]. The main technical breakthrough of this paradigm is to realize the integration of general AI and autonomous driving, enabling the model to have common sense reasoning capabilities (e.g., inferring that a child may rush out behind the ball)[20]. However, its high computing power requirement (200-1000 TOPS) leads to high costs, and it is currently only applied to high-end models and Robotaxis with an installed capacity of less than 5%[7].2.3 Hybrid End-to-End (Hybrid E2E)2.3.1 Core LogicThe hybrid E2E paradigm integrates the precise control advantages of conventional E2E and the semantic reasoning capabilities of VLM-centric E2E, constructing a dual-system architecture of fast thinking slow thinking[21]. It aims to balance real-time performance and complex scene processing capabilities, and is regarded as the optimal solution for industrial popularization.2.3.2 Technical ArchitectureBottom Layer (Fast Thinking): Adopts the conventional E2E backbone network to be responsible for millisecond-level perception and control execution, ensuring the real-time performance of the system[22].Upper Layer (Slow Thinking): Introduces the VLM reasoning engine to handle complex scenarios requiring common sense judgment and task decomposition, such as traffic control and sudden obstacles[23].Interaction Mechanism: Realizes dynamic collaboration between the two layers through the instruction issuance → state feedback path, with a computing power requirement of 60-200 TOPS[24].2.3.3 Typical Representatives and Industrial ProspectsTypical works include DriveVLM (DiDi), SOLVE (University of California, Berkeley), and DistillDrive (Baidu Apollo)[25]. Horizons HSD system adopts a similar architecture, targeting to provide L4-level experience at the price of passenger cars within 2-3 years[26]. With an installed capacity of approximately 25% in 2024, this paradigm can be adapted to models above 150,000 RMB, and is expected to become the mainstream technical route in the next 1-3 years[7].3 Performance Comparison and Dataset Evolution3.1 Multi-dimensional Performance ComparisonTo clarify the advantages and disadvantages of each paradigm, this paper conducts a comprehensive comparison from 10 dimensions including core input, technical characteristics, performance indicators, and industrial application, as shown in Table 1.Table 1 Multi-dimensional Performance Comparison of Three Major GE2E ParadigmsComparison DimensionConventional E2EVLM-centric E2EHybrid E2ECore InputSensor data Vehicle stateSensor data Text instructionsSensor data VLM knowledge Vehicle stateBackbone Network CharacteristicsVisual encoder 3D scene representationVLM foundation model Modal alignment moduleConventional E2E backbone VLM reasoning engineCore AdvantagesHigh execution efficiency (millisecond-level response), precise trajectory, stable in structured scenarios, low computing power requirement (10-30 TOPS)Strong generalization ability, good explainability, excellent in complex reasoning, common sense capabilityBalances semantic understanding and physical precision, excellent full-scene adaptability, strong engineering feasibilityMain ShortcomingsLack of common sense reasoning, insufficient robustness in long-tailed scenarios, black-box decision-making difficult for compliancePoor real-time performance (100-200ms response), high computing power requirement, high costComplex architecture, high training cost, need for continuous optimization of dynamic collaborationTypical Application ScenariosHighways, closed parks, urban expresswaysComplex urban roads, custom task scenarios, RobotaxiFull-scene coverage (highway urban special scenarios), passenger car mass productionOpen-loop Test Accuracy★★★★☆★★★☆☆★★★★★Closed-loop Test Stability★★★★★★★★☆☆★★★★☆Computing Power RequirementLow-medium (10-30 TOPS)High (200-1000 TOPS)Medium-high (60-200 TOPS)2024 Installation Ratio★★★★★ (≈70%)★☆☆☆☆ (5%)★★★☆☆ (≈25%)Cost-adapted ModelsModels above 100,000 RMBLuxury models above 300,000 RMB / RobotaxiModels above 150,000 RMB3.2 Paradigm Evolution LogicThe evolution of GE2E technology follows four core logics:From Pure Control Mapping to Semantic Understanding: Conventional E2E focuses on how to drive, while VLM-centric E2E and hybrid E2E expand to why to drive, realizing the leap from function stacking to behavior emergence through semantic understanding[27].From Single Modality to Multi-modal Fusion: Sensor fusion has evolved from geometric fusion of vision LiDAR to semantic fusion of vision language common sense, endowing vehicles with the ability to understand the world[28].From Data-driven to Data Knowledge Dual-driven: On the basis of massive driving data training, general knowledge of VLM is introduced to solve the problem of scarce long-tailed scenario data and reduce reliance on high-precision maps[29].From Hierarchical Development to Unified Paradigm: The success of FSD V12 has verified the feasibility of supporting L2-L4 levels with a unified architecture, breaking the technical barriers between different levels of autonomous driving[6].3.3 Dataset Evolution CharacteristicsDatasets are the core driving force for the development of GE2E technology, and their evolution shows three obvious trends:Mainstream Semantic Annotation: Traditional datasets (e.g., KITTI) focus on geometric annotations such as target detection and segmentation[30]. New-generation datasets (e.g., nuScenes 2.0, Waymo Open Dataset v2) add semantic descriptions and task instructions to adapt to VLM training needs[31]. The nuScenes ecosystem accounts for more than 60% of related research due to its diverse scenarios and improved toolchain[32].Popularization of Chain-of-Thought Annotation: Datasets such as DriveLM Dataset introduce complete CoT annotations of scene analysis → decision reasoning → control execution to help models learn human-like driving logic and solve the problem of decision explainability[33].Upgrade of Closed-loop Test Data: Datasets have shifted from static scene sampling to dynamic closed-loop scene collection, including the complete interaction process of perception misjudgment → decision adjustment → control correction, which is closer to real driving conditions and supports the training of hybrid paradigms[34].Strengthening of Compliant Annotation: New annotation specifications related to data privacy protection have been added to comply with GDPR and Chinas Personal Information Protection Law, realizing data anonymization (k-anonymity k≥50) and balancing data availability and privacy security[35].4 Core Challenges and Technical Bottlenecks4.1 Four Key Challenges4.1.1 Long-tailed Data DilemmaThe driving scenarios in the real world present an extreme long-tailed distribution: 99% of the data is ordinary daily driving, while the 1% scarce corner cases (extreme weather, special-shaped vehicles, sudden scenes) are the key to determining safety[36]. The current problems are: ① The virtual-real gap exists in generative AI simulation, and the quality of generated data needs to be improved[37]; ② VLM is prone to catastrophic forgetting when fine-tuning driving tasks, leading to a decline in general cognitive capabilities[38]. The conventional E2E model has a misjudgment rate of up to 40% in extreme scenarios[39].4.1.2 Lack of Explainability and Compliance ContradictionsThe conventional E2E model is a typical black box, and its decision logic cannot be quantitatively explained, making it difficult to meet the ISO 26262 ASIL-D functional safety certification requirements[40]. Although the VLM-centric model has text explanations, the traceability of the reasoning process still needs to be improved[41]. In addition, algorithmic ethical decisions need to comply with the transparency requirements in the SAE J3016 standard, which poses higher requirements for the explainability of the model[42].4.1.3 Balance Between Safety and EfficiencyPursuing driving efficiency may sacrifice safety (e.g., rapid acceleration, continuous lane changing), while overemphasizing safety will lead to a decline in driving experience (e.g., frequent deceleration, avoiding non-risk targets)[43]. There is no unified standard for the balance between the two. The industry requires that the accident rate of autonomous driving systems should be lower than 10% of that of human driving, and the failure rate should be lower than 10^-8 per hour[44].4.1.4 Contradiction Between Real-time Performance and Computing PowerThe large parameter scale and autoregressive generation mechanism of VLM lead to significant inference delay (100-200ms), which is difficult to meet the real-time requirements of high-speed driving and emergency braking (≤50ms)[45]. The high computing power chip (e.g., NVIDIA Thor with 2000 TOPS) has a thermal design power (TDP) that easily exceeds the 30-60W limit of the on-board power system, causing heat dissipation bottlenecks[46].4.2 Bottlenecks Exposed in Test Scenarios4.2.1 Open-loop TestThe hybrid paradigm performs the best, with a task completion rate 25% higher than that of conventional E2E and 30% higher than that of VLM-centric E2E in complex urban road scenarios, especially showing significant advantages in scenarios requiring common sense reasoning such as construction detours and temporary traffic control[47].4.2.2 Closed-loop TestConventional E2E still dominates, with stability 18% higher than that of the hybrid paradigm. The main reason is that the dynamic collaboration mechanism of the hybrid architecture is prone to logical conflicts in long-term driving, and the interaction strategy needs continuous optimization[48].4.2.3 Extreme Scenario TestAll three paradigms have obvious shortcomings: the conventional E2E has a misjudgment rate of 40%, the VLM-centric E2E fails to meet real-time requirements, and the computing power consumption of the hybrid paradigm easily exceeds the carrying capacity of on-board hardware in low-temperature environments[49].4.2.4 Cost-experience Balance TestThe hardware cost of the VLM-centric scheme is 3-5 times that of conventional E2E, making it difficult to adapt to economical models above 100,000 RMB[50]. Through model distillation and software-hardware co-optimization, the cost of the hybrid paradigm can be controlled within 1.5 times that of the conventional scheme, having the potential for mass production and popularization[51].5 Six Breakthrough Directions and Implementation Paths5.1 Advanced Reinforcement Learning: From Imitation to Transcendence5.1.1 Core IdeaAdopt a two-stage training model of Imitation Learning (IL) Reinforcement Learning (RL): first, quickly initialize the model by replicating human driving experience through IL, then actively explore long-tailed scenarios in high-fidelity simulation environments (e.g., CARLA, Meta Drive) using RL to independently optimize decision strategies, reducing reliance on real long-tailed data[52].5.1.2 Key TechnologiesIntroduce Inverse Reinforcement Learning (IRL) to extract implicit reward functions for human driving, combined with Multi-Agent Reinforcement Learning (MARL) to simulate traffic flow interaction[53]; adopt Dynamic Voltage and Frequency Scaling (DVFS) technology to optimize computing power allocation and improve training efficiency[54].5.1.3 Landing Value and Industrial ProgressReduce the misjudgment rate of the model in extreme scenarios by more than 30%[55]. Horizons HSD system has achieved emergent capabilities such as autonomous pull-over without special development through this technology[26]. Baidu Apollo has completed 1 billion kilometers of extreme scenario training in the simulation environment, reducing the demand for real road test data by 60%[56].5.2 Foundation Model Application: Common Sense is Power5.2.1 Core IdeaPre-train a general VLM foundation model based on massive general data (images, text, videos) to endow vehicles with world common sense (e.g., red light means stop, waterlogged roads are prone to skidding), then adapt to driving tasks through few-shot fine-tuning, breaking the ODD (Operational Design Domain) limit[57].5.2.2 Key TechnologiesAdopt Model Distillation to compress the VLM volume, reducing the number of model parameters from 100 billion-level to 10 billion-level to reduce computing power consumption[58]; realize precise alignment between general knowledge and driving tasks through Prompt Tuning[59]; improve chip computing power density through Chiplet packaging technology[60].5.2.3 Landing Value and Typical CasesEnable the model to have cross-scenario migration capabilities. A single urban scenario scheme can be quickly migrated to the whole country without separate training for specific regions, shortening the model adaptation cycle from month-level to week-level[61]. Teslas FSD V12, based on general VLM pre-training, realizes autonomous driving on most urban roads around the world without relying on high-precision maps, verifying the feasibility of this path[6].5.3 Agent Hierarchical Architecture: Human-like Dual System5.3.1 Core IdeaConstruct a hierarchical architecture of high-level reasoning Agent low-level execution Agent, simulating the human decision-making mode of slow thinking fast thinking to balance explainability and real-time performance[62].5.3.2 Key TechnologiesThe high-level Agent realizes task decomposition, common sense reasoning, and risk prediction (e.g., construction ahead → plan detour route) based on LLM/VLM, outputting human-readable reasoning paths[63]; the low-level Agent realizes millisecond-level perception and control based on the conventional E2E model[64]; realize collaboration through standardized interfaces conforming to ISO/SAE 21434 to meet functional safety requirements[65].5.3.3 Landing Value and Compliance AdvantagesSolve the capability shortcomings of a single paradigm[66]. Horizons HSD system adopts this architecture, targeting to achieve zero intervention, full-scene L4-level experience within 2-3 years, with costs that can be reduced to models above 100,000 RMB[26]. The reasoning process of the high-level Agent can meet the transparency requirements of SAE J3016, providing technical support for functional safety certification[67].5.4 World Model: Foreseeing the Future5.4.1 Core IdeaTrain the model to simulate the evolution of scenarios in the next 1-5 seconds (e.g., vehicle trajectory, pedestrian movement, traffic light changes) based on the current environmental state, realizing virtual trial and error and self-supervised learning, reducing reliance on manually annotated data[68].5.4.2 Key TechnologiesAdopt Transformer architecture to build a temporal prediction model, combined with Diffusion Model to improve the authenticity of scene generation[69]; generate high-fidelity simulation scenarios through digital twin technology to supplement the gap of real data[70]; realize desensitized data training using Federated Learning technology to comply with privacy protection regulations[71].5.4.3 Landing Value and Industry PracticeImprove the utilization efficiency of model training data by 50%, significantly reducing data collection and annotation costs[72]. DiDis DriveVLM has integrated a world model module, improving the target recognition accuracy in rainstorm scenarios by 40% and extending the decision advance from 100ms to 300ms[25].5.5 Cross-modal Deep Fusion: Precision Understanding5.5.1 Core IdeaBreak through the limitation of geometric fusion, realize deep fusion of LiDAR/Depth (3D geometric perception) and RGB/VLM (semantic understanding), enabling the model to understand both what it is and where it is, improving robustness in complex environments[73].5.5.2 Key TechnologiesAdopt Attention Mechanism to realize dynamic alignment of cross-modal features[74]; improve the robustness of modal fusion through Contrastive Learning, reducing the impact of sensor noise[75]; reduce data transmission delay by combining in-memory computing technology[76].5.5.3 Landing Value and Hardware SupportImprove the target recognition accuracy of the model in complex perception conditions such as low light and occlusion by 20%, while retaining semantic understanding capabilities to meet the requirements of ISO 21448 expected functional safety[77]. Chips such as Horizon Journey 6 and Black Sesame A2000 have built-in dedicated cross-modal fusion acceleration units, with computing power density increased by 3 times and power consumption reduced by 50%[78].5.6 Data Engine Optimization: Quality Improvement Rather Than Quantity Stacking5.6.1 Core IdeaConstruct a problem-driven automated data closed loop, shifting from massive data stacking to precision data mining, realizing the full-process automation of data collection → cleaning → annotation → training → testing to accelerate model iteration[79].5.6.2 Key TechnologiesAutomatically mine corner cases through Failure Analysis of model failure cases[80]; use blockchain technology to solidify the full-life cycle records of data, realizing two-way traceability of training data and decision results[81]; build an event data recording system conforming to GB 39732 to ensure compliance[82].5.6.3 Landing Value and Policy AdaptationShorten the model iteration cycle from month-level to week-level, improving data utilization efficiency by 40%[83]. Leading domestic automakers have increased the OTA update frequency of intelligent driving systems from quarterly to monthly through this technology[84]. The data closed-loop process complies with the requirements of the Guidelines for Data Sharing of Intelligent Connected Vehicles, and can submit standardized data to the national accident data database, shortening the liability determination time by 60%[85].6 Industrial Ecology and Policy Compliance Support6.1 Chip Computing Power EcologyThe market size of autonomous driving chips is expanding rapidly, reaching 18.6 billion yuan in 2024, expected to exceed 25 billion yuan in 2025 and climb to 87 billion yuan by 2030[86]. The computing power requirement jumps from 30-60 TOPS for L2 level to more than 500 TOPS for L4 level[87]. Domestic chips are rising rapidly. Horizon, Black Sesame Intelligence, and Huawei Ascend have launched a series of chips with 10-500 TOPS, which have been mass-produced and installed in automakers such as NIO, Xpeng, and Li Auto[88]. The installation ratio of domestic chips was less than 15% in 2024, and is expected to increase to more than 45% by 2030[89]. The technical trend focuses on energy efficiency ratio, aiming to control the power consumption per TOPS below 1W by 2030 through advanced processes such as 5nm and below, heterogeneous computing architecture, and dynamic voltage and frequency adjustment[90].6.2 Policy and Compliance System6.2.1 Safety StandardsISO 26262 ASIL-D functional safety certification and ISO 21448 expected functional safety framework have become necessary requirements for L3 models, and the system must activate emergency measures within 300ms when failing[91].6.2.2 Liability DivisionA three-level liability system is gradually established (human liability for L3 level, joint human-machine liability for L4 level, and manufacturer liability for L5 level), implementing the inversion of burden of proof, requiring manufacturers to provide complete data records and system verification reports[92].6.2.3 Ethical NormsAlgorithmic decisions must follow the three principles of priority to life, minimal harm, and non-discrimination, the weight coefficient of pedestrian protection is not less than 0.7, and differentiated treatment based on age, gender and other characteristics is prohibited[93].6.2.4 Data ComplianceImplement the dual standards of GDPR and Chinas Personal Information Protection Law. Autonomous driving data must be stored locally, cross-border transmission must pass security assessment, and data 30 seconds before the accident must be completely saved for 6 years[94].7 Conclusion and Future OutlookThe proposal of the General End-to-End (GE2E) framework marks a key transformation of autonomous driving technology from modular splitting to integrated fusion. Especially after Teslas FSD V12 verified the feasibility of the full-link>References[1] Shi S, Wang H, Chen X, et al. End-to-end autonomous driving: Challenges and directions[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(12): 21321-21335.[2] Chen Z, Yang Y, Han C, et al. Modular vs end-to-end autonomous driving: A comprehensive comparison[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 14567-14576.[3] LeCun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015, 521(7553): 436-444.[4] Yang Y, Han C, Mao R, et al. Survey of General End-to-End Autonomous Driving: A Unified Perspective[EB/OL]. https://doi.org/10.36227/techrxiv.176523315.56439138/v1, 2025.[5] Wang H, Shi S, Chen X, et al. A survey of deep learning for autonomous driving[J]. Pattern Recognition, 2021, 119: 107992.[6] Tesla. Tesla FSD V12: End-to-end AI for autonomous driving[EB/OL]. https://www.tesla.com/en_us/autopilot, 2024.[7] China Association of Automobile Manufacturers. 2024 Annual Report on Intelligent Connected Vehicles[R]. Beijing: China Association of Automobile Manufacturers, 2025.[8] Bojarski M, Del Testa D, Dworakowski D, et al. End to end learning for self-driving cars[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2016: 2174-2180.[9] Geiger A, Lenz P, Stiller C, et al. Vision meets robotics: The KITTI dataset[J]. The International Journal of Robotics Research, 2013, 32(11): 1231-1237.[10] Zhou X, Wang D, Krähenbühl P. Objects as points[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 10295-10304.[11] Chen X, Kundu K, Zhang Z, et al. Monocular 3d object detection for autonomous driving[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 2147-2156.[12] Xu D, Chen X, Lin S, et al. UniAD: Unified end-to-end autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 14587-14596.[13] Xpeng Motors. Xpeng G9: Equipped with VADv2 end-to-end autonomous driving system[EB/OL]. https://www.xpeng.com/model-g9, 2024.[14] Horizon Robotics. Journey series chips: Computing power and energy efficiency balance for autonomous driving[EB/OL]. https://www.horizon.ai/product/journey, 2024.[15] Radford A, Narasimhan K, Salimans T, et al. Improving language understanding by generative pre-training[EB/OL]. https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf, 2018.[16] Li J, Li D, Chen X, et al. DriveLM: Language models for autonomous driving[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 15627-15637.[17] OpenAI. Q-Former: Query-based transformer for vision-language pre-training[EB/OL]. https://openai.com/research/q-former, 2023.[18] Wang Y, Li J, Zhang B, et al. Explainable autonomous driving via natural language[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(3): 2890-2901.[19] Yang Y, Mao R, Han C, et al. LMDrive: Language-augmented end-to-end driving[C]//Proceedings of the 2024 IEEE International Conference on Robotics and Automation, 2024: 5678-5684.[20] Gao Y, Li Z, Shen Y, et al. Common sense reasoning for autonomous driving: A survey[J]. Artificial Intelligence Review, 2024, 57(2): 1234-1278.[21] Brooks R A. A robust layered control system for a mobile robot[J]. IEEE Journal of Robotics and Automation, 1986, 2(1): 14-23.[22] Pomerleau D A. ALVINN: An autonomous land vehicle in a neural network[J]. Neural Computation, 1989, 1(1): 38-53.[23] Chen Z, Yang T, Shi S, et al. VLM-AD: Vision-language model for autonomous driving decision[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024: 16789-16798.[24] NVIDIA. NVIDIA DRIVE Thor: 2000 TOPS computing power for autonomous driving[EB/OL]. https://www.nvidia.com/en-us/self-driving-cars/drive-thor/, 2023.[25] DiDi Autonomous Driving. DriveVLM: Hybrid end-to-end autonomous driving system[EB/OL]. https://www.didi.com/en/autonomous-driving, 2024.[26] Horizon Robotics. HSD system: Toward L4-level autonomous driving for mass production[EB/OL]. https://www.horizon.ai/product/hsd, 2024.[27] Schmidhuber J. Deep learning in neural networks: An overview[J]. Neural Networks, 2015, 61: 85-117.[28] Baltrusaitis T, Ahuja C, Morency L P. Multimodal machine learning: A survey and taxonomy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(2): 423-443.[29] Zhang C, Li H, He X, et al. A survey on knowledge-enhanced neural networks for natural language processing[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 33(12): 4223-4242.[30] Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2012: 3354-3361.[31] Caesar H, Bankiti V, Lang A H, et al. nuScenes: A multimodal dataset for autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 11621-11631.[32] Waymo. Waymo Open Dataset v2: Enhanced for end-to-end autonomous driving[EB/OL]. https://waymo.com/open/, 2024.[33] Li J, Chen X, Li D, et al. Chain-of-thought annotation for autonomous driving datasets[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 15638-15648.[34] CARLA Simulator. CARLA: Open-source simulator for autonomous driving research[EB/OL]. https://carla.org/, 2024.[35] European Commission. General Data Protection Regulation (GDPR)[EB/OL]. https://eur-lex.europa.eu/eli/reg/2016/679/oj, 2016.[36] Snoek J, Larochelle H, Adams R P. Practical Bayesian optimization of machine learning algorithms[C]//Advances in neural information processing systems, 2012, 25: 2951-2959.[37] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[C]//Advances in neural information processing systems, 2014, 27: 2672-2680.[38] McCloskey M, Cohen N J. Catastrophic interference in connectionist networks: The sequential learning problem[M]//Psychology of learning and motivation. Academic Press, 1989, 24: 109-165.[39] Kim J, Lee D H, Shin J, et al. Corner case detection for autonomous driving using generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020: 1134-1135.[40] ISO. ISO 26262: Road vehicles - Functional safety[EB/OL]. https://www.iso.org/standard/64083.html, 2018.[41] Gunning D. Explainable artificial intelligence (XAI)[R]. Arlington: Defense Advanced Research Projects Agency, 2017.[42] SAE International. SAE J3016: Taxonomy and definitions for terms related to driving automation systems for on-road motor vehicles[EB/OL]. https://www.sae.org/standards/content/j3016_202104/, 2021.[43] Amann J, Grote T, Winner H. Balancing safety and efficiency in automated driving[J]. Transportation Research Part C: Emerging Technologies, 2016, 71: 508-522.[44] IEEE. IEEE 2846: Standard for the definition of reliability metrics for automated driving systems[EB/OL]. https://standards.ieee.org/standard/2846-2022.html, 2022.[45] Chen Y, Wang Z, Li J, et al. Real-time end-to-end autonomous driving with low computational cost[C]//Proceedings of the IEEE International Conference on Robotics and Automation, 2023: 4567-4573.[46] Zhang H, Wang Y, Li D, et al. Thermal management for high-power autonomous driving chips[J]. IEEE Transactions on Components, Packaging and Manufacturing Technology, 2024, 14(2): 345-356.[47] Wang H, Shi S, Chen X, et al. Performance evaluation of end-to-end autonomous driving systems[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(8): 8234-8245.[48] Li Z, Chen X, Wang H, et al. Closed-loop evaluation of end-to-end autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024: 16799-16808.[49] Zhao J, Li H, Chen Z, et al. Extreme scenario testing for autonomous driving: A survey[J]. IEEE Transactions on Intelligent Vehicles, 2024, 9(3): 2890-2905.[50] McKinsey Company. The economic impact of autonomous driving[R]. New York: McKinsey Company, 2024.[51] Boston Consulting Group. Cost reduction paths for autonomous driving systems[R]. Boston: Boston Consulting Group, 2025.[52] Sutton R S, Barto A G. Reinforcement learning: An introduction[M]. MIT press, 2018.[53] Russell S J. Inverse reinforcement learning[C]//Proceedings of the 17th international conference on machine learning, 2000: 671-678.[54] Yao K, Wang Y, Li J, et al. DVFS-based energy optimization for reinforcement learning training on edge devices[C]//Proceedings of the IEEE International Conference on Edge Computing, 2023: 123-130.[55] Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.[56] Baidu Apollo. Apollo simulation platform: 10 billion kilometers of virtual testing[EB/OL]. https://apollo.auto/simulation, 2024.[57] Brown T, Mann B, Ryder N, et al. Language models are few-shot learners[J]. Advances in neural information processing systems, 2020, 33: 1877-1901.[58] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015.[59] Lester B, Al-Rfou R, Constant N. Parameter-efficient transfer learning for NLP[J]. arXiv preprint arXiv:1804.07612, 2018.[60] AMD. Chiplet technology: Enabling high-performance computing[EB/OL]. https://www.amd.com/en/technologies/chiplet, 2023.[61] Sun S, Liu Y, Gao J, et al. Few-shot transfer learning for autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 14597-14606.[62] Minsky M. The society of mind[M]. Simon and Schuster, 1988.[63] Brooks R A. Intelligence without representation[J]. Artificial intelligence, 1991, 47(1-3): 139-159.[64] Pomerleau D A. Rapidly adapting neural networks for autonomous navigation[J]. Neural Computation, 1993, 5(2): 181-194.[65] ISO/SAE. ISO/SAE 21434: Road vehicles - Cybersecurity engineering[EB/OL]. https://www.iso.org/standard/79843.html, 2021.[66] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2881-2890.[67] European Union. Regulation (EU) 2019/2144 on type-approval requirements for motor vehicles with regard to their general safety and the protection of vehicle occupants and vulnerable road users[EB/OL]. https://eur-lex.europa.eu/eli/reg/2019/2144/oj, 2019.[68] Ha D, Schmidhuber J. World models[J]. arXiv preprint arXiv:1803.10122, 2018.[69] Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models[J]. Advances in neural information processing systems, 2020, 33: 6840-6851.[70] Gaidon A, Wang Q, Cabon Y, et al. Virtual worlds as proxy for multi-object tracking analysis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 4340-4349.[71] McMahan B, Moore E, Ramage D, et al. Communication-efficient learning of deep networks from decentralized data[C]//Artificial intelligence and statistics, 2017: 1273-1282.[72] Jaderberg M, Simonyan K, Zisserman A, et al. Spatial transformer networks[J]. Advances in neural information processing systems, 2015, 28: 2017-2025.[73] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30: 5998-6008.[74] Chen T, Kornblith S, Norouzi M, et al. A simple framework for contrastive learning of visual representations[J]. International conference on machine learning, 2020: 1597-1607.[75] Zhang Y, Li K, Li K, et al. ResNeSt: Split-attention networks[J]. Advances in neural information processing systems, 2020, 33: 20089-20099.[76] IEEE. IEEE 802.11p: Wireless access in vehicular environments[EB/OL]. https://standards.ieee.org/standard/802_11p-2010.html, 2010.[77] ISO. ISO 21448: Road vehicles - Safety of the intended functionality[EB/OL]. https://www.iso.org/standard/68350.html, 2019.[78] Black Sesame Intelligence. A2000 chip: For L4-level autonomous driving[EB/OL]. https://www.blacksesame.com/a2000, 2024.[79] Dean J, Corrado G, Monga R, et al. Large scale distributed deep networks[C]//Advances in neural information processing systems, 2012, 25: 1223-1231.[80] Kim J, Lee D H, Shin J, et al. Failure mode and effect analysis for autonomous driving systems[C]//Proceedings of the IEEE International Conference on Intelligent Transportation Systems, 2022: 3456-3461.[81] Nakamoto S. Bitcoin: A peer-to-peer electronic cash system[EB/OL]. https://bitcoin.org/bitcoin.pdf, 2008.[82] China Automotive Technology and Research Center. GB 39732: Requirements for data recording systems of intelligent connected vehicles[EB/OL]. https://www.sac.gov.cn/, 2021.[83] Jia Y, Shelhamer E, Donahue J, et al. Caffe: Convolutional architecture for fast feature embedding[C]//Proceedings of the 22nd ACM international conference on Multimedia, 2014: 675-678.[84] NIO. NIO AD OTA updates: Quarterly to monthly[EB/OL]. https://www.nio.com/en-us/software-updates, 2024.[85] Ministry of Industry and Information Technology of the Peoples Republic of China. Guidelines for Data Sharing of Intelligent Connected Vehicles[EB/OL]. https://www.miit.gov.cn/, 2024.[86] Gartner. Forecast: Autonomous Driving Chips, Worldwide, 2023-2030[EB/OL]. https://www.gartner.com/en/newsroom/press-releases/2023-11-08-gartner-forecasts-global-autonomous-driving-chip-market-to-reach-8-7-billion-by-2030, 2023.[87] Intel. Intel Mobileye EyeQ6: 500 TOPS for L4 autonomous driving[EB/OL]. https://www.mobileye.com/product/eyeq6/, 2024.[88] Huawei. Huawei Ascend 910B: AI chip for autonomous driving[EB/OL]. https://www.huawei.com/en/products/ascend, 2024.[89] IDC. Worldwide Semiconductor Market Forecast for Autonomous Driving, 2024-2028[EB/OL]. https://www.idc.com/promo/semiconductors/autonomous-driving, 2024.[90] TSMC. 3nm process technology: For high-performance computing and autonomous driving[EB/OL]. https://www.tsmc.com/, 2023.[91] United Nations Economic Commission for Europe. Regulation No. 157: Uniform provisions concerning the approval of vehicles with regard to automated lane keeping systems (ALKS)[EB/OL]. https://unece.org/trans/main/wp29/wp29regs/R157e.html, 2021.[92] National Highway Traffic Safety Administration. Federal Motor Vehicle Safety Standards for Automated Driving Systems[EB/OL]. https://www.nhtsa.gov/, 2023.[93] European Commission. Ethical guidelines for trustworthy AI[EB/OL]. https://digital-strategy.ec.europa.eu/en/library/ethical-guidelines-trustworthy-ai, 2019.[94] Standing Committee of the National Peoples Congress of the Peoples Republic of China. Personal Information Protection Law of the Peoples Republic of China[EB/OL]. https://www.npc.gov.cn/, 2021.