自动化机器学习(AutoML)是将机器学习应用于现实世界问题的端到端自动化过程。在一个典型的机器学习应用中,工程师将一个由输入数据点组成的数据集进行训练。可能不是所有算法都可以开箱即用地适用于原始数据本身的形式。机器学习的专家可能必须应用适当的数据预处理、特征工程、特征提取和特征选择方法,使数据集适合机器学习。在这些预处理步骤之后,工程师必须选择算法和优化超参数,以最大化其最终机器学习模型的预测性能。由于这些中的许多步骤往往超出了非专家的能力,所以自动化机器学习被提出来作为一种基于人工智能的解决方案,以应对如何应用机器学习这一日益增长的挑战[1][2]。将端到端机器学习的应用过程自动化为此提供了一些优势:产生更简单的解决方案、更快地创建这些解决方案以及通常比手工设计更优的模型。然而,AutoML并不是灵丹妙药,它可以引入自己的额外参数,称为超参数,这可能需要一些专业知识来自行设置。但它确实让非专家更容易应用机器学习。
自动化机器学习可以针对机器学习过程的不同阶段:[2]
处理自动化机器学习各个阶段的著名平台:
^Thornton C, Hutter F, Hoos HH, Leyton-Brown K (2013). Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms. KDD '13 Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 847–855..
^Hutter F, Caruana R, Bardenet R, Bilenko M, Guyon I, Kegl B, and Larochelle H. "AutoML 2014 @ ICML". AutoML 2014 Workshop @ ICML. Retrieved 2018-03-28..
^Kotthoff L, Thornton C, Hoos HH, Hutter F, Leyton-Brown K (2017). "Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA". Journal of Machine Learning Research. 18 (25): 1–5..
^Feurer M, Klein A, Eggensperger K, Springenberg J, Blum M, Hutter F (2015). "Efficient and Robust Automated Machine Learning". Advances in Neural Information Processing Systems 28 (NIPS 2015): 2962–2970..
^Swearingen, Thomas; Drevo, Will; Cyphers, Bennett; Cuesta-Infante, Alfredo; Ross, Arun; Veeramachaneni, Kalyan (December 2017). "ATM: A distributed, collaborative, scalable system for automated machine learning". 2017 IEEE International Conference on Big Data (Big Data). IEEE. doi:10.1109/bigdata.2017.8257923. ISBN 9781538627150..
^Olson RS, Urbanowicz RJ, Andrews PC, Lavender NA, Kidd L, Moore JH (2016). Automating biomedical data science through tree-based pipeline optimization. Proceedings of EvoStar 2016. Lecture Notes in Computer Science. 9597. pp. 123–137. arXiv:1601.07925. doi:10.1007/978-3-319-31204-0_9. ISBN 978-3-319-31203-3..
^Olson RS, Bartley N, Urbanowicz RJ, Moore JH (2016). Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science. Proceedings of EvoBIO 2016. Gecco '16. pp. 485–492. arXiv:1603.06212. doi:10.1145/2908812.2908918. ISBN 9781450342063..
^Shubha Nabar (2018-08-16). "Open Sourcing TransmogrifAI – Automated Machine Learning for Structured Data - Salesforce Engineering". Salesforce Engineering (in English). Retrieved 2018-08-16.CS1 maint: Unrecognized language (link).
^Kyle Wiggers (2018-08-16). "Salesforce open-sources TransmogrifAI, the machine learning library that powers Einstein". VentureBeat. Retrieved 2018-08-16. Once TransmogrifAI has extracted features from the dataset, it’s primed to begin automated model training. At this stage, it runs a cadre of machine learning algorithms in parallel on the data, automatically selects the best-performing model, and samples and recalibrates predictions to avoid imbalanced data..
^de Sá, Alex G. C.; Pinto, Walter José G. S.; Oliveira, Luiz Otavio V. B.; Pappa, Gisele L. (2017), "RECIPE: A Grammar-Based Framework for Automatically Evolving Classification Pipelines", Lecture Notes in Computer Science (in 英语), Springer International Publishing, pp. 246–261, doi:10.1007/978-3-319-55696-3_16, ISBN 9783319556956.
^Haifeng J, Qingquan S, Xia H (2018). "Auto-Keras: Efficient Neural Architecture Search with Network Morphism". arXiv:1806.10282 [cs.LG]..
暂无