Blog posts

Digging into the data: The Cropster Data Project

In part 2 of our series on the Cropster Data Project we took a closer look at how we process, protect and secure the data we used for the project. In this third part, we'll take a look at the results of our research and the implications it has for the coffee community.

Machine and Human Learning

Before digging into the details, we had to convert the data into a form that could be easily analyzed. During this phase, our data engineering team worked closely with the researchers to mold the data into the right form, eliminate statistical outliers and decide which data subsets were the most relevant to the research questions. During this phase the data was also cleaned and pre-processed.

After the data cleaning, the research team tried to identify which roasting parameters have the largest influence on the quality of the roasted coffee (as measured by cupping score). In this case we used temperature time series data and the corresponding metadata and tried to find relationships with the cupping score. In total we analyzed more than 40 features such as mean and average temperatures in certain phases, deviations, or the skewness and kurtosis of the roast curves. As more than 40 features are hard to visualize and interpret at once, we reduced the scope of our data analysis with a principal component analysis. As we expected, due to the many relevant variables of roasting, we could not find any significant patterns with this first approach, though it did point us in the right direction going forward.

We then analyzed the similarity of a roast curve with its corresponding profile curve. Although this sounds relatively easy, it is actually quite complex. Since roast curves are never exactly the same length, we had to use shape matching algorithms to align the roast curve with its profile curve without losing too much information. Unfortunately, the difference between the roast and profile curve did not show a direct correlation with the cupping score. 

Next, we simplified our approach by defining that "good roasts" have the highest possible normalized score while "bad roasts" have any score that is smaller than the highest possible score. We applied state-of-the-art machine learning algorithms to the simplified data set and were then able to detect "good or bad roasts" with a statistically significant level of accuracy.

Then, we intensified our analysis on the detection of flavors that a roast was going to have. The university team developed a new representation of the problem space which allowed them to train machine learning algorithms to predict the flavor labels based on the temperature curve. This task required intensive preprocessing of the labels in collaboration with one of our roasting experts. The results did not only allow the prediction of flavor labels of a roast based on its curve to a significant degree, but also showed for instance that bitter tastes form during roasts that remain longer in a medium temperature range (ca. 125°C - 175 °C) before they reach high ranges (>220C°) later during the roast.

The results of the first project were refined in a second project. In the second project, we improved the mapping between flavors and the temperature curve by adding sentiment labels to the flavor labels and by combining the labels in a systematic manner. This allowed us to reduce the potential classes and eliminate uncommon labels from the target dataset. With the second dataset we were able to expand the prototype for predicting multiple combinations of labels at once.

Outlook

The Cropster Data Project has already shown us that there is much more to discover. Even with a few small scale research projects, we have already found and visualized relationships between roasting data and quality of the final product. Some of the ideas worked well, and though some proved not to be worth pursuing, we are just getting started and plan to continue our collaboration with enthusiastic researchers from different domains. Many aspects of coffee roasting remain mysterious and leave us with more questions than answers due to the new knowledge we gained. These questions invite further analysis of the coffee roasting process and we are excited to continue that research!

 

새 소식 더 보기

Release

레시피 범위 정의<

레시피 관리는 모든 성공한 커피 사업에 필수적인 부분입니다. 카페에서 레시피는 여러 위치의 여러 사람이 관리할 수 있습니다(예: 주인 및 관리자, 바리스타 트레이너 또는 중앙 집중식 실험실 또는 로스터리에서). 이를 염두에 두고, 카페에서의 레시피 관리를 살펴봅시다.

더 읽기
Origin   -   Roastery   -   Quality Control / Cupping   -  

로트 평가, 샘플 유형 및 샘플 그룹

로스터, 커피 연구소, 수입업체, 수출업체, 생산업체 모두 샘플 커피 로트의 여러 샘플을 평가할 때가 있습니다. Cropster Sample Groups은 샘플을 계속 추적하기 쉽게 합니다.

더 읽기
Roastery   -   Commerce / Selling   -   Cafe   -  

도매 거래처에서 품질 확보하기

도매 커피 로스팅 사업을 운영하는 것은 생각보다 더 복잡합니다. 소싱, 로스팅, 품질 관리에 대한 복잡성을 차치하더라도 고객에게 단순히 상품을 제공하는 것보다 도매를 하려면 더 많은 것이 필요합니다. 계정에 적합한 서비스와 지원을 제공해 계정이 효과적으로 여러분의 브랜드(또한 자사 브랜드!)를 표현할 수 있도록 하는 것은 여러분의 사업과 브랜드의 성공에…

더 읽기

뉴스레터를 구독해보세요.

다음의 업계 종사자를 의한 솔루션에 대한 더 자세한 내용을 확인해보세요.