Dynamic pricing with demand learning refers to the profit maximization problem where one optimizes the profit by choosing a price and learns the demand at the same time. Traditionally, the cost is fixed and the problem may be formulated as a multi armed bandit problem, which is known to have an O(logT) lower bound on the expected regret, where T is the number of periods. In this paper, we consider the case where the cost changes over periods. Then, the optimal pricing decision becomes a function of the cost. We propose an upper-confidence-bound type of algorithm to solve the problem. When the cost is a continuous random variable, we prove that the expected regret of our proposed algorithm is O((logT)2). When the cost is discrete, surprisingly, we find that the expected regret may be bounded by a constant. This is a joint work with Ying Zhong and Guangwu Liu.